Beautiful Code
Jan 12, 2008
This bit of code for converting numbers into their Roman numeral equivalents was posted by geezusfreeek on reddit, and I found it so alarmingly pretty that I'm going to spend some time to break it down. This exercise is largely for my own edification, but perhaps somebody else will learn a bit along the way. Here's the code:
import List
import Control.Arrow
romanize = concat . unfoldr next
where next 0 = Nothing
next x = Just $ second (x-) $ head $ filter ((<=x) . snd) numerals
numerals = [("M", 1000), ("CM", 900), ("D", 500), ("CD", 400),
("C", 100), ("XC", 90), ("L", 50), ("XL", 40),
("X", 10), ("IX", 9), ("V", 5), ("IV", 4),
("I", 1)]
Starting from the inside out is usually the easiest method of understanding blocks like this, so we'll look first at the "numerals" list. It is, simply, a list of tuples matching Roman numerals to their value. Importantly, it's ordered from largest to smallest; we'll see why shortly.
The meat of this snippet comes in the "next" function:
next 0 = Nothing
next x = Just $ second (x-) $ head $ filter ((<=x) . snd) numerals
This function is going to return either Nothing (in the base case) or Just a tuple of type (String, Integer), where the string is the largest roman numeral that's smaller than x and the int is the difference between x and the value of that roman numeral.
The first thing we do is filter out the numerals larger than x; we can run some quick tests at the interactive console to see how this works:
Prelude> let numerals = [("M", 1000), ("CM", 900), ("D", 500)]
Prelude> filter ((<=900) . snd) numerals
[("CM",900),("D",500)]
We use head to grab the largest numeral we can squeeze into the current value:
Prelude> head $ filter ((<=900) . snd) numerals
("CM",900)
Then we subtract the value of the roman numeral from the current value of x, in order to return the remainder to be operated on next:
Prelude> :m Control.Arrow
Prelude Control.Arrow> second (900-) $ head $ filter ((<=900) . snd) numerals
("CM",0)
Prelude Control.Arrow> second (915-) $ head $ filter ((<=915) . snd) numerals
("CM",15)
And, finally, we wrap it up in a Maybe monad, so that the main function can handle the base case transparently:
Prelude Control.Arrow> Just $ second (915-) $ head $
filter ((<=915) . snd) numerals
Just ("CM",15)
And now we're getting pretty close! What we have is a function that returns either Nothing or a tuple containing a letter to add to our roman numeral and the next value to operate on. All we have left is the code to drive the operation:
romanize = concat . unfoldr next
The use of unfoldr helps explain why we've wrapped next up in Maybe; here's its type:
Prelude Control.Arrow> :m List
Prelude List> :t unfoldr
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
So we'll take a function from b -> Maybe (a, b), a value of type b, and return a list of [a]. In our case, a = String and b = Integer.
All unfoldr does is call the function it's given as a first argument, which returns a tuple or Nothing. If it's a tuple, it appends its first element to a list and calls the same function with the second element. If it's Nothing, it returns the list it's built up to this point.
As we can see, that matches up perfectly with the "next" function we've already defined, and the unfoldr will build up a list of roman numeral strings:
Prelude> :m Control.Arrow List
Prelude Control.Arrow List> let numerals = [("X", 10), ("IX", 9), ("V", 5),
("IV", 4), ("I", 1)]
Prelude Control.Arrow List> let next x = if x==0 then Nothing
else Just $ second (x-) $ head $ filter ((<=x) . snd) numerals
Prelude Control.Arrow List> unfoldr next 9
["IX"]
Prelude Control.Arrow List> unfoldr next 8
["V","I","I","I"]
(We had to mangle the definition of next a bit to fit into the interactive interpreter, but we haven't changed anything fundamental about it.)
Finally, all we have to do is concatenate the strings into our final result:
Prelude Control.Arrow List> (concat . unfoldr next) 8
"VIII"
Conclusion
Phew - that's an awful lot of information stuffed into a few pretty lines! I still have a hell of a time geting the types to match up so that I can write stuff like this, but that doesn't mean that I can't appreciate the beauty of how it all fits together. The original function presented here is tight in the very best sense of the word.
[# ▷] haskell, code, programming
Why Publish CS Papers Without Code?
Mar 12, 2007
Imagine that you read a paper analyzing "Hamlet" in great detail. Intrigued, you went to the bookstore to look for a copy, only to find that there were none. Confused, you checked the library - nothing. Finally, you went online to shakespeare.com, only to find a note saying that "Hamlet" was still being edited and would be released Any Day Now.
Unfortunately, an analogous situation is the norm in the world of academic computer science. Graduate students and professors produce code by the truckload, and a majority of them produce only papers about their work. While this is certainly not always the case, it was difficult for me to even find these examples of academic source code. In all four cases, it has proven very useful to programmers outside of academia.
Life, not Math
The fact is, computer science research is more like biology research than it is like mathematical research. For an experiment to be valid, it should be repeatable. If you're publishing analyses of programs without publishing the code that was analyzed, how can the community possibly verify what you claim?
Bugs are endemic to source code everywhere, and there is no reason to believe that academic code is any different. All of the analysis in the world is irrelevant if the program you are analyzing has a subtle bug embedded in it. We, computer scientists, should expect and demand that published, well tested code be made available with every paper which claims to analyze or draw conclusions from a program of any significant size.
A Problem of Environment?
At the moment, there is just no easy way to do this. Where the open source world has SourceForge and other project hosting sites, there is no similar environment where academic papers can live alongside the code they reference. Instead, computer science researchers are encouraged to publish papers in traditional, copyright-locked, academic journals. These journals are not prepared to handle large volumes of code, nor should they be. Computer science research was made for the web, and there should be a home for it on the web.
It is intimidating for a harried academic that just wants to get work done, and advance in his field, to have to set up the host of tools required to share code effectively. If there were a SourceForge-for-academics site where they could simply register their paper, drop in the source code, and have a project ready made for them, it would be much more likely that they would participate. If such a project were to gain steam, the network effect would make it an invaluable resource for academic and industry programmers alike. Collaboration and open code sharing would lead to much more rapid progress in the field, and hopefully encourage the greater rigor that other fields require of their practitioners.
Two Different Worlds
Right now, academics and open-source programmers live largely in two separate, often parallel, worlds. Where they do collide, as they do in the Haskell language, there is often extremely interesting work being produced. Each brings a different, interesting, viewpoint to the table, and greater coordination between the two would have nearly universal benefits for programmers of every stripe.
So free the code! If you're an academic programmer, consider publishing the code that you have, regardless of what you may think of it. Consider asking the journal that you publish in to retain copyright over your work. If you're an open source programmer, look for some work in a field that interests you, and email the author if he hasn't released his code. Ask him an interesting question, and convince him that his code would be useful. If you're both, tell the world your ideas for how we can all work better together.
Update:
I've been contacted already by two scientists who care about the reproducibility of programs. First, Gavin Baker wrote to tell me about the insight toolkit, which "is a cross-platform open-source image processing toolkit for performing registration and segmentation" that attempts to provide reference implementations of published algorithms. He also pointed me towards The Insight Journal, which allows authors to publish open articles which are automatically verified with CMake. This is exactly the type of thing I was hoping to hear about.
Only a few minutes later, "I. Vlad" wrote to tell me that computational geophysicists have a similar system called Madagascar, which uses SCons to provide automated verification of results. Furthermore, they encourage the use of open data sets from the website for the Society of Exploration Geophysicists.
Good to hear that these scientists are out there making stuff happen, while I sit here on my duff.
Update 2
Made a correction to the update. (Made it clear that the Madagascar project did not set up the SEG site - sloppy writing on my part).
[# ▷] Computer Science, cs, programming, computer, python, haskell, science