<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<link rel="self" href="/Atom/" />
	<id>http://billmill.org/</id>
	<title>My Name Rhymes</title>
	<subtitle>Bill Mill blogs irregularly</subtitle>
	<updated>2007-03-12T07:29:00Z</updated>
	<author>
		<name>Bill Mill</name>
		<email>bill.mill@gmail.com</email>
		<uri>http://billmill.org/</uri>
	</author>
	<link href="http://billmill.org/" />
	<entry>
		<title>Why Publish CS Papers Without Code?</title>
		<link href="http://billmill.org/why_no_code.html" />	
		<id>http://billmill.org/why_no_code.html</id>
		<updated>2007-03-12T07:29:00Z</updated>
		<summary type="html">&lt;p&gt;Imagine that you read a paper analyzing "Hamlet" in great detail. Intrigued, 
you went to the bookstore to look for a copy, only to find that there were 
none. Confused, you checked the library - nothing. Finally, you went online to 
shakespeare.com, only to find a note saying that "Hamlet" was still being 
edited and would be released Any Day Now.&lt;/p&gt;
&lt;p&gt;Unfortunately, an analogous situation is the norm in the world of academic 
computer science. Graduate students and professors produce code by the 
truckload, and a majority of them produce only papers about their work. While 
this is certainly &lt;a href="http://www.cs.waikato.ac.nz/~ml/"&gt;not&lt;/a&gt; &lt;a 
href="http://www.bluej.org/"&gt;always&lt;/a&gt; &lt;a 
href="https://www.drproject.org/"&gt;the&lt;/a&gt; &lt;a 
href="http://www.haskell.org/ghc/"&gt;case&lt;/a&gt;, it was difficult for me to even 
find these examples of academic source code. In all four cases, it has proven 
very useful to programmers outside of academia.&lt;/p&gt;
&lt;h1&gt;Life, not Math&lt;/h1&gt;
&lt;p&gt;The fact is, computer science research is more like biology research than it 
is like mathematical research. For an experiment to be valid, it should be 
repeatable. If you're publishing analyses of programs without publishing the 
code that was analyzed, how can the community possibly verify what you 
claim?&lt;/p&gt;
&lt;p&gt;Bugs are endemic to source code everywhere, and there is no reason to 
believe that academic code is any different. All of the analysis in the world 
is irrelevant if the program you are analyzing has a subtle bug embedded in it.  
We, computer scientists, should expect and demand that published, well tested 
code be made available with every paper which claims to analyze or draw 
conclusions from a program of any significant size.&lt;/p&gt;
&lt;h1&gt;A Problem of Environment?&lt;/h1&gt;
&lt;p&gt;At the moment, there is just no easy way to do this. Where the open 
source world has &lt;a href="http://sf.net"&gt;SourceForge&lt;/a&gt; and other project 
hosting sites, there is no similar environment where academic papers can live 
alongside the code they reference. Instead, computer science researchers are 
encouraged to publish papers in traditional, copyright-locked, academic 
journals. These journals are not prepared to handle large volumes of code, nor 
should they be. Computer science research was made for the web, and there 
should be a home for it on the web.&lt;/p&gt;
&lt;p&gt;It is intimidating for a harried academic that just wants to get work done, 
and advance in his field, to have to set up the host of tools required to share 
code effectively. If there were a SourceForge-for-academics site where they 
could simply register their paper, drop in the source code, and have a project 
ready made for them, it would be much more likely that they would participate.  
If such a project were to gain steam, the network effect would make it an 
invaluable resource for academic and industry programmers alike. Collaboration 
and open code sharing would lead to much more rapid progress in the field, and 
hopefully encourage the greater rigor that other fields require of their 
practitioners.&lt;/p&gt;
&lt;h1&gt;Two Different Worlds&lt;/h1&gt;
&lt;p&gt;Right now, academics and open-source programmers live largely in two 
separate, often parallel, worlds. Where they do collide, as they do in the 
Haskell language, there is often extremely interesting work being produced.  
Each brings a different, interesting, viewpoint to the table, and greater 
coordination between the two would have nearly universal benefits for 
programmers of every stripe.&lt;/p&gt;
&lt;p&gt;So free the code! If you're an academic programmer, consider publishing the 
code that you have, regardless of what you may think of it. Consider asking the 
journal that you publish in to retain copyright over your work. If you're an 
open source programmer, look for some work in a field that interests you, and 
email the author if he hasn't released his code. Ask him an interesting 
question, and convince him that his code would be useful. If you're both, tell 
the world your ideas for how we can all work better together.&lt;/p&gt;
&lt;h1&gt;Update:&lt;/h1&gt;
&lt;p&gt;I've been contacted already by two scientists who care about the 
reproducibility of programs. First, &lt;a 
href="http://www.cs.mu.oz.au/~gavinb/index.php"&gt;Gavin Baker&lt;/a&gt; wrote to tell 
me about the &lt;a href="http://itk.org"&gt;insight toolkit&lt;/a&gt;, which "is a 
cross-platform open-source image processing toolkit for performing registration 
and segmentation" that attempts to provide reference implementations of 
published algorithms. He also pointed me towards &lt;a 
href="http://www.insight-journal.org/"&gt;The Insight Journal&lt;/a&gt;, which allows 
authors to publish open articles which are automatically verified with CMake.  
This is exactly the type of thing I was hoping to hear about.&lt;/p&gt;
&lt;p&gt;Only a few minutes later, "I. Vlad" wrote to tell me that computational 
geophysicists have a similar system called &lt;a 
href="http://rsf.sourceforge.net/"&gt;Madagascar&lt;/a&gt;, which uses SCons to provide 
automated verification of results. Furthermore, they encourage the use of open 
data sets from &lt;a href="http://software.seg.org/"&gt;the website&lt;/a&gt; for the 
Society of Exploration Geophysicists.&lt;/p&gt;
&lt;p&gt;Good to hear that these scientists are out there making stuff happen, while 
I sit here on my duff.&lt;/p&gt;
&lt;h1&gt;Update 2&lt;/h1&gt;
&lt;p&gt;Made a correction to the update. (Made it clear that the Madagascar project 
did not set up the SEG site - sloppy writing on my part).&lt;/p&gt;
</summary>
		<content type="html">&lt;p&gt;Imagine that you read a paper analyzing "Hamlet" in great detail. Intrigued, 
you went to the bookstore to look for a copy, only to find that there were 
none. Confused, you checked the library - nothing. Finally, you went online to 
shakespeare.com, only to find a note saying that "Hamlet" was still being 
edited and would be released Any Day Now.&lt;/p&gt;
&lt;p&gt;Unfortunately, an analogous situation is the norm in the world of academic 
computer science. Graduate students and professors produce code by the 
truckload, and a majority of them produce only papers about their work. While 
this is certainly &lt;a href="http://www.cs.waikato.ac.nz/~ml/"&gt;not&lt;/a&gt; &lt;a 
href="http://www.bluej.org/"&gt;always&lt;/a&gt; &lt;a 
href="https://www.drproject.org/"&gt;the&lt;/a&gt; &lt;a 
href="http://www.haskell.org/ghc/"&gt;case&lt;/a&gt;, it was difficult for me to even 
find these examples of academic source code. In all four cases, it has proven 
very useful to programmers outside of academia.&lt;/p&gt;
&lt;h1&gt;Life, not Math&lt;/h1&gt;
&lt;p&gt;The fact is, computer science research is more like biology research than it 
is like mathematical research. For an experiment to be valid, it should be 
repeatable. If you're publishing analyses of programs without publishing the 
code that was analyzed, how can the community possibly verify what you 
claim?&lt;/p&gt;
&lt;p&gt;Bugs are endemic to source code everywhere, and there is no reason to 
believe that academic code is any different. All of the analysis in the world 
is irrelevant if the program you are analyzing has a subtle bug embedded in it.  
We, computer scientists, should expect and demand that published, well tested 
code be made available with every paper which claims to analyze or draw 
conclusions from a program of any significant size.&lt;/p&gt;
&lt;h1&gt;A Problem of Environment?&lt;/h1&gt;
&lt;p&gt;At the moment, there is just no easy way to do this. Where the open 
source world has &lt;a href="http://sf.net"&gt;SourceForge&lt;/a&gt; and other project 
hosting sites, there is no similar environment where academic papers can live 
alongside the code they reference. Instead, computer science researchers are 
encouraged to publish papers in traditional, copyright-locked, academic 
journals. These journals are not prepared to handle large volumes of code, nor 
should they be. Computer science research was made for the web, and there 
should be a home for it on the web.&lt;/p&gt;
&lt;p&gt;It is intimidating for a harried academic that just wants to get work done, 
and advance in his field, to have to set up the host of tools required to share 
code effectively. If there were a SourceForge-for-academics site where they 
could simply register their paper, drop in the source code, and have a project 
ready made for them, it would be much more likely that they would participate.  
If such a project were to gain steam, the network effect would make it an 
invaluable resource for academic and industry programmers alike. Collaboration 
and open code sharing would lead to much more rapid progress in the field, and 
hopefully encourage the greater rigor that other fields require of their 
practitioners.&lt;/p&gt;
&lt;h1&gt;Two Different Worlds&lt;/h1&gt;
&lt;p&gt;Right now, academics and open-source programmers live largely in two 
separate, often parallel, worlds. Where they do collide, as they do in the 
Haskell language, there is often extremely interesting work being produced.  
Each brings a different, interesting, viewpoint to the table, and greater 
coordination between the two would have nearly universal benefits for 
programmers of every stripe.&lt;/p&gt;
&lt;p&gt;So free the code! If you're an academic programmer, consider publishing the 
code that you have, regardless of what you may think of it. Consider asking the 
journal that you publish in to retain copyright over your work. If you're an 
open source programmer, look for some work in a field that interests you, and 
email the author if he hasn't released his code. Ask him an interesting 
question, and convince him that his code would be useful. If you're both, tell 
the world your ideas for how we can all work better together.&lt;/p&gt;
&lt;h1&gt;Update:&lt;/h1&gt;
&lt;p&gt;I've been contacted already by two scientists who care about the 
reproducibility of programs. First, &lt;a 
href="http://www.cs.mu.oz.au/~gavinb/index.php"&gt;Gavin Baker&lt;/a&gt; wrote to tell 
me about the &lt;a href="http://itk.org"&gt;insight toolkit&lt;/a&gt;, which "is a 
cross-platform open-source image processing toolkit for performing registration 
and segmentation" that attempts to provide reference implementations of 
published algorithms. He also pointed me towards &lt;a 
href="http://www.insight-journal.org/"&gt;The Insight Journal&lt;/a&gt;, which allows 
authors to publish open articles which are automatically verified with CMake.  
This is exactly the type of thing I was hoping to hear about.&lt;/p&gt;
&lt;p&gt;Only a few minutes later, "I. Vlad" wrote to tell me that computational 
geophysicists have a similar system called &lt;a 
href="http://rsf.sourceforge.net/"&gt;Madagascar&lt;/a&gt;, which uses SCons to provide 
automated verification of results. Furthermore, they encourage the use of open 
data sets from &lt;a href="http://software.seg.org/"&gt;the website&lt;/a&gt; for the 
Society of Exploration Geophysicists.&lt;/p&gt;
&lt;p&gt;Good to hear that these scientists are out there making stuff happen, while 
I sit here on my duff.&lt;/p&gt;
&lt;h1&gt;Update 2&lt;/h1&gt;
&lt;p&gt;Made a correction to the update. (Made it clear that the Madagascar project 
did not set up the SEG site - sloppy writing on my part).&lt;/p&gt;
</content>
	</entry>
	<entry>
		<title>Rating My Summer League With Python</title>
		<link href="http://billmill.org/elo_ratings.html" />	
		<id>http://billmill.org/elo_ratings.html</id>
		<updated>2005-06-24T21:41:00Z</updated>
		<summary type="html">I play &lt;a href="http://www.ultimatehandbook.com/uh"&gt;Ultimate&lt;/a&gt; (sometimes
incorrectly called &lt;a href="http://whatisultimate.com"&gt;ultimate frisbee&lt;/a&gt;), 
which is one of the
reasons I've been very light on blogging this summer. As well as playing for my
club &lt;a href="http://www2.upa.org/scores/scores.cgi?div=127&amp;page=3&amp;team=3448"&gt;team&lt;/a&gt;,
I play in the Connecticut &lt;a href="http://ctultimate.com"&gt;summer league&lt;/a&gt;.&lt;p&gt;
This is fun, but as a stats nerd, I have to do something beyond just play. As
such, last summer I began 
&lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;ranking&lt;/a&gt; the teams in my
league. I started out intending to use an 
&lt;a href="http://collegerpi.com/rpifaq.html#Formula"&gt;RPI&lt;/a&gt;-style formula, but
it seemed from my reading that there were better algorithms out there. In
particular, I liked 
&lt;a href="http://www.usatoday.com/sports/sagarin/bkt0405.htm"&gt;Sagarin's&lt;/a&gt; ELO
rankings, which do not take into account margin of victory. Fortunately, I
found the &lt;a href="http://www.masseyratings.com/theory/sauceda.htm"&gt;Sauceda&lt;/a&gt;
ratings, which do, since I think they are significant in my summer league.&lt;p&gt;
After a little tinkering with the constants, I came up with a python
&lt;a href="http://billmill.org/static/files/elo.py"&gt;module&lt;/a&gt; to perform the Sauceda
calculations, and another 
&lt;a href="http://billmill.org/static/files/webpage.py"&gt;one&lt;/a&gt; to print out a 
web page for me. The webpage module is likely to be of little use to anyone,
except to serve as an example of how to use the elo module. You can see the
output &lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;here&lt;/a&gt; if you didn't
click on the link already.&lt;p&gt;
&lt;h2&gt;The Sauceda Rating System&lt;/h2&gt;&lt;p&gt;
The basic idea of the Sauceda rating is that when two teams play a game, they
contest one "game point". This is then combined with a team's "winning
expectancy", which is a function of the difference in the two teams' ratings,
to update each team's rating.&lt;p&gt;
The "game point" division is done via the following formula, where pd is the
game's point differential (i.e. 4 if a team wins 9-5) and pdv is the relative
value of the point differential:&lt;p&gt;
&lt;code&gt;gp = 1 - .4 ** (1 + (pd / pdv))&lt;/code&gt;&lt;p&gt;
Unless a game is tied, in which case each team gets .5, I use this formula to
determine the winner's game point percentage, and the remainder goes to the
loser. Pdv is a parameter which should be tweaked to provide the
best results for the sport under consideration; I use 4.5 for Ultimate (games
go to 9 or 13 in summer league), while the
author of the ranking system uses 11 for basketball.&lt;p&gt;
Next up, we calculate each team's winning expectancy, where Ra is the current
team's rating, and Rb is their opponent's rating:&lt;p&gt;
&lt;code&gt;we = 1 / (1 + 10 ** ((Rb - Ra) / 400)&lt;/code&gt;&lt;p&gt;
Finally, once we know a team's win expectancy (we) and share of the game point
(gp), we can update their ranking, where R0 is their current ranking:&lt;p&gt;
&lt;code&gt;R_new = R0 + K * (gp - we)&lt;/code&gt;&lt;p&gt;
K is another parameter to tweak; I use 60, currently. I've tweaked all the
parameters based on last year's Connecticut ultimate season, so they work for
me, but you'll probably have to find values that work for whatever particular
sport or game you'll be ranking.&lt;p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
I just wanted to share the algorithm I've been using to rank teams in my summer
league, since I think it's pretty neat. There's a lot more interesting stuff
about rating algorithms to talk about, but I think I've rambled long enough for
now. If you haven't yet, you can go see
&lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;the ratings&lt;/a&gt; for my summer
league to get an idea of what, practically, this algorithm does. (You can also
see the neat diagrams I used matplotlib to generate).&lt;p&gt;
Thanks go to Eduardo Sauceda and Ken
Massey for publishing the algorithm, which I've used with only tweaks to the
parameters.
</summary>
		<content type="html">I play &lt;a href="http://www.ultimatehandbook.com/uh"&gt;Ultimate&lt;/a&gt; (sometimes
incorrectly called &lt;a href="http://whatisultimate.com"&gt;ultimate frisbee&lt;/a&gt;), 
which is one of the
reasons I've been very light on blogging this summer. As well as playing for my
club &lt;a href="http://www2.upa.org/scores/scores.cgi?div=127&amp;page=3&amp;team=3448"&gt;team&lt;/a&gt;,
I play in the Connecticut &lt;a href="http://ctultimate.com"&gt;summer league&lt;/a&gt;.&lt;p&gt;
This is fun, but as a stats nerd, I have to do something beyond just play. As
such, last summer I began 
&lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;ranking&lt;/a&gt; the teams in my
league. I started out intending to use an 
&lt;a href="http://collegerpi.com/rpifaq.html#Formula"&gt;RPI&lt;/a&gt;-style formula, but
it seemed from my reading that there were better algorithms out there. In
particular, I liked 
&lt;a href="http://www.usatoday.com/sports/sagarin/bkt0405.htm"&gt;Sagarin's&lt;/a&gt; ELO
rankings, which do not take into account margin of victory. Fortunately, I
found the &lt;a href="http://www.masseyratings.com/theory/sauceda.htm"&gt;Sauceda&lt;/a&gt;
ratings, which do, since I think they are significant in my summer league.&lt;p&gt;
After a little tinkering with the constants, I came up with a python
&lt;a href="http://billmill.org/static/files/elo.py"&gt;module&lt;/a&gt; to perform the Sauceda
calculations, and another 
&lt;a href="http://billmill.org/static/files/webpage.py"&gt;one&lt;/a&gt; to print out a 
web page for me. The webpage module is likely to be of little use to anyone,
except to serve as an example of how to use the elo module. You can see the
output &lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;here&lt;/a&gt; if you didn't
click on the link already.&lt;p&gt;
&lt;h2&gt;The Sauceda Rating System&lt;/h2&gt;&lt;p&gt;
The basic idea of the Sauceda rating is that when two teams play a game, they
contest one "game point". This is then combined with a team's "winning
expectancy", which is a function of the difference in the two teams' ratings,
to update each team's rating.&lt;p&gt;
The "game point" division is done via the following formula, where pd is the
game's point differential (i.e. 4 if a team wins 9-5) and pdv is the relative
value of the point differential:&lt;p&gt;
&lt;code&gt;gp = 1 - .4 ** (1 + (pd / pdv))&lt;/code&gt;&lt;p&gt;
Unless a game is tied, in which case each team gets .5, I use this formula to
determine the winner's game point percentage, and the remainder goes to the
loser. Pdv is a parameter which should be tweaked to provide the
best results for the sport under consideration; I use 4.5 for Ultimate (games
go to 9 or 13 in summer league), while the
author of the ranking system uses 11 for basketball.&lt;p&gt;
Next up, we calculate each team's winning expectancy, where Ra is the current
team's rating, and Rb is their opponent's rating:&lt;p&gt;
&lt;code&gt;we = 1 / (1 + 10 ** ((Rb - Ra) / 400)&lt;/code&gt;&lt;p&gt;
Finally, once we know a team's win expectancy (we) and share of the game point
(gp), we can update their ranking, where R0 is their current ranking:&lt;p&gt;
&lt;code&gt;R_new = R0 + K * (gp - we)&lt;/code&gt;&lt;p&gt;
K is another parameter to tweak; I use 60, currently. I've tweaked all the
parameters based on last year's Connecticut ultimate season, so they work for
me, but you'll probably have to find values that work for whatever particular
sport or game you'll be ranking.&lt;p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
I just wanted to share the algorithm I've been using to rank teams in my summer
league, since I think it's pretty neat. There's a lot more interesting stuff
about rating algorithms to talk about, but I think I've rambled long enough for
now. If you haven't yet, you can go see
&lt;a href="http://llimllib.f2o.org/elo/elo.html"&gt;the ratings&lt;/a&gt; for my summer
league to get an idea of what, practically, this algorithm does. (You can also
see the neat diagrams I used matplotlib to generate).&lt;p&gt;
Thanks go to Eduardo Sauceda and Ken
Massey for publishing the algorithm, which I've used with only tweaks to the
parameters.
</content>
	</entry>
</feed>
