Evaluating A Few Bracket Picking Models

Since I made the bracket randomizer (please go read that if you haven't, or this won't make any sense to you) I've been wondering about its performance.

Included in the bracket randomizer are four models used to pick which team will win a game; one is quite simple and the other three are variations on a slightly more complicated scheme:

For fun, I hacked up a couple other possible models to compare to the ones above:

To check how well the models predict, I ran each one 10,000 times and calculated how many games it predicted correctly:

We can see immediately that, of 38 games so far, the higher-seeded team has won 30 of them and the higher Pomeroy-ranked team has won 33. It should also come as no surprise that the random model most frequently predicted half the games right.

As the value of n in the Pomeroy random model increases, the curve moves from halfway in between the random and the Pomeroy strict model, with a large spread of possibilities, to a closer and closer approximation of the Pomeroy strict model.

In most bracket pools, the number of games doesn't matter as much as the number of points scored. It should not surprise us that this graph doesn't look much different from the graph of total games predicted accurately.

The most important statistic for your bracket pool is the number of points you can still win; with Kansas (a #1 seed) out already, many people have lost their predicted champions. This creates an immediately obvious pattern in the possible remaining points for the seed model.

Since the seed model always picks all four #1 seeds to the final four, then chooses randomly among them, half the time it picks Kansas to lose its semifinal, and has a pretty darn good total points remaining score. Half of the remaining time, it chooses Kansas to win the national championship and has a poor one; otherwise Kansas loses in the championship and the remaining score is in the middle. This explains the seemingly strange shape at the top of the graph.

Another thing the whole graph shows us is the true awfulness of the pure random model; its most common result leaves your bracket with an underwhelming 37 points remaining.

Let's zoom in and check out what's going on down below the seed model.

Again, as n increases in the Pomeroy random model, it approximates the strict Pomeroy model more closely. What's interesting to me, though, is between the two and the three repetition models; the two repetition model has a much greater spread than the three, and many fewer cases right near the Pomeroy line. I don't have a good theory for why this is, so if you do, please let me know.

In my bracket pools, anybody with a possible score over 100 is doing well, and anybody with a possible score of 140 or higher is kicking serious butt. If you picked your bracket using the Pomeroy strict 2 rep model (like I did, incidentally), you have roughly a 16.3% chance of having >=140 points remaining, and a 71.4% chance of having >=100 points remaining.

Unfortunately, I don't have good data to know what is a good "points remaining" score. If you do have access to such data, please let me know.

But what if you want more? What if you want not just to win your office pool, but to beat everybody in the bracket? Well then, you'll want to zoom in on the very high end of the distribution.

The Pomeroy model with the most randomness has the very highest scores, with two brackets having a whopping 177 possible points remaining. It's also apparent that the less random a model is, the less highly unlikely events it creates, and the less likely it is to produce a world-beating near-perfect bracket.

If you've got an idea for a different model you'd like to try out, or you just want to show me where the bugs in my code are, you can go check it out at github.

Mar 24, 2010