Wednesday, March 20, 2013

Predictions!

This long overdue post may wind up on the longer side as there have been a lot of pretty important and interesting results that have come out of my model. Anyway, before talking about the results of my model, let's talk about picking your bracket in general, and some useful guidelines to consider when choosing picks for your office or friendly pool.

Does your pool provide bonus points for upset picks?
This is the first question you should ask yourself when picking your bracket. Many pools will add bonus points to correctly predicted "upset" picks as an added incentive to make what would otherwise be considered traditionally risky choices. For example, consider a pool that gives 1 bonus point for a correct upset pick in the first round regardless of the relative seeding of the teams; so a 9-seed beating an 8-seed is treated identically to a 15-seed beating a 2-seed. For second round upsets, 2 bonus points are awarded, etc.

In the first round this kind of incentive structure can add a lot of complications to what might otherwise be considered optimal picks, especially if we assume that correct first-round picks grant 1 point themselves. For the sake of example, let's suppose for a second that the win probabilities in Nate Silver's Blog are accurate, at least for the first round. If we take a look at the South Bracket we the 7-seed San Diego State has a 56.1% chance of beating 10-seed Oklahoma. Without the bonus points in place, the optimal choice when maximizing your expected points would be to always pick San Diego State, as you'd expect 1-point 56.1% of the time for a correct pick, and 0-points 43.9% of the time when Oklahoma is expected to win.



However, what happens to this example when we add the 1-point upset bonus for correctly guessing an Oklahoma victory? Well, when you look at the expected-payoffs for each of your two choices you can quickly see that the optimal strategy has changed: picking San Diego State will result in 1-point 56.1% of the time and 0-points 43.9% of the time, for an expected 0.561 points. Picking Oklahoma will give you 0-points 56.1% of the time when S.D. State wins, but will net you 2-points the 43.9% of games when Oklahoma ends up winning, thus giving you an expected 0.878 points! Thus, it becomes clear that certain upsets are worth picking purely from the perspective of maximizing your own points (there are also some game theory arguments to picking upsets that we'll discuss a little bit later). So, for the first-round what is the break-even point where you'll want to pick the upset over the favored team? Well, let's define $A$ as the probability that the higher seed wins, and $B=1-A$ as the probability that the lower-seeded team wins the upset. Assuming you get 2-points for the upset pick and 1-point for the non-upset pick, in the first round you'll want to pick the upset if $2*B>1*A$ or $2(1-A)>A$ which simplifies into $A<2/3$. Thus, if the probability of the higher-seeded team winning a first-round game is less than 66.7%, you'll want to pick the upset in a system like this.

At this point, if we look back at Nate Silvers predictions, we can see that Missouri, Oregon, St. Mary's, Cincinnati, Wichita St., Iowa St., Minnesota, Oklahoma, Bucknell, Davidson and Colorado all become optimal upset-picks for maximizing your expected points out of the first round, with Villanova being right on the fence.

Applying game theory to your bracket decisions
Up until this point we've been talking about optimal picks in the sense that they represent the picks that maximize your expected point total given the likelihood of various teams' performances. However, is this the best way to win your bracket? It's certainly the best way to maximize the point-total you'll receive in your pool, but is that really what you're after? No. You want to win your pool, and you want to find the strategy that maximizes the likelihood of that happening regardless of the point-total at the end of the day. After all, a win's a win, and close only counts in horse-grenades and sneakers, or something like that.

Anyway, what do I mean by all of this? Well, game theory tells us that your optimal strategy given a particular set of goals and circumstances will depend on the strategies of your fellow competitors. Thus, it depends whether the pool you are entering is populated by a bunch of risk-hungry Basketball fanatics who are convinced that X team is severely under- or over-rated, or are you playing with a more conservative group of coworkers or friends who might tend to make safer, higher-seeded picks on average.

Let's assume that you're playing in a pool where people will tend to make safer picks based on seed and expected-value-maximization, as this is generally the norm. In this scenario it is actually advantageous in certain situations to make picks that might be non-optimal in terms of expected value. If the expected value for Team A is 2.5 points, while the expected value for Team B is 2.3 points, it will most likely make sense to pick the lower expected value team if you believe most of your competitors will do the opposite. The reason is that you want to find places where you can create distance from your competition in terms of points. If you play it safe you'll likely end up near the top, but most likely not at the top. Picking teams that you feel others in your pool will choose against can substantially increase the likelihood of you winning the overall pool, especially if the expected payoff is relatively close between the two choices.

Alright, so now that we know how we're going to think about our picks let's take a look at what my model actually produced.

Results
So after all this work, what did the model have to say about the teams in this years tournament? Well, after running and testing extensively, the final iteration spat out a surprisingly accurate and interesting set of results:

When compared to the AP Rankings these results are fairly consistent, especially towards the top half of the list. However, there are a few notable differences: AP lists Florida as the 12-seed and Memphis as the 19-seed, who my model places in 3rd and 4th, respectively. As far as Florida is concerned, I'm relatively proud my model was able to place them so highly, as Nate Silver has calculated them as being the 3rd most likely to win the overall tournament, while Ken Pomeroy, a famous sports statistician (from whom I have taken all my underlying data), actually ranked Florida 1st among all Division-I teams in the country. Memphis on the other hand... we'll find out shortly how far-off I was there. At the very least, there's no denying they've had a great season as far as absolute results are concerned; they actually have the second-highest win-total of any team this season at 30, trailing only Gonzaga at 31.

Okay, that's great and all, but how the hell did we get these rankings?
As mentioned in previous posts, we used the Elo ranking system with a few key modifications. Let me go over those changes now.

  • As the Elo system was initially designed for Chess, home-court advantage is not something that comes up very often. Thus, when calculating a team's probability of winning a game, the home team's chances are increased accordingly. As it turns out, in NCAA Division-I basketball, the home team wins an astonishing 60% of the time. I wasn't expecting a number quite this large, but never-the-less incorporated this fact into the model.
  • Margin of victory now plays a very important factor in determining a team's Elo gains. While in absolute terms a win is a win, especially in a single-elimination tournament, losing a game by a single point is certainly much less damning than getting blown-out by 25. To take this into account, I calculated the average margin of victory across all games over the past 12-years, which came to 12.92-points. Winning by double this margin of victory resulted in an Elo gain twice as large as would otherwise be expected. Losing by only 3-points would result in an Elo loss of 3/12.92, or 23% of what would otherwise be expected. This change alone shot Florida from 28th in my rankings to 3rd.
  • As there are too few games in a season to accurately predict a teams Elo, or skill level, I simply doubled the results of each season, running through all the games start-to-finish twice.
  • As not all players or coaches will remain on a team year-to-year I partially reset each team's Elo at the end of a season. As you may recall, the "neutral" Elo in our model is 1200. To do this partial reset, I moved each teams Elo 1/3rd of the way back to 1200. Thus, a 1500 Elo team would be reset to $(2*1500+1200)/3=1400$ for the beginning of the proceeding season. While this is a relatively arbitrary split, it takes into account that most but not all of a team's skill is retained year-to-year.
Alright, with all that said, I'll let you get back to making last-second changes to your bracket. For me, I'll probably go with the final-four as suspected by my model: Louisville, Florida, Gonzaga and Indiana. However, do not take this as any indication that I have any faith in my model: I am mostly interested in seeing how it performs. Plus, my only unwaivering rule is that Xavier and Gonzaga must always win their first round games. Seeing as Xavier isn't even in the tournament this year, it's only natural that I put all my eggs in the 'Zags basket.


To calculate the probability that a given team will win a match, use the Elo's provided in the above link in conjunction with the formula provided in my second blog post. Have fun!