Wednesday, December 5, 2012

The Data Has Arrived!

After a long search for a free and comprehensive set of NCAA Men's Basketball results I have finally succeeded in finding what we need. Ken Pomeroy has been nice enough to let me use his most basic game results for this project, for which I am immensely grateful. As mentioned in a previous post, the data we care about most at this stage is relatively simple: for each game between Division I teams we want to know which two teams are involved, where the game took place, and what the score was. Thankfully, that's exactly what Ken has given us via his raw data.

Alright, so let's dive into the predictive model. I was able to put together a quick macro in Excel that ran through every game in the 2000-1 season, and modified the ratings of the teams involved in each game based on the outcome. In all, there were just under 5,000 games played throughout the season involving over 500 different teams. For the first round of tests I used a value of $K=36$ (if you need to remind yourself what the Elo calculations look like, check my previous post).

At the beginning of the season, each team was given a starting rating of 1200. I just went through and churned out new ratings for each team after each game using a simple Excel macro. As I just wanted to perform a super, super simplistic test, home court advantage is not taken into account. Additionally, it is an unrealistic assumption to assume that all teams start the season at the same skill level. Thus, I did not particularly expect these first results to be terribly accurate, but hoped that at least the basic concept would work reasonably well.

After the running through all 5,000 games the final Elo for each team was calculated. These new Elo ratings ranged from 1664 to 846. The table below shows the top- and bottom-10 performing teams based on the results of this first season.
In the 2001 NCAA Tournament Duke ended up beating Arizona in the finals, with Michigan State and Maryland losing in the semi-finals. Considering all of the Final Four teams made it into our top 7, I'd say we're off to a smashing start. This is great news for us, as this is an incredibly simplistic first cut at a modeling process that will undergo many, many iterations in the coming months.

One last test we can quickly perform is comparing our end-of-season results to the last week of AP Top 25 rankings from 2001. The table below shows where our model placed each of those teams after at the end of the season.

While our model pretty clearly begins to show some weird results with respect to teams towards the bottom of the AP list, the top 15 teams from the AP Polling are identical to the top 15 from our modeling with one exception: AP ranked Iowa State as the 10th best team (who we had ranked 17th, so right up there), and we had Gonzaga rounding out our top 15 while the Zags were nowhere to be found in the AP's list. Considering Gonzaga made it to the Sweet Sixteen in the 2001 tournament, this could be seen as a lapse in judgement on the part of the AP rather than a failure in our own modeling.
Alright, that's it for this time. Look for a post in the near future overviewing how we're going to deal with home court advantage, and hopefully extend our model all the way through the 2011-2012 season!

Sunday, October 21, 2012

What the hell is Elo, anyway?

Saying we're going to employ the Elo system to rank NCAA teams is fine, but what does that actually mean? In this post we'll think about exactly what we'd like to get out of any rating system, and how the Elo algorithm helps us get there.

So, first things first, what exactly do we want to get out of any rating system? Any ranking system needs to be able to create a totally ordered set out of the teams in the league. This simply means that, given any two teams, either one team can be deemed better or they are equally good (or bad). Additionally, any totally ordered set maintains transitivity, e.g. if the Celtics are better than the Nuggets and the Nuggets are better than the Bobcats, then the Celtics are better than the Bobcats.

The most basic ranking system that can satisfy the requirements of a totally ordered set is a "rank-ordering". This involves ranking the $n$ teams in a league $1$ through $n$. Regular season standings are the most simple way of doing this, and as we know are not always particularly accurate. In a rank-ordering, if you take any two teams, the team with the lower number will be "better". However, this doesn't tell you how much better an individual team might be.

Moving from the world in which we merely rank teams against one-another to a world in which those rankings can give us some insight into exactly how much better a certain team is can be a very complicated step. This is where our good friend Arpad Elo comes in.

The Elo Rating System
Arpad Elo contemplating his next move

Initially designed as an improved rating system for use in international chess competitions, Arpad Elo ended up creating a ranking algorithm that has since been put to use in a variety of different games and settings. In the online computer game League of Legends, players are ranked using the Elo rating system, and Nate Silver (of the fivethirtyeight election forecasting blog) has implemented a modified Elo algorithm to rank baseball teams in the MLB. Elements of the Elo system are also used in helping determine the BCS rankings in NCAA football.

What is so special about this algorithm that warrants such widespread use? Well, it provides probabilistic insight into the relative skill difference between any two teams. This is incredibly important as instead of knowing that Team A is better than Team B, we can instead say that Team A has a 75% chance of winning a match against Team B.

Damn. Now, to learn exactly how this is accomplished we'll take a slightly more technical look at the actual algorithm. I'll try to use my "sick-nasty teaching skillz" to convey the mathematics behind the theory in an accessible manner, so bear with me.

Heavy favorites UNC vs underdogs UVM
In a typical Elo system every team team is assigned a numerical rating. The higher the rating, the better the team. After two teams play each other, their ratings are modified based on the difference between the expected outcome of the game and the actual outcome. For example, if a team is highly favored and wins, their gain in rating will be modest. Conversely, if a massive underdog pulls out an upset victory they will see a more substantial increase in their rating. The drops in rating for the losing teams would be similarly modest and massive, correspondingly. This makes intuitive sense, as a favored team shouldn't be greatly rewarded for performing at their expected level, while a team with a 10% chance of winning a game shouldn't be overly punished for losing.

To show how this intuitive system works mathematically we need to begin by defining some variables. We'll let $R_A$ and $R_B$ signify the Elo rating of Team A and Team B in our hypothetical encounter. These are the numerical ratings that can be used to create a rank-order of the teams. The next step involves converting these ratings into expected outcomes. To do this we begin by assigning each team a new variable that we'll call Victory Points. $V_A$ and $V_B$ will signify the Victory Points for teams A and B respectively. We define Victory points with the following formula: $$V_A=10^{R_A/400}$$
Whenever two teams play each other, the result is directly proportional to the number of Victory Points each team has. If we represent Victory Points with marbles, each team would throw all their marbles into a bucket, and the winner would be determined by randomly picking one out. Thus, if a team has twice as many Victory Points as their opponent, they will be twice as likely to win. Therefore, if we let $E_A$ represent the expected outcome for Team A in a contest against Team B, then we can define $E_A$ with the following formula: $$E_A=\frac{V_A}{V_A+V_B}$$
Similar to the NBA Draft Lottery, where the worst performing teams are given more lotto balls, thereby increasing the likelihood that they get an early draft pick
In other words, the chance that you would pick out one of Team A's pebbles from the pool of both teams' pebbles. Depending on how much sense this all makes to you, you might be wondering where the hell that 400 came from in the Victory Points formula above. Well, Arpad decided that he wanted a 400 rating advantage to signify that a team was 10 times more likely to win than their opponent. It's a relatively arbitrary number, but he needed to use something there and chose 400. It makes the ratings slightly easier to interpret on face value, and puts most teams in the 1000 to 2000 rating range, which is where he wanted to work (presumably because it allows for four digits of precision when using whole numbers).

This expected outcome, $E_A$, is going to be a number between 0 and 1. If we consider a loss to be 0 and a win to be 1, this directly becomes the chance that a team will win. Now we need to figure out how a team's rating will change after a game based on the actual result. If we let $S_A$ signify the actual result for Team A ($S_A=1$ if Team A wins, and $0$ if they lose), then we can define Team A's change in rating, $\Delta_A$, with the following formula: $$\Delta_A=K(S_A-E_A)$$ This new variable $K$ is called the k-value. It is the maximum possible rating change that a team can achieve. For example, if Team A had a 100% chance of winning ($E_A=1$) according to their rating and ended up losing ($S_A=0$), their rating change would be $\Delta_A=K(S_A-E_A)=K(0-1)=-K$. The opposite is true if a team with a 0% predicted chance of winning ended up taking the game in the infinitely unlikely upset. This is a parameter that can be modified depending on how accurate the ratings are performing, but Arpad noticed that k-values between 16 and 32 worked the best with his system in Chess. Since then, similar k-values have been used successfully. In the computer game League of Legends a k-value of 24 is used. We will begin with $K=24$ in our analysis, but will experiment with other values to see if they are more or less appropriate.




Thus, if Team A has a rating of $R_A$ going into a game, it will have rating $R'_A$ at the end, where $R'_A$ is defined as: $$R'_A=R_A+K(S_A-E_A)$$ To illustrate exactly how this works, let's make up some numbers and use this formula in a real example. Let's consider that UVM vs UNC game from the 2012 NCAA men's tournament. We all know UNC was highly favored over UVM, but for the sake of argument we'll assign each team and Elo rating and look at how those rating would change depending on the actual outcome of the game. We'll begin by giving UNC a rating of $R_{UNC}=1925$ and let UVM be rated at $R_{UVM}=1650$. In this scenario, what is the expected outcome of this game?


To determine the expected outcome we have to calculate the number of Victory Points that each team will contribute towards the end result. UNCs Victory Points can be calculated using the above formulas as $V_{UNC}=10^{R_{UNC}/400}=10^{1925/400}=64,938$. Similarly we can calculate UVMs Victory Points as $V_{UVM}=10^{R_{UVM}/400}=10^{1650/400}=13,335$.

Since we know the expected outcome of the game is directly proportional to the Victory Points we then know that the chance UNC wins is equal to $E_{UNC}=\frac{V_{UNC}}{V_{UNC}+V_{UVM}}=\frac{64,938}{64,938+13,335}=83\%$. Similarly UVMs chances can be found to be $E_{UVM}=\frac{V_{UVM}}{V_{UNC}+V_{UVM}}=\frac{13,335}{64,938+13,335}=17\%$.

If we assume a k-value of 24 as discussed earlier, we can then determine each teams new ratings depending on the outcome of the game. Let's take the more likely scenario where UNC wins ($S_{UNC}=1$ and $S_{UVM}=0$). UNCs new rating will be defined as $R'_{UNC}=R_{UNC}+K(S_{UNC}-E_{UNC})$$=1925+24(1-.83)=1929$. Similarly UVMs new rating will be $R'_{UVM}=R_{UVM}+K(S_{UVM}-E_{UVM})$$=1650+24(0-.17)=1646$ As you can see, UNC's rating will increase by 4 points while UVM will drop 4 points.

However, what happens if UVM pulls out the upset victory? In this case ($S_{UNC}=0$ and $S_{UVM}=1$) we have different changes in the teams' ratings. Here we see UNC drops to $R'_{UNC}=R_{UNC}+K(S_{UNC}-E_{UNC})$$=1925+24(0-.83)=1905$ while UVM jumps up to $R'_{UVM}=R_{UVM}+K(S_{UVM}-E_{UVM})$$=1650+24(1-.17)=1670$. Thus UNC and UVM will lose and gain 20 points of rating respectively. This again makes intuitive sense, as poor performing teams should be more greatly rewarded for toppling an established giant than they should be punished for losing a game they had no right to win.

At this point in our journey, I'd like to take a moment for some self-reflection. If you've made it this far and are not utterly confused, please give yourself a hearty pat on the back, and thanks for legitimizing this exercise in mathematical education for myself. If there's anything that's unclear, leave a comment below and I can try to provide an explanation from a different perspective.

Stay tuned for a post in the next couple of days highlighting where in this process we can make modifications, and what some of the most necessary and simple modifications might be. For example, where can we incorporate home-court advantage into this process?

Tuesday, October 16, 2012

Welcome

Welcome to the Elo Sports blog. I'm Nick, and I'll be taking you on a magical tour through NCAA basketball statistics and history in an attempt to accurately rate teams for the 2013 NCAA tournament. As with any statistical rating algorithm ours will not be perfect. But that's the point of this whole process: to iterate, refine and test different rating systems and algorithms until we have one we feel comfortable using this March.

The first rating system we'll implement was developed by Árpád Élő, a Hungarian Physicist and avid chess player. The Elo system was originally designed as an improved rating system for chess but has since been used to rate the relative skill level of players in many other games and sports as well. While the rating system was designed for a two-player game, it's possible to apply it to team sports simply by considering each team as an individual "player". The athletes or competitors that make up each team can be considered different strengths and weaknesses of the "player". Just as certain chess players might excel in particular areas of the game, so too do different basketball team excel in particular areas depending on the roster and coaching. Finally, we'll consider roster and coaching adjustments between seasons as a player improving or weakening over time in the two-player game, something we'd want the rating system to be able to take into account.

The Elo system at a glance:
  • Each team is assigned a numerical rating.
  • In each game the statistically likely outcome is calculated. Each team is given a probability of winning.
  • Based on the probability of winning and the actual result a new rating is calculated for each team
    • A team with a low probability of winning a game will see their rating increase by more if they win, while losing will result in a more modest decrease. The converse is true for teams with a high probability of winning.
  • Ratings are recalculated after every game throughout a season. The more games you have the more accurate the ratings become.
The first step in this process will be modifying the Elo system to suit our particular needs. If we can build a system that we're comfortable with, applying it to different sports will be the next step. In the next several posts I'll go over several of the more interesting modifications we'll need to make to the pure 2-player game that the Elo algorithm was originally meant to rate. These include things like the home-field/court advantage, season-to-season rating resets, and stronger vs weaker conferences and schedules. So stay tuned for some riveting sports and math discussions!

With that, I'd like to finish by saying thanks for reading, and leave you all with a great Lonely Island video that should fairly accurately describe the tone and direction this blog will take in the coming months.