Ranking Ultimate Teams With the Elo Rating Algorithm

What if we ranked teams with a different system?

chess-pieces

Anybody who has even casually followed USA Ultimate’s college and club series since 2011 has surely heard somebody complain about the algorithm that USAU uses to rank teams and distribute bids. Opinions have ranged from neutrality (“just win all your games”) to tacit endorsement (“at least they don’t use the previous year’s nationals results anymore”) to harsh criticism, only occasionally coupled with an alternate proposal (shout out to probabilistic models).

But for all of its controversy, the USAU algorithm is generally competent at evaluating the relative strength of teams using only their box score results; the (rightful) point of contention is in how that algorithm, and its weaknesses, are leveraged as a part of the nationals bid distribution system. However, while there is enough fuel in the prior statement to power a series of articles, this piece will focus more on applying another common ranking tool — one used in many other competitions — to rank college ultimate teams: Elo Ratings.

Elo Background/Primer

Chess master and physicist Arpad Elo originally created his eponymous algorithm to improve how the US Chess Federation ranked its wide pool of competitive players. The ratings were designed to meet the task of ranking the relative skills levels of players in situations where the particular players may not necessarily meet head to head.

Elo ratings are still used to rank chess players today, and the algorithm has found its way to the desk of many other sports statisticians, producing ELO ratings for world football and the NFL, as well as applications ranking other large groups of competitors in games like Go and League of Legends.

The concept of the algorithm is quite simple: a player’s rating is re-evaluated after every contest (or collection of contests) based on the rating of his opponent and the result of the match. All matches are evaluated chronologically so that the rating change derived from the match is a function of what the players’ ratings were at the time rather than what they might end up being later (a notable difference from USAU). Further, the rating change that occurs after a game is a function of the following:

  • Relative rating of the teams. Based on the initial rankings of the players, Elo ratings calculate a win expectation percentage. The rating points awarded or subtracted are relative to the result of the game vs. the expected outcome. A victorious team that had a 10% chance of winning will earn more points than a team who won after being rated as 90% favorites. The win expectation formula can be changed, but is often done such that a 400-point rating differential indicates a 90% win expectation for the favorite.
  • Margin of victory (in many variations). In competitions where margin of victory (MOV) is quantifiable (ultimate, but not chess), the point change will be correlated with MOV. Different sports have different implementations of their MOV multiplier. Many implementations will award a high margin of victory with a high multiplier, but then lower it again if the winning team was rated much higher than the losing (a big win by a favorite gets less weight than a big win in an expected toss up).
  • The update parameter (or k-value). The k-value represents how much influence the latest result has on one’s ranking change. High k’s mean that the latest game will have a larger effect on a player’s rating, while a low k lightens the influence.

Elo Specs

Assumptions

– Each party/player’s performance in a match is a normally distributed random variable around their true “skill level.”

– New players enter the rankings at a fixed “average” rating (generally 1500)

Notable Properties

– A player can never hurt their ranking by winning. The size of the benefit may be affected by margin of victory and the relative ranking difference between the players, but ranking will never decrease. This differs from the USAU model where a close loss may help or hurt a ranking.

– In the same way, a loss can never help a player’s ranking. The size of the penalty may range based on the margin of loss and the relative ranking difference between the players, but ranking will never increase

– Rankings are state functions. A player’s future rating is independent of past results– it is conditioned only on his current ranking, and on the performance in his next match. This is in contrast to the USA Ultimate ranking where the future rating is calculated by a new result plus all past results.

Weaknesses

– The performance curve. The assumption of a normal distribution of performance, while likely accurate in the infinite, introduces bias into the system.

– Provisional rankings. When a player first enters the system, they are given the ‘average’ rating. If the player’s true skill is much higher or lower than average, then for the first n games the player participates in his opponents are being disproportionately rewarded/penalized

– Tuning the k-factor. There is definitive guide for how to tune the update speed of the algorithm. While in general high-sample sports like baseball lend themselves to low values, and lower-sample sports like American Football favor faster updates, much of the guidance for the update speed is simply heuristic analysis by the investigator.

Adaptations for Ultimate

– In order to avoid the bias of provisional rankings, the teams were given initial Elo ratings based on the prior year’s USAU ranking. The top team in the 2015 USAU rankings (Pittsburgh) was initialized to 1700, with each subsequent team initialized at one point lower. Teams that were not ranked in 2015 received the average rating of 1500 when they entered the system. The women’s receive similar initialization but starting at 1650 so that the average initialized rating was also 1500.123

Rankings

Below are the elo rankings generated for the 2016 season, including Conferences and Regionals.

Men’s Division

RankTeamElo Rating
1Oregon2008.51
2Massachusetts1982.85
3Minnesota1960.44
4Pittsburgh1940.95
5North Carolina-Wilmington1938.86
6Wisconsin1919.38
7Georgia1893.68
8Carleton College1889.76
9Case Western Reserve1888.41
10Washington1880.10
11North Carolina1877.94
12Stanford1868.68
13Harvard1858.89
14Cal Poly-SLO1854.86
15British Columbia1853.75
16Colorado1852.73
17Ohio State1850.29
18Brigham Young1845.31
19Texas A&M1839.42
20Franciscan1837.36
21Florida1832.04
22Georgia College1821.27
23Michigan1820.47
24Florida State1818.52
25Brandeis1818.26
26Georgetown1815.66
27Virginia Commonwealth1814.01
28Penn State1813.15
29Auburn1810.64
30Arkansas1805.33
31Connecticut1801.19
32Northwestern1800.71
33Georgia Tech1795.68
34Lewis & Clark1795.67
35Bryant1794.64
36Richmond1790.74
37Maryland1789.24
38New Hampshire1787.28
39California-San Diego1777.85
40Cornell1777.38
41Texas1773.92
42Missouri1772.18
43Air Force1771.71
44Virginia Tech1767.49
45Notre Dame1766.57
46James Madison1763.60
47Carleton College-GOP1762.44
48Utah1761.10
49Colorado College1755.26
50Baylor1754.42

Women’s Division

RankTeamElo Rating
1Oregon1885.30
2Central Florida1858.32
3Stanford1856.27
4British Columbia1835.82
5Michigan1815.30
6Virginia1800.38
7Wisconsin1789.60
8Colorado1782.21
9Texas1778.09
10Pittsburgh1777.55
11Whitman1774.62
12North Carolina1770.23
13UCLA1769.50
14California1753.71
15Washington1752.19
16Vermont1750.34
17North Carolina-Wilmington1749.66
18Notre Dame1742.70
19California-San Diego1742.11
20Delaware1737.40
21Minnesota1733.23
22Ottawa1731.94
23Williams1721.66
24Georgia Tech1720.47
25Dartmouth1719.99
26Colorado College1714.71
27Connecticut1711.09
28Penn State1709.42
29Ohio State1709.06
30Florida State1708.68
31Iowa State1705.23
32Tufts1705.10
33St Olaf1703.19
34Maryland1699.76
35California-Davis1697.20
36Carleton College-Eclipse1695.25
37Liberty1686.01
38Kansas1679.82
39Bates1678.67
40Rice1678.44
41Colorado State1675.92
42Wesleyan1673.29
43Northeastern1669.60
44Chaos (Independent)1667.04
45Northwestern1664.82
46Texas State1663.80
47Mount Holyoke1659.83
48West Chester1659.21
49Georgetown1657.95
50George Washington1651.01

  1. The win expectation formula was tweaked so that a 300 point differential was a 90% expectation. 

  2. The MOV multiplier used was log(pointdiff + 1)(2.2/[(elo_diff)*.001+2.2)] 

  3. The k-factor used was 20, which is comparable to the factor used in models for sports that play ~20 games per season 

  1. Cody Mills
    Cody Mills

    Cody Mills is an Ultiworld reporter and analyst. He is a former captain and coach of the Stanford men's team and currently coaches Cal Poly SLOCORE and Oakland Guerrilla. Before retiring due to injury in 2014 he played club for Boost FC and in the MLU for the San Francisco Dogfish. He aspires to be the Nate Silver of Ultiworld and is the keeper of frisbee-rankings.com

TAGGED: , ,

More from Ultiworld
Comments on "Ranking Ultimate Teams With the Elo Rating Algorithm"

Find us on Twitter

Recent Comments

Find us on Facebook