Ranking Ultimate Teams With the Elo Rating Algorithm

What if we ranked teams with a different system?

May 25, 2016 by Cody Mills in Analysis with 27 comments

Anybody who has even casually followed USA Ultimate’s college and club series since 2011 has surely heard somebody complain about the algorithm that USAU uses to rank teams and distribute bids. Opinions have ranged from neutrality (“just win all your games”) to tacit endorsement (“at least they don’t use the previous year’s nationals results anymore”) to harsh criticism, only occasionally coupled with an alternate proposal (shout out to probabilistic models).

But for all of its controversy, the USAU algorithm is generally competent at evaluating the relative strength of teams using only their box score results; the (rightful) point of contention is in how that algorithm, and its weaknesses, are leveraged as a part of the nationals bid distribution system. However, while there is enough fuel in the prior statement to power a series of articles, this piece will focus more on applying another common ranking tool — one used in many other competitions — to rank college ultimate teams: Elo Ratings.

Elo Background/Primer

Chess master and physicist Arpad Elo originally created his eponymous algorithm to improve how the US Chess Federation ranked its wide pool of competitive players. The ratings were designed to meet the task of ranking the relative skills levels of players in situations where the particular players may not necessarily meet head to head.

Elo ratings are still used to rank chess players today, and the algorithm has found its way to the desk of many other sports statisticians, producing ELO ratings for world football and the NFL, as well as applications ranking other large groups of competitors in games like Go and League of Legends.

The concept of the algorithm is quite simple: a player’s rating is re-evaluated after every contest (or collection of contests) based on the rating of his opponent and the result of the match. All matches are evaluated chronologically so that the rating change derived from the match is a function of what the players’ ratings were at the time rather than what they might end up being later (a notable difference from USAU). Further, the rating change that occurs after a game is a function of the following:

Relative rating of the teams. Based on the initial rankings of the players, Elo ratings calculate a win expectation percentage. The rating points awarded or subtracted are relative to the result of the game vs. the expected outcome. A victorious team that had a 10% chance of winning will earn more points than a team who won after being rated as 90% favorites. The win expectation formula can be changed, but is often done such that a 400-point rating differential indicates a 90% win expectation for the favorite.
Margin of victory (in many variations). In competitions where margin of victory (MOV) is quantifiable (ultimate, but not chess), the point change will be correlated with MOV. Different sports have different implementations of their MOV multiplier. Many implementations will award a high margin of victory with a high multiplier, but then lower it again if the winning team was rated much higher than the losing (a big win by a favorite gets less weight than a big win in an expected toss up).
The update parameter (or k-value). The k-value represents how much influence the latest result has on one’s ranking change. High k’s mean that the latest game will have a larger effect on a player’s rating, while a low k lightens the influence.

Elo Specs

Assumptions

– Each party/player’s performance in a match is a normally distributed random variable around their true “skill level.”

– New players enter the rankings at a fixed “average” rating (generally 1500)

Notable Properties

– A player can never hurt their ranking by winning. The size of the benefit may be affected by margin of victory and the relative ranking difference between the players, but ranking will never decrease. This differs from the USAU model where a close loss may help or hurt a ranking.

– In the same way, a loss can never help a player’s ranking. The size of the penalty may range based on the margin of loss and the relative ranking difference between the players, but ranking will never increase

– Rankings are state functions. A player’s future rating is independent of past results– it is conditioned only on his current ranking, and on the performance in his next match. This is in contrast to the USA Ultimate ranking where the future rating is calculated by a new result plus all past results.

Weaknesses

– The performance curve. The assumption of a normal distribution of performance, while likely accurate in the infinite, introduces bias into the system.

– Provisional rankings. When a player first enters the system, they are given the ‘average’ rating. If the player’s true skill is much higher or lower than average, then for the first n games the player participates in his opponents are being disproportionately rewarded/penalized

– Tuning the k-factor. There is definitive guide for how to tune the update speed of the algorithm. While in general high-sample sports like baseball lend themselves to low values, and lower-sample sports like American Football favor faster updates, much of the guidance for the update speed is simply heuristic analysis by the investigator.

Adaptations for Ultimate

– In order to avoid the bias of provisional rankings, the teams were given initial Elo ratings based on the prior year’s USAU ranking. The top team in the 2015 USAU rankings (Pittsburgh) was initialized to 1700, with each subsequent team initialized at one point lower. Teams that were not ranked in 2015 received the average rating of 1500 when they entered the system. The women’s receive similar initialization but starting at 1650 so that the average initialized rating was also 1500.¹²³

Rankings

Below are the elo rankings generated for the 2016 season, including Conferences and Regionals.

Men’s Division

Rank	Team	Elo Rating
1	Oregon	2008.51
2	Massachusetts	1982.85
3	Minnesota	1960.44
4	Pittsburgh	1940.95
5	North Carolina-Wilmington	1938.86
6	Wisconsin	1919.38
7	Georgia	1893.68
8	Carleton College	1889.76
9	Case Western Reserve	1888.41
10	Washington	1880.10
11	North Carolina	1877.94
12	Stanford	1868.68
13	Harvard	1858.89
14	Cal Poly-SLO	1854.86
15	British Columbia	1853.75
16	Colorado	1852.73
17	Ohio State	1850.29
18	Brigham Young	1845.31
19	Texas A&M	1839.42
20	Franciscan	1837.36
21	Florida	1832.04
22	Georgia College	1821.27
23	Michigan	1820.47
24	Florida State	1818.52
25	Brandeis	1818.26
26	Georgetown	1815.66
27	Virginia Commonwealth	1814.01
28	Penn State	1813.15
29	Auburn	1810.64
30	Arkansas	1805.33
31	Connecticut	1801.19
32	Northwestern	1800.71
33	Georgia Tech	1795.68
34	Lewis & Clark	1795.67
35	Bryant	1794.64
36	Richmond	1790.74
37	Maryland	1789.24
38	New Hampshire	1787.28
39	California-San Diego	1777.85
40	Cornell	1777.38
41	Texas	1773.92
42	Missouri	1772.18
43	Air Force	1771.71
44	Virginia Tech	1767.49
45	Notre Dame	1766.57
46	James Madison	1763.60
47	Carleton College-GOP	1762.44
48	Utah	1761.10
49	Colorado College	1755.26
50	Baylor	1754.42

Women’s Division

Rank	Team	Elo Rating
1	Oregon	1885.30
2	Central Florida	1858.32
3	Stanford	1856.27
4	British Columbia	1835.82
5	Michigan	1815.30
6	Virginia	1800.38
7	Wisconsin	1789.60
8	Colorado	1782.21
9	Texas	1778.09
10	Pittsburgh	1777.55
11	Whitman	1774.62
12	North Carolina	1770.23
13	UCLA	1769.50
14	California	1753.71
15	Washington	1752.19
16	Vermont	1750.34
17	North Carolina-Wilmington	1749.66
18	Notre Dame	1742.70
19	California-San Diego	1742.11
20	Delaware	1737.40
21	Minnesota	1733.23
22	Ottawa	1731.94
23	Williams	1721.66
24	Georgia Tech	1720.47
25	Dartmouth	1719.99
26	Colorado College	1714.71
27	Connecticut	1711.09
28	Penn State	1709.42
29	Ohio State	1709.06
30	Florida State	1708.68
31	Iowa State	1705.23
32	Tufts	1705.10
33	St Olaf	1703.19
34	Maryland	1699.76
35	California-Davis	1697.20
36	Carleton College-Eclipse	1695.25
37	Liberty	1686.01
38	Kansas	1679.82
39	Bates	1678.67
40	Rice	1678.44
41	Colorado State	1675.92
42	Wesleyan	1673.29
43	Northeastern	1669.60
44	Chaos (Independent)	1667.04
45	Northwestern	1664.82
46	Texas State	1663.80
47	Mount Holyoke	1659.83
48	West Chester	1659.21
49	Georgetown	1657.95
50	George Washington	1651.01

The win expectation formula was tweaked so that a 300 point differential was a 90% expectation. ↩
The MOV multiplier used was log(pointdiff + 1)(2.2/[(elo_diff)*.001+2.2)] ↩
The k-factor used was 20, which is comparable to the factor used in models for sports that play ~20 games per season ↩

Cody Mills

Cody Mills currently coaches New York PoNY, Cal Poly SLO, and the USA men's team. He was a captain then coach at Stanford and has previously coached San Francisco Revolver and Oakland Guerrilla in the club division. He also runs frisbee-rankings.com

TAGGED: algorithm, Elo Rating, USA Ultimate

Ranking Ultimate Teams With the Elo Rating Algorithm

May 25, 2016 by Cody Mills in Analysis with 27 comments

Cody Mills

More from Ultiworld

Send it Back: An Ultiworld Video Series

12 Days of College Ultimate 2025: Pat’s Magic 8 Ball

A Deep Dive Into the 2025 U24 Team Canada Rosters

FAQ About The USA Ultimate College Rankings And Algorithm

Deep Look: Easterns, NW Challenge, Assessing The Algorithm, AUDL

12 Days of College Ultimate 2025: 9 Traits of the Perfect College Build-A-Player

Comments on "Ranking Ultimate Teams With the Elo Rating Algorithm"

Recent Posts

Find us on Twitter

Recent Comments

Find us on Facebook

Subscriber Exclusives

Subscriber article

Video for standard subscribers

Subscriber podcast

podcast with bonus segment