A New Rankings Approach for Ultimate: Ideas From College Hockey

The USA Ultimate algorithm isn't perfect; here's another way.

Among other sports, I’m a fan of college hockey. And, surprisingly, college hockey shares many of the same issues as ultimate when determining postseason bids: each conference receives one automatic bid, and mathematical formulas1 are used to determine the other 12 teams.2

College hockey also suffers from one of the main problems that plagues ultimate algorithms: a lack of interconnectivity between teams. So I decided to look at the formulas and ranking system adopted in NCAA hockey and seeing if they can be used to improve ultimate rankings.


One of the college hockey formulas, the Bradley-Terry system, is based on a stipulation that the probability of a team with rating A defeating a team with rating B is given by A / (A + B).

For example, NationalsTeam has a rating of 4.2. SectionalsTeam has a rating of 0.8. The likelihood of NationalsTeam beating SectionalsTeam is 4.2 / (4.2 + 0.8) = 0.84, or 84%.

But how do we figure out a rating for a single team? Let’s say we have two teams with “known” ratings: RegionalsTeam rated 1.7 and SectionalsTeam rated 0.8. We are trying to figure out the rating for UnknownTeam, which has a 2-3 record against RegionalsTeam and a 2-1 record against SectionalsTeam. We’ll guess an initial rating of 1.0 for UnknownTeam. Using this rating with the Bradley-Terry equation, in the 5 games against RegionalsTeam and 3 games against SectionalsTeam we would expect a win total of:

5 * 1.0 / (1.0 + 1.7) + 3 * 1.0 / (1.0 + 0.8)

which comes out to 3.52 games.

The fact that, in reality, UnknownTeam won a total of 4 games (instead of 3.52) means the rating for UnknownTeam needs to be adjusted up a little bit.3 Using the new adjusted rating, we can again calculate the expected wins of UnknownTeam, and then adjust the rating again.

This enables us to calculate the rating for a single team, but we aren’t adjusting ratings for RegionalsTeam and SectionalsTeam — in the example above, those ratings are held constant. To simultaneously generate ratings for all teams playing during the regular season, I followed this process:

  1. Give every team an initial estimated rating of 1.0.
  2. Using current estimated team ratings, calculate adjustments for each team using their win-loss record.4
  3. Apply the calculated ratings adjustments for each team to generate new estimated ratings.
  4. Repeat steps 2-3 until the maximum adjustment is tiny5

My initial attempt generated rankings which made no sense: undefeated teams were rewarded way too much, even when using methods to account for this weakness.

However, ultimate has an aspect that can be leveraged to generate a larger data set. Each point of a game can be considered its own “mini-game.” So if a game finished with a score of 15-8, this would count as 23 “mini-games.” Now we’ve got a far larger data set to evaluate. A team’s rating under this system is no longer related to the likelihood of winning an entire ultimate game, but the likelihood of them winning any given point in a game.

2015 College

Using this “per-point” method, the 2015 USAU college regular season6 gives the following ratings (Men’s on left, Women’s on right):

Bradley-Terry Rankings for 2015 College ultimate.

Using the above ratings to determine bids to 2015 College Nationals would result in fairly similar results to the actual allotted bids under the USAU algorithm.

On the Men’s side, the sole difference is the strength bid awarded to Maryland and the Atlantic Coast region under the current USAU algorithm is instead awarded to Minnesota and the North Central region. Cincinnati would still have earned its strength bid for the Ohio Valley region.

On the Women’s side, one of the strength bids awarded by the current USAU algorithm to the South Central region is shifted to the Northeast region via Northeastern. The question of which South Central team “lost” the bid is up for debate; in the USAU rankings, Colorado College is ranked above Kansas, while the opposite is true if the above rankings are used.

2015 Club

Just for kicks, I’ve included the ratings using the above algorithm as applied to the 2015 USAU club season (using scores posted as of August 24, 2015). The bids would be as follows.


1 bid: Northeast
2 bids: Great Lakes, Mid-Atlantic, Northwest, South Central, Southeast, and Southwest
3 bids: North Central

Compared to the actual bids assigned, the Bradley-Terry system would result in the following bid “changes”: NE: -1; SW +1

TeamRatingStrength of Schedule
Truck Stop2.8572.503
High Five2.7992.512
Johnny Bravo2.7602.418
Madison Club2.5561.973
Ring of Fire2.4442.504
Florida United2.4412.177
Sub Zero2.3692.221
Prairie Fire2.3362.023
Chain Lightning2.1561.980
Galaxy Swag Universe1.9231.490


1 bid: Great Lakes, Mid-Atlantic, and North Central
2 bids: South Central, Southeast, and Southwest
3 bids: Northwest
4 bids: Northeast

Compared to the actual bids assigned, the Bradley-Terry system results in no bid changes.

TeamRatingStrength of Schedule
Brute Squad4.2632.143
Molly Brown3.4142.118
Hot Metal1.4590.894
Green Means Go1.4381.427
Colorado Small Batch1.0340.582


1 bid: Great Lakes, Mid-Atlantic, and Southeast
2 bids: Northwest and South Central
3 bids: North Central, Northeast, and Southwest 7.

Compared to the actual bids assigned by the USA Ultimate algorithm, the Bradley-Terry system would result in the following bid “changes”: MA: -1, NC +1, NE -1, SW +1

TeamRatingStrength of Schedule
The Chad Larson Experience3.4242.071
Slow White2.8571.939
Wild Card2.8431.861
Drag’n Thrust2.7801.768
Seattle Mixtape2.6461.975
Polar Bears2.5902.113
Love Tractor2.1641.424
Metro North2.1631.393
Cosa Nostra2.1262.115
Mental Toss Flycoons2.0261.483
American BBQ1.9141.721
Minneapolis Millers1.8941.404
The Muffin Men1.8831.110
Ambiguous Grey1.8731.306

Other Advantages to the Bradley-Terry System

Several related values can be calculated from the ratings, such as strength of schedule. Also, the probability of a winning any given game can be calculated from the probability of winning a single point8.

Expected point scores can also be calculated, along with expected point spreads. For example, Fury (rating 4.380) and Molly Brown (rating 3.414) would have the following probabilities of outcome:

Fury ScoreMolly Brown ScoreProbability
Up to 111510.98%
141412.03% (overtime)
15Up to 717.99%

Under this model, Fury is expected to win 75.7% of its games with Molly Brown, with an expected margin of victory of 3.1.

Any model will have caveats. This one is no different. Each point is assumed to be played identically. There’s no consideration of offensive and defensive lines, team depth, or cap limits. If only considering effects on winning an overall game, trading points and offensive/defensive lines will serve to reduce variance and decrease the probability of a potential upset.

Capping a game and reducing its length will increase the probability of an upset.9 Of course, many of these limitations apply to the USAU algorithm as well.

The main advantage this system has over the USAU algorithm is a team rating conveys additional information about the expected outcome of a game. As shown above, win percentage and point spreads can be calculated. Also, the USAU algorithm has been shown to provide a disproportionate benefit to defeating a similarly rated team. Under the Bradley-Terry model, playing a close game against a similarly rated team will not greatly affect either team’s rating.10

If you are interested in an even more technical discussion, here is the full research paper.

Stephen Wang was immensely helpful in discussions I had with him. He suggested that I treat each point as a “mini-game”, and also steered me towards using numerical methods rather than trying to invert an enormous matrix.

  1. There are several unofficial rankings algorithms in use for college hockey, but I took one (the Bradley-Terry system) that I find the most interesting and applied it to ultimate. 

  2. See this writeup on the NCAA website for more information about the college hockey selection process. 

  3. There’s several ways to determine exactly how much it should be adjusted, but I used Newton’s method for ease. 

  4. In calculating the ratings adjustment for Newton’s method, I only considered the derivative of a team’s rating and win total with respect to its own rating, and ignored the differential effect of adjusting opposing teams’ ratings. 

  5. less than 0.0000001 for any team in the ratings. 

  6. Data was scraped off the USAU website, so some data may be included / excluded that was not used in the “official” USAU calculation. 

  7. Excluding Union and Bessarion for the NE — including them would give the NE 4 bids and the NW 1 bid 

  8. using a modified binomial distribution 

  9. The details drawn from the model are likely skewed as well. The probability of a blowout win is likely overstated. Teams may have a more open rotation in such a scenario in order to rest starting players. Team depth would become more of a factor, and explicitly accounting for team depth is not done in this model. 

  10. I haven’t done a full analysis on the effects of playing a higher/lower rated opponent on a team’s rating, but I don’t think it will affect it because, unlike the USAU system, the winner does not get an automatic +125 points. 

  1. Wally Kwong

    Wally Kwong is a long-time ultimate player and one of the country's most experienced observers. He is an engineer by day. Follow him on Twitter at @Wally_Kwong.

TAGGED: , , , ,

More from Ultiworld
Comments on "A New Rankings Approach for Ultimate: Ideas From College Hockey"

Find us on Twitter

Recent Comments

Find us on Facebook

Subscriber Exclusives

  • Inside the Circle LIVE: European Open Round 2/3 Rapid Reax
    Subscriber podcast
  • SALT vs. Drag’n Rust (Masters Mixed Pool Play)
    Video for standard subscribers
  • regret. vs Crown (Masters Mixed Pool Play)
    Video for standard subscribers
  • PDXtra vs. SPENT (Masters Women’s Pool Play)
    Video for standard subscribers