Nationals Probabilities: Every Team’s Chances To Win A Title

A new probabilistic method for ranking, allocating bids, and predicting outcomes.

This article was written by guest author Thomas Murray. Ultiworld’s coverage of the 2016 Club Championships is presented by Spin Ultimate; all opinions are those of the authors. Please support the brands that make Ultiworld possible and shop at Spin Ultimate!

The purpose of this article is to outline an alternative method for ranking ultimate teams, allocating National Championship bids to the regions, and predicting single-game and tournament outcomes. At the end of this article, I report predictive probabilities derived from this method for each team’s pool placement and tournament finish at the National Championships this week. I will keep technical details in this article to a minimum.1

Like the current algorithm, the proposed method does the following:

  • accounts for strength of schedule
  • rewards teams for larger margins of victory
  • decays game results over time, i.e., down-weights older results

Unlike the current algorithm, there is never a need to ignore games between teams with hugely differing strengths. Recall, the current algorithm ignores blow-out games between teams with rating differentials of 600 or more, but only if the winning team has at least five other results that are not being ignored. Moreover, the proposed method facilitates the following:

  • down-weighting shortened games, e.g., 10-7
  • probabilistic bid allocation
  • prediction

The proposed method is a hybrid model that I’ve developed specifically for ultimate, which alleviates the known flaws of win/loss and point-scoring models that were considered by Wally Kwong last year here on Ultiworld.2

I think the current algorithm and bid allocation scheme are pretty darn good, and overall these tools/procedures have been beneficial for fostering a more competitive and exciting sport. Frankly, we are splitting hairs at this point with respect to ranking, and even bid allocation. The current algorithm doesn’t facilitate prediction, however. I am doing this because I enjoy this sort of thing, and minor improvements for ranking and bid allocation can still be worthwhile. The predictions are interesting in their own right.

I plan to maintain current rankings, and nationals predictions for all divisions in Club and College on my website.

Thank you Nate Paymer for making the USA Ultimate results data publicly available at www.ultimaterankings.net. This was an invaluable resource.

The Method

The key idea is that a win is split between the competing teams based on the score of the game. This is what I call the “win fraction.” In a win/loss model, the winning team gets a full win, or a win fraction of 1.0, and the losing team gets a total loss, or a win fraction of 0.0.

In the proposed method, the winning team gets a win fraction that depends on the proportion of goals that they scored and the losing team gets the remainder. The win fraction for the winning team is depicted in the figure below as a function of their proportion of the total points scored in the match.

win-share

Note that there is a diminishing marginal return for larger margins of victory, just like the current algorithm, and in line with our subjective notions.3

Using the win fractions, one can estimate a strength parameter for each team, just like a win/loss model. A point-scoring model also works similarly, but each team is attributed multiple “wins” each game equal to the number of points they scored. The major flaw of a point-scoring model is the marginal return for larger margins of victory is constant, rather than diminishing. The major flaw of a win/loss model is that it doesn’t use all the available information, and tends to perform poorly with small numbers of results like we deal with in ultimate. In particular, win/loss models vastly overrate undefeated teams that played an obviously weak schedule. The hybrid model alleviates both of these issues.

I also assign a weight to each match based on when it was played relative to the most recent week, and the winning score. I down-weight matches by 5% each week, and down-weight shortened matches in proportion to the amount of information they contain relative to a match played to 13.4

Rankings

As is the case typically, although I cannot simply write down the posterior distribution in a tidy equation, I can draw lots of random samples from the posterior distribution and use these samples to learn about the team strength parameters. In particular, each sample from the posterior distribution corresponds to a ranking of the teams.

The actual rankings I report reflect each team’s average rank across all the posterior samples. For example, if Brute Squad is ranked 1st in 40% of the samples, and 2nd in 60% of the samples, then their average rank is 1.8 = 1*(0.4) + 2*(0.6). In contrast, if Seattle Riot is ranked 1st in 60% of the samples, and 2nd in 40%, then their average rank is 1.4 = 1*(0.6) + 2*(0.4). In this case, Seattle Riot would be ranked ahead of Brute Squad. Bayesian methods, sometimes called Monte Carlo methods, are quite popular, in part due to Nate Silver over at 538, and huge advancements in computational power and statistical methodology during past 25 years or so.

Probabilistic Bid Allocation

The current rank-order allocation scheme could still be used based on the rankings. However, this new method naturally facilitates an alternative probabilistic bid allocation scheme. To do this, I calculate the probability that a particular team is in the top 16 (or top 20) using the posterior samples.

By summing these probabilities for all the teams in a particular region, I calculate expected number of top 16 teams in that region, which I call the bid score. To allocate the 8 (or 10) wildcard bids, I sequentially assign a bid to the region with the highest bid score, subtracting 1 from the corresponding region’s bid score each time a bid is awarded. In this way, the bids are allocated to reflect regional strength, accounting for uncertainty in the actual rankings.

Prediction

Because the proposed method relies on the win fraction, which is derived from a working assumption between the probability that a team scores on a particular point and the probability they win the game, I can invert the strength parameters of the two teams into the average probability that one team scores on the other in a particular point, and then simulate a match between the two teams using the resulting point scoring probability.

Extrapolating, I can simulate the entire National Championships in each division. In particular, I can sample one set of strength parameters for the 16 teams competing at nationals, simulate the tournament given these values, and record each team’s finish. Iterating this process, I can calculate the posterior probability of each teams pool placement and finishing place, i.e., 1st, 2nd, semis, and quarters.

Results

Below, I report the top 25 teams in each division at the end of the regular season, along with bid allocation under the two schemes. I then report the predictive probabilities of each team’s placement in their pool, and Nationals finish at this week’s tournament.

Men’s: End Of Regular Season Rankings & Bid Allocation

TeamRankMean Rank (90% CrI)Mean Strength Parameter (90% CrI)
Ironside13.66 (1,10)7.92 (5.71,10.14)
Truck Stop24.11 (1,10)7.77 (5.63,9.93)
Revolver36.66 (1,17)7.44 (5.08,9.84)
GOAT47.87 (1,19)7.23 (4.93,9.58)
Johnny Bravo58.09 (2,17)7.13 (4.97,9.30)
Chicago Machine69.06 (2,18)6.99 (4.86,9.13)
Madison Club79.83 (2,19)6.90 (4.77,9.06)
H.I.P810.27 (1,23)6.98 (4.54,9.58)
PoNY910.35 (3,20)6.84 (4.67,9.02)
Doublewide1011.36 (3,22)6.74 (4.50,8.98)
Sockeye1111.81 (3,23)6.70 (4.42,9.00)
Florida United1213.48 (3,26)6.50 (4.20,8.80)
Rhino1314.39 (6,24)6.38 (4.25,8.50)
HIGH FIVE1415.00 (6,25)6.31 (4.18,8.45)
Patrol1515.98 (7,26)6.21 (4.05,8.35)
Prairie Fire1616.30 (6,27)6.18 (3.96,8.39)
Sub Zero1717.63 (8,27)6.01 (3.89,8.13)
Ring of Fire1818.05 (7,30)5.99 (3.73,8.25)
Chain Lightning1918.08 (5,31)5.98 (3.65,8.32)
SoCal Condors2019.60 (3,35)5.80 (3.34,8.45)
Guerrilla2120.35 (10,30)5.69 (3.60,7.79)
Dig2225.32 (12,39)5.13 (2.95,7.34)
Furious George2327.10 (15,40)4.96 (2.80,7.13)
Inception2427.45 (18,39)4.95 (2.86,7.03)
Richmond Floodwall2527.83 (1,58)5.15 (1.80,9.12)
RegionRank-OrderProbabilisticBid Score
GL221.55
MA222.00
NC221.92
NE333.10
NW211.49
SC322.51
SE121.61
SW121.54

Mixed: End Of Regular Season Rankings & Bid Allocation

TeamRankMean Rank (90% CrI)Mean Strength Parameter (90% CrI)
AMP12.80 (1,7)6.59 (4.45,8.79)
Slow White23.50 (1,9)6.41 (4.20,8.66)
Drag’n Thrust35.91 (1,14)5.86 (3.68,8.06)
Seattle Mixtape46.37 (1,15)5.80 (3.60,8.04)
Steamboat58.81 (2,20)5.42 (3.24,7.65)
The Chad Larson Exp.69.77 (3,19)5.24 (3.18,7.31)
Metro North711.87 (4,23)5.01 (2.94,7.08)
Mischief812.37 (2,29)5.05 (2.73,7.40)
Alloy913.88 (3,30)4.86 (2.62,7.12)
Love Tractor1014.37 (6,26)4.74 (2.69,6.80)
shame.1115.22 (1,46)5.24 (2.12,8.88)
NOISE1215.96 (5,30)4.60 (2.52,6.69)
Cosa Nostra1316.54 (4,36)4.64 (2.35,6.96)
Polar Bears1417.30 (5,36)4.55 (2.32,6.78)
Bucket1518.18 (4,39)4.49 (2.21,6.83)
Wild Card1618.48 (6,37)4.43 (2.26,6.59)
BFG1719.12 (4,41)4.41 (2.12,6.76)
Bang!1819.58 (5,40)4.34 (2.13,6.58)
UPA1920.95 (3,53)4.41 (1.79,7.12)
Ambiguous Grey2022.06 (9,40)4.11 (2.05,6.18)
Blackbird2126.87 (12,48)3.80 (1.72,5.86)
Birdfruit2230.13 (12,57)3.63 (1.46,5.79)
7 Figures2330.64 (11,60)3.61 (1.37,5.85)
Charlotte Storm2430.84 (10,59)3.58 (1.37,5.83)
Dorado2531.73 (10,64)3.57 (1.26,5.90)
RegionRank-OrderProbabilisticBid Score
GL110.69
MA221.54
NC110.71
NE233.33
NW333.10
SC221.77
SE211.39
SW332.95

Women’s: End Of Regular Season Rankings & Bid Allocation

TeamRankMean Rank (90% CrI)Mean Strength Parameter (90% CrI)
Seattle Riot11.95 (1,4)10.68 (7.52,13.90)
Brute Squad22.56 (1,5)10.38 (7.23,13.59)
Molly Brown33.90 (1,7)9.81 (6.69,12.98)
Fury44.01 (1,7)9.77 (6.64,12.93)
Scandal54.86 (2,8)9.46 (6.34,12.62)
Traffic66.34 (3,10)8.93 (5.80,12.10)
6ixers78.31 (1,18)8.49 (4.88,12.42)
Nightlock89.61 (6,15)7.91 (4.85,11.02)
Phoenix910.34 (7,16)7.71 (4.69,10.77)
Wildfire1010.83 (5,19)7.69 (4.42,11.01)
Heist1112.70 (8,19)7.21 (4.23,10.23)
Showdown1213.09 (7,21)7.16 (4.01,10.35)
Underground1313.36 (8,20)7.10 (4.05,10.19)
Ozone1413.87 (8,20)6.97 (3.98,9.98)
Green Means Go1515.82 (10,21)6.61 (3.63,9.60)
Rival1616.65 (9,23)6.43 (3.35,9.56)
BENT1716.95 (10,23)6.39 (3.35,9.47)
Siege1817.06 (10,23)6.35 (3.28,9.46)
Iris1917.51 (10,24)6.27 (3.15,9.41)
Schwa2017.68 (12,23)6.27 (3.28,9.29)
Nemesis2118.42 (12,24)6.11 (3.08,9.16)
Stella2224.83 (19,31)4.57 (1.44,7.70)
Hot Metal2325.28 (21,31)4.45 (1.44,7.49)
Colorado Small Batch2425.44 (20,32)4.39 (1.29,7.50)
Pop2525.96 (21,32)4.24 (1.23,7.26)
RegionRank-OrderProbabilisticBid Score
GL110.73
MA211.56
NC110.87
NE233.15
NW333.14
SC221.77
SE221.72
SW332.95

Men’s Probabilities

Pool A

IronsidePoNYPrairie FireRing of Fire
157%18%13%11%
224%29%24%23%
312%28%30%30%
46%25%34%30%

Pool B

RevolverSockeyePatrolDoublewide
142%28%12%18%
228%28%19%25%
318%24%29%29%
412%20%40%28%

Pool C

Truck StopMadison ClubHIGH FIVEDig
143%30%17%9%
230%29%24%16%
317%24%30%28%
49%16%29%46%

Pool D

Johnny BravoChicago MachineH.I.PFurious George
134%29%28%8%
230%29%26%15%
322%25%25%28%
413%17%21%48%

Championship Bracket

1st2ndSemisQuarters
Ironside26%14%17%26%
Truck Stop16%12%18%28%
Revolver13%9%21%28%
Johnny Bravo9%9%21%29%
Madison Club7%8%14%31%
Chicago Machine6%8%16%30%
Sockeye6%7%15%28%
H.I.P5%6%16%28%
PoNY4%6%11%26%
HIGH FIVE2%4%10%26%
Doublewide2%4%9%26%
Prairie Fire2%4%8%22%
Ring of Fire1%3%7%20%
Patrol1%3%6%20%
Dig1%2%5%18%
Furious George0%2%5%17%

Mixed Probabilities

Pool A

AMPMetro NorthAmbiguous GreyBlackbird
160%22%10%9%
225%33%21%21%
310%26%31%33%
45%20%38%37%

Pool B

Slow WhiteAlloyNOISEPublic Enemy
153%23%17%7%
227%30%28%15%
314%28%31%28%
46%19%24%51%

Pool C

Drag’n ThrustSteamboatLove TractorNo Touching!
142%35%20%3%
232%31%29%9%
319%24%33%23%
47%10%18%65%

Pool D

Seattle MixtapeMischiefshame.G-Unit
126%24%48%2%
235%33%25%7%
331%32%19%19%
49%11%8%72%

Championship Bracket

1st2ndSemisQuarters
AMP22%16%17%27%
Drag’n Thrust14%12%16%35%
Slow White14%10%26%32%
shame.14%9%26%27%
Seattle Mixtape9%10%20%31%
Steamboat9%9%14%34%
Mischief7%8%18%30%
Metro North3%7%11%28%
Alloy3%6%13%30%
Love Tractor2%5%11%32%
NOISE2%4%10%29%
Ambiguous Grey1%2%6%19%
Blackbird0%1%5%18%
Public Enemy0%1%3%14%
No Touching!0%0%1%10%
G-Unit0%0%1%7%

Women’s Probabilities

Pool A

Seattle RiotNightlockIrisHeist
186%7%3%3%
212%47%18%24%
31%25%28%46%
41%21%51%26%

Pool B

Brute SquadWildfireShowdownRival
174%15%7%5%
219%35%26%20%
36%29%33%32%
42%22%35%42%

Pool C

Molly BrownTrafficPhoenixGreen Means Go
153%34%10%4%
231%38%21%11%
312%20%37%31%
44%9%32%55%

Pool D

FuryScandalOzoneSchwa
147%44%7%2%
239%38%16%7%
311%14%45%30%
43%4%32%61%

Championship Bracket

Team1st2ndSemisQuarters
Seattle Riot38%21%17%21%
Molly Brown17%17%18%36%
Brute Squad13%9%45%26%
Fury12%13%29%33%
Scandal8%12%27%38%
Traffic7%12%15%40%
Wildfire1%4%11%29%
Nightlock1%3%8%28%
Phoenix1%2%6%29%
Ozone0%2%7%28%
Showdown0%1%5%21%
Heist0%1%4%18%
Rival0%1%3%15%
Iris0%1%3%13%
Green Means Go0%1%2%16%
Schwa0%0%1%11%

  1. If you contact me at 8tmurray at gmail dot com, I would be happy to send you a PDF with the bare bones technical details. I will soon be submitting a manuscript to a statistical journal with these details and an objective evaluation of various methods for ranking ultimate teams. 

  2. The first hybrid model was proposed by Annis and Craig (2005) in the paper, “Hybrid Paired Comparison Analysis, with Applications to the Ranking of College Football Teams,” and later simplified by Annis (2007) in the paper, “Dimension Reduction for Hybrid Paired Comparison Models.” These papers discuss the flaws engendered by win/loss and point-scoring models, and I point of some of these flaws below. 

  3. I derived the win fraction through a working assumption about the point-scoring process. Namely, the win fraction reflects the % of games that a team would win against their opponent if their probability of scoring on each point is equal to p, and the game was played hard to 13. The observed win fraction in a particular game is calculated by plugging in for p the observed proportion of points that the team scored. This definition for the win fraction is a subjective choice, but I believe it is reasonable, and it results in a parsimonious and useful method. 

  4. The above weights and win fractions for each match lead to an objective function, called the likelihood. I take a Bayesian approach, so I also specify the same weakly informative prior distribution for each team’s strength parameter. Doing so ensures the rankings are fair and dominated by the results from the season. Together the likelihood and prior results in a posterior distribution that tells me the likely values for the team strength parameters, and thus the likely rankings. 

More from Ultiworld
Comments on "Nationals Probabilities: Every Team’s Chances To Win A Title"

Ultiworld comments were powered by our forums between 7/10/2016 and 1/16/2017. Learn more about how they work here.

Find us on Twitter

Recent Comments

Find us on Facebook

Subscriber Exclusives

  • Better Box Score Metrics: WUL Week 3 EDGE
    Subscriber article
  • Out the Back: Ultimate Potpourri
    Subscriber podcast
  • Huckin’ Eh: Conferences and C4UC Recap, Interviews with Prime Pandas and Ninjax
    podcast with bonus segment
  • Huckin’ Eh Subscriber Bonus: Jean-Lévy talks on Building an Ultimate Program
    Subscriber podcast