A potential change for next year?
April 14, 2017 by Cody Mills in Analysis with 7 comments
Since USA Ultimate shifted its bid allocation process to a method centered around the USAU “Top 20” algorithm in 2011, the bid allocation process and the algorithm behind it have been the subject of much scrutiny. A strong motif that has emerged from the din is a call for a more probability-focused twist on the bid allocation process.
Despite the controversy, the USAU algorithm is generally competent at evaluating the relative strength of teams using only their box score results (not a trivial feat). However, while one generally trusts the Top 20 algorithm to place a team in the correct general range, the idea that it can granularly differentiate the 20th best team from the 21st best team, given the relatively small sample size of games and the lack of connectivity between teams, is unrealistic at best. Since this is an accepted shortfall of the algorithm, the fact that USAU draws a “hard” cutoff between the 20th and 21st teams in terms of bid allocation is counterintuitive. There is a degree of uncertainty in the results of the algorithm.
When players and coaches have called for the introduction of a probabilistic element to the rankings in the past, one interpretation has been to try to distribute bids based on the collective probability that a region’s teams would have a team ranked in the top 20, rather than the absolute one-off assessment of whether a single team has exceeded the cutoff.
To use an example, if the North Central had one team ranked at #20 (or whatever the cutoff point is for the division/season), and another region — say, the Southwest1 — had teams ranked at #21 and #22, the bid distribution would likely differ between the algorithms (for completeness, let’s also say that both regions already have a similarly ranked team well inside the cutoff and earning an autobid). Given the aptitude of the USAU algorithm, it is more likely than not that the team at #20 is better than the teams at #21 and #22. However, collectively, the probability that the Southwest, with their two teams in the top 22 has a “true” Top 20 team among them may be higher than the probability that the team from the North Central is a top 20 team. Near the bid cutoff, a probabilistic algorithm would reward depth to a greater degree than the current system.
A Probabilistic Modification To Enhance The Current Allocation
While there are many choices and methods for a probabilistic algorithm, this article will focus on a straightforward process proposed by Scott Dunham. The proposed algorithm takes the USAU algorithm’s game ratings as its inputs, outputs a probability for each team being a “true” top 20 team, and end with the bids being allocated to regions via a similar mechanic to the current system.
Under the USAU algorithm, a given team’s ranking is, at a fundamental level, the average of that team’s individual game ratings. There are caveats, of course: the games are weighted differently based on when they occurred during the season, and under certain conditions a “blowout” win is discarded from the final average. However, the basic idea is that the rating is an average.
The probabilistic transformation that we implemented is consistent with the idea of a team rating as an average of individual scores, but we try to take things a step further by accounting for variability of a team’s rating. There is a limited sample size for each team being rated, and, because of this, the “average” that is calculated for each team may not necessarily reflect the “true” rating of a team. However, based on the number of games and the variability of their ratings, we can define a range of rankings that most likely contain the team’s true ranking.2
After we defined the distributions for the teams’ rankings, we ran 1000 simulations of the final ranking set, drawing a random sample from each team’s distribution and then ordering them by the sampled ranking (and noting which teams were in the top 20). At the end of the simulations, we had, for each a team, a fraction of the times that the team appeared in the top 20 (and therefore earned a bid). We added together the fractions for the teams in each region, subtracted 1 from each region’s total (to represent the auto-bid), and then assigned the strength bids to the region with the next-highest fractions (subtracting 1 from the region as it earned a strength bid).
The results might surprise you…
D-I Men’s Results
Since the Southeast Men’s region featured teams #21 Central Florida and #22 Auburn, you might think that the region would earn a bid under this new system. However, the probabilistic algorithm did not change this year’s bid distribution34. Given this year’s data, though, this seems logical: there is no team in the top 20 with the bare minimum game requirement (a la the Brown, Northwestern, and Cincinnati teams from 2014 and 2015). A high number of games reduces the uncertainty of the ranking average, meaning the sample ranking should stay relatively consistent. Further, there is also a 75 point drop between #19 Colorado State and #20 Purdue. Given that most teams near the bid cutoff have uncertainties in the range of 50-70 ranking points, this effectively insulates the top 19 from leaving the top 20 (and, of course, the Metro East’s auto bid precludes the 20th bid being distributed. In a season with a smaller gap between the borderline teams — a spot where the top 20 algorithm is less trustworthy — the depth of the Southeast might have won it an additional bid.
D-I Women’s Results
The probabilistic algorithm also delivers the same bid allocation now that Georgetown slid in front of Whitman to pass a bid from the Northwest to the Atlantic Coast. Even if USA Ultimate hadn’t added the missing UCSB games that swung the bid, the probabilistic approach would have moved the bid to the Atlantic Coast. Whitman sat right on the border of the cutoff, with relatively little point cushion below them; they were done in by the combined strength of Georgetown, Delaware, and UNC-Wilmington, with the latter two teams’ top-30 finishes proving meaningful (note that they would have made no difference at all in the current process). This is an apt demonstration of a key element of the probabilistic system: in a 1v1 context vs. Georgetown, Whitman might seem like the more “worthy” team (thanks in part to their strength of schedule), but the depth of the Atlantic Coast near the cutoff (with five teams in spots 19-34) aggregates enough probability that the likelihood that their region has a third top-20 team in it is greater than the probability that Whitman (or any other team in the Northwest) is a top-20 team.
Pros Of The Probabilistic System
– The probabilistic bid system would likely result in more competitive regional tournaments, where regions with groups of teams that collectively earned strength bids battle to take it (as opposed to top-heavy regions earning exactly enough bids for their top teams and thin regions earning no bids besides the one for the clear top team). It’s also harder for regions to earn a large number of bids since the probability that additional teams are top-20 begins to decrease.
– There is reduced incentive for teams to game the system. Under the current system, if a team well inside the cutoff faces a team from their region who is a bubble team, the higher-ranked team is incentivized to lose to get the lower team over the line (particularly late in the season). In the same way, a team ranked decently high but out of reasonable range of earning a bid is incentivized to allow themselves to be blown out by a borderline team from their region. These instances occur every year: in 2015, it was in Cincinnati’s best interest to allow themselves to be blown out by Ohio State in consolation at Huck Finn; in 2016, it was in Stanford’s best interest to take a big loss to Cal Poly in consolation at Easterns; and in 2011, Harvard had no motive to try to beat Tufts in their last game of the regular season, and every incentive to lose5
Cons of the Probabilistic System
– An unfortunate drawback of the proposed system is that playing fewer games actually helps teams that are below the bid cutoff line. By virtue of having fewer games, their uncertainty is higher, which means the range of their potential sample values is greater, increasing their odds of being in the top 20. A possible counter to this would be to keep or raise the current 10-game minimum requirement for rankings inclusion. In this year’s data, Boston College (Women’s) has one of the largest uncertainties of any team, 155, because of the distance between their “blowout” game rating and their final rating, which actually did not count those games. The high variance in their sample values gives them more top-20 results than their rating might suggest.
Not this year, baby! ↩
We defined this range by calculating the uncertainty of the team’s rating average (the uncertainty of the mean is equal to the standard deviation of the set of game ratings, divided by the square root of the number of games). Further, the rankings in this range are not equally likely. We treated the team’s ranking as a normally distributed random variable, with the USAU ranking as the mean and the uncertainty of the ranking as the standard deviation. ↩
For the record, when we trialed this process during the club season it shifted a women’s bid from MA (#15 Green Means Go 1578) to NE (#16 BENT 1563; #17 Siege 1561; #18 Iris 1560) and a mixed bid from SC
(#14 Cosa Nostra 1662) to NW (#4 Mixtape 1927; #17 BFG 1635) ↩
note that though all these results did ended up in favor of the eventual bid-earning team, I’m not accusing any team of throwing a game. Rather, I’m noting that the system was not set up such that both teams were properly incentivized to pursue a win. ↩