A batch of incorrectly excluded UCSB games swung the bid to the AC.
April 11, 2017 by Cody Mills in Analysis, News with 4 comments
If you thought the bid drama for 2017 was over, think again. The end of the regular season and the Nationals bid allocation that arrives with it is never short on drama. In 2015, it was the Cincinnati men forfeiting their way to a bid while the Southwest was shut out again, in painful fashion, with teams at #21 and #22. 2016 brought the dark-horse second bid to the Ohio Valley courtesy of Ohio State’s dominance in consolation of a sub-elite tournament. This year, of course, brought us the BYU story. But we aren’t done yet!
The Northwest Women’s region has lost their third bid to the Atlantic Coast (which moves to three bids) after a discrepancy in the games included in the rankings was corrected earlier today. Georgetown moved ahead of Whitman by the slimmest of margins for the final spot inside the bid cutoff.
As you may have read, I have been projecting the USAU Top 20 algorithm rankings for Ultiworld during the regular season to give readers an advanced look at what each Wednesday’s results might look like. These projections are often quite accurate, hampered only by the mystery behind which games USAU has invalidated for eligibility purposes, the incompleteness/poor data of score reporter, and a degree of ambiguity of which “blowout” wins should be excluded and which should be counted for a team with limited results.
USAU has done a decent job of keeping the public up to date with their invalidated games, posting the results on an occasionally-updated Google Doc. But, due to how infrequently this document is updated, I am often left to reverse-engineer which games are being counted. I do this by scraping the last official USAU rankings posting and programmatically comparing the listed records and ratings for each team with my own projections. From there, a bit of inspection and iteration can lead to discoveries about the root causes of differences in ranking. For example, earlier this season (before the invalidations were posted), I was able to see that Lewis & Clark’s results Palouse Open and Tennessee’s results Queen City Tune Up were not being counted in the rankings. Further, though the games have since been reinstated, I was also able to see that the DIII division of Tally Classic was not being counted. These late-disclosures are not a big deal; they’re a little annoying but they simply reflect an information asymmetry. However, some of the anomalies and differences I discovered have been more troublesome.
Perhaps the most striking example was the fact that for the first few iterations of the official rankings, USAU was counting the results of (and ranking!) Gunn High School. To be fair, Gunn did participate in several college tournaments, but they are listed on Score Reporter as a High School Boys’ team. Additionally, there were also instances where results from tournaments reported as 1-0 games were apparently being counted by USAU (notably from Big D in Little D). While this is more of an issue on the TD’s end than USAU’s, it still speaks to a lack of rigor in the algorithm’s data quality filtering. All of this is to say that USAU’s ranking process is far from perfect1.
In the process of preparing a final-eligibility-updated, game-by-game breakdown of teams and ratings to be posted on my hobby ranking website, I noticed a five-game anomaly in my USAU women’s rankings. Though my algorithm converged extremely close to the USAU rating and ordering, there were a few important differences: 1) Several teams had a game being counted by me that was not being counted by USAU, 2) UCSB had five games being counted by me that were not being counted by USAU, and 3) the final bid was now going to Georgetown, by a two point margin instead of Whitman by three. I looked at the game anomaly and realized that all of the discrepancies would be resolved if UCSB’s games from the Stanford Open were invalidated. However, the USAU Final Exclusions (or Preliminary Exclusions) list made no mention of these games being ineligible. All the same, I did the sensible thing and removed UCSB at Stanford Open, re-ran the rankings, and my resulting ranking set was virtually spot on the published rankings, including restoring Whitman’s position and bid.
The evidence was enough to make Ultiworld reach out to USAU concerning the difference. My analysis strongly suggested that there was an error in USAU’s data. If the games were truly supposed to be ineligible, then USAU erred by not posting them as such. If the games were supposed to count, then USAU ran their rankings with invalid data, calling into question the accuracy of their past published rankings– especially those where the margins have been tight. Recall the 2015 men’s club season, when Kansas City Prairie Fire won the last bid over Santa Barbara Condors by less than 1 rating point, or the previous season when a single-goal score correction flipped a bid from the SW (San Diego Streetgang) to the NW (Portland Rhino), or the 2013 College season where Stanford took the last bid over Texas A&M by one point. I’m sure the people on the losing end of those cutoffs wish they could have checked USAU’s results. These close calls are now shrouded by even more of a second-guessing specter than previously thought.
This problem should be solved by increasing the transparency behind the algorithm results2. Given the prominent implications of possible oversight, USAU should give the public view access to their ranking code, as well as the raw data being inputted. They should also work to more rigorously define their algorithm, beyond the high-level explanation given on the rankings page. At the very least they should take these steps for the preliminary ranking period to make the process more reviewable.
though neither are my projections ↩
the above-mentioned ambiguity behind blowouts, a lack of a rigorous definition of ranking “convergence”, the inconsistency behind publish invalidated games and data quality filtering ↩