Ultiworld globe logo

Fix The Algorithm? Somebody Did

by in Editors' Blog with 9 Comments

Think you know how to fix USA Ultimate’s algorithm or bid allocation system? Well, Michael Silger wrote his thesis about it.

In his master’s thesis, Silger explored different methods to improve the USA Ultimate algorithm. While the reading is not for the layman, it is interesting, even to a non-expert, in some of its discussion.

If you are a stats wonk, enjoy. Silger acknowledges some limitations and thinks he can still improve his model, but he makes some corrections to known issues to provide, as he says, “a more statistically sound approach for ranking teams.”

Master’s Thesis Analyzing USA Ultimate Algorithm

About Charlie Eisenhood

Charlie Eisenhood is the editor-in-chief of Ultiworld. You can reach him by email (charlie@ultiworld.com) or on Twitter (@ceisenhood).

View all posts by Charlie Eisenhood →

  • 8v92

    A 25-page master’s thesis! That’s amazing – and he got a useful result out of it.

    Mine’s looking more like 85 pages and I’m still not sure whether what I’m doing works yet.

    • mickey silger

      I had to cut quite a bit out due to time constraints of a single semester and a lot of the analysis wouldn’t make sense to my peers and advisors in regards to judging the accuracy. I’m curious to see your approach and how it differs from mine. If you or anyone has questions feel free to contact me. Once I get some free time, I plan on adjusting the tuning parameters a little to ensure ELO’s normality assumption isn’t abused.

  • Not a Math Major

    This looks like it certainly has some merit, but it is not a quick fix to the system. For example, look at the results of Stanford this year. In the model proposed, all losses lose ranking points. This means that Stanford probably would have dropped in the rankings (and lost a bid) after going 1-4 at Easterns, even though their losses were against some of the best teams in the country.

    That being said, I think Silger made some excellent points, and actually trying out new systems is a huge step in the right direction.

    • Mitch

      Careful there, you are talking about a team “dropping” from a ranking based on one system to a lower ranking in another system. Not apples to apples at all. Perhaps in this system, Stanford’s expected strength at this point would be much higher, combined with playing teams they are predicted to lose to would result in a small drop from a higher ranking. Or maybe stanford isn’t a top 20 team. There are many many variables. My apologies for butchering any of the thesis author’s terminology.

      Kudos for the work. While I doubt it will happen, I would love to see a conference of stat geeks at USAU HQ (or a virtual meeting) with some staff members working to improve the system. We’ll never be perfect, but that shouldn’t stop us from improving (and taking bigger incremental steps than we currently are taking).

      • Taskforce

        There is a group like you describe :
        “USA Ultimate continues to look to improve its ranking algorithm through consultation with Rodney Jacobson, Sholom Simon and other volunteers on the Algorithm Taskforce.”

      • guest

        The point still stands, though. If a team wins a game vs. a crappy B-team by a score of 15-13, that should significantly lower our estimation of the team’s rating, not raise it. Alternatively, if some unknown team loses a game to Oregon 15-13, it would raise our estimation of their ability. In chess, you can’t measure anything other than wins and losses, but not so in ultimate.

  • Bessie The Cow

    S/O to Rusty

  • http://www.facebook.com/rthompson Ryan Thompson

    When looking at this last year, I implemented Glicko2 instead of Elo and we also discussed TrueSkill – mainly because those algorithms also provide more information about the distribution and are possibly more robust at low numbers of games than Elo.

  • http://www.facebook.com/ariel.jackson.5492 Ariel Jackson

    Not to rehash old arguments, but we need to be careful about treating forfeits as harshly as is done here.

    Sure, you are punishing Whitman for their forfeits, but you are also punishing the teams they played against.

    A team that loses to them (say, because they didn’t rest starters knowing in advance they would forfeit (not that I’m claiming this is what they did, but this is implied in the article)), is now doubly punished because the team they lost to was stronger than their actual team strength by not conserving energy, but is ranked lower than their actual team strength due to the forfeit penalty.

Find us on Twitter
Find us on Facebook