September 22, 2017 by Charlie Eisenhood in Interview with 22 comments
Every season, there seems to be some controversy about the final bid allocation, the algorithm, or the USA Ultimate rankings. Last college season, Ultiworld’s Cody Mills spotted some missing game data in the USA Ultimate rankings, which ended up sending a bid from the Northwest to the Atlantic Coast. This club season, there has been a lot of discussion about Florida United’s weak regular season schedule and the fact that they earned a bid despite attending no highly competitive tournaments.
We were excited, then, when the head of USAU’s Rankings Committee Stephen Wang reached out offering the chance to do a Q&A. Here is the transcript of our conversation, lightly edited for length and clarity.
Ultiworld: Can you quickly describe your position and role?
Stephen Wang: Sure. My position is that I’m the head of the Rankings Committee. And the role varies from time to time. We talk about ideas and how we can improve things. We send over some things to the Competition Committee which chews on them and sometimes accepts them and sometimes doesn’t. It’s not particularly formal. But anytime USAU runs the algorithm and something funky happens, like there’s some data issue that they don’t know how to deal with, they might have me take a look at it.
Other tweaks that they’re thinking about doing that impact the algorithm and/or how they use the algorithm they might run it by us just to see if the math makes sense or if there are any unanticipated consequences.
What are some of the most notable recent changes that have been made, whether that’s to how the algorithm is processed or how you deal with the data?
Nothing in the last season or two. There might be a change coming next year, which we’ll announce at the time that it happens. The last major change was how we dealt with time decay, but that was a while ago. We tweaked how Sectionals and Regionals get weighted when they run rankings in the middle of the Series. But that’s not that big a deal: it doesn’t affect bids.
But there’s been a lot that’s been discussed and not a lot that’s been acted upon. That’s partly due to the ad hoc nature of the committee and how things get processed.
So let’s dig into some of that stuff. There’s always chatter every season about some aspect of the rankings or the algorithm. Right now, the biggest thing is probably Florida United and the Southeast bid situation. What do you think about that?
I pay attention to the discussion. I think some of what is being taken for gospel truth is maybe a little bit overrated, in regards to teams that play not-so-strong schedules.
Scott Dunham emailed me his analysis before it got published and he asked me what I thought about it. I think something that’s not being taken into consideration very well is the fact that — his analysis was that a team that plays another team that’s rated about 300 points or so below them and what happens in those sorts of games1. But the problem is that part of the reason those teams have the rating that they do is because of that game. And it’s really hard to disentangle all of those sorts of things.
Let’s say there are four teams and they play a round robin. One team goes 3-0, another team goes 2-1, etc. And let’s say all the games are 13-10. So 13-10 is going to translate to some number, some rating differential. It doesn’t matter what that number is — let’s just call it 200. In and of itself, 200 can’t be too high or too low. It’s just a number. You have to have a number for 13-10. If you run the algorithm on that pool, what will happen is that the top team will be rated 1150, the next team will be 1050, the next team 950, and the bottom team 850. But if you look at teams that are rated 100 points apart, so now you’re looking at the post-hoc rankings, they got a rating differential of more than 100 — they got a rating differential of 200.
Does that mean we are overvaluing those victories? I would say no.
It’s a little bit tricky trying to back out all of the effects of these games and saying ‘therefore it is clearly beneficial for Team X to play Team Y,’ because you don’t know what the ratings are going to be afterwards.
Another point that I would make here is that part of this is confirmation bias. Everybody notices those teams that play a weak schedule and happen to bubble up to the bid range. But nobody notices the teams that play a weak schedule and do well but don’t bubble up to the bid range.2.
It’s hard to say exactly how strong this effect is. I don’t think it’s as strong as people are led to believe. That doesn’t mean it doesn’t exist, but I think the evidence for it is weaker than people think.
I think the problem that a lot of people have is that they look at a Florida United team that has played what most people would consider to be a single competitive game all season: against Chain Lightning, which they won 13-11. And they say, ‘How is it fair that this team gets a bid when the rest of their games are against teams that they’re smashing?’
And I think the real issue is that when you’re playing a team that’s much worse than you, the difference between winning 13-9 and 13-5 may not seem like that big of a difference because of the way you do substitutions, for example. A huge win is a huge win, and there’s obviously relative levels of that and I understand that the algorithm takes into account the margin of victory.
But people see that Florida United gets a bunch of easy wins and they run up the score and therefore they get a bid. How is that fair when Team X didn’t get a bid even though they played and beat good teams at good tournaments and didn’t just play a bunch of weak competition?
I understand that. It’s a hard problem because you have to give Florida United a rating. And how do you do that fairly?
Let’s say they play the same team 10 times and they get the same score every single time. How do you want to rate them? In advance you have to designate a procedure for dealing with that situation. Obviously you want to take the score into account. You want to get the concavity correct in that additional margin of victory is worth less the bigger the gap gets. And we have that.
And, rightly, you don’t want to punish teams for playing teams that are way below them.
With this algorithm, it’s tricky because we have more concerns than say a Massey rating or a Pomeroy rating. They don’t have to deal with things like incentives. All they’re trying to do is to get a pure strength rating for a team. But because this algorithm for USAU is used for things that actually matter, we have to balance what does this do in terms of incentivizing teams and how accurate can we be with team strength.
I don’t have a lot of great solutions for that. This is not an algorithm solution, but one thing you could do is have a larger minimum game requirement or a larger out-of-region game minimum or something like that.
I just posited last week in my mailbag to require some number of out-of-region games as a prerequisite for earning a bid from Regionals to Nationals. I know that’s not algorithm-related but do you think that would be an effective solution?
We’ve talked about it quite a bit in the ratings group. I’ll put it that way.
Any chance that we’ll see it implemented?
I have no idea. That’s not really our purview. We’ve certainly discussed it amongst ourselves. There are lots of issues in terms of what you can ask teams to do, reasonably.
I think the competition committee is absolutely aware that the more connectivity you get, the better. They know that having to drop 50% of a team’s games because they’re blowouts is not good.
Are there any things that you think could improve that algorithm itself?
We’ve talked about dealing with games that are low scoring. If you have a 5-4 game, how much can we really put weight on that? We’ve started to talk about that and we’re waiting on some action there. That’s not a hard thing to implement and it’s something we’d like to do.
Other things that are a little bit more complicated: we would like to see if there are ways to minimize whatever positive effects there are [from playing teams that are ranked slightly below your team]. The whole issue that these teams get these ratings because they played those games is a delicate one. There’s a germ of an idea of how to deal with that but it would require a lot of experimentation. That’s not an easy process.
We first have to convince ourselves — the ranking committee — and then we would have to convince the competition committee that this is something that might be really technical but worthwhile.
Are there any common myths about the algorithm or the process by which the algorithm is used that stick out to you?
One of the big values that having such a thing gives to USAU is that it is completely objective. Even if adding some human input would make it more accurate, what you lose by going from 100% objectivity to less than 100% objectivity is so massive that it would outweigh a lot of things. So that’s one of the reason that it exists, and it is in the form that it is.
The other misconception that we touched on a little bit — it’s not really a misconception, it’s just that when people come up with these ideas of alternative rating systems [they don’t consider this] — one of the reasons the algorithm is the way it is is that we have to take into account incentives. Whereas something like Elo doesn’t think about that.
We need to have, for instance, a cutoff for how much margin of victory matters. Between 15-6, it doesn’t matter. The only reason that’s there is for incentive purposes. If we didn’t have that, then things get out of hand potentially. We have a time decay — that’s not an incentive thing so much as a reflection of how teams treat the season. Teams treat early season tournaments differently than they treat late season tournaments. This is our attempt to reflect that.
This is what makes it trickier than it could seem at first glance.
Do you think it’s a problem when you have a team — not to pick on Florida United, but this happened to be the most recent case of this — going to a tournament, playing teams that benefit from them doing well and earning a bid for their region? You talk about incentives — there were obviously incentives for the other teams they played that weekend to lose by large margins. I’m not saying that anything like that happened, but is it a problem that that case still exists?
It could be a problem, yes. All I can say is that the competition committee is aware of that. And you guys and the Internet writ large do a good job of noticing when such things could be happening. And I think USAU is aware of possibilities for shenanigans.
Beyond that, what can be done? It’s hard to say, beyond changing how the bids are allocated based on the ratings. That’s really the only solution there. Or making the ratings completely opaque. Tweak three or four parameters, don’t tell anyone, list the teams in order, don’t give them any numbers. Maybe Cody [Mills] will be so enterprising as to run the basic same algorithm and reverse engineer what the parameters must be if the teams ended up in that order.
There has been a lot of thought put into the algorithm. I’m a math professor at Rice. And there are three other people on the rankings committee — we have a statistician, we have a computer scientist with a math background, and we have somebody who’s in aerospace. Three of us have PhD’s. And I think all of us have been to Nationals in one form or another.
We know the game. We know our math. And we’re very sensitive to how this works and how it affects the ultimate season. It’s not just some slapdash thing that’s been put together.
Dunham’s contention is that playing teams that are a bit worse than your team is advantageous to your ranking, relative to other scenarios ↩
He pointed out the men’s team at Texas Tech from last year as an example ↩