Wednesday, May 09, 2012

"Money puck": when bad sports research attacks

My University of Toronto email account gets these periodic bulletins with all sorts of random info about what's happening at the school and faculty or students who have been in the news. This story popped up in the last bulletin, and while the headline filled me with skepticism - the shamelessness of "Money puck" is almost too much to bear - I was genuinely interested in the research that this engineer, Tim Chan, and his student, David Novati, were doing. [Full-disclosure: I actually work for the University of Toronto. In Engineering, no less.]

Quick - which one is the student and which the professor? Photo by Raj Grainger.

Having read the article and their second paper, I realize now that I probably should've skipped the whole thing. But I didn't, so you're going to hear about it.

The fluff piece immediately throws up a bunch of red flags. The authors explain that their study confirms that Sidney Crosby is a valuable player (really? someone who's regularly compared to Gretzky is good? what a surprise!) and that guys who get in fights and rack up penalty minutes are not valuable (it's bad to give the other team a power play? another shock!) So, unless this study is meant to be read by people who only listen to Don Cherry, it's immediately difficult to see what they're adding to the conversation.

An aside: One of the complaints that Tom Tango makes about academics who wade into statistical sports analysis - apropos of nothing, I think that this is too often the case with my other research interest, comic books - is that they don't bother to check the non-scholarly research, first. And this is a huuuuge problem. Because, y'see, the work that these academics are trying to do - "quantify[ing] [each player's] individual contributions to his team’s performance" - has been done and is being done, and very well. (And we know it's being done well because, as with many of the baseball analysts, their work wins them jobs with pro teams.) Here, for instance. And here. And this one, which is one of my favorites because it's surprisingly accessible. Also, here. Here's another one. This one, too. And here's a slicker option that's also pretty readable and very comprehensive. And one more. (To Chan and Novati's credit, though, they actually referenced the very last site on this list.) But academics ignore them, because that's the nature of the game - you reference other academics because they have academic prestige, thereby increasing your own academic prestige; and then you hope that other academics will reference you, increasing your prestige again.

Ryan Miller and Jim Corsi. Corsi's name should appear somewhere in this paper.
It doesn't. And that's a bad thing. Photo source unknown.

So, when their abstract begins with reference to "[r]ecent literature in hockey analytics" and it references only one non-academic advanced statistics site? (Which is, hilariously, equal to the number of references they make to both Bill James - who has nothing to do with hockey - and their own work.) That's a bad sign that, no, they aren't familiar with what's happening in the recent literature. (Or that their definition of "recent literature" is much too narrow.) And when the first line of your abstract is just plain wrong...

Some other (not-so) quick and dirty comments:
  • From the abstract: "Top ranked players in terms of point shares tend to be winners of major NHL awards, are leaders in scoring, and have the highest salaries. ... Overall, a better understanding of individual NHL player characteristics may provide a foundation for deeper, data-driven player analysis." Okay, so I don't get it. Why do we need "deeper" analysis if, in the first sentence, you're saying that your top-ranked players are also the ones who win awards and get paid well? Doesn't that imply that the current evaluation system works just fine? That you can add nothing to it? There's no hook, here.
  • They reference their earlier paper, which uses k-means clustering to establish four player types. Now, admittedly, I know virtually nothing about algebra. But I can tell when someone's approach is begging the question. What benefit does clustering actually have, here, to a game that is determined solely by goal-count? Where does the need to establish "types" come from and what point does it actually serve? And why limit yourself to, for example, four clusters of forwards and define them in the way that they have? I suspect that it's because most teams use four lines of forwards, each characterized by those particular functions (two scoring lines, a defensive line, and a "physical" one), but this is exactly the kind of "traditional" thinking that needs to be challenged, (or confirmed) not taken as a given.
  • This is more of a personal preference, but I've never been sold on Win Shares or Point Shares. (Nor has most of the advanced stats community, since most sports seem to prefer some version of Wins Above Replacement.) Especially in a sport like hockey, where 10% of your team's points may be derived from shoot-out wins, team wins and points simply aren't the best way to evaluate individual talent-level. (Also, given the point I just made about shoot-outs, you'd think that they'd incorporate shoot-out stats into their analysis. But they don't.)
  • They use only the most obvious, traditional stats. Now, goals make sense (though shooting percentage is better, and they should also account for the difference between even-strength and power-play goals, because the latter are so much easier to attain), but it's horribly problematic to use assists without differentiating between the quality of different assists - earning the second assist often means you had no direct involvement in the goal, or at least you were just as involved as the guy who fed you the puck but didn't earn a point himself. They use penalty minutes but, at least, admit that this is to help determine the player's function, not because penalties are good. (This is where the analysis is unclear me - is the "physical" component actually a negative component?) And they use plus/minus, which is where I start to feel The Outrage boil up. Plus-minus is a terribly way to assess a player's value - just click on that link, which explains why the best defensive players routinely have terrible plus/minus ratings. And then The Outrage explodes when I see that they use unadjusted GAA and... Goalie Wins. Um, no.
  • Relatedly, they completely ignore all of the tremendously useful advanced stats that have appeared over the past decade: they never use the terms GVT, Corsi/Fenwick (ie. shot differential), zone starts (where was the puck when the player got on to the ice? when he got off?), quality of competition, or some version of WOWY/quality of teammates (how did his teammates do when he was on the ice vs. off the ice?) If you're totally unfamiliar with these, you can read about what most of them mean here. The point, though, is that these metrics already exist and they've been proven to do a better job of assessing player ability than the stats that Chan and Novati rely on. I don't know if the reason is hubris or laziness... but, wow, they dropped the ball on this one. (Or dropped the puck. Made a bad cross-ice pass. Let one go through the five-hole. Insert your own metaphor here.)
  • Their metric normalizes for playing-time, but you can't do that. (Well, obviously, you can do that. But you shouldn't.) You can't give everyone equal playing-time because some players are simply injury-prone - their inability to stay on the ice has an undeniable effect on their value, except that normalization does deny it. Endurance and fitness is another factor - some players can simply play at a high-level for a longer time than others. But, perhaps most importantly, you have to account for the context of those minutes. Because if you normalize the total on-ice minutes without also normalizing their power-play and penalty-killing time - to say nothing of the various even-strength roles that players are given - then "normalizing" actually becomes something of misnomer, because you'll exaggerate those contextual differences.
Now, all that said, I can see some use for this research. In the fluff piece, it says that they hope to create an online tool that will help you draft your fantasy hockey team. And, yeah, I suppose that could work.

No comments: