The voting season for the Hall of Fame wrapped up a couple weeks ago, and as always, it was a good time to have a historical interest in Magic, or to be someone who likes intense Twitter drama. I happily inhabit both categories, and I enjoyed listening to all the discourse, and thinking about what I value in a professional player and how to quantify those values.
It’s not an easy process, which is why there is considerable room for disagreement among totally reasonable people. Even if two voters agree 100% on what traits should determine the strength of someone’s case for induction (which is already a stretch), they might disagree on how big the Hall should be, and just how strong a case must be made before a player has earned a vote. There’s even been discussion this year of only voting once every two or three years, with some suggesting that the Hall is beginning to be watered down.
The best way to evaluate that proposition is to compare recent classes to their predecessors. That’s easier said than done, however, due to the changing structure and popularity of professional Magic. Many attempts on this front have focused on the increasing size of Pro Tours. PT Los Angeles 1996 was the first professional event that resembled those of the modern day – only invited players could participate, there was a record-based cut to Day 2, and a Top 8 playoff to determine the winner – and it had 179 players, meaning that those final 8 represented slightly less than 5% of the total field. In comparison, PT Dragons of Tarkir had 408 players, so the Top 8 was less than 2% of the total field, or (by this crude measure) more than twice as difficult to reach.
Randy Buehler brought this idea up a couple times this voting season, with comments like this:
But as Randy also pointed out, that isn’t the only aspect of the Pro Tour that has changed, or even the most substantial one.
Several major changes to Magic have made the average skill of a Pro Tour player rocket upward. Most broadly, Magic is much, much more popular than it used to be, and with a larger pool of potential players to pull from, the 400 best today are much better than the 400 best in 2000. Additionally, Magic Online has allowed for every aspiring pro to practice as much as they want, even in remote locations or without a talented playgroup, and the multi-stage qualification process of PPTQs and RPTQs has made it much harder for a mediocre player to luck into a PT berth. Making exact comparisons is difficult, but it is widely agreed that the worst player at PTDTK is almost undoubtedly better than the worst player at PTLA.
All this is good, but the key question is by how much? The magnitude of the change between the early era and today is crucial to Hall of Fame discussions and any understanding of how professional Magic has changed. What follows is an attempt to estimate that magnitude.
My methodology is going to look somewhat similar to my previous article, wherein I tried to estimate the change an individual player goes through as he or she ages. There, I grouped all the players who had results at both age 20 and age 21 together, and took the average difference of those results, then repeated the process for every pair of ages. In this analysis, I’ll do the same, but use years instead of ages. Did players who played in both 2000 and 2001 get, on average, worse or better? What about 2001 and 2002? By combining each year, the result is an estimate of the change in difficulty over time.
In that article, I used Pro Points by year. While the thresholds and incentives behind them have changed over time, since that analysis mixed results from across eras together, those shifts didn’t systematically bias the estimates. Unfortunately, that’s not the case in this analysis, so I’d need a different measure.
What I did was pull the final standings from every PT for which they were available, and for each player in a given season, calculate their number of match points per PT played in. The downside of this is that detailed final standings aren’t consistently available until the 2001 season, so the coverage of this database is limited. That said, there are a lot of downsides to using Pro Points as a measure of skill, many of which I discussed in my previous article—players might just attend more or fewer tournaments rather than truly improve or decline, for one—and this dodges almost all of those. I didn’t give any additional credit for Top 8s, but I did count wins in the Top 8 as an additional three points for that player, and I ignored team PTs.
Before going any further, what performances are the best by this measure? These are the ten best player-seasons, by match points per PT:
Unsurprisingly, there are 7 seasons with only 1 PT, and only one with 3—it’s easier to be great for 16 rounds than it is for 32 or 48. Here are the top 10 seasons for players with at least 2 PTs:
One problem with using this measure alone is that it punishes players in some of the early PTs, which sometimes had as few as 12 rounds. To counter that, I scaled each PT result to 19 rounds (16 main, plus 3 for the Top 8). As an example, with this system, a player who won 12 of 15 rounds at PT Amsterdam in 2004 would get credit for 12.67 wins instead. How does the top 10 leaderboard look now?
One early season cracks the top 5, Kai Budde’s stellar 2001, and there’s some minor movement beyond that. But these leaderboards are primarily for fun, and it’s less important that this works as an individual metric than a collective one. How have Magic players changed from year to year? To construct the following curve, for every player, I weighed the change in their performance from year n to year n+1 by the harmonic mean of the number of PTs played in year n and in year n+1, then took the total across all players. 2001 was set to 0, and the change from each year to the subsequent year determined each value.
This curve has some expected year-to-year fluctuations, but it also has a fairly clear and consistent decline from 2001 through about 2010, after which it seems to level off. The conclusion from this chart would be that, while the difficulty of the PT hasn’t changed much in the last several years, the change from 2001 to today is between 12 and 15 match points, or 4 to 5 wins.
While it’s not unbelievable, that does seem large to me, and there’s one additional adjustment that I think should be made. Survivor bias might be responsible for some of the steepness of this curve, and I want to see what happens if I adjust for that. If you’re familiar with the concept, you can skip this next section, but if you’re not, consider this hypothetical. There are two identical Magic players, Jane and John Doe. They both attend one PT in 2010 and perform exactly at their true-talent level, getting 4 wins and 12 match points each. The next year, however, they again attend one PT, where Jane runs into several good matchups and ends up with 8 wins and 24 match points, while John plays his bad matchups and doesn’t win a single match. Jane feels like she’s improving as a player and attends the next PPTQ she can, eventually qualifying for at PT in 2012. John feels discouraged and like his hard work was for nothing, and stops playing competitively.
What this simplified scenario is meant to illustrate is that players who return in year n+1 probably did better in year n than the players who don’t return, and some of that difference in performance is due to luck rather than skill. Even if you don’t believe that discouragement plays a role, the fact that good performance at a PT can qualify you for the next PT means that statement is almost certainly true to some extent. The resulting bias toward good and/or lucky performers is known as survivor bias, and it could affect the above curve in a real way. Consider Jane’s performance in 2012. If she remains a 4 win/12 point true-talent performer, we’d expect her to decline from 2011 to 2012, by 12 match points, and that decline would be reflected in the data. If John had attended the PT, we’d expect him to improve by the same amount, but because he didn’t, it doesn’t show. The data now suggests that 2012 was more difficult than 2011, despite no real change.
Survivor bias is tricky to correct for, and there’s no single right way. What I’m going to do is give each player that played at least two years in a row but not in the third an additional “phantom” year of performance in the third. Their performance in that phantom year will be at the average of the previous two years, and in the number of PTs they played in the prior year. With that adjustment, the curve looks like this (in blue, with the original curve in red):
There’s a much smaller decline, with substantial fluctuations in the last several years making it somewhat difficult to pull out a trend. Here, rather than 3 to 4 match wins per PT, the difference between 2001 and 2002 is closer to 1 to 2 match wins. Notably, a lot of the decline comes between 2001 and 2002, where I can’t adjust for survivor bias because of the lack of 2000 data. If you start the curve in 2002 instead, the decline through 2014 is even smaller, between 2 and 3 match points.
So which one is right? I’m not sure. My guess would be somewhere in the middle, since while adjusting for survivor bias is almost certainly correct, the method is totally up for debate. Even using the same technique I did, it might be right to set the performance to 50% of the average of prior performance, or 30%, or 90%, so don’t take this as the be-all, end-all. What this does do, however, is give a sense for the range being discussed when the changing difficulty of the Pro Tour is the topic.
Finally, though I said this was best used collectively rather than individually, and that the year-to-year fluctuations are probably not reflective of real shifts, it would be boring not to use this to adjust the leaderboard data one more time. Here are the top 10 performances with at least 2 PTs since 2001, using the age-adjusted curve:
Congratulations Tom! Your 2013 season was definitively the best since 2001.
I hope this was enjoyable and interesting, and as always, I’d love to hear feedback or comments. The best way to reach me is on Twitter, @henrydruschel. Thanks!