Analyzing the Standard Metagame of PT Kaladesh

If you are a Magic fan like I am, you enjoyed reading the coverage of Pro Tour Kaladesh, and were intrigued by not only who won but also which deck they brought to the tournament. Wizards coverage did some good work filling us in on which decks were successful in making Day 2, and in reaching records of 6-4 or better. But as a participant in the tournament, and an interested party for the future of the format, I felt compelled to analyze a little deeper.

There are some gaps in my data, and there are some simplifying assumptions and categorizations used to make these analyses possible. There are also some methods you may not have heard of, but I have tried to make the findings as accessible as possible. I hope that what follows will provide value for anyone looking to play Standard competitively in the near future.

First of all, which decks showed up at Pro Tour Kaladesh?

Name Pilots
Aetherworks 100
Black-Green Delirium 63
Vehicles 43
R/B Aggro 38
Energy (not Aetherworks) 35
B/R Zombies 21
Jeskai 18
Blue-Red 17
Midrange (misc) 17
Bant Midrange 13
Grixis Emerge 12
Humans 12
Temur Emerge 11
Colossus 10
White-Blue Flash 10
Green-White 9
Emerge (misc) 8
White-Blue Control 7
Control (misc) 7
Aggro (misc) 7
White-Black Control 6
Other 2

At CFB’s Team Pantheon, these numbers would’ve surprised us a lot. We felt that Red-White Vehicles was the clear front-runner, and probably the most likely deck to face at the Pro Tour. We made an effort not to spend much time on any deck until we’d run it past Vehicles. Clearly this was not the case, with less than 10% of the field playing Vehicles decks. This includes 3- and 4-color Vehicles decks splashing for cards like Scrapheap Scrounger, Unlicensed Disintegration, and Ceremonious Rejection (basically you had to be playing Depala and more Vehicles than just Smuggler’s Copter to qualify).

A couple of decks we did see as contenders that saw a lot more play than R/W Vehicles were Black-Green Delirium and Aetherworks Marvel decks. Black-Green Delirium included builds splashing red for Radiant Flames or Kozilek’s Return, with nearly all of them playing Grim Flayer or Sylvan AdvocateIshkanah, Grafwidow, Mindwrack Demon, and Traverse the Ulvenwald.

Aetherworks Marvel decks were more chromatically diverse, spanning green-blue, green-red, Temur, and 4-color configurations. We knew this was an exciting deck, but our opinions varied widely within the Pantheon as to whether it was a menace to the format or a deck that that borders on non-functional. It looks like lots of pros found it plenty functional in playtesting.

Before I get into the deeper analysis, some words on methodology. Because I did not have access to each player’s deck choice and win/loss record (only those on Day 2 who did well in Constructed), I had to use some inference to find which decks fared well. I took 6-4 as being one win above the baseline, meaning that if you went 6-4, your deck is credited with earning 1 win more than what is effectively our minimum (i.e./ not good enough to make it to the stat sheet). If you went 7-3, 2 wins, 8-2, 3 wins, etc., I then divided by all players piloting the deck in the entire tournament to find how many wins above baseline (WAB) each deck got per pilot.

If you were White-Blue Flash and had lots of 9-1s and very few pilots, this meant a comparably huge number to decks like Aetherworks, which had relatively few strong finishes and lots of pilots.

I then used binomial distribution (and a compensation for players not getting a chance to win more after missing Day 2) to infer a win-rate from these wins above baseline.

To measure variance, I then used Bayesian inference (from pretty conservative priors, for math people out there) to try and determine the expected win-rate from this deck and how likely that win-rate is, given the number of matches played with the deck.

I admit, it’s an eyeful.

Another way to digest this is: here is my opinion, and how I got to it (roughly).

Name Pilots Wins above baseline Wins above baseline/total pilots Binomial inferred win rate Bayes inferred win rate Probability we are within 2.5% of true win rate Probability that we are within 7.5% of true win rate
White-Blue Flash 10 26 2.6 77.0% 74.0% 45.4% 90.6%
Aggro (misc) 7 10 1.43 64.0% 57.1% 46.5% 85.3%
Jeskai 18 18.5 1.03 59.0% 55.8% 65.6% 98.1%
Control (misc) 7 7.5 1.07 59.0% 53.8% 49.6% 93.1%
Vehicles 43 32 0.74 54.0% 53.5% 69.7% 99.9%
Humans 12 9 0.75 54.0% 52.4% 44.9% 95.7%
Green-White 9 6 0.67 53.0% 51.7% 47.6% 95.9%
Temur Emerge 11 7 0.64 52.0% 51.0% 53.2% 97.8%
Energy (not Aetherworks) 35 18 0.51 50.0% 50.0% 81.2% 100.0%
B/R Zombies 21 11 0.52 50.0% 50.0% 67.9% 99.7%
Black-Green Delirium 63 28 0.44 49.0% 49.6% 90.6% 100.0%
Grixis Emerge 12 5 0.42 48.0% 49.0% 54.7% 98.2%
White-Blue Control 7 2 0.29 5.0% 48.1% 44.3% 94.0%
Midrange (misc) 17 6 0.35 47.0% 48.1% 52.1% 98.2%
Colossus 10 3 0.3 45.0% 47.4% 46.1% 93.3%
Emerge (misc) 8 2 0.25 44.0% 47.1% 46.7% 92.7%
Aetherworks 100 37.5 0.375 47.0% 46.8% 64.5% 100.0%
R/B Aggro 38 13 0.34 46.0% 46.6% 66.2% 99.8%
Bant Midrange 13 3 0.23 44.0% 46.4% 56.6% 97.3%
Blue-Red 17 4 0.24 44.0% 46.2% 62.2% 98.7%
White-Black Control 6 0 0
Other 2 0 0

I could have then used a method like standard error to find how likely these win-rates are true, given the sample size. But in my experience, these methods don’t account for what we know about Magic—a win rate for a deck cannot be above 100% and realistically can’t be above 90% (or so I claim). For what it’s worth, I gave a deck a 1/2000 chance of being 90% vs. the field in my Bayesian analysis. This is really about reporting results, not explaining methods, though, so don’t worry if you don’t know Bayes from Mayonnaise. Recall that we had access to only who made it to 6-4 or better in Standard. I used this to find how many wins above baseline (WAB) each deck got. I then made a table of what the average WAB a given win-rate would yield, and used (along with compensation for not playing out 10 rounds in some cases—I don’t have space to get into this) it to reverse engineer a win rate for each deck.

My strongest reservations about this analysis have to do with data and variety. I was pretty thorough about double-checking which decks got classified which way, but there could easily have been inconsistencies that led to a few deck lists being miscategorized. As far as variety, consider the case of Aetherworks Marvel. 100 pilots brought this deck! About 80% of them were Temur, it’s true, but you are still looking at a potentially wide gulf between sideboard plans, mana bases, play styles, etc. One thing I can tell you is that 3 of the best players in the world (William Jensen, Owen Turtenwald, and Jon Finkel) played Temur Aetherworks, and they didn’t fare any better than the average Aetherworker.

What did we learn? While a small sample slightly weathers my enthusiasm over White-Blue Flash, it still turned in an amazing performance, suggesting an 74% win rate when I use Bayes to account for sample size. As a point of reference, R/G Ramp was the best performing deck at the last PT, also suffering from a small sample—in this analysis, it ends up with a win-rate of just under 60%. In other words, you’d probably have to go back to Esper Dragons to see a Pro Tour performance from a deck that is anywhere near what of White-Blue Flash. I’d be interested to see if anyone in the comments section has data for another dominant performance they’d be interested in comparing.

While it might look controlling at first blush, this deck dramatically varies from White-Blue Control with its inclusion of not only beatdown staples like Thraben Inspector, Smuggler’s Copter, and Selfless Spirit, but also old favorites like Reflector Mage and main-deck Spell Queller, and the often annoying Rattlechains. Often packing Gideon and Archangel Avacyn, this deck feels like Faeries with an enormous amount of extra punch.

Next, I note the success of Humans (52.4% Bayes inferred win rate, with Red-White, Bant, Green-White, Mono-White varieties included), miscellaneous aggro (57% Bayes inferred win rate with the star inclusion of Red-White Tokens), Green-White (52%), and Vehicles (54%), all as decks posed to take a conventional aggressive (and in this case, successful) stance. I’d guess that some of the strongest elements of these decks they had in common with White-Blue Flash: Smuggler’s Copter and the perfect little 1-drop to crew him—the innocuous Thraben Inspector.

Conversely, the two most widely played decks did not put up great showings in either the Top 8 or the tournament as a whole, with Black-Green Delirium putting up a respectable 49.6% Bayes inferred win rate, and Aetherworks coming in at a not-quite-as-bad-as-I’d-thought at 46.8%. Especially with Black-Green, I think there is plenty of room in the numbers for one version to have done well and another much worse. Also, if its bad matchup (Aetherworks) falls out of fashion, it’d be hard to see why Black-Green wouldn’t reinvigorate at upcoming Standard tournaments.

Another popular and interesting new deck was Energy (no Aetherworks). Often sporting Electrostatic Pummeler (but not always!), it utilized energy and large bodies (often generated by pump spells) to deal fast and copious amounts of damage. Some versions opted for Copters, blue for counterspells, and often went large with cards like Chandra, Torch of Defiance, Woodland Wanderer, and Verdurous Gearhulk.

And what about control decks? The Top 8 was full of them, but while Jeskai (56%) and Misc. Control (54%) did very well (including Shota’s Grixis), blue-red, white-blue, and white-black came in at or near the bottom of the heap.

One last category worth mentioning before further analysis is emerge. Some thought this brand of deck was just a slower/less powerful version of Aetherworks. But trusty old Temurge still put up decent numbers (51%), while Grixis and its miscellaneous brothers were further down the list (see the grouping below).

Now for some groupings:

Grouping Pilots Wins above baseline Wins above baseline/total pilots Binomial inferred win rate Bayes inferred win rate Probability we are within 2.5% of true win rate Probability that we are within 7.5% of true win rate Decks in category
Dedicated Copter Decks 140 107 0.76 55% 55.0% 99.7% 100.0% White-Blue Flash Aggro (misc) Vehicles Humans Green-White Energy (not Aetherworks) R/B Aggro
Thraben Inspector decks 81 83 1.02 59% 58.5% 96.4% 100.0% White-Blue Flash Vehicles Humans Green-White
Emerge decks 31 14 0.45 49% 49.3% 74.8% 99.9% Temur Emerge Grixis Emerge R/B Aggro
Control decks 55 32 0.58 51% 51.4% 71.9% 100.0% Jeskai Control (misc) White-Blue Control Blue-Red White-Black Control

Some energy decks ran Smuggler’s Copter, and the majority of Black-Green Delirium decks ran the 2-mana 3/3 flying looting Vehicle, but I decided to leave both out of the “Dedicated Copter Deck” grouping because they were more an example of decks that played Copter because it was too good, rather than a deck built to maximize its power.

As you can see, the Copter decks had a great showing with a 55% Bayes inferred win rate, while decks with Thraben Inspector (just about always containing Copter as well) did even better at 58.5%.

You can also see on the right that these samples are pretty robust. For dedicated Copter decks, I found a 99.7% chance that our Bayes inferred win-rate was within 2.5% of the truth (meaning the true win rate is between 52.5% and 57.5%) and nearly 100% that it’s within 7.5% of the truth (47.5% and 62.5%).

As far as control decks, they come out looking pretty good at 51.4%, and with the enormous disparity between the various archetypes, I think it’s fair to say that bringing the right control deck could be an exciting prospect. Then again, if Aetherworks numbers fall as drastically as they might, or people start playing more planeswalkers, its possible that Black-Green Delirium (not included in this category) will be about as controlling a deck as we see widely represented.

That’s all for now. Thanks for your attention, and I welcome any questions in the comments section.

Scroll to Top