We’re hardly all business here.
If we’re honest about it, Magic metagame analysis is equal parts “figuring out what we need to do to win” and “sports radio.” After all, although it’s useful to have guidelines for what deck to bring to your next tournament and how to tweak that deck for best results, it’s also just fun to hash out ideas like, “What’s the best deck?” and “Why did that deck even win?”
So, with that in mind, today I’m going to delve a little bit into both areas and ask, “Why did those decks do well?” and “How do we win next week?” In doing so, we get to move into two of my favorite topics – how reality (mis)matches our impressions, and how we can use our knowledge of what has gone before to figure out what to do next.
Or, to put it another way, do I really have to play CawBlade?
A big slice of Standard
This time around, I’m harvesting a whole bunch of data from post-M12 Standard events to inform our understanding of the current Standard metagame.
Specifically, I’ve collected the top eights from some forty-nine paper Magic events at roughly the PTQ level and above. This includes PTQs, SCG Standard Opens, other larger-scale events, and a flock of Nationals from around the world.
Going in, there are some clear built-in biases to the data we have to work with. About two thirds of the events come from the United States, and the majority of non-American events come from Western Europe. That said, the U.S. is big enough that we’d expect that if there are significant regional differences, we might see them occur even within the U.S. and not just between the U.S. and other countries.
The questions we’d like to ask the metagame
There are a few things we’d like to extract from a metagame analysis, to serve both our “want to win” and “sports talk” needs.
First, there are the basics. What’s winning? What’s not?
Which, of course, doesn’t give us any indication as to why any of that is happening. Most of the “why” is actually best addressed by in-depth consideration of matchups, card choices, and plays, but we can try to address it by asking some other questions as well.
These more elaborate questions focus on the fact that we’re not all just playing on Magic Online. Are there deck choice biases by region? By time? By type of event? Do different decks succeed at different levels?
With these questions in mind, let’s take a crack at actually using the information from those forty-nine events.
The basics – what’s winning?
Okay, so the basic question is “What’s winning?” That should be easy enough to address – after all, it’s right there in each top eight.
What’s winning, first take
Across the events surveyed, here are the winning archetypes and their tallies:
CawBlade – 22
Valakut – 4
Tempered Steel – 3
Goblins – 2
other aggro – 2
other control – 2
Red Deck Wins (RDW) – 2
Splinter Twin – 2
TwinPod – 2
U/B Control – 2
U/W Control – 2
Value Pod – 2
other combo – 1
Vampires – 1
Before we go on, here are a few quick notes on this tally.
First, any archetype that didn’t appear in at least 10% of the evaluated top eights was wrapped into another category. If it was a subset of a bigger archetype it was placed in that bigger archetype’s tally (for example, the two Sukenik-style [card]Gravitational Shift[/card] CawBlade decks were tossed in with the CawBlade category). If it didn’t fit one of those bigger archetypes, it was batched in with the appropriate “other X” category.
Second, although most of these archetype names will be evident for anyone who’s following Standard, “Value Pod” is my own shorthand for decks that use Birthing Pod but don’t leverage it into some kind of combo (that is, Podding “for value”).
Our first stumbling block
These numbers are nice and all, but they have a problem.
In a word, Nationals.
Consider the potentially depressing top eight from this year’s U.S. Nationals. Six CawBlade decks?
Well, first impressions aside, we’d do well to remember that Nationals tournaments are mixed events. Of the fourteen rounds that led to the top eight at U.S. Nationals, only eight were Standard. It’s possible, then, to make it to the top eight with a deck that has basically no chance of taking you through a full tournament on its own merits.
Thus, although the top eight of U.S. Nationals featured six copies of CawBlade, the list of Standard decks that took no more than one loss across eight rounds of play saw the CawBlade percent had the same CawBlade count, but also featured eight other decks:
CawBlade – 6
Value Pod – 2
RUG Splinter Twin – 1
Tempered Steel – 1
TwinPod – 1
U/B Control – 1
Valakut – 1
Vampires – 1
Obviously, CawBlade still has a big footprint here, but 43% of the top decks is very different from 75% of the top decks.
Unfortunately, we simply don’t have the data on what decks actually did well in all the Standard rounds at every Nationals. This suggests a very simple solution to issue of skewed top eights from mixed events.
Leave them out.
So moving forward, we’re going to ignore all those Nationals top eights, bringing us down to thirty-six events to pull data from.
Our clarified results
So, if we remove all those Nationals top eights (goodbye, funky Valakut/TwinPod hybrid deck that won Malaysian Nats!), then our winners look like this:
CawBlade – 15
Valakut – 4
Tempered Steel – 3
Goblins – 2
other aggro – 2
other control – 2
RDW – 2
Splinter Twin – 2
Value Pod – 2
U/B Control – 1
Vampires – 1
So, CawBlade is still at the top of the heap – not that we expected it to completely disappear with the removal of Nationals.
But this is also not the end of the story. After all, if only five people actually played in these thirty-six top eights with Goblins and then two of them won, that would be pretty impressive, right? Certainly, it would be more impressive than if twenty people were in those same top eights with Goblins.
As it happens, five people did play Goblins and come away with two wins.
So let’s clarify our question one more time.
Getting there versus getting there
Knowing a deck’s frequency in the top eight is nice and definitely informative in terms of our desire to win – since we’ll need to beat the others in the top eight if we want to win. But we’d also like to know how often an archetype takes the top prize versus how often it appears.
That breaks down like so:
CawBlade – 21.4%
Valakut – 12.5%
Tempered Steel – 8.3%
Goblins – 40%
other aggro – 22.2%
other control – 50%
RDW – 8%
Splinter Twin – 10.5%
Value Pod – 20%
U/B Control – 5.9%
Vampires – 7.1%
As a reminder, this is the percentage of top eight appearances by that archetype that resulted in a win. I strongly suspect that there is some skew here for small sample sizes for Goblins and “other control,” but the other win rates are pretty informative.
Our first pass conclusions
So, after all that, do we want to draw any conclusions?
I was certainly pretty shocked to see the big, chunky-style discrepancy between appearances and wins for U/B Control, Tempered Steel, and RDW. Between these three archetypes, there are seventy-eight top eight appearances and a less than stellar six wins.
Of course, it’s easy to overlook that U/W Control decks had sixteen top eight appearances and no wins.
In fact here are the top ten decks in terms of making it into top eights, with their appearance percentage (and the percent of total events won in parenthesis).
CawBlade – 24.5% (41.7%)
Tempered Steel – 12.6% (8.3%)
Valakut – 11.2% (11.1%)
RDW – 8.7% (5.6%)
Splinter Twin – 6.6% (5.6%)
U/B Control – 5.9% (2.8%)
U/W Control – 5.6% (0%)
Vampires – 4.9% (2.8%)
Value Pod – 3.5% (5.6%)
other aggro – 3.1% (5.6%)
CawBlade is clearly overrepresented in terms of wins versus appearance rate, whereas Valakut is running about even. Clearly, CawBlade is eating a little bit of everyone else’s cake, since most of the other contenders clock in at a little to a lot less than their expected win rate – if everyone won roughly as often as they appeared, anyway.
So is our conclusion simply “play CawBlade?” Or are there other factors at work here?
Does your mileage actually vary?
We can pretty easily come up with some ideas about factors that could influence the contents and winners of these top eights.
Are there regional differences, for example? While the pre-banning CawBlade monster was ravaging North America, we saw a GP top eight in Barcelona that featured a curious 2-2-2-2 split between CawBlade, U/B Control, Valakut, and RUG.
Are there time trends that we lose when we bunch tournaments together like this? Maybe Tempered Steel was awesome right after M12 came out, but its performance has dropped off since then?
How would you even really ask these questions?
Trawling for insight
This kind of situation comes up all the time in my line of work.
“So, we’ve collected 2,000 samples from that big E. coli outbreak, and we’ve sequenced the genomes of each infecting E. coli strain. What do we do now?”
There are a lot of these problems where you can maybe take a stab at an understanding of what’s going on using your own feeble brain, but you’d really like to set up that kind of question that’s really hard to answer by hand and then let a computer do it.
One way we do this is called clustering.
“Clustering” describes a family of techniques by which we throw a bunch of data into a computational blender and say “stick the stuff that’s similar together.” The nice aspect of using clustering methods is that you, the person who sets it up, don’t have to sit there doing an infinite amount of tedious work to try and maybe figure out how your data group together. In fact, you often won’t know ahead of time what the defining factors that really separate your data out are.
For example, in this case we might learn that geography matters, or time matters, or something else entirely.
To ask this question, I took those unified (that is, single-format Standard constructed) event top eights and clustered them based on relative presence of the various archetypes.
For the technically minded, here’s what I did:
Each of the thirty-six top eights was broken out as a profile with a number assigned for each archetype’s abundance in that top eight (including assigning “0” when an archetype was absent). This gave each event a numerical archetype profile (e.g. the 7/23 Santa Clara PTQ is 000030100100001020; that “3” is the CawBlade count). I then used Cluster 3.0 to subject them to hierarchical clustering using the Euclidean distance metric and the average linkage method).
The end result is a big chart that looks like this:
It can be a bit much to look at initially, so don’t worry if this doesn’t leap out at you and say “Here’s your answer.”
That writing-diagram looking part on the left is a dendrogram, which is a fancy Greek way to say “tree drawing” (not kidding – that’s what it means in Greek). It basically shows how the various top eights group up, starting on the left with a big group that includes all of them, then breaking them down into smaller, more similar groups as we go to the right.
The list of event names on the far right shows how the events are grouped – it matches that chart on the left. In other words, that tiny little group of two at the top of the dendrogram is the pairing of Oklahoma City and Garden City at the top of the list.
The chart in the middle shows us which archetypes were in each top eight, and how many of them appeared there (brighter green = more copies). For example, the top of the chart is a PTQ in Oklahoma City. It had a CawBlade deck (dim green), one “other aggro” deck, two Valakut decks (moderate green), and four Tempered Steel decks (bright green).
How regional are we?
If you stare at that middle graphic for a little bit, something will leap right out at you.
Yeah, check out that giant, glowing slice of CawBlade right through the middle. That’s the kind of thing clustering gives us – one of the defining factors that makes some top eights different from others is the crushing presence (or not) of CawBlade.
I mean, that’s not actually surprising. But it’s comforting when our analysis returns some of the things we expect it to return.
However, in light of weird events like that odd European GP, we might be inclined to think that this CawBlade dominance will primarily be in American top eights. I’ve certainly seen it suggested that the Open Series and similar “big money” events in the U.S. are drivers are a much more rapid rate of advance in Standard tech. Of course, I’ve also seen it suggested that they promote stagnation, since the top players prefer to avoid the risk of switching up designs, and the grinders just don’t have time to test new archetypes on a weekly basis.
Here’s what that CawBlade-dominated chunk looks like:
So, eight events with a heavy Caw presence, seven of them American. Notably, our American Caw-infested events include one Open Series tournament in Seattle, then two PTQs subject to the California player base, and finally a two-PTQ weekend in Richmond and a PTQ in Pennsylvania. The California events fit my anecdotal expectations for our area – we definitely love control decks, and there’s a strong bias toward playing the “best” deck. We can imagine the Richmond and Pennsylvania scenes being similarly biased, although that doesn’t explain the lack of similar enrichment for other major American PTQs.
This, in contrast, is a different grouping that is primarily defined by a strong presence of U/B Control, along with a smattering of fast aggro builds. This is a good time to mention that clustering isn’t just about having a lot of one thing – in this case, it really is the pairing of U/B Control and fast aggro that defines this group, as there are other groups that are just fast aggro, for example.
Unfortunately for our “geography matters” idea, this one is also pretty abundantly American – again, more so than the overall data set.
I’ll cap off this little exploration by assuring you that there isn’t enough of a geographic stamp out there to say, “You’re in Akron, expect Red decks!”
Okay, so that’s out.
How timely are we?
So, if it isn’t geography, maybe it’s time?
We’ll keep it short this time – there isn’t a big timestamp on our results, either. The most notable standout group is a pair of top eights from just two days after the release of M12, both of which included a plethora of U/W and U/B control decks…possibly before it was demonstrated that you could get away with just running CawBlade as a control option, even after the loss of Stoneforge and Jace.
Still, that’s really not a lot to work with.
It’s possible that there’s nothing to be drawn from this particular dataset, other than “don’t play RDW if you want to win.” But there’s one more thing we may want to look at using our clustering analysis.
You are who you fight
There’s a whole different consideration when we look at top eights and winning it all – and that’s who else is keeping you company in the top eight. We already know that CawBlade has been taking down more than its fair share of wins, but those other archetypes still won some 80% of the time. Are there some top eights that represent especially soft fields for CawBlade? Are others ripe for combo?
That’s exactly the kind of question we can take a crack at with our clustering. We’re basically asking, “Given a certain type of top eight, what wins?”
Let’s start with that one slice of top eights utterly dominated, numbers-wise, by CawBlade decks.
Unsurprisingly, five of those eight events went to CawBlade. You’d kind of hope so, given how many copies were in each top eight. Nonetheless, wins also went to Valakut, RDW, and Goblins. Perhaps more to the point, these Caw-riffic top eights had decent representation in their remaining space from Valakut, Splinter Twin, RDW, and Tempered Steel.
In contrast, check this out:
Here we have another slice of CawBlade-enriched top eights…so why wasn’t it grouped with the other?
These top eights all feature a bunch of aggro decks – in this case, Vampires and Tempered Steel, but no contribution from Valakut and very little presence of Splinter Twin (or combo in general). The result? A clean sweep for CawBlade.
Obviously, the group is small and this is more a notional result than a statistical one, but I think it points toward a real thing. Combo decks may not be favored against CawBlade, but they can still just plain old win, clearing the way for a combo deck to take down the top eight or for an aggro deck to care its way through combo opposition for the win.
On the other hand, a field of pure aggro, especially aggro that can’t just burn you out, is pretty easy pickings for CawBlade.
This third Caw-enriched group features slightly less of CawBlade overload, opening up space for a mixture of other control, combo, and aggro decks – with a notable utter absence of RDW. In this kind of mixed environment, Caw still has an edge, but there is ample opportunity for some assortment of combo and other decks to knock it down a peg, letting their compatriots win.
Switching gears entirely-ish, we have this group that is heavily defined by a strong Valakut presence…although the Caw is still nigh-ubiqutious and RDW is abundant. Notably, the wins here are evenly split between CawBlade, Valakut, and Tempered Steel…with none going to RDW, our current format’s perpetual bridesmaid.
Tempered Steel may falter in a naked head-to-head with CawBlade, but once again, the fast aggro decks seem to benefit from the presence of combo to clear the way.
When CawBlade diminishes, does U/B Control take over?
As it happens, not so much. Although this group is defined largely by a strong presence of U/B Control, none of the wins went to that archetype. Instead, we have wins for other control and aggro decks, including one each to CawBlade and RDW. I’m honestly not sure what the take-home is here – maybe the U/B Control decks keep knocking each other out?
As we approach the end of this breakdown, we have a pair of events pretty much defined by Tempered Steel, with a full half of each top eight given over to the archetype. That much Steel seems to be sufficient to keep the Caw down, but that didn’t ensure Steel dominance, as Valakut took one win and Tempered Steel took the other.
This suggests that if you throw enough copies of Tempered Steel at the metagame, CawBlade won’t make it to the end of the road…but that doesn’t ensure that the Tempered Steel decks will, either (since Valakut may simply be faster).
Finally, we close out the breakdown with this group that accounts for much of the U/W Control presence in recent top eights, with a side helping of CawBlade and Splinter Twin. With so much U/W Control in the mix, what happened?
Right. CawBlade and Splinter Twin took it down. It seems plausible that the U/W Control decks were able to knock CawBlade copies out of the top eight…and then summarily lose to Splinter Twin, whose success here certainly outpaces its abundance.
The answer is…?
So, how do we wrap all of this analysis up into something that’s not only interesting, but also useful?
First, we need to either assume that our local tournament metagame will be reflected in top eights or have a good feel for the most successful decks in our local metagame.
Then, armed with that knowledge, I’d be happy to make some speculative rules of thumb based on our observations so far:
Keep in mind that the moment we started assigning causes to these effects, we entered into the land of speculation (even if it is, hopefully, speculation guided by data and experience). So if some of these “rules” strike you as silly – hey, you may be right! It’s not like we’ve experimentally tested the cause-effect relationships here.
More generally, if you had to walk into a Standard event blind to the local metagame tomorrow, it looks like it does make a lot of sense to play CawBlade or something very much like it, but to shore up your game against Valakut to avoid being randomly topdecked out of a tournament.
Or, even more generally, the regulators in the current metagame are the fast clocks, be they combo (Valakut) or Tempered Steel (aggro). You want to pay attention to their prevalence in your local metagame because they influence the ability of decks that aren’t casting Hawks and Swords to win. Any one Tempered Steel or Valakut deck may not have good odds of taking down the tournament, but taken in aggregate they form the “chaos” that keeps the consistent winners consistent, but not crushingly dominant.
This isn’t a great time to play aggro – but the take home message is that those of us who don’t want to play aggro sincerely hope that the rest of you will take the job.
magic (at) alexandershearer.com
parakkum on twitter