The Field Report – Killing CawBlade So You Don’t Have To

We’re hardly all business here.

If we’re honest about it, Magic metagame analysis is equal parts “figuring out what we need to do to win” and “sports radio.” After all, although it’s useful to have guidelines for what deck to bring to your next tournament and how to tweak that deck for best results, it’s also just fun to hash out ideas like, “What’s the best deck?” and “Why did that deck even win?”

So, with that in mind, today I’m going to delve a little bit into both areas and ask, “Why did those decks do well?” and “How do we win next week?” In doing so, we get to move into two of my favorite topics – how reality (mis)matches our impressions, and how we can use our knowledge of what has gone before to figure out what to do next.

Or, to put it another way, do I really have to play CawBlade?

A big slice of Standard

This time around, I’m harvesting a whole bunch of data from post-M12 Standard events to inform our understanding of the current Standard metagame.

Specifically, I’ve collected the top eights from some forty-nine paper Magic events at roughly the PTQ level and above. This includes PTQs, SCG Standard Opens, other larger-scale events, and a flock of Nationals from around the world.

Going in, there are some clear built-in biases to the data we have to work with. About two thirds of the events come from the United States, and the majority of non-American events come from Western Europe. That said, the U.S. is big enough that we’d expect that if there are significant regional differences, we might see them occur even within the U.S. and not just between the U.S. and other countries.

The questions we’d like to ask the metagame

There are a few things we’d like to extract from a metagame analysis, to serve both our “want to win” and “sports talk” needs.

First, there are the basics. What’s winning? What’s not?

Which, of course, doesn’t give us any indication as to why any of that is happening. Most of the “why” is actually best addressed by in-depth consideration of matchups, card choices, and plays, but we can try to address it by asking some other questions as well.

These more elaborate questions focus on the fact that we’re not all just playing on Magic Online. Are there deck choice biases by region? By time? By type of event? Do different decks succeed at different levels?

With these questions in mind, let’s take a crack at actually using the information from those forty-nine events.

The basics – what’s winning?

Okay, so the basic question is “What’s winning?” That should be easy enough to address – after all, it’s right there in each top eight.

What’s winning, first take

Across the events surveyed, here are the winning archetypes and their tallies:

CawBlade – 22
Valakut – 4
Tempered Steel – 3
Goblins – 2
other aggro – 2
other control – 2
Red Deck Wins (RDW) – 2
Splinter Twin – 2
TwinPod – 2
U/B Control – 2
U/W Control – 2
Value Pod – 2
other combo – 1
Vampires – 1

Before we go on, here are a few quick notes on this tally.

First, any archetype that didn’t appear in at least 10% of the evaluated top eights was wrapped into another category. If it was a subset of a bigger archetype it was placed in that bigger archetype’s tally (for example, the two Sukenik-style [card]Gravitational Shift[/card] CawBlade decks were tossed in with the CawBlade category). If it didn’t fit one of those bigger archetypes, it was batched in with the appropriate “other X” category.

Second, although most of these archetype names will be evident for anyone who’s following Standard, “Value Pod” is my own shorthand for decks that use Birthing Pod but don’t leverage it into some kind of combo (that is, Podding “for value”).

Our first stumbling block

These numbers are nice and all, but they have a problem.

In a word, Nationals.

Consider the potentially depressing top eight from this year’s U.S. Nationals. Six CawBlade decks?

Well, first impressions aside, we’d do well to remember that Nationals tournaments are mixed events. Of the fourteen rounds that led to the top eight at U.S. Nationals, only eight were Standard. It’s possible, then, to make it to the top eight with a deck that has basically no chance of taking you through a full tournament on its own merits.

Thus, although the top eight of U.S. Nationals featured six copies of CawBlade, the list of Standard decks that took no more than one loss across eight rounds of play saw the CawBlade percent had the same CawBlade count, but also featured eight other decks:

CawBlade – 6
Value Pod – 2
RUG Splinter Twin – 1
Tempered Steel – 1
TwinPod – 1
U/B Control – 1
Valakut – 1
Vampires – 1

Obviously, CawBlade still has a big footprint here, but 43% of the top decks is very different from 75% of the top decks.

Unfortunately, we simply don’t have the data on what decks actually did well in all the Standard rounds at every Nationals. This suggests a very simple solution to issue of skewed top eights from mixed events.

Leave them out.

So moving forward, we’re going to ignore all those Nationals top eights, bringing us down to thirty-six events to pull data from.

Our clarified results

So, if we remove all those Nationals top eights (goodbye, funky Valakut/TwinPod hybrid deck that won Malaysian Nats!), then our winners look like this:

CawBlade – 15
Valakut – 4
Tempered Steel – 3
Goblins – 2
other aggro – 2
other control – 2
RDW – 2
Splinter Twin – 2
Value Pod – 2
U/B Control – 1
Vampires – 1

So, CawBlade is still at the top of the heap – not that we expected it to completely disappear with the removal of Nationals.

But this is also not the end of the story. After all, if only five people actually played in these thirty-six top eights with Goblins and then two of them won, that would be pretty impressive, right? Certainly, it would be more impressive than if twenty people were in those same top eights with Goblins.

As it happens, five people did play Goblins and come away with two wins.

So let’s clarify our question one more time.

Getting there versus getting there

Knowing a deck’s frequency in the top eight is nice and definitely informative in terms of our desire to win – since we’ll need to beat the others in the top eight if we want to win. But we’d also like to know how often an archetype takes the top prize versus how often it appears.

That breaks down like so:

CawBlade – 21.4%
Valakut – 12.5%
Tempered Steel – 8.3%
Goblins – 40%
other aggro – 22.2%
other control – 50%
RDW – 8%
Splinter Twin – 10.5%
Value Pod – 20%
U/B Control – 5.9%
Vampires – 7.1%

As a reminder, this is the percentage of top eight appearances by that archetype that resulted in a win. I strongly suspect that there is some skew here for small sample sizes for Goblins and “other control,” but the other win rates are pretty informative.

Our first pass conclusions

So, after all that, do we want to draw any conclusions?

I was certainly pretty shocked to see the big, chunky-style discrepancy between appearances and wins for U/B Control, Tempered Steel, and RDW. Between these three archetypes, there are seventy-eight top eight appearances and a less than stellar six wins.

Of course, it’s easy to overlook that U/W Control decks had sixteen top eight appearances and no wins.

In fact here are the top ten decks in terms of making it into top eights, with their appearance percentage (and the percent of total events won in parenthesis).

CawBlade – 24.5% (41.7%)
Tempered Steel – 12.6% (8.3%)
Valakut – 11.2% (11.1%)
RDW – 8.7% (5.6%)
Splinter Twin – 6.6% (5.6%)
U/B Control – 5.9% (2.8%)
U/W Control – 5.6% (0%)
Vampires – 4.9% (2.8%)
Value Pod – 3.5% (5.6%)
other aggro – 3.1% (5.6%)

CawBlade is clearly overrepresented in terms of wins versus appearance rate, whereas Valakut is running about even. Clearly, CawBlade is eating a little bit of everyone else’s cake, since most of the other contenders clock in at a little to a lot less than their expected win rate – if everyone won roughly as often as they appeared, anyway.

So is our conclusion simply “play CawBlade?” Or are there other factors at work here?

Does your mileage actually vary?

We can pretty easily come up with some ideas about factors that could influence the contents and winners of these top eights.

Are there regional differences, for example? While the pre-banning CawBlade monster was ravaging North America, we saw a GP top eight in Barcelona that featured a curious 2-2-2-2 split between CawBlade, U/B Control, Valakut, and RUG.

Are there time trends that we lose when we bunch tournaments together like this? Maybe Tempered Steel was awesome right after M12 came out, but its performance has dropped off since then?

How would you even really ask these questions?

Trawling for insight

This kind of situation comes up all the time in my line of work.

“So, we’ve collected 2,000 samples from that big E. coli outbreak, and we’ve sequenced the genomes of each infecting E. coli strain. What do we do now?”

There are a lot of these problems where you can maybe take a stab at an understanding of what’s going on using your own feeble brain, but you’d really like to set up that kind of question that’s really hard to answer by hand and then let a computer do it.

One way we do this is called clustering.

“Clustering” describes a family of techniques by which we throw a bunch of data into a computational blender and say “stick the stuff that’s similar together.” The nice aspect of using clustering methods is that you, the person who sets it up, don’t have to sit there doing an infinite amount of tedious work to try and maybe figure out how your data group together. In fact, you often won’t know ahead of time what the defining factors that really separate your data out are.

For example, in this case we might learn that geography matters, or time matters, or something else entirely.

To ask this question, I took those unified (that is, single-format Standard constructed) event top eights and clustered them based on relative presence of the various archetypes.

For the technically minded, here’s what I did:

Each of the thirty-six top eights was broken out as a profile with a number assigned for each archetype’s abundance in that top eight (including assigning “0” when an archetype was absent). This gave each event a numerical archetype profile (e.g. the 7/23 Santa Clara PTQ is 000030100100001020; that “3” is the CawBlade count). I then used Cluster 3.0 to subject them to hierarchical clustering using the Euclidean distance metric and the average linkage method).

The end result is a big chart that looks like this:

It can be a bit much to look at initially, so don’t worry if this doesn’t leap out at you and say “Here’s your answer.”

That writing-diagram looking part on the left is a dendrogram, which is a fancy Greek way to say “tree drawing” (not kidding – that’s what it means in Greek). It basically shows how the various top eights group up, starting on the left with a big group that includes all of them, then breaking them down into smaller, more similar groups as we go to the right.

The list of event names on the far right shows how the events are grouped – it matches that chart on the left. In other words, that tiny little group of two at the top of the dendrogram is the pairing of Oklahoma City and Garden City at the top of the list.

The chart in the middle shows us which archetypes were in each top eight, and how many of them appeared there (brighter green = more copies). For example, the top of the chart is a PTQ in Oklahoma City. It had a CawBlade deck (dim green), one “other aggro” deck, two Valakut decks (moderate green), and four Tempered Steel decks (bright green).

How regional are we?

If you stare at that middle graphic for a little bit, something will leap right out at you.

Yeah, check out that giant, glowing slice of CawBlade right through the middle. That’s the kind of thing clustering gives us – one of the defining factors that makes some top eights different from others is the crushing presence (or not) of CawBlade.

I mean, that’s not actually surprising. But it’s comforting when our analysis returns some of the things we expect it to return.

However, in light of weird events like that odd European GP, we might be inclined to think that this CawBlade dominance will primarily be in American top eights. I’ve certainly seen it suggested that the Open Series and similar “big money” events in the U.S. are drivers are a much more rapid rate of advance in Standard tech. Of course, I’ve also seen it suggested that they promote stagnation, since the top players prefer to avoid the risk of switching up designs, and the grinders just don’t have time to test new archetypes on a weekly basis.

Here’s what that CawBlade-dominated chunk looks like:

So, eight events with a heavy Caw presence, seven of them American. Notably, our American Caw-infested events include one Open Series tournament in Seattle, then two PTQs subject to the California player base, and finally a two-PTQ weekend in Richmond and a PTQ in Pennsylvania. The California events fit my anecdotal expectations for our area – we definitely love control decks, and there’s a strong bias toward playing the “best” deck. We can imagine the Richmond and Pennsylvania scenes being similarly biased, although that doesn’t explain the lack of similar enrichment for other major American PTQs.

This, in contrast, is a different grouping that is primarily defined by a strong presence of U/B Control, along with a smattering of fast aggro builds. This is a good time to mention that clustering isn’t just about having a lot of one thing – in this case, it really is the pairing of U/B Control and fast aggro that defines this group, as there are other groups that are just fast aggro, for example.

Unfortunately for our “geography matters” idea, this one is also pretty abundantly American – again, more so than the overall data set.

I’ll cap off this little exploration by assuring you that there isn’t enough of a geographic stamp out there to say, “You’re in Akron, expect Red decks!”

Okay, so that’s out.

How timely are we?

So, if it isn’t geography, maybe it’s time?

We’ll keep it short this time – there isn’t a big timestamp on our results, either. The most notable standout group is a pair of top eights from just two days after the release of M12, both of which included a plethora of U/W and U/B control decks…possibly before it was demonstrated that you could get away with just running CawBlade as a control option, even after the loss of Stoneforge and Jace.

Still, that’s really not a lot to work with.

It’s possible that there’s nothing to be drawn from this particular dataset, other than “don’t play RDW if you want to win.” But there’s one more thing we may want to look at using our clustering analysis.

You are who you fight

There’s a whole different consideration when we look at top eights and winning it all – and that’s who else is keeping you company in the top eight. We already know that CawBlade has been taking down more than its fair share of wins, but those other archetypes still won some 80% of the time. Are there some top eights that represent especially soft fields for CawBlade? Are others ripe for combo?

That’s exactly the kind of question we can take a crack at with our clustering. We’re basically asking, “Given a certain type of top eight, what wins?”

Let’s start with that one slice of top eights utterly dominated, numbers-wise, by CawBlade decks.

Unsurprisingly, five of those eight events went to CawBlade. You’d kind of hope so, given how many copies were in each top eight. Nonetheless, wins also went to Valakut, RDW, and Goblins. Perhaps more to the point, these Caw-riffic top eights had decent representation in their remaining space from Valakut, Splinter Twin, RDW, and Tempered Steel.

In contrast, check this out:

Here we have another slice of CawBlade-enriched top eights…so why wasn’t it grouped with the other?

These top eights all feature a bunch of aggro decks – in this case, Vampires and Tempered Steel, but no contribution from Valakut and very little presence of Splinter Twin (or combo in general). The result? A clean sweep for CawBlade.

Obviously, the group is small and this is more a notional result than a statistical one, but I think it points toward a real thing. Combo decks may not be favored against CawBlade, but they can still just plain old win, clearing the way for a combo deck to take down the top eight or for an aggro deck to care its way through combo opposition for the win.

On the other hand, a field of pure aggro, especially aggro that can’t just burn you out, is pretty easy pickings for CawBlade.

This third Caw-enriched group features slightly less of CawBlade overload, opening up space for a mixture of other control, combo, and aggro decks – with a notable utter absence of RDW. In this kind of mixed environment, Caw still has an edge, but there is ample opportunity for some assortment of combo and other decks to knock it down a peg, letting their compatriots win.

Switching gears entirely-ish, we have this group that is heavily defined by a strong Valakut presence…although the Caw is still nigh-ubiqutious and RDW is abundant. Notably, the wins here are evenly split between CawBlade, Valakut, and Tempered Steel…with none going to RDW, our current format’s perpetual bridesmaid.

Tempered Steel may falter in a naked head-to-head with CawBlade, but once again, the fast aggro decks seem to benefit from the presence of combo to clear the way.

When CawBlade diminishes, does U/B Control take over?

As it happens, not so much. Although this group is defined largely by a strong presence of U/B Control, none of the wins went to that archetype. Instead, we have wins for other control and aggro decks, including one each to CawBlade and RDW. I’m honestly not sure what the take-home is here – maybe the U/B Control decks keep knocking each other out?

As we approach the end of this breakdown, we have a pair of events pretty much defined by Tempered Steel, with a full half of each top eight given over to the archetype. That much Steel seems to be sufficient to keep the Caw down, but that didn’t ensure Steel dominance, as Valakut took one win and Tempered Steel took the other.

This suggests that if you throw enough copies of Tempered Steel at the metagame, CawBlade won’t make it to the end of the road…but that doesn’t ensure that the Tempered Steel decks will, either (since Valakut may simply be faster).

Finally, we close out the breakdown with this group that accounts for much of the U/W Control presence in recent top eights, with a side helping of CawBlade and Splinter Twin. With so much U/W Control in the mix, what happened?

Right. CawBlade and Splinter Twin took it down. It seems plausible that the U/W Control decks were able to knock CawBlade copies out of the top eight…and then summarily lose to Splinter Twin, whose success here certainly outpaces its abundance.

The answer is…?

So, how do we wrap all of this analysis up into something that’s not only interesting, but also useful?

First, we need to either assume that our local tournament metagame will be reflected in top eights or have a good feel for the most successful decks in our local metagame.

Then, armed with that knowledge, I’d be happy to make some speculative rules of thumb based on our observations so far:

Keep in mind that the moment we started assigning causes to these effects, we entered into the land of speculation (even if it is, hopefully, speculation guided by data and experience). So if some of these “rules” strike you as silly – hey, you may be right! It’s not like we’ve experimentally tested the cause-effect relationships here.

More generally, if you had to walk into a Standard event blind to the local metagame tomorrow, it looks like it does make a lot of sense to play CawBlade or something very much like it, but to shore up your game against Valakut to avoid being randomly topdecked out of a tournament.

Or, even more generally, the regulators in the current metagame are the fast clocks, be they combo (Valakut) or Tempered Steel (aggro). You want to pay attention to their prevalence in your local metagame because they influence the ability of decks that aren’t casting Hawks and Swords to win. Any one Tempered Steel or Valakut deck may not have good odds of taking down the tournament, but taken in aggregate they form the “chaos” that keeps the consistent winners consistent, but not crushingly dominant.

This isn’t a great time to play aggro – but the take home message is that those of us who don’t want to play aggro sincerely hope that the rest of you will take the job.

magic (at) alexandershearer.com
parakkum on twitter

20 thoughts on “The Field Report – Killing CawBlade So You Don’t Have To”

  1. Hit submit too early…

    Anyway, thirty-six top eights doesn’t give you much room for analysis. Nothing would appear statistically significant if you ran the appropriate tests.

  2. This is as good a metagame article as I have ever seen. The staggering amount of useful information packed in here is incredible.

    And you even have a summary of recommendation!

    Wait… this reeks of academia, they don’t have you too do they?

  3. super spider monkey

    Another amazing article!

    Your weekly serials should be marked “Required Reading” next to it just along with Josh Silvestri’s. Thank you!

  4. Shock and surprise, Spaniel complaining about sample size. It’s almost like the author mentioned on numerous occasions that this was an issue…There’s not been a statistics based article yet written about Magic that doesn’t suffer from this issue, for obvious reasons (lack of data, shifting format, etc), so seriosuly, as long as the author notes the point, it’s not worth trolling about it in the forums, as it’s meant to be indicative rather than definitive!

  5. Awesome article! Never seen one this thoroughly researched.

    I’d love it if this became a regular thing

  6. Pingback: MTGBattlefield

  7. Try playing cawblade and its not overpowered at all. As article says it has a very good aggro match up and enough disruption + clock against combo (twin or pyromancer) but against valakut you need something to deal with that deck or they will just titan you out and prepared UB control can pretty much overwhelm you quite easily unless you get draw of hawk into sword.

  8. Great article, I love that you really explain how you treated the data and what you used to back up your conclusions.

    For all you web programmers out there, someone has to build a website where you can do this kind of sophisticated analysis!

  9. @Ross – you might be right, maybe new CawBlade isn’t overpowered… but since people also said that for a long time about old CawBlade (until there was no way they could keep a straight face anymore), I’m reluctant to give too much credence to such an assertion if there’s no data to back it up.

  10. if caw + combo = caw mostly wins, and caw + aggro = caw wins; it makes no sense that caw + combo + aggro would make it a more open field than just caw + combo. i realize this is a conclusions drawn from “statistical analysis” but common sense still has its place lol

  11. Hey, i know its a lot of work but dont you think calculting top8 match win% would be more interesting? 3-0 for the first place, 2-1 second place, 3+4 1-1 and 5-8 0-1. I didnt read the whole article but of course ca wins the events more often if there are 3 caws in the top8 for example?

  12. Caw itself isn’t that powerful of a deck, it’s just good at beating bad players. On mtgo where everyone is at least above average caw doesn’t do that well. At scg opens I think we see caw dominate because the tournament is basically full of people who play magic once every couple months.

  13. Germany nationals would disagree with your assessment of caw blade over pyromancer or rdw

  14. Thanks all for the replies so far. Since this is the kind of thing at least one person is naturally going to comment on, I wanted to include my current favorite take on significant vis-a-vis clustering analysis, from the nice folks at Statistica:

    “In fact, cluster analysis is not as much a typical statistical test as it is a “collection” of different algorithms that “put objects into clusters according to well defined similarity rules.” The point here is that, unlike many other statistical procedures, cluster analysis methods are mostly used when we do not have any a priori hypotheses, but are still in the exploratory phase of our research. In a sense, cluster analysis finds the “most significant solution possible.” Therefore, statistical significance testing is really not appropriate here, even in cases when p-levels are reported (as in k-means clustering).”

    If this were, for example, a gene expression experiment, the kind of clustering analysis I ran here would be the first step, to be followed by specific experiments targeting those genes that the clustering suggested are important or linked (or whatever). It’s generally going to be problematic to look at a bunch of archetype-versus-archetype win percentages (for example), and declare that one deck is “more powerful” without having some some prospective experiments to evaluate that conclusion. This is why medical research often goes in the order (1) anecdotal observation, (2) retrospective correlation, (3) prospective testing.

    Also, for @corey – It actually makes a lot of sense. Remember that there is no one “CawBlade” deck, but rather a family of decks that are mostly able to be batched in that one archetype. That includes decks that do or don’t have Hero, that do or don’t have Gravitational Shift, etc. So if all the opposition in a tournament is aggro, then those Caw builds that stomp aggro (e.g. Gravitational Shift) will clean house. If all the opposition is combo, then the combo-tilted variations will do well. In either case, the result to the outside observer is “Caw decks win.” If *both* aggro and combo are present, then each Caw deck is more likely to be tested along both of these lines of play, and the overall Caw failure right could be expected to rise quite a bit. My In Development article “Multiple Lines of Attack” is essentially about this idea, for anyone who wants further reading.

    @Tobias – That would be a ton of work; however, I did address the “you have lots, so you win lots” issue — there are top eights that are rich in certain decks where those decks nonetheless fail like crazy to win. So it’s not just “more there, more wins.”

  15. The conclusion is valid. This is a horrid time for aggro, with Caws, Timely Reinforcements and alll sorts of incidental lifegain (batterskull, wurmcoil, etc)

    What surprises me is that U/B control (US Nationals notwithstanding) has failed to put up numbers. David Ochoa said his match against Aintrazi’s UB felt really bad. It’s probably a combination of U/B being soft to Tempered Steel, and people running suboptimal lists.

    Very cool use of clustering!

  16. There are a couple of things your data doesn’t take into account. First, the representation of each deck in the top 8’s versus their total representation in the field as a whole. For example, if 50% of the entire field played Caw, then you might expect 50% of the top 8 to be Caw. That would be the best pure statistical way to see how dominant an archetype might be, but there probably aren’t any records to draw off of. Another factor that would be impossible to calculate is the various skill level of the pilots of these decks. If all the best players in the world are playing Caw, then some of the success could be attributed to the players and not just the decks. If those same players decided to play U/B instead, you might see the results skewed in it’s favor. Of course that wouldn’t matter with decks like Valakut and Hawkward that require no skill at all.

  17. The problem of this article is that it is trying to use statistic and failing to do it. Put numbers together just don’t give the right answer. When you are saying that, for example, to not play UB, like it is not that good, and then we see UB winning German Nationals, after winning US Nationals, it feels just your article is very bad, and indeed it is. The main problem is that you want to prove something that the real results show that you’re wrong. In fact, when we study statistics, and I mean the serious one, you CAN’T induce the samples to give the answer you want to “hear”, you must collect the right appointments. Sorry, I know it is a hard critic, but a CF article affects too much the Magic “enviroment”, and this could make many people to do wrong choices.

  18. Actually thought this article would be terrible since most articles like this are just compiled percentages, but was surprisingly one of the best articles of its kind I’ve ever seen.

    It’s a shame that it took 10 pages of text to come to the conclusion that Cawblade is good, but maybe this will be a lot more useful in a more diverse metagame. 😛

    Enjoyed the article thoroughly, hope to see more!

Comments are closed.

