Bilbao Braggings—The Full Story of Who Beats Whom in Modern

The world is changed. I smell it in the air. There’s a whir coming. A shadow has descended and fires burn again on Mount Doom. Most of the Elves have departed these shores, leaving a couple of brave souls to venture into the land of Mardu alone. There are rocks to climb and trees to scale, bridges below and drums in the deep. There are towers and mines. There are amulets and relics. And in the icy waters beneath the mountains lurks one Thing to rule them all…

I’m happy to report on the results of almost all matches played recently at Grand Prix Bilbao. In total, we know which deck beat which in 5,836 single encounters. A bigger data set always yields more accurate insights, and this is the biggest seen so far. It’s not just huge in absolute numbers, but in relative terms too. We’re only missing deck info on 26 matches. 26 matches can’t put a dent in the findings below, so take the following as the definite guide to who beats whom in current Modern.

I write “we know” because the complete data is available to everyone on this Google sheet. The full thing is somewhat unwieldy, so I’ll break down the most relevant results for you. What’s relevant? Well…

Statistical Significance

I could point out, for example, that Ponza went 0-2 in matches versus Tron, because dunking on Ponza has a long tradition in Magic entertainment. Back in the previous century, Standard players had this saying that Ponza only ever won the mirror, and even then only if it went first.

It wasn’t true in Standard then, it isn’t true in Modern now, and this unfortunate record probably doesn’t reflect the true matchup between Ponza and Tron either. In fact, the chance to get such a result from a fifty-fifty matchup is 25%.

This is the crux. When we’re checking whether deck A or deck B is at an advantage in a battle between the two, the null hypothesis has to be that neither is. One way to control for variance is to calculate the probability of an observed match result or better occurring despite an assumed even matchup. Then, if said probability clocks in at 5% or lower, the observation is commonly considered to be of statistical significance.

Frank Karsten already wrote an article on the topic because of course he did. If you’re interested in the approach, its problems, and alternatives, I encourage you to read up on what the good doctor prescribed.

My own expertise lies more in making things look nice, so here we go:

A chart explaining statistical significance in number of wins and losses.


The blue area encompasses combinations of wins and losses that have a chance of >5% to come up in an even matchup. By this standard, a 2-0 record isn’t significant, and as you can see, a 3-0 and a 4-0 record fall within the “insignificant” territory too. A 5-0 record is where it begins to get interesting. Likewise, 7-1 or better is significant, but not 6-1 or 7-2. Of course, if a record of 16-7 bears statistical significance, then a flipped record of 7-16 does as well.

In the past, the results with which I’ve dealt often didn’t clear such high hurdles. Bilbao’s massive samples, in contrast, afford me the luxury to focus on significant records exclusively. This still doesn’t mean that all of the following represent unassailable facts. When testing a lot of matchups this way, you’re liable to get a false positive at some point. So let’s play a game here: Whoever finds a “green jelly bean” first and mentions it in a comment, wins.

Let’s Get Ready to Rumble

I’ll list all significant matchups for each archetype, beginning with Bilbao’s most played deck and moving down the popularity ranks from there. More pilots complete more matches, but this doesn’t necessarily yield more evidence of significantly lopsided matchups. You’ll find, for example, that Tron, Shadow, and Valakut sport way fewer of those than similarly popular archetypes.

Also be aware that the listed matchup results don’t reflect the totality of a deck’s performance. For that, refer to this handy table, which shows the overall win rate (including mirror matches) for the 19 decks with the highest number of followers in Bilbao:

A table of played and win-rates by deck.

Izzet Phoenix went:

  • 10-0 versus Elves (100%)
  • 12-3 versus Storm (80%)
  • 55-23 versus Hardened Scales (70.5%)
  • 43-66 versus Dredge (39.4%)
  • 17-38 versus Whir Prison (30.9%)
  • 7-21 versus Hollow One (25%)

I sorted these by win percentages (in parentheses), but note that winning 12 or more out of 15 in an even matchup is about one hundred times more likely than winning 55 or more out of 78. Though both qualify, with probabilities below 2% and below 0.02%.

Also notable is how Izzet Phoenix’s favorable matchups compare with the unfavorable. Elves and Storm are niche decks nowadays, whereas Dredge, Whir, and Hollow One all were among the 17 most popular archetypes in Bilbao. This suggests people may be adapting to Izzet Phoenix’s dominant position within the metagame.

Some decks have shown themselves capable of adapting too. At GP Los Angeles, Izzet Phoenix had won 11 of 15 recorded matches against Red-Green Valakut and beat Spirits 16-4. In Bilbao, however, Izzet Phoenix went 36-28 against Spirits (a win rate of 56% instead of 80%) and 25-27 against Red-Green Valakut (48% instead of 73%). None of the new information allows us to be confident that either deck is at any advantage at all, whereas previously we would have at least declared Spirits an underdog.

Modern hasn’t turned the corner yet, and we certainly can’t turn the page on Izzet Phoenix. The deck still won 55% of its 1,406 non-mirror matches in Bilbao. An actual 50% deck wins this much or more with a probability of only 0.007%. So, in true 007 fashion, the blue and red banner bird is still going to be the strongest weapon in her majesty’s arsenal when the best players fly to London later this month. But we have some cause for optimism that it won’t be all Phoenix all the way.

Burn went:

  • 6-0 versus Storm (100%)
  • 20-33 versus Shadow (37.7%)
  • 11-22 versus Affinity (33.3%)
  • 6-17 versus Whir Prison (26.1%)
  • 9-33 versus Humans (21.4%)
  • 2-11 versus Ad Nauseam (15.4%)
  • 1-8 versus Eldrazi and Taxes (11.1%)

Burn’s performance has stopped being embarrassing long ago. Now it’s simply depressing. And these are just the statistically significant results under the criteria outlined above. Lowering our standards doesn’t help Burn’s case either. Sure, the deck won nine of twelve matches against Infect, but more significant is that it went 16-27 against Dredge.

The Rock went:

  • 9-21 versus Humans (30%)
  • 8-19 versus Hardened Scales (29.6%)
  • 5-14 versus Hollow One (26.3%)
  • 6-19 versus Affinity (24%)

The Rock, on the other hand, did embarrass itself in Bilbao. It sported the second lowest win rate overall, only undercut by Mr. Burn’s. Traditionally, the midrange strategy was known as an all-rounder, but here this meant losing all around.

I went looking for favorable matchups in the less significant territory and found some hope in a 14-6 record versus Ad Nauseam. Alas, then I found a 36-51 record versus Izzet Phoenix. Fortunately, Rock fared much better than Jund in the battle against Tron. Unfortunately, it still lost 24 of 41 matches versus Urza lands.

Tron itself, the fourth most represented deck in Bilbao, went:

  • 11-1 versus Jund (91.7%)
  • 3-12 versus Whir Prison (20%)

It didn’t benefit or suffer from many extreme matchups, proving once again that a turn-3 Karn Liberated can beat just about anything.

Shadow went:

  • 5-0 versus Blue Valakut (100%)
  • 33-20 versus Burn (62.3%)

No single negative matchup was negative enough to reach statistical significance, but elsewhere Shadow accumulated 18 more total losses than wins. This itself isn’t statistically significant either, but it leaves a maximally middling overall performance.

Dredge went:

  • 9-0 versus Jeskai Control (100%)
  • 18-7 versus White-Blue Control (72%)
  • 66-43 versus Izzet Phoenix (60.6%)
  • 1-7 versus Ad Nauseam (12.5%)

Dredge beating White-Blue as well as the kindred Jeskai lends extra credibility to the notion that it’s hard to control the undead. And it seems notably harder without Terminus.

Humans went:

  • 12-2 versus Ad Nauseam (85.7%)
  • 33-9 versus Burn (78.6%)
  • 21-9 versus Rock (70%)
  • 6-18 versus Affinity (25%)

White-Blue Control went:

  • 6-0 versus Jund (100%)
  • 7-18 versus Dredge (28%)

Spirits went:

  • 5-0 versus Mill (100%)
  • 3-10 versus Hardened Scales (23.1%)
  • 2-9 versus Affinity (18.2%)

Hardened Scales went:

  • 6-0 versus Hollow One (100%)
  • 7-1 versus Jund (87.5%)
  • 10-3 versus Spirits (76.9%)
  • 19-8 versus Rock (70.4%)
  • 23-55 versus Izzet Phoenix (29.5%)
  • 3-13 versus Ad Nauseam (18.8%)

A couple of weeks ago, I reported that Hardened Scales beat White-Blue Control 5-1 in recorded matches at GP Los Angeles. This caused widespread outcry about how unlucky and unlikely such a result was. Well, in Bilbao Hardened Scales dutifully lost 14 of 20 matches against White-Blue, and indeed this bears more statistical significance than the earlier 5-1. Feel free to feel vindicated.

Red-Green Valakut went…

  • …about even, both overall as well as in every single matchup.

Affinity went:

  • 7-1 versus Jund (87.5%)
  • 9-2 versus Spirits (81.8%)
  • 19-6 versus Rock (76%)
  • 18-6 versus Humans (75%)
  • 22-11 versus Burn (66.7%)

Whir Prison went:

  • 5-0 versus Eldrazi (100%)
  • 12-3 versus Tron (80%)
  • 17-6 versus Burn (73.9%)
  • 38-17 versus Izzet Phoenix (69.1%)

This marks the end of the exercise, because going through further decks’ matchups would only repeat information already listed above. And it’s a sweet note upon which to end too, because no deck did better than Whir Prison in Bilbao, and no other deck did so well against three of the top five archetypes.

Actually, let’s disregard good practices for a second and review some more of Whir’s full two-day performance. The deck went:

  • 3-0 versus Infect
  • 2-0 versus Martyr of Sands
  • 1-0 versus 8 Rack
  • 1-0 versus Amulet Titan
  • 1-0 versus Elves
  • 1-0 versus Faeries
  • 1-0 versus Grishoalbrand
  • 1-0 versus Living End
  • 1-0 versus Merfolk
  • 1-0 versus Mill
  • 1-0 versus Zoo
  • 5-1 versus Hollow One
  • 4-1 versus Red Phoenix
  • 3-1 versus Shadow Zoo
  • 8-4 versus Dredge
  • 4-2 versus White-Blue Control
  • 4-2 versus Hardened Scales
  • 3-2 versus Jund
  • 10-8 versus Spirits
  • 8-7 versus Affinity
  • 6-6 versus Humans
  • 2-2 versus Ad Nauseam
  • 2-2 versus Red-Green Valakut
  • 1-1 versus Death and Taxes
  • 7-10 versus Rock
  • 8-12 versus Shadow
  • 0-1 versus Devoted Vizier
  • 0-1 versus Eldrazi and Taxes
  • 0-1 versus Esper Vengeance
  • 0-2 versus Storm
  • 0-2 versus Sultai Reclamation
  • 0-3 versus Jeskai Control
  • 0-3 versus Red Prison

This includes every even remotely real Modern deck, and all of it may as well boil down to variance. But it’s an impressive spread nonetheless, with winning records in 20 and losing records in nine matchups. We’re bound to get more reliable information soon, as more people pick up Whir Prison and complete more matches. This in turn may even put a spoke into the whirling wheels here, because more people will be familiar with Whir and may have the wherewithal to beat it.


A chart of significant matchups between GP Bilabo's 17 most popular decks.
Click to enlarge.


Scroll to Top