What Can Game Theory Tell Us Will Make a Good Constructed Format?

Recently, the current Standard format has come under scrutiny, with various writers asking what makes a Constructed format “good” or how to fix Standard. This discussion was largely prompted by a Tweet from Wizards R&D member Sam Stoddard.

In this article, I’ll offer my perspective on this discussion. I will describe three key characteristics that I consider to be the most important for a healthy Constructed format, along with examples and definitions inspired by my background in game theory. Although I did my PhD thesis on cooperative game theory, I’m well-versed enough in non-cooperative game theory to be heavily influenced by equilibrium concepts when thinking about formats and metagames. I hope that this article can offer some new terminology or more concrete ideas on what makes for a good Constructed format.

What is a Format?

This may look like a silly question, but I think it is important to start with non-ambiguous definitions to clarify what I mean with certain words. I need to define formats, metagames, and equilibria.

Formats – A format is the combination of (i) a set of possible decks and (ii) a matrix of matchup percentages.

For Standard, the set of possible decks includes the specific 75 used by Matt Severa to win Grand Prix Denver, the same deck with 1 extra Plains, a deck with 59 Mountain, 1 Incendiary Flow, and no sideboard, and so on. Although the set of possible decks would technically be infinite, we can make it finite (yet still extremely large—a topic for another article) by restricting ourselves to decks with no more than 60 cards.

The other aspect of a format, the matchup matrix, describes the probability that any one deck would win a match against any other deck when two experienced players would play the matchup. For instance, Severa’s Mardu Vehicles list would defeat the 59 Mountain deck 100% of the time.

Metagames – A metagame is a distribution over the set of possible decks. For instance, in one metagame, 30% of the players could play Matt Severa’s Mardu Vehicles list and 70% of players could play the 59-Mountain deck. Don’t ask me why we would end up in that metagame, but it does qualify as one of the options.

Equilibria – A metagame is called an equilibrium if every deck from that metagame is 50-50 against that metagame as a whole and there is no way to improve. So every deck that is contained with positive probability in an equilibrium metagame has an expected win percentage of 50% against that metagame, and there is no deck that has an expected win percentage of more than 50% against this metagame.

A format could be viewed as a two-player symmetric game where one player and a full metagame have to choose a deck. By the 1928 paper “Zur Theorie der Gesellschaftsspiele” by John von Neumman, who laid the foundation of game theory, an equilibrium in the way I described always exists. This also follows from the more general result in the famous 2-page paper “Equilibrium Points in N-Person Games” by John Nash from 1950. But as you’ll see, a format does not need to have a unique equilibrium—there may be multiples. (Apparently, when Nash explained his more general result, von Neumann dismissively said, “That’s trivial, you know. That’s just a fixed point theorem.”)

To illustrate this and other concepts, I’ll show five examples. Some are artificial examples used for illustration—others represent (simplifications of) historical or current formats.

Example 1: Eldrazi Winter

The format used for this example is shown in the following table. Numbers are bracketed if they can be inferred from the other entries by symmetry or because they are 50-50 mirror matches.

 Eldrazi Infect Jund Eldrazi (50%) 60% 60% Infect (40%) (50%) 50% Jund (40%) (50%) (50%)

In this simplified format, only three decks are possible. There is a specific 75-card Eldrazi list, a specific 75-card Infect list, and a specific 75-card Jund list, and those are the only three decks allowed. If even single-card tweaks would become possible, then we would already end up with 10,000 possible decks or so. As you can imagine, the matchup matrix for real formats is incredibly large, and that’s why we tend to cluster similar decks together in archetypes—sets of similar decks. The delineation of which deck does or does not belong to a certain archetype always tends to be a bit vague, but that is simply out of linguistic necessity.

Anyway, for this Eldrazi Winter format, there is only one equilibrium metagame: the one with 100% Eldrazi. In any other metagame, you would have a higher than 50% chance to win a match if you would simply show up with Eldrazi. This is not a great format.

Example 2: Rock-Paper-Scissors-Lizard-Spock

 Rock Paper Scissors Lizard Spock Rock (50%) 0% 100% 100% 0% Paper (100%) (50%) 0% 0% 100% Scissors (0%) (100%) (50%) 100% 0% Lizard (0%) (100%) (0%) (50%) 100% Spock (100%) (0%) (100%) (0%) (50%)

In this format, five decks or strategies are possible, and there is only one equilibrium: A mixed one where each strategy occurs at a 20% share.

Example 3: Great Decks and Anti-Decks

 Great Deck A Great Deck B Anti-A Anti-B Anti-Anti Great Deck A (50%) 50% 30% 80% 100% Great Deck B (50%) (50%) 80% 30% 100% Anti-A (70%) (20%) (50%) 100% 40% Anti-B (20%) (70%) (0%) (50%) 40% Anti-Anti (0%) (0%) (60%) (60%) (50%)

In this five-deck format, there is an infinite number of equilibriums, but all of them only contain Great Deck A and Great Deck B. It is easy to check that a metagame is an equilibrium if and only if Great Deck A has a metagame share between 40% and 60% and Great Deck B has the remainder. Indeed, if one of the two Great Decks captures more than 60% of the metagame, then a player could get more than 50% against that metagame by picking the corresponding anti-deck.

Example 4: Current Standard

 Emrakul-Ishkanah decks Copter-Gideon decks The other options Emrakul-Ishkanah decks (50%) 50% 55% Copter-Gideon decks (50%) (50%) 55% The other options (45%) (45%) (50%)

This is a rather pessimistic model, and I should stress that it is only one way to provide a simplified abstraction of the current Standard. It’s far from fully accurate, but it may resonate with people who gave low marks to the current Standard. From what I’ve heard and read, this is how many players may look at Standard right now.

The above table is based on the view that that there are 4 cards that are much better than the alternative options. Emrakul, the Promised End and Ishkanah, Grafwidow both require you to get card types in your graveyard, and they are the flag-bearers of the Aetherworks Marvel and G/B Delirium archetypes. Smuggler’s Copter and Gideon, Ally of Zendikar, meanwhile, are seen in decks with Archangel Avacyn (U/W Flash) or Scrapheap Scrounger (Mardu Vehicles), but there is a surprising amount of overlap in terms of cards and the way games play out, so I have also grouped them together.

The other options encompass decks built around Metalwork Colossus, Panharmonicon, Prized Amalgam, Electrostatic Pummeler, Torrential Gearhulk, and so on. They exist, but at least in the above abstraction they are simply not strong enough, as none have more than 45% against Emrakul-Ishkanah decks and Copter-Gideon decks.

In the format represented by the above matrix, a metagame is an equilibrium if and only if “The other options” has a share of 0%. Any mix of Emrakul-Ishkanah decks and Copter-Gideon decks would be an equilibrium.

Example 5: Kamigawa/Ravnica Standard

I don’t have a full metagame matrix for this one, but I want to put it in because it is my all-time favorite and because several people mentioned it when Paulo Vitor asked this question.

I may be biased because of nostalgia or because the Kamigawa/Ravnica era was when I got my best premier event results, but I fondly remember that the format (let’s say around March-April 2006) had a wide variety of viable archetypes that all played out differently during the games. In that sense, it reminds me of Modern.

An aside on “solved format” and “best deck”

Although I won’t use the terms “solved format” or “best deck” in the remainder of this article, they are often used in Magic writing. It is not always clear to me what people mean byit, and even I may have used them loosely myself. But while I was thinking about equilibria, I figured I might as well offer interpretations of these terms that resonate with me.

I would say that a format is “solved” when the entire matchup matrix, at least for all relevant decks, has been discovered and the metagame has ended up in an equilibrium. With “relevant” decks, I mean that we can disregard dominated ones and that we have consolidated similar-looking decks into adequate archetypes. The process of discovering this matchup matrix tends to be captivating and fun, but when all reasonable decks have been exhausted and the metagame isn’t evolving anymore, then we are at the point where a format is “solved.”

For “best” deck, I would like to make a distinction between best deck for a format or a metagame. The way I look at it, a deck is best for a metagame if, out of all decks in a format, it maximizes the win percentage against that metagame. A deck is the best in the format if it is the most popular in every equilibrium metagame. Each format with only one equilibrium must have at least one best deck in the format.

With all that out of the way, let me define the three key characteristics that I consider to be the most important: a Constructed format has to be diverse, has to be dynamic, and has to be skill-intensive.

Appealing Characteristic 1: A Format is Diverse

My first desire for a healthy format is diversity. Ideally, there should not be a dominant deck—there should be a variety of viable decks that cover the spectrum of different play styles and color preferences. It’s fine to have some decks that are slightly better than others in a certain metagame, but players should always have a deck that fits their play style and that gives them a reasonable shot at winning.

A more rigorous definition could be that a format is diverse if in every equilibrium metagame, no archetype captures a metagame share larger than 25% and each archetype features a deck with at least 48% against this equilibrium metagame. The specific numerical values are up for debate, but I do find both aspects important: no archetype should be dominant, and fringe archetypes should be at least somewhat viable.

The only vague part in my definition is the notion of “archetype.” I mean it broadly as a macro-archetype, but that is still vague. Sam Stoddard recently wrote that when R&D is working on setting up metagames, they have five different buckets that they want to show up: pure aggro, midrange, combo/ramp, control, and disruptive/tempo aggro.

That’s a good start that makes things more concrete, but I would like to see other (macro-)archetypes as well. For example, I enjoy occasionally playing with or against a Prison deck. Think Winter Orb, Opposition, Stasis, Oath of Druids, and so on. I have always appreciated cards that change the rules of the game or that come from a completely different angle. Such cards offer a new game play experience, which is exactly what I like about Magic! I understand that many people don’t like it when their opponent is preventing them from casting spells, but I enjoy a format with a broad spectrum of game-altering cards because they lead to different types of games.

I would also like to break up R&D’s combo/ramp category into a dedicated ramp category (think Metalwork Colossus or Explosive Vegetation), a spells-combo category (think Angel’s Grace and Ad Nauseam), a creature-combo category (think Electrostatic Pummeler and Larger than Life), and a toolbox category (think Survival of the Fittest or Gifts Ungiven). Ideally for me, all of them are competitive.

But no matter whether we have 4, 5, 6, or 10 macro-archetypes and which ones exactly, I do call for the notion that none of them should be dominant (>25% share) or non-competitive (<48% win percentage) in any equilibrium metagame. Let’s see how this works in my five examples.

Example 1 (Eldrazi Winter) – This format is not diverse because the only equilibrium contains only one macro-archetype.

Example 2 (Rock-Paper-Scissors-Lizard-Spock) – If the five options represent the five possible macro-archetypes, then this format is diverse: In the only equilibrium, each archetype has a 20% share and a 50% win percentage. Note that the basic Rock-Paper-Scissors format, where each strategy has an equilibrium share of 33.33%, is not diverse under my 25% cutoff definition.

Example 3 (Great decks and anti-decks) – If we again say that each of the five decks represent one macro-archetype, then this format is not diverse. For instance, the equilibrium of 50% Great Deck A and 50% Great Deck B has more than 25% of any one archetype. Moreover, Anti-A would only have 45% against that equilibrium metagame and thus wouldn’t be viable under my 48% cutoff definition.

Example 4 (current Standard) – This format, or at least the metagame matrix I gave, is not diverse for the same reason as the previous example. This is largely caused by the specific abstraction that I chose. At the start of the format, I expected that all of the other/fringe archetypes would be closer to 49-50% against every equilibrium metagame and that the various Gideon/Copter decks or Ishkanah/Emrakul decks would play out as completely different archetypes–it looked like this would be a diverse format. I frankly haven’t played enough lately to make a firm stand on whether that is still the case or if Emrakul, Ishkanah, Gideon, and Copter are indeed too powerful and similar. If it’s the latter, then this Standard format is not diverse according to my definition. Getting Emrakul’d every other match can get old.

Example 5 (Kamigawa/Ravnica Standard) – In my memory, this format was diverse. I don’t have proof via a full matchup matrix of that format, but at least the metagame breakdown from Pro Tour Honolulu (whether that was an equilibrium or not) had a lot of different decks, with the most popular one claiming no more than 13%. There was Heartbeat of Spring combo, U/R Urzatron, W/B Ghost Council of Orzhova, Greater Gifts, Enduring Ideal combo, Gruul/Zoo aggro, Wildfire mana denial decks, various other flavors of aggro, midrange, and control, and so on. That metagame table had a lot of puzzling names: “Roxodon Hierarchy,” “Bad Religion,” or “For Whom the Knell Tolls,” which was rather amusing as I was doing research for this article, but many different competitive decks were indeed viable. Just like Modern, you could essentially pick a deck that matched your preferred play style, and most of the time you would be at least close to 50% at any tournament.

Appealing Characteristic 2: A Format is Dynamic

My second desire for a healthy format is that it is dynamic. Ideally, the metagame evolves, players’ choices are in flux from week to week, and there is a way to prey on every predator. In a dynamic format, a deck that fell out of favor due to having poor matchups in the metagame at that time may have a resurgence later on if the metagame jumps to a more hospitable environment. Likewise, there are answers and hate cards for pretty much everything. As a result, experiences at tournaments remain fresh all the time.

If a metagame is in an equilibrium, then no one has an incentive to adjust, and no evolution may take place. Fortunately, we may never actually reach an equilibrium in practice, and even if we do, players might not realize it. After all, the matchup matrix is never known with perfect certainty and people may stick to their preferred deck irrespective of how well it performs against the rest of the metagame. What I often see is that people try to beat the two best-performing decks from last weekend, and a healthy format offers the tools to do so.

Since an equilibriums preclude evolution, a notion of dynamics shouldn’t refer to it. Instead, I will call a format n-dynamic for some integer n if for every combination of n different decks, there exists another deck that is at least 60% to win against the metagame comprised of equal amounts of those n decks.

This 60% cutoff point is somewhat arbitrary, but my definition aims to capture the notion that there is a way to attack the top n decks in a metagame. In practice, I would desire both 1-dynamics and 2-dynamics. But let’s see it in some examples.

Example 1 (Eldrazi Winter) – This format is not n-dynamic for any n, as can be shown via counterexamples. Against an Eldrazi opponent, there simply is no deck with at least 60% to win, so the format cannot be 1-dynamic. Against an opponent who randomly chooses between Eldrazi and Infect, Jund only has 45% to win, which doesn’t come close to clearing the 60% bar, and thus the format is not 2-dynamic. Finally, there is no fourth deck, so the format cannot be 3-dynamic, although that isn’t a particularly useful notion in this context.

Example 2 (Rock-Paper-Scissors-Lizard-Spock) – Against any particular deck, there is a way to beat it 100%, so the format is clearly 1-dynamic. As a counterexample for 2-dynamics, take an opponent who randomizes 50-50 between Rock and Paper. Then there is no other deck with a better-than-50% win rate. Paper would work, but that’s not an other deck. You don’t get fresh experiences if you play one weekend against a field of 50% Paper and the next weekend against a field of 100% Paper—that’s what I had in mind when I defined n-dynamics. A similar observation would apply to basic Rock-Paper-Scissors.

Example 3 (Great decks and anti-decks) – This format is easily checked to be 1-dynamic, but it is not 2-dynamic because there is no way to beat, say, 50% Great Deck A and 50% Great Deck B. One Great Deck is easy to beat with an Anti-deck if you ignore everything else, but an Anti-deck is still not able to attack a properly balanced metagame of Great Decks. This could arise in practice when a Magic format only has super-specific answers like Leyline of the Void or Stony Silence, rather than more flexible answers like Putrefy or Mana Leak. This is why I would argue, conceptually, for a mix of both 1-dynamics and 2-dynamics.

Example 4 (current Standard) – This format is not dynamic for any n because there is no way to get 60% against even one deck, let alone multiples. Although the metagame matrix I used is an abstraction, it resonates with BBD’s notion that there are too few effective reactive cards. Things would likely be different if there were cards like Rest in Peace or Relic of Progenitus to fight back against players who collect card types in their graveyard—then you might get 60% or more against deck combinations with Emrakul. A bribery that would allow you to cast an Emrakul from an opponent’s deck or a Pithing Needle against Aetherworks Marvel or Smuggler’s Copter would also help. Given the way it is, Standard might be better if Emrakul were banned, but I dislike bans unless absolutely necessary because it’s frustrating for players who invested time and energy into an Emrakul deck. And I don’t think it’s absolutely necessary.

Example 5 (Kamigawa/Ravnica Standard) – In my memory, this format featured plenty of lopsided matchups. The Howling Mine/Ebony Owl Netsuke deck, for instance, was very strong against durdly midrange decks but had almost no way to beat a fast aggro deck. That’s a good thing for dynamics, because it means that there are good counters to prevent a deck from becoming dominant. I also remember that Kamigawa/Ravnica featured good mix of specific hate cards and flexible answers (Blood Moon, Cranial Extraction, Guerrilla Tactics, Wrath of God, Mana Leak, Putrefy, etc.) for both 1-diversity and 2-diversity. With these answers, it also felt like many games had built-in comeback mechanics, but that fits more with my third characteristic.

Appealing Characteristic 3: Gameplay is Skill-Intensive and Fun

My third and final desire is that gameplay is skill-intensive and fun. You can achieve both when there are choices and your decisions matter, and there is preferably some tension created by unknown information about opponent’s hands. Gifts Ungiven is a good example of a card that promotes all of that, so if cards like that show up frequently in equilibrium metagames, then I’m happy. I’ll also enjoy the choices offered by cards like Ponder or Arcbound Ravager, but more straightforward cards like Grasp of Darkness or Galvanic Bombardment aren’t as much fun in this regard to me.

Note that interactivity isn’t much of a factor for me here, at least not in the sense of casting Grasp of Darkness or Galvanic Bombardment to destroy opposing creatures. It’s fine that there is interactivity in gameplay, but to me it’s not a necessity. I’m perfectly happy to play a goldfish matchup between two non-interactive versions of Ad Nauseam and Affinity when there is enough skill and decisions in terms of mulliganing, sequencing, and optimizing your cards. There is also fun for me in the tension of not knowing whether my opponent could win on the next turn or not.

I’m not going to go deep into formally defining notions of skill-intensiveness here, but if you’re interested you could check out the Borm & van der Genugten method that compares a virtual player with full information, an expert player without full information, and a beginner.

In the current Standard, many matchups appear to be sufficiently skill-testing. For example, when I asked players at Grand Prix Madrid about the matchup between U/W Flash and B/G Delirium, they told me that the better player would win most of the time and that it was one of the most interesting matchups that Magic could produce. Likewise, the combat situations that I have encountered while playing Mardu Vehicles have offered me with plenty of interesting options and lines of play to navigate through. But spinning the wheel with Aetherworks Marvel is not particularly skill-intensive, even if I don’t actually mind the tension and high variance it creates.

Concluding Thoughts

In this article, I offered my perspective on what makes for a good Constructed format in general: Using equilibrium concepts from game theory, I defined and explored diversity, dynamics, and skill-intensiveness. This had been on my list of possible topics for a while, and I hope there were some interesting ideas or formalisms for you to take away.

Although I have used numbers and math to communicate my ideas, this doesn’t mean that they are “right.” That’s never how it works. If you feel that some of my definitions are markedly different from the way you had understood certain terms, then please share your view in the comments below. Likewise, if you feel that there is another valuable factor (for example, the existence of a budget deck) missing, then jump to the comment section as well.

Either way, may we have great Constructed formats in 2017!