It is common at any given point that two or more players I respect hold strong views about what is “the best deck” in a particular format. If one person tells me Mythic is the best deck, while another tells me UW Control is the best deck, they can’t both be right. If asked to estimate the probability their deck could win a tournament if computers could simulate a tournament, they would each assign a high or relatively high chance of victory. One or both of them (likely both) is overstating how strong their deck is in the format. As I said, they can’t both be right about the “best” deck, but they can both be wrong. Even if one of them has correctly identified the best deck, they likely have overestimated even that deck’s effectiveness. In this article, I want to explore how and why this tends to happen. What makes players play extensively in a format and then believe they have found the best deck?
I’ve found that some of the most helpful tools for understanding why humans act the way they act come from a body of research in psychology known as “heuristics and biases.” If you can develop a good grasp of heuristics and biases, along with probability theory and decisional theory (two important fields not discussed in this article, stay tuned though), you’ll understand more about human decision-making than any of our ancestors did. I’ll be focusing on the field of heuristics and biases in this article, but I can only provide a quick glimpse at a few of the many phenomena that make up that body of research.
If you’re one of the people who loves to search for and then declare you have found “the best deck in the format,” you probably should read this article twice. Gerry Thompson should probably read this article three or four times.
A heuristic is a “shortcut” the brain uses to solve a problem. As I discussed in my article on Optimization by Proxy, the brain doesn’t have the resources to precisely calculate everything that is going on around it. Calculating things like probability or determining exactly which information to recall is a very complex problem. When one of the heuristics causes a predictable inaccuracy in our thinking, we call that resulting inaccuracy a bias. Let’s take a look at some specific examples, and apply them to the question “why does everyone think they have the best deck?”
The availability heuristic (and resulting bias) were first identified by Tversky and Kahneman in their 1973 paper “Availability: A heuristic for judging frequency and probability.“ That paper describes a study in which subjects were asked whether the letter R was more likely to appear in the first position in an English word (as in “rake”) or the third position (as in “core”). A large majority of subjects responded that the first position was more likely, even though in reality the third position is more likely. What people were doing was thinking of examples of each type of word and using the availability of examples to determine the expected frequency of those types of words. Tversky and Kahneman called this the “availability heuristic.” The availability heuristic has been demonstrated in many studies since 1973, and you can Google the topic to learn more if you’re curious.
Another example of the availability heuristic at work is that imagining a particular event (i.e. “picturing it in our heads”) tends to make us believe that event is more likely than other similar events we did not imagine. From Wikipedia:
“In one experiment that occurred before the 1976 US Presidential election, participants were asked simply to imagine Gerald Ford winning the upcoming election. Those who were asked to do this subsequently viewed Ford as being significantly more likely to win the upcoming election, and vice versa for participants that had been asked to imagine Jimmy Carter winning.”
[citing Carroll, J. S. (1978). The effect of imagining an event on expectations for the event: An interpretation in terms of the availability heuristic. Journal of Experimental Social Psychology 14, 88-96.]
Other studies have shown people are more likely to insure against a specific risk than obtain a broader type of insurance that includes that very same specific risk. This is likely because when offered the specific risk insurance, people visualize the event (such as a plane hijacking or earthquake) and it thus seems more likely to occur.
How does this impact why people think their deck is the best? Well, to answer the question “Which deck is the best?” players bring to mind examples of decks that have performed well. They also might imagine matchups or even tournament results. The research I discussed above predicts that people will substitute the availability of a certain outcome (say, “Mythic wins”) for the probability (i.e. expected frequency) of that same outcome.
Imagine Mythic and Naya are both good decks in a format. They both have some good matchups, and no unwinnable matchups. If I have played more games with Mythic than Naya, examples of the Mythic deck winning likely spring to my mind more readily than examples of Naya deck winning. On the other hand, if Tom Ross has played more games with Naya than Mythic, examples of the Naya deck beating the other popular decks likely spring to his mind. In this context, I am likely to overestimate Mythic’s chances of winning while Tom is likely to overestimate Naya’s chances of winning. Thus, we both might honestly believe we have “the best deck.”
Fallacies of Compression
There are several ways testing can be unreliable, yet seem reliable. One way this is possible, which I often discuss, is playing against unrepresentative decks or opponents. Another is perhaps even more subtle, but just as dangerous. Sometimes a deck changes over time, and many versions of the deck are tested. Looking back on our testing, we develop one idea of how effective “the deck” has been even though we haven’t tested one deck, we have tested many. Jund with 2 [card]Sarkhan the Mad[/card] is similar to Jund without any Sarkhan the Mad, but thinking of both as simply “the Jund deck” can lead to bias, as explained below. I’ll start with an example of this effect in a made up game: Modified Rock-Paper-Scissors.
Imagine the classic game Rock-Paper-Scissors if players were allowed a modification or two to the three existing choices. Players can choose to play Rock, Paper, or Scissors, with or without one of these modifications. Choices are made before the tournament, just like a deck choice in Magic, and they must remain the same for the entire tournament, just like in Magic.
Here are two of the possible modifications, those that modify “Rock,” along with what impact each modification has:
Spiked Rock: Rock now beats Paper 55% of the time, loses to Scissors 55% of the time.
Nerf-Spiked Rock: performs exactly the same as an original, unmodified Rock (but looks cooler.)
Let’s start with preparation for the tournament, assuming players don’t know what the traditional RPS matchups are or what my proposed modifications do to those matchups. Players might set out to test the matchups prior to the tournament.
Test matchup 1:
Rock vs. Paper. Players learn that Paper is a heavy favorite.
Rock player now tries modification to see if he can swing the matchup.
Test matchup 2:
(Modified) Spiked Rock vs. Paper. Players learn that Rock now has an advantage. Sweet.
Satisfied with the Paper matchup, Rock players moves on to test against Scissors.
Test matchup 3:
Spiked Rock vs. Scissors. Players learn that Scissors has the advantage (see the description of Spiked Rock above for why they learn this).
Rock player has an idea for how he might change the scissors matchup; he’ll use the Nerf-Spikes modification. Let’s assume the player modifies his Rock in an apparently creative way, by blunting the spikes he has added using Nerf material. The effect of this modification is really just removing the “Spiked Rock” modification, returning his Rock to unmodified Rock, but of course the player doesn’t know he has merely “returned” to unmodified.
Test matchup 4:
Nerf-Spiked Rock vs. Scissors. Players learn that Rock crushes Scissors.
Players conclude Nerf-Spiked Rock “beats everything” and is a dominant strategy.
I hope you can see the flawed step, since that’s the point of this otherwise silly exercise. Just in case you missed it, the players have “solved” a matchup and then moved on. When subsequent modifications of the deck (err, the Rock) changed the earlier tested matchup, they didn’t go back and test it again, they just assumed the obtained result, “this deck beats Paper,” wouldn’t change. In Magic testing, this happens all the time. Spotting this step can be difficult and ruling it out can be time-consuming. Even my simplified example from Rock Paper Scissors got complicated very quickly, and we didn’t even discuss the possibility that the Paper and Scissors players could be modifying their tools at the same time as the Rock player. This modification by other players, i.e. a challenge to the assumption that your deck can change but theirs is static, or fixed, adds another source of possible error. What if Paper modifies to become Cheesecloth Paper that always beats Spiked Rock but always loses to Scissors?
There are enough possibilities in modified RPS, and especially in Magic, to allow for any “deck” to feel like the “best deck” if you let your testing get biased in that deck’s direction. Trying to “solve the metagame” from the perspective of one deck does just this. By assuming static opposing decks (e.g. unmodified Paper) while wielding a dynamic deck yourself (e.g., a sometimes modified and sometimes unmodified Rock), you are biasing your testing towards feeling like your deck is a “world-beater.” In reality, you haven’t been testing one deck, you’ve tested several. The shortcut you’re using is categorizing all the various versions as one deck. A bias thus emerges in favor of your deck when you consider all its various versions which are favorable in particular matchups to the exclusion of those versions that are unfavorable in those matchups. You check opposing decks off the list, all the while gaining examples of “favorable matchups.” Additionally, you tend to be “done” with a matchup and ready to move on only when you’re satisfied you can win that matchup. Thus you always end each matchup on a winning note, so the recency effect (another bias, the tendency of more recent events to be easier to recall, and thus more available, than prior events) gets added into the mix when you recall how a particular matchup played out (you recall winning at the end more easily than losing at the beginning of the tested matchup). Hopefully you can begin to see how the biases are adding up and interacting.
The availability bias is compounded by our tendency to overestimate the accuracy of our predictions. This well-established tendency is called the overconfidence effect. Subjects, when asked to make estimates regarding complex or unfamiliar topics and then predict their own accuracy, believe themselves to be more accurate than they really are. A classic example is that subjects asked to spell difficult words and then give their level of certainty about the spellings were correct about 80% of the time on words for which they reported being “100% certain.” (Adams, P. A. & Adams, J. K . (1960). Confidence in the recognition and reproduction of words difficult to spell. American Journal of Psychology, 73 pp. 544-552.)
The effect occurs at lower levels of certainty as well.
This means that even if I know I’m only giving my best estimate as to which deck is the best, I’ll tend to be overconfident about how accurate my estimate is. This leads to the certainty with which some people will claim to know the best deck. Remember that people who are “certain” of the answer to complex problems are wrong far more often than they think.
Our own views are the ones we must most closely scrutinize
Just because I understand the research does not make me immune to the phenomena I discussed above. After I won a PTQ for Amsterdam earlier this year, I wrote in my PTQ report that Mythic was the best deck in Standard. Regarding Block Constructed, at Pro Tour San Juan I told Evan Erwin in an interview that all the best decks in the format had [card]Everflowing Chalice[/card]. The actual “best deck,” the New York team’s Beastmaster deck, did not have Chalices, but it didn’t come to my mind since I wasn’t playing it and hadn’t played it yet in the tournament. These are statements I made without proper support, my confidence in which was greatly influenced by my own biases. The tools I am trying to teach you are of no value (in fact they’re of negative value) if you don’t apply them to yourself at least as often as you apply them to others. Disproving someone else’s argument about what is the “best deck” doesn’t get you additional wins in a tournament if your arguments about your deck are just as flawed. Pointing out the biases others possess while failing to examine your own also doesn’t lead to more wins. Personally, I must make a better effort to correct for all the biases discussed above. Is one deck more readily available to my mind than the others? How confident am I? OK, now let’s correct for overconfidence. Correcting might just mean adjusting my confidence level downward and attempting to obtain a fresh perspective by switching decks for a while. I must avoid having a “pet” deck if I have time to play with multiple decks (if I don’t have time to play many decks, I need to be aware of my tendency to overestimate the effectiveness of my “pet” decks).
Most importantly, I must view disagreements with others as a chance to spot biases and flaws in each of our perspectives, rather than an opportunity to pound my chest and insist that the world is exactly as my mind thinks it is. The world is never exactly as my mind thinks it is.
@mtg_law_etc on twitter.