Skip to content
Starts With A Bang

How Probability Misleads Us About The Universe

Sign up for the Starts With a Bang newsletter
Travel the universe with Dr. Ethan Siegel as he answers the biggest questions of all

Just because something’s unlikely doesn’t mean that anything’s wrong.


In our quest to understand the Universe, theoretical physics is perhaps the most powerful tool we have as far as making predictions goes. On the one hand, we can measure how the Universe behaves on cosmic scales, gaining information about the laws and rules that it follows as well as its composition. We can then go back to the rules that govern it, throw in the raw ingredients, rewind the clock back as far as we’re willing to go, and simulate what type of Universe we’ll get out.

We can run the simulation as many times as we like, of course, and determine what the odds are of getting a Universe with certain structures or phenomena within them. When we go out to make our measurements, however, we only have the one Universe to observe. Most of the time, our observations align very well with what our simulated predictions indicated we ought to expect. But sometimes, we find phenomena that had extremely low probabilities of occurring. Critics of modern cosmology often point to these examples as proof that we’ve gotten something fundamentally wrong, but that’s generally a bad scientific practice. Probabilities can, and often do, easily mislead us about the Universe. Here’s how.

The largest-scale observations in the Universe, from the cosmic microwave background to the cosmic web to galaxy clusters to individual galaxies, all require dark matter to explain what we observe. The large scale structure requires it, but the seeds of that structure, from the Cosmic Microwave Background, require it too. The fluctuations should be random and gaussian in nature. (CHRIS BLAKE AND SAM MOORFIELD)

Let’s start with a very simple example that’s purely mathematical in nature: flipping a coin. Assuming the coin is perfectly fair, there are only two possible outcomes, heads and tails, each having 50% probability. You run all the simulations, flipping as many imagined coins as many times as you like — let’s say it’s one billion — recording all the possible results you can imagine. You can choose how you divide the different flips up: a billion flips all in a row, 1000 different series of a million flips apiece, or 100 million flips of 10 in a row.

You could, of course, simply calculate the probabilities exactly, since this is a simple enough problem that the math is straightforward enough. In general, however, most physical processes that we’d simulate are too complicated, and you can always reduce your errors further by making a more accurate or comprehensive simulation.

Then, with all that out of the way, you perform the real coin flips, and compare them to your simulations. What you get out could, quite possibly, be extraordinary.

Flipping a coin should result in a 50/50 outcome of getting either heads or tails. If you don’t get 50/50 results, that doesn’t necessarily mean your coin is biased, and the statistical likelihood of getting a few more heads or a few more tails than you’d expect is of a sufficiently high probability that a small number of flips cannot reveal that bias. (NICU BUCULEI / FLICKR)

Let’s say we choose to flip 10 coins. What results do you expect?

Most of us, instinctively, would anticipate that we’d get 5 heads and 5 tails. Indeed, that is the most common outcome if you flip 10 fair coins, but it’s not overwhelmingly likely. In fact, the odds that you’ll get exactly 5 heads and exactly 5 tails in 10 flips is only 24.6%: about 1 in 4 odds.

If you flipped ten coins and got the same result ten times in a row, you might think that something was rigged. How, after all, could you get saddled with such an unlikely outcome? The odds of getting ten flips that are either all heads or all tails is pretty low, at just 0.2%: 1 in 512.

And if you flipped ten coins and saw that, amidst your results, there were 5 heads in a row in there, you might be a little bit surprised. Should you be? As it turns out, each time you flip 10 coins, your chances of getting 5 heads in a row is 10.9%: approximately 1 in 11 odds.

Ten random coin flips can result in any of 1024 possibilities, all of which have equal probability. While this exact sequence, of HHTTTHHHHH has the same probability as any other, the fact that it has five heads in a row is a feature that is relatively unlikely. Whether the coin is biased or not cannot be determined from this single trial. (© 1998–2020 RANDOM.ORG)

You might look at these results with more (or less) suspicion, depending on what your expectations were. If you flip a coin 10 times and get 5 heads and 5 tails, you might simply say, “well, that’s in line with what I expected,” and never think about it again. If you got 5 heads in a row in your results, you might think, “well, that’s a bit unexpected, but nothing to write home about,” and you might file that information away in the back of your head and then go on with your next test.

But if you got either 10 heads or 10 tails, exclusively, it might raise a few concerns for you. The chances of getting all heads or all tails after 10 flips is so low that you’d probably think, “something is likely amiss. Perhaps my assumption that this is truly a fair coin, with a 50/50 probability of either heads or tails, is flawed in some way?”

And perhaps it is, perhaps it’s not. The way to tell, unsurprisingly, is to perform even better tests, and that requires further investigation.

If you flip 20 coins in a row, sometimes you’ll end up with streaks of 5 or even 6 heads in a row, just by random chance. But that doesn’t necessarily mean your outcomes are independent of previous results, or that your coin is fair or unfair. (SCREENSHOT FROM RANDOM.ORG)

If you decided, for example, to flip 100 coins, or 1000, you would have a much better handle than if you based your results on 10 coins alone. Even if your first 10 results were all heads, you’d expect that to start to even out with more tosses if the coins were truly fair. Your odds of getting 100 heads or 100 tails in a row are astronomically small: something like 1-in-10³⁰; that would be a clear indication that something is off. But your odds of getting at least 60 heads or at least 60 tails is not so bad: something like 5.7%.

That might fall into the “nothing to write home about” category, sure, but sometimes, further investigation is important, even when the outcome doesn’t defy your expectations. There’s a 38% chance of getting at least 6 heads in 10 tosses: no big deal. But there’s only a 2.8% chance of getting at least 60 heads in 100 tosses, and less than a 1-in-a-billion chance of getting at least 600 heads in 1000 tosses. In general, larger sample sizes — corresponding to more data — can help you discern between what’s just a random fluctuation and what indicates a flaw in your model.

Both simulations (red) and galaxy surveys (blue/purple) display the same large-scale clustering patterns as one another, even when you look at the mathematical details. If dark matter weren’t present, a lot of this structure would not only differ in detail, but would be washed out of existence; galaxies would be rare and filled with almost exclusively light elements. (GERARD LEMSON AND THE VIRGO CONSORTIUM)

The same mathematics that underlies a phenomenon as simple as coin flipping can also be applied to science: from biology to particle physics to cosmology. We have a picture of how the Universe works — the laws that govern it, the components that it’s made of, and the initial conditions that it began with — and so we can simulate how structures within it form, evolve, and grow with time.

We simulate the Universe over and over again, with the same laws and components, but randomly determined initial conditions, and see what happens. We can look at these simulated Universes and ask questions like:

  • How old is the Universe when stars begin forming?
  • When do we form the first galaxy clusters, and how large are they?
  • How often do we get a Universe where two galaxy clusters collide at certain speeds?
  • And how often does the Universe, when we simulate it, appear hotter in one direction than another?

After all, if we want to compare the Universe we have with our models of what we expect, we need to know how probable (or improbable) the outcome we see actually is.

Schematic diagram of the Universe’s history, highlighting reionization. Before stars or galaxies formed, the Universe was full of light-blocking, neutral atoms. While most of the Universe doesn’t become reionized until 550 million years afterwards, a few fortunate regions are mostly reionized at much earlier times. (S. G. DJORGOVSKI ET AL., CALTECH DIGITAL MEDIA CENTER)

Most of the things that we simulate do, in fact, line up precisely with what we expect. Simulations of early structure formation lead to the very first stars of all some 50–100 million years after the Big Bang, with the first deluge of stars forming some 200 million years after the Big Bang and enough to fully reionize the Universe another 300–400 million years later. The distant galaxies and quasars we’ve observed, at the extreme limits of current technology, all point to this picture being correct.

But then we look to the galaxy clusters we find, and compare them to the ones we expect to find, and things aren’t quite as clean. The “El Gordo” galaxy cluster, for instance, is a young but very massive galaxy cluster that’s causing a large amount of gravitational lensing, and also emits X-rays due to a relatively recent merger or collision. There should only be a few clusters in the Universe that have its properties in a typical simulation, and it’s fairly unlikely that we would have found one given the limited amount of the Universe we’ve explored.

The El Gordo galaxy cluster is one of the largest galaxy clusters in the Universe, and perhaps the largest one known to appear this early in the Universe’s history. According to our models of structure formation, it’s rather unlikely to find an object this massive this early in the Universe, but we only have the one Universe to examine. (NASA, ESA, J. JEE (UNIVERSITY OF CALIFORNIA, RIVERSIDE, USA))

Things can get even less likely than that. The Bullet Cluster — where two galaxy clusters are colliding at high speeds — shows clear evidence for the separation between normal matter (that emits X-rays) and total matter (whose mass causes gravitational lensing). It’s some of the clearest evidence for dark matter. And yet, when we simulate the Universe with dark matter as we understand it, the odds of getting a colliding pair of galaxy clusters with this uncanny speed is very small: less than 1-in-1000 by all accounts and as small as 1-in-a-billion in some simulations.

And the leftover glow from the Big Bang itself, the Cosmic Microwave Background, exhibits much smaller temperature fluctuations on the largest scales of all than theory predicts. When we simulate the Universe, only 1-in-770 simulations yield temperature fluctuations that are consistent with what we observe.

If your bias is to be dissatisfied with the current cosmological model, you might point to one of these facts and announce, “Don’t you see? It’s all wrong!” But this is a dangerous path, as it illustrates how probabilities can trick us into fooling ourselves.

The fluctuations in the Big Bang’s leftover glow, the Cosmic Microwave Background, are expected to follow a certain magnitude distribution that’s scale-dependent. The first two multipole moments, l=2 and l=3 (shown here), are too low in magnitude compared to what’s predicted, but the interpretation over what that means is very split. (CHIANG LUNG-YIH)

When we look at the Universe, we are deliberately examining it for any deviations from our expectations. Our expectations are based on our current understanding of how the Universe behaves: what the laws are as we know them, what the composition is as we know it, and the initial conditions as best we know them. When something deviates from our expectations, we have to consider the possibility that, in some way:

  • we might have gotten the laws wrong,
  • we might have gotten the composition wrong,
  • and/or we might have gotten the initial conditions wrong.

But there’s another possibility that’s entirely different, even assuming that there are no errors. Even with a very unlikely outcome, this could simply be the Universe we have. If we look at the Universe and test it for anomalies in a million different ways, we’d expect to find 45,500 of them at 2-σ significance, 2700 at 3-σ significance, 63 at 4-σ significance, and even 1 at 5-σ significance, which is normally considered the “gold standard” for a discovery in physics. Sometimes, the unlikely just happens by random chance, and that’s just a reflection of the Universe we get.

The gravitational lensing map (blue), overlayed over the optical and X-ray (pink) data of the Bullet cluster. The mismatch of the locations of the X-rays and the inferred mass is undeniable, supporting the inferred existence of dark matter. But the speeds associated with this cluster are high enough that they appear to be a statistically unlikely realization of what our Universe predicts. (X-RAY: NASA/CXC/CFA/M.MARKEVITCH ET AL.; LENSING MAP: NASA/STSCI; ESO WFI; MAGELLAN/U.ARIZONA/D.CLOWE ET AL.; OPTICAL: NASA/STSCI; MAGELLAN/U.ARIZONA/D.CLOWE ET AL.)

If we had billions upon billions of Universes to observe, we could know whether ours was typical or not. We could know in what ways we were statistical outliers, and we could reconstruct what the laws, composition, and initial conditions of a “typical” Universe truly are. But — just like any individual member of a population — our observable Universe is bound to be typical in some ways, atypical in others, and to possess a few extremely rare properties.

When we find an outcome that appears unlikely, it could be a hint that one of our assumptions about the properties of the Universe is flawed, but that isn’t necessarily the case. Even unlikely outcomes sometimes occur, and without more Universes to observe than our own, we cannot know which cosmic oddities point to a real problem with our theories versus which ones are simply due to our own particular uniqueness: what professionals call cosmic variance.

When we observe low-probability events in our Universe, we have every right to be suspicious. But if we play a 1-in-a-billion lottery a few billion times, don’t be surprised at the few occurrences where we actually hit the jackpot.


Ethan Siegel is the author of Beyond the Galaxy and Treknology. You can pre-order his third book, currently in development: the Encyclopaedia Cosmologica.
Sign up for the Starts With a Bang newsletter
Travel the universe with Dr. Ethan Siegel as he answers the biggest questions of all

Related
The integration of artificial intelligence into public health could have revolutionary implications for the global south—if only it can get online.

Up Next