Poisson distribution: why scientists and media don’t understand clinical trial statistics
- The media, and even many scientists, don’t have a solid enough understanding of statistics to distinguish between significant and non-significant findings in clinical trials.
- For instance, to determine if the results of two studies on vaccine side effects are significantly different, one must understand the Poisson distribution.
- The Poisson distribution is relevant in many domains, from biology to risk modeling for insurance companies.
Last month, Bayern Munich footballer Alphonso Davies was diagnosed with mild myocarditis following a COVID vaccine booster. He wasn’t the first vaccinated high-profile athlete to have suffered myocarditis. Concerns about heart complications in healthy, vaccinated people have repeatedly made the news since the first COVID vaccines rolled out. To investigate these, clinical trials are monitoring the prevalence of myocarditis in vaccinated people.
An Israeli study found that myocarditis occurred in 1 in 12,361 vaccinated boys aged 12 to 15. Comparing the results to those from an earlier CDC study, the New York Times reported that “the Israeli figure is higher than the Centers for Disease Control and Prevention estimate of one case per 16,129 vaccinated adolescents aged 12 to 17.” The authors behind the Israeli study suggested in a letter to the editor that “these differences may be explained by the active surveillance in our population.”
Should we be concerned? Is the Israeli result proof that the side effect rate is higher than we thought? Or is the result due to random chance? We can definitively answer that question, but we first need to meet the Poisson distribution.
A primer on the Poisson distribution
A statistical tool first described by French mathematician Simeon Poisson in the early 19th century, it models discrete and independent events occurring within a fixed time or space. Myocarditis cases, for example, are discrete and independent of each other. (For the cognoscenti: Cases where the sample sizes are huge and one of the outcomes is highly unlikely (just like in this case), the Poisson distribution approximates the binomial distribution.)
Here is how the Poisson distribution works. Let’s assume that you receive an average of ten emails every hour. What is the probability that you will receive four emails in the next hour? What about 12 emails? Or 45 emails? To quantify this, we need to consider the likelihood that the sampled statistic (number of emails in the next hour) could stray from the known average. Given that a phenomenon follows the Poisson distribution, the following nasty-looking equation describes the probability of observing a certain number of events (k) given a particular average rate (λ).
P(k) = (λk · e-λ)/k!
Nasty, yes. But the equation isn’t too hard to utilize. Plugging in the numbers from our previous example (k = 10 emails and λ = 10 emails per hour, on average), the formula to calculate the probability of getting exactly 10 emails (P(10)) in the next hour looks like this:
P(10) = (1010 · e-10)/10! = 0.125
The letter “e” is a weird constant found everywhere in nature (like pi) that is roughly equivalent to 2.72. The exclamation point doesn’t denote excitement; instead, it represents the factorial (which, in this case, is 10 x 9 x 8 x 7… x 1). As shown, once all the math is done, the answer is 0.125. Translation: There is a 12.5% chance that you will receive exactly 10 emails in the next hour.
Poisson distribution for vaccine side effects
What does this have to do with comparing two clinical trials? Great question. When you are trying to determine the rate of something (λ, which in this case is the rate of myocarditis as a COVID vaccine side effect), you need to calculate a confidence interval. This is a way for researchers to show that the “real answer” is in some particular range of values. Critically, this was missing from the NYT’s report, as well as from the analysis in the aforementioned letter to the editor.
The exact details involve some nitty-gritty statistics, but it can be calculated easily using software* (or even by hand with a calculator). The Israeli study estimated a rate of myocarditis of 1 in 12,361, but the confidence interval comes out to 1 in 7,726 to 1 in 30,902. Obviously, the CDC’s estimate of 1 in 16,129 lies within this range, which means the studies aren’t significantly different from each other.
In other words, the Israeli study does not suggest that the rate of myocarditis is higher than we thought. Its result was statistically indistinguishable from the CDC’s result.
Poisson: from biology to finance and beyond
The usefulness of the Poisson distribution in biology goes beyond comparing two clinical trials. Its impact spans from early work in bacterial genetics and species distribution to “omics” technologies that are now mainstream in life sciences research. It also has applications in finance and risk modeling for insurance companies.
Scientists and science writers, who often need to compare the results of biomedical studies, should be more familiar with the Poisson distribution. This obscure, abstract formula has a bigger impact in our daily lives than one might think.
*For the adventurous, the confidence interval can be calculated using R with the code:
x <- rpois(10000, 11)
low <- mean(x) – 2 * sqrt(var(x))
high <- mean(x) + 2 * sqrt(var(x))
This yields a confidence interval of 4.4 to 17.6 cases of myocarditis per the Israel sample size (which was approximately 135,971). Converted to fractions, this is 1 in 30,902 and 1 in 7,726, respectively.