Study: A Lot of Mind and Brain Research Depends on Flawed Statistics
David Berreby is the author of "Us and Them: The Science of Identity." He has written about human behavior and other science topics for The New Yorker, The New York Times Magazine, Slate, Smithsonian, The New Republic, Nature, Discover, Vogue and many other publications. He has been a Visiting Scholar at the University of Paris, a Science Writing Fellow at the Marine Biological Laboratory, a resident at Yaddo, and in 2006 was awarded the Erving Goffman Award for Outstanding Scholarship for the first edition of "Us and Them." David can be found on Twitter at @davidberreby and reached by email at david [at] davidberreby [dot] com.
Like a biblical parable, the typical human-behavior experiment is easily told and easily reduced to a message: People who pay with credit cards were more likely to have potato chips in their grocery bags than were people who paid with cash. (So if you want to lose weight, use cash!) I tell this kind of tale all the time (though I try to be careful about jumping from an experimental result to a policy). But the old yeshiva saying applies: "For example" isn't proof. Though many experiments lend themselves to a convincing blog post, the actual, logical case—what the researchers did that's different from what I do here— is supposed to be made by statistical inference. So if there's something wrong with people's statistics, then their results fall into a category much dreaded in science: anecdotes. Hence the shock value of this paper (pdf) in this month's Nature Neuroscience. In 157 neuroscience studies that made such this-group-versus-that-group claims, the authors write, they found fully half got their statistics wrong in at least one instance. Allowing for papers where the error clearly didn't invalidate the researchers' central claim, that still means about a third of these papers (from Science, Nature, Neuron, The Journal of Neuroscience and Nature Neuroscience itself) might be describing effects that aren't real.
The authors, Sander Nieuwenhuis, Birte U. Forstmann and Eric-Jan Wagenmakers, target a common procedure in neuroscience (and social sciences as well): Performing an experimental manipulation on two different groups, you find that one reacts in a satisfying clear way, while the other does not react that way. (To be more precise, you test for the probability of your data occurring when there was no effect from your treatment. Using one very common standard, when that probability is less than 5 percent, you declare you have a statistically significant finding.)
So there you are testing the effect of a drug on Group 1 (a strain of mutant mice) and, for comparison, on Group 2 (a strain of plain old vanilla lab mice). The drug has a statistically significant effect on Group 1 and a much smaller, statistically insignificant effect on Group 2. Therefore, you conclude that your drug has a different effect on mutant mice than on normal ones. And when you do this, you are wrong.
The reason: The results from Group 1 and from Group 2 are distinct pieces of information. In order to compare them statistically, you have to relate them to one another. You need to know the probability finding that difference between Group 1's effect and Group 2's—not the probability of either result in isolation. In fact, as this paper points out, the appearance of a statistically significant result in Group 1 and an insignificant result in Group 2 is not, itself, necessarily statistically significant. A large contrast between results from the two groups could be due to a very small difference in the underlying cause.
This is a lot less compelling than a neat story line (Ben Goldacre at The Guardian called his lucid explanation last week "400 words of pain"). But doing the stats right is the essential underpinning for the narrative version. So I was simply astonished that half the researchers making this sort of claim in the very prestigious sample were, according to the paper, not doing it correctly.
I try, dear reader, to sort out the wheat and the chaff here, worrying about soundness as well as the gee whiz factor, and trying to separate the experiments that actually took place from hype that could be derived from them. But Wagenmakers, who has made himself a scourge of statistical error and woolly thinking in general, has me worried.
I first encountered his skepticism of psychology's methods when he and his co-authors dismantled claims that standard psychology's methods could yield evidence of psychic powers. Then, last May, he and another set of co-authors published this paper (pdf), in which they look at 855 statistical tests in papers published in 2007 in two major psychology journals, and find that 70 percent would flunk an alternative (and, they say, better) test of significance.
I mean, it would be one thing if a lot of contemporary research on human behavior was superseded, corrected, improved upon or reinterpreted in the future. Given the way science is supposed to work, one of those fates is to be expected. What I can't get my mind around is the possibility that, instead, a great deal of this work, sheaf upon sheaf of it, will turn out to be simply meaningless.
ADDENDUM: The notion that scientists don't get statistics doesn't shock statisticians, it seems. At least, it doesn't shock my favorite statistics guru, Andrew Vickers of Sloan-Kettering, author of this very clear and handy guide to his field. After I sent him the paper by Nieuwenhuis et al., he emailed: "Bad statistics in neuroscience? Isn't that a bit like going out of your way to say that the Mets have a bad record against Atlanta? They lose against pretty much every team and there is no need to go through the sub-group analyses of multiple different opponents. By the same token, the surprise would be if neuroscientists didn't make the same mistakes as everyone else."
It makes sense to me that the oddities of statistical thinking would be no more congenial to scientists than to the rest of us (if your passion is alligator brains or star clusters, there's no particular reason you should cotton to p-values). Perhaps this leads to a "black box" approach to statistical software that helps explain the situation that Nieuwenhuis et al. decry. On the other hand, Goldacre sees things more darkly, suggesting the trouble may be a desire to publish at all costs.
I do think it's a subject we science writers ought to pay more attention to.
Nieuwenhuis, S., Forstmann, B., & Wagenmakers, E. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance Nature Neuroscience, 14 (9), 1105-1107 DOI: 10.1038/nn.2886
Wetzels, R., Matzke, D., Lee, M., Rouder, J., Iverson, G., & Wagenmakers, E. (2011). Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests Perspectives on Psychological Science, 6 (3), 291-298 DOI: 10.1177/1745691611406923
A new study shows choosing to be active is a lot of work for our brains. Here are some ways to make it easier.
There's no shortage of science suggesting that exercise is good for your mental as well as your physical health — and yet for many of us, incorporating exercise into our daily routines remains a struggle. A new study, published in the journal Neuropsychologia, asks why. Shouldn't it be easier to take on a habit that is so good for us?
A glass of juice has as much sugar, ounce for ounce, as a full-calorie soda. And those vitamins do almost nothing.
Quick: think back to childhood (if you've reached the scary clown you've gone too far). What did your parents or guardians give you to keep you quiet? If you're anything like most parents, it was juice. But here's the thing: juice is bad for you.
The stories we tell define history. So who gets the mic in America?
- History is written by lions. But it's also recorded by lambs.
- In order to understand American history, we need to look at the events of the past as more prismatic than the narrative given to us in high school textbooks.
- Including different voices can paint a more full and vibrant portrait of America. Which is why more walks of American life can and should be storytellers.
SMARTER FASTER trademarks owned by The Big Think, Inc. All rights reserved.