When Studies Prove That Studies Don't Prove Things ...

Do science journalists have weird psychic powers? You might think so, given the near simultaneity of publications this fall on the touchy theme of studies that don’t really prove what they’re supposed to have proved. Last month, The Atlantic Monthly published David H. Freedman’s profile of Stanford’s John Ioannidis, who disbelieves the results of almost all published biomedical research. Three days ago, yours truly explained why four psychologists think much the same about many papers in their field. And yesterday The New Yorker brought out this brilliant piece by Jonah Lehrer, which describes why there’s uneasiness about experimental findings in many disciplines. (If you don’t subscribe, you’ll have to shell out money for a copy of the magazine to read the whole piece. It’s worth it.)

Why should you pay up? Because experimental proof is the sacred text of our secular society. Many of us drop research findings into conversation the way, in his day, Cotton Mather quoted Scripture. Well, of course men evolved to be more promiscuous than women. Remember that experiment at the University of Texas? You bet stereotype threat is real—let me tell you about this Harvard study I read about. In normal talk, experiments convince us for the same reason that Leviticus helped Mather win an argument: They are satisfying stories, they seem to state universal truths, and they have the authority of wisdom and power. Of course exercise is good for us. This study showed that when people jog just 30 minutes a week …

Within the wise and powerful institution of science, though, experiments are supposed to be convincing logically, not rhetorically. Tell all the anecdotes you like, the data come out one way if your hypothesis is correct, and a different way if, instead, the “null hypothesis” is true. What is the likelihood of your numbers if the null hypothesis were right? Commonly, if that probability is less than five percent, you have a “significant” result to publish. Your hopes, fears, fads, and ambitions can’t affect that.

Such is the theory. By getting a number of scientists to speak honestly about their doubts and frustrations, Lehrer sketches a much more ambiguous reality. It’s one in which many experimentally established “truths” fade with time, for instance. (He’s not describing the overthrow of one theory by another, which is supposed to happen in science; rather, he’s talking about experimental effects that looked solid and then seem to fade away in later experiments—”as if nature gave me this great result, and then tried to take it back,” the psychologist Jonathan Schooler told him.)

The unease runs wide and deep through many fields, and it seems to be about the same issues everywhere: Statistics that are manipulated to get a “significant” result; biases affecting everything from the way the data are collected to what gets published to what’s even recorded in the notebook.

What is to be done? Some scientists sound as if they think the problem is laxness and lack of interest in statistics. If that’s so, many fields need a general tightening up on standards and math. However, you could make the case that in some areas, the problem is exactly the opposite. For example: If psychology isn’t physics, Paul Rozin argues here, then it will never come into its own until it ceases trying to act like physics. In some cases, an argument about reliability is really an argument about the nature and purpose of a discipline.

All in all, an interesting autumn for people interested in epistemology. (We’re used to getting no respect, what with our editors all the time pushing us for “the bottom line” and “the [note theological metaphor] lay reader’s version.”) Kudos to Lehrer for collecting scattered signs of soul-searching and making them into a big, important picture.