Daryl J. Bem's experiments on psi caught the world's attention, as I posted last month, because he used standard psychology-lab methods to gather and analyze his data. Imagine what astronomers might feel if NASA announced that the Hubble space telescope had found evidence for astrology: How do you scoff when you depend on the same instrument yourself? Some, though, had the opposite reaction. Like the foursome who recently threw this bucket of cold water on Bem's claim. If psychology's telescope yields evidence that people can sense future events, say the authors, then that thing needs a tune-up.
"The field of psychology," write Eric–Jan Wagenmakers, Ruud Wetzels, Denny Borsboom, and Han van der Maas of the University of Amsterdam, "currently uses methodological and statistical strategies that are too weak, too malleable, and offer far too many opportunities for researchers to befuddle themselves and their peers."
At the heart of their case against Bem is a fundamental tenet, which, I think, can be paraphrased as: Parapsychology? Dude, seriously?
If psi is for reals, they ask, why aren't the world's casinos bankrupt? In one of his experiments, Bem found that people could correctly predict, 53.1 percent of the time, which of two locations on a screen would show an erotic picture. That success rate, if applied to the black/red choice in roulette, would provide a very good living for gamblers and eventually bankrupt casinos, note Wagenmakers et al. Yet casinos are still in business. So either "psi effects are not operative in casinos, but they are operative in psychological experiments on erotic pictures," or psi effects are trivial to non-existent. Bem's "eight experiments are not enough to convince a skeptic that the known laws of nature have been bent."
The point here is not that they'd be more convinced by 800 experiments. What's needed to provide "extraordinary evidence," the authors say, is the testing of many alternative explanations, not many experiments. Instead, they say, Bem's procedures tested only two possible hypotheses: That psi is real and that the results mean nothing (the null hypothesis).
When only one hypothesis is tested against null, it's easy to make the logical error known as "the prosecutor's fallacy." That comes up, in real life, when a forensics lab assesses a crime-scene DNA sample and estimates that the odds of finding its particular combination of alleles is, say, one in three million. Suppose, unfortunately for you, the DNA is a match for yours. The prosecutor then argues that this means the odds are one in three million that anyone else but you committed the crime.
That's wrong: His hypothesis (you're guilty!) requires a DNA match. But that doesn't mean a DNA match requires his hypothesis. Maybe someone else with those markers did the deed (perhaps a lot of people with that genetic signature are concentrated in your neighborhood). Maybe that is your DNA, but you didn't commit the crime. Point is: You aren't convicted because the DNA matches; you're convicted because other possible explanations were tested, and only the one where you're guilty held up to scrutiny. A prosecutor who thinks people cheat on the lottery could throw every winner in jail, the authors point out. After all, the odds against winning are so great, anyone who does it must be guilty! That doesn't happen because winning-by-cheating is even more improbable than winning by chance. If you never compare the won-by-cheating theory to the won-by-chance theory, you won't see this.
In attacking the Bem paper, Wagenmakers et al. seem to be saying that psychology is rife with the prosecutor's fallacy, known also as the fallacy of the transposed conditional. Experimenters often claim they've proven a hypothesis because they've found some data, without exploring how else that data could have been generated. Bem doesn't say what other hypotheses he tested, they write, so they don't know how sound his reasoning is.
So they applied a statistical test of likelihood on Bem's data (it's a Bayesian hypothesis test, if you want details), and find that his results flunk.
Now, the severity of their test depends on their initial assumptions, and their initial assumptions were basically the logic-notation version of "Parapsychology? Dude, seriously?" Different assumptions might lead elsewhere, and the authors admit they don't know a lot about this kind of research. However, they say, they don't need to prove that their Bayesian test is superior to Bem's statistical methods. The mere fact that it comes to different conclusions from the same data indicates that those data aren't as strong as possible.
How strong do they need to be? That bring us to their most philosophical point: They say the experiments were "exploratory" work, where the researchers didn't define what they were looking for or how they would know they'd found it. In the experiment with the erotic pictures, Bem, they write, "tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition." That's an exploratory experiment, or, as they call it, "a fishing expedition." Lemuel Moyé calls it "data dredging."
The gold standard of experimentation, according to Wagenmakers et al., is different. It's the "confirmatory" experiment—one in which you have defined in advance what you're looking for and what you need to see to prove that you found it. That, writes Moyé, is like mapping out a treasure site and excavating it "for a jewel that all reliable evidence suggests is present." On the other hand, he writes, exploratory work is "strip mining of the site, churning up the information landscape, looking for any pattern at all in the data and proclaiming that the association leeched from the sample is [real]."
Yikes. Of course, what sounds to some like logical rigor sounds to others like an absurdly limited notion of how experiments actually work. Behind claims of shoddy statistics may lie a deep philosophical division about the trade-offs between surprise and certainty, which has also come out recently in the fMRI field and in epidemiology. I'll try to keep my eye on these debates. Stay tuned. Unless of course, you've already foreseen where it's all going.
Daryl J. Bem (In Press). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology : 10.1037/a0021524