The rise of the Internet has worked wonders for the public’s access to science, but this has come with the side effect of a toxic combination of confirmation bias and Google, enabling us to easily find a study to support whatever it is that we already believe, without bothering to so much as look at research that might challenge our position — or the research that supports our position for that matter. I’m certainly not immune myself from credulously accepting research that has later been called into question, even on this blog where I take great effort to take a skeptical approach and highlight false claims arising from research. Could it be the case that studies with incorrect findings are not just rare anomalies, but are actually representative of the majority of published research?
The claim that “most published research findings are false” is something you might reasonably expect to come out of the mouth of the most deluded kind of tin-foil-hat-wearing-conspiracy-theorist. Indeed, this is a statement oft-used by fans of pseudoscience who take the claim at face value, without applying the principles behind it to their own evidence. It is however, a concept that is actually increasingly well understood by scientists. It is the title of a paper written 10 years ago by the legendary Stanford epidemiologist John Ioannidis. The paper, which has become the most widely cited paper ever published in the journal PLoS Medicine, examined how issues currently ingrained in the scientific process combined with the way we currently interpret statistical significance, means that at present, most published findings are likely to be incorrect.
Richard Horton, the editor of The Lancet recently put it only slightly more mildly: “Much of the scientific literature, perhaps half, may simply be untrue.” Horton agrees with Ioannidis’ reasoning, blaming: “small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance.” Horton laments: “Science has taken a turn towards darkness.”
Last year UCL pharmacologist and statistician David Colquhoun published a report in the Royal Society’s Open Science in which he backed up Ioannidis’ case: “If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30 percent of the time.” That’s assuming “the most optimistic view possible” in which every experiment is perfectly designed, with perfectly random allocation, zero bias, no multiple comparisons and publication of all negative findings. Colquhorn concludes: “If, as is often the case, experiments are underpowered, you will be wrong most of the time.”
The numbers above are theoretical, but are increasingly being backed up by hard evidence. The rate of findings that have later been found to be wrong or exaggerated has been found to be 30 percent for the top most widely cited randomized, controlled trials in the world’s highest-quality medical journals. For non-randomized trials that number rises to an astonishing five out of six.
Over recent years Ioannidis’ argument has received support from multiple fields. Three years ago, when drugs company Amgen tried to replicate the “landmark publications” in the field of cancer drug development for a report published in Nature, 47 out of 53 could not be replicated. When Bayer attempted a similar project on drug target studies, 65 percent of the studies could not be replicated.
The problem is being tackled head on in the field of psychology which was shaken by the Stapel affair in which one Dutch researcher fabricated data in over 50 fraudulent papers before being detected. The social sciences received another blow recently when Michael LaCour was accused of fabricating data; the case exposed how studies are routinely published without raw data ever being made available to reviewers.
A massive operation titled The Open Science Collaboration, involving 270 scientists, has so far attempted to replicate 100 psychology experiments, but only succeeded in replicating 39 studies. The project looked at the first articles published in 2008 in the leading psychology journals. The news wasn’t entirely bad; the majority of the non-replications were described by the researchers as having at the very least “slightly similar” findings. The resulting paper is currently under review for publication in Science, so we’ll have to wait before we get more details. The paper is likely to ruffle some feathers; tempers flared a few years ago when one of the most high-profile findings of recent years, the concept of behavioral priming, was called into question after a series of failed replications.
Whatever way you look at it, these issues are extremely worrying. Understanding the problem is essential in order to know when to take scientific claims seriously. Below I explore some of Ioannidis’ key observations:
The smaller the study, the less likely the findings are to be true.
Large studies are expensive, take longer and are less effective at padding out a CV; consequently we see relatively few of them. Small studies however, are far more likely to result in statistically significant results that are in fact a false positive, so they should be treated with caution. This problem is magnified when researchers fail to publish (or journals refuse to publish) negative findings — a problem know as publication bias or the file drawer problem.
The smaller the effect size, the less likely the findings are to be true.
This sounds like it should be obvious, but it is remarkable how much research fails to actually describe the strength of the results, preferring to simply refer to statistical significance alone, which is a far less useful measure. A study’s findings can be statistically significant yet have an effect size so weak that in reality the results are completely meaningless. This can be achieved through a process known as P-hacking — which was the method John Bohannon recently used to create a spoof paper finding that chocolate helps you lose weight. P-hacking involves playing with variables until a statistically significant result is achieved. As neuroscientist and blogger Neuroskeptic demonstrated in a recent talk that you can watch online, this is not always the result of foul play, but can actually happen very easily by accident if researchers simply continue conducting research in the same way most currently do now.
The greater the number and the lesser the selection of tested relationships, the less likely the findings are to be true.
This was another key factor that enabled Bohannon to design the study rigged to support the case that eating chocolate helps you lose weight. Bohannon used 18 different types of measurements, relying on the fact that some would likely support his case simply due to chance alone. This practice is currently nearly impossible to detect if researchers fail to disclose all the factors they looked at. This problem is a major factor behind the growing movement of researchers calling for the pre-registration of study methodology.
The greater the financial and other interests and prejudices, the less likely the findings are to be true.
It is always worth checking to see who funded a piece of research. Sticking with our chocolate theme, a recent study that found that chocolate is “scientifically proven to help with fading concentration” was funded by Hershey. On a more serious note, tobacco companies have a long history of funding fraudulent health research over the past century — described by the World Health Organization as “the most astonishing systematic corporate deceit of all time.” Today that baton has been handed to oil companies who give money to scientists who deny global warming and fund dozens of front groups with the purpose of sowing doubt about climate change.
The hotter a scientific field, the less likely the findings are to be true.
Though seemingly counter-intuitive, it is particularly common in fast-moving fields of research where many researchers are working on the same problems at the same time, for false findings to be published and quickly debunked. This has been dubbed the Proteus Phenomenon after the Greek god Proteus, who could rapidly change his appearance. The same can be said for research published in the sexiest journals, which only accept the most groundbreaking findings, where the problem has been dubbed the Winner’s Curse.
What does this all mean to you?
Thankfully science is self-correcting. Over time, findings are replicated or not replicated and the truth comes out in the wash. This is done through a process of replication involving larger, better controlled trials, meta-analyses where the data from many trials are aggregated and analyzed as a whole, and systematic reviews where studies are assessed based on predetermined criteria — preventing the cherry picking that we’re all, whether we like it or not, so naturally inclined to.
Replications, meta-analyses and systematic reviews are by their nature far more useful for portraying an accurate picture of reality than original exploratory research. But systematic reviews rarely make headlines, which is a good reason the news is not the best place to get an informed opinion about matters of science. The problem is unlikely to go away any time soon, so whenever you hear about a new piece of science news, remember the principles above and the simple rule of thumb that studies of studies are far more likely to present a true picture of reality than individual pieces of research.
What does this mean for scientists?
For scientists, the discussion over how to resolve the problem is rapidly heating up with calls for big changes to how researchers register, conduct, and publish research and a growing chorus from hundreds of global scientific organizations demanding that all clinical trials are published. Perhaps most important and most difficult to change, is the structure of perverse incentives that places intense pressure on scientists to produce positive results while actively encouraging them to quietly sit on negative ones.