Once a week.
Subscribe to our weekly newsletter.
Believe It Or Not, Most Published Research Findings Are Probably False
Ten years ago, a researcher claimed most published research findings are false; now a decade later, his claim is stronger than ever before. How can this be?
The rise of the Internet has worked wonders for the public's access to science, but this has come with the side effect of a toxic combination of confirmation bias and Google, enabling us to easily find a study to support whatever it is that we already believe, without bothering to so much as look at research that might challenge our position — or the research that supports our position for that matter. I'm certainly not immune myself from credulously accepting research that has later been called into question, even on this blog where I take great effort to take a skeptical approach and highlight false claims arising from research. Could it be the case that studies with incorrect findings are not just rare anomalies, but are actually representative of the majority of published research?
The claim that "most published research findings are false" is something you might reasonably expect to come out of the mouth of the most deluded kind of tin-foil-hat-wearing-conspiracy-theorist. Indeed, this is a statement oft-used by fans of pseudoscience who take the claim at face value, without applying the principles behind it to their own evidence. It is however, a concept that is actually increasingly well understood by scientists. It is the title of a paper written 10 years ago by the legendary Stanford epidemiologist John Ioannidis. The paper, which has become the most widely cited paper ever published in the journal PLoS Medicine, examined how issues currently ingrained in the scientific process combined with the way we currently interpret statistical significance, means that at present, most published findings are likely to be incorrect.
Richard Horton, the editor of The Lancet recently put it only slightly more mildly: "Much of the scientific literature, perhaps half, may simply be untrue." Horton agrees with Ioannidis' reasoning, blaming: "small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance." Horton laments: "Science has taken a turn towards darkness."
Last year UCL pharmacologist and statistician David Colquhoun published a report in the Royal Society's Open Science in which he backed up Ioannidis' case: "If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30 percent of the time." That's assuming "the most optimistic view possible" in which every experiment is perfectly designed, with perfectly random allocation, zero bias, no multiple comparisons and publication of all negative findings. Colquhorn concludes: "If, as is often the case, experiments are underpowered, you will be wrong most of the time."
The numbers above are theoretical, but are increasingly being backed up by hard evidence. The rate of findings that have later been found to be wrong or exaggerated has been found to be 30 percent for the top most widely cited randomized, controlled trials in the world's highest-quality medical journals. For non-randomized trials that number rises to an astonishing five out of six.
Over recent years Ioannidis' argument has received support from multiple fields. Three years ago, when drugs company Amgen tried to replicate the "landmark publications" in the field of cancer drug development for a report published in Nature, 47 out of 53 could not be replicated. When Bayer attempted a similar project on drug target studies, 65 percent of the studies could not be replicated.
The problem is being tackled head on in the field of psychology which was shaken by the Stapel affair in which one Dutch researcher fabricated data in over 50 fraudulent papers before being detected. The social sciences received another blow recently when Michael LaCour was accused of fabricating data; the case exposed how studies are routinely published without raw data ever being made available to reviewers.
A massive operation titled The Open Science Collaboration, involving 270 scientists, has so far attempted to replicate 100 psychology experiments, but only succeeded in replicating 39 studies. The project looked at the first articles published in 2008 in the leading psychology journals. The news wasn't entirely bad; the majority of the non-replications were described by the researchers as having at the very least "slightly similar" findings. The resulting paper is currently under review for publication in Science, so we'll have to wait before we get more details. The paper is likely to ruffle some feathers; tempers flared a few years ago when one of the most high-profile findings of recent years, the concept of behavioral priming, was called into question after a series of failed replications.
Whatever way you look at it, these issues are extremely worrying. Understanding the problem is essential in order to know when to take scientific claims seriously. Below I explore some of Ioannidis' key observations:
The smaller the study, the less likely the findings are to be true.
Large studies are expensive, take longer and are less effective at padding out a CV; consequently we see relatively few of them. Small studies however, are far more likely to result in statistically significant results that are in fact a false positive, so they should be treated with caution. This problem is magnified when researchers fail to publish (or journals refuse to publish) negative findings — a problem know as publication bias or the file drawer problem.
The smaller the effect size, the less likely the findings are to be true.
This sounds like it should be obvious, but it is remarkable how much research fails to actually describe the strength of the results, preferring to simply refer to statistical significance alone, which is a far less useful measure. A study's findings can be statistically significant yet have an effect size so weak that in reality the results are completely meaningless. This can be achieved through a process known as P-hacking — which was the method John Bohannon recently used to create a spoof paper finding that chocolate helps you lose weight. P-hacking involves playing with variables until a statistically significant result is achieved. As neuroscientist and blogger Neuroskeptic demonstrated in a recent talk that you can watch online, this is not always the result of foul play, but can actually happen very easily by accident if researchers simply continue conducting research in the same way most currently do now.
The greater the number and the lesser the selection of tested relationships, the less likely the findings are to be true.
This was another key factor that enabled Bohannon to design the study rigged to support the case that eating chocolate helps you lose weight. Bohannon used 18 different types of measurements, relying on the fact that some would likely support his case simply due to chance alone. This practice is currently nearly impossible to detect if researchers fail to disclose all the factors they looked at. This problem is a major factor behind the growing movement of researchers calling for the pre-registration of study methodology.
The greater the financial and other interests and prejudices, the less likely the findings are to be true.
It is always worth checking to see who funded a piece of research. Sticking with our chocolate theme, a recent study that found that chocolate is "scientifically proven to help with fading concentration" was funded by Hershey. On a more serious note, tobacco companies have a long history of funding fraudulent health research over the past century — described by the World Health Organization as "the most astonishing systematic corporate deceit of all time." Today that baton has been handed to oil companies who give money to scientists who deny global warming and fund dozens of front groups with the purpose of sowing doubt about climate change.
The hotter a scientific field, the less likely the findings are to be true.
Though seemingly counter-intuitive, it is particularly common in fast-moving fields of research where many researchers are working on the same problems at the same time, for false findings to be published and quickly debunked. This has been dubbed the Proteus Phenomenon after the Greek god Proteus, who could rapidly change his appearance. The same can be said for research published in the sexiest journals, which only accept the most groundbreaking findings, where the problem has been dubbed the Winner's Curse.
What does this all mean to you?
Thankfully science is self-correcting. Over time, findings are replicated or not replicated and the truth comes out in the wash. This is done through a process of replication involving larger, better controlled trials, meta-analyses where the data from many trials are aggregated and analyzed as a whole, and systematic reviews where studies are assessed based on predetermined criteria — preventing the cherry picking that we're all, whether we like it or not, so naturally inclined to.
Replications, meta-analyses and systematic reviews are by their nature far more useful for portraying an accurate picture of reality than original exploratory research. But systematic reviews rarely make headlines, which is a good reason the news is not the best place to get an informed opinion about matters of science. The problem is unlikely to go away any time soon, so whenever you hear about a new piece of science news, remember the principles above and the simple rule of thumb that studies of studies are far more likely to present a true picture of reality than individual pieces of research.
What does this mean for scientists?
For scientists, the discussion over how to resolve the problem is rapidly heating up with calls for big changes to how researchers register, conduct, and publish research and a growing chorus from hundreds of global scientific organizations demanding that all clinical trials are published. Perhaps most important and most difficult to change, is the structure of perverse incentives that places intense pressure on scientists to produce positive results while actively encouraging them to quietly sit on negative ones.
Some mysteries take generations to unfold.
- In 1959, a group of nine Russian hikers was killed in an overnight incident in the Ural Mountains.
- Conspiracies about their deaths have flourished ever since, including alien invasion, an irate Yeti, and angry tribesmen.
- Researchers have finally confirmed that their deaths were due to a slab avalanche caused by intense winds.
a: Last picture of the Dyatlov group taken before sunset, while making a cut in the slope to install the tent. b: Broken tent covered with snow as it was found during the search 26 days after the event.
Photographs courtesy of the Dyatlov Memorial Foundation.<p>Finally, a <a href="https://www.nature.com/articles/s43247-020-00081-8" target="_blank">new study</a>, published in the Nature journal Communications Earth & Environment, has put the case to rest: it was a slab avalanche.</p><p>This theory isn't exactly new either. Researchers have long been skeptical about the avalanche notion, however, due to the grade of the hill. Slab avalanches don't need a steep slope to get started. Crown or flank fractures can quickly release as little as a few centimeters of earth (or snow) sliding down a hill (or mountain). </p><p>As researchers Johan Gaume (Switzerland's WSL Institute for Snow and Avalanche Research SLF) and Alexander Puzrin (Switzerland's Institute for Geotechnical Engineering) write, it was "a combination of irregular topography, a cut made in the slope to install the tent and the subsequent deposition of snow induced by strong katabatic winds contributed after a suitable time to the slab release, which caused severe non-fatal injuries, in agreement with the autopsy results."</p><p>Conspiracy theories abound when evidence is lacking. Twenty-six days after the incident, a team showed up to investigate. They didn't find any obvious sounds of an avalanche; the slope angle was below 30 degrees, ruling out (to them) the possibility of a landslide. Plus, the head injuries suffered were not typical of avalanche victims. Inject doubt and crazy theories will flourish.</p>
Configuration of the Dyatlov tent installed on a flat surface after making a cut in the slope below a small shoulder. Snow deposition above the tent is due to wind transport of snow (with deposition flux Q).
Photo courtesy of Communications Earth & Environment.<p>Add to this Russian leadership's longstanding battle with (or against) the truth. In 2015 the Investigative Committee of the Russian Federation decided to reopen this case. Four years later the agency concluded it was indeed a snow avalanche—an assertion immediately challenged within the Russian Federation. The oppositional agency eventually agreed as well. The problem was neither really provided conclusive scientific evidence.</p><p>Gaume and Puzrin went to work. They provided four critical factors that confirmed the avalanche: </p><ul><li>The location of the tent under a shoulder in a locally steeper slope to protect them from the wind </li><li>A buried weak snow layer parallel to the locally steeper terrain, which resulted in an upward-thinning snow slab</li><li>The cut in the snow slab made by the group to install the tent </li><li>Strong katabatic winds that led to progressive snow accumulation due to the local topography (shoulder above the tent) causing a delayed failure</li></ul><p>Case closed? It appears so, though don't expect conspiracy theories to abate. Good research takes time—sometimes generations. We're constantly learning about our environment and then applying those lessons to the past. While we can't expect every skeptic to accept the findings, from the looks of this study, a 62-year-old case is now closed.</p><p> --</p><p><em>Stay in touch with Derek on <a href="http://www.twitter.com/derekberes" target="_blank">Twitter</a> and <a href="https://www.facebook.com/DerekBeresdotcom" target="_blank" rel="noopener noreferrer">Facebook</a>. His most recent book is</em> "<em><a href="https://www.amazon.com/gp/product/B08KRVMP2M?pf_rd_r=MDJW43337675SZ0X00FH&pf_rd_p=edaba0ee-c2fe-4124-9f5d-b31d6b1bfbee" target="_blank" rel="noopener noreferrer">Hero's Dose: The Case For Psychedelics in Ritual and Therapy</a>."</em></p>
As patients approached death, many had dreams and visions of deceased loved ones.
One of the most devastating elements of the coronavirus pandemic has been the inability to personally care for loved ones who have fallen ill.
Research reveals a new evolutionary feature that separates humans from other primates.
- Researchers find a new feature of human evolution.
- Humans have evolved to use less water per day than other primates.
- The nose is one of the factors that allows humans to be water efficient.
A model of water turnover for humans and chimpanzees who have similar fat free mass and body water pools.
Credit: Current Biology
Being skeptical isn't just about being contrarian. It's about asking the right questions of ourselves and others to gain understanding.