Skip to content
Surprising Science

‘The first crack in the wall of significance testing’

A major psychology journal has banned the use of the near-universally adopted practice of significance testing, citing recent evidence of the technique’s unreliability. What will be the fallout for psychology as a field?
Sign up for the Smarter Faster newsletter
A weekly newsletter featuring the biggest ideas from the smartest people

If you have ever read a piece of psychology research cover to cover, you will almost certainly have witnessed the p-value, the controversial statistical measure that we have discussed at length on this blog previously. Last month, a scientific journal made perhaps the boldest move since journals began opening their doors to open access. The journal, Basic and Applied Social Psychology (BASP), has banned the use of null hypothesis significance testing, a technique used almost universally in psychology research and much scientific research across the board. Some, however, might call this the nuclear option, as when used properly, the p-value can be a useful indicator. Instead of significance testing, the journal will rely on arguably more reliable measures that are often left out of modern psychology research:


“BASP will require strong descriptive statistics, including effect sizes. We also encourage the presentation of frequency or distributional data when this is feasible. Finally, we encourage the use of larger sample sizes than is typical in much psychology research, because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem”

Significance testing is one of the most important, yet most widely misunderstood definitions in science. Over at the excellent Science-Based Medicine blog, Yale clinical neurologist Steven Novella sums up the problem well:

“The p-value was never meant to be the sole measure of whether or not a particular hypothesis is true. Rather it was meant only as a measure of whether or not the data should be taken seriously.”

Novella’s account refers to an absolutely beautiful GuitarHero-meets-SpaceInvaders-meets-Tetris-meets-roulette statistical demonstration of the problem by Geoff Cumming, dubbed “The dance of the p-values.” If it is not the most inspired stats lesson you’ve ever had, then I’ll eat my hat:

Follow Neurobonkers on TwitterFacebookGoogle+RSS, or join the mailing list. Image Credit: Adapted from original artwork by Shutterstock.

Reference:

Trafimow D. (2014). Editorial, Basic and Applied Social Psychology, 37 (1) 1-2. DOI: http://dx.doi.org/10.1080/01973533.2015.1012991

Sign up for the Smarter Faster newsletter
A weekly newsletter featuring the biggest ideas from the smartest people

Related
The problem of scientists manipulating data in order to achieve statistical significance, labelled p-hacking is incredibly hard to track down due to the fact that the data behind statistical significance is often unavailable for analysis by anyone other than those who did the research and themselves analysed the data.

Up Next