Faulty Statistics Muddy fMRI Results

An analysis of the widely used technique calls into question the validity of 40,000 studies.

Jul 6, 2016
Tanya Lewis

fMRI during working memory taskWIKIMEDIAFunctional magnetic resonance imaging (fMRI) is widely used in neuroscience. But according to a recent analysis by researchers from Massachusetts General Hospital in Boston, one of the most commonly used software packages for fMRI data generates false-positive rates as high as 70 percent. The analysis calls some 40,000 studies into question, the researchers reported last week (June 28) in PNAS.

“Though fMRI is 25 years old, surprisingly its most common statistical methods have not been validated using real data,” study coauthor Anders Eklund of Linköping University, in Sweden, told Wired.

The imaging method divides the brain up into small units called voxels, in which brain activity is measured. Then the software sorts through these voxels and looks for “clusters” with similar activity.

Eklund and colleagues compiled publicly available resting-state fMRI data from nearly 500 healthy controls. The researchers randomly assigned some of the subjects to a control group and others to an “experimental” group. Then they fed the data into one of three commonly used software packages (SPM, FSL, and AFNI) thousands of times, Ars Technica reported.

In addition to finding a startlingly high false-positive rate, the researchers found a bug in the AFNI software package When they used a debugged version of the software, it reduced false positives by more than 10 percent, according to Ars Technica. Most of the data from studies that used the faulty software isn’t available, however, so the data can’t easily be reanalyzed, the researchers noted in their paper.

Update (July 13): Following publication of this study, coauthor Thomas Nichols noted on his blog that the number of studies called into question by his team’s analysis may be fewer than the published figure (40,000). Nichols estimates that only around 15,000 papers use the multiple testing correction method referenced in the study, only 3,500 of which would result in error rates of 50 percent or more. “I frankly thought this number would be higher, but didn’t [realize] the large proportion of studies that never used any sort of multiple testing correction,” he wrote. On Twitter, Nichols told The Scientist that “an errata is in press.”