A large-scale effort to repeat psychological experiments has failed to confirm the results about half the time, according to a study published yesterday (November 19) on the pre-print server PsyArXiv and scheduled for publication in Advances in Methods and Practices in Psychological Science. According to the manuscript, the failures were not due to differences in the study populations between the original experiments and the replications, Nature reports.
“Those [replication efforts] that failed tended to fail everywhere,” Brian Nosek, who led the study and is the executive director of the Center for Open Science in Charlottesville, Virginia, tells Nature.
In the study, known as Many Labs 2, 28 classic and contemporary psychology experiments were repeated by 60 labs in 36 countries and territories. For example, they successfully replicated a 1981 study by Daniel Kahneman of Princeton University about framing effects, or how people react...
The replicators’ statistical threshold for significance was a p-value of less than 0.0001—a much more stringent threshold than the usual standard of p < 0.05. Overall, they failed to reproduce 14 of the experiments.
As The Atlantic notes, online bettors were quite good at predicting which experiments would be successfully repeated.
See also: “Gambling on Reproducibility”
The study found that differences in study populations could not explain why these replication efforts failed, despite them being a typical excuse, Nosek tells The Atlantic. “Your replication attempt failed? It must be because you did it in Ohio and I did it in Virginia, and people are different. But these results suggest that we can’t just wave those failures away very easily.”
See also: “Latest Effort To Reproduce Psych Studies Yields 62 Percent Success”
The consistency of the results across various replicating groups, even though many of them were failures to reproduce earlier findings, is encouraging, University of Oregon psychologist Sanjay Srivastava, who was not involved in the project, tells The Atlantic. If the results were more variable, it would cast doubt on reproducibility failures, but it would also cast doubt on successes. “That might allow us to dismiss failed replications, but it would require us to dismiss original studies, too,” Srivastava says. “In the long run, Many Labs 2 is a much more hopeful and optimistic result.”