Dealing with Irreproducibility

Researchers discuss the growing pressures that are driving increases in retraction rates at AACR.

By | April 8, 2014

FLICKR, UNIVERSITY OF EXETERRecent years have seen increasing numbers of retractions, higher rates of misconduct and fraud, and general problems of data irreproducibility, spurring the National Institutes of Health (NIH) and others to launch initiatives to improve the quality of research results. Yesterday (April 7), at this year’s American Association for Cancer Research (AACR) meeting, researchers gathered in San Diego, California, to discuss why these problems to come to a head—and how to fix them.

“We really have to change our culture and that will not be easy,” said Lee Ellis from the University of Texas MD Anderson Cancer Center, referring to the immense pressure researchers often feel to produce splashy results and publish in high-impact journals. Ellis emphasized that it is particularly important in biomedical research to ensure that the data coming out of basic research studies—which motivate human testing—is accurate. “Before we start a clinical trial, we’d better be sure this has some potential to help our patients,” he said.

C. Glenn Begley, chief scientific officer of TetraLogic Pharmaceuticals and former vice president of hematology and oncology research at Amgen, discussed a project undertaken by Amgen researchers to reproduce the results of more than 50 published studies. The vast majority were irreproducible, even by the original researchers who had done the work. “That shocked me,” he said.

William Sellers, global head of oncology at Novartis Institutes for Biomedical Research, described a similar experience. In addition to being unable to reproduce the majority of published experiments they attempted, Sellers and his colleagues got startling results when they began to verify the cell lines they purchased, finding that several commonly used lines were mislabeled as the wrong cancer type.

And these were just a few of the myriad problems that plague the literature, the experts noted. Lack of blinding or controls, unvalidated reagents, and inappropriate statistical tests were also common in the top-tier publications the researchers surveyed, not to mention the rising rates of research misconduct.

As for the cause of these problems, the panelists cited pressure from journals to tell nicely packaged stories, a professional culture that emphasizes high-impact publications, and the ongoing funding strain. “Right now, we have a system that I think is an unprecedented scientific enterprise, but by under-resourcing it, we’re placing it under enormous pressure,” said Ferric Fang of the University of Washington, who has studied rates and causes of retractions and misconduct.

The discussants offered a handful of possible solutions. For reagents and cell lines, Sellers suggested a Wikipedia-like reporting system through which properties could be recorded and verified. And for all the thousands of publications that have used inappropriate or mislabeled materials, retractions may be practical, he noted, suggesting some sort of flagging system on PubMed that could alert readers to potential problems.

And when it comes to outright misconduct, which has been on the rise in recent years, Ellis argued for more stringent consequences. “The punishment of being found guilty of misconduct is relatively light,” he said. “For those found guilty of fraud . . . you should be out [of science], that’s my personal feeling.”

Whatever the solution, the panelists agreed that something needs to happen—and soon. “Our ability to take a drug from concept to FDA [Food and Drug Administration] approval is very poor,” said Ellis. “In the field of cancer, only about 5 percent of drugs that start end up with FDA approval. To improve upon this dismal 5 percent success rate, we really need to have more confidence in our data.”

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: @mrgunn


Posts: 1

April 8, 2014

Thanks for your coverage of this important issue. I was there at this session and the room was definitely packed. In fact, they had to turn people away! I'm grateful to the panelists for being thought leaders on this issue in their community, and I would also like to point out that the Wikipedia-like system and the flagging system on Pubmed both currently exist.

The Reproducibility Initiative has a validation service and Pubmed Commons allows comments to be entered below any abstract on Pubmed.

I would be happy to answer any questions about the Initiative that readers of this article may have.

Avatar of: tvence


Posts: 1052

Replied to a comment from @mrgunn made on April 8, 2014

April 8, 2014

Thanks for your comment, @mrgunn! We're indeed interested in both the Reproducibility Initiative ( and PubMed Commons (

All the best,

Tracy Vence

News Editor, The Scientist

Avatar of: Richard Sever

Richard Sever

Posts: 2

April 9, 2014

The bioRxiv preprint server includes categories for "Confirmatory Results" and "Contradictory Results" (in addition to the more common "New Results" category that covers most papers). This is intended to allow authors to rapidly and easily communicate the reproducibility of work by other groups, which is often hard to publish in regular journals.

Avatar of: Paul Yarnold

Paul Yarnold

Posts: 1

April 25, 2014

Results fail to cross-generalize because linear models are preposterous.  It is not “one size fits all” as the linear models suggest.  Rather, it is “one size fits no-one”. The problem is exacerbated in longitudinal trials, due to confounding attributable to Simpson's Paradox. What is needed is an improvement in statistical education of both teachers and students. While the applied sciences continue to evolve, researchers rely on applied statatistical procedures which were developed close to a century ago, or earlier. A provisional patent was awarded a week ago for identifying the descendant (pareto optimal) family of globally-optimal (maximum-accuracy) statistical models for any given data set. This development finally offers a data-optimal automatic non-linear analysis which should alleviate issues with ancient methods.

Avatar of: Selc


Posts: 3

April 29, 2014











































































































































































































































































































































































































































































































































































































































































































  Irreproducibility does not equate to misconduct.  Many of the greatest experiments in science had problems with reproducibility.  In fact, one could argue that the BEST science will always be at the fringes of what is doable, and can be very difficult to reproduce until others are up to snuff.  This coming down on researches for initial irreproducible results could have a chilling effect upon the best science at the frontiers and produce a lot of mediocre yet easily reproducible results for fear of taking chances...with little consequent long term progress.   One well done positive study and lots of negative studies (particularly if they are large and have inhomogenous samples) I always regard as potentially important.  There are also innumerable cases where studies are actually innappropriate because we just don't have the technical ability or homogeneous enough samples to make sensible results with meaningful power.  In those cases, common sense and relying on few postive "small" but well done studies may be better than trying to do a study which will produce statistically meaningless (often negative) results.  Careful review of difficult to reproduce research results on a case by case basis is the only real way to ensure good science.  






















































































































































Popular Now

  1. 2017 Top 10 Innovations
    Features 2017 Top 10 Innovations

    From single-cell analysis to whole-genome sequencing, this year’s best new products shine on many levels.

  2. Thousands of Mutations Accumulate in the Human Brain Over a Lifetime
  3. Antiviral Immunotherapy Comes of Age
    News Analysis Antiviral Immunotherapy Comes of Age

    T-cell therapies are not just for cancer. Researchers are also advancing immunotherapy methods to protect bone marrow transplant patients from viral infections. 

  4. Search for Life on the Red Planet