Do That Again

A new initiative offers gold stars to researchers willing to have their studies replicated by other labs, but will it fix science’s growing irreproducibility problem?

By | August 15, 2012

stock.xchng, sachyn

In 2009, Science published a paper linking chronic fatigue syndrome with the mouse virus XMRV, prompting a flurry of subsequent studies—none of which could replicate the findings.  The paper was retracted last year.  The following year, Science published a paper describing a strain of bacteria that incorporated arsenic instead of phosphorus into its DNA backbone, only to publish two studies refuting the findings this July. In this case, the journal has not asked the authors for a correction or retraction, citing the self-correcting nature of the scientific process.

And these high profile examples are by no means isolated incidents. In 2011, scientists at Bayer Healthcare in Germany recounted their dismal experience in trying to validate published research on new drug targets: in more than 75 percent of the 67 studies they attempted, Bayer’s labs could not replicate the published findings. This past March, researchers at Amgen reported a similar problem, successfully reproducing the results of just six of the 53 hematology and oncology studies they attempted.

Indeed, published studies whose findings cannot be reproduced appear to be on rise, and while some such studies are later retracted, many stand, collecting citations, either because no one has tried to replicate the data, or those who have, successfully or not, cannot get their studies published.  A new partnership by the start-up Science Exchange, an online marketplace for outsourcing experiments, and the open-access journal PLoS ONE hopes to address the issue of scientific reproducibility.  Announced yesterday (August 14), the Reproducibility Initiative provides a platform for researchers to volunteer their studies for replication by independent third parties.  Studies validated through the initiative will earn a certificate of reproducibility, similar to a Consumer Reports recommendation for a particular car model.

“We think that, long term, there will ultimately be a shift from rewarding highly unexpected results to rewarding reproducible, high-quality results that are really true,” said Elizabeth Iorns, a former breast cancer researcher and CEO of Science Exchange.   Whether or not the new incentive system will have a broad impact on the scientific community, however, remains up for debate.

The Reproducibility Initiative takes advantage of Science Exchange’s existing network of more than 1,000 core facilities and commercial research organizations.  Researchers submit their studies to the initiative, which then matches the studies with qualified facilities that will attempt to replicate the studies for a fee.  The pilot program is accepting 40–50 studies, with preference given to preclinical studies that have translational value.  Submitting researchers will have to pay for the replication studies, which Iorns estimates might cost one-tenth that of the original study, as well as a 5 percent transaction fee to Science Exchange. Participants will remain anonymous unless they choose to publish the replication results in a PLoS ONE Special Collection later this year, which will include overall statistics on the rate of replication.

“We can’t oblige anyone to publish anything,” said Damian Pattinson, the executive editor of PLoS ONE, though he stressed, “If you can’t reproduce the study, it’s very important that people know that.”

The new initiative, he added, is in line with the journal’s record of publishing studies replicating previous findings or presenting negative data—the kind of research often ignored by prominent journals. The current incentive structure for scientific research “pays limited attention to replication and more attention to innovation and extravagant claims,” agreed John Ioannidis, a professor of medicine, health research and policy, and statistics at Stanford University and a scientific advisor to the Reproducibility Initiative.  As a result, researchers are pressured to pursue novel areas of research, undermining the self-correcting nature of science cited by some as the solution to the irreproducibility problem.

Even researchers with no intent for fraud or misconduct are pressured to cut corners or succumb to bias. In 2005, Ionnadis published a widely read paper in PLoS Medicine suggesting that most published scientific findings are actually false.  According to Ionnadis, many false positives stem from researchers hunting for statistically significant results with little regard for the likelihood of the relationship being tested. “As long as journals and reviewers are seeking to publish the ‘perfect story,’ investigators are almost subconsciously persuaded to select their best data for manuscript submission,” said Lee Ellis, a cancer researcher at the MD Anderson Cancer Center and a member of the Reproducibility Initiative’s scientific advisory board.

Researchers say they are eager to see how the Reproducibility Initiative, which appears to be the first of its kind, plays out, and if it will begin to make a dent in the rising number of irreproducible results in the published literature.  With its self-selecting participants, the Reproducibility Initiative is unlikely to uncover scientific misconduct, but may offer incentives for more rigorous research practices.  “It will be very interesting to see what kind of teams would be interested in submitting their studies for replication,” said Ioannidis.

Iorns suspects that researchers trying to license their discoveries to industry might find a certificate of reproducibility particularly useful.  In the long term, added Ellis, perhaps high impact journals will ask for proof of reproducibility prior to publication.  “If studies are required to be validated, then perhaps the highest impact journals will more likely publish articles that are likely to impact the lives of our patients,” he said.

“In theory it’s a good idea,” agreed Arturo Casadevall, a microbiologist and immunologist at the Albert Einstein College of Medicine, who is not involved with the initiative.  That said, he could not envision his own lab participating in the Reproducibility Initiative.  “I just don’t see your average scientist running a lab on a very tight budget having the money to have experiments done elsewhere.” Still, he added, “anything out there that tries to improve scientific integrity has to be looked on positively.”

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: EllenHunt


Posts: 74

August 16, 2012

So, let us accept first that 6/53 and 17/67 studies are correct. Call it 18% or 1/5th of studies. Let us further predicate that each of those studies had a p value of 0.05. I would guess that most of those studies reported p values much lower, but let's be generous to the authors.

Given a 5% probability of achieving the result by random chance, we would expect 95% of the studies to be reproducible. Obviously one of two things is going on. Either the authors are engaging in fraudulent data-filtering, or else we are seeing the filtering effect of positive publication submissions from around 2000 labs who happen to get positive results by chance. This latter certainly seems quite plausible.

It is common for papers to hide their actual data and just report things like Pearson's blah de blah was woofle waffle. Biologists tend to be not so mathematically inclined and use software, the arcana of which they do not understand. It is not unknown to feed data into statistics programs and decide what to use based on how it comes out. Whether or not that method is the right one may remain a good question. Biostatisticians may or may not help because their job may be on the line.

I have seen two instances of the reverse, or something like it, in the last ten years. In one case, the biologist had a result that was meaningful and it made perfect biological sense that it was. But he didn't know how to analyze the data. In the second, it was experimental design was statistically poor.

However, let us consider that the 53 drug target studies originated with high throughput, most likely. That suggests that contamination or assay problems of some sort interfered with results, and not just once. It is possible, I suppose, that high throughput results could be reported raw, without verification. But that seems unlikely. There is at least the suggestion then, that grad students and post-docs motivated to get out of their slavery could be motivated to ensure that results were confirming.

I remember fielding a series of calls a year or so ago from a scientist who had formed a company on the basis of results a grad student got. He was frantic because he couldn't reproduce those results. We went over his numbers, and I had to tell him that based on what he was telling me I didn't think the original results reported were possible. In other words, his grad student had probably lied to him, poor man, and he believed. It was a new area for him.

And then there is my experience that salts my evaluation. I know of multiple instances when PIs sanctioned publication of data that was entirely made up. They did it because it made them money, grant money. They did it because to be honest would have denied them that chair, or denied them tenure. Tenure committees and internal promotion decisions never look at truth and reproducibility.

Then, I am aware of one instance when a company with a product was, shall we say, 'highly motivated' to be unable to reproduce results that showed their product had serious problems. But I don't think that applies to the studies in question here.
So, I suspect that we are seeing a combination of these reasons for inability to reproduce results. I don't know how many research labs there are that could contribute to the narrow sector of the papers discussed in thus article. Are there 2,000? I suspect not. But I don't know. If we knew that number with some measure of confidence, we could improve our understanding of this problem. If we knew what the p values reported really were for those studies, we could be more clear how much fraud, filtering and bogus statistical calculation was probably involved.

And, if my experience is any guide, we might find some reproducible results which have been overlooked if we dug deep enough.

Avatar of: Michael Lerman

Michael Lerman

Posts: 8

August 16, 2012

I do not like the "end of...", "end of time.. and so off. But I had concluded long ago that experimental science initiated by the giants of the beginning (Newton, Faraday, Pascal. Galileo... ) is approaching a slow painful death. The reason is the fierce wars for monies and too many people doing science. One of my mentors told the 1% explanation: 1% getting a college education should actually do it and so on...up to the Nobel prize..
Michael Lerman Ph.D., M.D.

Avatar of: donald klein

donald klein

Posts: 1

August 16, 2012

Not experimental science or dumb researchers. It's lack of raw data access. Could be fixed if journals demanded it via web. Would allow effective peer review .

Avatar of: Ken Witwer

Ken Witwer

Posts: 1457

August 19, 2012

As Casadevall says, the initiative should be looked on positively. However, it would be much more encouraging if the journal in question followed through consistently in enforcing its own policies on data submission rather than introducing a new way to get authors to spend money on more publications. Unfortunately, ensuring quality by imposing stricter standards is not a revenue generator for an open access journal. The problem at many top journals is that "impact" drives publication, but the open access model practiced by PLoS ONE seems to multiply, not address, the issues: for every XMRV/CFS publication in Science and the like, one might suspect there are at least twenty low-quality, irreproducible publications in journals such as PLoS ONE. There is one sure way to move scientific publishing out of the current morass: industry-wide adoption of twin policies of mandatory data access and truly open peer review--where reviewers are wiling to stake their reputations on their verification of the authors' data.

Avatar of: Guest


August 22, 2012

Absolutely. PLoS ONE is keenly aware that their data accessibility record is poor,
but instead of putting their (massive) resources towards ensuring that the data are available for all 17,000 papers they'll publish this year, they engage in window dressing like this. What's up with that? I thought they were the champions of open science.

Avatar of: Guest


August 22, 2012

Absolutely! PLoS One is keenly aware that their current data archiving policy is inadequate and at odds with their image as pioneers of open science, so it's odd that they've put their energy into this worthy but very small scale initiative instead of tackling the bigger problem head on.

Avatar of: Amy Greene

Amy Greene

Posts: 1457

September 10, 2012

Beautiful response. The problem as I see it is that there is little penalty for such fabrications/omissions. A researcher may be given a penalty of 2 years without funding (see NIH Office of Research Integrity). Then there are some that go completely unpunished, in my opinion. Students blame their mentors for improper training, mentors say they can't oversee every piece of data. Maybe the labs are too big to be "training" students and should only hire fully trained scientists to do the work!

September 20, 2012

It seems that some guy with a clever idea came up with this one! It will be difficult to part with money that could go to mandatory training (renewed every year) in research ethics for trainees and support for a body that will ensure that fraudsters do not become repeat offenders when they move from lab to lab

Avatar of: Velva


Posts: 1

October 9, 2012

Large and small universities face the same problem - publish at all cost.  No one really cares whether the data is accurate or not.  But, everyone care whether they have a job.  I believe it is ridiculuous to think that you can adequately complete a study in a year and publish.  Only people that have been in the field for years have accumulated enough data to publish.  Plus, statistically accuracy?  Who cares?  I believe Lehman reference to Newton, etc.  is true.  Please train students to follow your lead and have integrity.  This is how you fix the problem.

Popular Now

  1. Running on Empty
    Features Running on Empty

    Regularly taking breaks from eating—for hours or days—can trigger changes both expected, such as in metabolic dynamics and inflammation, and surprising, as in immune system function and cancer progression.

  2. Gut Feeling
    Daily News Gut Feeling

    Sensory cells of the mouse intestine let the brain know if certain compounds are present by speaking directly to gut neurons via serotonin.

  3. Government Nixes Teaching Evolution in Turkish Schools
  4. Athletes’ Microbiomes Differ from Nonathletes