Want to Boost Reproducibility? Get Another Lab Involved

Including as few as two labs in a study improved the odds of getting the true effect size by as much as 23 percentage points, according to a replication model.

By Jim Daley | February 22, 2018


According to a study published today (February 22) in PLOS Biology, the current design of some preclinical studies may be undermining their reproducibility. Including just a few additional laboratories in each preclinical trial could improve the replicability of study results, the authors find.

A clinical trial would never be designed that only drew one cohort of participants from a single tiny village, but preclinical trials are typically designed in precisely that manner, study coauthor Hanno Würbel, a zoologist at the University of Bern, tells The Scientist. “If you want generally valid conclusions that apply to a whole range of conditions or individuals in a population, then you need to address this heterogeneity,” he says. “That leads to larger variation within your study cohort, but that is just an image of reality.”

In 1999, a report in Science found individual laboratories had unique differences that could yield “idiosyncratic” results, even when study protocols and housing conditions for animal models were rigorously standardized. That prompted Würbel, who was then studying how different environments affect behavior and brain function in mice and rats, to begin investigating the effect of various lab conditions on study outcomes. “If you run a highly standardized study—all animals with the same genotype, all exposed to the same conditions—you may obtain a highly precise result,” he says. “But it may only be valid under the specific conditions [of the study].” He concluded that perhaps by including multiple laboratories in a study, the problem could be mitigated.

If your hypothesis doesn’t stand the test of a heterogenized preclinical setting, then there is probably little hope that under clinical conditions it will work.—Hanno Würbel,
University of Bern

To test this idea, Würbel and his colleagues simulated single- and multi-laboratory experiments with published results from preclinical studies. They included 440 preclinical studies across 13 animal models of stroke, myocardial infarction, and breast cancer.

To model several laboratories collaborating on a single study, the team combined data from multiple independent studies. They found that including just two to four laboratories in a study was enough to produce more-consistent results than single-laboratory studies, which had high variation between their findings. They first conducted a meta-analysis of 50 independent studies on the effect of hypothermia on stroke severity in rodents, and found that it reduces severity by 50 percent. They used this number as a benchmark, comparing it to single- and multi-lab simulations’ predictions of the reduction in severity. Single-lab studies successfully predicted it 50 percent of the time. Adding a second lab to the simulation increased prediction success from 50 percent to 73 percent. Adding a third raised it to 83 percent, and adding a fourth, to 87 percent.

“What [the study is] recommending makes sense, and would certainly improve the field,” says oncologist Glenn Begley, the CEO of BioCurate, an Australian public-private biopharmaceutical partnership that was not involved in this study. “There is no doubt in my view that if the recommendations presented in this paper were adopted, we would be much better off.” Begley coauthored a letter to Nature in 2012 that advocated, in part, for improving preclinical trials by “[raising] the bar for reproducibility.” 

Using multiple labs for preclinical research may not always be necessary to improve reproducibility, says Würbel—for example, when establishing a proof of concept. “[Researchers] may well run initial studies under highly standardized conditions,” he says, but as soon as you want to generalize your findings, heterogeneity becomes important. “If your hypothesis doesn’t stand the test of a heterogenized preclinical setting, then there is probably little hope that under clinical conditions it will work.”

Tim Errington, metascience manager at the Center for Open Science’s Reproducibility Project, says including multiple labs in preclinical studies could lead to a more efficient use of resources. “These types of suggestions are exactly the way we can move forward on that front,” he says. But he adds that such progress requires changing the culture of how research is conducted. “People are slowly starting to move on this, and I think doing studies like this is a great way to allow us to see the benefits of moving the way that we conduct our research,” he says. “I think that what I’d like to see now is more groups doing this and reporting [their findings].”

The researchers acknowledge that it may be logistically difficult to include multiple labs in every new preclinical study, and to address this they recommended varying experimental conditions within labs to mimic multi-laboratory studies. The problem, says Würbel, is that a method for doing so still needs to be developed. He is currently investigating how best to do so. 

Würbel says that if a study cannot reproduce an observation, “then strictly speaking, you have no evidence for it. I think objectivity is always based on multiple independent observations.”

B. Voelkl et al., “Reproducibility of preclinical animal research improves with heterogeneity of study samples,” PLOS Biology, doi:10.1371/journal.pbio.2003693, 2018. 

Correction (February 23): The original version of this article miscalculated the effects of adding a second, third, and fourth lab on prediction success. The increases in prediction success were actually 23 percentage points, and 33 and 37 percentage points, respectively. Paragraph five has been updated to clarify. The Scientist regrets the error.

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: James M Clark

James M Clark

Posts: 1

February 23, 2018

I find the logic here a bit circular. Their criterion for an effect is a meta-analysis of existing results. They then find that including more of the published studies gives a better estimate of the effect, which is pretty much expected. Wouldn't a simpler solution be to simply make it easier to conduct (funding?) and publish (editorial policies?)  replications that could then be aggregated in a meta-analysis? Also, a collaboration between two or more labs is not necessarily the same as the(presumably) independent studies included in a meta-analysis.

Avatar of: PeterUetz


Posts: 9

March 15, 2018

In many cases it would suffice to properly document all variables, which admittedly is often nearly impossible (e.g. when the complete genomes of the animals are not sequenced). We have shown that for protein-protein interactions screens, which have a bad reputation for being not reproducible, very small differences can cause huge differences in results (Chen et al. 2010, Nature Methods 7 (9): 667-668, Stellberger et al. 2010, Proteome Science, 8: 8).

Getting another lab involved helps only to weed out the variation among experiments but doesn't improve documentation.

Popular Now

  1. Elena Rybak-Akimova, Chemical Kinetics Expert, Dies
  2. University of Oregon Erecting a $1-Billion Science Center
  3. Investigation Finds Signs of Misconduct in Swedish Researcher’s Papers
  4. Opinion: No, FDA Didn’t Really Approve 23andMe’s <em>BRCA</em> Test