According to a study published today (February 22) in PLOS Biology, the current design of some preclinical studies may be undermining their reproducibility. Including just a few additional laboratories in each preclinical trial could improve the replicability of study results, the authors find.
A clinical trial would never be designed that only drew one cohort of participants from a single tiny village, but preclinical trials are typically designed in precisely that manner, study coauthor Hanno Würbel, a zoologist at the University of Bern, tells The Scientist. “If you want generally valid conclusions that apply to a whole range of conditions or individuals in a population, then you need to address this heterogeneity,” he says. “That leads to larger variation within your study cohort, but that is just an image of reality.”
In 1999, a report in Science found individual laboratories had unique differences that could yield “idiosyncratic” results, even when study protocols and housing conditions for animal models were rigorously standardized. That prompted Würbel, who was then studying how different environments affect behavior and brain function in mice and rats, to begin investigating the effect of various lab conditions on study outcomes. “If you run a highly standardized study—all animals with the same genotype, all exposed to the same conditions—you may obtain a highly precise result,” he says. “But it may only be valid under the specific conditions [of the study].” He concluded that perhaps by including multiple laboratories in a study, the problem could be mitigated.
If your hypothesis doesn’t stand the test of a heterogenized preclinical setting, then there is probably little hope that under clinical conditions it will work.—Hanno Würbel,
University of Bern
To test this idea, Würbel and his colleagues simulated single- and multi-laboratory experiments with published results from preclinical studies. They included 440 preclinical studies across 13 animal models of stroke, myocardial infarction, and breast cancer.
To model several laboratories collaborating on a single study, the team combined data from multiple independent studies. They found that including just two to four laboratories in a study was enough to produce more-consistent results than single-laboratory studies, which had high variation between their findings. They first conducted a meta-analysis of 50 independent studies on the effect of hypothermia on stroke severity in rodents, and found that it reduces severity by 50 percent. They used this number as a benchmark, comparing it to single- and multi-lab simulations’ predictions of the reduction in severity. Single-lab studies successfully predicted it 50 percent of the time. Adding a second lab to the simulation increased prediction success from 50 percent to 73 percent. Adding a third raised it to 83 percent, and adding a fourth, to 87 percent.
“What [the study is] recommending makes sense, and would certainly improve the field,” says oncologist Glenn Begley, the CEO of BioCurate, an Australian public-private biopharmaceutical partnership that was not involved in this study. “There is no doubt in my view that if the recommendations presented in this paper were adopted, we would be much better off.” Begley coauthored a letter to Nature in 2012 that advocated, in part, for improving preclinical trials by “[raising] the bar for reproducibility.”
Using multiple labs for preclinical research may not always be necessary to improve reproducibility, says Würbel—for example, when establishing a proof of concept. “[Researchers] may well run initial studies under highly standardized conditions,” he says, but as soon as you want to generalize your findings, heterogeneity becomes important. “If your hypothesis doesn’t stand the test of a heterogenized preclinical setting, then there is probably little hope that under clinical conditions it will work.”
Tim Errington, metascience manager at the Center for Open Science’s Reproducibility Project, says including multiple labs in preclinical studies could lead to a more efficient use of resources. “These types of suggestions are exactly the way we can move forward on that front,” he says. But he adds that such progress requires changing the culture of how research is conducted. “People are slowly starting to move on this, and I think doing studies like this is a great way to allow us to see the benefits of moving the way that we conduct our research,” he says. “I think that what I’d like to see now is more groups doing this and reporting [their findings].”
The researchers acknowledge that it may be logistically difficult to include multiple labs in every new preclinical study, and to address this they recommended varying experimental conditions within labs to mimic multi-laboratory studies. The problem, says Würbel, is that a method for doing so still needs to be developed. He is currently investigating how best to do so.
Würbel says that if a study cannot reproduce an observation, “then strictly speaking, you have no evidence for it. I think objectivity is always based on multiple independent observations.”
B. Voelkl et al., “Reproducibility of preclinical animal research improves with heterogeneity of study samples,” PLOS Biology, doi:10.1371/journal.pbio.2003693, 2018.
Correction (February 23): The original version of this article miscalculated the effects of adding a second, third, and fourth lab on prediction success. The increases in prediction success were actually 23 percentage points, and 33 and 37 percentage points, respectively. Paragraph five has been updated to clarify. The Scientist regrets the error.