Potential Causes of Irreproducibility Revealed
Potential Causes of Irreproducibility Revealed

Potential Causes of Irreproducibility Revealed

Five independent groups got different results in a drug-response experiment, despite sharing protocols, reagents, and cell lines. The researchers identify technical variables that could be to blame.

Jul 11, 2019
Abby Olena

ABOVE: © ISTOCK.COM,
BELCHONOCK

While much of biology research suffers from a lack of reproducibility, no single factor has emerged as the driver of this problem. In a multi-lab study published this week in Cell Systemsresearchers have attempted to reproduce the results of an assay in which cultured cells were treated with cancer drugs. Their lack of success highlights the role that technical variables play in the ability to repeat experiments.

“The key thing [is] raising awareness of this variability and the fact that a lot of it will be really difficult to control,” says Paula Bates, who works in cancer drug development at the University of Louisville and did not participate in the study. “It’s especially important for projects where there is a lot of data being collected and compared."

The work came out of the National Institutes of Health Library of Network-Based Cellular Signatures (LINCS) Program, a consortium of six data-generation centers and one coordination center scattered across the United States. The goal of the program is to develop high quality large-scale datasets that describe how cells respond to various perturbations, including gene knockdowns, environmental changes, and drugs.

To integrate large datasets and get a more complete picture of cellular response requires that the data are reproducible and robust, says coauthor Laura Heiser, one of the leaders of the LINCS center at Oregon Health & Science University. “So what we set out to do was to try to understand the robustness and reproducibility of a pretty simple assay type and one that’s fairly standard in the systems biology community: a drug response assay.”

Heiser’s group and three other labs received human mammary epithelial cells, media, drugs, and a detailed protocol from the group of Peter Sorger, who leads the LINCS group at Harvard University. All five teams (Sorger’s included) cultured the cells and treated them with each of eight small-molecule drugs, then measured cell viability to estimate drug potency.

In initial experiments, the results revealed drug potencies that varied as much as 200-fold between groups. The researchers investigated the reasons for the incongruent findings and identified several technical factors. For instance, not everyone used the same method for counting cells, and direct counts using a microscope did not correlate well with measuring ATP levels in lysed cells, which is often used as a proxy for cell number. Groups also used different image-processing algorithms to count live cells, and this seemed to make a difference as well. The researchers also determined that the location in a cell culture plate in which cells were grown could lead to variation in results between labs. These so-called edge effects come from uneven evaporation of culture media and temperature gradients.

Once the teams addressed these issues with a more standardized protocol and randomization of which wells in a plate are treated with drug, the experiments’ replicability—both between groups and within single labs over time—improved. But the results were more consistent within a group than between groups, a factor that in the paper the authors write could be due to possible differences in pipetting technique, variations in equipment, or failure to stick to the protocol due to “a belief—belied by the final analysis—that counting cells is such a simple procedure that different assays can be substituted for each other without consequence.”

But following the protocol exactly is only part of the story, Sorger tells The Scientist. Another important part of understanding reproducibility is that many experiments depend on some biological component that goes unmeasured. “Whether cells grow or die in any given condition or . . . do any other response is actually contextually dependent on where they’re growing, how fast they’re growing, what their prior history is, what the medium is,” among other potential biological factors, Sorger says. More comprehensive measurements of this biological context could help improve the translation of preclinical findings to effective treatments, he adds.

“The point isn’t just to get reproducible effects. We want reproducible effects that are going to be robust to subtle changes in the experiment, which include moving the effect into a clinically relevant setting,” agrees University of Chicago researcher James Evans, who was not involved in the current study.

And with the biological variation that is inherent in any experiment, standardizing protocols may not be the best solution. Evans and his colleagues recently published findings in eLife showing that scientific communities that overlap less, use distinct methods, and cite more diverse prior work are more likely to generate reproducible results. If everyone follows the same protocol, “it’s going to make the effects that we end up producing fragile to the changes that occur in the real world.”

The real world—specifically finding clinically relevant biomarkers—is what matters to Benjamin Haibe-Kains, a researcher at Princess Margaret Cancer Centre and the University of Toronto who did not participate in the work. “At the end of the day, I’ll be provocative here and say I don’t really care about the reproducibility. It’s unlikely if the data are very noisy, but you can imagine a case where reproducibility is quite low, but you still have so much data that you can find those biomarkers that would make it to the clinic.”

M. Niepel et al., “A multi-center study on the reproducibility of drug-response assays in mammalian cell lines,” Cell Systemsdoi:10.1016/j.cels.2019.06.005, 2019.