A few years ago, officials at Switzerland’s Federal Food Safety and Veterinary Office approached Hanno Würbel, the head of the animal welfare division at the University of Bern, with the task of examining the quality of experimental design in the country’s animal research. Growing public awareness of the reproducibility crisis in science—which has emerged as researchers discover that a large proportion of scientific results cannot be replicated in subsequent experiments—had put pressure on the government authority to examine this issue, Würbel says. “They wanted to know, what is the situation in Switzerland . . . and is there anything that we need to improve?”
To address this question, Würbel and his colleagues examined scientific protocols in 1,277 applications for licenses to conduct animal research that were submitted to and approved by the Swiss Food Safety and Veterinary Office (FSVO). Their analysis, published in PLOS Biology in 2016, concluded that most of the experiments described in approved applications lacked scientific rigor. Only a fraction of the protocols included important measures against bias, such as blinding, randomization, or a clear plan for statistical analysis.
It’s now one of several studies that have pointed to critical flaws in the way animal experiments are designed—and many researchers argue that these flaws are major contributors to the reproducibility crisis plaguing published pre-clinical research. In 2011, for example, scientists at the pharmaceutical company Bayer reported that they were unable to reproduce the findings from 43 of 67 projects on potential drug targets in oncology, cardiology, and women’s health. Meanwhile, a 2015 PLOS Biology paper reported that more than 50 percent of preclinical research is not reproducible. The latter study’s authors highlighted poor experimental design as one of the main causes of the problem and estimated that, in the United States alone, approximately $28 billion is spent each year on preclinical experiments that cannot be replicated.
Research that is not quality-controlled is unethical.—Thomas Hartung, Johns Hopkins Bloomberg School of Public Health
Poorly designed animal studies raise ethical concerns in addition to financial and scientific ones. Preclinical experiments, which often involve modeling aspects of human diseases in animals, can include procedures that may cause pain or otherwise inflict harm on the organisms under investigation. While most scientists may consider that harm to be justified in cases where well-conducted research leads to scientific advances, projects that generate irreproducible data on account of poor design create far more unease. According to toxicologist Thomas Hartung, director of the Center for Alternatives to Animal Testing, a group dedicated to promoting and improving the welfare of research animals, at the Johns Hopkins Bloomberg School of Public Health and the University of Konstanz in Germany, “Research that is not quality-controlled is unethical.”
These issues have led numerous members of the scientific community to express the urgent need to expose and address the flaws of animal studies, with many actively working on ways to improve experimental design and tighten regulatory oversight.
Exposing poor experimental design in animal research
More than 50 years ago, zoologist William Russell and microbiologist Rex Burch established the 3Rs, core principles that governments in many countries, including the US, China, and the member states of the European Union, have since woven into legislation and guidelines regulating the use of animals in research. The 3Rs refer to replacement of animal studies with other methods, reduction of the number of animals used in experiments, and refinement of experimental techniques to reduce pain and improve welfare—a perennially controversial topic, as quantifying animal suffering is notoriously difficult.
During the approval process for animal experiments, regulatory bodies use the latest welfare research to weigh potential harm to animals against possible societal benefits, such as advancements in medicine or new scientific knowledge. But as investigations such as Würbel’s have shown, many poorly designed animal studies still seem to be falling through the cracks, accruing the associated costs without providing the potential benefits.
One of the groups trying to get a handle on the problem is the UK-based Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies (CAMARADES) team, which has carried out multiple systematic reviews—assessments of all available literature pertaining to a given research question—of preclinical animal studies. CAMARADES was founded in the early 2000s by University of Edinburgh neurologist Malcolm MacLeod and his colleagues, who decided to look into why so many of the drugs coming out of laboratories working on stroke failed in clinical trials.
In one of their first reviews, carried out more than a decade ago, CAMARADES researchers assessed preclinical publications for NXY-059, a small-molecule drug that showed promise in animal models of stroke but failed in later clinical trials. The team’s investigation revealed that, although nine papers reported that the drug successfully reduced the size of infarcts in animals, only two included randomization and blinding procedures. Furthermore, those two studies reported lower estimates of efficacy than the other seven did.
The group has since made similar findings for other neurological conditions, such as multiple sclerosis, Alzheimer’s, and Parkinson’s. For example, a 2016 review of interventions for Alzheimer’s disease reported that of 427 published in vivo studies, fewer than one in four reported blinding and randomization, and none reported sample size calculations. “Essentially, every review we did said the same thing,” says the University of Edinburgh’s Emily Sena, a leader of the CAMARADES group that continues to carry out these reviews. “Very few studies took simple measures to reduce bias.” On top of that, Sena adds, she and her colleagues consistently found that studies that did not incorporate those measures tended to report larger treatment effects.
“[CAMARADES’s] studies have been really useful for highlighting the problem,” says Nathalie Percie du Sert, the head of experimental design and reporting at the National Centre for the Replacement, Refinement, & Reduction of Animals in Research (NC3Rs), a UK-based scientific organization dedicated to advancing the 3Rs. “What they’re measuring is the quality of reporting in published papers—and that highlights that there’s not enough information in published papers to know that the studies are robust.”
If an experiment is not designed to yield robust results or is not reported in enough detail so that other people can actually use those results, then it’s a complete waste.—Nathalie Percie du Sert, National Centre for the Replacement, Refinement, & Reduction of Animals in Research
Researchers in the US and in Europe are working to implement these types of critical assessments for animal experiments more broadly. For example, the Evidence-Based Toxicology Collaboration that Hartung chairs at the Johns Hopkins Bloomberg School of Public Health has started using systematic reviews to evaluate toxicology studies, and the Netherlands-based organization SYRCLE (SYstematic Review Center for Laboratory animal Experimentation) provides tools, guidelines, and support for researchers to conduct reviews of animal studies in their own fields.
From the perspective of NC3Rs, “if an experiment is not designed to yield robust results or is not reported in enough detail so that other people can actually use those results, then it’s a complete waste” of animals and research resources, Percie du Sert tells The Scientist. “You might as well not do the experiment.”
Improving the scientific rigor of animal research
In 2015, NC3Rs released the Experimental Design Assistant (EDA), a free web tool developed to help researchers reduce the flaws in their methodologies and produce robust designs for animal experiments. “It’s basically like having your own statistical assistant with you when you design your experiment,” Percie du Sert explains. More than 6,000 researchers worldwide are using the EDA, she says, and the tool is recommended by several funders of animal research, such as the Wellcome Trust and the Medical Research Council (MRC) in the UK and the National Institutes of Health in the US.
It’s just one of the ways that NC3Rs is working to improve the rigor of animal experiments. In 2010, the organization published its ARRIVE (Animal Research: Reporting In Vivo Experiments) guidelines, a checklist of items for researchers to include when describing animal experiments, such as sample sizes, full descriptions of the organisms under investigation, and explanations of the measures taken to reduce bias. Although the guidelines have been endorsed by numerous journals and funders, a 2019 paper by Sena, MacLeod, and colleagues revealed that many manuscripts did not live up to its standards. A new version of ARRIVE, which has been updated with changes meant to improve compliance, was posted as a preprint on bioRxiv in July and is currently undergoing peer review.
NC3Rs has also spearheaded efforts to curtail the approval of poorly designed projects. In the UK, where the organization provides funding for 3R-related research, NC3Rs collaborates with other funders such as the Wellcome Trust and MRC to offer training to funding panel members on how to recognize characteristics of high-quality experimental design. These workshops have been running for four years and have included 257 attendees.
Similar developments are underway in other countries. In Switzerland, the vetting process for approving animal research licenses has become more rigorous in recent years, as authorities scrutinize applications closely for both animal welfare and scientific issues. Kaspar Jörger, the head of the animal protection department at the Swiss FSVO, told The Scientist in January that Würbel’s 2016 study was one of the driving forces behind this shift. The FSVO also now requires that all research institutions and pharmaceutical companies hire animal welfare officers to help scientists prepare their applications for animal research permits, and launched the Swiss equivalent to NC3Rs, the Swiss 3R Competence Centre (3RCC), in March 2018.
Sena suggests that, in addition to such measures, the scientific community should develop a formal framework to quantify the scientific rigor of a proposed experiment as part of the assessment of that experiment’s potential benefits—much as it uses the 3Rs to judge potential harm. One way to do this, she says, would be to establish the 3Vs, which she defines as “internal validity,” how well a study minimizes bias by using measures such as randomization and blinding; “external validity,” how generalizable findings are beyond a single laboratory; and “construct validity,” the extent to which a test measures what it claims to measure.
The wider consequences of poorly designed studies
Many of the problems plaguing experimental design in animal research are not unique. Preliminary work by Sena’s group, for example, has found that in vitro preclinical studies often actually suffer more than animal studies from poor experimental design. “I’ve had an undergrad student working with me who did a pilot project looking at in vitro study design, and none of the studies she looked at in her sample were randomized, and none of them were blinded,” Sena says, noting that the lower level of scrutiny and lower cost associated with in vitro research might be contributing factors.
However, the costs associated with poorly designed studies are magnified for animal research due to the difficult ethical questions they raise, and they’re beginning to have real consequences for the scientific community as public awareness of them rises. In Switzerland, for example, along with more-rigorous animal license procedures, political pressure is building to reduce or eliminate animal experimentation. This April, the federal government announced that it would hold a national referendum on banning animal experiments altogether after a campaign for such a vote, launched by a group of Swiss citizens, gained more than 100,000 signatories. The move has drawn criticism from the Swiss National Science Foundation and swissuniversities, a group of higher education institutions, which say that the passing of such a bill would hinder research. The vote is expected to take place around 2022.
The debate should act as a wake-up call for researchers working with animals, Würbel says. “[Scientists] need to convince people that what [they] are doing is useful, valid, and they take ethical concerns seriously,” he explains. “But you can only do this convincingly if you make sure that the science is rigorous and that you have done all you could do to make sure that the harm imposed on the animals is as minimal as possible.”
Alternatives to Animals
Researchers are actively searching for alternatives to animal testing in preclinical research, and in recent years there have been significant advancements in both computational and in vitro replacements. For example, researchers have used in vitro techniques such as organoids and organs-on-chips to model a various aspects of human biology. As these technologies become increasingly sophisticated, scientists have exploited these approaches to make brain organoids capable of generating electrical patterns akin to those found in premature human babies, and even “bodies-on-chips”—interconnected systems of mini-organs.
Substitutes for animal models will likely have the most immediate effect in the area of toxicity testing, because toxicity studies have very clearly defined endpoints, says Anthony Holmes, director of science and technology at the National Centre for the Replacement, Refinement, & Reduction of Animals in Research. There is already some evidence that in silico methods might be superior in this field. Thomas Hartung of Johns Hopkins University and his colleagues recently developed an artificial intelligence–based chemical screen that they demonstrated to be more effective at identifying toxic chemicals than animal tests. (See Hartung’s opinion article, “AI Versus Animal Testing,” The Scientist, May 2019.)
While many of these approaches are promising, it is unlikely that they will ever completely replace animal research. “We can model certain elements of this in in vitro conditions, and we’re getting better at it,” says Ulrich Dirnagl, a neurologist at the Charité University Hospital Berlin. “But there will be a stage when where we will need to see what’s going on in a functioning organism.”
Diana Kwon is a Berlin-based freelance science journalist. Follow her on Twitter @DianaMKwon.