Replication Failures Highlight Biases in Ecology and Evolution Science

As robust efforts fail to reproduce findings of influential zebra finch studies from the 1980s, scientists discuss ways to reduce bias in such research.

Aug 1, 2018
Yao-Hua Law

ABOVE: Recent work questions the conclusion of 30-year-old research that leg bands affect the mating success of male zebra finches.

What can males wear to look sexier? For zebra finches, the trick seemed simple: add a dash of red to their legs. Research conducted in the 1980s found that slipping red bands onto the legs of male birds turned them into sex magnets. Those studies became iconic in sexual selection research because they provided something rare in the discipline: strong, consistent effects. But data accumulated in recent years question these influential findings.

Zebra finches (Taeniopygia guttata) are native Australian birds with a bright red-orange beak. They form monogamous breeding pairs in which the male and female cooperate to raise young. Easy to rear in captivity, zebra finches are model organisms for research in cognition and sexual selection.

In the 1980s, ornithologist Nancy Burley, then at the University of Illinois, found that placing plastic leg bands of different colors—used by scientists to identify individual birds—on the legs of zebra finches affected the birds’ chances of mating. Burley reported, first in Science and then in other leading journals, that females preferred red-banded males and disliked green-banded males. Females also spent more time caring for nestlings sired by red-banded males. Burley’s results inspired subsequent research in female choice and maternal effects.

For many years I’ve been of the opinion that the band-color preference was overstated and probably incorrect.

—Simon Griffith,  Macquarie University

But results contradictory to Burley’s began to emerge in the late 1990s. And this March, Wolfgang Forstmeier of the Max Planck Institute for Ornithology and colleagues published the strongest disagreement yet. Forstmeier’s lab ran eight experiments and analyzed unpublished data from four other labs and found no effects of leg-band colors on the reproductive success of male or female zebra finches. The new study also included a meta-analysis of 39 published studies, including 22 that supported leg-band color effects. The meta-analysis found that effect sizes shrank as sample sizes increased—a sign of selective reporting. This pattern is seen only among the early and smaller studies of leg-band color that supported the positive effects of red leg-bands; in more-recent work challenging the notion of leg band preference, the researchers found no relationship between effect size and sample size.

Simon Griffith, an ornithologist at Macquarie University, describes Forstmeier’s study as “a really nice example of thorough and rigorous science.” In a 2010 paper (included in Forstmeier’s meta-analysis), Griffith had found that while females preferred red-banded zebra finches, this preference wasn’t due to the leg bands themselves, but to associated changes in the males’ singing. “For many years I’ve been of the opinion that the band-color preference was overstated and probably incorrect,” he says.

Leigh Simmons, an evolutionary biologist at the University of Western Australia, says that the “robust evidence” in Forstmeier’s study shows that leg-band color effects are not universal among zebra finches. But Simmons also argues that Burley’s peer-reviewed studies are valid. He lists various factors that might explain Burley’s results, such as levels of dietary carotenoids, which have been shown to affect vision of other finches, and the types of plastic bands used. “It can be any number of reasons why she found the effects and others haven’t,” says Simmons. (Burley did not respond to requests for comment.)

Reports of male blue tits' (Cyanistes caeruleus) plumage color on their condition or mating success may be influenced by bias.

But researchers’ inability to replicate past findings isn’t limited to Burley’s iconic zebra finch studies. In the past five years, meta-analyses and reviews have generated more evidence of bias in ecology and evolutionary biology research. For example, biases have been found in the literature on ideas such as feather color affecting mate choice in blue tits and black bib sizes indicating male dominance in house sparrows. As with zebra finch leg bands, such biases don’t necessarily invalidate the hypotheses themselves, but undermine the strength of evidence for them, leaving researchers questioning concepts once considered well-supported. While scientists disagree on the extent of the reproducibility problem—which exists across disciplines, from psychology to cancer biology—they have begun to undertake efforts to reduce bias and improve transparency in ecology and evolution research. 

Sources of bias

Biases in scientists or editors can skew published results, called publication bias. Simmons agrees that publication bias exists in the scientific literature, particularly when a new hypothesis is first proposed. He cites his experience unraveling conflicting results regarding a phenomenon called fluctuating asymmetry. In the early 1990s, a few studies had reported that female animals prefer males with more bilateral symmetry. It was a novel finding that spurred huge interest in the idea that animals could use the outward physical symmetry of potential mates to assess their underlying genetic quality. Simmons published a meta-analysis in 1999 showing that the strong effects found in early studies disappeared in later studies. He attributed the initially positive results to publication bias, and subsequent confounding results to better measurement tools. Simmons reasoned that because scientists are more likely to submit positive results for publication, the early years of a new hypothesis would skew towards data that support it. As subsequent studies began to use more-accurate methods for measuring symmetries, Simmons found, the effect of symmetry on male attractiveness decreased to zero.

In 2016, Forstmeier teamed up with Whitman College ecologist Timothy Parker and colleagues to review selective reporting and transparency in ecology and evolutionary biology literature. A sample of meta-analyses that examined 279 studies from 1970 to 2012 found that more than half of the studies failed to disclose full details of the experiments’ results and statistics. Moreover, given the typically small sample sizes in these fields, studies should average only a 20 percent chance of detecting a real effect, one that is not due to chance. But more than 70 percent of ecology and environmental studies reported significant results.

In addition to being small, sample sizes are often not predetermined, the team found. Instead, researchers frequently continue taking data until they get the desired result. This so-called “flexible stopping rule” is a bad statistical practice that “dramatically increases the chance of a false-positive result,” says Parker. Both he and Forstmeier were taught the flexible stopping rule and considered it sound methodological advice until recent years.

Alternatively, researchers may collect as much data as possible, then sift through the data for patterns, Parker says. But such an approach leads to a lot of selective reporting, he notes. “To have a good story often means you only tell the story of a part of your data.”

Improving transparency and reducing bias

Available data suggest that questionable research practices are common enough in ecology and evolution research to warrant concern. Last month, Hannah Fraser, an ecologist at University of Melbourne, and colleagues surveyed more than 800 ecologists and evolutionary biologists and found that many of the researchers—mostly midcareer and senior—admitted to at least one instance of selective reporting (64 percent), use of the flexible stopping rule (42 percent), or having changed hypotheses to fit their results (51 percent).

Recent research challenges the notion that the black bib of house sparrows (Passer domesticus) signals dominance status.

Publishers and funders can make demands and offer rewards to improve scientific rigor. For example, the Center for Open Science crafted the Transparency and Openness Promotion Guidelines (TOP), which have been endorsed by more than 5,000 journals and funding agencies. TOP guidelines require researchers to fully and clearly report research questions and methods, and to deposit their data and analysis codes in public archives. Funders can also require that scientists preregister their experimental design and analysis to discourage any bias during research planning.

Since 2012, Simmons has been the editor-in-chief of Behavioural Ecology, a leading journal of the field. The journal implemented compulsory data sharing among its authors in 2016 and began adhering to TOP guidelines in 2017.  TOP standards formalize good scientific practices, say Simmons. The moves neither affected the journal’s submission numbers nor necessitated extra administrative burden. “I received many emails congratulating our proactive move of signing up to TOP guidelines,” says Simmons. “We are in a generation where people appreciate [transparency], and there’s immense value in having data openly available.”

I spend a lot of my time as a science teacher to convey to my students that getting a null result is just as important as getting support for the alternative hypothesis.

Leigh Simmons, University of Western Australia

Reviewers also have a role to play in cleaning up the literature, says Parker. In May, Parker, Griffith, Forstmeier, and a number of other ecologists published a checklist for reviewers to promote transparency and reduce bias. Among their main suggestions are that reviewers request authors’ complete statistical details and rationale for sample sizes, evaluate study methods independently of the results, and examine the effect sizes.

Simmons notes, however, that reviewers and editors do not have time to “get the data and rerun the analysis” of every study. “To a certain extent, you have to trust the authors when they say what they have done and how they have done it.”

Rather, he argues, the key to more-open and rigorous scientific practices in ecology and evolution is properly educating young scientists. Early career researchers tend to perceive null results as failures, he says. “That’s nonsense. I spend a lot of my time as a science teacher to convey to my students that getting a null result is just as important as getting support for the alternative hypothesis.”

Parker agrees. “If a study was worth doing because you thought the answer could be valuable, then we should really know what the answer is regardless of the answer.”

November 2018

Intelligent Science

Wrapping our heads around human smarts


Sponsored Product Updates

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC’s Genomics division announced it is transforming its branding under LGC, Biosearch Technologies, a unified portfolio brand integrating optimised genomic analysis technologies and tools to accelerate scientific outcomes.

DefiniGEN licenses CRISPR-Cas9 gene editing technology from Broad Institute to develop cell models for optimized metabolic disease drug development

DefiniGEN licenses CRISPR-Cas9 gene editing technology from Broad Institute to develop cell models for optimized metabolic disease drug development

DefiniGEN Ltd are pleased to announce the commercial licensing of CRISPR-Cas9 gene-editing technology from Broad Institute of MIT and Harvard in the USA, to develop human cell disease models to support preclinical metabolic disease therapeutic programmes.

Thermo Fisher Scientific: Freezers for Biological Samples

Thermo Fisher Scientific: Freezers for Biological Samples

Fluctuations in temperature can reduce the efficacy, decompose, or shorten the shelf life of biologics. Therefore, it is important to store biologics at the right temperature using standardized protocols.