Batch Effect Behind Species-Specific Results?

Reanalysis of Mouse ENCODE data suggests mouse and human genes are expressed in tissue-specific, rather than species-specific, patterns. 

By | May 19, 2015

WIKIMEDIA, RAMALate last year, members of the Mouse ENCODE consortium reported in PNAS that, across a wide range of tissues, gene expression was more likely to follow a species-specific rather than tissue-specific pattern. For example, genes in the mouse heart were expressed in a pattern more similar to that of other mouse tissues, such as the brain or liver, than the human heart.

But earlier this month, Yoav Gilad of the University of Chicago called these results into question on Twitter. With a dozen or so 140-character dispatches (including three heat maps), Gilad suggested the results published in PNAS were an anomaly—a result of how the tissue samples were sequenced in different batches. If this “batch effect” was eliminated, he proposed, mouse and human tissues clustered in a tissue-specific manner, confirming previous results rather than supporting the conclusions reported by the Mouse ENCODE team.

“If you can contest a paper from just one tweet, that’s powerful,” said computational biologist Nicolas Robine of the New York Genome Center. “Of course people need the [reanalysis] to understand what they did, but [the tweet] seemed very clear.”

Now, in a preprint published today (May 19) by F1000 Research, Gilad and his coauthor Orna Mizrahi-Man detailed how they reanalyzed the Mouse ENCODE data to reach this alternate conclusion. The researchers obtained raw data, analysis code, and other details from the Mouse ENCODE team. They then extracted information on when and how each sample was sequenced, and matched the human and mouse tissues to lanes on sequencing runs. Most of the samples, Gilad and Mizrahi-Man discovered, had been sequenced in batches clustered by species; mouse tissues were run in one batch, human tissues in another. Only one of five sequenced batches included tissues from both species, the researchers found.

When Gilad and Mizrahi-Man applied statistical methods to eliminate this apparent batch effect, the species-specific clusters of gene expression disappeared. “When you correct for the batch effect, for the properties of how the study was designed, you do not see clustering by species,” said Gilad. “You see clustering by tissue.”

Batch effects—the potential results of small differences between methods, machines, or individual experimenters that can influence experimental output—have been reported in several other studies. To avoid the issue, experiments are usually set up so that batch effects don’t overlap with biological variables. In this instance, said Gilad, including more runs with samples from both species may have proved helpful. With only four such samples, it’s unclear how effectively batch effects can be controlled for.

Without more data, the conclusions of the Mouse ENCODE studies on species-specific gene expression are, to Gilad’s mind, uncertain at best. “It changes the entire conclusion, and argues that the conclusions of the Nature paper and PNAS paper are not warranted,” he told The Scientist. “At the very least, that there’s a question of whether a batch effect was responsible.”

Steven Salzberg, a computational biologist at Johns Hopkins University in Baltimore who was not involved in the work, called the reanalysis “thorough, careful . . . and really quite convincing.”

“The batch effect they uncovered undermines the results in a very serious way,” Salzberg told The Scientist. “I don’t think there’s any way to escape the fact that the main conclusion of the PNAS paper … is not true.”

Along with Salzberg and Robine, several other researchers expressed little surprise at the results of Gilad and Mizrahi-Man’s reanalysis on Twitter. To many, these latest results echoed their own disbelief of those published last year. For example, Benoit Bruneau, a cardiovascular researcher at the Gladstone Institutes in San Francisco, wrote: “That’s a relief. Those papers baffled me. Now I know why.”

To Robine, the issue at hand is a simple one. “If you have exceptional claims you need exceptional data,” he told The Scientist. “The main conclusions [of the PNAS paper] are probably wrong, but the experimental design was also not able to answer the question.”

“The data was collected for a particular question or project,” he continued. “If you’re going to reanalyze the data for a different question, you have to see if your question is compatible with the way the data was collected.”

Michael Snyder of Stanford University, a coauthor on the Mouse ENCODE study published in PNAS, is unconvinced by these latest results. “The reanalysis is really doing something we knew already,” he told The Scientist. “They wound up subtracting the species effect in the reanalysis, and therefore saw the tissue-specific effects, and that’s what we published as well.”

Snyder added that although the samples were run on separate lanes, the experiments were all done at the same time by the same person. “Basically that eliminates the laboratory effect and the person effects. We’ve never seen lane effects ourselves, but we will now document that,” he told The Scientist. “The bottom line is that we are pretty confident in our results, and the fact that it was studied in many different labs. We’ve actually spent two years on this—it wasn’t some cursory thing.”

To Snyder’s recollection, questions about potential batch effects were not raised during peer review of either of the Mouse ENCODE papers. The researchers did not participate in the Twitter discussion, which, as Snyder said, “can get personal—people start making negative comments about the authors very quickly.”

“Social media is a great forum for other discussions, but not when you’re critiquing someone’s work in this format,” Snyder said, adding that he and his colleagues are planning an online response to the preprint, which has not yet been peer reviewed.

Gilad was prompted to do the reanalysis because the published results “did not sit well with me,” he said. Because the results were reported by a consortium rather than a single investigator’s lab, he chose to share his findings via Twitter rather than more conventional means, such as contacting study authors or the journal. “We shared our analysis in the same way that the original data [was] shared—let the entire community figure it out,” he elaborated.

Much like Snyder, Gilad was surprised by the lack of questions he received about his own work on Twitter. “I wasn’t surprised by how many people joined in the discussion, retweeted, and all that,” he said. “But I was surprised that no one said: ‘You can’t just share the headlines; this is science, you need to share the details.’”

To integrative biologist Steve Phelps of the University of Texas at Austin, discussing experimental results in real time on Twitter is a bit unusual, but a reflection of how science is changing.

“Overall, the reanalysis and the participation [from ENCODE researchers] is a pretty positive thing for science,” said Phelps. “The original authors made their data and sources publicly available and shared it readily. [Gilad and Mizrahi-Man] had concerns, and they were able to go into the data and come up with an alternative version.”

“I think that [the discussion] is really pretty healthy and reflects well on both groups,” he added.

O. Mizrahi-Man and Y. Gilad, “A reanalysis of mouse ENCODE comparative gene expression data,” F1000 Research, doi:10.12688/f1000research.6536.1, 2015. 

Add a Comment

Avatar of: You

You

Processing...
Processing...

Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo

Comments

Avatar of: James V. Kohl

James V. Kohl

Posts: 481

May 20, 2015

genes in the mouse heart were expressed in a pattern more similar to that of other mouse tissues, such as the brain or liver, than the human heart.

If that was an accurate representation of biologically-based cause and effect, it could not be supported by what is currently known about the biophysically constrained chemistry of nutrient-depenent pheromone-controlled cell type differentiation during the life history transitions of all invertebrates and vertebrates.

The transitions are nutrient-dependent, which explains the similarities in tissue type differentiation across species.

RNA-directed DNA methylation links nutrient-dependent RNA-mediated amino acid subtitutions to biodiversity via the fixation of the substitutions in the context of the physiology of reproduction. An accurate representation of biologically-based cause and effect links Gilad (2003) et al.,  Natural Selection on the Olfactory Receptor Gene Family in Humans and Chimpanzees to Dobzhansky (1973) via this claim: "...the so-called alpha chains of hemoglobin have identical sequences of amino acids in man and the chimpanzee, but they differ in a single amino acid (out of 141) in the gorilla" (p. 127).

Amino acid substitutions differentiate the cell types of all cells in individuals in all genera. That may explain why others who are currently comparing the 1973 claim and 2003 claim also have challenged what appear to be misrepresentations of biologically-based cause and effect. The misrepresentations have been framed in the context of ridiculous theories.

Simply put, others are Combating Evolution to Fight Disease rather than accepting the claim that "...genomic conservation and constraint-breaking mutation is the ultimate source of all biological innovations and the enormous amount of biodiversity in this world." (p. 199). --  Mutation-Driven Evolution

Natural selection of food is one of two 'conditions of life' that Darwin tried to get others to consider before Hugo de Vries definition of mutation led population geneticists to insist on ignoring Darwin's other 'condition of life.' Instead of linking biodiversity to the nutrient-dependent pheromone-controlled RNA-mediated  physiology of reproduction that links metabolic networks to genetic networks in species from microbes to man, evolutionary theorists continue to place biodiversity into the context of perturbed protein folding. 

Serious scientists owe a great deal to Yoav Gilad and others who are not afraid to address the inconsistencies of theory in the context of facts. It will be interesting to see if the inconsistencies of evolutionary theory lead to healthy discussion of RNA-mediated events, since they obviously link the epigenetic landscape to the physical landscape of DNA via metabolic networks and genetic networks.

See for an example of what has been attributed to a single amino acid substitution: Oppositional COMT Val158Met effects on resting state functional connectivity in adolescents and adults

Popular Now

  1. Thousands of Mutations Accumulate in the Human Brain Over a Lifetime
  2. Two Dozen House Republicans Do an About-Face on Tuition Tax
  3. Can Young Stem Cells Make Older People Stronger?
  4. Putative Gay Genes Identified, Questioned
    The Nutshell Putative Gay Genes Identified, Questioned

    A genomic interrogation of homosexuality turns up speculative links between genetic elements and sexual orientation, but researchers say the study is too small to be significant. 

FreeShip