© THOMAS DEERINCK, NATIONAL CENTER FOR MICROSCOPY AND IMAGING RESEARCHBillions of years ago, one cell—the ancestral cell of modern eukaryotes—engulfed another, a microbe that gave rise to today’s mitochondria. Over evolutionary history, the relationship between our cells and these squatters has become a close one; mitochondria provide us with energy and enjoy protection from the outside environment in return. As a result of this interdependence, our mitochondria, which once possessed their own complete genome, have lost most of their genes: while the microbe that was engulfed so many years ago is estimated to have contained thousands of genes, humans have just 13 remaining protein-coding genes in their mitochondrial DNA (mtDNA).
Some mitochondrial genes have disappeared completely; others have been transferred to our cells’ nuclei for safekeeping, away from the chemically harsh environment of the mitochondrion. This is akin to storing books in a nice, dry, central library, instead...
Researchers have proposed diverse hypotheses to explain mitochondrial gene retention. Perhaps the products of some genes are hard to introduce into the mitochondrion once they’ve been made elsewhere. (Mitochondria have their own ribosomes and are capable of translating their retained genes in-house.) Or perhaps keeping some mitochondrial genes allows the cell to control each organelle individually. Historically, it has been hard to gather quantitative support for any of these ideas, but in the world of big (and growing) biological data we now have the power to shed light on this question. The mtDNA sequences of thousands of organisms as diverse as plants, worms, yeasts, protists, and humans have now been sequenced, yielding information on the patterns of gene loss and on the gene properties that may have governed this loss.
Modern statistical approaches give us ways to allow this wealth of information to speak for itself, for or against different hypotheses, without (as much) human preconception entering into the process. Such approaches often involve building models to describe how the natural world could have given rise to our observations. Sometimes we do this without realizing it: assuming that the errors on a quantity are normally distributed, for example, invokes a particular (and sometimes inappropriate!) model of the biological and experimental details underlying that measurement. So, in order to analyze the 2,000+ mitochondrial genomes available (Cell Systems, 2:101-11, 2016), we needed a general and unbiased way of accounting for the observed sequences.
To this end, we developed a mathematical description including all possible combinations of the mitochondrial genes we see today, and the different ways organisms could evolve from having a complete ancestral genome to having no genes at all. To avoid any personal preconceptions about possible mechanisms, we first codified our assumption, before seeing the data, that every way of getting from a full set of genes to an empty one could be equally likely; all existing genes are equally likely to be lost at any time. We then used the sequence data to perform calculations determining the probabilities of the different evolutionary paths actually having occurred.
Not surprisingly, we observed similar patterns of gene loss across different lineages, indicating that some genes are more likely to be lost than others. Some genes tend to be lost early on and are missing from mtDNA in most species, while others are retained by almost all organisms. This consistency speaks to a certain predictability of evolution; guiding trends appear to shape different species in the same way.
We then used another statistical approach called model selection to explore the mechanisms that are responsible for dictating these patterns of gene loss and address the long-debated hypotheses about mitochondrial evolution. We considered a set of possible models for how likely a given gene was to be lost based on different hypotheses, from length to sequence to chemical properties. Again, we initially assumed that all possibilities were equally likely and let the data speak for themselves. In the end, we identified three features that together predict whether a gene is likely to be retained in the mitochondrion, rather than transferred to the nucleus: 1) it encodes a protein that forms the center of a complex, 2) it encodes hydrophobic (water-repelling) proteins, and 3) it contains many Gs and Cs in the DNA sequence.
So what do these results mean? Can we now settle the age-old debate of how and why mitochondrial genes are lost? In a way, yes, because these three features suggest that a combination of hypotheses is on the mark. Proteins that are central to complexes are important for the correct assembly of those complexes, so the first feature supports the idea that mitochondria need to keep some genes to assemble their own machinery locally. That genes encoding hydrophobic proteins are more likely to be retained in mtDNA supports the hypothesis that some proteins won’t end up in the mitochondrion if they are made elsewhere, because hydrophobic proteins made in the cytoplasm tend to be shuttled to other regions of the cell. As for the third feature, we think that the numbers of Gs and Cs may be important in keeping DNA stable in the damaging environment of mitochondria, perhaps like a waterproof coating to protect the contents of the leaky shed.
Of course, these hypotheses still need to be put to the test, but preliminary work from the synthetic biology field supports our findings. Specifically, scientists have tried to transfer genes from the mitochondrial genome to the nuclear genome in yeast, mimicking the process that has occurred in evolution. While some of these experiments produced healthy, normal yeast, others did not. We found that the features we identified in our model selection predicted the genes that could not be viably transferred to the nucleus.
It is becoming clear that we need a combination of mechanisms to explain mtDNA gene loss. It is an odd feature of scientific discussion that researchers tend to develop a single explanation for the phenomena we observe in the very complex biological world; the fact that several hypotheses contribute to the full story helps explain and reconcile the heated historical debate on this topic. Moreover, our work supports the use of unbiased statistical and modelling approaches to interrogate many other biological problems, from crop design to disease infection and progression. Such approaches can help provide us with a genuinely open mind to tackle debated scientific questions and seek the underlying truth.
Iain Johnston is a Birmingham Fellow at the University of Birmingham, U.K. Ben Williams is a postdoc at the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts.
Clarification (May 31): This story has been updated from its original version to clarify that there are 13 protein-coding genes remaining in the human mitochondrial genome. There are additional genes for noncoding RNAs. The Scientist regrets any confusion.