Drosophila’s New Genes

An analysis of the transcriptomes of several fruit fly strains reveals dozens of possible de novo genes in each.

Jef Akst
Jef Akst
Jan 23, 2014

WIKIMEDIA, MUHAMMAD MAHDI KARIMIn the last few years, scientists have come to realize that genes really can arise from formerly noncoding regions of the genome. Indeed, comparing the genomes of related species has even suggested that such de novo gene formation may be quite common. Today, the first-ever within-population search for novel genes supports this idea. Publishing in Science, researchers at the University of California, Davis, present a total of 142 transcripts that are expressed in some or all six Drosophila melanogaster strains they examined, but that corresponded to intergenic sequences of the D. melanogaster reference genome.

“Until recently, de novo origin of genes was considered to be so unlikely as to be impossible,” comparative genomicist Aoife McLysaght of the Smurfit Institute of Genetics at Trinity College in Dublin, Ireland, who was not involved in the study, told The Scientist in an e-mail. “[T]his population level analysis is...

“To show [the formation of de novo genes] at the population genetics level is really a nice story,” agreed evolutionary biologist Diethard Tautz of the Max Planck Institute for Evolutionary Biology in Plön, Germany, who also did not participate in the research. “It shows the power of generating from nothing, so to speak.”

To search for de novo genes in the fruit fly, UC Davis’s David Begun and his colleagues analyzed the transcriptomes of the testes of six D. melanogaster strains, as well as three strains of D. simulans and two strains of D. yakuba. They specifically looked for transcripts that were expressed in at least one D. melanogaster strain, but not in the D. melanogaster reference strain (data generated by the modENCODE project), nor in any of the D. simulans or D. yakuba strains the researchers examined. This pattern would indicate that the transcripts have only recently evolved to be expressed—at some point in the last 2 million to 3 million years since D. melanogaster and D. simulans split. The researchers then compared the transcripts that fit this pattern to the D. melanogaster reference genome and eliminated any sequences that fell fewer than 500 base pairs away from known genes, minimizing the possibility that the sequences were simply part of the untranslated regions of existing genes. In the end, they found 142 de novo gene candidates.

“There are lots of genes that originated and [are] spreading in the D. melanogaster population,” said coauthor Li Zhao, a postdoc in Begun’s lab.

Looking more closely at these transcripts, the researchers found evidence that the majority of the de novo gene candidates were subject to cis-regulation, meaning that expression was controlled by regulatory elements just upstream of the new transcripts. Furthermore, the vast majority of the sequences contained open reading frames (ORFs)—regions that could theoretically produce proteins, as signified by start and stop codon sequences—of at least 150 base pairs. And looking at the ancestral sequence, as well as the sequences of the nonexpressing Drosophila strains, the researchers found these same ORFs, suggesting that the regulatory change alone is responsible for the expression of the new gene.

“The simplest model [of de novo gene formation] is that they have some mutation in the upstream regions, and those mutations somehow—it could be a binding region or some other regulation region—they somehow make the transcription machinery start,” Zhao said.

Finally, the researchers present preliminary data to suggest that these potential de novo genes may have been subject to natural selection. First, genes that were expressed at high frequencies in the population tended to be longer and more complex than those expressed at lower frequencies, pointing to a role for selection in their spread. Moreover, the researchers observed reduced heterozygosity, also consistent with patterns of selection. Whether these sequences are translated into proteins, or are otherwise functional, remains to be seen.

And of course, not all new genes are likely to be beneficial. In fact, theory would predict that the majority are actually harmful. But the idea that new genes are arising at such high frequencies would certain give natural selection plenty of raw material to work with. “You’ve got this constant churning of new regulatory mutations activating ancestrally unexpressed sequence, and a lot of them turn out to be deleterious, but some of them turn out to be functionally important and spread by selection,” Begun said. “We think we got a glimpse of that from this dataset.”

Though many questions remain, the study represented yet another advance in scientists’ understanding of a phenomenon that only a few years ago was thought impossible, said Tautz. “There has been a long tradition in biology to think that a gene can only arise due to duplication and diversions from another gene, and this is therefore a completely new story. It’s quite an exciting field.”

L. Zhao et al., “Origin and spread of de novo genes in Drosophila melanogaster populations,” Science, doi:10.1126/science.1248286, 2014.