Expanding ENCODE

Latest Encyclopedia of DNA Elements data enable researchers to compare genome regulation across species. 

Aug 27, 2014
Jyoti Madhusoodanan

NHGRIResearchers have long recognized genomic similarities across species. New results from the Encyclopedia of DNA Elements (ENCODE) and model organism ENCODE (modENCODE) projects, published in a series of papers in Nature today (August 27), could support further comparative analysis; together, the projects have now added more than 1,600 data sets, bringing the total number of available ENCODE/modENCODE data sets to 3,300. In their respective papers, the teams behind each project also provide key cross-species comparisons of genome regulation in nematode (roundworm), fly, and human cells.

“What’s really striking about these papers is that they find ways in which we can map similarities in genomic function between key model organisms that are often used in lab research,” said geneticist William Bush of Case Western Reserve University in Ohio who was not involved with the studies. “They have built models of genomic function that span all of these organisms.”

Previously, most cross-species comparisons of genome regulation examined only a few sites in the genome, yielding mixed results. Some studies suggested that regulatory regions were strongly conserved, while others found greater diversity among the same locations.

In one study, Alan Boyle of Stanford University in California and his colleagues compared maps of where transcription-regulating factors bind across the genomes of fly (Drosophila melanogaster), nematode (Caenorhabditis elegans), and human cell lines. They found that approximately half of all these sites among all three species are clustered at high-occupancy target (HOT) regions, areas where several transcription factors (TFs) congregate around chromatin. A related paper by Carlos Araya of Stanford and his colleagues examined the binding of different regulatory proteins to the C. elegans genome at different times and locations during development; this work also identified several HOT regions in the developing roundworm genome.

The significance of HOT regions is not yet clear. However, despite the evolutionary distance among them, human, roundworm, and fly cells shared similar chromatin structures to regulate transcription. “Finding that certain regulatory circuits that control development are actually conserved across all three organisms was fascinating,” said geneticist Michael Snyder of Stanford, a coauthor on both studies.

Comparing deep RNA sequencing data from both ENCODE and modENCODE across these three species, Mark Gerstein of Yale University in Connecticut and his colleagues found that gene expression levels “can be quantitatively predicted from chromatin features at the promoter using a model based on a single set of organism-independent parameters,” as they wrote in their paper. Identifying such a universal model of chromatin features is “sort of revolutionary,” said Bush.

To understand how transcription and other genomic functions are controlled, Joshua Ho—who was then at Harvard Medical School—and his colleagues analyzed chromatin features such as histone modifications and chromatin-associated proteins across the three species. Although these organisms have significantly different genome sizes and their genes are organized differently, the researchers identified many conserved elements—such as shared patterns of histone modification and regulatory regions—among them.

But shared patterns do not necessarily signify importance, said Bush. “Genome architecture is more conserved than we previously thought, but we have to be careful with taking conservation as a symbol of evolutionary constraint.”

Ho and his coauthors also found key differences in the structure of heterochromatin between species. “The idea that we define heterochromatin in more qualitative terms based on sequence content and the organism is interesting,” said geneticist John Greally of Albert Einstein College of Medicine in New York City who was not involved with these studies. He added that these data show that “heterochromatin is not the same thing in different organisms, not only in terms of distribution but also in terms of composition.”

Snyder suggested that this collection of ENCODE and modENCODE results could help researchers better understand the roles of genomic regulatory regions in diseases like cancer. Several previous studies have examined the roles of cancer-associated mutations in protein-coding regions, but genome-wide association studies frequently pick up mutations in regulatory regions of the genome as well. 

“Everybody studies protein-coding mutations in cancer, but there are probably many regulatory mutations that are important as well that are ignored for the most part,” said Snyder. “We’re going to be able to shed light on these for the first time, so it’s likely we can get to a causative mutation much more quickly when we do genetic studies.”

C.L. Araya et al., “Regulatory analysis of the C. elegans genome with spatiotemporal resolution,” Nature, doi:10.1038/nature13497, 2014.

A.P. Boyle et al., “Comparative analysis of regulatory information and circuits across distant species,” Nature, doi:10.1038/nature13668, 2014.

M.B. Gerstein et al., “Comparative analysis of the transcriptome across distant species,” Nature, doi:10.1038/nature13424, 2014.

J.W.K. Ho et al., “Comparative analysis of metazoan chromatin organization,” Nature, doi:10.1038/nature13415, 2014.