FLICKR, SHAURY NASH An international team led by researchers at the Broad Institute of MIT and Harvard has compiled and analyzed the largest aggregate collection of human protein-coding sequences to date. The researchers, members of the Exome Aggregation Consortium (ExAC), have made these raw data openly accessible to the research community since 2014. In the team’s latest analysis of the exomes from around the world—presented in part at a genomics conference in 2015—the team highlighted the utility of the large dataset to identify rare disease–causing variants and genes that are particularly sensitive to mutational variation, including loss of function. The results are published today (August 17) in Nature.
“The important part of the work is the large number of [exomes],” Stephen Scherer, who studies variation in the human genome at the Hospital for Sick Children and the University of Toronto, Canada, but was not involved in the work, told The Scientist in an email. “This is good data that research and clinical communities can use in different ways.”
“This is the deepest anyone has gone for any substantial part of the [human] genome,” said Jay Shendure of the University of Washington in Seattle, who penned an accompanying perspective but was not involved in the research.
The protein-coding sequences—which comprise less than 2 percent of the entire human genome—“are the parts of the genome we understand the best and they are also the regions ...