Two international teams have independently produced the first drafts of the human proteome. These curated catalogs of the proteins expressed in most non-diseased human tissues and organs can be used as a baseline to better understand changes that occur in disease states. Their findings were published today (May 29) in Nature.
Both teams uncovered new complexities of the human genome, identifying novel proteins from regions of the genome previously thought to be non-coding.
“While other large proteomic data sets have been collected that cataloged up to 10,000 proteins, the real breakthrough with these two projects is the comprehensive coverage of more than 80 percent of the expected human proteome which has not been achieved previously,” said Hanno Steen, director of proteomics at Boston Children's Hospital, who was not involved in the work. “These efforts clearly show that to get to this deep level of proteome coverage, many different tissue types must be probed.”
Analyzing 30 different tissue types, Akhilesh Pandey, a proteomics researcher at Johns Hopkins University in Baltimore, Maryland, and his colleagues at the Institute of Bioinformatics in Bangalore, India, and elsewhere cataloged proteins encoded by about 84 percent of all human genes predicted to code for proteins. The researchers published the results of their Human Proteome Map online, and the data will also soon be accessible through the National Center for Biotechnology Information database, said Pandey.
Meanwhile, proteomics researcher Bernhard Küster of the Technische Universität München in Germany and his colleagues created ProteomicsDB, a searchable, public database that catalogs 92 percent of the estimated 19,629 human proteins.
Both teams analyzed human tissue samples using mass spectrometry. Pandey’s team generated all new data, analyzing a variety of healthy human tissues, including seven types of fetal tissues and six types of hematopoetic cells. The Küster group took a slightly different approach, compiling already available raw mass spec data from databases and colleagues’ contributions, which currently makes up about 60 percent of the ProteomicsDB. To fill in the data gaps, the Küster lab generated its own mass spec data, analyzing 60 human tissues, 13 body fluids, and 147 cancer cell lines. According to Küster, the team only selected high-resolution public data, which was computationally processed for strict quality control.
“These two papers are very complimentary,” said Anne-Claude Gingras, a proteomics researcher at the Lunenfeld-Tanenbaum Research Institute in Toronto, Canada, who was not involved in either study. “The Hopkins group really addressed what was missing in proteomics, providing a survey of human proteins from a single source, which allows for easy comparisons within their data.” In contrast, the ProteomeDB effort connected new information with existing data from the proteomics community. The goal, said Küster, is to continue to grow and refine the database, further engaging the community and pooling more resources.
Comparing the ratio of protein to mRNA levels for every protein globally, the Küster lab found that the translation rate is a constant feature of each mRNA transcript. “This was a surprising and a really important finding,” said Gingras.
Steen agreed. “If this observation holds true, it’s a paradigm shift. The proteomics community has viewed transcriptome and proteome data as two sides of a coin,” he said. “But this analysis shows that at least at steady state, once the ratio for an mRNA/protein pair has been calculated, protein levels can be determined just from specific mRNA levels.”
Both studies identified evidence to suggest there is translation from DNA regions that were not thought to be translated—including more than 400 translated long, intergenic non-coding RNAs (lincRNAs)—found by the Küster team—and 193 new proteins—uncovered by the Pandey team. Still, the biological relevance of these and the novel proteins identified is not yet clear.
“The current genome annotations are based on computational algorithm,” said Min-Sik Kim, a research fellow at Johns Hopkins and an author on the Human Proteome Map study. “These predictions may not all be accurate, which is why [we] analyzed the proteins directly.”
The Pandey group is now working on examining the fetal proteome more closely, as well as adding post-translational protein modification data to its database. The team also wants to develop a protein map of the human brain—an organ that was not extensively studied as part of this latest effort, Pandey told The Scientist.
“The prevalent view was that information transfer was from genome to transcriptome to proteome. What these efforts show is that it’s a two-way road—proteomics can be used to annotate the genome. The importance is that, using these datasets, we can improve the annotation of the genome and the algorithms that predict transcription and translation,” said Steen. “The genomics field can now hugely benefit from proteomics data.”
M. Wilhelm et al., “Mass-spectrometry-based draft of the human proteome,” Nature, doi:10.1038/nature13319, 2014.
M.S. Kim et al. “A draft map of the human proteome,” Nature, doi:10.1038/nature13302, 2014.