A Systematic Approach to Finding Unannotated Proteins

A study suggests that there is more to the eukaryotic genome than was previously suspected.

Katarina Zimmer
Katarina Zimmer
Feb 28, 2018

UNEARTHED TREASURE: Confocal microscopy image of a previously unannotated mitochondrial protein, altMiD51 (green), alongside mitochondria (red) ANNIE ROY


S. Samandi et al., “Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins,” eLife, 6:e27860, 2017.

For many years, scientists believed that each eukaryotic gene encoded just one protein and its isoforms, and researchers annotated genomes accordingly. But recent research has shown that individual genes can encode multiple different proteins, and that plenty of proteins arise from regions of the genome that are considered noncoding. Xavier Roucou, a biochemist at the University of Sherbrooke in Quebec, Canada, decided to take a systematic approach to annotating these undocumented proteins.  

To detect regions of the genome that might encode these proteins—so-called “alternative open reading frames” (altORFs)—Roucou and colleagues scanned nine eukaryotic genomes, including the human genome, for translation initiation sites and stop codons. They then translated these in silico to predict the corresponding proteins, ending up with 183,191 possible unannotated proteins in the human transcriptome alone. Many of these had orthologs in the genomes of other species examined, and appeared to have functional domains.  

To estimate how many of the putative alternative proteins are expressed in humans, the researchers searched in proteomics data collected from human samples in other studies, and detected nearly 5,000 of them. For Roucou, the results suggest that the genome harbors many overlooked proteins. “We cannot ignore them anymore,” he says.

Judith Steen, a neurologist at Harvard Medical School, finds the results intriguing. However, she notes that it’s still unknown how many of the predicted proteins are actively translated in vivo, under what circumstances, and what role they play. “From my perspective, a lot of work needs to be done,” she says. “These are baby steps.”

Update (March 5): The original version of this article mentioned scanning genomes for transcription initiation sites; in fact, they were scanned for translation initiation sites. The Scientist regrets the error.