A Systematic Approach to Finding Unannotated Proteins

A study suggests that there is more to the eukaryotic genome than was previously suspected.

Mar 1, 2018
Katarina Zimmer

UNEARTHED TREASURE: Confocal microscopy image of a previously unannotated mitochondrial protein, altMiD51 (green), alongside mitochondria (red) ANNIE ROY


S. Samandi et al., “Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins,” eLife, 6:e27860, 2017.

For many years, scientists believed that each eukaryotic gene encoded just one protein and its isoforms, and researchers annotated genomes accordingly. But recent research has shown that individual genes can encode multiple different proteins, and that plenty of proteins arise from regions of the genome that are considered noncoding. Xavier Roucou, a biochemist at the University of Sherbrooke in Quebec, Canada, decided to take a systematic approach to annotating these undocumented proteins.  

To detect regions of the genome that might encode these proteins—so-called “alternative open reading frames” (altORFs)—Roucou and colleagues scanned nine eukaryotic genomes, including the human genome, for translation initiation sites and stop codons. They then translated these in silico to predict the corresponding proteins, ending up with 183,191 possible unannotated proteins in the human transcriptome alone. Many of these had orthologs in the genomes of other species examined, and appeared to have functional domains.  

To estimate how many of the putative alternative proteins are expressed in humans, the researchers searched in proteomics data collected from human samples in other studies, and detected nearly 5,000 of them. For Roucou, the results suggest that the genome harbors many overlooked proteins. “We cannot ignore them anymore,” he says.

Judith Steen, a neurologist at Harvard Medical School, finds the results intriguing. However, she notes that it’s still unknown how many of the predicted proteins are actively translated in vivo, under what circumstances, and what role they play. “From my perspective, a lot of work needs to be done,” she says. “These are baby steps.”

Update (March 5): The original version of this article mentioned scanning genomes for transcription initiation sites; in fact, they were scanned for translation initiation sites. The Scientist regrets the error.

January 2019

Cannabis on Board

Research suggests ill effects of cannabinoids in the womb


Sponsored Product Updates

pIC50: The Advantages of Thinking Logarithmically
pIC50: The Advantages of Thinking Logarithmically
Watch this webinar from Collaborative Drug Discovery to learn about how using pIC50 helps you get a better sense of the relative potencies, calculate the correct mean of multiple values, and select better sampling doses.
WIN a VIAFLO 96/384 to supercharge your microplate pipetting!
WIN a VIAFLO 96/384 to supercharge your microplate pipetting!
INTEGRA Biosciences is offering labs the chance to win a VIAFLO 96/384 pipette. Designed to simplify plate replication, plate reformatting or reservoir-to-plate transfers, the VIAFLO 96/384 allows labs without the space or budget for an expensive pipetting robot to increase the speed and throughput of routine tasks.
FORMULATRIX® digital PCR technology to be acquired by QIAGEN
FORMULATRIX® digital PCR technology to be acquired by QIAGEN
FORMULATRIX has announced that their digital PCR assets, including the CONSTELLATION® series of instruments, is being acquired by QIAGEN N.V. (NYSE: QGEN, Frankfurt Stock Exchange: QIA) for up to $260 million ($125 million upfront payment and $135 million of milestones).  QIAGEN has announced plans for a global launch in 2020 of a new series of digital PCR platforms that utilize the advanced dPCR technology developed by FORMULATRIX combined with QIAGEN’s expertise in assay development and automation.
Application of CRISPR/Cas to the Generation of Genetically Engineered Mice
Application of CRISPR/Cas to the Generation of Genetically Engineered Mice
With this application note from Taconic, learn about the power that the CRISPR/Cas system has to revolutionize the field of custom mouse model generation!