The Dark Matter of the Human Proteome
The Dark Matter of the Human Proteome

The Dark Matter of the Human Proteome

Advances in the functional characterization of newly discovered microproteins hint at diverse roles in health and disease.

Apr 1, 2019
Annie Rathore


In 2014, as I started my doctoral work at the lab of Alan Saghatelian at the Salk Institute for Biological Studies in La Jolla, California, the idea that there were tiny proteins in our cells that had long been overlooked by researchers was gaining traction. Researchers had recently recognized that the genome contained genes that were so small that they had been missed by traditional genome annotation methods, and targeted searching for protein-coding snippets of DNA had suggested there may be many thousands of so-called micro­proteins hard at work in our cells.

Before I joined the lab, Saghatelian’s group had developed a new approach to validate the existence of some 400 microproteins across multiple human cell lines and tissues. Other labs were similarly confirming the existence of these predicted peptides, pointing to the ubiquitous nature of microproteins. But what were they doing?

The functions of the microproteins in biological context quickly became a focus of the field. Some researchers, including Saghatelian’s group, approached this research question by looking for proteins that interacted with the micro­protein of interest. In the same year that I started in the lab, the group published its work on the microprotein MRI-2, later renamed CYREN, which appears to regulate DNA repair. It was one of only a handful of microproteins that had been characterized, yet many studies estimated that there were hundreds or thousands more to be looked at. The scope of the field was massive. I knew I wanted in on this burgeoning area of research.

Thousands of microproteins are pre­dicted to exist across species on the basis of genomic data.

Microproteins are not the cell’s only tiny proteins, but similarly diminutive peptide hormones, such as insulin, only become biologically active after they’re cleaved from larger precursor proteins. Microproteins, on the other hand, start out that way. They are translated from a small open reading frame (smORF) directly into their active form. These smORFs are so tiny, in fact, that researchers overlooked them in the early 2000s as they began predicting all the protein-coding regions in the newly sequenced human genome; they used a minimum length cutoff of 100 codons for gene assignment to decrease the rate of false positives. But over the past 10 years, developments in genomics and proteomics methods have revealed hundreds to thousands of smORFs.

Nailing down the functions of the microproteins they encode has been challenging. For one, newly discovered protein-coding smORFs often fail to produce detectable amounts of protein when cloned and expressed in cultured human cells, likely because their size makes the tiny proteins unstable. Moreover, most of these novel microproteins aren’t homologous to any known protein in any organism, making it extremely difficult to develop and test hypotheses about their functions.

Despite the challenges, researchers are making progress in characterizing the functions of the putative microproteins that have already been found in the genetic code. New techniques for identify­ing protein-microprotein interactions reveal how the tiny molecules function in the context of larger protein complexes, and where those complexes tend to be found in the cell. In the last five years, our group and others have uncovered plausible roles for microproteins in development, metabolism, muscle function, DNA repair, and mitochondrial activity, and some may even have links to disease. We are only beginning to scratch the surface of this field. Hundreds more microproteins have been detected across human cell lines and tissues, and thousands are predicted to exist across species on the basis of genomic data.

Finding Microprotein Functions

To determine how microproteins function in the cell, researchers interrogate their interactions with other proteins. One way to do this is to genetically tag the microprotein of interest with a peptide called FLAG and then isolate it and its interacting partners using antibodies that bind to the tag. Alternatively, researchers can use a tag called APEX2 that labels nearby interacting proteins. In both cases, the isolated protein complexes are analyzed using proteomic and biochemical methods, and the results can shed light on the functions of the microproteins themselves.

Tagging a microprotein called cell cycle reg­ulator of nonhomologous end joining (CYREN) revealed its role in regulating DNA repair. Researchers linked a DNA sequence encoding a short peptide called FLAG to the gene encoding CYREN. The FLAG-tagged microprotein sequence was then expressed in cultured human embryonic kidney cells, which were collected, lysed, and incubated with anti-FLAG antibodies bound to beads.

Putative function: CYREN acts by binding to the Ku70/Ku80 heterodimer and inhibits nonhomologous end joining (NHEJ) by protecting breaks with overhangs. The researchers proposed that, in doing so, CYREN promotes error-free repair by homologous recombination during cell cycle phases when sister chromatids are present.

Tagging the interacting partners of a microprotein called mitochondrial elongation factor 1 microprotein (MIEF1-MP) revealed its role in protein translation in mitochondria. My colleagues and I genetically linked MIEF1-MP to a protein tag called ascorbate peroxidase 2 (APEX2), then treated human cells expressing the sequence for the tagged microprotein with the labeling agent biotin-phenol, causing nearby proteins to acquire the label. We then collected, lysed, and treated the cells with streptavidin beads that bind to the biotinylated proteins, allowing for their enrichment for functional analyses.

Putative function: MIEF1-MP interacts with the mitochondrial ribosome (mitoribosome) and may serve as an assembly factor for the mitoribosome. The translation of proteins in the mitochondria slows when MIEF1-MP levels drop and picks up as the abundance of the microprotein increases.

See full infographic: WEB | PDF
all illustrations by Lucy Reading-Ikkanda

The discovery of microproteins

Once it was clear that genomes contain protein-coding regions of fewer than 100 codons, scientists rethought and invented techniques for discovering smORFs and the microproteins they encode. In a 2011 informatics analysis in fruit flies, using conservation and protein-coding potential across two fly species as parameters, Juan Pablo Couso of the University of Sussex and colleagues predicted some 400 to 4,500 bona fide smORFs.

A few years later, the field adopted ribosome profiling techniques to identify “ribosome footprints”—mRNAs that are found associating with ribosomes, presumably in the act of translation. In 2014, Ariel Bazzini and colleagues at Yale University used this approach to find 190 actively translating smORFs in the zebrafish and human genomes. In addition, they identified 311 smORFs in the upstream 5′ untranslated region (UTR) and 93 in the downstream 3′ UTR of annotated genes. That same year, the Saghatelian group—then at Harvard, but soon to move to Salk, where I joined it—searched the proteomics data against a custom RNA-seq database of all possible translation products in all reading frames, and discovered 237 microproteins across multiple human cell lines and tissues.

These techniques have identified evidence of smORFs in diverse types of supposedly noncoding RNA, including introns of pre-mRNAs, long noncoding RNAs, and primary transcripts of micro­RNAs and ribosomal RNAs. The next step is to identify the microproteins they encode and understand their biological importance.

See “Noncoding RNAs Not So Noncoding

Microprotein functions

By identifying domains or motifs in microproteins that also appear in larger, well-characterized proteins, researchers can begin to glean clues to function. Genetic screens measuring differential expression of transcripts in different cell types or contexts can also point toward possible biological roles. Most directly, researchers can genetically tag the microprotein of interest, express it in human cells, and look at where it is localized within the cell and what other proteins it interacts with.
Traditional methods, such as tagged-protein immunoprecipitation, can be applied to uncover microprotein-protein interactions. In one commonly used assay, for example, a DNA sequence encoding a short peptide called FLAG is added to the gene encoding the microprotein of interest and expressed in cultured human cells. 

The cells are then harvested and lysed, and the tagged microprotein is enriched using FLAG-targeting antibodies bound to beads in vitro. When the beads and the attached microprotein are immunoprecipitated, the interacting protein partners come along for the ride. Once isolated from the sample, the interacting proteins can be identified using proteomic techniques such as mass spectrometry and biochemical methods such as western blots.

Small but Mighty

As of early 2019, only about a dozen microproteins have been assigned even putative roles in humans or animal models.

MRI-2/CYREN69 aaHumanDNA repair
Nobody68 aaHumanMessenger RNA decapping
MIEF170 aaHumanMitochondrial protein synthesis
Hemotin/stannin88 aaFlyEndosomal maturation
Polar granule component71 aaFlyGermline development

Physiological RolesMinion84 aaMouseMuscle development
Tarsal-less or polished rice11-32 aaFlyEmbryonic development
Toddler55-58 aaRodent, zebrafish, humanCardiovascular development
Sarcolamban28-29 aaFlyMuscle contraction
Sarcolipin31 aaRodent, rabbit, humanMuscle contraction
Phospholamban52 aaMany vertebratesMuscle contraction
Myoregulin46 aaMouse, humanMuscle contraction
DWORF34 aaMouseMuscle contraction

Disease LinksHumanin24 aaMammalNeuronal cell death
CASIMO183 aaRodentActin cytoskeleton regulation

Over the past couple of years, these approaches have revealed that, unlike peptide hormones, which carry signals from one cell to another, microproteins function within the cell by interacting with larger protein complexes. Using the FLAG method, for instance, Saghatelian’s group discovered that CYREN interacts with the heterodimeric protein complex in the nucleus that binds to loose ends of DNA double-strand breaks, suggesting a role for CYREN in DNA repair. Later, functional studies by Jan Karlseder’s group, also at Salk, showed that CYREN inhibits the non-homologous end joining repair pathway during certain phases of the cell cycle. (See illustration.)

But methods such as FLAG are nonspecific; in addition to proteins that truly interact with the microprotein of interest, the immunoprecipitates contain numerous housekeeping proteins. Researchers can experiment with different buffer conditions and other parameters to reduce contaminant proteins, but this runs the risk of losing target proteins interacting with weaker bonds.

Shortly after I joined the Saghatelian lab, I set out to address this limitation. In 2015, I started with a protein tag called ascorbate peroxidase 2 (APEX2), developed for proteomic mapping by Alice Ting’s group, then at MIT. When APEX2-tagged proteins are exposed to biotin phenol and a catalyst in cultured cells, nearby proteins are labeled with the biotin. Because the half-life of the active biotin phenoxyl radical is less than 1 millisecond, the labeling radius is limited to 20 nanometers, meaning that only proteins that are in immediate proximity to the APEX-tagged molecules acquire the biotin label. The cells are then lysed, and biotinylated proteins are enriched using streptavidin beads and identified by proteomic and biochemical methods.

My colleagues and I applied the APEX2 proximity labeling method to identify interacting partners of microproteins expressed in human cells. One of the microproteins we targeted was the mitochondrial elongation factor 1 microprotein (MIEF1-MP), which had been reported in earlier studies to regulate mitochondrial dynamics. Our experiments revealed MIEF1-MP’s interaction with the mitochondrial ribosome (mitoribosome), and additional cryogenic electron microscopy studies of the human mitoribosome suggested that MIEF1-MP may be required for mitoribosome assembly. Indeed, when we measured the rate of protein synthesis in the mitochondria, we found that the loss of MIEF1-MP decreased the rate of translation. Elevated MIEF1-MP levels, on the other hand, resulted in increased translation.

The development and application of these microprotein-protein interaction detection technologies provide clues to the functions of microproteins. Early research hints at a diverse functional repertoire that should prompt scientists to think about how these tiny proteins might be exploited.

Microproteins at work

One application that comes to mind is biomedicine. Already, microproteins have been fingered for diverse cellular and physiological functions, biological processes that, if things go wrong, can lead to disease. Defects in DNA repair are a common theme in cancer, for example, while problems in
mitochondrial protein synthesis are a leading cause of metabolic and developmental disorders.

A team of Novartis researchers, with whom our group had collaborated on other microprotein projects, published a study on a microprotein called Minion (Microprotein INducer of fusION) that may one day be exploited for targeted drug delivery. Srihari and Srinath Sampath’s group discovered the microprotein in regenerating mouse muscle after an injury. The production of Minion peaks three to four days after injury, similar to the expression profile of a protein called Myomaker that is known to control muscle cell fusion.

Later, functional studies showed that Minion, along with Myomaker, allows cells to fuse and form multinucleated fibers that are capable of contracting. A lack of Minion disables skeletal muscles, including the diaphragm, resulting in perinatal death in mice. This insight into the Minion-Myomaker
system led the researchers to envision harnessing it to target fusing cells in cancer or other contexts.

Early research hints at a diverse functional repertoire.

Another newly discovered microprotein with a link to disease is CASIMO1 (Cancer Associated Small Integral Membrane Open reading frame 1), an 83-amino acid microprotein characterized by Sven Diederichs’s group at the German Cancer Research Center in Heidelberg. In an expression profiling study to identify human genes involved in hormone receptor–positive primary breast cancer samples, an RNA transcript initially annotated as a putative noncoding RNA showed sixfold higher levels in breast tumor than in normal tissue.

The researchers later found that the loss of CASIMO1 disrupted the regulation of the actin cytoskeleton, impaired migration ability, reduced cells’ proliferation rate, and stalled the cell cycle in the G0/G1 phase—changes that would be expected to limit cancerous growth. The microprotein also appears to modulate cellular lipid levels and signal transduction.

These findings emphasize the importance of further investigations to discover and characterize novel smORFs and the microproteins they encode. Biochemical studies show that microproteins use short sequences of just two to four amino acids to interact with larger protein complexes to regulate biology. Such interactions are amenable to small-molecule inhibition, and, therefore, microprotein-protein interactions could reveal new druggable targets. As researchers continue to probe the functions of microproteins to better understand their mechanisms of action in these various roles and disease conditions, it will enable the development of new therapeutics.

Annie Rathore is a life science consultant at Deloitte Management Consulting. She graduated from the Salk Institute for Biological Studies in La Jolla, California, in 2018.


  1. M.C. Frith et al., “The abundance of short proteins in the mammalian proteome,” PLOS Genet, 2:e52, 2006.
  2. E. Ladoukakis et al., “Hundreds of putatively functional small open reading frames in Drosophila,” Genome Biol, 12:R118, 2011.
  3. A.A. Bazzini et al., “Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation,” EMBO J, 33:981–93, 2014.
  4. J. Ma et al., “Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue,” J Proteome Res, 13:1757–65, 2014.
  5. S.A. Slavoff et al., “Peptidomic discovery of short open reading frame-encoded peptides in human cells,” Nat Chem Biol, 9:59–64, 2013.
  6. N. Arnoult et al., “Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN,” Nature, 549:548–52, 2017.
  7. Q. Chu et al., “Identification of microprotein–protein interactions via APEX tagging,” Biochemistry, 56:3299–306, 2017.
  8. A. Rathore et al., “MIEF1 microprotein regulates mitochondrial translation,” Biochemistry, 57:5564–75, 2018.
  9. Q. Zhang et al., “The microprotein Minion controls cell fusion and muscle formation,” Nat Commun, 8:15664, 2017.