In 2014, as I started my doctoral work at the lab of Alan Saghatelian at the Salk Institute for Biological Studies in La Jolla, California, the idea that there were tiny proteins in our cells that had long been overlooked by researchers was gaining traction. Researchers had recently recognized that the genome contained genes that were so small that they had been missed by traditional genome annotation methods, and targeted searching for protein-coding snippets of DNA had suggested there may be many thousands of so-called microproteins hard at work in our cells.
Before I joined the lab, Saghatelian’s group had developed a new approach to validate the existence of some 400 microproteins across multiple human cell lines and tissues. Other labs were similarly confirming the existence of these predicted peptides, pointing to the ubiquitous nature of microproteins. But what were they doing?
Thousands of microproteins are predicted to exist across species on the basis of genomic data.
Microproteins are not the cell’s only tiny proteins, but similarly diminutive peptide hormones, such as insulin, only become biologically active after they’re cleaved from larger precursor proteins. Microproteins, on the other hand, start out that way. They are translated from a small open reading frame (smORF) directly into their active form. These smORFs are so tiny, in fact, that researchers overlooked them in the early 2000s as they began predicting all the protein-coding regions in the newly sequenced human genome; they used a minimum length cutoff of 100 codons for gene assignment to decrease the rate of false positives. But over the past 10 years, developments in genomics and proteomics methods have revealed hundreds to thousands of smORFs.
Nailing down the functions of the microproteins they encode has been challenging. For one, newly discovered protein-coding smORFs often fail to produce detectable amounts of protein when cloned and expressed in cultured human cells, likely because their size makes the tiny proteins unstable. Moreover, most of these novel microproteins aren’t homologous to any known protein in any organism, making it extremely difficult to develop and test hypotheses about their functions.
Despite the challenges, researchers are making progress in characterizing the functions of the putative microproteins that have already been found in the genetic code. New techniques for identifying protein-microprotein interactions reveal how the tiny molecules function in the context of larger protein complexes, and where those complexes tend to be found in the cell. In the last five years, our group and others have uncovered plausible roles for microproteins in development, metabolism, muscle function, DNA repair, and mitochondrial activity, and some may even have links to disease. We are only beginning to scratch the surface of this field. Hundreds more microproteins have been detected across human cell lines and tissues, and thousands are predicted to exist across species on the basis of genomic data.
Finding Microprotein Functions
To determine how microproteins function in the cell, researchers interrogate their interactions with other proteins. One way to do this is to genetically tag the microprotein of interest with a peptide called FLAG and then isolate it and its interacting partners using antibodies that bind to the tag. Alternatively, researchers can use a tag called APEX2 that labels nearby interacting proteins. In both cases, the isolated protein complexes are analyzed using proteomic and biochemical methods, and the results can shed light on the functions of the microproteins themselves.
all illustrations by Lucy Reading-Ikkanda
The discovery of microproteins
Once it was clear that genomes contain protein-coding regions of fewer than 100 codons, scientists rethought and invented techniques for discovering smORFs and the microproteins they encode. In a 2011 informatics analysis in fruit flies, using conservation and protein-coding potential across two fly species as parameters, Juan Pablo Couso of the University of Sussex and colleagues predicted some 400 to 4,500 bona fide smORFs.
A few years later, the field adopted ribosome profiling techniques to identify “ribosome footprints”—mRNAs that are found associating with ribosomes, presumably in the act of translation. In 2014, Ariel Bazzini and colleagues at Yale University used this approach to find 190 actively translating smORFs in the zebrafish and human genomes. In addition, they identified 311 smORFs in the upstream 5′ untranslated region (UTR) and 93 in the downstream 3′ UTR of annotated genes. That same year, the Saghatelian group—then at Harvard, but soon to move to Salk, where I joined it—searched the proteomics data against a custom RNA-seq database of all possible translation products in all reading frames, and discovered 237 microproteins across multiple human cell lines and tissues.
These techniques have identified evidence of smORFs in diverse types of supposedly noncoding RNA, including introns of pre-mRNAs, long noncoding RNAs, and primary transcripts of microRNAs and ribosomal RNAs. The next step is to identify the microproteins they encode and understand their biological importance.
See “Noncoding RNAs Not So Noncoding”
By identifying domains or motifs in microproteins that also appear in larger, well-characterized proteins, researchers can begin to glean clues to function. Genetic screens measuring differential expression of transcripts in different cell types or contexts can also point toward possible biological roles. Most directly, researchers can genetically tag the microprotein of interest, express it in human cells, and look at where it is localized within the cell and what other proteins it interacts with.
Traditional methods, such as tagged-protein immunoprecipitation, can be applied to uncover microprotein-protein interactions. In one commonly used assay, for example, a DNA sequence encoding a short peptide called FLAG is added to the gene encoding the microprotein of interest and expressed in cultured human cells.
The cells are then harvested and lysed, and the tagged microprotein is enriched using FLAG-targeting antibodies bound to beads in vitro. When the beads and the attached microprotein are immunoprecipitated, the interacting protein partners come along for the ride. Once isolated from the sample, the interacting proteins can be identified using proteomic techniques such as mass spectrometry and biochemical methods such as western blots.
Small but Mighty
As of early 2019, only about a dozen microproteins have been assigned even putative roles in humans or animal models.
Over the past couple of years, these approaches have revealed that, unlike peptide hormones, which carry signals from one cell to another, microproteins function within the cell by interacting with larger protein complexes. Using the FLAG method, for instance, Saghatelian’s group discovered that CYREN interacts with the heterodimeric protein complex in the nucleus that binds to loose ends of DNA double-strand breaks, suggesting a role for CYREN in DNA repair. Later, functional studies by Jan Karlseder’s group, also at Salk, showed that CYREN inhibits the non-homologous end joining repair pathway during certain phases of the cell cycle. (See illustration.)
But methods such as FLAG are nonspecific; in addition to proteins that truly interact with the microprotein of interest, the immunoprecipitates contain numerous housekeeping proteins. Researchers can experiment with different buffer conditions and other parameters to reduce contaminant proteins, but this runs the risk of losing target proteins interacting with weaker bonds.
Shortly after I joined the Saghatelian lab, I set out to address this limitation. In 2015, I started with a protein tag called ascorbate peroxidase 2 (APEX2), developed for proteomic mapping by Alice Ting’s group, then at MIT. When APEX2-tagged proteins are exposed to biotin phenol and a catalyst in cultured cells, nearby proteins are labeled with the biotin. Because the half-life of the active biotin phenoxyl radical is less than 1 millisecond, the labeling radius is limited to 20 nanometers, meaning that only proteins that are in immediate proximity to the APEX-tagged molecules acquire the biotin label. The cells are then lysed, and biotinylated proteins are enriched using streptavidin beads and identified by proteomic and biochemical methods.
My colleagues and I applied the APEX2 proximity labeling method to identify interacting partners of microproteins expressed in human cells. One of the microproteins we targeted was the mitochondrial elongation factor 1 microprotein (MIEF1-MP), which had been reported in earlier studies to regulate mitochondrial dynamics. Our experiments revealed MIEF1-MP’s interaction with the mitochondrial ribosome (mitoribosome), and additional cryogenic electron microscopy studies of the human mitoribosome suggested that MIEF1-MP may be required for mitoribosome assembly. Indeed, when we measured the rate of protein synthesis in the mitochondria, we found that the loss of MIEF1-MP decreased the rate of translation. Elevated MIEF1-MP levels, on the other hand, resulted in increased translation.
The development and application of these microprotein-protein interaction detection technologies provide clues to the functions of microproteins. Early research hints at a diverse functional repertoire that should prompt scientists to think about how these tiny proteins might be exploited.
Microproteins at work
One application that comes to mind is biomedicine. Already, microproteins have been fingered for diverse cellular and physiological functions, biological processes that, if things go wrong, can lead to disease. Defects in DNA repair are a common theme in cancer, for example, while problems in
mitochondrial protein synthesis are a leading cause of metabolic and developmental disorders.
A team of Novartis researchers, with whom our group had collaborated on other microprotein projects, published a study on a microprotein called Minion (Microprotein INducer of fusION) that may one day be exploited for targeted drug delivery. Srihari and Srinath Sampath’s group discovered the microprotein in regenerating mouse muscle after an injury. The production of Minion peaks three to four days after injury, similar to the expression profile of a protein called Myomaker that is known to control muscle cell fusion.
Later, functional studies showed that Minion, along with Myomaker, allows cells to fuse and form multinucleated fibers that are capable of contracting. A lack of Minion disables skeletal muscles, including the diaphragm, resulting in perinatal death in mice. This insight into the Minion-Myomaker
system led the researchers to envision harnessing it to target fusing cells in cancer or other contexts.
Early research hints at a diverse functional repertoire.
Another newly discovered microprotein with a link to disease is CASIMO1 (Cancer Associated Small Integral Membrane Open reading frame 1), an 83-amino acid microprotein characterized by Sven Diederichs’s group at the German Cancer Research Center in Heidelberg. In an expression profiling study to identify human genes involved in hormone receptor–positive primary breast cancer samples, an RNA transcript initially annotated as a putative noncoding RNA showed sixfold higher levels in breast tumor than in normal tissue.
The researchers later found that the loss of CASIMO1 disrupted the regulation of the actin cytoskeleton, impaired migration ability, reduced cells’ proliferation rate, and stalled the cell cycle in the G0/G1 phase—changes that would be expected to limit cancerous growth. The microprotein also appears to modulate cellular lipid levels and signal transduction.
These findings emphasize the importance of further investigations to discover and characterize novel smORFs and the microproteins they encode. Biochemical studies show that microproteins use short sequences of just two to four amino acids to interact with larger protein complexes to regulate biology. Such interactions are amenable to small-molecule inhibition, and, therefore, microprotein-protein interactions could reveal new druggable targets. As researchers continue to probe the functions of microproteins to better understand their mechanisms of action in these various roles and disease conditions, it will enable the development of new therapeutics.
Annie Rathore is a life science consultant at Deloitte Management Consulting. She graduated from the Salk Institute for Biological Studies in La Jolla, California, in 2018.
- M.C. Frith et al., “The abundance of short proteins in the mammalian proteome,” PLOS Genet, 2:e52, 2006.
- E. Ladoukakis et al., “Hundreds of putatively functional small open reading frames in Drosophila,” Genome Biol, 12:R118, 2011.
- A.A. Bazzini et al., “Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation,” EMBO J, 33:981–93, 2014.
- J. Ma et al., “Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue,” J Proteome Res, 13:1757–65, 2014.
- S.A. Slavoff et al., “Peptidomic discovery of short open reading frame-encoded peptides in human cells,” Nat Chem Biol, 9:59–64, 2013.
- N. Arnoult et al., “Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN,” Nature, 549:548–52, 2017.
- Q. Chu et al., “Identification of microprotein–protein interactions via APEX tagging,” Biochemistry, 56:3299–306, 2017.
- A. Rathore et al., “MIEF1 microprotein regulates mitochondrial translation,” Biochemistry, 57:5564–75, 2018.
- Q. Zhang et al., “The microprotein Minion controls cell fusion and muscle formation,” Nat Commun, 8:15664, 2017.