As a PhD student at the University of Münster in the early 2000s, Sven Diederichs would bike three kilometers each morning to a compact lab in a brick building on the hospital campus. There, he’d defrost small vials of lung cancer specimens from the lab’s deep freezer and extract a single microgram of RNA from each tube. Some of the specimens came from people whose cancers eventually metastasized, and some were from people whose cancers never spread. Diederichs pooled the RNA samples from each group and analyzed the differences in gene expression between the two.
One gene quickly stood out because its RNA was three times more abundant in eventually metastasizing tumor samples than in non-metastasizing ones. Diederichs, his fellow postdoc Ping Ji, and their colleagues dubbed this RNA transcript, which was more than 8,000 nucleotides long, MALAT-1 (metastasis-associated in lung adenocarcinoma transcript 1, also abbreviated as MALAT1). High expression of the RNA predicted a poor prognosis, the team found. But, Diederichs notes, “we didn’t find a remarkable open reading frame, and we didn’t find any protein expressed from it. We decided that it has to be something noncoding.”
The scientific community was resistant to the idea that a long noncoding RNA (lncRNA) was linked to increased cancer metastasis, says Diederichs. Biology textbooks have long preached the primacy of proteins, with RNAs the intermediary between code and final product, and scientists considered most noncoding RNAs to be useless junk, he explains. “Not many people believed that long noncoding RNAs would do something. They were still referred to as transcriptional noise. We had reviewer comments that said, ‘If it doesn’t encode for a protein, it doesn’t do anything. So why should we care?’”
Diederichs says it took multiple attempts before his study was accepted for publication in Oncogene, where it appeared in 2003. Some two decades later, that paper has been cited by more than 1,500 others, and it’s clear that the team was on to something.
Indeed, advances in technology have revealed that much of the noncoding genome gives rise to lncRNAs and other RNA molecules that affect all manner of cellular processes. “The vast majority of the human genome gets transcribed into RNA, between 70 and 90 percent,” says Diederichs. And as of 2022, more than 100,000 human lncRNAs have been discovered. Although no one knows how many of these are functional, at least 30, and perhaps more than 100 have a strong link to cancer, while others associate with conditions such as schizophrenia or heart disease, as well as with normal physiological functions, such as cell growth and metabolism.
See “Nearly 500 lncRNAs Needed For Cell Growth Identified”
How these lncRNAs exert their influence is in many cases still an open question, but scientists have begun to uncover mechanisms that confirm that the RNA molecules themselves interact with DNA, other RNA, and proteins, regulating proliferation, growth suppression, angiogenesis, and cellular immortality. In addition, some purported lncRNAs may not be noncoding after all: Hidden within many of them live codes for peptides called microproteins that are proving to have powerful physiological functions as well as important roles in the development and progression of cancer.
Long noncoding RNAs (lncRNAs) are RNAs longer than 200 nucleotides that do not contain any open reading frames greater than 300 nucleotides (100 codons). The DNA sequences they’re transcribed from can be found between genes (intergenic), within introns of known genes (intronic), or in the antisense strand of DNA, among other places.
Microproteins/micropeptides are proteins translated from a short open reading frame (sORF) of less than or equal to 300 nucleotides, producing a protein of up to 100 amino acids. These sORFs are often found within lncRNAs.
Noncoding RNAs can have various functions
Geneticist Howard Chang’s first foray into lncRNAs came shortly after he started his lab at Stanford University in the mid-2000s. He decided that, instead of looking at the 2 percent of the human genome that encodes proteins, he would focus on the noncoding elements. “We hybridized RNA samples to arrays and, to our surprise, in addition to the known mRNA coding sequences, we also got all kinds of signal,” he recalls. “So, either we had some kind of terrible background [noise] in our array technology, or we have all these new RNAs that we should try to figure out how to characterize.” (Chang helped found and advises several companies working on RNA therapeutics, although he’s not directly involved in lncRNA commercialization.)
Noncoding RNAs (ncRNAs) are defined as RNAs that do not contain any open reading frames greater than 300 nucleotides (enough to code for 100 amino acids). Researchers once assumed “that proteins smaller than 100 amino acids are likely not functional,” explains geneticist Jin Chen of Altos Labs, a California- and UK-based biotech company focused on cellular rejuvenation programming. This definition of ncRNAs was intended to capture any RNA that didn’t encode a functional protein. Any ncRNA longer than 200 nucleotides was dubbed a long noncoding RNA.
We had reviewer comments that said, ‘If it doesn’t encode for a protein, it doesn’t do anything. So why should we care?—Sven Diederichs, University of Heidelberg
As Chang and others turned their attention to these molecules in the late 1990s and early 2000s, examples of lncRNAs’ physiological functions began to accumulate. A famous early example is Xist (X-inactive specific transcript). Researchers originally thought that it came from a protein-coding gene, but the protein turned out to be an artifact. Rather, the RNA transcript itself coats and inactivates one of the two X chromosomes in female embryos. Jeannie Lee, a geneticist at Harvard Medical School who advises RNA-targeting drug company Skyhawk Therapeutics, notes the importance of the RNA’s structure for its function. “It’s a very large RNA; each piece of the RNA will recruit a different cluster of proteins. So, the whole RNA is actually wrapped up in a large particle that consists of probably 100 or more proteins. The transcript scaffolds an entire family of proteins that will be required to silence the X chromosome.”
See “Genes that Escape Silencing on the Second X Chromosome May Drive Disease”
John Mattick, a molecular biologist at the University of New South Wales in Australia who has worked on lncRNAs since 2000 and is on the scientific advisory boards of RNA therapeutics companies US-based NextRNA Therapeutics and UK-based e-therapeutics, says that this feature is common in lncRNAs. These RNAs can have complex secondary structures that contain both “a sequence that will target a place in the DNA, but also another sequence that binds the protein that brings it to that location,” he says.
After discovering MALAT1’s involvement in metastasis, Diederichs wanted to know whether the RNA played a functional role in disease, or if it was just a marker of cancer’s spread. He started his own lab at the German Cancer Research Center and at the University of Heidelberg, and set out to determine just what MALAT1 was doing in the body. His team began by silencing the gene in human lung tumor cells and got its first clue. “There was this moment when we first saw that the cells that lacked MALAT1 did not migrate,” he says. “That was the moment where I was convinced that this was functionally relevant.” The team subsequently found that MALAT1 regulates expression of a set of metastasis-associated genes, and later studies showed that the RNA localizes to cellular structures called nuclear speckles.
Many other groups have followed suit, finding numerous lncRNAs whose expression levels correlate with features of different cancers. Chang, for example, found that a lncRNA his team dubbed HOTAIR (HOX antisense intergenic RNA) is a scaffold that recognizes various targets and silences Hox genes involved in body patterning. His and other teams have identified HOTAIR as an oncogenic molecule that affects proliferation, apoptosis, invasion, aggression, and metastasis of cells. The lncRNA tethers two proteins—polycomb repressive complex 2 (PRC2) on one of its ends and lysine-specific demethylase 1 (LSD1) on the other—that inhibit expression of tumor and metastasis suppressor genes, among others. HOTAIR has so far been linked to lung, breast, pancreatic, and other cancers.
“There’s literally hundreds of papers every year coming out with new long noncoding RNAs,” says Diederichs. “Some are very cancer-specific . . . some with a mechanism, some without a mechanism. So it’s really a jungle out there of long noncoding RNAs these days.”
Some lncRNAs aren’t noncoding at all
A few years ago, the idea that perhaps many of these lncRNAs actually encode proteins—and that the arbitrary threshold of 100 amino acids to define a protein was wrong—began rumbling through the research community. Last year alone, researchers published three major review papers on the resulting microproteins, extolling the mysteries of this dark proteome. “Recently we found out that a lot of these so-called long noncoding RNAs are not really noncoding; they can encode these small microproteins and these small microproteins really play a very diverse role,” says Chen. “Because [of] this initial oversight and assumption, there’s this whole proteome that’s basically missing” from scientific knowledge.
In fact, researchers have recognized that small, functional proteins exist, but these were thought to always be processed from larger proteins, such as is the case with peptide hormones and neuropeptides. Insulin, for example, is only 51 amino acids long, but it is cleaved from the longer proinsulin polypeptide chain (which is cleaved from the even longer preproinsulin). Microproteins, on the other hand, are born small—they are directly translated from short open reading frames (sORFs) that were overlooked as a result of the 100-amino-acid threshold.
To uncover these formerly ignored microproteins in the vast human genome, exploratory studies use computational approaches to search DNA for evolutionarily conserved sequences with few or no mutations that would change the coded amino acid—a feature that would suggest that the conserved sequence represents a true protein. Researchers also use ribosome profiling, in which RNAs actively translated by ribosomes are extracted and sequenced, as well as mass spectrometry to directly quantify the resulting protein. Combining these methods, scientists can confidently assess whether a sORF actually codes for a microprotein. Further experimentation can help determine what the microprotein does.
As with lncRNAs, researchers are discovering functional roles for microproteins in health and disease. Studies have shown that the peptides act as signaling molecules, regulators of enzymes, ligands for receptors, and critical transmembrane components. And now, scientists are finding more and more microproteins that appear to stoke cancer in humans.
In 2018, as Diederichs and graduate student Maria Polycarpou-Schwartz were searching for lncRNAs and mRNAs involved in breast cancer, their team homed in on one lncRNA that was expressed at levels six times higher in breast tumors than in normal tissue. Scanning a protein sequence database for the region, the researchers saw that the lncRNA contained a sORF that was highly conserved in mice and rats. In humans, it appears to be translated into a small protein of 83 amino acids. Knocking down the gene, dubbed CASIMO1 (cancer-associated small integral membrane ORF), in cultured human breast cancer cells caused actin cytoskeleton deregulation, such that the cells were unable to migrate and less likely to proliferate. The knockdown also suggested a role for CASIMO1 microprotein in cell cycle progression.
“For a long time, we thought that it’s noncoding because many of the prediction programs told us so and the open reading frame was really short and so on,” says Diederichs. So the team was surprised to eventually find that “if we take out the start codon of the longest ORF, we suddenly lost the function.”
Diederichs and his colleagues subsequently identified 12 proteins that appeared to interact with CASIMO1, including an enzyme involved in cholesterol biosynthesis called squalene epoxidase (SQLE), which also happens to be oncogenic. Their work showed that CASIMO1 overexpression in cultured breast cancer cells led to increased SQLE protein levels, while CASIMO1 knockdown led to reduced SQLE. CASIMO1 knockdown also reduced levels of active extracellular signal-regulated kinase, part of the MAPK pathway of cell proliferation that is often involved in cancer. This work, published in 2018, crowned CASIMO1 as the first microprotein discovered to have oncogenic activity.
Discoveries of additional oncogenic microproteins followed. Last year, for example, researchers in China found that a lncRNA highly expressed in certain chemotherapy-resistant breast tumors encodes a 44-amino-acid-long micropeptide, which they named PACMP (PAR-amplifying and CtlP-maintaining micropeptide). PACMP modulates the DNA damage response, thereby regulating cancer progression and drug resistance. Depleting the peptide in cultured cells reduced tumor growth and sensitized tumor cells to several chemotherapies.
lncRNAs and microproteins in cancer
Several long noncoding RNAs and microproteins have been implicated in cancer. While some appear to stoke the development of disease, others keep cancer’s progression in check. A selection is described below.
IncRNA OR MICROPROTEIN
Sustained proliferative signaling
Activates the miR-126/CXCR4 axis and downstream signaling pathways; increases lactate production, glucose uptake, and ATP production, among other actions
Increases phosphorylation of ERK, part of the MAPK pathway of cell proliferation
Regulated by p53, its overexpression inhibits proliferation of tumor cells (see its microprotein below)
Evasion of growth suppressors
Transcribed from the PINT lncRNA, it interacts with PAF1c and inhibits transcriptional elongation of multiple oncogenes and suppresses tumorigenic capabilities of glioblastoma cell lines
Resistance to cell death
Loss of CASC9 leads to apoptosis, likely through the AKT signaling pathway and possibly other pathways
Activation of invasion and metastasis
Regulates expression of a set of metastasis-associated genes
Promotes cell migration and invasion by increasing EMT; regulates insulin growth factor-binding protein 2 expression, among other functions
Competes with PP2A for CIP2A binding, leading to inhibition of oncogenic P13K/AKT/NFkB pathway
Interacts with Aquaporin 2 to inhibit the actin cytoskeleton, suppressing tumor growth and metastasis
Deregulating cellular energetics
Helps splice the PKM enzyme into a form that supports normal cell metabolism, inhibiting cancer
Genome instability and mutation
Modulates DNA damage repair
Promotes a proinflammatory immune response
When microproteins put the brakes on cancer
Some microproteins slow cancer progression. In 2017, researchers discovered that a lncRNA called HOXB-AS3 encodes a conserved 53-amino-acid peptide that suppresses colon cancer growth. It works via the enzyme pyruvate kinase M (PKM), which comes in two isoforms. The team showed that when the HOXB-AS3 micropeptide is present, it facilitates the splicing of the first isoform, PKM1, which supports normal cell metabolism and growth. In lab experiments, the team further found that when HOXB-AS3 is absent, cells make PKM2, triggering aerobic glycolysis, which allows the rapid proliferation of cancer cells. Previous studies have shown that PKM2 production confers selective advantage to tumor cells and is seen in most cancers. By examining tissues from patients with colon cancer, the scientists saw that individuals with low levels of HOXB-AS3 had more advanced cancers and poorer prognoses.
Similar cancer-blocking microproteins have been found in breast cancer. In 2020, researchers showed that downregulation of a microprotein called CIP2A-BP was linked to increased metastasis and reduced survival in patients with triple negative breast cancer. In a mouse model of breast cancer, injecting the microprotein reduced lung metastases and improved survival. The team also showed that CIP2A-BP binds to the tumor oncogene CIP2A to inhibit migration and invasion of the breast cancer cells via the PI3K/AKT/NFkB pathway.
That same year, another group of researchers studying microproteins from various noncoding RNAs discovered a microprotein they called MIAC (micropeptide inhibiting actin cytoskeleton) in head and neck squamous cell carcinoma. Their study showed that when MIAC levels are low, survival tends to be poor in patients with these cancers. The team further analyzed the RNA sequences of 9,657 other human tissues, including 32 different cancers, and found that MIAC is related to the progression of five other types of tumors. Diving deeper into possible mechanisms, the researchers found that MIAC interacts with aquaporin 2, a protein that normally functions in the kidney. This interaction inhibits the actin cytoskeleton and eventually suppresses tumor growth and metastasis. “MIAC . . . differential expression is significantly related to patient prognosis and the tumor state,” says lead investigator Hanmei Xu at China Pharmaceutical University in Nanjing. “So, the possibility of [its] application in cancer diagnosis and treatment [is worth] exploring.”
Although the field is young, the list of microproteins linked to cancer goes on and on. “These microproteins are very important. . . . There’s potentially tens of thousands of them,” says Chen, although not all are likely to be cancer related. Many open questions remain. It’s not clear, for instance, how many are stable and functional. “There are so many unknowns. I think it’s a very exciting field to be in at the moment,” Xu agrees. “Micropeptide discovery has a great future,” she says. “According to recent -omics studies, many microproteins have not been characterized. [Genomes] contain a great quantity of undiscovered treasure.”
The field is already revealing new levels of complexity, too. For example, as Diederichs points out, some lncRNAs both have functions of their own and serve as recipes for microproteins. In 2013, researchers reported on a lncRNA called Pint that is regulated by the tumor suppressor protein p53 and encourages cell proliferation and survival. In 2018, researchers discovered that this lncRNA, in its circular form, contains an ORF that encodes an 87-amino-acid microprotein called PINT87aa, which tamps down tumorigenesis in glioblastoma cell lines in vitro and in vivo. The team found that mice injected with anaplastic astrocytoma cells that were lacking functional copies of the microprotein grew larger tumors than animals injected with unmodified astrocytoma cells.
“I think it’s less important whether it’s coding or noncoding,” Diederichs says. “Indeed, my firm belief is that many of the transcripts we are looking at will have coding functions as well as noncoding functions.”