The worldwide attention on the SARS-CoV-2 virus has emphasized the dearth of analytical methodologies for rapid and cost-effective identification and characterization of viral RNA. Many virus families, including pathogenic varieties such as coronaviruses, human retroviruses, and influenza, are RNA viruses, meaning their genetic material occurs in the form of RNA instead of DNA. When these viruses infect host cells, the viral RNA package hijacks the infected cell’s molecular machinery and begins producing its own viral molecules, which eventually overwhelm the cell, releasing new viral molecules to infect other cells in the system. 

Almost 20 years ago, an RNA virus now known as SARS-CoV-1 spread through the human population killing nearly 775 people who suffered from the severe acute respiratory syndrome (SARS) that the virus caused. Since 2012, Middle East respiratory syndrome (MERS), another disease caused by an RNA coronavirus, has killed more than 850 people, approximately 35 percent of those it has infected. Scientists were alarmed by the severity of these viral infections and through genetic analysis were able to get an idea of how these viruses infect cells. But the inferiority of RNA-characterizing techniques with respect to DNA-focused methods means that researchers' understanding of the biology and processes associated with these infections is still arguably nascent. 

RNA differs from DNA in that it is modified post-transcriptionally with an assortment of chemical moieties on its nucleobase and ribose sugar. To date, there have been more than 140 modifications detected in RNA, and while the greatest concentration is found on transfer RNA, which delivers amino acids to the ribosome during translation, every RNA studied has contained some level of modification. More than 70 years of study has yielded some understanding of the biological role RNA modifications play. Knowledge of the roles they play in viral RNA, however, is scant. Analytically, investigation of viral RNA is hindered by a number of technical challenges, such as an inability to generate the actual sequence of the viral RNA with the modifications in their proper locations.

For example, the genetic sequence of SARS-CoV-2 was published in 2020. The authors of that paper reported the presence of 41 modification sites found in the viral RNA sequence. These modifications were listed as unknown due to the fact that RNA sequencing (RNA-seq) cannot directly identify modifications. RNA-seq enzymatically produces a complementary DNA strand (cDNA). In the process of creating the cDNA, the enzyme “reading” the RNA stops when it encounters a modification. These hard stops are exploited to identify the probable presence of a modification, but not its identity (methylation, acetylation, etc.).

This missing information is crucial, especially in the context of viral RNA, as modifications have been shown in nonviral RNAs to assist in structural stability or act as determinants for RNA protein binding, two key attributes for viral RNA infection and replication. In the context of SARS-CoV-2 two immediate questions are raised: What are the chemical structures of the unknown modifications, and how are they contributing to the virus’s biology?

Since 1985, the gold standard for studying RNA modification has been mass spectrometry. The approach, developed in the lab of James McCloskey at the University of Utah, can tell not only if there are modifications present but also what the modifications are and where they exist in the sequence. The two biggest limitations of mass spectrometry for transcriptomics are, firstly, the need for large samples and secondly, ionization inefficiencies. Viral RNA cannot be purified, economically, at the concentrations necessary for full characterization. Furthermore, large molecules such as intact mRNA cannot easily be analyzed due to the nature of the electrospray ionization; The larger the molecule the more difficult it is to ionize fully and to maintain the charge during the fragmentation process used to map the modification to its respective position within the sequence. However, mass spectrometry is still currently the only way to directly sequence RNA.

Newer technologies such as nanopore sequencing offer the promise of being able to handle small samples and also to read and locate RNA modifications. Nanopore sequencing differs from standard RNA-seq in that the RNA transcript is fed through a circular protein complex. As the nucleobases pass through the pore, electrical signals can be detected and converted to sequence ID. The RNA is read directly, without the need to generate cDNA. The limitation of nanopore sequencing is the need to train the system to recognize each RNA modification’s signature. This becomes a Herculean task, as each nucleotide’s electrical signature is affected by their neighboring nucleotides. The time and costs associated with generating a training library for all the modifications with all the neighboring nucleotides is prohibitive. 

We are told the next pandemic is right around the corner. If this is the case, another Space Race–type effort is needed, this time with a goal of developing methods that can more fully characterize RNA instead of flying to the moon.

Robert Ross is Senior Product Applications Specialist at Thermo Fisher Scientific, working on development of liquid chromatography tandem mass spectrometry (LC-MS/MS) characterization of nucleic acids. Email him at robert.ross2@thermofisher.com.