How Statistics Weakened mRNA’s Predictive Power

Transcript abundance isn’t a reliable indicator of protein quantity, contrary to studies’ suggestions. 

May 22, 2017
Ruth Williams

WIKIMEDIA, NICOLLE RAGER, NATIONAL SCIENCE FOUNDATIONUsing the quantity of messenger RNA (mRNA) as a proxy for protein abundance could be risky, concludes a paper published in PLOS Computational Biology today (May 22). The authors examined data from previous proteomic studies, and their new statistical calculations revealed that while mRNA levels can be a useful guide to protein levels when comparing different genes, relying on mRNA to evaluate the same gene in different tissues can be rather misleading.

“There has been controversy over the question of how well mRNA levels can predict protein levels,” said cell and molecular biologist Marko Jovanovic of Columbia University who was not involved in the study. “A few papers claim that their predictive power is very limited, others say they predict it very well. . . . The problem [is] that it depends what you are looking at—are you interested in the expression differences of different genes within the same tissues, or of the same gene in different tissues? Here, [the authors] have nicely separated these two, which is crucially important.”

In 2014, two papers published in Nature provided the first draft maps of the human proteome—each detailing the abundance and distribution of the assorted proteins throughout the body’s tissues—as determined by mass spectrometry. Amongst the multitude of data in one of the papers was a practical nugget of information: from looking at the vast array of proteins and their corresponding mRNAs in the various tissues, the authors had determined that mRNA levels are good surrogates for protein levels. If true, this would be really handy, said Jovanovic, “because it’s much easier to measure RNA levels.”

See “Human Proteome Mapped

Most biologists would likely have nodded at this conclusion and read on, but to bioengineer Nikolai Slavov of Northeastern University in Boston, the paper’s claim represented a statistical “elephant in the room,” he said. “It was clear to me that this was not consistent with their data from the moment I saw it, and that’s why we decided to reanalyze the data.”

The problem, Slavov said, was that, in the original study, changes in mRNA and protein levels between different genes, which can vary by 1,000-fold or more, had been grouped together with expression differences for individual genes between tissues, which are “usually within a 10-fold range.” Analyzing the data en masse in this way had created “a classical Simpson’s paradox,” said Slavov—a statistical phenomenon whereby apparent trends in individual sets of data disappear or reverse when the sets are pooled.

Sure enough, when Slavov reanalyzed the raw data files from the two Nature studies together with those from a more recent proteome study he found that, when individual genes were compared across different tissues, the mRNA data barely predicted protein levels at all.

Noise within the data was responsible for some of this unpredictability, Slavov said, but there were enough reproducible results in different data sets to suggest that the rest of the unpredictability was due to tissue-specific post-transcriptional regulation. In short, Slavov explained, differences in things like mRNA degradation, protein degradation, and protein secretion between cell types render the mRNA level of an individual gene a very poor predictor of its protein abundance across tissues.

The paper is not implying that one should never trust mRNA levels as an indicator of protein expression, said systems biologist Michael Springer of Harvard Medical School who was not involved in the study. Of mRNA and protein levels being correlated, he said, “grossly, that is true. Highly expressed mRNAs lead to highly expressed proteins, lowly expressed mRNAs lead to lowly expressed proteins.”

“But,” he went on, “if the differences are more subtle—as they often are for the same gene in different tissues or under different conditions, then you should be careful about using mRNA as a readout of protein level. . . It really depends on the question one is asking.”

The Scientist reached out to Bernhard Küster, one of the authors of the 2014 Nature study, but did not get a response.

A. Franks et al., “Post-transcriptional regulation across human tissues,” PLOS Computational Biology, 13:e1005535, 2017.


January 2019

Cannabis on Board

Research suggests ill effects of cannabinoids in the womb


Sponsored Product Updates

WIN a VIAFLO 96/384 to supercharge your microplate pipetting!
WIN a VIAFLO 96/384 to supercharge your microplate pipetting!
INTEGRA Biosciences is offering labs the chance to win a VIAFLO 96/384 pipette. Designed to simplify plate replication, plate reformatting or reservoir-to-plate transfers, the VIAFLO 96/384 allows labs without the space or budget for an expensive pipetting robot to increase the speed and throughput of routine tasks.
FORMULATRIX® digital PCR technology to be acquired by QIAGEN
FORMULATRIX® digital PCR technology to be acquired by QIAGEN
FORMULATRIX has announced that their digital PCR assets, including the CONSTELLATION® series of instruments, is being acquired by QIAGEN N.V. (NYSE: QGEN, Frankfurt Stock Exchange: QIA) for up to $260 million ($125 million upfront payment and $135 million of milestones).  QIAGEN has announced plans for a global launch in 2020 of a new series of digital PCR platforms that utilize the advanced dPCR technology developed by FORMULATRIX combined with QIAGEN’s expertise in assay development and automation.
Application of CRISPR/Cas to the Generation of Genetically Engineered Mice
Application of CRISPR/Cas to the Generation of Genetically Engineered Mice
With this application note from Taconic, learn about the power that the CRISPR/Cas system has to revolutionize the field of custom mouse model generation!
Translational Models of Obesity, Dysmetabolism, Diabetes, and Complications
Translational Models of Obesity, Dysmetabolism, Diabetes, and Complications
This webinar, from Crown Bioscience, presents a unique continuum of translational dysmetabolic platforms that more closely mimic human disease. Learn about using next-generation rodent and spontaneously diabetic non-human primate models to accurately model human-relevant disease progression and complications related to obesity and diabetes here!