“There has been controversy over the question of how well mRNA levels can predict protein levels,” said cell and molecular biologist Marko Jovanovic of Columbia University who was not involved in the study. “A few papers claim that their predictive power is very limited, others say they predict it very well. . . . The problem [is] that it depends what you are looking at—are you interested in the expression differences of different genes within the same tissues, or of the same gene in different tissues? Here, [the authors] have nicely separated these two, which is crucially important.”
In 2014, two papers published in Nature provided the first draft maps of the human proteome—each detailing the abundance and distribution of the assorted proteins throughout the body’s tissues—as determined by mass spectrometry. Amongst the multitude of data in one of the papers was a practical nugget of information: from looking at the vast array of proteins and their corresponding mRNAs in the various tissues, the authors had determined that mRNA levels are good surrogates for protein levels. If true, this would be really handy, said Jovanovic, “because it’s much easier to measure RNA levels.”
See “Human Proteome Mapped”
Most biologists would likely have nodded at this conclusion and read on, but to bioengineer Nikolai Slavov of Northeastern University in Boston, the paper’s claim represented a statistical “elephant in the room,” he said. “It was clear to me that this was not consistent with their data from the moment I saw it, and that’s why we decided to reanalyze the data.”
The problem, Slavov said, was that, in the original study, changes in mRNA and protein levels between different genes, which can vary by 1,000-fold or more, had been grouped together with expression differences for individual genes between tissues, which are “usually within a 10-fold range.” Analyzing the data en masse in this way had created “a classical Simpson’s paradox,” said Slavov—a statistical phenomenon whereby apparent trends in individual sets of data disappear or reverse when the sets are pooled.
Sure enough, when Slavov reanalyzed the raw data files from the two Nature studies together with those from a more recent proteome study he found that, when individual genes were compared across different tissues, the mRNA data barely predicted protein levels at all.
Noise within the data was responsible for some of this unpredictability, Slavov said, but there were enough reproducible results in different data sets to suggest that the rest of the unpredictability was due to tissue-specific post-transcriptional regulation. In short, Slavov explained, differences in things like mRNA degradation, protein degradation, and protein secretion between cell types render the mRNA level of an individual gene a very poor predictor of its protein abundance across tissues.
The paper is not implying that one should never trust mRNA levels as an indicator of protein expression, said systems biologist Michael Springer of Harvard Medical School who was not involved in the study. Of mRNA and protein levels being correlated, he said, “grossly, that is true. Highly expressed mRNAs lead to highly expressed proteins, lowly expressed mRNAs lead to lowly expressed proteins.”
“But,” he went on, “if the differences are more subtle—as they often are for the same gene in different tissues or under different conditions, then you should be careful about using mRNA as a readout of protein level. . . It really depends on the question one is asking.”
The Scientist reached out to Bernhard Küster, one of the authors of the 2014 Nature study, but did not get a response.
A. Franks et al., “Post-transcriptional regulation across human tissues,” PLOS Computational Biology, 13:e1005535, 2017.