Metabolomics researchers Gary Patti and Nathaniel Mahieu of Washington University in St. Louis report that out of about 25,000 compounds detected in E. coli by liquid-chromatography mass-spectroscopy (LC/MS), 90 percent were not unique metabolites. Rather, the same metabolite, fragmented or with chemical additions, is spotted multiple times, a phenomenon known as degeneracy. A second analysis, designed to weed out contaminants and artifacts in addition to degeneracy, confirmed just three percent of the observed compounds are bona fide, unique metabolites.
“This study confirms what I think a lot of people in the metabolomics world have known,” says University of Michigan endocrinologist Charles Burant, who was not involved in the work. “All these features that we see during mass spectroscopy, really, a lot of them are sort of junk.”
Scientists use metabolomics experiments to profile the small molecules—less than two kilodaltons in mass—present in a given group of cells and to compare metabolites present in healthy and diseased samples in the hopes of better understanding a disorder. In metabolic profiling using LC/MS, researchers first extract the small molecules from the macromolecules, which includes genetic material and proteins. Next, in liquid chromatography, a column physically separates the extract’s various components. Then, a mass spectrometer weighs each compound in the extract by giving it an electrical charge and recording how it moves in response to magnetic or electric fields.
Because identifying the chemicals behind mass spectrometry signals is quite difficult, Patti says, often in systems biology experiments, scientists will compare LC/MS signals between different samples or patient groups without having identified the underlying compounds. “That approach is very dangerous,” Patti says, “because if it’s true that in at least some of these metabolomics experiments that a lot of what we’re detecting are artifacts, contaminants, and degeneracies, then you worry how much of those types of comparisons are being modeled on data that correspond to noise.” Hopefully, most studies perform validation analyses that would catch these mistakes, Patti says, but it would be more efficient to avoid them in the first place.
LC/MS experiments typically reveal thousands of “peaks,” or features, each representing a single compound, many of which researchers cannot identify and therefore categorize as “unknown metabolites.” However, as the authors write, not all of these unidentifiable compounds are novel metabolites. A compound might also be unidentifiable because it’s a contaminant, an artifact, or an adduct: one compound bound to a second, charged molecule.
This study confirms what I think a lot of people in the metabolomics world have known. All these features that we see during mass spectroscopy, really, a lot of them are sort of junk.—Charles Burant,
University of Michigan
“Currently, there is a substantial component of the metabolomics communities that equates the number of features to a number of metabolites. However, this is totally erroneous, and Gary Patti and his group have [been] working to resolve this issue,” Lloyd Sumner, director of the University of Missouri Metabolomics Center, tells The Scientist in an email. Patti and graduate student Mahieu set out to determine how many of the LC/MS features from a sample of E. coli represented actual metabolites.
When the researchers ran their samples through the LC/MS, they detected about 25,000 compounds, which, they write, is typical for an untargeted metabolomics experiment—where researchers consider all detected compounds, rather than looking for particular ones. But they found that many features were due to metabolites forming adducts by binding to a charged particle.
Some adduct formation, such as the binding of a compound of interest to a hydrogen ion, is intentional and necessary to charge compounds for mass spectroscopy. “What was surprising was that there are many other types of adducts we’re seeing,” Patti says. Often, molecules formed adducts with each other, they found. Sometimes, compounds formed adducts with contaminants. “Because we have a lot of metabolites that are present simultaneously, we’re finding that a lot of these things are sticking together—not just dimers, but trimers and even more,” says Patti.
After removing the extra signals caused by adducts and fragments, the number of potential unique metabolites was down to about 3,000, meaning that around 90 percent of the original mass spectroscopy features were redundant.
The researchers used another method to detect contaminants—compounds that did not originate from the sample but came, say, from a test tube or a solvent—and artifacts, or signals due not to the presence of a compound but to some kind of technical fluke or data-processing glitch. The approach, called credentialing, involves growing the bacterial samples with glucose containing the heavy carbon isotope 13C and, in parallel, growing them with regular glucose—which contains mostly 12C—then mixing the two samples before the analysis.
On the mass-spectrometry read-out, any carbon-containing metabolite produced by the bacterial cell should produce two features: one representing the 12C-containing compound, the other representing the compound with the heavier 13C. To detect bona fide metabolites, Patti explains, “we look through all of the different signals, and we ask if a signal has a 13C dance partner? And because contaminants or artifacts come from non-biological sources, they’re not being made by the E. coli, they don’t have a dance partner.” Through this approach they detected 2,462 credentialed compounds. After removing adducts from the list, 892 bona fide metabolites remained—roughly three percent of the starting number.
Patti stresses that this doesn’t mean there are only 892 metabolites in E. coli. Textbooks will tell you, he says, that E. coli produces many more than 900. In fact, so many known metabolites are in part why scientists found it easy to believe the high numbers of putative metabolites their screens turned up, Patti says. “People sort of said, ‘There’s a lot of signals; there’s a lot of E. coli metabolites in the textbooks; they’re probably loosely correlated.’”
The results of the present study don’t necessarily mean that there aren’t thousands of E. coli metabolites, Patti says. “It just means we can’t detect them with this particular assay.”
Determining the chemical identity of detected compounds is time-consuming. “If you tried to do this for 25,000 and only 1,000 of them were real . . . you’d end up wasting a lot of time and resources,” Patti says. To save researchers the trouble, Patti and Mahieu have created a database (called creDBle) of credentialed features, starting with their E. coli dataset.
N.G. Mahieu, G.J. Patti, “Systems-level annotation of a metabolomics data set reduces 25000 features to fewer than 1000 unique metabolites,” Analytical Chemistry, doi:10.1021/acs.analchem.7b02380, 2017.