For a while it looked as if proteomics' next frontier was the clinic, if one was to believe the hype surrounding a 2002 study from US Food and Drug Administration scientist Emanuel Petricoin III and National Cancer Institute scientist Lance Liotta. The pair and their team used mass spectrometry and pattern-recognition software to probe serum samples for ovarian cancer biomarkers. Their findings suggested that proteomic patterns – series of peaks in mass spectra representing unidentified peptides or low molecular-weight protein fragments – could be used to diagnose early ovarian cancer with surprising accuracy.
This was the hope in late 2003 when biotech startup Correlogic licensed Quest Diagnostics and LabCorp to market OvaCheck, a blood test for ovarian cancer based on the initially promising findings. But doubts about the validity of Petricoin's results and the robustness of serum proteomics as a tool for biomarker discovery...
THE REPRODUCIBILITY FACTOR
For serum proteomic profiling to be widely accepted as a diagnostic assay, scientists must show that patterns identified by mass spectrometry are reproducible across totally independent data sets. This, some claim, is exactly what the
"There's substantial uncertainty about whether the initial results were strong enough to support the high expectations," says epidemiologist David Ransohoff of the University of North Carolina Medical School. Baggerly, for instance, uncovered a variety of flaws in three publicly posted datasets, including mass calibration problems, different processing of samples midexperiment, and incomplete sample randomization.
He also found inconsistent results across two datasets that were subjected to the same experimental conditions – peaks that successfully differentiated patients in one dataset failed to do so in the other. "We [couldn't] make it work with the peaks that they supplied," Baggerly says.
Petricoin, now codirector (with Liotta) of the Center for Applied Proteomics and Molecular Medicine at George Mason University, counters that Baggerly's conclusions themselves were flawed, as the initial studies never made any conclusions about reproducibility and had different goals, such as testing chip surfaces and machine variability. "Claims about not being able to reproduce our findings were made by people who had no expertise in mass spectrometry, and were based on the faulty assumption that all of our publicly posted ovarian cancer data sets were all derived from the same methodology," Petricoin writes in an E-mail.
But Ransohoff says demonstrating reproducibility in at least one group of independent subjects is important for demonstrating proof-of-principle, as it rules out a chance occurrence. And, Baggerly adds that one of the main problems he found with Petricoin and Liotta's results was that some of the peaks that best distinguished cancers from controls resided in the low m/z region of the mass spectrum – a region, he argues, that is primarily noise.
O. John Semmes of Eastern Virginia Medical School, Norfolk, cells the reproducibility issue a "red herring." "There was never a demonstration that mass spec could not reproduce a pattern," he says, rather, that the experimental bias may have dominated these early studies.
"There are clearly ways in which we can prevent ourselves from publishing things that we know have not been shown to be robust," says Semmes, including standardization of sample-preparation protocols and instrumentation tuning, and sensitivity and reproducibility checks between runs using antibody-based quality-control chips. Ransohoff says many of the doubts about the robustness of proteomic profiling could easily be addressed by well-designed hypothesis generation and testing.
Bioinformatician James Lyons-Weiler of the University of Pittsburgh Medical Center, says a pipeline-construction approach for protein profile analysis, with specific phases or steps that can be evaluated at each stage of the discovery process, could help scientists determine if these profiles are generalizable across patient sets. This can be accomplished through modular, open-source software using independent test validation sets, he says. The Cancer Biomedical Informatics Grid, an NCI-sponsored multi-institute research network, is in the planning stages of developing such software.
And new studies that employ bias-reduction steps – including one by Semmes and colleagues at the Early Detection Research Network, which demonstrated reproducibility of prostate cancer marker results using SELDI-TOF MS across multiple sites4 – show higher predictive accuracy, than do studies like Petricoin and Liolla's. says Baggerly.
PEEKING AT THE PEAKS
A corollary to the reproducibility issue is how researchers choose to use proteomic profiling data. Can the pattern uncovered by mass spectrometry itself be used for diagnosis without knowing what the peaks actually represent?
Lyons-Weiler says identifying and characterizing each peak in a biomarker panel is a scientific luxury that hinders early disease detection. He adds that some diagnostic assays used today test for proteins whose identities and/or functions are not fully known, including ovarian cancer marker CA-125. "The clinician certainly doesn't need to understand the function of a blood-based biomarker that works," he says.
Others argue that mass spectrometry is best used as a discovery rather than a diagnostic tool. Once peaks are identified they can be used to develop multiplexed ELISAs, bead-based assays, or antibody arrays – all of which are well-established clinical tools – for deployment as diagnostic tests. "The hurdles for FDA approval are quite high for things that you don't know the identity of," says Semmes.
Baggerly adds that using mass spectrometry for discovery alone makes reproducibility less of an issue: "Reproducibility of the pattern is not so vital if the pattern is acquired at one lab, and peaks found to be important are then pursued and identified so that assays for the biomarkers found are reproducible."
This approach has several advantages. "A collection of identified, named proteins will transcend whatever technology is used to measure it. From one year to the next there may be a better way of actually measuring these markers, but they will still be the same markers," says Liotta.
Either way, using mass spectrometry as a diagnostic tool may be too costly for clinical laboratories, as even the least expensive mass spectrometer costs in the $100,000 range. Those that are currently on the market are not quite ready for the clinic, says Petricoin: "The biggest problem right now for [mass spectrometry] fingerprinting ... is the lack of a common standard operating procedure between different labs, reproducibility between different machines, and the variation of the research instruments. [They] are not clinical-grade devices yet." He notes, however, that this is changing, as clinical-grade tandem mass spectrometry has been used for several years for neonatal metabolic screening.
TWEAKING THE TECHNOLOGY
Despite recent advances in instrumentation, proteomic biomarker discovery is still hindered by technology: current mass spectrometers are not sophisticated enough to cover the 9-to-12 log dynamic range of the human serum proteome.
Scientists combat this problem on two fronts, developing methods to reduce sample complexity, and improving instrumentation. Petricoin explains, for example, that the low molecular-weight proteins and peptides that comprised the patterns he and Liotta found can be enriched using high molecular-weight carrier proteins as "molecular mops" to reduce sample complexity, thus facilitating mass spectrometry analysis (see related story, page 30). David Speicher and colleagues at the Wistar Institute, Philadelphia, developed an alternate strategy that decreases sample complexity by first depleting carrier proteins using a polyclonal HPLC column and using three additional separation dimensions to fractionate the remaining proteins prior to digestion and LC/MS/MS analysis.5
Mass spectrometry manufacturers are getting into the game as well. But new instruments with higher sensitivity and mass accuracy aren't necessarily the best choice for biomarker discovery. The algorithms written for high-resolution instruments are designed to improve mass accuracy rather than to improve relative quantitation of the number of ions going to the detector. More simplistic time-of-flight instruments tend to be more relatively quantitative and thus more reproducible for protein expression profiling, Semmes notes. "Spending more money on your detector is not the answer to improving these studies. It has to be more logically thought out," says Semmes.
Despite the controversies, most scientists involved in proteomic biomarker discovery do agree on one thing: though the field is in a state of flux, proteomics will ultimately make it to the clinic. Says Liotta, "Until we have one sort of technology that falls out and becomes the platform of choice for everyone, we're still in a wild west race for what is the best way to do this."