Methods for diagnosing early-stage cancer are in notoriously short supply. Although much ballyhooed work from National Cancer Institute/FDA researchers Emanuel Petricoin, Lance Liotta, and colleagues promised a revolution thanks to clinical proteomic screening, initial findings fell under scrutiny. Critics maintained that the process they used, surface-enhanced laser desorption/ionization-time-of-flight (SELDI-TOF) mass spectrometry, lacked the sensitivity needed to detect low-abundance proteins, and that sample preparation and instrument variation introduced too many biases into the experiment to make the results valid.
The Hot Papers featured here directly addressed this controversy. Keith Baggerly, a biostatistician at M.D. Anderson Cancer Center in Houston, and his group systematically analyzed Liotta and Petricoin's publicly posted raw-data sets, determined that the published results were not reproducible, and suggested ways...
CREATING A CONTROVERSY
The initial papers on ovarian cancer biomarkers inspired a number of groups to perform similar studies, but many ran into problems. "When people tried to reproduce some of these experiments, they had a hard time," says Baggerly. Speaking about the raw data that his group investigated, he says, "In many cases, several of the things that were picked up as separating one group from another were due to artifacts of differential processing and design, and not biology."
Liotta and Petricoin contend that the study sets Baggerly analyzed were not meant to be reproducible, since each set was generated using different methodologies. "In fact, many of the studies were performed where we purposefully changed different experimental conditions to see how the spectra were affected," says Petricoin, now at George Mason University. The issues addressed by Baggerly, however, have enhanced the field; "Baggerly's group really put it on the front burner for people to be very careful about the way they design their studies," says Petricoin.
Data derived from the Science Watch/Hot Papers database and the Web of Science (Thomson Scientific, Philadelphia) show that Hot Papers are cited 50 to 100 times more often than the average paper of the same type and age.
K.A. Baggerly et al., "Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments," Bioinformatics, 20:777-85, 2004. (Cited in 91 papers, Hist Cite Analysis)
Z. Zhang et al., "Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer," Cancer Res, 64:5882-90, 2004. (Cited in 86 papers, Hist Cite Analysis)
N.L. Anderson et al., "The human plasma proteome - A nonredundant list developed by combination of four separate sources," Mol Cell Proteomics, 3:311-26, 2004. (Cited in 95 papers, Hist Cite Analysis)
Zhang and colleagues' multicenter, cross-validated effort to identify ovarian cancer biomarkers appears a well-designed response to Baggerly's paper even though the work started in 2001. The study independently analyzed serum samples from healthy and diseased patients from two centers, validated the results with samples from two different centers, and then sequenced and identified the resulting three candidate markers. To further validate their findings, Zhang and colleagues developed an immunoassay and tested it on samples from a fifth center.
Even some critics of the field consider the Zhang study one of the strongest papers to come out of the discovery-based proteomics literature. David Ransohoff, of the University of North Carolina School of Medicine, says, "The reason that it's strong was they derived their candidate markers from one group of subjects and then tested them on a completely independent group of subjects. And they included subjects from a variety of centers."
Despite the strength of the study's design, though, some have reservations about the low specificity and sensitivity of the markers compared to the classical serum marker for ovarian cancer, CA-125. The predictive model generated from Zhang's data performed better (a sensitivity of 74% vs. 65%). "It is an advance over CA-125, but not a huge advance," says Baggerly.
Further, the three proteins they found were not cancer-specific and, critics argue, may have been artifacts generated by SELDI's bias toward high-abundance proteins that "likely represent epiphenomena," says Eleftherios Diamandis of the University of Toronto, one of the more vocal skeptics of SELDI-based biomarker discovery. "I don't believe the Zhang data at all," he says.
Petricoin points out, however, that two of the Zhang markers are truncated or fragmented forms of high-abundant proteins. These may be the rare products of disease-specific processes. "We're not talking about the parental molecule, we're talking about the end product of an enzymatic process that emanates from the tumor's microenvironment," he says.
COMPARING METHODS
In the midst of the biomarker controversy, Leigh Anderson and colleagues looked to compare different biomarker discovery platforms. They combined data from three different experimental methods with literature search results, compared the data, and developed a preliminary nonredundant list of proteins present in human plasma and serum. Of the 1,175 proteins identified, only 46 were detected by all four platforms. The Human Proteome Organization obtained similar results in a comparative study done in 2005.
The small overlap has two consequences, Anderson says. "One would conclude that there isn't any best platform for doing plasma proteomics, or more importantly, biomarker searches. And ... if different platforms see different sets of proteins in the same mixture, you'll get more discovery by using a series of different platforms taken together." At the same time, candidate biomarkers discovered in one platform may not be discovered in another. "Therefore it's hard to see how to move forward with those methods to do systematic biomarker validation in plasma," Anderson adds.
These three Hot Papers point to broader problems with proteomics-based biomarker discovery and suggest new directions for the field. Says Ransohoff, "In my view the field doesn't yet have a really strong proof-of-principle study that shows a high degree of discrimination ... where chance and bias aren't explanations. And that's the kind of thing we need."