To see the potential of proteomic fingerprints, consider the limitations of the best-known biomarker for ovarian cancer: cancer antigen 125 (CA125). Ovarian cancer is usually discovered when it has already reached an advanced stage and metastasized. That CA125 levels are abnormal in 80% of advanced-stage cases is a fact of limited clinical utility, because therapy for advanced-stage ovarian cancer is not very good; the five-year survival rate is about 35%. What physicians need is a biomarker that alerts them to early-stage disease, when cancer is confined to the ovary, and surgery can cure nine out of 10 patients. Unfortunately, in early stages CA125 levels are abnormal no more than 60% of the time.
In contrast, the Lancet paper reported that in a masked set of 116 serum samples, the five-protein pattern discovered by Hitt's software correctly identified all 18 cases of stage I disease, and, in fact ,identified all 50 ovarian cancer cases in the sample set. The single flaw in the performance was predicting ovarian cancer in three of 66 cases that were nonmalignant. Overall, the proteomic fingerprint had a predictive value of 94% (50 of 53), vs. 35% for CA125.
|© Correlogic Systems, Inc.|
From Fraud to Fingerprints
Hitt's long odyssey outside the halls of academia and long love affair with computing gave him the experience he would need to find a proteomic fingerprint for ovarian cancer. His fascination with computing goes back to his days as a PhD candidate in biochemistry at West Virginia University, when he was required to demonstrate fluency in two languages to graduate. Instead of a second language, he was permitted to substitute a computer language, and he chose to learn FORTRAN. It was such an enthralling experience that years later, during his "mid-life crisis" in the early 1980s, he decided to embrace computing as a second career. He resigned a professor's post at the University of Cincinnati and traded in drug metabolism studies for a new life as a consultant specializing in applications of artificial intelligence. Eventually he became an expert at building neural net models to solve problems in pattern recognition.
By the late 1990s his expertise was furnishing him full-time employment as a sleuth of income tax cheats and credit card chiselers. Designing software to detect fraud from transaction patterns is an appealing line of work if you're looking for a cat-and-mouse exercise that keeps you on your toes; fraud artists are always on the lookout for new scams and gimmicks that help them stay out of sight. They "are very smart," Hitt observes. "If they begin to realize that part of their fraudulent behavior is starting to be learned, they develop something new." Keeping up with his quarry led him to study adaptive pattern recognition, where what he learned would later be applied to ovarian cancer. Here the objective is not only to teach software to recognize patterns, but endow it with the ability to recognize new patterns on its own.
In connection with seeking business opportunities for his technology, Hitt met Peter J. Levine, who today is president of Correlogic Systems. An attorney, Levine had a long history of using pattern recognition in his own business activities. In 1999, while talking with his friend Emanuel F. Petricoin, codirector of the Food and Drug Administration-National Cancer Institute Clinical Proteomics Program, Levine had the flash of insight that disease-state biomarkers might be found in patterns rather than individual proteins. And if such patterns existed in Petricoin's data, Hitt could find them, said Levine. To give Hitt a crack at the problem, Petricoin gave Hitt anonymous mass spectra data from 50 cancer patients and 50 noncancer controls. After letting the problem "percolate" for a few months, over the course of a weekend in January 2000, Hitt sketched out a method to unearth a multiprotein biomarker buried in a mountain of mass spectra. Petricoin's data became "training" sets for Proteome Quest's maiden expedition.
Finding a New Kind of Biomarker
His solution was to generate protein combinations for comparison to the training sets by using a genetic algorithm. Developed in the 1970s, genetic algorithms are "very effective in solving near impossible problems," says Hitt. The algorithm takes a list of protein combinations, usually 1,000 or so, and applies Darwinian selection to find the best proteomic fingerprint the list will provide. The process begins by evaluating randomly selected combinations against the training sets. Combinations that fail to distinguish cancer and noncancer drop from further consideration. In fact, the logic of elimination instantly sheds enormous numbers of combinations. For instance, if you have four proteins that cannot segregate the training sets, then any fifth protein you add to the combination is unlikely to create much improvement. Thus, shedding one combination sheds thousands.
The step that comes next gives the algorithm its name. Combinations that survive initial screening are recombined into new combinations and then screened again. After many iterations of recombination, screening, and elimination, a very good combination may emerge. After several months of work Hitt's best combination for the ovarian cancer training sets consisted of the amplitudes of serum proteins with M/Z values 534, 989, 2111, 2251, and 2465. Petricoin, fellow codirector Lance A. Liotta, Hitt, Levine, and others then collaborated on the retrospective study of mass spectra data that resulted in the Lancet paper.
The study compared the amplitudes of Hitt's five-protein fingerprint to amplitudes at the corresponding M/Z values in the masked samples, sorting samples into cancer and noncancer categories. When the samples were unmasked, the success of the fingerprint was all the more impressive: The malignant samples had included every major subtype of epithelial ovarian cancer, while the nonmalignant samples included potential false positives, benign disorders such as endometriosis and uterine fibroids.
A better understanding of false positives will be especially important in deciding how the fingerprint can best be used. Three false positives among 66 controls may be acceptable if the fingerprint is used only to screen women at high risk, such as those who have inherited mutations in breast and ovarian cancer genes BRCA1 and BRCA2. But for screening the general population Petricoin and Liotta consider the false positive rate unacceptably high.
The researchers suggest that it may be possible to lessen false positives by combining the proteomic fingerprint with other diagnostics. Then, if the next trials confirm that the false negative rate is zero, negative fingerprint tests would relieve patients of worry about ovarian cancer, while positive results would be subject to confirmation. The future also holds the possibility that as more SELDI-TOF data comes in, Proteome Quest will find a more accurate fingerprint. That is because adaptive pattern recognition is a key component of the software. "That's one of the beauties" of Proteome Quest's design, says Hitt. "When the diagnostic model screens a larger population, it will learn about that population as time goes on."
It is notable that the identities of the fingerprint proteins are unknown; proteomic fingerprinting does not depend on knowing what the proteins do. Once the five serum proteins are identified, they will undoubtedly be studied for what they may reveal about the mechanisms of ovarian cancer.
Multiprotein biomarkers seem so obvious that Correlogic ought to have swarms of competitors. But sometimes the chasm that separates idea and execution is especially broad. The patent search Correlogic did before filing a patent on its software turned up no competitor, says Hitt. "As near as we can tell, no one else has gone this route." Correlogic's broad patent covers discovery of "patterns of biomolecules to classify a biological state."
Correlogic is not entirely alone, however, according to Dick Rubin, director of marketing at Ciphergen Biosystems, in Fremont, Calif. Ciphergen sells the SELDI-TOF system used for the Lancet study. Rubin states that Ciphergen has software capable of finding multiprotein fingerprints by different algorithms.
In any event, Hitt is having the most exciting time of his life and Correlogic Systems is expanding beyond its current four employees to start new projects. In cooperation with the FDA and NCI, Correlogic is already hunting proteomic fingerprints for other cancers. Hitt mentions trying to find a fingerprint correlated to high risk for cardiac disease. Another idea is to look for a proteomic fingerprint for Alzheimer disease. The ovarian cancer fingerprint may be just the first of many, now that Ben Hitt has come back to biology.