Software Zeroes In on Ovarian Cancer

Ben A. Hitt is living proof that you can leave biomedical research without saying goodbye forever. More than 20 years since turning out the lights in the lab for what he thought was the last time, Hitt is not only back, he's in demand. Now chief scientist for Correlogic Systems, in Bethesda, Md., his phone hasn't stopped ringing since Feb. 16, when a paper in The Lancet1 announced that Proteome Quest, the pattern-recognition software he created, had identified a pattern among five serum proteins

By | April 15, 2002

Ben A. Hitt is living proof that you can leave biomedical research without saying goodbye forever. More than 20 years since turning out the lights in the lab for what he thought was the last time, Hitt is not only back, he's in demand. Now chief scientist for Correlogic Systems, in Bethesda, Md., his phone hasn't stopped ringing since Feb. 16, when a paper in The Lancet1 announced that Proteome Quest, the pattern-recognition software he created, had identified a pattern among five serum proteins that diagnoses ovarian cancer with unheard-of accuracy. It is a computing feat of outstanding significance and signals that the search for biomarkers for disease stands at the threshold of new possibilities, ready to step beyond single-protein biomarkers to those based on multiple proteins, what might be called proteomic fingerprints.

To see the potential of proteomic fingerprints, consider the limitations of the best-known biomarker for ovarian cancer: cancer antigen 125 (CA125). Ovarian cancer is usually discovered when it has already reached an advanced stage and metastasized. That CA125 levels are abnormal in 80% of advanced-stage cases is a fact of limited clinical utility, because therapy for advanced-stage ovarian cancer is not very good; the five-year survival rate is about 35%. What physicians need is a biomarker that alerts them to early-stage disease, when cancer is confined to the ovary, and surgery can cure nine out of 10 patients. Unfortunately, in early stages CA125 levels are abnormal no more than 60% of the time.

In contrast, the Lancet paper reported that in a masked set of 116 serum samples, the five-protein pattern discovered by Hitt's software correctly identified all 18 cases of stage I disease, and, in fact ,identified all 50 ovarian cancer cases in the sample set. The single flaw in the performance was predicting ovarian cancer in three of 66 cases that were nonmalignant. Overall, the proteomic fingerprint had a predictive value of 94% (50 of 53), vs. 35% for CA125.


© Correlogic Systems, Inc.

Protein Expression patterns contrast an ovarian cancer patient (Node 1) with healthy patients (Nodes 0, 2-4).



From Fraud to Fingerprints

The ovarian cancer fingerprint consists not simply of five proteins present in serum, but their relative amounts in relation to each other. Hitt's software detected the fingerprint by cleverly sifting through the massive amounts of data produced by a mass spectroscopy technique called SELDI-TOF (surface-enhanced laser desorption and ionization time-of-flight). This technique separates proteins according to their mass and electrical charge. A graph of SELDI-TOF data from a blood-sample analysis shows a spectrum of mass/charge (M/Z) peaks, or amplitudes, for approximately 15,200 serum proteins and peptides. The height of a peak represents the relative abundance of a protein in the sample. The speed and cost-effectiveness of SELDI-TOF—it analyzes a fingerprick of blood in 30 minutes—was ideal for investigating the hypothesis that cancer of the ovary is reflected by protein patterns in the serum. Provided, of course, that one knew which proteins to look for.

Hitt's long odyssey outside the halls of academia and long love affair with computing gave him the experience he would need to find a proteomic fingerprint for ovarian cancer. His fascination with computing goes back to his days as a PhD candidate in biochemistry at West Virginia University, when he was required to demonstrate fluency in two languages to graduate. Instead of a second language, he was permitted to substitute a computer language, and he chose to learn FORTRAN. It was such an enthralling experience that years later, during his "mid-life crisis" in the early 1980s, he decided to embrace computing as a second career. He resigned a professor's post at the University of Cincinnati and traded in drug metabolism studies for a new life as a consultant specializing in applications of artificial intelligence. Eventually he became an expert at building neural net models to solve problems in pattern recognition.

By the late 1990s his expertise was furnishing him full-time employment as a sleuth of income tax cheats and credit card chiselers. Designing software to detect fraud from transaction patterns is an appealing line of work if you're looking for a cat-and-mouse exercise that keeps you on your toes; fraud artists are always on the lookout for new scams and gimmicks that help them stay out of sight. They "are very smart," Hitt observes. "If they begin to realize that part of their fraudulent behavior is starting to be learned, they develop something new." Keeping up with his quarry led him to study adaptive pattern recognition, where what he learned would later be applied to ovarian cancer. Here the objective is not only to teach software to recognize patterns, but endow it with the ability to recognize new patterns on its own.

In connection with seeking business opportunities for his technology, Hitt met Peter J. Levine, who today is president of Correlogic Systems. An attorney, Levine had a long history of using pattern recognition in his own business activities. In 1999, while talking with his friend Emanuel F. Petricoin, codirector of the Food and Drug Administration-National Cancer Institute Clinical Proteomics Program, Levine had the flash of insight that disease-state biomarkers might be found in patterns rather than individual proteins. And if such patterns existed in Petricoin's data, Hitt could find them, said Levine. To give Hitt a crack at the problem, Petricoin gave Hitt anonymous mass spectra data from 50 cancer patients and 50 noncancer controls. After letting the problem "percolate" for a few months, over the course of a weekend in January 2000, Hitt sketched out a method to unearth a multiprotein biomarker buried in a mountain of mass spectra. Petricoin's data became "training" sets for Proteome Quest's maiden expedition.

Finding a New Kind of Biomarker

Hitt's task was to find a small set of proteins whose relative abundances were distinctly different in the two training sets. He knew that even for small clusters evaluating every combination within the blood proteome was impossible. Combinations of just five proteins number on the order of 1020; looking at them all "could take a billion years or more, using all the computing power on the planet." The problem had to be scaled down.

His solution was to generate protein combinations for comparison to the training sets by using a genetic algorithm. Developed in the 1970s, genetic algorithms are "very effective in solving near impossible problems," says Hitt. The algorithm takes a list of protein combinations, usually 1,000 or so, and applies Darwinian selection to find the best proteomic fingerprint the list will provide. The process begins by evaluating randomly selected combinations against the training sets. Combinations that fail to distinguish cancer and noncancer drop from further consideration. In fact, the logic of elimination instantly sheds enormous numbers of combinations. For instance, if you have four proteins that cannot segregate the training sets, then any fifth protein you add to the combination is unlikely to create much improvement. Thus, shedding one combination sheds thousands.

The step that comes next gives the algorithm its name. Combinations that survive initial screening are recombined into new combinations and then screened again. After many iterations of recombination, screening, and elimination, a very good combination may emerge. After several months of work Hitt's best combination for the ovarian cancer training sets consisted of the amplitudes of serum proteins with M/Z values 534, 989, 2111, 2251, and 2465. Petricoin, fellow codirector Lance A. Liotta, Hitt, Levine, and others then collaborated on the retrospective study of mass spectra data that resulted in the Lancet paper.

The study compared the amplitudes of Hitt's five-protein fingerprint to amplitudes at the corresponding M/Z values in the masked samples, sorting samples into cancer and noncancer categories. When the samples were unmasked, the success of the fingerprint was all the more impressive: The malignant samples had included every major subtype of epithelial ovarian cancer, while the nonmalignant samples included potential false positives, benign disorders such as endometriosis and uterine fibroids.

Next Steps

NCI and Correlogic plan to confirm the sensitivity and specificity of the fingerprint in a prospective study with more patients, closely assessing its ability to detect stage I ovarian cancer. They also plan a pilot study to evaluate the fingerprint in epithelial ovarian cancer patients who are in remission. This time-course study will compare proteomic fingerprints of women who remain in remission and those who relapse.

A better understanding of false positives will be especially important in deciding how the fingerprint can best be used. Three false positives among 66 controls may be acceptable if the fingerprint is used only to screen women at high risk, such as those who have inherited mutations in breast and ovarian cancer genes BRCA1 and BRCA2. But for screening the general population Petricoin and Liotta consider the false positive rate unacceptably high.

The researchers suggest that it may be possible to lessen false positives by combining the proteomic fingerprint with other diagnostics. Then, if the next trials confirm that the false negative rate is zero, negative fingerprint tests would relieve patients of worry about ovarian cancer, while positive results would be subject to confirmation. The future also holds the possibility that as more SELDI-TOF data comes in, Proteome Quest will find a more accurate fingerprint. That is because adaptive pattern recognition is a key component of the software. "That's one of the beauties" of Proteome Quest's design, says Hitt. "When the diagnostic model screens a larger population, it will learn about that population as time goes on."

It is notable that the identities of the fingerprint proteins are unknown; proteomic fingerprinting does not depend on knowing what the proteins do. Once the five serum proteins are identified, they will undoubtedly be studied for what they may reveal about the mechanisms of ovarian cancer.

Multiprotein biomarkers seem so obvious that Correlogic ought to have swarms of competitors. But sometimes the chasm that separates idea and execution is especially broad. The patent search Correlogic did before filing a patent on its software turned up no competitor, says Hitt. "As near as we can tell, no one else has gone this route." Correlogic's broad patent covers discovery of "patterns of biomolecules to classify a biological state."

Correlogic is not entirely alone, however, according to Dick Rubin, director of marketing at Ciphergen Biosystems, in Fremont, Calif. Ciphergen sells the SELDI-TOF system used for the Lancet study. Rubin states that Ciphergen has software capable of finding multiprotein fingerprints by different algorithms.

In any event, Hitt is having the most exciting time of his life and Correlogic Systems is expanding beyond its current four employees to start new projects. In cooperation with the FDA and NCI, Correlogic is already hunting proteomic fingerprints for other cancers. Hitt mentions trying to find a fingerprint correlated to high risk for cardiac disease. Another idea is to look for a proteomic fingerprint for Alzheimer disease. The ovarian cancer fingerprint may be just the first of many, now that Ben Hitt has come back to biology.

Tom Hollon (thollon@starpower.net) is a freelance writer in Rockville, Md.

1. E.F Petricoin III et al., "Use of proteomic patterns in serum to identify ovarian cancer," The Lancet, 359: 572-7, 2002.

For Further Information
For a detailed explanation of proteomic pattern analysis applied to ovarian cancer, go online to clinicalproteomics.steem.com

Advertisement
Keystone Symposia
Keystone Symposia

Follow The Scientist

icon-facebook icon-linkedin icon-twitter icon-vimeo icon-youtube
Advertisement

Stay Connected with The Scientist

  • icon-facebook The Scientist Magazine
  • icon-facebook The Scientist Careers
  • icon-facebook Neuroscience Research Techniques
  • icon-facebook Genetic Research Techniques
  • icon-facebook Cell Culture Techniques
  • icon-facebook Microbiology and Immunology
  • icon-facebook Cancer Research and Technology
  • icon-facebook Stem Cell and Regenerative Science
Advertisement
ProteinSimple
ProteinSimple
Advertisement
The Scientist
The Scientist