|Courtesy of Peter Uetz|
In Japan, Takashi Ito, professor of genome biology, Cancer Research Institute, Kanazawa University, was reaching the same conclusions. "When we finished the sequencing of the budding yeast, we learned that almost half of the genes within the genome ... hadn't been hit by the genetic approach," he says. That necessitated a new way of doing things: working on the proteins themselves. Not being a protein chemist, Ito says, yeast two-hybrid looked like the way to go.
The premise was simple; the reality was Herculean. This reporter-based assay relies on the dichotomous nature of certain transcriptional activators. One gene of interest can be fused to the DNA binding domain, and another gene believed to interact with the first can be fused to the transcription activation domain. The clones are transformed into haploid yeast, which are then mated. If these proteins interact, the transcription factor becomes functional, switching on expression of the reporter or reporters. For yeast, at around 6,000 genes, checking all interactions means doing that nearly 36 million times.
So, three groups set out to do some truly seminal work in large-scale proteomics: the first comprehensive protein-protein interaction map for a eukaryotic organism.1,2 Whether or not they succeeded is a point of contention.
|Courtesy of Curagen|
At the time, Uetz was a postdoc working with Stanley Fields, Howard Hughes Medical Institute Investigator and professor of genome sciences and medicine, University of Washington. Credited as a pioneer of yeast two-hybrid,3 Fields and his group took the same approach they had used on bacteriophage T7.4 They created an array of 6,000 transformants, each with a different open reading frame (ORF) cloned into the Gal4 activation domain, the so-called 'prey' of yeast two-hybrid. Then, they created the 'bait': transformants with ORFs cloned into the Gal4 DNA binding-domain. They added the bait, one at a time, to the colossal array. Then, the researchers looked to see what bit.
The approach, while thorough, was limiting. Their paper has data from just 192 proteins, representing 3% of possible tests. Eighty-seven of the arbitrarily picked proteins yielded 281 protein-protein interactions.
CuraGen researchers, who were in contact with, and eventually published with the Fields lab, took a faster approach. They pooled roughly 5,300 transformants with ORFs cloned into the GaI4 activation domain, then screened by mating bait clones one at a time. LoÏc Giot, a CuraGen group leader, says, "The most important criteria was to go fast and to see if we could extrapolate that approach to a bigger genome." CuraGen found 817 proteins interacting in 692 reactions. The data from both studies do not overlap greatly, but Fields says CuraGen demonstrated the ability of a high-throughput approach, while the array method generated more interactions per protein. "They're sort of complementary in their strengths and weaknesses," he says.
Though they worked independently, the groups published together, taking the advice of the papers' reviewers and editors. Uetz says he was reluctant to copublish because the methods were so different. But, "In the interest of getting a Nature article as opposed to two letters to Nature we thought, 'OK, why not?'"
In Japan, Ito and researchers, then at the Human Genome Center, Institute of Medical Science, University of Tokyo, had devised a plan similar to CuraGen's. Instead of pooling all 6,000 transformants, Ito's team split the bait and prey transformants into pools of 96 clones, intending to mate 3,844 combinations, accounting for almost 36 million possible combinations. Ito says they were still working when he heard CuraGen had tackled the project in a similar manner. "So," says Ito, "that's the reason why we published this paper before finishing them all; just to report establishment of the system and develop the pilot experiment." The 2000 Ito paper covered roughly 10% of the combinations. Ito followed with a complete view in 2001.5 In total, the group reported 4,549 interactions, 841 of which showed up three or more times making up their "core data." Yet, some have cast doubt on the full data set.6 Of Ito's methods, Giot says, "For just looking at yeast, that was probably the most efficient way to go." Efficient, yes, but researchers ask, "Is faulty data useful?"
A Quarry of Questionable Data
Ito agrees that the probabilistic methods of pooling ultimately face nature's indifference to the odds. "In principle, this procedure can test all the possible combinations. In the real world we did miss some interaction."
All told, the groups found around 5,000 unique interactions—many involving previously uncharacterized proteins. About 10% show up in more than one dataset. Michael Snyder, professor and chairman, molecular, cellular, and developmental biology, Yale University, predicts that there are 40,000- 50,000 interactions, but he adds, "In this business ... you don't have to know 100% of the story to get a lot of useful information."
Scientists also contend with false positives, ubiquitous in most high-throughput proteomic studies. Giot explains that different kinds exist. "There are false positives which happen because of the technology. Those are pairs that should not appear in that system, and those are easy to identify by looking at reproducibility." He says he thinks these account for about 10% of the information that the screen generates. Other false positives occur between proteins that do not interact in the biological context, but do interact in the two-hybrid system. "To be honest," he says, "we don't know how many of those are showing up during the screen."
Yeast two-hybrid shows interaction, but tells little about dynamics, says Snyder. "[Interaction] could show up at specific times in the life cycle and that might be missed in their studies because they mainly work over vegetative cells." Additionally, yeast two-hybrid misses some interactions from membrane-bound proteins, and it cannot always accurately account for post-transcriptional processing of the proteins of interest. "I think the community data ... is of higher quality," Fields says. "It's more reproducible and much more likely to represent real interaction data." Ito explains that his paper missed almost 90% of two-hybrid interactions reported in the literature because researchers employ various, carefully crafted experiments and not simply a one-shot, hypothesis-free test.
Researchers created these interaction maps to be hypothesis-generating machines intended to raise questions and not answers. False positives could leave many chasing down irrelevant leads. But, says Boone, "Creative people will be able to make use of the false positives and will be able to decipher from this sea of false positives and true interactions which ones are biologically relevant."
People still apply new techniques in a high-throughput manner. Ito explains that his lab has been studying how mutations affect interaction capabilities through so-called reverse two-hybrid. "We find that accumulation of these kind of mutation data has provided us with very nice starting points in identifying binding domains," Ito says. Also, as in genomics, some anticipate that cross-species comparisons will yield more answers about binding motifs and their evolutionary significance. At CuraGen, researchers have finished the Drosophila proteome and are working on the human, Giot reports. He says, "We are putting those data together in order to compare interaction across species in everything from yeast to Drosophila to human. We believe this is one of the best ways to identify real interaction."
1. P. Uetz et al., "A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae," Nature, 403:623-7, Feb. 10, 2000. (Cited in 338 papers)
2. T. Ito et al., "Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins," Proceedings of the National Academy of Sciences (PNAS), 97:1143-7, Feb. 1, 2000. (Cited in 83 papers)
3. S. Fields and O.K. Song, "A novel genetic system to detect protein-protein interactions," Nature, 340:245-6, 1989.
4. P. Bartel et al., "A protein linkage map of Escherichia coli bacteriophage T7," Nature Genetics, 16:277-82, 1997.
5. T. Ito et al., "A comprehensive two-hybrid analysis to explore the yeast protein interactome," PNAS, 98:4569-74, April 10, 2001.
6. R. Mrowka et al., "Is there a bias in proteome research?" Genome Research, 11:1971-3, Dec. 11, 2001.