An investment of $100 million should be enough to correlate the genome with function, and identify new basic research and drug targets
MapQuest and global positioning systems have radically changed the way we travel. By showing us where we are relative to where we want to go, these tools simplify the job of getting from point A to point B, and make travel in unfamiliar places less stressful. Some years ago I realized molecular biologists face an analogous problem: Cells contain tens of thousands of proteins and other macromolecules, which mediate hundreds of thousands of physical interactions at any given moment. Yet biologists lacked the navigational aids to traverse those interaction networks, aids that travelers today take for granted.
Early efforts to chart protein-protein interaction (or "interactome") networks, focused on defined biological processes in yeast and the worm, Caenorhabditis elegans; proteome-scale maps of yeast, worm, and the fly soon followed.1 Recently, two groups, Erich Wanker's lab in Berlin and my own, have published initial attempts at a proteome-scale human interactome.2,3
These efforts are already offering insight into the global topology and dynamics of interactome networks.4,5 And they are providing new angles to explore long-standing biological problems. Yet with only about 1% of the human interactome mapped, and perhaps 10% for the model organisms, we must not be complacent. In fact, I propose we expand our efforts and establish a Human Interactome Mapping Project modeled on the Human Genome Project (HGP).
MAPPING TERRA INCOGNITA
Biologists who specialize in particular biological pathways may not feel the need to relate their "local" molecular information to a more global, more integrated cellular model. They might argue the interactome is a waste of money, just as some would argue a system like MapQuest is not worth the investment.
But one afternoon in the spring of 1993, while reading Nature papers describing the sequence of yeast chromosome III and of three adjacent C. elegans cosmids, I realized how little we knew about genes and proteins relative to the whole genome and, one would say today, proteome. It occurred to me that this huge terra incognita could not be functionally investigated without some sort of systematic effort at mapping physical protein-protein interactions.
I was not the first to reach that conclusion, of course. For more than 50 years scientists like Max Delbrück and Conrad H. Waddington have been proposing models based on the idea that macromolecules form complex networks of functionally interacting components, and suggesting that the molecular mechanisms underlying most biological processes correspond to particular steady states adopted by such cellular networks.
Such systems-level conjectures complement molecular biology's reductionist, one-gene/one-function point-of-view in several ways. First, they provide a framework for understanding general biological properties like robustness and adaptability. It is unclear, for example, why more than half of all unique yeast genes (i.e., those without any recognizable genomic homolog) are dispensable for viability. These models also address limitations of the one-gene/one-function paradigm, such as the "gene number paradox": how species as different in complexity as worms and humans could contain approximately the same number of genes.
Systems-level models also provide testable hypotheses to explain, and not merely describe, cellular events like differentiation and homeostasis. Finally, they could aid early drug development, by considering a drug's actions in the context of the cellular networks in which the drug target functions.
A HUMAN INTERACTOME PROJECT
Despite these benefits, systems-level models have remained underdeveloped and underutilized until recently, mainly because of a lack of supporting experimental data. Now, with the human genome sequence in hand, we can consider gene product interactions on a proteomic scale-the starting point to ultimately generating wiring diagrams of all functional interactions in the cell.
A worldwide human interactome project will require a significant infusion of resources, however. With 24,000 or so human genes, a comprehensive pairwise matrix would constitute some 576 million combinations. And given rough estimates of about 300,000 interactions in the human interactome (not counting differential splice variants),2 "back-of-the-envelope" calculations suggest a thorough mapping project could cost anywhere from $100 million to $1 billion, depending on the desired level of quality and completeness-though those numbers will change as high-throughput interaction assays mature.
© 2005 MACMILLAN PUBLISHERS LTD.
This subset of the Vidal group's preliminary human interactome, originally published by Nature
in 2005 (437: 1173-8), contains 121 OMIM (Online Mendelian Inheritance in Man) proteins (green nodes) and 424 interactions involving them (orange edges), along with known literature-curated interactions (blue edges). Proteins without an OMIM disease association are depicted as yellow nodes. Note that 94 out of the 424 CCSB-HI1 interactions involve the Ewing sarcoma-related protein, EWSR1.View full version
Global annual funding for interactome projects to date has been about two orders of magnitude less than that allocated for genome projects. As a practical first step, funding agencies around the world must revise how they judge interactome project proposals, and review them as they did genome-sequencing projects. In particular, they must recognize that large-scale discovery of protein-protein interactions will bring a basic understanding of human biology that is at least on par with the HGP.
A second necessary step is the coalescence of the interactome community behind this project. A few months ago many in this field gathered at the Wellcome Trust Sanger Institute in Hinxton, UK, for the first Cold Spring Harbor Laboratory "Interactome Network" meeting. Those who attended share a core set of ideas that we are currently summarizing in a white paper.
Overall, six critical lessons can be learned from the HGP:
1. Make all information and resources publicly available and in affordable, standard formats. One of the cornerstones of the HGP was the recognition that no one could "own" the sequence, an ideal best described by John Sulston in his book, The Common Thread. We already have standards for releasing protein-protein interaction data. Early versions of recombinationally cloned coding-sequence, or "ORFeome" resources are also available, but need further development.
2. Support complementary approaches. Binary (e.g., yeast two-hybrid) and co-complex membership (e.g., affinity chromatography/mass spectrometry) mapping are complementary approaches that each offers something unique to interactome projects. Using both will improve data quality, just as expressed sequence tags complemented, rather than competed with, whole-genome sequencing projects.
3. Develop quality measures. We need the equivalent of a Phred score (a confidence metric for base-calling software) to provide a consistent measure of interaction data quality. Toward this goal it would be useful to establish a "gold standard" set of true positives and true negatives that would have to be tested as a shared control for any interaction dataset made publicly available.
4. Develop computational tools. Just as the human genome sequence must be annotated, the human interactome map must be integrated with other datasets to provide increasingly informative models. Without accessible navigational tools, interactome data will be useless to most of the scientific community. We need new computational tools to achieve these ends.
5. Define our deliverables. The HGP was conveniently subdivided by chromosome; we propose subdividing the human interactome project into "search spaces" defined according to the (increasing) availability of open reading frames, or ORFs, in the human ORFeome. Our group has generated 8,000 human ORFs, corresponding to about one-third of the predicted genes. Using that collection we searched one-ninth of all protein-protein pairwise combinations of the human proteome.2 Subsequent search spaces can be added as additional ORF resources become available.
6. Establish quality-control procedures. In order to finish the interactome map, we must interrogate the interactome using different protein-interaction assays to ensure that both sensitivity and specificity are optimized. It will also be crucial to map as precisely as possible the protein domains that are required for these interactions.
One outstanding question in considering the human interactome is how we measure completeness. Measuring the completeness of sequence information in the HGP was trivial, but when do you call an interactome complete? The 80/20 Rule ("Pareto's Principle") could help us here.
Suppose we decide that 80% of all predicted genes should be interrogated with at least with one splice variant, with a sensitivity rate of 80% and a specificity rate of 80%. Will such an "80x3" draft map be complete enough to allow accurate analyses of the global properties that organize this complex network? More importantly, will such a draft convey how the network's global properties are perturbed in various human diseases? If one considers tumor viruses, for instance, will it be possible to understand the global perturbations viral proteomes cause to their host's interactome?
I think it will be. But just as the HGP has neither answered every outstanding biological question, nor cured all disease, "just" knowing what protein pairs can interact with each other, which proteins can form a complex, or what transcription factors can bind to which promoter, will say nothing about how and when these interactions happen in the cell, nor will it reveal the purpose of the interaction. But it is the necessary scaffold to get us there.
Marc Vidal is the director of the Center for Cancer Systems Biology at the Dana-Farber Cancer Institute and an associate professor of genetics at Harvard Medical School. His research group seeks to understand how global and local properties of macromolecular networks relate to biological processes and to human disease. Dr. Vidal thanks Fritz Roth and David Hill, without whom his group's involvement in the human interactome mapping project would not be possible.
1. M. Vidal, "Interactome modeling," FEBS Lett
, 579:1834-8, 2005.
2. J.F. Rual et al., "Towards a proteome-scale map of the human protein-protein interaction network," Nature
, 437:1173-8, Oct. 20, 2005.
3. U. Stelzl et al., "A human protein-protein interaction network: A resource for annotating the proteome," Cell
, 122:957-68, Sept. 23, 2005.
4. H. Jeong et al., "Lethality and centrality in protein networks," Nature
, 411:41-2, 2001.
5. J.D. Han et al., "Evidence for dynamically organized modularity in the yeast protein-protein interaction network," Nature
, 430:88-93, 2004.
6. M. Vidal, "A biological atlas of functional maps," Cell
, 104:333-9, 2001.