Bolstering Functional Genomics

Sometimes it pays to listen to your adviser. Hui Ge, a graduate student in Marc Vidal's lab at Harvard Medical School did, pursuing one of her adviser's pet projects, and was published in Nature Genetics for her trouble.1 Ge and second author Zhihua Liu were partners in genetics professor George Church's annual course, Genomics and Computational Biology. As part of that course, students must pick an individual project and run with it, says Church. "

By | January 21, 2002

Sometimes it pays to listen to your adviser. Hui Ge, a graduate student in Marc Vidal's lab at Harvard Medical School did, pursuing one of her adviser's pet projects, and was published in Nature Genetics for her trouble.1

Ge and second author Zhihua Liu were partners in genetics professor George Church's annual course, Genomics and Computational Biology. As part of that course, students must pick an individual project and run with it, says Church. "If the projects are good enough, then I continue to encourage them and help them take it to the next step." Ge and Liu's project was to correlate expression-profiling (transcriptome) data with protein-protein interaction (interactome) data. What they found was that these data sets overlap, providing a way to add confidence to studies that would emanate from such results.

This project stems from the ever-increasing gap between available sequence data and available functional data. Some researchers now rely heavily on high-throughput methods, such as DNA microarray analysis, to illuminate new and interesting genes. Because of the vast quantity of data such techniques produce, the data is often clustered, in which data analyses group genes that behave in a similar way. In the case of biochip data, gene clusters tend to change expression levels in a certain way in response to a variety of treatments. The theory is that if two genes are in a cluster, and the function of one is known, scientists can infer that the other gene's function is related, perhaps by being in the same pathway.

©2001 Nature Publishing Group

The intra-cluster region has an average protein interaction density (PID) 5.8-times higher than does the inter-cluster region. (Reprinted with permission from Nature Genetics.)



Protein-protein interaction data is often similarly used. Scientists reason that if two proteins are members of a macromolecular complex, they likely have similar or related functions. Thus, investigators often wish to use expression-profiling and interaction data as launching points for further studies, to identify the functions of unknown genes.

But, how can researchers proceed with the confidence that they won't spend years chasing a red herring, or that they haven't missed something really important? The answer, says Vidal, is to overlap multiple types of data, creating a "biological atlas."2 Church explains, "When you find a coincidence between RNAs that co-cluster and proteins that interact, it ... reinforces your confidence that they're both correct."

Ge's approach was to group publicly available expression profiling data into 30 clusters. She then took public yeast two-hybrid interaction data, and investigated whether interacting proteins are encoded by genes in the same cluster (intracluster) or in different clusters (intercluster). The final data take the form of colored squares arrayed in a right triangle with 30 squares on each side. The interaction density for a given square is shown as a color gradient, with yellow squares containing the highest density.

When all the data were input into the matrix, most of the brightest squares were found along the triangle's diagonal, meaning that intracluster interactions are more common than are intercluster ones. When these data were plotted as a histogram, Ge found that the intracluster interaction density is about 6-7 times higher than the intercluster density. In other words, genes that are expressed coordinately often interact at the protein level, and conversely, proteins that interact are often encoded by coordinately expressed genes.

©2001 Nature Publishing Group

Transcriptome-Interactome correlation data for 335 protein pairs culled from the literature, compared to a randomized control.



In a sense, this is perfectly rational, even obvious. "If you imagine a complex of proteins ... you would hope that the genes encoding the subunits of this complex are coregulated," says Vidal. From a teleological point of view, this is in the cell's best interest. As Church points out, "if you have two gene products that work as a dimer, and one of them is expressed and the other one isn't, the other one is like 'idle hands in the devil's workshop.' It can go off and stick to things." So when Ge began this study, other researchers offered some good-natured ribbing. Vidal says, "When we were working on this, our [lab] neighbors kept telling us, 'yeah, right, so, of course.'" But, he adds, "It's very nice to have proof."

Vidal's biological atlas metaphor extends beyond interactome and expression data, to include biochemical genomics, structural genomics, gene knockouts, and protein localization data.2 Recently, his group published another article demonstrating the correlation of such datasets, in which they overlaid interactome data with large-scale phenotypic analyses, resulting in the identification of several new DNA damage response genes in Caenorhabditis elegans.3

Jeffrey M. Perkel can be contacted at jperkel@the-scientist.com.

References
1. H. Ge et al., "Correlation between transcriptom and interactome mapping data from Saccharomyces cerevisiae," Nature Genetics, 29:482-6, December 2001.

2. M. Vidal, "A biological atlas of functional maps,"Cell, 104:333-9, Feb. 9, 2001.

3. S.J. Boulton et al., "Combined functional genomic maps of the C. elegans DNA damage response," Science, 295:127-31, Jan. 4, 2002.

Popular Now

  1. Major German Universities Cancel Elsevier Contracts
  2. Running on Empty
    Features Running on Empty

    Regularly taking breaks from eating—for hours or days—can trigger changes both expected, such as in metabolic dynamics and inflammation, and surprising, as in immune system function and cancer progression.

  3. Most of Human Genome Nonfunctional: Study
  4. Identifying Predatory Publishers
AAAS