Functional Genomics Annotation: It's Logical!

Researchers are now using symbolic logic as a proteomics tool.

Mar 14, 2005
Maria Anderson
<p/>

© 2004 AAAS

Researchers are now using symbolic logic as a proteomics tool. Todd Yeates and colleagues at the University of California, Los Angeles, used logic analysis of phylogenetic profiles to identify functional correlations between proteins.1 These correlations can be used to make inferences about the likely functions of many previously uncharacterized genes and proteins, Yeates says.

Yeates' team identified eight different logic statements representing all the possible relationships between three proteins. For example, protein C is present if and only if protein A and protein B are both present; or, protein C is present if and only if protein A is present and protein B is absent. Using such statements, the group examined the complete set of proteins, divided into 4,873 families known as clusters of orthologous groups, from the fully sequenced and publicly available genomes of 67 organisms, mostly bacteria and archaea.

The team found 750,000 previously uncharacterized relationships between protein families. For example, they found that an archaeal DNA-binding protein is present if and only if an ATPase involved in DNA repair is present and a mismatch repair ATPase is absent. "Traditionally people have looked at genomic data in a binary way," says coauthor Peter Bowers. "We took that a step further. ... We went through systematically and looked at ternary relationships between proteins in a phylogenetic matrix."

Christopher Hogue, a biochemist at the University of Toronto and principal investigator for the Blueprint Initiative, an organization that develops and maintains public biological databases and bioinformatics software tools, points out that connecting proteins through a logic function doesn't necessarily indicate a direct interaction. What makes this approach novel is that the relationships described by these logic statements could represent protein-protein interactions, or inclusion in the same molecular complex or signaling pathway, says Gary Bader, a computational biologist at Memorial Sloan-Kettering Cancer Center in New York. " [They] are not just simple co-occurrences," he concurs.

Yeates says these relationships might allow scientists to "potentially place ... completely uncharacterized genes or proteins in a cellular context." If all the connections drawn to one uncharacterized protein involve proteins known to be players in signaling, virulence, or cell trafficking, then a role for the unknown protein can be predicted. The patterns found across multiple genomes might allow biologists to determine which cellular components operate together, substitute for each other, or represent alternatives to each other, he adds.

Hogue describes the approach as "straightforward and elegant," but adds, "My only critique is that you can't do anything with it yet." The raw data are available online to download now, Bowers says, but in the next few months the group hopes to incorporate them into the Prolinks database http://mysql5.mbi.ucla.edu/cgi-bin/functionator/pronav, a project of the UCLA Institute for Genomics and Proteomics. The project uses inference methods to predict functional linkages between proteins.