Douglas Brutlag challenges students in his computational biology classes at Stanford University to search the large proteomics databases for yeast membrane proteins. Without knowledge of the database lexicons, the students generally come up well short of the mark. "They find 20 to 200," says Brutlag, professor of biochemistry and medicine at Stanford's School of Medicine. "In fact, there are almost 2,000 proteins."
The problem: linguistics. "These are controlled vocabularies," Brutlag explains. "The key words for membrane proteins are trans membrane, inner membrane, and outer membrane, and unless you have synonyms for all of those, you miss them when you search the data."
Multiply the graduate students' challenge in searching databases by orders of magnitude, and this represents the trials of researchers in academia and industry worldwide. There is a Babel of different computer languages, imagery systems, and software programs that use unique symbols and store their biological treasures in distinctive...