Several dozen biologists and computer scientists are gathering this week in Bar Harbor, Maine, to discuss ontology--not the hoary philosophical concept, but the bioinformatics buzzword referring to a computer-based representation of the facts established by a scientific field. One group of conferees, the five-year-old Gene Ontology Consortium, will likely focus on programming issues and new viewing, browsing, and editing tools, says Monte Westerfield, the University of Oregon biology professor who directs the core zebrafish database, the Zebrafish Information Network (ZFIN). The other group, only a year-and-a-half old and loosely established, will be hashing out the basic computational grammar and syntax needed to describe phenotypes and their abnormalities.
As the relative progress of these two groups indicates, phenotype lags far behind genotype in the world of bioinformatics. And nowhere is that disparity greater than in the case of anatomy, phenotype's most tangible manifestation. But therein lies a problem: how to digitally link inchoate anatomical information with well-ordered genomic and proteomic data. Finding a solution is increasingly urgent as database and Web site curators try to assimilate thousands of images depicting tissues, organs, and body structures. That number should balloon as journals print ever fewer costly pictures per article, forcing microscopists and imaging specialists to turn to digital forms of publication.
A computational approach to anatomy "is actually rather difficult," observes Jonathan Bard, a professor of bioinformatics and development at Edinburgh University. "Also, anatomy is not very fashionable." Nevertheless, progress is occurring on several fronts: Links between anatomical and gene-expression data continue to expand. An upcoming Internet site should allow viewers to compare anatomies across species, a task that is now difficult, if not impossible, to accomplish online. Finally, various online resources already detail the anatomy of a single species or organ.
WHY ANATOMY IS DIFFICULT Bard says that an anatomy ontology linked to a database would facilitate faster and more rigorous computer searches. But translating anatomy into computerese is a daunting task. Unlike genomes with their four bases and proteomes with their 20 amino acids, anatomies are not reducible to a few building blocks or functions.
Moreover, descriptions of phenotype are "normally contingent upon assays," conducted under a wide range of conditions and with varying degrees of precision, notes Michael Ashburner, a biology professor at the University of Cambridge who is creating a mutant-phenotype ontology. Codifying and standardizing phenotypes, he grouses, "turns out to be a bit of a pick of a problem." Though ontogeny might proverbially recapitulate phylogeny, ontology does not easily recapitulate anatomy.
Recently established model organisms can present a more basic difficulty: Scientists are unsure how to characterize--or even what to call--certain anatomical parts. Westerfield acknowledges that some structures in the developing zebrafish remain nameless. "You see something in an embryo and you say maybe that's the spleen or the liver," he explains. "But to prove that, you have to label those cells in some way and show that they are the primordium, or the precursor, of the liver or the spleen. And that's a lot of work, slow work."
PIONEERING APPROACH Despite the lack of appropriate ontologies, limited integration of anatomical and molecular data has already occurred. The ZFIN Web site, for example, allows users to click on any one of almost 1,000 genes and receive images of tissues expressing that gene during development. But clicking on an image or the name of a tissue to learn which genes it expresses is not yet possible. Westerfield predicts that improvements to ZFIN could enable such queries within a year.
The Mouse Genome Informatics database has an anatomical dictionary browser that "allows you to start from a tissue or an anatomical term and recover information about expression," says Judith A. Blake, a principal investigator at the Jackson Laboratory in Bar Harbor. That information might include details about the assays used, as well as images of gels and in situ hybridizations. WormBase, the core database of Caenorhabditis elegans, allows users to designate the cell, cell group, and/or life stage in a search for gene-expression patterns.
Sudhir Kumar, an associate professor of life sciences at Arizona State University in Tempe, is pioneering an image-based approach to linking anatomy with gene expression. Called FlyExpress, it uses digital-image processing and pattern-recognition technology to encode and compare gene-expression patterns in the developing Drosophila melanogaster.1 "You give us an image, and we will find all other images with a similar expression pattern," promises Kumar. Interactions then can be inferred between genes or gene products that yield similar in situ or immunoblotting patterns. Biologists now make such inferences the low-tech way--by seeing a picture and remembering others.
Kumar expects that FlyExpress, which was funded in June, will contain about 90,000 images within three years. He is also fashioning a system that will computationally infer gene interactions. "There are a huge number of images coming through, and we no longer can afford to spend time on them by eye," he notes. "It just takes forever, so we really need to have computational techniques."
A HANDY HANDBOOK Suppose a researcher is studying the excretory system of worms and wants to compare it to fly excretion. No existing database will provide much help, remarks Edinburgh University's Bard. So he and nine colleagues are creating one called XSPAN.
In XSPAN, Bard says, "If you click on a particular tissue in a particular species, you will be given all the equivalent tissues in other model organisms, the reasons why they're considered equivalent, and the common cell types." The main challenge in building the database, he observes, is identifying tissue equivalents (a term Bard prefers in lieu of the more contentious concepts of homology and analogy).
A working version of XSPAN's cross-species anatomy is not due to go online for two years. But Web sites focusing on a single model organism's anatomy are already available. In April 2002, for example, Wormatlas began presenting visual and textual information about C. elegans. "We started with the adult hermaphrodite, and we're doing the adult male now," says editor David H. Hall, a neuroscience professor at Albert Einstein College of Medicine in New York. "My goal is to do the normal embryo a few years down the road and the normal larval anatomy after that."
Fellow editor Zeynep Altun recalls modeling one of Wormatlas' more unusual features, its "Anatomy Handbook," on the pocket-size texts she carried during medical school; these books used images, words, and flowcharts to elucidate a complex subject. Elliot Perens, an MD-PhD student at Rockefeller University, is grateful for this resource. "Details of the [worm's] anatomy aren't just sitting there when you look under the microscope," he points out. Consulting the handbook means that "you don't have to go and reinvent the wheel. Someone's already figured out the anatomy for you."
The handbook is not a database, but its text-based structure has its advantages, according to Thomas B. Brody, a staff fellow at the National Institute of Neurological Disorders and Stroke. Creator of the Interactive Fly, an Internet resource about Drosophila genes and development, he argues: "The most amenable approach to presenting the biological information in a coherent form, to me, is text-base because a database doesn't really allow you to put information in context. It allows you to make lists."
AN "ALTRUISTIC" ACT The Human Brain Project (HBP) represents another approach to computerizing anatomy, albeit of a single organ. Funded by the US government (its FY2002 budget was $15.8 million, up from $6.7 million in FY1995) the 10-year-old HBP now encompasses 37 databases. Its aim is to integrate the huge amounts of brain data being generated "so you can really start to understand how individual cells, then groups of cells, then circuits and systems, work together," says coordinator Stephen H. Koslow, an associate director of the National Institute of Mental Health.
One HBP Web site, overseen by Yale University's Gordon Shepherd, tells a user within seconds which brain cells have a specific ion channel, neurotransmitter, or receptor. Another site, run by the Medical College of Georgia's Kristen Harris, uses ultra-thin serial sections to reconstruct three-dimensional brain images. And a third, headed by John C. Mazziotta of the University of California, Los Angeles, School of Medicine, is combining magnetic resonance images from 7,000 people to create a brain atlas.
HBP's large-scale approach to elucidating brain anatomy is spreading. In June 2002, the Neuroinformatics Working Group, part of the Paris-based Organisation for Economic Cooperation and Development, reported on various efforts in 17 nations plus the European Community (www.oecd.org/dataoecd/58/34/ 1946728.pdf).
One difficulty plaguing all computer-based efforts to describe model organisms has been many researchers' reluctance to share data. This problem is "slowly being resolved," Koslow contends, because "people are understanding why it's important to do [that]." Yet he acknowledges that few rewards exist for sharing data or analyzing other investigators' data, and ethical guidelines for sharing are still under discussion.2 Another hindrance, says Hall, is that contributing to model-organism Web sites is "not an accepted way to get recognition" professionally, even though many sites name their contributors. He describes the act of providing data as still "very much altruistic."
Douglas Steinberg (firstname.lastname@example.org) is a freelance writer in New York City.
1. S. Kumar et al., "BEST: a novel computation approach for comparing gene expression patterns from early stages of Drosophila melanogaster development," Genetics, 162:2037-47, 2002.
2. D. Gardner et al., "Towards effective and rewarding data sharing," Neuroinformatics J, 1:289-95, 2003.