Digging through the Data

1. Which databases get a lot of traffic?The three largest International DNA databases are the European Bioinformatics Institute's (EBI) EMBL, the US National Center for Biotechnology Information's (NCBI) GenBank, and the DNA Data Bank of Japan. They rank at the top of the list for traffic, followed by the EBI's Swiss Prot, a protein sequence database, and EnsEMBL, EBI's annotated metazoan genome browser. Filling out the toolbox are the model organism databases (MODs), including WormBase and FlyB

Maria Anderson
Apr 11, 2004

1. Which databases get a lot of traffic?

The three largest International DNA databases are the European Bioinformatics Institute's (EBI) EMBL, the US National Center for Biotechnology Information's (NCBI) GenBank, and the DNA Data Bank of Japan. They rank at the top of the list for traffic, followed by the EBI's Swiss Prot, a protein sequence database, and EnsEMBL, EBI's annotated metazoan genome browser. Filling out the toolbox are the model organism databases (MODs), including WormBase and FlyBase, and Gramene, a resource for rice and grass genomics. All these databases contain extensive sequence information and genome maps. Some allow searches for sequences and for gene products and expression patterns. BLAST, NCBI's search engine that compares recent findings with known information, can be used to detect similarities between nucleotide sequences or protein sequences.

2. The gigabytes are mounting. Is there any help out there?

Most database Web sites offer their own...

3. Who maintains them?

Curators ensure that databases stay in good working order, although they rely heavily on primary researchers for updates. Mistakes do happen, and once online, they're hard to correct.2 Though time-consuming, the user may want to double-check online search results with the printed literature.

4. Do the databases speak the same language?

No, and that can cause big headaches. But things are beginning to change. The Gene Ontology Consortium is working to establish "consistent descriptions for gene products in different databases," according to its website. The distributed annotation system (DAS) allows databases to exchange information and annotations on genomic sequence data and display them in a single view.

5. What does the future hold for databases?

NCBI's RefSeq is trying to overcome some of GenBank's shortcomings. Those working on the Generic Model Organism Database hope to provide a single, standardized format for MODs. But Cold Spring's Lincoln Stein, an informaticist who helped develop WormBase and Gramene, takes a different stance: "There will only be one database in 15 years, and its name will be Google."

- Maria W. Anderson