Once the part-time, poorly paid province of postdocs and graduate students, biocuration has become a full-time, salaried career, driven by the explosive growth of biological data in recent years. More genomes are being sequenced, cDNA and EST projects are proliferating, the HapMap Project is going strong - and that's just nucleotide sequencing. Then there's proteomics or 3-D structures (see "Seeing is Believing" pg. 46), notes Rolf Apweiler, who heads the sequence database group in Hinxton, UK, at the European Bioinformatics Institute (EBI), a division of the European Molecular Biology Laboratory.
"We're getting to the point where it's possible to sequence a genome in a day," adds Maria Costanzo, a biocurator working on Stanford University's Saccharomyces and Candida genome databases. "But the raw sequence information is pretty much worthless unless it's interpreted and organized. The open-reading frames and other sequence features have to be identified; comparisons need...
Essentially, that's what biocurators do: maintain a trove of information on a particular organism, manage it, add to it, and help other people use it. Biocuration falls under the umbrella of bioinformatics, or the use of computing, mathematical, and statistical techniques to investigate and understand biological systems. But rather than doing their own experiments, biocurators gather and organize this information so other scientists can use it. While their absolute numbers remains relatively small - those in the field number approximately 1,000 full-time biocurators - nearly every life scientist working today incorporates biocuration in his or her research (see Tips box), points out Sue Rhee, a staff scientist at the Carnegie Institution and principal investigator of The Arabidopsis Information Resource (TAIR).
The average salary for a biocurator working in academia is $50,000 to $60,000; those in the private sector earn somewhat more. Philip Bourne of the University of California, San Diego, who oversees several biocurators in his job as codirector of the Protein Data Bank, says he expects salaries to rise. "The level of professionalism required is going up," and salaries should follow, he says. A PhD in molecular biology, developmental biology, microbiology, or other relevant field is now a requirement for the job, as well as an in-depth knowledge of the organism being curated. But, there is no single career path, and no formal training programs exist. Rhee, who has a PhD in plant developmental biology, went from making educational videos on alternative career choices for scientists to a part-time position at the TAIR-predecessor, which eventually led to her becoming the project's director. Costanzo, with a PhD in microbiology, got her start working on a yeast genome database project in the private sector.
Biocuration demands a unique mix of scientific skills and personality traits that differ from those important for bench scientists. "If you're a devoted experimentalist and love to stand all the time in the lab, it's definitely not something for you," says Apweiler. He and other biocurators agree that near-fanatical attention to detail and a love for organization are essential. The job also demands excellent communication skills. "It's definitely more collegial than working in the lab might be," Costanzo says. "All the decisions about how to organize the information and so forth are made in a group." Biocurators also need to be computer literate - for example, understanding Unix and Windows, Apweiler says. Many of the skills, such as html for mocking up database pages, can be learned on the job.
The discipline can lend itself to working remotely, as it is software-based rather than dependant upon hardware or labware. Costanzo works from her home in upstate New York, though she's in constant contact with her colleagues by E-mail, telephone, and computer video chat. The entire scientific team meets face-to-face twice a year, and Costanzo also attends major scientific meetings on yeast in addition to smaller and more specialized meetings.
Biocurators have been holding regular meetings since 2003, and their first international meeting for biocurators took place in December 2005. The biggest employers for biocurators are large-scale research operations such as EBI and the National Center for Biotechnology Information, followed by academic institutions such as Stanford, which has 17 curators working on its three genome databases, all of whom are on staff at the medical school's department of genetics. The National Science Foundation is funding more than 500 database projects, all of which require some biocuration work. Biocurators are also working in the private sector, for example in companies that also produce databases and related products for the life sciences industry, such as Jubilant Biosys, in Bangalore, India, and BIOBASE in Wolfenbüttel, Germany. Some pharma and biotech companies also employ in-house biocurators, for example to annotate genes in a company's pipeline for knockouts and other purposes.
Five Tips for Working Effectively with Biocurators
Helping biocurators do their jobs helps all scientists in their search for information, notes Sue Rhee, a staff scientist at the Carnegie Institution and principal investigator of The Arabidopsis Information Resource. Here are easy ways to help supply accurate information.
1. Use correct nomenclature to describe genes and gene families when writing an article or submitting a new mutation or strain for database inclusion. If you're not sure about any aspect of naming a gene, allele, or strain, don't hesitate to call someone working on the appropriate database to ask for help.
2. Clearly identify the organism and genes you are working with in article titles or abstracts.
3. Identify the strain background you are using when generating, maintaining, and characterizing a mouse line; if a mutant or inbred strain was obtained from a commercial or nonprofit distributor, this information is usually sufficient.
4. Answer seemingly mundane questions from curators: What gene does your probe detect? What is the sequence of your probe? Does your antibody detect a specific gene product? Not responding to questions like these can mean a paper is not cross-referenced.
5. Proactively provide feedback to databases, particularly when you notice errors.
Resources
Biocurator.org (information, including meetings, related to curation of biological data)
http://tesuque.stanford.edu/biocurator.org