Dueling Databases

Celera Genomics made hundreds of millions of dollars by selling access to its proprietary genome sequence information.

Ted Agres(tagres@the-scientist.com)
Jul 3, 2005

Celera Genomics made hundreds of millions of dollars by selling access to its proprietary genome sequence information. But this month, Celera discontinued its database subscription service and made its 30 billion base pairs of genomic data of humans, rats, and mice freely available through GenBank, operated by the US National Center for Biotechnology Information.

Some see Celera's decision to exit the sequence business as proof of the adage that information wants to be free, and yet another sign that selling access to data is no longer a viable business model. "The trend is perfectly clear. It would be surprising to find any company setting up a business plan that was based on a subscription database of precompetitive information," says Francis Collins, director of the National Human Genome Research Institute and leader of the Human Genome Project, Celera's publicly funded rival in the race to sequence the human genome.

During the...


Some companies, such as Biobase in Germany, are trying to increase value by curating, annotating, and extending the reach of their databases. Others, like the American Chemical Society's Chemical Abstracts Service (CAS), are attempting to maintain market exclusivity by keeping potential competitors at bay. Novartis and Perlegen Sciences, on the other hand, believe they will generate more business if they allow other researchers access to their proprietary databases. "We don't know if it's collaboration or competition or some combination that will drive science the fastest," Campbell says. "Nobody has studied it before."

Celera knew that its genomic information was a perishable commodity. "There is a time component to the value of information," says Tony Kerlavage, Celera's senior director of online business. In the company's early days, when Celera held a near-monopoly on human genome sequences, pharmaceutical companies and research institutions paid big bucks to access the raw data to locate novel genes and drug targets.

At its height, more than 200 institutions and 25 drug and biotech companies subscribed to the Celera Discovery System (CDS), paying annual fees ranging from thousands to millions of dollars, depending on the number of researchers. Over the years, the CDS was supplanted by such resources as GenBank and Ensembl – a project of the European Bioinformatics Institute and the Sanger Institute. Today, the CDS is useful primarily as a reference source, hence the company's willingness to place it in the public domain. Kerlavage declined to say how many subscriptions expired July 1, closing the service for good.

Three years ago, Celera's parent company, Applera Corp., decided to shift from information to drug discovery and development, and to selling gene expression arrays and diagnostic tools. The move may have been prescient. "If you have complementary services, it may be better to have your data freely available so you can sell more of those other services, whether they are machines or other things," says Arti Rai, a Duke University law professor who focuses on intellectual property in the life sciences.


Not anxious to see its Chemical Abstracts Service Registry follow in the footsteps of Celera's database, the American Chemical Society (ACS) is aggressively lobbying federal officials to curtail development of PubChem, a free online resource on the biochemical structures and properties of some 650,000 small organic molecules. PubChem was initiated in September 2004 as part of the National Institutes of Health Roadmap and is maintained by the National Center for Biotechnology Information.

The 159,000-member ACS claims PubChem is exceeding its mandate and has become a "mini-replica" of the Chemical Abstracts Service Registry, a database of more than 25 million organic and inorganic substances and more than 56 million sequences. "That replica will, over time, pose an insurmountable threat to CAS's survival," simply because it is publicly funded, the ACS said in a statement. That PubChem data are already in the public domain "is completely irrelevant," the ACS added. "If a scientist obtains this data from PubChem, there is no reason to purchase it from the CAS Registry."

NIH officials say PubChem will complement, not compete, with the CAS. The data overlap will be minimal, they say, and PubChem will not offer the detailed manual curation that makes the CAS valuable to its subscribers.

"It's all about money," Collins fumes. "It's hard to see how this very small effort on the part of NIH could represent a significant threat. I am astonished by their very strong negative reaction, especially for a database that's run by a supposedly scientific society."

Revenues from the Registry and other publications yield more than half of the ACS's annual funding, said CAS president Bob Massie in an E-mail. Massie says the CAS has discussed the matter with congressmen from Ohio, where the company is located. The House budget bill for NIH that was drafted in June acknowledges the controversy and "urges NIH to work with private sector providers to avoid unnecessary duplication and completion."

"Twelve people at NIH will put 1200 people at ACS out of business? That's absurd," says Stephen Heller, a consultant and expert in numerical databases who has extensive U.S. government experience. "There is minor overlap between PubChem and the CAS Registry, but basically they are two separate things. One is for chemists and the other is for biologists, and they are two different cultures and they don't talk to each other let alone use the same computer systems and databases," he says. And he notes that the ACS received millions in funding from the National Science Foundation in the late 1950s and early 1960s to develop the technical infrastructure for the CAS Registry system.

"There are structural changes going on in the dissemination of scientific information because of the Internet and because everything has become computer-readable. It's not the same sort of business it used to be," says Heller. "Either you adjust or you have problems."


For years, researchers had enjoyed access to the Yeast Protein Database, a detailed curation of Saccharomyces cerevisiae compiled by James I. Garrels at Proteome Inc., in Beverly, Mass. In 2000, Incyte Corp. (then Incyte Genomics) acquired Proteome for $77 million and began charging for what had previously been free, triggering an outcry from researchers.

Incyte sold Proteome in January 2005 to Biobase, a commercial biological database vendor in Germany, a move that completed Incyte's transition to drug discovery and development. For Biobase, acquiring Proteome's BioKnowledge Library increases the company's depth of offerings – which also include transcription factors and signal transduction databases – and helps solidify its competitive position in the marketplace. "The synergism between Biobase's traditional database portfolio and the BKL range of products will open unprecedented opportunities for the customers to optimally exploit their investments in novel high-throughput approaches," said Biobase President Edgar Wingender in a statement.

Taking a completely different tactic, Perlegen Sciences, a biotech company in Mountain View, Calif., will donate its proprietary database of 1.6 million single nucleotide polymorphisms to the International HapMap Consortium by the end of the year. Perlegen believes that having a completed HapMap will enable it to scan for and develop drugs for specific diseases and patient populations more quickly and effectively than if it kept its database secret.

In an effort to elucidate the underlying genetic basis for type 2 diabetes, and subsequently create therapies to treat it, Novartis is partnering with the Broad Institute to create a public database of genetic variants associated with the disease.

"It's very likely the answers we get will be complex and require a lot of work," says Tom Hughes, head of the diabetes and metabolism disease areas at Novartis Institutes for BioMedical Research in Cambridge, Mass. "We believe it's important to get the best minds to the problem, and the best way is to share data and get people working on it."

Novartis will contribute $4.5 million to the collaboration, which will also involve Leif Groop at Lund University in Sweden, who has collected thousands of genetic samples from diabetes patients. The first data will be posted later this year. "It makes good sense to get the data out there to help the field mature," Hughes says. "It will help us define better what the medicine should look like. We can't do it on our own."

Interested in reading more?

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?