Image: Erica Johnson
A healthy volunteer died in a Johns Hopkins asthma study because the researcher missed information about an inhalant's potential dangers. A vendor to a large pharmaceutical company says that the firm wasted almost two years trying to isolate a compound, not realizing that fellow colleagues had already obtained a patent for it. University of Minnesota researchers, as many others do, discovered after three years of research that results they were writing up had already been published.
The common denominator: the difficulty, if not impossibility, of keeping pace with the overwhelming amount of scientific literature and mountains of new data released daily. Throw in more discoveries, more funding, more specialists, and you have more data, more print journals, more E-journals, more room to miss something, more room for mistakes. PubMed, the major literature database for biomedical researchers, adds about 40,000 new entries each month; the complete database (which dates to 1966) has nearly 12 million citations from more than 4,500 journals. Add GenBank, Swiss-Prot, and myriad other databases, and the list grows like kudzu. Factor in the enormous scientific community worldwide who speak in different tongues, who publish in those languages, and the problems with communication become frustratingly clear.
Those who are drawn to hot fields report that information overload is common. "In angiogenesis or apoptosis there can be about 40 papers per week," says Judah Folkman, a professor who studies angiogenesis at Children's Hospital in Boston. In Alzheimer research, the number is double.1
Not everyone is groaning under the weight; some relish it. "I think it's great. ... I don't feel there is information overload," says Nicholas Cozzarelli, a professor of molecular and cell biology, University of California, Berkeley, and the editor of Proceedings of the National Academy of Sciences (PNAS). "The number of researchers has increased, and accuracy has increased, and the number of things to attack has increased."
But precision in finding the right piece of information does not always happen. Paul Leitman chairs a Johns Hopkins committee established after asthma-study volunteer Ellen Roche died; the group is responsible for establishing literature-searching guidelines.2 Leitman says that patients expect investigators to know their stuff. "You wouldn't like to hear how overwhelmed they are with the literature, how they can't possibly cope, and there are only 24 hours in a day ...," he says.
Add another problem: limited funds. Numerous librarians interviewed say it is a struggle to pay for print and E-journal subscriptions. "We can't keep up," says David Osterbur, librarian at Harvard University's Biological Laboratories Library. "[Subscriptions] usually increase 10% to 15% per year. ... It becomes very difficult to choose between the highly specialized journals." Marilyn Tinsley, Information Services librarian at Stanford University's Lane Medical Library, agrees. "We get the Lancet, but not Lancet Oncology." Says Kim Bevis, librarian at the Salk Institute for Biological Studies, "[Researchers] are looking at a smaller and smaller body of research. I just think ... they are missing things."
The plethora of papers obviously has not stopped researchers from working, notes Monica Bradford, Science's executive editor. "But it's constant pressure. You have to do triage of the information you are going to monitor. That's why you see more dependence on [review articles]." Hee-Jeong Im, biotechnologist at Rush-Presbyterian-St. Luke's Medical Center in Chicago, says what happened to her former University of Minnesota colleagues happens quite often. "The only way to avoid it," she says, is "working hard day and night." Adds Nature managing editor Peter Wrobel, "As night follows day, everybody has to publish."
The number of ways to attack this assimilation problem continues to evolve. Better search engines, free journal access, proprietary databases and E-mail alerts are all helping scientists get what they want. But some worry that they are not getting all they need.
SAME PROBLEM, ONLY DIFFERENT Managing volumes of knowledge became an issue soon after humans developed writing; books, libraries, encyclopedias, and periodicals have all contributed to the solution. With the advent of the Web and its vast potential to disseminate, gather, and store information, some Web sites, such as PubMed, established parameters to limit which journals were included. Journal Citation Reports, a product of the Institute for Scientific Information, includes journals that ISI deems internationally influential in their fields. "That was started back in the 1970s," says David Pendlebury, manager, contract research, at ISI. It began as a way to "help librarians pick the right journals for their institutions." Pendlebury says that ISI is the "only database that has a comprehensive citation index."
At the same time, good research was begetting more good research, and scientists were becoming specialists, such as Susan M. Bailey, who studies telomeres at Colorado State University. The load snowballs, she explains, as she follows new information on DNA repair, which leads her to breast cancer research, microarrays, and other related fields. "It generates more information than one person can [read]," says Bailey.
Yuri Rukazenkov, global brand manager at AstraZeneca in Macclesfield, Cheshire, explains that with 50,000 to 100,000 employees worldwide, the company needs procedures to make sure nothing is missed, so more time can be spent on actual research. "We have processes ... for example, we have global cross-functional teams that meet on a regular basis." In addition, AstraZeneca, like other pharmaceutical companies, has a department staffed with trained scientists who follow and analyze the literature. "They will collect and filter it for you. For example, [they will] tell you whether a recent trial was well done or had potential flaws," says Rukazenkov.
Some companies are looking at software that consolidates disparate information into highly structured databases. Providers such as Ingenuity Systems of Alviso, Calif., offer this service for researchers, and BioSpace in San Francisco provides it to those on the business side of research. Joe Horvath, director of knowledge management at Millennium Pharmaceuticals in Cambridge, Mass., says that his company's recent agreement with Ingenuity to develop its own knowledge management system will allow Millennium to gather key internal and external information in a customized searchable system. "It will enable us to get the key findings from the scientific literature into a knowledge base along with our internal knowledge," says Horvath.
An Internet-based bioinformatics-genetics platform called GeneScape was developed by CuraGen of New Haven, Conn., when the company began operations in 1993. A staff of 60 keeps CuraGen researchers current with new data--and relevant journal articles--related to the company's key interests. "They flip a button on the computer; and the results are delivered to the desktop," says Cory Brouwer, the company's group leader of bioinformatics. The system allows for correction methods "when we have found errors."
FREE, OR NEARLY SO Some researchers at universities and smaller companies with limited financial resources have found tools that help manage the load. More are hiring independent research companies, such as Access Information Services in Dayton, Ohio, where president Jodi Gregory specializes in the life sciences arena. Using a product called RadarScreen, which filters out unwanted information, Gregory scans about 300 articles a day and picks out "ones that apply to each individual. ... It's one of the ways they cope with the overload."
Illustration: Ned Shaw
Perhaps the most common programs are the free alerting services that send journal articles, tables of contents, or news directly to users by E-mail. "Alerting services are popular," says Michael Sterns, BioSpace's executive vice president. More than 47,000 BioSpace subscribers get news from more than 800 news sources. PubCrawler, developed in the department of genetics at Trinity College in Dublin, automatically searches both the PubMed literature database and the Entrez (GenBank) gene database based on user-defined keywords.
John Sack is director of HighWire Press in Palo Alto, Calif., a 7-year-old technology service provider for publishers that operates a Web portal for online journals. He attended a conference that prompted HighWire's first alerting service. One guest speaker, from a large lab, described a study designed to see what relevant information had been missed in a literature search. "Each person in the lab ... [was] tracking about 30 journals. ... [Then] they looked at many more journals, like 300, to see what they were missing. The good news is that they weren't missing a lot, but the bad news was that they were missing some. ... They wanted to know how they could find those articles without doing so much more work."
THE NEXT-TO-IT EFFECT Because of online alerting services and widening availability of full-text access, some researchers almost never pick up paper. "We used to spend endless hours physically in the library, and now don't darken a library door because we have access to the Internet," says Hopkins' Leitman. And given that most researchers say they are spending more than 10 hours a week keeping up, they welcome the time saved.
However, murmurings are surfacing that online searching is stifling serendipitous discovery. Keyword searching has serious limitations, many agree. "We should be focusing on tools for discovery, which include search engines and alerting services that go well beyond keyword searching," says HighWire's Sack. For example, when researchers get an article from HighWire, they also see links to articles that cite, and are cited by, that article. The user ultimately ends up at the ISI database.
A company called Collexis, based in the Netherlands, uses a system not dependent on keyword searching, but on concept searching. Barend Mons, co-owner of Collexis and assistant professor, University of Rotterdam, says Collexis uses concept numbers to find information. Using the Medical Subject Headings (MeSH) and other hierarchical thesauri, each concept--regardless of whether it is a disease, protein, or gene--is assigned a number. Epstein-Barr and human herpes virus 4 would have the same number, as would a gene that is known by one name in the United States, and another name in Spain or France. These concepts are then combined to form a fingerprint, which is sold to users. Collexis, which lists Nature Publishing Group and Elsevier among its clients, does not have the information per se-- the various fingerprints link to the respective articles. Collexis abstracts 50 to 100 concepts from a short article. "We can also mine for relationships, like molecule A inhibits molecule B," says Mons. So far, Collexis has MeSH in French, Portuguese, Spanish, and English.
Mons says that traditional searches turn up only about 30% of the information required, and that percentage could get lower in time. With 110,000 identified proteins, and another 600,000 waiting to come, "The tidal waves coming from genomics and proteomics are unmanageable unless we develop very intelligent meta-analysis techniques."
AND NOW FOR SOMETHING COMPLETELY DIFFERENT About 18 months ago, entrepreneur Vitek Tracz launched BioMedCentral.com that allows scientists to have their work peer-reviewed, and, if accepted, have it published on the site, where everyone has instant access. It is an idea, says publisher Jan Velterop, that was not, and still is not, thoroughly embraced. In the beginning, people looked at Tracz like he had three heads. "They still do," Velterop quips. BioMed Central "challenges every axiom of the publishing world. ...They find it difficult to emotionally wake up to it."
So far, of the 2,000 papers submitted, 1,000 have been published, Velterop says. The abstracts are indexed in PubMed, where BioMed Central provides a link listing related articles. BioMed Central, which is co-owner of The Scientist, publishes 75 journals.
Some librarians say they are concerned that E-journals, which have no print counterpart, could disappear, leaving no trace of the actual papers on other Web sites. "With E-journals, you don't get that ownership, so if you drop your subscription later, or if they go out of business, you don't get access," says Stanford's Tinsley.
When the Johns Hopkins tragedy became public, most reports said the researcher missed key references on the inhalant's toxicity because the pertinent information preceded PubMed's earliest index. However, medical librarians who later searched PubMed found references that cast doubt on the inhalant's safety.3 Both Stanford and Johns Hopkins have changed the way they reach out to investigators, moving their searching classes from the library into lab meetings and grand rounds.
PNAS's Cozzarelli sees no negatives, only positives. "I do not see this as a problem. I think it's a healthy sign that "science and biology [are] deepening and broadening, and we should applaud it, and be happy for it."
3. E. Perkins, "Johns Hopkins' tragedy: Could librarians have prevented a death?" Information Today, 2001.