Literature Forensics: Navigating Through Flotsam, Jetsam, and Lagan

Intimidation and bewilderment are but two feelings scientists often confront when facing the ever-expanding published scientific literature. With the birth of any hypothesis, all fantasies of a one-way freeway for a scientific endeavor evaporate when the journey abruptly confronts a forked-road dilemma. One direction, what is known and what was known, leads back in time. A twisted, rutted, convoluted course, it can reveal how, and from where, pioneers from other, unrelated journeys arrived at th

By | February 18, 2002

Intimidation and bewilderment are but two feelings scientists often confront when facing the ever-expanding published scientific literature. With the birth of any hypothesis, all fantasies of a one-way freeway for a scientific endeavor evaporate when the journey abruptly confronts a forked-road dilemma. One direction, what is known and what was known, leads back in time. A twisted, rutted, convoluted course, it can reveal how, and from where, pioneers from other, unrelated journeys arrived at the same juncture; but it can make for a punishing and, at first thought, boring ride. The other, what is unknown or pretends to be the unknown, quickly recedes into what at least appears to be the unexplored horizon, and its seductive siren can easily win our attention.

Proper navigation of this juncture of old vs. new, past vs. future, dull vs. exciting, known vs. unknown is critical in avoiding a morass of ill fates. These include reinvention, duplication, and attendant ridicule or censure by colleagues for failing to build upon or acknowledge what those before us have done. Following the siren of exploration without investigating where others have traveled is fraught with risks—the worst being when the fork's two branches loop back on one another, revealing that they are one continuum. What had seemed to be uncharted territory is unveiled as a Mobius path toward the fool's gold of rediscovery.

The essence of an ever-expanding map, revealing where science is and how it developed, exists in the combined published literature, unpublished knowledge of experts, and in the insight of visionaries. Unfortunately, this many-dimensional map is highly fragmented, a dynamic jigsaw work that's never fully assembled—the unique cleft of one piece always in need of its neighbor. Since this map cannot self-assemble, it requires the continual effort of specialists to reveal its secrets. As in the maritime world, the literature is strewn with its printed versions of flotsam (floating debris), jetsam (useful things cast overboard to lighten the load), and lagan (useful items discarded but marked for later recovery). But even flotsam may have value after closer scrutiny.

So much has been written about the abuse and neglect of the literature that this essay could possibly have been composed simply by excerpting quotations from those who've written so well about the topic before. Writing about the under-use of the literature is as risky as instructing someone on their mispellings. Little if anything new can be offered here. Only the recounting of a dialog that has been actively under way in a formal sense for many decades, and led by a number of insightful scientists and writers.1 The work of Swanson can be highlighted as one showing practical outcomes.2 So perhaps this missive will simply help fan the flames. The price of quality literature is eternal vigilance!

The major objective of science is to reveal the knowable. To ensure that others can build upon each incremental discovery or paradigm shift, progress is published in the printed and electronic literature. The exponentially expanding literature, however, makes its retrieval, consumption, digestion, and assimilation increasingly difficult. Indigestion from over consumption or from pointless works—or outright poisoning by errant or manufactured data—can conspire to make us forgo our meal and set off on long, ill-conceived treks paying no heed to what's already known.

The literature is a repository for more than just what is known (and at times was known but now forgotten.3 It also holds vast potential for creation of new knowledge simply by our forming new connections among existing points. Synthesis of new knowledge from the literature has been codified in a range of approaches (falling under such rubrics as "knowledge management" and "literature-based knowledge discovery" with "complementary literatures") assisted by continually improving tools developed by information specialists (such as "text mining"). I have referred to this broad endeavor as "literature forensics"—the use of the literature alone to solve mysteries.

Much has been written over the last 50 years about the explosive growth of the literature, how it is abused by neglect or is disregarded,4 what undiscovered knowledge resides within, and even ways to assist us in avoiding indigestion while attempting to feast on the entire banquet (e.g., "knowledge management"). Even within small sub-disciplines it is now rare for a scientist to be versed in its entire literature. Imagine trying to read all that's been published on a particular sub-discipline, then to synthesize it into a definitive review and perspective on the field. Or even more, making connections with other, unrelated fields?

It is paradoxical that many disciplines find it essential to assure the quality of newly acquired data but have little interest in ensuring that the ruts in already traveled roads are not made deeper. Why is quality assurance applied to the performance of science but not to its retrieval and synthesis from the literature? How does one ensure that a quality effort has been made in accessing and capturing the literature? Does your organization have a roadmap for ensuring that prior to (and during) an investigation the published literature is digested and assimilated? By not assuring the quality of literature research (or review), the risks are many, not the least of which is self-humiliation and dilution of the literature by contributing to its bloat. A simple introspective exercise is "What would my planned publication add to the body of knowledge? Does it extend the frontier established by those who ventured before me? Am I adding to our base of knowledge (forming connections, unifying previously disparate and fragmented elements of science), or just adding to the already overwhelmingly large bushel bag of haphazard facts?"

Determining where our work fits within the larger picture is a responsibility that we need to highly value, especially if we are to succeed in its communication.5 Erwin Chargaff's comment,6 "...depth engenders restriction. In the end, we know nearly all about nearly nothing," reflects the danger of a focus too narrow. A reasonable guess as to why so little time is invested with the literature is that the perceived pursuit of supposedly "new" science is valued over rediscovery of the old.

Perhaps the overwhelming scope and depth of the literature is sufficient to scare many from its exploration. After all, the Self-Taught Man (in Jean-Paul Sartre's Nausea) set out to systematically assimilate all that was housed in the library, starting with the "As," only to discover that new materials kept appearing in the sections he had just corralled. Mastering just a small portion of the literature can be akin to sweeping mercury into a dustpan. And not to lose sight of what the literature comprises, peer-reviewed archival publications and books are but one aspect. There is also the "gray" literature: government and university reports, theses, technical advertisements, proceedings and abstracts from scientific meetings, and the dizzying expanse of the World Wide Web—even the proprietary holdings of the private sector.

If one pledges a closer commitment to the literature, much more is possible than its "mining." Discovering, locating, reading, understanding, and archiving the information gleaned from myriad pages of text—these never-ending and time-consuming tasks are but one dimension of the advancement of science. They provide only a foundation for building new knowledge. A world of genuinely new discoveries hides in the morass of unconnected facts and ideas that already populate the literature; no new experiments are required. By identifying new linkages, new knowledge can be synthesized in a paper or electronic laboratory. No fancy equipment required—just your time and access to electronic databases and search tools.

Christian G. Daughton,7PhD (, is chief, Environmental Chemistry Branch, Office of Research and Development, US Environmental Protection Agency, Las Vegas, NV 89119.

1. E. Garfield, "Demand citation vigilance," The Scientist, 16[2]:6, Jan. 21, 2002. See also

2. D.R. Swanson et al, "Information discovery from complementary literatures: categorizing viruses as potential weapons," Journal of the American Society for Information Science and Technology, 52:797-812, 2001.

3. C.G. Daughton, "Literature forensics? Door to what was known but now forgotten," Environmental Forensics, 4:277-82, 2001. (Available at

4. I. Ginsburg, "The disregard syndrome: A menace to honest science?" The Scientist, 15[24]:51, 2001.

5. C.G. Daughton, "Emerging pollutants, and communicating the science of environmental chemistry and mass spectrometry: Pharmaceuticals in the Environment," Journal of the American Society for Mass Spectrometry, 12:1067-76, 2001. (Available at

6. E. Chargaff, Heraclitean Fire, New York: Rockefeller University Press, 1978.

7. The views expressed here are those of the individual author and do not necessarily reflect the views and policies of the US Environmental Protection Agency.

Follow The Scientist

icon-facebook icon-linkedin icon-twitter icon-vimeo icon-youtube

Stay Connected with The Scientist

  • icon-facebook The Scientist Magazine
  • icon-facebook The Scientist Careers
  • icon-facebook Neuroscience Research Techniques
  • icon-facebook Genetic Research Techniques
  • icon-facebook Cell Biology Research
  • icon-facebook Microbiology and Immunology
  • icon-facebook Cancer Research and Technology
  • icon-facebook Stem Cell and Regenerative Science