Although the basic currency of science is the research article, the fruits of modern laboratory research are often incompatible with the aliquot suitable for publication in a scientific manuscript. Genome-scale inquiry and high-throughput experimentation yield enormous data sets, straining the established article framework; meanwhile, isolated findings or negative results are seldom published at all. Further, it has become obvious that preserving data in its native digital format - with search, annotation, and update capabilities - is desirable. Databases are already the primary form of information storage and access for genomics and protein structure research.
The various shortcomings of the article format have been quietly patched with other modes of communication. The typical reader scans general information first - press coverage, textbooks, and high-level descriptions - before exploring in greater detail through PubMed abstracts, conference presentations, and online data sets.
Scientific information is exchanged in a multi-tiered manner, and these myriad other channels render the scientific manuscript optional, if not obsolete. For instance, those seeking authoritative high-level scientific knowledge can visit the NCBI Bookshelf, an indexed and fully searchable digital archive of textbooks with citations linking directly to PubMed abstracts; a scientist in search of genomic data or bioinformatics software need look no further than online databases or laboratory Web sites. Often the journal article, the bedrock of peer-reviewed scientific knowledge, is the last information source consulted.
While this highlights the importance of nontraditional communication in science, it is also regrettable: After all, journal articles are the main output for which scientists earn recognition, and producing them commands a huge share of our efforts. Meanwhile, virtually no credit is afforded to producing quality high-level summaries or to online data deposition.
Journals must produce more than just papers. Editors should demand online deposit of data as a requirement for publication, and enforce a unified nomenclature for biology. In addition to the traditional manuscript, authors should deliver structured methods and results sections suitable for computer parsing, a lay-friendly news blurb (like those PLoS Medicine includes), and a single PowerPoint slide summarizing the work. This entire body of information should be peer-reviewed, published en masse, and kept in sync, thereby avoiding the current problem of disconjugate articles and data sets.
Broadly, such publishing reform would expand the purview of journals to other tiers of scientific content. The next step is to consolidate all tiers into a single searchable resource. We envision a centralized digital index acting on all information in the biomedical sciences. Just as PubMed indexes journal abstracts in a structured fashion, we propose cataloging a broad range of material, which would enable users to run PubMed-like queries over abstracts, full text, data sets, lay summaries, and presentations, all through a single portal.
Of course, to some degree this goal mimics that of existing entities such as the NCBI's Entrez. The major difference is that the NCBI approach is monolithic: an attempt to amass and house all scientific communication in one place. This is neither realistic nor desirable. We must recognize the plurality of voices contributing to science worldwide. The driving force behind data integration should not be a single American entity; instead, it should be a collaborative effort driven by journals: decentralized information, central access.
This central index would add value by cataloging and interrelating disparate data sources. For instance, a data set might link not only to its companion article, but also to earlier versions of the data, news coverage, reviews, and related talks given by the authors. Community annotation and discussion would add another dimension to peer review, and interested parties of all pedigrees could access information at a level suitable to their needs.
The future of scientific data lies in digital storage and access. It makes sense to revamp academic publishing now to ensure efficient database deposit. Today, considerable resources are poured into extracting data from journal articles; indeed, many databases are still hand-curated by dedicated staff. There will be some up-front costs to implementing this system, but a transition to include machine-readable output will soon pay for itself. Forget "publish or perish." Academic publishing must diversify or die.
Michael Seringhaus is a graduate student in Mark Gerstein's group at Yale University, where Gerstein is A. L. Williams Professor of Biomedical Informatics. firstname.lastname@example.org