By long-standing policy, scientific data are not public until a reviewed manuscript is published. Why, then, treat large-scale DNA sequence data differently from all other experimental data? The answer to that question is that substantial information can be found in even raw sequence data, and the amount of information increases as genome sequencing progresses. Thus, hundreds of scientists per genome project (tens of thousands of scientists if summed over all genome extant projects) use public, but unpublished, DNA sequence data to design their own experiments and/or to interpret their own experimental data. Public and private grant agencies have recognized the substantial information within incomplete genome sequences and require early release of sequence data as a condition of funding.

To cite one example, in our participation in the international Malaria Genome Project, we hoped that providing the sequence of the Plasmodium falciparum genome long before publication would jump-start drug discovery and...

