The First DNA Sequence Database

Credit: COURTESY OF GREG HAMM" /> Credit: COURTESY OF GREG HAMM In the middle of 1981, Greg Hamm was a 30-year-old software programmer newly hired by the European Molecular Biology Laboratory to head up its DNA data library-a database that did not yet exist. So he set about making one. "We had journals publishing sequence data in increasingly small point size type, which was useless," he says. "It was clear that one thing that was needed was a transmission format, a way to send the data f

Jeffrey M. Perkel
Apr 1, 2006
<figcaption> Credit: COURTESY OF GREG HAMM</figcaption>
Credit: COURTESY OF GREG HAMM

In the middle of 1981, Greg Hamm was a 30-year-old software programmer newly hired by the European Molecular Biology Laboratory to head up its DNA data library-a database that did not yet exist. So he set about making one. "We had journals publishing sequence data in increasingly small point size type, which was useless," he says. "It was clear that one thing that was needed was a transmission format, a way to send the data from one place to another."

Rather than limiting the system to the hardware at hand, Hamm decided to adopt an "archaic" file format that could be read by relatively simple and sophisticated systems both. "The decision I made was to reach backward in the history of computing to come up with a format," he says. "The idea was, if you had a very sophisticated environment, this format should be easy...

Shown here is a "very early sketch" of Hamm's suggested format. Each line of the file contains specific information, tagged with a two-character identifier, such as DT for date and FT for feature table. Though some things have changed - this drawing doesn't include an accession number field, for one thing - EMBL's data format (the first EMBL record - accession #X0001 - is shown in the inset) remains remarkably the same as Hamm envisioned it 25 years ago.