<figcaption> Credit: COURTESY OF GREG HAMM</figcaption>

In the middle of 1981, Greg Hamm was a 30-year-old software programmer newly hired by the European Molecular Biology Laboratory to head up its DNA data library-a database that did not yet exist. So he set about making one. "We had journals publishing sequence data in increasingly small point size type, which was useless," he says. "It was clear that one thing that was needed was a transmission format, a way to send the data from one place to another."

Rather than limiting the system to the hardware at hand, Hamm decided to adopt an "archaic" file format that could be read by relatively simple and sophisticated systems both. "The decision I made was to reach backward in the history of computing to come up with a format," he says. "The idea was, if you had a very sophisticated environment, this format should be easy...

Shown here is a "very early sketch" of Hamm's suggested format. Each line of the file contains specific information, tagged with a two-character identifier, such as DT for date and FT for feature table. Though some things have changed - this drawing doesn't include an accession number field, for one thing - EMBL's data format (the first EMBL record - accession #X0001 - is shown in the inset) remains remarkably the same as Hamm envisioned it 25 years ago.

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!