Advertisement

DNA Data Storage

Researchers code a book into DNA, demonstrating the possibility of using the biological molecule for long-term data storage.

By | August 16, 2012

image: DNA Data Storage STOCK.XCHNG, FLAIVOLOKA

Coding messages into DNA was first demonstrated in the 1980s, but technology at the time would only allow one graphical symbol to be encoded. While that capacity has grown over the last 3 decades, the largest project to date, completed in 2010, managed just 7,920 bits of data, equating to approximately half a page of typed text. Using a novel technique, detailed today in Science, researchers at Harvard and Johns Hopkins Universities, have now encoded a 53,000-word book into DNA, including 11 JPG images and one JavaScript program.

"Others have pointed out that DNA has certain advantages," said study co-author Sriram Kosuri. "But no one had really taken it to a level that we were able to code really useful amounts of information."

Those advantages include the density of information that can be stored: an estimate of maximum capacity predicts that one gram of single-strand DNA could store as much as an exabyte (1018 bytes) of data. However, synthesizing and sequencing DNA carries a lot of inherent errors. Synthetic DNA typically has one incorrect nucleotide in every 70, and next gen sequencing techniques can make many mistakes when interpreting the stored data.

To overcome such errors, the team assigned the bases A and C as 0s, and G and T as 1s, creating a digital data stream. The manuscript and its accompaniments—a draft version of a book co-authored by one of the study's authors, George Church, called Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves—was converted to HTML before being translated into the stream of 0s and 1s that could be written into the DNA sequence. The resulting stream was 5.27 megabits long, or 5.27 million 0s and 1s.

Previous methods have faced problems when trying to create whole streams in one long DNA sequence, a tricky and expensive process. The team's solution was to split the stream into smaller sections. They coded 96 bits per short nucleotide section, called an oligonucleotide, each of which contained a 19-bit "address" to order the information in the overall sequence. Each oligonucleotide was synthesized multiple times, so that upon reading, errors could be compared in each copy and a consensus reading could be reached.

"It's a similar in the way that when you sequence the human genome, you don't sequence it once, you sequence it at 30 or 50 times coverage, and you just take consensus at each position," said Kosuri.

After synthesizing the sequence and attaching drops of DNA to microarray chips, the data was stored at 4 degrees Celsius for 3 months before being dissolved in water, amplified by PCR, and sequenced. By storing multiple copies, and sequencing each copy many times to reach consensus, the team managed to decode the entire 5.27-million-bit sequence with only 10 bit errors.

"They've come up with a very clever way of managing error in the creation of the information," said synthetic biologist Steven Benner at the Foundation for Applied Molecular Evolution, who was not involved in the study. "[The authors] provide some clever ways to get around the problems, allowing the reading of the minority molecules containing the desired information amid the larger numbers of molecules that do not."

While DNA storage is not re-writable, and not intended to replace your hard drive, the idea of long-term storage of large amounts of data in a very small space has advantages for archiving records and data. In contrast to a flat disc like a CD, with data only inscribed on the surface, a sheet of DNA has data stored throughout its thickness. The major challenge that remains, however, is the cost and efficiency of today’s synthesizing and sequencing technologies, which currently make this system impractical for regular use. As sequencing costs continue to drop and technologies continue to advance, however, such DNA storage strategies may soon become much more practical.

Another challenge that must be overcome is preservation. DNA can still be sequenced from dried mummies thousands of years old, but such sequences are rarely complete.

"The chemistry of DNA does not easily lend itself to century-scale passive, unpackaged archives," said Benner. "However, this paper should encourage people to tackle the challenges of molecule-based information storage, given its potential for very high density storage."

G. Church et al., "Next-generation digital information storage in DNA," ScienceDOI: 10.1126/science.1226355, 2012.

Add a Comment

Avatar of: You

You

Processing...
Processing...

Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo

Comments

Avatar of: mortonkurzweil

mortonkurzweil

Posts: 1

August 17, 2012

Is this the First Book of Moremen?

Avatar of: EllenHunt

EllenHunt

Posts: 74

August 20, 2012

Oh, for god's sake how ridiculous. Anybody could "code" anything into DNA for a long time. Just define a sequence and send it out for synthesis. It's a lab curiosity of zero utility. You can't read it back in any reasonable time period. We can't read a single strand of DNA with any reliability. You can't synthesize the strand in any reasonable time period. So some grad student got himself a paper or two and some lab managed to snag funding for that silly "project." This is a meaningless waste of time.

Follow The Scientist

icon-facebook icon-linkedin icon-twitter icon-vimeo icon-youtube
Advertisement

Stay Connected with The Scientist

  • icon-facebook The Scientist Magazine
  • icon-facebook The Scientist Careers
  • icon-facebook Neuroscience Research Techniques
  • icon-facebook Genetic Research Techniques
  • icon-facebook Cell Culture Techniques
  • icon-facebook Microbiology and Immunology
  • icon-facebook Cancer Research and Technology
  • icon-facebook Stem Cell and Regenerative Science
Advertisement
Anova
Anova
Advertisement
NeuroScientistNews
NeuroScientistNews
Life Technologies