Coding messages into DNA was first demonstrated in the 1980s, but technology at the time would only allow one graphical symbol to be encoded. While that capacity has grown over the last 3 decades, the largest project to date, completed in 2010, managed just 7,920 bits of data, equating to approximately half a page of typed text. Using a novel technique, detailed today in Science, researchers at Harvard and Johns Hopkins Universities, have now encoded a 53,000-word book into DNA, including 11 JPG images and one JavaScript program.

"Others have pointed out that DNA has certain advantages," said study co-author Sriram Kosuri. "But no one had really taken it to a level that we were able to code really useful amounts of information."

Those advantages include the density of information that can be stored: an estimate of maximum capacity predicts that one gram of single-strand DNA could store as...

To overcome such errors, the team assigned the bases A and C as 0s, and G and T as 1s, creating a digital data stream. The manuscript and its accompaniments—a draft version of a book co-authored by one of the study's authors, George Church, called Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves—was converted to HTML before being translated into the stream of 0s and 1s that could be written into the DNA sequence. The resulting stream was 5.27 megabits long, or 5.27 million 0s and 1s.

Previous methods have faced problems when trying to create whole streams in one long DNA sequence, a tricky and expensive process. The team's solution was to split the stream into smaller sections. They coded 96 bits per short nucleotide section, called an oligonucleotide, each of which contained a 19-bit "address" to order the information in the overall sequence. Each oligonucleotide was synthesized multiple times, so that upon reading, errors could be compared in each copy and a consensus reading could be reached.

"It's a similar in the way that when you sequence the human genome, you don't sequence it once, you sequence it at 30 or 50 times coverage, and you just take consensus at each position," said Kosuri.

After synthesizing the sequence and attaching drops of DNA to microarray chips, the data was stored at 4 degrees Celsius for 3 months before being dissolved in water, amplified by PCR, and sequenced. By storing multiple copies, and sequencing each copy many times to reach consensus, the team managed to decode the entire 5.27-million-bit sequence with only 10 bit errors.

"They've come up with a very clever way of managing error in the creation of the information," said synthetic biologist Steven Benner at the Foundation for Applied Molecular Evolution, who was not involved in the study. "[The authors] provide some clever ways to get around the problems, allowing the reading of the minority molecules containing the desired information amid the larger numbers of molecules that do not."

While DNA storage is not re-writable, and not intended to replace your hard drive, the idea of long-term storage of large amounts of data in a very small space has advantages for archiving records and data. In contrast to a flat disc like a CD, with data only inscribed on the surface, a sheet of DNA has data stored throughout its thickness. The major challenge that remains, however, is the cost and efficiency of today’s synthesizing and sequencing technologies, which currently make this system impractical for regular use. As sequencing costs continue to drop and technologies continue to advance, however, such DNA storage strategies may soon become much more practical.

Another challenge that must be overcome is preservation. DNA can still be sequenced from dried mummies thousands of years old, but such sequences are rarely complete.

"The chemistry of DNA does not easily lend itself to century-scale passive, unpackaged archives," said Benner. "However, this paper should encourage people to tackle the challenges of molecule-based information storage, given its potential for very high density storage."

G. Church et al., "Next-generation digital information storage in DNA," ScienceDOI: 10.1126/science.1226355, 2012.

Interested in reading more?

Become a Member of

Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member?