Menu

Infographic: Writing with DNA

Researchers devise numerous strategies to encode information into nucleic acids.

Sep 30, 2017
Catherine Offord

If just encoding text, one way is to convert each letter of the alphabet into a three-letter code. Using three bases, such as A, C, and T, gives 27 combinations—enough for the English alphabet plus a space—with a code such as AAA = A, AAC = B, and so on (1 in graphic below). However, researchers often want to encode more than just text, so most current methods instead first translate data into binary code—the language of 1s and 0s used in electronic media. Using binary, the four bases of DNA could theoretically store up to two bits of information per nucleotide, with a code such as A = 00, C = 01, and so on (2).

In reality, though, biochemical features of nucleic acids make some combinations of bases more desirable than others. Particularly problematic are homopolymers—long strands of the same nucleotide—which are difficult to write and read using current methods. One way to avoid homopolymers is by allocating two bases to each binary digit; long runs of the same digit can then be encoded by alternating base pairs (3). A more efficient method is to convert text or other data into a code that employs three digits rather than two, and then write bases so that no base is used twice in a row—for example by encoding 0, 1, and 2 as C, G, and T after an A, but as G, T, and A after a C (4). Newer methods include more complex codes, as well as error-correcting techniques, to pack as much information as possible into DNA while maximizing the accuracy of information retrieval.

Sources for methods depicted: 1. Bancroft et al., 2001; 3. Church et al., 2012; 4. Goldman et al., 2013.

Storage Cycle

After an encoding method is chosen, researchers write the DNA message into a series of long oligonucleotides. In earlier methods, these fragments were each tagged with a unique address sequence to aid reassembly, as well as common flanking sequences that allow amplification by PCR (1). Newer methods incorporate selective retrieval of specific sections of stored data, known as random access, by combining the address and PCR sequences into unique codes on either side of every oligonucleotide. Appropriate primers allow researchers to select and amplify only a sequence of interest (2).

These oligonucleotides are synthesized into tiny test tubes or printed onto DNA microchips, which are stored in a cold, dry, dark place. When the message needs to be read, researchers rehydrate the sample and add primers corresponding to the addresses of the sequences of interest. The amplified product is then sequenced and decoded in order to retrieve the original message.

THE SCIENTIST STAFF

Read the full story.

November 2018

Intelligent Science

Wrapping our heads around human smarts

Marketplace

Sponsored Product Updates

Slice® Safety Cutters for Lab Work

Slice® Safety Cutters for Lab Work

Slice cutting tools—which feature our patent-pending safety blades—meet many lab-specific requirements. Our scalpels and craft knives are well suited for delicate work, and our utility knives are good for general use.

The Lab of the Future: Alinity Poised to Reinvent Clinical Diagnostic Testing and Help Improve Healthcare

The Lab of the Future: Alinity Poised to Reinvent Clinical Diagnostic Testing and Help Improve Healthcare

Every minute counts when waiting for accurate diagnostic test results to guide critical care decisions, making today's clinical lab more important than ever. In fact, nearly 70 percent of critical care decisions are driven by a diagnostic test.

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC’s Genomics division announced it is transforming its branding under LGC, Biosearch Technologies, a unified portfolio brand integrating optimised genomic analysis technologies and tools to accelerate scientific outcomes.