Infographic: Writing with DNA

Researchers devise numerous strategies to encode information into nucleic acids.

Written byCatherine Offord

| 2 min read

Listen with Speechify

0:00

2:00

If just encoding text, one way is to convert each letter of the alphabet into a three-letter code. Using three bases, such as A, C, and T, gives 27 combinations—enough for the English alphabet plus a space—with a code such as AAA = A, AAC = B, and so on (1 in graphic below). However, researchers often want to encode more than just text, so most current methods instead first translate data into binary code—the language of 1s and 0s used in electronic media. Using binary, the four bases of DNA could theoretically store up to two bits of information per nucleotide, with a code such as A = 00, C = 01, and so on (2).

In reality, though, biochemical features of nucleic acids make some combinations of bases more desirable than others. Particularly problematic are homopolymers—long strands of the same nucleotide—which are difficult to write and read using ...

Interested in reading more?

The Scientist ARCHIVES

Become a Member of

Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!

Join for free today

Already a member? Login Here

Meet the Author

Catherine Offord
After undergraduate research with spiders at the University of Oxford and graduate research with ants at Princeton University, Catherine left arthropods and academia to become a science journalist. She has worked in various guises at The Scientist since 2016. As Senior Editor, she wrote articles for the online and print publications, and edited the magazine’s Notebook, Careers, and Bio Business sections. She reports on subjects ranging from cellular and molecular biology to research misconduct and science policy. Find more of her work at her website.
View Full Profile