FLICKR, DUNCAN HULLDuke University researchers have created a new algorithm to determine the least-repetitive DNA sequence that can encode any given protein, according to a paper published last week (January 4) in Nature Materials. The algorithm, which is freely available online, should allow researchers to more easily experiment with biopolymers and other structurally repetitive polypeptides.

“Repetitive proteins can have important structural and functional properties of interest to materials science and biomedicine, and being able to easily study them and build many variants is quite exciting,” Daniel Goodman, a synthetic biologist at the Wyss Institute for Biologically Inspired Engineering at Harvard University who was not involved in the study, wrote in an email to The Scientist.

“We’re providing a new tool to empower people to do new and cool polymer science using amino acid building blocks,” said study coauthor Ashutosh Chilkoti, a professor of biomedical...

Our cars, homes, and workplaces are filled with polymers—often plastics. “The polymers that have changed the world have largely been synthetic,” Chilkoti said. New biopolymers, encoded genetically, will be biodegradable and nontoxic. And biopolymers can have even more-precise repeating structures than synthetic polymers. “Biology can make a perfect polymer,” said Chilkoti. In addition, the algorithm could be useful for creating other proteins used in molecular biology that have highly repetitive structures, such as gene-editing TALENs.

Repetitive polypeptides can be difficult to produce. Synthetic biologists can produce polypeptides most efficiently by first building and amplifying their novel DNA sequence and then inserting it into workhorse organisms such as Escherichia coli and yeast. Producing the proper genetic code involves synthesizing multiple short DNA sequences, piecing them together, and amplifying them. Repetitive sequences can easily stick to one another in the wrong order. And during PCR, repetitive single-stranded DNA fragments can anneal to each other out of register. Often, that results in “garbage,” Chilkoti said. “What you’d get is a smear on a gel.”

Sixty-one codons encode cells’ 20 amino acids. Researchers have long realized that they can make the sequences encoding repetitive proteins less repetitive by using mixture of different codons for each repeated amino acid in the protein. Chilkoti suggested to Duke graduate student Nicholas Tang that he could try to find a mathematical method to figure out the least repetitive way to genetically encode any protein.

“He had this great insight: figuring out it’s the traveling salesman problem,” Chilkoti said. In the traveling salesman problem, a salesman attempts to find the most efficient route through a variety of cities, beginning and ending in the same city. The goal is to avoid unnecessary repetition, just as Chilkoti and Tang’s goal was to avoid unnecessarily using the same codons repeatedly. Adapting mathematical concepts that had already been applied to the traveling salesman class of problems, the researchers created an algorithm that would allow synthetic biologists to input an amino acid sequence and receive an optimized DNA sequence.

To test their algorithm, the researchers selected 19 repetitive proteins commonly used in materials science and biology, including the protein mussels use to adhere to surfaces, and artificial polypeptides inspired by silks and elastins. Chilkoti and Tang ran the amino acid sequences of their proteins through their algorithm. They then produced DNA sequences encoding each protein either in-house or by sending their sequences to the companies Gen9 or Genscript. The researchers found that they were able to harness E. coli to produce proteins using all of the DNA sequences they had produced.

Previously, researchers hoping to produce repetitive proteins would have needed to arduously assemble their DNA in-house. But using the algorithm, other scientists should be able to cheaply hire companies to make their DNA sequences.

One remaining concern is that proteins with substituted codons may not be optimally expressed by model organisms. “In general, such an approach should be very useful and solve many problematic cases in the field,” Tamir Tuller of Tel Aviv University who was not involved in the work wrote in an email to The Scientist. “However, it is important to remember that the nucleotide composition of a coding sequence can affect various phenomena including gene expression, protein folding, protein aggregation, and organismal (host) fitness.”

Some organisms may prefer one synonymous codon over another during the transcription or translation processes, for reasons that are not yet fully understood. But Chilkoti noted that scrambling codons did not appear to harm expression of the genes that he and Tang tested.

Synthetic biologists would find it useful to synthesize and test large numbers of novel proteins with only slight variations between them, Goodman said. But for this high-throughput approach, efficiency will be of the essence. “What I’m really excited about it the ability to synthesize lots of stuff very cheaply,” said Goodman.  

N.C. Tang, A. Chilkoti, “Combinatorial codon scrambling enables scalable gene synthesis and amplification of repetitive proteins,” Nature Materials, doi:10.1038/nmat4521, 2016.

Interested in reading more?

The Scientist ARCHIVES

Become a Member of

Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member?