Menu

Abundant Sequence Errors in Public Databases

A new algorithm reveals hoards of preparation-induced DNA mutations in publicly available human sequences.

Feb 16, 2017
Ruth Williams

 

FLICKR, SAURI NASHSome sequence variants found in DNA specimens may actually be caused by damage during sample processing, according to a paper in Science today (February 16). A team of researchers at New England Biolabs (NEB) has devised an algorithm for assessing the degree of such damage, and suggests that using DNA repair enzymes during sample preparation might rectify the problem.

“The work demonstrates how to distinguish somatic variants from those due to DNA preparation damage,” Stanford University’s Stephen Montgomery, who was not involved with the work, wrote in an email to The Scientist. “The benefits of this [include] reduced false positives . . . in discovery-based cancer genome projects,” he added.

It is well known that DNA samples extracted from ancient specimens or from formalin-fixed, paraffin-embedded tissues are prone to fragmentation and chemical modification, which can produce mutations that did not exist in the living organism. But recent evidence suggests that, in fact, any DNA sample may be at risk of such artificial mutagenic damage. DNA sonication—the use of sound energy to agitate the DNA fragments in preparation for amplification and sequencing—is known to induce oxidative damage that introduces mutations.

Such mutations occur only rarely within a sample and so, in many instances, are not problematic. But in cancer biology, explained molecular oncologist Marc Ladanyi of Memorial Sloan Kettering Cancer Center in New York who was not involved with the work, “there is an increasing emphasis on [identifying] sub-clonal mutations [as well as] detecting mutations in free tumor DNA in the plasma,” both of which may be present in only a very small proportion of cells in the sample.

“[When dealing with variants at] such low allele frequencies, this artifact is a genuine concern,” Ladanyi said, “and the paper is a good reminder that the artifact needs to be guarded against.”

Laurence Ettwiller and fellow researchers at NEB in Ipswich, Massachusetts, have now devised an algorithm that calculates the extent of such damage in a sequenced DNA sample. The algorithm makes use of the fact that the oxidative damage of DNA during sonication converts guanine to 8-oxoguanine, which appears and acts like a thymine during sequencing reads. Comparing the sequencing reads of the two complementary strands, these converted guanines can be spotted as mismatches: one strand reads out thymine, but the complimentary strand reveals a partnered cytosine (which pairs with guanine). Naturally occurring guanine-to-thymine variants, on the other hand, would have thymine’s natural partner adenine. The algorithm thus compares the first and second sequencing reads to reveal the degree of mismatching (or imbalanced) thymines to determine the amount of damage.

When applied to sequences in the 1000 Genomes and The Cancer Genome Atlas databases, the algorithm—called Global Imbalance Value (GIV)—determined that 41 percent of the 1000 Genomes datasets had an imbalance score indicative of damage, while 73 percent of those in The Cancer Genome Atlas showed extensive damage.

“The damage is more prevalent than we would have expected,” said NEB’s Thomas Evans, who co-authored the study. Such errors would be likely to confound the identification of true low-frequency somatic variants, he said.

On a positive note, said Ettwiller, “one thing that people can do is to look at samples that they have and flag ones that are too damaged”—essentially, use the GIV algorithm, which is freely available on GitHub, as a quality control step. The GIV score of a sample could also be used as a guide to set stringent thresholds for identifying potentially genuine low-frequency variants.

In addition, the authors suggest a way to rectify the damage before sequencing takes place. When a mix of DNA repair enzymes was added to the DNA sample during preparation, the oxidation damage was fixed, they reported.

“[The paper] provides a technical solution, which is repairing the DNA with this enzyme cocktail,” said Ladanyi. But, he noted, “the authors are from NEB and the solution to the problem is to use the NEB repair kit, so there’s an intrinsic conflict of interest.”

To that point, Ettwiller said that while the team did use NEB enzymes to fix their own damaged DNA samples, they are not asserting it would work for all DNA preparations.

“We do sell that mix for repairing DNA upstream of sequencing, but we don’t want to make any grandiose claims; that’s not how NEB works,” Evans said. “We’re continuing to evaluate it.”

L. Chen et al., “DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification,” Science, 355: 752-56, 2017.

November 2018

Intelligent Science

Wrapping our heads around human smarts

Marketplace

Sponsored Product Updates

The Lab of the Future: Alinity Poised to Reinvent Clinical Diagnostic Testing and Help Improve Healthcare

The Lab of the Future: Alinity Poised to Reinvent Clinical Diagnostic Testing and Help Improve Healthcare

Every minute counts when waiting for accurate diagnostic test results to guide critical care decisions, making today's clinical lab more important than ever. In fact, nearly 70 percent of critical care decisions are driven by a diagnostic test.

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC announces new, integrated, global portfolio brand, Biosearch Technologies, representing genomic tools for mission critical customer applications

LGC’s Genomics division announced it is transforming its branding under LGC, Biosearch Technologies, a unified portfolio brand integrating optimised genomic analysis technologies and tools to accelerate scientific outcomes.

DefiniGEN licenses CRISPR-Cas9 gene editing technology from Broad Institute to develop cell models for optimized metabolic disease drug development

DefiniGEN licenses CRISPR-Cas9 gene editing technology from Broad Institute to develop cell models for optimized metabolic disease drug development

DefiniGEN Ltd are pleased to announce the commercial licensing of CRISPR-Cas9 gene-editing technology from Broad Institute of MIT and Harvard in the USA, to develop human cell disease models to support preclinical metabolic disease therapeutic programmes.