“The work demonstrates how to distinguish somatic variants from those due to DNA preparation damage,” Stanford University’s Stephen Montgomery, who was not involved with the work, wrote in an email to The Scientist. “The benefits of this [include] reduced false positives . . . in discovery-based cancer genome projects,” he added.
It is well known that DNA samples extracted from ancient specimens or from formalin-fixed, paraffin-embedded tissues are prone to fragmentation and chemical modification, which can produce mutations that did not exist in the living organism. But recent evidence suggests that, in fact, any DNA sample may be at risk of such artificial mutagenic damage. DNA sonication—the use of sound energy to agitate the DNA fragments in preparation for amplification and sequencing—is known to induce oxidative damage that introduces mutations.
Such mutations occur only rarely within a sample and so, in many instances, are not problematic. But in cancer biology, explained molecular oncologist Marc Ladanyi of Memorial Sloan Kettering Cancer Center in New York who was not involved with the work, “there is an increasing emphasis on [identifying] sub-clonal mutations [as well as] detecting mutations in free tumor DNA in the plasma,” both of which may be present in only a very small proportion of cells in the sample.
“[When dealing with variants at] such low allele frequencies, this artifact is a genuine concern,” Ladanyi said, “and the paper is a good reminder that the artifact needs to be guarded against.”
Laurence Ettwiller and fellow researchers at NEB in Ipswich, Massachusetts, have now devised an algorithm that calculates the extent of such damage in a sequenced DNA sample. The algorithm makes use of the fact that the oxidative damage of DNA during sonication converts guanine to 8-oxoguanine, which appears and acts like a thymine during sequencing reads. Comparing the sequencing reads of the two complementary strands, these converted guanines can be spotted as mismatches: one strand reads out thymine, but the complimentary strand reveals a partnered cytosine (which pairs with guanine). Naturally occurring guanine-to-thymine variants, on the other hand, would have thymine’s natural partner adenine. The algorithm thus compares the first and second sequencing reads to reveal the degree of mismatching (or imbalanced) thymines to determine the amount of damage.
When applied to sequences in the 1000 Genomes and The Cancer Genome Atlas databases, the algorithm—called Global Imbalance Value (GIV)—determined that 41 percent of the 1000 Genomes datasets had an imbalance score indicative of damage, while 73 percent of those in The Cancer Genome Atlas showed extensive damage.
“The damage is more prevalent than we would have expected,” said NEB’s Thomas Evans, who co-authored the study. Such errors would be likely to confound the identification of true low-frequency somatic variants, he said.
On a positive note, said Ettwiller, “one thing that people can do is to look at samples that they have and flag ones that are too damaged”—essentially, use the GIV algorithm, which is freely available on GitHub, as a quality control step. The GIV score of a sample could also be used as a guide to set stringent thresholds for identifying potentially genuine low-frequency variants.
In addition, the authors suggest a way to rectify the damage before sequencing takes place. When a mix of DNA repair enzymes was added to the DNA sample during preparation, the oxidation damage was fixed, they reported.
“[The paper] provides a technical solution, which is repairing the DNA with this enzyme cocktail,” said Ladanyi. But, he noted, “the authors are from NEB and the solution to the problem is to use the NEB repair kit, so there’s an intrinsic conflict of interest.”
To that point, Ettwiller said that while the team did use NEB enzymes to fix their own damaged DNA samples, they are not asserting it would work for all DNA preparations.
“We do sell that mix for repairing DNA upstream of sequencing, but we don’t want to make any grandiose claims; that’s not how NEB works,” Evans said. “We’re continuing to evaluate it.”
L. Chen et al., “DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification,” Science, 355: 752-56, 2017.