FLICKR, COL FORD AND NATASHA DE VEREDNA from diverse species—including bacteria, plants, and humans—contaminates nearly every sample sent through a next-generation sequencer, according to a study published today (October 29) in PLOS ONE.
DNA contamination is a known problem, illuminated by the use of today’s uber-sensitive tools and techniques, which can detect, amplify, and sequence even a single molecule of the nucleic acid. This latest analysis finds that the degree of contamination is inversely related to the starting concentration of a sample, demonstrates that “blank” controls may not be sufficient, and makes the case that a recent study claiming to have found evidence of food-derived DNA in the human bloodstream may have been the result of contamination.
Together with other recent documentations of the widespread nature of DNA contamination, this study suggests “that these really eye-popping kind of papers that tell us there are foreign sequences in everything—that our nucleic acids are this mishmash of everything that we interact with in the environment—they may very well be overstating the point,” said Ken Witwer, a molecular biologist at Johns Hopkins Medicine who was not involved in the work.
“I believe this issue of possible DNA contamination in next-gen sequencing experiments is an extremely relevant issue that could explain a variety of unexpected results,” agreed sequencing pioneer Leroy Hood, president of the Institute for Systems Biology in Seattle, Washington, in an e-mail. “It is essential that proper precautions be taken in these experiments.”
Geneticist Richard Lusk of the University of Michigan became interested in the consequences of DNA contamination when he was asked to review a study that supposedly documented the presence of food-derived DNA in the human bloodstream. He raised his concerns about possible contamination at the time, but ultimately got word that the study was accepted for publication in PLOS ONE.
“I thought this could be a paper that would be alarming to quite a few people . . . and it didn’t seem like it was correct,” Lusk told The Scientist. So he set out to demonstrate what he considered a much simpler explanation for the findings.
He started by identifying four datasets of what might be considered “clean” DNA information, such as that derived from individually washed cells. In each, he carefully analyzed the sequences reported, and found that DNA contamination was rampant, with molecules originating from all sorts of organisms arising in the data. While negative-control libraries prepared from “blank” samples—empty Eppendorf tubes treated the same as those containing real samples—did recover the most common contaminants (meaning they could be used to eliminate matching sequences from the test samples), the controls often missed low-frequency contaminants. Samples with low starting concentrations were most susceptible to contamination, because they required greater amplification.
In order to specifically test his theory that the PLOS ONE results he had reviewed could be explained by contamination, Lusk selected “datasets where I could be almost positive [had] nothing to do with food,” he said. Sure enough, in each dataset, “I could find the same [sequences] that the [authors of the PLOS ONE] paper found,” Lusk noted. To him, it was the confirmation he’d been looking for: there was an alternative explanation for the results pointing to the presence of food-derived DNA in the human bloodstream.
The lead author of that paper, Sándor Spisák, who’s now at the Dana-Farber Cancer Institute in Boston, stands by his team’s results, however. “The presences of foreign molecules are not enough to conclude or predict their source,” he told The Scientist in an e-mail. “Compared to the relatively sterile laboratory environment, our gastrointestinal tract is continuously processing a huge amount of foreign DNA, just few microns away from blood vessels, and fluid exchange cannot be excluded, especially in case of inflammation.” Moreover, his group has recently identified a possible mechanism by which food-derived DNA could make it to the bloodstream, Spisák added. “According to our results, this is more complicated issue.”
Kaare Magne Nielsen, a microbial geneticist of the Arctic University of Norway who was not involved in either study, agreed that macromolecules like DNA fragments could travel from the gut to the bloodstream. He pointed to studies in mice and other mammals that even show a temporal correlation between when food is ingested and when the food-derived nucleic acids show up in the animal’s blood. “I think there is quite some evidence that food derived DNA enters the bloodstream,” he said. However, he added, “I can be skeptical to what extent full-length DNA [does so],” and contamination may need to be considered as a possible explanation for some of the longer sequences identified in human blood by Spisák’s group.
Additional work is needed to confirm or reject Spisák’s results, but Lusk’s study highlights the broader concerns surrounding DNA contamination, which are only going to become more apparent as technologies improve.
“We are pleased that other researchers have turned their attention toward this phenomenon, and our work was used as a source of their analyses,” Spisák said. “In any experiment, especially when it involves relatively new methodologies like NGS [next-gen sequencing], it is very important to identify possible artifacts and properly separate signal from noise.”
Running repeat samples is important, said Witwer, as is careful experimental design. “A contaminant you would expect to be present no matter the concentration of the sample of interest. That raises the possibility of doing dilution experiments that would perhaps help you separate the contaminate background from the real signal.”
No matter how researchers doing high-throughput sequencing choose to control their experiments, it’s most important to simply be aware of the problem, said Lusk. “This is a really powerful technology,” he said. “Powerful enough to detect things that weren’t actually originally in the sample.”
R.W. Lusk, “Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data,” PLOS ONE, 10.1371/journal.pone.0110808, 2014.