Contaminated genomes

Human DNA sequences are found in nearly a quarter of the publically-available non-primate genomes, emphasizing the need for better quality control measures

| 3 min read

Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share
More than 20 percent of non-primate genome sequences from the top public sequencing facilities are contaminated with human DNA, reports a linkurl:study;http://www.plosone.org/article/info:doi/10.1371/journal.pone.0016410 published today (February 16) in PLoS ONE.
A Sanger sequencing read
Image: Wikimedia commons, Loris
This research calls for scientists to work harder to ensure that the genomes they're sequencing do not become contaminated during the sequencing process, and, more importantly, to check for potential contamination in genomes pulled from the public databases on which genomes are normally deposited. "Genome contamination is a big problem -- but it's not new," said linkurl:Jonathan Eisen,;http://bobcat.genomecenter.ucdavis.edu/mediawiki/index.php/Main_Page evolutionary biologist at the University of California, Davis and lead of the phylogenomics program at the United States Department of Energy Joint Genome Institute. "This paper might help remind people of this [issue]."Contamination can be introduced into a genomic sequence at any number of stages. It could be airborne bacteria landing in a sample, or even DNA fragments floating around in reagents, left behind after sterilization. But probably the most common contaminant is the scientist herself. It just takes a skin cell falling into the sample before amplification. "Are you wearing gloves to protect yourself from your sample or your sample from you?" linkurl:Rachel O'Neill,;http://www.oneill.mcb.uconn.edu/R.ONeill_Lab/Home.html paper author and molecular geneticist at the University of Connecticut, wondered. "I think it's a little bit of both."A graduate student in O'Neill's lab was screening genome databases for conserved sequences, and was excited to find the same sequence across diverse species. However, when he tried to replicate the results in the lab, he failed, suggesting that the database genomes were contaminated. So he decided to screen all non-primate genomes housed in four public databases -- University of California, Santa Cruz's genome browser, National Center for Biotechnology Information's GenBank, the Joint Genome Institute, and Ensembl -- for human-specific repetitive sequences known as AluY elements.Of the 2,057 raw sequence genomes searched, 454 contained this human DNA sequence, or 22.39 percent. "The level of contamination we have found is high enough to show concern," said O'Neill. And that's just contamination from human sources, she added -- just imagine how much contamination could exist from species like E. coli or others commonly found in the lab.Eisen noted the flurry of papers reporting horizontal gene transfers between species, such as the linkurl:report;http://mbio.asm.org/content/2/1/e00005-11.long this week in mBio of human DNA acquired by gonorrhea, and wondered if this could simply be an issue of human DNA contaminating the data.The frequency of human contamination requires scientists to do extra experiments, to go above and beyond the norm to confirm their results, Eisen argued. "All you need is one cell to do something weird and you have the potential for all kinds of anomalies.""There is always that lingering doubt," linkurl:Mark Pallen,;http://pathogenomics.bham.ac.uk/staff/mpallen.html a microbial genomicist at the University of Birmingham, said of the gonorrhea sequence, though he added he thinks the gonorrhea example is probably a case of bona fide DNA transfer.The high level of sequence contamination could spell real trouble when it comes to human sequencing, O'Neill said. "Finding an Alu element from a human in a fish sample is very straightforward," she said. "Finding a human sample in a human sample is where the difficulty comes in." Relying on sequencing with such high human contamination to make decisions about personal health could be catastrophic.Moving forward, scientists must invest more in quality control, Eisen said, but the importance of this step can be lost behind the pressure to generate more data. "It would be nice if everybody took a step back and said that the quality of data is also important," he said. "But it's a hard argument to win; it's hard to convince myself in some cases."Longo, M.S., et al. "Abundant Human DNA Contamination Identified in Non-Primate Genome Databases." PLoS ONE, DOI: linkurl:10.1371/journal.pone.0016410;http://www.plosone.org/article/info:doi/10.1371/journal.pone.0016410
**__Related stories:__***linkurl:Sequencing on target;http://www.the-scientist.com/article/display/55645/
[1st May 2009]*linkurl:Bacterial genes jump to host;http://www.the-scientist.com/news/display/53552/
[30th August 2007]*linkurl:DNA Sequencing Industry Sets its Sights on the Future;http://www.the-scientist.com/2004/09/27/44/1/
[27th September 2004]* linkurl:Related F1000 evaluations;http://f1000.com/search/evaluations?query=genome+contamination
[16th February 2011]
Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member? Login Here

Meet the Author

  • Hannah Waters

    This person does not yet have a bio.
Share
TS Digest January 2025
January 2025, Issue 1

Why Do Some People Get Drunk Faster Than Others?

Genetics and tolerance shake up how alcohol affects each person, creating a unique cocktail of experiences.

View this Issue
Sex Differences in Neurological Research

Sex Differences in Neurological Research

bit.bio logo
New Frontiers in Vaccine Development

New Frontiers in Vaccine Development

Sino
New Approaches for Decoding Cancer at the Single-Cell Level

New Approaches for Decoding Cancer at the Single-Cell Level

Biotium logo
Learn How 3D Cell Cultures Advance Tissue Regeneration

Organoids as a Tool for Tissue Regeneration Research 

Acro 

Products

Artificial Inc. Logo

Artificial Inc. proof-of-concept data demonstrates platform capabilities with NVIDIA’s BioNeMo

Sapient Logo

Sapient Partners with Alamar Biosciences to Extend Targeted Proteomics Services Using NULISA™ Assays for Cytokines, Chemokines, and Inflammatory Mediators

Bio-Rad Logo

Bio-Rad Extends Range of Vericheck ddPCR Empty-Full Capsid Kits to Optimize AAV Vector Characterization

Scientist holding a blood sample tube labeled Mycoplasma test in front of many other tubes containing patient samples

Accelerating Mycoplasma Testing for Targeted Therapy Development