A new algorithm published yesterday (July 2) in Nature Biotechnology takes the best of second- and third-generation sequencing technologies to produce fuller and more accurate whole genome sequences.
Second-generation sequencers read short DNA snippets—between 100 and 700 base pairs long—then stitch them together to produce a full genome. However, stitching them in the correct order remains a challenge. Third-generation sequencers, on the other hand, can read long stretches of DNA at once, but are more prone to errors. The new algorithm, developed by researchers at the National Biodefense Analysis and Countermeasures Center in Frederick, Maryland, corrects the sequences obtained from third-generation sequencers using the short reads of their second-generation counterparts.
The researchers tested the new algorithm on the Escherichia coli and yeast genomes, and found it increased accuracy by up to 99.9 percent. The algorithm was also used to sequence the genome of the common pet parakeet (Melopsittacus undulates) for the first time.
This new approach may be useful for researchers interested in sequences that lie beyond gene-coding regions, Adam Phillippy, the bioinformatics researcher who led the study, told Nature. “Normally people just get the genes out, but you lose structural information,” he said.
For example, the algorithm may be particularly adept for transcriptome analysis, which could benefit from reads that scan an entire messenger RNA sequence, Elaine Mardis, co-director of the Genome Institute at Washington University in St Louis, Missouri, told Nature. (Read a profile on Elaine Mardis from The Scientist’s January 2012 issue).