UNIVERSITY HOSPITAL MUNSTER, GERMANY
While attending the American Society of Microbiology meeting in New Orleans in May 2011, University of Münster medical microbiologist Dag Harmsen started hearing rumors about an E. coli outbreak back in Germany. The culprit bacterium seemed especially nasty, and wasn’t a previously known strain. By late July, the outbreak would infect at least 4,000 individuals, killing 50. An unusually high number (22 percent) of patients developed hemolytic uremic syndrome, a form of sometimes fatal kidney damage triggered by a bacterial toxin.
Because his lab was relatively close to the epicenter of the outbreak, local authorities sent Harmsen some of the first stool samples from infected patients. After a colleague characterized the isolates and developed a rapid screening test for the strain, dubbed O104:H4, Harmsen’s lab set to work sequencing the bacterium. Within 3 days, they were done.
“It took us, to be absolutely exact, 62 hours from DNA to sequence and then to draft genome,” Harmsen says.
Until now, next-gen sequencing was something for a couple hundred genome sequencing centers around the world.
—Dag Harmsen, University of Münster
The University of Münster team—which, in addition to Harmsen, consisted of only two postdocs, a technician, and a handful of bioinformaticians—shared the first-to-genome title with the Beijing Genomics Institute (BGI), a huge sequencing center that had also obtained samples early in the E. coli outbreak. Such rapid sequencing using so little manpower would have been unprecedented 10 years ago, but thanks to continually evolving technologies, quick-fire genome reads may before long become the rule, rather than the exception.
To draft the O104:H4 genome, Harmsen and his team used an instrument (released in December 2010) called the Ion Torrent Personal Genome Machine (PGM) that utilizes out-of-the-box electronics and shuns the complex optics of traditional next-generation sequencers. The relatively affordable price tag, around $50,000, means that even a university hospital with no genomics center can afford it. (Incidentally, BGI also used the Ion Torrent to sequence of the E. coli strain.)
And the Ion Torrent isn’t the only new sequencer that is changing the equation for genomics research. There are a handful of machines, both on the market and in development, that are making the process faster, cheaper, and customizable for experiments involving genomes of any size.
“Until now, next-gen sequencing was something for a couple hundred genome sequencing centers around the world,” Harmsen says. But with these new technologies, “in principle, all hospitals around the world are potential customers.”
Let there be no light
The key to the Ion Torrent PGM’s speed and modest price tag is that it avoids the finicky and complex optics of traditional next-gen techniques. Older sequencers, such as the Illumina HiSeq, involve fluorescent labeling of nucleotides to sequence DNA fragments. These systems use DNA polymerase to construct new strands from the fragments, while a laser excites fluorescently labeled nucleotides and sophisticated optics detect the resulting signal, identifying which base has been added to the sequence.
In contrast, the Ion Torrent machine takes advantage of the hydrogen ion released when DNA polymerase adds a nucleotide during DNA synthesis. The machine starts with amplified snippets of DNA in up to a million or more microwells. It then floods the plate with each of the four nucleotides in succession—first As, then Gs, then Ts, then Cs. If the nucleotide complements that of the DNA template as the sequence is being assembled, it is incorporated, resulting in the release of a hydrogen ion that is detected by a pH sensor and translated into a voltage change recorded by a semiconductor sensor. The process of reading each nucleotide can occur in seconds, and the elimination of light-based optics makes the machine more robust and inexpensive, says Maneesh Jain, vice president of marketing and business development at Life Technologies, which makes the machine. To keep costs down, the company also uses chips made in factories that produce consumer electronics. “It’s like your Xbox in terms of electronics,” Jain says.
In January, Ion Torrent released a more powerful machine, called the Ion Proton, which the company says will allow a large sequencing center to churn out a human genome in a single day for the long-sought-after price of $1,000. One limitation of the Ion Torrent, however, is that it may miscount long stretches of the same base pair, says Keith Robison, an informatician at Cambridge, Massachusetts-based biotech Warp Drive Bio, who has analyzed data from several sequencing machines. That can make it hard to identify insertions or deletions in a cancer genome, for instance. The Ion Torrent is best for applications where speed and cost are key, he says.
Pacific Biosciences also has a new machine on the market: the Single Molecule Real Time (SMRT) sequencer, commercially released in April 2011. Like more traditional machines, SMRT relies on fluorescently labeled nucleotides, but there’s a twist: the instrument sequences just one molecule at a time, cutting out the time-consuming step of amplification. The system traps a single DNA molecule, along with a DNA polymerase, in one of 150,000 tiny holes—tens of nanometers in diameter and etched in an aluminum film—then floods the surface with all four fluorophore-labeled nucleotides, says Eric Schadt, Pacific Biosciences’s chief science officer. As DNA polymerase attaches each nucleotide to the sequence, laser beams of two different wavelengths illuminate each of those holes, the geometry of which permits only the nucleotides that match the growing DNA strand to fluoresce. A sophisticated camera can then detect the particular nucleotide being incorporated into each chain based on the color of light emitted. “You’re literally looking at a single molecule of DNA,” Schadt says.
Rather than carefully aligning the sequences of many DNA fragments (which gradually go out of sync because of accumulated errors), the SMRT machine can generate extremely long sequencing reads—on average more than 1,000 base pairs, but occasionally as long as 10,000—a massive chain compared with the 35- to 400-base-pair reads of traditional next-gen technologies.
This ability to sequence very long DNA fragments is handy for Simon Chan, a geneticist at the University of California, Davis, who is trying to understand whether the repetitive, rapidly evolving DNA of centromeres is involved with speciation. By analyzing the sequential pattern of different repeat types, the team can trace how new repeat combinations arise. But because each sequence is so similar, it’s hard to take short DNA fragments and assemble them with any certainty that you have the right order, Chan says. “Having really long reads allows us to address that in a much more comprehensive way.”
The SMRT’s steep price tag, however—in the same ballpark as the Illumina HiSeq at nearly $700,000 (US price)—means that only a few dozen genome centers have purchased the machine to date. The machine also has a fairly high error rate—“around 15 percent on a good run,” Robison said, compared to about 1 in 1,000 for the Illumina HiSeq. But because the errors are random and the machine can read such long snippets of DNA, it’s possible to cinch the DNA into a circle and do multiple runs around the same snippet to get a more accurate read, he says. That makes the machine most useful for applications such as haplotyping, where really long runs are needed, or de novo genome assembly, where individual errors are less critical, he says.
Another technology on the horizon does away with both amplification and optics. UK-based Oxford Nanopore Technologies is currently developing a machine called the GridION, which centers on membrane pores just big enough for DNA nucleotides to squeeze through. The company has isolated and genetically modified bacterial nanopores, such as Staphylococcus aureus’s a-hemolysin, a protein whose real-life role is to breach the membranes of the bacterium’s host cells and allow passage of ions and other molecules. The nanopores are inserted into an artificial lipid bilayer, placed in individual microwells tens of micrometers wide, and arrayed on a sensor chip. As each nucleotide or single strand of DNA travels through a channel—at a pace managed by a specially designed enzyme—it disrupts a current running through the pore, and the change is measured by a semiconductor sensor. Because each base disrupts the electric field in a slightly different way, those current changes can then be translated into a DNA sequence.
Like the Ion Torrent system, the GridION will do away with fluorescent labels and lasers, relying instead on semiconductor detection of nucleotides. And like the Pacific Biosciences machine, it won’t require amplification of DNA fragments before sequencing and can generate reads that are thousands of base pairs long—up to 48 kilobases.
The GridION system is made up of “nodes,” each of which takes a single-use reagent cartridge containing 2,000 nanopores that can produce tens of gigabases of sequence every day. A powered-up, 8,000-nanopore cartridge is in the works for 2013, and in certain configurations is expected to deliver a complete human genome in 15 minutes. Oxford Nanopore is also developing a disposable sequencing device called the MinION the size of a USB stick that will retail for $900 and bring sequencing to the masses.
The company announced a 4 percent error rate for the GridION at a conference in February and is currently aiming to reach an error rate of 0.1 to 2 percent by its commercial launch later this year. Still, it’s hard to say how the GridIon will stack up against existing systems, as it has not been commercially released yet, Robison says. “In the end, you really have to demonstrate data.”
Compressing the Data
While new sequencing technologies aiming to bring high-throughput sequencing into everyday academic labs hold much promise for a wide range of biomedical research, they also threaten to magnify an already growing problem—what to do with all the data. Even with projects that employ more traditional next-gen techniques, which are often outsourced to one of a select few genome centers, terabytes of sequence data are being sent back to the original labs. (See “Sequence Analysis 101,” March 2011.)
“For many biologists, this is the first time that they’ve had to use this amount of data,” says Ewan Birney, an informatician at the European Bioinformatics Institute in the United Kingdom, which in partnership with the US government’s National Center for Biotechnology Information aims to archive all the sequencing data generated around the world.
Several teams are developing ways to stem the sequencing data deluge. Birney and his colleagues have developed an algorithm that can compress sequence information 5- to 50-fold smaller than can standard compression techniques alone. The system uses basic principles derived from image and video compression (like that used for YouTube or satellite TV), but also takes advantage of the fact that sequence data is highly redundant. The algorithm compares each read to a reference sequence and marks places where the two differ, while ignoring spots that are the same. For instance, only about one in 10,000 base pairs in a newly sequenced human genome differs from the reference human genome, so the vast majority of the data can be ignored.
Pharma companies and biotechs are also launching efforts to deal with the lack of an open-source framework for sharing gene-sequencing data. For example, many have decided to participate in a not-for-profit group called the Pistoia Alliance that aims to do just that, says Nick Lynch, an alliance founder.
In October 2011, the alliance announced the Sequence Squeeze challenge. Teams compete to see who can compress a reference sequence into the smallest size in the least amount of time. Part of the idea behind the challenge was to entice researchers who are used to dealing with massive amounts of data, such as computer scientists or astrophysicists, to apply their knowledge to sequencing problems. The winner, announced on April 23, James Bonfield of the Wellcome Trust Sanger Institute, was able to compress the original sequence data to a very small size, while also making the process of compression and decompression rapid. “We hope [this technique] will become part of the way that this sequence data is sent around the globe,” Lynch says.