University of Oklahoma graduate student Richard Wilson spent the early 1980s reading DNA. First he’d add four radioactively labeled synthesis-terminating nucleotides—one corresponding to each of the four natural bases—to mixtures of DNA fragments. He’d then load fragments treated with different radioactive bases into separate wells of a polyacrylamide gel and use electrophoresis to separate the strands into a pattern that reflected their length, and, consequently, where the unnatural bases had incorporated.
“It was all very manual,” recalls Wilson, now director of the McDonnell Genome Institute at Washington University in St. Louis. “We used to get the sequencing gels running, go have dinner and probably a few beers. Then we’d come back to the lab around two in the morning, take the gels down, put X-ray film on them, and develop them the next day.”
Wilson and his lab mates would gather in an office and read out the order of bases from the X-ray films, going from shortest fragment to longest fragment, as someone typed the sequence into a computer. In this way, running four electrophoresis experiments at once—“or eight if we were feeling adventurous,” Wilson says—the team completed the 17,553-base mitochondrial genome of the frog Xenopus laevis in about two years (J Biol Chem, 260:9759-74, 1985).
The method, known as dideoxy or Sanger sequencing, was published in 1977 and was one of the first widely adopted techniques for reading DNA. For a decade, it was carried out manually in labs around the world, and used mainly to sequence individual genes and small viral genomes. But things were about to change. DNA sequencing was on the cusp of a technological revolution that would kick-start a succession of ever-faster, cheaper, and more-accurate methods, shifting the field of genomics into the high-throughput powerhouse of scientific data generation that it is today.
“It was clear to us that automating DNA sequencing was really going to be key to the future of biology,” says Leroy Hood, who cofounded Applied Biosystems Inc. (ABI) in 1981 to develop some of the instruments that would drive this revolution. “Molecular biology was coming to the fore, and it was clearly central to understanding biological information in living organisms. . . . Sequencing was going to become very, very important.”
In 1986, ABI announced the first automated DNA sequencer. Although based on the Sanger technique, the new machine used fluorescent, not radioactive, labels. With one color for each nucleotide, that meant sequencing one section of DNA required just one lane in a gel instead of four (Nature, 321:674-79). After electrophoresis, the base sequence could be read from the gel by a computer equipped with a lens and photomultiplier. Later versions of the technology incorporated automatic lane loading, too.
“We thought it would be transformative,” says Kim Worley, a geneticist at Baylor College of Medicine who was involved in the Human Genome Project. “Every lab around the world was spending lots of time analyzing one part of one gene. Giving people all the genes, all at once, so they could just do the biology would be a tremendous benefit.” Ten years and $3 billion later, the Project’s members completed a draft of the human genome.
Working in parallel
As researchers sifted through the data pouring out of these projects, a wave of technologies that would become next-generation (next-gen) sequencing was already gathering steam. These technologies used massive parallelization, with the ability to produce “millions of different sequences simultaneously, but with very short reads,” Hood says.
In the first commercially successful next-gen sequencers, released by 454 Life Sciences in 2005, parallelization was achieved via rapid amplification of small, bead-bound fragments of DNA using polymerase chain reaction (PCR). And nucleotides were read using a technique called pyrosequencing (Nature, 437:376-80, 2005). The system could sequence 25 million bases with 99 percent accuracy in a single 4-hour run—a 100-fold increase in throughput—at less than one-sixth the cost of conventional methods.
The following year, Solexa (acquired by biotech giant Illumina in 2007) presented its take on next-gen sequencing, introducing the technology that is most widely used today. Instead of bead-based amplification, Illumina machines employ a technique called bridge amplification to clone fragments of single-stranded DNA immobilized in a flow cell (Nature, 456:53-59, 2008). The sequences themselves are read using fluorescently labeled nucleotides similar to those of the Sanger method. Along with their offshoots, these technologies have come to dominate research and clinical labs as the cheap and effective sequencers of choice; the release of Illumina’s HiSeq X Ten system in 2014 brought the cost of sequencing a human genome below the $1,000 mark.
“Now, my students, some of whom don’t know any sequencing, think nothing of it,” says Harvard University’s George Church, who pioneered one of the first next-gen bead-based methods back in 2005 (Science, 309:1728-32). “If they change one base pair in the human genome, they’ll send it out for sequencing and check they changed that base pair and nothing else. That’s kind of a ridiculous assay by 1980s standards, but it actually makes sense today.”
The sequencing field shows no signs of slowing down. Today, emerging technologies such as single-molecule real-time (SMRT) and nanopore sequencing are beginning to eliminate the need for amplification, with advantages that go beyond just increasing speed: in addition to reducing PCR-derived bias and permitting longer reads, these single-molecule techniques retain DNA-bound molecules so researchers “could read out methylation and footprinting information,” Church notes, presenting the possibility of obtaining genetic and epigenetic reads simultaneously. (See “Sons of Next Gen,” The Scientist, June 2012.)
Such “third-generation sequencing” is already making its debut in biomedical research. Earlier this year, for example, scientists used Oxford Nanopore’s portable MinION device to classify Ebola strains in West Africa with a genome sequencing time of under 60 minutes (Nature, 530:228-32). The same device is currently being used in Brazil to map the spread of Zika virus across the country in real time, and was used this summer to sequence DNA on the space station.
Of course, these nascent technologies are not without problems, says Wilson. “I would say there’s not much that’s really shown itself to be incredibly robust,” he notes. “If you’re going to use those technologies, either in the research or clinical setting, you’ve got to be able to get consistent results from experiment to experiment. I’m not sure we’re quite there yet.”
But according to Hood, now 77 years old and president of the Institute for Systems Biology in Seattle, that transition is just on the horizon, and will reinforce the remarkably swift scientific progress that has characterized the last 30 years of DNA sequencing. “Living through it, you were very impatient, and always wondered when we’d be able to move to the next stage,” he reflects. “But in looking back, all of the things that have happened since ’85, they’re really pretty astounding.”