The Automated DNA Sequencer

<p>Timeline:</p>

Sequencing Milestones

As a graduate student at Stanford University in the early 1990s, Jonathan Eisen convinced a friend with access to one of the first automated DNA sequencers to run 10,000 base pairs for him. "Doing it myself, without an automated sequencer, would have taken at least a year, and it wouldn't have been particularly accurate," he recalls. Instead, Eisen got the entire sequence in just two weeks. "I never did manual sequencing again," says Eisen. "Even a simple sequence. There was no point."

The automated sequencer irrevocably altered Eisen's work, as it did for every single life scientist working today. Without it, researchers likely would still be working on the first draft of the human genome, and slowly spitting out sequences on yearly timetables that take just hours to complete today. Eisen is now an evolutionary biologist at The Institute for Genomic Research (TIGR) in Rockville, Md., an...

CALTECH ROOTS

The "Aha!" moment in the automated sequencer's development occurred in the early 1980s at the California Institute of Technology. Back then, scientists knew manual sequencing was holding them back – it provided only short read lengths, required four lanes per sample, used radioactivity, and forced people to manually read off and key in sequences, providing an additional source of human error. But just adding robots wouldn't do, says Dennis Gilbert, chief scientific officer at Applied Biosystems – the technology needed to change in fundamental ways.

The key was switching from a single radioactive nucleotide to four fluorescently labeled ones, explains Gilbert. Lloyd Smith, primary developer of the first automated sequencer, says he got involved in the project during late-night brainstorms with Tim Hunkapiller, who would "hang around the hallways" while Smith was a postdoc at Caltech in the 1980s. Smith brought a background in instrumentation and fluorescence to the project, and recalls that the idea for using four dyes to label each of the four bases "kind of bubbled up." Hunkapiller was working in Leroy Hood's lab, also at Caltech, and participated in the initial development of ideas for the automated sequencer, Smith notes.

Collaborators Tim and Mike Hunkapiller (Tim's brother, who would eventually become president of Applied Biosystems), Hood, and Smith, realized they would only need one lane per sequence if they used four fluorescent dyes. "Once you knew you could do that, we knew we were on our way," Gilbert says. Smith then ordered every kind of fluorophore he could, and used "trial and error" to find four that were clearly distinguishable. He ended up with fluorescein, NBD, tetramethylrhodamine, and Texas Red.

Smith, now at the University of Wisconsin-Madison, leaped the other significant hurdle, coupling the dye to DNA, by using then-newly emerged phosphoramidite chemistry. "That was a big breakthrough," he says. "For the first time, there was some chemistry that enabled you to do this." The final system was announced in Nature in 1986, and commercialized the same year.¹

CHANGING TO MEET THE NEED

Rick Wilson, director of the genome-sequencing center and professor of genetics at Washington University in St. Louis, which has over 100 of the latest generation sequencers from Applied Biosystems, recalls that his first automated sequencer was difficult to use. The software was "awful," he says, and he had to sit in front of the machine for over an hour, then tell the computer where on the gel to start collecting data. Early users learned that you needed a good sample to get good sequence data, says Mark Adams of Case Western Reserve University in Cleveland, Ohio, cofounder of Celera Genomics, a company that spearheaded the private human genome sequencing effort. The first time he used the automated sequencer, it was "initially reasonably frustrating," Adams says.

The instruments have come a long way since then. In the biggest shift, Applied Biosystems moved from slab gels – which can be thwarted by mess and human error – to capillary electrophoresis, says Gilbert. Today, the company's sequencers separate DNA in a capillary no wider than a human hair, and the 3730xl can run 96 of them in parallel. Beckman Coulter of Fullerton, Calif., and GE Healthcare of Chalfont St. Giles, UK, also sell capillary systems. But LI-COR Biosciences in Lincoln, Neb., continues to sell automated sequencers using slab gels, says Larry Midden-dorf, vice president of sales and marketing for the company's environmental product line, citing greater accuracy and longer read lengths.

Automated sequencing in the clinic

For all the hype, genome sequencing is still just a bit player in the clinic at best. It's prohibitively expensive and time-consuming to read a patient's DNA, and no one knows what the vast majority of the sequence means. But that could be changing, experts say. "The whole point of sequencing the human genome was to improve human health," says Eric Green of the National Human Genome Research Institute. "Imagine a day when part of a routine clinical exam might be the acquisition of sequence data."

Washington University's Rick Wilson, whose lab is currently investigating the genetic underpinnings of non-small cell lung cancer, says his team has identified mutations that correlate with response to treatment. This is the essence of pharmacogenetics, and Wilson says he expects to see more of it in the future.

Most clinics will likely use automated sequencers to run fragment analysis and look for single nucleotide polymorphisms, rather than sequence patients' entire genomes, predicts Noreen Galvin, GenomeLab business center manager at Beckman Coulter. "That's something that's very applicable to the capillary system, in a clinic," she says.

But there are many clinical situations where an entire sequence still comes in handy, says Applied Biosystems' Dennis Gilbert. For instance, HIV-positive patients benefit from knowing the entire sequence of the virus they carry since some drugs respond better to some strains and the virus is always changing. People battling viruses like avian flu may also fare better if doctors know, without a doubt, which strain they carry, notes Gilbert.

Still, even if researchers learn more about what genes cause disease, and how to interpret reams of sequence data in a way that benefits health, other experts will have to work out the legal and ethical ramifications of that information. For instance, should people know how many disease-carrying genes they have? "We're probably going to make these technical advances, but what are we going to do with that information?" asks Max-Planck's Glenis Wiebe.

Throughput has also improved. Applied Biosystem's first instrument contained 16 lanes and could sequence up to 6,400 bases in 24 hours; the latest generation technology, the model 3730, with appropriate upstream automation, can spell out up to 2 million base pairs in the same time period, and at half the cost of the earlier model 3700. "It's been a continual evolution of the instruments, chemistry, and so forth," Gilbert says.

Not every company is looking to play at this level. As a company of around 200 people, LI-COR chooses not to compete with Applied Biosystems, which employs over 4,200, says Ron Stolley, director of sales and marketing for biotechnology. Instead, the company targets "niche markets," such as labs that do AFLP or tilling analysis and microsatellite work, Stolley says.

The company also targets educational labs, which typically use instruments sporadically and require minimal maintenance and operational costs, says Jeff Harford, LI-COR production manager for genomics. Reflecting that focus, LI-COR is offering $1.75 million in matching funds for high schools and colleges to acquire its sequencing tools and software. Cheryl Kerfeld, director of the undergraduate genomics research institute at the University of California, Los Angeles, says she prefers using LI-COR instruments with her students precisely because they are less automated, as this gives students a better idea of what happens during the experiment. In any event, Kerfeld adds, most biology textbooks describe sequencing using slab gel systems, not capillaries. But budget is also a factor: at $40,000, the LI-COR model 4300 costs just a fraction of the Applied Biosystems 3730xl DNA Analyzer ($365,000).

THE NEXT GENERATION

The life sciences would be vastly different had the sequencer never been developed. Genome sequencing notwithstanding, TIGR's Eisen says the automated sequencer democratized sequencing. "Everybody and their mothers are getting the genome sequence of the organism they're working on," he says. "That would have been inconceivable" without the automated machine.

Eisen has received funding from the National Science Foundation's Tree of Life program to sequence major bacterial groups, to understand early events in bacterial evolution. Other researchers are using the tool to resequence already sequenced organisms, to clarify the mutation process, how sequences vary among individuals, and what those variations mean. "The value of 20 E. coli genomes is enormously greater than having just one," says Eisen.

Having genomes at their fingertips enables scientists to investigate much "bigger questions" in biology than counting letters, says Max-Planck's Wiebe – questions like how drugs and disease affect gene expression patterns genome-wide, for instance. Experts credit the new technology with creating the still-booming field of bioinformatics, which grew out of the need to handle mass amounts of sequencing data. And scientists are still investigating how the automated sequencer might change the clinic (see Sidebar).

Fast Facts

How has the sequencer transformed the life sciences: Democratized sequence data, facilitated the human genome project

When was it developed: 1986

Primary application: Automates DNA sequencing gel electrophoresis

Pros: Provides mass amounts of sequence data quickly

Cons: Cost, short read lengths

Key reference: L.M. Smith et al., "Fluorescence detection in automated DNA sequence analysis," Nature, 321:674–9, 1986.

Clinical application: Individualized medical care based on sequence data?

Yet the technology continues to evolve. For one thing, many say automated sequencers and their chemistry are still too expensive. Wiebe estimates it costs millions of dollars to sequence an entire genome, making it prohibitive for many applications. And read lengths are too short, says Smith, which increases the likelihood of errors. The error rate in raw sequence data is getting "better and better all the time, but it's still not perfect," he says. (According to Applied Biosystems, the 3730xL machines can process 96 capillaries, each with 900 called bases, in three hours with a 1% error rate.)

NHGRI has awarded $70 million in three-year grants for new sequencing technologies during fiscal years 2004 and 2005. Applied Biosystems' Gilbert estimates between 30 and 40 companies are working on next-generation sequencing technologies, including single-molecule techniques, which decode sequence from a single cell, and technologies that dramatically cut sequencing costs.

Michael Metzker, of the Baylor College of Medicine in Houston, and colleagues recently described a new sequencer configuration that uses four different lasers matched to four different dyes, improving the instrument's sensitivity.² The lasers also pulse at different times, Metzker explains, which improves sensitivity even further. Metzker has formed a company called LaserGen to commercialize the technology; the company plans to release a prototype in 2006.

On July 31, Branford, Conn.-based 454 Life Sciences Corp. published a paper in Nature describing the sequencing of the bacterium Mycoplasma genitalium in four hours with 99.96% accuracy, 100 times faster than standard methods.³ The August 4 Sciencexpress outlined a technology that converts an epifluorescence microscope into an automated sequencer.⁴ Jay Shendure at Harvard Medical School and colleagues used the technology to resequence E. coli for one-ninth the cost per base of conventional tools, with less than one error per million consensus bases.

If successful, some of these technologies could create new sequencing revolutions of their own, Eisen says. Using single-molecule sequencing, doctors could look at the sequences comprising a single tumor, which probably contains thousands – or even millions – of distinct cell subpopulations, each with a distinct genotype, he says. "People aren't even thinking what they can do with it yet," he adds.

For now, the automated sequencer is still the "gold standard," says Gilbert, and genome centers continue to put their faith in the technology. The Broad Institute in Cambridge, Mass., for instance, recently added nearly 20% more Applied Biosystem machines, bringing their total number to 126. But the Institute is also looking ahead; in March it installed a genome sequencer from 454 Life Sciences – the company's first placement.

Only a few years ago, notes Eric Green of NHGRI, researchers believed manual sequencing using slab gels was state-of-the-art, which today is decidedly old-fashioned. In a few more years, he expects the same thing will happen to the latest generation of sequencers. "What we have seen with automated sequencing is just the first step. "

SEQUENCING MILESTONES

1975

Frederick Sanger invents the "chain termination" method of DNA sequencing

1977

Sanger sequences bacteriophage φX174 (5,386 bp)

1982

GenBank launches with 606 sequences and 680,338 bases

1986

Nature publishes, and Applied Biosystems commercializes, the automated DNA sequencer

1990

The Human Genome Project begins

1992

Yeast chromosome III sequenced (315,000 bp)

1995

Haemophilus influenzae the first free-living organism sequenced (1.8 million bases)

Applied Biosystems releases the first sequencer based on capillary electrophoresis

1996

Saccharomyces cerevisiae sequenced (14 million bp)

1997

GenBank exceeds 1 billion bases

1998

Caenorhabditis elegans sequenced (97 million bp)

1999

First human chromosome (22) sequenced (33.4 million bp)

2000

Drosophila melanogaster (180 million bp) and Arabidopsis thaliana (120 million bp) sequenced

2001

Human Genome Project and Celera publish drafts of the human genome (3 billion bp)

2002

Mouse genome sequenced (2.5 billion bp)

2003

Human Genome Project finishes

2004

GenBank holds 44.5 billion base pairs, and 40.6 million sequences

Interested in reading more?

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!

Already a member?