© CHRISTIAN DARKIN
Five years after publication of two drafts of the human genome, Maynard Olson of the University of Washington finds himself longing for another "lurch." To be sure, genomic scientists across the world have chalked up many achievements since 2001, but, like many of his colleagues, Olson is feeling more impatient than celebratory.
Progress has included a blizzard of comparisons between the human sequence and many others, including the chicken, the mouse, the rat, the dog, and the chimp. The flourishing of comparative genomics, says Olson, has changed the focus of genomics from a single reference sequence of genes to a rich variety of "functional elements," largely sequences that serve as ignition switches, brakes and accelerators for gene expression. And the focus on single-base changes has widened to an array of evolutionary rearrangements: insertions, deletions, reversals, and duplications. There are new tools: new global databases of all functional elements in genomes (e.g., ENCODE), small molecules for chemical genomics (e.g., PubChem), and a raft of protein structures.
And yet the last five years, in Olson's view, have been "a period of a great grinding of gears, kind of shifting of gears." In the terms of the science historian Thomas Kuhn, it's been "a period of consolidation and more normal science." Others, such as Sydney Brenner of the Salk Institute, the Nobel Prize-winning pioneer of the worm, Caenorhabditis elegans, go further, worrying that the genome sequence and the growing lists of sequences and proteins and protein interactions and functional elements don't get very deep into such core problems of biology as the operations of the cell, of development from egg to adult, or the problem of consciousness. "We've become very geno-centric," says Brenner. "The cell must become the focus."
What vexes many thousands of colleagues around the world most is that genomics hasn't yet moved into the "real world" of medical relevance. Olson led a team that sequenced the principal microbe involved in lung infections in cystic fibrosis patients, Pseudonomas aeruginosa. Referring to changes in cells of both the patient and the infecting organisms, says Olson, "it's clear that mutational cascades are a really critical aspect of disease progression, just as is the case with cancer." To build a genomics "bridge" into this area is going to call for a "very large" amount of sequencing of both patients and microbes to follow the progression of the disease. For this, the faster second-generation sequencing technologies emerging from several startup companies will be essential, Olson thinks, just as it will be for the new National Institutes of Health Cancer Genome Project, on which pilot work has begun. "They're over-promising and are trying to move too quickly, without a strong enough strategic plan," Olson argues. "Nonetheless the scientific idea is right. These policy things usually eventually fall into place as reality exerts itself."
Such issues as cheaper, faster DNA sequencing to get genomic tools into the clinic sooner will define the field for the next five years, and beyond. Echoing Olson, Cold Spring Harbor Laboratory's Lincoln Stein says the last five years have fit Kuhn's definition of 'normal science," although "the number of questions never decreases."
The annotation work of the last five years, says David R. Bentley, the former director of human genetics at the Wellcome Trust Sanger Institute in Hinxton, UK, has been "changed completely" by the discovery of microRNAs, which include the small interfering RNAs, siRNAs. "All you had five years ago was strings of bases.... Now we have this beautiful... emerging colorful picture of what each of these bases does. I think we may build a whole new picture of how the genome works and how it specifies the cell," says Bentley, now chief scientist of Solexa, one of several startup companies that is speeding DNA sequencing and driving down its costs beyond those possible with the classic Sanger method.
To those who shrug at "just another sequence," says Phillip Sharp of MIT, a Nobel Prize-winning leader in RNA research for three decades, "the widening array of genomes, means everything." MicroRNAs were found in C. elegans, by Andrew Fire and Craig Mello in 1998. Comparing human, worm, fruit fly, and plant genome sequences allowed microRNA research to go fast in the past few years. Soon after microRNAs were found in humans, researchers were calculating the number of microRNA genes and focusing on their targets; they went on in 2005 to using them as a signature of cancer in cells and a potential tool for reducing the expression of genes for synthesizing cholesterol.
"We have gone from a situation where we didn't even know about this regulatory network in three years to being able to identify gene systems that are being regulated," says Sharp, who contends that the pace of microRNA discoveries is ten times faster than the work on RNA splicing that he and others began in the late 1970s. The speedup, in Sharp's opinion, is entirely due to having the human sequence - and is a major example of the influence that genomics is having on other fields. "We made progress on the biochemistry of that process, but we made very little progress on the big picture of how it's regulated and changes in normal versus disease states." So now, the comparison of genome sequences will also be harnessed to get at the specifiity and regulation of RNA splicing.
The growing menagerie of "model organisms" and what comparative analysis of them can achieve also impresses Robert Waterston, chairman of the genomic sciences department at the University of Washington. With the ability to knock down every gene in an organism like yeast, he says, a "true molecular description of yeast... is on the table." Since yeast is a primary organism for comparisons with human sequences to discover genes and their controls, he adds, "I can't imagine it wouldn't have a profound impact on how you view humans."
NEXT FIVE YEARS: SPEED UP, COSTS DOWN
Sensing "some motion" recently, Olson hopes for the success of the new, faster sequencing techniques that are coming over the horizon from startups like 454 Life Sciences, Solexa, Agencourt Bioscience, and Helicos Biosciences. They claim processes 100 times faster than the "classical" machines of the 1980s and 1990s, which now operate in "reads," or sample lengths of 800 bases compared to some 100 for the new processes. The workhorses of the 2001 human drafts have kept doubling their throughput about every 22 months over 15 years. In September, 454 reported that, in a single run, its system did a shotgun sequence and assembly of the microbe, Mycoplasma genitalium, in four hours. Claire Fraser's team at the Institute for Genomic Research took three months to work out Mycoplasma's sequence in 1995.
Solexa, Bentley said, has a "marketing timeline" that calls for some of the instruments it's developing to be in the hands of "early access customers" toward the end of the second quarter of 2006, with commercialization scheduled by the end of the year. For the rival second-generation sequencing machines, Bentley sees "overlapping markets, although exactly where the overlaps are is not clear at the moment." While the new technologies have worked with small genomes, there are challenges of accuracy, longer "reads," and cost for larger genomes, he says. sequence in 1995.
As he has for decades, Leroy Hood, director of Seattle's Institute for Systems Biology focuses on the "toolbox" for a genomics that will enable a truly personalized, predictive, and preventive medicine. "All the big revolutions [in science] are technique-driven," he says. Hence, the drive for machines to sequence a human genome for $100,000 and then $1,000, compared to today's $3 million price, should succeed over the next 10 to 15 years. The 454 machine, he says, can handle several hundred thousand samples at once. As to Helicos, to which he is an advisor, he sees an "enormous advantage" to the single-strand technique of Stephen Quake of Stanford, which Helicos uses. The result of faster sequencing will be "an explosion of biology," with demand for full sequences of hundreds of millions of people in Europe and America - so, the market will be there, says the ebullient Hood, contradicting Olson.
Francis Collins (see Delivering on the Dream), director of the National Human Genome Research Institute, whose overall budget in 2005 was $500 million, is betting $30 million a year for five years on second-generation sequencing. Although he says, "Technology development is a risky experience," he adds, "We are on the cusp of a real paradigm shift."
And the Broad Institute in Cambridge, Massachusetts, is testing a 454 machine. However, "we're going to be testing all the others, too... I'm in favor of all clever ideas," says Broad director Eric Lander. Nonetheless, "it takes quite a long time before clarity emerges around a new technology platform. We are going to have to see how they perform in many, many respects." Compared to the 1990s decisions to use sequencing technology from Applied Biosystems and its rivals, "We might have a more textured, layered solution."
GETTING TO THE CLINIC
Such "second generation" sequencing could make possible one of Olson's other goals: being able to sequence, in, say 100 patients and 100 controls, the HLA region of chromosome 6. The HLA region contains genes associated with autoimmune diseases like type 1 diabetes, multiple sclerosis, rheumatoid arthritis, and lupus. The "precise molecular explanation" for these diseases "has remained elusive for decades," Olson notes. Presently, he says, such "large-scale targeted resequencing of genomes," going after small nucleotide changes along "substantial chunks of real estate" is beyond the capacities of today's principal sequencing robots.
Sharp says that's true for other fields as well. "They can't afford to do the cancer genome without dropping the cost of sequencing by over ten-fold," he says. In general, he is optimistic on secondgeneration sequencing, saying he "would bet on it without a question that we will be at a $1,000 genome in a five-year window." At that point, "It will be feasible to sequence everybody at a cost that will be insignificant compared to the medical costs," opening up the way to wide clinical application in diagnosing disease and picking particular therapies.
One of the most dramatic efforts to push genomics into the realm of complex, multi-genic diseases is the five-year, $138 million haplotype map (HapMap) project, involving samples donated by Japanese, Han Chinese, Yoruba, and Americans of European descent. The project takes advantage of the fact that the millions of single nucleotide polymorphisms (SNPs) found in at least one percent of humans tend to pass between generations in blocks of DNA called haplotypes. The project announced its Phase 1 analysis in October 2005, and said that the analysis of Phase 2, already completed, would be published in 2006. Despite successes, such as using HapMap data to pinpoint a gene for macular degeneration, there remains controversy over HapMap's reach into domains such as rearrangements like deletions and reversals, or the numerous rare mutations that may be involved in diseases.
The minor variations are of central interest to Bentley of Solexa, who has specialized in rare variations. The HapMap, he says, has limitations, capturing only common variations in three target populations, missing the rare mutations. But it may provide a quick way to find more disease genes. Still, in three to five years, he says, the new sequencing machines should open the option of going after virtually all the many genes involved in a disease like diabetes. To be sure, the multiple sequences of patients and "controls" will have to square with what HapMap has found. "Everything that a HapMap captures should also be captured by a technology that aims to do better." Bentley, an early proponent, calls the HapMap "a real benchmark."
Collins, who has directed NHGRI since 1993, is cheerful in the face of the doubts. He is certain that HapMap will become "the most powerful tool" to date "for unraveling the genetics of common diseases." He adds, "I think you can expect... in the next, let's say, two to three years, that the major genetic contributions to genetic diseases, perhaps as many as a dozen... will be identified. And that's going to be incredibly exciting."
The long road of such incremental steps to clinical relevance creates impatience in genomics. Waterston says, "Yeah, there's frustration. I suspect among some quarters that read the hype and didn't know enough of the science, they would be frustrated. Anybody who knew the science knew that it was going to be a long time coming. These are hard problems. I would say that the progress that has been made on them is pretty substantial. But that's because I come in with a deep understanding of how hard it is."
DNA from the 40,000-year-old bones of a modern human found in Europe contains Neanderthal genes.