How many sequenced genomes are enough? The minimum number for comparative genomics, researchers say, depends on what you want to learn. The optimum number is a still a mystery.
For identifying cis-regulatory regions such as enhancers and promoters, the genomes of three species that are roughly equidistant evolutionarily is the bare minimum, and more is better, according to Lincoln Stein. Stein, who is at Cold Spring Harbor Laboratory, is first author of a paper on the
For finding conserved sequences in mammals, with their more complicated genomes and long stretches of repetitive DNA, some half-dozen species might be enough, said Eric Green, scientific director at the National Human Genome Research Institute (NHGRI). His estimate comes from his lab's unpublished work and from a paper in the August 14
“By the time you get to about five or six mammals, you start to plateau in the detection of these highly conserved sequences,” Green said. “But,” he cautioned, “we don't know if we've found everything we need to find, and we don't know if our algorithms are properly developed to find everything we're trying to find. It just means that with our algorithms, you plateau. In the absence of knowing what it is that you're really trying to find, it's hard to assess the methods and the datasets that you're using to find it.”
Studies of evolutionary change require another comparative approach.
That's why the
Will three genomes really be enough for evolution studies? “If you're trying to understand details about genome evolution, there's probably never enough genomes. Every genome you get more data from, you're getting insight about the evolution of that genome relative to all other genomes,” Green said. Green has put his resources where his mouth is. The
“I'd say we're just scraping the surface right now,” noted Hugh Robertson, who studies the evolution of insect transposons at the University of Illinois at Urbana-Champaign. He forecasts eventual genome projects on several insect orders, including beetles, moths, and bugs (Hemiptera). To say nothing of fruit flies:
The main barrier to the immediate sequencing of many more genomes remains cost. “The reason there is so much discussion about which vertebrates to sequence is simply because it still costs between $50 million and $100 million to sequence a vertebrate genome, and that's real money,” Green pointed out. “There's no question that if the cost of a sequence were ten or a hundred times cheaper, we wouldn't be worried about whether we were going to sequence three mammals, or six, or ten. We would just sequence a lot of them.”
Even at present prices, sequencing is a bargain, Robertson argued. “When you think about it relative to the enormous amount of resources that NIH puts into grants to characterize individual genes of different species, sometimes it just seems ludicrous not to be sequencing their genomes.” Sequencing an insect genome, he declared, would cost no more than a few National Institutes of Health (NIH) grants.
Sequencing costs have dropped several orders of magnitude, from $10 per finished base in 1990 to today's cost, which Green estimates at about 5 or 6 cents per base for finished sequence and about 2 to 4 cents for draft sequence. For some comparisons, draft sequence is adequate. Last spring NHGRI projected future cost at about a cent per finished base by 2005.
Although the plummeting price of sequencing is welcome, it is due to incremental improvements on the basic technology. “What we're all praying for is one of those great breakthroughs—a new technology that will allow us to read single-molecule sequences, or whatever the trick is going to be that will give us several orders of magnitude increase in speed and reduced cost,” Robertson said. Teams of competitive technology developers around the world are racing toward that goal, cheered on by a lot of casual prophecy about the $1000 genome.
Nor is cost the only challenge. “The really big questions about genome dynamics, selection, adaptation, and gene networks await better theory, methods, and clever hypothesis testing,” said Cristian I. Castillo-Davis, who studies regulatory sequence evolution at Harvard University. “Currently, the field is very thin on biological analysis and very heavy on technology and the reporting of numbers for numbers' sake.” Castillo-Davis' solution? “We are in great need of biologists who can develop novel analytical tools and theory to make biological sense of comparative genomic data.”
But the infrastructural problems of comparative genomics tend to fade in the dazzle of its prospects. With the data that will flood public databases in the next few years, Coghlan expects researchers to take on questions such as: How does regulatory DNA evolve? How is chromosome and protein evolution related to population size and structure? How do differences in meiosis and recombination in different species, such as those with holocentric chromosomes versus those with a true centromere, affect the structure of chromosomes and proteins?
“Research communities are realizing that they're going to wither if they don't have a genome project,” Robertson said. “I suppose we're not going to sequence every genome on the planet, and that's certainly true if technology stays the way it is. But if technology changes as radically as some people think it will, then yes, why not sequence most of the species on earth?”