© 2001 AAAS
Along a 106-kilobase stretch of human chromosome 21, one study found that 18 haplotype blocks represent a segment of 147 SNPs from 20 individual copies of the chromosome. One block, containing 26 SNPs and spanning 19 kilobases, is detailed at right. The four most common haplotypes, occurring in 16 of the 20 chromosomes sampled, can be identified by two tag SNPs (bottom right). (Adapted from N. Patil, Science, 294:1719–23, 2001)
The myriad medical breakthroughs predicted to come from the sequencing of the human genome have yet to pour freely. The idea that genes related to common diseases and unique drug responses can be uncovered through careful scrutiny of genetic variation is an inspiring one, but searching for variability remains expensive and time-consuming. A project that would map variants common in most human populations might ease that search. In July 2001, five months after publication of...
The description of haplotype structure described in these papers arrived at a critical point during human genome sequencing, allowing researchers to focus on applications for the nascent data. "It came at the right time in the genome project when the sequence was nearly in hand, and we could concentrate on why we did the sequence in the first place," says Aravinda Chakravarti, director of the Institute of Genetic Medicine at Johns Hopkins University.
While earlier studies had hinted at this structure of the genome, most scientists still assumed that genomic data would be very confusing, says Mark Daly, White-head's human genetics informatics director. "The true simplicity of the underlying structure was not revealed in any of the earlier studies due to too few markers, too little sampling."
Four studies published in October 2001 cleared some of the confusion and set the stage for Cox's and Altshuler's reports. Alec Jeffreys, a geneticist at University of Leicester, UK, described the occurrence of recombinational hotspots, or short regions of low LD due to recombination.3Whitehead/MIT scientists illustrated how a haplotype in the chromosome 5q31 region is a risk factor for Crohn disease,4 and explained how a single representative SNP, or tag SNP, can be used to characterize each block.5 Finally, geneticist John Todd and colleagues called for more SNP identification in order to facilitate the disease-gene search.6
Then, this issue's Hot Papers presented support for the idea that haplotype information from a few specific populations could be extrapolated to all humans in order to identify potential disease loci. Cox explains that the blocks containing correlated SNPs are relatively short, which means that many SNPs are needed to identify disease-related genes. "The good news is that you don't have to look at all of them," says Cox, "because within that 12,000 base pair region, there's only, on average, three flavors that account for almost everybody. It's incredible."
Some researchers had predicted this structure before, but never with such "a massive amount of data," says Cox. His group examined two haploid copies of chromosome 21 from 24 "ethnically diverse individuals" from the NIH-sponsored Coriell Cell Repositories in Camden, NJ. Altshuler's group looked at 275 individuals from four populations: 50 unrelated African Americans, 42 unrelated Japanese and Chinese individuals, 93 of European ancestry, and 30 parent-offspring trios from the Yoruba-speaking peoples in Nigeria. Altshuler, White-head's medical and population genetics program director, explains that these did not constitute a comprehensive survey of the human population, but that population genetics literature supports the idea that these "were somewhat meaningful as coarse groupings."
These two papers "asserted that what was predicted [before] ... was actually the case for the majority of the human genome," says Daly. He adds that these papers were the justification for pursuing the construction of a whole genome haplotype map.
An international consortium with research teams from Japan, China, Canada, the United Kingdom, and the United States launched the $120 million HapMap Project in October 2002. Since then, scientists have been collecting and analyzing DNA samples of 270 people from four populations: Europeans (samples from the French CEPH genotype database), Nigerian Yorubans, Japanese, and Chinese. This past December the consortium published its scientific strategy.7
When first announced, the HapMap was touted as the way to use genomic data to tackle medical conundrums such as complex genetic disorders and to identify variations that might contribute to good health. Knowing that the human genome could be parsed into smaller segments "made it real and practical for us to search the human genome for disease associations," says Hopkins' Chakravarti.
DISSENT AND APPEAL
But not everyone was convinced. Yale University geneticist Kenneth Kidd says that the genome structure proposed by the Perlegen and Whitehead groups is a "gross oversimplification." The idea that three or four haplotypes would account for the entire human population, he says, is not accurate. "Of course there are blocks," he says, "but that's not the majority of the genome." He agrees that there are regions of reduced recombination mixed with recombinational hotspots, but he says, "there are lots of regions where there just appears to be random recombination at sort of a uniform rate."
Besides questioning the interpretation of their findings, he questions the actual data. "Much of [Perlegen's] data might be purely a consequence of the ethnic stratification of their sample and have nothing to do with linkage disequilibrium in the genome in general," says Kidd. "The problem is, because you can't know the ethnicities, you can't test that within those data."
Many say the HapMap Project may not answer the questions its sponsors initially claimed it would. Pennsylvania State University anthropologist and biologist Ken Weiss says he is "skeptical that the project will be able to identify things of major health importance in the way that was claimed." Weiss adds that momentum may have played a role in raising support for the project, and Kidd agrees: "A lot of money was set aside by people who were convinced by the sales pitch."
Those involved with the project disagree. "There's so much widespread support from various genetic communities," says Chakravarti. "It's difficult for anyone to argue that this is a completely misguided project." This structure "turns out to be a very simple, appealing, and elegant hypothesis," he adds.
Regardless of how it came to be, Chakravarti notes that the structure exists, it has led to discoveries, and it will continue to do so. "It sounds very trite that the possibilities are endless, but we really haven't figured out all of the ways in which we can use genomic data."
Maria W. Anderson can be contacted at