Phase I of HapMap Complete

International consortium publishes most comprehensive catalog of human genetic variation to date

Oct 26, 2005
David Secko(

Researchers have released a public database of human genetic variation, designed to help scientists study the effects of small genetic differences on health, reports an international consortium in this week's Nature. The findings suggest that only 260,000 to 470,000 single nucleotide polymorphisms (SNPs) are needed to capture all the common genetic variation in the populations studied, despite the fact that there are an estimated 10 million common SNPs in the human genome.

The HapMap, launched in 2002 by the International HapMap Consortium, is a catalogue of millions of SNPs that maps the natural organization of the human genome in blocks called haplotypes. "The HapMap is a resource that ushers in a new era of disease studies by effectively allowing all the common variation in the human genome to be compared," said Peter Donnelly, from University of Oxford, UK, and one of the authors of the paper.

However, in the past, some concerns have surrounded the HapMap project – for instance, scientists have questioned whether haplotypes are the best way to search for genetic factors, and if the populations used are really representative of human diversity.

Newton Morton from the University of Southampton, UK, who was not involved in the study, suggested that some of these debates are still swirling, but most researchers are now focused on what can be done with the HapMap. The International HapMap Consortium has "a few very good maps from several populations, these are not huge by a long shot, but I think they will be very useful," Morton told The Scientist.

Indeed, scientists have already expressed excitement about the HapMap's potential, given that it proposes a solution to the relatively unsuccessful track-record genetics has in dissecting complex disease traits. "Most of the common diseases, like hypertension, stroke, and heart disease, have an important genetic component," said Donnelly. "But, for most of these diseases, we understand very little of what is going on with them. It's pretty depressing actually."

One potential solution is to compare people with and without a disease to measure how they differ genetically. But comparing 10 million common SNPs in people is "just too expensive given current technology," Donnelly told The Scientist. However, a few years ago, scientists recognized that the human genome is organized into haplotypes, providing a potential shortcut. "With [haplotypes] we might be able to get away with only comparing 5-10% of the 10 million SNPs, suddenly making searches affordable," Donnelly noted.

During the study, researchers took 296 DNA samples from four populations in Nigeria, Tokyo, Beijing and Utah, aiming to genotype one SNP for every 5 kb of genome. They characterized over one million SNPs, verified the low haplotype diversity in the above populations, and created a fine-scale genetic map of 21,617 recombination hotspots.

Results from a second phase of the project were also added to the database on Monday, Donnelly noted. This project analyzed an additional 2.1 millions SNPs and was done in collaboration with Perlegen Sciences, Inc., California.

"Approaching 3 million SNPs, I think, is far ahead of anyone's prediction of what the project could do," said Morton. "So, [the HapMap] is a tremendous step forward, for which the pay off could be quite large, since this puts us at the stage of looking for genes of small effect in disease," he added.

Both Morton and Donnelly note that scientists are already benefiting from the project – for instance, Josephine Hoh and colleagues at Yale University used the HapMap to link the complement factor H gene (HF1) to age-related macular degeneration, a leading cause of blindness in the U.S.

"We have been using the data from the HapMap for awhile," Hoh told The Scientist, "and I'm glad to have it, since otherwise, we would have had to painstakingly narrow the search for HF1, so it saved a lot of energy and time."

Indeed, in a report appearing the same issue of Nature, Vivian Cheung and colleagues from University of Pennsylvania, used the HapMap to look at the genetic basis of natural variation in gene expression. Starting with a scan of the whole genome, they were able to find a functional cis-acting transcription regulator for one test gene (chitinase 3-like 2).

Ritsert Jansen, from the Groningen Bioinformatics Centre, said this is indeed something to get excited about. "The advent of detailed information about SNP variation is highly essential for genetical genomics in experimental -- like mice --and non-experimental -- like human -- organisms," Jansen, who was not involved in either study, told The Scientist.

Nevertheless, Jansen, Morton and Donnelly all caution that the statistical effects gleamed from genome-wide association studies using the HapMap need to be rigorously verified to avoid false positives. "There has been a history in the field of claimed results that haven't replicated," said Donnelly. "So now, I think, there is a widespread feeling that we have to be careful about how we do these experiments…for instance, trying to replicate studies before we get too excited," he said.