<figcaption> Credit: Illustration: Thom Graves</figcaption>
Credit: Illustration: Thom Graves

Phase 1 of the International HapMap Project (http://www.hapmap.org), published in November 2005, was hailed by the mainstream press as a revolutionary tool for gene-association studies. Researchers using the data have been similarly enthusiastic. Says Jeanette McCarthy of the Graduate School of Public Health, San Diego State University, "It's an unprecedented resource. ... It provides a lot more information not just for people doing whole genome association studies, but [also] for those focused on specific regions of the genome or even candidate genes. It can add a lot of information and help us pinpoint the genes a lot easier."

Left out of the discussion, however, are more practical issues. Like any map, the HapMap requires some training to use properly. How, for instance, do you use the data? Are there things to look out for when choosing SNPs (single nucleotide polymorphisms) and determining haplotype block...

1. DON'T USE HAPMAP FOR RARE VARIANTS

At the core of the HapMap is the concept of linkage disequilibrium (LD), a numerical measure of how well a SNP in a particular location correlates to a SNP in another location nearby; high LD between two SNPs means that the presence of an allele at one SNP is a good predictor of the other SNP's allele. So-called htSNPs (haplotype-tagging) are SNPs with a high degree of LD for every other polymorphism located in a haplotype region or block; in other words, htSNPs can be used as a surrogate marker for every SNP within that particular block. Others consider tagSNPs, which can be determined using pairwise LD (e.g., r2>=0.8), without reference to blocks.

In gene-association studies, users generally pick five or six SNPs with a high degree of LD for a known variation, making the assumption that if these SNPs tag well for that variation, they will also tag well for another, unknown variation associated with it. Most of the time this will be a good assumption, but not always, says Oxford scientist Gilean McVean. One case involves rare variation, though McVean says more sophisticated algorithms, yet to be developed, might be able to tag these. "The second is that some SNPs are simply untaggable," he says, "and it's not clear why that's true. They may be hot spots of mutation ..., they may be hotspots of gene conversion, or they might be sitting in the middle of recombination hot spots. And if any of those three is true, then the current HapMap won't tag those."

That's the conclusion Oxford researcher Eleftheria Zeggini and colleagues came to in a recent evaluation of HapMap performance,1 which showed that SNPs selected using different algorithms all captured common variations well. However, she says, "Dedicated resequencing efforts in large sample sizes will be necessary in order to capture rare variants, those occurring with minor allele frequency [less than five percent]."

2. NOT ALL tagSNPs ARE CREATED EQUAL

?

Several tools (see Table) are available to help you pick HapMap SNPs, but not all of them will give you the same answer.

"Really, all they're doing is going through the HapMap data, computing statistical correlations between the different SNPs," says Lon Cardon of the Wellcome Trust Center for Human Genetics. "Whenever they see a high correlation, they pick one of them, and say ... you just need to genotype this one out of this set. But naturally, there's multiple solutions to that problem." Adds Michael Nothnagel of the University of Kiel, Germany (whose own work demonstrated the effect of SNP marker choice on haplotype block patterns2): "Depending on the marker set you use, you might get different haplotype blocks and different haplotype patterns in these different subsets."

Nothnagel adds that further verification of the LD pattern is also needed. "You can have identical patterns with different SNP sets ... but you cannot assume that this is always the case," he says, noting that in his view, the pairwise method of selecting SNPs is more stable than the block approach.

Cardon suggests that comparisons between two studies using different tags be done at the marker level. "If I wanted to see how my answers compared with someone else's, I [would] genotype the same markers that they did, not necessarily just proxies," he says.

3. DON'T DISCARD YOUR FAILED DATA

Scientists can use the HapMap to identify large insertion deletion polymorphisms and Mendelian inheritance inconsistencies by mining apparent genotyping failures (such as null genotypes or Mendelian inconsistencies) for patterns that are indicative of these chromosomal abnormalities.

Until now deletions have been hard to detect with standard SNP genotyping assays. With the HapMap, though, one can scan the genotyping data for unusual regions that match the appearance of a deletion.3?5 "If one is performing an association study with a genome-wide set of SNPs, one is not only testing the standard combination of single nucleotides for associations, but one has the ability to test also for the role of large structural genetic variants as well, in the pathogenesis of disease," says Mark Daly of the Broad Institute in Cambridge, Mass.

4. DON'T ASSUME HAPLOTYPE BLOCKS HAVE FIXED ENDS

Yale researcher Josephine Hoh and colleagues recently performed a genome-wide screen for SNPs associated with age-related macular degeneration (AMD) in 146 Caucasian subsets and identified a 500-kb region containing two alleles associated with the disease.6 Because the region was too wide for fine mapping, Hoh's team looked at data from the HapMap project and found that the two SNPs were located within a much smaller (41-kb) block in a gene already known to be strongly associated with AMD.

Hoh's team assumed the block would contain functional polymorphisms in linkage disequilibrium with the risk alleles. But, after resequencing the exons they found that a functional polymorphism associated with AMD was in fact located 2 kilobases upstream, just outside the HapMap block.

One lesson, says Hoh, is this: Researchers who use HapMap data to fine-tune their disease-association studies must rely on educated judgment when determining block boundaries. "The question will be: Are you going to sequence in just that block of intervals, or do you want to widen the interval a little bit and do a larger size of sequencing?" she says. "After all, the cohorts we are investigating would never be the same as the ones in the HapMap project."

5. VALIDATE, VALIDATE, VALIDATE

Keith Cheng and colleagues at the Pennsylvania State University College of Medicine, along with Penn State anthropologist Mark Shriver, used the HapMap recently in a study of the genetic basis of human pigment variation.7 Cheng's team first identified a human ortholog to a zebrafish gene associated with changes in melanosome number and morphology in zebrafish stripes, then used HapMap data to search for polymorphisms in that gene that showed high-frequency differences among populations. They found a SNP at amino acid 111 that encoded for either alanine (the ancestral allele, predominant in darker-skinned Africans and Asians) or threonine (a mutation present in all of the lighter-skinned European population).

Cheng points out that his team performed three levels of validation: frequency difference between populations, regional evidence of selection, and functional evidence. Although the genomic data clearly provided evidence of selection, Cheng's team performed a second study in two admixed populations (with measured skin pigmentation levels) not included in the HapMap data to confirm their findings. "When you do these sorts of studies, you need to do multiple levels of validation in order to prove that what you assert is correct. I think it's very dangerous to start with any one feature alone, such as just frequency or just population distribution around the world. There can be a lot of artifacts caused by things like bottlenecks," Cheng cautions.

aconstans@the-scientist.com

References

1. E. Zeggini et al., "An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets," Nat Genet, 37:1320?2, December 2005. 2. M. Nothnagel, K. Rohde, "The effect of single-nucleotide polymorphism marker selection on patterns of haplotype frequency estimates," Am J Hum Genet, 77:988?98, December 2005. 3. D.A. Hinds et al., "Common deletions and SNPs are in linkage disequilibrium in the human genome," Nat Genet, 38:82?5, January 2006.4. S.A. McCarroll et al., "Common deletion polymorphisms in the human genome," Nat Genet, 38:86?92, January 2006.5. D.F. Conrad et al., "A high-resolution survey of deletion polymorphism in the human genome," Nat Genet, 38:75?81, January 2006. 6. R.J. Klein et al., "Complement factor H polymorphism in age-related macular degeneration," Science, 308:385?9, 2005.7. R.L. Lamason et al., "SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans," Science, 310:1782?6, Dec. 16, 2005.

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!