Clockwise from top left: images courtesy of Affymetrix, Illumina, Sequenom and Illumina
Take any two individuals, sequence and compare their genomic DNA, and you'll find that the vast majority (about 99.9%) of the sequences are identical. In the remaining 0.1% lie differences in disease susceptibility, environmental response, and drug metabolism. Researchers are understandably keen to dissect these variations, most of which take the form of single-nucleotide polymorphisms (SNPs).
A SNP (pronounced "snip") is a substitution of one base pair at a given location on the genome. At position 11,294,479 on human chromosome 7, for instance, some people have an A, while others have a G. On average, SNPs are spaced every 300 bases throughout the human genome and are estimated at nearly 10 million. Each is a genomic landmark, a surveyor's marker that researchers can use to chart the location of disease genes and heritable traits, for instance.
Most SNPs reside outside coding regions, exerting potential influence on gene regulation and expression. Many researchers value these SNPs for use in association studies and whole-genome linkage-disequilibrium mapping. In this type of analysis, maps of common, genome-wide polymorphisms are used to unearth variations that are associated with, but not causative of, medical conditions.
But some polymorphisms occur in protein-coding regions (cSNPs) and may directly contribute to disease, disease susceptibility, and drug metabolism by altering gene function. An estimated 200,000 cSNPs are present in the human genome,1 and scientists like David Altshuler, director of the Program on Medical and Population Genetics at the Whitehead Genome Center in Cambridge, Mass., say such SNPs should take precedence in large-scale genotyping studies.
Cataloging SNPs across the genome is one of the goals of the Human Genome Project2 and a focus of the SNP Consortium. Together, these two projects have contributed millions of SNPs to dbSNP, a public database maintained by the National Center for Biotechnology Information (NCBI). The availability of so many genetic markers presents an interesting problem: How can researchers sample SNPs on a genome-wide scale to take advantage of all the information? Scientists have developed several solutions, including microarray, mass spectrometry, and bead-based approaches. "There is no clear winner yet," says Altshuler, "but there's plenty of excitement."
Courtesy of Illumina
A TECHNOLOGICAL FIX Whole-genome genotyping of 10 million SNPs is a monumental effort, technologically daunting, and prohibitively expensive. A newly launched initiative called the HapMap project (see sidebar) aims to simplify whole-genome genotyping by producing haplotypes. Such shortcuts "provide substantial statistical power in association studies of common genetic variation," according to a recent report.3
Theoretically, researchers using the HapMap will be able to genotype an individual using only 500,000 SNPs, but typing even that smaller number of SNPs is a tremendous challenge, and a huge expense. "Estimates of the number of SNPs required for good genome-wide linkage disequilibrium studies varies a great deal," says Gabor Marth of the NCBI, "but the price of genotyping has to drop significantly before we can simultaneously test tens or hundreds of thousands of SNPs in large clinical samples."
Several companies are scrambling to fill this niche, by developing high-throughput, accurate, and affordable technologies for large-scale SNP genotyping. What follows is a profile of four such technologies, each approaching the problem from a unique angle. "The technologies for accurate, high-throughput genotyping are improving daily, and their actual costs are likely to be quite low," says Altshuler.
THE MICROARRAY APPROACH Affymetrix, best known for GeneChip® microarrays for gene expression analysis, has now turned its attention to large-scale genotyping. In 1999, the Santa Clara, Calif.-based company released its GeneChip HuSNP™, capable of profiling 1,200 SNPs simultaneously. Now its GeneChip Mapping10K Array (in early access) genotypes 10,000 SNPs per assay. By the end of the year, Affymetrix expects to begin offering early access to next-generation products that can genotype 100,000 SNPs per assay across multiple arrays.
"The use of the chips accelerates the mapping process considerably," says Peter Nuernberg, of the Gene Mapping Center in Berlin's Max Delbrück Center for Molecular Medicine. "It takes only a couple of days as compared to months when using micro-satellites. Furthermore, we expect the chip approach [with the 100,000 SNPs] to be the best method for whole-genome association studies."
Scientists at Perlegen Sciences, an Affymetrix affiliate, have pushed the genotyping envelope even further. In 2001, Nila Patil and colleagues resequenced human chromosome 21 from each of 20 individuals using GeneChip sequencing arrays.4 They then compared the sequences, finding nearly 36,000 SNPs and defining the chromosome's haplotype structure.
More recently the company has used larger-format GeneChips (five-inch wafers with 60 million probes) to resequence 50 full haploid human genomes in 18 months, discovering 1.7 million SNPs; it has licensed the SNP collection to Affymetrix. According to David Cox, Perlegen's Chief scientific officer, "Perlegen has developed pooled, quantitative genotyping as well as individual genotyping assays for all 1.7 million SNPs." Its scientists are presently using these assays in conjunction with their human genome haplotype map to carry out whole-genome association studies in collabor-ation with several pharmaceutical partners.
THE MIP APPROACH Affymetrix recently agreed to supply biotech startup ParAllele BioScience of South San Francisco, Calif., with GeneChip Tag Arrays (universal detection biochips) for use with ParAllele's Molecular Inversion Probe™ (MIP™) assays for custom SNP genotyping. MIP is a highly multiplexed, platform-independent assay, and when combined with Tag Arrays, provides scalable and flexible DNA-analysis applications.
MIPassays employ a series of enzymatic steps (nucleotide addition, ligation, and digestion) to invert an MIP probe (hence the name) into a form suitable for PCR amplification. A unique sequence tag targets each MIP probe to a specific address on the GeneChip Tag Array for SNP calling.
"Up to 10,000 genotyping reactions can be combined in a single tube with MIPassay technology," says Tom Willis, ParAllele president and CEO. He adds, "ParAllele technologies provide a high-throughput platform for all kinds of SNP genotyping, including difficult applications such as fine mapping, in which custom SNP assays are needed." ParAllele is using MIPassays in its work on the International HapMap Project, in collaboration with Baylor University in Houston.
THE BEADARRAY APPROACH In October of 2002, the NIH selected Illumina, a genotyping services provider, as one of five US HapMap project participants. The work will be conducted in Illumina's San Diego genotyping service facility with the aid of the company's proprietary BeadArray™ technology and GoldenGate™ assay.
BeadArray technology employs two components: bundles of optical fibers (containing 50,000 strands), each containing a microscopic well at the tip; and an array of 1,520 three-micron beads (each of which can interrogate one SNP) that fit into those wells. The fiber bundle is dipped into a pool of coated beads, which then self-assemble to form an array with 30-fold redundancy. To improve throughput, Illumina's Sentrix™ multiarray matrix contains 96, 384, or 1,536 bundles of optical-imaging fibers, so users can genotype 1,520 SNPs in each sample of a microtiter plate simultaneously.
Allele discrimination is performed in a microtiter plate using Illumina's GoldenGate allele-specific extension PCR-based assay, which identifies each SNP with a discrete fluorescent tag and a unique "address" to target a particular bead in the array. Once the microtiter plate is "mated" with a Sentrix multiarray device to capture the reaction products on the beads, Illumina's Sherlock™ confocal scanner scans the arrays and "calls" the genotypes at each marker.
THE MASS SPEC APPROACH San Diego-based Sequenom uses mass spectrometry as its enabling technology for high- throughput genotyping. "Accuracy improves with this approach," says Dirk van den Boom, director of Molecular Biology at Sequenom, "because we're measuring the molecule of interest directly."
At the heart of this technology is a beadless, label-free, primer-extension chemistry called homogeneous MassEXTEND™ (hME). Each allele-specific primer extension product has a unique molecular weight that allows individual genotyping with mass spectrometry, as well as multiplexing when assays are designed in a staggered fashion. Using Sequenom's newest MassARRAY™ 200K system, researchers can genotype 200,000 samples per day, or as many as one million SNPs with fivefold multiplexing.
In an hME assay, a sequence-specific primer is annealed to an area adjacent to the polymorphic site and extended differentially depending on which allele is present, producing allele-specific products that are distinguishable by MALDI-TOF MS (matrix-assisted laser desorption/ionization time-of-flight mass spectrometry). The sample is then dispensed to a SpectroCHIP™ target plate (96 or 384 elements) for mass spectrometry. Mass and correlating genotyping is determined with MassARRAY RT software.
"This technology is easy to use," says Edward I. Ginns, director, Brudnick Neuro-psychiatric Research Institute at the University of Massachusetts Medical School, Worcester. "But we also chose it because of the potential developments in future high-throughput technology, for its flexibility, and because it isn't likely to become outdated."
Sequenom is using its Allele Frequency Analysis application internally to complete eight independent high-resolution scans of the human genome and plans to finish at least seven more by midyear. The Mass- ARRAY platform is in use at more than 85 customer sites worldwide, including the Wellcome Trust Sanger Institute and the Whitehead Institute for Biomedical Research, which together account for a substantial portion of the work performed for the SNP Consortium's Allele Frequency/ Genotype Project.
WHAT'S NEXT The approaches described above represent only a fraction of available high-throughput genotyping technologies. Other options include offerings from Amersham Biosciences, Applied Biosystems, Beckman Coulter, GlaxoSmithKline, Luminex, Lynx Therapeutics, Molecular Devices, Nanogen, PerkinElmer, Qiagen, Third Wave Technologies, and Variagenics.
Which approach will ultimately prove hardiest is anyone's guess: Most of the methods described here are still too new to have been evaluated in the peer-reviewed literature. Until this happens, says Altshuler, "Many scientists and institutions will remain on the sidelines, waiting to see which ones perform as described and emerge as winners. The HapMap project intentionally involves a variety of genotyping methods, with the expectation that most of the effective technologies will be sorted out within a year."
Marilee Ogren (Marilee.firstname.lastname@example.org) is a freelance writer in Boston.
1. F.S. Collins et al., "Variations on a theme: Cataloging human DNA sequence variation," Science, 278:1580-1, 1997.
2. F.S. Collins et al., "New goals for the US human genome project: 1998-2003," Science, 282:682-9, 1998.
3. S.B. Gabriel et al., "The structure of haplotype blocks in the human genome," Science, 296:2225-9, 2002.
4. N. Patil et al., "Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21," Science, 294:1719-23, 2001.