Getting Started with SNPs

Richard Houlston works at the Institute of Cancer Research in Sutton, UK, where he searches for genes that confer susceptibility to disease.

Laura Spinney(
Nov 20, 2005

© Robert Kyllo

Richard Houlston works at the Institute of Cancer Research in Sutton, UK, where he searches for genes that confer susceptibility to disease. Until recently, Houlston mapped microsatellite markers within families. But he is switching to single nucleotide polymorphism (SNP) genotyping using the Affymetrix 10K SNP array, because, says Houlston, there is less room for interpretative error. It is also faster: whereas previously, his team might have processed 150 families in nine months, with one Affymetrix scanner in-house Houlston can complete a similar size study in 15 weeks.

If you've been thinking about doing some genotyping work of your own but don't know how to get started, there are five questions you should answer before deciding on a platform.


Each platform caters to different SNP counts and throughputs. For 10 or fewer SNPs and sample numbers in the thousands, the...


<p>What's Your Type?</p>

Pick a platform that meets your needs

The throughput capacity of the various platforms (how many samples they can process each day) varies by model, level of automation, and number of technicians. An Affymetrix GeneChip Scanner 3000 System can theoretically process 48 arrays (48 samples) in a day, but a lab with one such system and one technician will process just eight samples in an average day, says Ragoussis. It takes two to three days to get from DNA to data with Affymetrix, so the real throughput is either eight samples every two to three days, or eight samples a day with a second technician dedicated to sample preparation. Illumina's DNA-to-data cycle is also two to three days, but one person can process 288 samples (3 × 96-well plates) per cycle.

The ABI 7900 HT can process 384 TaqMan assays in 30 minutes, or 48 plates (18,432 samples) per day, while Sequenom's high-capacity Autoflux mass spectrometer can run 20 384-well chips (7,680 samples). Pyrosequencing can process 96 samples in 10 minutes, but those are PCR products, which must be generated first.


SNPs are hard to come by for organisms other than humans, because the variation is not known, or because lab strains are inbred, yielding poor polymorphism resolution. "We know about the sequences of a lot of other organisms, but we don't know about variation in any other organism like we do with the human," says David Goldstein, director of Duke University's population genetics and pharmacogenetics program in Durham, NC.

The NCBI's dbSNP database lists more than 10 million human SNPs, but just 565,000 for mouse and 1,065 for Caenorhabditis elegans. On the other hand, agricultural genomics, both of farm animals and crop plants, is popular: dbSNP lists 19,409 cow SNPs, and a recent paper identified 1.7 million SNPs in rice1 (see Case Study 2).


The cost of reagents for genotyping is relatively low. Some platforms require a hardware investment, but there are other cost considerations, too. "Per-sample cost is potentially far more important than per-SNP cost, particularly if the SNPs you're calling are not sufficiently informative, and if you need to run additional samples to generate better statistics," says Illumina spokesperson, Bill Craumer. Similarly, a small drop in reproducibility demands significantly more samples for the same statistical power. Consider also SNP discovery, the ability to detect unanticipated variation. Only DNA sequencing provides this capacity across the whole genome, but at a minimum cost of just over $1 a sample it remains prohibitively expensive. For small regions, however, pyrosequencing and other sequencing-based technologies will work.


According to a recent survey by Arlington, Va.-based Bioinformatics, nearly 40% of all SNP genotyping work is either outsourced or performed in a core or collaborator's lab. Among the reasons cited is cost, a factor that can be considerable.

The Wellcome Trust Centre in Oxford, for instance, uses Illumina's BeadArray for whole-genome scanning and Sequenom's MassARRAY for fine mapping. The former system costs $325,000 up front for the scanner, software, installation, and so on; a base Sequenom system costs $289,000. The Broad Institute also uses Sequenom for fine mapping, but mainly Affymetrix for whole-genome scanning. Duke's Goldstein puts his faith in Illumina. "Because my group has been heavily involved in methods for picking particular SNPs to type, I really want a method that allows me to select my own SNPs, as opposed to Affymetrix, which gives you a chip and says, 'Here are your SNPs."'

All three labs perform their genotyping in dedicated core facilities. "We're users really," says Goldstein. "You just want a professional operation, a group whose job it is to operate this machinery accurately and to establish a secure data pipeline."


Richard Bowman works at the Medical Research Council's Dunn Human Nutrition Unit in Cambridge, UK, where he is part of a team that is screening a 25,000-member cohort for seven SNPs, including two for apolipoprotein E (ApoE). The team genotypes via pyrosequencing, using a PSQ 96 MA system they bought from Biotage in 2002.

A DNA sequencing technology, pyrosequencing is capable of identifying unexpected polymorphisms. The throughput is lower than with TaqMan or Sequenom, but Bowman, whose lab processes some 4,000 samples per week, has been impressed by the method's robustness. "We have done 75,000 samples to date and we have a 99% success rate," he says. "That's comparable to, if not slightly better than, other technologies. We also get exceptionally high reproducibility." In particular, they have a high success rate when it comes to testing for the ApoE SNPs. "ApoE is a difficult region to look at because it is GC-rich, and certain assays don't like that," explains Bowman. "But with pyrosequencing, one assay can look at two SNPs that are relatively far apart in that region. Another advantage is that we get the sequence directly, and this acts as an inbuilt quality control."


Steve Moore heads a program at the University of Alberta, Canada, to discover the genes that influence economically important traits in beef cattle. His group is currently working on whole-genome scans using around 3,000 SNPs, which they hope to increase to 6,000 in the next year. "We anticipate this will actually be enough for whole-genome association studies, given the level of linkage disequilibrium found in cattle," he says.

Moore's group genotypes in-house using Illumina BeadArrays. "There are real advantages in terms of flexibility and self determination around this," he says, "The cost for outsourcing is, however, not very different." With one technologist Moore's group can process around 600,000 genotypes per week. "One plate of 96 animals and 1,536 SNPs costs approximately $8,000 (US) to run. You can spend a lot of money very quickly," says Moore. He chose Illumina partly because no robots were needed, which made a big difference in the service costs. But Moore points out that it took three years to get the funding for his genotyping work. "In that time the best platform available changed around four times."

Interested in reading more?

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?