Following the sequencing of the human genome in 2001, genetic variation between people was largely pinned on simple sequence differences known as single-nucleotide polymorphisms, or SNPs. This led to large-scale SNP-mapping ventures, such as the International HapMap Project, to identify regions of the genome underlying phenotypic variation and disease susceptibility. But SNPs are only part of the picture. Recently, scientists are realizing that structural differences - including deletions, duplications, inversions, and copy-number variants - encompass millions of bases of DNA, and are at least as important as SNPs in contributing to genomic variation in humans.
In 2004, two landmark studies showed that gains or losses of large swaths of DNA - known as copy number variants (CNVs) - are common features of the human genome. These first genome-wide studies identified a few hundred CNVs, but because of the techniques used, researchers could detect only large-scale differences of roughly 50 kb...
Variation Investigation
Evan Eichler of the University of Washington describes deletion polymorphisms as "binary CNVs," because only two possible states exist in an individual: The genomic region is either there or it's not. Deletions, however, make up only a small subset of a much larger number of CNVs and structural variants in general, says Lars Feuk of the Hospital for Sick Children in Toronto. Feuk helps maintain the online Database of Genomic Variants (http://projects.tcag.ca/variation), which, as of April, contained 9,735 individual variants greater than 100 bp.
The Hot Papers' novel statistical methods for finding deletions based on the existing SNP data were clever, says Jonathan Sebat of Cold Spring Harbor Laboratory, but they were heavily biased because of inherent limitations of the HapMap project. "The patterns the [authors] observed were true for unique regions of the genome, but they're not necessarily true for complex regions where deletions reoccur with high frequencies," he notes. That's why most people are using different methods to find structural variants today. "Analysis of intensity data is where the money's at," Sebat says. This approach helped construct an unbiased genome-wide CNV map, and discovered around 1,500 CNVs greater than 1 kb covering 12% of the genome.
Others, though, are using different tactics. Scott Devine of Emory University reanalyzed previously generated DNA sequence traces from the HapMap project to probe for even smaller structural variants. He found more than 400,000 deletion and insertion polymorphisms ranging from 1 bp up to 10 kb.
Many researchers are also turning to resequencing techniques. A recent comparison of Craig Venter's diploid genome with the human genome reference sequence found close to a million structural variants encompassing around 10 Mb of DNA.
A Structured approach
Many insertion and deletion polymorphisms land in the coding regions of genes. "So-called normal people are walking around with broken copies of genes very frequently," says Devine. For example, researchers at the Karolinska University Hospital in Stockholm showed earlier this year that individuals homozygous for one of the most commonly deleted genes, UGT2B17, identified in the Hot Paper by Steven McCarroll's group at the Broad Institute have lower levels of urinary testosterone, suggesting that steroid users might often pass undetected in current athletic doping tests simply based on their DNA.
Still, there's a long way to go before we have a complete understanding of human genetic variation, cautions Eichler. "We've only captured a subset of a subset of the complete view of structural variation," he says. Many current hybridization probes can reliably detect some CNVs, and two newly developed genotyping platforms from Affymetrix and Illumina include CNV probes in combination with SNP probes. But the next step, notes Eichler, is to design more comprehensive microarray chips dedicated to genome-wide structural variation. Feuk says that this goal might not be far off: "Within a year, I think we'll see the first arrays targeted specifically toward structural variation."