Following the sequencing of the human genome in 2001, genetic variation between people was largely pinned on simple sequence differences known as single-nucleotide polymorphisms, or SNPs. This led to large-scale SNP-mapping ventures, such as the International HapMap Project, to identify regions of the genome underlying phenotypic variation and disease susceptibility. But SNPs are only part of the picture. Recently, scientists are realizing that structural differences - including deletions, duplications, inversions, and copy-number variants - encompass millions of bases of DNA, and are at least as important as SNPs in contributing to genomic variation in humans.
In 2004, two landmark studies showed that gains or losses of large swaths of
DNA - known as copy number variants (CNVs) - are common features of the human
genome. These first genome-wide studies identified a few
hundred CNVs, but because of the techniques used, researchers could detect only
large-scale differences of roughly 50 kb and greater. Then, in early 2006, two new
studies, both Hot Papers this month, discovered close to 700 finer-scale CNVs within
the human genome. Both papers looked for odd patterns in the existing HapMap SNP
data to uncover deletion "footprints." One study discovered apparent violations of
The large number of segregating deletions "was an eye opener for all of us," says Jonathan Pritchard of the University of Chicago, who led one of the studies. "It changes the way we think about the stability of the genome." These Hot Papers revealed the extent of genomic dynamism and opened the flood gates to the wave of structural variation that genomicists have discovered since.
Evan Eichler of the University of Washington describes deletion polymorphisms as "binary CNVs," because only two possible states exist in an individual: The genomic region is either there or it's not. Deletions, however, make up only a small subset of a much larger number of CNVs and structural variants in general, says Lars Feuk of the Hospital for Sick Children in Toronto. Feuk helps maintain the online Database of Genomic Variants (http://projects.tcag.ca/variation), which, as of April, contained 9,735 individual variants greater than 100 bp.
The Hot Papers' novel statistical methods for finding deletions based on the
existing SNP data were clever, says Jonathan Sebat of Cold Spring Harbor Laboratory,
but they were heavily biased because of inherent limitations of the HapMap project.
"The patterns the [authors] observed were true for unique regions of the genome, but
they're not necessarily true for complex regions where deletions reoccur with high
frequencies," he notes. That's why most people are using different methods to find
structural variants today. "Analysis of intensity data is where the money's at,"
Sebat says. This approach helped construct an unbiased genome-wide CNV map, and
discovered around 1,500 CNVs greater than 1 kb covering 12% of the
Others, though, are using different tactics. Scott Devine of Emory University
reanalyzed previously generated DNA sequence traces from the HapMap project to probe
for even smaller structural variants. He found more than 400,000 deletion and
insertion polymorphisms ranging from 1 bp up to 10 kb.
Many researchers are also turning to resequencing techniques. A recent
comparison of Craig Venter's diploid genome with the human genome reference sequence
found close to a million structural variants encompassing around 10 Mb of
A Structured approach
Many insertion and deletion polymorphisms land in the coding regions of
genes. "So-called normal people are walking around with broken copies of genes very
frequently," says Devine. For example, researchers at the Karolinska University
Hospital in Stockholm showed earlier this year that individuals homozygous for one
of the most commonly deleted genes, UGT2B17, identified in the Hot
Paper by Steven McCarroll's group at the Broad Institute have lower levels of
urinary testosterone, suggesting that steroid users might often pass undetected in
current athletic doping tests simply based on their DNA.
Still, there's a long way to go before we have a complete understanding of human genetic variation, cautions Eichler. "We've only captured a subset of a subset of the complete view of structural variation," he says. Many current hybridization probes can reliably detect some CNVs, and two newly developed genotyping platforms from Affymetrix and Illumina include CNV probes in combination with SNP probes. But the next step, notes Eichler, is to design more comprehensive microarray chips dedicated to genome-wide structural variation. Feuk says that this goal might not be far off: "Within a year, I think we'll see the first arrays targeted specifically toward structural variation."