Restructuring Human Variation

Investigators put deletions on the map of human genetic variation.

By | August 1, 2008

Following the sequencing of the human genome in 2001, genetic variation between people was largely pinned on simple sequence differences known as single-nucleotide polymorphisms, or SNPs. This led to large-scale SNP-mapping ventures, such as the International HapMap Project, to identify regions of the genome underlying phenotypic variation and disease susceptibility. But SNPs are only part of the picture. Recently, scientists are realizing that structural differences - including deletions, duplications, inversions, and copy-number variants - encompass millions of bases of DNA, and are at least as important as SNPs in contributing to genomic variation in humans.

In 2004, two landmark studies showed that gains or losses of large swaths of DNA - known as copy number variants (CNVs) - are common features of the human genome. These first genome-wide studies identified a few hundred CNVs, but because of the techniques used, researchers could detect only large-scale differences of roughly 50 kb and greater. Then, in early 2006, two new studies, both Hot Papers this month, discovered close to 700 finer-scale CNVs within the human genome. Both papers looked for odd patterns in the existing HapMap SNP data to uncover deletion "footprints." One study discovered apparent violations of Mendelian inheritance,1 while the other inspected clusters of SNPs that are out of expected equilibrium frequencies, and other genotyping errors.2 These papers also showed that deletions and their neighboring SNPs are tightly linked, indicating that most polymorphic deletions have ancient origins.

The large number of segregating deletions "was an eye opener for all of us," says Jonathan Pritchard of the University of Chicago, who led one of the studies. "It changes the way we think about the stability of the genome." These Hot Papers revealed the extent of genomic dynamism and opened the flood gates to the wave of structural variation that genomicists have discovered since.

Variation Investigation

Evan Eichler of the University of Washington describes deletion polymorphisms as "binary CNVs," because only two possible states exist in an individual: The genomic region is either there or it's not. Deletions, however, make up only a small subset of a much larger number of CNVs and structural variants in general, says Lars Feuk of the Hospital for Sick Children in Toronto. Feuk helps maintain the online Database of Genomic Variants (, which, as of April, contained 9,735 individual variants greater than 100 bp.

The Hot Papers' novel statistical methods for finding deletions based on the existing SNP data were clever, says Jonathan Sebat of Cold Spring Harbor Laboratory, but they were heavily biased because of inherent limitations of the HapMap project. "The patterns the [authors] observed were true for unique regions of the genome, but they're not necessarily true for complex regions where deletions reoccur with high frequencies," he notes. That's why most people are using different methods to find structural variants today. "Analysis of intensity data is where the money's at," Sebat says. This approach helped construct an unbiased genome-wide CNV map, and discovered around 1,500 CNVs greater than 1 kb covering 12% of the genome.3

Others, though, are using different tactics. Scott Devine of Emory University reanalyzed previously generated DNA sequence traces from the HapMap project to probe for even smaller structural variants. He found more than 400,000 deletion and insertion polymorphisms ranging from 1 bp up to 10 kb.4 He says he also has unpublished data revealing 1.5 million more deletions, over 99% of which are less than 100 bp. "Add up the bases, and it's almost as many as the known SNPs," he says.

Many researchers are also turning to resequencing techniques. A recent comparison of Craig Venter's diploid genome with the human genome reference sequence found close to a million structural variants encompassing around 10 Mb of DNA.5 Other approaches, including new high-throughput sequencing and older-generation technologies, are also probing genome-wide variation. Earlier this year, Eichler and his colleagues used fosmid clone-based sequencing of eight genomes to identify close to 1,700 CNVs greater than 8 kb, around a third of which were not present in the human reference genome sequence.6

A Structured approach

Many insertion and deletion polymorphisms land in the coding regions of genes. "So-called normal people are walking around with broken copies of genes very frequently," says Devine. For example, researchers at the Karolinska University Hospital in Stockholm showed earlier this year that individuals homozygous for one of the most commonly deleted genes, UGT2B17, identified in the Hot Paper by Steven McCarroll's group at the Broad Institute have lower levels of urinary testosterone, suggesting that steroid users might often pass undetected in current athletic doping tests simply based on their DNA.7

Still, there's a long way to go before we have a complete understanding of human genetic variation, cautions Eichler. "We've only captured a subset of a subset of the complete view of structural variation," he says. Many current hybridization probes can reliably detect some CNVs, and two newly developed genotyping platforms from Affymetrix and Illumina include CNV probes in combination with SNP probes. But the next step, notes Eichler, is to design more comprehensive microarray chips dedicated to genome-wide structural variation. Feuk says that this goal might not be far off: "Within a year, I think we'll see the first arrays targeted specifically toward structural variation."

Data derived from the Science Watch/Hot Papers database and the Web of Science (Thomson ISI) show that Hot Papers are cited 50 to 100 times more often than the average paper of the same type and age. D.F. Conrad et al., "A high-resolution survey of deletion polymorphism in the human genome," Nat Genet, 38:75-81, 2006. (Cited in 149 papers) S.A. McCarroll et al., "Common deletion polymorphisms in the human genome," Nat Genet, 38:86-92, 2006. (Cited in 152 papers)


1. S.A. McCarroll et al., "Common deletion polymorphisms in the human genome," Nat Genet, 38:86-92, 2006. (Cited in 149 papers) 2. D.F. Conrad et al., "A high-resolution survey of deletion polymorphism in the human genome," Nat Genet, 38:75-81, 2006. (Cited in 152 papers) 3. R. Redon et al., "Global variation in copy number in the human genome," Nature, 444:444-54, 2006. 4. R.E. Mills et al., "An initial map of insertion and deletion (INDEL) variation in the human genome," Genome Res, 16:1182-90, 2006. 5. S. Levy et al., "The diploid genome sequence of an individual human," PLoS Biol, 5:e254, 2007. 6. J.M. Kidd et al., "Mapping and sequencing of structural variation from eight human genomes," Nature, 453:56-64, 2008. 7. J.J. Schulze et al., "Doping test results dependent on genotype of UGT2B17, the major enzyme for testosterone glucuronidation," J Clin Endocrinol Metab, doi:10.1210/jc.2008-0218, published on-line, March 11, 2008.


Avatar of: anonymous poster

anonymous poster

Posts: 23

August 21, 2008

Perhaps SNPs may be responsible for major differences between phenotypic groups, such as races or ethnic groups, while all of the other things lead to the differences between families and/or individuals.

Popular Now

  1. Next Generation: Nanotube Scaffolds Reconnect Spinal Neurons
  2. Mapping the Human Connectome
    Daily News Mapping the Human Connectome

    A new map of human cortex combines data from multiple imaging modalities and comprises 180 distinct regions.

  3. Will Organs-in-a-Dish Ever Replace Animal Models?
  4. Your Office Has a Distinct Microbiome