Between missing chunks of chromosomes and single nucleotide polymorphisms (SNPs) lies a vast middle ground of genomic alterations. Among these are copynumber variations (CNVs) - the differences between individuals in the number of copies of a genomic region. "The total nucleotide content that is encompassed by CNVs most certainly exceeds that of SNPs," says Stephen Scherer of The Hospital for Sick Children in Toronto.
The recent surge of interest in CNVs has induced a proliferation of technologies designed to detect them in normal DNA, congenital diseases, and cancer cells, in which copynumber changes may induce their unruly divisions.
Scientists have largely turned to comparative genomic hybridization (CGH) arrays, which involve hybridizing two genomes - one as a reference and one to be tested...
BAC IN TIME
User: Frank Speleman, Ghent University Hospital, Belgium
The project: Using BAC arrays left over from the Human Genome Project to screen cell lines for CNVs that associate with disease, including neuroblastoma and Hodgkin lymphoma.
The problem: BAC arrays are time-intensive to create, and the arrays themselves are not completely reproducible. Also, they might miss copy-number differences smaller than 50-100 kb.
The solution: Oligonucleotide and SNP arrays offer much quicker answers to copy-number questions, Speleman says. He expects that BAC arrays will die out in the near future, but his group continues to use them, mainly because of the time they've already invested in creating them. Moreover, since their genome coverage is comprehensive, BACs also offer more robust data than other arrays.
ANEUPLOID PROBLEMS
User: Ken Lo, Roswell Park Cancer Institute, Buffalo, New York
The project: Looking for CNVs in medulloblastoma and glioblastoma cell lines.
The problem: Tumor samples have two characteristics that make copynumber analyses difficult: abnormal numbers of chromosomes, and a heterogenous cell composition.
The solution: Lo's group uses both BAC arrays and Affymetrix SNP arrays, and picks through their data to manually correct suspected errors. For example, a detected single-copy gain might actually be a loss in a tetraploid cell or a gain in a diploid cell. They then look for a loss of heterozygosity (the loss of one parental allele) to decide which is the case.
Current platforms "are all designed based on the premise that the natural ploidy state of your DNA sample is two," Lo says. What's more, he adds, "sometimes, you can't tell whether [a data problem] is a ploidy issue or a tumor heterogeneity issue." Tumors are often mixtures of many types of cells, and copy-number changes occur differently in each. Since the cells are merged before DNA extraction, the results reflect an average across different cell types. His group is collaborating with Yuhang Wang, a computer scientist at Southern Methodist University in Dallas, who is developing algorithms to control for these issues. For now, says Lo, "you have to really, really think about the results."
OLD DOG, NEW TRICKS
User: George Zogopoulos, University of Toronto
The project: Genome-wide scans to detect CNVs in the general population and in patients with gastrointestinal cancer.
The problem: Genotyping platforms like the Affymetrix array that Zogopoulos uses can generate noisy data and don't cover the whole genome, particularly regions rich in repetitive sequences that are likely to contain CNVs.
The solution: "Given that [SNP arrays] weren't primarily designed for this, it's important to validate using a second laboratory approach," Zogopoulos says. He and his colleagues confirm their results with quantitative PCR. The sensitivity of PCR is often right at the needed level to detect copy-number changes, with the use of a high number of replicates to generate statistical power, he says.
Despite their drawbacks, Zogopoulos stuck with SNP-based arrays for their sensitivity to very fine-scale copynumber changes, and the ability to detect both SNPs and CNVs in a single assay. "We had generated the data for a different project, and we took advantage of the wealth of genetic data and reanalyzed it for copynumber variation."
MULTIPLEX FOR CONTROL
User: Matthew Hurles, Genome Dynamics and Evolution Group, The Wellcome Trust Sanger Institute, Cambridge, UK
The project: Screening genomic samples from thousands of individuals to look for an association with common conditions such as diabetes, rheumatoid arthritis, and hypertension.
The problem: Results from such a large sample are often plagued by what Hurles terms "batch effects." Quality between sets of extracted DNA can vary, or discrepancies in DNA processing might arise simply because different people run the samples at different times. "Systematic differences can really screw up your association studies," Hurles says.
The solution: Most manufacturers offer "multiplex" arrays, which contain fewer probes but multiple sets of the same probes, Hurles says. "They're not going to give you whole-genome coverage, but they're targeted towards the CNVs that you already know exist." One of the best ways to control for batch issues is to run control genomic samples on each subarray in the multiplex setup, he says. "That kind of approach minimizes any systematic differences that you might get between cases and controls. It doesn't get rid of the effect, but it means it has less of an impact on your association testing." As for quality differences in your original DNA samples, Hurles says, "ideally you control that by not having it in the first place."
NEEDLE IN THE HAYSTACK
User: Yao-Shan Fan, University of Miami
The project: Detecting pathogenic gene CNVs in patients with unexplained mental retardation.
The problem: Fan needs to detect CNVs associated with a disorder without overwhelming his detections with normal variations that occur in everyone.
The solution: Oligonucleotide arrays, which provide poorer signal-to-noise ratios than large-insert clones such as BACs but dense genome coverage, have just the right level of specificity and resolution for this type of project, Fan says. He uses the Agilent platform, which covers the whole human genome with 44,000 probes. Detecting normal CNVs in healthy individuals might work with an SNP array because of its superior resolution, Fan says, but that isn't desirable for his studies. "If you use the array with a very high resolution, then you see a lot of normal variations, and it's hard to pick up the pathogenic one," he says. Traditional BAC arrays, on the other hand, have resolutions that are too low for his purposes.