Looking at Variation in Numbers

The massive efforts to systematically find and catalog single nucleotide polymorphisms (SNPs) bear witness to the conviction that small genomic changes may provide clues to the origins of such things as heart problems, obesity, and pharmacologic responses.

By | March 14, 2005


The massive efforts to systematically find and catalog single nucleotide polymorphisms (SNPs) bear witness to the conviction that small genomic changes may provide clues to the origins of such things as heart problems, obesity, and pharmacologic responses.

But another type of variation, largely overlooked by the genetics community, might ultimately make equally important contributions to health. Large, submicroscopic rearrangements comprise about 5%-10% of the human genome. Many of these contain duplications that vary in the number of times and ways they are repeated: tandemly, at distal parts of the same chromosome, or even on other chromosomes.

At least three papers last summer dealt with the advantage of new technologies used to discover the extent to which these polymorphic rearrangements are duplications of genes found elsewhere in the genome. And research published in January documented the first instance of the resulting gene-dosage effect on disease susceptibility: the effect of copy number of the CCL3L1 chemokine gene on susceptibility to HIV infection and progression to AIDS.1

"In a sense, we're just seeing the tip of the iceberg here in terms of the potential importance of copy-number variation to human disease, to human health, and to evolutionary change," says James Sikela, professor in human medical genetics at the University of Colorado Health Sciences Center in Aurora. "I think there's a newly-found appreciation for how important these copy numbers can be."


HIV-1 uses the CCR5 chemokine receptor like a keyhole to unlock and enter a cell. "There's a battle going on for this keyhole," says Sunil Ahuja, director of the Veterans Administration Center for AIDS and HIV-1 Infection in San Antonio, and one of the recent study's corresponding authors. The most potent endogenous key is CCL3L1, a chemokine whose gene is found anywhere from zero to at least 14 times in a normal diploid genome. CCL3L1 is also known to be a potent anti-HIV-1 chemokine. "You can imagine that if there are individuals who produce high amounts of the chemokines, there might be a possibility that they would gum up this keyhole and prevent the entry of the virus," says Ahuja.

Robert Nibbs and his group at Beatson Institute for Cancer Research in Glasgow, Scotland, discovered the CCL3L1 copy-number polymorphism (CNP) and the copy-number correlation with production of the chemokine.2 "The prediction would be," says Nibbs, "that those individuals with high copy number would be protected from HIV infection, and also be protected from progression once they have become infected." He adds, "What Professor Ahuja has done is to prove that that hypothesis is correct."

Ahuja teamed up with Matthew Dolan, who oversees the US military's Tri-Service AIDS Clinical Consortium (TACC) cohort. A unique feature of that cohort, comprising 1,300 HIV-positive individuals who were in the US Air Force, is that the participants "all had uniform access to health care and were a racially balanced population," says Dolan. "You can look at questions that are constrained by ethnicity, and you can also eliminate the factor of adequate access to health care," he explains, pointing out that patients with HIV often do not join a prevalent cohort until years after they become infected.

As predicted, Ahuja and Dolan found that individuals who had a higher number of CCL3L1 copies were less likely to become infected with HIV, or to progress to AIDS once they were infected. But there was a twist. Individuals of African origin had an average of about six copies per genome, whereas those of European ancestry had an average of two per genome. They found that a person's copy number relative to the population average matters more than absolute gene dosage. The contribution of CCL3L1 gene dosage could be teased apart from that of the noncopy-dependent variant of CCR5, already known to confer some resistance to HIV infection and HIV progression.

Such an analysis could not have been done in the United Kingdom, says Nibbs, "We don't have the access to the kind of cohorts that Professor Ahuja and his collaborators do."

Indeed, in a paper last year that examined a different cohort, the researchers found no correlation between the absence of CCL3L1 and susceptibility to HIV infection or the rate of its progression.3 Graeme Stewart, corresponding author from Westmead Millennium Institute at West-mead Hospital in Sydney, writes in an E-mail: "Our study doesn't contradict their results, as we only examined the proportion of people with HIV who fail to express any CCL3L1 (null/null)." He adds, "Since such people are few we did not have the power to detect a partial effect."



© 2004 Nature Publishing Group

The interchromosomal (red) and intrachromosomal (blue) duplications (> 20 kb, > 95%) for chromosome 16. Chromosome 16 is drawn 20 times larger than scale to other chromosomes. Centromeres (purple) are shown for reference. (J. Martin et al., Nature, 432:988–94 Dec. 22, 2004.)

The TACC study may be the first to correlate CNPs with disease susceptibility, says Evan Eichler, associate professor of genome science at the University of Washington, Seattle. He calls it "a beautiful piece of work ... a Christmas present," a sentiment shared by many. But it was not the first time that researchers have seen a gene-dosage effect. Genes for the Rhesus factor blood group, cytochrome P450, glutathione S-transferase, and drug susceptibility, for example, "are all known to be copy-number variants in the population," Eichler says.

Baylor College of Medicine pediatrician and genetics professor James Lupski argued in 1991 that a common inherited neuropathy, Charcot-Marie-Tooth disease, was due to the duplication of a large segment of chromosome 17.4 The 1.5-megabase segment comprises 21 different genes, and only the one for peripheral myelin protein 22 is gene-dosage sensitive.

"At the time, there was a lot of resistance to the idea that you could get clinical phenotype related to just gene dosage, not having an aberrant protein or abnormal gene," Lupski says. Yet it was already well known that Down syndrome, caused by duplication of an entire chromosome 21, is the most common genetic disease, affecting one in 600 live births. "We were so fixated on mutations," Lupski says. "Dosage can obviously have phenotypic consequences."

The field of human genetics had focused on characterizing single Mendelian traits found in very small portions of the population, Eichler says. But now geneticists are shifting that focus to more complex diseases: those with multiple, smaller contributions from several factors. With the TACC study, he notes, it became apparent that the "ability to become infected with HIV, or develop AIDS, [is] a complex interplay between the environment, single base-pair mutations, as well as copy-number variation."


Eichler has been working on mapping genetic duplications, though he won't discuss details. "We can safely say that we've mapped all the sites of duplication in at least three or four individuals." To determine which of these sites have variants across the population, or the extent of that variation, requires a far larger and broader sampling. Those are the more interesting duplications, medically speaking: "If everyone has the same copy number, even though it's duplicated, it probably doesn't mean that there's going to be any association there with disease," Eichler notes.

Determining the extent of variable duplication is another matter. At least three studies have used microarray screening, each in a different way, to screen the human genome for large-scale variation.

A group led by Michael Wigler of Cold Spring Harbor Laboratories in New York used representational oligonucleotide microarray analysis (ROMA) to measure the relative concentration of DNA segments in the population. About 85,000 oligonucleotides were printed onto a glass microarray and then hybridized with differentially labeled genomic digests from 20 different individuals. The experiment found 76 unique CNPs of about 100 kilobases and greater, with 70 genes among them, "including genes involved in neurological function, regulation of cell growth, regulation of metabolism, and several genes known to be associated with disease."5

In another study, Stephen Scherer at the Hospital for Sick Children in Toronto and Charles Lee at Harvard University led a group that used array-based comparative genomic hybridization (aCGH) to look for CNPs.6 Arrayed BAC-derived genomic clones were hybridized with the labeled genome digests of 55 individuals. They found 255 variable loci, half of which overlap with genes. Of those 255 variable regions, only 11 were detected by both the Wigler group and the Lee/Scherer group.


© A. Fortna, et al.

A number of great ape and human lineage-specific gene copy-number variations are apparent from genome-wide cDNA array comparative genomic hybridization. Each horizontal row represents aCGH data for one cDNA clone on the microarray, while each vertical column represents data from one experiment, (H=human, B=bonobo, C=chimpanzee, G=gorilla, and O=orangutan). Regions shown contain lineage-specific genes (vertical black lines) and adjacent flanking genes ordered by chromosome map position using the UCSC Golden Path genome assembly (November 2002 sequence freeze). Arrows denote from which hominoid lineage the copy number change is unique. (From A. Fortna et al., PLoS Biol, 7:E207, July 2, 2004.)

Methodological differences, such as different density and scope of the arrays used, and perhaps the use of different builds of the reference genome, probably account for much of the discrepancy between the studies, notes Nigel Carter in a commentary7 to the Lee/Scherer study. Carter writes: "It is common practice in selecting clones or probes for [aCGH] to avoid regions that hybridize to more than one genomic location or show variation," concluding that "many more [large-scale copy-number variations] probably remain to be discovered."

Also, only about half of the variable regions found in either study were detected in more than a single individual. This leaves open the question of whether these duplications arose uniquely in the individual screened, or perhaps the sample size was too small to detect a more common polymorphism.


At about the time the latter papers appeared last summer, another group, from Stanford University and the University of Colorado Health Science Center, published work that used cDNA arrays to undertake what they call the first genome-wide gene-based survey of gene duplication across hominoid species.8 cDNAs representing nearly 30,000 human genes were spotted onto glass slides and then hybridized with human and either gorilla, chimpanzee, bonobo, or orangutan genomic digests. In all, the researchers found more than 1,000 genes that showed copy-number changes unique to one or more of the human and great ape lineages. Of these, 134 showed increases in copy number specific to the human lineage, including a number of genes thought to be involved in the structure and function of the brain.

One advantage of using cDNA instead of genomic oligonucleotides or BAC clones is that "we're actually getting gene-specific information when we use these chips," says co-corresponding author Sikela. "We're really excited by some of the genes we've found that are either in-creased or decreased specifically in human, where you could relate it to cognition, language, those kind of things," says Sikela. "Many different traits distinguish these organisms, and it's plausible that copy-number change could be a major reason for that."

Genomes are dynamic, fluctuating entities, which "evolved by duplicating and by inducing variation when they duplicate," says Wigler. Focusing only on SNPs and point mutations, he says, is not enough. "There's a big picture here that people are missing."

In some cases though, the extent of variation has long been under scrutiny. Barbara Trask, director of the human biology division at the Fred Hutchinson Cancer Research Center in Seattle, invokes the collection of human olfactory receptor genes and pseudogenes. Exactly how many, and which, of the 800 or so related sequences (grouped in 17 clades) individuals have varies throughout the population and affects the ability to detect and differentiate smells.

When genes duplicate, the selective pressure to keep them from mutating may no longer be present. Over time, copies may be rendered nonfunctional, or they may take on a new function. "There are going to be cases where additional copies themselves might have a phenotypic effect, and therefore might confer a selective advantage or disadvantage," says Trask.

Now, the technology is available to interrogate the genome for CNPs of such duplications, says Wigler. "What you're going to see is that ... some of the more subtle differences between humans – disease susceptibility genes, and the rates at which people age – are going to be caused by [gene-]dosage effects."

Popular Now

  1. Publishers’ Legal Action Advances Against Sci-Hub
  2. Metabolomics Data Under Scrutiny
    Daily News Metabolomics Data Under Scrutiny

    Out of 25,000 features originally detected by metabolic profiling of E. coli, fewer than 1,000 represent unique metabolites, a study finds.

  3. How Microbes May Influence Our Behavior
  4. Decoding the Tripping Brain