Structural variations common in human genome

At least 12 percent of the genome is made of regions that vary in number across individuals, according to four new papers

Written byCharles Q. Choi
| 4 min read

Register for free to listen to this article
Listen with Speechify
0:00
4:00
Share
Regions where large segments of DNA are gained or lost cover at least 12 percent of the human genome, far more than previously thought, an international consortium of scientists report in four papers in three journals this week. The findings could help scientists identify new traits with medical or other phenotypic relevance and understand human evolution, Wan Lam at the British Columbia Cancer Research Center in Vancouver, who did not participate in the studies, told The Scientist. "The work is beautiful," added Evan Eichler at the University of Washington in Seattle, who was not a coauthor. The consortium focused on copy number variable regions (CNVRs), which are DNA segments 500 base pairs or larger found in varying numbers in different people. In their Nature paper, coauthor Matthew Hurles at the Wellcome Trust Sanger Institute in Cambridge, England, and his colleagues report the first comprehensive map of CNVRs. The researchers analyzed DNA samples from 270 individuals from four populations with ancestry in Africa, Europe or Asia who were part of the International HapMap Project. Specifically, they screened for copy number variants (CNVs) using two complementary technologies detailed in the consortium's new papers in Genome Research: single-nucleotide polymorphism genotyping arrays and clone-based comparative genomic hybridization (CGH). SNP genotyping arrays investigated assayed samples for more than 500,000 known SNPs, looking for stretches of adjacent SNPs that occurred in levels different from expected ratios. Clone-based CGH compared samples from 269 of the HapMap individuals against DNA from one HapMap individual chosen as a reference standard, looking for differences in copy number among more than 26,000 large-insert cloned segments that span nearly all of the currently sequenced portion of the genome. The reference individual was male, to allow detection of CNVs on the Y chromosome, and was the son of two HapMap participants, to maximize prior information on his CNVs.SNP genotyping arrays and clone-based CGH found CNVs with an average size of 206 and 341 kilobases, respectively. While SNP genotyping was better at detecting smaller CNVs, clone-based CGH had large specific targets, leading to less background noise in results and making it better at detecting CNVs in more complex regions of the genome, such as those where two or more segments are duplicated.The two technologies combined found 1,447 CNVRs, covering an eighth of the human genome. "We could have culled more CNVs from our data," Hurles told The Scientist via email. However, he said they aimed conservatively "so that investigators are not overwhelmed by the false positives that are inevitable in any study of this nature."The CNVRs contained 2,908 genes, 285 of which are linked to disease. They also contained 67 non-coding RNAs, 50 ultraconserved elements and 130,353 conserved non-coding sequences. Notably, CNVRs encompass about two to three times as much nucleotide content per genome as SNPs do, Hurles said."I suspect our paper will be a wakeup call to a lot of scientists that they need to start incorporating a 'CNV-analysis step' in their study designs if they want to fully understand their data," Stephen Scherer at the Hospital for Sick Children in Toronto, coauthor on all four papers, told The Scientist via email.The consortium found that only about 10 to 15 percent of copy number variation occurred between populations. "This is as a result of our recent common ancestry in Africa," Hurles said. The researchers suggest these differences could explain the increased prevalence of some diseases in certain populations. For instance, prior studies have shown that one CNV the consortium confirmed, UGT2B17, is a gene linked to an increased risk of prostate cancer in populations of African and European descent. The consortium is now expanding its studies to thousands of healthy individuals from populations outside the HapMap collection."We think that we are at the stage where we can confidently detect CNVs of 50 kilobases or more, but we think that the overall aim must be to increase resolution by two orders of magnitude such that we can detect CNVs of 500 base pairs. Our consortium is currently pursuing this objective," Hurles said. A large number of CNVs remain to be found, Lam agreed in an email to The Scientist. Lam's team has found many CNVs that are not seen in the consortium's new papers, and vice versa."The important next steps will be to identify the specific variants within these regions, to learn what the alleles are -- zero copies? three copies? -- and to develop ways to type these variants in large patient cohorts so that researchers can see whether these variants are associated with disease risk," Steven McCarroll at Massachusetts General Hospital in Boston, who was not a coauthor, told The Scientist via email.Scherer added that the researchers would "also like to have a much better idea of the precise start and end points of the CNVs so we can better understand the underlying mechanism -- that is, are they random events or DNA sequence-driven?" The scientists aim to better understand "the new mutation rate of CNVs and how this might be dependent on the region of the genome involved," he said. In the future, the most sensitive way to identify all kinds of DNA variation, including SNPs, CNVs and inversions of sequences, may be to directly compare whole genomes, according to the consortium. In their Nature Genetics paper, the consortium computationally compared whole genome sequences assembled by the two human genome projects and confirmed more than 1.5 million SNPs and 240 variable regions, including CNVs and inversions.Charles Q. Choi cchoi@the-scientist.comLinks within this article:Wan Lam http://www.bccrc.ca/cg/people_wanlam.htmlEvan Eichler http://eichlerlab.gs.washington.eduJ.P. Roberts. "Looking at Variation in Numbers," The Scientist, March 14, 2005 http://www.the-scientist.com/article/display/15302/R. Redon et al. "Global variation in copy number in the human genome," Nature 444: 444-54, Nov. 23, 2006. http://www.nature.com/nature/journal/v444/n7118/full/nature05329.htmlMatthew Hurles http://www.sanger.ac.uk/Teams/Team29A. Constans. "A Practical Guide to the HapMap," The Scientist, Feb. 1, 2006. http://www.the-scientist.com/article/display/23052D. Komura et al. "Genome-wide detection of human copy number variations using high density DNA oligonucleotide arrays," Genome Research, published online ahead of print Nov. 22, 2006. http://www.genome.org/cgi/content/abstract/gr.5629106v1H. Fiegler et al. "Accurate and reliable high-throughput detection of copy number variation in the human genome."Genome Research, published online ahead of print Nov. 22, 2006. http://www.genome.org/papbyrecent.shtmlJ.L. Peirce. "Following Phylogenetic Footprints," The Scientist, September 27, 2004 http://www.the-scientist.com/article/display/14954Stephen Scherer http://www.the-scientist.com/article/display/21257R. Khaja et al. "Genome assembly comparison identifies structural variants in the human genome," Nature Genetics, published online ahead of print Nov. 22, 2006. http://www.nature.com/ng/journal/vaop/ncurrent/abs/ng1921.htmlV.K. McElheny. "The Human Genome Project +5," The Scientist, Feb. 1, 2006. http://www.the-scientist.com/article/display/23065
Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member? Login Here

Meet the Author

Share
July Digest 2025
July 2025, Issue 1

What Causes an Earworm?

Memory-enhancing neural networks may also drive involuntary musical loops in the brain.

View this Issue
Genome Modeling and Design: From the Molecular to Genome Scale

Genome Modeling and Design: From the Molecular to Genome Scale

Twist Bio 
Screening 3D Brain Cell Cultures for Drug Discovery

Screening 3D Brain Cell Cultures for Drug Discovery

DNA and pills, conceptual illustration of the relationship between genetics and therapeutic development

Multiplexing PCR Technologies for Biopharmaceutical Research

Thermo Fisher Logo
Discover how to streamline tumor-infiltrating lymphocyte production.

Producing Tumor-infiltrating Lymphocyte Therapeutics

cytiva logo

Products

The Scientist Placeholder Image

Sino Biological Sets New Industry Standard with ProPure Endotoxin-Free Proteins made in the USA

sartorius-logo

Introducing the iQue 5 HTS Platform: Empowering Scientists  with Unbeatable Speed and Flexibility for High Throughput Screening by Cytometry

parse_logo

Vanderbilt Selects Parse Biosciences GigaLab to Generate Atlas of Early Neutralizing Antibodies to Measles, Mumps, and Rubella

shiftbioscience

Shift Bioscience proposes improved ranking system for virtual cell models to accelerate gene target discovery