Researchers investigating genetic variation and its contribution to phenotypic differences have gained a windfall of data, described in two Nature papers out today (September 14). A collaboration involving more than forty scientists from the United Kingdom, Germany, and the United States have published the sequences of 17 different mouse genomes, including 13 of the most common lab strains and 4 wild-derived strains. The breadth of the data, publicly available, promises to aid research tracing the link between DNA sequence and phenotype, and shed light on the genetic underpinnings of disease susceptibility and species evolution.
The new sequences are “eminently more powerful than previous mouse data,” said David Threadgill, a geneticist at North Carolina State University who was not involved with the project. Although the first mouse genome sequence (strain C57BL/6) was released in 2002, scientists using other strains had little sequence data to support their research.
“It’s a misperception that we had strain sequences,” explained Threadgill—most of the data had focused on mapping single nucleotide polymorphisms, or SNPs, to the C57BL/6 reference strain, leaving large swathes of mouse genomes unsequenced. In addition to identifying over 56 million SNPs, the new data also identify many larger structural variations, including insertions, deletions, and copy number variations in areas of the genome with repeated DNA sequences.
Using the C57BL/6 sequence as a guide, the consortium of researchers broke down the genomes of each of the 17 strains into bits of about 100 base pairs and reassembled them, much like a “jigsaw puzzle,” said Jonathan Flint, co-author and neurogeneticist from The Wellcome Trust Centre for Human Genetics in Oxford, UK. Along the way, researchers identified locations where a strain’s genome differed from the C57BL/6 sequence, building a picture of the variation in SNPs and larger sequence variations among mouse strains.
The researchers found that wild-derived strains revealed more sequence variation than the inbred lab lines, echoing previous concerns about extrapolating results obtained in lab mice to more outbred populations, like humans, said Ira Hall, a geneticist at the University of Virginia who was not involved in the sequencing. Where in the genome sequence variation tended to fall—within exonic coding regions, the introns, or intervals between genes—is also becoming clearer: the researchers identified a surprisingly large amount of variation attributable to transposable elements, bits of DNA that insert or extract themselves from the genome.
Flint likened the new sequencing data to Christmas, and future research projects to “opening Christmas presents.” The previously long, tedious process of identifying a characteristic of interest, like memory formation, and honing in on a stretch of DNA that was linked to the trait has been exponentially expedited—like "shooting fish in a barrel," said Rob Williams, a neurogeneticist at the University of Tennessee, who was not involved with the project. Now, researchers investigating specific strains carefully bred for disease susceptibility simply have to compare one mouse sequence to another to find the DNA segment of interest.
Indeed, the researchers already linked much of the variation they identified to complex traits, like asthma or anxiety, and found that some structural variants that altered large stretches of sequence appeared to contribute more to phenotypes than simple SNPs.
“[It's the] first comprehensive look at the level of genetic variation and the effect it can have on phenotype,” said co-author David J. Adams, a cancer geneticist at The Wellcome Trust Sanger Institute in Cambridge, UK. Adams began the project to address the dearth of data facing his own lab as they investigated the genetic variation leading to cancer susceptibility. Now, Adams plans to use the sequences to understand how the genome evolves when cancer forms, and what genetic variation influences susceptibility to different types of cancer. The sequences “provide a blueprint” for those cancer studies, explained Adams.
The researchers also compared how different strain backgrounds might contribute to gene expression. The teams crossed two different strains, and compared RNA expression of specific genes in the offspring. Surprisingly, chromosomes from one strain sometimes “out-performed” the other, a phenomenon called allelic bias, where more RNA is transcribed from one chromosome than another. Occasionally, a strain whose chromosome produced the most RNA in one tissue, like liver, produced less RNA in another tissue.
Rob Williams looks forward to the chance to finally do “real genetics.” Genetics, said Williams, is the study of variation, and with 17 sequences to compare, mouse researchers can finally begin to study just that.
T.M. Keane, et al., "Mouse genomic variation and its effect on phenotypes and gene regulation," Nature, 477:289-94, 2011.
B. Yalcin, et al., "Sequence-based characterization of structural variation in the mouse genome," Nature, 477:326-29, 2011.