Sequencing On Target
Techniques for pulling out and sequencing selected areas of the genome
It's time for a genomics reality check.
Despite the constant, glowing coverage of speedy, low-cost next-generation DNA sequencing, whole-genome analysis, and consumer genomics, researchers still have no idea what the vast majority of human genomic DNA does, nor the functional consequence of variations in those sequences. Thus, few researchers actually need to sequence entire genomes—yet.
For the moment, most next-gen projects have more limited aims, such as "exome" sequencing (targeting that 1% of the genome, 250,000 exons or so, that actually encodes protein), immunogenomics (profiling individuals' antibody gene complement), or identifying variants in mere handfuls of genes.
Even if researchers actually desire whole-genome analyses, there's a financial angle to consider: No matter how cheap gene sequencing gets, it's still cheaper to sequence a fraction of the genome than to do the whole thing—especially when studying large populations.
"Hypothetically, if you could enrich for 1% of the genome which captures 90% of the most causative alleles, conceivably you could apply that to 100 times more people," says George Church of Harvard University.
Until recently there was no easy way to do that enrichment; PCR is not easily scaled. In the past few years, however, new techniques have emerged to make this process, variously called "targeted resequencing" or "genomic partitioning," tractable.
The Scientist asked five researchers to give the pros and cons of their chosen methods for fractionating genomes to feed the sequencing pipelines. Here's what they said.
Researcher: Elaine Mardis, Associate Professor of Genetics and Codirector, the Genome Center, Washington University, St. Louis, Mo.
Project: Sequencing cancer-associated genes in matched tumor and normal samples from one or two hundred cancer patients. The goal: to identify rare variants that cannot be found via traditional clonal sequencing approaches.
Technique: Mardis used Raindance Technologies' high-tech system, the RD T-1000. The system makes a PCR vinaigrette: millions of individual reaction chambers, each an aqueous bubble in an oil emulsion containing genomic DNA, reagents, and any of 4,000 separate primer pairs, all in a single tube.
First, the team designs primer pairs for each desired genomic segment. These are shipped to Raindance, which encapsulates each pair into 8-pL microdroplets to produce a pool of aqueous micelles in oil. That emulsion is then returned to the lab, where it is plugged into the instrument, along with genomic DNA and other PCR reagents. Finally, the RDT-1000 merges the ingredients into millions of discrete 22-pL droplets, each an individual reaction vessel, which can be amplified in any thermal cycler.
"Our experience so far has been that it's well designed," she says. "The interface is quite simple."
Considerations: Ideal for core facilities or small groups of labs, the method, says Mardis, is "plug-and-play" and doesn't require a lot of optimization.
The technique requires a considerable upfront investment, both in terms of hardware ($225,000) and oligos. Also, because each instrument run is limited to 4,000 reactions, full exome amplification requires multiple runs. Importantly, the process retains all the shortcomings of PCR, especially uneven target amplification.
Finally, says Mardis, success (as with all genome-partitioning methods) depends entirely on the uniqueness of the sequence being targeted. "Being able to design a genome-unique set of capture probes or PCR products is not always as easy as it sounds."
Researcher: George Church, Professor of Genetics, Harvard Medical School
Project: The Personal Genome Project, a massive effort to read the genomes of up to 100,000 individuals.
Technique: Though he hopes to be sequencing complete genomes by the end of the year, for now Church uses "padlock probes" to focus exclusively on exonic sequences (Nature Methods, 4:931-936, 2007).
Padlocks are 70-bp oligos containing two 20-base targeting arms flanking a generic 30-base linker. When padlocks are incubated with single-stranded, sheared genomic DNA, the targeting arms bind to the desired sequences, producing a bimolecular circle in which the two arms flank the region to be sequenced. One of the two arms then acts as a primer for DNA polymerase to copy the intervening sequence, and ligase seals the circle. Finally, noncircularized DNA is digested away, leaving only the circularized targets of interest, which are then amplified with universal primers and sequenced.
"Basically, when the circle is formed and ligated, you can think of it as locked" to exonucleases, says Church—hence the name.
Considerations: Church says the padlock method is both highly scalable—he uses it to select all 258,000 human exons— and precise. Hybridization approaches always pull down whatever off-target sequences (such as introns) the selected sequences may carry with them, but "the padlock probe approach is precise to one base pair," he says. "You get exactly what you want."
It also maximizes sequencing dollars by minimizing off-target sequencing, Church says, though sequence bias—the "preferential capture of certain sequences, hence requiring more sequencing to get the under-represented sequences up to an adequate level"— remains a persistent, if diminishing, problem.
Researcher: Andreas "Andy" Gnirke, Research Scientist, Broad Institute of Massachusetts Institute of Technology and Harvard University
Project: Finding an inexpensive way to read either exomes or specific susceptibility loci in large numbers of affected and normal individuals.
Technique: Gnirke developed a method he calls "hybrid selection" (or solution hybrid selection). He uses a custom 22,000-oligonucleotide microarray from Agilent, in which each oligo contains a 170- base targeting sequence flanked by 15-base universal primer sites. First, the oligos are cleaved from the array and PCR-amplified to introduce a T7 polymerase promoter at one end. The amplified oligos are then transcribed in the presence of biotin-UTP to create a biotinylated capture pool, or "bait." Next, the genomic DNA to be sequenced is sheared and coupled to adaptors to create a "pond" of prey molecules, which hybridize to the bait.
The resulting RNA:DNA hybrids are then captured on streptavidin beads, eluted, and amplified to yield the sequencing template (Nature Biotech, 27:182-189, 2009). "We believe [this technique] currently gives the best balance of specificity, uniformity, recovery of both alleles, and cost," he says. Agilent has commercialized the method as the SureSelect Target Enrichment System ($600/reaction if you buy the 100-reaction kit).
Considerations: Because the method requires on-array synthesis, it works best for a defined set of sequencing targets to be scanned over and over again. Adding new probes can be expensive.
On the other hand, being able to create, test, aliquot, and store large batches of oligos ups the technique's reproducibility, Gnirke says. With arrays, by contrast, "there's no way you can really test the thing before you use it." The technique also uses less DNA than arrays—about 500 ng instead of 10 μg.
The method is less precise than padlock probes, however. "We capture fragments that have exon sequence and some flanking sequence," he says. "We always get a certain amount of flanking bycatch that was specifically captured but still not what you want."
Researcher: Hanlee Ji, Assistant Professor of Medicine, Stanford University School of Medicine, and Senior Associate Director, Stanford Genome Technology Center (SGTC)
Project: Optimizing genomic biomarker discovery with a cost-effective, scalable, and generic sequencing pipeline that can process hundreds of patient samples.
Technique: Ji and SGTC director Ron Davis have devised several approaches to the genome-partitioning problem; one uses what Ji calls "targeted genomic circularization" or alternatively, "selector probes."
Selector probes are 80-nucleotide-long double-stranded oligos, with a central 40-base generic "vector" sequence and two targetspecific overhanging termini. Genomic DNA is digested with a restriction enzyme, denatured, and incubated with the selector probe. When the ends of the probe find their target sequences, the result is a partially double-stranded circle in which the two ends of the genomic fragment are bridged by the probe. These molecules can then be amplified and sequenced (PNAS, 104:9387-9392, 2007).
"The reaction is very simple," says Ji. "It's multistep, it can be done in any molecular biology lab, and it is very easy to integrate with any next-generation sequencing system."
Considerations: The selector probe strategy fills a different niche than do other, exome-scale methods, says Ji. "We see this as something any group could use to target anywhere from 10 to, say, 1,000 genes of interest."
That's because oligos are both inexpensive and stable, and because normalization— the tricky process of ensuring that different sequences amplify to the same degree—is easily accomplished by adjusting oligo sequence or concentration.
The strategy does impose several design constraints. To be efficient, the two probe ends must target sequences that are between 150 and 1,000 bases apart. Also, one of the ends must correspond to a restriction site. Still, even if some probes work better than others, you can simply use redundant oligos, Ji says. "It's just a simple matter of overengineering to compensate for potential failures of a given oligonucleotide."
Researcher: Jonathan and Christine "Kricket" Seidman, Department of Genetics, Harvard Medical School
Project: Searching for genetic variants of cardiac disease by performing deep sequencing of cardiac disease–associated genes in large populations.
Technique: The Seidmans opted for surface capture-based hybridization, a solidstate complement to Gnirke's approach.
They separately PCR amplify each desired exonic sequence—about 45,000 bases representing 11 genes in the case of hypertrophic cardiomyopathy—concatenate them into a single linear molecule, and bind that to a nitrocellulose filter. They then shear the genomic DNA to be sequenced, attach flanking PCR primers, hybridize those molecules to the filter, and sequence what bound. The technique, says Jonathan, "is absolutely based on Southern blotting technology," and is just as simple.
Considerations: One primary strength of the approach is flexibility. PCR products are easier to generate than long oligos, and anyone with even a modicum of technical know-how can generate a filter and modify it at will, encompassing new genes as they come up. Up to 30 MB can be thus targeted, says Jonathan.
The method is also inexpensive, requiring only PCR primers and nitrocellulose, and capable of detecting copy number variants PCR-based methods overlook. "We think these capture methods are going to be a useful tool for measuring a two-fold change in the amount of sequence," he says.
If you prefer commercial methods, Roche/NimbleGen recently released an array-based genome-partitioning product. The NimbleGen Sequence Capture 2.1 M Human Exome array uses oligonucleotide probes to capture more than 180,000 exons and microRNA sequences.