Reveling in the Revealed

© HENNING DALHOFF/SCIENCE SOURCEA cell packs its genome as if our lives depended on it, and they do. If you could unwind the DNA within the nucleus of a single cell, it would stretch two meters. The 2–3 percent of the genome revealed at any one time performs an essential function: transcription. “Assaying the parts that are being used is a very powerful way to try to understand gene-expression regulation at the level of DNA,” says William Greenleaf of Stanford University. And probing that regulation process is key for understanding health and disease. Large consortia-led projects such as ENCODE (Encyclopedia of DNA Elements) have made great strides in identifying various functional elements of the genome. These include enhancers, activators, and promoters—regions of DNA that bind proteins that control transcription. Studies have also have tapped into the nature of DNA’s primary packing material: protein spools called histones around which genomes wind to form nucleosomes. Nucleosomes, which are often compared to beads on a string of DNA, further stack as chromatin folds and winds, forming some 10,000 loops within the cell’s nucleus (Cell, 159:1665-80, 2014). This brings distant regions of the genome into close contact and ensures that genes aren’t unintentionally transcribed. Which parts of the genome are available for transcription at a given moment? ENCODE helped answer this question by using DNase-seq, a technique that digests and sequences nucleosome-free regions of the genome. Similar methods have come along in recent years, including ATAC-seq and MNase-seq, expanding researchers’ options for taking snapshots of available (or unavailable) DNA. Surveying the whole genome using these methods can be a helpful first step toward cataloging potential functional elements of transcription. ChIP-seq (or its myriad variations) may then provide more mechanistic insights, by using antibodies to pinpoint specific transcription factors, notes senior investigator Keji Zhao of the National Heart, Lung, and Blood Institute. The Scientist talked to developers and users about the pros and cons of each of these commonly used techniques. Here’s what they said. DNase-seq Background: Deoxyribonuclease (DNase) has long been paired with Southern blotting to reveal exposed regions of DNA, known as DNase hypersensitive sites, finding that such regions are indeed active. Next-generation sequencing has allowed researchers to probe exposed regions across entire genomes, and the ENCODE project alone has generated more than 400 data sets using DNase-seq. How it works: DNase-seq takes advantage of the fact that exposed regions of the genome are naturally more prone to degradation by DNases. The method employs the enzyme DNase I to cleave DNA at sites along the genome that are not wrapped around nucleosomes, which become displaced by the binding of transcription factors. These small fragments, which are thought to infer the presence of transcription factors, are then sequenced and mapped to the genome. Pros: The technique is better established than any of the other chromatin accessibility methods; many labs have applied it across a wide range of cell types and species (including plants) and its cutting bias is better understood. It’s possible to tweak DNase-seq to look at protected regions of the genome, presumably where transcription factors or nucleosomes may dwell. This is called DNase footprinting. Cons: DNase-seq is technically difficult to master, especially in finding the optimal digestion conditions for a given cell type and number. Because the method requires millions of cells, it may be challenging to analyze rare patient samples. Getting started: Check out the two main protocols: Cold Spring Harb Protoc, doi:10.1101/pdb.prot5384, 2010; Curr Protoc Mol Biol, Supplement 103:Unit 21.27, 2013. Considerations: Recent research has revealed how DNase?I’s cutting bias may limit the method’s usefulness for the identification of DNA footprints. Analyzing supposed binding of 36 different transcription factors, the researchers showed that DNase-seq data were not useful for illuminating footprints for many of them (Nat Methods, 11:73-78, 2014). Because where the enzyme cuts is sequence-dependent, researchers should use naked DNA (i.e., DNA with no associated proteins) as a control in DNase-seq (also, in ATAC-seq) footprint analysis, says Clifford Meyer, research scientist in X. Shirley Liu’s lab at Harvard University and a coauthor on the Nature Methods study. “If you see a pattern in the naked DNA, then you know it’s got nothing to do with transcription-factor binding,” he adds. SPACE BETWEEN: DNase-seq and ATAC-seq are used to sequence and map exposed regions of DNA, whereas MNase-seq maps regions that are protected by nucleosomes. But because the methods provide snapshots of a dynamic process that is averaged across many thousands of cells, DNase- and ATAC-seq do not provide data that perfectly complement those of MNase-seq. (TF = transcription factor)BASED ON EPIGENETICS CHROMATIN, 7:33, 2014, REDRAWN WITH PERMISSION. Single cells?: Just a month ago, Keji Zhao’s group described single-cell DNase-seq (scDNase-seq), using the technique to identify exposed regions of DNA in tumor cells that they had manually scraped from fixed-tissue slides of thyroid cancer biopsies. The team also analyzed exposed genome regions of single living cells isolated using fluorescence-activated cell sorting (Nature, 528:142-46, 2015). ATAC-seq Background: In collaboration with Howard Chang at Stanford University, Greenleaf’s group introduced the Assay for Transposase-Accessible Chromatin (ATAC)-seq in 2013 (Nat Methods, 10:1213-18, 2013). How it works: ATAC-seq inserts sequencing adapters directly into accessible DNA using the enzyme Tn5 transposase. The bits captured between the adapters are then amplified with qPCR and sequenced. Pros: The protocol is the easiest of any of the accessibility methods, and the signal-to-noise ratio is fantastic, says the University of Chicago’s Jason Lieb, who developed a related method, called FAIRE-seq, several years ago. (Although FAIRE-seq is as easy as ATAC-seq, Lieb has mostly switched over to the latter because it gives a better signal.) You need only 50,000 or fewer cells to get results from ATAC-seq. Cons: Starting materials are slightly more expensive; you have to purchase a kit for Tn5 transposase from Illumina (Nextera DNA Library Preparation Kit). There’s not much precedent for ATAC-seq footprinting yet, notes bioinformatician Michael Buck of the State University of New York at Buffalo. His group is working on a tool that accounts for ATAC-seq bias to highlight potential footprints. Last year, Greenleaf’s group released a high-resolution nucleosome peak detector (NucleoATAC; available via GitHub). Getting started: Greenleaf has started a forum to field questions from an expanding user group. You can request access to it at sites.google.com/site/atacseqpublic/home?pli=1. Those experienced with molecular biology techniques can generate a sequencing library in a day, Greenleaf says. Tips: Every cell is different, so you will need to adjust the cell number and the lysis conditions for your particular situation. “Ideally you want to gently lyse cells to get the transposase in but not disrupt the chromatin state,” says Greenleaf. Using too many cells leads to fewer sequencing adapters being inserted, and thus larger DNA fragments; too few cells will lead to shorter bits. The optimal number of cells can vary, depending on the tissue or organism from which the cells originate. It’s always good to do some preliminary analysis before you run your samples on a sequencer, or do light sequencing to start, says Greenleaf. You could run a preliminary gel to check out fragment distributions, or run the sample through a machine that quantifies DNA and measures its quality (e.g., Agilent 2100 Bioanalyzer). For production-level sequencing, Greenleaf recommends using paired-end sequencing for the best results. Single cells?: Two groups recently published different methods for single-cell ATAC-seq. Jay Shendure’s group at the University of Washington and his collaborators tagged cell nuclei with barcodes and separated them using fluorescence-activated cell sorting (Science, 348:910-14, 2015). In contrast, Greenleaf’s lab uses microfluidic approaches for cell isolation (Nature, 523:486-90, 2015). Much of the challenge for both methods comes down to data analysis, Greenleaf says, because the data are sparse. “In a single cell, there’s either zero, one, or two loci that are open at any specific region of the genomic sequence,” he says. MNase-seq Background: Researchers have used micrococcal nuclease (MNase), from Staphylococcus aureus, for digesting and studying chromatin for at least 40 years. In 2010, they started pairing it with high-throughput sequencing. How it works: MNase works by chewing up exposed stretches of the genome; the DNA associated with nucleosomes is recovered and sequenced. That makes MNase-seq the inverse of ATAC-seq and DNase-seq, at least conceptually. Pros: In combination with chromatin immunoprecipitation (ChIP-seq), which requires a high-quality antibody, MNase digestion can be used to study regulatory factors that bind to nucleosomes. The technique has been used on the cells of many species, from yeast to humans. Cons: MNase-seq requires 10–20 million cells. Most enzymes used in chromatin accessibility assays have sequence-specific biases; MNase likes to cut in AT-rich regions of the genome. For reasons that are not always clear, certain regions of the genome are more sensitive than others to MNase digestion. Getting started: Researchers have developed a protocol that takes into account the shorter reads produced by MNase digestion and generates base-pair resolution mapping (PNAS, 108:18318-23, 2011). Buck’s group has described methodology aimed at standardizing digestion and data analysis steps (BMC Mol Biol, 13:15, 2012). Tips: DNase-seq and MNase-seq are not perfect opposites: studies could, for example, suggest that a given site on the genome could be both DNase I hypersensitive and nucleosomal, says Lieb. “Just imagine that a site is open half the time and nucleosomal half the time,” he says. “It’s theoretically possible to get [DNase] hypersensitive signaling and a nucleosome. Kinetics are still a challenge which none of these methods has addressed completely.” Averaging over many populations of cells also muddles the data, he adds. An alternative to MNase-seq, called NOMe-seq, generates genome-wide information about both nucleosome positioning and the state of DNA methylation (Genome Res, 22:2497-506, 2012). Single cells?: Nothing published yet. A word about data analysis In probing DNase-seq, ATAC-seq, and MNase-seq data, most researchers use programs that were originally developed for ChIP-seq, says Michael Buck. It’s simple enough to recognize places in the genome that are open, “but if you want to do more analysis, that’s where people get bogged down,” he adds. Sophisticated analysis is required to get more meaningful results, and for that you will need some programming abilities, Zhao says. “It doesn’t matter what programming language you use—R, Perl, C++—but programming ability is important.” That doesn’t mean you have to be a bioinformatician. It’s relatively easy for molecular biologists to pick up enough Perl, for example, to do data analysis themselves, or at least be able to communicate with a bioinformatician about the analysis. Core facilities and collaborators who specialize in data analysis can be key resources, Zhao says. In addition, Buck says, new assay-specific analysis tools are in the works and the tools for these methods should improve in the near future.