© JAVIER TRUEBA/MSF/SCIENCE SOURCE
Two researchers sit hunched in front of a fume hood dressed head-to-toe in stark white Tyvek suits, though the yellow-tinted window I’m viewing them through lends the entire scene a sulfurous hue. One of the scientists, a research associate named Hongjie Li, pipettes tiny volumes of solutions containing decades-old DNA into centrifuge tubes, while the other, PhD student Lu Yao, types information into a laptop. Airlock doors and a sensitive ventilation system minimize the incursion of outside air and the myriad bits of contaminating DNA it carries. Yao, reaching a point when she can take a break, looks up from her work and waves, a...
This is the ancient-DNA lab at the University of Illinois, Urbana-Champaign, tucked in a corner of the basement at the Carl R. Woese Institute for Genomic Biology. Yao has spent hours in this space. Working under the guidance of molecular anthropologist Ripan Malhi, she hopes to answer questions about phylogeny, biogeography, and island dwarfism among long-tailed macaques (Macaca fascicularis) in Southeast Asia by sequencing decades- and even century-old mitochondrial DNA collected from the dried skulls of monkeys in museum collections. And thanks to recent methodological, computational, and conceptual advances in the study of ancient DNA, Yao, Li—who studies ancient DNA from native Californians—and other researchers are succeeding, compiling sequences at an unprecedented rate.
In just a few decades, the study of ancient DNA has gone from a scientific curiosity to an extremely powerful method for reconstructing past biological phenomena. Malhi recalls that in his own PhD research, which he finished in 2001, he devoted an entire dissertation chapter and a year of lab work to the genetic analysis of 40 ancient samples from Native Americans, zeroing in on a 300-base-pair-long fragment of mitochondrial DNA. “Now, that’s something that one of my students can do in a month,” he says. “It’s pretty amazing.”
In addition to greatly condensing the amount of time it takes to extract and sequence old DNA, new techniques are allowing researchers to pluck sequenceable fragments from ever-more-ancient samples, providing genetic blueprints from long-forgotten epochs of evolution, migration, and ancestry. In 2014 alone, scientists successfully sequenced the mitochondrial genome of a hominin that lived more than 400,000 years ago,1 exomes from the bones of two Neanderthal individuals more than 40,000 years old,2 and a nearly complete nuclear genome from a 45,000-year-old modern human fossil,3 to name but a few. In 2013, an international team of researchers led by scientists at the University of Copenhagen published the full genome sequence of an ancestral horse species that roamed the Middle Pleistocene permafrost of North America more than 700,000 years ago—the oldest complete genome sequenced thus far.4
For ancient-DNA researchers, these truly are heady times. “The last two or three years have been amazing,” says Mattias Jakobsson, a population geneticist at Uppsala University in Sweden who studies ancient DNA as a way to understand human evolutionary history. And the coming years only promise more sequences from more and older specimens, he adds. “We’re certainly heading to much more data. There’s going to be many more studies of many more individuals.”
Roots and shoots
The seeds of ancient DNA research sprouted in 1984, even before polymerase chain reaction (PCR) became the ubiquitous technique that it is today. Researchers at the University of California, Berkeley, successfully cloned and sequenced two fragments of mitochondrial DNA from a 140-year-old museum specimen of a quagga, an extinct relative of zebras, demonstrating that genetic material could survive and be recovered from the remains of long-dead animals.5
In just a few decades, the study of ancient DNA has gone from a scientific curiosity to an extremely powerful method for reconstructing past biological phenomena.
The quagga paper and similar reports of ancient DNA recovery from China and Germany ignited excitement among geneticists, and the race was on to comb ever-older specimens for sequenceable DNA. In 1985, Svante Pääbo, then a young PhD student at Uppsala University, published in Nature that he had cloned nuclear DNA from a mummified Egyptian child who was laid to rest 2,400 years ago.6 Just a few years later, however, as PCR burst onto the scene, Pääbo learned that the DNA he’d recovered was at least in part modern human DNA, likely from the archaeologists or museum staff who had handled the specimen.
Soon thereafter, other claims of ancient DNA recovery were determined to be the result of contamination and/or faulty methodology, rather than a glimpse into prehistory. The recovery of supposed chloroplast DNA from 20-million-year-old magnolia leaves,7 for example, could not be repeated; those who tried turned up only the genetic sequences of contaminating bacteria, mistakenly amplified by PCR. And pieces of mitochondrial DNA supposedly collected from 80-million-year-old dinosaur bone fragments8 were, in fact, of modern human origin. “In the middle of the ’90s you have a lot of people realizing that a lot of things were wrong,” says Ludovic Orlando, a leading ancient-DNA researcher at the University of Copenhagen.
The mistakes made by ancient-DNA pioneers were not wholly uninformative, though. Problems with contamination led researchers to adopt carefully designed protocols for unearthing, cataloging, handling, and studying ancient samples. And realizing the inherent difficulty in working with highly degraded and aged bits of DNA sparked creative strategies for recovering short fragments of truly ancient genetic material and for differentiating them from modern DNA. Complement these protocols with new technologies developed in the past few years to mine, extract, isolate, and sequence genetic material from fossilized specimens, and there has been nothing short of a revolution in ancient DNA research.
Better tools blossom
© JAVIER TRUEBA/SCIENCE SOURCEThe first rule of paleogenomics is: the older the sample, the more fragmented the DNA. Exogenous and endogenous nucleases get to work as soon as an organism dies, degrading its tissues and genetic material. Water and oxygen take their toll as well, leaving ancient DNA with characteristic double- and single-strand breaks, crosslinks, and telltale patterns of molecular modification. All told, DNA has an average half-life of only about 521 years in bone,9 meaning that most fossilized samples contain only trace amounts of endogenous genetic material. It’s no wonder early studies of ancient DNA led to so many cases of mistaken identity.
The first problem was that 1980s cloning techniques relied on enzymes that would actually repair damaged DNA, and not always correctly, introducing errors into resultant sequences. The adoption of PCR sidestepped this problem by eliminating the need to clone DNA samples before sequencing, but traditional PCR only amplifies fragments that are at least 90 base pairs in length, longer than the highly degraded fragments found in specimens thousands of years old. As a result, PCR is far more likely to amplify contaminating modern DNA than genetic material originating in a fossil sample. It’s also extremely labor- and time-intensive to assemble an entire genome using traditional PCR, because the method only amplifies one specific stretch of DNA at a time.
Enter next-generation sequencing. The technology, which uses faster and simpler library preparation to ready DNA for massively parallel sequencing, came into wide use in the mid-2000s, allowing researchers to read all the DNA molecules in a given sample, not just a target sequence. “You recover all the genetic information that is in your library and in your extract,” says Matthias Meyer, a researcher at the Max Planck Institute for Evolutionary Anthropology who has pioneered recent methodological improvements to working with ancient DNA. “With the same amount of DNA extract, you can now get thousands or tens of thousands of times more information. This is really what has made ancient DNA sequencing on a larger scale possible.”
Next-gen sequencing is particularly useful for analyzing highly fragmented DNA, adds Eske Willerslev, a geneticist at the University of Copenhagen’s Natural History Museum of Denmark, because it can capture the sequences of exceedingly short stretches of nucleotides. “The ability to go down and take 30 or 35 base pairs makes a huge difference,” he says. “Those technological improvements that come with next-generation sequencing have made the biggest difference.”
Improvements in the processing of ancient genetic material prior to sequencing have also helped researchers in the quest to retrieve older and shorter fragments of DNA. In ancient bone, DNA is nestled among diverse organic and inorganic molecules, including collagen and a mineral form of calcium called hydroxyapatite, which must be dissolved away to extract sequenceable genetic material. “The challenge of DNA extraction is to purify the DNA,” Meyer says, “to wash out the substances that are interfering with your downstream analysis.” Meyer’s lab used an extraction buffer to address this issue, employing the chelating agent ethylenediaminetetraacetic acid (EDTA) to dissolve hydroxyapatite and the enzyme proteinase K to dissolve collagen.
Meyer’s lab also optimized a formula of binding buffers that contains isopropanol, which aids in the capture of very short DNA fragments, and the salt guanidine hydrochloride, to help attach DNA to special silica filters nestled inside centrifuge tubes. (See illustration below.) In a 2012 Science paper, Meyer led an international team of researchers that obtained a high-quality genome sequence from the finger bone of a Siberian Denisovan, an extinct relative of Neanderthals, by capturing, sequencing, and stitching together DNA fragments that were as small as 35 base pairs long. (That bone is estimated to be somewhere between 30,000 and 80,000 years old, but because it is just the tip of a finger, it does not contain enough carbon for dating, and its true age is still debated.)
© LUCY READING-IKKANDAIn addition to the DNA extraction and isolation improvements, Meyer and his colleagues accomplished the unprecedented feat by using a homegrown single-stranded library preparation method to ready the DNA for sequencing. Instead of using double-stranded DNA to make genetic libraries that can be fed into a next-gen sequencer, as was standard practice at the time, Meyer and his team first separated the double helix, then prepared the sequence library using each of the single strands, doubling the amount of fragments the group had to sequence. This also had the benefit of circumventing a purification step that leads to the loss of some genetic material in the sample, further increasing the number of precious ancient DNA fragments that could be recovered from such a small and degraded specimen. “One thing that we noticed when we generated the first high-quality DNA sequence is that not only did we get more DNA, but we got much shorter sequences, which had always been lost before during library preparation,” Meyer says. “This technique allowed us to achieve a level of resolution that has not been achievable before.”
But being able to recover tiny fragments of DNA doesn’t change the fact that any sample will contain genetic sequences from more than just the organism of interest. Even with the widespread adoption of handling procedures to minimize the risk of contamination, modern DNA fragments—from humans, plants, microbes, or other organisms—far outnumber the bits of ancient DNA in any centuries- or millennia-old biological specimen. To overcome this dilemma, researchers have turned to clever analytical methods that allow them to differentiate ancient and modern sequences based on characteristic patterns of molecular modification to which degrading DNA is subjected.
Chief among these telltale DNA alteration patterns is cytosine deamination, in which cytosine (C) bases are replaced with uracil (U), a base that normally occurs in RNA. Researchers have established a clear correlation between the postmortem age of a biological sample, the preservation conditions in which the sample was found, and cytosine deamination rates. These patterns of cytosine deamination can help ancient-DNA hunters to distinguish modern human DNA from Neanderthal sequences, for example, which are otherwise genetically very similar.11 “[DNA] damage makes your life more difficult, but it also gives you a lot of power,” says Orlando.
The fruits of technology
A couple of years ago, as Meyer and his colleagues continued to tinker with ways to recover shorter fragments of DNA, they got their hands on some 400,000-year-old bear bones from the Sima de los Huesos (Pit of Bones) site in Atapuerca, Spain. Applying their new techniques to the samples, the team was able to recover DNA fragments, 95 percent of which were shorter than 50 base pairs, Meyer says. Then, encouraged by their success using the single-stranded library preparation method on the Denisovan fossil, the Meyer lab turned its efforts to another fossil from the same Spanish cave—that of a 400,000-year-old hominin femur—and succeeded in generating a high-coverage mitochondrial genome sequence.1
“It’s very exciting that you can look directly into the past with ancient DNA,” says Rasmus Nielsen, a University of California, Berkeley, computational biologist who studies population genetics using modern and ancient DNA. “We’ve gotten some big surprises from ancient DNA.”
One of the biggest of these surprises involves the spread of lactase persistence, the ability to metabolize the milk sugar lactose, through human populations in Europe. Lactase persistence in Europeans is strongly associated with well-described genetic polymorphisms that confer the production of the lactose-digesting enzyme lactase into adulthood. (Distinct polymorphisms conferring the trait became established independently in parts of Africa, an example of convergent evolution.) In the distant past, humans, like all other mammals today, only produced lactase as young feeding on their mothers’ milk; adults were lactose intolerant. But at some point in the species’s evolution the 13,910*T allele arose in certain European populations, likely conferring lactase persistence.
Genome analysis of SNP and microsatellite variation among modern European genomes led researchers to propose that the 13,910*T allele swept through European populations sometime between about 7,000 and 10,000 years ago. But in 2011, researchers in Hungary sequenced DNA collected from 23 ancient bone samples—European commoners and Asian conquerors who lived in the 10th and 11th centuries. The DNA sequences revealed that the 13,910*T allele was relatively rare among commoners and completely absent among invaders from Asia, where the modern human population still has a relatively low prevalence of the allele.12 Then, in 2014, another European research team reported that lactase-persistence alleles were completely absent from DNA they had pulled from the bones of Europeans inhabiting the Great Hungarian Plain between 5,700 BC and 800 BC.13 Those ancient DNA analyses suggest that the 13,910*T allele swept across Europe much more recently—probably between 3,000 and 4,000 years ago—than researchers had surmised by studying modern DNA and archaeology.
With the sequencing and analysis of ancient DNA, “you have the way here to really make the difference between the two evolutionary scenarios,” says Orlando.
Improved analysis of ancient DNA has also led to important revisions in models of early human migration. Last year, Willerslev’s group at the Natural History Museum of Denmark sequenced the genome of a 12,600-year-old ancient Native American Clovis boy who lived in what is now Montana. The sequence revealed that roughly 80 percent of all modern Native Americans are direct descendants of the boy’s family.14 It also confirmed that North America was first populated by individuals from Northeast Asia, not Western Europe, as proposed by one contrarian hypothesis. Other recent analyses of ancient DNA samples have led to revisions of hypotheses about the peopling of Europe, the Arctic, and Australia.
Going further back in hominin evolution, ancient DNA work has helped to uncover human ancestors that were completely unknown to science. That same finger bone from which Meyer’s group generated a high-quality Denisovan genome sequence was first used in 2010 to generate a draft genome sequence of the hominin, which had never before been described.15 “They discovered something that we think is a new species of humans that anthropology had overlooked for hundreds of years,” says Orlando. “You can see how those techniques have big surprises and they can really discover things that are unexpected.”
What’s next (gen)?
© KENNIS & KENNIS/MSF/SCIENCE SOURCEWhile the past few years have seen a profusion of new and interesting uses of ancient DNA spurred by the rapid improvement of research methods, scientists working in the field agree that the coming years hold even more in store. For one thing, sequencing technologies continue to develop at a breakneck pace, and, according to Willerslev, as third-generation technologies such as nanopore sequencing are applied to ancient-DNA work, researchers will be able to probe even deeper into the biological past. Using a combination of next-gen sequencing and a third-generation, single-molecule sequencer from Helicos Biosciences that can sequence DNA directly without the need for an amplification step, his group sequenced the 700,000-year-old horse DNA that still holds the record as the oldest genome yet sequenced.4 “This type of technology will be the future,” he says.
Researchers are also beginning to apply their newfound skills in dealing with ancient materials to branch out beyond simply sequencing genomes. In 2012, for example, Willerslev’s lab published an analysis of proteins, which are generally longer lived postmortem than genetic material, of 43,000-year-old woolly mammoth bones.16 And last year, Willerslev, Orlando, and colleagues published a genome-wide nucleosome map and survey of cytosine methylation levels in the DNA they pulled from the 4,000-year-old hair shafts of a Paleo-Eskimo, effectively launching the field of ancient epigenetics.17 Also last year, Pääbo’s group at the Max Planck Institute for Evolutionary Anthropology published the first full DNA methylation maps of the Neanderthal and Denisovan genomes.18 “For the first time we’ll be able to address what is the role of epigenomics and epigenetics in evolution,” Willerslev says.
But just how far back into biological history will ancient DNA researchers be able to reach? Most scientists feel that recovering sequenceable DNA isn’t likely in samples more than 1,000 millennia in age. “I would bet all my money that 1 million is the limit,” says Meyer. But some are confident that further improvements to DNA isolation and sequencing techniques could take us even further back. “It would not surprise me if we were able to sequence DNA older than 1 million years given appropriate environmental conditions,” Malhi says. Willerslev agrees, speculating that researchers may eventually be able to sequence DNA collected from samples dating to 2 million years ago. “I wouldn’t be surprised at all.”
ANCIENT-DNA TECHNOLOGY MOVES INTO THE CRIME LAB
Sitting locked in bone for millennia is just one way that DNA can become fragmented and degraded. Although not very old, genetic material in biological fluids or tissues left at crime scenes can be similarly damaged and sparse. As a result, some in the forensic science field are calling for the use of the latest paleogenomic techniques in the crime lab. “It makes sense that forensic scientists adopt the protocols that are most effective,” says University of Illinois molecular anthropologist Ripan Malhi, “because we’re dealing with the same types of issues, which are degraded DNA in low concentrations that is subject to contamination and damage.”
Two separate companies, Parabon Nanolabs and Identitas, now offer modern genomic techniques to forensic investigators looking to generate leads in cold cases or instances of missing persons. Both firms utilize microarray genotyping to pinpoint hundreds of thousands of single nucleotide polymorphisms (SNPs) that indicate phenotypic traits such as eye color, hair color, and freckling as well as geographic ancestry. “What our system does is really generate new information just from DNA,” says Parabon’s director of bioinformatics, Ellen Greytak. “It’s like if there were an eyewitness that was telling [investigators], ‘This is what that person looked like.’”
Such techniques are still rarely used in state, local, federal, or international crime labs, but they could eventually supplant the standard forensic genetic method—matching DNA to known samples using short tandem repeats (STRs) that serve as a genetic fingerprint to identify suspects. SNPs, because they can be pulled out of shorter and more degraded fragments of DNA, could prove extremely informative to criminal investigators. “I hope that... the standards in the forensics space will be informed by innovations in the academic labs,” says Cris Hughes, a University of Illinois forensic anthropologist who also works as deputy forensic anthropologist at the Champaign County Coroner’s Office.
Because forensic scientists around the world have already compiled STR databases that contain information about thousands of individuals, and because the work can be done relatively cheaply, the move to newer genomic technologies at working crime labs may happen very slowly. Laurence Rubin, Identitas CEO and a practicing rheumatologist at St. Michael’s Hospital in Toronto, says that his company is now shifting gears from demonstrating the utility of SNP-based microarrays for analyzing forensic samples to into more of an advocacy role, introducing law enforcement agencies to the power of these genomic technologies.
“Until there’s a willingness to adopt this on a larger scale, we and others in the field will be faced with doing exploratory efforts,” he says, such as a 2013 study of their proprietary chip that predicted eye color, hair color, and biparental ancestry—with anywhere from 48 percent to 94 percent accuracy—among more than 3,000 blinded DNA samples (Int J Legal Med, 127: 559-72, 2013). “The problem right now is that we need to advance the technology with the agencies and groups that can use it the most.”
- M. Meyer et al., “A mitochondrial genome sequence of a hominin from Sima de los Huesos,” Nature, 505:403-06, 2014.
- S. Castellano et al., “Patterns of coding variation in the complete exomes of three Neandertals,” PNAS, 111:6666-71, 2014.
- Q. Fu et al., “Genome sequence of a 45,000-year-old modern human from western Siberia,” Nature, 514:445-49, 2014.
- L. Orlando et al., “Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse,” Nature, 499:74-78, 2013.
- R. Higuchi et al., “DNA sequences from the quagga, an extinct member of the horse family,” Nature, 312:282-84, 1984.
- S. Pääbo, “Molecular cloning of Ancient Egyptian mummy DNA,” Nature, 314: 644-45, 1985.
- E.M. Golenberg et al., “Chloroplast DNA sequence from a Miocene Magnolia species,” Nature, 344:656-58, 1990.
- S.R. Woodward et al., “DNA sequence from Cretaceous period bone fragments,” Science, 266:1229-32, 1994.
- M.E. Allentoft et al., “The half-life of DNA in bone: Measuring decay kinetics in 158 dated fossils,” Proc R Soc B, 279:4724-33, 2012.
- M. Meyer et al., “A high-coverage genome sequence from an archaic Denisovan individual,” Science, 338:222-26, 2012.
- P. Skoglund et al., “Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal,” PNAS, 111:2229-34, 2014.
- D. Nagy et al., “Comparison of lactase persistence polymorphism in ancient and present-day Hungarian populations,” Am J Phys Anthropol, 145:262-69, 2011.
- C. Gamba et al., “Genome flux and stasis in a five millennium transect of European prehistory,” Nat Commun, 5:doi:10.1038/ncomms6257, 2014.
- M. Rasmussen et al., “The genome of a Late Pleistocene human from a Clovis burial site in western Montana,” Nature, 506:225-29, 2014.
- D. Reich et al., “Genetic history of an archaic hominin group from Denisova Cave in Siberia,” Nature, 468:1053-60, 2010.
- E. Cappellini et al., “Proteomic analysis of a Pleistocene mammoth femur reveals more than one hundred ancient bone proteins,” J Proteome Res, 11:917-26, 2012.
- J.S. Pedersen et al., “Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome,” Gen Res, 24:454-66, 2014.
- D. Gokhman et al., “Reconstructing the DNA methylation maps of the Neandertal and the Denisovan,” Science, 344:523-27, 2014.