How to Calculate Mutation Rate for Evolutionary Biology
How to Calculate Mutation Rate for Evolutionary Biology

How to Calculate Mutation Rate for Evolutionary Biology

Four ways to study mutation rate, a crucial statistic in studies of evolution

Jul 1, 2018
Amber Dance

Mutation: it’s the raw material for evolution. That makes knowing the rate at which it occurs crucial to the study of evolutionary biology.

Mutation rate figures into all kinds of calculations. For example, the “molecular clocks” that evolutionary biologists use to estimate when one species first diverged into two are based on species’ mutation rates. Scientists also use the rates to track how quickly viruses, such as influenza, evolve. And cancer biologists are interested in using mutation rates to estimate how quickly tumor cell genomes might change over time.

“It is a parameter that you have to input into every mutation-evolution model there is,” says Yuan Zhu, a postdoc at the Genome Institute of Singapore.

Scientists used to infer mutations from phenotypic changes, such as the development of drug resistance. Now, thanks to increasingly cost-effective and rapid DNA sequencing, more-sophisticated ways of getting a handle on whole-genome mutation rates have emerged. Among these techniques are methods that researchers can apply to just about any species. Though scientists have primarily analyzed microbes and viruses thus far, they’ve also tackled lab models such as Drosophila and Arabidopsis, and even humans. These techniques are revealing how the mutation rate varies across the genome of a single species, and they’re pinpointing regions that are especially prone to alteration. They’re also uncovering the error rates of different enzymes, such as polymerases and repair enzymes, in the DNA replication process.

Here, The Scientist profiles four different ways of studying mutation rates in viruses, yeasts, and humans.

Au naturel

Researchers can identify mutations in natural populations of organisms. Rafael Sanjuán, an evolutionary biologist at the Institute for Integrative Systems Biology in Valencia, Spain, does so with viruses. “They mutate a lot, so it’s easy to witness evolution in real time,” he says.

HIV, in particular, is known for its astronomical mutation rate, an estimated 3 x 10-5 errors per base, per infection cycle. However, that rate was determined with virus growing in the lab. Sanjuán instead investigated the HIV mutation rate in the wild—that is, in blood samples donated by 11 people before they underwent HIV treatment. Because the pathogen mutates constantly, a single untreated person contains a population of ever-diversifying viruses.

How could one time point give Sanjuán a rate of change? His trick is to look only for lethal mutations. These are found in viral particles that received a faulty copy of the HIV genome. They exist in blood cells, but are unable to infect other cells or replicate any further—they’re evolutionary dead ends. That means any lethal mutation had to happen in the generation immediately preceding the dead-end viruses.

Modified from

Sanjuán and colleagues considered any nonsense mutations, which would insert a premature stop codon, to be probable lethal events. Based on the HIV sequence, the researchers figured this could possibly happen at 732,350 spots in the genome. In their samples, they observed 3,069 likely lethal mutations. Then, it was just a simple fraction: 3,069 actual lethal mutations divided by 732,350 possible lethal mutations gave them a mutation rate of 4.1 x 10-3 mutations per base, per cell infection cycle. (PLOS Biol, 13:e1002251, 2015)

Presumably, nonlethal mutations would occur at a similar rate. That corresponds to one mutation per 250 bases every time the virus genome is copied, much greater than the in vitro rate—and one of the highest known in biology.


  • Analyzing “wild” populations, as with viruses in real hosts, gives better information about the real mutation rate, Sanjuán says.
  • Sanjuán’s lethal-mutation approach, in particular, eliminates the effects of replication or natural selection.
  • Natural-population methods, such as the lethal-mutation trick, can work with any sample and any organism, so long as researchers make certain assumptions about sites where mutations are neutral and unaffected by natural selection. Researchers have done similar studies with humans, for example, by picking out unique mutations in a closely related group.


  • Scientists working with nonviral wild populations might find that natural selection can affect which mutations persist.
  • If the individuals in a sample are distantly related, it can be difficult to tell which mutations are new, notes Zhu.

The Power of Three

New mutations haven’t yet had time to fall subject to natural selection. One way to be sure mutations occurred recently is to sequence groups of two parents and their offspring, or trios. Any sequence found only in the offspring must have happened this generation. “It’s the most direct observation you can have,” says Shamil Sunyaev, a computational geneticist at Brigham and Women’s Hospital and Harvard Medical School in Boston.

To understand patterns of mutation across the human genome, Sunyaev and colleagues in the Netherlands examined sequences from 250 Dutch trios. They observed 11,020 new mutations in the children of these families.

The authors generated a map of mutation rates across the human genome. The map revealed that genes were more likely to mutate than noncoding DNA. That’s probably because CpG dinucleotides—stretches of the genome where guanines follow cytosines—are more frequent in coding sequences, and are relatively susceptible to error because methylated cytosines have a tendency to undergo deamination, transforming them into thymines. (Nat Genet, 47:822-26, 2015).

Though the researchers’ goal was not to determine the overall human mutation rate, back-of-the-envelope calculations showed that their data matched nicely with the accepted number, about 1.2 x 10-8 mutations per nucleotide, per generation, says Sunyaev. While that mutation rate hardly measures up to that of HIV, “it’s a very large number,” he notes. That corresponds to about 70 de novo mutations in every baby, about one of which will occur in a protein-coding gene, he says.


  • You can confirm mutations are novel.
  • It works for any sexually reproducing species, if you can collect all members of the trio.


  • It misses embryonic lethal mutations.
  • With large organisms such as humans, Sunyaev notes, there are few ways to experimentally investigate and confirm the mechanisms behind the observed mutations.

In the Lab

A method that gives scientists more experimental control is a mutation accumulation experiment. Researchers grow organisms in the lab, generation by generation, and track the genetic changes that build up.

Zhu and colleagues used this technique with the yeast Saccharomyces cerevisiae when she was a graduate student in the Stanford University laboratory of Dmitri Petrov. Their collaborators at the University of Georgia had grown 145 strains of budding yeast for approximately 2,000 generations each. Every two days, or about 20 generations, they streaked the cultures and used one colony to start the next round. Doing so created a bottleneck and eliminated much of the chance that the fittest yeast would take over the culture. “This is as close to neutral evolution as possible,” says Zhu.

At the end of the experiment, the researchers identified almost 1,000 spontaneous mutations, including 867 single-nucleotide swaps and 26 indels. They calculated that the single base pair mutation rate was 1.7 x 10-10 per base, per generation, while the indel rate was 5 x 10-12 per base, per generation. (PNAS, 111:E2310-18, 2014).

the scientist staff


  • Natural selection is minimal.
  • The lab environment gives researchers control.


  • Mutation accumulation is only feasible for organisms with short generation times, and those that breed well in the lab setting. 
  • It’s always possible that the mutation rate is different in other strains, or in the wild.

Errors & Repairs

Mutation accumulation experiments can also help scientists understand how mistakes in DNA replication occur, and how cells fix them. That’s what bioinformatician Scott Lujan and molecular geneticist Thomas Kunkel, both at the National Institute of Environmental Health Sciences in North Carolina, are interested in. The researchers also analyzed S. cerevisiae, over 900 generations, calculating the same mutation rate as Zhu—1.7 x 10-10 per base pair, per generation for wild-type yeast.

To delve into how and why those mutations occur, they also analyzed a variety of strains with defects in the DNA polymerases that synthesize new strands, or in the DNA mismatch repair protein MSH2. By analyzing mutation patterns in these strains, they could discover what kinds of mistakes the polymerases are most likely to make, and what mistakes cells are best able to repair.

Lujan analyzed 40,000 mutations in the eight strains he studied. Among his discoveries, he found that different DNA polymerases work on each strand of a growing DNA molecule. When the two strands separate into a replication fork, replication proceeds differently on each strand because polymerases only work in the 3’ to 5’ direction. On one side, polymerases can synthesize a single, long strand called the leading strand, moving from 3’ to 5’. On the other side of the fork, the polymerases must work 5’ to 3’. They do so by synthesizing short chunks of DNA in the 3’ to 5’ direction, then linking those together. This is called the lagging strand.

the scientist staff

Lujan could tell which of three yeast polymerases worked on different sections of DNA because each leaves a unique pattern of errors in its wake. For example, pol delta is prone to create AA mismatches, and these were prevalent along the new DNA synthesized as the lagging strand, indicating pol delta works in that direction.
The mismatch repair system also worked differently on dissimilar mutations. For example, says Kunkel, it was pretty good at fixing a T mistakenly paired with a G, but not so efficient at fixing rarer substitutions, such as a C linked to another C. The cell was better at repairing the more common errors (Genome Res, 24:1751-64, 2014). “That’s how organisms get very low mutation rates,” concludes Kunkel.


  • Mutations accumulate faster in strains that lack key DNA replication or repair genes.


  • While this works well for microbes, it would be harder to avoid natural selection in animals such as mice. For example, any embryonic lethal mutation would be invisible to the researchers. As Kunkel puts it: “Dead cells tell no tales.”