Researchers Propose Automating the Naming of Novel Microbes
Researchers Propose Automating the Naming of Novel Microbes

Researchers Propose Automating the Naming of Novel Microbes

With modern technologies unearthing novel bacterial and archaeal species by the dozens, hundreds, or even thousands, manually naming them all is no longer practical, scientists say.

Jef Akst
Jef Akst
Mar 1, 2021


When Mark Pallen and his colleagues began describing the chicken gut microbiome several years ago, they soon identified DNA sequences from undocumented species. In 2019, the team conducted its most comprehensive survey yet and found hundreds of seemingly novel microbes, some belonging to entirely new genera. “It became clear that much of what we were looking at then had never been named before, never been characterized,” says Pallen, a professor of microbial genomics at the Quadram Institute in the UK as well as the University of East Anglia and the University of Surrey. 

Often, researchers publishing on microbial discoveries will assign alphanumeric designations such as “s__JCVI-SCAAA005 sp000224765” to new bacterial and archaeal species. But trained as a medical microbiologist, Pallen says he values the traditional binomial nomenclature instituted by Carl Linnaeus in the 18th century that identifies an organism by its genus and species names. Giving microbes a name “makes them real,” he says. And so, he set out to name his team’s newly discovered chicken gut microbes.

Having studied Latin in school, Pallen says it wasn’t difficult to come up with new names. He simply looked up the ancient Greek and Latin translations for words such as “chicken,” “bird,” “gut,” “microbe,” and began following the languages’ grammatical rules to recombine the words’ roots. In total, he came up with 160 new genus names and 41 species epithets, for a total of more than 600 new binomial species names. “I realized actually there was great power in this combinatorial approach, where you can just use different ways of saying the same thing,” taking advantage of the various synonyms for each word, says Pallen. “And I thought, this is probably of more general interest.”

I spent a fair part of [last] summer just buried in Greek and Latin dictionaries.

—Mark Pallen, Quadram Institute

Indeed, as researchers employ metagenomic approaches to investigate samples from all sorts of environments, they are revealing more of the millions upon millions of as-yet unknown—and unnamed—microbes on Earth. The problem is that most microbiologists are simply not trained in nomenclature. “Naming organisms in this whole sphere of microbiology seems to really get missed in education,” says Alison Murray, a microbial ecologist at the Desert Research Institute in Reno, Nevada. “It’s this little mysterious thing that happens when people isolate organisms, and then they have to drill down and figure out, well, what do you have to do to name it?”

It’s not just ancient grammar that researchers have to worry about. Scientists must follow the 9 principles, 65 rules, and 8 recommendations set forth by the International Code of Nomenclature of Prokaryotes. While Pallen’s language studies had prepared him for the task, he still had the chicken gut microbe names he’d come up with checked by the Hebrew University of Jerusalem’s Aharon Oren, one of only a handful of nomenclature reviewers at the International Journal of Systematic and Evolutionary Microbiology (IJSEC), where new microbe names are codified. 

Pallen had created the names manually, moving around the Latin and Greek roots using spreadsheets and a word processor, but he realized that the approach could likely be automated. He called on his Quadram Institute colleague Andrea Telatin to write a Python script that could take input about the species to be named and output a proper Latin or Greek name along with information about its etymology. Pallen then set to work compiling the databases that could be fed into the program. “I spent a fair part of [last] summer just buried in Greek and Latin dictionaries and on various sites, just searching for multiple terms that meant similar things,” he says. 


Pallen looked up the roots of words for the general environments (say, an animal host) and the specific environments (such as the intestines) in which novel microbes were found, as well as for the microbes themselves (bacteria, coccus, etc.). He also gathered some useful prefixes such as neo- for “new” and crypto- for “hidden” or “secret” that could be added to existing genus and species names. Telatin finished the program, which the team dubbed the Great Automatic Nomenclator, and Pallen used it to generate entirely new genus names—for example, Equintestimonas, for a microbe (“monas”) that lives in the intestines (“intestine-”) of a horse (“equi-”)—plus prefix-tweaked species epithets, for a total of more than 1 million new binomial species names. In December, he and his colleagues presented the approach as a proof of principle. With the creation of more input tables, they say, the program could be applied to any batch of newly discovered microbe species. 

“I thought this was amazing,” says Murray, who notes that she and others have recently been discussing ways to use computers to help with the problem of naming new species. “This is the right team to have gone through the effort to think about how to actually do this. [It’s] really bold of them . . . to take a first whack at it.” She says she suspects that this publication will attract microbiologists’ attention and perhaps get others to start building on the idea of automating the naming of new species.

In addition to helping microbiologists quickly and easily create accurate and informative binomial names for new species they discover, the approach could help ease the burden on the limited number of IJSEC nomenclature reviewers. By ensuring that language rules are properly followed, and by providing details about the name’s etymology, the Great Automatic Nomenclator streamlines one step in the review process, says Michigan State University’s George Garrity, a nomenclature reviewer at IJSEC. “It provides a very useful resource that automates a process that would ordinarily take a fair amount of manual intervention.”

But there’s still room for improvement, Pallen says of the program; “this is version 1.0.” One major task going forward will be to extend the program to be able to generate species names using the full combinatorial approach as it can for genus names, as opposed to simply adding prefixes to existing species names, he says. “Normally, creating species epithets is a lot more fiddly than creating genus names,” Pallen explains, and doing so “requires some linguistic skill.” He adds that he also hopes to make the software easier to use, with an interface that accepts simple inputs, as opposed to having to work with the command line programming as in the current version. 

Murray says that she hopes the attention that nomenclature is now getting could help unofficially work the practice into the consciousness of microbiologists. And if user-friendly software allowed researchers to input basic information about their organisms and see that information translated into a proper binomial species name, Murray muses, it may serve as “a way to entrain the next generation in this aspect of microbiology.”

This article is part of a two-part series on the challenges of species classification. 

Read about researchers who identified a novel species of comb jelly using only video footage here.