ANDRZEJ KRAUZE
We are surrounded by an invisible world of microorganisms—including many species of bacteria, archaea, and fungi—that play fundamental roles in natural processes, from cycling carbon in soil to fermenting food in the mammalian gut. But until recently, there hasn’t been a standardized way of documenting these ubiquitous little organisms, making it difficult to fully understand the extent of their functions on Earth.
In the summer of 2010, 26 leading experts in microbiology and bioinformatics congregated for a workshop in Snowbird, Utah, to discuss the challenges standing in the way of achieving this goal. The trouble, the group concluded, was that while laboratories across the world were rapidly advancing their knowledge of microbes using genetic sequencing, they were going about it in completely different ways, which made it difficult to compare one group’s data on microbial samples to another’s.
“People who studied particular systems tended to use different...
Knight was particularly interested in global ecological questions about how microbial communities are distributed, and what factors dictate their distributions. So, together with two colleagues from the workshop, Janet Jansson, chief scientist for biology and laboratory fellow at Pacific Northwest National Laboratory, and Jack Gilbert, a professor of surgery at the University of Chicago and director of the Microbiome Center, Knight founded the Earth Microbiome Project (EMP). Its goal was simple: “Get started with characterizing microbes on a scale that nobody had known before,” Knight says.
The trio put out a call to microbiologists around the world to send physical samples to one of their three laboratories, with the promise that the team would sequence the microbes harbored in them according to a standardized protocol and make the resulting genomic data publicly available. These included host-associated samples, such as those from primate guts or the skins of Komodo dragons; aquatic ones, collected from oceans and lakes; and sediment and soil samples, gathered everywhere from the ocean floor to the Alaskan permafrost.
More than 500 researchers sent in samples, from 43 countries across the world.
Not everyone thought the project was doable, however. For example, Jonathan Eisen, a professor in evolutionary biology at the University of California, Davis, who also attended the 2010 workshop, told Nature in 2012, “Knight and Gilbert literally talk about sequencing the entire planet. It is ludicrous and not feasible—yet they are doing it.”
Knight had his own doubts, too: one initial concern was that even with the lure of free sequencing, researchers would want to hang on to their own samples. “But to my delight that turned out to not be true,” he says. More than 500 researchers sent in samples, from 43 countries across the world. The team soon had thousands of samples—all neatly packed into about 25 freezers across the three founders’ laboratories.
The researchers’ meta-analysis, published late last year in Nature with 300 coauthors, describes almost 28,000 samples from labs around the world (Nature, 551:457-63, 2017).
To catalog the microbes, the team developed a standardized protocol that involved probing for and sequencing the 16S ribosomal RNA (rRNA) gene, which serves as a unique barcode for species of bacteria and archaea. They also used a new method to remove sequencing errors in the data to ensure accuracy. The researchers managed to build a framework to denote where the sequence came from, and which other sequences it was found with—making the addition of any additional sequences to the database easier.
Using this protocol, the team detected a total of 307,572 unique 16S rRNA sequences from the microbial samples. For around 90 percent of these sequences, precise matches could not be identified in reference databases. For Jansson, the database opens up many possibilities: “If we get a sequence and we don’t know where it comes from, [we] could have a good probability of finding that it was a soil microbe or one that was associated with a host, or an aquatic microbe, just based on this sequence.”
The researchers also performed a meta-analysis in order to explore ecological principles in microbiology. For instance, they debunked the notion that microbial richness correlates positively with temperature—in fact, data from non-host-associated samples suggest that microbial richness peaks at a narrow and relatively cool temperature range, and then declines, depending on pH and the type of sample.
It’s not the only study to take advantage of the EMP’s resources. The EMP also undertook the DNA extraction, sequencing, and analysis of the samples involved in about 100 other individual studies. These contributions to the database have caused it to steadily grow since the first entries were made public in 2011.
One global study of the microbes associated with sponges demonstrated that these microorganisms are major contributors to the microbial diversity of the world’s oceans (Nat Commun, 7:11870, 2016). A different study took a close look at the gut microbes of ant- and termite-eating placental mammals such as aardvarks and pangolins, and found that diet and phylogeny are both important factors in shaping the evolution of mammalian gut microbiota (Mol Ecol, doi:10.1111/mec.12501, 2013).
Some researchers went to great lengths to collect the samples that fueled these projects. This includes Jansson herself, who happened to be working on a collaborative research project on microbial communities and oil when the Deepwater Horizon oil rig exploded in the Gulf of Mexico in 2010. Before the wellhead was closed, oil and gas company British Petroleum agreed to send out fleets to collect sediment samples from the 1,500-meter-deep seafloor to find out how microbes in ocean sediments were responding to the oil spill. Jansson was able to show in real time how microbes were helping digest large amounts of oil that would otherwise have reached the shoreline, and she identified new microbes that had genes for oil degradation (ISME J, 8:1464-75, 2014).
“We now have ideas about who they are, even though we’ve never cultivated them,” she tells The Scientist. “That was so exciting.” She enthusiastically added her samples to the database, where anyone can view them.
Thanks to efforts such as these, the total number of samples collated by the EMP has now reached 100,000, says Knight. He hopes the database will grow even further: the team has made its sequencing protocol publicly available, so that laboratories across the world will be able to contribute their own data directly. And Janssen notes that the scale of the database will allow many researchers to make predictions about what kinds of microorganisms to expect in different environments—or indeed the inverse: to link a microbe to its environment of origin based only on its 16S rRNA sequence.
Being able to make such links could have applications across many scientific disciplines, from microbiology to forensics. For example, in 2001, when at least 22 people contracted anthrax that had been mailed through the US postal service, the FBI was able to use genetic analysis to trace back the spores to the likely source, a single flask in a laboratory in Maryland. A similar approach could also be used to pinpoint the source of food- or water-borne microbial pathogens, or microbes found in specks of dirt at crime scenes.
It may be a while before such evidence routinely reaches the courtroom, says Randall Murch, a former Special Agent and senior executive with the FBI who led the creation of a department within the agency devoted to combining microbiological and forensic sciences to support bioterrorist and criminal investigations. But databases such as the EMP’s could significantly contribute to propelling the field forward, he says. “Anyone in this field understands that repositories—properly constructed repositories of microbes—are crucial.”