Sequencing the sea's microbial communities uncovers millions of new proteins and thousands of unknown protein families
By Melissa Lee Phillips | March 14, 2007
A sampling of genetic sequences from ocean microbial communities reveals millions of new proteins and thousands of new protein families, according to a report in this months Public Library of Science Biology.
The analysis also suggests that continued sampling of microbial communities will reveal novel protein families "for some time to come," said study first author Shibu Yooseph of the J. Craig Venter Institute in Rockville, Md.
From 2003 to 2005, the Sorcerer II Global Ocean Sampling expedition collected seawater samples from a range of the world's oceans. Yooseph and his colleagues analyzed the genetic fragments in these samples using the metagenomics technique of shotgun sequencing. They assembled 7.7 million microbial sequences and predicted that these sequences code for 6.12 million proteins -- nearly twice the number of proteins in current databases, according to the authors.
"The actual number of proteins might not be surprising to some people," Yooseph told The Scientist, but "we were really surprised with the amount of diversity."
The authors found that many of the newly discovered proteins clustered into previously unknown protein families. They have evidence for at least 2,000 fairly large clusters of novel protein families, Yooseph said.
These findings suggest that additional analyses of samples from other environments -- such as soil or the deep sea -- "will continue to reveal a great deal of novelty in terms of proteins," Yooseph said.
"It's been known for some years that we still are in a linear phase in terms of protein family discovery," said Darren Natale of Georgetown University Medical Center in Washington, D.C., who was not involved in the study. But "it's nice that the dataset here is so large and that it still holds," he told The Scientist.
"Having this enormous additional number of sequences from very unfamiliar organisms will be immensely useful," said Cyrus Chothia of MRC Laboratory of Molecular Biology in Cambridge, UK. "It will give us much greater information about how diverse families can be."
However, Chothia added that relying solely on sequence information to determine protein families may be misleading. Without structural or functional information about these proteins, it remains possible that some of the new proteins are simply relatives of known proteins whose sequences diverged greatly, Chothia said. "It may well be true that they found entirely new families, but it could be true that they are very distant relatives of known families," he told The Scientist.
Yooseph and his colleagues also found that several protein domains thought to be kingdom-specific are actually found in more than one kingdom of life. These findings suggest either that some lineages are more ancient than previously thought, or that these shared sequences have jumped kingdoms through lateral gene transfer, Yooseph said. "We will need to look at those on a case-by-case basis."
The expedition's samples also turned up more sequences of viral origin than suspected, Yooseph said, indicating that researchers are far from fully exploring the diversity of viruses. For example, the researchers found that at least two protein families -- UV repair enzymes and glutamine synthetase -- contain many new viral additions.
In accompanying papers in the same issue of PLoS Biology, researchers present analyses of two other aspects of Sorcerer II data. In the first paper, Douglas B. Rusch of the Venter Institute and colleagues analyze genome structure and evolution in the microbial samples and present new methods for measuring the genomic similarity between metagenomic samples. They also show various ways in which oceanic organisms differ based on their location or environmental pressures. In the other paper, Natarajan Kannan of the University of California, San Diego and colleagues examine the protein kinase-like (PKL) superfamily, and report that these proteins cluster into 20 major families which contain many family-specific features.
Melissa Lee Phillips
Links within this article
S. Yooseph et al., "The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families," PLoS Biology, March 2007.
J.M. Perkel, "The big picture in microbial genomics," The Scientist, July 1, 2006.
T.M. Powledge, "Shotgun sequencing comes of age," The Scientist, December 31, 2002.
J.M. Perkel, "Bacterial census of Texas air reveals microbial diversity," The Scientist, December 19, 2006.
D.B. Rusch et al., "The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific," PLoS Biology, March 2007.
N. Kannan et al., "Structural and Functional Diversity of the Microbial Kinome," PLoS Biology, March 2007.