All Systems Go

All Systems Go Some peculiar microorganisms are showing systems biology can color in what's missing from models of biochemical and cellular networks. By Elie Dolgin n April 22, 2006, Nitin Baliga, a microbiologist at the Institute for Systems Biology in Seattle, was spending a lazy Saturday afternoon at home, when he noticed an enticing email in his inbox from his ISB collaborator Richard Bonneau. The subject line: "woooooohoooooo!"

By | March 1, 2009

All Systems Go

Some peculiar microorganisms are showing systems biology can color in what's missing from models of biochemical and cellular networks.

By Elie Dolgin

n April 22, 2006, Nitin Baliga, a microbiologist at the Institute for Systems Biology in Seattle, was spending a lazy Saturday afternoon at home, when he noticed an enticing email in his inbox from his ISB collaborator Richard Bonneau. The subject line: "woooooohoooooo!"

Baliga's team had just constructed a new model that could predict the molecular-level responses of a free-living cell to genetic and environmental changes. That cell, however, was not Escherichia coli or yeast. It was the little-known archaeon Halobacterium salinarum, a tiny extremophile that thrives in highly saline lakes such as the Great Salt Lake and the Dead Sea.

The model was accurately predicting Halobacterium's dynamics at the genome scale. But could it predict new molecular-level responses to changes in environmental conditions not tested in the initial data used to construct the model? Yes, Bonneau had just found out, and he was so thrilled that he couldn't wait to share his findings—or finish his sentences.

Related Articles

1 Tackling such a multitude of unknowns all at once in any organism, especially one from a peculiar domain of life, was sure to be difficult, Baliga admits, but he wasn't deterred. "My goal was to deconstruct the whole bug," he says. "How I was going to do it in detail, I did not know."

Shortly thereafter, Baliga joined the then-fledgling ISB as a postdoc working with Leroy Hood, the so-called godfather of systems biology. He set to work developing the basic lab tools to achieve his then "pie in the sky" idea, as he calls it. The knock-out genetics needed improvement, no suitable protein expression system existed for Halobacterium, and the microbe's high salt environment denatured all the standard enzymes and antibodies.

Once he got everything up and running, Baliga continued to amass data, including microarray transcript profiles, gene knockout responses, and vast protein catalogs. "All of these together gelled the whole effort rapidly," Baliga says. Not only did the lab techniques get better, faster, and cheaper, but importantly, analyzing the preliminary results helped identify critical computational problems that needed solutions before genome-scale modeling would be feasible.

Jim Fredrickson

Baliga set up his own research group at the ISB in 2002, and together with Bonneau and ISB physicists-turned-computational biologists David Reiss and Vesteinn Thorsson he devised a three-pronged approach for transforming the tangles of data culled from more than 150 different experiments into a cohesive model that could reconstruct global gene networks. They first developed an algorithm called cMonkey that grouped functionally related genes in an environmental condition-dependent manner. This is important because patterns of co-expression often vary significantly across diverse environmental settings, which leads to genetic relationships that are valid under some, but not all, conditions. This first step was critical for later modeling success, says Bonneau, now at New York University. "If you screw this step up, you're hosed."

Next, the researchers integrated the predicted gene clusters with mRNA and protein time series data to create a dynamic gene network model with a temporal component. Inferelator—the tool Baliga's team developed, referred to as "inf" in Bonneau's ecstatic email—took the cMonkey gene clusters, incorporated real-time expression data, and created a suite of mathematical equations that could successfully predict many novel gene regulatory relationships. Lastly, to visualize and analyze the copious amounts of data, the researchers developed Gaggle, an open-source software system for integrating different bioinformatics tools and databases. "Then we just sat staring at [the model] for a year or two trying to figure out what the heck we had done and what it all meant," Baliga says.

In the meantime, Baliga's team had accumulated additional data from around 130 new experiments, including novel environmental perturbations, unique gene-environment combinations, and different time series measurements. Since these data were completely unlike what was used to create the algorithms, they provided the perfect test of the model's predictive power. Bonneau crunched the new data and it worked. The model's new predictions of Halobacterium's cellular turnover matched the actual experimental results with the same precision as in the data used to fit the model. And it could spit out accurate predictions of the transcriptional responses of more than 1,900 genes (around 80% of the genome).2

"The techniques that you develop from a modeling standpoint in these simpler systems will extrapolate to more complex systems." -Jason McDermott

"Their network is predictive in addition to providing [gene] topology," says Tim Gardner, associate director of computational biology at Amyris Biotechnologies, a bioenergy company in Emeryville, Calif. "If you stimulate gene A, it says what will be the likely responses in the rest of the cell." The Halobacterium gene network is probably the most powerful predictive regulatory model to date, adds Michael Laub, a microbiologist at the Massachusetts Institute of Technology. "It really demonstrates where the field is going and the sorts of things that are going to be possible."

To Baliga, the reason the model can predict responses to so many novel stimuli is clear: Although there are near-infinite numbers of environmental factors, many of those different stimuli are linked. For example, radiation is connected to temperature, which in turn affects gas solubility, pressure, and salinity, to name a few. Thus, if the radiation changes, the bug's built-in wiring which has been molded by billions of years of evolution, anticipates future environmental changes in linked factors and adjusts its gene expression accordingly. So even though Baliga never primed the model with salinity, for example, he could accurately predict the relevant salinity-related genes as they are often the same ones that change with temperature or radiation. This interconnectedness "is the fundamental property of biological systems," Baliga says.

A fully predictive systems biology model for Shewanella is "not quite there yet," says PNNL microbiologist Alexander Beliaev. But the Shewanella researchers are narrowing in on one, although they're taking a different approach than Baliga's group. In addition to tackling a model focused on gene transcript levels—a so-called "gene regulatory network," for which the researchers have modeled more than 1,000 gene interactions—the group is pursuing a metabolic model that can predict growth and metabolism under various environmental conditions. Shewanella is ideal for such a model because it flourishes nearly everywhere.

Himadri Pakrasi

Of Baliga's Halobacterium model, Beliaev says, "This is a good approach. But what you have is a regulatory network. To have a [fully] predictive model, you also need to have information about the metabolic network, because in the very end … the metabolic model is what's going to predict your cell's behavior." Currently, the Shewanella metabolic model contains 774 reactions, 634 metabolites, 783 genes, and counting. This is a ways behind E. coli's best metabolic model with its more than 2,000 reactions, 1,000 metabolites, and 1,200 genes,3 but ahead of other microbes such as Clostridium acetobutylicum. Eventually, Beliaev hopes that by understanding metabolic responses to the environment, his team will be able to use Shewanella in toxic metal bioremediation and in biofuel production.

Pakrasi and his colleagues are also hot on the heels of genome-scale models in Cyanothece. At the last count, their metabolic model was on par with Shewanella's, with 719 reactions, 749 metabolites, and 574 genes. "We're still definitely in the mid-stages of the [metabolic] model," says Jennie Reed, a bioengineer at the University of Wisconsin-Madison, who works on the project. Meanwhile, the gene regulatory model is progressing quite rapidly. Last year, Pakrasi's team published Cyanothece's genome and transcriptome, and now they're working on the proteome.

"To suddenly have an order of magnitude increase in data acquistion is a challenge. But it's a good challenge." -Dick Smith

The initial proteomic results have important implications for all biological research, says Dick Smith, a chief scientist at the PNNL who collaborates with Pakrasi. "In going from the transcriptome to the proteome there's not a one-to-one correlation," Smith says. Rather, there's only about a 50% overlap. "Where we find concordance [between the proteome and transcriptome], our models that are based on time series data work very well," adds Pakrasi. "But 50% of the time they're not working simply because we're not getting into more detailed [proteome] analysis."

Smith is starting to make more detailed analyses possible by vastly improving the mass spectrometry techniques used for high-throughput proteomics. This has created the same obstacle encountered by Baliga's group before him, though—namely, a glut of data to sort out. "To suddenly have an order of magnitude increase in data acquisition is a challenge," he says. "But it's a good challenge."

In response, McDermott developed a bioinformatics tool to graphically explore gene networks in Cyanothece, whose gene activity waxes and wanes in a predictable fashion throughout the day. Yet that's not always intuitively obvious when you stare at the heat maps or time plots that researchers have traditionally relied upon, McDermott says. So when he first created his illustration, he figured Christmas had come early: Not only was the software working, but through his graphical representation, which looked like a circular wreath, he could finally make sense of the immense Cyanothece dataset. "It portrays that temporal and cyclical nature of this data in a more intuitive fashion," he says.

Jason McDermott's 'wreath' of gene activation in Cyanothece

The image depicted a ring of interconnected genes, all mapped neatly onto time. If you look at the top of the ring, McDermott explains, you can find all the genes that are active at dawn. And at the bottom, sit all the genes that peak at mid-day. "In this way the wreath mimics a clock," he says (See figure to the left).

Currently, the Cyanothece model can predict some aspects of the regulatory network, such as which genes are essential for survival, but the scientists can't yet model time-dependence or responses to novel stimuli. "The models that we've generated are predictive in a certain way," McDermott says, "but we'd need a few more datasets to get a good predictive model." At the moment, they only have overlapping transcriptomic, proteomic and metabolomic data from the same sample for one environmental condition—the standard 12 hour light, 12 hour dark cycle.

Still, considering that Cyanothece's genome and first-pass transcriptome were only published last year, Pakrasi suspects that more predictive models won't be far off. "What we've been doing for the last three to four years is we were generating just the baseline data compared to other well studied organisms," he says. "We've made significant process… I hope 2009 is the year for a [fully predictive] model."

Have a comment? E-mail us at References

1. W.V. Ng et al., "Genome sequence of Halobacterium species NRC-1," Proc Natl Acad Sci, 97:12176–81, 2000.
2. R. Bonneau et al., "A predictive model for transcriptional control of physiology in a free living cell," Cell, 131:1354–65, 2007.
3. A.M. Feist et al., "A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information," Mol Sys Biol, 3:121, 2007.


Avatar of: Joan Slonczewski

Joan Slonczewski

Posts: 1

March 3, 2009

This article about systems biology of haloarchaea is very interesting, but it incorrectly states that E. coli lives in a uniform environment. In fact, E. coli navigates extremes of pH, oxygen, nutrient levels, and in some cases salinity. The global stress response studies of E. coli pioneered by Fred Neidhardt and colleagues laid the groundwork for studies such as those reported here.
Avatar of: Ellen Hunt

Ellen Hunt

Posts: 199

March 3, 2009

E. coli doesn't navigate the extremes of halobacteria, but it definitely changes expression. \n\nI would be more curious about the system developed because I worked with AI and pseudo-AI approaches to other problems at one time. This sounds like some type of training dataset into an "inference engine". But how was that inference system made? A problem with such systems is that they can present you with results without being able to enlighten you as to what exactly is going on. In other words, predictive is not necessarily intelligible.
Avatar of: Nitin Baliga

Nitin Baliga

Posts: 1

March 3, 2009

I agree with the comment that E. coli (for that matter any organism) is capable of adapting to environmental changes. I also do not subscribe to the idea that there are housekeeping genes that are constitutively expressed and not regulated. There is ample evidence that every gene in any organism is under some type of regulation; for example, in our study of Halobacterium over 80% of genes were included in the model suggesting they were differentially regulated in some or all of the environments we tested. \nWith regard to the nature of the model: we have shown that disregarding the time component can reduce the predictive power of the Inferelator model suggesting that regulatory influences in the inferred network are causal. Furthermore, the model recapitulates and extends a lot of known biology and has also provided experimentally testable hypotheses that have led to novel biological insights. These issues have been addressed very carefully in the Cell paper as we think the model should eventually represent true operational relationships within a cell.
Avatar of: Matthew Grossman

Matthew Grossman

Posts: 27

March 3, 2009

Good article but the comments regarding E. coli, other than it is very very well studied, are sheer nonsense.\n\nE. coli is one of the most metabolically versatile microbes and certainly experiences a vast range of environmental changes in its daily life.
Avatar of: Vinay Rale

Vinay Rale

Posts: 6

March 30, 2009

Comparison to E.coli is to have a very well known benchmark. We know much more about E.coli.\nFor that matter every bacterium is versatile , till we find out its versatility in different niches and its own. Thanks.\n\nVinay Rale

Popular Now

  1. A Coral to Outlast Climate Change
  2. Science Celebrities: Where Are the Women?
  3. First In Vivo Human Genome Editing to Be Tested in New Clinical Trial
  4. Understanding Body Ownership and Agency