All Systems Go
Some peculiar microorganisms are showing systems biology can color in what's missing from models of biochemical and cellular networks.
By Elie Dolgin
n April 22, 2006, Nitin Baliga, a microbiologist at the Institute for Systems Biology in Seattle, was spending a lazy Saturday afternoon at home, when he noticed an enticing email in his inbox from his ISB collaborator Richard Bonneau. The subject line: "woooooohoooooo!"
Baliga's team had just constructed a new model that could predict the molecular-level responses of a free-living cell to genetic and environmental changes. That cell, however, was not Escherichia coli or yeast. It was the little-known archaeon Halobacterium salinarum, a tiny extremophile that thrives in highly saline lakes such as the Great Salt Lake and the Dead Sea.
The model was accurately predicting Halobacterium's dynamics at the genome scale. But could it predict new molecular-level responses to changes in environmental conditions...
1 Tackling such a multitude of unknowns all at once in any organism, especially one from a peculiar domain of life, was sure to be difficult, Baliga admits, but he wasn't deterred. "My goal was to deconstruct the whole bug," he says. "How I was going to do it in detail, I did not know."
Shortly thereafter, Baliga joined the then-fledgling ISB as a postdoc working with Leroy Hood, the so-called godfather of systems biology. He set to work developing the basic lab tools to achieve his then "pie in the sky" idea, as he calls it. The knock-out genetics needed improvement, no suitable protein expression system existed for Halobacterium, and the microbe's high salt environment denatured all the standard enzymes and antibodies.
Once he got everything up and running, Baliga continued to amass data, including microarray transcript profiles, gene knockout responses, and vast protein catalogs. "All of these together gelled the whole effort rapidly," Baliga says. Not only did the lab techniques get better, faster, and cheaper, but importantly, analyzing the preliminary results helped identify critical computational problems that needed solutions before genome-scale modeling would be feasible.
Baliga set up his own research group at the ISB in 2002, and together with Bonneau and ISB physicists-turned-computational biologists David Reiss and Vesteinn Thorsson he devised a three-pronged approach for transforming the tangles of data culled from more than 150 different experiments into a cohesive model that could reconstruct global gene networks. They first developed an algorithm called cMonkey that grouped functionally related genes in an environmental condition-dependent manner. This is important because patterns of co-expression often vary significantly across diverse environmental settings, which leads to genetic relationships that are valid under some, but not all, conditions. This first step was critical for later modeling success, says Bonneau, now at New York University. "If you screw this step up, you're hosed."
Next, the researchers integrated the predicted gene clusters with mRNA and protein time series data to create a dynamic gene network model with a temporal component. Inferelator—the tool Baliga's team developed, referred to as "inf" in Bonneau's ecstatic email—took the cMonkey gene clusters, incorporated real-time expression data, and created a suite of mathematical equations that could successfully predict many novel gene regulatory relationships. Lastly, to visualize and analyze the copious amounts of data, the researchers developed Gaggle, an open-source software system for integrating different bioinformatics tools and databases. "Then we just sat staring at [the model] for a year or two trying to figure out what the heck we had done and what it all meant," Baliga says.
In the meantime, Baliga's team had accumulated additional data from around 130 new experiments, including novel environmental perturbations, unique gene-environment combinations, and different time series measurements. Since these data were completely unlike what was used to create the algorithms, they provided the perfect test of the model's predictive power. Bonneau crunched the new data and it worked. The model's new predictions of Halobacterium's cellular turnover matched the actual experimental results with the same precision as in the data used to fit the model. And it could spit out accurate predictions of the transcriptional responses of more than 1,900 genes (around 80% of the genome).2
"Their network is predictive in addition to providing [gene] topology," says Tim Gardner, associate director of computational biology at Amyris Biotechnologies, a bioenergy company in Emeryville, Calif. "If you stimulate gene A, it says what will be the likely responses in the rest of the cell." The Halobacterium gene network is probably the most powerful predictive regulatory model to date, adds Michael Laub, a microbiologist at the Massachusetts Institute of Technology. "It really demonstrates where the field is going and the sorts of things that are going to be possible."
To Baliga, the reason the model can predict responses to so many novel stimuli is clear: Although there are near-infinite numbers of environmental factors, many of those different stimuli are linked. For example, radiation is connected to temperature, which in turn affects gas solubility, pressure, and salinity, to name a few. Thus, if the radiation changes, the bug's built-in wiring which has been molded by billions of years of evolution, anticipates future environmental changes in linked factors and adjusts its gene expression accordingly. So even though Baliga never primed the model with salinity, for example, he could accurately predict the relevant salinity-related genes as they are often the same ones that change with temperature or radiation. This interconnectedness "is the fundamental property of biological systems," Baliga says.
A fully predictive systems biology model for Shewanella is "not quite there yet," says PNNL microbiologist Alexander Beliaev. But the Shewanella researchers are narrowing in on one, although they're taking a different approach than Baliga's group. In addition to tackling a model focused on gene transcript levels—a so-called "gene regulatory network," for which the researchers have modeled more than 1,000 gene interactions—the group is pursuing a metabolic model that can predict growth and metabolism under various environmental conditions. Shewanella is ideal for such a model because it flourishes nearly everywhere.
Of Baliga's Halobacterium model, Beliaev says, "This is a good approach. But what you have is a regulatory network. To have a [fully] predictive model, you also need to have information about the metabolic network, because in the very end … the metabolic model is what's going to predict your cell's behavior." Currently, the Shewanella metabolic model contains 774 reactions, 634 metabolites, 783 genes, and counting. This is a ways behind E. coli's best metabolic model with its more than 2,000 reactions, 1,000 metabolites, and 1,200 genes,3 but ahead of other microbes such as Clostridium acetobutylicum. Eventually, Beliaev hopes that by understanding metabolic responses to the environment, his team will be able to use Shewanella in toxic metal bioremediation and in biofuel production.
Pakrasi and his colleagues are also hot on the heels of genome-scale models in Cyanothece. At the last count, their metabolic model was on par with Shewanella's, with 719 reactions, 749 metabolites, and 574 genes. "We're still definitely in the mid-stages of the [metabolic] model," says Jennie Reed, a bioengineer at the University of Wisconsin-Madison, who works on the project. Meanwhile, the gene regulatory model is progressing quite rapidly. Last year, Pakrasi's team published Cyanothece's genome and transcriptome, and now they're working on the proteome.
The initial proteomic results have important implications for all biological research, says Dick Smith, a chief scientist at the PNNL who collaborates with Pakrasi. "In going from the transcriptome to the proteome there's not a one-to-one correlation," Smith says. Rather, there's only about a 50% overlap. "Where we find concordance [between the proteome and transcriptome], our models that are based on time series data work very well," adds Pakrasi. "But 50% of the time they're not working simply because we're not getting into more detailed [proteome] analysis."
Smith is starting to make more detailed analyses possible by vastly improving the mass spectrometry techniques used for high-throughput proteomics. This has created the same obstacle encountered by Baliga's group before him, though—namely, a glut of data to sort out. "To suddenly have an order of magnitude increase in data acquisition is a challenge," he says. "But it's a good challenge."
In response, McDermott developed a bioinformatics tool to graphically explore gene networks in Cyanothece, whose gene activity waxes and wanes in a predictable fashion throughout the day. Yet that's not always intuitively obvious when you stare at the heat maps or time plots that researchers have traditionally relied upon, McDermott says. So when he first created his illustration, he figured Christmas had come early: Not only was the software working, but through his graphical representation, which looked like a circular wreath, he could finally make sense of the immense Cyanothece dataset. "It portrays that temporal and cyclical nature of this data in a more intuitive fashion," he says.
The image depicted a ring of interconnected genes, all mapped neatly onto time. If you look at the top of the ring, McDermott explains, you can find all the genes that are active at dawn. And at the bottom, sit all the genes that peak at mid-day. "In this way the wreath mimics a clock," he says (See figure to the left).
Currently, the Cyanothece model can predict some aspects of the regulatory network, such as which genes are essential for survival, but the scientists can't yet model time-dependence or responses to novel stimuli. "The models that we've generated are predictive in a certain way," McDermott says, "but we'd need a few more datasets to get a good predictive model." At the moment, they only have overlapping transcriptomic, proteomic and metabolomic data from the same sample for one environmental condition—the standard 12 hour light, 12 hour dark cycle.
Still, considering that Cyanothece's genome and first-pass transcriptome were only published last year, Pakrasi suspects that more predictive models won't be far off. "What we've been doing for the last three to four years is we were generating just the baseline data compared to other well studied organisms," he says. "We've made significant process… I hope 2009 is the year for a [fully predictive] model."
Have a comment? E-mail us at References