Even the simplest biochemical pathway - ligand, receptor, intracellular messenger, output - is far more complex than a simple back-of-the-envelope sketch. Signaling molecules have multiple kinase and phosphatase regulators, and gene promoters can be simultaneously influenced by both positive and negative transcription factors. Ligands, of course, are themselves subject to myriad control systems.
In short, biological systems are not binary entities, in which something is either on or off; they are stochastic, subject to nonlinear behavior.
Fortunately, a range of tools exists to help you build computational models, from the simple to the complex. Here, experts helped us break down the four main classes of computational models, showing how they've made use of the data available.
1. Network Mapping
The simplest type of model is the computational equivalent of that back-of-the-envelope sketch: a network map. Trey Ideker, at the University of California, San Diego, leads one group of researchers that is developing software to create these maps.
Cytoscape (www.cytoscape.org) maps pathway components and the connections between them-the cellular equivalent of an electrician's wiring diagram. Such diagrams are useful, but like a circuit diagram, they are devoid of dynamic information, such as when and where each interaction occurs.
Genome-scale approaches to identify protein-protein, protein-nucleic acid, and protein-metabolite interactions provide the data for constructing network maps, and a surprising amount already exists. Resources like the Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu) and KEGG (www.genome.jp/kegg) provide such data, which can be input directly into Cytoscape to build maps.
From there, the challenge is to get a running simulation, that is, a computational model. "A circuit diagram is static," says Ideker, "but what people then do is they simulate dynamic voltages across that circuit." Similarly, researchers can use network maps as the foundations for building computational models that can test how cellular events propagate.
Though conceptually simple, Ideker says, network mapping should not be overlooked: Lack of knowledge about pathway intermediates can invalidate, and in many cases preclude, a simulation.
2. Correlation Models
Though they often are considered distinct, most biological pathways talk to each other. Take, for instance, tumor necrosis factor (TNF) and epidermal growth factor (EGF) signaling. Peter Sorger at Harvard Medical School and colleagues recently published a pair of papers on these two pathways, the first of which used correlation modeling to identify those intracellular signals that most tightly correlated with TNF signaling.
A correlation model renders a pathway at a relatively granular level, without necessarily considering each element explicitly. As a result, it can be built with little a priori knowledge of the pathway components.
Suppose for instance, that you know ligand levels, that some intracellular molecule is phosphorylated, and the resulting phenotype. "If your model has gaps like that you can use correlation modeling ... you can have black boxes between the stages," says Hamid Bolouri at the Institute for Systems Biology in Seattle.
Correlation models use data types similar to those for kinetic models. But, says Sorger, the difference is in its density: "Kinetic modeling requires a greater density of data." Correlation modeling can also more easily accommodate heterogeneous data such as biochemical levels, activities, and phenotypes, he adds. "Kinetic modeling, being more local, cannot easily predict phenotype." The tradeoff, says Bolouri, is that while correlation models "can make good predictions from incomplete data ... the model is not mechanistic."
Nevertheless, correlation models can reveal unexpected insights. "EGF receptor activation was as tightly correlated with TNF as it was with EGF treatment," says Sorger. "That seems nuts, until you realize that as soon as you add TNF, the cell releases an EGF-like ligand in the first of a four-step autocrine cascade."
3. Logical Models
Logical models describe biological processes as a series of simple Boolean logic gates: If X and not Y then Z, and so on. Reka Albert at Pennsylvania State University and her colleagues used this to model the behavior of plant stomata - the pores on plant leaves that allow carbon dioxide in and oxygen out.
"We have about 100 components in our network," she says, "and about half of them are unknown." Given that lack of knowledge, and the dearth of information describing how these components interact, Albert's group developed a logical model to simulate the pathway.
"Because there is almost no quantitative or kinetic information available for the components and interactions, we assumed two states for each component," she explains - on or off, or active or inactive. One of the components of the model, for instance, was "actin cytoskeleton reorganization," whose Boolean rule was "cytosolic calcium OR NOT RAC1."
Though intuitive and relatively simple to construct (assuming you have some mechanistic knowledge of your pathway) such models have several drawbacks. Biological components typically do not follow Boolean behavior. In addition, with no kinetic data inputs, the models can identify the important components but not the concentrations to which they correspond.
Still, logical models can yield benefits. Albert's team was able to use its model of stomata behavior to predict accurately the effect of intracellular pH on the process. "We tested that experimentally and found good qualitative agreement."
4. Kinetic Modeling
Kinetic models seek to describe the temporal and spatial behavior of every component in the system individually, using differential equations. Sorger says that they represent the pinnacle of complexity in computational modeling. "With kinetics, you require the most knowledge." John Tyson at Virginia Polytechnic Institute and State University uses kinetic modeling in his work on cell-cycle oscillation.
Data that can be factored into the model include: mutant phenotypes; how proteins interact, are activated, and are degraded; and how fast things change. In the case of the cell-cycle oscillator, the literature is awash in data, Tyson says, but consolidating that information into a simple set of rules was out of the question. "It's impossible to figure out intuitively how all these components will work together," he says. "The network is too complicated, with too many contradictory signals."
Untangling these signals can provide unexpected insights into the so-called emergent behaviors of complex systems. These are properties that are only apparent in the context of the whole system. The cell-cycle oscillator, for example, does nothing dramatic for a time, until some critical threshold is attained, and suddenly there's a burst of activity. Then everything quiets down and the cycle begins again. "These threshold events are nonlinear, and simple linear models don't capture them," Bolouri says.
But in the absence of data, simple linear models sometimes are required. Indeed, Tyson's work on the cell-cycle oscillator has been marked by a long series of refinements along the continuum of model complexity.
"When we first started modeling the cell cycle, we didn't know all the parts," says Tyson. "We built a detailed kinetic model of part of the cycle, and used a simple Boolean model for the rest, where we had no detail." Thus with refinements they've been able to fill the gaps.
"Modeling is a way to refine your understanding of the molecular basis of cell physiology," he says, "a way to determine the implications of what you're thinking. A kinetic model can often tell you where your thinking is wrong and how to fix it. "