Following a recent computational biology meeting, a group of us got together for dinner, during which the subject of our individual research projects came up. After I described my efforts to model signaling pathways, the young scientist next to me shrugged and said that models were of no use to him because he did "discovery-driven research". He then went on to state that discovery-driven research is hypothesis-free, and thus independent of the preexisting bias of traditional biology. I listened patiently, because I have heard this argument many times before.
I was too polite to point out that all biological research was hypothesis-driven, although the hypothesis might be implicit. Genomic sequencing projects might seem to lack a hypothesis, but the resulting data is exploited by hypothesizing specific evolutionary relationships between different genes.
The idea there are actually two distinct ways of conducting biological research was formally proposed several years ago in a Nature Biotechnology commentary (R. Aebersold et al., 18:359, 2000). The authors described "discovery science," like genome sequencing projects, as blindly cataloguing the elements of a system, disregarding any hypotheses on how it works. In contrast, they described "hypothesis-driven science" as being small-scale, narrowly focused, and using a limited range of technologies.
Although the authors' intent was to justify large-scale research as a valid way to approach biological problems (another frequent topic at after-meeting dinners), in my opinion, casting it as hypothesis-free did the emerging field of systems biology a great disservice. To imply that large-scale systems biology research can be productively conducted without a prior set of underlying hypotheses is nonsense. A good hypothesis is at the heart of the best science, regardless of scale.
We started our systems biology program almost eight years ago, and one of our first projects was to establish the relationship between specific cell signaling pathways and both gene and protein expression. We thought that important patterns would quickly become self-evident, but sorting through lists of thousands of genes and proteins quickly dissuaded us of that idea. We could see patterns, but they simply did not make any obvious sense. We mostly know the relationship between gene expression and subsequent protein levels, but looking at thousands of genes made it seem more complex, and overwhelmed our intuition.
To extract biological meaning from the data required a level of simplification. And this is where we needed a hypothesis. By postulating that specific classes of proteins were degraded at an accelerated rate, for example, we could create hypothetical patterns against which to compare our data. This allowed us to quickly look for both expected and unexpected relationships. After our initial, disappointing foray into "discovery science", we subsequently used specific hypotheses to guide our experimental designs. For example, by proposing that signaling pathways regulate the shedding of proteins from the cell surface, we were able to identify these proteins, relate them to specific signaling pathways, and discover that they are frequently released by cancer cells (Jacobs et al., J Proteome Res, 7:558, 2008).
Despite the importance of hypotheses in systems biology research, they are not always explicitly stated. As biologists, we are well trained in posing small, specific questions, but we have little familiarity with framing systems-level hypotheses. (Unlike small questions, systems-level hypotheses might take the form of postulating how the outputs from different signaling pathways are combined.) Likewise, our intuition regarding systems-level relationships in biological systems is difficult to translate into experimental design. This is why computational models are so central to systems biology research. Unlike humans, computers are very good at keeping track of complex relationships and predicting how low-level changes will alter higher-level functions. Computational models, however, must be built from a set of explicit, hypothesized relationships.
Finding meaningful relationships in complex datasets also requires starting with the appropriate data. A hypothesis usually takes the form of a mechanistic relationship between a specific cause and a consequent effect, and this will almost always depend on experimental context. There are some circumstances when data must be gathered in the absence of context or hypothesis to characterize a system, but it is unrealistic to expect such preliminary studies to lead to significant biological insights. For this, you need a hypothesis.
Systems biology might be the future of biology, but we still need hypotheses to take us where we want to go.
Steven Wiley is a Pacific Northwest National Laboratory Fellow and director of PNNL's Biomolecular Systems Initiative.