It was the code that hooked Bob Murphy on biology. At age 13, he visited New York City’s American Museum of Natural History with his parents and picked up a copy of Isaac Asimov’s book The Genetic Code in the gift shop. After reading it, “I came downstairs, and I told my parents, ‘I know what I want to do with my life,’” he recalls. “I was just fascinated by the idea that you could decode biological information, that biological systems were built on this DNA material that could be converted into RNAs and proteins.”
That was in the mid-1960s. Murphy, true to his word, went on to study biochemistry at Columbia University. Then in 1974, when he was in graduate school at Caltech, he encountered another type of code that would shape his career. One day, after he’d extracted proteins from chromatin and run polyacrylamide gels of the samples, he and others in his lab were “having discussions about how to analyze [the gels], and somebody said, ‘You know, that’s the kind of thing that you could use a computer for,’” Murphy recalls. “And I literally said, ‘What do you mean, use a computer?’”
Murphy was introduced to the computing lab at Caltech, and was soon learning to code—first in BASIC, and later in Fortran and MACRO-11. His advisor, James Bonner, acquired the lab’s first minicomputer during Murphy’s time there, and Murphy remembers connecting the lab’s spectrophotometers and other instruments to the computer and developing software to allow automated data collection. With that computational assistance, he studied chromatin structure, determining that the distances between DNA-histone complexes called nucleosomes change during DNA replication. It was the start of a career devising computer programs to answer biological questions.
Harnessing Big Data
Murphy earned his PhD in 1980 and returned to Columbia for a postdoc with Charles Cantor. In Cantor’s lab, Murphy started working on identifying interactions between histones, which led him to the question of how cells take up certain chromatin components in the first place. He tackled the problem using a fluorescence cytometer that Cantor had just acquired for the lab. “You could put a suspension of cells in the instrument, and it would flow them through a laser and then measure the fluorescence given off by each cell,” Murphy explains. “This was a very early high-throughput, data-generating instrument, and it was perfect for the kinds of things that I was interested in, especially because it produced a lot of data quickly.” Using fluorescent probes, he and Cantor were among the first to show that the compartments containing molecules that have been gobbled from the cell’s surface via endocytosis undergo a rapid drop in pH. The researchers also detailed the kinetics of what happens to those materials once a compartment enters the cell.
When it came time to apply for faculty jobs, Murphy saw an ad that seemed too perfect to pass up: Carnegie Mellon University (CMU) in Pittsburgh was starting a new center for applying fluorescence to biology experiments and was looking for professors. Murphy joined CMU in 1983 and has never left.
Murphy’s early work at CMU continued in a similar vein to his research in Cantor’s lab, using fluorescence spectrometry to trace the kinetics of what happens to materials after they are endocytosed into cells. For example, his team found strong evidence that the transmembrane protein Na+,K+–ATPase regulates acidification that he and Cantor had identified in endosomes.
Seeing is Believing
In the course of his fluorescence spectrometry research, Murphy learned that if one of his studies found, for example, that a particular cargo would end up inside a specific type of cellular compartment, reviewers would ask what the compartment looked like. “My first reaction was, well . . . the point here is to study the kinetics and show the biochemistry of this,” he says. But eventually he relented. “We would then just go and do a very simple microscopy experiment, and that would make the reviewers happy. It actually hadn’t changed the story, but there was a picture to go along with the story.”
Similarly, at scientific talks he went to in the early 1990s, Murphy would see microscopy images given center stage—and, in his view, used to support models of what happened inside cells via tenuous reasoning. “I kept saying to myself, somebody has to try to find a way to make these images into something closer to data, something that you actually can operate on. . . . At a certain point I decided that we would try to do it.”
The machine doesn’t need us. It’s the same way that a self-driving car works, in that you get in the self-driving car, and you tell it, “I want to go to Cleveland,” and it figures it out. You don’t tell it how to get to Cleveland, you just tell it what your goal is.—Bob Murphy, CMU
Murphy “is one of these people who’s brilliant, and he’s almost always right,” says Mario Roederer, who was Murphy’s first graduate student in the mid-1980s and is now an immunologist at the National Institute of Allergy and Infectious Diseases. “His work ethic was amazing.”
That work ethic served Murphy well as he and then MD/PhD student Michael Boland went about trying to computationally analyze images of cells. They would label proteins that home to specific organelles, then feed images of the labeled cells into a computer to train it to use pattern recognition to group together cells that contained the same labeled protein. It was a machine learning approach to visually identifying organelles within cells that removed the need for a human to first make the determination of which organelles were shown in training images. As a result, it could potentially achieve greater accuracy than people could.
“When we started, I would go to cell biology meetings and talk about the idea of recognizing what organelle a protein is in by this automated approach, and the reaction most people had was . . . ‘No, you have to go to grad school in cell biology to be able to tell the difference between [organelles].’ It took a lot of subsequent work to convince people that this was a viable approach,” Murphy recalls.
His group’s first paper on the image recognition program came out in 1997, and eventually his team was able to achieve “basically perfect” accuracy with it, he says. Yet all was not smooth sailing. He had originally hoped the program would learn to recognize proteins with similar distribution patterns as members of a common class—for example, to group different lysosome-localizing proteins in the same bucket. Instead, the program picked up on subtle differences in the distribution patterns, such that “almost every new protein we looked at wasn’t readily recognized as being one of those classes,” he explains. Rather than continue trying to develop a classification model, the group switched gears, embracing the messy complexity of protein dynamics. They trained a model to break down each protein’s distribution into a mixture of fundamental patterns, so that it could determine which organelles a protein was likely to be found in.
A New Paradigm
Murphy’s fascination with biology has remained, even as the idea that first attracted him to the field—that life is a code that can be cracked, with each gene corresponding to a single function—has failed the test of time. “The way we used to think about these kinds of problems was to try to simplify things as much as possible—reductionism—but it’s been at least 20 years since we’ve realized that that’s not going to work, that biological systems are complex systems,” he told The Scientist late last year, in an interview in his corner office in one of Carnegie Mellon’s newer buildings, overlooking the Steel City. Murphy talked about slides projected on a large wall-mounted flat-screen TV with the polish of someone practiced at conveying the reasons for his excitement to nonspecialists. Given biology’s complexity, he says, it’s just not possible to do all of the experiments needed to figure out all the interactions that govern each of a cell’s functions in a traditional, hypothesis-driven way. “We need to have a way in which we can only do only the experiments that we need to do and not do all possible experiments.”
There are no hard and fast rules in biology, as there are in physics, Murphy elaborated in a later phone interview, because there are always exceptions. Computational models provide a way forward for biologists. For example, let’s say a researcher had 96 drug candidates and wanted to know how they’d act on 96 different proteins within a cell line. Doing 9,216 experiments is out of the question, so instead, the researcher aims to do some fraction of those experiments, and use the results to train a machine learning program to model what the outcome of the others would have been.
That training process will be most effective if it’s active rather than passive, Murphy says. That means that, rather than feeding a computer program a large data set for training, he wants researchers to hand the reins over to the program from the get-go, enabling it to determine which experiments’ results would be most useful for improving its model—and then, to go get those results. By hooking up computers that run machine learning programs to instruments such as robotic liquid handlers and microscopes, and putting needed starting materials, such as drug candidates and cell lines, within robot arm’s reach, his group has created automated setups that can construct and continuously refine experiments in response to research questions. In the drug candidate and cell-line example, the program starts out running experiments with nearly random combinations of the two. As it runs microarray analyses to “see” the results, it builds a model predicting what the results would be of all possible combinations. The program then interrogates that model to see which of its predictions are the most uncertain, and runs those experiments, using the results to further refine the model. As the cycle repeats, the accuracy of the model increases.
“It doesn’t need us. It’s the same way that a self-driving car works, in that you get in the self-driving car, and you tell it, ‘I want to go to Cleveland,’ and it figures it out. You don’t tell it how to get to Cleveland, you just tell it what your goal is,” Murphy says. Accordingly, he’s dubbed these active machine learning setups “self-driving instruments.”
Driving Into the Future
Not surprisingly, realizing this ambitious vision of experiments with minimal human oversight requires plenty of people-powered planning, setup, and tinkering. Even as his own lab works out the kinks, Murphy has been working on launching a new CMU master’s program in automated science to teach trainees how to set up and maintain self-driving instruments. The first master’s program students will arrive this fall to an array of new equipment to practice on, including robotic liquid handling, microscopy, and nucleic acid extraction instruments. Murphy expects the program’s graduates to fan out to industry employers and national labs, or to go on to earn PhDs and eventually set up their own research labs that harness artificial intelligence and automation.
We really believe that these self-driving instruments are going to change the way science is done.—Bob Murphy, CMU
The students in the new program should be prepared for a challenge. Both Roederer and Greg Johnson, who completed his PhD in Murphy’s lab in 2016 and is now a scientist at the Allen Institute for Cell Science in Seattle, say Murphy is a tough and rigorous mentor. In one-on-one meetings, Johnson recalls Murphy repeatedly saying, “What is the question that you’re asking, and why is it important?”—and how does that jibe with the computational methods used to address it. “[He] definitely was the first person who I met who had a clear articulation of the tight coupling between computational modeling and biological research,” Johnson says.
Murphy expects that the utility of such modeling, coupled with self-driving instruments, will extend to experiments on many different species. He and colleague Joshua Kangas have already collaborated with plant biologists to automate an experiment on how various chemicals affect the growth of Arabidopsis protoplast cultures, and while that particular study didn’t work out as hoped, Murphy and Kangas think machines could one day steer such plant experiments, and perhaps similar experiments in animals. Alternatively, a machine learning program could produce a readout of instructions for, say, mouse experiments it needs to refine its model, and a researcher could perform the experiments and feed the data back into the program. “The methods that we’re working on are generalizable, and we try to find collaborations that will test the parts of the methods. . . that we think might need the most work,” Murphy says.
Using images along with machine learning, Murphy thinks biology will advance toward a much more detailed structural understanding of spatial relationships within cells—where a given protein will be found in relationship to the cell membrane, for example, or to microtubules. Having that foundation will make it feasible to investigate how perturbations such as mutations or drug candidates change those orientations. Building comprehensive computational models of cells’ spatial relationships will require the work of many labs, Murphy says, and he thinks the role of his lab is to “develop tools that will enable that, and describe a way in which we think that this task could be done.” In one recent step in that direction, he and a graduate student worked with Seema Lakdawala of the University of Pittsburgh School of Medicine to use images to train a model of the spatial relationships of different segments of influenza RNA to predict how they likely come together within an infected cell to produce new infectious particles.
Ultimately, Murphy says, “We really believe that these self-driving instruments are going to change the way science is done.”