"I’ve been collaborating with computational biologists since before genomics was genomics,” says Ross Hardison, a comparative geneticist at Pennsylvania State University. Back then, “we called it molecular cloning,” he says.
Hardison began working on genomics in the 1980s, when algorithms for aligning DNA sequences could only handle 10,000 base pairs, he says. Long strings of sequences were still spliced together by hand—“with effort”—and many biologists doubted whether generating enormous amounts of genomic data was even worthwhile.
“Many people thought we should stay focused on small model systems. They didn’t want to drown in a sea of sequences,” Hardison remembers. Although he and his colleagues could tease apart some regulatory pathways in a few organisms, without better comparative tools they could only guess whether their findings might apply to other organisms.
Biologists and informaticians must learn to speak each other’s languages in order benefit from the alliance.
Enter computer-scientist Webb Miller. He approached Hardison, envisioning a system whereby whole genomes could be aligned and compared. Hardison was skeptical knowing this would require programs capable of handling hundreds of thousands of base pairs, but as he needed to laboriously tape together panels of individually generated sequence alignments for any figure over 10 kilobases long, he was willing to give Miller’s idea a shot.
The initial joint venture would turn into an ongoing, fruitful collaboration for both researchers. But Miller and Hardison had to learn to speak each other’s languages in order benefit from the alliance.
When Miller came to Hardison with his first attempt at an algorithm that would align two genomes, hoping for immediate feedback, Hardison promised to look as soon as he got a chance. “[Miller] asked me, ‘If I come back to your office at noon on Monday, will you have something?’” recalls Hardison. “I didn’t realize it, but he was already setting the tone of our collaboration.” Miller needed Hardison’s feedback in order to move forward. Frequent dialogues about the alignments quickly resulted in new and better algorithms. “You have to have computational help as a biologist,” says Hardison, and collaboration offers fantastic opportunities. “It has enriched my scientific pursuits enormously, in ways I had never foreseen.”
Even when computational biologists and classically trained biologists appreciate each other’s input, they don’t always understand each other’s work processes. The Scientist talked to researchers and computational biologists for tips on how to plan for and get the most out of bioinformatics collaborations.
In the Beginning
Get a partner or a hired gun?
Before contacting potential collaborators, make sure that’s what you really want. A collaborator will, by definition, be involved in the process and may want to help shape the questions your project is designed to answer. Some bioinformaticians are willing to analyze sequence data without knowing anything about it, says Lindsay Farrer, chief of biomedical genetics at Boston University who investigates the genetic component of diseases such as Alzheimer’s. Others may actually be interested in a project’s bigger biological questions, he says. If you don’t want the extra input, it may be better to hire an expert, recommends Sanja Rogic, who works at the Centre for High-Throughput Biology at the University of British Columbia. Pricing can depend on the number of samples to be analyzed, but sequencing and interpreting a single genome runs about $5,000 or more. Many institutions also have bioinformatics core facilities that can provide such services for a registration fee and a bioinformatician’s hourly fee. But if you’re interested in a collaboration, Rogic cautions, don’t treat the analysis step as “something a technician does quickly after [biologists] generate the data.”
Don’t expect all things from one person
Before approaching a bioinformatician, make sure you know what you need from him or her. There are a variety of skills a computational biologist can offer, including database management, programming, and data analysis. A smaller academic lab needs to decide whether to bring in collaborators or find someone who can be involved “soup to nuts,” says Farrer. And if you want help designing your experiments, you might need a statistician instead.
Get your point across
One of the most frustrating problems for a bioinformatician is working with a biologist who can’t clearly and concisely explain his scientific question. A bioinformatician will be immediately intrigued if a biologist can clearly communicate how the concept for a study is novel, says Simon Kasif, a biomedical engineer at Boston University who studies signaling networks in disease. When biologists and computational biologists understand each other, it’s possible to define more challenging problems. “The bar is raised for what is expected of the project,” says Kasif.
Be an early bird
If you’re hiring a computational biologist, it’s a good idea to discuss the project early on. Rogic has seen RNAseq data sets in which poorly considered controls prevented all the data from being analyzed. To avoid such boondoggles, biologists and informaticians working toward a common goal, such as measuring gene expression patterns in response to oxidative stress, should each list relevant controls. The lists will be substantially different—and equally valuable, says Nigam Shah, a biomedical informatics scientist at Stanford University.
Get on the same page
Make sure everyone involved has the same goals for a project. Computational biologists more interested in developing new methodologies may not work comfortably with biologists who only want to use established tools to probe their biological problem. Some projects, like the Human Genome Project, in which data analysis is light, go smoothly, explains Kasif, but some suffer from incompatible goals, such as perfecting the methods versus trying to unearth deeper biological knowledge. Agreeing on which goal takes priority will enable a smooth group effort.
Make the cash flow clear
Another catch can be funding, says Kasif. Biologists tend to be much more pragmatic about money. “Bioinformaticians will be proposing all these crazy ideas, and the biologists will say, ‘But how will we pay for that?’ They know how expensive it is to produce such large amounts of data,” he adds. Ideally, Kasif says, bioinformaticians will contribute funding to make sure the necessary experiments get done.
Get a clue
It helps for each side to have a basic working knowledge of what the other does. Ask your collaborators for the important papers from their field, suggests Rogic. There’s an art to picking the best computational tools for the job, explains Aimée Dudley, a biologist at the Institute for Systems Biology in Seattle who focuses on gene network regulation, just as there’s almost never a single “right” method. So a bioinformatician needs to make sure her biologist collaborator understands why specific tools are being used and what information they convey. Also, knowing how the results will be analyzed makes it easier for a biologist to design effective experiments and then to understand the analysis, she says.
Make the limitations clear
Make sure your collaborators understand the limitations of your contribution. For bioinformaticians, that means conveying the weaknesses inherent in their predictions and analyses, says Kasif. Experiments can fail for reasons beyond the experimenter’s control, which should in turn be made clear to the computational biologist, Dudley notes. Discussing the sources of variability in a specific data set can also help stimulate discussions of where to draw the line between “meaningful” data and noise, says Hardison.
Add some data to that data
Biologists often don’t realize how enormous data sets have to be for bioinformaticians to glean meaningful patterns and insights. Data set size requirements must be clear from the initial conversations. Biologists often don’t think about how much data they need to generate statistically significant conclusions, says Rogic; their samples may be too few, and sometimes the controls used are just what’s conveniently in the freezer. In turn, bioinformaticians need to be sensitive to the time and money involved in generating such data, she says.
Define data usage and exclusivity
It’s also important to make clear how the data can be used, and by whom. A curious bioinformatician, says Shah, may want to add the new data set to publicly available data sets. This could make it possible to ask new questions an experimental biologist cannot—and they might even be more interesting than the original questions. But a biologist who designed the experiment and feels a sense of ownership over the data might resent a collaborator squeezing a different paper out of it. “It’s awkward when [the bioinformatician gets] a higher-profile paper,” says Shah.
Dudley says that it can also get sticky when an experimental biologist is approached by a second computational biologist, who is a potential new collaborator. If both bioinformaticians end up applying the same tools to analyze the data, it can become “a little like cheating,” says Dudley.
Give it time
The amount of time it takes to run an experiment or to analyze the data can often be woefully misunderstood by the other party. Biologists can frustrate computational biologists by expecting their analyses to be completed within hours. In reality, the work, which can involve programming, running simulations, and analyzing data sets, may take days or even weeks, says James Collins, a biomedical engineer at Boston University. Computational biologists, in turn, may not understand why a biological experiment can take even more time than originally anticipated. “Sometimes they don’t recognize that reagents can be back-ordered, or that there are queues to get on the right machine,” says Dudley. Collecting human data can take lots of time, too. So make sure everyone has a sense of how long each step may take and how things can stall.
It’s important to keep communication open during the entire process, says Dudley. This can take many forms, from in-person meetings to e-mail and videoconferencing updates with collaborators in far-flung locations. Dudley, who manages projects with upwards of 20 researchers, often communicates with collaborators daily. The number and type of updates will depend on the project and personalities involved, but all collaborators need to be kept updated often enough to respond to new information, new ideas, and new directions. Make sure “no one feels stupid when asking a question,” Dudley explains—because everyone should be asking questions.
Give a round of applause
Computational biology has come a long way in the decades since Miller sat on a grant review board and heard a biologist refer to a computer scientist as a “bottle washer.” Today computational biologists are held in higher regard, but a little more appreciation never hurts. No one wants to feel like a “code monkey or a data monkey,” Shah notes. Give credit where credit is due, says Hardison: “Really, it’s playground rules.”