A recent toast to James Watson highlights a tolerance for bigotry many want excised from the scientific community.
In today’s data-heavy research environment, wet-lab scientists can benefit from new computational skills.
July 1, 2016|
© ISTOCK.COM/NICOLASAdelaide Rhodes had no idea a tiny crustacean would fuel such a big career shift. About a decade ago, as a postdoc at the University of Washington, she was researching copepods—microscopic organisms that convert unsaturated fatty acids into the omega-3 fats that make salmon a healthy meal. They’re “what fish eat to get fat,” Rhodes says. During an aquaculture boom, she began hunting for genes involved in the fat-converting process. Trouble was, very few researchers studied copepod genetics. Back in 2005, Rhodes’s searches for “copepod and lipids” on the DNA Data Bank of Japan, European Nucleotide Archive, and GenBank yielded no results. When she searched “crustacean,” she got a list of some 50 genes, but none were related to lipid metabolism.
Undeterred, Rhodes broadened her search to include insect genes, then designed primer sets and ran countless PCR assays to check if those same genes were found in copepods. She also went around at meetings asking other researchers if they had copepod data or sequences to share. Rhodes eventually identified two potential copepod desaturases—enzymes that introduce double bonds into fatty acid chains. However, she couldn’t confirm whether those genes are specific to copepods, because there weren’t enough publicly available crustacean genomes for comparison.
These days, she wouldn’t have that problem. When researchers identify a new genomic sequence, they can use modern computing and bioinformatics tools to check for its presence in related species’ genomes with just a few keystrokes. And as technical advances yield unmanageable amounts of data across diverse fields, more wet-lab scientists are turning to bioinformatics to make sense of their results. Online courses, workshops, and a growing community of bioinformatics-savvy researchers are now available to help scientists better understand available data-analysis tools or create their own—or even to persuade them to leave the bench altogether for a computing career.
After her struggles to identify copepod genes involved in omega-3 fatty acid production, Rhodes went on to do two additional postdocs: at Smithsonian Marine Station in Fort Pierce, Florida, and at Texas A&M University–Corpus Christi. Knee-deep in computational analyses by then, she returned to school and completed a master’s degree in bioinformatics at Johns Hopkins University in 2012.
You don’t need to be an expert in computational tools or bioinformatics or math.—Raquel Hontecillas-Magarzo,
Rhodes, who is now a researcher and bioinformatics trainer at Oregon State University, wasn’t alone in choosing to make this career shift. A recent survey by the jobs and recruiting site Glassdoor.com rated “data scientist” as the best job of 2016, and in 2012 Harvard Business Review called it the “sexiest job of the 21st century.” But even if you’re not looking to change paths, a little bioinformatics know-how can still be helpful in the lab.
As an immunology postdoc at Virginia Tech, Raquel Hontecillas-Magarzo worked with mice and did molecular biology experiments. She then spent two years doing benchwork at the Spanish Institute for Research and Agriculture in Madrid before returning to Virginia Tech’s Biocomplexity Institute as an assistant professor. So when the university assembled a team to develop computational models for studying human immunity to gut pathogens, Hontecillas-Magarzo was tapped for her expertise in experimental design. The team included life scientists, physicists, bioinformaticians, and software engineers—about a 50:50 mix of experimental and computational researchers.
At a weeklong symposium on computational immunity in summer 2014, Hontecillas-Magarzo and other immunologists learned how computational tools could deepen their analysis of wet-lab data and suggest new hypotheses that might not seem intuitive based on the literature. Nowadays, Hontecillas-Magarzo uses computer simulations to model the behavior of immune cells during infection by Helicobacter pylori, a bacterium that can cause ulcers. She and her colleagues define the simulation’s parameters based on experimental data—for instance, the level of T-cell activity measured on the third day of an H. pylori infection in a mouse. Recently, a sensitivity analysis using this model suggested that anti-inflammatory macrophages may help maintain mucosal integrity and prevent stomach epithelial cells from dying during H. pylori infection. These in silico analyses don’t reveal underlying mechanisms. However, they can show that “if you change one [element], it has a significant effect on the other,” which helps inform decisions on what to validate in bench experiments, says Hontecillas-Magarzo. She is currently conducting mouse studies to follow up on the macrophage/epithelial link.
Even without access to local courses or symposia, wet-lab researchers can gain familiarity with computational and bioinformatics methods by arranging collaborations with research groups whose members have that expertise, suggests Josep Bassaganya-Riera, who he directs Virginia Tech’s Nutritional Immunology and Molecular Medicine Laboratory, which includes Hontecillas-Magarzo’s lab. Researchers specifically interested in computational immunology can find links to books, tutorials, and other resources at this Virginia Tech site.
To analyze data with interdisciplinary teams, “you don’t need to be an expert in computational tools or bioinformatics or math,” Hontecillas-Magarzo says. However, “you need some level of understanding. You need to understand some of their terminology.”
Kathleen Fisch, a computational biologist at the University of California, San Diego, got her first taste of bioinformatics from evolutionary biologist Craig Moritz as a University of California, Berkeley, undergraduate using geographic software to map climatic niches of hummingbirds. But it wasn’t formal instruction. “He’d say, ‘Here are some data points. Go play with the software,’” Fisch says. Then, while working on her PhD at UC Davis, Fisch continued dabbling with computational tools—using a program called Structure to detect population structure from microsatellite DNA markers and SPAGeDi (Spatial Pattern Analysis of Genetic Diversity) to assess the genetic diversity of endangered smelt populations in the San Francisco Bay Delta.
But those software packages were developed by others; Fisch wanted to create her own. She started by learning Python and R, two widely used programming languages. “I bought a bunch of books and hardly looked at them,” Fisch jokes. Instead, she immersed herself in online courses through Coursera. Five days a week, Fisch logged onto Coursera to watch lectures, puzzle over problems, and get feedback from fellow students. At first the “computational stuff seems intimidating,” says Fisch, “but it’s totally within your grasp if you have time to dedicate to it.” (The Python and R classes were free when Fisch took them five years ago, though Coursera now charges $79–$99 per course for similar offerings. Upon successful completion, students earn electronic course certificates that can be added to their LinkedIn profile.)
Researchers can also learn computational basics by attending Data Carpentry and Software Carpentry courses. Software Carpentry runs about 100 two-day workshops around the world each year, teaching core skills for research computing through short tutorials and practical exercises. All instruction is done via live coding. While Software Carpentry is mostly aimed at researchers who are already doing some data analysis and programming, its sister organization, Data Carpentry, is good for those who are just beginning the transition from spreadsheets to R, Python, and command-line data analysis.
With more computational skills under her belt, Fisch decided to leave the bench entirely and work with Scripps Research Institute bioinformatician Andrew Su, whose lab builds and applies tools to use crowdsourcing for genetics and genomics. As a Scripps postdoc, Fisch learned how to analyze next-generation sequencing data on different platforms and has collaborated with multiple research groups on projects ranging from precision medicine studies in breast cancer to systems biology analyses of osteoarthritis. “Working with lots of PIs and collaborators, I was able to get exposed to pretty much all the next-gen sequencing types,” Fisch says. In the fall of 2014, she took a job at UC San Diego, where she currently works at the Institute of Genomic Medicine developing an open-source platform to automate multi-omics data analysis pipelines on computer clusters and in the cloud.
PHOTO PROVIDED BY MIEPAs she began taking Coursera classes to learn Python and R, Fisch also turned to help from colleagues and an online community forum called StackOverflow, where she picked up the basics of a command-line language called bash. Although self-teaching on a “need to know” basis was probably not as comprehensive as a formal college course, “it was enough to get me off the ground,” Fisch told The Scientist. The collection of Python “recipes” on GitHub.com, a public code repository, is another good resource for bioinformatics code snippets and concepts.
Another postdoc in Su’s lab at Scripps, Tim Putman, also waded into bioinformatics with help from a supportive community. When he began his PhD research at Oregon State University (OSU) in 2010, Putman conducted cell biology experiments to study the pathogenesis of Chlamydia infection. But sequencing bacterial genomes and doing comparative genomics quickly hooked Putman on the analysis side of the research. To do that kind of work, he needed to navigate the Linux environment to pull files from other servers, extract the data he wanted, and run Python and R scripts to reformat the results to work with the lab’s algorithms.
Putman picked up some command-line basics from other members of his lab. He also took a Python workshop offered on campus through OSU’s Center for Genome Research and Biocomputing. Another big help was OSU’s bioinformatics users group (BUG). This group of life scientists, bioinformaticians, computer scientists, mathematicians, and engineers meets every other week to chat over lunch about metagenomics, structured query language (SQL), and other computational challenges. The primary goal of BUG is “getting people into the same room to chat about what they’re learning and what they’re struggling with,” says Shawn O’Neill, one of OSU’s bioinformatics trainers.
Indeed, Putman sometimes found others in the room who had answers for his nagging problems. “A big thing I learned from BUG people was how to configure and debug the command-line tools and set up my environment,” he says. “This can be a big hurdle to someone new to computer science.”
UC Davis also provides opportunities for bioinformaticians to share struggles and solutions, through a forum called the Data Intensive Biology program. The sessions are organized by Titus Brown, a key leader of a grassroots movement to have bioinformaticians train each other. Some of the discussions are broadcast online, so that interested researchers outside the UC Davis campus can participate. And some attendees, including Rhodes, meet periodically to hash out new course materials and teaching methodologies.
PHOTO PROVIDED BY MIEPIt was during one of these UC Davis workshops that Rhodes learned about several techniques for hybrid de novo assembly of Illumina data. So when she returned to OSU and Putman complained to her that there were no reference sequences with which to align his own bacterial genomes, she encouraged him to look into new tools. “It was great because she was out learning about cutting edge stuff from the leaders in the field and then bringing it back to researchers at OSU,” Putman says.
Now at Scripps, Putman is putting his bioinformatics experiences to good use, working with colleagues to build a web interface application that will allow researchers to explore how their gene of interest is connected to proteins, drugs, enzymatic substrates, and microbes hosted in Wikidata, a community-curated database for many types of structured data. Users will also be able to use the application to add their own microbial data to the database. Reflecting on his career journey, Putman feels fortunate to have had so many resources to guide his transition from the bench to bioinformatics. “For what I’m doing now, it’s more typical to have a computer science background,” he says.
Rhodes, too, is grateful for the little crustaceans that nudged her toward computational research. “I feel my switch to bioinformatics has enabled me to ask bigger and more interesting questions than before,” she says. “I still hope to answer my original research question about how copepods produce highly unsaturated omega-3 fatty acids, but I now have the ability to ask even more compelling questions touching on biodiversity, adaptation, and evolution.”
Esther Landhuis is a freelance science writer living in the San Francisco Bay Area.