ISTOCK, EGALWhen Lior Pachter came across one of the latest publications from the federally funded Genotype-Tissue Expression (GTEx) project, he couldn’t suppress his disappointment.
In the paper, published last October, researchers from the GTEx consortium had analyzed RNA sequencing (RNA-seq) data from more than 40 tissue types in the human body. The findings themselves were exciting, says Pachter, a computational biologist at Caltech. But a single line, tucked away in the methods section, left him feeling exasperated. The line read: “RNA-seq reads were aligned to the human genome . . . using TopHat (v1.4).”
In response, Pachter took to Twitter. “Please stop using Tophat,” he wrote in early December. “There is no reason to use it anymore.”
TopHat version 1.4 was a 2012 update to an open-source program conceived by Pachter and his colleagues in 2008 that aligns reads from RNA-seq experiments to a reference genome. Not only...
“The original TopHat program is very far out of date, not just in time, but in performance—it’s really been superseded,” Pachter tells The Scientist. “By now, in 2017, certainly a high-profile consortium with interesting data oughtn’t be using this tool.”
Kristin Ardlie, director of the GTEx Laboratory Data Analysis and Coordination Center at the Broad Institute, notes that the group does pay careful attention to its choice of tool, but that there are inevitable delays given the project’s scale.
“Getting consortium papers written and to a publication endpoint can take a long time,” she writes in an email to The Scientist. The data for the October publications were finalized in 2014, and made public in 2015. “The original analyses of that would have been performed months before that time,” she adds. (TopHat2, TopHat’s immediate predecessor, became available in 2012.) “We do consider [TopHat v1.4] out of date (or that there are better versions available), and we have indeed updated our tools many times since.” More recent GTEx projects use STAR.
But Pachter points out that GTEx isn’t the only group putting out papers citing obsolete versions of the software. Since its 2009 publication, the original TopHat paper, coauthored by Pachter, his graduate student Cole Trapnell, and Trapnell’s coadvisor, Steven Salzberg, has racked up more than 6,500 citations—of which more than 1,000 were logged in the last year.
It sends the message that it doesn’t really matter what program you use, that they’re all similar—and that’s not really the case.
—Lior Pachter,
Caltech
And TopHat is just one of many out-of-date computational tools to have become embedded as bad scientific habits. Indeed, anecdotal evidence, as well as recent research into the issue, suggest that the use of obsolete software is widespread in the biological sciences community, and rarely even recognized as a problem.
“Quite often, we’ve encountered students or faculty who have been unconsciously using these outdated software tools,” says Jüri Reimand, a computational cancer biologist at the University of Toronto. Asked why they haven’t considered updating their workflows, “they usually answer because they were first familiarized with those tools and they didn’t really pay attention to whether they were updated frequently.”
There’s now growing momentum to counter this attitude, as it becomes increasingly obvious that the choice of computational software can have a substantial influence on the progress of science. Not only do users of older methods fail to take advantage of faster and more-accurate algorithms, improved data sets, and tweaks and fixes that avoid bugs in earlier versions, they also contribute to a reproducibility crisis due to differences in the results new and old methods produce.
From that perspective, “when users are using very old tools that we really know are not the right thing to use, it in a sense devalues the contributions of all of us developing new methodology,” says Pachter. “It sends the message that it doesn’t really matter what program you use, that they’re all similar—and that’s not really the case.”
The effect of outdated software on results and reproducibility
The last few years have seen a handful of efforts to quantify the effect of using outdated computational tools on biological research. In 2016, Reimand and his colleagues explored 25 web-based pathway enrichment tools—programs that help researchers tap into online databases to make sense of experimental genetic data. The team wanted to know whether updates to these databases and the software being used to access them were making their way into the literature, and whether those changes had an effect on scientific results.
It is not the effect of people just taking a long time to publish results.
—Jüri Reimand,
University of Toronto
Their findings were damning. In a letter to the editor published in Nature Methods, the researchers wrote that “the use of outdated resources has strongly affected practical genomic analysis and recent literature: 67% of ∼3,900 publications we surveyed in 2015 referenced outdated software that captured only 26% of biological processes and pathways identified using current resources.”
The main culprit in that statistic was a popular gene annotation software called DAVID, which, in 2015, had not been revised since 2010 (although it has since been updated). Despite its failure to discover nearly three-quarters of the information revealed using more-recent alternatives, DAVID had made it into more than 2,500 publications—many of which must have used the tool when it was already substantially out of date and superseded by other available tools, Reimand notes. “It is not the effect of people just taking a long time to publish results.”
Even when a single tool is regularly updated, the research community may significantly lag behind, as highlighted by a 2017 study by University of Pennsylvania pharmacologist and computational biologist Casey Greene and his former graduate student, Brett Beaulieu-Jones, in Nature Biotechnology.
The duo focused on just one tool: BrainArray Custom CDF, an online resource developed in 2005 consisting of various files that aid gene-expression experiments by matching DNA probes to genes. Combing through the 100 most recent publications that employed the tool, now in its 22nd version, Greene and Beaulieu-Jones found that more than half omitted which version the authors used altogether, making these studies’ findings essentially unreproducible. The remaining papers, which were published between 2014 and 2016, cited nine different versions, ranging from 6 to 19.
When the researchers applied several recent BrainArray Custom CDF versions to a gene-expression data set—obtained from human cell lines engineered to lack particular T-cell proteins—they found multiple discrepancies in the results. For example, while versions 18 and 19 both identified a total of around 220 genes showing significantly altered expression compared to controls, 10 genes that were identified using version 18 were omitted by version 19, and a further 15 genes that were identified using version 19 were missed by version 18.
“It’s making a difference at the margins,” says Greene. “If one of those is your favorite gene, it might change your interpretation.”
Raising awareness of the need to stay up to date
Studies such as Greene’s and Reimand’s are a reminder that “there’s a difference between software and experimental protocol,” Pachter says. “Changes in computer science are very rapid—the pace of change and nature of change is just very different than it is for experimental protocol.”
But getting that message to researchers is not so simple, he adds. While some responders to Pachter’s December tweet suggested simply removing old tools or old versions of a software online—in order to, at the very least, prevent new downloads of obsolete tools—there are good reasons to retain a record of the computational dinosaurs online. “There is an argument—and it’s an important one—that people may want to reproduce old results or have the ability to run the software as it was at the time,” Pachter says.
Publishers of scientific literature may also help increase awareness.
Reimand agrees that reproducibility is a key reason to keep good records of older tools. “There should be a version available of the same software that allows you to go back to, say, six months from now, and say, ‘This is how I got the results back then,’” he notes. Many sites now do this: the BrainArray website, for example, currently hosts all 22 of its versions for download—although at the time of Greene’s 2017 study, at least five versions were unavailable.
Some developers instead opt for warning notices on the websites where software is available to download. On TopHat’s homepage, a notice below the description panel reads: “Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality . . . in a more accurate and much more efficient way.” (Emphasis TopHat’s.)
Pachter suggests that old versions of software could also be modified by developers to include their own warnings, “so that when you download the tool, and you go and actually run it, then the program itself outputs a message and says, ‘You can use this, but there are newer and better tools.’”
On the flipside, the publishers of scientific literature itself may also help increase awareness around the role of computational tools by requiring greater transparency about software information. A number of heavyweight publishing companies such as Elsevier, Spring Nature, and AAAS have adopted publishing guidelines aimed at improving reproducibility, many of which take into account the software problem.
“Including all the information, dependencies, configuration variables, test data, and other items necessary to repeat an analysis is really just part of the larger reproducibility picture, which Elsevier strongly supports,” writes William Gunn, director of scholarly communications at Elsevier, in an email to The Scientist. For example, one set of guidelines known as STAR methods—introduced by Cell Press in 2016 and now being expanded across Elsevier journals—“requires a description of the software, which includes version information, and a link to get it, unless it’s provided as a supplementary file,” adds Gunn.
Doing away with the software download
While initiatives like these might raise awareness of the risks of using outdated software, there are also moves in the biological sciences community to make the whole issue of updating computational tools—as well as switching between tools and various versions—a whole lot easier.
One possible solution, Greene notes, is for researchers to adopt the practice of uploading their entire computing environment with their publications, so that analyses can be run with any and all versions of a tool as they become available. “As a version changes, you could run the analysis with both versions through that software and quickly look at the difference in the results,” says Greene, whose Nature Biotechnology paper outlined how such a system could work in detail.
This sort of a dynamic approach to software is widely used in computer science, but remains a relatively novel concept among biologists. Nevertheless, as Nature reported earlier this year, some researchers see the transition to an era in which “scientists will no longer have to worry about downloading and configuring software” as only years away.
Until then, Pachter has advice for other tool developers. “Do as I’ve done, on Twitter and elsewhere, in public talks and statements,” he says. “Make a point of taking the time to tell people, ‘I have this tool, it’s very popular. Please don’t use it anymore.’”