How to Fix Science's Code Problem

In the spring of 2013, around 180 scientists who had recently published computational studies in Science received an email from a Columbia University student asking for the code underpinning those pieces of research. Despite the journal having a policy mandating that computer code be made available to readers, the email prompted a range of responses. Some authors refused point-blank to share their code with a stranger, while others reacted defensively, demanding to know how the code would be used. Many, though, simply wrote that they preferred not to share, admitting that their code wasn’t “very user-friendly” or was “not written with an eye towards distributing for other people to use.”

Unbeknownst to the authors, the code requests were part of a study by Columbia University researchers focusing on reproducibility in science, who would go on to publish several of the responses they received. Of 204 randomly chosen studies published in 2011 and 2012, the Columbia team could only obtain the code for 44 percent—from 24 studies in which the authors had provided data and code upfront, and thus didn’t need to be contacted, and 65 whose authors had shared it with the student upon request. The researchers often couldn’t run the code they did receive, though, as it would have required additional information from authors and special expertise they didn’t possess. Overall, the team could only reproduce the original published results for 26 percent of the 204 studies, they reported in a 2018 PNAS study.

Authors’ hesitation around code-sharing didn’t surprise Jennifer Seiler, who was at the time part of the Columbia team and is now a senior engineer at the Bethesda, Maryland–based systems engineering and software development company RKF Engineering Solutions. Beyond any sinister motives—like trying to conceal fraud or misconduct—Seiler says that some authors might be afraid that sharing their code would allow other scientists to scoop them on their next research project. In many other cases, she suspects, scientists simply don’t have the skill or incentive to write their code in a way that would be usable for other researchers. Many are probably embarrassed over badly written, inefficient, or generally unintelligible code, she says. “I think more often it’s shame than it is data manipulation or anything like that.”

If the code isn’t published online with the article, your chances of getting someone to respond, in my experience, have been slim to none.
—Tyler Smith, Agriculture and Agri-Food Canada

Without the code underlying studies—used to execute statistical analyses or build computational models of biological processes, for instance—other scientists can’t vet papers or reproduce them and are forced to reinvent the wheel if they want to pursue the same methods, slowing the pace of scientific progress. Altogether, “it’s probably billions of dollars down the drain that people are not able to build on existing research,” Seiler says. Although many scientists say the research community has become more open about sharing code in recent years, and journals such as Science have beefed up their policies since Seiler’s study, reluctance around the practice persists.

Compared to laboratory protocols where there’s long been an expectation of sharing, “it’s just recently that we’re starting to come around to the idea that [code] is also a protocol that needs to be shared,” notes Tyler Smith, a conservation biologist at Agriculture and Agri-Food Canada, a governmental department that regulates and conducts research in food and agriculture. He too has had trouble getting hold of other groups’ code, even when studies state that the files are “available on request,” he says. “If the code isn’t published online with the article, your chances of getting someone to respond, in my experience, have been slim to none.”

Poor incentives to keep code functioning

Much of the problem with code-sharing, Smith and others suggest, boils down to a lack of time and incentive to maintain code in an organized and shareable state. There’s not much reward for scientists who dig through their computers for relevant files or create reliable filing systems, Smith says. They may not also have the time or resources to clean up the code so it’s usable by other researchers—a process that can involve formatting and annotating files and tweaking them to run more efficiently, says Patrick Mineault, an independent neuro-scientist and artificial intelligence researcher. The incentive to do so is especially low if the authors themselves don’t plan on reusing the code or if it was written by a PhD student soon to move on to another position, for instance, Mineault adds. Seiler doesn’t blame academic researchers for these problems; amid writing grant proposals, mentoring, reviewing papers, and churning out studies, “no one’s got time to be creating really nice, clean, well-documented code that they can send to anyone that anyone can run.”

Stronger journal policies could make researchers more likely to share and maintain code, says Sofia Papadimitriou, a bioinformatician at the Machine Learning Group of the Université Libre de Bruxelles in Belgium. Many journals still have relatively soft policies that leave it up to authors to share code. Science, which at the time of Seiler’s study only mandated that authors fulfill “reasonable requests” for data and materials, strengthened its policies in 2017, requiring that code be archived and uploaded to a permanent public repository. Study authors have to complete a checklist confirming that they’ve done so, and editors and/or copyeditors handling the paper are required to double-check that authors have provided a repository link, says Valda Vinson, executive editor at Science. While Vinson says that initially, authors occasionally complained to the journal about the new requirement, “I don’t think we get a whole lot of pushback now.” But she acknowledges the system isn’t bulletproof; a missing code file might occasionally slip past a busy editor. Smith adds that he’s sometimes struggled to find a study’s underlying code even in journals that do require authors to upload it.

Papadimitriou says that more journals should encourage, or even require, reviewers to double-check that code is available, or even examine it themselves. In one study she and her lab recently reviewed, for example, the code couldn’t be downloaded from an online repository due to a technical issue. The second time she saw the paper, she found an error in the code that she believed changed the study’s conclusions. “If I didn’t look at it, nobody would have noticed,” she says. She reported both problems to the relevant editors—who had encouraged reviewers to check papers in this way—and says that study was ultimately rejected. But Papadimitriou acknowledges that scrutinizing code is a lot to ask from reviewers—typically practicing scientists who aren’t compensated for their reviews. In addition, it’s particularly hard to find reviewers who are both knowledgeable enough about a particular topic and proficient-enough programmers to comb through someone else’s code, Smith adds.

While firmer stances from journals may help, “I don’t think we’re going to get out of this crisis of reproducibility simply with journal policies,” Seiler says. She also sees a responsibility for universities to provide scientists with resources such as permanent digital repositories where code, data, and other materials can be stored and maintained long-term. Institutions could help lighten the burden for large research groups by hiring research software engineers—professional developers specializing in scientific research—adds Ana Trisovic, a computational scientist and reproducibility researcher at Harvard University. During Seiler’s PhD in astrophysics at the Max Planck Institute for Gravitational Physics in Germany, her research group had a software developer who built programs they needed as well as organizational systems to archive and share code. “That was extremely useful,” she says.

A lack of coding proficiency

There’s another big component to the code-sharing issue. Scientists who do most of the coding in studies—frequently graduate students—are typically self-taught, Mineault notes. In his experience as a mentor and teacher, students can be very self-conscious about their less-than-perfect coding skills and are therefore reluctant to share clunky code that is possibly riddled with bugs they’d rather nobody find. “There’s often a great sense of shame that comes from not having a lot of proficiency in this act of coding,” Mineault says. “If they’re not required to [share] it, then they probably wouldn’t want to,” adds Trisovic.

A recent study by Trisovic and her colleagues underscored the challenges of writing reproducible code. The team’s study crunched through 9,000 code files written in the programming language R and accompanying datasets that had been posted to the Harvard Dataverse, a public repository for materials associated with various scientific studies. The analysis revealed that 74 percent of the R scripts failed to complete without an error. After the team applied a program to clean up small errors in the code, that number only dropped to 56 percent.

Some of the failures were due to simple problems, such as having the program seek out a data file on the author’s own computer using a fixed directory, something that had to be changed for the code to work on other computers. The biggest obstacle, however, was an issue particularly acute in R, where code files often call on multiple interdependent software “packages,” such that the functioning of one package is contingent on a specific version of another. In many cases, Trisovic’s group was running the code years after it had been written, so some since-updated packages were no longer compatible with others. As a result, the team couldn’t run many of the files. In R, “you can very easily have this dependency hell where you cannot install [some library] because it’s not compatible with many other ones that you also need,” Trisovic says.

While there are ways to address this issue by documenting which package versions were used, the continual development of software packages is a challenge to creating reproducible code, even for skilled programmers, Mineault notes. He recalls the experience of a colleague, University of Washington graduate student Jason Webster, who decided to try to reproduce a computational analysis of neuroimaging data published by one of Mineault’s colleagues. Webster found that, just a few months after the study’s publication, the code was practically impossible to run, mainly because packages had changed in Python, the programming language used. “The half-life of that code, I think, was three months,” Mineault recalls. How reproducible one scientist’s code is, Trisovic says, can sometimes depend on how much time others are willing to invest in understanding and updating it—which, she adds, can be a good practice, as it forces researchers to give code more scrutiny, as opposed to running it blindly.

In Mineault’s view, moving toward better reproducibility will at the very least require systemic overhauls of how programming is taught in higher education. There’s a widely held belief in science that practice alone will make young scientists better at programming, he says. But coding isn’t necessarily something that people naturally get better at, in the same way that an algebra student won’t discover integral and differential calculus on their own if asked to compute the area under a curve. Rather, some computer science experts have noted that proficiency in coding comes from targeted, structured instruction. Instead of occasional coding classes, “I would like to see a more structured set of programming courses, which are just building up to becoming a proficient programmer in general. Otherwise, I think we’re in too deep too early,” Mineault says.

Even without institutional changes, there are practices researchers themselves can adopt to build confidence in coding. Scientists could strike up coding groups—for instance, in the form of online, open-source coding projects—to learn from peers, Mineault says. Trisovic recommends that researchers create departmental workshops where scientists walk colleagues through their own code. Within research groups, scientists could also make it a habit to review each other’s code, Trisovic adds; in her study, the code files that had undergone some form of review by external scientists were more likely to run without error.

Some scientists have also compiled practical advice for researchers on writing reproducible code and preparing it for publication. Mineault recently wrote The Good Research Code Handbook, which includes some practices he learned while working at tech companies Google and Facebook, such as regularly testing code to ensure it works. Mineault recommends setting aside a day after each research project to clean up the code, including writing documentation for how to run it, naming associated files in a sensible way—in other words, not along the lines of “analysis_final_final_really_final_this_time_revisions.m,” he cautions. To truly appreciate how to write reproducible code, Mineault suggests that researchers try rerunning their code a few months after they complete the project. “You are your own worst enemy,” he says. “How many times does it happen in my life that I’ve looked at code that I wrote six months ago, and I was like, ‘I have no idea what I’m doing here. Why did I do this?’”

There are also software tools that can make writing reproducible code easier by tracking and managing changes to code so that researchers aren’t perpetually overwriting past file versions, for example. The online repository-hosting platform GitHub and the data archive Zenodo have introduced ways of citing code files, for instance with a doi, which Science and some other journals require from authors. Making research code citable places a cultural emphasis on its importance in science, Trisovic adds. “If we recognize research software as a first-class research product—something that is citable [and] valuable—then the whole atmosphere around that will change,” she says.

Seiler reminds researchers, though, that even if code isn’t perfect, they shouldn’t be afraid to share it. “Most of these people put a lot of time and thought into these codes, and even if it’s not well-documented or clean, it’s still probably right.” Smith agrees, adding that he’s always grateful when researchers share their code. “If you’ve got a paper and you’re really interested in it, to have that [code], to be able to take that extra step and say, ‘Oh, that’s how they did that,’” it’s really helpful, he says. “It’s so much fun and so rewarding to see the nuts-and-bolts side of things that we don’t normally get to.”

TIPS FOR GOOD HYGIENE

The Scientist assembled advice from people working with code on how to write, manage, and share files as smoothly as possible.

Manage versions: Avoid overwriting past file versions; instead, use tools to track changes to code scripts so previous iterations can be accessed if needed.

Document dependencies: Keep track of which software packages (and which specific versions) were used in compiling a script; this helps ensure that code can still be used if packages are updated and are no longer mutually compatible.

Test it: Run code regularly to ensure it works. This can be done manually, or automated through specialized software packages.

Clean up: Delete unnecessary or duplicated bits of code, name variables in intuitive ways (not just as letters), and ensure that the overall structure—including indentation—is readable.

Annotate: Help yourself and others understand the code months later by adding comments to the script to explain what chunks are doing and why.

Provide basic instructions: Compile a “README” file to accompany the code detailing how to run it, what it’s used for, and how to install any relevant software.

Seek peer review: Before uploading the code into a repository, have someone else review it to ensure that it’s readable, and look for glaring errors or points that could cause confusion.