Scientists Abandon their Software

Last summer, a member of the biology department of the University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural motifs in gene or protein sequences, was out of date having been written more than a decade ago. Although still used by molecular biologists, its slow computing ability meant a straightforward multiple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in

Sam Jaffe
Feb 15, 2004
<p></p>

Last summer, a member of the biology department of the University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural motifs in gene or protein sequences, was out of date having been written more than a decade ago. Although still used by molecular biologists, its slow computing ability meant a straightforward multiple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in computational biology, to write a program that could do the job more quickly.

For the next six months, Vitacolonna embraced the project as his main work. In March, he will present the final release of his new program, SmartFinder, which promises to speed up searches by a factor of ten. He will complete his postdoc position this summer, and he hopes to get a new job shortly thereafter. Until then, Vitacolonna will...

ORPHANED CODE

Vitacolonna isn't alone in abandoning a good project at the end of his postdoc term. It's a ritual of many a bioinformatician. If a person's postdoc project is wildly successful and published in multiple journals, he or she can continue overseeing the program in a junior professor job. If it's just ho-hum work, the creator will probably move on to more interesting projects on the next step up the career ladder.

In the field of bioinformatics, this process, which results in programs known as abandonware, has a debilitating impact. Postdocs and graduate students write code, release it into cyber-space under an open-source license, and then move on to the next innovation. Meanwhile, new grad students and postdocs don't want to work on a project that already has a solution, even if that solution has nobody fixing bugs and providing service. The discipline cherishes innovation and creativity, not sound technical support.

Moreover, no private company wants to offer a commercial version of a free program floating around. "It's one of the biggest challenges in the field of bioinformatics," says Sean Eddy, a Howard Hughes Research Investigator and an associate professor at the Washington University of St. Louis. "There's thousands of programs floating around orphaned by their creators, [which] can't help anybody and tend to clog up software development in critical areas."

The problem stems from biology's dependence on the free flow of information and the sharing of lab equipment, reagents, and protocols. The very concept of creating a proprietary program whose source code is kept secret and which is sold for profits is anathema to the entire academic biocomputing community. "The tendency is greater in biology than in chemistry or physics for researchers to rely on free software," says Don Gilbert, a bioinformatist at the Center for Genomics and Bioinformatics at the University of Indiana, Bloomington. He says that biology's addiction to no-cost software is killing the industry, because it squelches the ability of small startups to launch new bioinformatics projects.

Another reason for all the abandonware has to do with the very nature of bioinformatics: It's geared towards solving specific problems rather than providing a permanent solution to general problems. "Solving a computational challenge is a career, servicing a preexisting program is just an engineering task," Gilbert says.

Gene Myers, professor of computer science at the University of California, Berkeley, agrees and actually recommends that his junior lab staff steer clear of doing too much support on finished projects. A living legend in bioinformatics, Myers helped to decode the human genome at Celera Genomics. "In academia there is no reward structure for maintaining and supporting a piece of software," Myers says. "It would not be good for the postdoc's career and I would not advise them to do so. There's just no academic incentive for it."

Meanwhile, the clutter of older, half-useful programs can hamper future software development. "I came across a lot of programs that I wanted to use, but they had no documentation and no support," says Glen Otero, the founder of Calident Software, which oversees its own distribution of Linux called BioBrew. "It's hard for someone else to start working on a similar project when it already exists, even if it's not really useful to anyone."

PENNIES FOR PRESERVATION

There isn't much of a financial incentive, either. Few bioinformatics-only companies remain standing, and most industry jobs are within a larger biotech or pharmaceutical company. And don't let the myth of bioinformatics being a hot field fool you. According to The Scientist's most recent salary survey,1 US bioinformatics PhDs actually saw a decline in salaries in 2003, from median earnings of $80,000 a year in 2002 to $76,800 in 2003. "Bioinformaticians are the first ones to the chopping block when a drug company needs to cut somewhere," says Joe Landman, who founded his own consulting firm, Scaled Informatics.

That's not to say that it's impossible to take these biology tools into the marketplace. Chris Xie is living proof. As a graduate student at the University of California, Riverside, he says had a dream one night about a new approach to creating powerful networks. He went back to his office and wrote a Java program that scientists without information technology experience could use to make their own distributed computing network for solving complex bioinformatics problems. Xie has since left graduate school and turned his program into a suite of software for Java-based distributed computing, called GreenTea. He now employs 25 programmers in the United States and China and has attracted venture capital from China. He concentrates on selling his software to drug companies, but in the spirit of the field, he still provides it free to university biologists.

Xie is the exception to the rule, however. It is very difficult to make a bioinformatics company succeed financially, and few inside academia or industry want to take such chances with their careers. That's why the National Institutes of Health has recently taken a step to promote maintenance of bioinformatics programs.

In July 2002, NIH unveiled a new grant program that pays young scientists to update and maintain their bioinformatics tools. "This is the first time we've offered money to maintain and strengthen preexisting programs," says Yuan Liu of the National Institute of Neurological Disorders and Stroke. But, the pool is limited to $3.5 million.

Washinton University's Eddy recommends a more dramatic change to the structure of the world bioinformatics system. "I think it's time to start thinking about extending the length of post-docs in our field so that they have time to identify a problem, create a solution and then get all the bugs out," he says. "It's not like the rest of biology where you prove a hypothesis and you're done. For a bioinformatics program to work well, you need to keep repeating the experiment over and over until all the bugs are out."

Sam Jaffe can be contacted at sjaffe@the-scientist.com

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?