Left: Courtesy of Pratul K. Agarwal; Right: Courtesy of High Performance Computing Facility, University of Puerto Rico

Users no longer need remember arcane command-line incantations with Linux; the OS hides its complexities beneath a snazzy user interface. Here, Linux versions of ImageJ, an image manipulation suite (top) and PyMOL, a biomolecular structure visualizer (left) are shown.

On August 25, 1991, a student named Linus Torvalds at the University of Helsinki posted an innocuous message to an Internet bulletin board. "Hello everybody out there using minix," he wrote on the comp.os.minix newsgroup. "I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones." Minix was a commercial UNIX product, and Torvalds, who was building a free variant of it, was taking feature requests.

Thirteen years later Torvalds' project, called Linux ("Linus' Minix"), has become a wildly popular alternative to...


"Stability and efficiency are still the main reasons for using Linux," says Israel Nelken, a neurobiologist at Hebrew University, who started using Linux in 1996. At that time, Nelken used the operating system to run a network of computers from Silicon Graphics. Now, Nelken uses it to control all the computers in his lab. "The Linux workstations are used mostly for data analysis, with MATLAB as the main application," he says. "MATLAB under Linux still runs faster than MATLAB under Windows on the same machine, and its memory handling is substantially better." That memory-handling benefit is important to Nelken, because he and his colleagues analyze large databases.

In addition to direct research benefits, Linux also helps Nelken in other ways. "A smaller advantage is security and viruses," he says. Many viruses are written to take advantage of security loopholes in Microsoft's Windows operating system and Outlook, it's E-mail client. But, he says, "We do all our mailing under Mozilla on the Linux machines, and therefore E-mail viruses are not an issue."


Courtesy of Sun Microsystems

Biomedical researchers and scientists at the National Center for Microscopy and Imaging Research and the San Diego Supercomputer Center at the University of California, San Diego, use clustered Sun Java Workstations running Linux and partner technologies to power this 30-megapixel BioWall Tiled Display.

John Yates III, director of the proteomics mass spectroscopy lab at the Scripps Research Institute in La Jolla, Calif., also started using Linux around 1996. Today he uses it to run a cluster of 128 Intel Xeon processor nodes. When asked why he selected Linux, he says, "It's free, it's reasonably robust, and it has developed more features over the years."

Linux offers other benefits that can be especially useful for modern life scientists. According to Steffen Möller, a member of the bioinformatics group at the University of Rostock, Germany, "Many bioinformatics tools have evolved under Unix and found a common platform in Linux." Möller also cites the value of being able to use Linux across multiple hardware platforms. He adds, "The software is distributed very easily and consistently, and there is a strong community spirit."

Other research scientists prefer Linux's open-source architecture. "Linux is more flexible and versatile than Windows because it is infinitely more customizable, and so applications can be written for virtually any purpose," says Christopher Davies, assistant professor at the Medical University of South Carolina, Charleston. He adds, "Importantly, it is open-source, and this facilitates communication between scientists and developers to overcome problems, which leads to faster optimization of applications under Linux."

Davies also finds financial reasons to select Linux – and not simply that it can be obtained for nothing. To him, the main pecuniary advantage is that he can custom-build task-specific computers for a relatively low cost, and then control them with Linux. For example, he says, "One computer can include high-performance graphics cards for molecular graphics applications, or the best CPU for CPU-intensive applications, or both."


Trying Linux

If you're curious about Linux but not quite ready to take the plunge, you can try one of the many Knoppix distributions available. Knoppix http://knopper.net/knoppix/index-en.html is a Linux variant that runs from a bootable CD instead of the hard drive, thus providing a risk-free way to take the operating system out for a spin.

At least two bioinformatics-themed Knoppix distributions exist, Bioknoppix http://bioknoppix.hpcf.upr.edu and VigyaanCD http://www.vigyaancd.org. A third distribution, Bio-Linux 4.0, is expected this fall http://envgen.nox.ac.uk/biolinux.html.

For more on the Linux operating system itself, visit http://www.linux.org.

Of course, everything has a downside, even Linux. "The free version is not well supported," says Yates. Nelkin's complaint is one of compatibility: "The disadvantages, are, well, that Linux is not Windows. When we do word processing or when we prepare presentations in the lab, we use MS Office under Windows." He concludes, "To communicate with the world, we need Windows."

The primary complaint is incompatibility with other programs. "Linux comes in different flavors," says Davies, "hence some programs may work on one variant of Linux but not others." Davies also points out that when a scientist upgrades hardware, Linux developers may lag in rolling out the drivers to support it. And software installation can be taxing, he adds. "Trying to track down various libraries to get a particular program running can be frustrating at times, but overall it's improving."

Others, however, see few disadvantages to using Linux in the lab. Joseph DiRisi, who holds the Tomkins Chair in the department of biochemistry and biophysics at the University of California, San Francisco, says, "There are none that I can justify."


Even just a few years ago, most people thought of Linux as an operating system for computer hackers or dedicated code writers. Today, though, even the largest computer companies are taking part. Loralyn Mears, life sciences market segment manager for Sun Microsystems of Santa Clara, Calif., says, "It has had a huge impact, because life science research in particular is largely fueled by the open-source initiative." She adds, "The freeware operating system opened entirely new possibilities to readily enable collaboration among researchers."

Sun helps meet customer demand for Linux in several ways. First, this company's x86 bioinformatics hardware is all Linux compatible. "All of our Intel- and Opteron-based servers can boot Linux, Solaris, or Windows," Mears says. "We've seen an overwhelming request for free Linux in the life-science community." Consequently, Sun has contributed more code to the Linux Standards Base than any other life-science information-technology vendor. In addition, Sun's Solaris operating environment continues to become increasingly compatible with Linux.

Philip Papadopoulos, program director of grid and cluster computing at the University of California at San Diego, and his colleagues use Linux-based Sun computers to drive a bank of monitors called a "tiled display wall." The wall contains 20 monitors, each driven by its own dual-processor Sun Java Workstation. This cluster runs a Linux operating system that is managed with the Rocks Clustering Toolkit (developed at the San Diego Supercomputing Center). "Linux allows us a more complete control of the operating-system stack. It is also significantly easier to manage a Linux-based cluster than a Windows-based one," says Papadopoulos.

In essence, the tiled display wall behaves like a 30-million pixel monitor. "That's 20-times more pixels than on your home monitor," Papadopoulos says, noting that this high number of pixels is especially useful for electron microscopy images. "The display allows the scientist to look at the complete image at full resolution without having to pan," he says. Moreover, the images can be manipulated rapidly because the system runs at 160 gigaflops (160 billion floating-point operations per second). The average desktop, says Papadopoulos, runs at about 5.6 gigaflops. The tiled wall display is being used in a wide variety of research projects, including building a complete atlas of the mouse brain and looking for neural structures that might predispose a patient to depression.



Courtesy of Pacific Northwest National Laboratory

Sporting nearly 2,000 processors this Linux-based supercomputer at the William R. Wiley Environmental Molecular Sciences Laboratory, part of the Pacific Northwest National Laboratory, clocks in at an impressive 11.4 teraflops. That makes it ninth on the June 2004 list of the world's top 500 supercomputers http://www.top500.org.

Joining Sun in support of Linux is IBM. "Linux is ready for the vast majority of applications that computational scientists use," says Dan Frye, vice president of the IBM Linux Technology Center in Beaverton, Ore., who adds "it has been years" since he met anyone with a life science project the system couldn't handle. Linux's major strength, says Frye, is portability across platforms. "You can get Linux from multiple players and multiple hardware architectures," he says. But he says also that many scientists are perfectly happy with the free versions of the operating system.

Still, IBM finds many ways to interact with scientists using Linux in the lab. The company offers a wide variety of programs for life science investigators. "We also encourage applications companies to port programs to Linux and tune their applications," Frye adds, but Frye's team can help get programs running on Linux when needed. "My team's job is to make Linux better," he says. "If there are things that are missing in Linux, our life science team comes in and we improve that."

Part of making Linux better involves outside collaborations. In October 2003, IBM and Accelrys, a software company with facilities in San Diego and Cambridge, UK, joined forces to bring two more software tools to Linux-loving life scientists. First, Accelrys made its Insight II and Catalyst software Linux-ready. (The former provides graphical tools for modeling and analyzing molecular systems; the latter manages a database of three-dimensional information about molecules. In combination, these software packages can tackle problems such as testing the three-dimensional alignment between a drug candidate and potential molecular target.) IBM then worked with Accelrys to make sure that these software packages would work on the IBM IntelliStation workstation running Red Hat Linux.


Scientists of all stripes have found ways to incorporate Linux into their work. Ravi Iyengar of the Mount Sinai School of Medicine uses Linux on PCs and Sun workstations to model biochemical signaling pathways in cells. In 1999, Masaru Tomita and his colleagues at the Keio University in Fujisawa developed their E-CELL simulation software, compatible with both UNIX and Linux, to model the interactions of systems of cells, genes, proteins, and other macromolecules. Over time, the operating system has been applied to other life science challenges, too.

Consider X-ray crystallography, one of the most computationally intense biological techniques. Davies uses the technique to study the three-dimensional structure of penicillin-binding proteins, antibiotic targets that affect the synthesis of bacterial cell walls. "Many of the applications in crystallography are CPU-intensive, and historically we used high-end UNIX workstations, such as Silicon Graphics, to perform these tasks," he says.

But such workstations are expensive, and PCs are cheap, so Davies switched to Linux. "Because Linux is very similar to UNIX, many of these applications could be ported relatively easily to Linux, thus allowing us to harness the power of today's PCs for crystallography applications at a fraction of the cost," he says.

DiRisi uses Linux in his research, too. "Our primary portal for our malaria transcriptome data is completely built and served on a Linux platform." So far, DiRisi's system has handled more than 600,000 data requests since September 2003. Describing the success of his site and the Linux system, DiRisi says, "All this without a single reboot!"

Stability is nice, but so is speed. At the Pacific Northwest National Laboratory (PNNL) in Richland, Wash., scientists developed an 11.4 teraflop (trillion floating-point operations per second) supercomputer that runs on Linux. Scientists at PNNL use this system for a variety of life-science applications, including one study of how various chemical modifications change the structure of DNA. Scott Studham, technical group leader of computer operations at the lab's molecular science computing facility, says, "When I started this project, I was a firm believer in vendor-supported, proprietary UNIX." Now that he has had nine months of using Linux on this supercomputer, he says, "I'm a convert to the open-source support model." He adds, "The biggest advantage of Linux is that we can see and modify the code to better meet our users' need. Also if you need a tool or a patch, chances are it is going to be available on Linux first."

Nevertheless, Linux isn't for everyone. Retaining vestiges of its hacker ancestry, Linux still requires considerable computer know-how to stay afloat irregardless of vendor support. "You can't just call the vendor anymore if it isn't working," observes Studham. Installing new software remains surprisingly daunting, much of it available only as source code that must be compiled prior to installation. And there are few friendly message boxes to ensure you know what you're doing before executing a potentially lethal command.

Still, those scientists who use Linux in the lab seem generally satisfied, and express considerable support for the open-source approach. Says Studham: "Linux rocks!"

Mike May mmay@the-scientist.com

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?