Courtesy of Paracel
Researchers use BLAST to search previously characterized DNA or protein sequences for partial or total matches. For the last 12 years, Pasadena, Calif.-based Paracel
"Unlike NCBI [National Center for Biotechnology Information] BLAST, Paracel BLAST is designed as a parallel application. Both can be run on clusters, but we can run a single job in parallel while NCBI can only run one job per processor," says Marc Raeffell, Paracel's senior manager of R&D.
Raeffell says Paracel BLAST can eliminate many of the bottlenecks in NCBI BLAST that cause problems with large sequences. "Typically NCBI gets slow at the 105 base range, and completely fails in the 106 to 107 range, which Paracel BLAST searches without problems," says Raeffell.
But Tom Madden, an NCBI staff scientist, counters that the current version of NCBI BLAST breaks up the databases into segments that are distributed across a group of machines running in parallel. "NCBI currently runs 150,000 searches a day, at peaks of 180 searches a minute, on a distributed system of CPUs," Madden writes in an E-mail. The NCBI hopes to make this version of BLAST publicly available in about six months, Madden says.
According to Raeffell, Paracel's 64-bit version will enable individual nodes in the cluster to address up to 64 GB of local RAM, while the current 32-bit version is limited to 4 GB. This will allow 64-bit Paracel BLAST to perform more complex alignments, and in some cases will offer better performance.
Paracel, which to date has been selling Intel-based clusters, decided with this release to support AMD's Opteron rather than Intel's competing Itanium processor chip to meet customer demand for 64-bit versions needed to support very large databases. "We see Opteron offering much better price [and] performance than competing platforms. BLAST is an integer application, and Itanium offers better performance only in floating-point," says Raeffell.
A typical Paracel cluster includes 16 dual processor nodes (a total of 32 processors), each with 2 GB of RAM and a 60-GB hard disk. The nodes are connected to a central disk array with a total capacity of around 1 terabyte (1,000 GB), at a cost of "around $50,000," according to Raeffell.
- John D. Ruley