Analyzing protein and DNA sequences has become a daily routine in most life-science laboratories. But today's scientists expect more from sequence-analysis software than just motif identification and sequence alignments. They want fully integrated workflows. They want to be able, for example, to run queries against several databases, design oligonucleotide primers based on the results, design a cloning experiment, and then order the primers – all using one seamless suite of programs.

Programs that do the basic groundwork, such as BLAST and other sequence-analysis algorithms, have just about reached their performance peak, according to Vivien Benazzi, director for bioinformatics R&D at Invitrogen in Carlsbad, Calif. As a result, software vendors are scrambling to provide added features to attract customers. Two trends currently dominate the market: open-access design to allow greater customization, and increased user friendliness (especially enhanced user interfaces that work seamlessly with other programs in the workflow).

These trends...


Another trend, tighter integration of software suites, is designed to simplify sequence analysis. Says Bob Gross, president of West Lebanon, NH-based Textco Biosoftware, "Having worked on mainframes and stared at green letters on a black background, I find it particularly satisfying to be able to fly cross-country at 35,000 feet and work on a complex sequence-analysis project on my trusty laptop."

Oracle Database 10 g, Oracle's latest release and arguably the world's most widely used database software, has BLAST searches integrated into the very heart of the database, thus undercutting the need to export ever-expanding sequence datasets into analytical servers. "The Oracle data mining (ODM) option brings BLAST to the data, rather than the other way around," says Susie Stephens, a principal product manager for life sciences at Oracle in Redwood Shores, Calif.


Courtesy of Invitrogen

from Invitrogen is one of the most popular sequence-analysis packages. Featuring tight integration with Invitrogen services, the software supports cloning "in silico" with Invitrogen's proprietary Gateway technology and can even track and order reagents necessary to perform the experiment.

The database enables users to retrieve sequences of interest by supporting sophisticated searches that filter data based on sequence annotations and sequence homology. "Combined with the other analytical capabilities in the database, such as regular expression searches, statistics, and further data-mining algorithms, the database provides a strong analytical engine for drug discovery. Also, queries can be performed in batch, or automated with the use of scheduling tools," says Stephens.

The Lasergene sequence-analysis software package from DNAStar of Madison, Wis., contains six tightly integrated modules, which enable sequence alignment, plasmid construction, gene discovery, and sequence assembly, as well as protein structure prediction and primer design. The full integration of these modules ensures that changes in a file in one module are automatically updated in files in all the other modules. Modular program design also simplifies the dissemination of bug fixes and new features, and it potentially cuts costs by allowing users to purchase only the features they need.

GeneSolve by Stratagene of La Jolla, Calif., provides restriction-site digestion, motif searching for prokaryotes and eukaryotes, simulated gel mapping, and 3-D structural visualization. Integration is a key feature. Stratagene claims that using GeneSolve in conjunction with IobionLab Protein-Solve software allows seamless, integrated analysis of nucleic acid sequences and their protein counterparts simultaneously. (Iobion Informatics of La Jolla, Calif., is a Stratagene affiliate.)

Textco's Gene Inspector integrates more than 60 analysis tools (including sequence alignment, restriction mapping, and antigenicity and structure prediction) with a powerful "notebook." This notebook acts as an output for the data, but it also stores a record of the analysis that can be repeated with new sequences – a form of pipeline. "No longer are [users] in the position of trying to compare two different analysis runs on different sequences using different parameters, a situation that is far too common and is nearly impossible to interpret accurately," says Gross.


This trend towards ever-increasing integration is also evident in the design and evaluation of new clones and primers. New programs replace computer-centric metaphors such as "cut and paste" with biology-based ones. Thus, researchers can select a fragment bounded by restriction sites, "cut" it with the specified enzymes, and then "ligate" it to another fragment to create a new clone.

NTI Advance expands on that idea with support for parent company Invitrogen's Gateway cloning technology, which uses recombinases rather than restriction enzymes. The program can generate a step-by-step cloning protocol and track all the necessary reagents; it can even order reagents from Invitrogen or any other supplier. Similarly, in Accelrys' new release of MacVector (version 7.2) the user can now directly click on restriction enzyme sites on a graphical map and then copy and paste fragments of DNA from one molecule into another.

Three programs from Scientific & Educational Software of Cary, NC, which cover clone and primer design and multiple sequence alignments, have been merged into a single integrated suite called Clone Manager Suite 7. A central program hub, called Sci Ed Central, manages integration by controlling file handling, molecule viewing, sequence editing, and general utility functions.

MiraiBio's DNASIS MAX is another well-integrated program suite. Not only does the program perform local and remote homology searches, it also sports an annotation editor that allows the analysis results (such as ORFs, primers, or hydrophobic regions) to be saved as annotations alongside the sequence. Editing of multiple sequences is made possible by displaying results in a single window. Site searching for restriction enzymes allows the user to find the best enzymes to excise a selected sequence area. A motif search displays the DNA sequence and the translated amino acid sequence in a result window, and common motifs that are found in multiple sequences can be detected.

Increased integration and extensibility are also prominent in publicly available software tools. The European Bioinformatics Institute's (EBI's) public database http://www.ebi.ac.uk has long offered a comprehensive pack of analysis tools free of charge. Since professional Web designers were hired three years ago to improve the site's user friendliness, usage has soared. Rodrigo Lopez, EBI's head of external services, reports growth from about two million hits per month three years ago to a current average of 1.6 million per day.

"To understand the trends in sequencing software, you only have to look at who is using the tools," says Lopez. "Ten years ago a university professor would have been the only one in [the] lab to hold an account. Now, everyone is using the tools. Today's user demands more than just technical reports such as sequence alignments; current developments are towards increased user friendliness."

In keeping with the movement towards integration, EBI is promoting its new simple-object access protocol (SOAP)-based tools to allow easier automated querying of their tools and databases by third-party software. Indeed, the entente between public and private services is decidedly congenial. "We always have the most up-to-date and the most exact data," says Lopez. "They have the nicest tools."



Selected Providers of Sequence Analysis Software

As BLAST reaches its performance peak, new search algorithms are being developed. PatternHunter from Bioinformatics Solutions in Ontario uses spaced-seed technology to allow accurate homology searches. BLAST uses seeds, short sequences whose match hints to the algorithm that the current sequence segment forms part of a wider match. In BLAST, the whole seed must match the target sequence, whereas Pattern-Hunter uses spaced seeds, which makes the search more accurate.

The PatternHunter program also is fast. Working with the Mouse Genome Sequence Consortium, PatternHunter compared the mouse genome with the human genome in 20 CPU-days. BLAST needs at least 20 CPU-years to do the same job at the same sensitivity.1

Meanwhile, traditional searches and alignments are getting easier, thanks to advances in software interfaces. Vector NTI's graphical sequence-analysis components present search and alignment results, along with their annotations, in a variety of graphic formats. Similarly, StarBlast, a product of DNAStar, is a three-component system consisting of a server, a data manager, and a Web browser-based client.

PepTool from BioTools in Edmonton, Canada, parses search results from a number of databases and includes tools for annotating sequences, multiple alignments, expected protease cleavage points, and posttranslational modification analysis. Jellyfish from LabVelocity of Burlingame, Calif., can import sequences directly from a database, allowing users to drag and drop sequences straight into an alignment.

Old favorites are also undergoing improvements. The latest release of MacVector adds support for fast user switching, a feature found only in Mac OS X version 10.3. Accelrys' senior product manager, Kevin Kendall, says, "The key for us is not necessarily to incorporate the latest bleeding-edge bioinformatics algorithms, but to provide simple interfaces that enable users to solve their everyday bioinformatics problems in a straightforward, no-hassle way." To provide this same power to PC users, Accelrys offers DS Gene, which is basically a PC front-end to MacVector. DS Gene's primer-design module has been completely rewritten to improve user interactivity and the display of results. For instance, individual primers can now be selected and highlighted in the parent sequence to simplify annotation.


Finding transcription factor binding sites in a DNA sequence is important in view of escalating interest in the transcriptome. MatInspector from Genomatix in Munich uses a library of matrix descriptions for transcription factor binding sites to locate matches in sequences of unlimited length. Similar and/or functionally related sites are grouped into so-called matrix families.

A variety of commercial and free tools simplify analysis of DNA sequencer output. Sequencher from Gene Codes of Ann Arbor, Mich., is possibly today's most popular software for DNA sequencing. Its fast contig assembly is coupled to a set of user-friendly editing tools that allow restriction enzyme mapping, heterozygote detection, cDNA-to-genomic DNA large-gap alignment, support for confidence scoring, comparative sequencing, and ORF, motif, and SNP analysis. Yet Sequencher has also been used in applications as diverse as mutation detection and forensics.

The GenomBench module, part of Vector NTI Advance, enables the download, viewing, analysis, and annotation of local copies of reference genomic DNA sequences. DNATool from BioTools allows multiple alignments of DNA data, as well as an integrated viewer to allow PDB views of DNA structure and fast dot-plot analyses.

With ever-increasing feature sets, easier use, and expanding opportunities for customization, academic and corporate scientists are finding that the power of the genome is getting ever easier to tap. Says Benazzi, "Once you are connected, you've got a quick workflow."

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!