Bioinformatics researchers who perform alignments of long protein sequences face a difficult choice: They can get accurate results in hours (sometimes days), or quick results if they're willing to sacrifice accuracy. Now, Robert C. Edgar, an independent researcher in Mill Valley, Calif., has an alternative: a sequencing algorithm that delivers both high accuracy and speed. Edgar calls his new algorithm MUSCLE (multiple sequence comparison by log expectation).
MUSCLE uses a log expectation score to control progressive alignment of the input amino acid sequences. This avoids the need to explicitly generate multiple alignments, thus drastically reducing the complexity of the problem (particularly with long sequences). As a result, the algorithm is both fast and accurate. Evaluated on the Bali BASE benchmark (which includes 142 reference alignments and more than 1,000 sequences), MUSCLE delivers a higher score than T-Coffee (until now considered the most accurate technique) while requiring only 20 seconds...