A dictionary for genomes

With sequence information in hand, the search for regulatory sites in promoters can be done by computers rather than cloning. But the primary tools for analysis, multiple-alignment algorithms, can only handle a small amount of sequence data. In the August 29 Proceedings of the National Academy of Sciences, Bussemaker et al. introduce an alternative algorithm that they dub 'MobyDick' (Proc Nat Acad Sci USA 2000, 97: 10096-10100). MobyDick treats DNA sequence as text in which allthewordshavebeenru

Written byWilliam Wells
| 1 min read

Register for free to listen to this article
Listen with Speechify
0:00
1:00
Share

With sequence information in hand, the search for regulatory sites in promoters can be done by computers rather than cloning. But the primary tools for analysis, multiple-alignment algorithms, can only handle a small amount of sequence data. In the August 29 Proceedings of the National Academy of Sciences, Bussemaker et al. introduce an alternative algorithm that they dub 'MobyDick' (Proc Nat Acad Sci USA 2000, 97: 10096-10100). MobyDick treats DNA sequence as text in which allthewordshavebeenruntogether. It attempts to build a dictionary of 'words' by first finding over-represented pairs of letters. Letter frequency is used to determine the probability that the pairs exist thanks to chance, and this helps determine how larger fragments continue to be built. Bussemaker et al. test their algorithm on a space-less version of the first ten chapters of the novel Moby Dick, then attack a list of all of the upstream regions in the yeast ...

Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member? Login Here

Meet the Author

Share
Image of a man in a laboratory looking frustrated with his failed experiment.
February 2026

A Stubborn Gene, a Failed Experiment, and a New Path

When experiments refuse to cooperate, you try again and again. For Rafael Najmanovich, the setbacks ultimately pushed him in a new direction.

View this Issue
Human-Relevant In Vitro Models Enable Predictive Drug Discovery

Advancing Drug Discovery with Complex Human In Vitro Models

Stemcell Technologies
Redefining Immunology Through Advanced Technologies

Redefining Immunology Through Advanced Technologies

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Beckman Coulter logo
Conceptual multicolored vector image of cancer research, depicting various biomedical approaches to cancer therapy

Maximizing Cancer Research Model Systems

bioxcell

Products

Sino Biological Logo

Sino Biological Pioneers Life Sciences Innovation with High-Quality Bioreagents on Inside Business Today with Bill and Guiliana Rancic

Sino Biological Logo

Sino Biological Expands Research Reagent Portfolio to Support Global Nipah Virus Vaccine and Diagnostic Development

Beckman Coulter

Beckman Coulter Life Sciences Partners with Automata to Accelerate AI-Ready Laboratory Automation

Refeyn logo

Refeyn named in the Sunday Times 100 Tech list of the UK’s fastest-growing technology companies