Finding the beginning of genes within genomic sequence presents a formidable challenge to projects to annotate the human genome sequence. In the Advanced Online Publication of Nature Genetics Ramana Davuluri and colleagues at Cold Spring Harbor Laboratory, in New York describe a bioinformatic strategy to predict gene promoters and first exons (Nat Genet 2001, DOI: 10.1038/ng780).

They developed a new program, called FirstEF, that attempts to predict the starts of genes. They collected over two thousand first-exons to use as a training dataset, and characterized those that were associated with a CpG island. FirstEF is designed to recognize CpG islands, promoter regions and first splice-donor sites.

The program could predict 86% of all first exons with about 17% false positives (92% of CpG-related first-exons and 74% of non-CpG exons). FirstEF gave a similar performance when tested against the finished sequences for human chromosomes 21 and 22.

Interested in reading more?

Become a Member of

Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!