Gene Finding with Hidden Markov Models

made history in 1995 when it became the first free-living organism to have its genome completely sequenced.

Written byKaren Heyman

| 8 min read

Listen with Speechify

0:00

8:00

Haemophilus influenzae made history in 1995 when it became the first free-living organism to have its genome completely sequenced. In the decade since, some 180 or so organisms have followed suit.

For every one of these genomes, the sequence is only the beginning. The challenge for the computational biologists charged with making sense of the data: to find the gene sequences hidden within those strings, billions of bases long, of As, Cs, Gs, and Ts. The genome annotation strategies these computer scientists cum biologists have developed clearly have come a long way. The most recent iteration (version 4.0) of the Drosophila genome annotation, for instance, updated only 25 predictions out of 13,472 protein-coding genes.

But improvements can still be made. "If they were 100% reliable, then they would have been run on the April 2003 complete human sequence and that would've been it. Those would have been your genes," says ...

Interested in reading more?

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!

Join for free today

Already a member? Login Here