Genomic quirks make Candida albicans a troublesome creature in the lab. Unlike Saccharomyces cerevisiae, which was sequenced in 1996, C. albicans has no known haploid or homozygous form. Using whole genome shotgun sequencing is tricky, because multiple alleles confound standard assembly software. Moreover, the C. albicans genome is full of repeated sequences and recently diverged gene families.
In a 2004 paper, a team of researchers led by Ron Davis of the Stanford Genome Technology Center presented the first complete sequence of the pathogen's diploid genome.1 "The analysis was challenging," says study coauthor Nancy Federspiel. "It was hard to tell apart the allelic differences from the sequencing errors." Because their sequence assembly software assumed single-copy sequence, the researchers were not surprised when the sum of the contigs exceeded the genome size by 20%. They had to perform a pairwise comparison of the contigs...