A pufferfish offers a compact version of the human genome, packing essentially the same information into one-eighth the DNA. It includes the protein-encoding exons and their control regions, but it lacks many repeats and has compressed introns interrupting its genes. This correspondence makes the pufferfish genome a uniquely useful tool to annotate the human genome, and sequencing it has been a goal since 1989. But "its" is a misnomer, because in late 2001, researchers revealed the genome sequences of two species of pufferfish.
The Fugu Genome Consortium, of which Elgar's group is a part, announced the sequencing of the Japanese pufferfish Fugu rubripes at the 13th International Genome Sequencing and Analysis Conference in San Diego on Oct. 25. The International Fugu Genome Consortium includes the US Department of Energy Joint Genome Institute, Myriad Genetics Inc. of Salt Lake City, the Institute for Molecular and Cell Biology of Singapore, the Singapore Biomedical Research Council, the UK Medical Research Council, the Cambridge University department of oncology, the Institute for Systems Biology in Seattle, and Celera Genomics Group in Rockville, Md. Close on its fins was the Nov. 6 statement by Agencourt Bioscience Corp of Beverly, Mass., GenoScope in Envry, France, and the Whitehead Institute Center for Genome Research in Cambridge, Mass. of their whole genome shotgunning of the Tetraodon nigroviridis genome. "Tetraodon lives in brackish waters in Malaysia, whereas Fugu is a saltwater delicacy," says Kevin McKernan, co-chief scientific officer and cofounder of Agencourt. The human/teleost (bony fish) split is between the evolution of these fish, which occurred 360-380 million years ago, he adds.
Fugu—From Delicacy to Annotation Tool
|Courtesy of H. Roest Crollius|
The idea to use Fugu DNA sequences to probe the human genome came from Sydney Brenner, distinguished professor at the Salk Institute for Biological Studies in La Jolla, Calif.1 He reasoned that since vertebrates have similar development and physiology, their genomes were likely to be similar in protein-encoding content. "Sydney Brenner suggested that it was the way in which genes are regulated, rather than different genes, that gave rise to vertebrate diversity," says Elgar, who worked with Brenner at the time. Fugu was a good candidate for a vertebrate considerably older than humans, because it diverged from the lobe-finned fishes that gave rise to the tetrapods 450-
500 million years ago. And its genome is small, which Brenner knew from a landmark paper by Ralph Hinegardner,2 a professor emeritus of ecology and evolutionary biology at the University of California at Santa Cruz.
Hinegardner analyzed the DNA content of more than 300 species of fish. Members of order Tetraodontiformes—the pufferfishes—were the lightweights, with genomes of 350-400 million bases. "Fugu is at one end of a normal distribution of fish genome sizes," explains Elgar. And dissecting its genome proved gratifying. "We were excited when we started sequencing it and all these lovely little genes, just like the human ones only about one-tenth the total size, kept popping up. They were exact copies of their big mammalian counterparts, with the same numbers of exons and introns, with splice sites in the same positions. The only difference was they all had small introns," he adds.3
What's so exciting about pufferfish comparisons is not just that they have piscine versions of human genes—after all, mice, rats, and even flies and worms have many of the genes that humans have—but that the control sequences are easier to detect. Researchers can use Fugu regulatory sequences to fish out related sequences linked to previously unknown human genes.
"We're pretty good at spotting coding sequences, mostly through cDNAs and EST [expressed sequence tags] work, but we're very poor at finding regulatory sequences, basically because no one knows what we're looking for. These control sequences are often shared between Fugu and mammals, and because the rest of the 'junk' is not well conserved, as it often is when you compare mouse and man, they stand out and slap you if you know how to look," relates Elgar. McKernan says much the same thing for Tetraodon, "Junk DNA is a problem in identifying promoters and regulatory regions. Pufferfish doesn't have the junk, so it is easier to tell the signal from the noise."
Tracking Genes in Tetraodon
The Tetraodon project started in 1997 at GenoScope, the French National Sequencing Center. "We did about threefold coverage, and the Whitehead Institute and Jean Weissenbach's group at GenoScope has done another 2.5-fold coverage, so that's about fivefold overall," explains Joel Malek, manager of sequencing operations for Agencourt. Assembly of the sequence is ongoing.
The Tetraodon group's strategy is to seek synteny, the maintenance of stretches of genes in the same order on a chromosome across species. The researchers developed a tool called Exofish, which stands for "exon finding by sequence homology," to identify "ecores," which are "evolutionary conserved regions" of the genome. The Tetraodon genome contains 2,992 ecores. The idea is that the most vital genes are the ones that are retained throughout evolution.
"What we are hoping to get out of the pufferfish genome project is a refined method to find the promoters and regulatory sequences in the human genome. They are more tightly spaced in the pufferfish genome. They are closer to the genes that they regulate. We call this a Rosetta stone," says McKernan.
Even though the first draft sequence isn't yet complete, genome information from this fish is already being used to annotate the human genome, specifically chromosome 20.4,5 Matching of human sequences to 207 ecores identified human exons that had been beneath the radar of the annotation because researchers did not have ESTs, cDNAs, or protein homologies to follow. Specifically, of 727 human genes and 168 pseudogenes on chromosome 20, 36% contain exons conserved in mouse and Tetraodon.
The Bigger Picture—Genome Structure
|James P. McVey, NOAA Sea Grant Program|
"There is no logical explanation as to why some genomes are so much more cluttered with 'junk' than others. There have been some obvious explosions, such as LINES [long interspersed elements] and SINES [short interspersed elements] in primates. These make up a large proportion of our genome and are probably the result of a rogue retrovirus or retrotransposon that just went out of control and started replicating and inserting all over the genome," speculates Elgar.
But the larger size of mammalian genomes might reflect more than an accidental embrace of an errant bit of RNA copied into DNA that then inserted itself into the human genome. Perhaps when chromosomes grew to a certain size, they had to add non-protein-encoding DNA sequences just to maintain their integrity. After all, the largest of Tetraodon's 21 chromosomes is still smaller than the smallest human chromosome, and so might be beneath such a threshold. Evidence comes from chromosome comparisons. "It may have something to do with chromosome architecture. Mammalian chromosomes are quite large, fish chromosomes are small, and chickens have a set of large macrochromosomes and a set of microchromosomes. The microchromosomes are thought to be gene-dense, as are Fugu's, and the macros are gene-poor, as are mammals'. It may be that once a chromosome gets to a certain size or state, that the whole environment changes, and junk starts squeezing in," suggests Elgar.
Comparative genomics also hints that biodiversity reflects not only differences in DNA sequences, but in gene organization. "The Institute for Genomic Research [TIGR] has sequenced a diversity of microbial genomes. And every time they do a sequence, about 20% to 30% of the genes are unknown. In higher organisms, this isn't necessarily the case," says Malek, who came to Agencourt from TIGR. Instead, the genome sequences of more complex, multicellular organisms seem to differ more in organization and in number of repeats than they do in unique content. For example, various genes have been traced through modern primates, duplicating and moving among the chromosomes, but not necessarily changing very much in sequence. This helps account for our great similarity to our closest primate cousins, at least in terms of DNA sequence.
Genomic closeness, however, may be a matter of semantics and expectations. "For vertebrates I think we need a different definition of a gene. In microbial genomes, there are clear starts and stops to genes. This is not so in vertebrates. A gene in Drosophila melanogaster, for example, may theoretically have 38,000 splice variants. It is beginning to look like evolution took a path where these variants became the mechanism for diversity. Everyone has been running around counting genes, but it may really be splice site variants that define the proteome," says McKernan. Having the genetic instructions of the two pufferfish in hand may provide glimpses into how the complexity of the human genome arose.
1. S. Brenner et al., "Characterisization of the pufferfish (Fugu) genome as a compact model vertebrate genome," Nature, 366:265-8. 1993.
2. R. Hinegardner, "Evolution of cellular DNA content in teleost fishes," American Naturalist, 102:517-23. 1968.
3. G. Elgar et al., "Generation and analysis of 25 Mb of genomic DNA from the pufferfish Fugu rubripes by sequence scanning," Genome Research, 9:960-71. 1999.
4. M Hattori, T.D. Taylor, "Part three in the book of genes," Nature, 414:854-5, 2001.
5. P. Deloukas et al., "The DNA sequence and comparative analysis of human chromosome 20," Nature, 414:865-71, 2001.