What's in a Gene Name?

If you thought the hard work of sequencing the human genome was complete, think again.

By | February 28, 2005


Joelle Bolt

If you thought the hard work of sequencing the human genome was complete, think again. Just ask Human Genome Organization nomenclature committee (HGNC) chair Sue Povey, of London's Galton Institute Laboratory. "The major effort of our group is now an attempt to make the human genome data more human-friendly!" exclaims Povey on her Web site.1

Take, for example, the Down syndrome critical region (DSCR) genes in a region on chromosome 21 that has long been assumed to be critical for that disease's phenotype. A recent study by Roger Reeves and colleagues at the Johns Hopkins School of Medicine demonstrated that some of those genes are neither critical nor necessary for most of the structural features of Down syndrome.2 Those genes, the focus of many scientific papers, are now known to have names that have nothing to do with their real, as yet unknown, functions. The HGNC, according to the Galton Laboratory's Elspeth Bruford, says that until the genes' function is identified, DSCR, while a misnomer, will remain. Once function is determined, "we could consider renaming them," Bruford says via E-mail.

Similarly, identifying a gene's function by homology between species can lead to false predictions of evolutionary descent and illogical or incorrect gene annotations. Chris Ponting, a functional geneticist at Oxford University, points out that in protein databases Caenorhabditis elegans hypothetical protein M04C9.4 is described as containing 'similarity to Bos taurus osteopontin precursor (bone sialoprotein 1) SW:OSTP_BOVIN.'3 "This functional prediction appears to be incorrect," Ponting writes in an E-mail. "Obviously, the invertebrate nematode C. elegans does not have bones!"

The use of the phrase "similarity to" shows how scientists simply follow their own rules for explaining relationships between entities in databases, ignoring the fact that the C. elegans example is "not connected with approved nomenclature of any kind," says Bruford. "This kind of 'similarity to' term is by no means an agreed standard, nor is it to be taken as definitive."

The central problem is the amount of guesswork involved when a gene is unknown. Scientists using language to reflect their predictions about gene function may unintentionally convey a certainty not supported by their data. The HGNC, along with several other groups around the world, hopes to change all that, but it's been an upward battle. Language users, scientists included, tend to hate following rules.

Michael Ashburner, a Drosophila geneticist at Cambridge University, says scientists think nomenclature is "tedious beyond belief." They "don't spend much time on it," he says, and "everyone breaks the rules." Nomenclature committees may have no authority to enforce the rules, leaving enforcement up to journal editors. "It is up to the community to accept what the committees say, so rules are often broken," says Ashburner.

The HGNC has approved more than 20,000 human gene symbols and names and has established a clear set of rules for naming genes and their products.4 The committee has also joined in the effort to clarify ontological relationships among genetic entities. Chief among these efforts is the Gene Ontology consortium,5 which has organized thousands of terms into three distinct networks or ontologies: cellular components, molecular function or activity, and biological process. The consortium's three networks are structured by two expressions of relations: subsumption (is a), and inclusion (part of).

Still, a transparent, entirely appropriate nomenclature and system of linguistic relationships may be out of reach. Phrases such as "similarity to," "is part of," "is a," "derives from," and "located in" remain "nowhere near clearly defined," according to Barry Smith, director of the Institute for Formal Ontology and Medical Information Science at Saarland University in Germany. Smith writes in an E-mail that ill-defined terms and relations lead to authors using them in different ways from one ontology to the next (and sometimes within a single ontology), and they make lots of mistakes along the way.6

Convincing everyone to agree may be the most challenging goal. According to a recent case study, biologists can't even agree on a single conception of the gene.7 Even if many scientists agree that function is the best basis for naming genes, not everyone can agree on the definition of function. Most biologists use the word to mean "acting in a certain way," while those involved in clinical work use it to mean "having a function or purpose."

Smith and his colleagues are hoping to help solve these problems by convincing everyone "of the advantages of a single consolidated suite of well-defined relations, and training everyone in its use," he says. "In this way, all the data annotated in terms of the resulting ontologies would be capable of becoming integrated together automatically."


Follow The Scientist

icon-facebook icon-linkedin icon-twitter icon-vimeo icon-youtube

Stay Connected with The Scientist

  • icon-facebook The Scientist Magazine
  • icon-facebook The Scientist Careers
  • icon-facebook Neuroscience Research Techniques
  • icon-facebook Genetic Research Techniques
  • icon-facebook Cell Culture Techniques
  • icon-facebook Microbiology and Immunology
  • icon-facebook Cancer Research and Technology
  • icon-facebook Stem Cell and Regenerative Science
Life Technologies