WIKIMEDIA, WEBRIDGEAn analysis of proteomic data from seven studies suggests the human genome contains fewer than 20,000 protein-coding genes, 1,700 fewer than previously predicted. The results, published last month (June 16) in Human Molecular Genetics and posted last year on arXiv.org, also show little evidence of protein expression for more evolutionarily recent genes that can only be traced back to primate lineages.
The protein-coding region of the human genome has been shrinking since its discovery. The first sequences published in 2001 predicted 26,000—30,000 genes; a recent evolutionary comparison suggested the number was closer to 20,500. Now, that number might be reduced to approximately 19,000.
According to the Physics arXiv Blog, written in January, “That’s an interesting result that is partly a reflection of the state of genomics. The human genome is by no means fully defined and biologists are still in the process of refining their gene...
"The coding part of the genome [which produces proteins] is constantly moving," lead author Alfonso Valencia of the Spanish National Cancer Research Centre said in a statement. "No one could have imagined a few years ago that such a small number of genes could make something so complex."
Valencia and his colleagues combined peptide information from large-scale mass spectrometric analyses to confirm the expression of 11,838 genes. The analyses revealed an additional 2,000 genes that did not show evidence of peptide expression in these datasets. The majority of expressed genes correspond to the oldest, most conserved regions of the genome, according to the study.
"The number of new genes that separate humans from mice [those genes that have evolved since the split from primates] may even be fewer than ten," study co-author David Juan said in the press release.
Although this work reduces the number of human genes to fewer than those currently predicted in the nematode genome of Caenorhabditis elegans, Valencia said it might be too early to compare humans to worms. “Our work suggests that we will have to redo the calculations for all genomes, not only the human genome,” he said.
These results are being evaluated for incorporation into an updated genome annotation. “When this happens it will redefine the entire mapping of the human genome, and how it is used in macro projects such as those for cancer genome analysis,” Valencia said.