Menu

New Database Expands Number of Estimated Human Protein-Coding Genes

Some scientists are not yet convinced that the list is accurate.

Jun 19, 2018
Diana Kwon

ISTOCK, BLACKJACK3D

The human genome may contain more protein-coding genes than prior analyses suggested. A study published last month (May 29) on BioRxiv provides an expanded database of approximately 5,000 novel genes—of those, around 1,000 code for proteins, expanding the estimated number of protein-coding genes from around 20,000 to 21,000.

“If people like our gene list, then maybe a couple years from now we’ll be the arbiter of human genes,” study coauthor Steven Salzberg, a computational biologist at Johns Hopkins University, tells Nature.

Salzberg and his colleagues compiled a catalog of human genes and transcripts using data from the Genotype-Tissue Expression (GTEx) project, in which scientists sequenced the RNA from various tissues in hundreds of human subjects. By comparing the sequenced RNA to the human genome, the researchers were able to compile a database of 43,162 genes—21,306 of which coded for proteins, and 21,856 were noncoding genes.

According to Nature, this dataset includes many more genes than currently existing datasets. For example, the GENCODE gene set, a widely used human gene database run by the European Bioinformatic Institute (EBI) in the U.K., includes 19,901 protein-coding genes and 15,779 noncoding ones.

Some scientists say more evidence is required to verify that that the new gene list is accurate. For example, Adam Frankish, a computational biologist at the EBI involved in the GENCODE project who was not involved in the study, tells Nature that after carefully analyzing about 100 of the newly identified protein-coding genes, he and his colleagues found that only one of those seems to truly code for protein.

Salzberg tells Nature that having an accurate gene count is important, because uncounted genes are frequently ignored—meaning those containing disease-causing mutations may be overlooked. On the other hand, Frankish tells Nature that hastily adding genes could also be problematic, because they may divert scientists’ attention away from the genes that are actually involved in a disease.

April 2019

Will Car T Cells Smash Tumors?

New trials take the therapy beyond the blood

Marketplace

Sponsored Product Updates

Getting More Consistent Results by Knowing the Quality of Your Protein
Getting More Consistent Results by Knowing the Quality of Your Protein
Download this guide from NanoTemper to learn how to identify and evaluate the quality of your protein samples!
Myth Busting: The Best Way to Use Pure Water in the Lab
Myth Busting: The Best Way to Use Pure Water in the Lab
Download this white paper from ELGA LabWater to learn about the role of pure water in the laboratory and the advantages of in-house water purification!
Shimadzu's New Nexera UHPLC Series with AI and IoT Enhancements Sets Industry Standard for Intelligence, Efficiency and Design
Shimadzu's New Nexera UHPLC Series with AI and IoT Enhancements Sets Industry Standard for Intelligence, Efficiency and Design
Shimadzu Corporation announces the release of the Nexera Ultra High-Performance Liquid Chromatograph series, incorporating artificial intelligence as Analytical Intelligence, allowing systems to detect and resolve issues automatically. The Nexera series makes lab management simple by integrating IoT and device networking, enabling users to easily review instrument status, optimize resource allocation, and achieve higher throughput.
IDT lowers genomic barriers with powerful rhAmpSeq™ targeted sequencing system
IDT lowers genomic barriers with powerful rhAmpSeq™ targeted sequencing system
Increasing accuracy and reducing cost barriers, IDT’s innovative system delivers simple and cost-effective amplicon sequencing