ABOVE: When it comes to the human genome, scientists opt for well studied genes, neglecting thousands of known and conserved genes with unknown function in the process. Modified from © istock.comerhui1979Natallia Yatskova; designed by Luiza Augusto

In the last two years, scientists achieved two genomic milestones: the complete sequences of the human non-Y genome and, just this past August, that of the Y chromosome.1,2 With the final pages of the human genetic playbook complete, plenty of mysteries remain, including the function of many of the 20,000-25,000 protein-coding genes. To encourage research on these many mystery genes, a team of scientists have created a new publicly available database that ranks genes based on how little is known about them.3 Using this new directory, they selected more than 200 neglected genes that are evolutionarily conserved between fruit flies and humans. The systematic silencing of these genes in fruit flies revealed that many are essential for survival and other important biological functions, demonstrating that there is still much to be explored in the vast unknowns in the genome.

Sean Munro, a biologist at the MRC Laboratory of Molecular Biology and coauthor of the paper, noted that the functional unknome—a portmanteau of unknown genome and a catchall coined by the authors to describe the collection of known genes with unknown function—has shrunk since the early 2000s, but scientists have plenty of ground to cover. “There's still a couple thousand genes in the human genome, at least, for which essentially nothing is known, and then there are some where a little is known, but not very much,” said Munro. 


“This phenomenon very much touches on human health,” said Thomas Stoeger, a systems biologist at Northwestern University who was not involved in the study. There is growing evidence that scientists have a bias towards studying well known proteins, which others argue hinders progress in biomedical research and limits what is known about gene-disease relationships.4-6 

The origin of the unknome dates back to conversations Munro had with colleague Matthew Freeman, a study coauthor, now a cell biologist at the University of Oxford, about the untapped potential of a database that helped scientists to identify genes of unknown function for further analysis. Munro also came across this problem in his own research on the components of membrane trafficking. “Quite often we'd turn out proteins, or other people would find proteins, which are very well conserved in evolution, but absolutely nothing is known about them,” said Munro. 

To determine just how mysterious each protein-coding gene was, Munro and his team gathered data about the function of human proteins and their orthologs in popular model organisms. For this they turned to the Genome Ontology (GO) Consortium, an established bioinformatics initiative that uses a controlled vocabulary to maintain consistency across species and provides annotated information on genes and gene products, including protein function. The Unknome database generates a protein popularity score that is based on the number of GO annotations the protein has.  

Munro and his team then used their new database to identify neglected human proteins that are evolutionarily conserved in their model organism, Drosophila melanogaster. After buzzing around the database, they landed on 260 Drosophila proteins and got busy assessing the biological consequences of knocking the genes down using RNA interference (RNAi). “That took much, much longer than we hoped,” said Munro. This was a result of a confluence of factors, including the usual turnover in the research team and the ambitious task of testing 260 genes in replicates. 

An initial screen revealed that nearly 25 percent of the unknown genes were essential for survival. “The Drosophila geneticists found that quite surprising as they sort of assumed that everything important had been found by conventional genetic screens in flies,” said Munro.

     A plastic contraption with 3 tiers for holding 96-well plates.
To run their biological assays at scale, Munro and his team used a three-tiered flywheel. Each level holds 20 96-well plates.
2023 Rocha et al. CC BY 4.0 DEED.

The researchers advanced the remaining 198 nonessential genes to a second round of testing to determine whether they played a role in a range of biological functions. They found that many of these genes contributed to male and female fertility, wing growth, aberrant protein removal, locomotion, and resilience to stress. One gene, CG11103—called TM2 domain containing 2 (TM2D2) in humans—surprised Munro and his team. When they deleted the gene in female flies, any eggs they laid failed to develop. They linked this failure to launch to an overproduction of cells in the offsprings’ nervous systems, a phenotype that indicates defects in the conserved and highly studied Notch signaling pathway.   “Despite all the work that's been done on [the Notch signaling pathway], this gene had not been found,” said Munro. 

“Having these things online or in a database, I hope it makes this searchable,” said Stoeger. “It could be that maybe someone finds a gene in their assay, and maybe Googles for this gene, and maybe the only thing that they find is this record in the unknown database. I think this is very valuable.” He also noted that new lingo like unknomics helps to give life to the initiative. “Without having a name, I think it's actually difficult to describe,” said Stoeger. 

Although this is a step in the right direction, Stoeger said that the lack of a database is not the only roadblock keeping scientists from exploring the human unknome. He added, “My fear is that the problem is not so much a problem of biological information.”  Previous research by Stoeger revealed that scientists continue to focus their research on a minority of known genes identified before the Human Genome Project.7 In a recent study, Stoeger and his team examined where in the -omics analysis pipeline these understudied genes leak out and found that scientists tend to abandon them while writing up the results, instead drawing attention to known, popular genes.8 

The availability and reliability of reagents is another rate-limiting step, as is convincing researchers to step into the unknome. “It is risky,” said Munro. “It's something where you have to have a very good reason or a good clue to take on something like this, because it is very challenging.” Additionally, funding bodies tend to be risk averse when doling out the dough for research, but Munro said that he has spoken with a couple of organizations that are considering funding research into the unknome to address this problem. 

Beyond its potential value in guiding scientists towards neglected proteins, the Unknome database also highlights just how much of biology remains to be explored. Looking to the future, Munro is excited to see what new tools, like the Unknome database, help reveal. “There might be things out there which are like the unknowns,” said Munro. “No one's looking for the components because no one knows the biological process exists yet. That may sound a bit fanciful, but there are a few examples.” CRISPR was hiding in plain sight in Escherichia coli, leaving scientists like Munro to wonder what else is out there.

References

  1. Nurk S, et al. The complete sequence of a human genome. Science. 2022;376(6588):44-53. 
  2. Rhie A, et al. The complete sequence of a human Y chromosome. Science. 2023;621:344-354.
  3. Rocha JJ, et al. Functional unknomics: Systematic screening of conserved genes of unknown function. PLOS Biol. 2023;21(8):e3002222.
  4. Sinha S, et al. Darkness in the human gene and protein function space: Widely modest or absent illumination by the life science literature and the trend for fewer protein discoveries since 2000. Proteomics. 2018;18(21-22):e1800093. 
  5. Haynes WA, et al. Gene annotation bias impedes biomedical research. Sci Rep. 2018;8:1362.
  6. Kustatscher G, et al. Understudied proteins: Opportunities and challenges for functional proteomics. Nat Methods. 2022;19:774-779.
  7. Stoeger T, et al. Large-scale investigation of the reasons why potentially important genes are ignored. PLOS Biol. 2018;16(9):e2006643.
  8. Richardson RAK, et al. Meta-research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife. 2023;12:RP93429.