Closing the Gaps in the Human Genome: Why Y Was the Final Hurdle

For two decades, scientists struggled to fully sequence the Y chromosome. Finally, researchers have mapped its full length thanks to recent advances in sequencing technology.

Kamal Nahas
| 4 min read
The illustration shows floating chromosomes with a Y-shaped chromosome in the foreground.

Human chromosome Y has been fully sequenced for the first time, completing the first gapless reference male genome.

©ISTOCK, Usis

Register for free to listen to this article
Listen with Speechify
0:00
4:00
Share

By 2022, each human chromosome had been fully mapped with the exception of the Y chromosome.1 Despite being the shortest one, this chromosome has been the toughest to sequence because it’s studded with repetitive DNA.2 Common sequencing techniques collect short reads from random sites on a chromosome and piece them into a single read where they overlap, but repetitive DNA on the Y chromosome complicates assembly due to multisite overlaps.

At last, two teams of scientists have collaboratively tackled this challenge and fully sequenced the Y chromosome. They reported their results in two independent studies published in Nature. The first study described a carefully validated, complete reference sequence, while the second study reported Y chromosome variation between 43 men from different backgrounds.3,4 Together, these data create new opportunities for exploring the genetic makeup and diversity of the Y chromosome.

“A lot of people don’t appreciate the technological development that went on under the hood. It’s really impressive, and it’s going to make assembling accurate and full genomes a lot more possible,” said Brianna Chrisman, a computational genomics researcher at the cancer genomics company GRAIL, who was not involved in either study.

See also “Large Scientific Collaborations Aim to Complete Human Genome

In the first study, researchers from different institutes banded together under the Telomere-to-Telomere (T2T) consortium to fill in the gaps in the reference human genome. To sequence the Y chromosome, Adam Phillippy, a genomics researcher at the National Human Genome Research Institute and study coauthor, together with his colleagues, chose nanopore sequencing because it produces long reads, which unambiguously overlap even if repetitive DNA is present.5 However, this technique is error prone, producing an error every 100 bases or so. So, the researchers also used a high-fidelity technique called single-molecule circular consensus sequencing that produces shorter reads and generates an error every 1000 bases on average.6 Then, in a first, the T2T consortium used an algorithm named Verkko that incorporated both techniques to assemble highly accurate long reads into a full Y chromosome sequence.7

The first full sequence of the Y chromosome contained 30 million new base pairs. Phillippy said that most of these newly discovered sequences relate to sequences on other chromosomes but carry subtle variations. “Now the question is ‘are those subtle variations doing anything interesting?’” he said.

Phillippy and his colleagues found 110 new genes, 41 of which are predicted to code for proteins. The majority were extra copies of the TSPY gene, which is involved in sperm production. It’s not clear why these backups have evolved.

The new Y chromosome sequences could spell change for metagenomics research, which involves sequencing microbial genomes. Human DNA contaminants often creep into these studies.8 “You have people in the lab shedding skin cells into their reagents,” Phillippy explained, and these contaminant sequences could be incorrectly attributed to microbes. From a bioethics standpoint, contaminants could contain DNA signatures of the individuals from which they came. He added that people who donate samples in human microbiome studies, for example, are promised anonymity, and their DNA needs to be excluded from published datasets to avoid the future possibility of tracing their DNA back to them.

The 30 million base pairs in the Y chromosome that were not sequenced until now created a blind spot and could have leaked through the filters. Using the complete Y chromosome sequence rather than previous versions, the team identified nearly 1000 more potential contaminants in these datasets. “It would be helpful and doable to go through the collection of public bacterial reference genomes we have, and maybe viruses as well, and try to flag these Y chromosome sequences,” Chrisman said.

Charles Lee, a genomics researcher at the Jackson Laboratory who led the second study, approached the problem from a different angle. Once the T2T consortium had finetuned the sequencing protocol they used for their study, Lee and his colleagues adopted it and applied it to 43 Y chromosomes from men who inhabited every continent except Australia. “They have samples from all over the world focusing a little bit more on South America, West Africa, and East Asia, which have been historically underrepresented,” Chrisman said. Half of the chromosomes came from African backgrounds, which were among the most genetically diverse because humans who migrated to other continents lost mutations along the way.9 By comparing variations across all 43 chromosomes, the researchers estimated that the most recent common ancestor lived approximately 183,000 years ago.

Each chromosome had a striking degree of variation on average, including three inverted sequences longer than 1000 base pairs, 88 large insertions or deletions longer than 50 base pairs, and beyond 3000 single-base pair mutations. Charting this diversity could help to identify genes that affect health and fertility in males.

Sex chromosomes have been overlooked in disease research because they were not fully sequenced until recently. “Now, there’s no excuse not to include the Y chromosome in studies of human health,” said Melissa Wilson, a computational evolutionary biologist at Arizona State University and coauthor of the study by the T2T consortium. In fact, chromosome Y has recently garnered attention in cancer research because its loss in aging cells correlates with a poor prognosis of bladder cancer.10

“What I’m looking for next is the ability to do what we’ve done here at the single-cell level” to explore variation within an individual, said Lee. Although single-cell sequencing technology already exists, it cannot collect long reads from the DNA of one cell, he explained.

  1. Nurk S, et al. The complete sequence of a human genome. Science. 2022; 376(6588):44–53.
  2. Bachtrog D, Charlesworth B. Towards a complete sequence of the human Y chromosome. Genome Biol. 2001; 2(1016.1).
  3. Rhie A, et al. The complete sequence of a human Y chromosome. Nature. 2023; 620(7975).
  4. Hallast P, et al. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature. 2023; 620(7975).
  5. Goodwin S, et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 2015; 25:1750–1756.
  6. Rhoads A, Au KF. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. 2015; 13(5):278–289.
  7. Rautiainen M, et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol. 2023.
  8. Chrisman B, et al. The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci Rep. 2022; 12:9863.
  9. Choudhury A, et al. High-depth African genomes inform human migration and health. Nature. 2020; 586(7831):741–748.
  10. Abdel-Hafiz HA, et al. Y chromosome loss in cancer drives growth by evasion of adaptive immunity. Nature. 2023; 619(7970):624–631.

Note: August 30: This story was updated to correct the error rates for nanopore sequencing and single-molecule circular consensus sequencing.

Keywords

Meet the Author

  • Kamal Nahas

    Kamal Nahas, PhD

    Kamal is a freelance science journalist based in the UK with a PhD in virology from the University of Cambridge.
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
TS Digest January 2025
January 2025, Issue 1

Why Do Some People Get Drunk Faster Than Others?

Genetics and tolerance shake up how alcohol affects each person, creating a unique cocktail of experiences.

View this Issue
Sex Differences in Neurological Research

Sex Differences in Neurological Research

bit.bio logo
New Frontiers in Vaccine Development

New Frontiers in Vaccine Development

Sino
New Approaches for Decoding Cancer at the Single-Cell Level

New Approaches for Decoding Cancer at the Single-Cell Level

Biotium logo
Learn How 3D Cell Cultures Advance Tissue Regeneration

Organoids as a Tool for Tissue Regeneration Research 

Acro 

Products

Sapient Logo

Sapient Partners with Alamar Biosciences to Extend Targeted Proteomics Services Using NULISA™ Assays for Cytokines, Chemokines, and Inflammatory Mediators

Bio-Rad Logo

Bio-Rad Extends Range of Vericheck ddPCR Empty-Full Capsid Kits to Optimize AAV Vector Characterization

Scientist holding a blood sample tube labeled Mycoplasma test in front of many other tubes containing patient samples

Accelerating Mycoplasma Testing for Targeted Therapy Development

An illustration of different-shaped bacteria.

Leveraging PCR for Rapid Sterility Testing