Human Transcriptome Maps Exclude Most Populations. Scientists Decoded Just How Much.

Sequencing diverse populations revealed over 41,000 transcripts missing in Eurocentric references, exposing ancestry bias that limits insights into global disease risk.

Written bySneha Khedkar
| 4 min read
People’s avatars and DNA structure on world map. Current human reference genomes do not account for genetic diversity. RNA sequencing uncovered the extent of ancestry bias in transcriptomics.
Register for free to listen to this article
Listen with Speechify
0:00
4:00
Share

At the turn of the century, the biomedical research community saw a landmark global effort, when scientists built a reference human genome sequence to better understand the genes underlying health and disease. However, they stitched together this reference genome sequence from individual genomes of people belonging to a small slice of humanity.

“Most of the omics data, including transcriptomic data, is dominated by samples that have been obtained from individuals of European ancestry,” said Roderic Guigó, a computational genomics researcher at Barcelona Institute of Science and Technology. To bridge this gap, he teamed up with Marta Melé, a transcriptomics and functional genomics researcher at Barcelona Supercomputing Center.

Now, the researchers analyzed samples belonging to people from eight genetically diverse populations and identified thousands of novel transcripts not found in the reference transcriptome.1 Their findings, published in Nature Communications, highlight the extent of ancestry bias in gene maps, which prevents scientists from obtaining important insights about the biology and disease risk in non-European populations.

Current Human Gene Maps Are Biased Towards European Ancestry

It is not surprising that there is a Eurocentric bias, but it was interesting to see the extent, said Divya Tej Sowpati, a computational genomics researcher at the Center for Cellular and Molecular Biology, who was not involved in the study. “It is good that someone went ahead and showed it. It was important to catalog [it], and it's commendable,” he added.

Eight people stand in a corridor in a supercomputer facility. The researchers carried out RNA sequencing of diverse samples to uncover the extent of Eurocentric ancestry bias in transcriptomics.

Members of the research team that uncovered the extent of ancestry bias pictured in facilities housing MareNostrum5, the supercomputer which was critical for processing the vast amounts of data generated by the study.

Mario Ejarque / BSC-CNS

Melé agreed that the ancestry bias in transcriptomes was not unexpected. “When we started this project, we had the suspicion that this might be the case, that [there] could be a bias,” she said. What surprised her was that nobody had looked at these differences despite the well-documented bias in the field towards samples from European ancestry.

For instance, a majority of the sequence of the original reference human genome came from just a handful of people, many of whom were enrolled through a newspaper advertisement in New York. By looking at more diverse genomes such as those from the GenomeIndia Project or Egyptian Multiomics Dataset, scientists uncovered previously unreported genes associated with disease risk between people from European and non-European ancestry.2

Long-Read RNA Sequencing Uncovers the Extent of European Bias in Genomics

Building on such studies, Guigó, Melé, and their team sought to investigate whether transcripts differed between the populations. For this, they used long-read RNA sequencing (RNA-seq), a technology that can sequence RNA molecules in a full-length transcript from end to end.3 They sequenced RNA extracted from B cell lines derived from 43 people belonging to eight populations across Africa, America, Asia, and Europe.

By employing a series of stringent filters to ensure quality, the team identified more than 155,000 transcripts from these samples. Of these, more than 41,000 were novel and had not been reported in any official gene map. Nearly 700 of these novel transcripts came from DNA regions previously thought to contain no genes.

To study the extent of European ancestry bias, the team then grouped the cell line samples as belonging to European or non-European ancestries and compared them against conventional reference maps. Compared to the former, the latter samples carried more novel transcripts, highlighting that non-European transcripts are less represented in reference gene maps.

Continue reading below...

Like this story? Sign up for FREE Genetics updates:

Latest science news storiesTopic-tailored resources and eventsCustomized newsletter content
Subscribe

Guigó, Melé, and their team also identified more than 2,200 population-specific transcripts present in one ancestry but not others. While non-European population-specific transcripts were mostly novel, most transcripts for European populations were already characterized.

Missing Transcripts Have Implications for Disease Biology

The team discovered that many of the novel ancestry-specific transcripts occurred in genes associated with autoimmune diseases, which present differently between the populations. Current reference maps do not contain information about such transcripts.

“When you lack [a] reference that is unbiased or that represents the populations fully…it has the potential for you to miss important connections…between genetics, diseases, and genetic ancestry,” said study coauthor Fairlie Reese, a genomics researcher in Melé’s group. This limits a better understanding, diagnosis, as well as treatment of diseases in non-European populations.

“[For] example…if there is a mutation…in a transcript that is not annotated, that we don't have the [correct] map [of], we're going to think that this mutation or this change doesn't have any effect,” explained Melé. In contrast, a more complete characterization of transcripts from all over the world can provide information about the implications of such mutations, “because we have the maps that are correct and are more representative of the whole humanity.”

Photograph of Divya Tej Sowpati, a computational genomics researcher who contributed to the GenomeIndia Project.

Divya Tej Sowpati is a computational genomics researcher at the Center for Cellular and Molecular Biology, who was involved in the GenomeIndia Project that uncovered the genetic diversity in Indian population.

Shambhavi Garde

The data generated in the study can offer important insights into disease susceptibility and severity in populations all over the world, agreed Sowpati. “This is paving way to a new paradigm in RNA-seq or that kind of a transcript-based analysis.”

Despite this, Sowpati noted that the small sample size does not capture all of human genetic diversity. “But this is a good starting point.”

Guigó agreed, “This is not a sufficiently representative sampling of the human transcriptome diversity.” As part of the human pangenome project—an ongoing effort to build a more complete and more diverse human reference genome—scientists have catalogued transcriptomes of hundreds of populations and analyzing that data is important, he added.

“[This] is more like the tip of the iceberg,” said Melé. She hopes that scientists can expand such investigations to other populations as well as other cell types. “We need to fix it as a scientific community, not only [us], to get more representation of other populations and other cell types and tissues.”

Related Topics

Meet the Author

  • Sneha Khedkar

    Sneha Khedkar is an Assistant Editor at The Scientist. She has a Master’s degree in biochemistry, after which she studied the molecular mechanisms of skin stem cell migration during wound healing as a research fellow at the Institute for Stem Cell Science and Regenerative Medicine in Bangalore, India. She has previously written for Scientific American, New Scientist, and Knowable Magazine, among others.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
Illustration of a developing fetus surrounded by a clear fluid with a subtle yellow tinge, representing amniotic fluid.
January 2026

What Is the Amniotic Fluid Composed of?

The liquid world of fetal development provides a rich source of nutrition and protection tailored to meet the needs of the growing fetus.

View this Issue
Redefining Immunology Through Advanced Technologies

Redefining Immunology Through Advanced Technologies

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Beckman Coulter Logo
Skip the Wait for Protein Stability Data with Aunty

Skip the Wait for Protein Stability Data with Aunty

Unchained Labs
Graphic of three DNA helices in various colors

An Automated DNA-to-Data Framework for Production-Scale Sequencing

illumina

Products

nuclera logo

Nuclera eProtein Discovery System installed at leading Universities in Taiwan

Brandtech Logo

BRANDTECH Scientific Introduces the Transferpette® pro Micropipette: A New Twist on Comfort and Control

Biotium Logo

Biotium Launches GlycoLiner™ Cell Surface Glycoprotein Labeling Kits for Rapid and Selective Cell Surface Imaging

Colorful abstract spiral dot pattern on a black background

Thermo Scientific X and S Series General Purpose Centrifuges

Thermo Fisher Logo