Technical Bias Widespread in RNA-Seq Datasets

Genes that are exceptionally long or short are overrepresented in some published reports, which can lead to misinterpreted results.

| 3 min read
rna-seq rna sequencing bias dataset transcript

Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share

ABOVE: © ISTOCK.COM, SHUOSHU

RNA sequencing is a popular tool among molecular biologists, because it allows them to examine gene expression patterns in DNA. However, the technique is susceptible to experimental artifacts, which can lead to misinterpreted findings. According to a study published last week (November 12) in PLOS Biology, one such bias, which is associated with gene length, is widespread in many published datasets.

Rani Elkon, a bioinformatician at Tel Aviv University in Israel, says that his team was analyzing RNA sequencing (RNA-seq) datasets for a project aimed at infering the co-regulation of genes by examining their co-expression across many different biological conditions when they stumbled upon a puzzling finding: Genes coding for proteins in the ribosome or other translation-related machinery—which are exceptionally short—and genes coding for extracellular matrix proteins such as collagen—which are exceptionally long—kept popping up in their analyses. “In many different datasets, genes that were upregulated ...

Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member? Login Here

Keywords

Meet the Author

  • Diana Kwon

    Diana is a freelance science journalist who covers the life sciences, health, and academic life.
Share
May digest 2025 cover
May 2025, Issue 1

Study Confirms Safety of Genetically Modified T Cells

A long-term study of nearly 800 patients demonstrated a strong safety profile for T cells engineered with viral vectors.

View this Issue
iStock

TaqMan Probe & Assays: Unveil What's Possible Together

Thermo Fisher Logo
Meet Aunty and Tackle Protein Stability Questions in Research and Development

Meet Aunty and Tackle Protein Stability Questions in Research and Development

Unchained Labs
Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Bio-Rad
How technology makes PCR instruments easier to use.

Making Real-Time PCR More Straightforward

Thermo Fisher Logo

Products

The Scientist Placeholder Image

Biotium Launches New Phalloidin Conjugates with Extended F-actin Staining Stability for Greater Imaging Flexibility

Leica Microsystems Logo

Latest AI software simplifies image analysis and speeds up insights for scientists

BioSkryb Genomics Logo

BioSkryb Genomics and Tecan introduce a single-cell multiomics workflow for sequencing-ready libraries in under ten hours

iStock

Agilent BioTek Cytation C10 Confocal Imaging Reader

agilent technologies logo