ABOVE: © ISTOCK.COM, SHUOSHU
RNA sequencing is a popular tool among molecular biologists, because it allows them to examine gene expression patterns in DNA. However, the technique is susceptible to experimental artifacts, which can lead to misinterpreted findings. According to a study published last week (November 12) in PLOS Biology, one such bias, which is associated with gene length, is widespread in many published datasets.
Rani Elkon, a bioinformatician at Tel Aviv University in Israel, says that his team was analyzing RNA sequencing (RNA-seq) datasets for a project aimed at infering the co-regulation of genes by examining their co-expression across many different biological conditions when they stumbled upon a puzzling finding: Genes coding for proteins in the ribosome or other translation-related machinery—which are exceptionally short—and genes coding for extracellular matrix proteins such as collagen—which are exceptionally long—kept popping up in their analyses. “In many different datasets, genes that were upregulated ...