Microparadigms in cell biology?

Textual model questions efficiency of gaining scientific knowledge

| 3 min read

Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share
Published scientific statements, whether they are later proven true or false, have a profound effect on subsequent interpretations by researchers and on the probability that they will eventually come to a correct conclusion about a scientific question, a statistical analysis of protein interaction literature reveals. The findings, published this week in the Proceedings of the National Academy of Sciences (PNAS), suggest that the way these "microparadigms" bias future interpretations may actually slow down the process of gaining scientific truth.The paper "sets up a very sophisticated model" to answer large-scale questions about how scientific knowledge is produced that "no one has previously been able to measure," said Neil Smalheiser at the University of Illinois at Chicago, who did not participate in this study.These findings hint that the "current way we produce and interpret results is not optimal" for scientists to ultimately converge upon the correct result, according to first author Andrey Rzhetsky of Columbia University. "The model suggests that dependence between statements is too strong."Rzhetsky's team assessed 1.5 million unique statements about protein interactions from 150,000 full text articles in 78 journals (GENEWAYS 6.0). Using a binary system (eg. protein A either interacts or does not interact with protein B), they chronologically ordered statements about each pair of proteins to construct chains of reasoning over time.The group then simulated different ways scientists might approach published findings, and assessed the probability that each scenario would lead to the correct answer at any given step of the chain. If scientists trust nobody, for example, and ignore all previous literature, the probability of publishing the correct result remains constant. At other extremes, scientists could be super-conformists (usually agreeing with the majority opinion about a given protein-protein relationship) or super-anti-conformists (usually agreeing with the minority opinion).The authors searched their real world data set for these hypothetical patterns; while all five were present, the pattern of mild skepticism was most common.When they measured the momentum, or strength of influence, of published statements on future interpretations, they found that scientists give their own data at least 10 fold greater weight than others' findings, but are still heavily influenced by previous results and particularly the majority opinion -- revealing a tendency for conformism. What's more, the authors discovered that a strikingly large proportion of results (95%) are positive -- reporting presence rather than absence of an interaction.According to the authors' stochastic analysis, this predominance of positive results can only be explained by two extremes: A very low rate of experimental errors or exceptionally invalid experiments. So scientists are either perpetuating truth or perpetuating errors, Rzhetsky said.The authors also found that the momentums of actual published statements are too high to optimize the probability of coming to the right result at the end of a given chain. This phenomenon could be explained by the premium placed on new data in science publishing, said Gully Burns at the University of Southern California, who did not participate in this study. "You can't really get things published simply reproducing other people's results," he said. To produce correct scientific knowledge more efficiently, Rzhetsky suggested "independent benchmarking" by an institution that would periodically verify a sampling of the literature.This paper demonstrates the utility of similar exercises for large-scale data mining, even beyond protein interactions, Burns told The Scientist. Researchers have performed similar data mining only in sequence databases, added Smalheiser. "It's probably as sophisticated an example of text mining as there is so far, [and] more direct and more sensitive than citation analysis."Still, according to Burns, it will be important to "parametrize more details of individual experiments" in a future model, for example by accounting for the section of the paper in which a statement is found or the animal model or cell type used to derive it.Rzhetsky said this work is part of a larger effort to sort and evaluate millions of facts from the literature to create an overarching model of cellular interactions. "A huge amount of information is already published and locked in literature," he said. "We're trying to get that information out."Ishani Ganguli iganguli@the-scientist.comLinks within this articleA. Rzhetsky et al., "Microparadigms: Chains of collective reasoning in publications about molecular interactions," PNAS, March 14, 2006. http://www.pnas.org/cgi/doi/10.1073/pnas.0600591103R. Finn, "Program uncovers hidden connections in the literature," The Scientist, May 11, 1998. http://www.the-scientist.com/article/display/18032/Neil Smalheiser http://www.psych.uic.edu/faculty/smalheiser.htmAndrey Rzhetsky http://genome6.cpmc.columbia.edu/andrey/Gully Burns http://www-rcf.usc.edu/~gully/
Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member? Login Here

Meet the Author

  • Ishani Ganguli

    This person does not yet have a bio.
Share
May digest 2025 cover
May 2025, Issue 1

Study Confirms Safety of Genetically Modified T Cells

A long-term study of nearly 800 patients demonstrated a strong safety profile for T cells engineered with viral vectors.

View this Issue
Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Bio-Rad
How technology makes PCR instruments easier to use.

Making Real-Time PCR More Straightforward

Thermo Fisher Logo
Characterizing Immune Memory to COVID-19 Vaccination

Characterizing Immune Memory to COVID-19 Vaccination

10X Genomics
Optimize PCR assays with true linear temperature gradients

Applied Biosystems™ VeriFlex™ System: True Temperature Control for PCR Protocols

Thermo Fisher Logo

Products

The Scientist Placeholder Image

Biotium Launches New Phalloidin Conjugates with Extended F-actin Staining Stability for Greater Imaging Flexibility

Leica Microsystems Logo

Latest AI software simplifies image analysis and speeds up insights for scientists

BioSkryb Genomics Logo

BioSkryb Genomics and Tecan introduce a single-cell multiomics workflow for sequencing-ready libraries in under ten hours

iStock

Agilent BioTek Cytation C10 Confocal Imaging Reader

agilent technologies logo