ChatGPT Fails to Flag Retracted and Problematic Articles

The large language model scored a majority of discredited or retracted articles highly, highlighting that information obtained from AI tools must be verified.

Written bySneha Khedkar
| 3 min read
Magnifying glass in front of a stack of documents, signifying that ChatGPT ignored retractions.
Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share

Not long ago, researchers may have spent weeks combing through the literature, looking for clues to piece together a perfect research paper. Now, many simply seek help from large language models (LLMs) like ChatGPT to speed up their work.

With academics increasingly relying on ChatGPT, Mike Thelwall, a data scientist at the University of Sheffield sought to understand its credibility. “We wondered whether ChatGPT knew about retractions and would report [them]…or whether it would filter them out of its database so that it wouldn't report them,” said Thelwall.

A selfie of Mike Thelwall, who investigated whether ChatGPT ignored retractions and other concerns in research articles.

Mike Thelwall, a data scientist at the University of Sheffield, investigated whether ChatGPT ignored retractions and other concerns in research articles.

Mike Thelwall

To find out, Thelwall and his team asked the LLM to assess the quality of discredited or retracted articles and discovered that ChatGPT did not flag the concerns.1 Their results, published in the journal Learned Publishing, emphasize the need for verifying information obtained from LLMs.

“This is a fantastic paper [on a] really, really important topic,” said Jodi Schneider, an information scientist at the University of Wisconsin-Madison, who was not involved in the study. The bottom line for researchers is “don't trust any fact that is coming from AI [tools],” she noted.

For their study, Thelwall and his team identified 217 articles that either had controversial claims or had been retracted. They then submitted the article titles and abstracts to ChatGPT, requesting the tool to evaluate the quality—benchmarked against standard guidelines—of each paper 30 times, yielding 6,510 responses. They did not ask the LLM whether the article had been retracted upfront, “because that's not what a user would do,” said Thelwall.

None of the 6,510 responses that ChatGPT generated mentioned that the articles were retracted or had been flagged for serious concerns. The tool scored a majority of the papers highly, indicating that the articles were world-leading or internationally excellent.

This surprised Thelwall. “I was really expecting a low score because of the retraction,” he said. “But it didn't do that in nearly all cases.”

Although ChatGPT identified a few methodological weaknesses in the articles it scored lower, none of these criticisms were relevant to the articles’ retraction or correction statements. Only in five cases did the LLM mention that the study was part of a controversy.

A photograph of Jodi Schneider, who studies post-citation retractions, wearing a red top and glasses.

Jodi Schneider, an information scientist at the University of Wisconsin-Madison, studies post-retraction citations.

© School of Information Sciences, University of Illinois Urbana-Champaign/ Thompson-McClellan Photography

To investigate further, the researchers directly asked ChatGPT whether specific claims from retracted articles were true. The LLM responded that the claims were likely to be true, partially true, or consistent with research almost two-thirds of the time. While it mentioned that some statements were not established or were unsupported by current research at times, it flagged a statement as false in only one percent of the cases.

The results did not surprise Schneider for two reasons.We know that that LLMs lie. They tell us what we want to hear,” she said. Additionally, through her studies on post-retraction citations, she had observed that researchers constantly cite work that has been retracted, indicating that retracted research is still moving around.

She noted that one of the strengths of this study was how carefully the researchers chose the studies that had been retracted. “They gave a lot of thought to the subtleties of why things were retracted, and whether [in] certain reasons for retraction, the information might still be valid…like plagiarism,” she noted. Going forward, she said researchers could use this methodology to investigate whether other LLMs have a similar tendency.

Thelwall agreed, noting that they could conduct a similar study using more powerful models. But for now, he hopes that the results highlight the limitations of AI tools. “They can augment us; they can help us, make us more efficient,” he said. “But they can't replace [us]. Not yet, anyhow.”

  1. Thelwall M, et al. Does ChatGPT ignore article retractions and other reliability concerns? Learn Publ. 2025;38(4):e2018.

Related Topics

Meet the Author

  • Sneha Khedkar

    Sneha Khedkar is an Assistant Editor at The Scientist. She has a Master’s degree in biochemistry, after which she studied the molecular mechanisms of skin stem cell migration during wound healing as a research fellow at the Institute for Stem Cell Science and Regenerative Medicine in Bangalore, India. She has previously written for Scientific American, New Scientist, and Knowable Magazine, among others.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
Illustration of a developing fetus surrounded by a clear fluid with a subtle yellow tinge, representing amniotic fluid.
January 2026

What Is the Amniotic Fluid Composed of?

The liquid world of fetal development provides a rich source of nutrition and protection tailored to meet the needs of the growing fetus.

View this Issue
Redefining Immunology Through Advanced Technologies

Redefining Immunology Through Advanced Technologies

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Beckman Coulter Logo
Skip the Wait for Protein Stability Data with Aunty

Skip the Wait for Protein Stability Data with Aunty

Unchained Labs
Graphic of three DNA helices in various colors

An Automated DNA-to-Data Framework for Production-Scale Sequencing

illumina

Products

nuclera logo

Nuclera eProtein Discovery System installed at leading Universities in Taiwan

Brandtech Logo

BRANDTECH Scientific Introduces the Transferpette® pro Micropipette: A New Twist on Comfort and Control

Biotium Logo

Biotium Launches GlycoLiner™ Cell Surface Glycoprotein Labeling Kits for Rapid and Selective Cell Surface Imaging

Colorful abstract spiral dot pattern on a black background

Thermo Scientific X and S Series General Purpose Centrifuges

Thermo Fisher Logo