Researchers did not observe the same link between a bacterium and human colon cancer that prior investigators had reported.
Retractions are on the rise. But reams of flawed research papers persist in the scientific literature. Is it time to change the way papers are published?
May 1, 2016|
© KRIS MUKAI
An unfortunate story has become all too common: a researcher is suspected of having manipulated data, an investigation is launched, the paper is retracted by a scientific journal, and the offending scientist is punished. But while cases of misconduct and subsequent retractions headline a growing reproducibility problem in the sciences, they actually represent a relatively small number of the flawed studies out there. The vast majority of publications that reported inaccurate results, used impure cell cultures, relied on faulty antibodies, or analyzed contaminated DNA are not the result of wrongdoing, but of honest mistakes, and many such papers persist in the scientific literature uncorrected.
“I think there is a continuum between fraud and errors, and I think people are all too willing to go easy on something if there is no fraud,” says Columbia University statistician Andrew Gelman, who blogs about retractions and reproducibility problems in the scientific literature.
Are these “zombie papers” (to repurpose a term coined by academic publishing watchdog Leonid Schneider) benign—relics of antiquated methodologies or poor reagents that serve as a historical record for the field of inquiry? Or are they worrisome enough to be hunted down and excised from the body of the scientific literature altogether, in the same way that intentionally falsified reports are?
Many researchers argue for the latter. Flawed papers, especially those that become highly cited, run the danger of perpetuating faulty methods or conclusions, sending funding and effort in fruitless directions, and building layers of theory upon shaky conceptual foundations. In this way, zombie papers can spawn more zombie publications, and the damage can be amplified and spread in an infectious pattern.
“It is a big problem, and it is a pervasive problem,” says Brian Nosek, a University of Virginia psychologist and cofounder/executive director of the Center for Open Science. Just how big remains unclear, but Gelman estimates that flawed publications may outnumber the good ones. “I think there are journals and years where I would guess more than half the papers have essentially fatal errors,” he says.
I think there is a continuum between fraud and errors, and I think people are all too willing to go easy on something if there is no fraud.—Andrew Gelman,
And the zombie horde will only continue to grow as ever more journals churn out reams of scientific papers at an increasing rate. Nosek and Gelman are critical of traditional scientific publishing, which has remained essentially unchanged for centuries. They and others say it’s time to modernize the process. Over the past couple of years, researchers have begun to implement new mechanisms and avenues to review, flag, correct, and annotate the scientific literature. In the future, some hope, the way that researchers and publishers interact with each other and the body of work they generate could be radically transformed.
“There is certainly evolution in how people are thinking about these issues,” Nosek says, “and what role publishers then would play if there was more responsivity to evidence as it accumulates rather than just the static record of what was thought at that particular time.”
In the early 1980s, Svante Pääbo was a PhD student at the University of Uppsala in Sweden studying how an adenovirus can block a human histocompatibility antigen and so conceal itself from its host’s immune system. But the young Pääbo, now director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, had a surreptitious side project up his sleeve. “I had studied Egyptology before I went to medical school, so I knew there were all these hundreds and thousands of mummies in the museums,” he says. “I thought I should try to see if DNA might be preserved in them.” Pääbo obtained samples from 23 mummies and scoured them for traces of usable genetic material. And in a few of the samples he found some. He stained the mummy cells, located the nuclei, and cloned the DNA from one of the samples, taken from a child who died 2,400 years ago, using a plasmid vector, as was the era’s go-to DNA sequencing protocol. In a 1985 letter to Nature, Pääbo reported that he had extracted and sequenced DNA from the millennia-old relic.
The publication helped launch the now red-hot field of ancient DNA research. Pääbo would become known as a pioneer of the discipline, and he would go on to extract ancient DNA from a variety of long-dead organisms, extinct mammoths and Neanderthals among them. There was just one problem. That mummy DNA Pääbo sequenced was not from the mummy at all. As Pääbo himself determined nearly a decade later, using the newer method of PCR amplification that became widely used around 1986, the genetic material he had isolated was actually from a modern-day human, likely from the antigen research that he was also conducting. “In hindsight, that clone that’s presented there is surely a contaminant,” Pääbo says.
In 1994, after Pääbo revisited his original mummy data and realized the error, he and colleagues briefly admitted to the mistake in a Nature paper describing their sequencing of ancient mammoth DNA (using methods to ensure contamination was avoided). We “believe that [contamination] represents a great danger to the field of molecular archaeology,” Pääbo and his coauthors wrote, adding that sequences retrieved by molecular cloning are particularly susceptible and “are therefore of only limited scientific value.” More than 20 years later, however, Pääbo’s 1985 mummy DNA paper still stands without a correction or erratum.
While Pääbo is candid about the mistake he made as a PhD student, he contends that the paper doesn’t need formal correction, much less retraction, for three reasons. First, the methodologies it showcased were so rapidly overtaken by advancing technology—PCR and, later, targeted sequencing library preparation and direct DNA capture—that there was no danger of anyone using plasmid cloning and obtaining similarly misleading results, he says. Second, the histological staining results he presented in that paper remain valid. “In general, I do not think I would call the 1985 paper a ‘zombie paper’ in the sense that if it is cited today it is to say that DNA from ancient tissues can survive and be studied,” he wrote in an email to The Scientist. “That conclusion is right even if the actual DNA sequence shown is wrong.” And third, the 1985 paper was more a proof of concept, and was not meant to form a foundation for future research, he says. “It’s not that that sequence leads to any conclusions, any inference about Egyptian history or something.”
Nature seems to agree that the paper, which has been cited more than 560 times since its publication, according to Google Scholar, should be viewed as more of a historical relic than a blemish in the literature. “As technology evolves, so too does science, and new technologies, techniques, and evidence may lead to the reinterpretation or refining of a finding,” Sowmya Swaminathan, head of editorial policy at Nature, wrote in an email to The Scientist. “Researchers accept this as a part of science evolving.”
Leonid Schneider, an erstwhile molecular biologist who now bills himself as an independent science journalist and frequently writes about science publishing and researcher misconduct, also concurs that the 1985 paper has value, but he suggests that action be taken, more on principle than because of any chance of extreme scientific damage. “I still recommend that [Pääbo] issue a statement to go with this article,” he says, “so that whenever somebody clicks on this article from the original publisher, they should also see a statement explaining which part of it is not reliable anymore. So I think it is his duty, even if it’s 30 years old.”
© KRIS MUKAIIn today’s era of digital publishing, flawed studies are much more likely to attract immediate criticism than did Pääbo’s 1985 mummy DNA paper. In December 2010, for example, then NASA research fellow Felisa Wolfe-Simon and her colleagues published a paper showing that a gammaproteobacterium collected from Mono Lake in California was capable of replacing the essential element phosphorus with arsenic, so that it could grow in an arsenic-rich medium devoid of phosphorus. But after a NASA press conference about the findings and the online posting of the manuscript on Science’s website, critics descended on the paper.
Dozens of researchers wrote on blogs, in online forums, and directly to Science claiming that they spied problems with the study’s experimental design and the authors’ interpretations of the results. The journal published much of the debate, including the authors’ responses and a news story detailing the controversy, and when users pull up the paper on the journal’s website, they will find a list of links to these resources. “The scholarly record associated with this paper was significantly amended to reflect the seriousness and volume of questions raised by the scientific community,” Marcia McNutt, editor-in-chief of Science, wrote in an email to The Scientist. “Science published an unprecedented number of technical responses and comments, as a package.” That said, the paper remains, uncorrected and unretracted, largely because its authors maintain the veracity and robustness of its findings.
Just how many flawed papers like the arsenic-life study, as it has come to be known, continue to stand in the literature is anyone’s guess. But it’s likely a very large number, especially if one goes beyond just those papers with identifiable errors to include any study whose methodologies or conclusions have been replaced with new knowledge or understanding. “Any paper has errors. This is part of how science works, right?” says Nosek. “We don’t understand the phenomena we’re investigating, and so we do some research, we identify some things, we learn a little bit more, and we’re a little bit less wrong in how we understand that phenomenon.”
Of course, correcting or retracting the vast numbers of flawed papers isn’t exactly practical. Obesity researcher David Allison of the University of Alabama at Birmingham recently got a taste of the challenges involved in taking on the zombie horde: last year, he and a few collaborators began searching for and trying to correct errors in published papers. For 18 months, the researchers pored over the literature in their fields of obesity, energetics, and nutrition, finding dozens of errors that warranted corrections. But they also found that trying to correct those errors or to retract the papers containing them was a difficult proposition. “After attempting to address more than 25 of these errors with letters to authors or journals, . . . we had to stop—the work took too much of our time,” Allison and his coauthors wrote in a Nature comment published this February.
Too often, Allison says, the concerns he and his coauthors raised—which typically involved problems with the statistical analysis or design of experiments—were met with defensiveness from authors. “Nobody wants to have their errors pointed out publicly,” Allison tells The Scientist. “We all realize they should be, but it’s not fun. If it’s a severe error, we really don’t like it.” And when Allison and his colleagues approached journal editors about the problems they had discovered, most were too consumed with the herculean task of staying on top of mountains of new manuscripts seeking publication to engage in retrospective reviews of already-printed papers. “For the editors, it’s time-consuming for them to resolve this,” Allison says. “So you’ve got all these disincentives up and down the line, and I think that’s a big reason why these things aren’t corrected.”
© KRIS MUKAIOne way to root out questionable papers is postpublication peer review and online commenting, which has become more pervasive in the form of sites such as Faculty of 1000 (F1000), PubMed Commons, PubPeer, and others, as well as commenting functions on the websites of some traditional publishers. This approach is fraught with challenges, however. “Several journals that have implemented online commenting have since discontinued it,” Science’s McNutt wrote in an email. “For most journals, there may be a staff-power problem in terms of monitoring the commenting to keep it constructive and civil.”
Another consideration is anonymity. Last year, PubPeer came under fire for allowing users to post anonymous comments. PubPeer’s founders—who had retained their own anonymity but revealed themselves in response to the criticism—argued that anonymous comments on the site were not inferior to those posted by registered users, and said in an October blog post that allowing anonymous comments was “the only certain defense against legal attack or a breach of site security.”
Some researchers argue that implementing new systems within the existing one will not be sufficient; policing the literature will require a new, broader approach to scientific publishing. “Our present system is an ad hoc invention that dominated science and [has] never been evaluated,” says Nosek. He envisions a system that can help people assess a study’s value based on all the available evidence. As highlighted by Pääbo’s mistaken identification of mummy DNA in 1985 and his admission of error published in a separate paper nine years later, “there is very little direct connection between any [one] scientific contribution and any other scientific contribution,” Nosek says. “The solution is to have better curation of what is actually happening in science, which is [the] accumulation of knowledge.”
Attempting to create that connection, Nosek has spent the last two years helping to launch the SHared Access Research Ecosystem (SHARE) notification service, a collaboration between his Center for Open Science, the Association of Research Libraries, the Association of American Universities, and the Association of Public and Land-grant Universities. “It’s trying to create a single, open data set of all research events—so not just publications, but also grants and clinical trials and retractions and everything else that happens about research,” Nosek says. Once the massive data set is compiled, he adds, “the second step is providing really good curation tools so that these different units of the research literature are linked together and [it is] much easier to search and discover these kinds of things.”
In Nosek’s vision, the scientific paper ceases to exist as a static snapshot of the current state of understanding. Instead, papers become dynamic entities that authors can continually update with new knowledge. “A paper is a paper, and it’s a paper that way forever,” Nosek says. “But really, as new research happens, we should be able to revise those papers, and then just say this is the new version. A paper is never done, because a phenomenon isn’t understood at that point. So you could imagine careers built on the continuous editing of a single paper, which is what we know about a particular phenomenon.”
Implementing the SHARE project—and its European correlate, the OpenAIRE project—is achievable, says Nosek. The key is to develop the technologies necessary to help researchers search, sort, and filter information about a particular paper, after gathering mountains of information about all papers into a single, searchable pipeline. Although the job of corralling not only the scientific literature but all the ancillary discussion that surrounds published papers under one roof would be a big one, Nosek concedes, there is a precedent that points to the viability of achieving the task. “This problem has already been addressed in very effective ways, and that is [by] news and media information via the Internet,” he says. Search engines like Google allow users to digest a huge amount of information by providing tools that allow them to home in on and highlight specific needles among massive haystacks of information.
At least one title, open-access journal F1000-Research, does indeed allow authors of submitted papers to revise their original manuscripts based on comments from users made postpublication, and it posts revised versions alongside other versions, creating “living” versions of scientific studies. “I am delighted that F1000 and other groups are trying new models,” says Nosek. But the real challenge lies in getting the scientific community to broadly agree to adopt such a new system. This might require both researchers and publishers to freely submit not only manuscripts, but also comments, data, and reviews of papers. “To the extent that we can move this infrastructure to be part of the publishing workflow, it’s just a matter of changing our mind-set about what publishing means,” Nosek says.
“If things are preprints, we know not to just believe them, right?” adds Columbia University’s Gelman. “That’s how published papers should be too, I think.”
In addition, Allison says, the scientific community would need to overhaul its whole concept of who actually owns data and research findings. “You’re in charge of it for a while, but it’s really the public’s data,” he says. “And this [change] won’t happen overnight.”
So while zombie papers, such as Pääbo’s mummy DNA study, the arsenic-life paper, and many others too numerous to mention here, will likely live on in the scientific literature, there is a glimmer of hope that, as science adopts a more modern model for publishing and revising results, making papers more dynamic and less static, we may see a downtick in recruitment to the zombie hordes.
May 2, 2016
When I waa a graduate student one of the professors told me that if my research plans included relying on a published result from another lab that I had best repeat that result myself before using it. This is for a variety of reasons, amongst them simple fraud. Even absent worries about fraud this is excellent advice, nothing is ever correct until it's been done by two different people.
May 2, 2016
I enjoyed this thoughtful and balanced article. I also think that we can take very practical and feasible steps towards increasing the transparency and reliability of scientific articles without placing unreasonable burden on traditional editors and journals. These steps are articulated in this eLife feature article: http://elifesciences.org/content/4/e12708v1
May 2, 2016
And yet Pääbo's "honest" mistake:
- makes the fundamental point of the paper (i.e. being able to extract DNA from a mummy) void and flawed.
- and most importantly, that flawed paper served him a very successful career. Possibly, at the expenses of some other researcher who did not make those mistakes but did not publish on a top journal and drew the attenton of the media.
May 6, 2016
The problem is compounded when "reputable" publications refuse to publish anything that seems critical or challenging some published articles. The only thing that they accomplish is to delay the inevitable while letting the zombie publications to spread misinformation.
May 8, 2016
I agree that this is a valuable and interesting article. Well done!
May 12, 2016
"its authors maintain the voracity and robustness of its findings". As they should, for a zombie paper B-)
May 12, 2016
I'm glad that someone is finally pointing this out. You would not believe some of the studies I've seen, things like one researcher who used DMSO (toxic on it's own) to suspend zinc particles, and then concludes that 5 parts per million of zinc are toxic to living cells! WHAT! If this was true we'd all be dead!
I recently transferred from one school to another, and if I hear the words, "Bad results are still results," one more time, I will lose my cool. Negative results are good results, positive results are good results, bad results just come from bad lab practice. The problem is that students are being taught that whatever results they get are still worth the time and materials it took to do them, even if they totally screwed them up! Being able to pass a class based entirely on your ability to bs your way through conclusions on a lab report is unacceptable. Imagine transferring into a class where 4th year students (yes plural) are unable to do a DNA extraction from a kit, yet are still able to pass a lab where the final result depends on the product from that extraction! How about control DNA that is so sheared it's nothing but a smear, yet is supposed to demonstrate an actual medical diagnostic test. If you did work like that in an actual lab you'd be lucky not to be fired, but public institutions are routinely turning out that kind of garbage education. Would you even want to work alongside a person who repeatedly cross-contaminates everyone else's work? Would you ever feel confident hiring someone like that? The "publish or perish" mentality has contaminated so far down the ladder, it will takes decades to get back to quality results from quality research.