Todd Heatherton had groped students, according to allegations, and was facing termination.
A study shows that the methods by which scientists evaluate each other’s work are error-prone and poor at measuring merit.
October 15, 2013|
FLICKR, EUROPEAN SOUTHERN OBSERVATORYScientific publications are regularly evaluated by post-publication peer review, number of citations, and impact factor (IF) of the journal in which they are published. But research evaluating these three methods, published in PLOS Biology last week (October 8), found that they do a poor job of measuring scientific merit. “Scientists are probably the best judges of science, but they are pretty bad at it,” said first author Adam Eyre-Walker of the University of Sussex in the U.K. in a statement.
Eyre-Walker and coauthor Nina Stoletzki of Hannover, Germany, analyzed post-publication peer review databases from Faculty of 1000 (F1000) and the Wellcome Trust, containing 5,811 and 716 papers respectively. In each of these databases, reviewers assigned subjective scores to papers based on merit. Eyre-Walker and Stoletzki expected that papers of similar merit would get similar scores, but they found that the reviewers assigned papers the same scores about half the time—only slightly more often than expected by chance. The researchers also found a strong correlation between the IF of the journal in which papers were published and the merit scores that reviewers assigned to papers.
“Overall, it seems that subjective assessments of science are poor; they do not correlate strongly to each other and they appear to be strongly influenced by the journal in which the paper was published, with papers in high-ranking journals being afforded a higher score than their intrinsic merit warrants,” the authors wrote.
Eyre-Walker and Stoletzki also found that the number of citations a paper accumulated was mostly random, though papers published in journals with higher IF had more citations, suggesting that citation number is also poor at measuring merit. They suggested, too, that because reviewers usually disagree on the merit of a paper, journal IF is also an inconsistent way to judge a paper. However, they concluded that IF is probably the least error-prone of the three measures based on its transparency.
In an accompanying editorial Jonathan Eisen of the University of California Davis and colleagues wrote that the “study is important for two reasons: it is not only among the first to provide a quantitative assessment of the reliability of evaluating research . . . but it also raises fundamental questions about how we currently evaluate science and how we should do so in the future.”
October 15, 2013
A response from Faculty of 1000 to the Eyre-Walker and Stoletzki study has been published here: http://blog.f1000.com/2013/10/10/peer-review-subjective/
Disclosure: I am currently employed by Faculty of 1000
October 15, 2013
Big news here! Peer reviewers are lousy! Details at 11!
So I went over the past 5 years of publications.
I've had exactly two reviewers that were good. I've had 8 reviewers that stank. The rest were mediocre, but tolerable. Their faults included:
- Glaringly obvious they had abysmal English comprehension and could not understand what they reviewed.
- Their complaints had nothing to do with what was written, but went off on a tangent, despite a clear letter that addressed why those aspects were left out. (Impossible to write everything in one paper.) Getting rejected for impossibilities is very frustrating.
- Indecipherable gobbledegook. Impossible to tell what on earth they were talking about.
- Astonishing errors of basic science or geometry. I mean real jaw-droppers, like appearing not to know what molecular weight is.
- Innumeracy and failure to understand basic mathematics.
- Comments not tied to specific parts of the paper that make it impossible to respond to them.
I'll balance that by saying that I have reviewed papers that had similar problems. But I almost always wade through and write real comments. Even the worst paper I did that with for the first 2 pages.
October 15, 2013
It is good that people are defining criteria for evaluating peer review. If flaws are identified, then the next step has to be the definition of remedies. I see lots of complaints: Every rejected manuscript is blamed by the authors on an incompetent reviewer even when the authors should be a bit more introspective. Then you have the manuscripts accepted that should have been rejected. Yes, those all occur and still will, even under the best of circumstances. But let's get beyond the griping and define improvements that lessen (although never will eliminate) the issues that we all gripe about.
Do we all self-publish online with views, reads and 'stars' as the merit criteria? If the work is out there and good, do these merit criteria even matter, particularly when they can be gamed to look better than they are? If evaluations are needed, who does the evaluation? Anybody on the web no matter her/his qualifications? Is the Faculty of 1000 model the way to go? If so, why has it gained no traction? Equally important, if self-publishing, how does one ensure the integrity of the work and its archiving for history (traditionally in the domain of libraries)? There are lots of moving parts to be considered thoughtfully.