Useless Peer Review?

A study shows that the methods by which scientists evaluate each other’s work are error-prone and poor at measuring merit.

Oct 15, 2013
Abby Olena

FLICKR, EUROPEAN SOUTHERN OBSERVATORYScientific publications are regularly evaluated by post-publication peer review, number of citations, and impact factor (IF) of the journal in which they are published. But research evaluating these three methods, published in PLOS Biology last week (October 8), found that they do a poor job of measuring scientific merit. “Scientists are probably the best judges of science, but they are pretty bad at it,” said first author Adam Eyre-Walker of the University of Sussex in the U.K. in a statement.

Eyre-Walker and coauthor Nina Stoletzki of Hannover, Germany, analyzed post-publication peer review databases from Faculty of 1000 (F1000) and the Wellcome Trust, containing 5,811 and 716 papers respectively. In each of these databases, reviewers assigned subjective scores to papers based on merit. Eyre-Walker and Stoletzki expected that papers of similar merit would get similar scores, but they found that the reviewers assigned papers the same scores about half the time—only slightly more often than expected by chance. The researchers also found a strong correlation between the IF of the journal in which papers were published and the merit scores that reviewers assigned to papers.

“Overall, it seems that subjective assessments of science are poor; they do not correlate strongly to each other and they appear to be strongly influenced by the journal in which the paper was published, with papers in high-ranking journals being afforded a higher score than their intrinsic merit warrants,” the authors wrote.

Eyre-Walker and Stoletzki also found that the number of citations a paper accumulated was mostly random, though papers published in journals with higher IF had more citations, suggesting that citation number is also poor at measuring merit. They suggested, too, that because reviewers usually disagree on the merit of a paper, journal IF is also an inconsistent way to judge a paper. However, they concluded that IF is probably the least error-prone of the three measures based on its transparency.

In an accompanying editorial Jonathan Eisen of the University of California Davis and colleagues wrote that the “study is important for two reasons: it is not only among the first to provide a quantitative assessment of the reliability of evaluating research . . . but it also raises fundamental questions about how we currently evaluate science and how we should do so in the future.”