Study: Peer Review Predicts Success

Scientists who evaluate National Institutes of Health grant applications often identify the projects that will have the biggest scientific impact, according to an analysis.

Apr 23, 2015
Ruth Williams

WIKIMEDIA, AREYNThe National Institutes of Health (NIH) peer-review scoring system, which is used to select grant proposals for funding, is an accurate predictor of how impactful proposed research will ultimately become, according to an analysis published today (April 23) in Science. Overall, applicants with the highest-scoring grants published the most papers, garnered the most citations, and earned the most patents, researchers have found.

“This is the most important science policy paper in a long time,” said Pierre Azoulay of the MIT Sloan School of Management who was not involved in the research. When it comes to peer review, “most of the pontifications that you hear—most of the anger, editorials, suggestions for reform—have been remarkably data-free. So this paper, as far as I am concerned, is really a breath of fresh air.”

“[As] it turns out,” he added, “the NIH is doing a pretty good job.”

The process by which NIH grants are applied for, reviewed, and awarded has come under scrutiny in recent years. Among the concerns is that the large investment of time and effort by both the applicants and reviewers reduces the time both can spend doing research.

Another concern is whether the peer-review process actually works. “There is very little prior research on how effective peer-review committees are at deciding which grant applications to fund, and yet that is the major mechanism by which science funding is allocated in the United States and internationally,” said study coauthor Leila Agha of the Boston University Questrom School of Business.

To evaluate the efficacy of NIH’s peer-review process, Agha and Danielle Li of Harvard Business School tallied the scientific impact of more than 130,000 projects funded by R01 grants from across all of the agency’s institutes with the scores these projects received during review. Agha and Li assessed scientific impact according to the number of publications that acknowledged funding by the grant, citations to those publications, and patents that cited either the grant itself or a grant-funded publication. Overall, they found “the better the score that the [peer-review] committee had assigned, the more likely the grant was to result in a high number of publications, or in publications that are highly cited, or even . . . in research that ultimately gets patented,” Agha told The Scientist. “The results are suggestive that the committees are successfully discriminating even amongst very strong applications.”

This correlation persisted even after Agha and Li accounted for differences across applications, such as the year the grant was funded, or the principal investigator’s credentials—including publication history, institution, and prior funding history. This showed “that the intrinsic merit of a scientific idea is more valuable than the actual person,” said Aruni Bhatnagar, a professor of medicine at the University of Louisville who was not involved in the work.

The results “illustrate the ability of the NIH reviewers to identify which projects are going to be the most promising,” said Brian Jacob, a professor of education policy and economics at the University of Michigan. “It would have been a little worrying,” he added, “if they weren’t getting it right.”

This study wasn’t the first attempt to assess the federal agency’s funding approach. NIH’s own Michael Lauer, director of the division of cardiovascular sciences at the National Heart Lung and Blood Institute (NHLBI), previously found no correlation between peer-review percentile score and subsequent scientific impact in an analysis of funded NHLBI grants. Lauer suggested this was because his analysis looked only at new grant applications, whereas Agha and Li examined all grants—new and renewed.

“The literature suggests that experts do a much better job of assessing past and present performance than predicting what is going to happen in the future,” Lauer told The Scientist. Agha noted, however, that even when she and her colleague separated new grants from renewed ones, they saw similar results.

Agha and Li, who included awards from all NIH institutes, analyzed a greater number of grants than Lauer (137,215 versus 1,492) over a longer time period. This, too, may have contributed to apparent discrepancies between the studies. Differences aside, “the critical message,” said Lauer, “is that the conversation about peer-review of grants is moving into this sphere of rigorous science. Instead of having arguments about opinions . . . we’re debating actual data.”

Even if peer review is a satisfactory predictor of future scientific success, that “doesn’t mean the process can’t be shortened and improved,” said Jacob.

The results of this latest analysis indicate that such improvements can be built from a solid foundation, said Azoulay. “We’re not starting from a situation where peer-review is a complete disaster and where we might as well be picking projects out of a hat or throwing darts,” he said. In refining and reforming the process, he added, “there’ll be no need to throw out the baby with the bathwater.”

D. Li and L. Agha, “Big names or big ideas: Do peer-review panels select the best science proposals?” Science, 348:434-438, 2015.