FLICKR, THOMAS HAWKThe United States currently spends about 2.7 percent of its gross domestic product (GDP) on research and development, about half of which comes from federal sources. This amount is comparable to annual expenditures on transportation and water infrastructure (3 percent of GDP) and on education (5.5 percent). The magnitude of the investments required for maintaining the scientific enterprise have resulted in calls for a quantitative assessment of the impact of the contributions of individuals and institutions, so that policy makers are persuaded that resources are being used effectively.
Despite its importance, whether and how to quantify scientific impact remains a source of controversy within the research community. For example, the San Francisco Declaration on Research Assessment has promoted “the need to eliminate the use of journal-based metrics, such as journal Impact Factors, in funding, appointment, and promotion considerations.” I find it surprising that a scientist would propose a move away from measurement and quantification when these activities are at the core of science itself. I believe that when considering an imperfect but necessary tool, the right course of action is to seek to improve it, rather than to discard it. The scientific community—and especially the funding agencies—should support the development of better bibliometric evaluation tools rather than oppose their use altogether.
There is a long history of using bibliometric-based measures to quantify scientific production and impact. Opponents of such measures, including many prominent scientists, have recently urged the scientific community to return to the “gold standard” of peer review. Underlying this recommendation is the “hypothesis” that, if two intelligent, unbiased evaluators were to read the papers of, say, applicants for a faculty position, they would draw the same conclusion about which candidate was best for the job.
This is a naive, unsustainable position. Anyone working as an editor of a journal, or as a member of a selection or promotion and tenure committee, knows how broadly the ratings of papers, individuals, or proposals vary across reviewers. Indeed, Case Western Reserve University’s David Kaplan and his colleagues have demonstrated that one would need tens of thousands of independent unbiased peer-evaluations in order to obtain an accurate ranking. And the number of reviewers needed is not the only limitation of peer review as a measurement process. Like all humans, scientists have biased views of students, collaborators and competitors. Sadly, being an expert is not a guarantee that those biases will be absent; it is only a guarantee that one will be convinced that one is right.
Scientists are already being evaluated using bibliometric-based measures such as number of citations or the h-index. An indisputable fact is that most bibliometric-based measures are very strongly correlated with one another, suggesting that they capture something real and important about the impact of bodies of work.
There are, however, two major challenges when addressing scientific impact. First, scientists benefit from being perceived as having a large impact. Thus, a sound measure of scientific impact must resist manipulation. Second, a sound measure of impact must, arguably, reward quality over quantity. Indeed, one of the risks of the broad use of bibliometric-based measures is the change in publication patterns of scientists with the goal of inflating their own apparent impact. These manipulation efforts are likely related to observed increases in publication rates and in questionable self-citation practices.
Moreover, while citations are arguably the most trustworthy indicators of scientific impact, the number of citations of single papers spans more than five orders of magnitude, with the most highly cited papers having hundreds of thousands of citations. The broad range of observed number of citations and Columbia University’s Duncan Watts and his team’s 2006 findings on cultural markets suggest that the process by which a paper’s quality gets translated into citations is almost certainly driven by “rich get richer” dynamics.
It may be possible to overcome these challenges, however, if one possessed a rigorous characterization of the statistical properties of the number of citations to scientific papers. Using the functional form of the distribution of number of citations, one could develop a principled approach to the development of measures of impact.
My lab has demonstrated that the logarithm of the number of citations to a papers published in a journal converge to an ultimate value within about 10 years. Remarkably, the distribution of the ultimate number of citations to papers published in a scientific journal converges to a discrete lognormal distribution with stable parameters μ and σ, which are analogous to the parameters with the same name for a Gaussian distribution. Our results suggest that there is a latent quantity—which one might denote “citability”—that determines a paper’s ability to accrue citations.
Even though we lack a deep understanding of the concept of scientific impact, our burgeoning understanding of the dynamics of citations will enable the development of measures that are objective, easy to calculate, resist manipulation, and foster desirable publication behaviors. With such measures in hand, we will be able to finally uncover the individual and institutional conditions that foster significant scientific advances, and help policy makers and the public to become confident that resources are being used wisely.
Luís A. Nunes Amaral is a professor of chemical and biological engineering, physics and astronomy, and medicine at Northwestern University, where he co-directs the Northwestern Institute on Complex Systems.