As the amount of scientific research proliferates and science becomes more costly to produce, funding agencies around the world are increasingly interested in objectively assessing the quality of academic research. Several governments with centralized academic funding mechanisms (e.g., the United Kingdom and Australia) have already implemented research evaluation systems and distribute at least a portion of research funding on the basis of quality assessments. The National Science Foundation is also being pressured by Congress to better assess the results of investments in research. Given this reality, it is important for scientists to understand the strengths and weaknesses of research quality assessment methodologies and to contribute to the ongoing international debate about their appropriate use.
Of course, scientists have always been in the business of evaluating research. Although peer review has a long history, many studies suggest that it is, at best, extremely imperfect.1 To any researcher who has had one reviewer rate a submission as outstanding while another dismisses it as rubbish, the major weakness of peer review should be obvious.
There really is no decent alternative to peer review, however, when evaluating the quality of a single work. It is when evaluating a collection of work, such as that produced by a department, or by an individual over a career, that alternative evaluation methodologies, especially citation and publication analysis, become relevant. The question is whether these alternative analyses really improve on existing evaluation methods. This is important because scientific productivity is increasingly being measured through publication and citation data.
Assessing National Scientific Productivity
Counts of publications are already published frequently. One rarely questions international comparisons. The United States produces the highest percentage of publications in each field, although it dominates some fields, such as neuroscience and behavior, molecular biology and genetics, and biochemistry. Various other nations produce a relatively high percentage of world research in certain fields: Japan in biochemistry, biotechnology and applied microbiology, and chemistry; Germany in physics; and the United Kingdom in molecular biology and genetics, and microbiology.
These data are valid, but care must be taken with their interpretation. The data discussed so far, for example, confirm that the United States produces a major portion of the world's science in these fields, but they do not show that the United States is more productive in science than other nations. Per capita data are more appropriate to address that. Adjusting for population reveals that Switzerland, Sweden, Israel, and Denmark all produce a larger share of world publications relative to population than the United States in each of these fields, and the United Kingdom and Canada are more productive than the United States in most of them.
Regardless of which data are examined, however, most would agree that publications are an important outcome of scientific research. There are, however, two important questions to ask: First, should publication and citation data be used to evaluate research quality, and second, is the same type of data with which conclusions are reached about international research activity also appropriate for assessments of institutions, departments, and individuals?
Producing a lot of research is not the same as producing good research. Most judgments about the quality of published research are based on the perceived quality of the article or book itself, or failing that, by ascribing to the research perceptions about the journal or publisher that prints the research. Of course, most judgments about article or journal quality are inherently subjective. Citation analysis provides some degree of objectivity for assessments of research quality or impact. The idea is that an article or journal that is cited by many researchers has, in some way, made a significant contribution to science.
Of course, there are many criticisms of this approach. Some common ones are that "bad" research accrues many citations as people attack it; citations are perfunctory; many highly cited articles are "only" methodological papers or reviews; "good" research may not be recognized for years; and technical problems with the Science Citation Index, the best source of data, make it inappropriate to use. All of these issues, and many others, have been considered in depth.2 The reality is, however, that when you count the citations researchers, journals, departments, or institutions receive, the identities of high achievers are rarely surprising--they are generally those that attain distinction by many other measures.3 A strong record of publication and citation attainment is an accomplishment that most academic scientists agree is desirable, and by analyzing data drawn from the Science Citation Index, these accomplishments can be measured accurately.
Institutional and Departmental Quality
A substantial body of literature (often published in Scientometrics) suggests that citations are a valid measure of research quality or impact. Assuming their validity for a moment, does it follow that institutions and departments attaining many citations are "better" in some way than those that garner few? Perhaps, but there are issues that need consideration.
Most importantly, comparing raw counts of citations obtained by different departments within a university isn't useful because some fields tend to produce shorter (and more numerous) articles, or simply have a tradition of more thorough citing than do other fields. Also, larger fields have more researchers to potentially act as citers than smaller ones. In short, many factors other than departmental quality might account for differences in raw counts. A better strategy is to compare the percentage of all citations in major journals in a particular field garnered by a department with the analogous measure calculated for a different department in another field. Citation data make such comparisons possible. Without intimate knowledge of specific fields, for example, a university dean might find this information a useful supplement to possibly embellished claims by chairs to deans about the quality of their departments.
Even if a department garners many citations, it is worthwhile to examine the distribution of them. Does a department have only a few high achievers with a disproportionate share of citations while the remaining faculty are much less productive, or are citations distributed relatively evenly among faculty of equal rank? It is likely that the good reputation of a department characterized by numerous solid achievers will be more enduring than a reputation based on the achievements of a single superstar. In the absence of citation data, however, such realities are difficult to gauge for those who lack detailed knowledge of specific fields, but whose jobs nevertheless involve judgments about research performance and resource allocation.
While national and institutional performance measures tend to be accepted as valid, evaluating the performance of individual scientists is more controversial. Of course, researchers' performances are routinely assessed in academic institutions, and impressionistic judgments about research quality are central to these evaluations. Supplementing subjective assessments with readily available publication and citation data seems desirable.
Studies are already being conducted that use publication data to evaluate the impact of NSF funding on research output. Scholars are also turning to citation data for tenure battles (and it is often true that those denied tenure have more citations than those who already have tenure). Also, arguments about gender-related or other salary inequities are stronger when research productivity data are included in the equation.
When it comes to evaluating individuals, however, a fundamental problem with citation data is that most scientists attain relatively few citations. There is no good reason to believe, for example, that a scientist with three citations is better than one with only a single citation. Yet three citations is more than many scientists attain during their whole careers. Citation data, which are always highly skewed, are most relevant in analyses involving aggregate units, or in analyses of academic elites.
While citation analysis might be very relevant to evaluating accomplishments at research universities, publication counts (perhaps adjusted for journal quality) might be more suitable for studies at institutions where the central focus is not on research. Regardless of whether citations or publications are most relevant, however, good data for analysis exist and should be a central component of any research assessment.
The scientific community would do well to understand and be actively involved in developing appropriate assessment methodologies. If we fail to develop acceptable performance measures ourselves, we should hardly be surprised if others develop them for us.
Thomas J. Phelan (email@example.com) is director of the Science and Technology Research Project and social sciences computing at the University of California, Los Angeles.
1. J. Campanario, "Peer review for journals as it stands today, Part 2," Science Communication, 19:277-306, 1998; or S. Cole et al., "Chance and consensus in peer review," Science, 214:881-6, 1981.
2. T.J. Phelan, "A compendium of issues for citation analysis," Scientometrics, 45:117-36, 1999.
3. E. Garfield, "Random thoughts on citationology: its theory and practice," Scientometrics, 43:69-76, 1998; or H.A. Zuckerman, Scientific Elite: Nobel Laureates in the United States, Brunswick, N.J., Transaction Publishers, 1996.