It is well known that the distribution of citation counts is highly skewed, with a few scientists receiving many citations but with most receiving very few. What is less well known is that when these counts are aggregated by institution, and then by place, these distributions become even more extreme, with most citations being associated with individuals in a small number of institutions in an even smaller number of places and countries.
To demonstrate this geographical concentration, a source is needed for data that can be aggregated. The Institute for Scientific Information's HighlyCited database(www.isihighlycited.com) is such a source1; in December 2002 the database comprised the top 100 or so cited individuals in 21 scientific fields. Here, I use it to illustrate the geography of scientific citation.
I must qualify the analysis: The source has many limitations, as the data used exclude mathematics, the social sciences, and the humanities, and are thus biased towards the medical sciences. Moreover, the database is under rapid development, having almost doubled its size (as of June 2003) since the date of the analysis (December 2002).
The analysis reveals a remarkable concentration pattern: 1,222 scientists work in 429 institutions, which are located in 232 places in 27 countries. Almost half these researchers are in 50 institutions in five countries, with most in the United States. The top 20 institutions are listed in terms of the number and percentage of scientists cited; these 20 institutions employ nearly 30% of them. The concentration increases as the data are aggregated from institution, to place, and then to country. The top 10 locations in terms of the number of scientists and the areas where they work are also shown. In increases, I have computed the relative entropy, R = 1 - (H / Hmax), where H is the Shannon entropy defined in the usual way as H = -Sj pj ln pj, where pj is the proportion of citations in an institution, place, or country. This statistic varies from 0 to 1, where 0 represents a completely dispersed (even) pattern of citations, and 1 represents all citations as being concentrated in one institution, place, or country. For institutions, R is 0.23, increasing to 0.36 for places, and then to a massive 0.79 for countries.
A graphic indication of this basic pattern is illustrated in the figure above, where I have mapped the main locations of places by circles proportional to the number of cited scientists. These locations bear out perceptions of where the world's top institutions are most heavily concentrated: four US cities on the West coast; the Washington-to-Boston area; Chicago; the cluster of towns around Research Triangle Park, NC; and in Europe, central London.
I have not yet examined the local detail of where these institutions are located, but casual knowledge suggests that these are even more highly clustered at ever-finer scales. For example, the institutions in Boston are all within a two-mile radius of the MIT (Massachusetts Institute of Technology) Museum, and in London, they are within a three-mile radius of the British Museum. On an even more local scale in central London, for example, the majority of the cited scientists can be found within a half-mile of Euston station in Bloomsbury.
Although analysis is limited by the bias in the ISI data to English-language publications, to the medical sciences, and to full-time research rather than education, I consider that these findings have important implications for national educational policies, for the choice of the best graduate schools, and so on. I do not yet know how robust these indicators of geographical concentration actually are, though I suspect they will not change very much on an annual basis. I suspect that from year to year, however, there may be considerable volatility in the actual names of those who form the HighlyCited database, but once the data are aggregated across institutions, places, and countries, such volatility will begin to disappear.
What I am most interested in, however, is how different places and countries are changing over decades rather than years. This will give me some idea of how global research quality is changing, which is of central importance to science policy everywhere.
Michael Batty (firstname.lastname@example.org) is the director of the Centre for Advanced Spatial Analysis, University College, London.
A more detailed commentary based on this work is published as "The geography of scientific citation," Environ Plan A, 35:761-5, 2003.
Further details of this analysis are given on the author's Web site www.casa.ucl.ac.uk/citations.