Opinion: Mutations of citations

Just like genetic information, citations can accumulate heritable mutations

Sep 16, 2010
Christian G. Specht
Few scientific studies have attracted as much attention as the "Cleavage of structural proteins during the assembly of the head of bacteriophage T4", linkurl:published 40 years ago;http://www.nature.com/nature/journal/v227/n5259/abs/227680a0.html by Uli Laemmli. Referenced an estimated 2 x 105 times (about 15 daily citations), it is unavoidable that the article is often cited incorrectly. Indeed, database searches reveal more than 600 variations of the correct reference linkurl:(ISI database).;http://wok.mimas.ac.uk/
Figure 1A. Sequence alignment of the correct citation (#1) and a selection of
citation variants (#2-10), comprising the author's name, journal, volume, first
page number and year of publication. Sequence identity is indicated in grey.
Click linkurl:here;http://images.the-scientist.com/content/images/general/figure1a-1.jpg to see a larger version of this image.
Wrong citations (WCs) contain errors in the sequence of letters and numbers that make up the correct citation, including the name of the author or journal, the volume and page numbers or the year of publication (see examples listed in Fig. 1A). The omission, addition or replacement of one character on the keyboard by another lead to variations that can be described in genetic terms and classified as deletions, insertions, point mutations and inversions of characters, or as complete nonsense mutations.
Figure 1B. Incidence of spontaneous WCs in which
the page number is incorrect (Laemmli, U.K. (1970)
Nature 227, 600 through 700). The most common
errors are inversions (680 to 608) or the replacement
of a number with one of similar shape (680 to 630) or
value (680 to 681). Note that the number of correct
citations (estimated at 2 x 105) exceeds the
capacity of the ISI database (216 = 65536 'cytes').
Click linkurl:here;http://images.the-scientist.com/content/images/general/figure1b-1.jpg to see a larger version of this image.
While many citation variants are unique, others are found hundreds of times (see ISI database and examples in Fig. 1A). Which, then, are the principles that govern the distribution of WCs? The incidence of a WC can be explained by the likelihood that a certain character is mixed up with another character. For example, the shape of the number 8 is more similar to a 3 than to a 2; hence these spontaneous events happen at different rates (Fig. 1B). Nonetheless, when searching for incorrect references of Laemmli's article on ISI, citations in which the page number deviates by one are much more common than those with a similar alteration of the year (> 10 fold, Fig. 1A). In this case WCs do not occur in a purely stochastic fashion, since the year bears significance to the typing scientist, thus increasing his proofreading activity. Other WCs, on the other hand, are more frequent than one might expect from their unusual sequence (e.g. #10 in Fig. 1A, which has appeared 11 times since 1983). Since these are often found in publications that cite one another it seems safe to assume that they represent inherited WCs (Fig. 1C).
Figure 1C. Tracing of WCs to an ancestor from 1983
(#10 from Fig. 1A, occurrences 1-9 are identified
by research location and year). Inherited WCs are
generally transmitted between overlapping groups of
scientists within the same institution (boxes) or with
shared research interests (dashed lines). Lineages
are easily identified in articles that cite a previous paper
containing the WC (black lines), although this may involve a
missing link that does not contain the WC itself (e.g. 4 to 7).
Click linkurl:here;http://images.the-scientist.com/content/images/general/figure1c-1.jpg to see a larger version of this image.
In summary, citation variants arise through a variety of mechanisms similar to those described by molecular genetics. They are heritable between scientists and offer exciting insights into the transfer of knowledge. The high incidence of wrong citations reflects the fact that the contained information is to a certain extent redundant and may thus tolerate many mutations. However, it is possible that in the future the number of wrong citations can be minimised by using reference software tools - provided that the database entries are correct in the first place.Christian G. Specht is a neurobiologist working on learning & memory and currently based at the ENS in Paris.Editor's note (October 20): This article generated some online discussion, prompting a response from the author linkurl:here.;http://www.the-scientist.com/news/display/57698/
**__Related stories:__***linkurl:Online access = more citations;http://www.the-scientist.com/blog/display/55437/
[19th February 2009]*linkurl:More articles, fewer citations;http://www.the-scientist.com/blog/display/54839/
[18th July 2008]*linkurl:A new proposal for citation data;http://www.the-scientist.com/blog/display/54402/
[4th March 2008]