Theory of Citing — Handbook of Optimization in Complex Networks DOI: 10.1007/978-1-4614-0754-6_16
The great majority of misprints are identical to misprints in articles that earlier cited the same paper. The distribution of the numbers of misprint repetitions follows a power law. We develop a stochastic model of the citation process, which explains these findings and shows that about 70–90% of scientific citations are copied from the lists of references used in other papers. Citation copying can explain not only why some misprints become popular, but also why some papers become highly cited. We show that a model where a scientist picks few random papers, cites them, and copies a fraction of their references accounts quantitatively for empirically observed distribution of citations.
Without having read the article, I wonder if they assume that the misprints mean that the authors lazily copied the reference without having studied it, or if it onsiders the fact that the citation format might be copied but the authors did indeed read the studies. Previous citation (together with RSS/eTOC for recent articles) is the main source of acquaintance with previously unknown articles. Currently highly cited articles are more likely to be even more cited in the future…
I don’t see a priori a problem with 100% of the citations being cut-and-pasted from previous publications — the problem might be e.g. if they are in blocks, or look much similar to a google scholar search, or otherwise suggest that the authors preferred to use the references as an argument from authority instead of careful election of a few that illustrate the point or review well previous work (i.e. quantity instead of quality). The fact that the citations contain the same typos may point to an irresponsible copy, but it might also mean that the authors are not confident/did not have a preference or strong reason to cite them otherwise — e.g. when the citation is not unequivocal (“lecture notes in CS” or “RECOMB”? “Molekuliarnaia biologiia” or “Molecular Biology”, which will furthermore change the page numbers?) or the article does not have enough information (a scanned pdf file without the issue/page info where I must rely on other sources for the proper reference)…
I also wonder if the authors compare their model to an alternative model (it’s easy to come up with many models that somehow fit the data; the hard part is to quantify which of them is better). I could think of a model of citation misprints appearing at random and spreading by drift, which tells nothing about the motives in copying them from one article to another. The easiness at which we can check for the typos (and therefore improve our citation practice) is relatively recent, with the internet/pdfs/doi/mendeley/scholar explosion.
IOW, I don’t think we can detect lazy referencing by looking at misprints (if I am not making a straw man of their argument by reading just their abstract, that is…).