Since 1990, scholars have been citing older articles (from Verstak et al. 2014. arXiv:1411.0275)

Scholars have been devoting more attention to older literature, a new study of the citation patterns in journal articles reveals.

The paper, “On the Shoulders of Giants: The Growing Impact of Older Articles,” by Alex Verstak and six employees of Google Inc. analyzed the age of citations from 1990 to 2013. They report that scholars are citing proportionally more of the older literature. Whats more, the trend appears to be increasing over time.

“Older articles” were defined as being 10 years old or older.

The researchers report that since 1990, the share of references to older articles has increased by 28%. Business, Economics & Management journals increased their share by 56%, and Computer Science by 39%. In contrast, Chemical & Material Sciences and Engineering both increased their share by just 2% and 3%, respectively.

The researchers explain their results by several changes in the way scholarly information is produced, disseminated, and discovered, including:

  1. A industry-wide transformation from print to online distribution
  2. Mass digitization of backfiles and journal archives
  3. Full text indexing, and
  4. Better search engines based on relevance ranking

While the results of this study are not novel (indeed, I covered several earlier studies on this topic several years ago), they are intuitive. For scholars, access to the literature has been been getting much, much easier– especially for older materials. As the effort required to identify and retrieve older literature declines, we should expect to see greater use of the older materials.

Most of us (publishers, software engineers, librarians, archivists, and scholars themselves) would accept the findings of this paper as good news, especially in a news cycle that focuses on negative stories (shrinking library funds, commercial publisher profits, licensing impasses, misuse of research funds, plagiarism and fraud in science, among others). We could all benefit from some good news. Everyone, please pat yourself on the back!

Now that finding and reading relevant older articles is about as easy as finding and reading recently published articles, significant advances aren’t getting lost on the shelves and are influencing work worldwide for years after.

Like the recent Google study of more highly-cited articles being published in non-elite journals, this research paper leaves out details that would allow me to make better sense of the data and ignores other factors that may explain their results. For example, there is no mention of article-level linking, facilitated by the Digital Object Identifier (DOI). While I understand that the authors of this study all work for a search engine company, for someone working in publishing or libraries, ignoring the DOI is a major omission.

Similarly, a lack of historical context makes the paper ripe for availability bias. The trend reported by the researchers begins in 1990, years before widespread use of the web and functioning relevance based search engines, like Google, and decades before Google Scholar debuted. Somehow, it feels like we’ve always had them. But 1990, you were likely to be using a 2400/bit modem. Feeling nostalgic? If you’re over 40, take short trip down memory lane with the sound of dial-up Internet.

During much of the 1990s, academic journals were print-based, although some publishers were experimenting with digital delivery. Internet connectivity was not ready to send large PDF files, so many early digital access to journal articles came on compact discs, arranged in towers or locked in deep library filing cabinets. Still, even in this exciting time of new tools, few publishers were looking backward into their archives. JSTOR would be the first major company to move into this void.

The first ‘Big Deal’ (bundled access to a publisher’s entire collection of articles) was introduced in 1996 by Academic Press, now part of Elsevier. It was several years before other publishers adopted similar package deals. When the library market became saturated, publishers diverted their attention to digitizing backfiles. Most of these mass digitization projects took place in the mid-2000’s.

My historical diversion is to point out that the trend to cite proportionally more older material has been going on for decades, as early as the 1960s, long before the Internet or digital publishing arrived. It is likely that these tools (along with email, Big Deals, mass digitization, article-level linking, repositories, and better search engines) are making it easier to find–and therefore cite–older relevant material; however, these new tools only explain the last little upward swing on Google’s chart.

Something much bigger is taking place.

Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

27 Thoughts on "Growing Impact of Older Articles"

Excellent post, pointing out that access to the literature has improved consistently for decades. Last night, coincidentally, I came across a quote from David Perlmutter: “If the 20th century was defined by a crisis in access, the 21st century will be defined by a crisis in authority.”

Perhaps citing older research is also occurring because authors are finding more clear authority in earlier works, and perhaps authority is harder to find today. As you say, there are many big questions to be considered in this intuitive finding.

My research indicates that most citations usually occur early in a paper, where the background for the reported effort is being described. By convention this background description includes a fair amount of history, with some citations going back several decades. Perhaps it has simply become increasingly fashionable to include more history in explaining the context for the research being reported.

Note that historical citation is not necessarily an indicator of direct influence on the research being reported, contrary to the above quotation. There is anecdotal evidence that many citations are only found in the process of writing the article, after the work is done. Thus the trend in older citations may not mean that researchers are going further back to get their ideas. On the other hand this trend has interesting implications for half-life computations, in the context of embargo mandates, so it may be important to understand its reason for being.

A development at several university libraries reporting the STEM literatures available in eBook packages like Springer are getting rapidly increasing use by faculty and students of chapter downloads. I know this isn’t exactly on topic, and its pure anecdotal but it’s another development I hope is going to be tracked. Long form anyone?

Yes, looking at the eBook as container offers new and possibly better ways to organize and disseminate certain information. The trick will be to think beyond the paper analogs (e.g. the novel). The economics of paper allow eBooks to be of any length from very short to very long as well as provide a far more expressive palette to the author, especially where ePub 3 is used.

Wait. This seems like it’s highly confounded with there just being older literature. Before 19xx there was no literature in field Y – choose your X and Y, so as any field develops you’ll see this trend.

This is an excellent point. If we use the “tree of knowledge” metaphor and consider that new fields are like twigs that form from established branches, your hypothesis would be true if these new twigs largely ignored literature published on prior branches and connected trunk. The Google team shows us an overall effect among all articles in their dataset (Figure 1), but also effects at the level of broad areas of research (Figure 2). We could test your hypothesis if we had graphs for each of the 261 subject classifications. Perhaps this is where the true answer is to be found.

Actually, all you need, I believe, is a seminal paper, not a whole new field. (Of course it’s not clear what field is to begin with.) My advisor, Herb Simon, once told me “never cite anything less than 50 years old that isn’t still being read”. Of course, most of those papers were his. 🙂 My point being, that as time goes on the seminal papers being cited will get older and older. So my current theory on this is that it’s mostly artifact.

I think your Seminal Paper theory is a good one and should be tested. Unfortunately, we don’t have the data from the Google study, nor lists of top cited papers, which may help us understand the nature of older citation. These growing impact of older articles may be concentrated among a small core of seminal papers.

In my “issue tree” model of science the number of papers that ultimately flow from a seminal paper grows exponentially for some time, until the seminal idea becomes more or less exhausted or gives rise to new seminal papers, which take over the growth (and hence the citations). As Price pointed out long ago, specific areas of research tend to exhibit S-shaped cumulative growth curves, first growing rapidly then slowing to something like nothing. When I was with DOE OSTI my team explored this phenomenon using a contagion model. If there are a large number of these growth areas going on at once, in many different stages of development, they might generate long term changes like this Google “aging citation” curve just by chance. Or there might be some structural change in progress. It might take some modeling to figure this out.

I have long been working on an “issue tree” model of scientific progress (and Simon was my advisor too!). Having precursors is as old as science so new research thrusts never come without lots of prior work to cite. See http://scholarlykitchen.sspnet.org/2012/07/17/how-does-science-progress-by-branching-and-leaping-perhaps/. We are after all only talking about a relatively small number of citations per paper, around 20 in many cases. I tend to agree with Jeff that this trend is an artifact, at least in the sense of not a change in how science is done, just how it is reported. (Mind you I am not claiming that this is what Jeff means by an artifact.)

Yes. “Artifact” isn’t the right word for the hypotheses. It’s more correctly a confounded alternate theory, I think.

Thanks for an interesting post, Phil.

A relevant article appeared in Science about a year ago, 4 October 2013, vol 342, pp. 127-132, “Quantifying Long-Term Scientific Impact” — where “long-term” might be up to 30 years! (So, well outside the usual Impact Factor contribution.)
http://www.sciencemag.org/content/342/6154/127.short

On page 130 of the article is the authors’ analysis of why NEJM’s IF might have gone up while Cell’s might have gone down: “Cell papers have gravitated from short- to long-term impact: a typical Cell paper gets 50% more citations than a decade ago, but fewer of the citations come within the first 2 years.”

No explanation about why this shift would happen (beyond the mathematical part of course) in terms of author behavior (citing more background as David suggests) or editor behavior (selecting articles that are knitting together stories told over time?).

I believe I’ve seen some documentation that suggests the number of citations IN a paper is going up (perhaps just as the number of authors increases). But if the growth is in historical papers that would be interesting to recognize.

(I hereby admit to not yet reading the Verstak et al. paper that Phil is writing about! I’ll bet the data to answer my question is inside there. It just went to the top of my reading pile!)

So, this has nothing to do with the fact that google scholar also ranks publications based on their previous links, benefitting older publications disproportionally? Newer publications per default have fewer links and thus end up low in the ranking. And scholars may not look so far down the ranking.

This is an interesting and some say undesirable feature of GS but it does not help much with a trend from 1990, or the 1960’s as Phil says, because GS is too new. It might relate to the small acceleration in the most recent data.

Phil kindly posted a link to this Kitchen piece on Gene Garfield’s SIGMETRICS listserv, where it got the following interesting response from Éric Archambault :

“This paper doesn’t present much we didn’t already know and in facts omits known explanations.

My colleagues Vincent Larivière, Yves Gingras and I suggested several years ago that this phenomenon was due to the fact that the scientific literature is not growing as fast as the growth of researchers using this literature and since researchers have plenty of time to “digest” what is being produced now, they are increasingly citing older articles.

http://www.science-metrix.com/en/publications/scientific-publications?title=&&search-submit=Apply&page=1#/en/publications/scientific-papers/long-term-variations-in-the-aging-of-scientific-literature-from

This was confirmed mathematically by Leo Egghe:

“”This paper proves two regularities that were found in the paper [V. Larivière, E. Archambault and Y. Gingras (2007).
Long-term patterns in the aging of the scientific literature, 1900-2004. Proceedings of ISSI 2007. CSIC, Madrid, Spain, 449-456, 2007].
The first is that the mean as well as the median reference age increases in time. The second is that the Price Index decreases in time.

Using an exponential literature growth model we prove both regularities. Hence we show that
the two results do not have a special informetric reason but that they are just a mathematical
consequence of a widely accepted simple literature growth model.””

https://doclib.uhasselt.be/dspace/bitstream/1942/9283/2/showing%202.pdf

Eric”

I wonder whether some authors may choose to cite older articles because these may not be behind a paywall and may be more easily accessed when writing papers. Some journals with paywalls may open access to articles a year or two after their publication.

While I don’t take issue with these observations the disproportionate increase of 56% in Business compared to only 2%-3% in hard science reflects both the lag in digitization of the social sciences combined with the relatively recent increase in discovery via Google over time. Years ago JSTOR proved that older content that is readily accessible will be used. I expect the rate of increase will continue to rise for a growing global audience until we’ve reached a balance point (which apparently the sciences already have).

But as Phil and Eric both point out, this trend is far too old for digitization to be a factor, except perhaps for the slight acceleration toward the end.

We (CIBER Research) in collaboration with Carol Tenopir’s group interviewed for our Trust In Information Sources project 90 researchers in the sciences and social sciences in the UK and the US. There was a section on citations in the semi structured questions we raised. Unfortunately the paper concentrating on the results is not yet published. There is some information at http://www.ciber-research.eu/download/20140115-Trust_Final_Report.pdf. The great majority of researchers started by citing seminal papers and sometimes books.

Anthony

Here’s something else in line with a longterm trend of increasing use of old literature.
There’s a steady increase in cited half lives for Annals of Mathematics. Rounded to the nearest year, half lives for citing year:
1953 – 4
1963 – 8
1973 – 14
1983 – 18
1993 – 22
2003 – 26
2013 – 29

People normally cite a few papers that are decades old, that is standard practice, but for more than half the citations to be over 29 years back is incredible. Unless it were a history journal of course. This is such an extreme case of the trend in question, going from 4 years to 29 years, that it might be a good place for detailed analysis, to try to see what is causing the change.

This is very interesting and the low rate of increase in chemistry and (I suspect) structural biology correlates with studies at Caltech for the 10/1979-9/1980 fiscal year. We found that over 90% of the references to journal articles were published in the previous 10 years.

Comments are closed.