We all want to be scored. We want to know exactly where we stand. We want to know how much people like us. In other words, we want metrics. I am not an expert on human behavior so I really can’t explain the science behind this but it seems a universal human condition.
Despite there being solid evidence that workers experience anxiety and are demoralized by being ranked in performance evaluations, companies face lots of resistance from staff to get rid of ratings and rankings.
Despite a common sense push to promote students in school based on ability and not age, we continue to see no creative way around giving grades and having kids take standardized test to rate how they compare to their peers.
We are comfortable being rated and ranked and we insist on protecting the metrics that currently exist. Added to performance metrics are social metrics. How many “impressions” did my tweet get? What about “total engagements?” Did my “thought” perform better on Facebook? How many “reactions” did I get? How many of those were positive and how many negative?
With the exception of professional athletes, who literally make a living on whether they have positive stats, the only other profession that seems absolutely enamored of metrics are academics.
All metrics come with an asterisk, a caveat, a grain of salt — or several grains of salt. What follows is a review of some of the strengths and limitation of different metrics around citation, attention, and usage. For each product or platform, I am making a judgement on how many grains of salt one should keep in mind when using the data. I decided to use this source for measuring the number of grains needed:
- A pinch is a thousand grains of salt;
- A cup holds a million grains of salt;
- A bathtub holds a billion grains of salt;
- and a classroom holds a trillion grains of salt.
Citation information from Crossref is fast and relatively accurate but dependent on publishers depositing XML references with their Crossref deposit. How it works is that participating publishers deposit their tagged and parsed references for each paper/chapter/etc. to Crossref. If the references are complete and properly formatted, Crossref tallies the citations from that reference list. Questions remain about completeness of the literature — really old papers for which there are no Crossref records will not be deposited and not all publishers are depositing the reference data.
Users may find Crossref citation information displayed on journal article pages either with a simple number of citations or a complete “cited-by” list. Crossref makes the information easy to access via an API. Sharing sites or indexing cites may also use Crossref citation information but publishers must agree to share the data with the third party. So even with third party platforms using Crossref citation data, users may see different numbers if a publisher chooses to opt out of allowing a third party to access their data from Crossref.
Grains of Salt—a pinch: For older papers the citation counts may never be accurate if backfile content is not digitized or if it is digitized but references are not deposited into Crossref. “Most Cited” lists using Crossref data may be based on citations over a specific period of time as opposed to all time. This is not always clear on the publisher or index site.
Web of Science
Citation information from Web of Science (WoS) is accessible by subscription only. That said, some publisher and index sites may use WoS data to display “cited by” information if they have paid to surface that information. WoS only has citations from publications indexed by WoS, which includes about 12,000* journals. The actual number of journals in existence is a hotly debated number, but the 2015 STM Association Report estimated at about 28,000. WoS publishes a list bi-weekly of which journals are included in the database.
Grains of Salt—a bathtub*: Due to the limited size of the WoS database, much caution should be used when it comes to their citation metrics. The citation database only goes back to 1990, so older content will not be included. Because WoS uses publisher supplied metadata for all different purposes, the processing time seems less automated and there can be a lag time of several months before supplied articles are found in WoS.
*UPDATE: I have been informed that the WoS database includes 28,000 records with 20,000 of them collecting citation information. I don’t have a URL with this information, it comes from Clarivate Analytics staff. Given that the breadth was larger than originally reported in an earlier version of the post, I will upgrade the “grains of salt” status to a cup. 2/9/2017: I was contacted by a different Clarivate employee who provided these numbers: The full database has about 33,000 but the Core Collection only has 17,500. All of the citation metrics come from the core collection. Further, some selected journals go back as far as 1900. You can read more about the process here.
The Scopus provides a platform for subscribers to access raw citation information and generate their own analysis and the recently launched CiteScore adds another level of metric crunching. While this is a spiffy new tool, it sits on existing Scopus database functionality of reporting citation counts. Scopus includes 23,000 journals, many more than WoS. Scopus publishes a list annually of which journals are included in the database.
Grains of Salt—a cup: While bigger than WoS, Scopus still omits many journals and is only available via a subscription. Content goes back to 1996* so citation information is missing for older content. The processing of publisher supplied content into Scopus seems less automated, again because specific file formats and tagging are not required. There can be delays in finding newly published content in Scopus, which will also cause slight delays in citation updates as well.
UPDATE: Scopus has been adding back content with indexed citations. The database currently includes content citations back to 1970. Metadata records without citation data is available back to 1900.
Google Scholar gets behind-the-paywall access to journals that agree to be crawled. I cannot find any statistics or reports about how many publishers allow this. Google uses the full-text crawl (of paywalled and open content) as well as data from their patent database to count citations. Anyone can access the data. In addition to the patent citations, Google Scholar has been adding citations from other sources such as policy documents and preprint servers.
Google Scholar does not provide a list of which publications or web sites they are indexing or collecting citation data from, though some have tried to make a guess. Unlike WoS and Scopus, there is no curation process so everything gets swept into the metrics, regardless of whether it is helpful or appropriate.
Grains of Salt—a classroom: While Google Scholar is free, giving it a leg up in accessibility, it is not at all transparent on what is actually being counted as a citation. Citations are updated when they update Google Scholar. I know that sounds redundant, but the frequency at which things get updated is a question. For example, here at ASCE, we noticed some issues with how some content was displaying in Google Scholar. We made some changes that they suggested and were told that when they re-index the site later this year, those changes will be reflected. This seems to indicate that data is not updated in real time. In fact, on the Google Scholar Citation page, it warns authors that any fixes to citations “usually takes 6-9 months for the changes to be reflected in Google Scholar; for very large publishers, it can take much longer.” Google Scholar is also not the cleanest database. Author disambiguation and collapsing multiple versions of a paper into one record can be sloppy. It’s very possible for the same paper to appear as multiple papers, each with their own citation counts (which are not at all clear on duplication).
With the exception of professional athletes, who literally make a living on whether they have positive stats, the only other profession that seems absolutely enamored of metrics are academics.
Alternative Metrics and Social Sharing Sites
Altmetric is probably the most popular tool for measuring “attention” metrics. In addition to citation data, it collects social media mentions, mainstream media mentions, public policy documents, and “saves” on reference management sites—namely, Mendeley. They are also now counting mentions on post-publication peer review sites (such as Publons and PubPeer), Wikipedia, and social media sites like Reddit and YouTube. Altmetric does perform some “weighting” of the mentions. For example, an author blogging about their own work counts less than Gizmodo blogging about the same work.
Grains of Salt—a cup: Altmetric is not always transparent about how the little donut score was created for each paper and the weighting is unknown. Further, they continue to add sources or change their rules about weighting as more resources become available. This is not a bad thing but it certainly means that the Altmetric score on a paper could change significantly over time even if no one is currently talking about the paper anymore.
ResearchGate, a social sharing site, does provide citation counts for papers but it is not clear where this information is coming from. On a spot check, it does not seem that ResearchGate is using Crossref data and may instead only be counting citations within the full text article shared on their site.
Grains of Salt—a classroom: This is a big black box with no idea what happens on the inside.
Downloads and Usage Metrics
While it may seem to make sense that the most cited papers are the most downloaded papers, this is not often the case. Still, authors are asking for more and more information about how many times their papers have been downloaded. This shift is indicating to me that download counts are being included on CVs or tenure and promotion packages. For clarity, I am counting “download” as a full-text download of the PDF or a view of the full-text HTML.
Download information is only available from the journal platform. Journals that are subscription based journal should provide consistent statistics following the COUNTER rules for reporting. COUNTER rules require an apples to apples comparison across publishers as well as eliminating usage by bots and spiders. Not all open access publishers are following COUNTER rules as they are not beholden to librarians for usage statistics.
Grains of Salt—a bathtub: If a journal moves from one publisher to another — or if that publisher moves from one journal platform to another — download history may be lost. The total downloads are also deprecated due to papers being shared on private groups (like in Mendeley) or commercial sites such as ResearchGate, and of course illegal hosting sites like Sci-Hub and Libgen. The STM sharing principles calls for building usage stats across legal sharing networks but that infrastructure has not been built yet.
Total downloads (or full text access counts) are also muddled by versions of articles in institutional repositories, funding agency repositories, and preprint/archiving servers. If an author has posted a version of the paper on multiple web sites, he or she may need to go to every site for which downloads are available and add them up.
Can We Fix This?
No. Authors want to know about the impact they are making and for some of these metrics, their career advancement depends partly on this information. That said, we are fully immersed in a culture of sharing and authors are heavily encouraged and in some cases mandated to put multiple copies of their papers into multiple platforms. While this “spread the seeds” approach may sow more visibility, it can also makes quantifying the citations, views, and attention impossible.
What is important is to let authors and editors know the limitations of the data that are available.
15 Thoughts on "How Many Grains of Salt Must We Take When Looking at Metrics?"
An interesting and enjoyable read, thanks. I will only comment that the different databases perhaps should have different amounts of salt allowed depending on the use you intend to make of them – for instance, I find Google Scholar the best overall tool for searching the scientific literature to discover new and/or relevant research, but I agree it would be a terrible tool to make granular assessments of single individuals. (Would it be that bad for making overall comparative assessments of large numbers of people/organizations, though? I don’t know the answer, but perhaps the false positives would balance themselves out in the end.)
I’d also like to suggest that if the default quantum is a pinch, then the other amounts should be defined as kilopinches, megapinches, etc. – let’s go for a new SI unit of scepticism here.
I really like your grain of salt scale, Angela, and will share your article with my students. By coincidence, just last evening we covered publication metrics. It was great to have Sara Rouhi from Altmetric.com join me to cover all the various types of metrics as you did. [Aside: Although I covered the Scopus metrics I missed the new Scopus Citation Tracker. Good to know.]
Despite the limitations, as you pointed out, the use of such metrics will not go away any time soon. Remembering a time when all we had was the Impact Factor it seems the area of metrics will only grow larger. Looking through the editorial lens, students offered up some ways that metrics could assist publishers. Those were: 1) to track trends especially with regards to fast-moving areas of research; 2) help authors to promote their own work and not just rely on the publisher to do all the heavy-lifting; 3) using altmetrics could help in identifying fraudulent or out-of-context hype which might need to be countered quickly; and 4) using all metrics available periodical publishers could highlight articles on their websites which could create additional citations/postings and generate a positive feedback loop.
Excellent post. The decisions with anything relating to data start with how it’s collected and even the assumptions that undergird its collection, and track all the way through to interpretation. Too often, we think a number is pure and unfettered by moral or logical assumptions, value systems, incentives, or countervailing information. Interpretation is a key step that gets glossed over too often. The way Altmetric weights tweets vs. news stories still bothers me, for example, as a news story in a major publication certainly travels more widely than merely the 16x difference I think their algorithm gives news stories. It’s probably more like 1,000x. Empirical studies could help move us beyond assumptions, which can be set in stone too early.
Publishers are increasingly interested in having a platform-independent source for usage statistics, especially as the platform square dance continues. As you note, a platform change can prove disruptive to usage stats and long-term comparability. There are also solutions being developed that will make publishers even less dependent on platforms for usage data.
Libraries are increasingly aware that vendors offering content aggregations while also offering analytics solutions need a clear firewall between the practices. And with the same article appearing on various platforms, libraries are looking to compare value and usage across platforms (primary publisher, aggregators). Even basic usage reporting has different layers and complexity depending on your perspective (buyer vs. seller, so to speak).
Kent, you refer to platform-independent solutions to measuring usage. What would these be? Thanks!
Hi Phil. With publishers spreading their content across multiple platforms, there are analytics solutions that aggregate these data and normalize them. We are offering something like this, and there are others. In addition, relying on the back-end COUNTER data from a platform may not be required in the future as other technologies come to bear that take the data from the front-end instead. This could help with both the timeliness of the data, as well as making platform moves carry less risk to data acquisition and structure.
Kent, count me among those who are concerned about conflicts of interest as some content providers move more into the analytics space, but I don’t recall seeing very much suggesting that libraries are increasingly aware of or vocal about this issue. I’d love to see more of this and would welcome any pointers you can share.
I strongly question the assertion that we all want to be measured. If that were really true, then many of us would not seem so hyper-sensitive to the failings of metrics. I believe that those who are most concerned are the ones who dislike their standings (of even the idea that their lives can be quantified). No measurement is perfect, but because of that, professors argue about metrics. Could it be that they secretly hate receiving grades, that they prefer to dish them out to students instead?
What inspired me to approach this topic at all was work we were doing here at ASCE to get our platform launched on an upgraded site. This involved a pretty substantial redesign and a review of everything being presented. It became difficult for us to determine how the Most Read list was being generated. We also added download counters on articles, which is fantastic as more and more authors are asking us for the number of downloads, but there is no way to add a caveat that says these are the downloads for the last 5 years only (historical data was lost when we changed platforms). We present “cited by” information from Crossref as well as a Most Cited list. But again, most cited this year, for a span of years, for all time? Any of those are possible but we should make it clear to the user what they are seeing.
We also get questions from authors about why they see more citations for their paper in Google Scholar than on our page. Some ask if our download counter includes downloads from places like ResearchGate (um, no!). Being more transparent about what is being measured, what is missing, and where else one would need to go to find additional metrics would benefit inquiring minds.
I think Angela is exactly right to highlight the role of transparency in promoting trust. Many altmetrics stakeholders, including altmetric & Mendeley, worked with NISO to define a set of best practices & provide more transparency in how their metrics are derived/reported.
You can see the gory details behind the metrics for Mendeley, Crossref, & others here: http://www.niso.org/apps/group_public/download.php/16121/NISO%20RP-25-201x-3,%20Altmetrics%20Data%20Quality%20Code%20of%20Conduct%20-%20draft%20for%20public%20comment.pdf (PDF)
Angela, thank you for one of the best SK pieces I’ve read in a while. I’m dealing with much of the same problems as you are at ASCE so this was especially timely for me. I’m going to share this with my staff, journal editors, and organization managers.
In all measurements there are basically two meta-questions: what do you want to know, and how accurately can the measure tell you this. In scholarly measures the “what do you want to know” boils down to “is person A ‘better’ (in some sense) than person B.”
Recently, in JASIST, we (http://onlinelibrary.wiley.com/doi/10.1002/asi.23689/pdf , arXiv version at: https://arxiv.org/abs/1510.09099) looked into this question. We found (by comparing different, but very similar measures of exactly the same papers by exactly the same people, chosen to avoid any large systematic) that to say, with 95% confidence, that person A is “better” than person B on the basis of either citations or downloads, person A’s measures must be at least a factor of two larger than person B’s.
Note that these statistical errors are in addition to the normal systematics which bedevil these measures, such as age or (sub-)field or co-authorship pattern of the individual, what publications are covered for a citation database, and whose actions are counted for a download database. Downloads can be especially problematic, as, while downloads by researcher-authors correlate well with citations, downloads by others definitely do not. We reviewed this in ARIST: (http://onlinelibrary.wiley.com/doi/10.1002/aris.2010.1440440108/abstract, arXiv version at: https://arxiv.org/abs/1102.2891).
The errors in citation and download measures are really quite large, and, while for any ensemble of individuals they will tend to cancel, leaving a relation between (say) citations and some other measure of quality, such as grant funding or prizes, this is of little practical use. In the very practical case of making a personnel decision between two qualified individuals citation or download metrics have little use, unless the differences are large enough to be significant, a factor of two or more.
Regarding the salt metric, it is hard to see Scopus as a million times better than Google Scholar. If you do not have a subscription to Scopus than GS is infinitely better because we are dividing by zero.
This just in:
New KBS Research Event: The Future of Research Assessment Peer Review vs. Metrics
Thanks Angela, an excellent post and I really like the grains of salt measure – I feel a new metric will appear soon alongside articles and books 😉
In particular I am pleased to see your score for Crossref. We use Crossref as one of the data sources in Bookmetrix as it is the only one that ‘indexes’ all our books and chapters.
http://www.bookmetrix.com/detail/book/781242e6-e31d-47a3-869c-c2a828823159#citations It is however as good as the publishers contributing to it.
Key question that remains is what does it all mean, for the researcher, for the funder, for the institution. Rightfully so, you mention to ‘let authors and editors know the limitations of the data’, a point we could try harder. We have done so previously, highlighting the differences between the citation data sources; https://www.springer.com/citations. I think it is very important to provide a context so the user understands what it means and doesn’t get drowned in a sea of metrics.