In researching today’s post, I made several reporting errors. These errors affected how I evaluated Scopus metrics. As a result, I thought it would be fair to give Wim Meester, Head of Content Strategy for Scopus, an equal opportunity to have his response read and not be buried in a long list of comments:
After our correspondence on this topic it is good to read the resulting blog post and see the discussion of document type classification issues and journal metrics. Via this way, I would like to clarify some of the statements in the above post.
I believe that some nuance is lost with respect to how Scopus document type classification is described. In contrast to where it is mentioned, the “Scopus‘ model of allowing publishers to classify their own content and make modifications to its indicators”, we in fact do not allow publishers to classify their own content and make modifications. We take the publisher provided classification and match that to our own definition of the document type. While we are open to feedback and/or publisher disagreement with classification, we will not change a document type if it does not match with our general document type definitions.
I would also like to clarify confusion about the reported SNIP ranks in the table. Source Normalized Impact per Publication (SNIP) is a journal metric calculated based on a proprietary methodology developed by CWTS. SNIP measures contextual citation impact by weighting citations based on the total number of citations in a subject field. SCImago does not calculate SNIP values, but calculates a metric called SCImago Journal Rank (SJR). The SJR methodology is developed by SCImago and it is a prestige metric based on the idea that not all citations hold the same weight. With SJR, the subject field, quality and reputation of the journal have a direct effect on the value of a citation. SNIP and SJR are two different type of journal metrics and therefore, their values and ranks should not be compared. More details on these metrics and how they work can also be found here: http://www.journalmetrics.com/.
This approach, the provision of Scopus data to third parties, is not different from how we work with other organizations like university rankers who use Scopus data as input for their rankings. For example, Times Higher Education and QS both use Scopus data for their world university rankings, however, the weight they give to citations and the methodology they use to calculate citation impact differs. Therefore, the eventual rankings will be different, although Scopus is used as the citation data source for both rankings.
I think the actual comparison that you are interested in and what we corresponded about is the differences in document counts (sometimes referred to as the “citable items”). Scopus assigns the document type to the data and every year we provide the full dataset to CWTS and SCImago. As described in Ludo Waltman’s response here and the research papers he quotes, CWTS takes the article, review and conference papers and then further excludes those documents that do not contain cited references. That document count is used on the calculation of SNIP. From the same provided dataset, SCImago takes article, review and conference papers and adds “short review” documents to the document count which is then used for the calculation of SJR.
Therefore, I believe it is not the integrity of the dataset but the different methodologies to calculate different type of journal metrics that explains the difference in document counts. Also note that the actual IPP, SNIP and SJR journal metric values that are reported by CWTS and SCImago are exactly the same as reported in Scopus and any other Elsevier sources. These values are consistent and can be trusted.
Finally I do want to thank you for your critical look on document type classification in Scopus and how these are used to calculate journal metrics. If there is one thing I learned from this exercise is that we should be even more transparent and that there is room for a simple, easy to use journal metric that gives credit to every document regardless of how Scopus or anybody else classifies it.
Wim Meester
Discussion
3 Thoughts on "Can Scopus Deliver A Better Journal Impact Metric? Response from Scopus"
Wim, thank you for your response. I agree that interpretation is based on nuance, but I’m concerned that your definition of Article is much too broad and overlaps with other Scopus classification types. The ambiguity between document definitions allows publishers to make a strong case to argue for one classification type (non-citable) over another (citable).
I’m also wondering exactly how many Articles, Reviews and Conference Papers are published with no references? I don’t think I’ve ever seen such one, which makes me wonder whether your classification rubric is not working accurately or reliably. Can you or @LudoWaltman provide some examples?
Phil, you are raising an important issue. In footnote 13 in the paper in which the SNIP indicator is defined (http://dx.doi.org/10.1016/j.joi.2012.11.011), the following is written: “We exclude publications that have no references. In many cases, these publications actually do have references, but data on their references is missing in Scopus. Also, some publications appear two times in Scopus, one time with references and one time without.” The last two sentences in this quote indicate an important problem of the Scopus database. You can read more about the problem of duplicate publications, mentioned in the last sentence, in the following paper: http://dx.doi.org/10.1016/j.joi.2015.05.002. Wim probably will be able to provide more information on these issues related to the data quality of Scopus.
Besides capturing data from the original source of the publisher, Scopus also receives data feeds from third parties like Medline to enrich records with indexing terms and PubMed ID’s, but without cited references. The document type classification of these items by Medline is “Article” and these items are deduplicated and merged with the corresponding Scopus item (with the Scopus document type classification prevailing). This deduplication is done with an algorithm at high precision. However, any algorithm has its limits on how precise it can be and sometimes the Medline item is unique or with millions of records being processed due to differences in the metadata the match is not made and the item is not deduplicated. Therefore it can happen that there are (duplicate) items classified as Article in Scopus that do not have cited references. As explained by Ludo, CWTS has decided to exclude these documents from the calculation of metrics by excluding all documents that do not have cited references. If we discover these duplicates in Scopus, we will make sure to remove or merge these with the Scopus record. If our users find duplicates they are encouraged to report that feedback by the help function in the product or by contacting the local Elsevier customer service team. Like with our approach of author profiles, affiliation profiles, research metrics etc., we trust that our approach of an algorithm in combination with manual feedback results into the highest accuracy of deduplication possible.