It’s no secret that Google’s PageRank algorithm is basically the familiar journal citation approach blown out mathematically and practically to achieve the real-time network effect. Oh, how powerful it is! Now, Google is going a bit more old school, ratcheting its engine back to dabble in math of a different kind, this time the math of Jorge Hirsch, whose h-index is slowly becoming an alternative to the impact factor.
While the h-index was originally designed to measure an individual scientist’s impact, it’s attempt to square citations can be applied to any set of articles. Google is calling its creation of five-year h-indices “Scholar Metrics.” Using the h-index, the value is the lowest number at which the number of articles and the number of citations match. In applying the h-index to journals, Google is creating a set of very interesting tensions.
One of the main advantages of Google Scholar is that it is more comprehensive in its scope than Thomson Reuters’ Web of Science. At least, let’s start with that assumption — that more is better. Using Google Scholar, you get a ranking like that shown in the figure below. Nature is the top-ranked journal, followed by NEJM, Science . . . and RePEc? That stopped me. Turns out that’s Research Papers in Economics, a metadata hub for economics papers — or, as it describes itself, “a decentralized bibliographic database of working papers, journal articles, books, books chapters and software components, all maintained by volunteers.” Publishers participate by depositing metadata, and RePEc provides links. RePEc has its own rankings of the papers listed there, but getting to a paper is no easy feat.
So is more better? RePEc is a metadata hub listed among journals. It’s not indexed by Thomson Reuters for good reason — RePEc produces no new scientific information itself. Yet, it has an h-index in Scholar Metrics. Strange.
Other entries raise doubts about the value of the wider selection available in Google Scholar used to generate Scholar Metrics — arXiv.org and SSRN both publish interesting preliminary papers, but do they belong on this list? Conferences can also make the list.
Now, some may argue that adding these “gray literature” elements is a good thing, but it’s a highly selective set of the gray literature — I know newsletters, blogs, and monograph series that could be included but aren’t indexed in Google Scholar because they don’t have an academic institute or journal brand behind them. While Google Scholar is broader, it’s also idiosyncratic.
An interesting 2011 paper in Research on Social Work Practice entitled, “Evaluating Journal Quality: Is the H-Index a Better Measure Than Impact Factors?” compared the h-indices for social science journals with five-year impact factors, and found a high correlation. However, faculty empirical quality ratings correlated better with h-index values, something the authors attribute to their field’s applied research culture — that is, useful clinical research, which may be less cited because it’s aimed at practitioners (who don’t cite) creates a better reputation, something citations in and of themselves may not capture completely.
But who needs to continue to list Scholar Metrics’ virtues? It’s free and it’s Google’s. Bottom line.
It’s free! It’s Google!
OK, let’s settle down. Pitfalls exist. For instance, is a “citation” in the Google Scholar index really a citation in the traditional sense? We’re all well-aware of all the baggage a citation can carry — most are straightforward, but many are not, and they can range from damning to fraudulent. Which gets us back to one of the main questions — are more citations, which Google Scholar must possess, better? Or are they merely adding noise? Thomson Reuters analyzes self-citation, and punishes journals that generate an appreciable percentage of their impact factor through self-citation. Yet, looking at the arXiv.org citation data in Google Scholar, it’s clear that most of the citations are self-citations from one arXiv.org paper to another. The same seems to hold true for SSRN, although there are more citations from outside.
There are other concerns with Google’s approach. For one, only journals meeting Google’s inclusion criteria can participate in Scholar Metrics. These inclusion criteria are largely technological in nature, and can change with the wave of an engineer’s hand, as happened to multiple journals last summer. Untangling a new Google edict can take months, during which time it seems a journal would be delisted from Scholar Metrics. While being delisted from Web of Science is usually a measure of some sort of malfeasance, being delisted from Scholar Metrics could be due to some misapplied headers or robot.txt file.
Google’s Scholar Metrics are a nice start to something that could be honed into a useful, free tool. But as it stands now, it’s just a start. More human judgment must be brought to bear, either through better engineering, actual human curation, or a mixture of the two.