A few years ago, I helped institute a journal’s first “Most and Top” lists — Most Read, Top Searches, Most Blogged, Most Cited. They seemed to work pretty well, but the Top Searches list had an anomaly — the term “biological” was coming in ranked in the Top 5 week after week. It didn’t fit. Every other term was a medical condition — diabetes, sepsis, hyponatremia, and so forth. Why would “biological” rank so high? Was there something about our audience we didn’t understand?
Because it was a raw count, it took the team a while to really question its presence. I fell into the same trap. We’d worked hard to create the counters and lists and categories and interface, and the data were the data. Who were we to question what was a pure reflection of search engine behavior?
Finally, after weeks of strangely consistent results, we started digging. We entered “biological” into our search engine, and only one article showed up as a likely result, an article with “biological” in the title. Some IP address detective work ensued. Fast-forward, and it turns out that the author of the article in question had once hired an industrious assistant who had set up an automated search for the term “biological” in the belief that this would help him know when the paper was cited so he could impress his boss. The assistant? He’d moved on a few years earlier, but the automated search was still running. It took about a week for the new staff to find it and shut it down. “Biological” immediately dropped out of our Top Searches list.
The edge of the network can have unexpected power.
Recently, Google learned about the power of the Internet’s Oort cloud, that seemingly endless cluster of small sites orbiting the major players like the outer comet cloud slowly orbiting our solar system, launching the stray comet our way every now and again. As explained in a New York Times article entitled, “Search Optimization and Its Dirty Little Secrets,” J.C. Penney’s search engine optimization (SEO) firm probably indulged in “black hat” optimization — tricks used to deceive Google by promulgating links at the fringe’s of the Internet so intensely that they overwhelm the PageRank algorithms.
Where did the black hat SEO firm sprinkle J.C. Penney links? Some were those bastions of affordable household goods and practical family fashions like nuclear.engineeringaddict.com, casino-focus.com, bulgariapropertyportal.com, usclettermen.org, and elistofbanks.com. (Notice, I’m not linking to those — no need to perpetuate any SEO problems for Google.)
Google responded by instituting correctives that sank J.C. Penney’s rankings well below the radar. It was a less severe reaction than when Google delisted BMW for a short period in 2006 when that company was caught spamming for links.
Of course, scholarly publishers should watch any malfeasance with Google rankings with some interest. After all, our practice of citation is what Google’s PageRank is expressly built upon. And with new approaches like the Eigenfactor being built to make scholarly citation take on some of the network effects Google has achieved, the relevance only increases.
The Eigenfactor boasts of its approach’s breadth and inclusiveness while denigrating the non-networked approach to calculating impact:
Our algorithms use the structure of the entire network (instead of purely local citation information) to evaluate the importance of each journal.
As the Google/J.C. Penney story reminds us, relying on the entire network can be a double-edged sword. With journals proliferating in number and with more journals launching in remote locations, the network is becoming a potential liability.
We’ve talked here before about the Eigenfactor, going all the way back to a 2008 post by Phil Davis summarizing a paper of his in which the Eigenfactor and the impact factor mapped nearly identically, raising the question of whether popularity and prestige are purely reflective in a relatively closed system like scientific communication. As more online-only or online-mainly journals are developed, the power of the network effect on citation systems will grow.
The argument of increased virtue based on network reliance is dubious at best. It’s entirely possible that the errors of the old remain while adding the novel weaknesses of the network.
(Hat-tip to Marie McVeigh for the pointer.)