Evidence has been mounting that PubMed Central (PMC) pulls traffic away from publishers’ sites, no matter what the publisher’s business model is — subscription or open access (OA). One of the interface choices that seems clearly designed to divert traffic from publishers’ sites is the direct and prominent links to PMC versions of articles which appear immediately in search results lists. Links to publisher versions of articles are one layer below at the abstract level, requiring an additional click and pageload.
This design has been controversial ever since it was first noticed, and may contribute to the traffic siphoning being observed in studies of PMC’s effect on publisher traffic.
So, how does the National Library of Medicine (NLM) justify this design?
According to documents recently obtained through an ongoing Freedom of Information Act (FOIA) request, not very well or clearly — and certainly not openly.
This story begins in February 2011, when Janet O’Flaherty of the BMJ Group emailed Sheldon Kotzin at the NLM:
(Note once again the confusion between PubMed and MEDLINE on O’Flaherty’s part. We continue to see this even among sophisticated people like her and a large proportion of academics.)
Kotzin writes a polite response to O’Flaherty, telling her he is referring her question to Ed Sequeira and informing her of the dates she asked about. That night, Sequeira emails Bart Trawick, David Lipman, Jim Ostell, and Kathi Canese, all NLM employees, asking:
A “docsum” is a “document summarization.” In systems that use a docsum approach, user queries generate a summary document containing the available information to the system. A docsum can be modified to show various elements available to the system, which implies that if information is available, it can be included in the docsum. Therefore, if you do not include selected information in the docsum, that’s more of a choice than a technology limitation.
Lipman replies the following morning. He copies the entire group on his response, cc’ing Dennis Benson and Eric Sayers, as well:
This two-part justification is fascinating. In the first part, Lipman is basically saying that publisher sites aren’t reliable enough, and he doesn’t want to have their problems reflect badly on PubMed.
In the second part, Lipman is saying a few things. First, he’s speaking like a publisher, who wants to keep people on his version of the article because there are all sorts of fancy services arrayed nearby. Second, he’s saying that the publisher links might be more attractive to users, and he’s not willing to take the risk (“. . . we will clearly lose hits. I would rather drop the PMC links on the docsum than add the publisher links.”). This undercuts the criticism that publisher sites are too flaky to link into, and shows that NLM is purposely suppressing links to publisher sites in order to increase and keep traffic. Third, he is saying that the docsum isn’t that hard to modify — dropping PMC links doesn’t strike him as a technology challenge, and adding them isn’t remembered as something that was difficult to accomplish.
Perhaps now that we know that Lipman’s willing to drop PMC links from the PubMed search docsum, we should take him up on the idea . . . after all, that would make for a more level playing field, and it’s apparently an easy solution.
Of course, these rationalizations were not shared with O’Flaherty in Sequeira’s response, which came two weeks after her initial inquiry:
This is a fascinating bundle of nonsense to unpack. Essentially, Sequeira is saying that approaches that work well at the abstract level can’t be made to work well on the search result list, and that risks being taken with links on the abstract level can’t be taken on the search result list, despite some of these things being helpful to “an extremely important constituency for NLM” and the like.
It’s worth noting that I received copies of these 2011 emails during the government shutdown, where PMC is showing an error message while publisher sites are functioning normally. The implicit assumption that PMC is less likely to generate error messages than publisher sites is undercut by this, as the entire PMC repository is now generating error messages, and the problem is not technological but political (and is lasting longer than any technology challenge would be allowed to last).
This short email exchange also shows that Sequeira was hoping for a better justification for the PMC links than, as he put it, “[a]nything better than: all the links in the PubMed search result go to other pages within the NCBI Entrez site” but Lipman didn’t deliver.
Or did he?
I believe Lipman’s answer is actually closer to the truth — the managers of PMC are competing for traffic, they implemented a search results design in their doscum which worked, and were simply defending it and the traffic benefits it’s been delivering.
I asked O’Flaherty via email about how she recalled this exchange, but had not received a response as of this writing. (Update: Apparently, there was an email failure, and O’Flaherty never got my message. She’s since read the post, and so far has nothing to add. But the door [and comments thread] is open.)
Based on these emails, it seems the the suspicion about NLM’s design on PubMed search results was largely correct — that is, the search results design in PubMed was meant to steer traffic away from publishers’ sites, increase PMC traffic, and show off NCBI and other government information tools. It’s competitive, and it seems to be working. That’s why they’re defending it and are not willing to change it.