Evidence has been mounting that PubMed Central (PMC) pulls traffic away from publishers’ sites, no matter what the publisher’s business model is — subscription or open access (OA). One of the interface choices that seems clearly designed to divert traffic from publishers’ sites is the direct and prominent links to PMC versions of articles which appear immediately in search results lists. Links to publisher versions of articles are one layer below at the abstract level, requiring an additional click and pageload.
This design has been controversial ever since it was first noticed, and may contribute to the traffic siphoning being observed in studies of PMC’s effect on publisher traffic.
So, how does the National Library of Medicine (NLM) justify this design?
According to documents recently obtained through an ongoing Freedom of Information Act (FOIA) request, not very well or clearly — and certainly not openly.
This story begins in February 2011, when Janet O’Flaherty of the BMJ Group emailed Sheldon Kotzin at the NLM:
Dear Sheldon – Hope all is well with you and that the winter weather hasn’t reached Washington.
When we met in November I mentioned our concern about the prominence of links to PMC full text rather than the primary journal (see screen shot below) – do you know if there are any plans to at least include a link to the primary source as well?
Can you tell me when the PubMed selection panel is due to meet next? We are hoping a couple of our journals are on the list!
(Note once again the confusion between PubMed and MEDLINE on O’Flaherty’s part. We continue to see this even among sophisticated people like her and a large proportion of academics.)
Kotzin writes a polite response to O’Flaherty, telling her he is referring her question to Ed Sequeira and informing her of the dates she asked about. That night, Sequeira emails Bart Trawick, David Lipman, Jim Ostell, and Kathi Canese, all NLM employees, asking:
Do we have an official answer to why the PubMed docsum has direct links to PMC articles, but not to free articles at the journal site? Anything better than: all the links in the PubMed search result list go to other pages within the NCBI Entrez site. There are no external links there, but once you get to the abstract page for an article there’s a prominent link to the journal site.
A “docsum” is a “document summarization.” In systems that use a docsum approach, user queries generate a summary document containing the available information to the system. A docsum can be modified to show various elements available to the system, which implies that if information is available, it can be included in the docsum. Therefore, if you do not include selected information in the docsum, that’s more of a choice than a technology limitation.
Lipman replies the following morning. He copies the entire group on his response, cc’ing Dennis Benson and Eric Sayers, as well:
There are two reasons:
When a user goes to the fulltext several things can happen:
1) They may get the full text
2) They may get some login screen
3) There may be some sort of error on the publisher site and they get an error message of some sort
If we have the link directly from the docsum, it will increase the number of users that see the problem as coming from the NLM.
Second reason (internal reason):
We provide other useful information on the PubMed page and on the PMC page (e.g. neighbors) that a high fraction of users click on. We know that many of these users may not have realized ahead of time that this information is useful to them (i.e. thought they ended up clicking on a related article, when looking at the docsum page, they may have thought they would only go to the publisher site for article).
Many more hits in docsum have publisher links than have PMC links. So we will clearly lose hits. I would rather drop the PMC links on the docsum than add the publisher links.
Bottom line is that doing things the way we’re doing it doesn’t prevent a path but practically speaking, doing what BMJ is asking here will prevent paths.
This two-part justification is fascinating. In the first part, Lipman is basically saying that publisher sites aren’t reliable enough, and he doesn’t want to have their problems reflect badly on PubMed.
In the second part, Lipman is saying a few things. First, he’s speaking like a publisher, who wants to keep people on his version of the article because there are all sorts of fancy services arrayed nearby. Second, he’s saying that the publisher links might be more attractive to users, and he’s not willing to take the risk (“. . . we will clearly lose hits. I would rather drop the PMC links on the docsum than add the publisher links.”). This undercuts the criticism that publisher sites are too flaky to link into, and shows that NLM is purposely suppressing links to publisher sites in order to increase and keep traffic. Third, he is saying that the docsum isn’t that hard to modify — dropping PMC links doesn’t strike him as a technology challenge, and adding them isn’t remembered as something that was difficult to accomplish.
Perhaps now that we know that Lipman’s willing to drop PMC links from the PubMed search docsum, we should take him up on the idea . . . after all, that would make for a more level playing field, and it’s apparently an easy solution.
Of course, these rationalizations were not shared with O’Flaherty in Sequeira’s response, which came two weeks after her initial inquiry:
Sorry for the slow response. I got sidetracked in the middle of my original attempt and then let this slip.
By design, the links from the main PubMed search result page are confined to other pages on the NLM/NCBI/Entrez site. This is to ensure that a user clicking on an item in a result list will always get some more detail about that article, even if it’s just an abstract. Once you’re at the PubMed abstract, you have links that are both internal and external to Entrez, and there’s a prominent link to the journal site. Incidentally, a fair portion of PubMed users come directly to an abstract from an external search engine such as google, bypassing the PubMed search page altogether.
The links to journal sites are set up automatically by the respective publishers / providers. So on any given day, you’ll have some percentage that simply are dead, a significant number that take a user to an access control page, and then a fair number that actually get you to an article. Given the number of articles in PubMed, there’s no way we can do any sort of human curation of the links. Users who get a dead end of one sort or the other tend to see this as a PubMed problem. That’s true even for a surprising number of users at academic institutions, who I’d expect to know better. In any case, moving the external links to the main search result page will simply aggravate the situation.
Another consideration, among several, is that if we were to add external links to the main search page, academic libraries would press us to include customized links to their preferred provider (subscription source) for a given article on the search result list. That’s a feature now available on the abstract page, thought the link to the journal site still gets greater prominence. As you might expect, academic medical librarians are an extremely important constituency for NLM, so this isn’t something we can treat lightly.
In short, I don’t see us changing the current approach. Sorry.
This is a fascinating bundle of nonsense to unpack. Essentially, Sequeira is saying that approaches that work well at the abstract level can’t be made to work well on the search result list, and that risks being taken with links on the abstract level can’t be taken on the search result list, despite some of these things being helpful to “an extremely important constituency for NLM” and the like.
It’s worth noting that I received copies of these 2011 emails during the government shutdown, where PMC is showing an error message while publisher sites are functioning normally. The implicit assumption that PMC is less likely to generate error messages than publisher sites is undercut by this, as the entire PMC repository is now generating error messages, and the problem is not technological but political (and is lasting longer than any technology challenge would be allowed to last).
This short email exchange also shows that Sequeira was hoping for a better justification for the PMC links than, as he put it, “[a]nything better than: all the links in the PubMed search result go to other pages within the NCBI Entrez site” but Lipman didn’t deliver.
Or did he?
I believe Lipman’s answer is actually closer to the truth — the managers of PMC are competing for traffic, they implemented a search results design in their doscum which worked, and were simply defending it and the traffic benefits it’s been delivering.
I asked O’Flaherty via email about how she recalled this exchange, but had not received a response as of this writing. (Update: Apparently, there was an email failure, and O’Flaherty never got my message. She’s since read the post, and so far has nothing to add. But the door [and comments thread] is open.)
Based on these emails, it seems the the suspicion about NLM’s design on PubMed search results was largely correct — that is, the search results design in PubMed was meant to steer traffic away from publishers’ sites, increase PMC traffic, and show off NCBI and other government information tools. It’s competitive, and it seems to be working. That’s why they’re defending it and are not willing to change it.
3 Thoughts on "Link Miser — Why the NLM Links to PubMed Central Versions Directly from PubMed Search Results"
“This is a fascinating bundle of nonsense to unpack.”
Thank you for over the past couple of years unbundling PMC’s entrance in to the publishing business.
These “indexes” are growing into full-fledged competitive publishers. As such, PMC is using their self-promotion oriented discretion on accepting publications (without a grading system like NLM, or even referring to the appropriate CDM section as the basis for their decision) and burying links to the primary publisher sources in favor of their own.
PMC encouraged and launched eLife (as you set forth here: http://scholarlykitchen.sspnet.org/2013/02/06/answers-finally-how-pubmed-central-came-to-help-launch-and-initially-publish-elife/).
Which fledging well-connected publisher without content will be chosen to present to the PMC Advisory Committee next? Perhaps PMC will provide “upgrade link packages” to publishers?
We have a similar problem in Latin America with ScieLO, which was founded in 2000 to help small journals from learned societies to publish online. But then they became “selective” and turned into a sort of data base of LA journals. The problem is that publishing standards are pretty much the same as from the start, with few innovations and a harsh user experience.
My journal is OA, online only and I have no interest in putting into ScieLO for some of the same reasons that are mentioned above in the article. I really do not understand this whole point of republishing…however I do agree with MEDLINE’s requirement for external and certified digital preservation systems, and the “dark” ones (see Portico) seem much more appropriate for the purpose than PMC.
It seems to me that everyone has to justify their program/income and what we see here is a bureaucrat doing just that. Additionally, I think we are seeing what I like to call the lunch room factor. These people from PMC and NLM eat together and no one wants to see the other harmed. I have noticed over the course of a rather long career that in hard times those farthest from the office are fired before those in the office. Ask PMC how NLM is doing and they will praise the organization and vice versa.