Comments on: Pubget: Time-saver or Content Aggregator?

By: Pubget Continues to Puzzle and Diversify « The Scholarly Kitchen

Pubget Continues to Puzzle and Diversify « The Scholarly Kitchen — Fri, 06 Aug 2010 09:23:15 +0000

[…] last year’s coverage of Pubget, I questioned whether their business model, “based on selling advertisements to lab equipment […]

By: Daniel Schwartz

Daniel Schwartz — Mon, 08 Feb 2010 01:54:25 +0000

I, for one, would like to applaud the work of Pubget. My university spends an enormous sum gaining access to the medical literature but when it comes time for me to actually access a relevant article, the process of finding the PDF is incredibly burdensome and actually a deterrent to the perusal of the literature. This barrier is likely part of the reason why many in the health sciences make decisions about an article solely based on the abstract – the full text is just too hard to get to, even if one has an institutional subscription.

Pubget has come up with a solution to a significant problem that the publishing industry and academia should have solved years ago.

If they need to generate income from contectual advertising, we should be pleased that this service has been made available, rather than question the validity of their approach.

If there are copyright issues, I suggest we leave that to the publishers and Pubget to sort out. In any event, no one has actually provided any documentation that Pubget is actually spidering PDF content. I suspect that the search terms, name of journals/articles and user search history may provide adequate content to create good contectual ads without even needing the review the PDF itself.

Much luck to Pubget and I hope they succeed in revolutionizing access to the underutilized scientific literature.

NB I have no relationship with Pubget.com, other than being a dedicated user.

By: David Crotty

David Crotty — Mon, 24 Aug 2009 14:17:13 +0000

In reply to David Crotty. Thanks for the clarification. I'm not sure how PubGet is generating their ads, whether they're based on the content they've spidered or on the search term used (as Google does). It's an interesting question though, perhaps one could argue that it's fair use, and in particular, PubGet is asking publishers for permission to spider their content, so that may be part of the deal. One thing also to note is that Google is currently facing a trademark lawsuit over selling companies' trademarked names as adwords, so it's unclear if their approach is wholly legal.

By: David Smith

David Smith — Mon, 24 Aug 2009 08:44:53 +0000

In reply to David Crotty.

At the risk of being ‘orribly pedantic, I’ll explain why I think the Google approach is a bit different and thus doesn’t attack copyright directly in the manner that I see PubGet might possibly do…

Ok, For search results (Adwords), the ads are actually set against the incoming search terms, not the actual text that can be found in the search results. Now, clearly there is a connection between the search, Google’s spidering of the web and the ability to lay ads alongside snippets of content, but, you can opt out (with robots.txt) and the ads are shown not as a direct result of the text in the search results. I think this is why it has never been challenged (seriously).

Adsense, does use site text for context based adverts, but again, the nature of adsense is such that the responsibility for copyright adherence lies with the person signing up to the adsense scheme.

I’m am speculating of course, but my original comment was based on the idea that PubGet would have to use the journal texts to develop a context based advertising model, and thus, unlike Google they would directly be running up against the copyright restrictions. Also over here at least, copyright extends into databases… where the process of harvesting a database can sometimes also be an issue, regardless of what’s in it.

By: David Crotty

David Crotty — Fri, 21 Aug 2009 14:35:48 +0000

In reply to David Smith.

Any copyright issue would be the same here as for any search engine. I don’t think anyone has challenged Google’s ability to spider the web and sell ads against content in their search results. The lawsuit that brought about the proposed Google Books Settlement did ask that question, challenging whether Google indexing book content (even if that content was never displayed) was fair use. Unfortunately, the case never made it to court, as the settlement was worked out instead.

Most journals allow search engines to spider and index their content, and ads are sold against searches that bring up the content in results. The big difference is that the search engines send traffic to the journals, where PubGet would not.

By: David Smith

David Smith — Fri, 21 Aug 2009 08:38:42 +0000

Isn’t there going to be a copyright issue here shortly?

PubGet want to make money off advertising and the logical approach to do that is to mine the text of the articles in order to be able to place relevant ads next to the research results.

That approach would be in breach of copyright I think, unless they are going to enter into agreements with publishers for a revenue split on the ad income.

I agree with David up at the top – this looks and smells awfully like a hijacking program.

By: James Schmidt

James Schmidt — Thu, 20 Aug 2009 22:09:51 +0000

As a health care practitioner who frequently accesses the professional literature, I would find such a tool very useful. However, I am concerned about the broader issue here which is that tools like this are likely to ultimately make the subscription and advertising-based scientific journal a thing of the past. As more and more of us stop subscribing to journals, and access what we want from them over the web instead, the subscritions and associated advertising revenue of these journals will dry up and so will they.

The problem is that professional journals offer more than information; they offer information that has to some degree been assessed for reliability and accuracy. Articles are routinely peer reviewed by independent experts in the field who know the existing literature in the field as well as the principles of scientific research design. They use that information to determine the value of a given paper. As anyone who has published in such journals knows, far more papers are rejected than are accepted, the usual reasons for rejection being that the research design is flawed or the results are trivial. When results are unexpected or contrary to existing information, replications of the findings are often requested from the paper author or from independent sources. Editors will also, in many journals, append editorials or summaries of reviewers’ comments to allow the reader to have a more balanced context in which to evaluate the value of a study. Further, most journals require verification that the study was conducted in an ethically acceptable way. Likewise, journals typically require a statement as to where the funding for the research came from so that readers can consider whether factors such as funding from a pharmaceutical firm might have had an impact on the results. As studies move directly onto the web without such review it will become increasingly impossible for readers to determine the quality of the research, whether it was conducted in an ethical way, and whether it was paid for by individuals with vested interests. There have been numerous disclosures of subversion of the scientific literature by pharmaceutical firms in the recent past, and as the editorial process is reduced and ultimately eliminated we can expect these abuses to increase.

It took many long years to develop the editoral oversight system we have with the published journal. Sadly, I see little being done to transfer this process to the web and think this must become an overriding priority of we are indeed to move all of our scientific reporting to the web. Otherwise, the term “virtual reality” will take on a new and much less favorable meaning as we substitute “virtual truth” for actual fact.

By: Alex Frost

Alex Frost — Thu, 20 Aug 2009 16:05:05 +0000

There are good points raised here, however I think the suggestion that publishers’ efforts to expand the functionality of html versions doesn’t necessarily reflect that this is what scientists want(ed) (‘If the image of an article was what scientists really wanted, publishers could have stopped developing somewhere around 1995.’). As an extension of David Crotty’s comment, it needs to be considered that traditional models of online revenue rely on web traffic around articles, and this is an important source of potential bias when comparing the efforts of scholarly publishers and the interests of their readers.

I’ve certainly advocated new online tools for publishers, but frankly have never seen (and probably more importantly, never looked for) research on the reading preferences/habits of representative samples of scientists. If current online platforms were designed with a bias towards increasing traffic (or more benignly to test out new tools), the usage/download patterns from these sites is not a source for clear answers…

It is interesting to consider future radical changes in connectedness in which access to the tools around articles is not centralized at one publisher’s website – and local ‘image copies’ are seamlessly connected to the same features now restricted to online/html versions.

By: David Crotty

David Crotty — Thu, 20 Aug 2009 14:08:02 +0000

I think there is a fundamental conflict here with journal publishers. Journals generate traffic, and they sell ads based on that traffic. PubGet wants to take that traffic away from the journal and to PubGet, and sell the same ads to the same companies. I expect to see most journals blocking this sort of hijacking attempt.

The direct line to the pdf also stops many interesting new ventures like article ratings or commenting systems, as those are done on the html versions of papers, not the static and disconnected pdf.