This claim, and the paper supporting it, is displayed prominently on the Academia.edu website, a platform for scholars to share research papers. It was also sent out to all 21 million registered Academia.edu members.
The paper, “Open Access Meets Discoverability: Citations to Articles Posted to Academia.edu,” was authored by six employees of Academia.edu and two members of Polynumeral, a data consultancy company.
In recent years, other large data-driven companies, like Facebook and Google, have conducted and reported on their own research, and it would be unfair to discount this paper simply because of its self-aggrandizing results. Scholars require serious scientific studies to back bold claims. To millions of potential users of Academia.edu, a scientific paper is much more effective than a glitzy vendor stall at a national conference or a glossy brochure, and the investors behind this company undoubtedly know that a return on investment requires more than false promises. It requires hard data.
There are many papers claiming a citation advantage to open access (OA) articles and most are not worth reading. This paper is an exception. The authors clearly understand the limitations of observational research and how correlations are often confused with causation. They analyze their data using three similar models to verify their results. They use covariates in their regression model and look for other explanations. They take a fair and unbiased look at the literature and don’t purposefully obscure or ignore research that contradicts their conclusions. Strangely, these are characteristics absent in most OA-advantage papers.
Compared to a control group of papers, selected at random from the same journals and same years as the Academia.edu group, their analysis finds a positive association between free access and article citations that grows over time. This association should not be surprising, given a decade and a half of similarly reported results. What IS surprising about their findings was that having one’s paper freely available from other freely accessible locations only boosted a paper’s citations by just 3%. Or expressed as a comparison:
We find that a typical article posted to Academia.edu has 75% more citations than an article that is available elsewhere online through a non-Academia.edu venue: a personal homepage, departmental homepage, journal site, or any other online hosting venue.
What seems to be omitted in the above statement is that other online venues include much more than personal and departmental home pages or journal sites. They include massive open literature repositories like PubMed Central, the arXiv, bioRxiv, SSRN, not to mention their chief competitors, ResearchGate and Mendeley. That a relatively young upstart is besting them all is unexpected, except from a marketing perspective.
In an online survey of social network use by Nature in 2014, only 29% of scientists and engineers responded that they were even aware of Academia.edu and just 5% visited the site regularly, compared to 88% and 29% for ResearchGate, respectively. For social scientists, arts and humanities respondents, the results were somewhat closer. For respondents who use these two services, their main reason was simply to maintain a profile, followed by posting content.
The authors of the paper claim that it isn’t open access that is driving the results, but discoverability. Academia.edu users are actively notified when new papers are posted in their field or by authors they follow. But many other indexes, archives, social media tools, and journal websites have comparable notification services as well, so I find their explanation unsatisfactory, especially when all of these other sources of content, taken together, have just a tiny effect against the mighty power of Academia.edu to boost citations.
This paper suffers from a data problem. And that problem is their control group.
The researchers compared the performances of papers uploaded to Academia.edu to a control group–a random sample of papers selected from the same journals. If the randomization was successful, the control group should be similar in all respects to the Academia.edu group. In this way, differences observed in the data over time are likely attributable to Academia.edu and not some other cause.
If you look at their data (download the file: papers.csv.gz), you’ll notice something odd in the title of the first article: it is an errata and it belongs to the control group. Indeed, if you search the title list, you’ll find editorials, corrections, retraction notices, letters to the editor (and their responses), commentaries, book reviews, conference program abstracts, news, and even obituaries in the control group. As a general rule, these kinds of papers receive few (if any) citations. So, it’s no surprise that papers uploaded to Academia.edu outperformed a sample that included a good proportion of non-research material.
There is other evidence in the paper that the treatment and control groups are not similar. The researchers reported that 45% of papers uploaded to Academia.edu were also found on other free access sites compared to 25% for the control group (Table 5). We don’t know whether this difference is the result compositional differences between the treatment and control group, or whether important papers are more likely to be posted freely or published as open access.
Academia.edu is not a publisher but a metrics and analytics company, whose value proposition is to generate valid statistics around the impact of scientific research. So, it’s difficult for me to comprehend how this paper got so far without anyone even spot-checking the article list or attempting to first check that the treatment and control group were similar before proceeding with the analysis. The huge citation boost to the Academia.edu group may have had nothing to do about open access or discovery, but explainable entirely by bad data.
Addendum (17 August 2015): The authors of the Academia.edu paper have addressed my critique of their control group by classifying each of the papers in their dataset and by limiting their reanalysis to articles reporting original research. While the main result of their reanalysis still holds, the effect size dropped from 83% to 73%. Papers posted to Academia.edu are still more likely to be found on other free access sites, however. A response to my post with a description of their classification and reanalysis can be found here. The authors’ research paper was replaced with the revised copy and the homepage for Academia.edu now claims “Boost Your Citations By 73%”