SOAP Survey Requires Clean Interpretation of Data

All surveys are biased, although this does not make them wrong or useless.

It’s the responsibility of a competent survey researcher to disclose potential sources of biases and to detail how they may distort one’s findings. A professor of mine once taught me that it was better to reveal one’s biases or others would do it for you. It’s one of those cautionary tales that most social scientists know deeply, and a lesson that should be reiterated again here.

The report, “Highlights from the SOAP project survey. What Scientists Think about Open Access Publishing“ has been trumpeted recently with much fanfare, and there is much to admire. The survey reflects the cooperation of several publishers, foundations, and research organizations working together on a joint project. It presents the results of over 40,000 respondents on the perceptions and behaviors of active scientists with respect to publishing. It extends the dialog on how to support the production and distribution of freely accessible scientific literature.

SOAP’s main conclusions are uncontroversial: Scientists are generally supportive of uninhibited access to the research of other scientists and view access to funds and the lack of high-quality open access journals as barriers to publishing. I’m less comfortable, however, with how the researchers got to these conclusions.

First, the survey was based on a convenience sample, which was achieved by sending out requests to listservs and by directly emailing potential subjects. If you’re like me, when this email came through, you spent your 20 minutes doing something more meaningful than answering another Web-based survey. The profile of the respondents of this survey are therefore highly indicative of sampling bias and non-response bias:

The sources of the largest amount of responses, are, respectively, those of SOAP partners SAGE, Springer and BioMed Central, with 800k, 250k and 170k addresses. The fourth largest mailing was run through Thomson Reuters to 70k authors in fields where, after the first three months of the survey live-time, a relatively low response rate was observed.

Indeed, the majority of active scientists responding to this survey indicated that they have published open access articles, reaffirming the demographic profile of the respondents. The survey also includes questions that are leading, and may invoke acquiescence and social desirability biases. In the case of Q9, the question is also double-barreled:

Do you think your research field benefits, or would benefit from journals that publish Open Access articles

Not surprisingly, 89% of respondents gave a resounding “yes.” Given the issues I just mentioned surrounding this question, the researchers do not hesitate to question their results, but use this factoid to construct a definitive conclusion:

The most relevant findings of the survey are that around 90% of researchers who answered the survey, tens of thousands, are convinced that open access is beneficial for their research field, directly improving the way the scientific community work. At the same time, our previous study found that only 8-10% of articles are published yearly in open access journals. The origin of this gap is apparently mostly due to funding and to the (perceived) lack of high-quality open access journals in particular fields.

Luckily the researchers made their dataset available, and I was able to calculate the responses of several questions not reported in their report. I was particularly interested in what factors their respondents felt were most important when selecting a journal for publication. Below is a ranked list of the factors authors felt were either “important” or “extremely important”:

Q 13. What factors are important to you when selecting a journal to publish in?

Prestige (94%)
Relevance for community (90%)
Impact Factor (84%)
Likelihood of acceptance (79%)
Positive experience (79%)
Speed of publication (79%)
Importance for career (75%)
Absence of fees (67%)
Recommendation by colleagues (57%)
Open Access (45%)
Copyright policy (36%)
Organisation policy (36%)

Like similar studies of the publishing priorities of scientists, “Prestige,” “Relevance,” and “Impact Factor” are listed at the top, while “Open Access,” “Copyright,” and “Organisation policy” occupy the last places. You’ll also note that “Absence of fees” was listed as “important” or very important” to a submitting author, a detail highlighted in last year’s faculty survey by Ithaka S+R. Scientists who self-selected to take a survey on open access publishing seem very much like scientists in general: Everyone wants free access. No one wants to pay or to be told how and where to publish.

And yet the inclusion of Q13 changes the interpretation of the study quite significantly since it adds another dimension of scientists that did not make it into the narrative of the report. While respondents were overwhelmingly supportive of other researchers making their articles freely-accessible — the entire field if possible — they showed little interest in open access and copyright issues with respect to their own articles.

Given the scores of individuals involved in creating, promoting and analyzing this survey, and the voluntary participation of 40,000 scientists, the researchers missed a great opportunity to contribute valid and generalizable details to a field that is woefully lacking of objective data. While one should not dismiss the SOAP survey out of hand, we should be critical of what it measures, what the data mean, and how we can conduct better surveys in the future.

Phil Davis

@ScholarlyChickn

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/

Discussion

15 Thoughts on "SOAP Survey Requires Clean Interpretation of Data"

This survey is biased not towards open access but, as the title of this report indicates, towards open access publishing. It thus completely omits the alternative route to open access, green open access by self-archiving in digital repositories in conjunction with publishing in journals. It turns out this could seriously misjudge the wishes of its target constituency, or ‘convenience sample’, and ultimately prejudice efforts to improve wider uptake of open access.

In exposing the publication factors in Q13, Phil has found the same gap in this extensive survey that I first spotted in the original slide presentations, commenting on Twitter (19 Jan.):

– Wondering whether SOAP might have missed green repository OA entirely? Slides 31 and 37 here have the answer-Yes http://slidesha.re/hhf9nr
(NB slide 31 is figure 5 in the report)

– Gap between (author) beliefs and actions (on OA publishing) identified by SOAP are where green OA and conventional journal publishing stand

Not only does the report fail to mention the word ‘repository’, and refers to ‘green’ open access just once (indeed, as a ‘Barrier’ to open access publishing under ‘Other’ in Table 3), it completely misses the significance in the context of its own results.

Phil and I might disagree on our interpretations of what this gap might mean and how to respond to it, but I agree with him that this potentially important report is devalued by its failing to comment and understand the implications of these critical findings.

By Steve Hitchcock
Feb 2, 2011, 8:45 AM

I do acknowledge your point that OA advocates are more likely to go through the hassle of answering such a survey and that this influences the resulting data.

But while talking about bias, you forgot to mention that the outcome of Q13 is heavily biased itself by the current academic environment. The individual scientist is judged by his publications. Present decision makers (for funding or tenure) decide on the basis of prestige and impact factor of the journals published in (instead of reading the articles), so that is what scientists aim for. It does not really matter that it is more than questionable to value a single article on the basis of a journal’s impact factor. If that is the way how grants are distributed, scientists will adjust to the system. No matter if it’s right or wrong or their personal opinion.

Same with “Absence of fees”. As long as funding agencies do not offer money for OA fees, scientists will be reluctant to spend their hard earned research money for something they might get for free. But this does not necessarily mean this is the right way to go.

In this sense, I think the outcome of Q13 is kind of a self-fulfilling prophecy and can hardly be used as an argument, that traditional publishing is giving scientists what they want.

Anyways, it is clear that OA is something others benefit from if I do it. It’s all about ethics – charity if you want to put it that way. So it’s not surprising that scientists want open access to other’s articles but don’t care about their own publishing. They are as selfish and lazy as anybody else.

Along this line is a comment by Lars Fischer which can be found on my website:

“As I wrote elsewhere, as long as scientists get all the papers they want, most of them won’t give a shit. Until, that is, the whole system breaks down and every scientist is confined to those journals his or her institution can actually afford.

That just won’t happen because Open Access is changing the publishing landscape in a way that publishers simply can’t afford to limit access, because it would put them out of business very quickly. The Open Access movement is, in essence, a bunch of people working their asses off so that the majority can blissfully go on with not giving a shit.”

By olchemist
Feb 2, 2011, 9:07 AM

Reality is not a bias. Moreover, there are good reasons for the system to operate as it does. For example, given the high rejection rates of high impact journals the mere fact of acceptance is a credible measure of an article’s importance, albeit just one of several. If you don’t understand the rationality of the present system, then you probably don’t understand the issues with OA.

By David Wojick
Feb 2, 2011, 9:49 AM

I simply don’t agree that “there are good reasons for the system to operate as it does”, because one of the reasons is lack of time and/or expertise. That does not mean I do not understand.

Anyways, this is not the place to have another discussion about the IF.

By olchemist
Feb 2, 2011, 10:13 AM

Good points, Phil. However, am I correct that you are involved in a kind of running academic battle on this issue? I seem to recall some lengthy exchanges on the Sigmetrics listserv. Scientific debate is the core of scientific progress, but if so perhaps you should disclose your position here.

By David Wojick
Feb 2, 2011, 9:40 AM

David,
My position is, and always has been, to allow good science inform policy development. Calling out undisclosed bias and inappropriate analysis is part of that mission. If you know my history, you will note that I’ve railed just as strongly against publisher corruption and library groupthink. But labels are not important here. I hope that you concern yourself with the validity, reliability and generalizability of the SOAP survey and not on the personalities and motivations behind the individuals who comment on the study.

By Phil Davis
Feb 2, 2011, 9:47 AM

Please don’t be offended, Phil. Those of us in the policy community always wonder where someone is “coming from”, that’s all.

By David Wojick
Feb 2, 2011, 9:53 AM

I think it is fair to say that publishers have been doing what the academic system AS A WHOLE wants, viz., serving the P&T system that is the linchpin of the entire academic enterprise. Whether that system itself is dysfunctional, of course, is another question that deserves close scrutiny. I am among those (and I suspect Phil is also) who think it is and is long overdue for serious reform. As the former director of a smaller university press (Penn State) who had worked for one of the most prestigious presses earlier (Princeton). I was always struck at the illogic of a system that rated books published by Princeton higher than those published by Penn State even though the peer review conducted at both presses was equally rigorous and based on reports from the same stable of reviewers. And the failure of the P&T system to accommodate itself to new forms of publishing as technology advanced is itself a scandal and another reason to question the utility of this system. It is, frankly, shot through with biases of various sorts that have no basis in reality. Librarians knows this, publishers know this, and even some faculty do (at least those paying any attention), but the system keeps chugging along anyway in its inertial fashion.