Editor’s Note: Today’s post is by Shamsi Brinn and Bill Kasdorf. Shamsi is UX Manager at arXiv, bringing the experience of arXiv’s diverse users to the forefront of organizational planning. Bill is Principal of Kasdorf & Associates, LLC, a consultancy focusing on editorial and production workflows, XML/HTML/EPUB content modeling, standards and best practices, and accessibility.
Access is not the same as accessibility, which guarantees the ability to fully access content regardless of disability. Much progress has been made in the past few years to make publications accessible but the focus is typically on formal publications: the books, articles, or websites that are the end products of a publishing workflow. The need to address accessibility upstream in those workflows is critical. The papers scholars and scientists write, review, and edit, as well as other forms of research communication that are increasingly central to scholarship and science, are still largely inaccessible.
This is not an edge case. According to research by the Allen Institute from 2021, only about 2.4% of published scientific research papers are fully accessible. This is confirmed by recent research by arXiv, the leading preprint server in science and math. Their recent Accessibility Report documents that the experience of accessing and participating in the scholarly record by people with print disabilities is overwhelmingly negative. An important theme: that PDF is a barrier. HTML is strongly preferred, which enables content to be much more accessible natively.
To forward the effort to make all research accessible, arXiv recently hosted an Accessibility Forum in which researchers, academics, and software and systems providers, many of whom are themselves print disabled, could surface the issues and discuss possible solutions.
It was a huge success. Over 2,000 people registered for the Forum; over 350 attended the live event; and hundreds more are accessing the recently published videos. This outpouring of interest was, frankly, shocking, and enormously encouraging to those of us who are working to make all research, not just formally published books and articles, “born accessible.”
Why This Topic is So Essential
The keynote speaker, Jonathan Godfrey, a blind statistics researcher from Massey University in New Zealand, made a big impact. He was the first totally blind person to get a job as a lecturer in statistics despite the formidable obstacles posed by inaccessible scientific content and tools. Today, in contrast, he uses R and HTML for both authoring and consumption, the same formats his students use. He stated that 2022 was the first year of his professional life that he didn’t need the assistance of sighted people, thanks to HTML — and he’s now much more employable than he was twenty years ago.
The Importance of Standards
Avneesh Singh from the DAISY Consortium has had a hand in forging today’s foundational standards that are a prerequisite for accessibility. He pointed out that an accessible reading experience is dependent on three components: making the content itself accessible; using rendering software that supports accessibility; and assistive technology (AT) such as screen readers. As he made clear, “Assistive technology needs a uniform or common language to talk to each other. And this common language is accessibility standards,” specifically citing Unicode, HTML, CSS, WCAG, ARIA, EPUB 3, PDF/UA, and MathML, as well as extended descriptions of images, on which accessibility standards are based.
Addressing Accessibility at Submission
The next speaker was Dr. Cynthia Bennett from Google, speaking as a blind peer reviewer who uses screen readers that generate either text-to-speech or Braille. Her description of dealing with most papers was devastating. She pointed out that the current process for making published papers accessible “reinforces an ableist hierarchy of access.” She often has to OCR inaccessible papers; the results are so poor that this is often not worth the effort. And attempting to create image descriptions with AI is virtually useless: the resulting descriptions bear little resemblance to what is needed to review a paper. The situation is so dire that she can’t even make her own papers accessible to screen readers; instead, she’s dependent on junior scholars to help her out. She articulated a common theme of the forum: scientific papers are usually made accessible at publication; they need to be accessible at submission.
The Student’s Struggles
Lukas Nadolskis, who is working on his Ph.D. in neuroscience at UC-Santa Barbara, said, “I think every blind person that works on research [recognizes] that we need to work at least twice as much as everybody else.” HTML has been a game changer for him. “In order for me not to work 10, 11 hours a day to access half of a paper . . . the only way I could do this is through HTML.” But papers are often not available as HTML, and books are rarely available in HTML, which is a big problem in his field. “HTML has enabled me to work in research like my sighted colleagues,” he said; yet he still has not found a good way to get from HTML to embossed braille, which is his preferred format.
The Power of the Publisher
Stacy Scott, head of accessibility at Taylor & Francis and chair of the Accessibility Action Group of The Publishers Association in the UK, reported on how a leading scholarly publisher is successfully addressing accessibility. At university she started out intending to major in the humanities and social sciences, but she found the burden of having to scan so many inaccessible print books untenable, so she switched to math, which ironically was more accessible. But she couldn’t get a job with her math degree — “people didn’t believe a blind person could do STEM.” She wound up first working in international development, and then with RNIB Bookshare, which was “a sea change,” enabling her to understand the need to deal with so many types of disability. Now at Taylor & Francis, she looks at “both content and access to content.” T&F now has 130,000 books as EPUB 3s; their journal articles are available as EPUB, PDF, and HTML. Today, 2,500 of their books are fully accessible with alt text, and they have pioneered a workflow based on obtaining draft image descriptions from authors at submission. They’ve provided a hub to facilitate and train authors. She is a big advocate of building accessibility into workflows instead of retrofitting accessibility and stressed the importance of gaining ownership of this at a senior level in the organization.
Research Tools Need to Be Accessible
Dr. Patrick Smyth, a Postdoctoral Fellow in Humanities Entrepreneurship at the Publics Lab and “Chief Learner” at the Iota School, describes himself as “a blind hacker and programmer whose work focuses on citizen technology, critical infrastructure, accessibility, and technical pedagogy.” Dr. Smyth pointed out that he not only has to read content, he also has to take notes and participate in many other ways in scholarship and pedagogy. He stressed how important it is to involve disabled people in the technical work with new formats. It is critical they are “not just consumers but are actively involved with the creation of the tools, and the platforms, and the formatting, and the standards.”
The Essential Role of Support Networks
A joint presentation by Breanne Kisselstein and Anne Logan, both Ph.D. candidates in separate departments at the School of Integrative Plant Science at Cornell, stressed that achieving accessibility goes beyond research formats. Ms. Kisselstein is deaf/blind and neurodiverse; Ms. Logan is deaf and an advocate and tutor. They have personal experience with “The Deaf Tax”: the mental fatigue associated with having to advocate for oneself to access the same information as others. “We can’t take a break from advocating because we are trailblazers,” explained Logan, “and we want to make it easier for other people after us to achieve their dreams.” They pointed out that “multiple marginalities” don’t just add difficulty, they multiply the problem. They discussed the “hidden curriculum,” an all-too-common set of unwritten or unspoken rules. All of these benefit from support networks, which make “a world of difference”: “It takes a village to make an MS/Ph.D. candidate.”
Dynamic Discussions
The balance of the forum consisted of a panel followed by three discussion sessions: one focusing on image and graph descriptions, one on “myth busting the needs of Deaf academics,” and one on what authors and publishers can do.
As was emphasized by earlier presenters, a theme of the panel was the need for research to be accessible right from the start, not at a later phase of publishing. We are underutilizing existing technologies and creating avoidable remediation work downstream. Lastly, Sarah Kane, a blind Astrophysics student at the University of Pennsylvania, ended with a rousing challenge to professors and journal editors: change accessibility from a nice-to-have to a requirement.
A common theme of the first discussion session was to “move off PDF,” which can never be as accessible as a fully accessible website or EPUB 3. A primary issue is that the content needs to be reflowable in order to work in multiple modalities and on multiple devices. A complex graphic consisting of two bar charts was used in a demonstration. The use of an extended description in HTML performed much better than just using VoiceOver to “read” the figure. Best of all, when the HTML code of the data tables from which the bar charts were produced was provided, assistive technology could actually read all the detail. Extended descriptions like these can be provided as links from a PDF.
One key insight from the second discussion group was that it’s insufficient to presume that a sighted Deaf user can just “read” a website in English. The reason? They may not be proficient in English, since their first language may have been ASL (American Sign Language) or BSL (British Sign Language). Another insight: Deaf people are often visual learners, so good figures and posters are very valuable to them; many figures and posters are poorly done and, as with visual disabilities, making research accessible to more people makes the content better for everyone.
Finally, in the last discussion session, it was emphasized that PDF cannot be neglected, because it is still so dominant in scholarly publishing. Dr. Kaveh Bazargan from River Valley Technologies pointed out the importance of open peer review and post-publication peer review, and the need for them to be accessible. An initiative at NIST called LaTeXML converts LaTeX to HTML, and arXiv is very close to implementation of HTML formats alongside the traditional PDF.
Toward an Accessible arXiv
What arXiv has heard from the community is that the most impactful change it can make is to offer HTML versions of papers, alongside the current PDF and TeX source files. 90% of the arXiv corpus is submitted as LaTeX and reaching the goal of reasonable accuracy will be a collaborative endeavor with the LaTeXML and LaTeX project teams. In addition, arXiv will make it simple for authors and readers to flag issues in the new HTML renderings, building accessible feedback systems that will unearth rendering issues sooner and improve conversion over time.
Conclusion
A highlight of a number of presentations was that progress is being made, if slowly. As Stacy Scott said, “I really feel like we’re finally turning that corner. I interact with publishers and vendors on a very regular basis and I’m seeing a great change in the trend… I’m just absolutely delighted.”
The trends towards accessibility are heartening, energizing. The enthusiasm around the Forum is one more sign that the scientific community is embracing accessibility as necessary and inevitable. Let’s make Open Access truly open. Whether research is published in a journal, a preprint, a website, a notebook, or beyond, disability should never be a barrier to accessing it.
Discussion
2 Thoughts on "Guest Post — Making Research Accessible: The arXiv Accessibility Forum Moved the Action Upstream"
Thank you for this article. At the recent ConTech conference in London there was an excellent panel discussion on content accessibility with tie ins to the European Accessibility Act. The act will soon have compliance penalties that will affect organizations seeking to do business in the EU. Accessibility audit and remediation services are in high demand at my company these days.
I was delighted to be involved in this excellent and informative Forum. A point I did not get a chance to emphasize is that we need content to be fully accessible (i.e. to people with any disabilities) but the accessibility must not be format-dependent. Currently by far the most convenient way of making content accessible is via HTML. But we need to look long-term, and beyond HTML. Who knows what the standard format of tomorrow will be? If we save content in a logical and structured format (say XML) then it is possible to write a “filter” to convert that structured format to any other current or future format, including HTML, PDF, Braille, voice, etc.