Unless you are a data specialist, harvesting valuable insights and trends in usage reports can range from painful to impossible. Data is housed in various formats and systems, often with limited visualization capabilities that allow non-technical users to explore patterns in their data. In recent posts, we have explained how current analytics tools, typically developed for specific use cases and business models, are not well-suited for today’s requirements.
At LibLynx, we are in the midst of a project that aims to transform how stakeholders of all kinds interact with usage data and explore the impacts of publishing programs. Deploying human-centered design principles, we aim to democratize analytics tools so they are delightful and intuitive for anyone to use.
In this post, we’re sharing what we’ve learned about the limitations of current data experiences, as well as some recommendations for humanizing it so that our community can more effectively leverage the value in usage data.
Limitations on our data experiences
For most content providers, analyzing usage data requires gathering reports from multiple platforms and leveraging specialist staff and software to make sense of it all. Web analytics is managed by Adobe or Google Analytics, COUNTER reports might come from a dozen different sources, and customer or sales records are in yet a different system. This means that most data are converted to tables that can be exported as spreadsheets, where much of the analysis takes place.
Our industry lacks a method that delivers simple and effective visualizations of usage data that are accurate, flexible, and easy for anyone to use. Content providers need to communicate the value of scholarly publishing and, therefore, often rely on data experts (if available) and software (such as Tableau, Power BI, or Adobe Analytics) to bring disparate reports together. This means that most decision makers in our sector only enjoy second-hand data experiences, rather than engaging directly with data (which otherwise requires significant technical training). This all adds up to a challenging experience navigating usage data, which are critical assets for informing business decisions and measuring impacts.
Many of these limitations experienced within our communities are a result of leveraging the same reports to serve multiple purposes to a variety of stakeholders. Usage data are a commercial imperative for publishers to share with institutional subscribers and partners, all of whom use this data to help measure the returns on their investments. That same data is aggregated and analyzed by publishers to determine the most popular collections, development of fields of study, and in-demand topics, as well as the priority of efforts by marketing, sales, editorial, and product management teams.
To reshape these experiences, we aim to improve under-the-hood data frameworks as well as the front-end designs. In the last year, we have conducted a few dozen interviews and usability tests with publishing stakeholders, alongside some market research, and have unearthed some valuable insights into how we might transform how our industry thinks about usage data. Based on these learnings, we are experimenting with innovative approaches to evaluating usage data, optimized to meet today’s needs for open access, such as consortial reporting,open-access analytics, as well as the interactive tools and features that enable exploration of the data.
“Fractured” journeys, from exploration to explanation
Based on this research, we have learned that publishing staff in sales, editorial, and product management are eager to play with data directly, to investigate trendlines and observe patterns, such as seasonal peaks and troughs. Business development teams rely on impact metrics to do their jobs, but this requires the kind of data that “is out there, but takes a hundred hours to collect,” as one manager told us. We have learned that publishers believe trends in usage data are best discovered and understood when reports are aggregated across platforms and access types (such as subscription or open access). However, “we all struggle with the fractured nature of usage data,” as one analyst put it.
Human-centered design research shows that we typically enter into data analysis workflows either intending to explore patterns within the data or to explain a trend observed in the data. Exploratory workflows need some unique features to enable immersive and engaging experiences with data, which are intuitive and available for any level of data expertise. In contrast, explanatory workflows are unique in their focus on telling a story to a specific audience, where optimizing data visualizations and exporting those images are paramount. These workflows demand different kinds of tools and features to facilitate a productive experience with usage data.
However, we have learned in this project that our industry’s typical usage reporting systems do not attend to these distinct needs, and therefore often fail to meet these needs. If available, some publishers are either investing in software to ingest and normalize reports and/or data experts to oversee the process of aggregating and making sense of usage data. This needs gap is also evidenced by the fact that some publishers use chatbots to lighten the analytical load by uploading large data sets and prompting the system to present results as easily digestible conclusions.
Labeling and terminology
When we conducted a card-sort test, designed to identify how publishing staff perceive usage metrics and their relative value, we learned that our community is not using the same labels and terms for various datasets. For example, usage for ‘titles’ might report on data for specific publications, or might report on data for articles or chapters of a publication. During our interview series, one publishing executive told us they want to be able to “compare like-for-like across our channels,” but find this difficult because some platforms report usage as ‘views’ while others report on “‘hits,’ which are vague… and unclear metrics.”
We also learned that the most important data points are those that represent usage by organization, region, subject, and publication format or title. Teams dedicated to business development often want to start their exploration by drilling into data for a particular region or organization. Editorial and product teams typically begin their usage-data explorations via a publication format, title, or hosting platform/collection.
One executive told us that ‘total item requests,’ which is a COUNTER metric, is often seen as a valuable starting point to explore data “because it is an indication of real engagement” by readers. These various engagement metrics can be difficult to compare across sales channels. Some amount of reported usage is impossible to affiliate with a specific location or organization, let alone be sure the usage is from humans (and not ‘bots,’ or non-human traffic from systems that are scraping content sites).
There is a high demand for slicing and dicing data by how a publication is being consumed, which is often referred to as ‘event types,’ a phrase that is unclear to many users. While ‘download’ metrics are often understood to indicate perceived value found in a publication, some stakeholders are less sure how to interpret the meaning of metrics such as ‘investigations’ or ‘requests.’ Labels for such critical data should ideally reflect the context and objectives of the staff conducting the data analysis. One interviewee commented that all other relevant data points, such as date, location, and subject matter, relate to the usage itself: “Everything else flows from the usage event.”
Humanizing the data experience
Given the business-critical needs for data analysis, our industry needs a new approach to designing data experiences. In addition to the obvious user expectations for data accuracy, security, and speed of processing, I would recommend that service providers adopt user-centric principles for the design of their data products:
- Universal: Develop features and workflows that can be used by any publishing staff or stakeholder, regardless of data expertise.
- Flexible: For publishing stakeholders to directly interact with and explore their data, instead of simply consuming reports, analytics tools should allow them to begin their journey from any data point.
- Consistent: No matter where a user begins that journey, they should always be able to see data visualized in some manner (even just a table), so they get an immediate sense of how changing filters or variables impacts a report.
- Error proof: To deliver an enjoyable and immersive data experience, tools should present illogical data queries; error messaging is one thing, but it’s important to not let users trip over mistakes as they explore.
- Contextual: While it may be ideal from an operational perspective, data labels should ideally reflect how stakeholders refer to data points and understand their relative value and impact.
“Stories are data with soul” (Brené Brown)
If we aim to democratize and revolutionize usage analytics, our industry must develop tools that allow publishers to identify the human stories and social impacts reflected in their data. This community is hungry for more elevated analytical experiences that are intuitive and simple, that do not require hours of training or a data science degree to manage your usage data. Data experiences should be engaging and immersive, supporting delightful exploratory journeys through usage data. These tools should be both accessible for users with disabilities (e.g., WCAG standards) and in terms of being inclusive of anyone to pick up with any level of expertise.
Users should be able to explore their data from any starting point, to support the exploratory and explanatory needs of all publishing stakeholders. Users should be able to visualize their data in a variety of ways, to best suit their reporting and presentation needs. The next generation of analytics tools should be customizable and exportable, and easily embedded in other web environments. Tools should make usage data easily viewed and understood, not hidden in Excel tables or complicated systems. Various AI solutions offer new ways of analyzing large, complex datasets at lightning speed, an ideal application of such technology.
LibLynx is experimenting with analytics tools that can be used by anyone within an organization to easily and delightfully explore usage data patterns and insights. Watch this space!
Discussion
7 Thoughts on "New Ways to Illuminate Stories in Your Usage Data"
Didn’t realise the kitchen was now running ads!
Hi Heather,
There’s often a blurry line between providing information valuable to the community and promotion of the organization with that information’s services and products. Last week for example we ran a piece questioning role that preprints are playing in misinformation that some interpreted as an “advertisement” for the author’s journal which offers a different approach to peer review. The week before that we had a post discussing the results of a consultancy’s study on PID adoption for a country, which could easily be seen as an “advertisement” for that consultancy’s services.
As the editor, in my view, all of these posts provided useful information to the community, and weren’t blatant sales pitches. In this case, the post offers a report on lessons learned from a study about approaches to better data practices, and while it does mention that efforts are in place to develop better tools, no specific tools are mentioned nor promoted. What, in your opinion, is being sold here, other than better data practices? Is it possible to mention the work being done by any individual or organization without it being somewhat promotional for that individual or organization?
Thanks for sharing, Lettie. Compiling, unifying, and understanding usage/user data has been a challenge for publishers and clients for a long time; it’s nice to see examples of ongoing research in this area. The insight you shared shows the importance of a forward thinking data and presentation strategy.
I’m a longtime HE textbook person who has been learning a lot about libraries since coming to Cambridge. That intensified in 2020 we launched our HE website that subscription based textbooks for libraries. Part of that learning has been around usage data, which in some ways didn’t see that different from learning analytics offered in most educational ebooks, but it is different! The learning analytics are for the instructor and the student and while the instructor makes the decision to use or not use a textbook, I don’t “think” that learning analytics is the biggest tool they use for that decision.
But this article has made me wonder if librarians have thought about what usage data would be helpful to instructors using materials in their courses through the library? You do run into another whole set of parameters as instructors look for data like this in a formative rather than summative manner. They want to act on what they see during the course, not after. Maybe I should just ask my Cambridge library colleagues, but as librarians promote use of library materials as course materials, have they been thinking about how to share this type of data? Based on the above article, I see many challenges, but that feels like life today.
As always, I’m delighted that the Kitchen is highlighting the value of usage metrics.
One of the primary benefits of standards is that all users of the standard share a common language. The lack of definition around ‘views’, ‘downloads’, ‘hits’, and so forth is precisely why the volunteer team who developed Release 5 and then Release 5.1 of the COUNTER Code of Practice chose instead to use much more clearly defined metrics (investigations and requests). Over the last 18 months I have been working directly with publisher staff to deliver education and training about how to work with COUNTER metrics well beyond their traditional use within a sales team. In most cases, editorial, operations and other non-sales teams had never been provided with COUNTER reporting and thus they were simply unaware of the availability of normalised metrics. I would therefore encourage everyone to start from the existing standard when considering storytelling with usage metrics, rather than reinventing the wheel.
I would also like to take this opportunity to correct a misconception present in the post: the assertion that metrics within COUNTER reports cannot be definitively associated with humans rather than bots. The COUNTER Code of Practice has always required that bots be excluded, with that rule in place as clearly as possible and with bots broadly defined for more than two decades. I’m (painfully!) aware that COUNTER is unable to keep pace with the ever-increasing quantity and variety of bots, including new AI tools, and we have been working on a project to update our bots guidance to replace the old reliance on a list of bots – a project Liblynx were invited to participate in.
For readers who are not familiar with COUNTER, I encourage you to go take a look at the plethora of free educational materials on our website (https://www.countermetrics.org/education). For those who are familiar with the standard but want to take their knowledge further, we’ve recently announced the COUNTER Academy, a 16-week in-depth online course that includes things like combining different types of data with COUNTER reports.
Hi Tasha,
While the code of practice has always required that bots be excluded, in practice the tools across our industry to do so are proving increasingly inadequate (including the current COUNTER list of robot agents at https://github.com/atmire/COUNTER-Robots). A problem that used to be manageable has been super-charged by AI-driven bots looking to crawl content in increasingly sophisticated ways, and there are many of us across the community that are exploring more effective ways to identify robotic activity. This trend has also been reported by others in social media and blog posts e.g. https://go-to-hellman.blogspot.com/2025/03/ai-bots-are-destroying-open-access.html.
In the meantime, robotic activity is absolutely slipping through into the aggregate unattributed usage of open content. I’ve personally had conversations with multiple publishers who are concerned about how this activity reduces the value of their open access usage statistics.
As LibLynx processes 100 million+ authentication and analytics events monthly, we’re particularly motivated to find better solutions. Our technical roadmap for this year includes a project to analyse our usage logs to find better ways to identify robotic activity, so that we can send misbehaving bots to the naughty chair (https://blog.cloudflare.com/ai-labyrinth/), and do a better job of understanding and communicating human vs robotic-generated usage.
Finally, a shout out to Charles Watkinson from UMP. Charles and I were at a table together at the NISO preconference on OA usage that you also attended, Tasha. He noted that not all robotic activity is bad, and more reporting on robotic usage is interesting. Exploring new ways to communicate ‘good’ robotic activity beyond that envisaged by COUNTER’s existing TDM usage reporting may be interesting.
Great, I’ll be in touch again about Liblynx contributing to the projects we have on the go about bots (as mentioned in the original comment) and AI metrics.