Ask The Chefs: Do's And Don'ts Of Data

Over the past several years, data discussions in most organizations appear to be on the rise within scholarly communications and beyond. The last time we talked about data on Ask The Chefs was more than two years ago, when we considered what is the most important data for publishers to capture? But when we have data, how should it be handled? What can and can’t it do for us?

This month we asked the Chefs: Where should organizations leverage data in decision making? Where shouldn’t they?

Alice Meadows: Rather than attempting to answer this question directly, I’m instead going to try and address the issues it raises by sharing two examples of when an initiative I’ve been involved with used data: one less successful, one more successful.

The first is from my time in scholarly publishing, when the company I was working for brought in consultants to help analyze and streamline our marketing activities, which were divided primarily by broad discipline, with some teams focused primarily on journals, and others marketing a mix of books and journals. The consultants asked each team to gather data about time spent on different activities, and the relative value of each type. When the consultants presented their findings — based on this data — many of their recommendations didn’t make sense to the marketing directors. The reason soon became clear. Although we had been provided with a common template for collecting our data, this didn’t include definitions for each activity type, so instead every team used their own. Sometimes these matched up across all teams, but often they didn’t. The consultants were not well-versed enough in marketing to know that, for example, PR was the same as publicity; and not well-versed enough in publishing to know that books marketing and journals marketing are quite different. Lesson learned: don’t make decisions based on data unless you’re confident that data has been collected by someone who fully understands the context.

Lesson learned: don’t make decisions based on data unless you’re confident that data has been collected by someone who fully understands the context.

The second example is from my time at ORCID. One of the first things I did was to carry out a community survey, which we then repeated every 18 months or so. I can’t claim to be a survey expert — or a data expert — but I do know that every time we did that survey we learned a bit more about which information to collect and why. In particular, I learned that we needed to understand enough about our community to know how to segment it. The first time around, when I was very new, I really didn’t know much about the ORCID community so, although the data was helpful, the survey questions were limited by my lack of understanding. By the third time, I had a much better sense of both who the key stakeholder groups within our community were and what we wanted to learn from our community, and the data were significantly more valuable. Lesson learned: meaningful data collection and analysis requires understanding of the community being studied.

Haseeb Irfanullah: Data guides decision-making in almost every facet of our lives and, in many cases, we don’t even realize it. Organizations are no different.

We create organizations based upon data on their possible roles in our society and on opportunities they could harness. While running an organization and occasionally changing its directions, data helps us a lot. But sometimes we may even feel that it is the data that is controlling an organization’s fate — telling us when to merge our organization with other, sell it, or even kill it.

But what data we use matters with the organization in question. An organization’s profit margins, growth rates, success rate of bidding, clients/consumers it is reaching, staff size, efficiency indices — often in relation to its competitors — for example, matter a lot for a commercial entity.

For not-for-profit organizations, charities, or civil society organizations, some of the above data might not be that important in decision-making. While working in the non-government (NGO) sector for the last couple of decades I have seen that although visions, strategies, programs, and brand value do define an organization and its direction, it is often the fundraising opportunity, or a lack of it, that is the main driver. I have seen how the boundaries of ‘not-for-profit’ and ‘for-profit’ personas get blurred just to keep such an organization alive.

Sometimes you need to do things not because data tells you to do it. You need to do it because you believe you should do it.

Numbers are a reality of life. But they are not the only ones. Sometimes organizations are born out of emotion, not necessarily because hardcore data is supporting. There is a bookshop-cum-cafe in Dhaka, Bangladesh. Three years back, 30 -odd people joined hands to build it, led by the widow of a reputable, progressive book publisher who was brutally murdered by a group of radicals in 2015 in his office. I didn’t know this family in person. But a grief-stricken wife’s strength and courage to commemorate and celebrate her slain husband’s beliefs and philosophies fascinated me, and I joined in. Despite many attempts, this bookshop called “DipanPur” has been struggling to reach a break even level. I honestly doubt it will survive the on-going pandemic-forced lockdown. But to me, it really doesn’t matter. Sometimes you need to do things not because data tells you to do it. You need to do it because you believe you should do it.

David Smith: If you can get data to help inform a decision then you should, and your goal should always be to get data. Without getting into the weeds, I’ll define data as ‘actionable information’; it could be quantitative, it could be qualitative, and likelihood is, you’ll need both kinds to help you. Almost certainly you won’t have enough of it. You will also need some other things, namely a hypothesis and a model. There are two basic types of hypothesis — and to make proper use of your data, you need to know which is which:

The Null Hypothesis: There’s no difference between this thing and that thing; where ‘thing’ is what you are trying to figure out (some examples – The market will not support another journal in this field- so we shouldn’t launch one; New-fangled peer review approach is not better than ye olde traditional peer review approach so we shouldn’t do it). The objective here is to collect data and show that what you discover cannot be explained by this null hypothesis.

The Research Hypothesis: “There’s Gold in them thar hills!” This is a statement that is both predictive and testable, The prediction is the presence of gold in a specific location, and the test being to go digging for it.

You need both hypotheses so you can figure out what sort of data you need to collect and analyze, and you also need a model. And here a model is a framework for how you and your organization think about and understand the world you operate in.

Frankly I’m struggling to think of anything that can be decided upon without recourse to data. The most audacious goal or world changing crusade will ultimately cleave to the overwhelming force of reality, so you need data to tell you. You need to be constantly sampling the world as you test your ideas so you can respond to what you see in the data. But you also need to be constantly referring back to your hypotheses and your model to see whether they are accurately reflecting the reality of your particular world.

The most audacious goal or world changing crusade will ultimately cleave to the overwhelming force of reality, so you need data to tell you.

Data is seductive. Especially when displayed as pretty pictures. The data you collect is a reflection of your assumptions and biases. It’s all too easy to mistake a mirage for an oasis. Good decisions (whether correct or incorrect) are ones taken by people with a robust and rational understanding of what they are actually trying to achieve, the limitations of the information they have to hand, AND the consequences of what they are about to decide. So to change the question; it’s not where should data be leveraged, but whether there’s sufficient and effective data inputs in the organization’s (or the individual’s) decision making process.

Lettie Conrad: There’s a lot of hype around data-driven decision making and the ever-sharper tools available for automated analysis, visualizations, and storage. In my work with publishing and technology organizations, I advocate for evidence-driven decisions, which is a broader view of utilizing diverse datasets that inform our strategic and operational thinking. When we want to answer questions about what occurred in the past, let’s think like a historian and draw on the documentation, experts, and systems that reflect previous events. Data can shed light on what actually happened — how many downloads and who bought what — but data can also provide insights into the human conditions behind those occurrences. Consulting the right data in context should fuel our thinking about what comes next, empowering us to craft a vision of the future.

Data can shed light on what actually happened — how many downloads and who bought what — but data can also provide insights into the human conditions behind those occurrences. Consulting the right data in context should fuel our thinking about what comes next, empowering us to craft a vision of the future.

To be clear, I take a broad view of the word “data.” Of course, there’s the quantifiable phenomena that populate dashboards — usage logs, sales records, etc. Robust data-driven decision making also embraces qualitative evidence and interpretive insights — customer perceptions, trend analysis, and the like. If “all is data,” then we have a world of methods to leverage the intelligence available to us. If we only look to that which can be calculated, we’ll leave behind important clues to the answers we seek. In particular, qualitative data can help fill in the gaps left in the margins of our spreadsheets — adding insights into the why and the how behind the what and the who.

There are questions, however, that require data (of any sort) to play a background or consultative role. These are the big questions of mission and values, the questions that drive innovations and inspire communities of scholars and scientists. Having a firm handle on market size and user demand is important, but these act as foundational knowledge on which we build our dreams. We cannot operate entirely on balance sheets, we cannot always wait for consensus or proof documented on every ledger. An evidence-driven strategic plan must leave room for creative vision and strategic bravery. Data and human ingenuity must work together.

An evidence-driven strategic plan must leave room for creative vision and strategic bravery.

Ann Michael: Whenever we ask a question about data, responses vary. Looking at the responses above, there are some clear best practices stated or implied. There are also a few more to add:

Have an objective. David’s discussion of hypotheses is very important. I’ve seen many organizations question the value of data in their planning and decision making processes because they don’t have clear objectives. Having specific questions you want to answer enables you to focus. However, there is a time and place for exploring what the objective might be, for learning more about your data and what it means. Just “playing” with data can cause you to discover new questions and new hypotheses to explore. Of course, we need to carefully manage this time so that it doesn’t become endless and so to ensure our data and analytics produce value for the organization. Bottom line: We should always be pushing toward clear questions we want answered (a precursor to a hypothesis), but to do that often requires some less structured exploration.

Know your data. Alice did a beautiful job of showing what happens when we don’t take care in data collection (and cleaning). To use date effectively and trust the results, we need to understand where it came from, how it was defined, and the context in which it was collected (to name a few non-exhaustive points). When combining internal sources of data, data elements need to be cataloged and have some type of governance process around them to ensure that integrity is maintained (as much as possible) in data collection and preparation processes. When considering external sources of data, similar information should be provided defining where data has come from and what has been done to it. As Alice also mentioned, this is not ONLY in the purview of a data scientist — if you use data, you need to understand its meaning and context. Data collected for one purpose may be used for another, but do so with caution.

Data, and the models built from data, don’t magically spit out answers to complex questions.

Interpret context. Haseeb and Lettie both brought up the role of subject matter expertise in analysis. Subject matter experts are needed to frame objectives and to interpret results. Data, and the models built from data, don’t magically spit out answers to complex questions. As Lettie stated, we “…must leave room for creative vision and strategic bravery.” The key is to be willing to start on a path that might be based on that bravery, but to get to the point where you can use data to confirm or refute that path’s value — and to assess if adjustment is required. We must be careful not mistake bias for intuition. It’s a difficult balance to maintain.

Check your bias at the door. We all come to any analysis with preconceived ideas and biases. It’s natural and unavoidable. This is the nature of having experience; experts have opinions! Many times these opinions are valuable in shaping research, but they can also get in the way of recognizing when research is pointing us in a new or non-intuitive direction. Some things we can do to minimize the impact of our biases include: structuring analysis around a hypothesis, including individuals in our analysis team that have very different perspectives than our own, deeply reflecting on results that cause us to have an immediate negative reaction, “sanity checking” our reaction with others outside of the analysis, conducting secondary research on results that don’t sit well with us, and I’m sure you can think of more. One of my favorite sayings is from Ronald H. Coase: “If you torture data long enough it will confess to anything.”

We all come to any analysis with preconceived ideas and biases. It’s natural and unavoidable. This is the nature of having experience; experts have opinions!

Don’t expect perfection. (Understand limitations.) No model is perfect. If we wait for perfection we will never get started. That said, when dealing with data in the context of the question (hypothesis) we are attempting to address, we must be able to articulate where it’s a fair representation of reality, where it is not, and how that impacts our interpretation of it. Imperfect does NOT mean useless. A timely example of this is COVID. Under normal circumstances we use the past to try and understand the future (finding trends, correlations, etc.). What if something happens that is so rare or new that it makes the future potentially less connected to the past? Is it reasonable to assume that past trends in publishing revenue or library budgets will continue as they have for the past 5 years? No. However, what we can do is think in terms of ranges, scenarios, and triggers. We can still model – we just build different models with different assumptions and objectives.

Don’t stop. If you are truly committed to incorporating data in to operational, product, and strategic decisions, your journey is just beginning. Like anything else, aptitude and creativity in using data to uncover new insights requires practice and experience. As you start to formulate objectives, deliver on them, and execute on the practices in this list, you will find increased and more sophisticated uses for data. You’ll clean and prepare your data more adeptly. You’ll find new sources of data to tap. And, you’ll uncover new ways to make it impactful to your organization and your mission.

So, in short, where should we be leveraging data? Everywhere we possibly can! But, with an understanding of what it can and can’t do, with a commitment to interpreting results as bias-free as humanly possible, and with an understanding that our insights will get better and more impactful over time.

Now it’s your turn. Where do you believe organizations should leverage data in decision making? Where shouldn’t they?

Ann Michael

@annmichael

Ann Michael is Chief Transformation Officer at AIP Publishing, leading the Data & Analytics, Product Innovation, Strategic Alignment Office, and Product Development and Operations teams. She also serves as Board Chair of Delta Think, a consultancy focused on strategy and innovation in scholarly communications. Throughout her career she has gained broad exposure to society and commercial scholarly publishers, librarians and library consortia, funders, and researchers. As an ardent believer in data informed decision-making, Ann was instrumental in the 2017 launch of the Delta Think Open Access Data & Analytics Tool, which tracks and assesses the impact of open access uptake and policies on the scholarly communications ecosystem. Additionally, Ann has served as Chief Digital Officer at PLOS, charged with driving execution and operations as well as their overall digital and supporting data strategy.

Discussion

10 Thoughts on "Ask The Chefs: Do’s And Don’ts Of Data"

Thanks Ann et al, great post … one thing we touched on in our joint NISO presentation (that also covered the evolving world of metrics) was merging internal data (downloads, citations, author, reviewer, editor information) with external data points, where else does an author or editor publish, what’s the benchmarking and competitive external analysis. Merging the inner work environment, with the larger global data goldfish bowl can have its challenges, especially when comparing apples to oranges, but of course, is key. Data governance. Also, some people just don’t know, what they don’t know sort of thing!

By Adrian Stanley
Aug 5, 2020, 8:39 AM

Absolutely – combining data sets in a meaningful and accurate way is always a challenge, but can have great rewards.

By Ann Michael
Aug 5, 2020, 9:33 AM

As the above answers call out – data literacy and domain expertise is essential and needs higher attention in organisations when levering data. Too often people use data (because we are all told to make data-driven decisions) without understanding the limitations and use cases for the data. Data when mis-used can be selectively choosen and “manipulate” to give authority to any answer we seek. Instead of using data to seek deterministic answers, they should be used to raise questions that need qualitative follow-up and inspection of cause and effect. they should be the starting point of a discussion not the end-point.

When collecting data, we should first ask how different outcomes in the data would change our actions – if they would not change our actions there is no need to collect the data. We need to be cautious that it is not what data we can collect that determine decisions (and bonuses), but the desired outcome for user, customers, stakeholders and employees that should guide decisions. It is as important to ask what data we do not have available (or are able to measure) as what data we do have and can measure.

I’m a massive proponent of using data, when used the right way and with the right understanding of the limitations a given dataset has. I have used it to significantly improve services, employee and customer satisfaction, but I have unfortunately also seen countless questionable uses of data – often not out of ill intent, but from lack of data literacy.

By Niels Dam
Aug 5, 2020, 8:45 AM

Niels – I could not agree more – especially with this part “we should first ask how different outcomes in the data would change our actions – if they would not change our actions there is no need to collect the data.”

By Ann Michael
Aug 5, 2020, 9:34 AM

When I saw the title of this article, I was hoping the discussion would be on research data associated with the journals articles that are published. Many scholarly publishers now ask for research data for the peer review process or to publish for other researchers to use, or point to repositories where the full data is located. Are there any best practices seen? Do other researchers use the data? What about huge data (e.g. AI data or raw data)? In cases where the research results are public but the data isn’t (e.g. research with human subjects), how does that affect best practices on data?
Would someone be able to create a blog entry on the do’s and don’ts of research data publication?

By Rob
Aug 5, 2020, 11:01 AM

Here’s out back catalog on data publishing:
https://scholarlykitchen.sspnet.org/category/data-publishing-2/

Hopefully you are already aware of Force 11 and the JDDCP:
https://www.force11.org/datacitationprinciples

By David Crotty
Aug 5, 2020, 11:03 AM

“Lesson learned: don’t make decisions based on data unless you’re confident that data has been collected by someone who fully understands the context.”

This is so important. A former employer once made a massive change to job responsibilities based on data collected from a consultant who did not understand the context. It did not go well.

By Liz
Aug 5, 2020, 11:02 AM

Sounds like the employer didn’t either!

By Charlie
Aug 10, 2020, 3:22 AM

Great post, thank you! As an employee and as a consultant, I have made a career out of managing and coaching not-for-profit organizations in collecting, then transforming, data into useful information in order to advise decision-making. The points made in this post are excellent and all necessary considerations for handling data. One item to add to this post is that data collection, handling and information dissemination should be conducted ethically and responsibly, maintaining confidentiality/anonymity and using informed consent tools. When working with a data set it is sometimes easy to forget about the people in the data. No data collection is worthwhile if it ultimately damages the trust of the people we are serving.

By Sharon Kristovich
Aug 5, 2020, 11:02 AM

Belated thanks Ann for this great post! You might be interested in reading a recent article published in the journal I work with that was written for organizations trying to implement data and AI strategies in their business processes. It’s not publisher specific but it still contains helpful advice that publishers can follow. Here’s the link: https://hdsr.mitpress.mit.edu/pub/4vlrf0x2/release/1

By Rebecca McLeod
Aug 13, 2020, 3:23 PM

The Scholarly Kitchen

Ask The Chefs: Do’s And Don’ts Of Data

Ann Michael

Discussion

Virtual 5K Run, Walk, and Roll Searching for Greatest Of All Teams

SSP Welcomes Newly Elected Board Members for 2024-2025 Term

Celebrating the Dedicated Volunteers That Drive SSP – THANK YOU!

Ann Michael

Related Articles:

Next Article: