Editor’s Note: Today’s post is by Molly Hardy. Molly is the Project Lead for Public Data at the Library Innovation Lab of the Harvard Law School Library. Prior to that, she served as a senior program officer for the National Endowment for the Humanities. Her current research focuses on the cultural history of Botany in the early United States.

As researchers, we often say, ‘We need the data.’ Today, the data needs us.”   Kathy Reid

Behold this commemorative pitcher. The banner across the top announces, “Prosperity in the United States of America,” celebrating the completion in 1790 of the first U.S. Census. Created in or around that year in England for an American market, it does not celebrate the U.S. Constitution, nor does it remember a Revolutionary battle; instead, its focus seems far more mundane, as it lists the states and the numbers of their inhabitants. Mandated by the Constitution just three years prior and carried out by U.S. marshals on horseback, the census was, we might say in modern parlance, the first federal data set.

white ceramic jug with black illustration decorating it, commemorating the first US census in 1790
Jug commemorates the first census of the United States taken in 1790. Image courtesy of the Smithsonian Institution.

This object, nearly 250 years old and held at the Smithsonian’s Museum of American History, symbolizes the dual nature of data – in this case, federal data. It is a set of numbers attached to places in a table-like form that we can easily see as a forerunner of our modern-day spreadsheets. But it is also very much a cultural object, a ceramic pitcher adorned with the iconography of national “progress” and advancement. When studied, it reveals insights across disciplines about the breadth of human experience and even, at the level of its base materials, the natural world. I see its duality as something to be celebrated, as something we can learn from today.

Data is having a moment. Between a political policy environment focused on defunding and deleting data collections – an environment in which little can be trusted – and an onslaught of new AI tools that feed indiscriminately on data, bits of information at the intersection of rows and columns are appearing in headlines more than ever before.

But a time of loss can also be a time of invention. So as federal agencies from NOAA to IMLS, from NIH to NEH, have been weakened or destroyed, and as we emerge from the rubble, we might ask what data itself can teach us about how to rebuild. What kind of infrastructure does data require as we reimagine its preservation and accessibility?

One key lesson is already emerging: to avoid cultural memory loss, we must build systems that save what humanity needs across disciplinary silos rather than saving some archives and losing others through an accident of history. Starting in the twentieth century, the sciences and humanities have been separated at all levels: in university departments, in professional tracks, and, crucially, in funding sources. Whether philanthropic or governmental, federal or local, support for the sciences and for humanities has come from silos with little overlap.

But data does not know or care about partitions erected in the last century to create disciplines. In the past, natural history museums as well as antiquarian and historical societies would not have seen “history” and “science,” “culture” and “nature” as separate endeavors but as strands of a shared pursuit of knowledge.

These disciplinary divisions are costly. As we read about the devastation to the alphabet soup of federal agencies — NSF, NIH, NOAA, NASA on the science side; NEH, IMLS, and the Smithsonian on the humanities side — we see that each discipline has been left to scramble for itself, reinventing the tools to preserve its own data. This scramble is particularly wasteful because the data each agency relies on is actually intermingled and mutually necessary to a coherent view of both humanity and the natural world. We must address this division — not only at the level of sharing tools and techniques, but at the level of examining how the support systems for these disciplines became so fractured and vulnerable in the first place.

An example of a more unified and holistic approach is the Climate and Economic Justice Screening Tool (CEJST) created by the Justice40 Initiative, an effort to advance environmental equity. A search for this tool created with federal funding now produces a “404 page not found,” but when it operated, the tool drew on both scientific and cultural data, from agricultural loss rates to census demographics, from projected flood losses to rates of heart disease. It was used by practitioners across the “hard sciences,” “social sciences,” and “humanities” to understand the built environment and the people living in it. And when it was about to be deleted, a group drawing from all of those communities came together to form the Public Environmental Data Partners to create a safe home for it. This example provides a model in which both the tool itself as well as the means of preserving it are the result of interdisciplinary cooperation.

Other projects like America’s Essential Data, started by the former U.S. Chief Data Scientist, are now emerging to document the great value of data resources like CJEST. America’s Essential Data is a platform for people to share “real-world examples of how federal data can benefit the American people and economy.” The brief testimonials collected on the site offer oral histories of how federal data shapes people’s lives: stories that cut across disciplinary silos and emphasize the importance of data to everyone.

Responding to this cross-cutting need, the Data Rescue Project, an effort spearheaded by librarians and technologists across the U.S. and Europe, has often bridged “science” and “humanities” communities and data. And the  Public Data Project at the Harvard Law School Library, a participant in the Data Rescue Project, has preserved large-scale, cross-cutting collections, including all the data sets indexed by data.gov, and all data published by the twenty Smithsonian Museums.

We preserve this material because it needs to be preserved; data from any of these sources may be relevant to legal scholarship and practice, just as they may be important in validating and deriving knowledge across other fields. Knowledge about a complicated world does not come in separate boxes. It is ironic—and I hope instructive—that as we build cutting-edge tools to preserve vast digital collections, that we find ourselves returning to a time when a far more porous understanding of “culture” and “nature,” “science” and “humanities” reigned.

Molly Hardy

Molly Hardy is the Project Lead for Public Data at the Library Innovation Lab of the Harvard Law School Library. Prior to that, she served as a senior program officer for the National Endowment for the Humanities. Her current research focuses on the cultural history of Botany in the early United States.

Discussion

6 Thoughts on "Guest Post — Rethinking Disciplinary Data Regimes"

The word “data” is the plural form of “datum.” This seems to have been forgotten by most. Maybe editors need a nudge.

https://www.merriam-webster.com/dictionary/data
Is data singular or plural?: Usage Guide
Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (such as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (such as they, them); and as an abstract mass noun (like information), taking a singular verb and singular modifiers (such as this, much, little), and being referred to by a singular pronoun (it). Both constructions are standard. The plural construction is more common in print, evidently because the house style of several publishers mandates it.

Thanks for the Merriam-Webster quote.
However, in academic and formal writing (like the article above perhaps), especially scientific fields, the word “data” is treated as a plural, requiring a plural verb. Most style guides mandate treating “data” as a plural form because it is correct to do so. The fact that many people do not realize there is a singular form does not excuse them using the plural form incorrectly. Correct them and they will learn.

I’m not sure I would call our conversational opinion blog “formal” in any way (have you seen any of our Friday posts?). Regardless (or perhaps I should say “irregardless” to be further infuriating), language evolves and while one can try to force others to follow particular rules, over time it is a losing battle (see the seemingly abandoned “don’t end a sentence in a preposition” argument in an interesting article here described as based on the “sunk cost fallacy” https://www.npr.org/2024/02/27/1233663125/grammar-preposition-sentence-rule-myth-merriam-webster-dictionary).

Now, let’s talk about spaces at the end of a sentence after a period and em-dashes to really stir up some controversy….

Well, per Chicago, while “data” is always plural in the sciences, “In formal contexts, the most reliable approach is to retain the plural uses unless doing so makes you feel as if you’re being artificial, stuffy, and pedantic…But make your play and be consistent.”
AMA, which uses the plural “data,” also notes that many resources use the term as a singular collective noun.
Given that the blog is not a hard science or medical journal, or particularly formal, I’d say the usage is appropriate.
(After editing medical journals for several years, singular “data” does pop out at me—but the plural in nonformal use still sounds overly stuffy. So it’s weird either way)

I agree that if we’re too siloed, we run the risk of losing important data. But on the other hand having different standards and metadata tagging across disciplines and types of data is important for making data more robust and reusable. What is the correct balance. Maybe the silos aren’t just in disciplines but in governance mechanisms? It seems most of the problems with lost datasets at the moment are due to relying on a particular federal government rather than diversification and redundancy across many government and private entities.

Comments are closed.