arrow on bookI can’t tell you when I first encountered C. Wright Mills’s article, On Intellectual Craftsmanship, but for me — as an educated layperson — the article was memorable for its discussion of the steps a working scholar should take in beginning to integrate his or her thinking about a topic with the scholarship of those who had gone before. The critical take-away for me was Mills’s characterization of the monographs that would be generated by his young audience of social scientists as being “organized releases from the continuous work that goes into them.” The scholarly process — his ‘continuous work’ — could be expected to emerge from files of notes, annotated pages, and scribbled connections among ideas.

I was reminded of Mills’s work when I was reading the working draft of a JSTOR Labs report published in December of 2016, Reimagining the Digital Monograph: Design Thinking to Build New Tools for Researchers. (The draft is open for comment until January 31, 2017). JSTOR Labs has already launched tools for discovering content quoting material from Shakespeare as well as for helping to understand the United State Constitution. If the Labs’ folks were revisiting the structure of the monograph, I was curious as to the outcome.

The report sets out a dozen functional or design principles aimed at fostering an ideal engagement between scholarly reader and content presented in monographic form. An extended appendix includes a set of user profiles built from a small group of ethnographic studies of practicing historians and graduate students in the field. The studies look at devices and applications used by the practitioners, how they found and evaluated the monographs that they used, and the specific activities of information extraction, close reading, and subsequent re-use. JSTOR Labs used these findings to develop its Topicgraph visualization tool, which is specifically aimed at streamlining the process of identifying and evaluating the depth and scope of topic coverage within a single title.

There was a throw-away reference in the report to the possible creation of “The Way Better Table of Contents.” The TopicGraph tool is not that; rather, it appears to be a supplement to both the table of contents (ToC) as well as the back-of-the-book index. In various ways, the six profiles appended to JSTOR’s report note the prevalent activity of rapidly scanning standard elements — the ToC, the introduction, the end-notes — in order to assess a book’s argument and relevance. For me, then, the question arose as to whether those standard elements of front- and back-matter might need to be reconsidered.

Casually speaking, the ToC in a book signals the book’s structure or organizational flow. In a human-readable environment, its chapter headings serve a descriptive purpose as well as a navigational one. In the instance of a biography, for example, the TOC allows one to navigate to precisely the point of the subject’s life that is of greatest interest. When I pick up John Halperin’s The Life of Jane Austen (Johns Hopkins University Press, 1984), the ToC features descriptive and informative chapter headings (the “Years of the First Trilogy, 1796-1799” followed by the “Treacherous Years, 1800-1806”). Such specificity is more serviceable to the serious reader than perhaps the ToC encountered in Irene Collins’ more narrowly focused biography Jane Austen: The Parson’s Daughter, (Hambledon Press, 1998), in which the chapter headings are far more general—“Dancing,” “War,” “Love and Tragedy.” The serious reader has to compare the two volumes, before he or she can recognize that Collins’ chapter “Love and Tragedy” covers at least in part the same timeframe in Austen’s life as Halperin’s chapter “The Years of the First Trilogy.”

Leveraged against one or both of those titles, JSTOR’s elegant prototype is a visualization tool that would allow me, the reader, to grasp more immediately where within those two relevant chapters I might turn in order to reach segments most relevant to a study of Austen’s early novel Sense and Sensibility. On the one side of the screen is the visualization display showing key concepts and phrases within the text, while on the other side is an online PDF reader for viewing the text. Clicking on the left-hand display takes one immediately to the critical page and/or chapter. The value-add lies in the glance-ability provided by the tool. If the two biographies I mentioned had been part of JSTOR’s beta, the visualization would have displayed the greater frequency of references in Halperin’s book without the loss of time required to scan the table of contents and then flipping to the index. It’s less disruptive to the workflow process of the reader in pursuit of a thought.

Consider applying that kind of evaluative support to another book on my shelf, A Companion to Jane Austen Studies (Greenwood Press, 2000). This is not a monograph. It is a series of individually authored literature reviews and bibliographical essays that provide the reader with an overview of useful scholarship in the field. There is no guidance provided as to the content of the various essays except for those listed chapter headings. The value-add of the Topicgraph tool, if applied to this volume, would be that I could see where references to Sense and Sensibility might occur outside of the two chapters readily identifiable in the ToC.

Of course, the value of such a tool could be taken a step further if it were constructed to run against a full collection of Austen-related materials. Such new dashboard tools could be a boon to digitized special collections in saving the time of the reader.

Front and back matter are evaluative aids. Aside from making available PDF scans of those pages, I’m not persuaded that digital platforms have done much in support of that active form of use. There may be specific platform functionalities that have been introduced; in the trade sector, I’m thinking of such props as Amazon’s “Look Inside” feature or the Kindle X-Ray function. (Amazon’s X-Ray feature, useful in identifying people, places, and terms appearing in a book, is currently applied primarily to those works of fiction that might be assigned in the classroom. Its value is thereby limited to being a type of rapid reference or “look-up” tool.) Aside from wondering what’s holding up Amazon’s application of the tool to nonfiction, the next question that springs to mind is why digital book publishing in the humanities and social sciences hasn’t assumed more of a leadership role.

Front and back matter isn’t something that serious readers just skip over. As JSTOR Lab’s report shows, it’s where they actually begin. Is there a way to make those critical, “load-bearing” supports more useful for the purposes of conveying information or aiding in navigation? For years, there have been mutterings about metadata and abstracts at the chapter level, but (one assumes) the lack of any evident progress in that regard is due to the additional workload for authors that it entails. It’s much better to see how technology might automate and amplify meaning.

One of the dozen design principles that was referenced in the JSTOR Labs report (in fact the primary principle heading the list) was the centrality of great writing to the monograph. There may be automated functionalities that can amplify or otherwise be appended to a linear narrative, but monographs rely upon the ability of the author to express his or her findings logically and effectively via text. If we’re to redesign the monograph then, it is key to avoid blocking the connection that can made between author and reader. In Reimagining the Digital Monograph, the contributors seem to suggest that providers should be focused on enhancements to navigation and collaboration. The anticipated future of the book may not be about augmenting chapter titles or associated metadata, but neither is it about eliminating anything. It’s about the author’s text.

What we think of as front- and back-matter has largely been honed down (over a very long period of time) to the most necessary elements — title page, colophon, acknowledgements, notes, bibliography, index, etc. There might be some finagling of placement or the occasional folding in of a preface with an introduction, but no one seems upset over the status quo of the book as conveyance. Mulling over the JSTOR Labs’s report, I have come to the conclusion that the digital monograph features that are most immediately amenable to being changed or tweaked (insofar as content or platform providers are concerned) are exactly those Topicgraph-style dashboard tools that allow the reader to examine content from more varied distances.

When you look at user profiles and how individual scholars develop their intellectual craftsmanship, you see that their work isn’t primarily performed at a collection level (unless it involves text- or data-mining of an author’s corpus). It’s done at the level of the individual volume, and as one of the principles put forward in this report notes, “The ideal digital monograph should allow different kinds of readers to navigate it in different ways” (emphasis is from the original). The tools of intellectual craftsmanship may shift in terms of device or medium, but the nature of the activity remains the fitting of one new idea into the edges of the greater knowledge framework.

Again, the draft version of Re-Imagining the Digital Monograph: Design Thinking to Build New Tools for Researchers remains open for comment until January 31st, 2017.


Jill O'Neill

Jill O'Neill

Jill O'Neill is the Educational Programs Manager for NISO, the National Information Standards Organization. Over the past twenty-five years, she has held positions with commercial publishing firms Elsevier, ThomsonReuters and John Wiley & Sons followed by more than a decade of serving as Director of Planning & Communication for the National Federation of Advanced Information Services (NFAIS). Outside of working hours, she manages one spouse and two book discussions groups for her local library.


10 Thoughts on "Intellectual Craftsmanship and Scholarly Engagement — JSTOR’s Ideas for Redesigning the Digital Monograph"


You write that “For years, there have been mutterings about metadata and abstracts at the chapter level, but (one assumes) the lack of any evident progress in that regard is due to the additional workload for authors that it entails. It’s much better to see how technology might automate and amplify meaning.”

In our case, we have been creating metadata and abstracts at the chapter level (and for some titles, at the sub-chapter level too) for many years – something STM publishers seem to do too. Our authors sometimes balk at the need to provide abstracts so we have a simple workaround – we’ll use the chapter’s opening paragraph instead. This may not be ideal, but the opening paragraph usually has the necessary keywords, so it serves as a good proxy for an abstract.

Since we now handle around 1000 new books a year, we’ve built tools to semi-automate the process of metadata extraction at the chapter level, in some cases it is totally automated, and we collect all the metadata – at all levels – into a single database from where we publish.

The results, in terms of generating downloads, have been impressive – our books are more discoverable because the chapters are now able to be found by search engines independently of the book. Users also seem to like being able to choose to download chapters only, some 2/3rds of our downloads are now at the chapter level compared to 1/3rd at the whole book level.

Toby Green
OECD Publishing

The issue tree structure of monographs might play in here. See my little essay at

The TOC is a rudimentary issue tree diagram. It works because while text is usually a linear string of sentences, the thinking expressed has a branching structure, which I have named the issue tree. Presenting a tree of thoughts as a linear string requires a lot of jumping around, from one line of thought to another, in bits and pieces. Seeing at least the top of the tree, along the lines of the TOC, might be useful. One might see not just the topics, but the basic thinking.

Thanks, Jill, for this great overview of what was a tremendously fun project. I’m so pleased to get the word out about it, and am very very eager to hear the thoughts and feedback of the community and incorporate that input into the final draft and future projects. Topicgraph is really just the first of many potential ways to improve the digital experience of monographs — so much still to do!

I love your suggestion that Topicgraph support not just book-analysis, but also corpora. If you don’t mind, that might just find its way into the final paper and, maybe a future project.

I 100% agree on the value of chapter-level metadata and abstracts! As Toby points out, above, those don’t come free or easily, and that’s for a publisher with contact with the author (as an aggregator, JSTOR would face an even steeper hill to climb). One of the constraints we set ourselves on this project was that whatever we did would “scale” without substantial incremental per-book cost, as might be incurred for chapter-level abstracts, or, more granular semantic tagging of book indices. That’s not to say those investments aren’t worthwhile! They very often are, and I am thrilled that forward-thinking publishers like OECD are investing in them (I’m always quite interested in hearing about automation of this chapter-level data). For us, this constraint was there in acknowledgement of the fact that that investment might not be worth it or even possible for all (especially backfile) books. Topicgraph’s data-science-based topic-modeling approach was a way to craft a solution that, if it were considered valuable, could scale to the 50k books within JSTOR and beyond.

It would be useful to be able to use the topigraph to find specific lines of reasoning or argument, rather than just topics. The underlying issue tree structure plays in here.

I agree that topics are just the beginning. It would be great to identify entities (such as Jane Austen or Sense and Sensibility, as Jill describes in her piece), as well as things like schools of thought (some schools of thought do emerge as a topic model, but i can’t say it’s consistently successful at this point). I like the idea of also grasping lines of reasoning or even kind of argument being made (at least for some disciplines this could be helpful). Are you aware of systems that are doing this algorithmically? Is there an automagic way of generating “issue trees”?

I know of no such system but it should be feasible, especially given the success of question answering systems. The issue tree exists because when we write we are constantly answering unspoken questions. Typically each sentence is answering an unspoken question posed to a specific prior sentence. Branching occurs because more than one question can be asked.

Comments are closed.