Will Editing Mix Machines With Humans? Dan Cohen Ponders the Future of Publishing

During the opening plenary of the SSP Annual Meeting Wednesday, Dan Cohen provided an interesting perspective on what might the world of scholarly publishing look like if it were a “digital native” — it was an interesting vision of new modes of scholarly communication that are based on social media, alternative metrics, and some examples of how scholars may navigate the onslaught of digitally distributed content.

Cohen, who is an Associate Professor in the Department of History and Art History at George Mason University and the Director of the Center for History and New Media, and the rest of the team at CHNM have been working actively to put into place some of these new technologies for scholarly information distribution. The Center has been tremendous leaders in digital information distribution and has supported a variety of projects from Zotero to ScholarPress and from PressForward to THATCamp. During his presentation, Cohen focused a great deal of time and attention on the PressForward service and some of the new publications that CHNM have produced using it.

Using PressForward, Cohen and Joan Fragaszy Troyano are now editing two new publications, Digital Humanities Now and the Journal of Digital Humanities. Digital Humanities Now began as a curation tool to extract humanities information from twitter. It now covers a variety of blog, social media and repository content that is available openly on the web. The Journal of Digital Humanities takes this curation process a step further — it consists of the best content exposed by Digital Humanities Now.

In some ways, these two publications are quite novel, automatically culling from thousands of daily posts, organizing and segmenting their respective content, and then algorithmically ranking the mass of content. Increasingly, as the mega journals such as PLoS ONE, Nature’s Scientific Reports, and the few others that exist begin collecting a wide range of diverse scholarly content, these overlay journals will become increasingly useful. Digital Humanities Now and the Journal of Digital Humanities are examples of how to bring this content together and what these overlay journals might look like.

But despite its novelty, the Journal of Digital Humanities is essentially doing what traditional journals and editors had done — gather, review, and validate content — but based on a curation model rather than one that is submission-based. Traditional journals had to generate sufficient reputation and distribution to generate paper submission from authors. If a traditional publication were lucky enough to have sufficient reputation for quality, it could be selective, or even especially selective of the content it published. In the case of the new journals, they are pushing out content that already exists and then curating the content based on what is available in the open web.

There is an internal bias with these two publications in that they rely solely on open access content, for a combination of business, practical, and legal reasons. Reuse of traditionally published content in this fashion is generally prohibited by copyright. As such, DHJ is a skewed sample of the work being done in the humanities, but it needn’t disqualify its relevance. It is an open question as to whether these virtual publications will encourage additional publications to move toward greater re-use allowance, or whether certain paper repositories will develop around subject areas, from which copyrighted material can be culled from within a single publisher’s collection.

There is a parallel between these algorithmically generated journals and another trend in publishing that has emerged over the past year, which is the automatic generation of stories from structured data by companies such as Narrative Science and Automated Insights. These services are nothing like the Lorem Ipsum dummy text generators, which simply spew text characters. These services create narrative text based on structured data to create news stories without human drafting or editing. You might expect that these services are only used for populating advertising spam sites, but this is not the case. In fact, many reputable news publications are using these services to add content to their publications (while simultaneously cutting out their writing staff, we should note). Wired magazine featured an article on this last month entitled “Can an Algorithm Write a Better News Story Than a Human Reporter?” I met Kristian Hammond at the Tools of Change conference earlier this year, and the service is impressive. I am not certain I agree with his belief that it will be less than 5 years before an auto-generated story wins some reporting award, there is certainly a place in our world for this auto-generated content. Similarly, there is a place for the auto-summarized journal.

Hopefully, we can all agree that machines are poor surrogates for humans in terms of curating and selecting content. But they do have strengths in speed and scale, which is important to match the pace of content creation. Assessing the quality will for the foreseeable future be the domain of editorial experts and peer-reviews. In many ways the quality of the journal will dependent on the quality of the filtering mechanisms. Of course, this has always been true, it simply has taken decades for the highest quality titles to rise to the top. In our digital environment, it probably will happen much quicker. How these new forms of communication are accepted by the scholarly community, authors and the administrators who manage promotion and tenure committees is another important question that will take several years to work out.

Todd A Carpenter

@TAC_NISO

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Discussion

9 Thoughts on "Will Editing Mix Machines With Humans? Dan Cohen Ponders the Future of Publishing"

Interesting article thanks Todd. With the majority of people considering themselves content creators now and with an over-abundance of information, the machinery and editorial processes of STM journals has never been so critical. To loosen the filter, as some new approaches appear to advocate, would be of no useful service to the professional community in my view.

By Andrew Miller
Jun 1, 2012, 6:09 AM

My first comment, excellent!!!
I certainly am not advocating the removal of filters and I agree that the filtering that scholarly journals (even beyond STM) provide is even more critical What is interesting is that new types of filtering is starting to open up. Over time these automated filters will improve It is unlikely they will be as nuanced and of as high quality as human filters, but there are positives and negatives to each approach (human versus machine). Automated filters needn’t be looser than human filters, they are just different. How the two interplay moving forward will be fascinating.

By toddacarpenter
Jun 1, 2012, 7:25 AM

Thanks for the thoughtful response to my talk, but this makes it seem like PressForward publications are almost entirely algorithmic. (Especially in your rather odd and inappropriate parallel to fully automated journalism.) I made it clear several times in my talk (indeed it was the main thesis of the talk) that we need the best of algorithmic and more traditional human editorial methods of selection. PressForward publications are hybrids of those two methods. The algorithms and associated technology (like RSS) help us find content of interest to a community of scholars, but humans have to check to make sure that content is of high quality before disseminating it.

Readers can get a better sense of how, for instance, the Journal of Digital Humanities was put together in our editors’ introduction to the first issue. I would agree that we are trying to do something traditional as well, which is to do what print journals have done: providing new, important scholarship to people with limited attention. It’s different than completely personalized “publications” like Flipboard (which indeed are run by algorithms).

As to the question of “internal bias” toward open access, if the Scholarly Kitchen will plead guilty to its internal bias against open access, I shall plead guilty as well. 😉

By Dan Cohen (@dancohen)
Jun 1, 2012, 10:01 AM

I don’t think Scholarly Kitchen has an internal bias against OA. As part of SK I just have a bias in favor of business models that work. I am even thinking of starting an author pays journal. Being skeptical of Utopian schemes is not a bias.

By David Wojick
Jun 1, 2012, 10:43 AM

I work for a publisher with a strong history as a leader in open access publishing. I think it’s the ideal way scholarship should be done. I do think there are issues in the practicality of the proposed system and its implementation that should be openly discussed, criticized and improved upon though, and many unfortunately see any form of analysis as an attack.

As always, the Scholarly Kitchen is a diverse group of authors bundled together under one umbrella, and while we each have our own biases, I’m not sure the group as a whole can agree on anything.

By David Crotty
Jun 1, 2012, 11:41 AM

Todd, you seem to be using the term “curation” in a new way to me. You say “… the Journal of Digital Humanities is essentially doing what traditional journals and editors had done — gather, review, and validate content — but based on a curation model rather than one that is submission-based.”

What is a curation model? Do you mean search and collection? I think of curation as managing a digital archive. Is curation a new buzz word I need to know? I do algorithms.

By David Wojick
Jun 1, 2012, 10:58 AM

Todd is using curation in the sense that seems to be winning out on the web, wherein curation means selection and organization of digital content. For some examples of this usage see the discussion of the differences between digital curation and digital preservation.

By tjowens (@tjowens)
Jun 1, 2012, 11:31 AM

I like to say that confusion is the price of progress, so this is very interesting. I first encountered the term curation when I did staff work for the US Interagency Working Group on Digital Data (IWGDD). The focus was preservation, not selection, and I thought at the time that selection was the bigger issue. But collecting twitter messages strikes me as an odd use of the term curation. On the other hand, web search is called discovery, which I find especially distasteful in the context of science, where discovery should mean discovery. Looking for language is the mark of revolution.

See also http://en.wikipedia.org/wiki/Zettel_(Wittgenstein) which is like a twitter collection. Dare we call it zettel?

By David Wojick
Jun 1, 2012, 1:41 PM

The Scholarly Kitchen

Will Editing Mix Machines With Humans? Dan Cohen Ponders the Future of Publishing

Innovation Showcase Highlights Cutting-Edge Publishing Solutions

View photos from the 46th Annual Meeting!

Todd A Carpenter

Related Articles:

Next Article: