Why Are Publishers and Editors Wasting Time Formatting Citations?

Sign reading, "citation needed" — Actually, we have the citation. It’s the identifiers and the metadata we need. (Image via futureatlas.com https://www.flickr.com/people/87913776@N00).

Citations are foundational elements of scholarly publishing. They provide evidence of the reliance of current work on existing literature, background on how research strategies were developed, indication of the thoroughness of the work, and a summary of significant prior related art, as well as facilitating plagiarism detection. As an ancillary benefit, they also have created one metric against which past research is judged. The investment in ensuring references are accurate, complete, and the link (if one exists) to the referenced object is functioning is certainly a core publishing function.

However, all of these things have very little to do with how references are presented in citations, nor the importance of one style over another. Much of the focus on citation style is driven by domain tradition and adherence to one of the many dozens of style guides that exist in the scholarly publishing world. I certainly don’t intend to disparage any of the different manuals, nor their adherents in any field. What concerns me is the fact that there is so much time wasted in the production process of editing references to adhere to the house style, whatever that style may be.

Anyone who has written a scholarly article, a book chapter, an entire book, or a bibliography understands the challenges of references. Recently, I submitted a book manuscript for a project that I am co-editing. One of the most tedious and painful aspects of pulling together the book was going through and editing the references. While I appreciate and acknowledge the importance of the references, the entirely manual process of checking and formatting the references was an incredible waste of time.

A scholarly reference at its most basic element is a string that provides readers with sufficient information to track back to the original source of the content being referenced. That information, of course, varies by the type of publication, the source document, and the complexity of the resource being referenced. There will always be a variety of citation styles based on the type of resource, but why do we need so many different styles for the same type of resource?

The consistent formatting of names, the placement of commas and periods, and the representation of all manner of data are all style issues that should be handled by automated style sheets drawing on linked metadata. Because we are focusing on a string of text, we are inefficiently producing citation content. and wasting effort on post-distribution processing. We are also not taking advantage of all the potential from machine-linking this information, a critical requirement as we move further into a linked data world.

There have been many studies over the years of citations, their accuracy (just a few more and more), and the types of errors researchers are making. These errors can be serious or minor, with minor errors being the most common. Significant errors can be so bad that discovery of the cited work is impossible. Although the percentage (in the 5-20% range across several studies) is modest, it is still troubling..

Some of the research on citation errors showed inappropriate use or misuse of a referent’s conclusion. If the other identification and metadata errors in a citation are eliminated or reduced significantly in ways discussed below, editors or reviewers might redirect their time to confirming the appropriateness of the reference. This might be a better investment of editorial resources.

Many publishers and vendors who provide copyediting or publishing services have for some time done production transformations of references and validation. Unfortunately, the match rate of these post-production references against CrossRef and PubMed metadata is far short of perfect. This editorial process has been the focus of a lot of process development and programming over the years. There has also been research in how to use pattern matching for names and topics (and another), as well as how to algorithmically parse citations (and just a few more and more).

Rather than using text to identify digital resources, we should be taking advantage of the system of identifiers and metadata that exist to streamline the process of reference creation and validation. This would solve the problem at the root rather than after the fact. Instead of identifying people with just their names, we should be using ORCIDs or ISNIs to ensure disambiguation and machine-processability. Identifiers can also be used for institutions and publications. Nearly every article has (or at least should have) a persistent identifier such as a DOI. Publications have ISBNs or ISSNs. Sound and A/V formats use ISRC, ISAN, and ISMN identifiers. Special collections and archive materials have collection identifiers and accession numbers. Data sets are being tagged with DataCite DOIs, or ARKs. Even concepts and other digital materials can be identified with URIs. All of these identifiers should have associated metadata that describe what that object is, be it name, title, or description.

When authors are submitting references, why doesn’t the community simply send in a reference that is submitted like this:

<Author ID>, <Publication ID>, <Object Identifier>, <Publisher identifier>, <Date (of publication/access)>, and specific location (such as page number, etc), if necessary.

Each of these elements, aside from date and specific item location should be represented by unique persistent identifiers. References should not be strings of text that describe the referenced content. In our increasingly digital and machine-intermediated world, the production departments can pull in related metadata using an automated process to retrieve the data and format it in any way the publication’s editors would prefer. Any outliers can be dealt with in traditional ways.

While such a change requires buy-in from the community for broad adoption, providing citation data in these structured ways would actually be easier for authors to do than to correctly gather and format all of the information in references the way they are doing so now. A large number of authors are already doing something similar by using reference managers like Zotero, EndNote, Mendeley, RefWorks or some similar service. All of these services use structured metadata to transform easily from one style to another. Rather than taking that structured data, converting it to text streams, then having editorial services return it back to structured data, publishers should ask for that data and plug it into an automated formatting process.

Making all (or at least most) of the elements of references machine-readable through persistent identifiers will remove through unambiguous reference the errors and confusion about the identity of the author, or publication or whatever is being identified. A greater range of metrics and analytics are possible through machine-processable references. Some of this is already being done by organizations that process references for analytics, such as Thomson-Reuters. Who can say what metrics or services might be possible if all the material were generally available? Additionally, citations can be scanned for appropriate or correct quote attribution. Presently, CrossRef partners to provide a service for plagiarism detection, mediated through the CrossCheck metadata, but there are others. A more robust system might follow with new and different types of providers.

One might respond that the scholarly community has successfully implemented the DOI system and DOIs are regularly included in references, so isn’t that sufficient? In some ways yes, the inclusion of DOIs does address part of the challenge. DOIs do create the functional tie to the metadata regarding the referenced item. However, they do not form the basis for how most references are constructed or formatted or allow the parsing of the reference. Also, DOIs provide an indirect link to information rather than a direct link to, for example, the author’s ORCID profile. This lack of direct connections creates a barrier to simple analytics and information mapping. This is certainly not an insurmountable issue, based on the robust infrastructure that CrossRef provides, but traversing this linked data web directly rather than via the CrossRef metadata also simplifies the collection of metrics for assessment purposes. Relying on DOI metadata also lacks some of the real-time validation services and potential linked data extraction from a master identity record that would be associated with a persistent identifier for that entity, such as a person using ORCID, a book using ISBN, or an institution using an ISNI.

PLOS hosted a hack day at their offices earlier this month in San Francisco on trying to find ways to improve the machine interoperability of citations. One idea is what PLOS is calling rich citations, essentially using linked metadata to provide richer citation information. PLOS Labs has developed an open source bot that can automatically collect rich citation information. While these might be interesting services, would they be necessary, if the power of identifiers instead of text were already built into the references?

To implement these suggestions would require a cultural shift in the current publication process. Authors will need to view much of their content not as narrative, but rather as structured information and data. Citations are a good place to start since much of this content has existing identifiers that can be assigned and used to replace the textual format with a structured one. Let’s leverage the existing systems and tools to redesign how we manage citations to stop wasting our time on reformatting and add the increased functionality that machine-processable structures can provide.

Todd A Carpenter

Todd Carpenter is Executive Director of the National Information Standards Organization (NISO). He additionally serves in a number of leadership roles of a variety of organizations, including as Chair of the ISO Technical Subcommittee on Identification & Description (ISO TC46/SC9), founding partner of the Coalition for Seamless Access, Past President of FORCE11, Treasurer of the Book Industry Study Group (BISG), and a Director of the Foundation of the Baltimore County Public Library. He also previously served as Treasurer of SSP.

Discussion

39 Thoughts on "Why Are Publishers and Editors Wasting Time Formatting Citations?"

One reason why reference lists shoud be in reasonably good shape upon manuscript submission is so that reviewers can check the references. If only the metadata is submitted, it may be necessary for journals to convert the reference list to its edited format before sending manuscripts out for review (which should not be too hard, just something for publishers to keep in mind).

By RMS
Nov 6, 2014, 7:56 AM

Yes. But this is of course only a reason why the references should be comprehensible, not a reason why they need conform to the journal’s preferred format.

By Mike Taylor
Nov 6, 2014, 9:03 AM

Thought-provoking article. I think using just the DOI make this a lot easier to implement; maybe add the access date and that’s it. All other metadata can be pulled from CrossRef if and when available. Surely CrossRef could create the APIs required that would serve ORCIDs when present and relevant links.

By Pieter Lamers
Nov 6, 2014, 8:23 AM

Authors will need to view much of their content not as narrative, but rather as structured information and data.

I don’t think we can ignore the narrative use of citation and the information they convey to the editor and reviewers. Who is the author citing? What is the name of the journal? Are these authoritative sources? While I agree with your argument, we need to find a way to convey the structured metadata in a way in which humans AND machines can read. I think the value of citation would be greatly diminished if it were merely a string of machine readable code.

By Phil Davis
Nov 6, 2014, 8:40 AM

My point wasn’t that we should get rid of narrative text altogether. A string of identifiers can easily be rendered digitally as a narrative string humans can read. The same applies to the HTML that this page is sent the net. Simply by including the IDs provides the opportunity to do both.

By Todd A Carpenter
Nov 6, 2014, 9:40 AM

An interesting post about a subtle issue.

Print still matters for many journals in the biomedicine area, and PDFs are still in high demand by users. I can guess, but how would this approach affect the ability to create traditional reference lists in print and PDFs?

There is also the workflow issue for authors, who are often cutting-and-pasting into Word. This goes beyond how they view their works as a philosophical matter. They have habits, and they take shortcuts. Their workflow is a vital piece of the overall publishing workflow. So while editors, copyeditors, and publishers may all evolve in this direction, authors would have to, as well. With Word as the common input tool, there is a major workflow issue to solve before this could ever be adopted. As you mention, there is already a GIGO (garbage-in, garbage-out) problem around citations. How could we address the authorship workflow challenge without exacerbating the GIGO problem?

I love the idea of increasing accuracy and efficiency, as well as machine-readability. Looking at the problem more holistically will be important to finding a way through to the goal.

By Kent Anderson
Nov 6, 2014, 8:47 AM

Interesting. ASCE recently moved to a single article publishing workflow. With this transition, we eliminated page numbers in favor of Content IDs. We also have de-emphasized the volume and issue. Mostly this is because we are publishing them in their final form before they are assigned to issues. We have updated our reference style to rely heavily on the DOI.

I have to say, I get at least one call or email every week from an author about the “missing” page numbers. I have also had author claim to boycott the journals because we aren’t putting the Vol and Issue numbers on the final PDF despite the fact that every PDF has a linked DOI. The editors get a lot of question as well. Mostly I hear…”How will I put this paper on my CV”? There seem to be less concerns about citing the paper in other journals.

We have decided to stick it out. We knew there would be a learning curve and that we were leaping a little further out on this than some of our “sister” journals. I am not sure that the format suggested above is ready for prime time but I do joke about the reference of the future being “ORCID, ORCID, DOI”. But for now, people want to know the author, the journal, the title and the year of publication before wasting their time on a click.

By Angela Cochran
Nov 6, 2014, 9:35 AM

Right now, we are in a transitional phase – where both scholars and their machines are both reading (and writing) citations. The whole trick is how to empower each of these “readers” to do what they’re best at. Machines need identifiers for efficient, meaningful retrieval; Humans need text and its inherent narrative to recall most effectively.

Having done my time in the bowels of raw citation data, citation capture and link, citation correction, and citation metrics, metrics, metrics…I know the enormous benefits of metadata fingerprints, algorithmic and heuristic matching to known sources, and machine-assisted capture.

From the very beginning of manually-keyed data capture, ISI was using a minimal, machine-readable “identifier” (see: http://www.dlib.org/dlib/september99/atkins/09atkins.html from 1999, and this from 1977: http://garfield.library.upenn.edu/essays/v3p042y1977-78.pdf).

In the tens of millions of references processed each year by Thomson Reuters, there are not many that make it out to Web of Science without some enhancement from the vast metadata files in the TR production systems. When there is a link created to a source item, a citation that notes only the first author can be searched using ANY author as the cited author (yep, even Dr. 500th-Author-because my name starts with Z); full article titles can be added, even if they’re not in the original. But minor errors can also be corrected: some misspellings or alternate spellings of author name, variant presentations of a source title. When DOI was added as part of the metadata, it didn’t fix everything (early days, it was actually an obstacle), but it increased the number of corrections that could be filled onto a reference. Year, volume and page could be corrected – and those had been the most frequent sources of mistakes and the largest source of link-failure. More recent enhancements to capture enabled even more references to be expanded based on their re-recurrence.

My ideal future: the machine reader will be a powerful assistant to the scholarly reader. When writing or reading a manuscript, the scholar will work in cooperation with the machine to identify relevant materials; the machine will manage them as identifiers but spit them out as human-readable references – maybe as enriched by the scholar’s notes and commentary. The completed MS will be sent from one machine to another – the receiving machine can use its metadata to verify/format and present the reader with an new set of enhanced references, rich with the reader’s notes or comments.

Format was originally for the purpose of consistency and retrieval. It’s time to update what each reader needs in order to retrieve and contextualize the cited work.

By Marie McVeigh
Nov 6, 2014, 10:13 AM

This should not be an “either or” discussion.

In the Editorial Manger peer review system we offer an option that automatically transforms the authors’ unstructured references into journal style with automated links to digital objects such as DOIs (using eXtyles built in).

This means that authors don’t have to bother conforming to journal style for submission, but journals can nevertheless benefit from familiar/narrative styling during peer review for reviewer and editor convenience. Incidentally this fully automated process also creates structured XML that is ideal for machine reading and production.

What’s strange is that while we’ve offered this capability for close to a decade, only a few publishers have chosen to purchased this option.

By Richard Wynne
Nov 6, 2014, 10:14 AM

There’s a lot more automation already available to publishers than you might realize. My organization does not require authors to conform to our particular house style when submitting their manuscripts (in fact, none of the journals I’ve worked on in my career have required this – is this really a serious concern beyond the smallest of publishers?). As long as the references are mostly complete, I’m happy. In composition, we use automated tools to identify the component parts of the reference, and then automatically style and reorder elements. We can also pull in some associated metadata (mostly DOIs) to enable linking in the online products. The system can trip up on oddball stuff such as whitepapers, but the copyeditors take care of that formatting. And all these tools are available either directly to publishers, or through composition service providers, and they’re not that expensive.

I think we’re on our way toward making better use of digital records, but more infrastructure is needed, and some formats need to be proven. If publishers are expected to support automatic data retrieval for all of the standards you mention (ORCID, ISNI, ISAN, etc.), and all the ones that are still appearing, it will be a never-ending software development black hole, and frankly, software development is already an area where many publishers struggle.

By Shaun Halloran
Nov 6, 2014, 10:27 AM

There are many other incomprehensible sources of waste of time :
– why editors ask for cover letters alongside the submissions? ‘Skillful’ editors should be able to spot the ‘importance’ of texts they read, otherwise they do not ‘deserve’ to be editors… do they?
– why editors ask for complex page formatting while authors are not sure yet if theirs papers will be accepted or not? Isn’t a waste of authors’ time, too?

By Rorer
Nov 6, 2014, 12:13 PM

Todd, I am sorry you had to go through that process. But as you say 1000s of academics have to go through the same torture. It actually worse than you think. You have now finished your job, putting commas in the right places, putting “et al”, etc. as per instructions. You might think you have saved the publisher some time, or saved them money, but actually I am afraid you have completely wasted your time!! Here is why…

Publishers now usually ask for XML-first typesetting (quite rightly). That means XML, or structured data should be produced first, then PDF etc automatically from that – so that XML is the definitive archive. And for that, they typically use offshore companies (like mine). We undo what you have done, throw away all the punctuation, then structure the references in XML. Then we get DOIs from CrossRef etc and embed those. Then for the preparation of PDF, we use a script to typeset the references in the publisher’s style, thus recreating what you had done in the first place!!

Actually, it’s even worse than that! If you were using a reference manager when writing your article, e.g. Mendeley, Zotero, EndNote, then all the structured information would have been in those anyway, so we just need that data. Less work for you, and no QC for us…

Sounds mad, doesn’t it? Yes, it is. 😉

By Kaveh Bazargan (@kaveh1000)
Nov 6, 2014, 2:32 PM

Now that most, if not all, academic publishers have stopped hoping the Web would just go away, it’s beyond time to move past the Word doc as the preferred form of submission. A simple web form as the publisher’s submission system would make things easier for you, Kaveh, if not perhaps reducing your value add. All publishers have to do is to say: OK, no more Word docs. Just paste your plain text into these fields, give us DOIs for your citations (where available), upload your figures, data, and code here & we’ll handle the rest. Less effort for everyone & lower costs all around. What’s not to like?

With XML as the version of record, what and how it’s displayed can be determined on the fly. Also, see content negotiation for DOIs: http://crosscite.org/cn/ Instructions for what version or format of an article or metadata record to return can be specified dynamically.

By mrgunn (@mrgunn)
Nov 7, 2014, 4:00 PM

Now that most, if not all, academic publishers have stopped hoping the Web would just go away,

I’m sorry William, but this statement is complete nonsense. Academic publishers were among the first to move their publications successfully onto the web, and unlike most of their counterparts in other parts of publishing, it has proven to be a tremendously profitable place to be (just see the margins made by your employer, Elsevier for example). Do you really, honestly think that Elsevier, who bought your web based company for a large sum, wants the web to go away?

There are lots of things to take publishers to task over, but let’s not get ridiculous.

As for your other suggestions, this is a lot easier said than done. Researchers are incredibly conservative, and they have established workflows. If they write their documents in Word (which most do), it’s incredibly difficult to move them off to another methodology. If a publisher starts making lots of demands, do extra work, change your workflow, learn new tools, etc., that becomes a competitive disadvantage compared to other journals that let the researchers do things in an easier and (for them) more efficient manner. Many of my journals still get regular complaints from authors that they have to use an online submission system, why can’t they just email their manuscripts into the editor. Do you really think these same researchers are going to track down DOIs for their lists of hundreds of references?

By David Crotty
Nov 7, 2014, 4:12 PM

Reblogged this on DailyHistory.org and commented:
Formatting citations is time consuming and soul crushing process. Todd Carpenter at the Scholarly Kitchen argues that their is a much better way. Citations have only one purpose, they are designed to direct readers to sources used by an author. Today, disciplines have their own specialized citation forms. Personally, I have used at least six different citation methods as an attorney and academic. Carpenter argues that this process should be automated using metadata. This approach makes even more sense as more and more sources are digitized. Carpenter’s proposal goes beyond including just DOIs in references. What do you think his proposal?

By sandvick
Nov 6, 2014, 6:06 PM

hi, We have done some work on this for a new platform for PLOS where it is possible to enter a DOI and the citation is populated in the manuscript in the correct *uneditable* format. Changing the Journal profile then reformats the citation to the target Journals requirements. Means no author introduced error.

By Andrew Hyde
Nov 6, 2014, 8:07 PM

Has anyone actually developed software to do this in a general way, that is, to work with the existing identifier and publication systems? There are a number of non-trivial interface issues. Developing this software and deploying it commercially throughout the industry is not what I would call a culture change, whatever that means. It is a new technology. Note too that authors have to acquire and enter a large number of complex numbers, so there is still room for error as well as the expenditure of time. It is a good idea but far from simple in execution.

By David Wojick
Nov 7, 2014, 3:56 AM

Well, Mendeley automatically extracts DOIs and other identifiers from PDFs & retrieves metadata from Crossref, Pubmed, etc, and we integrate with word processors and other reference management and writing software, so that solves the acquisiion and entry issues and pretty much removes any room for error on the author’s part.

Think about how you create a webpage these days. You’re not opening up your text editor and typing , nor are you generally typing out the URLs, but rather copying and pasting them from elsewhere. Finally, with systems like Authorea and Overleaf, which submit manuscripts directly into the submission systems of publishers that work with them, I don’t see this being much of a technical problem, but rather something that just needs broader adoption, in other words, a cultural change, one which could be led by publishers, because it untimately decreases their costs (see Kaveh’s comment and my reply above).

By mrgunn (@mrgunn)
Nov 7, 2014, 4:08 PM

I did not realize that “cultural change” means the adoption of new technology. In that case it is important that cultural change is difficult, time consuming and expensive. It is what is often called technological change. Calling it cultural change makes it sound easy, like changing one’s mind, but technological change is seldom easy.

The creation of the technology is by far the easiest part. In particular, technological change is almost always accompanied by short term productivity losses (incurred for long term benefits) or what I call the confusion cost of progress. How much short term productivity loss can be afforded is often the prime determinant of the slow pace of technological change. So the important question is what kinds of short productivity losses are required in order to make this cultural change? I doubt they are small.

By David Wojick
Nov 9, 2014, 10:45 AM

This is a really important point. Since the days of the dotcom boom, we’ve been inundated with pitches from startup companies, “here’s our revolutionary new technology, all we have to do is get the culture of science to change and get the practice and workflow of science to change, and then scientists will get some marginal improvement (and my company will get rich).”

So far, we’ve seen only a tiny percentage of such companies gaining any leverage in the market (MySpace for Scientists never did quite catch on). Asking researchers to change their established workflows, to invest the time in learning new technologies and such is a big ask for a population that is increasingly short on time. There has to be a big payoff offered, much more than the idea that this will make life easier for publishers.

There is, however, near universal dissatisfaction with online article submission systems (http://scholarlykitchen.sspnet.org/2013/09/03/what-does-a-scientist-want/). If the payoff offered is easing the submission process, then perhaps you’d see uptake.

By David Crotty
Nov 10, 2014, 9:28 AM

Todd’s thesis is fundamentally correct – with the tools that are currently available, there is no need for publishers to demand that authors use either a particular citation style or even a consistent citation style, and editors/reviewers can better invest their time validating the accuracy and the appropriateness of references. However, the proposed solution, to replace strings of text with strings of identifiers, is somewhat akin to asking someone for directions and being told “Well, I wouldn’t start from here…”

As has been pointed out already, citations come from multiple sources, by direct download from online published sources, by copy-and-paste from other publications, from reference-management tools, or even from memory. Author behavior is the part of the scholarly publishing process that is least amenable to publisher-driven change. It is not yet realistic to expect authors to deliver the kind of identifier-based citations Todd proposes.

There are behavioral changes that should be happening already: journal editors should be told that they should not reject a paper simply because the references are formatted to the wrong style (it still happens!); Instructions for Authors should focus on what information is required in a citation to make an unambiguous link, not the presentational minutiae; and publishers should look at the tools that are available to automatically embed identifiers in human-readable citations and look to invest in new tools as new identifiers emerge.

There are several reasons why it makes more sense for publishers to work with what authors deliver, without overburdening them with style diktats. These include reasons that have already been given, including the very important point that the reference list is used extensively during the peer review process, but there are other arguments in favor of the “human-readable” citation style:

1. Authors are used to delivering citations in some sort of human-readable form. We’ve turned around enough supertankers in scholarly publishing without taking on this one.

2. As well as the points already made about assessing the relevance, significance, and authority of the cited material, the human-readable citation allows the reader to make fundamental judgments such as “is the citation written in a language I can read?”

3. There appear to be no mainstream authoring tools available right now that allow “on the fly” conversion of the type of machine-readable citation envisioned by Todd to a human-readable reference.

4. Human-readable citations are far more error tolerant. Two DOIs that differ by a single character can point to completely unrelated works, and there are generally no contextual clues to figure out what the errant DOI should have been.

5. Human readers find identifiers such as DOI and ORCID opaque, and mistakes are almost impossible to spot without automated tools. Identifiers are highly prone to typographic error – an internal study that was shared with us at Inera by a customer revealed that at least 20% of author-supplied DOIs contained mistakes that prevented linking. In the end, the customer decided to remove all author-supplied DOIs in their workflow and query CrossRef to reacquire correct DOIs, proving it is more efficient and accurate for the publisher to use automated tools to embed identifiers based on the human-readable citation.

6. Not all (in fact most) authors don’t yet have ORCID iDs, not all articles have a DOI, and many never will. Orphan journals are unlikely ever to have DOIs assigned to their articles, and dead authors are unlikely to sign up for ORCID iDs, but works by these dead authors in orphan journals still get cited.

7. Even the internally curated PubMed database contains errors, and databases that rely on depositor-curated metadata such as CrossRef and ORCID are far more error prone.

8. Where and how would authors find the relevant identifiers? There is already considerable pushback from authors about the demands publishers place on them. Is this a good/better use of their time?

9. Many types of cited material do not have DOIs or other unambiguous identifiers, such as personal communications (though in the world of linked citations, these belong in the text or acknowledgments, not the bibliography), patents, some conference proceedings, and, in particular, poster abstracts, reports and white papers, software, and new media.

Of course, some of these problems may lessen or even disappear with time. The widely adopted DOI has taken time to reach a point where reliable tools are available to incorporate it easily and automatically into publishing workflows; ORCID is still some way from that point. The fact that a problem exists implies that the community should be looking for a solution. But solutions already exist that acknowledge and work with author behavior. These solutions use the power of the new identifiers to turn a human-readable citation into a machine-readable link, and also help editors/reviewers validate the accuracy of citations. This is a less radical solution that achieves the same goals.

Humans and machines can have the best of both worlds.

(Thanks to my colleagues Robin Dunford and Igor Kleshchevich for sharing their thoughts and comments about this post in a lively office discussion.)

By Bruce Rosenblum
Nov 7, 2014, 9:45 AM

So, one of the advantages of working in an online environment is that you can leverage basic web technologies to integrate with external services. We have been working on a prototype to do exactly this at PLOS and it is Open Source. Essentially it works like this:
1. place a marker in the text
2. open the citation plugin
3. enter free form text
4. the editor does a DOI look up against the text and presents the author with potential options
5. the author choose the right citation
6. the citation is referenced and cited correctly

The citations are not editable manually, and it is possible to change the citation format and then both reference and citation are changed on the fly.

It is a very powerful solution and web based. It is in alpha stage.

For some reason this forum lists me as Andrew Hyde, its Adam Hyde.

adam

By Adam Hyde
Nov 7, 2014, 2:42 PM

William hits the nail on the head, as does Adam. Get things structured from the start, and you won’t need our services, nor that of my very good friend, Bruce!! We have an industry that thinks it is efficient, but actually has gotten very good at reverse engineering. We automate what we can, but use manual labour when we can’t. Every time an author chooses “bold” from the dreaded Word menu, our guys double check what they mean by bold. Is it emphasis, subheading, mathematical vector? We have the technology to go beyond Word, which is really a glorified typewriter.

Authors love their research, but hate writing them, having to read pages of “author instructions” which are usually a decade old. They also hate setting aside half a day to submit the papers. Peer reviewers hate dealing with attachments. And we hate careful proofreading of something that could have been right in the first place.

The answer has to be some kind of online system. Authors log in and a modern web page guides them to write their account. A reference tool grabs their references including DOIs if present. They click to submit, etc.

Sorry for being so negative on a Friday night, but we really deserve better!!

By Kaveh Bazargan (@kaveh1000)
Nov 7, 2014, 6:57 PM

Lots of good points here, but practicing librarians know the value of having duplicate, human-readable metadata as part of citations. “Hi, I’m trying to find identifier 193843820.345 in publication 29383991.231 but get a 404 error” — there’s not really much troubleshooting one can do, whereas. “Hi, I’m trying to find Carpenter 2014 in the Journal of Academic something something” is a great start. With so much software to help format citations, the “pain” of formatting citations is a straw man. Lookup tools and other author helpers are great, though. Don’t take away our human-readable metadata just yet, please!

By Jody
Nov 8, 2014, 2:11 AM

I fully agree with Jody on the value of having human-readable citation data.
But I think Jody is too optimistic when saying that the pain of formatting citations is a “straw man”.

I use Zotero and I’m very happy with its functionality. I input the bibliographical details in the relevant fields, taking care to do so accurately and to include ISSNs, ISBNs and DOIs when available. (And just to answer some of the other, technologically advanced commentators on this post, my research is comparative and historical in nature. I use a lot of older and off-the-beaten-track material and the vast majority of items that I cite do not have DOIs. Finding these sources is greatly facilitated by accurate, human-readable citations.)

In theory, when I format my bibliography all I have to do is select the style prescribed by the journal. With a click my whole bibliography can be formatted as required by the relevant journal. EXCEPT, that not one of the library and information science journals in which I publish (I’m a librarian too) uses a plain vanilla version of any of the well known citation styles such as Chicago or APA. These are always tweaked in some or other way. Neither do they specify that they want authors to adhere to the style of another, named journal of which the style is listed in Zotero’s style repository, which as of today lists 7176 named styles. As a result a conscientious author may spend many hours working his/her way through the bibliography, adding, removing or substituting brackets, full stops, commas, colons, authors’ given names, italics, bold, etc. When I suggested that one of the major journals get a publishing student or intern to write a Zotero citation format for its unique style, my suggestion was met with incomprehension. It seems that some publishers just don’t care about how much time their authors have to spend on formatting citations.

I mentioned that Zotero lists 7176 styles. Of these 1094 are unique. Should I switch to another citation manager that supports more citation styles? Maybe 20,000 will do? No. That does not address the problem. The problem is the pointless proliferation of citation styles. Why must every editor, editorial board or publisher feel the need to tweak one of the the existing well-known and well-documented styles to produce one unique to their journal(s)? To me, as a librarian, it is particularly galling that we don’t set an example in this regard.

By Peter Lor
Nov 22, 2014, 4:55 AM

I am surprised no mention of LaTeX typesetting yet here.

By Arap Sutter
Nov 8, 2014, 9:08 AM

Me too. I’m in the biological sciences, where there’s less frequent use of LaTeX; but I find it far preferable to Word for anything with many figures and references. Word is OK for some things- like working up collaboratively written draft document- but a hassle to use if you have more than 50 references and tons of figures. Since you can just swap out style files, re-formatting for submission to a different journal simply involves changing one line in the .tex file.

By Allison (@DrStelling)
Nov 8, 2014, 11:47 AM

Arap and Allison: Because >95% of authors use Word the industry has built a lot of tools around it, and when a LaTeX file arrives, it is a spanner in the works. You might be amused to know that the “industry standard” way of dealing with LaTeX files is to convert them to Word, as then everything follows the same system! We have frequent enquiries from other suppliers, asking to convert LaTeX to Word.

By Kaveh Bazargan (@kaveh1000)
Nov 8, 2014, 12:45 PM

In my world (physics) LaTex is the de facto standard, and for a good reason. References are handled by a single identifier named by the author and are pulled into the manuscript from a BibTex file which is your own database. A reference only has to be entered into the database once in your lifetime and this can be done by a single click on a journal site. The journal you want to publish in provides the style file you need to format correctly according to that journals’ wishes. It also sorts the references according to their appearance in the text. Most journals provide .tex templates for the manuscript also.
If the journal then end up resetting the whole thing it must be their problem. You have done what they asked you to do. And should they reject you pearls of wisdom, reformatting for a different journal is a piece of cake.
Latex compilers exist as GNU freeware. You can even write books with the stuff. No reason to stick with dinosaurs.

By Klavs Hansen
Nov 8, 2014, 2:20 PM

@Klavs: Fully agree with you that once you are in the world of TeX/LaTeX, writing is a pleasure. Physics is my world too, and so is LaTeX. I was the first TeX user at Imperial College, London in 1983. 😉

But LaTeX is not for everyone, so we need tools to encourage structured documents from all authors. I am happy to say several people are working on this, and I am too…

By Kaveh Bazargan (@kaveh1000)
Nov 8, 2014, 5:12 PM

I was interested to notice that Copernicus run their pricing depending on how much work a manuscript will be to handle; a submission in LaTeX is charged less on a per-page basis than a submission in Word, with an extra supplement for difficult or complicated cases.

http://www.annales-geophysicae.net/general_information/article_processing_charges.html

http://www.climate-of-the-past.net/general_information/article_processing_charges.html

So they’re both a) making it clear that LaTeX is a bit easier/cheaper to work with (for their publishing workflow, anyway = ), and b) gently incentivising authors to submit that way – saving fifty or a hundred euro is a nice encouragement!

By Andrew Gray (@generalising)
Nov 10, 2014, 8:26 AM

I’d bet the success of an approach like this would vary greatly depending on the field and the level of funding in the field. For a biomedical researcher with millions of dollars in grants, all publication charges are going to come out of those grants, and fifty euros is likely not worth the time to alter one’s workflow or do extra work. In fields where authors are unfunded and paying out of pocket however, they may be more effective.

By David Crotty
Nov 10, 2014, 9:21 AM

Copernicus are (with a couple of exceptions) targeting the earth sciences, which I suppose sits in the middle – not biomedicine levels of cash, but certainly not the penniless historians we all fret about!

I agree it’s unlikely to cause switching on a massive scale – though I wonder what the overall effect would be if journals routinely offered small reductions in APCs for things that make their life easier (or conversely, small penalties for things that would cause more work). I suppose this ties into the original question again…

By Andrew Gray (@generalising)
Nov 10, 2014, 11:26 AM

I’ve seen proposals like this, an ala carte system where you only pay for what you want (copyediting? that’s $100 extra), but not really seen many in practice. One area where I have seen it is in licensing, where some publishers charge an additional fee for CC-BY in order to make up for lost licensing right revenue. Those seem to have gone okay, though there were a slew of protests over the new Science journal doing this (and a favorable comparison given to the new Nature journal which automatically charges authors the more expensive rate rather than giving them the choice) but those came from advocates looking more to drive an agenda than real world researchers making financial decisions.

But for many, there remains a layer of removal between their own accounts and any of these costs. If the university or the funder is paying the bill, then $50 isn’t going to make much of a difference either way.

By David Crotty
Nov 10, 2014, 3:10 PM

Totally agree with you Kaveh. if the world were to use LaTeX for everything it would be great. But that is not going to happen and there is no point pretending it will happen. We need to build tools for where the user is now.

By Adam Hyde
Nov 10, 2014, 9:34 PM

My point with LaTex was just that it actually gets the job done in less time, which I think was one of the concerns of the original posting. I’ll leave it to others to save the world.

By klavs Hansen
Nov 11, 2014, 6:57 AM

The Scholarly Kitchen

Why Are Publishers and Editors Wasting Time Formatting Citations?

Todd A Carpenter

Discussion

Shaping Our Collective Voice Through Advocacy: Insights from SSP’s Pulse Check

16th GW Ethics in Publishing Conference 2026

SSP Welcomes Newly Elected Board Members for 2026-2027 Term

Todd A Carpenter

Related Articles:

Next Article: