It is a truth universally acknowledged that when given the choice between something that is free and something that costs even a small amount of money, other things being equal, most people recall that they are rational actors in an economic morality play and choose the free item.

I suppose it’s another universally acknowledged truth that when you begin to write anything in which Jane Austen will be cited, it is irresistible to quote the famous opening line of “Pride and Prejudice.” It’s a great line from a great book, one not only studied by literary scholars but taught in classrooms everywhere and even available in airport bookstores, where it sits on the shelf along with the other small number of public-domain classics that have established an enduring place in the popular imagination:  “Great Expectations,” “Moby Dick,” “Vanity Fair,” “Portrait of a Lady,” etc.  In our digital age, those great incipits can be cited with abandon — “Happy families are all alike”; “Call me Ishmael”; even “This is the saddest story I ever heard” — as someone need only copy them into the Google search bar to discover their provenance (“Anna Karenina,” “Moby Dick,” and Ford Maddox Ford’s “The Good Soldier”).  The pyrotechnology of Google married to the classics:  What’s not to like?  Google can make even the elliptical poetry of Ezra Pound accessible to many readers.

Other things are not always equal, however. Step inside your local bookstore, assuming your town is fortunate enough to still have one, and pick up a copy of, say, the Penguin Classics edition of “Martin Chuzzlewit” or the Oxford Classics version of “Tom Jones.”  These public-domain classics have been beautifully edited:  authoritative texts, useful introductions and suggestions for further reading, and notes for obscure references in the text. These are perfect, attractively priced student editions.

For some writers the amount of editorial work is much greater; consider, for example, what it means to edit one of Shakespeare’s plays for students or having to translate the works of Aeschylus, Horace, or Flaubert.  All of these works are in the public domain, but the work of translators and editors lifts them above what the public has title to.  Once you throw an editor into the mix, other things simply are not equal.

These thoughts were prompted by my introduction to Google Ebooks, the much-awaited program that promises to revolutionize access to books.

When the program was launched in December, I proceeded to install the Google Books app on my iPad to test the service.  With installation, three titles automatically appeared in my personal library:  “Pride and Prejudice,” “Great Expectations,” and “Alice in Wonderland” — all in the public domain.

I decided to read a few pages of “Pride and Prejudice” to test Google’s e-reader, but Austen being Austen, I am now almost finished with all of her major work:  “Emma,” “Pride and Prejudice,” “Mansfield Park,” “Sense and Sensibility,” “Persuasion,” and “Northanger Abbey.”  I have done this several times before, though in print, and I am not alone, as Austen is one of those writers people keep coming back to.  Hence her availability even in airport bookstores.

Incidentally, one of the advantages of the iPad and other e-reading devices is that you can load the complete Austen onto them, step inside your Man Cave, turn to NASCAR on the TV, and curl up with the tale of Elizabeth Bennet or Emma Woodhouse.  Heck, for all anyone knows, you could be reading “Typee,” Hemingway, or even Norman Mailer.

Rather than pay for the Penguin or any other edited version of Austen, I decided to be a cheapskate and searched for free Google versions.  And that’s when things began to go wrong.  The Google editions were packed with errors. If I were not studying Google Ebooks for professional reasons, if I were not already familiar with the works of Austen, would I have gone on? Would I have thought that Austen does not know how to place quotation marks, that she made grammatical mistakes that would embarrass even a high school freshman, or that her dialogue sometimes breaks off without explanation?  I began to wonder what service or disservice Google had performed, rendering one of the world’s most popular writers in a form as bizarre as the Zemblan translation of Shakespeare in Nabokov’s “Pale Fire.”

The problems with the Google versions of Austen potentially stem from four sources, though it is the third of these that is the principal culprit:

  1. The original print edition. Except for those books sent to Google directly by publishers, the books found in Google Ebooks derive from Google’s mass digitization project of library collections. The assumption is that if a university library puts a book on its shelf, that book must be okay.  This is a bad assumption, however.  Publishers make mistakes, libraries make mistakes; over time the seriousness of these mistakes becomes more apparent.  Texts have to be reviewed; sometimes explanatory notes are necessary to provide context.  Take a look at the public-domain 1911 edition of Encyclopaedia Britannica (not part of Google Ebooks) and ask yourself if some people will confuse the text for a modern one.
  2. Digital scans of the print edition. Digital scanning has gotten very good; as far as I know, Google does as good a job as anyone at this.  But scanning can nonetheless introduce errors and odd artifacts and in any event does not provide a text in a modern typeface that can fit any size screen.  This may not be a problem for a scholar working with obscure works, but for popular works like “Pride and Prejudice,” this raises a real barrier to readership.
  3. Optical character recognition (OCR). OCR has come a long way. Each Google Ebook of a scanned text is accompanied by an OCR’d version, which allows the user to change fonts and type size and reflow the text. Unfortunately, some of the OCR for the Austen volumes I read was simply terrible. Words got mashed together, spacing was bizarre, punctuation was simply not picked up, etc., etc. The problem for Austen is that it is the OCR version, not the scanned pages, that most readers are likely to use, as they will be reading from mobile devices with tiny screens, which require reflowed text.
  4. Metadata. The metadata for the public domain works in Google Ebooks is atrocious.  Geoffrey Nunberg has written forcefully about this.  To his comments I would add that the crime is worse for popular works such as Austen’s.  Scholars can struggle with poor metadata, but someone who might pick up a classic but once in his or her life is not likely to make much of an effort.  My experience with “Mansfield Park” is not atypical.  I began to read the work (in the corrupt OCR version) and then came to what appeared to be the end.  But it was marked the “end of Volume One.” There was no Volume Two.  I had a similar experience when researching books about the Beatles and encountered a title and cover for a Beatles book, but inside was a musical score by Mozart.

I wish to be clear that I am restricting my criticism to the small number of literary classics that continue to have a popular readership.  Google has done a disservice to these works and their readers.  Free is a terrible price, as many readers will flock to these free editions — not knowing that other things are not equal — bypassing the edited volumes prepared by scrupulous publishers.

The horse is out of the barn, alas:  free rules, and damn the consequences. What is needed is a branded collection of reader’s editions for popular public-domain works. Such editions would not have all the apparatus provided by a Penguin or Oxford, but they would have reliable texts and be in technical formats such that schoolkids can read them on their phones. They would have to be free.

I would like to see an authoritative organization — the Modern Language Association, perhaps? — take this on, perhaps by beginning with a review of the texts in Project Gutenberg.  There would be a one-time cost to get this going and then a small cost for maintenance. There is no self-evident way to recoup these costs.  The number of titles would be in the range of 200-300.  Titles that pose challenges (Shakespeare, translations) could be put off for a later time.  Perhaps the founders of Google, whose personal wealth will someday be the subject of literary epics, would underwrite this.  This is what is meant by “giving back” after you have taken away.

The branding (MLA Editions or something of that kind) is key as teachers could then recommend these free editions to their students.  It’s a shame that the good work of Penguin, Oxford, and other reputable publishers would be diminished by this, but the onslaught of uncurated free culture creates new costs, some hidden, everywhere.

