Note: This post was written by Carol Anne Meyer, an SSP Board member.
While the Scholarly Kitchen’s chefs were still mopping up from the food fight, 26 attendees of this year’s SSP Annual Meeting jumped in taxis to the Richmond District in San Francisco for an impromptu field trip to the Internet Archive. A few minutes later, the Archive’s founder and this year’s Annual Meeting keynote speaker, Brewster Kahle, met the group at the Internet Archive’s new building on Funston Street.
As we stepped out of our cars in front of the Greek revival building, we thought of how appropriate the neoclassical columnar architecture was for the organization. The beautiful building, formerly a Christian Science church, sparkled white in the California sunshine. Its columns bear an uncanny resemblance to the Internet Archive’s own logo.
Inside in the vestibule, Kahle drew another parallel — this time to the Library of Alexandria. “We know it will eventually burn,” he said, speaking of the massive digital archive of web pages, audio, video, and printed material, “but we don’t know when. So we are making copies. There are copies in the Netherlands, Egypt, and here in San Francisco. . . . On a flood plain, the middle east, and on a fault,” he added wryly. “When it burns, it will probably be by the stroke of a pen. So we make plans so that copies will be available. One copy is just not a good idea,” he said with a shudder. “Great libraries are burned in times of political turmoil. If we keep of a copy of the library, we can restore it once stability has returned.”
“We pride ourselves on our frugality,” Kahle said. “Everybody says that these days, but we really are frugal.” Still, all employees in the scanning centers receive benefits, he quickly noted. Internet Archive funding comes from the government, from organizations paying 10 cents per page to scan their materials, and from foundations. Kahle likened foundations as the venture capital of the not-for-profit world, and noted that they are often happy to fund experimental projects, but that ongoing operations need to find other financial sources.
The next stop on the tour was “The Library,” the former sanctuary of the church. Kahle continued his remarks, backlit with almost a halo of golden late-afternoon light streaming in through the magnificent stained glass windows of the 1923 building. The setting put me in the same reverent mood I’ve felt in other worshipful cathedrals of information (like the main reading room of the New York Public Library, the Bodleian Library, or the Library of Congress). I don’t think I could imagine this operation more happily situated (with apologies to Jane Austen for the turn of phrase).
Kahle explained that they plan to make some changes to the space, like leveling the slanted floor, to turn it into some sort of yet-to-be-defined 21st Century “reading room,” combining physical presence and ubiquitous information. In the meantime they will use the lovely room and all its pews to present lectures and show films, presumably of neat materials uncovered in the archiving process.
The Scanning Center
Our group then crammed into a crowded room in the building next door — the scanning center. One of a number of scanning centers located in five countries, the long narrow room was lined with perhaps 20 stations that looked a little like curtained photo booths.
Continuing the comparison to the Library of Alexandria, the custom-built and designed book scanners are called Scribes. An operator sat at each station, scanning books donated from private collections, lent or “de-accessioned” from academic institutions, or paid for by content owners who want their material scanned. (The cost, according to the numbers Kahle presented at his keynote on the first day of the annual meeting, is 10 cents per page, which includes scanning, performing Optical Character Recognition [OCR], and generating digital versions like Daisy, ePub, and Mobi.)
One of the attendees murmured that the center was literally a sweat shop. The building was not air conditioned on the muggy June day, and the Scribes and other equipment generate a lot of heat.
The new Scribes are high-resolution scanners, specially designed with rollers to compress books. Two glass plates are set at an angle so that the compression does not break the back of a book the way a flat scanner would. Two cameras are precisely angled to get sharp, distortion-free images of each page. At first, during the Million Book Project, the scanners were not of high enough quality, and the scans did not look good.
“Frankly,” Kahle admitted, “they were no better than the books scanned by Google Books. That wasn’t good enough. We need high quality digital versions of things that will survive forever.”
Along the center of the San Francisco scanning center is a row microfilm scanners, which operators use to digitally capture documents stored on the older media. Kahle asked the “tourists” not to disclose some of the content types we saw on the tour because they haven’t been publicly announced, but we can say that they ranged from historical US government documents to many types of print and non-print media. More than once he reiterated that the Internet Archive is about access, not just about preservation. “We don’t really see the point of a dark archive,” he noted.
Kahle seems unfazed by the enormity of Internet Archive’s mission of “universal access to all knowledge.” Any content type the group asked about, it seemed, they have plans to ingest. They have begun to scan current books (those in print and in copyright). Once a book is scanned it is converted into multiple formats: Daisy, ePub, Mobi, PDF, and B&W PDF for POD, among them.
After print materials are scanned, if they have been donated, they are carefully packed into archival boxes and stored very densely in climate controlled storage facilities where they will be preserved indefinitely, taking non-destructive scanning to the nth degree. Kahle’s reverence for the books — any books, all books — is evident in everything he says and does.
What About Copyright?
The Internet Archive, according to Kahle, makes a point of respecting copyright law. He reported that they can and do immediately make in-copyright books available for the print disabled. (The recipients are vetted by the Library of Congress to ensure they have the rights to them.) They are happy to have and store the other formats until the copyright expires and they can legally be exposed. The Open Library project, which Kahle described in depth during his keynote address, is another approach to making in-copyright works more accessible.
The Internet Archive employs about 40 people. “We don’t really want to grow. We want other people to do what we are doing,” said Kahle.
Kahle noted that many of the operator’s salaries were funded by money from the economic stimulus package, and likened it to Works Progress Administration (WPA) projects in the Great Depression. “In those cases, physical infrastructure was preserved by the government to employ people in a difficult economic period. Now the Internet Archive is preserving information infrastructure at a similar time.”
“The Sunday School Room”
Back in the main building, we proceeded to the basement, where we saw the “Sunday School room,” a vast open space filled with desks, rose chintz sofas left by the Christian Scientists, and a custom designed and built open source computer filled with Internet Archive storage. On one end of the enormous room, staffers were crowded into a small conference room watching a demo. While Kahle told us more about the program, they burst into applause at what they had seen. An ancient set of encyclopedias adorned a floor to ceiling bookshelf.
“What kind of content do you take?” we asked. A better question is “What do they not take?” Books, yes. Audio media, yes. Government documents, yes. Journals, yes. Television, yes. Videos, yes. The World Wide Web, yes. They have experimented a little with archiving software, but are leaving much of that to the Computer History Museum.
“What about file migration?” we asked. “What happens when the world shifts and computers can no longer do anything with, say, PDFs?”
“Oh, yes,” Kahle answered, almost nonchalantly, “we migrate them. We’ve already reprocessed all our audio files multiple times: from mp2 to divx, that didn’t catch on, then MP4, but not the same MP4 as we have today.” He noted that it takes many machines many months to reprocess all those files.
“Is there loss?” we wondered.
“Well, whenever possible, we go back to the original digital source rather than reprocessing converted files,” was the reply.
“Do you want books?” someone asked. The unequivocal answer: “Yes! We want contemporary books. If you have more than 100, contact us first, please! If you are a publisher, send us your books (just one of each).”
We ended our tour where we began, in the vestibule. Each of us signed the guest book, took a baseball cap as a souvenir, and left, pondering just how 40 people can really save the knowledge of the world.
To Learn More:
You can experience a tour of the building yourself here: http://vimeo.com/7531102. That’s right, before the Internet Archive took over the building, they posted a video. Living the dream — preservation and access.
Scanning center video tour (this one is in Los Angeles) http://www.archive.org/details/srlfscanningcentertour_1
Brewster Kahle’s 2007 TED talk: http://www.ted.com/index.php/talks/brewster_kahle_builds_a_free_digital_library.html
Thanks to SSP President Ray Fastiggi and Past-President Bill Kasdorf for making the arrangements for our tour, to Brewster Kahle for inviting us and showing us the works, and again to Bill for reviewing a draft of this report, adding valuable information, and fixing my embarassing malapropisms.