This is a short post on a big topic.
Google has opened up its database of digitized books to researchers, who are mining the data for trends and patterns. The New York Times has a stimulating summary of the project, and only a person completely lacking in curiosity could fail to be fascinated by some of the early findings.
How far can this go?
It’s hard to say at this juncture, but my guess is that the emergent properties of large databases will prove to be very rich indeed.
What jumped out at me in the Times‘ coverage, however, was this statement:
Google says the culturomics project raises no copyright issue because the books themselves, or even sections of them, cannot be read.
How’s that again? It looks to me as though someone slipped a curve ball by the batter.
A copyright is a right to make copies. But here we are told that a copyright only applies when there is a human reader. Machines can read, too, and creating and repurposing content for them is a significant growth opportunity for the publishing industry. On what authority does Google assert that human readership is an essential aspect of copyright?
To my knowledge, publishers have shown little interest in asserting their rights in data-mining projects. Perhaps that will change if the results of these experiments with the Google book database prove to be significant.