According to a recent blog post at Google, the number of distinct URLs they have found and indexed online just crossed the 1 trillion threshold. This is an increase from 1998, when Google started with an index of 26 million pages. Google also claims that they are seeing a few billion URLs added every day.
Something the blog post mentions that I hadn’t thought about is how the Web is now throwing off URLs automatically. For instance, blog sites create a new calendar entry every day. There is really no upward limit on the number of URLs that Google will have to index in order to have a comprehensive accounting of the Web.
But, as TechCrunch has pointed out, discovering all these URLs and actually storing information about them are two different things. Google probably only stores about 40 billion URLs, after eliminating spam, duplicates, and other noise.
Also, is Google actually the most complete index of the Web? Apparently not as of this week, as Cuil (pronounced “cool,” but it doesn’t work for me) unveiled its search engine, which it promotes as the world’s biggest. Using it, I think maybe they focused on scale before they focused on usability.