A Trillion URLs

According to a recent blog post at Google, the number of distinct URLs they have found and indexed online just crossed the 1 trillion threshold. This is an increase from 1998, when Google started with an index of 26 million pages. Google also claims that they are seeing a few billion URLs added every day.

Something the blog post mentions that I hadn’t thought about is how the Web is now throwing off URLs automatically. For instance, blog sites create a new calendar entry every day. There is really no upward limit on the number of URLs that Google will have to index in order to have a comprehensive accounting of the Web.

But, as TechCrunch has pointed out, discovering all these URLs and actually storing information about them are two different things. Google probably only stores about 40 billion URLs, after eliminating spam, duplicates, and other noise.

Also, is Google actually the most complete index of the Web? Apparently not as of this week, as Cuil (pronounced “cool,” but it doesn’t work for me) unveiled its search engine, which it promotes as the world’s biggest. Using it, I think maybe they focused on scale before they focused on usability.

Kent Anderson

@kanderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

The Scholarly Kitchen

Innovation Showcase Highlights Cutting-Edge Publishing Solutions

View photos from the 46th Annual Meeting!

Kent Anderson

Related Articles:

Next Article: