statistics often lie
Image by mac steve via Flickr

In statistics, I was taught that a sample size of 30 is usually sufficient for a fairly reliable estimate around a particular question. In most cases where curiosity is the only thing at stake, asking 30 people will give you a pretty fair idea of the lay of the land.

In usability testing, the numbers get even smaller — ask 4-5 people to use something, and you’re seeing everything you’re going to see by person 3, with subjects 4 and 5 merely confirming what you’ve already learned.

With the Internet, the idea of scale has become baked into our daily lives. Reach anyone, anywhere, with information at the speed of light.

That’s why it was a surprise when Hal Varian, the chief economist of Google, stepped forward recently in an interview with CNet, saying scale is “bogus.”

Varian was responding to Steve Ballmer‘s claim that the Microsoft-Yahoo! deal was made to create sufficient scale for both to compete with Google:

With our new Bing search platform, we’ve created breakthrough innovation and features. This agreement with Yahoo will provide the scale we need to deliver even more rapid advances in relevancy and usefulness.

Granted, in the full quote, Ballmer is throwing the term “scale” around pretty loosely, apparently referring both to the scale of staff and expertise the deal will create, and to the scale of the search index it will yield. However, the premise that scale is important to Ballmer is clear.

Varian is skeptical of the notion that scale in search matters much anymore — skeptical on a fundamental, theoretical basis:

when you look at data, there’s a small statistical point that the accuracy with which you can measure things as they go up is the square root of the sample size. So there’s a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate. Another point that I think is very important to remember…query traffic is growing at over 40 percent a year. If you have something that is growing at 40 percent a year, that means it doubles in two years. So the amount of traffic that Yahoo, say, has now is about what Google had two years ago. So where’s this scale business? I mean, this is kind of crazy.

Varian’s criticism also touches on something Nicholas Carr commented on earlier, a view by Tim O’Reilly that:

Ultimately, on the network, applications win if they get better the more people use them. As I pointed out back in 2005, Google, Amazon, ebay, craigslist, wikipedia, and all other other Web 2.0 superstar applications have this in common.

Carr argues that Google isn’t really benefiting from the “network effect” since:

What Google did was to successfully mine the “intelligence” that lies throughout the public web (not just within its own particular network or user group). The intelligence embedded in a link is equally valuable to Google whether the person who wrote the link is a Google user or not.

My head was spinning by this time. Carr, Varian, Ballmer, O’Reilly — scale, network effect, power laws. Yip!

So I tried to boil this down.

Google does benefit from the network effect because the scale of their index is the entire Web on a macro level. They need that to grow, and the number of links to grow, for their algorithms to get better. But for linking around particular terms and sites on a micro level, scale is less important. A very rare term might have a very small scale, so scale is relative in Google. Therefore, scale matters to Google in some cases and doesn’t matter in others. This is part of its power — it can do a good job at either scale.

Varian’s argument is that the velocity of growth makes a certain scale meaningless pretty quickly, so acquiring a new scale is just gaining you a few months and very little real market power because the power law math means you’ve bought deep into the decimal points.

So, what about the new entrants and scale? Twitter? Facebook?

Scale matters to them, too, but in a different way — both are more powerful when more people use them.

This is not true of Google, since information and links are what Google depends on.

Facebook and Twitter rely on people and links. This gives them two dimensions to scale, one of which has a definite plateau — there are only so many people. So, to succeed, both have to get as high a percentage of the population using them, and then make it terribly easy to contribute links.

Once this is done, they will have achieved the same scale potential Google has (go big, go niche), but with a qualitative difference — instead of an algorithm concatenating links, people on Facebook and Twitter will be recommending things to one another, and a search experience based on personal recommendations can emerge. And then, while Varian is watching Yahoo! and Microsoft and debating the scale of information systems, Facebook will eat Google’s lunch (<–fascinating article).

For STM publishers, the issue of scale in social media is about the number of people — but in this case, the people in an existing community. Social media can be scaled effectively in smaller communities, where it can be statistically valid and very usable. You don’t have to make a Facebook-level play to get the benefits of increased engagement, greater loyalty, user-generated ratings and content, and audience availability.

Ultimately, part of scale is keeping things in perspective.

Reblog this post [with Zemanta]
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.