Raw Numbers Mean Nothing
Posted Tuesday, June 21, 2005, at 05:38PM by e;
This morning Sean linked to a a comparison Triston Louis did of how Google and Yahoo report links to blogs.
Tristan takes a look at the number of links Yahoo and Google and Technorati report as pointing to a site and uses that to infer how well each engine is doing in covering blogs.
I skimmed it this morning, and then just went back and had a nice back and forth with Sean about it. My contention: the comparison is worthless.
Tristan's data shows that generally Google only reports about 3% as many links to the Technorati top 100 blogs as Yahoo does. For my statistically insignificant blogs the difference is varied: blog.ericrichardson.com shows 400 links in google, and 2,760 in yahoo (14.4%); blogdowntown shows 222 and 28,800 (0.7%).
The question is whether those 28,800 "links" Yahoo tells me about actually mean anything.
It would seem that they don't, since 24,800 of them come from LA Voice. Now, I appreciate that Mack links me in the sidebar. But why do I care that Yahoo can find close to 25,000 permutations of LA Voice URLs that happen to have my link on them? The problem with dynamic content is that there are a near infinite number of ways to access the same information. Back when we were all writing HTML there were a certain number of "pages" on a site. They were files. Today sites that are dynamic don't have a conception of "pages." If you look at the archives, blogdowntown has 294 posts. But who knows how many different URL combinations might allow access to those same pieces of information?
I can create a site that has 50,000 "pages" but very little content just as easily as I can create a site that has 100 pages of good content (well, probably easier... 100 pages of good content takes time). To have the former site indexed more fully only increases the noise in the index.
There's no assurance that more entries in the index means that an engine is hitting more information. And in the end, that's what matters: information. I don't care about raw numbers -- ever. Raw numbers are worthless. Raw "link" counts are worthless. They might be interesting to look at, but I would say that they have no connection to the reality of how comprehensively any engine is indexing the web. The dynamic reality of most blog software only exagerates this disconnect.
Comments —
Add Your Comment —
Eric Richardson lives in Los Angeles, California, and is generally trying to figure out the future of community news. He is the publisher of blogdowntown, an online news site for Downtown Los Angeles.
Over on blogdowntown:
Recent Comments:
- Littlebighuman on Why Can't Open Office...
- 0xentrot on Building a Phone System...
- Some guy answering a really old question on Why Can't Open Office...
- openoffice user on Why Can't Open Office...
- e; on The Phone is Ringing
- kda406 on The Phone is Ringing
- Jeff Phillips on The Phone is Ringing
- David (Australia) on The Phone is Ringing
- frank on Why Can't Open Office...
- Dave on The Phone is Ringing
On This Date
- 2005
- The Joy of Good Drivers
- Midnight Ridazz: The Heavy Metal Ride
- 2003
- Post #100 of 2003
- 2000
- dream

Recent Comments: