Wednesday, June 4, 2008

Is Google search suffering from its success?

When Google started, the innovation was that it weighted relevance based on the number of users that had gone to a web page looking for a specific topic. That is, it used historical behavior records to judge relevance. Now that everyone uses Google all the time and has done so for a few years, usage patterns might reflect the rankings of past Google results, as much as they reflect true relevance.

When Google burst on the scene, several search engines had been in use, including human indexing, a la Yahoo.
Several years later, the landscape has changed. Google has a quasi monopoly on searches. Even more, I have noticed people not even bothering to enter a URL of a site they know, but instead Googling a name and clicking on the link to it, because it's faster than entering a URL.
So what?
So this means that in the last three of four years, if people were looking for something, chances are they Googled it. They looked it up and clicked on some of the links in the first page of results. A few might go to the second or third page. I doubt more than 10% of queries ever get beyond the third page.
There is no issue as long as the most valuable site for that query does indeed show up in the first screen - or second or third at the max.
However, what if it doesn't? What if new material was added to an existing site that now makes it the the most relevant site for your query? Or if it's a new site? Or worse, what if there has always been an excellent site, but it has always been overlooked?
There are plenty of services that are dedicated to boosting the visibility of new sites. However, a good site that has been consistently overlooked, will keep slipping further and further down in the rankings.
Imagine, for instance, a site that was initially on page three, because it was new. Nobody got so far, while everyone was using the first couple of links this query brought up, and new websites got added, so the site kept slipping further and further down, so that several years later, it might be on page 15. Nobody will ever get there and nobody will know what they are missing.
Commercial sites would advertise to compensate for the lack of visibility. (This might suggest that ads are more relevant today than they were initially. An interesting thought to study.) However, non-commercial sites will just be obfuscated.
How would one measure this loss of good relevant sites? What I've noticed -again, just observing a sample of one- is that if the first page doesn't give me good material, I won't dig down to other pages. Instead, I will re-phrase my query and try again, maybe several times. So, if Google wanted to measure the effectiveness of its search results, it couldn't just look at click through from the results page. Instead, it would have to study the number of re-phrasings of the original search terms. And that is not an indication necessarily of poor search results. It also reflects the searcher's growing knowledge of the subject matter and the growing clarity of his or her query.
One could measure how long a user spent actively looking through a search result-linked page. That would indicate high relevance, presumably. Except that some searches are simple, pointed, queries where a one-second answer is enough. I don't know if we already have knowledge models that can distinguish between a quick and a deep search.

In any case, the one with the data on search patterns is Google. I'm sure they must have developed some models for evaluating searches' success and user frustration. I wonder how accurate they are? Would they ever come out and tell us how searching outcomes have been evolving over time? What I'm always afraid of, in a Google search or elsewhere, is that I don't know what I don't know. We need an additional service that evaluates and categorizes all the search results that do not make it to page 1. Remember Northern Lights?