Sunday, August 21, 2011

Google Caffeine: Towards a Fresher Google Index

Back in December 2009, we wrote a post on the issues that Google faced with real-time search. At this time the integraton of real-time search and especially Twitter updates in Google's search engine result pages showed some weaknesses.

Since then, things have changed and Google has reviewed its way of indexing documents and build a completely new search index called Google Caffeine. On July 8th, Google has officially announced and explained the Google Caffeine.

What is new with Caffeine

The orginal Google search index was built some time ago, when the Internet was much smaller and very different. In the last couple of years and even months, the content on the internet has increased dramatically and the need for instantaneous
and fresh information has put a lot of pressure on the Google search index resulting in large delays to display the latest information.

The 'old' Google search index was build around layers. Each layer was updated at different rates (some layers were updated more frequently than others for example for Google News), but the main layer or main index required to crawl a very large amount of web pages to be fully updated. This process could take up to 30 days to integrate newly created pages in its index.

Google Caffeine Search Index Schema

Although Google does not give a lot of information on what it is radiacally different, Caffeine appears to have higher crawling capabilities. The crawling process has also been reviewed. Rather than crawling the entire web at once, Caffeine crawls smaller portions and update the index on constant basis, which would improve the freshness of information.

At this stage, it is still early days to see the difference and measure the real effect of the changes.

Update:
After having published a few posts since the Google Caffeine release, I have noticed that most of my posts were indexed in the next 30 mins after the post has been published. Not sure if it is because we use Blogger, but it seems to be working quite well so far!

No comments:

Post a Comment