Thursday, July 10, 2008

Googlebot

A Googlebot is used by the Google as a search bot. A search bot is a web robot which is used for the purpose of searching. A web robot is generally called a web crawler (or web spider). A web crawler is an automated program which browses the web in a routine, automated manner. Crawlers can be used for other tasks such as automated web maintenance. Googlebot uses the process of crawling to find out new and updated pages to be included in the Google index.

Multiple sets of computers are used to procure millions of pages on the web. Googlebot is the program that does the fetching. An algorithmic process is used by the bot for fetching. Computer programs determine the sites to be crawled, how recurrently these sites have to be crawled and how many pages needed to be indexed. The crawling process starts with a list of web page URLs and there will be Sitemap data provided by the webmasters. Googlebot enters all these web sites and finds links on these visited pages and includes them in a list of pages to crawl. Dead links are found, new sites detected and changes to existing sites are marked. Using the new discoveries the Google index is updated.

There are two versions to Google bot, deepbot and freshbot. Deepbot crawls in depth and follows almost every link on the web and delivers as many pages as it can to the Google indexers. Crawling using deepbot is done about once in every month. This is the time when Google Dance is witnessed on the web. Freshbot crawls the web in search of latest content. It visits websites that are changing continuously and according to how often they change. Googlebot also process information present in important content tags and attributes, such as Title tags and ALT attributes. Googlebot can handle most content types, but cannot handle certain types such as Flash files or dynamic pages.

Googlebot takes up an enormous amount of bandwidth. This will result in the websites exceeding their bandwidth limit and be temporarily taken down. This is a serious problem webmasters have to tackle with. Webmaster tools are allocated by the Google to control the crawl rate. This will allow the server to handle the load more efficiently. Robot.txt files are used by webmasters to give information about their site to web robots. Using the necessary instruction in the robot.txt file you can either block or allow Googlebot from visiting your site.

1 comment: