An analysis of 400 million search engine visits to 10,000 sites done by Incapsula researchers has revealed details that might be interesting to web operators and SEO professionals.
- Google’s web-crawling bot is more active and way more thorough than any of its peers (the MSN/Bing, Baidu, and Yandex bots)
- Sites that are more often crawled by Googlebot don’t have a higher share of organic search traffic visits, meaning that Google doesn’t “play favorites” with websites.
- Google’s average visit rate per website is 187 visits/day, and the average crawl rate is 4 pages/visit. There are high and low extremes of both these rates, and it seems that content-heavy and frequently updated websites – forums, news sites, big e-shops – are crawled more thoroughly.
Since Google is still the most used search engine globally, most website operators never block Googlebot as they don’t wan to “disappear” from Google.
Unfortunately, this preference allows fake Googlebots to do their dirty deeds in the form of DDoS attacks, content theft, spamming and hacking.
Googlebot imposters take on Googlebot’s identity to gain privileged access to websites and online information. They do so by misusing Googlebot’s HTTP(S) user-agent, which functions as an ID for website visitors. According to the data gathered by Incapsula researchers, over 4% of bots operating with this user-agent are not actually Googlebots.
“By observing recent data, collected from over 50 million Fake Googlebot sessions, we saw that 34.3% of all identified impostors were explicitly malicious, with 23.5% of these bots used for Layer 7 DDoS attacks,” they noted.
“These numbers make all sorts of sense because DDoS is just the situation where Googlebot’s ID can come in handy, particularly in the case of security solutions that still rely on rate limiting instead of case-by-case traffic inspection.”
And this attack is particularly effective as it puts web operators in a lose-lose situation: either they block all Googlebots (including the fake ones) and disappear from the search results, or they allow them and the site gets DDoSed and downed if they don’t have the money to buy more bandwidth or effective DDoS protection.
Fake Googlebot visits originate from botnets, and currently most of the traffic comes from the US (25.2 %), China (15,6 %), Turkey (14.7%), Brazil (13.49%) and India (8.4%). Googlebot’s visits originate overwhelmingly from the US (over 98 percent).
“The good news is that Fake Googlebots can be accurately identified using a combination of security heuristics, including IP and ASN verification – a process which allow you to identify bots based on their point of origin,” they added. But again, these practices are not usually available to owners of small websites.