Bing bot aggressively spidering LCP?

Any chit-chat not directly LonelyCache website related
Post Reply
User avatar
Corfman Clan
Global Moderator
Posts: 914
Joined: January 17th, 2012, 12:21 am

Bing bot aggressively spidering LCP?

Post by Corfman Clan »

This morning I received an email from the admin of the company hosting LonelyCache and thought it was rather interesting
We run an automated process to attempt to detect and block data harvesting from some of our websites. The process watches for "aggressive" web browsing activity from individual IP addresses. In general any IP that requests more than 30 pages per minute averaged over a 5 minute period triggers an alert. In addition to monitoring or own website this system monitors all sites we host.

For some time I've noticed that the Bing bot has been aggressively spidering your Lonely Cache site. Bing bot is often visiting your site from multiple IP addresses simultaneously requesting upwards of 100-150 pages per minute combined. What I thought was odd is that the Bing bot is aggressively spidering your site almost daily. I haven't seen this type of aggressive spidering on any other sites we host. Do you have any idea why Bing bot is so interested in the Lonely Cache Project?
What's your ideas on why Bing bot is so interested in LonelyCache?
Image
User avatar
skeeper
Benefactor
Posts: 106
Joined: January 18th, 2012, 8:05 am

Re: Bing bot aggressively spidering LCP?

Post by skeeper »

What is Bing bot?
User avatar
Corfman Clan
Global Moderator
Posts: 914
Joined: January 17th, 2012, 12:21 am

Re: Bing bot aggressively spidering LCP?

Post by Corfman Clan »

the greenskeeper wrote:What is Bing bot?
Oops, I sometimes forget not everyone knows what these techie terms are. :oops:

The internet search facilites, such as Google, Yahoo, Bing, etc., have web bots (or spiders) that basically traverse (crawl) all the world wide web gathering information that allows them to return search results quickly and (hopefully) that are worthwhile. So "Bing Bot" is Bing's web bot.
Image
chris geertsen
Posts: 11
Joined: October 3rd, 2012, 9:21 pm

Re: Bing bot aggressively spidering LCP?

Post by chris geertsen »

i am not a computer expert so i would not know alot of these terms. nor do i know how they work :? but why is this a bad thing for the site?
chris geertsen
Posts: 11
Joined: October 3rd, 2012, 9:21 pm

Re: Bing bot aggressively spidering LCP?

Post by chris geertsen »

so these bot's basically help out bing google etc for helping there browser page have more links to sites. maybe because this site is so new there trying to gather as much imformation as they can so when someone searches in there browser it will show up. i have been to at least one browser where i searched this name and it did not come up at all. there is my wisdom. doubt it's that good. :roll:
User avatar
Corfman Clan
Global Moderator
Posts: 914
Joined: January 17th, 2012, 12:21 am

Re: Bing bot aggressively spidering LCP?

Post by Corfman Clan »

Of course I want the search engines to know about LonelyCache and include it as results in searches. That is a good thing. The question really is why the Bing bot is hitting LonelyCache as much as it is (way more than any other site the company is hosting).

My response to the email was
I don’t know why the Bing bot would be spidering LonelyCache that much. Perhaps it’s because all the pages in LonelyCache tend to have a lot of hyperlinks to other LonelyCache pages. With the dynamic nature of the site and the number of geocaches & geocachers in the LonelyCache territory, there is essentially millions of pages to navigate through.
That may be the reason why, I don't know. For example, for the points leaderboards, a typical page has over 400 hyperlinks and there are over 425,000 of them just for the LonelyCache Wide region.

Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be :lol:
Image
User avatar
Corfman Clan
Global Moderator
Posts: 914
Joined: January 17th, 2012, 12:21 am

Re: Bing bot aggressively spidering LCP?

Post by Corfman Clan »

Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be :lol:
Or maybe, because LonelyCache is filled with so much Baad Daata it constantly needs re-scanning...
Image
User avatar
Team Tierra Buena
Posts: 8
Joined: January 18th, 2012, 9:48 pm

Re: Bing bot aggressively spidering LCP?

Post by Team Tierra Buena »

Corfman Clan wrote:
Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be :lol:
Or maybe, because LonelyCache is filled with so much Baad Daata it constantly needs re-scanning...
Or maybe the bots have a website where they get points for visiting lonely websites!

Happy Thanksgiving, everyone!
Team Tierra Buena
Making geocaching needlessly difficult for ourselves since 2001!
rocketsciguy
Posts: 145
Joined: January 18th, 2012, 9:55 am

Re: Bing bot aggressively spidering LCP?

Post by rocketsciguy »

Corfman Clan wrote:
Corfman Clan wrote:Anyway, I was hoping for more whimsical reasons why the Bing bot might be so interested in LonelyCache, such as because it's so awesome how could it not be :lol:
Or maybe, because LonelyCache is filled with so much Baad Daata it constantly needs re-scanning...
That's funny!

I think your response to the hosting company is probably right... tons of hyperlinks on every dynamically-generated page, and every page is updated every day. Even if the content of a particular page doesn't change, the time stamp at the bottom of the page changes every update cycle, so if the Bing-Bot is doing a text-comparison of the HTML, it will find differences. Those changes probably tell the Bot to dig deeper. Blame Microsoft for having an overly aggressive, poorly designed web-crawler algorithm.

But please keep all those hyperlinks! They make the site very useful!

I think I remember from somewhere that there's a way to prevent or inhibit spiders from crawling your domain. A "policy" stored as a specially formatted 'spider.txt' file in the root directory or something like that. I bet your hosting company would be happier if the spiders only did their thing once every week or month, or not at all.
Ranger Alpha
Posts: 6
Joined: March 31st, 2012, 10:47 pm

Re: Bing bot aggressively spidering LCP?

Post by Ranger Alpha »

Does LonelyCache have a robots.txt file?
User avatar
Corfman Clan
Global Moderator
Posts: 914
Joined: January 17th, 2012, 12:21 am

Re: Bing bot aggressively spidering LCP?

Post by Corfman Clan »

Ranger Alpha wrote:Does LonelyCache have a robots.txt file?
No, it doesn't and at this time I see no compelling reason to add one.
  • A web bot may honor a robots.txt file or completely ignore it, so its utility is limited.
  • We do want the search engines to know about LonelyCache, so we don't want to direct those web bots to stay away.
  • The web hosting company isn't concerned about any adverse effects (performance or otherwise) from the Bing Bot spider. The admin was mostly just curious on what might be going on with it.
With that said, this did highlight a deficiency with our configuration that we have since changed that should make things better in the future. Currently we have the two domains: lonelycache.com and lonelycacheproject.com. We changed things so http://www.lonelycache.com is our primary web site and http://www.lonelycacheproject.com will have a permanent redirect to http://www.lonelycache.com. Before this was done, they appeared to the search engines as two different web sites, now they will appear as just one. This should help you not get duplicate search results.
Image
Post Reply