Ich glaub´ der Link, den ich zuletzt genannt hatte war nicht richtig. Hier gibt´s ein paar Infos und Tipps:
I’ve seen the same bot on two different IP ranges, so I emailed Backbone about this bot and got this reply today:
Hi Mike: The bots are belonged to snap.com. They are one of search engines for websites. The purpose of the bots are for web indexing. It is my understanding that the bots are harmless.
Here is a little more explanation from snap.com regarding how their bots work.
As robot by itself is an aggresive agent software on the net, we have to be very careful in running it, and we have implemented some self-constraint mechanism in the robot software as follows:
a. Our bot strictly abides by the robots.txt exclusion convention. In every run of crawling, the bot will check the robots.txt file in the target sites first to filter out urls that are specified in robots.txt.
b. In order not to cause heavy load on the crawled servers, we have set a upper bound of fetching per day for each unique IP. (many different site names may have same IP, so the bound is on the IP) In addition, the requests to a particular site will be distributed evenly in a the crawling time interval to avoid the possibility of fetching a lot urls in a short time period.
c. Some site adminstrators do not like crawlers, but they don’t know robots.txt and do not put a robots.txt in their document root directory of the web site.
In our experience, although very few, there could be some complaints from some site managers for the crawling and sometimes some would complain about DOS attack.
Eine Suchmaschine ist es wohl eher nicht. Eine Hackattacke?
Speedy
http://www.iis-resources.com/modules/mydownloads/singlefile.php?cid=13&lid=417
Danke.
verfügbarer speicher, prozessor und festplattenauslastung, verbindung uvw
Hier gibt´s ein paar Infos und Tipps:
I’ve seen the same bot on two different IP ranges, so I emailed Backbone about this bot and got this reply today:
Hi Mike:
The bots are belonged to snap.com. They are one of search engines for websites. The purpose of the bots are for web indexing. It is my
understanding that the bots are harmless.
Here is a little more explanation from snap.com regarding how their bots
work.
As robot by itself is an aggresive agent software
on the net, we have to be very careful in
running it, and we have implemented
some self-constraint mechanism in the robot software
as follows:
a. Our bot strictly abides by the robots.txt exclusion
convention. In every run of crawling, the bot will check
the robots.txt file in the target sites first to filter out
urls that are specified in robots.txt.
b. In order not to cause heavy load on the crawled servers,
we have set a upper bound of fetching per day
for each unique IP.
(many different site names may have same IP, so the
bound is on the IP)
In addition, the requests to a particular site will be
distributed evenly in a the crawling time interval to
avoid the possibility of fetching a lot urls in a short
time period.
c. Some site adminstrators do not like crawlers, but
they don’t know robots.txt and do not put a robots.txt
in their document root directory of the web site.
In our experience, although very few,
there could be some complaints from
some site managers for the crawling and sometimes some
would complain about DOS attack.
There is a tutorial regarding robots.txt in document root.
http://www.searchengineworld.com/robots/robots_tutorial.htm
If they have
User-agent: snap.com beta crawler v0
Disallow: /
in the file, our bots will skip their site.
If you have any questions or concerns, please feel free to contact us.
Quelle: http://www.heliopolis.us/archives/2006/05/bot-conundrum/#comment-5341