>

@IT-Spezis: Wer oder was ist ein "Snapbot/1.0"

#
Dieses Teil/Ding oder was auch immer treibt sich auf meiner Homepage rum. Nicht nur einmal, sondern ca. 20 mal.

Eine Suchmaschine ist es wohl eher nicht. Eine Hackattacke?

Speedy
#
Kann das jemand übersetzten?

Danke.

#
das ist dann wohl ein tool/script, dass die leistung des servers misst und auswertet!

verfügbarer speicher, prozessor und festplattenauslastung, verbindung uvw
#
Ich glaub´ der Link, den ich zuletzt genannt hatte war nicht richtig.
Hier gibt´s ein paar Infos und Tipps:

I’ve seen the same bot on two different IP ranges, so I emailed Backbone about this bot and got this reply today:

Hi Mike:
The bots are belonged to snap.com. They are one of search engines for websites. The purpose of the bots are for web indexing. It is my
understanding that the bots are harmless.

Here is a little more explanation from snap.com regarding how their bots
work.

As robot by itself is an aggresive agent software
on the net, we have to be very careful in
running it, and we have implemented
some self-constraint mechanism in the robot software
as follows:

a. Our bot strictly abides by the robots.txt exclusion
convention. In every run of crawling, the bot will check
the robots.txt file in the target sites first to filter out
urls that are specified in robots.txt.

b. In order not to cause heavy load on the crawled servers,
we have set a upper bound of fetching per day
for each unique IP.
(many different site names may have same IP, so the
bound is on the IP)
In addition, the requests to a particular site will be
distributed evenly in a the crawling time interval to
avoid the possibility of fetching a lot urls in a short
time period.

c. Some site adminstrators do not like crawlers, but
they don’t know robots.txt and do not put a robots.txt
in their document root directory of the web site.

In our experience, although very few,
there could be some complaints from
some site managers for the crawling and sometimes some
would complain about DOS attack.

There is a tutorial regarding robots.txt in document root.
http://www.searchengineworld.com/robots/robots_tutorial.htm

If they have
User-agent: snap.com beta crawler v0
Disallow: /
in the file, our bots will skip their site.

If you have any questions or concerns, please feel free to contact us.

Quelle: http://www.heliopolis.us/archives/2006/05/bot-conundrum/#comment-5341


Teilen