SindiceBot

About SindiceBot

SindiceBot is a web crawler that collects web pages and RDF documents for the Sindice search engine.

Excluding SindiceBot from all or parts of your site

If you don't want SindiceBot to access your site, you can create a file called robots.txt in the top-level directory of your site.

To exclude SindiceBot completely from your site, add this to the robots.txt file:

User-agent: SindiceBot
Disallow: /

To exclude SindiceBot only from the /private directory, but allow it access to the rest of your site, add this to the robots.txt file:

User-agent: SindiceBot
Disallow: /private/

For more information about the robots.txt file, please refer to the Web Robots Pages.

Limiting crawl frequency

SindiceBot will wait for about two seconds between successive requests to your site. If you want SindiceBot to wait longer between requests, you can use the Crawl-Delay command in robots.txt. To do this, add this to your robots.txt file:

User-agent: SindiceBot
Crawl-delay: 10

This command instructs SindiceBot to wait at least 10 seconds between requests.

Contact us

If you have further questions, comments or feedback, please send a message to the public .

SindiceBot

About SindiceBot

Excluding SindiceBot from all or parts of your site

Limiting crawl frequency

Contact us

Api Documentation

Tools

Help