SindiceBot
About SindiceBot
SindiceBot is a web crawler that collects web pages and RDF documents for the Sindice search engine.
Excluding SindiceBot from all or parts of your site
If you don't want SindiceBot to access your site, you can create a file called robots.txt in the top-level directory of your site.
To exclude SindiceBot completely from your site, add this to the robots.txt file:
User-agent: SindiceBot Disallow: /
To exclude SindiceBot only from the /private directory, but allow it access to the rest of your site, add this to the robots.txt file:
User-agent: SindiceBot Disallow: /private/
For more information about the robots.txt file, please refer to the Web Robots Pages.
Limiting crawl frequency
SindiceBot will wait for about two seconds between successive requests to your site. If you want SindiceBot to wait longer between requests, you can use the Crawl-Delay command in robots.txt. To do this, add this to your robots.txt file:
User-agent: SindiceBot Crawl-delay: 10
This command instructs SindiceBot to wait at least 10 seconds between requests.
Contact us
If you have further questions, comments or feedback, please send a message to the public .
Api Documentation
- Sindice API
- Query language
- Sindice Search API
- Sindice Cache API
- Sindice Live API
- Ping Submission API
- Metadata extractions
Tools
- AJAX Search Widget
- Third Party Libraries
- Siren indexing technology
- Web Data Inspector
- Any23 Library
- OpenSearch Plugin
Help
- Publishing Web Data
- SindiceBot
- Analyze Web of Data