Query Services
The Sindice API provides programmatic access to its search capabilities. Please refer here for support questions.
Query services (v2)
There are two types of search in the new API: term search and advanced search.
In general these APIs are based on the OpenSearch 1.1 specification.
- the q parameter specifies the query
- the page parameter (mandatory) specifies the result page. Pages are 1-indexed, so the first page is 1, the second is 2 and so on.
- the qt parameter must be either "term" or "advanced" to select between term Search and Triple Search.
- the sortbydate parameter is a boolean flag that specifies whether the results have to be sorted by date (they are sorted by relevance, otherwise).
Example:
http://api.sindice.com/v2/search?q=Rome&qt=term&page=1
http://api.sindice.com/v2/search?q=Rome&qt=term&page=1&sortbydate=1
Term Search
Term Search allows you to retrieve documents that are related to keywords and or URIS.
to activate the Term Search use qt=term in the query parameters. Example:
http://api.sindice.com/v2/search?q=Rome&qt=term
Currently, term search enjoys better ranking and is in general more suitable when searching for user provided strings.
Term search automatically parses URIs and uses them to look at URIs inside the RDF. Example:
For the complete documentation of the Term Search query language see http://sindice.com/developers/querylanguages.
Advanced Search
Advanced Search allows the use of triple level expressions in the query. Example
will locate RDF that contain resources which have "foaf:name" "Renaud Delbru".
For the complete documentation of the Advanced Search query language see http://sindice.com/developers/api#QueryLanguages.
Combined Search
Just like the advanced search, but with an additional parameter that specifies a term query. This additional query will be combined with the advanced query using an AND operator. For example:
will locate those resources which have "foaf:name" "Renaud Delbru" and contain the word "michele".
Result formats
You can negotiate the content ant retrieve three different formats:
- json: curl -H "Accept: application/json" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
- rdf: curl -H "Accept: application/rdf+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
- atom: curl -H "Accept: application/atom+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
The basic format has three "groups" of fields :
- generation time of this search
- base url, without the specific page
- number of total results
- url of this result page
- url of previous, next, first and last page of results
- link to the HTML alternate representation for this page, in the normal sindice website
- author field, Sindice.com
- number of items per page
- starting index in this page
- a Query object with fields that allow replaying of this query (search Term, page, role)
then there is a list of entries, each one has
- title, a list of the document labels in JSON and RDF, and a single field with comma separated strings for Atom (we can't change the spec)
- formats, a list, for example RDFa and Microformat
- content, a simple string such as: "13 triples in 1000 bytes"
- link, the document URI
- updated, the document modification date
In specific, a JSON-encoded object looks like this:
{
"updated": "2008/06/03 18:27:29 \+0100",
"base": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term"
"totalResults": 211,
"search": "http://www.sindice.com/opensearch.xml",
"self": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
"previous": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=",
"title": "Sindice search: gabriele",
"last": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=22",
"alternate": "http://sindice.com/v2/search?q=gabriele\u0026qt=term",
"author": "Sindice.com",
"first": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
"itemsPerPage": 10,
"startIndex": 1,
"next": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=2",
"query":
{
"role": "request",
"startPage": 1,
"searchTerms": "gabriele"
},
"link": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
"entries":
[
{
"title": ["Gabriele Albertini"],
"formats": ["RDF"],
"content": "183 triples in 32484 bytes",
"link": "http://dbpedia.org/resource/Gabriele_Albertini",
"updated": "2008/05/23"
},
{
"title": ["Gabriele Paonessa"],
"formats": ["RDF"],
"content": "111 triples in 16153 bytes",
"link": "http://dbpedia.org/resource/Gabriele_Paonessa",
"updated": "2008/05/23"
},
...
]
}
The format closely matches the OpenSearch format, so refer to that for further details, the only two differences are the title field in the entry, which is a list (a document can have different labels) and the format field which is a list of the formats found in one page (for example, RDFa and microformats).
Example ATOM format:
<?xml version="1.0" encoding="iso-8859-1"?> <feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:sindice="http://sindice.com/vocab/fields#" xmlns="http://www.w3.org/2005/Atom"> <title>Sindice search: gabriele</title> <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"/> <updated>2008-06-03T19:50:39+01:00</updated> <author> <name>Sindice.com</name> </author> <id>http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term</id> <opensearch:totalResults>211</opensearch:totalResults> <opensearch:startIndex>1</opensearch:startIndex> <opensearch:itemsPerPage>10</opensearch:itemsPerPage> <opensearch:Query role="request" startPage="1" searchTerms="gabriele"/> <link href="http://sindice.com/search?page=1&q=gabriele&qt=term" rel="alternate" type="text/html"/> <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term" rel="first" type="application/atom+xml"/> <link href="http://api.sindice.com/v2/search?q=gabriele&qt=term" rel="previous" type="application/atom+xml"/> <link href="http://api.sindice.com/v2/search?page=2&q=gabriele&qt=term" rel="next" type="application/atom+xml"/> <link href="http://api.sindice.com/v2/search?page=22&q=gabriele&qt=term" rel="last" type="application/atom+xml"/> <link href="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term" rel="self" type="application/atom+xml"/> <link href="http://www.sindice.com/opensearch-term.xml" rel="search" type="application/opensearchdescription+xml"/> <entry> <title>Gabriele Albertini</title> <link href="http://dbpedia.org/resource/Gabriele_Albertini"/> <id>http://dbpedia.org/resource/Gabriele_Albertini</id> <updated>2008-05-23T00:00:00+01:00</updated> <sindice:format>RDF</sindice:format> <content>183 triples in 32484 bytes</content> </entry> <entry> <title>Gabriele Paonessa</title> <link href="http://dbpedia.org/resource/Gabriele_Paonessa"/> <id>http://dbpedia.org/resource/Gabriele_Paonessa</id> <updated>2008-05-23T00:00:00+01:00</updated> <sindice:format>RDF</sindice:format> <content>111 triples in 16153 bytes</content> </entry> </feed>
It is a simple ATOM file, plus the OpenSearch schema plus a single additional tag for carrying informations about the document format. You should be able to parse this easily with any XML parser.
The RDF representation defines the base search URI as a search:Result object, which has many search:resultPage}}s, each one having many {{search:Entry. the other fields should be obvious, and mimic the other searches.
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:fields="http://sindice.com/vocab/fields#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns="http://sindice.com/vocab/search#"> <Results rdf:about="http://api.sindice.com/v2/search?q=gabriele&qt=term"> <dc:title>Sindice search: gabriele</dc:title> <dc:date>2008-06-03T19:54:11+01:00</dc:date> <dc:creator>Sindice.com</dc:creator> <totalResults>211</totalResults> <itemsPerPage>10</itemsPerPage> <terms>gabriele</terms> <firstPage rdf:resource="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"/> <lastPage rdf:resource="http://api.sindice.com/v2/search?page=22&q=gabriele&qt=term"/> <page rdf:resource="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"/> <opensearchDescription rdf:resource="http://www.sindice.com/opensearch.xml"/> </Results> <ResultPage rdf:about="http://api.sindice.com/v2/search?page=1&q=gabriele&qt=term"> <startIndex>1</startIndex> <previousPage rdf:resource="http://api.sindice.com/v2/search?q=gabriele&qt=term"/> <nextPage rdf:resource="http://api.sindice.com/v2/search?page=2&q=gabriele&qt=term"/> <htmlPage rdf:resource="http://sindice.com/search?page=1&q=gabriele&qt=term"/> <entry rdf:resource="#result1"/> <entry rdf:resource="#result2"/> ... </ResultPage> <Entry rdf:about="#result1"> <dc:title>Gabriele Albertini</dc:title> <link rdf:resource="http://dbpedia.org/resource/Gabriele_Albertini"/> <dc:created>2008-05-23T00:00:00+01:00</dc:created> <fields:format>RDF</fields:format> <content>183 triples in 32484 bytes</content> <rank>1</rank> </Entry> <Entry rdf:about="#result2"> <dc:title>Gabriele Paonessa</dc:title> <link rdf:resource="http://dbpedia.org/resource/Gabriele_Paonessa"/> <dc:created>2008-05-23T00:00:00+01:00</dc:created> <fields:format>RDF</fields:format> <content>111 triples in 16153 bytes</content> <rank>2</rank> </Entry> ... </rdf:RDF>
Integrating JSON in your script
If you want, you can add an additional argument to the request called callback, which will cause the code to be wrapped in a function with the name you choose.
This allows clean integration of the Sindice results in your webpage, for example:
<script type="text/javascript" src="http://api.sindice.com/v2/search?q=mike&qt=term&format=json&callback=showSindiceResults" />
Notice that to force the rendering of JSON output we added an additional parameter format. It can obviously be used with values atom and rdfxml
Other API versions
Currently, our API Version is 2, with base address http://api.sindice.com/v2/
As new APIs will be released, the old one will be kept at the existing locations.
API v1
The previous version of Sindice API is still available. It implements the following 3 searches:
In the simple APIs there are 3 query types, which mimic the old Sindice search queries,
- Lookup URIs. Syntax: http://api.sindice.com/v1/lookup?uri=... [Superceeded by the V2 Term Query]
- Lookup keywords. Syntax: http://api.sindice.com/v1/lookup?keyword= [Superceeded by the V2 Term Query]
- Lookup IFPs. Syntax: http://api.sindice.com/v1/lookup?property=...&uri=foo [Superceeded by the V2 Advanced query]
V1 Result Formats
The result format can be selected in two ways: by HTTP content negotiation or by an optional format query parameter. The default format is HTML.
Content negotiation examples:
- To get results in RDF:
curl -H "Accept: application/rdf+xml" http://api.sindice.com/v1/lookup?keyword=berlin
- To get results in JSON:
curl -H "Accept: application/json"http://api.sindice.com/v1/lookup?keyword=berlin
- To get results in Plain text:
curl -H "Accept: text/plain" http://api.sindice.com/v1/lookup?keyword=berlin
- To get results in XOXO:
curl -H "Accept: text/html" http://api.sindice.com/v1/lookup?keyword=berlin
V1 Query Parameters
- keyword, searches for documents which contain the given keyword, the parameter value specifies the keyword to look for, akin to the term search in the v2 search. Example: http://sindice.com/query/lookup?keyword=sindice
- uri, searches for documents which mention the given URI. Parameter value is the URI for the index search, %-encoded. Example: http://sindice.com/query/lookup?uri=http%3A%2F%2Fwww.w3.org%2FPeople%2FBerners-Lee%2Fcard%23i
- property and object, searches for documents that contain entities which have this property with value this object. It used to work only for Inverse Functional Properties, but now works for any property. Example: http://sindice.com/query/lookup?property=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fmbox&object=mailto%3Atimbl%40w3.org.
- format, specifies the format in which results will be encoded. Possible values are: rdfxml, txt, json, html. Query example: http://sindice.com/query/lookup?keyword=sindice&format=txt. Note: absence of this attribute causes Sindice to adjust results format as specified in the Accept HTTP header, with a default of HTML.
- callback, when the JSON return format is returned, the callback parameter can be specified to indicate the structure name that will encode the response. Example: http://sindice.com/query/lookup?keyword=sindice&format=json&callback=sindice
- page, Sindice returns result in sets of 10. This parameter can be used to get a specific result page. Please note that the return code 401 will be returned by Sindice if the page parameter is set beyond what is considered an acceptable value (currently 100, the tenth page). Example: http://sindice.com/query/lookup?keyword=sindice&page=2
Instead of using a single query type parameter, the V1 API uses multiple parameters. This means that you can specify more than one arguments, and they are tried in order: thus, specifying both keyword and url means that you will get the results only for the former.
Query Limits
Sindice currently limits to 100 the number of result pages for each query. For special needs you can refer to our developer forum or contact us directly.
Sindice currently offers two ways to search for semantic data: Term Search and Advanced Search.
Term Search
The Term query language allows you to search for keywords and URIs in both the original and inferred content of indexed documents. This means that, for example, if you look for paolo it will match both documents containing this word in a URI, like:
<http://paolo.capriotti.name/foaf.rdf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>
and those containing it in a literal:
<http://paolo.capriotti.name/foaf.rdf#me> <http://xmlns.com/foaf/0.1/givenName> "Paolo"
You can combine simple queries using boolean operators (AND, OR) and parentheses for example:
(tim AND berners AND lee) OR timbl OR http://www.w3.org/People/Berners-Lee/card
also note that the AND operator is automatically assumed when entering multiple terms.
You can additionally restrict your query by using field operators. For example, a query like 'washington' would match people with this name, the city, the football team, and so on. If you want to just look for people, you can try:
Or, if you want to exclude people from the results, you can use the '-' operator, like:
Other domain operators include:
- label, words in the title of the document
- domain, e.g. www.deri.ie
- format, one of RDF, RDFA, MICROFORMAT, XFN, HCARD, HCALENDAR, HLISTING, HRESUME, LICENSE, GEO, ADR
- ontology, such as skos, foaf...
Namespaces
Popular namespaces are supported for convenience, for example in any query you can use foaf:knows instead of http://xmlns.com/foaf/0.1/knows
Some supported prefixes are: foaf, dc, owl, rdf, rdfs, skos, pimo, swc, sioct, void, sioc, dbpedia, bio, doap. For a complete list, please refer to prefix.cc
In addition, we support the following namespaces which map directly to
microformats: vcard, xhtml, doac, geo, hlisting, ical, rev, xfn (see Microformats
Support for details).
Example query: vcard:name
Advanced Search
Sindice can also leverage the full power of the triple concepts through the advanced query language. The syntax of this query language is simple to use: the basic element is a triple pattern. A triple pattern is a complete or partial representation of a triple:
* <http://xmlns.com/foaf/0.1/name> "Renaud Delbru" * <http://xmlns.com/foaf/0.1/name> 'Renaud AND Delbru'
The symbol '*' stands for a wildcard matching one of the elements of a triple. A triple pattern composed of three '*' is not allowed, and will return no results.
Due to a bug, the Sindice beta1 query engine does not behave as expected when using two wildcards in a triple pattern. The problem will be fixed in beta2. For the moment, we recommend to avoid the use of two wildcards in a single triple pattern, since it will lead to unexpected results.
The query language enables boolean combination of triple patterns using binary operators:
- AND: intersection
- OR: Union
- -: Complement (Include - Exclude)
- (): Grouping
For example, you can try combining two triple queries:
* <http://sindice.com/exfn/0.1/friend> <http://klogs.org> AND * <http://sindice.com/exfn/0.1/friend> <http://www.isaacmao.com>
But you can also use the same operators in a literal pattern defined by single quotes, for example try this query:
* <http://sindice.com/hlisting/0.1/itemName>
'ipod AND (nano OR shuffle)'
If a literal element is enclosed in double quotes, an exact match will be performed. The query language also allows multiple operators, for example:
(* <http://xmlns.com/foaf/0.1/givenname> "Giovanni" AND * <http://xmlns.com/foaf/0.1/family_name> "Tummarello") OR * <http://xmlns.com/foaf/0.1/name> "Giovanni Tummarello"
will find matches for either a pair of given and family name, or a single full name.
Api Documentation
- Query Services
- Ping Submission
- Sindice Cache
- Sindice Live
- Microformat Support
- Sig.ma API
Tools
- AJAX Search Widget
- Third Party Libraries
- Siren indexing technology
- Web Data Inspector
- Any23 Library
- OpenSearch Plugin
