Query Services

Sindice API Query Languages
Sindice API:

The Sindice API provides programmatic access to its search capabilities. Please refer here for support questions.

Query services (v2)

There are two types of search in the new API: term search and advanced search.
In general these APIs are based on the OpenSearch 1.1 specification.

  • the q parameter specifies the query
  • the page parameter (mandatory) specifies the result page. Pages are 1-indexed, so the first page is 1, the second is 2 and so on.
  • the qt parameter must be either "term" or "advanced" to select between term Search and Triple Search.
  • the sortbydate parameter is a boolean flag that specifies whether the results have to be sorted by date (they are sorted by relevance, otherwise).

Example:

http://api.sindice.com/v2/search?q=Rome&qt=term&page=1

http://api.sindice.com/v2/search?q=Rome&qt=term&page=1&sortbydate=1

Term Search

Term Search allows you to retrieve documents that are related to keywords and or URIS.
to activate the Term Search use qt=term in the query parameters. Example:

http://api.sindice.com/v2/search?q=Rome&qt=term

Currently, term search enjoys better ranking and is in general more suitable when searching for user provided strings.
Term search automatically parses URIs and uses them to look at URIs inside the RDF. Example:

http://api.sindice.com/v2/search?q=Giovanni+Tummarello+http%3A%2F%2Frichard.cyganiak.de%2Ffoaf.rdf%23cygri&qt=term&page=1

For the complete documentation of the Term Search query language see http://sindice.com/developers/querylanguages.

Advanced Search

Advanced Search allows the use of triple level expressions in the query. Example

http://api.sindice.com/v2/search?q=*+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E+%22Renaud+Delbru%22&qt=advanced&page=1

will locate RDF that contain resources which have "foaf:name" "Renaud Delbru".

For the complete documentation of the Advanced Search query language see http://sindice.com/developers/api#QueryLanguages.

Combined Search

Just like the advanced search, but with an additional parameter that specifies a term query. This additional query will be combined with the advanced query using an AND operator. For example:

http://api.sindice.com/v2/search?q=*+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%3E+%22Renaud+Delbru%22&qt=combined&page=1&qv=michele

will locate those resources which have "foaf:name" "Renaud Delbru" and contain the word "michele".

Result formats

You can negotiate the content ant retrieve three different formats:

  • json: curl -H "Accept: application/json" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
  • rdf: curl -H "Accept: application/rdf+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1
  • atom: curl -H "Accept: application/atom+xml" "http://api.sindice.com/v2/search?q=gabriele&qt=term&page=1

The basic format has three "groups" of fields :

  • generation time of this search
  • base url, without the specific page
  • number of total results
  • url of this result page
  • url of previous, next, first and last page of results
  • link to the HTML alternate representation for this page, in the normal sindice website
  • author field, Sindice.com
  • number of items per page
  • starting index in this page
  • a Query object with fields that allow replaying of this query (search Term, page, role)

then there is a list of entries, each one has

  • title, a list of the document labels in JSON and RDF, and a single field with comma separated strings for Atom (we can't change the spec)
  • formats, a list, for example RDFa and Microformat
  • content, a simple string such as: "13 triples in 1000 bytes"
  • link, the document URI
  • updated, the document modification date

In specific, a JSON-encoded object looks like this:

{
 "updated": "2008/06/03 18:27:29 \+0100",
 "base": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term"
 "totalResults": 211,
 "search": "http://www.sindice.com/opensearch.xml",
 "self": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "previous": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=",
 "title": "Sindice search: gabriele",
 "last": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=22",
 "alternate": "http://sindice.com/v2/search?q=gabriele\u0026qt=term",
 "author": "Sindice.com",
 "first": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "itemsPerPage": 10,
 "startIndex": 1,
 "next": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=2",
 "query":
  {
   "role": "request",
   "startPage": 1,
   "searchTerms": "gabriele"
  },
 "link": "http://api.sindice.com/v2/search?q=gabriele\u0026qt=term\u0026page=1",
 "entries":
  [
   {
    "title": ["Gabriele Albertini"],
    "formats": ["RDF"],
    "content": "183 triples in 32484 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Albertini",
    "updated": "2008/05/23"
   },
   {
    "title": ["Gabriele Paonessa"],
    "formats": ["RDF"],
    "content": "111 triples in 16153 bytes",
    "link": "http://dbpedia.org/resource/Gabriele_Paonessa",
    "updated": "2008/05/23"
   },
  ...
  ]
}

The format closely matches the OpenSearch format, so refer to that for further details, the only two differences are the title field in the entry, which is a list (a document can have different labels) and the format field which is a list of the formats found in one page (for example, RDFa and microformats).

Example ATOM format:

<?xml version="1.0" encoding="iso-8859-1"?>
<feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
      xmlns:sindice="http://sindice.com/vocab/fields#"
      xmlns="http://www.w3.org/2005/Atom">
  <title>Sindice search: gabriele</title>
  <link href="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term"/>
  <updated>2008-06-03T19:50:39+01:00</updated>
  <author>
    <name>Sindice.com</name>
  </author>
  <id>http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term</id>
  <opensearch:totalResults>211</opensearch:totalResults>
  <opensearch:startIndex>1</opensearch:startIndex>
  <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
  <opensearch:Query role="request" startPage="1" searchTerms="gabriele"/>
  <link href="http://sindice.com/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="alternate" type="text/html"/>
  <link href="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="first" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?q=gabriele&amp;qt=term"
        rel="previous" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=2&amp;q=gabriele&amp;qt=term"
        rel="next" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=22&amp;q=gabriele&amp;qt=term"
        rel="last" type="application/atom+xml"/>
  <link href="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term"
        rel="self" type="application/atom+xml"/>
  <link href="http://www.sindice.com/opensearch-term.xml"
        rel="search" type="application/opensearchdescription+xml"/>
  <entry>
    <title>Gabriele Albertini</title>
    <link href="http://dbpedia.org/resource/Gabriele_Albertini"/>
    <id>http://dbpedia.org/resource/Gabriele_Albertini</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>183 triples in 32484 bytes</content>
  </entry>
  <entry>
    <title>Gabriele Paonessa</title>
    <link href="http://dbpedia.org/resource/Gabriele_Paonessa"/>
    <id>http://dbpedia.org/resource/Gabriele_Paonessa</id>
    <updated>2008-05-23T00:00:00+01:00</updated>
    <sindice:format>RDF</sindice:format>
    <content>111 triples in 16153 bytes</content>
  </entry>
</feed>

It is a simple ATOM file, plus the OpenSearch schema plus a single additional tag for carrying informations about the document format. You should be able to parse this easily with any XML parser.

The RDF representation defines the base search URI as a search:Result object, which has many search:resultPage}}s, each one having many {{search:Entry. the other fields should be obvious, and mimic the other searches.

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:fields="http://sindice.com/vocab/fields#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/"
         xmlns="http://sindice.com/vocab/search#">
  <Results rdf:about="http://api.sindice.com/v2/search?q=gabriele&amp;qt=term">
    <dc:title>Sindice search: gabriele</dc:title>
    <dc:date>2008-06-03T19:54:11+01:00</dc:date>
    <dc:creator>Sindice.com</dc:creator>
    <totalResults>211</totalResults>
    <itemsPerPage>10</itemsPerPage>
    <terms>gabriele</terms>
    <firstPage rdf:resource="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term"/>
    <lastPage rdf:resource="http://api.sindice.com/v2/search?page=22&amp;q=gabriele&amp;qt=term"/>
    <page rdf:resource="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term"/>
    <opensearchDescription rdf:resource="http://www.sindice.com/opensearch.xml"/>
  </Results>
  <ResultPage rdf:about="http://api.sindice.com/v2/search?page=1&amp;q=gabriele&amp;qt=term">
    <startIndex>1</startIndex>
    <previousPage rdf:resource="http://api.sindice.com/v2/search?q=gabriele&amp;qt=term"/>
    <nextPage rdf:resource="http://api.sindice.com/v2/search?page=2&amp;q=gabriele&amp;qt=term"/>
    <htmlPage rdf:resource="http://sindice.com/search?page=1&amp;q=gabriele&amp;qt=term"/>
    <entry rdf:resource="#result1"/>
    <entry rdf:resource="#result2"/>
    ...
  </ResultPage>
  <Entry rdf:about="#result1">
    <dc:title>Gabriele Albertini</dc:title>
    <link rdf:resource="http://dbpedia.org/resource/Gabriele_Albertini"/>
    <dc:created>2008-05-23T00:00:00+01:00</dc:created>
    <fields:format>RDF</fields:format>
    <content>183 triples in 32484 bytes</content>
    <rank>1</rank>
  </Entry>
  <Entry rdf:about="#result2">
    <dc:title>Gabriele Paonessa</dc:title>
    <link rdf:resource="http://dbpedia.org/resource/Gabriele_Paonessa"/>
    <dc:created>2008-05-23T00:00:00+01:00</dc:created>
    <fields:format>RDF</fields:format>
    <content>111 triples in 16153 bytes</content>
    <rank>2</rank>
  </Entry>
 ...
</rdf:RDF>

Integrating JSON in your script

If you want, you can add an additional argument to the request called callback, which will cause the code to be wrapped in a function with the name you choose.
This allows clean integration of the Sindice results in your webpage, for example:

<script type="text/javascript"
        src="http://api.sindice.com/v2/search?q=mike&qt=term&format=json&callback=showSindiceResults" />

Notice that to force the rendering of JSON output we added an additional parameter format. It can obviously be used with values atom and rdfxml

Other API versions

Currently, our API Version is 2, with base address http://api.sindice.com/v2/
As new APIs will be released, the old one will be kept at the existing locations.

API v1

The previous version of Sindice API is still available. It implements the following 3 searches:

In the simple APIs there are 3 query types, which mimic the old Sindice search queries,

V1 Result Formats

The result format can be selected in two ways: by HTTP content negotiation or by an optional format query parameter. The default format is HTML.

Content negotiation examples:

  • To get results in RDF:
    curl -H "Accept: application/rdf+xml" http://api.sindice.com/v1/lookup?keyword=berlin
  • To get results in JSON:
    curl -H "Accept: application/json"http://api.sindice.com/v1/lookup?keyword=berlin
  • To get results in Plain text:
    curl -H "Accept: text/plain" http://api.sindice.com/v1/lookup?keyword=berlin
  • To get results in XOXO:
    curl -H "Accept: text/html" http://api.sindice.com/v1/lookup?keyword=berlin

V1 Query Parameters

Instead of using a single query type parameter, the V1 API uses multiple parameters. This means that you can specify more than one arguments, and they are tried in order: thus, specifying both keyword and url means that you will get the results only for the former.

Query Limits

Sindice currently limits to 100 the number of result pages for each query. For special needs you can refer to our developer forum or contact us directly.

Query Languages:

Sindice currently offers two ways to search for semantic data: Term Search and Advanced Search.

Term Search

The Term query language allows you to search for keywords and URIs in both the original and inferred content of indexed documents. This means that, for example, if you look for paolo it will match both documents containing this word in a URI, like:

<http://paolo.capriotti.name/foaf.rdf#me>
  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
  <http://xmlns.com/foaf/0.1/Person>

and those containing it in a literal:

<http://paolo.capriotti.name/foaf.rdf#me> <http://xmlns.com/foaf/0.1/givenName> "Paolo"

You can combine simple queries using boolean operators (AND, OR) and parentheses for example:

(tim AND berners AND lee) OR timbl OR http://www.w3.org/People/Berners-Lee/card

also note that the AND operator is automatically assumed when entering multiple terms.

You can additionally restrict your query by using field operators. For example, a query like 'washington' would match people with this name, the city, the football team, and so on. If you want to just look for people, you can try:

washington class:Person

Or, if you want to exclude people from the results, you can use the '-' operator, like:

washington -class:Person

Other domain operators include:

  • label, words in the title of the document
  • domain, e.g. www.deri.ie
  • format, one of RDF, RDFA, MICROFORMAT, XFN, HCARD, HCALENDAR, HLISTING, HRESUME, LICENSE, GEO, ADR
  • ontology, such as skos, foaf...

Namespaces

Popular namespaces are supported for convenience, for example in any query you can use foaf:knows instead of http://xmlns.com/foaf/0.1/knows

Some supported prefixes are: foaf, dc, owl, rdf, rdfs, skos, pimo, swc, sioct, void, sioc, dbpedia, bio, doap. For a complete list, please refer to prefix.cc

In addition, we support the following namespaces which map directly to microformats: vcard, xhtml, doac, geo, hlisting, ical, rev, xfn (see Microformats Support for details).
Example query: vcard:name

Advanced Search

Sindice can also leverage the full power of the triple concepts through the advanced query language. The syntax of this query language is simple to use: the basic element is a triple pattern. A triple pattern is a complete or partial representation of a triple:

* <http://xmlns.com/foaf/0.1/name> "Renaud Delbru"
* <http://xmlns.com/foaf/0.1/name> 'Renaud AND Delbru'

The symbol '*' stands for a wildcard matching one of the elements of a triple. A triple pattern composed of three '*' is not allowed, and will return no results.

 Due to a bug, the Sindice beta1 query engine does not behave as expected
 when using two wildcards in a triple pattern. The problem will be fixed
 in beta2. For the moment, we recommend to avoid the use of two wildcards
 in a single triple pattern, since it will lead to unexpected results.

The query language enables boolean combination of triple patterns using binary operators:

  • AND: intersection
  • OR: Union
  • -: Complement (Include - Exclude)
  • (): Grouping

For example, you can try combining two triple queries:

* <http://sindice.com/exfn/0.1/friend>
    <http://klogs.org>
AND
* <http://sindice.com/exfn/0.1/friend>
    <http://www.isaacmao.com>

But you can also use the same operators in a literal pattern defined by single quotes, for example try this query:

* <http://sindice.com/hlisting/0.1/itemName>
    'ipod AND (nano OR shuffle)'

If a literal element is enclosed in double quotes, an exact match will be performed. The query language also allows multiple operators, for example:

(* <http://xmlns.com/foaf/0.1/givenname> "Giovanni"  AND
* <http://xmlns.com/foaf/0.1/family_name> "Tummarello")
OR * <http://xmlns.com/foaf/0.1/name> "Giovanni Tummarello"

will find matches for either a pair of given and family name, or a single full name.