Sindice Cache API

Cache API V3

The Sindice Cache provides read-only access to the Sindice Data Store.


Arguments

url=STRING (Required)
The url of the page to be retrieved from the Sindice Data Store. More than one url argument may be provided.
field=FIELDNAME (Optional)(For available field names see "Response Format")
The names of the fields to be included in the results. More than one field argument may be provided. If no field argument is provided all fields will appear in the results.
callback=STRING (Optional)
If callback parameter is provided, the result will be wrapped with callback({here the results}). This allows to make jsonp calls directly from javascript.
output=json (Optional)
The output format. Only json output format is supported.
pretty=true/false (Optional)
Whether to format output for easier reading by people. If set to 'true' the output will appear nicely formatted.
filter.FIELDNAME.include.regex=(<REGEX>) (Optional)
For the named field, filter to include only the items matching the given regular expression. This is useful for explicit_content and implicit_content fields to include only the desired triples. If more than one include filter is provided for the same field, an item will be included if it matches any query. Filters are not case sensitive.
filter.FIELDNAME.exclude.regex=(<REGEX>) (Optional)
For the named field, filter to exclude any items matching the given regular expression. This is useful for explicit_content and implicit_content fields to exclude undesired triples. If more than one exclude filter is provided for the same field, an item will be excluded if it matches any query. Filters are not case sensitive.

Response Format

The response is a json map, keyed by the request urls. The values are json objects containing the current fields for the url as they appear in the Sindice Data Store.

The following fields are available from the cache. Not all fields are available for each resource.

checksum
This field is present on only some documents and will be removed in the future
class
List of class URIs used in the document
dataset_uri
The uri of the dataset to which the url belongs. This field is normally only present for urls which have data_source=DUMP.
data_source
The means by which the data was obtained, PING, CRAWL, DUMP, SIGMA, etc.
domain
The website domain to which the url belongs.
etag
The HTTP etag for the url, only sometimes present.
explicit_content
A list of rdf statements extracted for that url. Each item of the list is a single statement in ntriple format.
format
How semantic data was exposed, for example RDF, RDFA, MICROFORMAT. There may be one format or a list of formats.
implicit_content
A list of rdf statements inferred for that url. Each item of the list is a single statement in ntriple format.
label
A list of labels (literals from rdfs:label) of the indexed resource.
length
The number of bytes in explicit_content.
ontology
A list of the urls of the ontologies imported by the statements.
size
The number of statements in explicit_content.
timestamp
The date of indexing in the form yyyy-MM-dd'T'HH:mm:ss.SSS (expressed in Java SimpleDateFormat format).
url
The url which uniquely identifies the indexed resource.

Example GET request

wget "http://api.sindice.com/v3/cache?
	url=http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway
	&field=explicit_content&field=data_source&field=ontology
	&output=json&pretty=true
	&filter.explicit_content.exclude.regex=(%2Fdata%2F)"

Example Response

{ "http://dbpedia.org/resource/Terry_Fox_Run" :
   { "checksum" : "e5be6d2dee6eff0916cc169872c5f6155b10e9a7",
      "data_source" : "DUMP",
      "domain" : "dbpedia.org",
      "explicit_content" : [ 
      	"<http://dbpedia.org/resource/Terry_Fox_Run> 
      	<http://www.w3.org/2000/01/rdf-schema#label> 
      	\"Terry Fox Run\"@en .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Long-distance_races> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Philanthropy> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Charities_based_in_Canada> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://dbpedia.org/property/hasPhotoCollection> 
        <http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Terry_Fox_Run> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/homepage> 
        <http://www.terryfoxrun.org/> .\n"
        ],
      "format" : "RDF",
      "implicit_content" : [ 
      	"<http://www.terryfoxrun.org/> 
      	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
      	<http://xmlns.com/foaf/0.1/Document> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> 
        <http://www.terryfoxrun.org/> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/page> 
        <http://www.terryfoxrun.org/> .\n"
        ],
      "label" : "Terry Fox Run",
      "length" : "844",
      "ontology" : [ "http://xmlns.com/foaf/spec/",
          "http://www.w3.org/2002/07/owl.rdf",
          "http://www.w3.org/2000/01/rdf-schema",
          "http://www.w3.org/1999/02/22-rdf-syntax-ns",
          "http://www.w3.org/2003/06/sw-vocab-status/ns.rdf",
          "http://www.w3.org/TR/skos-reference/skos.rdf"
        ],
      "size" : "6",
      "timestamp" : "2009-09-21T12:17:14.419",
      "url" : "http://dbpedia.org/resource/Terry_Fox_Run"
    }
 }

Cache API V2

Calls examples:

Cache API allows you to retrieve triples from cached documents using document URI. To get triples you must specify url=URI of the document.
To get implicit triples too you must add parameter implicit=1. Example:

http://api.sindice.com/v2/cache?url=http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway
http://api.sindice.com/v2/cache?url=http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway&implicit=1

The result format can be selected by HTTP content negotiation:
Content negotiation examples:
  • To get results in HTML:
    curl -H "Accept: text/html" 
    "http://api.sindice.com/v2/cache?url=
    http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"
  • To get results in RDF:
    curl -H "Accept: application/rdf+xml" 
    "http://api.sindice.com/v2/cache?url=
    http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"
  • To get results in Plain text:
    curl -H "Accept: text/plain" 
    "http://api.sindice.com/v2/cache?url=
    http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"