Sindice Cache API

Cache API V3

The Sindice Cache provides read-only access to the Sindice Data Store.

Arguments

url=STRING (Required): The url of the page to be retrieved from the Sindice Data Store. More than one url argument may be provided.
field=FIELDNAME (Optional)(For available field names see "Response Format"): The names of the fields to be included in the results. More than one field argument may be provided. If no field argument is provided all fields will appear in the results.
callback=STRING (Optional): If callback parameter is provided, the result will be wrapped with callback({here the results}). This allows to make jsonp calls directly from javascript.
output=json (Optional): The output format. Only json output format is supported.
pretty=true/false (Optional): Whether to format output for easier reading by people. If set to 'true' the output will appear nicely formatted.
filter.FIELDNAME.include.regex=(<REGEX>) (Optional): For the named field, filter to include only the items matching the given regular expression. This is useful for explicit_content and implicit_content fields to include only the desired triples. If more than one include filter is provided for the same field, an item will be included if it matches any query. Filters are not case sensitive.
filter.FIELDNAME.exclude.regex=(<REGEX>) (Optional): For the named field, filter to exclude any items matching the given regular expression. This is useful for explicit_content and implicit_content fields to exclude undesired triples. If more than one exclude filter is provided for the same field, an item will be excluded if it matches any query. Filters are not case sensitive.

Response Format

The response is a json map, keyed by the request urls. The values are json objects containing the current fields for the url as they appear in the Sindice Data Store.

The following fields are available from the cache. Not all fields are available for each resource.

checksum: This field is present on only some documents and will be removed in the future
class: List of class URIs used in the document
dataset_uri: The uri of the dataset to which the url belongs. This field is normally only present for urls which have data_source=DUMP.
data_source: The means by which the data was obtained, PING, CRAWL, DUMP, SIGMA, etc.
domain: The website domain to which the url belongs.
etag: The HTTP etag for the url, only sometimes present.
explicit_content: A list of rdf statements extracted for that url. Each item of the list is a single statement in ntriple format.
format: How semantic data was exposed, for example RDF, RDFA, MICROFORMAT. There may be one format or a list of formats.
implicit_content: A list of rdf statements inferred for that url. Each item of the list is a single statement in ntriple format.
label: A list of labels (literals from rdfs:label) of the indexed resource.
length: The number of bytes in explicit_content.
ontology: A list of the urls of the ontologies imported by the statements.
size: The number of statements in explicit_content.
timestamp: The date of indexing in the form yyyy-MM-dd'T'HH:mm:ss.SSS (expressed in Java SimpleDateFormat format).
url: The url which uniquely identifies the indexed resource.

Example GET request

wget "http://api.sindice.com/v3/cache?
	url=http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway
	&field;=explicit_content&field;=data_source&field;=ontology
	&output;=json&pretty;=true
	&filter.explicit;_content.exclude.regex=(%2Fdata%2F)"

Example Response

{ "http://dbpedia.org/resource/Terry_Fox_Run" :
   { "checksum" : "e5be6d2dee6eff0916cc169872c5f6155b10e9a7",
      "data_source" : "DUMP",
      "domain" : "dbpedia.org",
      "explicit_content" : [ 
      	"<http://dbpedia.org/resource/Terry_Fox_Run> 
      	<http://www.w3.org/2000/01/rdf-schema#label> 
      	\"Terry Fox Run\"@en .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Long-distance_races> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Philanthropy> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://www.w3.org/2004/02/skos/core#subject> 
        <http://dbpedia.org/resource/Category:Charities_based_in_Canada> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://dbpedia.org/property/hasPhotoCollection> 
        <http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Terry_Fox_Run> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/homepage> 
        <http://www.terryfoxrun.org/> .\n"
        ],
      "format" : "RDF",
      "implicit_content" : [ 
      	"<http://www.terryfoxrun.org/> 
      	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
      	<http://xmlns.com/foaf/0.1/Document> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/isPrimaryTopicOf> 
        <http://www.terryfoxrun.org/> .\n",
        "<http://dbpedia.org/resource/Terry_Fox_Run> 
        <http://xmlns.com/foaf/0.1/page> 
        <http://www.terryfoxrun.org/> .\n"
        ],
      "label" : "Terry Fox Run",
      "length" : "844",
      "ontology" : [ "http://xmlns.com/foaf/spec/",
          "http://www.w3.org/2002/07/owl.rdf",
          "http://www.w3.org/2000/01/rdf-schema",
          "http://www.w3.org/1999/02/22-rdf-syntax-ns",
          "http://www.w3.org/2003/06/sw-vocab-status/ns.rdf",
          "http://www.w3.org/TR/skos-reference/skos.rdf"
        ],
      "size" : "6",
      "timestamp" : "2009-09-21T12:17:14.419",
      "url" : "http://dbpedia.org/resource/Terry_Fox_Run"
    }
 }

Cache API V2

Calls examples:

Cache API allows you to retrieve triples from cached documents using document URI. To get triples you must specify url=URI of the document.
To get implicit triples too you must add parameter implicit=1. Example:

The result format can be selected by HTTP content negotiation:
Content negotiation examples:

To get results in HTML:

curl -H "Accept: text/html" 
"http://api.sindice.com/v2/cache?url=
http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"

To get results in RDF:

curl -H "Accept: application/rdf+xml" 
"http://api.sindice.com/v2/cache?url=
http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"

To get results in Plain text:

curl -H "Accept: text/plain" 
"http://api.sindice.com/v2/cache?url=
http%3A%2F%2Fwordnet.rkbexplorer.com%2Fid%2Fword-Galway"