The SIREn 1.0 Open Source Release and its Use in the Semantic Web Community

We are happy to announce the availability of SIREn 1.0 under the Apache License Version 2.0. SIREn is the information retrieval engine that has been powering Sindice.com these past years. SIREn has been developed as a plugin for Apache Lucene and Apache Solr to enable efficient indexing and searching of arbitrary structured documents, e.g., JSON or XML. While it does not match the RDF data model (graph vs tree) and therefore it is NOT a SPARQL endpoint replacement, it can be used in many interesting ways by the Linked Data / Semantic Web community. We explain below how to index and search RDF graphs using the JSON-LD format.

SIREn is available for download at this homepage and the source code is available on github.

Acknowledgements for this release must go to the European FP7 LOD2 project of which SIREn is a deliverable, and to the Irish Research Council for Science, Engineering and Technology which has supported this project.

Indexing and Searching JSON-LD with SIREn

SIREn’s API is based on the JSON data format, and therefore it is compatible with the JSON-LD syntax, a JSON-based format to serialise Linked Data. In order to leverage the full power of SIREn and be able to search not only on text and entities, but also on relations between entities, one can use the JSON-LD Framing to easily map a graph to a tree structure. Mature libraries in various programming environments (see here for a list of librairies) are available to export RDF data into the JSON-LD format.

With this set of tools, one could transform the following N-Triple document about an online bike shop:

<http://store.example.com/> <http://ns.example.com/store#product> <http://store.example.com/products/links-speedy-lube> .
<http://store.example.com/> <http://ns.example.com/store#product> <http://store.example.com/products/links-swift-chain> .
<http://store.example.com/> <http://purl.org/dc/terms/description> "The most \"linked\" bike store on earth!" .
<http://store.example.com/> <http://purl.org/dc/terms/title> "Links Bike Shop" .
<http://store.example.com/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.example.com/store#Store> .
<http://store.example.com/products/links-speedy-lube> <http://ns.example.com/store#category> <http://store.example.com/category/chains> .
<http://store.example.com/products/links-speedy-lube> <http://ns.example.com/store#category> <http://store.example.com/category/lube> .
<http://store.example.com/products/links-speedy-lube> <http://ns.example.com/store#price> "5"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://store.example.com/products/links-speedy-lube> <http://ns.example.com/store#stock> "20"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://store.example.com/products/links-speedy-lube> <http://purl.org/dc/terms/description> "Lubricant for your chain links." .
<http://store.example.com/products/links-speedy-lube> <http://purl.org/dc/terms/title> "Links Speedy Lube" .
<http://store.example.com/products/links-speedy-lube> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.example.com/store#Product> .
<http://store.example.com/products/links-swift-chain> <http://ns.example.com/store#category> <http://store.example.com/category/chains> .
<http://store.example.com/products/links-swift-chain> <http://ns.example.com/store#category> <http://store.example.com/category/parts> .
<http://store.example.com/products/links-swift-chain> <http://ns.example.com/store#price> "10"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://store.example.com/products/links-swift-chain> <http://ns.example.com/store#stock> "10"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://store.example.com/products/links-swift-chain> <http://purl.org/dc/terms/description> "A fine chain with many links." .
<http://store.example.com/products/links-swift-chain> <http://purl.org/dc/terms/title> "Links Swift Chain" .
<http://store.example.com/products/links-swift-chain> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ns.example.com/store#Product> .

into a JSON-LD document (the ‘@context’ part has been removed for readability):

{
    "@id": "http://store.example.com/",
    "@type": "Store",
    "name": "Links Bike Shop",
    "description": "The most \"linked\" bike store on earth!",
    "product": [
        {
            "@id": "p:links-swift-chain",
            "@type": "Product",
            "name": "Links Swift Chain",
            "description": "A fine chain with many links.",
            "category": ["cat:parts", "cat:chains"],
            "price": 10.00,
            "stock": 10
        },
        {
            "@id": "p:links-speedy-lube",
            "@type": "Product",
            "name": "Links Speedy Lube",
            "description": "Lubricant for your chain links.",
            "category": ["cat:lube", "cat:chains"],
            "price": 5.00,
            "stock": 20
        }
    ]
}

and then index it as a SIREn’s document, and be able to search bike shops based on their products, e.g., find all shops selling chains with a price inferior to 20:

(@type : Store)
AND
(product : { category : chains, price : int([* TO 20]) })

About Other Sindice Open Source Releases

This release follows the release of Sig.ma as open source and, more notably, our RDF extraction library Any23 (now a top level apache project).

We are currently requesting permission to release other components of Sindice, as we believe this would be quite beneficial to the community. These include the assisted sparql query editor, powered by the RDF graph summary. We hope we’ll be able to return on this soon.

Post filed under Sindice.

2 comments

  1. Comment by jaydeep  

    Hey,
    What tool should I use to crawl and convert online rdf uris to json based content to get it indexed by SIREN?

  2. Comment by renaud  

    Hi,

    you can use any23 library [1] to convert structured data into RDF, then use the jsonld-java library [2] with OpenRDF Sesame to convert RDF into json-ld documents.

About this blog

In this blog you'll find announcements related to Sindice project, as well as news about Semantic Web topics or technical issues strictly related to the search engine.

Categories