Sindice posts

  • Farewell to Giovanni and Renaud

    By Naoise Dunne with no comments.

    It is with sadness that the Sindice team here at Insight say goodbye to two of the original founders of Sindice.com,  Dr. Giovanni Tummarello and Dr. Renaud Delbru.  Renauld and Giovanni are with the project since it’s inception and have been crucial for Sindice to become a showcase for what is possible using […]

  • The SIREn 1.0 Open Source Release and its Use in the Semantic Web Community

    By renaud with 2 comments.

    We are happy to announce the availability of SIREn 1.0 under the Apache License Version 2.0. SIREn is the information retrieval engine that has been powering Sindice.com these past years. SIREn has been developed as a plugin for Apache Lucene and Apache Solr to enable efficient indexing and searching of arbitrary structured documents, e.g., JSON […]

  • How we ingested 100M semantic documents in a day (And where do they come from)

    By Giovanni Tummarello with 4 comments.

    How to get some unexpected big data satisfaction. First: build an infrastructure to process millions of document documents. Instead of just doing it home-brew, however, do your big data homework, no shortcuts. Second: unclog some long standing clogged pipe The feeling is that of “it all makes sense” and it happened to us the other […]

  • Updates on the Sindice SPARQL Endpoint

    By Giovanni Tummarello with no comments.

    SPARQL is really hard to beat as a tool to experiment with data integration across datasets.  For this reason we get many request on what data we have in our sparql.sindice.com, frequency of  updates, etc. We admit there was a bit of disappointment as in the past months we were not able to keep up […]

  • Sig.ma Enterprise Edition (EE) available

    By Giovanni Tummarello with no comments.

    The original Sig.ma The http://sig.ma service was created as a demonstration of live, on the fly Web of Data mashup. Provide a query and Sig.ma will demonstrate how the Web of Data is likely to contain surprising structured information about it (pages that embed RDF, RDFa, Microdata, Microformats) By using the Sindice search engine Sig.ma allows a […]

  • Sindice reindexed: find your datasets (much faster)

    By Giovanni Tummarello with no comments.

    Having streamlined several procedures inside Sindice, rebuilding the sindice index from scratch now takes just a few hours. Over the weekend, we built a new Sindice index based on the latest updates of Siren and improvements to the pipelines. This is now in production and sports the following enhancements: Ranking no more big docs first […]

  • Searching infinite amounts of Web Data: The new Sindice Index and Frontend

    By Giovanni Tummarello with 3 comments.

    Several goals have kept the the Sindice Team constantly busy in the past year or so. Luckly we’re now getting close to their deployment and today we’re happy to begin by introducing SIREn, the new Sindice core index,  its supporting new frontend and the API.  SIREn: Sindice’s own semantic search engine SIREn (Semantic Information Retrieval […]

  • Sindice migration

    By robert with no comments.

    This is mainly a test post to verify that the Sindice blog continues to work after migrating it to a new server. But it is also a good opportunity to briefly mention the upgrades we are making to the Sindice infrastructure. I’m happy to report that Sindice has been suffering from some growing pains over […]

  • Sindice now supports Efficient Data discovery and Sync

    By Giovanni Tummarello with 1 comment.

    So far semantic web search engines and semantic aggregation services have been inserting datasets by hand or have been based on “random walk” like crawls with no data completeness or freshness guarantees. After quite some work, we are happy to announce that Sindice is now supporting effective large scale data acquisition with *efficient syncing* capabilities based on […]

  • Sindice planned downtime this weekend

    By robert with no comments.

    Hi. Due to an expansion of one of our datacentres (and the electrical work that this implies), Sindice and related services such as sig.ma will be down from 1730 GMT+1, 11-Jun (Friday) to 1730 GMT+1, 12-Jun (Saturday). This major upgrade will give us increased room to grow the Sindice infrastructure over time. On 27-May we […]

About this blog

In this blog you'll find announcements related to Sindice project, as well as news about Semantic Web topics or technical issues strictly related to the search engine.

Categories