Sindice @ 20+ Millions and Openings

Works on Sindice are proceeding at full speed and so is the indexing of the Semantic Web.

Sindice now indexes over 20+ millions Semantic Web documents (21,5 m as i type) and will index your submitted RDFs in usually less than 30 minutes. This great result is entirely due to the dedication of the Sindice development team.

Some of the geeky bits :-). We have now the version 2 of the indexing pipeline up and running (Renaud, Eyal, Michele).

The Sindice indexing pipeline does a job which is all but trivial. And does it at an amazing speed.

Basically each document is integrated by recursively resolving the URIs of the properties and classes in use, thus calculating a “Web closure” of the explicitly or implicitly imported ontologies. Once this is performed, reasoning happens using RDFS and some OWL ( e.g. FunctionalProperty, TransitiveProperty, sameAs, inverseOf, InverseFunctionalProperty, SymmetricProperty). Sindice has done this for each of the 22 million source independently,in less than 3 weeks (plus the actual indexing and all sort of other processes) on a relatively small cluster (4-6 xeon cores). Not bad? :-)

Thanks to this processing, we can be as precise and complete as possible in solving tasks such computing the IFP index, composing human legible descriptions of documents and powering at best the forthcoming entity based APIs.

Notably, all large datasets (e.g. the huge UniProt) are now proudly processed using our brand new Hadoop based Semantic Sitemap processor, specific courtesy of Holger Stenzhorn who has joined the team last month.

Sindice is Hiring!

In the context of the EU project OKKAM, to start Jan, we are now looking for candidates who’re interested in developing highly scalable and innovative Semantic Web infrastructures and applications. Positions include Interns, Masters, Ph.D, and Postdocs and Scientific Developers.

While we of course highly value academic brilliance, we’re expecially looking for candidates who, like us, believe that it is through clever but hard core software engineering and development that we can make the difference on the Semantic Web.

Successful candidates will be rewarded with top salaries and working conditions.

Post filed under Announcements.

6 comments

  1. Comment by Shantanu  

    The work is really interesting. One would have worked on this project even without the salary

  2. Comment by Vinay Kumar  

    Im a student of india, i love to work here as an intern. How to apply?

  3. Comment by Jeremy Flowers  

    Saw a thread on LinkedIn with item posted by David Peterson that mentioned your site under Semantic Web group. I”ve been developing a growing interest in this type of stuff. Web Crawlers Bots, Data extraction etc.
    Would be keen to know what reading material you’ve found the best?
    I’ve been thinking of writing a search engine to search for IT jobs with real employers (not the fake ones agencies post on job boards) and bringing back results in geographic radius.
    I’ve recently been reading Lucene in Action, HTTP Programming Bots in Java (Jeff Heaton). Web Content Mining with Java (Tony Loton). He had good way of creating a string representing DOM structure then using wildcards to extract rows of data, so you ended up with an SQL like tool to get data out of tables..
    I’ve also got the Collective Intelligence in Action on order.. Due out any day..
    Saw your not about candidates above. This kind of stuff does appeal to me. But I think I’d have some catching up to do to get to the level you folks are at.

    PS:Was looking at WebMonkey today too. Thinking I need to understand Microformats/RDF better too. A lot to learn! But I’m up for it!

  4. Pingback from Blogabriel » Traduction : How to Publish Linked Data on the Web? (10/10)  

    [...] developed by DERI Ireland, currently indexes over 20 million RDF documents. See also their ISWCpaper Sindice.com: Weaving the Open Linked [...]

  5. Pingback from Blogabriel » Traduction française : How to Publish Linked Data on the Web?  

    [...] developed by DERI Ireland, currently indexes over 20 million RDF documents. See also their ISWC paper Sindice.com: Weaving the Open Linked [...]

  6. Pingback from Traduction : How to Publish Linked Data on the Web? (10/10) « Blogabriel  

    [...] developed by DERI Ireland, currently indexes over 20 million RDF documents. See also their ISWC paper Sindice.com: Weaving the Open Linked [...]

About this blog

In this blog you'll find announcements related to Sindice project, as well as news about Semantic Web topics or technical issues strictly related to the search engine.

Categories