Sig.ma - Live views on the Web of Data

Today we release Sig.ma, Hurray \o/ !
Sig.ma is a pretty advanced application implemented on top of Sindice which gives a very visual and interactive access to the “Web of Data” as a whole.  Best thing to do, really, is watch the screencast. Bear the first 60 seconds where I introduce the Web of Data, it’s pretty fast after that.

While the demo is probably.. agreably cool :-) there is more to talk about.

While Sig.ma is by no mean the first data aggregator for the Semantic Web, its contribution is to show that the sum is really bigger than the single parts and exciting possibilities lie in a holistic approach for automatic semistructured data discovery and consolidation.

In Sig.ma, elements such as large scale semantic web indexing, logic reasoning, data aggregation heuristics, pragmatic ontology alignments and, last but not least, user interaction and refinement, all play together to provide entity descriptions which become live, embeddable data mash ups.

An interesting example:

when we first saw the B&W pictures (e.g. see the demo ) pop up automatically the first time we ran Sigma we were really excited: that DERI data had been there forever yet never meaningfully used or integrated.. let alone automatically! That DERI RDF file does no reuse the right URI for people , doesn’t use Inverse Functional Properties such as “emails”, and  uses only one of many ways to say “author”.

But here it was! That file was there, discovered automatically and  contributing marvelously to the mashup providing information about papers,  (including technical reports that would not be listed otherwise) an extra picture, the phone number, a confirmation of the personal homepage, research projects and more.

Note: this doesn’t mean that the DERI file is bad at all actually. It’s simply not unrealistically great, in other words it was created with a realistic effort, the same that we can expect from any data publisher.

There was no way to get that very useful data with classic  Semantic Web inference and rule consolidation alone. All it took was instead the mix of semantic web practices and tricks with pragmatic and elements of soft computing (quite basic indeed).

In our opinion it all makes sense and inspires the following thoughts:

  1. A little semantic might in fact go a long way:  no way there could be something comparable to Sig.ma had we not had a large core of semantically structured data (the Web of Data itself). Publish way more please!  Be this in whatever format can be consolidated to RDF.
  2. … it goes in fact even more a long way when the user is involved, and can with pragmatic actions (e.g. “reject” or “approve”) to  steer and validate the results.
  3. For data publishers: just like on the HTML web you can simply care only about your site.  If you don’t reuse other people URIs or you don’t put “sameAs” links or you don’t really use the ontology everyone else is using then..  it can work all the same most likely and for most applications!
  4. .. but overdescribe
    Be verbose with your semantic descriptions,  more than what you would be for a human. A well described entity will be the best possible “entity” identifier that one could think. It will automatically generate invisible but robust links to others entity descriptions. So dont just write name = fooguy, make sure you expose all you have (and are willing to share) and let aggregation engines use this data to at least do the best consolidation possible.  Good descriptions will also make you show more often in semantic aggregations, foster new applications and make people more likely to integrate with you.
  5. For data consumers: We are working for you really and willing to do the hard work.
    This is again very similar to the HTML world. How difficult is to make sense of all the broken HTML out there? Very! How many people have to do it really? just a few, the browser makers.  Others can reuse their efforts and concentrate on other aspects. Sig.ma and Sindice are engines that do the hard part for you as a Web of Data developer.  We provide open services and open source components (heck, at the end of the week we’re even releasing our index open source :-) , next the reasoning engine).  If there is interest and market others will come and there will be more choice

So let me conclude with a good-fortune  Sig.ma of Stefan Decker :-) (50 sources, sigma “Stefan Decker” + add info “Stefan Decker DERI”, with a  couple of manual sources added or deleted)


And the rest follows from the small FAQ in the Sig.ma about page. Cheers!

Why is this potentially revolutionary?

As appropriate data sources become available (pages annotated with RDFa or Microformats), Sig.ma is in a different league in terms of information richness and precision compared to methods solely based on web text analysis.

Sig.ma can be used by humans and software agents alike to obtain structured data about any entity.

Is Sigma noise free?

Not yet. Sig..ma still employs heuristics for many aspects and has to deal with heterogeneous data in the current Web of Data – a very early stage environment! What we can say however is:

  1. Sig.ma is interactive and can learn from its usage: when a user deletes a piece of information or a source, Sigma writes it down and that piece of information is less likely to show back at a later time.
  2. We have deliberately chosen very simple strategies at this point to test the general idea more than advanced strategies: the potential for improvement is tremendous.
  3. The Web of Data itself is very new: until very recently there was basically no way to see this data in action and markup has been done on a best effort-hacker enthusiastic-leap of faith way. Now that Google and Yahoo are starting to recognize the value of page markup, it is realistic to expect improvements in data coverage and quality.

Why does my phone number/picture/favourite movie not appear?

Pages exposing RDF, RDFa or Microformats will appear. If you or your company want information to be found on the web of data, it is very simple to mark up your HTML using RDFa, then submit it to Sindice. You will find it returned by Sig.ma within 10-15 minutes.

How is Sig.ma built? Can I build applications like Sig.ma?

Sig.ma is enabled by Sindice, an index of the web of data. Thanks to Sindice, Sig.ma can accurately locate sources of web data using not only text but also precise attribute value searches and more. Sindice is alive and growing, constantly finding new information, receiving “pings” and immediately adding new documents etc. Where to start? Please write on our forum.

Acknowledgements

Sig.ma and Sindice are built at DERI mainly within the OKKaM Project (ICT-215032) but also with the support of the Science Foundation Ireland under Grant No. SFI/02/CE1/I131, of the ROMULUS project(ICT-217031) and the iMP project.

R&D by  Michele Catasta, Richard Cyganiak, Szymon Danielczyk and Giovanni Tummarello.

Post filed under Announcements, Blogroll, Sigma, Sindice.

8 comments

  1. Pingback from Stefano’s Linotype » First Impressions on Sig.ma  

    [...] yet  link to Sig.ma because it wasn’t released. But Giovanni posted about it today and officially released it to the public so I can now talk about [...]

  2. Pingback from The Web of Data, Beyond What Google and Yahoo Show « Bob's News  

    [...] in API mode to reuse the information Sig.ma finds inside applications. Here are a screencast and a blog post, with semantic-web-geek [...]

  3. Pingback from Semantic Web for the masses - Part II | Carlo Torniai  

    [...] In a previous post I’ve highlighted the need of simple tools that can be easily accessed by a large number of users in order to - ease the process of publishing and retrieving structured information - show the value of having structured information on the web Recently at Deri Giovanni Tummarello and his rsearch group has launched a service called Sigma which easily fits in these categories. Sigma is a search engine for structured data based on Sindice. Sindice parses the information on the web looking for RDFa and microformats, in particular it parses well known structured information in pages such as Wikipedia, WordPress blogs, Linkedin, Flickr, Facebook, etc. retrieving information and translating it in triples. Sigma presents this information and allows you to do some pretty cool stuff with it. The key features are: - a cool and meaningful visualization of triples including the related sources - the great interface that allows you to add additional sources of triples and to create your own “views” of information about a resources (the so called sigmas) The “sigmas” intended as “views” of information about particular resource can be shared, blogged, and downloaded as JSON or RDF. Ok I must admin that Sigma isn’t probably an application “for the masses” but surely it has the great merit to show in a tangible way what it can be done with the Web of data. I’ve played around a little bit with my name adding my blog as a source of additional RDFa and it was pretty easy and straightforward. In the following you’ll find a snipped of the results returned form Sigma: There is some funny stuff but one can easily reject the info and build his/her own “clean” sigma. Learn more about Sigma here. [...]

  4. Pingback from DERI launches Sig.ma  

    [...] the SFI-funded CSET in semantic web technology, has announced the launch of Sig.ma, which they describe as giving live aggregation of semantic web data. Sig.ma is based on [...]

  5. Pingback from Sig.ma or Semantic Information Mashup | Lacisoft's  

    [...] you want to learn more and see a more convincing demo then watch the video below or read the official blog post about [...]

  6. [...] of what’s on the “Web of Data” about a given topic. For more information see our blog post. For academic use please cite Sig.ma as in [...]

  7. Pingback from Part of the LOD2 Technology Stack: Sig.ma Enterprise Edition (EE) available | LOD2  

    [...] of what’s on the “Web of Data” about a given topic. For more information see the respective blog post. For academic use please cite Sig.ma as in [...]

  8. Comment by Yasen  

    Tons of pingbacks and not a single comment, a shame :)

    I think sig.ma is on the right way of accumulating and presenting semantic metadata and I hope to see a lot more of it in the future. But you can’t hope that data will improve in quality :) ask Google or Yahoo about how quality of webpages improved (or all the other search engines that actually suck at finding the most relevant results)

    The real challenge for you will be those fancy algorithms you are talking about which help you clear the noise and improve relevancy of the results.

    Good luck!

About this blog

In this blog you'll find announcements related to Sindice project, as well as news about Semantic Web topics or technical issues strictly related to the search engine.

Categories