Metadata extractions
Contents
- Status
- Overview
- Processing metadata with Any23
- Any23 Extractors
- Any23 Microformats Nesting
- Verify your data with Sindice Inspector
Status
- Major improvements to the document. Added reference to Any23 external documentation.
Migrated from Microformats to Metadata document topic. - Added support for rel-license, hListing, EXFN. Change archive format to accomodate new spec.
- Generally improvements to the document.
- First public release.
Overview
This section describes how Sindice extracts metadata from the Web and which formats are supported.
Metadata parsed out from HTTP resources (HTML, RDF specific formats, CSV, etc) are converted to RDF graphs.Such graphs are enhanced, stored and indexed by the several Sindice backend components,
which make them available to be queried through several services offered by Sindice like SIREn, the SPARQL endpoint and the Sindice Cache API.
See the full list of the Sindice open API and Tools.
Processing metadata with Any23
The core extraction library used by Sindice is Anything To Triples (Any23), at the latest stable release.
Any23 is a library, a Web service and a set of command line tools written in Java
for extracting structured data in RDF format from a variety of Web documents.
Any23 will be maintained in the Google Code repository until the 0.7.0 release,
then the development infrastructure will be migrated to the Apache Any23 site.
Any23 Extractors
Any23 supports multiple input and output metadata formats including:
- Microformats
- RDFa 1.0, 1.1
- Microdata
- and more ...
Any23 Microformats Nesting
Any23 adds specific structural statements to express the nesting relationship of Microformat metadata.
The logic of these statements is documented in documentation section Microformat Nesting.
Verify metadata with Sindice Inspector
The main purpose of the Sindice Inspector is to provide a verification and visualization tool for Semantic Web metadata.
It provides also support for web engineers interested in evaluating third party metadata contents that can be then
pinged to Sindice to be used in data mashup scenarios.
The Sindice Inspector uses Any23 to extract metadata, so this versatile tool can used to verify how Sindice reads any data exposed on the Web.
A live demo of the only Any23 Web Service is available here.
Api Documentation
- Sindice API
- Query language
- Sindice Search API
- Sindice Cache API
- Sindice Live API
- Ping Submission API
- Metadata extractions
Tools
- AJAX Search Widget
- Third Party Libraries
- Siren indexing technology
- Web Data Inspector
- Any23 Library
- OpenSearch Plugin