What Is This Thing Called Linked Data?*

DocEng2015 Tutorial

*This title is inspired from Alan Chalmers' book "What is this thing called science?"


The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial, held during the 15th ACM SIGWEB International Symposium on Document Engineering September 8-11, Lausanne, Switzerland, aims to give an overview of the principles, models and technologies underlying Linked Data.

Topic and Goal

The Linked Data initiative promotes exposing, sharing and connecting structured data on the Web [2,6]. Linked Data has brought a large amount of data in the form of RDF triples, nowadays rising up to more than 89 billions of triples distributed over datasets of many different domains, such as media and life sciences, and geography, to name a few. As a result, the Web has gone from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This has attracted many organizational bodies that today publish and consume linked data, ranging from public and private companies (BBC, New York Times) to public institutions (UK government).

The goal of this tutorial is to present an overview of all the principles, models and technologies of Linked Data. We will start by reviewing Tim Berners-Lee's Linked Data principles and motivating Linked Data and its precursor the Semantic Web. We will explain the RDF data model for distributing data on the Web. Then, we will introduce the SPARQL protocol and query language for RDF data. We will nish by briefly introducing semantic modeling with the RDF-S and OWL ontology languages. Below we elaborate a bit more on these topics.

Tutorial's Content

The tutorial will consist of two sessions, one dedicated to theory, and another one to practice. The theoretical session will mainly cover the topics summarized below.

Linked Data Principles

Tim Berners-Lee [1] outlined four main principles that are the guidelines for publishing linked data on the web:

  1. use URIs to name (identify) things,
  2. use HTTP URIs so that people can look up these names
  3. when someone looks up a URI, provide useful information using open standards such as RDF, SPARQL, etc., and
  4. include links to other URIs so that people can discover more things.

Distributing Data on the Web with RDF

RDF (Resource Description Framework) [4] provides a graph model for publishing and interlinking data on the web. RDF allows to describe web resources by using URIs to identify resources and represent (binary) relations between resources. RDF does not only provide a graph model but it also serves as a foundation for other standards for querying data (like SPARQL) and reasoning over data (RDF-S, OWL).

Querying Linked Data with SPARQL

SPARQL [5] builds on top of RDF and it provides:

  1. a query language for accessing RDF graphs;
  2. an XML format for representing the results of a query; and
  3. a protocol to submit a query to a distant server and receive the results through HTTP.

Linked Data applications typically rely on SPARQL for consuming linked open data.

Semantic Modeling with RDF-S and OWL

Even though Linked Data needs no more than RDF and SPARQL (apart from URI and HTTP) to be deployed, we consider that a tutorial on Linked Data should also briefly introduce RDF-S [3] and OWL [7] ontology languages. These languages allow to specify the semantics of the vocabularies employed to describe data, which ensures the interpretation and use of these data. Furthermore, by specifying a logical semantics, it is possible to do reasoning over data and, for example, detect possible logical inconsistencies in datasets.

Hands-on Session

A second session of the tutorial will be devoted to practical work, and participants will learn how to use di erent tools to transform raw data into linked open data and how to query linked data. For this, we will use the OpenRefine tool and Apache Jena ARQ.

References

[1] T. Berners-Lee. Linked data - design issues. 2006. http://www.w3.org/designissues/linkeddata.html
[2] C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The story so far. International Journal on Semantic Web and Information Systems, 5(3):1-22, 2009.
[3] D. Brickley and R. V. Guha. RDF vocabulary description language 1.0: RDF Schema. W3C Recommendation 10 February 2004. http://www.w3.org/tr/rdf-schema/
[4] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation 25 February 2014. http://www.w3.org/TR/rdf11-concepts/
[5] S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Proposed Recommendation 8 November 2012. http://www.w3.org/tr/sparql11-query/
[6] T.Heath and C. Bizer. Linked Data - Evolving the Web into a Global Data Space, Synthesis Lectures on the Semantic Web, Morgan & Claypool Pub., 2011.
[7] P. Hitzler, M. Krotzsch, B. Parsia, P. F. Patel-Schneider, and S. Rudolph. OWL 2 Web Ontology Language Primer (Second Edition). W3C Recommendation 11 December 2012. http://www.w3.org/tr/owl2-primer/