The Linked Data initiative has made it possible for the web to evolve from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This tutorial, held during the 15th ACM SIGWEB International Symposium on Document Engineering September 8-11, Lausanne, Switzerland, aims to give an overview of the principles, models and technologies underlying Linked Data.
The Linked Data initiative promotes exposing, sharing and connecting structured data on the Web [2,6]. Linked Data has brought a large amount of data in the form of RDF triples, nowadays rising up to more than 89 billions of triples distributed over datasets of many different domains, such as media and life sciences, and geography, to name a few. As a result, the Web has gone from being a global information space in which only documents are linked to one in which both documents and data are linked: a web of documents and data. This has attracted many organizational bodies that today publish and consume linked data, ranging from public and private companies (BBC, New York Times) to public institutions (UK government).
The goal of this tutorial is to present an overview of all the principles, models and technologies of Linked Data. We will start by reviewing Tim Berners-Lee's Linked Data principles and motivating Linked Data and its precursor the Semantic Web. We will explain the RDF data model for distributing data on the Web. Then, we will introduce the SPARQL protocol and query language for RDF data. We will nish by briefly introducing semantic modeling with the RDF-S and OWL ontology languages. Below we elaborate a bit more on these topics.
The tutorial will consist of two sessions, one dedicated to theory, and another one to practice. The theoretical session will mainly cover the topics summarized below.
Tim Berners-Lee [1] outlined four main principles that are the guidelines for publishing linked data on the web:
RDF (Resource Description Framework) [4] provides a graph model for publishing and interlinking data on the web. RDF allows to describe web resources by using URIs to identify resources and represent (binary) relations between resources. RDF does not only provide a graph model but it also serves as a foundation for other standards for querying data (like SPARQL) and reasoning over data (RDF-S, OWL).
SPARQL [5] builds on top of RDF and it provides:
Linked Data applications typically rely on SPARQL for consuming linked open data.
Even though Linked Data needs no more than RDF and SPARQL (apart from URI and HTTP) to be deployed, we consider that a tutorial on Linked Data should also briefly introduce RDF-S [3] and OWL [7] ontology languages. These languages allow to specify the semantics of the vocabularies employed to describe data, which ensures the interpretation and use of these data. Furthermore, by specifying a logical semantics, it is possible to do reasoning over data and, for example, detect possible logical inconsistencies in datasets.
A second session of the tutorial will be devoted to practical work, and participants will learn how to use dierent tools to transform raw data into linked open data and how to query linked data. For this, we will use the OpenRefine tool and Apache Jena ARQ.