Boek/Portele

< Boek
Versie door Jwvveen (overleg | bijdragen) op 17 feb 2014 om 22:14 (1 versie)
(wijz) ← Oudere versie | Huidige versie (wijz) | Nieuwere versie → (wijz)

INSPIRE and Linked Data

 

Auteur

Clemens Portele (interactive instruments GmbH)

 

Linked Data as such did not exist when the INSPIRE Directive or its technical foundation in Implementing Rules were designed. However, as the focus of linked data is on principles how data is to be published on the Web, there is overlap in the goals of INSPIRE and of linked data, but there are also – sometimes subtle – differences. For example:

  • INSPIRE did not limit the networks on which spatial data is to be made available while linked data is explicitly focussed on the Web. However, so far INSPIRE has limited its technical guidance to the Web, so in reality this is not a significant difference, but the network-neutrality requirements in the Directive and Implementing Rules have impact, even if all INSPIRE data is published on the Web. This affects in particular identifiers.
  • The scope of INSPIRE is about publishing existing data sets and excludes the collection of new data. Typically publishing data as linked data goes beyond this as it strongly encourages adding links to other, related data.

The compatibility of INSPIRE concepts and linked data were first analysed in 2009 and presented at the INSPIRE conference 2010. In that paper it was highlighted that linked data is not a technical subject per se. It uses the principles of the Web, and technologies of the Semantic Web, to develop the Web of data. Linked data can be seen as a philosophy about using the Web to create a unified set of data. A key aspect is to interlink the data through Web mechanisms.

All this is consistent with what is done in SDIs in general and INSPIRE in particular. There are some differences in technologies, but these are minor. The biggest issues are non-technical.

INSPIRE is a European Directive that entered into force in May 2007. It aims to create a European Union spatial data infrastructure. This will enable the sharing of environmental spatial information among public sector organisations and better facilitate public access to spatial information across Europe. The Directive addresses 34 spatial data themes needed for environmental applications, with key components specified through technical implementing rules. Implementation of INSPIRE is ongoing.

 INSPIRE is based on a number of common principles:

  • Data should be collected only once and kept where it can be maintained most effectively.
  • It should be possible to combine seamless spatial information from different sources across Europe and share it with many users and applications.
  • It should be possible for information collected at one level/scale to be shared with all levels/scales; detailed for thorough investigations, general for strategic purposes.
  • Geographic information needed for good governance at all levels should be readily and transparently available.
  • Easy to find what geographic information is available, how it can be used to meet a particular need, and under which conditions it can be acquired and used.


Technical Comparison of Linked Data and INSPIRE[bewerken]

In the table below we compare how different technical aspects related to (spatial) data are treated in both linked data and INSPIRE. For INSPIRE we take the technical guidelines, which are not legally binding, as the basis.

We can distinguish two main causes for the differences shown in the table. First, INSPIRE – and SDIs in general – developed using a different set of web technologies than linked data. Linked data is firmly based on semantic web technologies while SDIs are mostly based on web services sending XML messages via HTTP, typically based on OGC and ISO/TC 211 standards. Second, there are differences that are due to the way the INSPIRE Directive was worded.

 

Aspect 
Linked Data
INSPIRE
Schema description

RDF-S / OWL are preferred; other languages are ok, too

UML as specified by ISO 19109

Data encoding

RDF (XML or TTL) are the preferred encoding; other encodings are ok, too; any encoding should use an open specification or at least provide the data in a structured form
GeoSPARQL specifies two RDF geometry encoding options using WKT and GML

GML – derived from the UML model using the standard GML encoding rule – as default encoding; other encodings are ok, too. Over the next years a broad variety of encodings will be used, see http://inspire.ec.europa.eu/media-types

Terms and vocabularies
Managed as resources, typically encoded in SKOS

Managed as resources, currently encoded in GML and in the future likely using SKOS, too

Identifiers
HTTP URIs for all resources
The identifiers should be stable and not depend on implementation
Identifiers not required for all data

HTTP URIs may be used, but this is only a recommendation, not a requirement

Links    

Links to other data is qualified by a link type HTTP URIs are used to reference the linked resource

Links to other features are qualified by a link type (property)

URIs are used to reference linked resource in GML encoding

Links are restricted to associations identified in application schemas

Most datasets do not have links to external resources

Access to resources
Using HTTP
Pre-defined Dataset Download Service (Atom feed option): no access to individual resources, only datasets

Pre-defined Dataset and Direct Access Download Service (WFS option): GetFeatureById query supports access to each feature using a HTTP URI

Queries on datasets

Optional, but typically provided using a SPARQL endpoint, if RDF is supported as an encoding

GeoSPARQL provides extensions for spatial query predicates

Only supported in Direct Access Download Services (WFS)
Extensions (e.g., additional attributes)

Linked data follows the open world assumption:

Additional information may be attached to any resource by anyone Extensions may be part of another dataset

INSPIRE follows a closed world assumption:

Extensions are supported and specified in extensions to the UML schema

The complete information about a feature is always part of one dataset

Table 1: Comparison of technical characteristics of spatial data in linked data and INSPIRE

 

In order to understand the differences better, we will discuss what data publishers that have a mandate to publish spatial data in INSPIRE need to do, if they want to provide their data consistent with linked data practices, too. 

There are three tasks that they would have to address beyond the INSPIRE interoperability requirements:

  • use persistent, resolvable HTTP URIs as identifiers for all features,
  • support RDF as an additional encoding, optionally provide a SPARQL endpoint for queries,
  • add and maintain links to other data

Let's have a look at each topic separately.


HTTP URIs as Identifiers[bewerken]

Identifiers in the legal framework of INSPIRE have been defined independent of a specific platform as a combination of a namespace and a local identifier in that namespace.

This is a direct consequence of supporting multiple types of platforms, at least conceptually. In practice, however, INSPIRE is implemented as part of the web and no other type of platform is supported by the implementing rules and technical guidance documents.

In a way it could also be said that INSPIRE is implemented on the web, but with additional conventions that make it hard for the rest of the web to link to and use the spatial data in INSPIRE. Similar things can be said about the semantic web. Both linked data and INSPIRE/SDIs add an additional layer of conventions on top of the web and both conventions are supported by different communities.

However, these conventions should honour the basic conventions of the web in order to avoid ending up as a closed platform that simply uses the web as a basic network infrastructure. Identifiers of resources are an important case where INSPIRE currently does not follow the conventions of the web as it is today. I.e., HTTP URIs should be used for all information resources and these URIs must be stable. There is also the expectation that information about the identified resource can be retrieved using HTTP.

Supporting HTTP URIs for features in INSPIRE is a pre-condition for the other tasks necessary to provide INSPIRE data consistent with linked data practices. Even without considering linked data, this is a general prerequisite for publishing data on the web in general.

This has been recognised and the technical guidance in INSPIRE already includes a strong recommendation to use such HTTP URIs for all features (http://inspire.ec.europa.eu/ids). A key issue is that this requires business processes and technical infrastructure that many providers of spatial data are not used to. As INSPIRE has no legal requirement for using HTTP URIs as identifiers it is likely that without supporting measures this recommendation will not be followed. The Netherlands has acknowledged this and is developing a URI strategy that also includes spatial data (see the separate paper in this book).

To implement the recommendation, two actions are needed:

First, all identifiers in INSPIRE need to be mapped to stable HTTP URIs. For features without identifiers, it should be considered whether such identifiers could be defined. A challenge in this process is that URIs must be independent of implementation details. For example, a URI of a GetFeatureById request to a WFS is not appropriate as this is likely to change with time.

Second, the necessary technical infrastructure needs to be set up, configured and maintained to resolve the HTTP URIs and return information resources. Typically this will involve a standard HTTP redirect to a current location of a resource, for example the URI of a WFS GetFeatureById request.

For features and data sets, this is the responsibility of the data/service providers. For code lists, this is the responsibility of the European Commission (for the INSPIRE code lists) and the Member States (when extending the standard code lists). For coordinate reference systems this has already been done by the Open Geospatial Consortium.

If this is accomplished, others may use the spatial data in INSPIRE to provide location context to their business information in a way that is consistent with web technologies. For example, one could associate property rights with a parcel, a timetable with a railway station, with statistical information to a statistical unit, materials with an industrial facility, etc.

Openness of the data will be important in this, too, as links to data that turns out to be inaccessible will not be useful and minimize the acceptance and uptake of spatial data that is published on the web.


RDF encoding and SPARQL endpoints[bewerken]

The linked data movement has a string preference for using RDF as an encoding of data, but this is not a strict requirement and the importance will depend on the context. However, to integrate a dataset into the linked data cloud it is likely that RDF needs to be supported. I.e., data providers that want to provide their spatial data also need to provide the data in an RDF encoding. Technically this will be straightforward as the INSPIRE application schemas and GML encoding are both largely isomorphic with RDF.

Typically the RDF will be stored in a triple store and be published via a SPARQL endpoint that also supports queries on the data.

The challenge here lie in the additional workflows that needs to be supported. In particular, updates to the dataset imply an update of the triple store, too. In addition, the technical infrastructure needs to be maintained.


Links to other data[bewerken]

The fifth star of the linked data deployment scheme is to link your data to other data to provide context. If spatial data in INSPIRE is published with HTTP URIs as identifiers, this enables others to provide location context to their data as we have discussed above. Likewise spatial data should be linked to related data. Otherwise the contribution to interlinking data would be limited. The web depends on links.

Today, most spatial datasets do not contain links to features maintained in other datasets or other resources. Spatial datasets are typically closed collections of data. The experiences with GML are relevant here: The Geography Markup Language (GML) was created more than a decade ago to web-enable spatial data and their schemas. Still today spatial data is in most cases without links to other data and GML is mostly used as just another spatial data format and not as originally intended. Maybe the time was not ready when GML was developed, but there is also a good chance that the spatial data community is still not ready to adapt to the web.

The INSPIRE application schemas reflect this and support for references to other data is limited. Using the open world assumption of the semantic web, spatial data that is encoded using RDF may of course be enriched with additional link types not included in the INSPIRE application schemas.

In INSPIRE there are no legal requirements to collect additional data – including links.


Conclusions[bewerken]

This paper provides a brief look at the gaps between INSPIRE and linked data. Technologically, the gap is small.

The main challenges are:

  • Whether organisations see sufficient benefit in publishing their features and related data with persistent HTTP URIs (and keep the data up-to-date) - encoded in RDF via SPARQL endpoints;
  • Whether organisations see sufficient benefit in establishing and maintaining links to other data. Today this is typically not the case and over the last 10-15 years not much of the existing spatial data has been adapted to web principles and it is unclear, if this is changing.

Publishing five star linked data in most cases introduces new data workflows as we have seen from the discussion in this paper and research providing a clear assessments of the value and benefits is not available at the moment.


References
[bewerken]

Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE): http://bit.ly/iwGjRH

Commission Regulation 1089/2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services: http://bit.ly/15dUzhB

INSPIRE Generic Conceptual Model (D2.5), version 3.4rc3: http://bit.ly/11lhloQ

INSPIRE Guidelines for the encoding of spatial data (D2.7), version 3.3rc3: http://bit.ly/173DuZt

Cox, S., Schade, S., Portele, C. Linked Data in SDI, INSPIRE Conference 2010: http://bit.ly/12SDWMK

Bizer, C., Heath, T., Berners-Lee, T. Linked Data - The Story So Far. Special Issue on Linked Data, International Journal on Semantic Web and Information Systems (IJSWIS), http://linkeddata.org/docs/ijswis-special-issue

Berners-Lee, T. Linked Data, http://www.w3.org/DesignIssues/LinkedData.html