INSPIRE and Linked Data
Auteur
Clemens Portele (interactive instruments GmbH)
Linked Data as such did not exist when the INSPIRE Directive or its technical foundation in Implementing Rules were designed. However, as the focus of linked data is on principles how data is to be published on the Web, there is overlap in the goals of INSPIRE and of linked data, but there are also – sometimes subtle – differences. For example:
The compatibility of INSPIRE concepts and linked data were first analysed in 2009 and presented at the INSPIRE conference 2010. In that paper it was highlighted that linked data is not a technical subject per se. It uses the principles of the Web, and technologies of the Semantic Web, to develop the Web of data. Linked data can be seen as a philosophy about using the Web to create a unified set of data. A key aspect is to interlink the data through Web mechanisms.
All this is consistent with what is done in SDIs in general and INSPIRE in particular. There are some differences in technologies, but these are minor. The biggest issues are non-technical.
INSPIRE is a European Directive that entered into force in May 2007. It aims to create a European Union spatial data infrastructure. This will enable the sharing of environmental spatial information among public sector organisations and better facilitate public access to spatial information across Europe. The Directive addresses 34 spatial data themes needed for environmental applications, with key components specified through technical implementing rules. Implementation of INSPIRE is ongoing.
INSPIRE is based on a number of common principles:
In the table below we compare how different technical aspects related to (spatial) data are treated in both linked data and INSPIRE. For INSPIRE we take the technical guidelines, which are not legally binding, as the basis.
We can distinguish two main causes for the differences shown in the table. First, INSPIRE – and SDIs in general – developed using a different set of web technologies than linked data. Linked data is firmly based on semantic web technologies while SDIs are mostly based on web services sending XML messages via HTTP, typically based on OGC and ISO/TC 211 standards. Second, there are differences that are due to the way the INSPIRE Directive was worded.
Aspect |
Linked Data |
INSPIRE |
---|---|---|
Schema description |
RDF-S / OWL are preferred; other languages are ok, too |
UML as specified by ISO 19109 |
Data encoding |
RDF (XML or TTL) are the preferred encoding; other encodings are ok, too; any encoding should use an open specification or at least provide the data in a structured form |
GML – derived from the UML model using the standard GML encoding rule – as default encoding; other encodings are ok, too. Over the next years a broad variety of encodings will be used, see http://inspire.ec.europa.eu/media-types |
Terms and vocabularies |
Managed as resources, typically encoded in SKOS |
Managed as resources, currently encoded in GML and in the future likely using SKOS, too |
Identifiers |
HTTP URIs for all resources The identifiers should be stable and not depend on implementation |
Identifiers not required for all data HTTP URIs may be used, but this is only a recommendation, not a requirement |
Links |
Links to other data is qualified by a link type HTTP URIs are used to reference the linked resource |
Links to other features are qualified by a link type (property) URIs are used to reference linked resource in GML encoding Links are restricted to associations identified in application schemas Most datasets do not have links to external resources |
Access to resources |
Using HTTP |
Pre-defined Dataset Download Service (Atom feed option): no access to individual resources, only datasets Pre-defined Dataset and Direct Access Download Service (WFS option): GetFeatureById query supports access to each feature using a HTTP URI |
Queries on datasets |
Optional, but typically provided using a SPARQL endpoint, if RDF is supported as an encoding GeoSPARQL provides extensions for spatial query predicates |
Only supported in Direct Access Download Services (WFS) |
Extensions (e.g., additional attributes) |
Linked data follows the open world assumption: Additional information may be attached to any resource by anyone Extensions may be part of another dataset |
INSPIRE follows a closed world assumption: Extensions are supported and specified in extensions to the UML schema The complete information about a feature is always part of one dataset |
Table 1: Comparison of technical characteristics of spatial data in linked data and INSPIRE
In order to understand the differences better, we will discuss what data publishers that have a mandate to publish spatial data in INSPIRE need to do, if they want to provide their data consistent with linked data practices, too.
There are three tasks that they would have to address beyond the INSPIRE interoperability requirements:
Let's have a look at each topic separately.
Identifiers in the legal framework of INSPIRE have been defined independent of a specific platform as a combination of a namespace and a local identifier in that namespace.
This is a direct consequence of supporting multiple types of platforms, at least conceptually. In practice, however, INSPIRE is implemented as part of the web and no other type of platform is supported by the implementing rules and technical guidance documents.
In a way it could also be said that INSPIRE is implemented on the web, but with additional conventions that make it hard for the rest of the web to link to and use the spatial data in INSPIRE. Similar things can be said about the semantic web. Both linked data and INSPIRE/SDIs add an additional layer of conventions on top of the web and both conventions are supported by different communities.
However, these conventions should honour the basic conventions of the web in order to avoid ending up as a closed platform that simply uses the web as a basic network infrastructure. Identifiers of resources are an important case where INSPIRE currently does not follow the conventions of the web as it is today. I.e., HTTP URIs should be used for all information resources and these URIs must be stable. There is also the expectation that information about the identified resource can be retrieved using HTTP.
Supporting HTTP URIs for features in INSPIRE is a pre-condition for the other tasks necessary to provide INSPIRE data consistent with linked data practices. Even without considering linked data, this is a general prerequisite for publishing data on the web in general.
This has been recognised and the technical guidance in INSPIRE already includes a strong recommendation to use such HTTP URIs for all features (http://inspire.ec.europa.eu/ids). A key issue is that this requires business processes and technical infrastructure that many providers of spatial data are not used to. As INSPIRE has no legal requirement for using HTTP URIs as identifiers it is likely that without supporting measures this recommendation will not be followed. The Netherlands has acknowledged this and is developing a URI strategy that also includes spatial data (see the separate paper in this book).
To implement the recommendation, two actions are needed:
First, all identifiers in INSPIRE need to be mapped to stable HTTP URIs. For features without identifiers, it should be considered whether such identifiers could be defined. A challenge in this process is that URIs must be independent of implementation details. For example, a URI of a GetFeatureById request to a WFS is not appropriate as this is likely to change with time.
Second, the necessary technical infrastructure needs to be set up, configured and maintained to resolve the HTTP URIs and return information resources. Typically this will involve a standard HTTP redirect to a current location of a resource, for example the URI of a WFS GetFeatureById request.
For features and data sets, this is the responsibility of the data/service providers. For code lists, this is the responsibility of the European Commission (for the INSPIRE code lists) and the Member States (when extending the standard code lists). For coordinate reference systems this has already been done by the Open Geospatial Consortium.
If this is accomplished, others may use the spatial data in INSPIRE to provide location context to their business information in a way that is consistent with web technologies. For example, one could associate property rights with a parcel, a timetable with a railway station, with statistical information to a statistical unit, materials with an industrial facility, etc.
Openness of the data will be important in this, too, as links to data that turns out to be inaccessible will not be useful and minimize the acceptance and uptake of spatial data that is published on the web.
The linked data movement has a string preference for using RDF as an encoding of data, but this is not a strict requirement and the importance will depend on the context. However, to integrate a dataset into the linked data cloud it is likely that RDF needs to be supported. I.e., data providers that want to provide their spatial data also need to provide the data in an RDF encoding. Technically this will be straightforward as the INSPIRE application schemas and GML encoding are both largely isomorphic with RDF.
Typically the RDF will be stored in a triple store and be published via a SPARQL endpoint that also supports queries on the data.
The challenge here lie in the additional workflows that needs to be supported. In particular, updates to the dataset imply an update of the triple store, too. In addition, the technical infrastructure needs to be maintained.
The fifth star of the linked data deployment scheme is to link your data to other data to provide context. If spatial data in INSPIRE is published with HTTP URIs as identifiers, this enables others to provide location context to their data as we have discussed above. Likewise spatial data should be linked to related data. Otherwise the contribution to interlinking data would be limited. The web depends on links.
Today, most spatial datasets do not contain links to features maintained in other datasets or other resources. Spatial datasets are typically closed collections of data. The experiences with GML are relevant here: The Geography Markup Language (GML) was created more than a decade ago to web-enable spatial data and their schemas. Still today spatial data is in most cases without links to other data and GML is mostly used as just another spatial data format and not as originally intended. Maybe the time was not ready when GML was developed, but there is also a good chance that the spatial data community is still not ready to adapt to the web.
The INSPIRE application schemas reflect this and support for references to other data is limited. Using the open world assumption of the semantic web, spatial data that is encoded using RDF may of course be enriched with additional link types not included in the INSPIRE application schemas.
In INSPIRE there are no legal requirements to collect additional data – including links.
This paper provides a brief look at the gaps between INSPIRE and linked data. Technologically, the gap is small.
The main challenges are:
Publishing five star linked data in most cases introduces new data workflows as we have seen from the discussion in this paper and research providing a clear assessments of the value and benefits is not available at the moment.
Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE): http://bit.ly/iwGjRH
Commission Regulation 1089/2010 implementing Directive 2007/2/EC of the European Parliament and of the Council as regards interoperability of spatial data sets and services: http://bit.ly/15dUzhB
INSPIRE Generic Conceptual Model (D2.5), version 3.4rc3: http://bit.ly/11lhloQ
INSPIRE Guidelines for the encoding of spatial data (D2.7), version 3.3rc3: http://bit.ly/173DuZt
Cox, S., Schade, S., Portele, C. Linked Data in SDI, INSPIRE Conference 2010: http://bit.ly/12SDWMK
Bizer, C., Heath, T., Berners-Lee, T. Linked Data - The Story So Far. Special Issue on Linked Data, International Journal on Semantic Web and Information Systems (IJSWIS), http://linkeddata.org/docs/ijswis-special-issue
Berners-Lee, T. Linked Data, http://www.w3.org/DesignIssues/LinkedData.html
Resource Description Framework (RDF) is een standaardmodel voor gegevensuitwisseling op het web. RDF heeft functies die het samenvoegen van gegevens vergemakkelijken, zelfs als de onderliggende schema's verschillen, en het ondersteunt specifiek de evolutie van schema's in de loop van de tijd zonder dat alle gegevensgebruikers moeten worden gewijzigd.
De activiteiten van Platform Linked Data Nederland (PLDN) worden mede mogelijk gemaakt dankzij het Kadaster, TNO, Big Data Value Center (BDVC), ECP, Forum Standaardisatie, Kennisnet, SLO, Waternet, Taxonic, MarkLogic, Triply, Franz Inc., SemmTech, Rijksdienst voor het Cultureel Erfgoed (RCE), Beeld en Geluid, EuroSDR, de KVK en ArchiXL
Wilt u op de hoogte gehouden worden van nieuws en ontwikkelingen binnen PLDN?
Schrijf u dan in voor de nieuwsbrief