Thijs Brentjens and Linda van den Brink, Geonovum February 24, 2014
In our view four things are needed in order to integrate geo-information successfully with the Semantic Web.
These five things are described in more detail below.
For the sake of interoperability there should be one standard vocabulary for linked geo data, published by W3C. W3C Basic Geo is too limited, supporting only point geometries. For leveraging all the geo data that is out there, more than just point geometry is needed. There are already several vocabularies around that offer this. Some of the well-known ones are the vocabulary from GeoSPARQL, OGC GML , OGC KML , NeoGeo, Location Core Vocabulary and GeoJSON . They have a lot of similarities. GeoSPARQL, NeoGeo, GML, KML and GeoJSON don’t only describe geometry types. They also share a class of objects called Feature which has the same or a similar meaning. Also, they all have a property called ‘geometry’. Location Core Vocabulary is a bit broader. Where Features are generally objects that have a geometry as property, Location Core defines the class Location (actually it reuses Dublin Core Location) which is a spatial region or a named place. Properties are defined not only for geometry but also for place names, place identifiers and for addresses.
In our opinion GeoJSON is a good starting point. It has a basic data model and supports a decent set of geometry types: "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", and "GeometryCollection".
We estimate this is sufficient for all but very advanced use cases. For example, in the Dutch national dataset (which is now being created) for large scale topography, arcs are used. This geometry type is not supported in GeoJSON. There should be discussion on whether more than what is currently in GeoJSON is needed. Elements from GeoSPARQL (e.g. more geometry types), NeoGeo, and Location Core Vocabulary (e.g. named places, addresses) should be considered. This should be a joint W3C/OGC activity. The ideal outcome in our eyes would be a W3C vocabulary covering most use cases, and possibly an OGC vocabulary covering special, advanced use cases (like GML / GeoSPARQL).
Recently, on January 16th, the W3C has published the JSON-LD recommendation. JSON-LD is a JSON encoding for Linked Data. Since in the geospatial domain (Geo)JSON is becoming increasingly popular and Linked Data is also a hot topic (at least in the Netherlands it is), the question also rises if JSON-LD and GeoJSON can be used together.
We tried this in an experimental setup and successfully combined GeoJSON with JSON-LD. GeoJSON-LD could be the web-encoding for linked geospatial data; it could be direct output of (Geo)SPARQL or be used as output of specific APIs.
JSON-LD allows for the use of any vocabulary to encode geometry. Instead of combining it with GeoJSON, it could also be combined with, for example, GeoSPARQL for more specialized GIS use cases. Via GeoSPARQL any GML geometry type can be used. Location Core Vocabulary also takes the approach of offering different possibilities: a geometry may be encoded as a WKT (Well Known Text, see ISO 19125-1) string literal, GML or KML, a GeoSPARQL or Basic Geo geometry class, schema.org RDF, or a geocoded URI. This allows people to select whatever fits them best, but does not help much with gaining interoperability between datasets.
GeoJSON, an extension of JSON for geometry, is in our opinion good enough for a large number of use cases. In its favour, it is a lightweight encoding, and less verbose than XML encodings like GML. Support in existing software and platforms is pretty good . After adding LD @context to GeoJSON in our experiment, we found that GIS applications with no understanding of JSON-LD could still use the data.
In some cases there is a need for embedding geometry as a string of a property. GeoSPARQL uses this approach where a geometry might be encoded as GML geometry or WKT in an RDF triple. Encoding geometry as WKT could also be useful for embedding geometry in HTML pages, using for example RDFa. WKT is preferred here, since GML is an XML encoding, which could easier result in issues with encoding in HTML if not done properly.
OGC GeoSPARQL, as an extension of W3C SPARQL, defines a vocabulary for asserting and querying topological relations between spatial objects. This is very useful as it allows you to assert / query whether two spatial objects cross each other, one lies within the other, is near another, etc. However the topology clause in GeoSPARQL is parameterized to allow the use of different families of topological relations (Simple Features, RCC8, and Egenhofer; this goes back to different mathematical definitions of what, for example, an intersection is precisely). This seems overly complex and could hinder wide implementation as well as interoperability.
In other standards where these topological relations are used, such as OGC Filter Encoding , there is no such parameterization. We recommend selecting just one from these families of topological relations. It still needs to be determined which one this should be. OGC Filter Encoding (ISO 19143) uses Simple Features (ISO 19125-1), NeoGeo uses RCC8. OGC Filter Encoding is probably the best starting point.
Coordinate reference systems (CRS) are to geo-information what character encodings are to text. If you don’t know which CRS is used, you can’t use the coordinates. Different CRSs exist for a reason: localized CRSs provide more precise coordinates for a certain part of the globe. It is not possible for a global CRS to be as precise, for example because the continental plates move a few centimetres every year. For large scale data and applications this continental drift could be very relevant over time. Take for example the boundary of cadastral parcels. If this drift is not taken into account, there could be issues if parcel boundaries that were established e.g. 10 years ago are overlaid over recently acquired aerial imagery with high accuracy (e.g. 10 cm). There could be visual differences, while the actual situation did not change.
Discussion is necessary on whether support for different coordinate reference systems, geographic as well as projected ones, is needed in linked geo data standards. The possibility to use different CRSs hinders interoperability (datasets using different CRSs cannot be easily combined, a complex transformation is necessary) but on the other hand this option is perhaps needed for use cases where a high precision of coordinates is important.
Even if this turns out to be necessary, the default should be WGS84 (lat/lon).
In GeoSPARQL it is possible to refer to a CRS, but the reference is part of the geometry literal. If this were a separate property it would be easier to use the CRS as a selection criterion (which is desirable, for example, when displaying data on a map: data which uses different CRS cannot be combined on a map). In the GeoJSON object model a member ‘crs’ is defined. In GML there is a similar property, ‘srsName’, to indicate the coordinate reference system used. These are good examples of how it should be done.
Geometries, especially lines and polygons, may contain many coordinates. For example, a municipal boundary could easily contain more than 1500 coordinate pairs. Compared to non-geometric properties, this can result in large amounts of data to transfer and process. The coordinates can easily be 95% of all data of an object when using polygons. The question rises whether there is a need for performance optimization and/or compression techniques for large amounts of coordinates. If so, there could also be a need to standardize such a technique, similar to the PNG format for encoding images.
There are several examples of coordinate compression techniques. The Google Maps API defines an algorithm to compress the coordinate values of a polyline to a single string . Also, the human readable Well Known Text (WKT) representation of geometry has a binary counterpart, Well Known Binary (WKB), which is much more compact. Both are defined in ISO 19125-1 and used for storage and exchange of geometries. WKT is referenced in several OGC standards.
Het World Wide Web Consortium is een organisatie die de webstandaarden voor het wereldwijde web ontwerpt, zoals HTML, XHTML, XML, CSS en de Web Content Accessibility Guidelines. Het wordt geleid door Tim Berners-Lee, de originele bedenker van het HTTP-protocol en HTML, waar het web oorspronkelijk en nog steeds grotendeels op gebaseerd is.
JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. JSON-LD is an ideal data format for programming environments, REST Web services, and unstructured databases such as CouchDB and MongoDB.
Resource Description Framework (RDF) is een standaardmodel voor gegevensuitwisseling op het web. RDF heeft functies die het samenvoegen van gegevens vergemakkelijken, zelfs als de onderliggende schema's verschillen, en het ondersteunt specifiek de evolutie van schema's in de loop van de tijd zonder dat alle gegevensgebruikers moeten worden gewijzigd.
De activiteiten van Platform Linked Data Nederland (PLDN) worden mede mogelijk gemaakt dankzij het Kadaster, TNO, Big Data Value Center (BDVC), ECP, Forum Standaardisatie, Kennisnet, SLO, Waternet, Taxonic, MarkLogic, Triply, Franz Inc., SemmTech, Rijksdienst voor het Cultureel Erfgoed (RCE), Beeld en Geluid, EuroSDR, de KVK en ArchiXL
Wilt u op de hoogte gehouden worden van nieuws en ontwikkelingen binnen PLDN?
Schrijf u dan in voor de nieuwsbrief