Boek 5/Blog: Phil Archer - Standardisation is a community effort

< Boek 5

One of the endless discussions among W3C staff is around what we do, what we should have done in the past, and what we should do in future to support the development and maintenance of vocabularies.

Whatever your technology, data interoperability can only be achieved if different systems use the same terms, or, at least, terms that can be programmatically mapped. The W3C process is good for creating stable vocabularies like SKOS and the RDF Data Cube, but what about vocabularies that are continually evolving? The stellar example of that is schema.org. Should W3C adapt its tooling and processes to match? What’s the right balance between stability and agility, integrity and practical flexibility, centralisation and distribution?

To make progress, I need a hook, that is, an example of a vocabulary that has a significant community of interest but that is in need of maintenance. Step forward DCAT, the Data Catalogue Vocabulary, and, step forward the VRE4EIC project. A Virtual Research Environment (VRE) offers visualisation and manipulation services across multiple datasets from multiple sources. The project has looked at dozens of potential data sources and it’s clear that there is no single dataset description vocabulary that dominates, nor is there likely to be. That’s the Web.

At the end of November, I made my sixth and final trip of 2016 to the Netherlands. This time to run the Smart Descriptions & Smarter Vocabularies (SDSVoc) workshop that discussed these issues. The final report from that event has been published around the end of January. As well as looking at some of the many dataset description vocabularies in use, and the specific need to update and improve DCAT, we discussed the idea of content negotiation by profile. That is, that you can request data encoded in JSON or RDF or XML etc. but we want clients to be able to request data in, say, RDF using DCAT and for servers to support multiple alternative responses. Datasets with spatiotemporal aspects need specific metadata fields; tooling and search are important too of course. There’s a lot to think about.

Subject to the usual W3C member-driven process, I expect to be beginning a new Working Group around April-May that will pick up on these ideas. DCAT is the poster-child, but the issues are much broader. As ever, standardisation is a community effort - and you’ll be welcome to join in.

Phil Archer
Data Strategist at W3C (www.w3.org)