Unconference 2.0 February 9th, 2017 Meeting Notes

Unconference 2.0 February 9th, 2017 Meeting Notes

A shorter Dutch version of the meeting notes (without links) can be found here.

The second PLDN Unconference was organized on Thursday February 9 at the Kadaster in Amsterdam. A large number of Unconference topics were suggested by the participants to discuss and collectively we have structured these topics in logical groupings for discussion (see picture). A summary of what we have discussed can be found in the following text. In general, we have followed the following structure for each topic discussion:

  • Unconference topic introduction (pitch)
  • Questions, issues, challenges and/or practices
  • Possible solution scenario’s, thoughts, ideas and/or related initiatives (including useful links to relevant sources)
  • Agreed upon follow-up actions (if any)


And for some topics the meeting notes are more in W3C notes style.

Unconference Topics[bewerken]

DSC01204.JPG

W3C Working Groups Update[bewerken]

Bart van Leeuwen of Netage gave an update on this topic.

Persons from several companies in the Netherlands are now participating in three W3C Working Groups. Netage and Taxonic are official W3C members and participate in the development of a number of W3C standards, so called W3C recommendations. W3C Working Groups don’t create standards, but propose recommendations that can become standards if these recommendations are adopted as standard on e.g. the ‘comply or explain list’ of the Dutch government.

Linda van den Brink of Geonovum is one of the contributors of the Spatial Data on The Web Best Practices document, a deliverable of Spatial Data on The Web Working Group. Netage and Taxonic are active within the Data Shapes Working Group and Lieke Verhelst of the Linked Data Factory has recently started the SKOS vs OWL for Interoperability Community Group together with Bart, which was launched during a discussion at the SDSVoc meeting in Amsterdam last month.

SKOS and OWL Community Group[bewerken]

Bart van Leeuwen of Netage gave an update on this topic.

The W3C SKOS and OWL for Interoperability Community Group has been set up after a group discussion at the SDSVoc W3C Workshop at the end of last year. Lieke Verhelst of the Linked Data Factory has initiated group within the W3C context. They see a number of bad modelling practices with respect SKOS and OWL usage in practice for which they would like to give clarity and good examples on how to do use SKOS and OWL better. SKOS is more a general-purpose language, while you need a more specific use case for OWL. This W3C Community Group is looking for more participants.

Spatial Data on The Web Working Group[bewerken]

Linda van den Brink of Geonovum gave an update on this topic

Linda: The Spatial Data on The Web Working Group is working the following deliverables:


This Working Group will finish their activities in June 2017.

Pano: Is it clear yet how you can refer to a CRS from a set of coordinates?
Linda: No, but this will be included in the Best Practice release at the end of March.
Wouter: Do you need to give proof for each deliverable?
Linda: Not for the best practice, but for SSN and OWL Time it is required. But for the best practice we would like to include a number of good real-world examples.
Wouter: Will look into this together with Pano to determine which Kadaster examples would fit best within the best practice document.

Data Shapes Working Group[bewerken]

Nicky van Oorschot of Netage gave an update on this topic

Nicky: Pano and I are members of the RDF Data Shapes Working Group who are working on SHACL. With SHACL it is possible to define constraints on Linked Data. This was an omission in RDF.
Pano: The Working Group is now active for more than 2 years.
Nicky: SHACL is still in draft phase, but it has already been used a lot by a number of projects. So, it is fulfilling a bigger need in this area. The Working Group is currently working on solving a number of issues. A new Working Draft has been published last week.
Pano: It is the intention that the draft version becomes a candidate recommendation, but it is also clear that there are some sensitivities around this topic (SHACL vs ShEx opponents).
Bart: You can implement every shape in SHACL with SPARQL.
Pano: The Kadaster is using shapes, e.g. to generate UI’s with screen validation. Triply is also using SHACL for facetted browsing in the Linked Data Reactor (LD-R).
Nicky: Netage has described incident forms for the American Fire Department Authority as SHACL shapes based upon known incidents from the past.
Wouter: How likely is it that SHACL becomes a de facto standard instead of a W3C recommendation? In practice W3C is now lagging behind current implementations.
Bart: In general, W3C creates recommendations based upon member submissions. Also on the case of SHACL. It is our expectation that this will work out well in the end.
Nicky: The last version of the public draft shows a number of differences compared to the current implementations.
Bart: At W3C you see that recommendation tracks are finished according to schedule and that a community group is initiated after those tracks to think about the next version.
Erwin: Is SHACL a good candidate for the comply-or-explain list of the Dutch government?
Linda: That’s too early, SHACL is not yet a standard. The W3C Data on The Web Best Practice is a recommendation now and could therefore be a candidate for this list.
Erwin: I will look into this on how we can proceed with this topic.

How-To Convert UML-Diagrams to Correct Semantic Models?[bewerken]

Linda van den Brink of Geonovum pitched this topic

Topic context: The Geonovum information models are all in UML (e.g. IM-GEO), for which a norm is available. Geonovum would also like to publish these models as Linked Data, but is looking for best practices to convert the UML models to Linked Data.

Known Kadaster issues and examples: Object-orientation in models, normalization and denormalization practices, definition of time aspects, enumerations, etc. (Pano)

  • Everything in the BAG is a BAG object with an id and a status with a valid-from and a valid-to definition. Therefore, two dimensions are combined in 1 object, the data and the metadata, something you would like to model separately with Linked Data.
  • A lot of reference data is used within the context of destination plans. You would also like to ‘decouple’ this reference data for more general reference purposes.


Would ‘thingy-fying’ everything be a possible best practice (create a class for everything)?

Other example:


Problem: A paradigm-change, where you want to get it right for Linked Data and where you want to have a transitional arrangement for the old UML models (Gerard).

Requirement: A best practice, which helps you to define models semantically correct and to define the relationships between models, which also take into account the relevant legal frameworks and which also give recommendations about what to include in legislation and more important what not given the current experience with BAG data and definitions.

Possible solution scenario’s:

  • ISO-standard for converting UML to OWL from the Geo domain, which has been often criticized given the quality of the OWL models that have been produced with this ISO-standard.
  • Use shapes to define governance separately


Follow-up actions:

  • Initiate a new W3C community group with e.g. Michael Lutz to work on this and or
  • Focus upon NEN 3610 with the Kadaster, Geonovum. ArchiXL, … and schedule a Conceptual Friday to discuss this further (action: Linda)


Towards A Sound and Comprehensive RDF 2.0 Version?[bewerken]

Pieter van Everdingen of OpenInc pitched this topic.

The current set of W3C core recommendations is a set of complementary standards, where a next standard adds new concepts to a previous standard where a previous standard fall short. In working this way we have now e.g. the following set of core standards:

  • RDF
  • RDFS
  • SKOS
  • OWL
  • SHACL


Having many standards adds to the confusion of when and how to use a standard properly for a given situation (see also the SKOS vs OWL update). And if we look at inferencing with data, we see that the current set of standard is not semantically rich enough and not mathematically sound enough to do it well for all possible domains and situations. Certain relationship types are missing and certain standards, like OWL, have some mathematical foundation while others, like RDFS and SKOS, are less formal.

When we look at the possible evolution of these standards then we see two possible scenario’s in our effort to get these standards right somewhere in the future:

  • Scenario 1: Make better versions of the current set of complementary standards via a consensus approach
  • Scenario 2: Investigate the current usage of these standards in detail, determine their current shortcomings, ‘weed out’ unnecessary concepts and work with a small number of lead experts to come to the best possible minimal viable version of RDF 2.0 that includes all fundamental concepts needed for creating semantically rich models in detail with all possible list, tree and network structures (simple and complex) and the most often used constraints within these structures. And this minimal viable version of RDF 2.0 should also include all fundamental concepts for correct inferencing with well-designed data structures and constraints.


My concern is that in the current effort to ‘get things right’ we continue to add concepts to standards and invent new additional standards, while the foundation of these standards still has its shortcomings and will continue to have these shortcomings if we don’t decide upon a different approach. The foundation ‘rumbles’ and we continue to suffer from these shortcomings if we don’t get it right at the most foundational level of the semantic web and linked data at the first place.

The idea of scenario 2 is also based upon the evolution of version 1 to a version 2 of the relational model. Also, version 1 of the relational model had its shortcomings, but Edgar Codd (and Chris Date) worked very hard in the late eighties in the previous century to get version 2 of the relational model right with a solid mathematical foundation, using set theory, predicate logic and relational calculus. At this moment I have no arguments why this scenario could not work for Linked Data.

Comments on this pitch from the other participants:

Wouter: If you would like to extend the entailment options of Linked Data and the Semantic Web then you might better look at initiatives outside the RDF standard first, like SKOS and OWL.
Pano: In practice inferencing is mostly a performance problem. In theory you can do a lot of inferencing, but in practices it might be not feasible.
Wouter: In practice we also see that the partOf relationship is missing, but you can add this relationship to your own vocabulary. OWL does not prevent you from doing so.
Bart and Wouter: What is missing then in the current set of core standards? Did you already look at W3C RDF 1.1 archives to see why certain concepts were not included?
Pieter: Not yet, it something I should do somewhere in the future to get a better idea on how the recommendation has evolved, but it also sound like a lot of work. First, I would like to have your feedback on the possible scenario’s and whether working on a new RDF 2.0 recommendation as proposed would be desirable or not. And then start doing the ‘homework’.
Gerard: The OWL recommendation is produced in the past by having consensus among the Working Group members. They did not succeed to get everything in this recommendation.
Pieter: Ok, I understand this process, but that does not say anything yet (or not in enough detail) about the mathematical correctness and completeness of the fundamental concepts with respect to correct and detailed (or more complex) inferencing you would like to do with the data structures, constraints and rules you would like to define with these fundamental concepts.
Wouter: In practice we see wrong usage of mathematical relationships, e.g. with the owl:sameAs relationship. It might be an idea to add a rdfs:sameAs relationship as a less formal variant. Jim Hendler also proposed a number of improvements that would be helpful (from slide 21 e.g. also the partOf relationship, the implies relationship, being able to work with probability, temporal reasoning, procedural attachment, and having sufficient formalism, different kinds of completeness in a decidable way, etc., in order to improve and 'revive' the usage of OWL)
Wouter: My RDF wish list of possible improvements would be:

  • Remove Blank Nodes from the recommendation and use well-known URI’s instead
  • Allow that literals can also be used at the subject and predicate position (that you can say something about a string, which can be helpful)


Pano: Is the second wish not in conflict with the blank nodes wish?
Wouter: I don’t think so. Ideally everything has its own URI.
Nicky: You can also solve literals at subject position in a different way (details to be included later)
Wouter: Then I withdraw my second wish.
Joop: Blank nodes can be useful in certain contexts given their compact syntax.
Bart:

  • Also remove Reification from the recommendation


Pieter: In general I think it would be good if it would become more clear when to use RDF, SKOS and OWL (especially for newcomers). So, we need to continue to explain the Linked Data basics. The W3C SKOS vs OWL for interoperability community group can help to give more clarity, but I think it would also be good the get the foundation right by looking at the current shortcoming of the set of complementary recommendations and follow an approach similar to creation of version 2 of the relational model to get a solid and correct RDF foundation for the future.

Need & Proper Usage of Indirect Identifiers (id & doc URI's)?[bewerken]

Linda van den Brink of Geonovum pitched this topic.

Linda: Explains the problem with indirect identifiers (the difference between id and doc URI’s).
Wouter: The distinction is only important for digital things, not for physical things.
Wouter: Is any empirical information available on how indirect identifiers are used now? And can you translate this to a best practice.
Bart: This is also applicable for fire department incidents. You can model it in a way that something is a description of something else.
Gerard: +1 Make sure that your model is right and that you make the distinction.
Bart: An incident is an event with an id from the moment it has been reported. But the information about the response by the fire department to an incident also gets and id.
Pano: You would like to make this distinction clear to everybody to avoid confusion. E.g. by using an isDesribedBy predicate?
Linda: The current best practice has defined an issue for this on Github on which you can react:
https://github.com/w3c/sdw/issues/208
See also http://w3c.github.io/sdw/bp/ (section Status of this Document)
Nicky: Maybe it is also a problem that id and doc URI’s are too much alike.
Bart: Maybe it is something that can be solved in the HTTP link header.
Wouter: This topic can use some uptake, since this is used by web developers a lot. HTTP RFC does not support this at the moment, but you can add and register a keyword at IANA.
Bart: This also something that can be discussed within LDP Next (the follow-up of the Linked Data Platform Community Group). This is a follow-up action for Bart.

How-To Query Geo Data with Acceptable Performance?[bewerken]

Wouter Beek of the VU Amsterdam/Triply pitched this topic.

Requirements and findings: Next to 2D polygons we would also like to have tools to define 3D data in Linked Data and query that data with acceptable response times (e.g. a containment question). Reasonable results can already be accomplished using Lucene and smart index strategies.

Another requirement is that you would like to be able to define an inclusion relationship at functional level. In practice we see that commercial products are not making a lot of effort yet to include these requirements in their products, but we must also must say that we still must further investigate Oracle and Stardog solutions to come a better conclusion.

Questions to be answered:

  • What is so different about Geo data what has an impact on its performance?
  • How does GeoSPARQL work and can you define used-defined functions properly?
  • How are indexes used given the graph paradigm of Linked Data?
  • Do we need to ‘decompose’ WKT in triples or not? (causing a triple explosion)
  • Can be co-operate with the Linked Data Benchmark Consortium (LDBC), HOBBIT or any other organization for Geodata benchmarks?
  • Does any H2020 program has a specific use case for Geospatial queries? (next to NWO and Environmental Act projects)?
  • What is the feasibility of this topic (positive or negative)? We tend to be positive and realistic about this topic that good progress can be made with the right parties involved.


Follow-up action:

  • Get this on the agenda of the OGC meeting next month in Delft (Wouter)


How-To Make Linked Data Tooling More Sustainable?[bewerken]

Pieter van Everdingen of OpenInc pitched this topic.

Question to be answered:

In the past we have seen a examples of Linked Data tooling stacks like LOD2, but unfortunately we see in practice that these stack proposals don’t get enough ‘critical mass’ to become sustainable while we see that e.g. a large number Java tools became sustainable enough in such a way that you can design and implement mature software factories for a community of developers. The question is then, what can we do different to make Linked Data tooling more sustainable?

Findings on current stacks:

Stacks are often too big and too complex for starters, they are difficult to install (with or without Docker) and tools within a stack are sometimes immature, which makes day-to-day usage of these tools cumbersome. It is also the intention of European projects like LOD2 that commercial parties adopt the insights of these projects in their own Linked Data products, but this is only happening at a very small scale with a limited number of vendors in this area.

The vendors and tools that did adopt insights from European projects are Eccenca, the Semantic Web Company with their Poolparty product suite, the Linked Data Reactor (LD-R), YASGUI and HDT. But we also see that it takes a lot of effort to make these tools sustainable. In practice it is feasible to make small components sustainable, but making whole stacks sustainable is still too challenging, since we cannot create enough critical mass yet.

A counter-productive force in this area is that developers sometimes build new tools themselves faster than adopting existing tools. But on the other hand we also see some convergence efforts from commercial organizations, which convert their Linked Data tools to Java-based solutions in order to be more competitive and more compliant.

How-To Make Linked Data Publishing Less Challenging?[bewerken]

Bart van Leeuwen of Netage pitched this topic.

Question to be answered:

If we look at the quality of Beta releases of current Linked Data publications than we see much room for improvement. So, what can we do to improve the quality of these publications.

Possible solution scenario’s:

Can we make more use of different types of validators to check the quality of a linked data publication and of API’s? This might be an interesting research topic for a student at a university (building upon what the LOD Laundromat already can do at dataset level).

How-To Organize Coordination & Trust in Collaboration Chains?[bewerken]

Jeroen van Beele pitched this topic.

Question to be answered: How can sub-economies collaborate using DEMO and Linked Data (http://www.ee-institute.org/en/demo). Starting point are fundamental concepts like Resources, Labor, Planning, Goals and Knowledge that we would like to share in the collaboration chain. The biggest challenge is in the area of Goals and Planning sharing and how we can move from exchange values to use values.

Related initiatives:

  • The SOLID project, the distributed social web with Tim Berners-Lee a.o.
  • The World Garden project from 2005, with WebID based upon FOAF, which you can use to manage your identity among the web
  • Cryptocurrency initiatives
  • Data science initiatives at the Jheronimus college (JADS)
  • Linked Data Blockchain / Hyperledger initiatives


Issues:

  • All initiatives have a more technical focus with no sound business case and clear incentives for end users. Only central models, like Facebook have a clear business case.
  • Executing queries using one central source is much easier and better scalable than a situation with query-ing over many nodes (which is often not feasible with the current technology).


How-To Develop More Data-Aware User Interfaces & Why?[bewerken]

Ali Khalili of the VU Amsterdam pitched this topic

You would like to make better use of the (linked) data that is available to create better user interfaces. E.g. websites with unstructured data like Wikipedia usually use static templates to create a simple user interface, but websites with structured data like DBpedia can use the available data to create better user interfaces more flexible and more dynamically.

Related initiatives and ideas

  • Web of intent
  • Express intent using UI profiles and Schema.org
  • A Yahoo initiative of many years ago (which one)? (Bart)
  • PoolParty presentation (Taxonomy Driven UX by Andreas Blumauer)
  • Personalized learning (Elsevier education application for nurses)
  • LD-R, which is a 'toolkit' of web components using Linked Data
  • Towards Data-Aware User Interfaces (LinkedIn article)


Issues:

  • The flexible graph structures of Linked Data must be converted to standard tree structures that are usually used in UI’s.


Possible Synergies between Graph Data & Machine Learning?[bewerken]

Nicky van Oorschot of Netage pitched this topic

Netage has worked on a project for one of their customers with a lot of historic data, where they have used data shapes and machine learning to detect patterns and clusters in their firewire data. The main challenge in combining data shapes and machine learning is that semantic and statistical models are different and cannot be easily combined.

In the current solution Netage is working with a mathematical model, where the assumption is made that the semantics of the firewire data is implicitly available within this data (e.g. certain fires only happen at certain locations). Netage is looking for ways to use the semantics and the relationships within the firewire data within a statistical context.

Questions to be answered (also more in general):

  • How-to improve Machine Learning by making use of the semantics in data?
  • How-to use Machine Learning for improving Linked Data publishing?


Related initiatives:

  • 2 to 3 Linked Data research initiatives at universities (e.g. the Maestro project at the VU Amsterdam that will start this year to research how we can better combine symbolic and non-symbolic systems. RDF2vec is still in a very early development stage).
  • Around 10 Machine Learning research initiatives at universities.


But how can we stimulate possible synergies between these research initiatives and research paths and directions.

How-To Automatically Derive The Semantics of Data from The Data Itself?[bewerken]

Wouter Beek of the VU pitched this topic?

Question to be answered:

How can we derive the definitions of known but undefined concepts automatically from text documents where these concepts are used in the text?

Possible solution scenario’s & related initiatives:


SKOS Taxonomies versus More Exact Knowledge Modelling?[bewerken]

Gerard Kuys of Ordina and DBpedia NL Chapter pitched this topic

Question to be answered:

What is the optimal mix of SKOS and OWL when we look at:

  • how to make data disclosure as easy as possible and
  • how to make a detailed and exact models of knowledge domains


SKOS can be used for creating thesauri, which works well for data disclosure. But SKOS falls short when very detailed and exact models have to be created for a knowledge domain.

Related initiatives:


Agreed upon follow-up actions:

  • One (or more) members of the DBpedia NL Chapter are going to participate in the SKOS vs OWL for Interoperability Community Group (action: Gerard)
  • Members of the Kadaster Linked Data team will share their best practices within the PLDN community (action: Erwin & Pano)


How-To Deal with Conflicting Statements over Time in History?[bewerken]

Pano Maria of Taxonic pitched this topic

Question to be answered:

Statements about reality can change over time and can become in conflict with each other. The question is how we can model this in a way that the historical data on these statements can be query-ed in a convenient way (as simple as possible or better not too complex). Possible solution scenario’s:

  • Each version of an object is a Named Graph, but query-ing the data of this scenario is complex and another question is how you can define the relationships between Named Graphs.
  • Linked Data Fragments (LDF) with Memento and HDT, a scalable solution for historic linked data, but this is a single source solution at document level (and not an endpoint or node).


Useful link to other sources: