Introduction

“Data is the new gold” or “data is the new oil”. Probably most people have come across one of these expressions in the past years. Which is not a surprise as more data has been generated in the past 2 years than in the history of mankind . And data and analytics are changing the way companies make decisions creating a new world of opportunities. Open data, although not a new phenomenon, recently gets a lot of attention from governmental organizations as well as private companies. Open data is the idea that certain data, such as governmental data, should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time . Big data is often defined in terms of the following three dimensions: volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Not volume, but variety is the most challenging aspect nowadays, according to a Gartner survey. Linked Data describes a method of publishing structured data so that it can be interlinked and becomes more useful, tackling the variety dimension of big data. Linked Open Data (LOD) is a practical way (by defining a set of standards and guidelines) to contribute to the Semantic Web. Semantic means "learning about the meaning”. The semantic web can be defined as a web of connections between information which allows new insights to arise. Information published as Linked Open Data makes it easier to reuse that data, as it includes many references to other sources of knowledge which makes the access to the information more easy.

Although many discussions happen around open data and big data, in the end we should talk about (re-) usable data. Publishing open data that is not being easy to use is pointless, just as it is impossible to combine and use a large variety of data sets (big data) when the data is not really understandable and usable. Linked (Open) Data is a conceptual solution for the re-use of data, which is mainly standards based. The Linked Data stack of standards, concept, approaches, technology, is very large and therefore not always easy to use, or at least it is often not easy to know where to start. Some cookbooks and step-by-step approaches already exists, but they are often lacking either detail, or contain only a small part of the complex picture.

This roadmap at hand contains a 9-step roadmap approach, containing enough detail, that everybody who is interested in hands-on exercise with opening a data set should be capable to do so. Of course some arguable choices have been made, and it is certainly not the only way to create linked open data, but at least it is a pragmatic way of creating linked open data. Not only the technical aspects are addressed but we also raise attention for the selection of open data sets, and put emphasize on organizing the governance around a data set.

In this post we are talking about big, small, linked and open data, and any combination of those. In fact the size has some impact, but merely on the costs of providing data, the need for automation, and the SLA requirements on the data platform. The impact of openness is mainly limited to the license form and the organization of the governance of the dataset (which is covered in a separate publication, called BOMOD). Linked Data does not have much impact on the governance, but will have major impact on the technical steps to take for preparing the data for publication. More and more datasets are published online, but not all of them are reusable and can actually be linked. Several guidelines to come to Linked Data and LOD are available online, the most well know are the 5 star schema from Tim Berners-Lee and the Linked Data Cookbook .

We will use these guidelines as input to come to an extended LOD roadmap which provides a concrete structure and pays special attention to the quality of the datasets and the description of the metadata. Our goal is to make the roadmap as practical as possible, including tools and checklists where possible.

We hope that this roadmap inspires you to start working and experimenting with linked data, and that this roadmaps helps you in setting the first steps. We are open for suggestions about improving this roadmap, and finally if you need help you can either try the Platform Linked Data the Netherlands community, or contact us directly.

Go back to overview