Step 8: Publish the data

In this step the dataset is made available on the Internet. There are different options for publishing the dataset. A good practice is to make use of several options, so that data users have a choice and can select the method that best suits their purposes.

One option is to publish the dataset as a flat file. Often used syntaxes are: RDF/XML (.rdf) and Turtle (.ttl). LODRefine, the tool recommended in Step 5 to convert the data to RDF can export to both formats. The resulting files can simply be put on a webserver.

Another, more advanced, way to make the data available is to store it in a triple store and serve it through a SPARQL-endpoint. If you provide a SPARQL Endpoint you allow others to query your linked data/ metadata. You can provide links to the data set download files (dumps) or the SPARQL endpoint. Download files relieve your server from strong crawling/querying activity for people interested in bulk loading (e.g. indexing) your dataset. SPARQL endpoints allow people to select a subset of their interest through a query.

If you have a SPARQL graph please provide information, such as the name of the SPARQL graph in the metadata of your dataset. It is also important that you publish your metadata on a central data broker to give it more visibility and increase the reuse of your dataset. The metadata quality dimension important for this step of the guideline are defined in the table below.

Dimension	Definition	Source	Metrics
Accessibility	Extent to which information is available or easily retrievable. Extent to which data are easily found and linked to (API).	Knight & Burn [1], ODI	various from Zaveri [2]
Format: Machine-readable	If the data is machine readable.	ODI

Consumers of Linked Data do not have the luxury of talking to a database administrator who could help them understand a schema. Therefore, a best practice for publishing a Linked Data set is to make it “self-describing” e.g. by adding metadata as described in Step 7. Self-describing data suggests that information about the encodings used for each representation is provided explicitly within the representation.

Several frameworks/ tools are available for hosting RDF data. One of them is Sesame, an open source framework for storing and querying RDF data. Sesame can be installed on any appropriate server. A web interface, the OpenRDF Workbench, enables you to create a new RDF repository and upload the RDF triples created in Step 5 from a file. Once the data is uploaded to the Sesame, users can query the dataset with SPARQL, the standard query language for linked data.

Other options to publish your data include the following platforms:

Swirrl: Commercial software as a service publishing platform.
LOD Cloud: This group catalogs data sets that are available on the Web as Linked Data and contain data links pointing at other Linked Data sets.
Open Data overheid: The Dutch National Open Data platform where governmental organizations can register their open datasets.
City-SDK: A webservice offering unified and direct access to open data from government, commercial and crowd sources alike. Cities can open up their data using CitySDK.
Platform Linked Data Nederland: Platform that offers organizations to publish their linked open data
Open data Nederland: A registry listing all the open datasets of the Netherlands on one single website.
CKAN: A powerful data management system that makes data accessible by providing tools to streamline publishing, sharing, finding and using data.

[1] Knight, S. A., & Burn, J. (2005). Developing a framework for assessing information quality on the World Wide Web. Informing Science, 8, 159-172.

[2] Zaveri, Amrapali, et al. "Quality assessment methodologies for linked open data." Submitted to Semantic Web Journal (2013).

Go back to overview

Step 8: Publish the data

Nieuwsbrief

Mogelijk gemaakt door

Leden