How to use AllegroGraph to work with Linked Open Data
Auteurs
Dr. Jans Aasman (Franz Inc.)
Roel Stap (Data for Use)
Most articles in this book focus on interesting applications of Linked Open Data (LOD). But this chapter describes some simple steps on how to use a triple store, how to load linked open data, and how to create SPARQL queries with a graphical query builder. This should allow users new to these topics to better understand the methods and techniques and thus to better understand the more complex examples later in the book.
What are triples and what is RDF?
For completeness we’ll introduce the concept of triples here but we assume that the readers of this book are familiar with the RDF stack. The Resource Description Framework (RDF) language is used to express data about resources, where ‘resources’ can be interpreted to be anything (a web page, a person, an idea, etc.). The basic building block is the triple, consisting of subject, predicate, object. The subject is a URI, the predicate is some property that is defined for the type (class) of the subject, and the object is either a typed literal or the URI of some other subject. Let’s look at a couple of assertions that express data about resources:
bb:YogiBerra rdf:type bio:Person .
bb:YogiBerra bb:playsPosition bb:Catcher .
bb:YogiBerra bb:careerHomeRuns 358 .
The first assertion says that a ‘resource,’ Yogi Berra, whose URI is defined in the bb: namespace, is of type Person (where the meaning of Person is defined in the bio: namespace), he played the catcher position (where the meaning of playsPosition and Catcher are defined in the bb: namespace), and he had 358 career home runs (where the meaning of careerHomeRuns is defined in the bb: namespace). This is what data looks like in RDF: triples expressed as a subject, a predicate, and an object, separated by spaces, and concluded with a period.
The above description comes from a little mini course in RDF that can be found on the Franz website (www.franz.com).
Most Linked Open Data comes in the form of files containing RDF triples. In order to work efficiently with triples you need to have a triple store database, that is specialized for storing the triples data format. A good triple store allows you to store triples and index them for fast retrieval, to perform SPARQL queries, and to reason dynamically or through materialization. AllegroGraph is such a triple store with some additional unique capabilities. AllegroGraph Provides:
In our example we are going to work with a data set that we extracted from DBPedia, the triple version of the Wikipedia. We took all the information about movies and actors, producers and directors and stored that in a single file (N-Triples Format). This file can be downloaded from our website (see instructions below)
We are going to use a powerful visual navigation tool called Gruff. Gruff is one of the interfaces to AllegroGraph and it allows you to create a new triple store, download triple files to populate the store, and then query triples or display triples on the screen. Gruff comes in two forms, a standalone version that includes a basic version of AllegroGraph, and the server edition. You will want the server edition if you are working with hundreds of millions to billions of triples.
If you just want to look at a few million triples and you don't have easy access to a Linux Server, then you can just install the standalone version. We are going to use the standalone version in this tutorial.
Visit http://www.franz.com/agraph/gruff and go to the download section. Assuming you have a 64-bit Windows machine you should download Gruff v5.0.x for AG 3.3.
Extract the file that you downloaded into a convenient location (which we will refer to as the Gruff directory).
Go into the Gruff directory and double click 'gruff.exe'
The data for or example is in http://bit.ly/126Ng82. Please unzip it and place it in a convenient place.
Creating a triple store is now about as easy as starting an Excel spreadsheet.
First we create a new triple store: File -> New Triple-Store
Because we work with a standalone version we use the name of your local machine, Gruff will probably fill it in for you already. As you can see my laptop is called JansSamsung. Note that you don’t have to fill in a port number. For the Store Folder you will type in the full name of the triple store you are going to create. Make sure it is not an existing directory because it will overwrite that.
Once you click ok, the database will ask you how many triples you expect. Just accept the default. This number is only important if you know you are going to use millions of triples.
Now we are going to load the file with movies and actors in Gruff.
File->Load Triples->Load N-Triples
Gruff will ask if you want to load the triples from a file or from the web. Chose ‘file’ for this tutorial.
And find the place where you stored the file actors.ntriples downloaded from the Franz Inc website. Select it and load. You will see a yellow bar for a few seconds and if that bar disappears the data is ready to be used.
To quickly test the data was loaded. From the Gruff menu:
Display->Display All Triples Up To A Limit.
And after a few seconds you’ll see activity on your screen. Use the wheel on your mouse (or shift- or shift+,) to zoom in and out and then press the letter ‘r’ to reformat the screen.
Just for fun you might want to click on a node and go to the Tabular View. Click around a little bit to become familiar with the data. Go back to the Graph View (View->Graph View) or press the letter ‘g’.
Now we are going to delete all the information from the screen by Remove->Remove-All-Nodes (Don’t worry, it won’t delete any triples, it will just remove the nodes from the screen)
In many cases you start exploring a set of files by typing in some of the concepts that you know that might be in the file. For that we need text indexing (i.e. Key Word Search, like Google).
Display -> Edit Free Text Predicates
A widget will pop up, just select all and then click ok.
Now we want to find Kevin Bacon:
Display-> Display Triples by Freetext Index (or press ‘;’ ) and type Kevin Bacon in the search field.
Browse through the results and choose Kevin Bacon and select ok.
So now you have one node on the screen that we are going to use in the next section
So now we have Kevin Bacon on the screen and we want to see some triples where Kevin is the subject or object. The first thing we want to do is to select the predicates that we want to see on the screen. Type the letter ‘p’ and you’ll see a list of predicates. Choose DS, Director, and Starring and click OK
Now select the Kevin Bacon node and press the letter ‘f’. You’ll see a lot of new nodes come up. Click on a movie and see how that expands by pressing the letter ‘f’. Click a few times on nodes and you’ll see that the screen gets crowded with nodes and links. Zoom in a little bit (use the wheel on your mouse or shift-.) and press the letter ‘r’ to reformat (In the Layout Menu you see all the types of reorganization of the screen provided by Gruff). Below is a screen shot that should look similar on your machine.
Note how you see on the left side the names of the predicates and classes used as well as the corresponding colors.
There are other ways to explore the graph on the screen. Press the letter ‘z’ a few times until only the Kevin Bacon node is on the screen (you might have to zoom back to see his name again).
For example: right click on Kevin and play with the first two options (Display a Linked Node from Menus, Display Linked Nodes from a Tree) to show triples on the screen.
One significant advantage to using a triple store is to let the database find connections between nodes.
First let us clean up the screen by removing all triples from the screen (Remove -> Remove All Triples)
Then use ‘;’ to find Kevin Bacon (or use Display->Display Triples by Freetext Query)
Do this again to find Arnold Schwarzenegger (just type Arnold and you’ll find him). You should now have two nodes on the screen; Arnold Schwarzenegger and Kevin Bacon. Now select Arnold, press ‘shift-f’, and drag the cursor to Kevin and click and you should see something like the picture below
First let us discuss the Tabular View. Assuming that you see Kevin Bacon still on the screen, double click on his name (or press the letter ‘t’) and you are in the tabular view. See the picture below. It is kind of self evident on how to navigate through this view. Note that there is a thick grey line in the middle. Above the grey line you have triples that start with Kevin, below the grey line you have triples where Kevin is in the object position of the triple.
[fig 5]
Another way to explore the triples is to use the outline view. Click on Kevin again and hit the letter ‘O’ and you’ll see this. Note that black text means that you are going deeper into the hierarchy, the blue text means that you see triples that point back at Kevin. Just play with it and you’ll soon understand intuitively.
Gruff will help you write SPARQL Queries. If you know SPARQL to some extent then you can go to the query view by pressing the ‘w’ or View->Query View. Select the SPARQL bullet and then just for fun type this little query that will select a hundred random triples from the triple store.
Select * where { ?x ?y ?z . } limit 100
First click on the ‘Do Query’ button and then click on the Create Visual Graph button.
Now you don’t need to type SPARQL Queries, you can also build them graphically. Here is a brief example.
The query that we are going to build: ‘Who directed the movies that Kevin Bacon starred in?’
We have shown you a simple way to create a triple store, navigate the triples, and create queries. Please note that we only used Gruff with the built in triple store. For larger data sets and working with SPARQL 1.1 you will want to try the combination of the AllegroGraph server and the client side Gruff.
AllegroGraph® is a modern, high-performance, persistent graph database. AllegroGraph uses efficient memory utilization in combination with disk-based storage, enabling it to scale to billions of quads while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications.
Franz Inc. is bronze sponsor of PLDN. They provide a variety of services as part of its Knowledge Graph platform solution: from architectural consulting and technical seminars to training. Franz’s flagship product, AllegroGraph, provides the necessary power and flexibility to address your Knowledge Graph needs.
Resource Description Framework (RDF) is een standaardmodel voor gegevensuitwisseling op het web. RDF heeft functies die het samenvoegen van gegevens vergemakkelijken, zelfs als de onderliggende schema's verschillen, en het ondersteunt specifiek de evolutie van schema's in de loop van de tijd zonder dat alle gegevensgebruikers moeten worden gewijzigd.
Door middel van reasoning, redeneren met data (feiten) en regels, probeer je nieuwe feiten te achterhalen met de verzameling feiten en regels die je op een bepaald moment hebt. Met OWL en RDF Schema kun je redeneren, maar dat heeft zijn beperkingen. Binnen de PLDN community is er interesse in SHACL en SPIN om te bekijken in hoeverre deze de beperkingen van OWL en RDFS kunnen oplossen.
De activiteiten van Platform Linked Data Nederland (PLDN) worden mede mogelijk gemaakt dankzij het Kadaster, TNO, Big Data Value Center (BDVC), ECP, Forum Standaardisatie, Kennisnet, SLO, Waternet, Taxonic, MarkLogic, Triply, Franz Inc., SemmTech, Rijksdienst voor het Cultureel Erfgoed (RCE), Beeld en Geluid, EuroSDR, de KVK en ArchiXL
Wilt u op de hoogte gehouden worden van nieuws en ontwikkelingen binnen PLDN?
Schrijf u dan in voor de nieuwsbrief