Semantic Hackday Notes

This blogpost is liable to change rapidly. It should be treated as a work-in-progress

I’m at the Bright Lemon/Kasabi Open Government hackday today and thought that perhaps a blogpost about the day might prove to be a useful record of the day. After a set of round-the-tableintroductions, Leigh Dodds did a brief run through on Kasabi. This introduced the latest NHS datasets loaded onto Kasabi and their functionality.

Kasabi Default APIs

One of the most difficult things for me to get my head around were the 5 default APIs available for every Kasabi hosted dataset. I have linked to the documentation pages for each of these:

I also will discuss each of them below as I start to use and understand them.

SPARQL Endpoint

The SPARQL Endpoint for each dataset will provide access to a SPARQL  API Tester or Experimental API Explorer. It also links to any example SPARQL queries that have been defined for  a dataset e.g. List all UK Primary Care Trusts on some of the NHS data.

Dataset homepages

The homepage for each dataset shows a descriptive section above and, below, three tabs –

  • API – lists the APIs (including the default 5 above) available for a dataset
  • Explore – displays a list of ways that the data can be viewed (including the default “As linked data” layout  – the linked data void description default URL) –
  • Attribution – shows ways in which data or output can be attributed to Kasabi.

Getting started with SPARQL on Kasabi

In order to get started with a a dataset you can actually edit/play with some of the sample queries for a dataset, or, if there are no sample queries, try the following procedure (this procedure uses the CIA World FAct Book dataset):
  1. Open the dataset homepage
  2. Open the Explore tab and click on the “Browse as Linked Data” link in a new tab –
  3. This opens the ‘void description’ page for the dataset which is the default url for the linked data of the dataset and gives some very basic info about the dataset
  4. Go back to the API tab and click on the SPARQL Endpoint for the dataset
  5. Open the experimental API explorer for that dataset
  6. This allows writing some example queries

SPARQL Examples

On the ‘void description’ page, look for any definitions of vocabularies used. These will allow querying of the ‘types’ within the data. There are a couple of ways of doing this. Try this one – go to the experimental API explorer for the CIA dataset and type this in

PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#>

SELECT ?s ?p ?o WHERE {
?p a ns:Country.
}

Running this will give a list of countries in the dataset. And this one will give the first 10 (‘LIMIT 10’) triples in the dataset

PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#>

SELECT * WHERE {
 ?s ?o ?p .  
} LIMIT 10

These queries both specify a single vocabulary – PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#&gt; They then define the data to be returned (‘?s ?p ?o’ and ‘*’) and follow that by defining which data they should be extracted from. Another useful SPARQL query would be the describe query. Try this:

PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#>

DESCRIBE <http://www4.wiwiss.fu-berlin.de/factbook/resource/Ireland>

This query should be returning the equivalent of this CIA page about Ireland, or this page on the German source of the CIA data

The vocab used by Kasabi with this dataset can be seen here http://www4.wiwiss.fu-berlin.de/factbook/page/Ireland This allows us to see a list of all the instances of a particular triple object such as this list of factbookcodes (these are in fact the FIPS_country_codes used by the US Federal government)

PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#>

SELECT * WHERE {
 ?s ns:factbookcode ?p .  
}

Another query might be to find what objects are defined in the dataset. This one does that:

PREFIX ns: <http://www4.wiwiss.fu-berlin.de/factbook/ns#>

SELECT distinct ?o WHERE {
 ?s ?o ?p .  
} ORDER BY ?o

So, one of the next things you might like to do is to have that list as a separate URI. We can do that by creating an API on Kasabi.

Useful links

Note – GitHub:

There is a guthub user called kasabi (github.com/kasabi/kasabi-xsl) which can be used to add xsl files for transforming outputs in new APIs.

Linked data – there’s more … (as Jimmy Cricket used to say)

Ingrid Koehler recently posted a nice blog on Linked Data. Most of it I was aware of and subscribed to but there was one point which had never struck me. As I read it I had a DOH! moment thinking “… that’s so obvious, whay hadn’t I thought of it before!”.

It was her point 5:

Linked data does not have to be open data. Public services would benefit tremendously from using linked data formats. It means that we could stop spending resources on data aggregation and start spending it on analysis and action. Linked data can be used in secure settings to help partners share personal, sensitive or commercial information on performance and resources and help better target those in need or areas for improvement.

I just wonder if we can create tools to make it easier to convert data TO linked data formats, whether we would find more people publishing in those formats? You still need to be a bit of a geek to get data into Linked Data formats.

%d bloggers like this: