3
I once worked on the base of the US Naval Medical Center in Bethesda and there are a couple of points about this which I want to mention. Firstly, it was instigated by Franklin Delano Roosevelt he was once out on a drive into the ‘country’ away from the District and stopped in Bethesda, did a quick sketch and then said that this was to be the new naval hospital and that this was the location. Sometimes big things come from a minimalist approach to planning. All the need is the vision and the impetus. The World Wide Web and the developing Semantic Web, the web of data, and the Internet of things in the same way as the Naval Medical Center, started as a small sketch, a few ideas and a drawing and as we have observed, in a few years this “vague but exciting ….” idea has grown wings and gone global in a big way. When I was at “Navy Med” and the military medical school there I remember one of the Navy priests giving a talk. When addressing a crowd from the naval base on the subject of abstinence from meat on Good Friday he said something like, “Don’t just head off today and get surf n turf and ask them to go light on the turf”. In other words, he was pointing out that one could simply respond with a box-ticking exercise but that wouldn’t be keep to the spirit of the ordnance. There is something of a similar check in software engineering where we have the practices of Behaviour Driven Development and Test Driven Development we need to both “do the right thing” [BDD] and “do the thing right” [TDD]. So the Vision, from the policy, is 1) Delivery of improved public services through public bodies making use of the data 2) Wider social and economic benefits through innovative use of the data 3) Accountability and transparency of delivery of our public services To help people in public service make use of d ata they need to be able to find out what’s available, and when they’ve found what they want it needs to be available in a form that they can use. The racing certfor the approach to be taken across Europe and probably globally for cataloguing and discovery is to use the RDF vocabulary for describing data catalogues called DCAT and here are some examples of its use in a data catalogue and also an indicator of the work ongoing to harmonise INSPIRE catalogues with the DCAT vocabulary. The move away from proprietary formats for data is not only useful for interoperability, but it has been seen by UK Gov and others as being a way of encouraging the SME sector to gain government business and perhaps to reduce the costs and risk associated with IT contracts. This is another aspect of public service ‘improvement’. The Standards Hub is the route through which proposals for non-proprietary formats are pitched to the UK Data Standards Board, and it is worth keeping an eye on . With regard to data interoperability, one key consideration is character encoding, and again, the ‘racing cert’ is UTF-8.

2014 11-17 crichton institute talk on open data

Embed Size (px)

Citation preview

Page 1: 2014 11-17 crichton institute talk on open data

I once worked on the base of the US Naval Medical Center in Bethesda and there are a couple of points about this which I want to mention. Firstly, it was instigated by Franklin

Delano Roosevelt – he was once out on a drive into the ‘country’ away from the District and stopped in Bethesda, did a quick sketch and then said that this was to be the new

naval hospital and that this was the location. Sometimes big things come from a minimalist approach to planning. All the need is the vision and the impetus.

The World Wide Web and the developing Semantic Web, the web of data, and the Internet

of things in the same way as the Naval Medical Center, started as a small sketch, a few ideas and a drawing and as we have observed, in a few years this “vague but exciting ….”

idea has grown wings and gone global in a big way.

When I was at “Navy Med” and the military medical school there I remember one of the Navy priests giving a talk. When addressing a crowd from the naval base on the subject

of abstinence from meat on Good Friday he said something like, “Don’t just head off today and get surf ‘n’ turf and ask them to go light on the turf”. In other words, he was pointing out that one could simply respond with a box-ticking exercise but that wouldn’t be keep to

the spirit of the ordnance. There is something of a similar check in software engineering where we have the practices of Behaviour Driven Development and Test Driven

Development – we need to both “do the right thing” [BDD] and “do the thing right” [TDD].

So the Vision, from the policy, is

1) Delivery of improved public services through public bodies making use of the data

2) Wider social and economic benefits through innovative use of the data

3) Accountability and transparency of delivery of our public services

To help people in public service make use of data they need to be able to find out what’s

available, and when they’ve found what they want it needs to be available in a form that they can use.

The “racing cert” for the approach to be taken across Europe and probably globally for

cataloguing and discovery is to use the RDF vocabulary for describing data catalogues called DCAT and here are some examples of its use in a data catalogue and also an indicator of the work ongoing to harmonise INSPIRE catalogues with the DCAT

vocabulary.

The move away from proprietary formats for data is not only useful for interoperability, but it has been seen by UK Gov and others as being a way of encouraging the SME sector to gain government business and perhaps to reduce the costs and risk associated with IT

contracts. This is another aspect of public service ‘improvement’. The Standards Hub is the route through which proposals for non-proprietary formats are pitched to the UK Data

Standards Board, and it is worth keeping an eye on .

With regard to data interoperability, one key consideration is character encoding, and

again, the ‘racing cert’ is UTF-8.

Page 2: 2014 11-17 crichton institute talk on open data

There are a range of text based formats for documents and data that are all suitable in different ways to make your data interoperable. The key aspects of the data types that are

important to note are that they are all simple text formats and that in various ways, through nesting and other markup, they provide structure to the data and in this way make the data

“smarter” than if the structure was absent. As Eric Raymond said, “smart data and dumb code is a lot better than the other way round”.

To innovate with data people need to know what it is about, they need to be able to merge data from different sources with the minimum hassle, and their life is made easier if the

data is available in useful ‘smart’ formats such as JSON. Obviously, there are many innovators who are not skilled programmers, and the provision of tools for application

development will be helpful encouragement those wanting to start in this area. It is a simple matter of seed and soil. There is no point in putting data ‘out there’ and just expecting things to happen without any further effort.

Equally, we don’t want to spend our own time reinventing the wheel. In all ways we need

to re-use what is available because it improves interoperability and it speeds up development. It makes sure that we are doing the thing right. So when it comes to providing semantics, if a community have already developed terminologies for the domain

we are working in, and the definitions are exactly what we need, then reuse the ones that are ‘out there’ in the wild.

The UK Government Linked Data working group has spent a lot of time working out the best way of designing URLs to act as globally unique identifiers for ‘things’. When you

have to mint URL identifiers use these standard patterns. It will help you get the job done faster and help you get it right first time.

Reuse the experiences, promotional materials and toolsets of other projects for your

project. The EC “Citadel on the move” project is all about helping localities bring their data into the 21st century – and to do this there are data converters and mobile app development web services and toolkits. And you can either join in the “Citadel on the

move” or (because their software is on an open license) you can host your own.

Accountability and transparency will be improved by adopting all the approaches with data that I’ve mentioned earlier, but to make this easier, to really enter into the spirit of the policy goal, local data needs to be easily accessed from what can sometimes be complex

and very large datasets. The road traffic accident data, for example, is published for Scotland by the Department for Transport in a multi-megabyte download that comprises 3

tables of normalised data. The linking of the tables and subsetting to get data even at the city level is beyond most citizens. Helpful subsetting needs to be made available. This can be done either by data intermediaries or by using well designed APIs. Tutorials too

ensure that the accountability and transparency is real, that we are attending to the soil as well as just scattering the seeds.

So, what are the design patterns that are helpful in industrialising the conversion of data from proprietary containers to interoperable, mergeable forms? Excel and PDf can be

pipelined through Tika to be converted in bulk to XHTML. There might be some loss of charts, but it is a good route to data conversion. Without developing a triplestore-based

approach, existing relational databases can be exposed to SPARQL queries using converters such as D2R or Virtuoso. A more traditional API approach can be used to

Page 3: 2014 11-17 crichton institute talk on open data

generate textual outputs. Both of these will allow helpful subsetting of data, making the provision of ‘data as a service’

The publication of data on the web is still in its infancy, and the best ways of actually doing

it are still being worked out. This is why I used words like “racing cert” earlier. Projects like the W3C/EC “Share PSI 2.0” which I participate in are working across Europe to gather and evaluate best practice. We are working towards providing a set of guidance in

2015/6. You can get involved in this work. The output is going to be given special attention by the W3C “Data on the web best practices working group” which is due to

report in 2016. You can get more ideas and keep up with developing trends at a number of sites including these listed in the slide.

In trying to do the right thing we need to ensure that we do the thing right. This is made easier by using standards and re-using tools and other artefacts that are made available

under open licenses by others who are on the same journey. Together we are creating the new world in which we empower a society and economy to make the most of data.

Peter Winstanley

2014-11-12