Stop making tools! Nobody likes them anyway

Preview:

Citation preview

DANS is een instituut van KNAW en NWO

Data Archiving and Networked ServicesData Archiving and Networked Services

Stop making tools !Nobody likes them anyway...

Christophe Guéret (@cgueret)

New Trends in eHumanities16 April 2015

DANS is een instituut van KNAW en NWO

Data-driven research

Data collection

Data cleaning and integration

Data processing Tool

New data Existing data

Happy users

What kind of tool ?● Could be

– An interactive web site – An “app” for smart phones– A stand-alone software

● Goal is to always let users consume the data for their need

● Actual tooling will depend on the skills and preferences of the team member coding it!

Behaving scholars do a bit more

Data collection

Data cleaning and integration

Data processing Tool

New data Existing data

Happy users

The myths of long term use● Data and software sent to a digital trusted

repository will for sure be re-used later

● Tools can be maintained after the project and further improved to fit new needs

● If the tool is not being used enough it should be adapted to fit more user needs

In reality● Data that is not easy to use is not used

● Tools are not maintained once the person who coded it has moved onto other things

● It is not possible to make everyone happy and fit all research questions with one tool

Data re-use: could you do it ?CEDAR all open on github: data, queries and scripts.

● Usage example:– Download dumps

– Install triple store

– Load data & wait

– Recursively query for provenance

Data is the important thing

http://redmonk.com/jgovernor/2007/04/05/why-applications-are-like-fish-and-data-is-like-wine/

Data

Tool

Where we're going we don't need “tools”

So what needs to be done ?Do not bake the data into the tool. Instead build the tool on top of the data, and ensure others can do the same

Data collection

Data cleaning and integration

Data processing Data exposition

Tool 1 Tool 2 ...

In fact, do not write any tool● Focus on exposing the data

– Less time spent coding and less code– Easier and cheaper to maintain

● To increase availability, expose your data on the Web

● Exposing != Make a package and put it somewhere

The magic keyword 1 : “API”● “In computer programming, an application

programming interface (API) is a set of routines, protocols, and tools for building software applications” - Wikipedia

● Regardless of data, all the software you use is a layered cake bound by software APIs– Presentation software > GUI toolkit >

Rendering System > Operating System > Hardware

Example (courtesy of Wikipedia)● In this code “nextLine” and “close” are part

of the API of “Scanner”

APIs can be on the Web too● HTTP can be used as an API too.

● Get a specific record from a database– http://example.com/api?action=show&id=500

● Delete a record in a database– http://example.com/api?action=delete&id=500

● But don't do it that way! This is abusing the role of the “GET” method from HTTP

Generic design for tool + API● Tools consume the data provided by a set of

APIs over the Web

● If you are coding tools– Forget about server-side page rendering– Learn Javascript

Data API ToolMySQL, R, ... HTTP, JSON, ...

The magic keyword 2 : “REST”● “Representational State Transfer (REST) is a

software architecture style consisting of guidelines and best practices for creating scalable web services” - Wikipedia

● For example: instead of using GET to do a delete just use the DELETE method from HTTP on the target resource

The magic keyword 3 : “JSON”● “JSON (/ d e s n/ JAY-s n), or JavaScript ˈ ʒ ɪ ə ə

Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs” - Wikipedia

A step further with JSON-LD● JSON-LD is Linked Data expressed in JSON.

Let users follow links across datasets● Example of JSON data that is not JSON-LD

Ok, but what is the API call to get more information about the board ?

● Need to figure it out in some way● With LD you would get a link

Part of the result from http://api.openonderwijsdata.nl/api/v1/get_document/duo/po_school/2013-20YF

Web APIs● There is a lot of them (> 12k) and their

number is increasing rapidly. See: http://www.programmableweb.com/

● Some examples:– https://dev.twitter.com/rest/public

– http://www.slideshare.net/developers/documentation

– http://developer.rottentomatoes.com/docs

– https://www.flickr.com/services/api/

Bonuses

© All Seeing, Flickr

Give less to share more● Noticed something about the examples given

in the previous slide on Web APIs ?

● None of them would give you a copy of their dataset, yet they have an API to let you access the data !

● => API enable fine-grained access to data

Monetize a service, not a dataset● APIs open up the opportunity for monetizing

the usage of the data instead of the data itself

● Users can be charged per API call

● Similar “download VS API” approaches– Paid game VS Free to play– Music download VS Streaming music

Extra technical bonuses ● Most of the processing happens on the client

side, so less resources needed to serve the data

● Finer tracking of data usage

● Extra possibilities to do caching, do round-robin, use CDNs etc => more easy to scale

Ending on some more examples...

Facilityregistry.orgThe website is the API. No interface of any kind

Nlgis.nlAPI and a simple data visualisation tool using it

Lod.cedar-project.nl/cedarGeneric query interface + extra API

To summarise● When your data is ready to be shared make first

an API for it. This will minimise friction in re-use.

● If you want/need to write a end-user tool make it use your own API (and others !)

● Plan maintenance for the API to keep it running.

Recommended