28
DANS is een instituut van KNAW en NWO Data Archiving and Networked Services Data Archiving and Networked Services Stop making tools ! Nobody likes them anyway... Christophe Guéret (@cgueret) New Trends in eHumanities 16 April 2015 DANS is een instituut van KNAW en NWO

Stop making tools! Nobody likes them anyway

Embed Size (px)

Citation preview

Page 1: Stop making tools! Nobody likes them anyway

DANS is een instituut van KNAW en NWO

Data Archiving and Networked ServicesData Archiving and Networked Services

Stop making tools !Nobody likes them anyway...

Christophe Guéret (@cgueret)

New Trends in eHumanities16 April 2015

DANS is een instituut van KNAW en NWO

Page 2: Stop making tools! Nobody likes them anyway

Data-driven research

Data collection

Data cleaning and integration

Data processing Tool

New data Existing data

Happy users

Page 3: Stop making tools! Nobody likes them anyway

What kind of tool ?● Could be

– An interactive web site – An “app” for smart phones– A stand-alone software

● Goal is to always let users consume the data for their need

● Actual tooling will depend on the skills and preferences of the team member coding it!

Page 4: Stop making tools! Nobody likes them anyway

Behaving scholars do a bit more

Data collection

Data cleaning and integration

Data processing Tool

New data Existing data

Happy users

Page 5: Stop making tools! Nobody likes them anyway

The myths of long term use● Data and software sent to a digital trusted

repository will for sure be re-used later

● Tools can be maintained after the project and further improved to fit new needs

● If the tool is not being used enough it should be adapted to fit more user needs

Page 6: Stop making tools! Nobody likes them anyway

In reality● Data that is not easy to use is not used

● Tools are not maintained once the person who coded it has moved onto other things

● It is not possible to make everyone happy and fit all research questions with one tool

Page 7: Stop making tools! Nobody likes them anyway

Data re-use: could you do it ?CEDAR all open on github: data, queries and scripts.

● Usage example:– Download dumps

– Install triple store

– Load data & wait

– Recursively query for provenance

Page 8: Stop making tools! Nobody likes them anyway

Data is the important thing

http://redmonk.com/jgovernor/2007/04/05/why-applications-are-like-fish-and-data-is-like-wine/

Data

Tool

Page 9: Stop making tools! Nobody likes them anyway

Where we're going we don't need “tools”

Page 10: Stop making tools! Nobody likes them anyway

So what needs to be done ?Do not bake the data into the tool. Instead build the tool on top of the data, and ensure others can do the same

Data collection

Data cleaning and integration

Data processing Data exposition

Tool 1 Tool 2 ...

Page 11: Stop making tools! Nobody likes them anyway

In fact, do not write any tool● Focus on exposing the data

– Less time spent coding and less code– Easier and cheaper to maintain

● To increase availability, expose your data on the Web

● Exposing != Make a package and put it somewhere

Page 12: Stop making tools! Nobody likes them anyway

The magic keyword 1 : “API”● “In computer programming, an application

programming interface (API) is a set of routines, protocols, and tools for building software applications” - Wikipedia

● Regardless of data, all the software you use is a layered cake bound by software APIs– Presentation software > GUI toolkit >

Rendering System > Operating System > Hardware

Page 13: Stop making tools! Nobody likes them anyway

Example (courtesy of Wikipedia)● In this code “nextLine” and “close” are part

of the API of “Scanner”

Page 14: Stop making tools! Nobody likes them anyway

APIs can be on the Web too● HTTP can be used as an API too.

● Get a specific record from a database– http://example.com/api?action=show&id=500

● Delete a record in a database– http://example.com/api?action=delete&id=500

● But don't do it that way! This is abusing the role of the “GET” method from HTTP

Page 15: Stop making tools! Nobody likes them anyway

Generic design for tool + API● Tools consume the data provided by a set of

APIs over the Web

● If you are coding tools– Forget about server-side page rendering– Learn Javascript

Data API ToolMySQL, R, ... HTTP, JSON, ...

Page 16: Stop making tools! Nobody likes them anyway

The magic keyword 2 : “REST”● “Representational State Transfer (REST) is a

software architecture style consisting of guidelines and best practices for creating scalable web services” - Wikipedia

● For example: instead of using GET to do a delete just use the DELETE method from HTTP on the target resource

Page 17: Stop making tools! Nobody likes them anyway

The magic keyword 3 : “JSON”● “JSON (/ d e s n/ JAY-s n), or JavaScript ˈ ʒ ɪ ə ə

Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs” - Wikipedia

Page 18: Stop making tools! Nobody likes them anyway

A step further with JSON-LD● JSON-LD is Linked Data expressed in JSON.

Let users follow links across datasets● Example of JSON data that is not JSON-LD

Ok, but what is the API call to get more information about the board ?

● Need to figure it out in some way● With LD you would get a link

Part of the result from http://api.openonderwijsdata.nl/api/v1/get_document/duo/po_school/2013-20YF

Page 19: Stop making tools! Nobody likes them anyway

Web APIs● There is a lot of them (> 12k) and their

number is increasing rapidly. See: http://www.programmableweb.com/

● Some examples:– https://dev.twitter.com/rest/public

– http://www.slideshare.net/developers/documentation

– http://developer.rottentomatoes.com/docs

– https://www.flickr.com/services/api/

Page 20: Stop making tools! Nobody likes them anyway

Bonuses

© All Seeing, Flickr

Page 21: Stop making tools! Nobody likes them anyway

Give less to share more● Noticed something about the examples given

in the previous slide on Web APIs ?

● None of them would give you a copy of their dataset, yet they have an API to let you access the data !

● => API enable fine-grained access to data

Page 22: Stop making tools! Nobody likes them anyway

Monetize a service, not a dataset● APIs open up the opportunity for monetizing

the usage of the data instead of the data itself

● Users can be charged per API call

● Similar “download VS API” approaches– Paid game VS Free to play– Music download VS Streaming music

Page 23: Stop making tools! Nobody likes them anyway

Extra technical bonuses ● Most of the processing happens on the client

side, so less resources needed to serve the data

● Finer tracking of data usage

● Extra possibilities to do caching, do round-robin, use CDNs etc => more easy to scale

Page 24: Stop making tools! Nobody likes them anyway

Ending on some more examples...

Page 25: Stop making tools! Nobody likes them anyway

Facilityregistry.orgThe website is the API. No interface of any kind

Page 26: Stop making tools! Nobody likes them anyway

Nlgis.nlAPI and a simple data visualisation tool using it

Page 27: Stop making tools! Nobody likes them anyway

Lod.cedar-project.nl/cedarGeneric query interface + extra API

Page 28: Stop making tools! Nobody likes them anyway

To summarise● When your data is ready to be shared make first

an API for it. This will minimise friction in re-use.

● If you want/need to write a end-user tool make it use your own API (and others !)

● Plan maintenance for the API to keep it running.