19
1 Het begint met een idee BRIDGING THE GAP BETWEEN RESTFUL APIS AND LINKED DATA Albert Meroño-Peñuela Rinke Hoekstra & many others CLARIAH Tech Day 07-10-2016

grlc: Bridging the Gap Between RESTful APIs and Linked Data

Embed Size (px)

Citation preview

Page 1: grlc: Bridging the Gap Between RESTful APIs and Linked Data

1 Het begint met een idee

BRIDGING THE GAP BETWEEN RESTFUL APIS AND LINKED DATA

Albert Meroño-PeñuelaRinke Hoekstra& many others

CLARIAH Tech Day07-10-2016

Page 2: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam2

ACCESSING LINKED DATA

Page 3: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

Multiple Linked Data consuming applications Variety of access interfaces needed

3

ACCESSING LINKED DATA

Page 4: grlc: Bridging the Gap Between RESTful APIs and Linked Data

4 Het begint met een idee4

Page 5: grlc: Bridging the Gap Between RESTful APIs and Linked Data

5 Het begint met een idee5 Het begint met een idee

One .rq file for SPARQL query Good support of query curation

processes> Versioning> Branching> Clone-pull-push

Web-friendly features!> One URI per query> Uniquely identifiable> De-referenceable

(raw.githubusercontent.com)

5 Faculty / department / title presentation

GITHUB AS A HUB OF SPARQL QUERIES

Page 6: grlc: Bridging the Gap Between RESTful APIs and Linked Data

6 Het begint met een idee6 Het begint met een idee

Rinke: this is an asset in itself. We need to be able to keep the queries we use to answer research questions for reproducibility

Page 7: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

Linked Data APIs emerge RESTful entry point to Linked Data hubs for Web applications OpenPHACTS

…but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained

7

MEANWHILE IN THE SEMANTIC WEB…

Page 8: grlc: Bridging the Gap Between RESTful APIs and Linked Data

8 Het begint met een idee8 Het begint met een idee

Cousin of BASIL in a SALAD Same basic principle: 1 SPARQL query = 1

API operation Automatically builds Swagger spec and UI

from SPARQL

But: External query management Organization of SPARQL queries in the

GitHub repo matches organization of the API

Thin layer – nothing stored server-side Maps

> GitHub API> Swagger spec

Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like Linked Data APIs’. SALAD, ESWC (2016)

8 Faculty / department / title presentation

Page 9: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam9

MAPPING GITHUB AND SWAGGER

Page 10: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

10

SPARQL DECORATOR SYNTAX

Page 11: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

11

THE GRLC SERVICE

Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host,

> http://:host/api/:owner/:repo/spec returns the JSON swagger spec

> http://:host/api/:owner/:repo/api-docs returns the swagger UI> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n

calls operation with specifiec parameter values> Uses BASIL’s SPARQL variable name convention for query parameters

Sends requests to> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their

decorators> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference

queries, get the SPARQL, and parse it

Page 12: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

12

DROPDOWNS

• Fills in the swag[paths][op][method][parameters][enum] array

• Uses the de-contextualized triple pattern of the SPARQL query’s BGP against the same SPARQL endpoint

• Very inefficient

• JSON spec caching via reverse proxy

• LOD cache

• Own dimension/codelist cache

• Unmapped parameter ambiguity if the user wants to mix enum with arbitrary parameter values (“all values”)

Page 13: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

13

CONTENT NEGOTIATION

• API endpoints can now end with .content_type (e.g grlc.io/CLARIAH/wp-queries/MyQuery.csv)

• Supports .csv, .json, .html (can be extended)

• grlc sets ‘Accept’ HTTP header and agnostically returns same ‘Content-Type’ as the SPARQL endpoint

• Up to the SPARQL endpoint to accept it

Page 14: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

14

PAGINATION

• Large query results are typically nasty to consuming applications

• Split the result in multiple parts (or “pages”)

• Size? #+ pagination: 100

• Navigating pages

• rel=next,prev,first,last links in the HTTP headers (GitHub API Traversal convention)

• Extra request parameter ?page (defaults to 1)

~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all

HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18447Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last

~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=3

HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18142Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=4>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=prev, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=1>; rel=first, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last

Page 15: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

15

CACHE

• Moved implementation outside of grlc (not its direct responsibility)

• grlc sets HTTP header Cache-Control to public, max-age=900 (15 minutes, customizable)

• nginx caches all grlc generated JSON (and other static/dynamic assets)

• nginx becomes part of the bundle

Page 16: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

16

CONTAINER RELEASE

• Uses docker

• Infrastructure-independent install

• Bundles (composes) all required packages (python, python libs, grlc, nginx). Can be easily extended to more

• Publicly available at hub.docker.com

• One-command server deploy: docker pull clariah/grlc

Page 17: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

The spectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications

grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice

Separates query curation workflows from everything else Allows at the same time

> Web-friendly SPARQL queries> Web-friendly RESTful APIs

Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set

Try it out!> http://grlc.io/ > https://github.com/CLARIAH/grlc 17

CONCLUSIONS

Page 18: grlc: Bridging the Gap Between RESTful APIs and Linked Data

Vrije Universiteit Amsterdam

Finish with the curl –X GET that gives the result of the original query in the crappy script

Page 19: grlc: Bridging the Gap Between RESTful APIs and Linked Data

19 Het begint met een idee

THANK YOU!

@ALBERTMERONYO

DATALEGEND.NETCLARIAH.NL

19