19
LODStats

Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Embed Size (px)

Citation preview

Page 1: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

LODStats

Page 2: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Introduction

Description and System Architecture

Dataset Model

Use Cases

Agenda

Data Web Statistics (Summary)

Conclusions

Page 3: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

How to comprehend this data?

3

● Data portals● Big nucleus datasets● SPARQL endpoints

Introduction

Page 4: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

9960+RDF Datasets on the Data Portals

4

Page 5: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Calculate statistical metrics User interface

5

Aggregates datasets from the largest data portals

LODStats: Web Application

SPARQL interface

“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]

Page 6: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

6

CKAN Aggregator

LODStats: System Architecture

Scan largest CKAN repos Filter out RDF datasets

“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]

Page 7: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

7

LODStats core application

LODStats: System Architecture (cont.)

Queue RDF datasets Calculate statistics

Page 8: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

LODStats: Provisioning

Docker image per component

docker-compose.yml for the whole project

Sustainable and platform independent deployment8

Page 9: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

LODStats: Provisioning (cont.)

9

web:

restart: always

build: ./web

links:

- db

- rabbitmq

environment:

- LODSTATS_DB=db

- RABBITMQ=rabbitmq

rabbitmq:

restart: always

image: rabbitmq:3.6.1

db:

restart: always

build: ./db

virtuoso:

restart: always

build: ./virtuoso

environment:

- DBA_PASSWORD=dba

- SPARQL_UPDATE=false

- DEFAULT_GRAPH=http://lodstats.aksw.org/

nginx:

build: ./nginx

restart: always

links:

- web

- virtuoso

environment:

- VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu

Page 10: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

LODStats: Provisioning (cont.)

10

$ git pull https://github.com/AKSW/lodstats.docker $ docker-compose build$ docker-compose up -d

Page 11: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

11

Data Model

Page 12: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

12

Data Web Statistics Summary

More statistics are available from SPARQL endpoint

2011 2016

Datasets 422 9,644

Links 3% 40%

Data Portals datahub.io publicdata.eu, data.gov, datahub.io

Page 13: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Privacy Analysis Does dataset

contain sensitive information?

Coverage Analysis Does dataset

contain necessary information?

Quality AnalysisDefine quality metrics using

statistical data.

Vocabulary ReuseFind a suitable vocabulary for your dataset.

13

How can you use LODStats data?

Use Cases

Link Target IdentificationWhich datasets are good

candidates for interlinking?

“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]

Page 15: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Availability● Application

○ Online at: http://lodstats.aksw.org○ LODStats processing module: https://github.com/aksw/lodstats ○ LODStats frontend including SPARQLify mappings:

https://github.com/aksw/lodstats_www ○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker

● Dataset○ Online at: http://lodstats.aksw.org/sparql ○ Datahub.io: https://datahub.io/dataset/lodstats ○ Can be deployed in Virtuoso using docker-compose from deployment repo

Page 16: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Processing of very large datasets (Spark/Hadoop)

Improving usability of the frontend

Extending data collection to crawling

Conclusions & Future WorkLODStats is easily replicable using Docker technology

Page 17: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Augustusplatz 10, Room P905, 04109 Leipzig, Germany

Address

+49-341-97-32260

Phone

[email protected]

Email

twitter.com/akswgroup

http://aksw.com/IvanErmilov

17

Contact Information

Page 18: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Thank YouIvan Ermilov <[email protected]>

Page 19: Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016

Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web

LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012

References1

2Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -- June 2, 2016, Proceedings

3