Upload
ivan-ermilov
View
151
Download
1
Embed Size (px)
Citation preview
LODStats
Introduction
Description and System Architecture
Dataset Model
Use Cases
Agenda
Data Web Statistics (Summary)
Conclusions
How to comprehend this data?
3
● Data portals● Big nucleus datasets● SPARQL endpoints
Introduction
9960+RDF Datasets on the Data Portals
4
Calculate statistical metrics User interface
5
Aggregates datasets from the largest data portals
LODStats: Web Application
SPARQL interface
“LODStats – An Extensible Framework for High-performance Dataset Analytics” (EKAW’2012) [1]
6
CKAN Aggregator
LODStats: System Architecture
Scan largest CKAN repos Filter out RDF datasets
“Linked Open Data Statistics: Collection and Exploitation” (KESW’2013) [2]
7
LODStats core application
LODStats: System Architecture (cont.)
Queue RDF datasets Calculate statistics
LODStats: Provisioning
Docker image per component
docker-compose.yml for the whole project
Sustainable and platform independent deployment8
LODStats: Provisioning (cont.)
9
web:
restart: always
build: ./web
links:
- db
- rabbitmq
environment:
- LODSTATS_DB=db
- RABBITMQ=rabbitmq
rabbitmq:
restart: always
image: rabbitmq:3.6.1
db:
restart: always
build: ./db
virtuoso:
restart: always
build: ./virtuoso
environment:
- DBA_PASSWORD=dba
- SPARQL_UPDATE=false
- DEFAULT_GRAPH=http://lodstats.aksw.org/
nginx:
build: ./nginx
restart: always
links:
- web
- virtuoso
environment:
- VIRTUAL_HOST=lodstats.aksw.org,stats.lod2.eu
LODStats: Provisioning (cont.)
10
$ git pull https://github.com/AKSW/lodstats.docker $ docker-compose build$ docker-compose up -d
11
Data Model
12
Data Web Statistics Summary
More statistics are available from SPARQL endpoint
2011 2016
Datasets 422 9,644
Links 3% 40%
Data Portals datahub.io publicdata.eu, data.gov, datahub.io
Privacy Analysis Does dataset
contain sensitive information?
Coverage Analysis Does dataset
contain necessary information?
Quality AnalysisDefine quality metrics using
statistical data.
Vocabulary ReuseFind a suitable vocabulary for your dataset.
13
How can you use LODStats data?
Use Cases
Link Target IdentificationWhich datasets are good
candidates for interlinking?
“Detecting Similar Linked Datasets Using Topic Modelling” (ESWC’2016) [3]
Availability● Application
○ Online at: http://lodstats.aksw.org○ LODStats processing module: https://github.com/aksw/lodstats ○ LODStats frontend including SPARQLify mappings:
https://github.com/aksw/lodstats_www ○ Deployment setup (docker): https://github.com/AKSW/lodstats.docker
● Dataset○ Online at: http://lodstats.aksw.org/sparql ○ Datahub.io: https://datahub.io/dataset/lodstats ○ Can be deployed in Virtuoso using docker-compose from deployment repo
Processing of very large datasets (Spark/Hadoop)
Improving usability of the frontend
Extending data collection to crawling
Conclusions & Future WorkLODStats is easily replicable using Docker technology
Augustusplatz 10, Room P905, 04109 Leipzig, Germany
Address
+49-341-97-32260
Phone
twitter.com/akswgroup
http://aksw.com/IvanErmilov
17
Contact Information
Thank YouIvan Ermilov <[email protected]>
Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web
LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin, and Jens Lehmann in Proceedings of the EKAW 2012
References1
2Detecting Similar Linked Datasets Using Topic Modelling by Michael Röder, Axel-Cyrille Ngonga Ngomo, Ivan Ermilov, and Andreas Both in The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -- June 2, 2016, Proceedings
3