Click here to load reader

LDIF Lightening Talk

  • View
    366

  • Download
    0

Embed Size (px)

DESCRIPTION

LDIF translates heterogeneous Linked Data from the Web into a clean, local target representation while keeping track of data provenance.

Text of LDIF Lightening Talk

  • 1. LDIFLinked Data Integration Framework
  • 2. | LINKED DATA CHALLENGES Data sources that overlap in content may: use a wide range of different RDF vocabularies use different identiers for the same real-world entity provide conicting values for the same properties Implications: Queries are usually hand-crafted against individual sources no different than an API Improvised or manual merging of entities Integrating public datasets with internal databases poses the same problems
  • 3. | LDIF LDIF homogenizes Linked Data from multiple sources into a clean, local target representation while keeping track of data provenance 1 Collect data: Managed download and update 2 Translate data into a single target vocabulary 3 Resolve identier aliases into local target URIs 4 Cleanse data resolving the conicting values 5 Output Open source (Apache License, Version 2.0) Collaboration between Freie Universitt Berlin and mes|semantics
  • 4. | LDIF PIPELINE1 Collect data Supported data sources:2 Translate data RDF dumps (various formats) SPARQL Endpoints3 Resolve identities Crawling Linked Data4 Cleanse data5 Output
  • 5. | LDIF PIPELINE1 Collect data Sources use a wide range of different RDF vocabularies2 Translate data dbpedia-owl: City3 Resolve identities schema:Place R2R local:City fb:location.citytown4 Cleanse data5 Output Mappings expressed in RDF (Turtle) Simple mappings using OWL / RDFs statements (x rdfs:subClassOf y) Complex mappings with SPARQL expressivity Transformation functions
  • 6. | LDIF PIPELINE1 Collect data Sources use different identiers for the same entity2 Translate data Berlin, Germany , Berlin, CT 1N O 3 24 52 Berlin, MD3 Resolve identities 13 Berlin, NJ Berlin, MA4 Cleanse data Berlin =5 Output Berlin Silk Berlin, , N 1 O 3 Germany 2 24 5 13 Proles expressed in XML Supports various comparators and transformations
  • 7. | LDIF PIPELINE Sources provide different values for the same property1 Collect data Berlin2 Translate data population is 3.4M3 Resolve identities Berlin4 Cleanse data population Berlin is 3.5M Sieve population is 3.5M5 Output Proles expressed in XML Supports various quality assessment policies and conict resolution methods
  • 8. | LDIF PIPELINE1 Collect data Output options:2 Translate data N-Quads3 Resolve identities N-Triples SPARQL Update Stream4 Cleanse data5 Output Provenance tracking using Named Graphs
  • 9. ! |!!! LDIF ARCHITECTUREApplication!Layer! Application!Code!! SPARQL!or!RDF!API! !!!!!!LDIF!! !!Data!Access,!! Data! Identity! Data!Quality!Integration!and!! Web!Data! Integrated! Translation! Resolution! and!Fusion! Access!Module! Web!Data!Storage!Layer! ! Module! Module! Module! ! ! HTTP!Web!of!Data! HTTP! HTTP! HTTP! RDFa! LD!Wrapper! LD!Wrapper!Publication!Layer! RDF/X ML! Database!A! Database!B! CMS!
  • 10. | LDIF VERSIONS In-memory keeps all intermediate results in memory fast, but scalability limited by local RAM RDF Store (TDB) stores intermediate results in a Jena TDB RDF store can process more data than In-memory but doesnt scale Cluster (Hadoop) scales by parallelizing work across multiple machines using Hadoop can process a virtually unlimited amount of data
  • 11. | THANK YOU Website: http://ldif.wbsg.de Google group: http://bit.ly/ldifgroup Supported in part by Vulcan Inc. as part of its Project Halo EU FP7 project LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943)