Upload
aidan-hogan
View
75
Download
1
Tags:
Embed Size (px)
Citation preview
Skolemising Blank Nodes whilePreserving Isomorphism
Aidan Hogan – DCC, Universidad de Chile
WHY? BLANK NODES ARE GREAT!
When life gives you blank nodes …
Blank Nodes are glue!
Blank Nodes names aren’t important …
(Isomorphic)
Blank nodes are common in real-world data …
Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres "Everything You Always Wanted to Know About Blank Nodes". Journal of Web Semantics 27: pp. 42–69, 2014
BLANK NODES ENABLE SYNTAX SHORTCUTSThey represent implicit nodes in the graphThey help specify order, higher-arity relations, reification, etc., succinctlyThey are common in real-world data
BLANK NODES:WHAT’S THE PROBLEM?
Are two RDF graphs isomorphic?
Are two RDF graphs isomorphic?
RDF ISOMORPHISM IS GI-COMPLETEA general algorithm to see if two RDF graphs are the “same” will (probably) not be tractable
BLANK NODES ADD COMPLEXITY?WHAT TO DO?
RDF 1.1 proposes Skolemisation
But fresh IRIs every time is not ideal
But fresh IRIs every time is not ideal
Would prefer a “consistent” labelling
Would prefer a “consistent” labelling
Compute isomorphically-unique graph hash
Finding duplicate documents from a crawler
CANONICAL LABELLING USEFUL FOR:1. Mapping blank nodes to IRIs 2. Computing unique hashes for RDF graphs
OLD BUT RECURRING QUESTION
An old question that won’t go away …
Jeremy J. Carroll. “Signing RDF Graphs.” ISWC 2003.
Edzard Höfig, Ina Schieferdecker. “Hashing of RDF Graphs and a Solution to the Blank Node Problem.” URSW 2014.
NO EXISTING APPROACH IS GENERAL• Hard cases seem unlikely in practice• Let’s build a general (and thus worst-case exponential) algorithm
that’s efficient for practical cases
NAÏVE CANONICAL LABELLING SCHEME
(Naïve) Canonical labels for blank nodes
But wait … what happens if ... ?
Or another case …
Or another case …
Or another case …
Fixpoint does not distinguish all blank nodes!
NAÏVE: COLOUR BLANK NODES RECURSIVELY UNTIL FIXPOINT• Efficient• Incomplete
CANONICAL LABELLING SCHEME:ALWAYS DISTINGUISH ALL BLANK NODES
Brendan D. McKay. "Practical graph isomorphism". Congressus Numerantium 30: pp. 45–87, 1981.
Start with a (non-distinguished) colouring …
Let’s distinguish a node …
Let’s distinguish a node …
Colouring is no longer a fixpoint!
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Fixpoint reached: still not finished!
So again let’s distinguish another …
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
Now all blank nodes are distinguished!
Blank node labels computed from colour
Let’s go back: first, why pick _:a and _:c?
Okay so: why _:a …
Adapt ideas from the Nauty algorithm (for standard graph isomorphism)
Adapt ideas from the Nauty algorithm (for standard graph isomorphism)
Check all leafs for minimum graph
What happened?
What happened?
What happened?
Automorphisms cause repetitions
CORE ALGORITHM: FIND MINIMAL GRAPH FOLLOWING FIXED COLOURING RULES• Complete• Efficient for many cases?
OKAY … SO WHAT HASHING TO USE?
What about hash collisions?
128 bit: MD5, Murmur3_128160 bit: SHA1
HASHING MAY LEAD TO COLLISIONS• Don’t care what hashing you want to use• 128-bit hash shortest hash with acceptable collision probability• For cryptographic use-cases, SHA-256 or better might be needed
EVALUATION
Evaluation: Real-world Graphs
Evaluation: Nasty Synthetic Graphs
CONCLUSIONS
In loving memory of
Linked Data
2007–2012
Survived by its research
community
_:b1999–2015
Conclusions
Aside: Why GI-Hard?
Aside: Why GI-Hard?(Can Encode Graph Isomorphism as RDF Isomorphism)
if and only if
Aside: Why GI-Complete?(Can we encode RDF isomorphism as graph isomorphism?)
if and only if
?
?
Aside: Why GI-Complete?(Yes: We can encode RDF isomorphism as graph isomorphism)
Aside: Why GI-Complete?(Yes: We can encode RDF isomorphism as graph isomorphism)
if and only if
COMPLETE CANONICAL LABELLING SCHEME
A complete canonical labelling?
Find a canonical labelling for H
Choose the lowest possible graph
COMPLETE: FIND MINIMUM POSSIBLE GRAPH USING FIXED BLANK NODE LABELS• Complete• Inefficient
The need for a graph-level hash
OPTIMISATION: PRUNE THE TREE USING AUTOMORPHISMS
Trim the search treeusing “found” automorphisms
Found Automorphisms …
PRUNING PER AUTOMORPHISMS AVOIDS SYMMETRIC REPETITIONS• Automorphisms are found naturally• Makes very “regular” structures (like cliques) a lot easier• Need to be careful how to manage the automorphism group