Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China

Preview:

Citation preview

Toward Scalable Reasoning over Annotated RDF Data Using MapReduce

Chang Liu1, Guilin Qi2

1Shanghai Jiao Tong University2Southeast University, China

MotivationMore interests to represent additional

information on top of RDFTime, uncertainty, trust, and provenance=> Annotated RDF

Large amount of dataYAGO2

Problem: Large Scale Reasoning

Motivation (cont’d)Recent work on scalable reasoning using

MapReduceWebPIE (ISWC ‘09, ESWC ‘10)Fuzzy pD* (ISWC ‘11)

Our ideaLarge scale annoated RDF reasoner using

MapReduce

Background: Annotated RDFSyntax:

Deductive rules:Subproperty, Subclass, Domain, Range,

GeneralizationExample:

Subproperty (a)

Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012)

Background: MapReduce

Naïve ImplementationSubproperty (a)

Mapper Mapper Mapper

Reducer Reducer Reducer

(X, P, Y) : (P,sp,Q) :

(X,Q,Y) :

Key Value

P1 X Y

2 Q

Challenges and solutionsGeneralization Rule

Delete triples from the data set

Large data reconstruction cost

SolutionOnly perform at the beginning and at the endCombine Generalization Rule with other rules

E.g. when a reducer generates and , it generates instead.

Challenges and solutions (cont’d)Unnecessary Derivation

E.g. Waste a lot of computation time

SolutionIncorporate the annotation into mapped keyE.g.

Map to ((t1, p), (1, s,o, [1,2])) Map to (t3, p), (2, q, [3,4])) They will not be grouped together!

Challenges and solutions (cont’d)Fixpoint Calculation

Subproperty/subclass rules require fixpoint iteration

SolutionLoad subproperty/subclass schema triples into

memoryCalculate the closure

Shortest path calculation Floyd-Warshall style algorithm

(𝑥1 , sp , 𝑥2 ) : 𝜆1 , (𝑥2 , sp , 𝑥3 ): 𝜆2 ,…, (𝑥𝑛 , sp , 𝑥𝑛+1 ) :𝜆𝑛⇒ (𝑥1 , sp ,𝑥𝑛+1 ): 𝜆1⊗…⊗𝜆𝑛

𝑥1 𝑥2 𝑥𝑛+1…“Shortest”

path

Experiment setupDataset

Fuzzified DBPedia core ontologyfpdLUBM 1000, 2000, 4000, 8000

Cluster25 machine with 75 mapper/reducer slots

Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012)

Experiment result - fuzzy DBPedia

#units 128 64 32 16 8 4 2

Time(sec.)

122.653

136.861

146.393

170.859

282.802

446.917

822.269

Speedup

6.70 6.01 5.62 4.81 2.91 1.84 1.00

Dataset: fuzzified DBPedia core ontology

Results:

Experiment result – fpdLUBM

Number of Universities

Time of FuzzyPD (minutes)

Time of WebPIE (minutes)

1000 38.8 41.32

2000 66.97 74.57

4000 110.40 130.87

8000 215.48 210.01

Experimental results of FuzzyPD and WebPIE

Experiment result– fpdLUBM (cont’d)

Number of units Time(minutes) Speedup

128 38.80 4.01

64 53.15 2.93

32 91.58 1.70

16 155.47 1.00

Scalability over number of units

Experiment result– fpdLUBM (cont’d)Scalability over number of units

Experiment result– fpdLUBM (cont’d)

Number of universities

Input (Mtriples)

Output (Mtriples)

Time (minutes)

Throughput (Ktriples/second)

1000 155.51 92.01 38.8 39.52

2000 310.71 185.97 66.97 46.28

4000 621.46 380.06 110.40 57.37

8000 1243.20 792.54 215.50 61.29

Scalability over data volume

Conclusion and Future workWe show how to design MapReduce

algorithms to achieve scalable annotated RDFS reasoning

Several challenges along with solutions

Future workMore experiments on annotated RDFS

ontologiesAnnotated OWL 2 RL

Q&A

Recommended