64
1 A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright State University Committee: Dr. Amit Sheth (advisor), Dr. T.K. Prasad, Dr. Soon M. Chung, Dr. Christopher Barton (EES WSU), Dr. Kate Beard (SISE U. of Maine)

A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

Embed Size (px)

Citation preview

Page 1: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

1

A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data

Matthew PerryPh.D. Dissertation Defense

Kno.e.sis Center, Wright State University

Committee: Dr. Amit Sheth (advisor), Dr. T.K. Prasad, Dr. Soon M. Chung, Dr. Christopher Barton (EES WSU), Dr. Kate Beard (SISE U. of Maine)

Page 2: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

2

Three Dimensions of Information

Fred Smith moved into the house at 244 Elm Street on November 16, 2007

Thematic Dimension: What Spatial Dimension: Where

Temporal Dimension: When

Page 3: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

3

Background

Page 4: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

4

• What is an ontology?– Agreed-upon formalization of concepts and relationships in the

real world

• Types of ontologies?– General-purpose vs. Domain ontologies

• Parts of an ontology– Classes – types or logical groups of objects– Relationships – how objects relate to each other– Attributes –features and characteristics of objects– Instances – members of Classes who have Attributes and

participate in Relationships

Ontology

Schema e.g. Student attends University

Data e.g. ‘Matt’ attends ‘Wright State University’

Page 5: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

5

Representing ontologies and instance data

• W W W Consortium (W3C) standards– Resource Description Framework (RDF)

• Language for representing information about resources• Resources are identified by Uniform Resource Identifiers

(URIs) – globally-unique• Common framework for expressing information allows

exchange and reuse without loss of meaning• Graph-based data model• Relationships are first class objects

Page 6: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

6

rdfs:Class

knoesis:Politician

knoesis:Speech

rdf:Propertyrdfs:Literal

knoesis:gives

knoesis:name

rdfs:domain

rdfs:domain

rdfs:range

rdfs:range

knoesis:Politician_123

knoesis:Speech_456

“Hillary Clinton”

knoesis:gives

name

knoesis:Person

rdf:typerdfs:subClassOfstatement

Defining Classes:<knoesis:Person> <rdf:type> <rdfs:Class> . Subject Predicate Object

Defining Class/Property Hierarchies:<knoesis:Politician> <rdfs:subClassOf> <knoesis:Person> . Subject Predicate Object

Defining Properties:<knoesis:gives> <rdf:type> <rdf:Property> . Subject Predicate Object

Defining Properties (domain and range):<knoesis:gives> <rdfs:domain> <knoesis:Politician> .<knoesis:gives> <rdfs:range> <knoesis:Politician> . Subject Predicate Object

Statement (triple):<knoesis:Politician_123> <knoesis:name> “Hillary Clinton” . Subject Predicate Object

Statement (triple):<knoesis:Politician_123> <knoesis:gives> <knoesis:Speech_456> . Subject Predicate Object

Directed Labeled Graph

Page 7: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

7

rdfs:Class

knoesis:Politician

knoesis:Speech

rdf:Propertyrdfs:Literal

knoesis:gives

knoesis:name

rdfs:domain

rdfs:domain

rdfs:range

rdfs:range

knoesis:Politician_123

knoesis:Speech_456

“Hillary Clinton”

knoesis:gives

name

knoesis:Person

rdf:typerdfs:subClassOfstatement

Rule:(x, rdf:type, y) and (y, rdfs:subClassOf, z)

(x, rdf:type, z)

Asserted:(knoesis:Politician_123, rdf:type, knoesis:Politician)(knoesis:Politician, rdfs:subClassOf, knoesis:Person)

Infer:(knoesis:Politician_123, rdf:type, knoesis:Person)

RDF(S) Inferencing

Page 8: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

8

Outline

• Background– Motivation– Related Work

• STT Modeling Approach• STT Query Operators • Implementation Scheme• Experimental Evaluation• Query Language Support

Page 9: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

9

E1:Reviewer

E2:Paper E5:Person

E4:Person

E3:Paper E7:Submission

E6:Person

author_of

author_ofauthor_of

author_ofauthor_of

author_of

friend_of

friend_of

Aggregated RDF Instance Base

Ontology Schemas

XMLHTML RDBMS

TEXT

How is entity1 (Reviewer) related to entity2 (Submission) ?

SemDis Project

Semantic Analytics: Searching, browsing and analyzing semantically meaningful connections among named entities where an ontology provides the context or domain semantics

What do we need?

Data model that represents relationships explicitly as first class objects

Ability to model semantics of the relationships

Tools for efficient storage and querying of these relationships

Page 10: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

10

An Example: Battlefield Intelligence

?Person?Symptom

Chemical_X

?Military_Event?Location_1

Enemy_Group_Y

?Location_2

?Enemy

participated_in

has_symptom

induces

located_at

spotted_at

member_of

How close are these locations in space?

How are these eventsrelated in time?

SELECT ?pFROM TABLE(spatial_eval(‘(?p has_symptom ?s)(Chemical_X induces ?s) (?p participated_in ?m)(?m located_at ?l1)’, ‘?l1’, ‘(?e member_of Enemy_Group_y)’); )(?e spotted_at ?l2)’, ‘?l2’, ‘geo_distance(distance=2 unit=mile)’);

Page 11: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

11

Application Areas: Semantics + Space and Time

• Semantic Sensor Web– Web-accessible sensor networks and archived sensor data

that can be discovered and accessed using standard application protocols and application program interfaces.1

• Event Web– Event Web organizes data in terms of events and

experiences and allows access from users perspectives. For each event, Event Web collects and organizes audio, visual, textual, and other data to provide people an environment for experiencing the event from their perspective. … Unlike events, hypertext has no notion of time, space or semantic structures other than often ad-hoc hyperlinks.2

1. Botts, M., Percivall, G., Reed, C., and Davidson, J. (2007). OGC Sensor Web Enablement: Overview and High Level Architecture (OGC 07-165). Technical Report, Open Geospatial Consortium.

2. Jain, R. (2008). EventWeb: Developing a Human-centered Computing System. IEEE Computer, 41(2):42–50.

Page 12: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

12

Objective

• Goal– Enable Semantic Analytics over thematic, spatial and

temporal dimensions

• Shortcomings of State of the Art– Current GIS technology does not support complex thematic

analytics operations

– Current Semantic Analytics technology does not support spatial and temporal relationship analysis

Page 13: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

13

Contributions

• An ontology-based spatiotemporal modeling approach using temporal RDF

• A formalization of a set of spatial, temporal and thematic query operators for the proposed modeling approach

• A SQL-based implementation of the proposed query operators (storage, indexing, inferencing, query processing)

• An extension of the SPARQL RDF query language: SPARQL-ST

• A detailed performance study using large synthetic and real-world RDF datasets

Page 14: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

14

Broad Differences from Related Work

• ST Modeling– Represent thematic entities as first class objects rather than directly

attached attributes of spatial objects– Provide many-to-many mapping between thematic and spatial objects

• ST Querying– Utilize thematic relationships to connect entities to spatial regions in a

variety of ways (contexts)– Analyze ST properties of a given entity w.r.t. different contexts– Dynamic binding of objects to ST properties

• ST Data on Semantic Web– Focus on relationship-centric nature of RDF data for analytical queries– Implicit relationships (e.g., distance)– Look at query language aspects and performance issues– Only system supporting both spatial and temporal

Page 15: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

19

STT Modeling Approach

Page 16: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

21

Example Domain Ontology

Page 17: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

22

Temporal RDF: Incorporating Temporal Information

Student

Undergraduate Graduate

rdfs:subClassOfrdfs:subClassOf

Student1

rdf:type : [2004, 2008]rdf:type : [2002, 2004]

rdf:type[?, ?]

Temporal InferencingInterval Union: (Student1, rdf:type, Student) : [2002, 2008]

1. Claudio Gutiérrez, Carlos A. Hurtado, Alejandro A. Vaisman. “Temporal RDF”. ESWC 2005: 93-107

Associate temporal label with a statement that represents the valid time of the statement

(Student1, rdf:type, Graduate) : [2004, 2008]

Page 18: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

24

STT Query Operators

Page 19: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

25

Querying in the STT dimensions

• Define a notion of context based on a graph pattern– Query about entities w.r.t. a given context

• Associate spatial region with an entity w.r.t. a context

• Associate temporal interval with an entity w.r.t. a context

• How are entities related in space and time w.r.t. a given context

Page 20: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

26

E1:Soldier

E2:Soldier

E3:Soldier

E5:Battle

E4:Address

E6:Address

E7:Battle

occurred_at

occurred_at

located_at located_at

lives_at

lives_at

assigned_to

E8:Military_UnitE8:Military_Unit

assigned_to

participates_in

participates_in

Georeferenced Coordinate Space

(Spatial Regions)

Dynamic EntitiesSpatial OccurrentsNamed Places

Contexts Linking Non-Spatial Entities to Spatial Entities

ResidencyBattle Participation

E1:Soldier

Page 21: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

27

Context Definition

Graph Pattern: recursive definitionBasis: a tuple from (UL U VN) X (U U VN) X (UL U VN) is a graph pattern (triple pattern)Recursion: if P1 and P2 are graph patterns, then (P1 AND P2) is a graph pattern

Semantics1 of a graph pattern are defined in terms of a function [[.]], which takes a graphpattern and returns a set of mappings where a mapping μ : VN RT is a function fromVN to RT

1. Perez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ISWC 2006

Page 22: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

28

Context Definition

A spatial context is a 2-tuple (GP, v) where:

1) GP is a graph pattern2) v var(GP) is a variable in GP identifying a Spatial_Region instance

Example:(‘(?x assigned_to ?y) (?y participates_in ?z) (?z occurred_at ?s)’, ‘?s’)

Page 23: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

29

Spatial Operators

spatial_extent ((GP, v))G {(μ, s)}

Given: a spatial context (GP, v), a temporal RDF graph GFind: {(μ, s) | μ [[GP]]TRIPLES(G) and s = geom(μ(v))}

Example: What are the properties of the 101st Airborne Division w.r.t. battle participation?

ANS spatial_extent(‘(<101st Airborne Div> participates_in ?x) (?x occurred_at ?s)’, ‘?s’)G

Page 24: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

30

Spatial Operators

spatial_restrict ((GP, v), sf($s))G {(μ, s)}

Given: a spatial context (GP, v), a spatial formula sf defined over S and a variable $s, a temporal RDF graph GFind: {(μ, s) | μ [[GP]]TRIPLES(G) and s = geom(μ(v)) and sf evaluates to true for $s = s}

Example: Which military units have spatial extents that are within 20 miles of (48.45 N, 44.30 E)?

ANS spatial_restrict(‘(?x participates_in ?y), (?y occurred_at ?s)’, ‘?s’, distance($s, point(48.45 N, 44.30 E)) < 20 miles)G

Page 25: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

31

Spatial Operators

spatial_eval ((GP1, v1), (GP2, v2), sf($s1, $s2))G {(μ1, s1, μ2, s2)}

Given: a spatial context (GP1, v1), a spatial context (GP2, v2), a spatial formula sf defined over S and variables $s1, $s2, a temporal RDF graph GFind: {(μ1, s1, μ2, s2) | μ1 [[GP1]]TRIPLES(G) and μ2 [[GP2]]TRIPLES(G) and s1 = geom(μ1(v2)) and s2 = geom(μ2(v2)) and sf evaluates to true for $s1 = s1 and $s2 = s2}Example: Which military unit’s operational area overlaps the operational area of the 3rd Armored Division?

ANS spatial_restrict(‘(?x1 participates_in ?y1), (?y1 occurred_at ?s1)’, ‘?s1’, ‘(<3rd Armored Div> participates in ?y2) (?y2 occurred_at s2)’, ‘?s2’, overlap-bdy-intersect($s1, $s2))G

Page 26: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

32

Temporal Operators

Initial Definitions:

For each statement e = (s, p, o) TRIPLES(G), let temporal(e) = {t | (s, p, o) : [t] G}

For a set of time points T’ T,let contig_intervals(T’) = {[ti, tj] | for all t T : (if ti t and t tj then t T’) and ti-1 T’ and tj+1 T’}

ExampleSuppose:

T = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}T’ = {2, 3, 4, 7, 8}

Then:contig_intervals (T’) = {[2, 4], [7,8]}

Page 27: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

33

Temporal Operators

Given a set of temporal triples E = {e1, e2, …, en},we define the interval extension of E, int_extension(E) as the set:

contig_intervals(temporal(e1)) Xcontig_intervals(temporal(e2)) X …

contig_intervals(temporal(en))

ExampleSuppose:

E = {e1, e2, e3},contig_intervals(temporal(e1)) = {[2, 4], [7, 8]},contig_intervals(temporal(e2)) = {[1, 5], [7, 9]},contig_intervals(temporal(e3)) = {[4, 5]}

Then:int_extension(E) = {{[2, 4], [1, 5], [4, 5]}, {[2, 4], [7, 9], [4, 5]},

{[7, 8], [1, 5], [4, 5]}, {[7, 8], [7, 9], [4, 5]}}

Page 28: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

34

Temporal Operators

Given a set of time intervals I = {[s1, t1], [s2, t2], …, [sn, tn]}smin = min1<=i<=n si, smax = max1<=i<=n si

tmin = min1<=i<=n ti, tmax = max1<=i<=n ti

Intersect(I) = [smax, tmin], or NULL if tmin < smax

Range(I) = [smin, tmax]

Platoon#456 Soldier#789Soldier#123

assigned_to:[3, 12] assigned_to:[6, 20]

Intersect [6, 12]

Range [3, 20]

Page 29: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

35

temporal_extent (GP, IT)G {(μ, i)}

Given: a graph pattern GP, an interval type IT {intersect, range}, a temporal RDF graph GFind: {(μ, i) | μ [[GP]]TRIPLES(G) and i intersect/range(int_extension(μ(GP))) }

Temporal Operators

Example: Find all pairs of soldiers who were members of the 101st Airborne Division at the same time and return the times of joint membership?

ANS temporal_extent(‘(?x assigned_to <101st Airborne Div>) (?y assigned_to <101st Airborne Div>)’, ‘intersect’)G

Page 30: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

36

temporal_restrict (GP, IT, tf($t))G {(μ, i)}

Given: a graph pattern GP, an interval type IT {intersect, range}, a temporal formula tf defined over I and a variable $t, a temporal RDF graph GFind: {(μ, i) | μ [[GP]]TRIPLES(G) and i int/range(int_extension(μ(GP))) and tf evaluates to true for $t = i}

Temporal Operators

Example: Which members of the 3rd Armored Division participated in battles during September 1944?

ANS temporal_restrict(‘(?x assigned_to <3rd Armored Div>) (<3rd Armored Div> participates_in ?y)’, ‘intersect’, during($t, [09:01:1944, 09:31:1944]) = true)G

Page 31: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

37

Example: Which speeches by President Roosevelt were given during a military event

ANS spatial_restrict(‘(<President Roosevelt> gives ?x)’, ‘intersect’, ‘(?y participates_in ?z)’, ‘intersect’, during($t1, $t2) = true)G

temporal_eval (GP1, IT1, GP2, IT2, tf($t1, $t2))G {(μ1, i1, μ2, i2)}

Given: a graph pattern GP1, a graph pattern GP2, an interval type IT1 {intersect, range}, an interval type IT2 {intersect, range}, a temporal formula tf defined over I and variables $t1, $t2, a temporal RDF graph GFind: {(μ1, i1, μ2, i2) | μ1 [[GP1]]TRIPLES(G) and μ2 [[GP2]]TRIPLES(G) and i1 int/range(int_closure(μ1(v1))) and i2 int/range(int_closure(μ2(v2))) and tf evaluates to true for $t1 = i1 and $t2 = i2}

Temporal Operators

Page 32: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

38

Implementation Scheme

Page 33: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

39

Overview

• Extended ORDBMS (Oracle 10g)– Defined storage and indexing scheme– User-defined functions for temporal RDFS inferencing– User-defined functions for querying

• Challenges– Thematic relationships can be directly stated but spatial and temporal

relationships require additional computation– Spatial and temporal properties of subgraphs aren’t known until query

execution time … challenging to index

• RDF(S) Inferencing– If statements have an associated valid time, this must be taken into

account when performing inferencing

– (x, rdfs:subClassOf, y) : [1, 4] AND (y, rdfs:subClassOf, z) : [3, 5] (x, rdfs:subClassOf, z) : [3, 4]

Page 34: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

40

Existing Oracle Technology

• Semantic Technology Component– Storage Structures for RDF(S) Data (non-spatial, non-

temporal)– RDFS Inference Procedures (non-temporal)– SQL-based Querying (non-spatial, non-temporal)

• Spatial Component– Spatial Types – SDO_GEOMETRY

• Implementation of Spatial_Region

– Spatial Indexing– Spatial Operators

Page 35: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

41

SQL-based Querying Approach

SQL Table Functions

SELECT X, YFROM TABLE (Table_Func(…));

X Y Z

a b c

d e f

… … …

Page 36: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

42

Spatial Functions

spatial_extent (graphPattern VARCHAR, spatialVar VARCHAR, ontology RDFModels, <geom SDO_GEOMETRY>, <spatialRelation VARCHAR>)return AnyDataSet;

spatial_eval (graphPattern VARCHAR, spatialVar VARCHAR, graphPattern2 VARCHAR, spatialVar2 VARCHAR, spatialRelation VARCHAR, ontology RDFModels) return AnyDataSet;

Page 37: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

43

Temporal Functions

temporal_extent (graphPattern VARCHAR, intervalType VARCHAR, ontology RDFModels, <start DATE>, <end DATE>, <temporalRel VARCHAR>)return AnyDataSet;

temporal_eval (graphPattern VARCHAR, intervalType VARCHAR, graphPattern2 VARCHAR, intervalType2 VARCHAR, temporalRel VARCHAR, ontology RDFModels)return AnyDataSet;

Page 38: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

44

Storage Scheme

Spatial Indexing Procedure

Temporal Indexing Procedure Load RDF Data with Oracle

Thematic Indexes (on TemporalTriples)

(subj_id, prop_id, obj_id)(prop_id, subj_id, obj_id)(obj_id, prop_id, subj_id)

Page 39: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

45

Temporal Inferencing

RDFS Inferencing Rules

1. (x, rdf:type, y) AND (y, rdfs:subClassOf, z) (x, rdf:type, z)

2. (x, p, y) AND (p, rdfs:domain, a) (x, rdf:type, a)

3. (x, p, y) AND (p, rdfs:range, b) (y, rdf:type, b)

4. (x, p, y) AND (p, rdfs:subPropertyOf, z) (x, z, y)

Example:(x, participates_in, e):[2, 5] (y, participates_in, e):[3, 7](z, participates_in, e):[6, 9]

By rule 3: (e, rdf:type, event):[2, 9]

Interval Union

Page 40: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

46

Temporal Inferencing Algorithm

1. create table asserted_temporal_triples (subj, prop, obj, start_date, end_date)

2. perform schema-level inferencing

3. perform instance-level inferencing

4. sort redundant_triples by (subj_id, prop_id, obj_id, start)

5. make a single pass and merge overlapping intervals for same statement

6. insert updated triples and intervals into final temporal_triples table

asserted_temporal_triples(x, participates_in, e):[2, 5] (y, participates_in, e):[3, 7](z, participates_in, e):[6, 9]

redundant_triples(e, rdf:type, event):[2, 5](e, rdf:type, event):[3, 7](e, rdf:type, event):[6, 9] temporal_triples

(e, rdf:type, event):[2, 9]

Page 41: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

47

Query Function Implementation

• Oracle Extensibility Framework– ODCITable Interface

• Start()– Prepare SQL query over TemporalTriples and

SpatialData tables – Base Query• Fetch()

– Retrieve row from base query and do additional processing (e.g., construct int/range intervals)

• Close()– Final cleanup (e.g., close DB cursors)

Page 42: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

48

Temporal Filtering Example

1.2.3.

Intersection:Range:

1 2 3 4 5 6 7 8

[2, 7][1, 5][4, 8]

[4, 5][1, 8]

Partial Filter on Each Edge: during (3, 6)

Intersection: (start <= 6) and (end >= 3)

Intersection: (start > 3) and (end < 6)

Page 43: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

49

Experimental Evaluation

Page 44: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

50

Evaluation

• Environment– Oracle 10g R2 on 64-bit Solaris 9– Four 1.8 GHz Ultra Sparc IV processors 8GB RAM– 512 MB buffer cache, 512 MB pga_aggregate_target

• Objective– Test scalability w.r.t (1) dataset size, (2) query complexity

• Dataset– Synthetic RDF Graph1 – Historical Battlefield Analysis (SynHist)

• Spatial: US Census block group polygons

• Temporal: random intervals

– Real-world Data – Political Domain (GovTrack)• Spatial: US Census congressional district polygons

• Temporal: given in data

1. Matthew Perry "TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications", AIS SIGSEMIS Bulletin Volume 2 Issue 2 (April - June) 2005, pp. 46 - 48

Page 45: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

51

Dataset Characteristics

Page 46: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

52

Scalability w.r.t. Dataset Size

Page 47: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

53

Scalability w.r.t. Dataset Size

Page 48: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

54

Scalability w.r.t. Graph Pattern Size

Page 49: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

55

Scalability w.r.t. Graph Pattern Size

Page 50: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

56

Scalability of Spatiotemporal Queries

Page 51: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

57

Query Language Support

Page 52: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

58

Overview

• SPARQL– W3C recommended query language for RDF data (as

of Jan. 15, 2008)– Graph pattern-based queries (subgraph match)

• SPARQL-ST– Spatial variables– Temporal variables– Spatial filter expressions– Temporal filter expressions

Page 53: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

59

Intro to SPARQL

Basic Query:SELECT ?b, ?pWHERE { ?b rdf:type usbill:HouseBill . ?b usbill:sponsor ?p }

b p

<http://.../106/bills/h2916> <http://.../people/N000002>

<http://.../107/bills/h3041> <http://.../people/D000275>

Filtered Query:SELECT ?bWHERE { ?b rdf:type usbill:HouseBill . ?b rdfs:label ?l . FILTER (regex(?l, “handgun”)) }

Page 54: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

60

Spatiotemporal Triple Pattern: 4-tuple from (UL VN VS) x (U VN ) x (UL VN VS) x (VT)

Spatial Triple Pattern: 3-tuple from (UL VN VS) x (U VN ) x (UL VN VS)

SPARQL-ST: Spatiotemporal Graph Pattern

Sets of Terms:UL = URIs U Literals U = URIs RT = RDF Terms

Sets of Variables:VN = Variables

Triple Pattern: 3-tuple from (UL VN) x (U VN) x (UL VN)

VS = Spatial Variables VT = Temporal Variables

Spatiotemporal graph patterns are constructed from triple patterns and/or spatial triple patterns and/or spatiotemporal triple patterns

Page 55: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

61

SPARQL-ST Mappings

Politician_123

Committee_456

District_789

Polygon_1

Linear_Ring_1

NAD83

-85.32 34.1, -85.33 34.2, …, -85.32 34.1

on_committee : [1990, 2000]

represents : [1984, 1992]

located_at : [1990, 2000]

uses_crs : [-∞, + ∞]

exterior : [-∞, + ∞]

lrPosList : [-∞, + ∞]

SELECT ?c, %s, #t1WHERE { <Politician_123> on_committee ?c #t1 . <Politician_123> represents ?d #t2 . ?d located_at %s #t3 }

Maps to single URI

Maps to a set of triplesMaps to a time interval

Page 56: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

62

SPARQL-ST by Example

SELECT ?s1, ?s2, intersect(#t1, #t2, #t3, #t4)WHERE { ?s1 usgov:hasRole ?a #t1 . ?a usgov:forOffice usgov:senate/oh #t2 . ?s2 usgov:hasRole ?b #t3 . ?b usgov:forOffice usgov:senate/oh #t4 . FILTER (?s1 != ?s2) }

Find all politicians who were senators of Ohio at the same time and return the times of joint senatorship.

Page 57: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

63

SPARQL-ST by Example

SELECT ?p, ?bWHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:isPartOf usgov:congress/house #t3 . ?p usgov:sponsor ?b #t4 . TEMPORAL FILTER ( after(intersect(#t1, #t2, #t3, #t4), interval(04:02:2008, 04:02:2008, MM:DD:YYYY))) }

Find all House members who sponsored a bill after April 2, 2008

Page 58: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

64

SPARQL-ST by Example

SELECT ?nWHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h . SPATIAL FILTER (distance(%g, %h) <= 100 miles) }

Find all politicians that represent areas within 100 miles of the district represented by Nancy Pelosi.

Page 59: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

65

SPARQL-ST by Example

SELECT ?pWHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( -75.14 40.88, -70.77 40.88, -70.77 42.35, -75.14 42.35, -75.14 40.88))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(03:01:2008, 03:31:2008, MM:DD:YYYY)))}

Find all politicians representing congressional districts within a given geographical area at any time in March 2006

Page 60: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

66

Conclusions

• Showed how the relationship-centric nature of the RDF data model can extend the state-of-the-art in modeling and querying STT data

• Modeling– Many-to-many mapping between thematic and spatial objects (formalized as a

context)

• Querying– Support spatial and temporal relationships in graph pattern queries– More complex thematic aspects than traditional STT querying– Proposed SPARQL-ST to integrate with current standards

• Implementation– Good scalability on large synthetic and real-world datasets– Only system for spatial and temporal RDF

• Future Work– Semantic Associations– Sensor Web and Event Web applications

Page 61: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

67

Publications

• Dissertation Related

– Matthew Perry, Amit Sheth, Farshad Hakimpour, Prateek Jain. “Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS ‘07), Mexico City, MX, November 29 – 30, 2007

– Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA, November 10 - 11, 2006

– Matthew Perry and Amit Sheth. “A Framework for Spatial, Temporal and Thematic Analytics over Semantic Web Data”, submitted to VLDB Journal

– Matthew Perry and Amit Sheth. “SPARQL-ST: Extending SPARQL for Spatial and Temporal Queries”, in preparation.

Page 62: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

68

Publications

• Other STT Related

– Amit Sheth and Matthew Perry. “Traveling the Semantic Web through Space, Time and Theme”, IEEE Internet Computing, Volume 12, Issue 2, March/April 2008

– Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, Amit Sheth. "Data Processing in Space, Time, and Semantics Dimensions", Terra Cognita 2006 - Directions to the Geospatial Semantic Web, in conjunction with the Fifth International Semantic Web Conference (ISWC '06), Athens, GA, November 6, 2006

– Matthew Perry, Amit Sheth, Ismailcem Budak Arpinar. "Geospatial and Temporal Semantic Analytics", To appear in Encylopedia of Geoinformatics, Hassan A. Karimi (Ed), Idea-Group Inc., 2008

– Farshad Hakimpour, Boanerges Aleman-Meza, Matthew Perry, Amit Sheth. "Spatiotemporal-Thematic Data Processing in Semantic Web", To appear in The Geospatial Web, Springer-Verlag, May, 2007

• Proposals

– Amit Sheth (PI), T.K. Prasad. “Spatial, Temporal and Thematic Analysis of Semantic Web Data” NSF-Small

Page 63: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

69

Publications

• SemDis Related

– Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Ismailcem Budak Arpinar, Amit Sheth. "Peer-to-Peer Discovery of Semantic Associations", Second International Workshop on Peer-to-Peer Knowledge Management (P2PKM '05), San Diego, CA, July 17, 2005

– Cartic Ramakrishnan, William Milnor, Matthew Perry, Amit Sheth. "Discovering Informative Connection Subgraphs in Multi-relational Graphs", SIGKDD Explorations Special Issue on Link Mining, Volume 7, Issue 2, December 2005

– Matthew Perry "TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications", AIS SIGSEMIS Bulletin Volume 2 Issue 2 (April - June) 2005, pp. 46 – 48

– Matthew Perry and Eric Stiles. "SEMPL: A Semantic Portal", Thirteenth International World Wide Web Conference (WWW '04), New York, NY, May 17-22, 2004

• Semantics and Databases

– Matthew Perry, Souripriya Das, Melliyal Annamalai, Eugene Inseok Chong, Zhe Wu, Jagannathan Srinivasan. “Semantic Similarity based Top-k Queries over Resources categorized using a Taxonomy”, Submitted to VLDB 2009

Page 64: A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data Matthew Perry Ph.D. Dissertation Defense Kno.e.sis Center, Wright

70

Questions?