34
Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Approximate Semantic Matching of Heterogeneous Events Souleiman Hasan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute (DERI), National University of Ireland, Galway (NUIG) [email protected] http://www.StefanDecker.org/ In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012), Berlin, Germany, 2012.

Approximate Semantic Matching of Heterogeneous Events

Embed Size (px)

DESCRIPTION

Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over a structured representation of Wikipedia and Freebase events. Initial evaluations show that the approach matches events with a maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.

Citation preview

Page 1: Approximate Semantic Matching of Heterogeneous Events

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Approximate Semantic Matching of Heterogeneous Events

Souleiman Hasan, Sean O’Riain, Edward CurryDigital Enterprise Research Institute (DERI), National University of Ireland, Galway (NUIG)

[email protected]://www.StefanDecker.org/

In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012), Berlin, Germany, 2012.

Page 2: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Outline

Introduction Smart Environments Motivational Scenario Related Work

Proposal Approximate Semantic

Matching

Experiments Wikipedia Freebase

Conclusions Q & A

2 of 34

Page 3: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Smart Environments

Smart Homes, Grids, Cities… Internet-of-Things, Sensor Web…

Non-technical users High heterogeneity Trend for dynamic data-driven decision making

by 2020 50 billion devices connected to mobile networks (OECD, 2012)

Event/Situation of InterestNew free parking space near me

Event/Situation of InterestSoccer match played in Berlin........

3 of 34

Page 4: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Event Cloud

CIO

Helpdesk

Maintenance Personnel

Equipment Monitoring

Building

Data Center

Operational Level

Strategic Level

CSO

Motivational Scenario- Enterprise

Various terms used: energy consumption, energy usage….room, space, zone…

Situation of InterestCompany CO2 emissions performance

Dynamic Environments:New events from equipments joining and leaving

Energy usage by global IT department

PUE of the Data Center in Dublin

kWhs used by server 172.16.0.8

4 of 34

Page 5: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Handling of semantically heterogeneous events

Handling of dynamic environments with event types by sources joining and leaving

Low cost of rules management Usability Precision

Requirements

5 of 34

Page 6: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

EPL

Inte

rface

an

d P

ars

er

Single EventMatcher

User

Situation of InterestWhen a floor is empty and its energy usage for an hour is above threshold w.r.t budget then it is an excessive usage

Event Processing

Developer

ERP

BMS

Pattern Matcher

UI

Transl

ati

on

EVENT PROCESSING RULEINSERT INTO ExcessiveEnergyUsageByFloorSELECT a.floor as floorFROM PATTERN [(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor (a.floor=b.floor))].WIN:TIME(1 hour)GROUP BY a.floorWHERE (b.usage) > GetAcceptableThreshold(a.budgetValue)

Templates Repository

Rules Repository

Exe

cuti

on

R

ep

osi

tory

CEP Engine

PC NO XDG26359Floor: 1stusage: 3 kWh

VM: vmdgsit01.deri.ieFloor: 1stusage: 15 kWh

Non-technical users with natural language needs

Separated from the engine

Rules tied to vocabularyHigh cost in case of

heterogeneity or change

6 of 34

Page 7: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Exact Event Processing Paradigm

Requirement Addressing by the paradigm

Semantic Heterogeneity Does not scale out to high heterogeneous environments

Dynamic Environment Does not scale out to high dynamic environments

Rule Management High cost on large heterogeneity and dynamicity

Usability Low

Precision 100% (typically)

7 of 34

Page 8: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Decoupling in Event Systems

Space Time

Synchronization

Event Producer

Event Consumer

Space

Synchronization

Time

Producers and consumers don’t know each other

Participants don’t need to be actively involved in the interaction the same time

Event producers and consumers don’t get blocked to send/receive events

8 of 34

Page 9: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Decoupling in Event Systems

Principle “Removal of explicit dependencies between

participants” (Eugster et al., 2003) Outcome

Scalability

Event Producer

Event Consumer

Space

Synchronization

Time

9 of 34

Page 10: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Semantic Coupling

Current event-based systems keep explicit semantic dependency between participants

Limited scalability in highly heterogeneous and dynamic environment

Event Producer

Event Consumer

Space

Synchronization

Time

Semantic (Event types, property, values)

10 of 34

Page 11: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Current Approaches

Ontology-based (Petrovic et al., 2003), (Zhang & Ye, 2008)… Does not “remove explicit dependency” Hard to achieve ontology agreement a priori at large-

scale of heterogeneity and dynamicism Medium usability, 100% precision typically

Fuzzy sets (Liu & Jacobsen, 2002) Address only event numerical values vs. string values

subscriptions Medium usability, High precision

11 of 34

Page 12: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach

Approximate semantic matching of events

Types & properties possible mappings

EventType(s)

PropertiesValues

Values possible mappings

Pick best overall mapping

Post-matching event processing

Subscription

Type(s)Properties

Values

12 of 34

Page 13: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Semantic Similarity f: Terms X Terms [0,1] term1, term2 are Terms

f(term1, term2)=0 absolute semantic mismatchf(term1,term2)=1 exact match

E.g. Football Match and Soccer Match are similar Relatedness: a general case of similarity

E.g. Football Match and Referee related but not similar Thesaurus-based: e.g. WordNet-based Distributional semantics-based: e.g. Wikipedia

ESA The more Wikipedia articles two terms occurs in, the

more related they are

Background

13 of 34

Page 14: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

event

Football Match

Netherlands National Football Team

2010 FIFA World Cup Final

Johannesburg

FNB stadium

Spain National Football Team

Howard Webb type

name

team

locationlocation

team

referee

Event type “”Soccer MatchEvent team “Spain”Event place “South Africa”

Subscription

14 of 34

Page 15: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

typename

refereeteam

location

typeplaceteam

SubscriptionEvent

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

LinJiang&ConrathLeacock&ChodorowLeskPathResnikGloss VectorWuPalmerWikipedia ESAExactMatcher

Recall

Precision

15 of 34

Page 16: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

typename

refereeteam

location

typeplaceteam

SubscriptionEvent

Determine top m correspondence candidatesRankSimJiiang&Conrath(ps, pe)

Measure properties relatednessfP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))

16 of 34

Page 17: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

typename

refereeteam

location

typeplaceteam

SubscriptionEvent

typelocation

team

typeplaceteam

90%Top 1

typename

referee

typeplaceteam

40%Top 2

17 of 34

Page 18: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football MatchHoward Webb

Spain National Football TeamJohannesburgFNB stadium

Netherlands National Football Team

Soccer MatchSouth AfricaSpain

SubscriptionEvent

Measure values relatedness fV=WikipediaESA(Vs, Ve)

18 of 34

Page 19: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Spain National Football Team

Spain95%

Football MatchHoward Webb

Spain National Football TeamJohannesburgFNB stadium

Netherlands National Football Team

Soccer MatchSouth AfricaSpain

SubscriptionEvent

Netherlands National Football Team

Spain30%

19 of 34

Page 20: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football MatchHoward Webb

Spain National Football TeamJohannesburgFNB stadium

Netherlands National Football Team

Soccer MatchSouth AfricaSpain

SubscriptionEvent

typename

refereeteam

location

typeplaceteam

Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve)

20 of 34

Page 21: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football MatchHoward Webb

Spain National Football TeamJohannesburgFNB stadium

Netherlands National Football Team

Soccer MatchSouth AfricaSpain

SubscriptionEvent

typename

refereeteam

location

typeplaceteam

Determine correspondent event statement Corre by Max fSTMT

21 of 34

Page 22: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Rank within a window Complex Event Processing …

22 of 34

Page 23: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiments Overview

Methodology Prepare an event set that reflect required semantic

heterogeneity (Wikipedia events) Prepare gold standard set of subscriptions that stress

multiple aspects of semantic coupling Validate suitability of semantic approximation from

precision perspective Use a different event set and same subscriptions to

validate low maintainability cost (Freebase events) Evaluation Criteria

Average interpolated Precision-Recall Curve on 11 recall points

Maximal F1 Score over the average curve

23 of 34

Page 24: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Wikipedia Events

Event Set Statistics

Source structured Wikipedia Infoboxes, DBpedia 31 August 2011

Collection Triples directly associated to instances of dbpedia-owl:Event class

Data model RDF

Total # of events 20,156

Total # of distinct event types 4,950

Total # of distinct event properties 1,459

Total # of distinct event values 500,717

Total # of triples 1,502,599

Average # of distinct type per event 7.42

Average # of distinct property per event 30.52

Average # of distinct value per event 54.16

Average # of triple per event 64.67

24 of 34

Page 25: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Wikipedia Events

Example Event Types Football Match Race Music Festival Space Mission Election 10th-Century BC Conflicts Academic Conference Aviation Accident …

25 of 34

Page 26: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Subscription Set

Manually created gold standard set of subscriptions

ID Description Subscription # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"

1 1 NO NO NO

2 Football matches played in the FNB stadium

event type "Football Match"event place "FNB Stadium"

2 2 NO YES NO

3 Events taking place in Wembley stadium

event type "Event"event place "Wembley Stadium"

219 5 NO YES Syntactic

4 Charity events taking place in Wembley stadium

event type "Charity"event place "Wembley Stadium"

29 6 YES YES Semantic+ Syntactic

5 Charity Rock events taking place in Wembley stadium

event type "Charity"event type "Rock"event place "Wembley Stadium"

2 2 YES YES Semantic+ Syntactic

6 Football matches played in the UK

event type "Football Match"event stadium "United Kingdom"

505 603 NO YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match"event team "South America"event stadium "Europe"

20 123,774

NO YES Background Knowledge

26 of 34

Page 27: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Subscription Set

Manually created gold standard set of subscriptions

ID Description Template # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"

1 1 NO NO NO

2 Football matches played in the FNB stadium

event type "Football Match"event place "FNB Stadium"

2 2 NO YES NO

3 Events taking place in Wembley stadium

event type "Event"event place "Wembley Stadium"

219 5 NO YES Syntactic

4 Charity events taking place in Wembley stadium

event type "Charity"event place "Wembley Stadium"

29 6 YES YES Semantic+ Syntactic

5 Charity Rock events taking place in Wembley stadium

event type "Charity"event type "Rock"event place "Wembley Stadium"

2 2 YES YES Semantic+ Syntactic

6 Football matches played in the UK

event type "Football Match"event stadium "United Kingdom"

505 603 NO YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match"event team "South America"event stadium "Europe"

20 123,774

NO YES Background Knowledge

Subscriptionevent type "Event"event place "Wembley Stadium"

SPARQL pattern 1?event rdf:type dbpedia-owl:Event.?event dbpprop:stadium dbpedia:Wembley_Stadium.

SPARQL pattern 2?event rdf:type dbpedia-owl:Event.?event dbpedia-owl:location dbpedia:Wembley_Stadium.

… …

ID

Description Subscription

# of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

3 Events taking place in Wembley stadium

event type "Event"event place "Wembley Stadium"

219 5 NO YES Syntactic

27 of 34

Page 28: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

Jiang&Conrath Wikipedia ESARecall

Precision

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

Jiang&Conrath Wikipedia ESARecall

Precision

Events taking place in Wembley stadium

Football matches played in the UK

Need for a hybrid matcher that combines both

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

0 2^ -25 2^ -20 2^ -15 2^ -10 2^ -5 1

Fre

quen

cy

Semantic similarity or relatedness score (log scale)

Jiang&Conrath WikipediaESA

28 of 34

Page 29: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

Jiang&Conrath Wikipedia ESA HybridRecall

Precision

Matcher Jiang&Conrath Wikipedia ESA HybridMaximal F1 Score 70.06% 44.26% 75.45%Recall 80% 80% 90%Precision 62.31% 30.59% 64.94%

Hybrid matcher outperforms a single similarity or relatedness measure matcher.

29 of 34

Page 30: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Freebase Event Set

Event Set Statistics

Source Freebase events dump 1 December 2011, triples current

Collection Triples directly associated to instances of “fbase:time.event" class

Data model RDF

Total # of events 84,529

Total # of distinct event types 858

Total # of distinct event properties 1,242

Total # of distinct event values 1,199,627

Total # of triples 1,859,338

Average # of distinct type per event 3.33

Average # of distinct property per event 10.67

Average # of distinct value per event 21.66

Average # of triple per event 21.99

30 of 34

Page 31: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Subscription Set

Same as in Experiment 1.

ID Description Subscription # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match"event team "Spain national football team"event stadium "FNB Stadium"

1 1 YES YES NO

2 Football matches played in the FNB stadium

event type "Football Match"event place "FNB Stadium"

8 2 YES YES NO

3 Events taking place in Wembley stadium

event type "Event"event place "Wembley Stadium"

29 5 NO YES NO

4 Charity events taking place in Wembley stadium

event type "Charity"event place "Wembley Stadium"

0 - - - -

5 Charity Rock events taking place in Wembley stadium

event type "Charity"event type "Rock"event place "Wembley Stadium"

0 - - - -

6 Football matches played in the UK

event type "Football Match"event stadium "United Kingdom"

34 1,398 YES YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match"event team "South America"event stadium "Europe"

2 219,600 YES YES Background Knowledge

31 of 34

Page 32: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Results

Matcher Jiang&Conrath Wikipedia ESA HybridMaximal F1 Score 44.60% 70.73% 76.33%Recall 60% 80% 80%Precision 35.49% 63.39% 72.98%

Hybrid matcher gives similar results in Freebase as in DBpedia

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1

Jiang&Conrath Wikipedia ESA HybridRecall

Precision

32 of 34

Page 33: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Conclusions

33 of 34

Exact MatcherApproximate Semantic

MatcherNumber of Required Subscriptions 345,000 7Maximal F1-Score 100% 75.89%

Approximate semantic matcher addresses subscriptions/ rules maintainability cost in heterogeneous and dynamic environments

Approximate semantic matcher is suitable when less than 100% precision is acceptable

A hybrid matcher outperforms a single similarity or relatedness measure matcher.

Page 34: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Future Work

Need to enhance subscription set for more representativeness.

Approximate semantic matcher generates “uncertain” results whose impacts on further event processing functions such as CEP needs to be studied

34 of 34