Transcript
Page 1: Approximate Semantic Matching of Heterogeneous Events

© Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Approximate Semantic Matching of Heterogeneous Events

Souleiman Hasan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute (DERI)

National University of Ireland, Galway (NUIG)

[email protected] http://www.StefanDecker.org/

In proceedings of DEBS 2012, Berlin, Germany

Page 2: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Hasan S, O’Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012) www.edwardcurry.org

Further Reading

Page 3: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Outline

n  Introduction ¨  Smart Environments

¨  Motivational Scenario

¨  Related Work

n  Proposal ¨  Approximate Semantic

Matching

n  Experiments ¨  Wikipedia

¨  Freebase

n  Conclusions

n  Q & A

3 of 34

Page 4: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Smart Environments

n  Smart Homes, Grids, Cities… n  Internet-of-Things, Sensor Web…

n  Non-technical users n  High heterogeneity

n  Trend for dynamic data-driven decision making

by 2020 50 billion devices connected to mobile networks (OECD, 2012)

Event/Situation of Interest New free parking space near me

Event/Situation of Interest Soccer match played in Berlin

........

4 of 34

Page 5: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

CIO

Helpdesk

Maintenance Personnel

Building

Data Center

CSO

Motivational Scenario- Enterprise

Various terms used: energy consumption, energy usage….

room, space, zone…

Situation of Interest Company CO2 emissions performance

Dynamic Environments:

New events from equipments joining and leaving

Energy usage by global IT department

PUE of the Data Center in Dublin

kWhs used by server 172.16.0.8

5 of 34

Page 6: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

n  Handling of semantically heterogeneous events

n  Handling of dynamic environments with event types by sources joining and leaving

n  Low cost of rules management

n  Usability

n  Precision

Requirements

6 of 34

Page 7: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

EPL

Inte

rfac

e an

d P

arse

r

Single Event Matcher

User

Situation of Interest When a floor is empty and its energy usage for an hour is above threshold w.r.t budget then it is an excessive usage

Event Processing

Developer

ERP

BMS

Pattern Matcher

UI

Tra

nsl

atio

n

EVENT PROCESSING RULE INSERT INTO ExcessiveEnergyUsageByFloor

SELECT a.floor as floor FROM PATTERN

[(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor (a.floor=b.floor))]

.WIN:TIME(1 hour) GROUP BY a.floor

WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue)

Templates Repository

Rules Repository

Exec

uti

on

Rep

osi

tory

CEP Engine

PC NO XDG26359 Floor: 1st usage: 3 kWh

VM: vmdgsit01.deri.ie Floor: 1st usage: 15 kWh

Non-technical users with natural language needs

Separated from the engine

Rules tied to vocabulary High cost in case of

heterogeneity or change

7 of 34

Page 8: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Exact Event Processing Paradigm

Requirement Addressing by the paradigm

Semantic Heterogeneity Does not scale out to high heterogeneous environments

Dynamic Environment Does not scale out to high dynamic environments

Rule Management High cost on large heterogeneity and dynamicity

Usability Low

Precision 100% (typically)

8 of 34

Page 9: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Decoupling in Event Systems

n  Space n  Time

n  Synchronization

Event Producer

Event Consumer

Space

Synchronization

Time

Producers and consumers don’t know each other

Participants don’t need to be actively involved in the interaction the same time

Event producers and consumers don’t get blocked to send/receive events

9 of 34

Page 10: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Decoupling in Event Systems

n  Principle ¨  “Removal of explicit dependencies between

participants” (Eugster et al., 2003)

n  Outcome ¨  Scalability

Event Producer

Event Consumer

Space

Synchronization

Time

10 of 34

Page 11: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Semantic Coupling

n  Current event-based systems keep explicit semantic dependency between participants

n  Limited scalability in highly heterogeneous and dynamic environment

Event Producer

Event Consumer

Space

Synchronization

Time

Semantic (Event types, property, values)

11 of 34

Page 12: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Current Approaches

n  Ontology-based ¨  (Petrovic et al., 2003), (Zhang & Ye, 2008)…

¨  Does not “remove explicit dependency”

¨  Hard to achieve ontology agreement a priori at large-scale of heterogeneity and dynamicism

¨  Medium usability, 100% precision typically

n  Fuzzy sets ¨  (Liu & Jacobsen, 2002)

¨  Address only event numerical values vs. string values subscriptions

¨  Medium usability, High precision

12 of 34

Page 13: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach

n  Approximate semantic matching of events

Types & properties possible mappings

Event Type(s)

Properties Values

Values possible mappings

Pick best overall mapping

Post-matching event processing

Subscription Type(s)

Properties Values

13 of 34

Page 14: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

q  Semantic Similarity q  f: Terms X Terms à [0,1]

q  term1, term2 are Terms q f(term1, term2)=0 absolute semantic mismatch

q f(term1,term2)=1 exact match

q  E.g. Football Match and Soccer Match are similar

q  Relatedness: a general case of similarity q  E.g. Football Match and Referee related but not similar

q  Thesaurus-based: e.g. WordNet-based q  Distributional semantics-based: e.g. Wikipedia ESA

q  The more Wikipedia articles two terms occurs in, the more related they are

Background

14 of 34

Page 15: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

event

Football Match

Netherlands National Football Team

2010 FIFA World Cup Final

Johannesburg

FNB stadium

Spain National Football Team

Howard Webb type name

team

location location

team

referee

Event type “”Soccer Match Event team “Spain” Event place “South Africa”

Subscription

15 of 34

Page 16: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

type name

referee team

location

type place team

Subscription Event

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  

Precision  

Recall  

Lin  

Jiang&Conrath  

Leacock&Chodorow  

Lesk  

Path  

Resnik  

Gloss  Vector  

16 of 34

Page 17: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

type name

referee team

location

type place team

Subscription Event

Determine top m correspondence candidates RankSimJiiang&Conrath(ps, pe)

Measure properties relatedness fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))

17 of 34

Page 18: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

type name

referee team

location

type place team

Subscription Event

type location

team

type place team

90% Top 1

type name

referee

type place team

40% Top 2

18 of 34

Page 19: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football Match Howard Webb

Spain National Football Team Johannesburg

FNB stadium Netherlands National Football Team

Soccer Match South Africa Spain

Subscription Event

Measure values relatedness fV=WikipediaESA(Vs, Ve)

19 of 34

Page 20: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Spain National Football Team

Spain 95%

Football Match Howard Webb

Spain National Football Team Johannesburg

FNB stadium Netherlands National Football Team

Soccer Match South Africa Spain

Subscription Event

Netherlands National Football Team

Spain 30%

20 of 34

Page 21: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football Match Howard Webb

Spain National Football Team Johannesburg

FNB stadium Netherlands National Football Team

Soccer Match South Africa Spain

Subscription Event

type name

referee team

location

type place team

Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve)

21 of 34

Page 22: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

Football Match Howard Webb

Spain National Football Team Johannesburg

FNB stadium Netherlands National Football Team

Soccer Match South Africa Spain

Subscription Event

type name

referee team

location

type place team

Determine correspondent event statement Corre by Max fSTMT

22 of 34

Page 23: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Proposed Approach Instantiation

Types & properties possible mappings

Values possible mappings

Pick best overall mapping

Post-matching event processing

n  Rank within a window n  Complex Event Processing

n  …

23 of 34

Page 24: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiments Overview

n  Methodology ¨  Prepare an event set that reflect required semantic

heterogeneity (Wikipedia events)

¨  Prepare gold standard set of subscriptions that stress multiple aspects of semantic coupling

¨  Validate suitability of semantic approximation from precision perspective

¨  Use a different event set and same subscriptions to validate low maintainability cost (Freebase events)

n  Evaluation Criteria ¨  Average interpolated Precision-Recall Curve on 11 recall

points

¨  Maximal F1 Score over the average curve

24 of 34

Page 25: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Wikipedia Events

Event Set Statistics

Source structured Wikipedia Infoboxes, DBpedia 31 August 2011

Collection Triples directly associated to instances of dbpedia-owl:Event class

Data model RDF

Total # of events 20,156

Total # of distinct event types 4,950

Total # of distinct event properties 1,459

Total # of distinct event values 500,717

Total # of triples 1,502,599

Average # of distinct type per event 7.42

Average # of distinct property per event 30.52

Average # of distinct value per event 54.16

Average # of triple per event 64.67

25 of 34

Page 26: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Wikipedia Events

n  Example Event Types ¨  Football Match

¨  Race

¨  Music Festival

¨  Space Mission

¨  Election

¨  10th-Century BC Conflicts

¨  Academic Conference

¨  Aviation Accident

¨  …

26 of 34

Page 27: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Subscription Set

n  Manually created gold standard set of subscriptions

ID Description Subscription # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"

1 1 NO NO NO

2 Football matches played in the FNB stadium

event type "Football Match" event place "FNB Stadium"

2 2 NO YES NO

3 Events taking place in Wembley stadium

event type "Event" event place "Wembley Stadium"

219 5 NO YES Syntactic

4 Charity events taking place in Wembley stadium

event type "Charity" event place "Wembley Stadium"

29 6 YES YES Semantic + Syntactic

5 Charity Rock events taking place in Wembley stadium

event type "Charity" event type "Rock" event place "Wembley Stadium"

2 2 YES YES Semantic + Syntactic

6 Football matches played in the UK

event type "Football Match" event stadium "United Kingdom"

505 603 NO YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match" event team "South America" event stadium "Europe"

20 123,774 NO YES Background Knowledge

27 of 34

Page 28: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Subscription Set

n  Manually created gold standard set of subscriptions

ID Description Template # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"

1 1 NO NO NO

2 Football matches played in the FNB stadium

event type "Football Match" event place "FNB Stadium"

2 2 NO YES NO

3 Events taking place in Wembley stadium

event type "Event" event place "Wembley Stadium"

219 5 NO YES Syntactic

4 Charity events taking place in Wembley stadium

event type "Charity" event place "Wembley Stadium"

29 6 YES YES Semantic + Syntactic

5 Charity Rock events taking place in Wembley stadium

event type "Charity" event type "Rock" event place "Wembley Stadium"

2 2 YES YES Semantic + Syntactic

6 Football matches played in the UK

event type "Football Match" event stadium "United Kingdom"

505 603 NO YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match" event team "South America" event stadium "Europe"

20 123,774 NO YES Background Knowledge

Subscription event type "Event" event place "Wembley Stadium"

SPARQL pattern 1 ?event rdf:type dbpedia-owl:Event. ?event dbpprop:stadium dbpedia:Wembley_Stadium.

SPARQL pattern 2 ?event rdf:type dbpedia-owl:Event. ?event dbpedia-owl:location dbpedia:Wembley_Stadium.

… …

ID

Desc

rip

tio

n

Su

bsc

rip

tio

n

# o

f re

levan

t even

ts

# o

f n

eed

ed

exact

ru

les

Even

t ty

pe

ap

pro

xim

ati

on

Even

t

pro

pert

ies

ap

pro

xim

ati

on

Lit

era

ls a

nd

re

sou

rces

ap

pro

xim

ati

on

3 Events taking place in Wembley stadium

event type "Event" event place "Wembley Stadium"

219 5 NO YES Syntactic

28 of 34

Page 29: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Results

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  

Precision  

Recall  Jiang&Conrath   Wikipedia  ESA  

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  

Precision  

Recall  Jiang&Conrath   Wikipedia  ESA  

Events taking place in Wembley stadium

Football matches played in the UK

Need for a hybrid matcher that combines both

0%5%

10%15%20%25%30%35%40%45%

0 2^ -25 2^ -20 2^ -15 2^ -10 2^ -5 1

Freq

uenc

y

Semantic similarity or relatedness score (log scale)

Jiang&Conrath WikipediaESA

29 of 34

Page 30: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 1- Results

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  

Precision  

Recall  Jiang&Conrath   Wikipedia  ESA   Hybrid  

Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 70.06% 44.26% 75.45% Recall 80% 80% 90% Precision 62.31% 30.59% 64.94%

n  Hybrid matcher outperforms a single similarity or relatedness measure matcher.

30 of 34

Page 31: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Freebase Event Set

Event Set Statistics

Source Freebase events dump 1 December 2011, triples current

Collection Triples directly associated to instances of “fbase:time.event" class

Data model RDF

Total # of events 84,529

Total # of distinct event types 858

Total # of distinct event properties 1,242

Total # of distinct event values 1,199,627

Total # of triples 1,859,338

Average # of distinct type per event 3.33

Average # of distinct property per event 10.67

Average # of distinct value per event 21.66

Average # of triple per event 21.99

31 of 34

Page 32: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Subscription Set

n  Same as in Experiment 1.

ID Description Subscription # of relevant events

# of needed exact rules

Event type approximation

Event properties approximation

Literals and resources approximation

1 Football matches played by Spain in the FNB stadium

event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"

1 1 YES YES NO

2 Football matches played in the FNB stadium

event type "Football Match" event place "FNB Stadium"

8 2 YES YES NO

3 Events taking place in Wembley stadium

event type "Event" event place "Wembley Stadium"

29 5 NO YES NO

4 Charity events taking place in Wembley stadium

event type "Charity" event place "Wembley Stadium"

0 - - - -

5 Charity Rock events taking place in Wembley stadium

event type "Charity" event type "Rock" event place "Wembley Stadium"

0 - - - -

6 Football matches played in the UK

event type "Football Match" event stadium "United Kingdom"

34 1,398 YES YES Background Knowledge

7 Football matches played by a South American team in Europe

event type "Football Match" event team "South America" event stadium "Europe"

2 219,600 YES YES Background Knowledge

32 of 34

Page 33: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Experiment 2- Results

Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 44.60% 70.73% 76.33% Recall 60% 80% 80% Precision 35.49% 63.39% 72.98%

n  Hybrid matcher gives similar results in Freebase as in DBpedia

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  

Precision  

Recall  Jiang&Conrath   Wikipedia  ESA   Hybrid  

33 of 34

Page 34: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Conclusions

34 of 34

Exact Matcher Approximate Semantic

Matcher Number of Required Subscriptions 345,000 7 Maximal F1-Score 100% 75.89%

n  Approximate semantic matcher addresses subscriptions/ rules maintainability cost in heterogeneous and dynamic environments

n  Approximate semantic matcher is suitable when less than 100% precision is acceptable

n  A hybrid matcher outperforms a single similarity or relatedness measure matcher.

Page 35: Approximate Semantic Matching of Heterogeneous Events

Digital Enterprise Research Institute www.deri.ie

Future Work

n  Need to enhance subscription set for more representativeness.

n  Approximate semantic matcher generates “uncertain” results whose impacts on further event processing functions such as CEP needs to be studied

35 of 34


Recommended