© Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Approximate Semantic Matching of Heterogeneous Events
Souleiman Hasan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute (DERI)
National University of Ireland, Galway (NUIG)
[email protected] http://www.StefanDecker.org/
In proceedings of DEBS 2012, Berlin, Germany
Digital Enterprise Research Institute www.deri.ie
Hasan S, O’Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012) www.edwardcurry.org
Further Reading
Digital Enterprise Research Institute www.deri.ie
Outline
n Introduction ¨ Smart Environments
¨ Motivational Scenario
¨ Related Work
n Proposal ¨ Approximate Semantic
Matching
n Experiments ¨ Wikipedia
¨ Freebase
n Conclusions
n Q & A
3 of 34
Digital Enterprise Research Institute www.deri.ie
Smart Environments
n Smart Homes, Grids, Cities… n Internet-of-Things, Sensor Web…
n Non-technical users n High heterogeneity
n Trend for dynamic data-driven decision making
by 2020 50 billion devices connected to mobile networks (OECD, 2012)
Event/Situation of Interest New free parking space near me
Event/Situation of Interest Soccer match played in Berlin
........
4 of 34
Digital Enterprise Research Institute www.deri.ie
CIO
Helpdesk
Maintenance Personnel
Building
Data Center
CSO
Motivational Scenario- Enterprise
Various terms used: energy consumption, energy usage….
room, space, zone…
Situation of Interest Company CO2 emissions performance
Dynamic Environments:
New events from equipments joining and leaving
Energy usage by global IT department
PUE of the Data Center in Dublin
kWhs used by server 172.16.0.8
5 of 34
Digital Enterprise Research Institute www.deri.ie
n Handling of semantically heterogeneous events
n Handling of dynamic environments with event types by sources joining and leaving
n Low cost of rules management
n Usability
n Precision
Requirements
6 of 34
Digital Enterprise Research Institute www.deri.ie
EPL
Inte
rfac
e an
d P
arse
r
Single Event Matcher
User
Situation of Interest When a floor is empty and its energy usage for an hour is above threshold w.r.t budget then it is an excessive usage
Event Processing
Developer
ERP
BMS
Pattern Matcher
UI
Tra
nsl
atio
n
EVENT PROCESSING RULE INSERT INTO ExcessiveEnergyUsageByFloor
SELECT a.floor as floor FROM PATTERN
[(a=FloorEmptySensor -> every b=DeviceEnergyUsageSensor (a.floor=b.floor))]
.WIN:TIME(1 hour) GROUP BY a.floor
WHERE (b.usage) > GetAcceptableThreshold(a.budgetValue)
Templates Repository
Rules Repository
Exec
uti
on
Rep
osi
tory
CEP Engine
PC NO XDG26359 Floor: 1st usage: 3 kWh
VM: vmdgsit01.deri.ie Floor: 1st usage: 15 kWh
Non-technical users with natural language needs
Separated from the engine
Rules tied to vocabulary High cost in case of
heterogeneity or change
7 of 34
Digital Enterprise Research Institute www.deri.ie
Exact Event Processing Paradigm
Requirement Addressing by the paradigm
Semantic Heterogeneity Does not scale out to high heterogeneous environments
Dynamic Environment Does not scale out to high dynamic environments
Rule Management High cost on large heterogeneity and dynamicity
Usability Low
Precision 100% (typically)
8 of 34
Digital Enterprise Research Institute www.deri.ie
Decoupling in Event Systems
n Space n Time
n Synchronization
Event Producer
Event Consumer
Space
Synchronization
Time
Producers and consumers don’t know each other
Participants don’t need to be actively involved in the interaction the same time
Event producers and consumers don’t get blocked to send/receive events
9 of 34
Digital Enterprise Research Institute www.deri.ie
Decoupling in Event Systems
n Principle ¨ “Removal of explicit dependencies between
participants” (Eugster et al., 2003)
n Outcome ¨ Scalability
Event Producer
Event Consumer
Space
Synchronization
Time
10 of 34
Digital Enterprise Research Institute www.deri.ie
Semantic Coupling
n Current event-based systems keep explicit semantic dependency between participants
n Limited scalability in highly heterogeneous and dynamic environment
Event Producer
Event Consumer
Space
Synchronization
Time
Semantic (Event types, property, values)
11 of 34
Digital Enterprise Research Institute www.deri.ie
Current Approaches
n Ontology-based ¨ (Petrovic et al., 2003), (Zhang & Ye, 2008)…
¨ Does not “remove explicit dependency”
¨ Hard to achieve ontology agreement a priori at large-scale of heterogeneity and dynamicism
¨ Medium usability, 100% precision typically
n Fuzzy sets ¨ (Liu & Jacobsen, 2002)
¨ Address only event numerical values vs. string values subscriptions
¨ Medium usability, High precision
12 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach
n Approximate semantic matching of events
Types & properties possible mappings
Event Type(s)
Properties Values
Values possible mappings
Pick best overall mapping
Post-matching event processing
Subscription Type(s)
Properties Values
13 of 34
Digital Enterprise Research Institute www.deri.ie
q Semantic Similarity q f: Terms X Terms à [0,1]
q term1, term2 are Terms q f(term1, term2)=0 absolute semantic mismatch
q f(term1,term2)=1 exact match
q E.g. Football Match and Soccer Match are similar
q Relatedness: a general case of similarity q E.g. Football Match and Referee related but not similar
q Thesaurus-based: e.g. WordNet-based q Distributional semantics-based: e.g. Wikipedia ESA
q The more Wikipedia articles two terms occurs in, the more related they are
Background
14 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
event
Football Match
Netherlands National Football Team
2010 FIFA World Cup Final
Johannesburg
FNB stadium
Spain National Football Team
Howard Webb type name
team
location location
team
referee
Event type “”Soccer Match Event team “Spain” Event place “South Africa”
Subscription
15 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
type name
referee team
location
type place team
Subscription Event
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall
Lin
Jiang&Conrath
Leacock&Chodorow
Lesk
Path
Resnik
Gloss Vector
16 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
type name
referee team
location
type place team
Subscription Event
Determine top m correspondence candidates RankSimJiiang&Conrath(ps, pe)
Measure properties relatedness fP=Min(1,m-RankSimJiiang&Conrath(ps, pe) +1)*WikipediaESA(ps, pe))
17 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
type name
referee team
location
type place team
Subscription Event
type location
team
type place team
90% Top 1
type name
referee
type place team
40% Top 2
18 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football Match Howard Webb
Spain National Football Team Johannesburg
FNB stadium Netherlands National Football Team
Soccer Match South Africa Spain
Subscription Event
Measure values relatedness fV=WikipediaESA(Vs, Ve)
19 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Spain National Football Team
Spain 95%
Football Match Howard Webb
Spain National Football Team Johannesburg
FNB stadium Netherlands National Football Team
Soccer Match South Africa Spain
Subscription Event
Netherlands National Football Team
Spain 30%
20 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football Match Howard Webb
Spain National Football Team Johannesburg
FNB stadium Netherlands National Football Team
Soccer Match South Africa Spain
Subscription Event
type name
referee team
location
type place team
Calculate statements relatedness fSTMT =fP(ps, pe)*fV(vs, ve)
21 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
Football Match Howard Webb
Spain National Football Team Johannesburg
FNB stadium Netherlands National Football Team
Soccer Match South Africa Spain
Subscription Event
type name
referee team
location
type place team
Determine correspondent event statement Corre by Max fSTMT
22 of 34
Digital Enterprise Research Institute www.deri.ie
Proposed Approach Instantiation
Types & properties possible mappings
Values possible mappings
Pick best overall mapping
Post-matching event processing
n Rank within a window n Complex Event Processing
n …
23 of 34
Digital Enterprise Research Institute www.deri.ie
Experiments Overview
n Methodology ¨ Prepare an event set that reflect required semantic
heterogeneity (Wikipedia events)
¨ Prepare gold standard set of subscriptions that stress multiple aspects of semantic coupling
¨ Validate suitability of semantic approximation from precision perspective
¨ Use a different event set and same subscriptions to validate low maintainability cost (Freebase events)
n Evaluation Criteria ¨ Average interpolated Precision-Recall Curve on 11 recall
points
¨ Maximal F1 Score over the average curve
24 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Wikipedia Events
Event Set Statistics
Source structured Wikipedia Infoboxes, DBpedia 31 August 2011
Collection Triples directly associated to instances of dbpedia-owl:Event class
Data model RDF
Total # of events 20,156
Total # of distinct event types 4,950
Total # of distinct event properties 1,459
Total # of distinct event values 500,717
Total # of triples 1,502,599
Average # of distinct type per event 7.42
Average # of distinct property per event 30.52
Average # of distinct value per event 54.16
Average # of triple per event 64.67
25 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Wikipedia Events
n Example Event Types ¨ Football Match
¨ Race
¨ Music Festival
¨ Space Mission
¨ Election
¨ 10th-Century BC Conflicts
¨ Academic Conference
¨ Aviation Accident
¨ …
26 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Subscription Set
n Manually created gold standard set of subscriptions
ID Description Subscription # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"
1 1 NO NO NO
2 Football matches played in the FNB stadium
event type "Football Match" event place "FNB Stadium"
2 2 NO YES NO
3 Events taking place in Wembley stadium
event type "Event" event place "Wembley Stadium"
219 5 NO YES Syntactic
4 Charity events taking place in Wembley stadium
event type "Charity" event place "Wembley Stadium"
29 6 YES YES Semantic + Syntactic
5 Charity Rock events taking place in Wembley stadium
event type "Charity" event type "Rock" event place "Wembley Stadium"
2 2 YES YES Semantic + Syntactic
6 Football matches played in the UK
event type "Football Match" event stadium "United Kingdom"
505 603 NO YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match" event team "South America" event stadium "Europe"
20 123,774 NO YES Background Knowledge
27 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Subscription Set
n Manually created gold standard set of subscriptions
ID Description Template # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"
1 1 NO NO NO
2 Football matches played in the FNB stadium
event type "Football Match" event place "FNB Stadium"
2 2 NO YES NO
3 Events taking place in Wembley stadium
event type "Event" event place "Wembley Stadium"
219 5 NO YES Syntactic
4 Charity events taking place in Wembley stadium
event type "Charity" event place "Wembley Stadium"
29 6 YES YES Semantic + Syntactic
5 Charity Rock events taking place in Wembley stadium
event type "Charity" event type "Rock" event place "Wembley Stadium"
2 2 YES YES Semantic + Syntactic
6 Football matches played in the UK
event type "Football Match" event stadium "United Kingdom"
505 603 NO YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match" event team "South America" event stadium "Europe"
20 123,774 NO YES Background Knowledge
Subscription event type "Event" event place "Wembley Stadium"
SPARQL pattern 1 ?event rdf:type dbpedia-owl:Event. ?event dbpprop:stadium dbpedia:Wembley_Stadium.
SPARQL pattern 2 ?event rdf:type dbpedia-owl:Event. ?event dbpedia-owl:location dbpedia:Wembley_Stadium.
… …
ID
Desc
rip
tio
n
Su
bsc
rip
tio
n
# o
f re
levan
t even
ts
# o
f n
eed
ed
exact
ru
les
Even
t ty
pe
ap
pro
xim
ati
on
Even
t
pro
pert
ies
ap
pro
xim
ati
on
Lit
era
ls a
nd
re
sou
rces
ap
pro
xim
ati
on
3 Events taking place in Wembley stadium
event type "Event" event place "Wembley Stadium"
219 5 NO YES Syntactic
28 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall Jiang&Conrath Wikipedia ESA
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall Jiang&Conrath Wikipedia ESA
Events taking place in Wembley stadium
Football matches played in the UK
Need for a hybrid matcher that combines both
0%5%
10%15%20%25%30%35%40%45%
0 2^ -25 2^ -20 2^ -15 2^ -10 2^ -5 1
Freq
uenc
y
Semantic similarity or relatedness score (log scale)
Jiang&Conrath WikipediaESA
29 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 1- Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall Jiang&Conrath Wikipedia ESA Hybrid
Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 70.06% 44.26% 75.45% Recall 80% 80% 90% Precision 62.31% 30.59% 64.94%
n Hybrid matcher outperforms a single similarity or relatedness measure matcher.
30 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Freebase Event Set
Event Set Statistics
Source Freebase events dump 1 December 2011, triples current
Collection Triples directly associated to instances of “fbase:time.event" class
Data model RDF
Total # of events 84,529
Total # of distinct event types 858
Total # of distinct event properties 1,242
Total # of distinct event values 1,199,627
Total # of triples 1,859,338
Average # of distinct type per event 3.33
Average # of distinct property per event 10.67
Average # of distinct value per event 21.66
Average # of triple per event 21.99
31 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Subscription Set
n Same as in Experiment 1.
ID Description Subscription # of relevant events
# of needed exact rules
Event type approximation
Event properties approximation
Literals and resources approximation
1 Football matches played by Spain in the FNB stadium
event type "Football Match" event team "Spain national football team" event stadium "FNB Stadium"
1 1 YES YES NO
2 Football matches played in the FNB stadium
event type "Football Match" event place "FNB Stadium"
8 2 YES YES NO
3 Events taking place in Wembley stadium
event type "Event" event place "Wembley Stadium"
29 5 NO YES NO
4 Charity events taking place in Wembley stadium
event type "Charity" event place "Wembley Stadium"
0 - - - -
5 Charity Rock events taking place in Wembley stadium
event type "Charity" event type "Rock" event place "Wembley Stadium"
0 - - - -
6 Football matches played in the UK
event type "Football Match" event stadium "United Kingdom"
34 1,398 YES YES Background Knowledge
7 Football matches played by a South American team in Europe
event type "Football Match" event team "South America" event stadium "Europe"
2 219,600 YES YES Background Knowledge
32 of 34
Digital Enterprise Research Institute www.deri.ie
Experiment 2- Results
Matcher Jiang&Conrath Wikipedia ESA Hybrid Maximal F1 Score 44.60% 70.73% 76.33% Recall 60% 80% 80% Precision 35.49% 63.39% 72.98%
n Hybrid matcher gives similar results in Freebase as in DBpedia
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Precision
Recall Jiang&Conrath Wikipedia ESA Hybrid
33 of 34
Digital Enterprise Research Institute www.deri.ie
Conclusions
34 of 34
Exact Matcher Approximate Semantic
Matcher Number of Required Subscriptions 345,000 7 Maximal F1-Score 100% 75.89%
n Approximate semantic matcher addresses subscriptions/ rules maintainability cost in heterogeneous and dynamic environments
n Approximate semantic matcher is suitable when less than 100% precision is acceptable
n A hybrid matcher outperforms a single similarity or relatedness measure matcher.
Digital Enterprise Research Institute www.deri.ie
Future Work
n Need to enhance subscription set for more representativeness.
n Approximate semantic matcher generates “uncertain” results whose impacts on further event processing functions such as CEP needs to be studied
35 of 34