View
217
Download
2
Embed Size (px)
Citation preview
p. 1
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared Vocabularies
TMRA'05International Workshop on Topic Maps
Research and Applications06.10.2005
Lutz MaicherUniversity of Leipzig
p. 2
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange = Retrieval Task
1. Subject Proxies are createdin a remote environment.
requested peer
requested peer
requesting peer
?
?
2. A requesting peer requests further information about a Subject in interest.
3. The requested peers have to decide whether a Subject Proxy indicating anidentical Subject is available.
none
4. Requested peers send a fragment to requesting peer.
5. Requesting peer has to merge in the requested fragments.
p. 3
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Enterprise Information Integration
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Quelle: Taylor, John: Thoughts from the Integration Consortium: Enterprise Information Integration: A New Definition, DM Review Online, (9,2004).
p. 4
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Existing Approaches to Topic Maps Exchange● TMRAP – Topic Maps Remote Access Protocol● TMIP – the REStful Topic Maps Interaction Protocol
(formerly: Federated Topic Maps)
● SHARK(alternatively: Knowledge Port Approach)
● TMShare
● all of them base on the TMDM – if distributed peers do not use a common
vocabulary (PSIs), the exchange fails completely
p. 5
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Semantics in Topic Maps
● Topic Maps are a semantic technology ......only in the perspective of information integration
– „Subject Proxies indicating identical Subjects have to be viewed as merged ones“
● A Subject Map Disclosure (SMD) discloses:– SMD ontology
● implies the Subject Indication Approach
– Subject Equality Decision Approach● define the semantics of the given Subject Proxies in
respect to the functionality of holding the Co-Location objective true
– Subject Viewing Approach
p. 6
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
How Subject Equality is detected?
Subject Equality SMDi (
Subject Identity under integration perspective?
Subject Equality = both Subject Proxies indicate identical Subjects governed by the Subject Equality Decision Approach SMDi
Subject Indication SMD1 (Subject IdentitySubject Stage1),
Subject Indication SMD2 (Subject IdentitySubject Stage2))
Subject Identity integration perspective( Subject Stage1, Subject Stage2)
Subject Identity is indicated governed by the Subject Indication Approach SMD1
p. 7
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
How Subject Equality is really detected?
Subject Equality SMDi (
Subject Equality = both Subject Proxies indicate identical Subjects governed by the Subject Equality Decision Approach SMDi
Subject Indication SMD1,
Subject Indication SMD2,
Subject IndicationSMD1
Subject Map Subject Proxy1, Subject Map Subject Proxy2) true | false
Subject IndicationSMD2?
?
p. 8
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Possible Subject Equality Approaches of a SMD
Referential Subject Equality Approach[A reference to a discrete ‘object’ indicates the intended Subject.] - Subject Proxy 1 indicates its Subject by pointing to it with S1 - Subject Proxy 2 indicates its Subject by pointing to it with S2 - Subject Equality holds if S1=S2Structuralist Subject Equality Approach[The Subject depends on other Subject Proxies of the Subject Map.] - Subject Proxy 1 indicates its Subject through a set of Subject Proxies s1 - Subject Proxy 2 indicates its Subject through a set of Subject Proxies s2 - Subject Equality holds if s1 = s2 (or S1 similar S2)
Meaning (semantics) in linguistics
referential semantics The meaning of word is defined by the object it refers to.
structuralist semantics The meaning of a word is defined by its usage in the language.
The different Approaches to Subject Equality define the semanticsof the used vocabulary at the time of the Subject Equality Decision.
p. 9
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Absence of Shared Vocabularies
Topic Map Processing Application
Subject Map Disclosure ontology
Subject Map ontology
Subject Map Vocabulary
Subject Map Disclosure (SMD)
Structuralist Subject Equality Decision
Referential Subject Equality DecisionReferential Subject Equality Decision
p. 10
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Towards a SMDSIM
Topic Map Processing Application
Subject Map Disclosure ontology
Subject Map ontology
Subject Map vocabulary
Subject Map Disclosure (SMD)
Structuralist Subject Equality Decision
Referential Subject Equality Decision
Structuralist Subject Equality Decision
p. 11
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Subject Similarity Measure (SIM)● SIM – Similarity of the Subject of two different
Topics● Procedure: a Subject available in Topic Map TM2
will be requested from Topic Map TM1– Extract a Topic Map Fragment (F) from TM2 around the Topic
representing the Subject
– for each pair (T1, T2) from TM1, F
● depict the simDNAtype for each pair● calculate the simDNA for each pair● calculate the simDNA twice, by using the detected similarity
from the first step● simDNA’(T1,T2) = sum of digits (simDNA(T1,T2))
– Subject Equality (T1,T2) -> (max simDNA’(T1,T2)) and (simDNA(T1,T2))>threshold
p. 12
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
simDNAtype
(0..*) Source Locator [Locator Item]
(0..1) Subject Locator [Locator Item]
(0..1) Subject Identifier [Locator Item]
(0..*) Topic Names [Topic Name Item]
(0..*) Source Locator [Locator Item]
(0..1) Type [Topic Item]
(0..*) Scope [Topic Item]
(1) Value [String]
(0..*) Variants [Variant Items]
(0..*) Source Locators [Locator Item]
(0..*) Scope [Topic Item]
(0..1) Value [String]
(0..1) Resource [Locator Item]
(0..*) Occurrences [Occurrence Item]
(0..*) Source Locators [Locator Item]
(0..1) Type [Topic Item]
(0..*) Scope [Topic Item]
(0..1) Value [String]
(0..1) Resource [Locator Item]
(0..*) rolesPlayed [Association Role Item]
(0..1) Type [Topic Item]
(1) Parent [Association Item]
TMDM simDNAType
/x*y*z*w*s*1*2*3*t*n*(o)*[a]*/
x – the current Topic is typing a Topic
y – the current Topic is typing an Association
z – the current Topic is typing a Topic Characteristics
w – the current Topic is typing a Association Role
s – the current Topic is scoping a Topic Characteristic
1 – the current Topic has a Source Locator
2 – the current Topic has a Subject Locator
3 – the current Topic has a Subject Identifier
t – the current Topic is typed
n – the current Topic has a TopicName
o – the current Topic has an Occurrence
o => /(v|l)t?s*/ (OccDNAtype)
a – the current Topic takes part in an Association
a => /a(tp)*/ (AssDNAtype)
p. 13
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
simDNA – 1. Iteration
simDNAType
/x*y*z*w*s*1*2*3*t*n*(o)*[a]*/
x – the current Topic is typing a Topic
y – the current Topic is typing an Association
z – the current Topic is typing a Topic Characteristics
w – the current Topic is typing a Association Role
s – the current Topic is scoping a Topic Characteristic
1 – the current Topic has a Source Locator
2 – the current Topic has a Subject Locator
3 – the current Topic has a Subject Identifier
t – the current Topic is typed
n – the current Topic has a TopicName
o – the current Topic has an Occurrence
o => /(v|l)t?s*/ (OccDNAtype)
a – the current Topic takes part in an Association
a => /a(tp)*/ (AssDNAtype)
Example
simDNAtype(T1) = x13tn
x – the current Topic is typing a Topic
1 – the current Topic has a Source Locator
2 – the current Topic has a Subject Locator
3 – the current Topic has a Subject Identifier
t – the current Topic is typed
n – the current Topic has a Topic Name
simDNA(T1,T2) = 01XX1
T2 types an Association
T2 has a Source Locator
T2 has none Subject Identifier
T2 is not typed
T2 has a Topic Name, which is not similar
simDNA(T1,T3) = 21113
T2 types a Topic
T2 has a Source Locator
T2 has a Subject Identifier
T2 is typed
T2 has a Topic Namen, which is a “bit” similar
p. 14
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
simDNA – 2. Iteration
simDNAType
/x*y*z*w*s*1*2*3*t*n*(o)*[a]*/
x – the current Topic is typing a Topic
y – the current Topic is typing an Association
z – the current Topic is typing a Topic Characteristics
w – the current Topic is typing a Association Role
s – the current Topic is scoping a Topic Characteristic
1 – the current Topic has a Source Locator
2 – the current Topic has a Subject Locator
3 – the current Topic has a Subject Identifier
t – the current Topic is typed
n – the current Topic has a TopicName
o – the current Topic has an Occurrence
o => /(v|l)t?s*/ (OccDNAtype)
a – the current Topic takes part in an Association
a => /a(tp)*/ (AssDNAtype)
Example
simDNAtype(T1) = x13tn
x – the current Topic is typing a Topic
1 – the current Topic has a Source Locator
2 – the current Topic has a Subject Locator
3 – the current Topic has a Subject Identifier
t – the current Topic is typed
n – the current Topic has a Topic Name
simDNA(T1,T2) = 01XX1
T2 types an Association
T2 has a Source Locator
T2 has none Subject Identifier
T2 is not typed
T2 has a Topic Name, which is not similar
simDNA(T1,T3) = 21133
T2 types a Topic
T2 has a Source Locator
T2 has a Subject Identifier
T2 is typed, and the typing Topic is similar
T2 has a Topic Name, which is a “bit” similar
p. 15
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
SIM - Example
13n Beispiel.xtm#TMStandards z13n X111 t_source.xtm#t_source Similar: false
xx1n Beispiel.xtm#t_person z13n 01X1 t_source.xtm#t_source Similar: false
z1n Beispiel.xtm#t_introduction z13n 21X1 t_source.xtm#t_source Similar: false
zz1n Beispiel.xtm#t_homepage z13n 21X1 t_source.xtm#t_source Similar: false
s1n Beispiel.xtm#t_en z13n X1X1 t_source.xtm#t_source Similar: false
s1n Beispiel.xtm#t_de z13n X1X1 t_source.xtm#t_source Similar: false
x1n Beispiel.xtm#t_requirements z13n 01X1 t_source.xtm#t_source Similar: false
ss1n Beispiel.xtm#t_nickname z13n X1X1 t_source.xtm#t_source Similar: false
13n Beispiel.xtm#t_sort z13n X111 t_source.xtm#t_source Similar: false
z1n Beispiel.xtm#t_source z13n 21X3 t_source.xtm#t_source Similar: true
y1nnn Beispiel.xtm#at_authorship z13n 01X1 t_source.xtm#t_source Similar: false
ws1n Beispiel.xtm#art_author z13n 01X1 t_source.xtm#t_source Similar: false
ws1n Beispiel.xtm#art_document z13n 01X1 t_source.xtm#t_source Similar: false
13tnn(vs)(lt)(vts)[atptp] Beispiel.xtm#M1 z13n X111 t_source.xtm#t_source Similar: false
13tnn(lt) Beispiel.xtm#M2 z13n X111 t_source.xtm#t_source Similar: false
12tn(lt)[atptp] Beispiel.xtm#RA1 z13n X1X1 t_source.xtm#t_source Similar: false
p. 16
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
SIM - Assessment● Self-Assessment
– take each Topic from the Topic Map
– create a (randomly pruned) fragment around the Topic Maps, and
– request the Topic Map.
– pruning probabilities● probType - of the Type of the Topics● probTopNam - of the whole Topic Name● probAss - of the Association the Topic plays a role
● probOcc - of a occurrence (and all of its properties)
p. 17
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
SIM - (Self-)Assessment
p. 18
Lu
tz M
aic
her
(maic
her@
info
rmati
k.u
ni-
leip
zig
.de)
Topic Maps Exchange in the Absence of Shared Vocabularies
Besides the TMDM Subject Equality Approach
Syntax Data Model(Graph)X
X
Referential Subject Equality
Structuralist Subject Equality
semantics as relative value
semantics as absolute value
bound to SM ontology
- simpleSIM- yields very good results in restricted domains- usage of Topic is ignored
bound to TMV vocabulary
bound to SMD ontology
- SIM (bound to TMDM)- more generic, yields good results - usage of Topic is exploited
bound to TMRM - adoption of Melniks Similarity Flooding Approach- not suitable for the usage scenario, but for SM ontology matching
bound to TMA ontology
- work to do
O(n*n)O(n*log(n))
Sowa’s Knowledge Signature
Subject Equality SMDi (
Subject Indication SMD1,
Subject Indication SMD2,
Subject Map Subject Proxy1, Subject Map Subject Proxy2)
true | false
How can a SMDSIM be defined: How a deterministic Subject IndicationApproach can be defined?