10
iRDQL – Imprecise RDQL Queries Using Similarity Joins iRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, August 29, 2022

IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

Embed Size (px)

Citation preview

Page 1: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s

iRDQL – Imprecise RDQL Queries Using Similarity Joins

Abraham Bernstein, Christoph Kiefer

Banff, April 10, 2023

Page 2: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

2

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s Surfing the Beach

Questions: Where to rent a surf board? Which beach service offers best prices? Where to find the biggest waves?

Page 3: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

3

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s RDQL – RDF Data Query Language

Jena – Implementing the Semantic Web Recommendations [Carroll et al. ‘03]

SELECT ?s1,?p1

WHERE (?s1 presents ?p1)

(?p1 serviceName “beach surfing”)

?s1 ?p1

Beach Surfing Service

Beach Surfing Profile

Page 4: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

4

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s iRDLQ – Extending RDQL With Similarity Joins

3 additional language constructs IMPRECISE, SIMMEASURE, OPTIONS

SELECT ?s1,?p1,?p2

WHERE (?s1 presents ?p1)

(?p2 serviceName “beach surfing”)

IMPRECISE ?p1,?p2

SIMMEASURE Levenshtein

OPTIONS IGNORECASE false THRESHOLD 0.7

?s1 ?p1 ?p2 sim

Beach Surfing Service Beach Surfing

ProfileBeach Surfing Profile

1.0

Beach Broker Service

Beach Broker Profile

Beach Surfing Profile

0.85

Abstract Broker Service

Abstract Broker Profile

Beach Surfing Profile

0.7

Similarity Join

Page 5: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

5

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s Similarity Measurement – Status Quo

Similarity between feature vectors [Lee et al. 98]

– features of objects like name, degree,...– Cosine of angle

strings or sequences of strings [Levenshtein ‘66]

– textual description of objects– Levenshtein, TFIDF

trees and graphs [Shasha et al. ‘02]

– tree/graph comparison– Isomorphisms, Tree-edit distance

objects [Resnik ‘95]

– amount of information contained in objects

CS Dept US

UnderGrad

Courses

Grad

Courses People

FacultyStaff

Associate Prof

Prof

name: Mike Meyers

granting-institution: NYU

CS Dept Swiss

Courses Staff

Academic

StaffTechnical

Staff

Lecturer Professor

first-name: Abraham

last-name: Bernstein

degree: Prof., Ph.D.

AdministrationStaff

„This is the Department of Informatics at the University of Zurich.“

„Computer Science Department at the University of NY.“

p = 0.0345

Page 6: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

6

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s Evaluation Approach

Quantitative evaluation using an OWL-S service retrieval test collection [Klusch ‘05]

OWL-S Service-based Precision, Recall andF-Measure as performance measures

Precision / Recall / F-Measure:

Page 7: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

7

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s OWL-S Service Retrieval Test Collection

<?xml version="1.0" encoding="UTF-8"?>

<service:Service rdf:ID="CITY_BROKER_SERVICE">

<service:presents rdf:resource="#CITY_BROKER_PROFILE"/>

<service:describedBy rdf:resource="#CITY_BROKER_PROCESS_MODEL"/>

<service:supports rdf:resource="#CITY_BROKER_GROUNDING"/>

</service:Service>

<profile:Profile rdf:ID="CITY_BROKER_PROFILE">

<service:isPresentedBy rdf:resource="#CITY_BROKER_SERVICE"/>

<profile:serviceName xml:lang="en">

hotel reservation booking service

</profile:serviceName>

<profile:textDescription xml:lang="en">

Provide the best hotel reservation system in a given city.

</profile:textDescription>

<profile:hasInput rdf:resource="#_CITY"/>

<profile:hasOutput rdf:resource="#_BROKER"/>

<profile:has_process rdf:resource="CITY_BROKER_PROCESS" />

</profile:Profile>

city_broker_service.owls

city_broker_service2.owls

city_financial_agent_service.owls

city_financial_agent_service1.owls

urbanarea_financial_agent_service.owls

city_organization_service.owls

...

406 OWL-S services of 6 different domains

9 queries together with its correct answers

Query Relevance Set

Query

http://www.w3.org/Submission/OWL-S/

Page 8: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

8

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s iRDQL – Performance (iP vs. iPM) iP: IMPRECISE ?p1, ?p2

iPM: IMPRECISE ?p1, ?p2 & IMPRECISE ?m1, ?m2

Average Precision, Recall, and F-Measure (iP vs. iPM), Levenshtein

avg precision (iPM)

avg precision (iP)

avg recall (iPM)

avg recall (iP)

avg f-measure (iPM)

avg f-measure (iP)

Page 9: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

9

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s iRDQL – Performance (iRDQL vs. OWLS-M4)

iRDQL slightly outperformed

by specialized algorithm

OWLS-M4: Matchmaking algorithm of OWLS-MX [Klusch et al. ‘05]

Precision

Recall

F-Measure

Page 10: IRDQL – Imprecise RDQL Queries Using Similarity Joins Abraham Bernstein, Christoph Kiefer Banff, February 11, 2014

10

iRD

QL

– Im

pre

cise

RD

QL

Queri

es

Usi

ng S

imila

rity

Join

s Conclusions and Outlook

Combination of RDQL and similarity measures

Generic IR-based approach only slightly outperformed

Inspired by Cohen’s work [Cohen ‘00]

no flat tables aggregated (ontologized) objects

Performance improvements Switch to SPARQL Thank You...