31
www.insight-centre.org An Ontology-based Technique for Online Profile Resolution Keith Cortis, Simon Scerri, Ismael Rivera, Siegfried Handschuh International Conference on Social Informatics Kyoto, Japan 27th November 2013

An Ontology-based Technique for Online Profile Resolution

  • Upload
    kcortis

  • View
    307

  • Download
    0

Embed Size (px)

DESCRIPTION

This paper was presented at the 5th International Conference on Social Informatics (http://www.socinfo2013.com/) in Kyoto, Japan on 27 November 2013. The full paper can be found at: http://link.springer.com/chapter/10.1007%2F978-3-319-03260-3_25

Citation preview

Page 1: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

An Ontology-based Technique for

Online Profile Resolution

Keith Cortis, Simon Scerri, Ismael Rivera,

Siegfried Handschuh

International Conference on Social Informatics

Kyoto, Japan 27th November 2013

Page 2: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Introduction (1)

Instance Matching : if two instances /

representations refer to the same real world

entity or not e.g., persons

Research Challenge : Discovery of multiple

online profiles that refer to the same person

identity on heterogeneous social networks

Page 3: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Improved profile matching system extended

with:

Named Entity Recognition

Linked Open Data

Semantic Matching

Additional Benefit: Ontology used as a

background schema

Advantage: Standard schema enables

cross-network interoperability

Introduction (2)

Page 4: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Contact Matcher Applications:

Control sharing of personal data

Detection of fully or partly anonymous

contacts

o > 83 million fake accounts

New contacts suggestions that are of direct

interest to user

Motivation

Page 5: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

Online Profile Suggestions

5

1

4

ANNIE IE System

Large KB Gazetteer

Surname

Name

City

Country

c

Attribute Weighting Function

b

Semantic-based Matching Extension

Country City

country

a Attribute

Value Matching

NCO

Online Profile Merging

6

Named Entity Recognition

Page 6: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

1

Page 7: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Lifting semi-/un-structured profile information

from a remote schema

Transform information to instances of the

Contact Ontology (NCO)

NCO - Identity-related online profile information

Semantic Lifting

Page 8: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

1

4

ANNIE IE System

Large KB Gazetteer

a Attribute

Value Matching

NCO

Named Entity Recognition

Surname

Name

City

Country

Page 9: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Direct Value Comparison

String Matching

Best string matching metric for each

attribute type

Attribute Value Matching

Page 10: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

1

4

ANNIE IE System

Large KB Gazetteer

Surname

Name

City

Country

b

Semantic-based Matching Extension

Country City

country

a Attribute

Value Matching

NCO

Named Entity Recognition

Page 11: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Indirect semantic relations at a schema level

Use-case: Location-related profile attributes

Location sub-entities being semantically

compared are: city, region and country

Find the semantic relations between the sub-

entities in question in a bi-directional manner

E.g. Galway (profile 1) vs. Ireland (profile 2)

Semantic-based Matching

Ireland Galway

country

isPartOf locatedWithin

Galway Ireland

capital

largestCity containsLocation

isLocationOf

Page 12: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

1

4

ANNIE IE System

Large KB Gazetteer

Surname

Name

City

Country

c

Attribute Weighting Function

b

Semantic-based Matching Extension

Country City

country

a Attribute

Value Matching

NCO

Named Entity Recognition

Page 13: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Approach 1: Direct Similarity Score

Approach 2: Normalised Similarity Score

based on a threshold for each attribute type

Attribute Weighting Function

Name Justin Bieber J. Bieber

Similarity Value 0.90

Name Justin Bieber J. Bieber

Metric Similarity Value 0.90

Similarity Value 1.0

Attribute Threshold for Name : 0.70

Name Justin Bieber Joffrey Baratheon

Metric Similarity Value 0.4

Similarity Value 0.0

Page 14: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

Online Profile Suggestions

5

1

4

ANNIE IE System

Large KB Gazetteer

Surname

Name

City

Country

c

Attribute Weighting Function

b

Semantic-based Matching Extension

Country City

country

a Attribute

Value Matching

NCO

Named Entity Recognition

Page 15: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Online Profile Suggestions

Name Joffrey Baratheon Joff Baratheon

City King’s Landing King’s Landing

Role King King

Date of Birth 286AL 286AL

Similarity Score 0.95

Name Joffrey Baratheon Joffrey Bieber

City King’s Landing London, Ontario

Role King Singer

Date of Birth 286AL 01/03/1994

Similarity Score 0.30

Similarity Threshold: 0.90

Page 16: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Online Profile Suggestions

Page 17: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile Resolution Technique

User Profile Data Extraction

Semantic Lifting

2

3

Hybrid Matching

Process

Online Profile Suggestions

5

1

4

ANNIE IE System

Large KB Gazetteer

Surname

Name

City

Country

c

Attribute Weighting Function

b

Semantic-based Matching Extension

Country City

country

a Attribute

Value Matching

NCO

Online Profile Merging

6

Named Entity Recognition

Page 18: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Two-staged evaluation:

1. Technique

a) Best attribute similarity score approach

b) If NER & semantic-based matching extension

improve overall technique

c) The computational performance of hybrid

technique against the syntactic-based one

d) A similarity threshold that determines profile

equivalence within a satisfactory degree of

confidence

2. Usability

e) Level of precision for the profile matching

Experiments & Evaluation

Page 19: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Two Datasets:

1. A controlled dataset of public profiles obtained

from the Web (LinkedIn and Twitter)

182 online profiles

– 112 ambiguous real-world

persons (common attributes)

– 70 refer to 35 well-known

sports journalists

Maximised False Positives

2. Private personal and contact-list profiles

obtained from 5 consenting participants

Technique Evaluation

Page 20: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile attribute similarity score that fares best

Direct Approach outperforms Normalised Approach

8631 online profile pair comparisons

Technique Evaluation – Experiment 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.7 0.75 0.8 0.85 0.9

Re

sult

Threshold value

Direct Approach

Precision

Recall

F1-Measure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.7 0.75 0.8 0.85 0.9

Re

sult

s

Threshold value

Normalised Approach

Precision

Recall

F1-Measure

Page 21: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

String-based technique vs. String + NER + Semantic-

based technique

New hybrid technique improves the results

considerably over the string-only based one

F-measure -> more or less stable for thresholds of

0.75 and 0.8.

Technique Evaluation – Experiment 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.7 0.75 0.8

Re

sult

Threshold value

Precision

Recall

F1-Measure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.7 0.75 0.8

Re

sult

Threshold value

Precision

Recall

F1-Measure

String Technique

Hybrid Technique

Page 22: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Technique Evaluation – Experiment 3

Computational performance of hybrid technique vs.

syntactic-only based one

For this test we selected profile pairs:

Having a number of common attributes

At least 1 attribute candidate for semantic matching

On average hybrid technique takes ≈15ms more

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Tim

e (

ms)

Number of Common Attributes

Syntactic

Hybrid

Page 23: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Find a deterministic similarity threshold with the

highest degree of confidence

Optimal threshold is 0.9 -> F-measure of 0.693

Technique Evaluation – Experiment 4

0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96

Precision 0.290 0.317 0.550 0.694 0.806 0.876 0.940 0.947 0.988

Recall 0.805 0.784 0.654 0.600 0.584 0.573 0.508 0.486 0.454

F1-Measure 0.426 0.452 0.598 0.643 0.677 0.693 0.660 0.643 0.622

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Re

sult

Page 24: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Quantitative & Qualitative

Performance of profile matching technique

Contact matcher run against the two social

networks that user is most active

Social Networks chosen:

Number of participants: 16

Person suggestion page

Short survey about their user experience

Usability Evaluation (1)

Page 25: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Usability Evaluation Results:

#Distinct Profiles: 8,415

#Average Profiles per Social Network per

Participant: 262

#Comparisons: 1,041,279

#Person Matching Suggestions: 1,195

#Correct Matches: 975

#Incorrect Matches: 220

#Precision rate: 0.816

Usability Evaluation (2)

Page 26: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Statistics & Results:

Social Network Integration

– 56.25% : LinkedIn and Facebook

– 25% : LinkedIn and Twitter

– 18.75% : Facebook and Twitter

User Satisfaction

– 50% : Extremely

– 43.8% : Quite a bit

– 0% : Moderately

– 6.3% : A little

– 0% : Not at all

Usability Evaluation (3)

Page 27: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Usability Evaluation (4)

Application 1: Management & Sharing Application 2: Enhanced Security

Application 3: Networking & Suggestions

Page 28: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Person’s gender is not provided by all social

network APIs

Identify gender based on first name or

surname through NER

Weights of some profile attributes e.g., first

name, surname are too high

In some cases they impact the final result too

strongly

More experiments will be conducted to fine-

tune these weights

Limitations

Page 29: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Consider identification of higher degrees of

semantic relatedness

Enrich technique with other LOD cloud datasets

Additional social networks targeted

Future Work

country

Page 30: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Profile matching algorithm with:

Semantic Lifting

NER on semi-/un-structured profile information

Linked Open Data to improve the NER process

Semantic matching at the schema level to find

any possible indirect semantic relations

Weighted Profile Attribute Matching

Quantitative & Qualitative Evaluation

Conclusion

Thank you for your attention

Page 31: An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

Existing Profile Matching Approaches based on:

User’s friends

Specific Inverse Functional Properties e.g., email

address

String matching of all profile attribute

Semantic relatedness between text, depending

on remote Knowledge Bases e.g., Wikipedia

Evaluation of these Approaches:

Technique Evaluation on controlled datasets

No Usability Evaluation

Related Work Comparison