Upload
kcortis
View
307
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This paper was presented at the 5th International Conference on Social Informatics (http://www.socinfo2013.com/) in Kyoto, Japan on 27 November 2013. The full paper can be found at: http://link.springer.com/chapter/10.1007%2F978-3-319-03260-3_25
Citation preview
www.insight-centre.org
An Ontology-based Technique for
Online Profile Resolution
Keith Cortis, Simon Scerri, Ismael Rivera,
Siegfried Handschuh
International Conference on Social Informatics
Kyoto, Japan 27th November 2013
www.insight-centre.org
Introduction (1)
Instance Matching : if two instances /
representations refer to the same real world
entity or not e.g., persons
Research Challenge : Discovery of multiple
online profiles that refer to the same person
identity on heterogeneous social networks
www.insight-centre.org
Improved profile matching system extended
with:
Named Entity Recognition
Linked Open Data
Semantic Matching
Additional Benefit: Ontology used as a
background schema
Advantage: Standard schema enables
cross-network interoperability
Introduction (2)
www.insight-centre.org
Contact Matcher Applications:
Control sharing of personal data
Detection of fully or partly anonymous
contacts
o > 83 million fake accounts
New contacts suggestions that are of direct
interest to user
Motivation
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
Online Profile Suggestions
5
1
4
ANNIE IE System
Large KB Gazetteer
Surname
Name
City
Country
c
Attribute Weighting Function
b
Semantic-based Matching Extension
Country City
country
a Attribute
Value Matching
NCO
Online Profile Merging
6
Named Entity Recognition
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
1
www.insight-centre.org
Lifting semi-/un-structured profile information
from a remote schema
Transform information to instances of the
Contact Ontology (NCO)
NCO - Identity-related online profile information
Semantic Lifting
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
1
4
ANNIE IE System
Large KB Gazetteer
a Attribute
Value Matching
NCO
Named Entity Recognition
Surname
Name
City
Country
www.insight-centre.org
Direct Value Comparison
String Matching
Best string matching metric for each
attribute type
Attribute Value Matching
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
1
4
ANNIE IE System
Large KB Gazetteer
Surname
Name
City
Country
b
Semantic-based Matching Extension
Country City
country
a Attribute
Value Matching
NCO
Named Entity Recognition
www.insight-centre.org
Indirect semantic relations at a schema level
Use-case: Location-related profile attributes
Location sub-entities being semantically
compared are: city, region and country
Find the semantic relations between the sub-
entities in question in a bi-directional manner
E.g. Galway (profile 1) vs. Ireland (profile 2)
Semantic-based Matching
Ireland Galway
country
isPartOf locatedWithin
Galway Ireland
capital
largestCity containsLocation
isLocationOf
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
1
4
ANNIE IE System
Large KB Gazetteer
Surname
Name
City
Country
c
Attribute Weighting Function
b
Semantic-based Matching Extension
Country City
country
a Attribute
Value Matching
NCO
Named Entity Recognition
www.insight-centre.org
Approach 1: Direct Similarity Score
Approach 2: Normalised Similarity Score
based on a threshold for each attribute type
Attribute Weighting Function
Name Justin Bieber J. Bieber
Similarity Value 0.90
Name Justin Bieber J. Bieber
Metric Similarity Value 0.90
Similarity Value 1.0
Attribute Threshold for Name : 0.70
Name Justin Bieber Joffrey Baratheon
Metric Similarity Value 0.4
Similarity Value 0.0
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
Online Profile Suggestions
5
1
4
ANNIE IE System
Large KB Gazetteer
Surname
Name
City
Country
c
Attribute Weighting Function
b
Semantic-based Matching Extension
Country City
country
a Attribute
Value Matching
NCO
Named Entity Recognition
www.insight-centre.org
Online Profile Suggestions
Name Joffrey Baratheon Joff Baratheon
City King’s Landing King’s Landing
Role King King
Date of Birth 286AL 286AL
Similarity Score 0.95
Name Joffrey Baratheon Joffrey Bieber
City King’s Landing London, Ontario
Role King Singer
Date of Birth 286AL 01/03/1994
Similarity Score 0.30
Similarity Threshold: 0.90
www.insight-centre.org
Online Profile Suggestions
www.insight-centre.org
Profile Resolution Technique
User Profile Data Extraction
Semantic Lifting
2
3
Hybrid Matching
Process
Online Profile Suggestions
5
1
4
ANNIE IE System
Large KB Gazetteer
Surname
Name
City
Country
c
Attribute Weighting Function
b
Semantic-based Matching Extension
Country City
country
a Attribute
Value Matching
NCO
Online Profile Merging
6
Named Entity Recognition
www.insight-centre.org
Two-staged evaluation:
1. Technique
a) Best attribute similarity score approach
b) If NER & semantic-based matching extension
improve overall technique
c) The computational performance of hybrid
technique against the syntactic-based one
d) A similarity threshold that determines profile
equivalence within a satisfactory degree of
confidence
2. Usability
e) Level of precision for the profile matching
Experiments & Evaluation
www.insight-centre.org
Two Datasets:
1. A controlled dataset of public profiles obtained
from the Web (LinkedIn and Twitter)
182 online profiles
– 112 ambiguous real-world
persons (common attributes)
– 70 refer to 35 well-known
sports journalists
Maximised False Positives
2. Private personal and contact-list profiles
obtained from 5 consenting participants
Technique Evaluation
www.insight-centre.org
Profile attribute similarity score that fares best
Direct Approach outperforms Normalised Approach
8631 online profile pair comparisons
Technique Evaluation – Experiment 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7 0.75 0.8 0.85 0.9
Re
sult
Threshold value
Direct Approach
Precision
Recall
F1-Measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7 0.75 0.8 0.85 0.9
Re
sult
s
Threshold value
Normalised Approach
Precision
Recall
F1-Measure
www.insight-centre.org
String-based technique vs. String + NER + Semantic-
based technique
New hybrid technique improves the results
considerably over the string-only based one
F-measure -> more or less stable for thresholds of
0.75 and 0.8.
Technique Evaluation – Experiment 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7 0.75 0.8
Re
sult
Threshold value
Precision
Recall
F1-Measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.7 0.75 0.8
Re
sult
Threshold value
Precision
Recall
F1-Measure
String Technique
Hybrid Technique
www.insight-centre.org
Technique Evaluation – Experiment 3
Computational performance of hybrid technique vs.
syntactic-only based one
For this test we selected profile pairs:
Having a number of common attributes
At least 1 attribute candidate for semantic matching
On average hybrid technique takes ≈15ms more
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Tim
e (
ms)
Number of Common Attributes
Syntactic
Hybrid
www.insight-centre.org
Find a deterministic similarity threshold with the
highest degree of confidence
Optimal threshold is 0.9 -> F-measure of 0.693
Technique Evaluation – Experiment 4
0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96
Precision 0.290 0.317 0.550 0.694 0.806 0.876 0.940 0.947 0.988
Recall 0.805 0.784 0.654 0.600 0.584 0.573 0.508 0.486 0.454
F1-Measure 0.426 0.452 0.598 0.643 0.677 0.693 0.660 0.643 0.622
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Re
sult
www.insight-centre.org
Quantitative & Qualitative
Performance of profile matching technique
Contact matcher run against the two social
networks that user is most active
Social Networks chosen:
Number of participants: 16
Person suggestion page
Short survey about their user experience
Usability Evaluation (1)
www.insight-centre.org
Usability Evaluation Results:
#Distinct Profiles: 8,415
#Average Profiles per Social Network per
Participant: 262
#Comparisons: 1,041,279
#Person Matching Suggestions: 1,195
#Correct Matches: 975
#Incorrect Matches: 220
#Precision rate: 0.816
Usability Evaluation (2)
www.insight-centre.org
Statistics & Results:
Social Network Integration
– 56.25% : LinkedIn and Facebook
– 25% : LinkedIn and Twitter
– 18.75% : Facebook and Twitter
User Satisfaction
– 50% : Extremely
– 43.8% : Quite a bit
– 0% : Moderately
– 6.3% : A little
– 0% : Not at all
Usability Evaluation (3)
www.insight-centre.org
Usability Evaluation (4)
Application 1: Management & Sharing Application 2: Enhanced Security
Application 3: Networking & Suggestions
www.insight-centre.org
Person’s gender is not provided by all social
network APIs
Identify gender based on first name or
surname through NER
Weights of some profile attributes e.g., first
name, surname are too high
In some cases they impact the final result too
strongly
More experiments will be conducted to fine-
tune these weights
Limitations
www.insight-centre.org
Consider identification of higher degrees of
semantic relatedness
Enrich technique with other LOD cloud datasets
Additional social networks targeted
Future Work
country
www.insight-centre.org
Profile matching algorithm with:
Semantic Lifting
NER on semi-/un-structured profile information
Linked Open Data to improve the NER process
Semantic matching at the schema level to find
any possible indirect semantic relations
Weighted Profile Attribute Matching
Quantitative & Qualitative Evaluation
Conclusion
Thank you for your attention
www.insight-centre.org
Existing Profile Matching Approaches based on:
User’s friends
Specific Inverse Functional Properties e.g., email
address
String matching of all profile attribute
Semantic relatedness between text, depending
on remote Knowledge Bases e.g., Wikipedia
Evaluation of these Approaches:
Technique Evaluation on controlled datasets
No Usability Evaluation
Related Work Comparison