Upload
lakshay-bansal
View
224
Download
0
Embed Size (px)
Citation preview
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
1/18
Feature-based approaches to semanticsimilarity assessment of concepts using
Wikipedia
-Yuncheng Jiang , Xiaopei Zhang, Yong Tang, Ruihua Nie
Presented By:Kushagra Sharma (286/CO/12)Lakshay Bansal (287/CO/12)
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
2/18
Abstract
In the ast! se"eral ar#a$hes t# assess s%m%lar%ty &ye"aluat%ng the kn#'lege m#ele %n an (#r mult%le) #nt#l#gy (#r#nt#l#g%es) ha"e &een r##se
*#'e"er! there are s#me l%m%tat%#ns su$h as the +a$ts #+ rely%ng#n ree+%ne #nt#l#g%es an +%tt%ng n#n-ynam%$ #ma%ns %n thee,%st%ng measures
In th%s aer! s#me n#"el +eature &ase s%m%lar%ty assessment
meth#s ha"e &een r##se that are +ully eenent #n%k%e%a an $an a"#% m#st #+ the l%m%tat%#ns an ra'&a$ks%ntr#u$e %n the re"%#us meth#s
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
3/18
Introduction
Definition: Semant%$ s%m%lar%ty %s unerst## as the egree #+ta,#n#m%$ r#,%m%ty &et'een $#n$ets (#r terms! '#rs)
In #ther '#rs! semant%$ s%m%lar%ty states h#' ta,#n#m%$ally neart'# $#n$ets (#r terms! '#rs) are! &e$ause they share s#mease$ts #+ the%r mean%ng
.e$hn%$ally! s%m%lar%ty measures assess a numer%$al s$#re thatuant%+%es th%s r#,%m%ty as a +un$t%#n #+ the semant%$ e"%en$e#&ser"e %n #ne #r se"eral kn#'lege s#ur$es
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
4/18
Ontology based methods toestimate similarity
0ge C#unt%ng easures
In+#rmat%#n C#ntent easures
eature Base easures
*y&r% easures
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
5/18
Edge counting measures
- consists of taking into account the length of the path linking theconcepts !or terms" and the position of the concepts !or terms"in a gi#en dictionary !or ta$onomy% ontology"
.he ma%n ad#antage #+ ege $#unt%ng measures %s the%r simplicity&
.hey #nly rely #n the grah m#el #+ an %nut #nt#l#gy 'h#see"aluat%#n reu%res a lo' computational cost&
3ue t# the%r s%ml%$%ty! these ar#a$hes #++er a limited accuracy ue t# #nt#l#g%es m#el a large am#unt #+ ta,#n#m%$al kn#'legethat %s n#t $#ns%ere ur%ng the e"aluat%#n #+ the m%n%mum ath In
an#ther erse$t%"e! the ma%n assumt%#n #+ ege $#unt%ngmeasures %s that an ege reresents the same semantic distanceany'here %n the stru$ture #+ the grah (#r ath)! 'h%$h %s n#t true asse$t%#ns #+ the grah may &e +%nely $lass%+%e an #thers #nly$#arsely e+%ne
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
6/18
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
7/18
DI*AD+A,A.E* ofO,O/O.0 BA*ED )E1OD*
Clearly! the $#nstru$t%#n process of domain ontologies is time-consuming and
error-prone an ma%nta%n%ng these #nt#l#g%es als# reu%res a l#t #+ e++#rt +r#m
e,erts .hus! the meth#s #+ #nt#l#gy &ase s%m%lar%ty measures are limited in
scope and scalability
%th the emergence of social net'orks or instant messaging systems! a l#t #+
(sets #+) $#n$ets #r terms (r#er n#uns! &rans! a$r#nyms! ne' '#rs!$#n"ersat%#nal '#rs! te$hn%$al terms an s# #n) are not included in Word,et an
#ma%n #nt#l#g%es (%n +a$t e& users $an u&l%sh 'hate"er they 'ant t# share '%th
the rest #+ the '#rl &y us%ng %k%s! Bl#gs an #nl%ne $#mmun%t%es at resent)!
there+#re! s%m%lar%ty measures that are &ase #n these k%ns #+ kn#'lege res#ur$es
$ann#t &e use %n these tasks
.hese l%m%tat%#ns are the m#t%"at%#n &eh%n the ne' te$hn%ues resente %n th%saer 'h%$h infer semantic similarity from a kind of ne' source of information!
%e! a '%e $#"erage #nl%ne en$y$l#e%a! namely %k%e%a
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
8/18
Feature based similarityFeature based approaches to similarity measures assess similarity bet'een conceptsas a function of their properties&
(ommon features tend to increase similarity and non-common ones tend to diminish it&
4m%tt%ng a +un$t%#n 2!c" that yields the set of features rele#ant to c! ."ersky r##se the+#ll#'%ng s%m%lar%ty +un$t%#n:
'here %s s#me +un$t%#n that re+le$ts the sal%en$e #+ a set #+ +eatures! 5(a) 5(&) %s the%nterse$t%#n &et'een th#se t'# sets #+ +eatures! 5(a) 5(&)%s the set #&ta%ne 'henel%m%nat%ng the elements #+ '(&) +r#m the set #+ +eatures #+ $#n$et a! 5(a)! an ! 9 an are arameters that r#"%e +#r %++eren$es %n +#$us #n the %++erent $#m#nents
3odrigue4 and Egenhofer!35E" resent a k%n #+ ar#a$h t# $#mut%ng semant%$s%m%lar%ty .he s%m%lar%ty %s $#mute as the 'e%ghte sum #+ s%m%lar%t%es &et'een synsets!+eatures (eg! mer#nyms! attr%&utes! et$) an ne%ghr $#n$ets (th#se l%nke "%a semant%$
#%nters) #+ e"aluate terms: Simre(a,b) = w. S synsets(a,b) + u. S features(a,b)+ v. S neighborhoods(a,b)
'here the +un$t%#ns Ssynsets! S+eatures! an Sne%ghrh##s are the s%m%lar%ty &et'eensyn#nym sets! +eatures! an semant%$ ne%ghrh##s #+ e"aluate terms! '! u! an " ('! u! ";
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
9/18
S reresents the #"erla%ng &et'een the %++erent +eatures!$#mute as +#ll#'s:
4 +eature &ase +un$t%#n $alle =-s%m%lar%ty rel%es #n the mat$h%ng &et'een synsets an termes$r%t%#n sets .he term es$r%t%#n sets $#nta%n '#rs e,tra$te &y ars%ng term e+%n%t%#ns .'# terms are s%m%lar %+ the%r synsets #r es$r%t%#n sets #r! the synsets #+ the terms %n the%rne%ghrh## (eg! m#re se$%+%$ an m#re general terms) are le,%$ally s%m%lar .he s%m%lar%ty+un$t%#n %s e,resse as +#ll#'s
.he s%m%lar%ty +#r the semant%$ ne%ghrs Sne%ghrh##s %s $al$ulate as +#ll#'s:
'here % en#tes relat%#nsh% tye
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
10/18
WIs hyerl%nks are als# use+ul as ana%t%#nal s#ur$e #+ syn#nyms n#t $ature &y re%re$ts*yerl%nks als# $#mlement%sam&%guat%#n ages &y en$#%ng #lysemy In art%$ular! art%$les ment%#n%ng #ther en$y$l#e%$entr%es #%nt t# them thr#ugh %nternal hyerl%nks .h%s m#els art%$le $r#ss-re+eren$e
(ategory structure S%n$e ay 2
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
11/18
Feature-based similarity usingWikipedia
Formal representation of Wikipedia conceptsLet 4 &e a %k%e%a art%$le an C#n &e the t%tle #+ 4 .he +#rmal reresentat%#n #+ %k%e%a
$#n$et C#n %se+%ne as +#ll#'s:
(on 6 =*ynonyms% .losses% Anchors% (ategories>
'here Syn#nyms C#n! C#n1! ! C#nmD %s the set #+ syn#nyms #+ C#n! El#sses %s the +%rstaragrah #+ te,t #+ 4! 4n$h#rs 4n$1! ! 4n$nD %s the set #+ an$h#r te,ts (%e! la&els #+%nternal hyerl%nks) %n 4! an Categ#r%es Cat1! ! CatkD %s the set #+ $ateg#r%es #+ 4
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
12/18
4 +rame'#rk +#r +eature-&ase s%m%lar%ty
Let C#n1 FSyn#nyms1! El#sses1! 4n$h#rs1! Categ#r%es1G an C#n2 FSyn#nyms2! El#sses2!
4n$h#rs2! Categ#r%es2G &e t'# %k%e%a $#n$ets .he s%m%lar%ty #+ C#n1 an C#n2! en#te
as S%mC#n(C#n1! C#n2)! %s the +un$t%#n
S%mC#n: %k%C#n = %k%C#n H
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
13/18
)A1E)AI(A/ )ODE//I,. OF FEA3EBA*ED A**E**)E,
e $an #&ta%n %++erent +eature &ase ar#a$hes t# s%m%lar%ty assessment reult%ng +r#m%nstant%at%#ns #+ the +rame'#rk
%th#ut l#ss #+ general%ty! 'e assume that there are t'# sets #+ terms (#r '#rs! $#n$ets) Set1
an Set2 O&"%#usly! these t'# sets may &e Syn#nyms! Setgl#sses! 4n$h#rs! #r Categ#r%es
4$$#r%ng t# =-s%m%lar%ty ar#a$h #r 0 ar#a$h (#r%gue@ 0genh#+er)! 'e ha"e the
+#ll#'%ng s%m%lar%ty $#mutat%#n meth#s +#r Set1 an Set2:
here
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
14/18
E%"en t'# %k%e%a $#n$ets
(on? 6 =*ynonyms?% *etglosses?% Anchors?% (ategories?> and
(on@ 6 =*ynonyms@% *etglosses@%Anchors@% (ategories@>
4$$#r%ng t# the n#t%#ns #+ SBsim an SKL0! 'e ha"e the +#ll#'%ng ar#a$hes t# s%m%lar%tymeasures +#r %k%e%a $#n$ets (su#se that the +un$t%#n S$#n$ets %s the a"erage #r ma,):
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
15/18
(omparisonof #arious
Approachesto 1uman
basedCudgements
3esults on correlation'ith human udgementsof similarity measures&
(r#m le+t t# r%ght: measurear#a$h! $#rrelat%#n +#r C
&en$hmark! $#rrelat%#n +#r E&en$hmark! an $#rrelat%#n +#r
NMN-.C &en$hmark)
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
16/18
Benchmark and E$perimental results !based on studentsG and professorsG udgements"
3esults on correlation 'ith our benchmark ofsimilarity measures&
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
17/18
Analysis of E$perimental3esults
.he ar#a$hes S%m%r C#n! S%mSe$C#n! S%m.h%C#n! an S%m#uC#n! they er+#rm relat%"ely 'ell
'%th the l#'est $#rrelat%#n &e%ng
8/18/2019 Feature-based approaches to semantic similarity assessment of concepts using Wikipedia
18/18
(onclusion
.he +%nal g#al #+ $#muter%@e s%m%lar%ty measures %s to accurately mimic human udgements about semantic similarity&
In th%s aer! s#me limitations #+ the e,%st%ng +eature &ase measures areidentified! su$h as the +a$ts #+ rely%ng #n a (#r mult%le) ree+%ne #ma%n #nt#l#gy (#r#nt#l#g%es) an fitting static domains (%e! n#n-ynam%$ #ma%ns)
.# %mlement semant%$ s%m%lar%ty measurement &ase #n +eature &y mak%ng use #+
%k%e%a a formal representation of Wikipedia concepts %s resente .hen! aframe'ork +#r +eature &ase s%m%lar%ty &ase #n the +#rmal reresentat%#n #+ %k%e%a$#n$ets %s g%"en
.he e"aluat%#n! &ase #n se"eral '%ely use &en$hmarks an a &en$hmarke"el#e %n th%s aer! susta%ns the %ntu%t%#ns '%th rese$t t# human Pugements
O"erall! se"eral meth#s resente here ha"e g## human $#rrelat%#n an $#nst%tute
s#me e++e$t%"e 'ays #+ eterm%n%ng s%m%lar%ty &et'een %k%e%a $#n$ets In a%t%#n!$#ns%er%ng the l%m%tat%#ns (eg! small s%@e) #+ the e,%st%ng stanar &en$hmarks +#r$#n$et s%m%lar%ty assessment! 'e '%ll ursue the es%gn #+ a ne' &en$hmark se$%ally+#$use #n %k%e%a $#n$ets