MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka(WSDM’09)

Speaker : Yi-Ling TaiDate : 2009/11/23

OUTLINE Introduction Method

Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations Measuring Relational similarity

Experiments Conclusions

INTRODUCTION Implicit semantic relations between two words

Google, Youtube (acquisition) Ostrich, bird (is a large)

Similar semantic relations between two words pairs Google, Youtube → Yahoo, Inktomi Ostrich, bird → lion, cat

This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.

OUTLINE OF THE SIMILARITY METHOD

OUTLINE OF THE SIMILARITY METHOD Web search component

query a Web search engine to find the contexts Pattern extraction component

extract lexical patterns that express semantic relations

Pattern clustering component cluster the patterns to identify particular relation

Similarity computation component. compute the relational similarity between two

word-pairs

RETRIEVAL CONTEXTS Snippets - brief summaries provided by Web

search engines along with the search results. containing two words, captures the local context

query “Google * *YouTube”

RETRIEVAL CONTEXTS “ * ” - wildcard operator, matches one word or

To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B *

* * A”, and A B query words co-occur within a maximum of three

words “ ” ensure that the two words appear in the order

remove duplicates if they contain the exact sequence of all words 7

EXTRACTING LEXICAL PATTERNS shallow lexical pattern extraction algorithm

extract the semantic relations between two words from web snippets.

not require language preprocessing

Consist of the following three steps Step 1:

Replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks

EXTRACTING LEXICAL PATTERNS Step 2:

Exactly one X and one Y must exist in a subsequence The maximum length of a subsequence is L words. Gaps should not exceed g words. Total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not

Step 3: select subsequences with frequency greater than N

EXTRACTING LEXICAL PATTERNS a modified prefixspan algorithm

consider all the words in a snippet not limited to extracting patterns from only the

mid-fix

X to acquire Y, X acquire Y, X to acquire Y for.10

IDENTIFYING SEMANTIC RELATIONS A semantic relation can be expressed using

more than one pattern.

If there are many related patterns between two word-pairs, we can expect a high relational similarity.

cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.

IDENTIFYING SEMANTIC RELATIONS p : word-pair frequency vector of pattern p : frequency of pattern p occurs with

the word-pair SORT : sorts the patterns in the descending

order of their total occurrence in all word-pairs

c : the vector sum of all word-pair frequency vectors corresponding to the patterns that belong to that cluster.

: denote the vector addition : similarity threshold 13

MEASURING RELATIONAL SIMILARITY : feature vector of a word-pair

Elements of the feature vector , are the total frequencies of the word-pair in each cluster.

the relational similarity between two word-pairs

is a correlation matrix 14

MEASURING RELATIONAL SIMILARITY the correlation between clusters and by

the element in

is the union between the two clusters

EXPERIMENTS Dataset

100 instances (word or named-entity pairs)

five relation types ACQUIRER-ACQUIREE PERSON-BIRTHPLACE CEO-COMPANY COMPANY-HEADQUARTERS PERSON-FIELD

EXPERIMENTS manually select 20 instances for each types.

Wikipedia online newspapers company reviews

For each instance, download snippets using YahooBOSS API

EXPERIMENTS - LEXICAL PATTERNS Lexical Patterns

run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910

we only select the 148655 patterns that occur at least twice. 18

EXPERIMENTS - PATTERN CLUSTERS Ratio : singletons to total number of clusters

EXPERIMENTS -RELATION CLASSIFICATION We evaluate the proposed relational similarity

measure in a relation classification task. k-nearest neighbor classification

classification accuracy

average precision

Rel(r) : a binary valued function that returns 1 if the word-pair at rank r has the same relation 20

EXPERIMENTS -RELATION CLASSIFICATION

= 0.955 2629 non-singleton clusters 6930 singletons

EXPERIMENTS -RELATION CLASSIFICATION the top 10 clusters with the largest number

of lexical patterns. the top four patterns that occur in most

number of word-pairs

RELATIONAL SIMILARITY MEASUREScompare the relational similarity measures VSM:

each word-pair is represented by a vector of pattern frequencies

the relational similarity between two word-pairs is computed as the cosine similarity

LRA: Latent Relational Analysis Create a matrix in which the rows represent

word-pairs and the columns represent lexical patterns

singular value decomposition (SVD) 23

RELATIONAL SIMILARITY MEASURES IP:

set in Formula 2 to the identity matrix compute relation similarity using pattern clusters

CORR: the proposed relational similarity measure.

RELATIONAL SIMILARITY MEASURES

CONCLUSIONS We proposed a method to compute the

similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly compute relational similarity for unseen

word-pairs a general framework - designing relational similarity

measures can be modeled as searching for a matrix

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Documents

Implicit Testing Introduction: The Implicit Testing Company Ltd

Search Engines and Google - Forsiden Engines • Search engine queries are not like SQL ... • Computes the Jaccard similarity using signatures ... • Minhashing is fast but there

Day 10 - Implicit Differentiation - TUTORBEE.tvtutorbee.tv/StudyGuides/MCV4U/Day 10 - Implicit Differentiation.pdf · Differentiation Implicit Differentiation Level 8 12 IMPLICIT

Implicit LES

Implicit skinning

Implicit Implicit Scala

QISS: An Open Source Image Similarity Search Engine€¦ · While other search engines are based on text surrounding images or tags, QISS evaluates the semantic similarity between

String Similarity Join With Diﬀerent Similarity Thresholds ...ynsilva/publications/StringSimilarityJoin.pdf · String Similarity Join With Diﬀerent Similarity Thresholds Based

Measuring Semantic Similarity between Words Using Web Search Engines WWW 07

Search Engines that Learn from Implicit Feedback Jiawen, Liu Speech Lab, CSIE National Taiwan Normal University Reference: Search Engines that Learn from

Symbolic Melodic Similarity (through Shape Similarity)

Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Proving triangle similarity using sas and sss similarity

The Consequences Of Implicit Bias: What Is Implicit Bias ...inns.innsofcourt.org/media/161535/the_consequences_of_implicit_bias... · The Consequences Of Implicit Bias: What Is Implicit

Flexible Similarity Search of Semantic Vectors Using ...€¦ · Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines MichalRůžička 1,VítNovotný ,PetrSojka

The semantics of similarity in geographic information retrievalgeog.ucsb.edu/~jano/semofsim2011.pdf · 2014. 6. 22. · mation indexing, relevance rankings, search engines, evaluation

User profile correlation-based similarity (UPCSim) algorithm ......collaborative ltering similarity [29], the Triangle Multiplying Jaccard (TMJ) similarity [30], and the similarity

Implicit -phoebe

Similarity and Transformations...Holt McDougal Geometry Similarity and Transformations A transformation that produces similar figures is a similarity transformation. A similarity transformation

Triangle Similarity: AA, SSS, SASTriangle Similarity: AA