Naive Bayes Classifier Approach to Word Sense Disambiguation · Naive Bayes Classiﬁer Approach to...

Word Sense Disambiguation WSD Naive Bayes Classifier Conclusion

Naive Bayes Classifier Approach toWord Sense Disambiguation

Daniel Jurafsky and James H. Martin

Chapter 20Computational Lexical Semantics

Sections 1 to 2

Seminar in Methodology and Statistics 3/June/2009

Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 1/18

Outline

1 Word Sense Disambiguation WSDWhat is WSD?Variants of WSD

2 Naive Bayes ClassifierStatistics difficultyGet around the problemAssumptionSubstitutionIntuition of Naive Bayes Classifier for WSD

3 Conclusion

What is WSD? Variants of WSD

What is WSD?

WSD is the task of automatically assigning appropriatemeaning to a polysemous word within a given contexPolysemy is:

the ambiguity of an individual word or phrase that can be used(in different contexts) to express two or more different meanings

Here WSD is discussed in relation to computational lexical semantics

Example of polysemous word

Figure: Example sentences of the polysemous word bar

Variants of generic WSD

Many WSD algorithms rely on contextual similarityto help choose the proper sense of a word in contextTwo variants of WSD include:

All words approach andSuppervised or lexical sample approach

Unsupervised WSD approach

All words WSD approach

A system is given entire texts and a lexicon with an inventory ofsenses for each entry and the system is required to disambiguateevery context word in the text, disadvantages:

1 Training data for each word in the test set may not be available2 The approach of training one classifier per term is not practical

Supervised WSD approach

Supervised WSD approach or lexical sample WSD approach

Takes as input a word in context along with a fixedinventory of potential word senses and outputs thecorrect word sense for that useThe input data is hand-labled with correct word sensesUnlabeled target words in context can then be labeledusing such a trained classifier

Collecting features for Supervised WSD

Input for Supervised WSD are collected in feature vectorsA feature vector consits of numeric or nominal values toencode linguistic information as input to most ML algorithmsTwo classes of feature vectors extracted fromneighbouring context are:

1 Bag-of-word feature vectors and2 Collocational feature vectors

Classes of feature vectors

Bag-of-word feature vectors

These are unordered set of wordswith their exact position ignored

Classes of feature vectors

Collocation feature vectorsA collocation is a word or phrase in a position ofspecific relationship to a target wordThus a collocation encodes information about specific positionslocated to the left or right of the target word e.g. take bass as targetAn electric guitar and bass player stand off to one side, ...Collocation feature vector, extracted from a window of two wordsto the right and left of the target word, made up of the wordsthemselves and their respective POS, that is:[wi−2, POSi−2, wi−1, POSi−1, wi+1, POSi+1, wi+2, POSi+2]

Would yield the following vector:[guitar, NN, and, CC, player, NN, stand, VB]

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD

Naive Bayes Classifier

Because of the feature vector annotations we canuse a Naive Bayes Classifier approach to WSD

This approach is based on the premise that:

Choosing the best sense s out of the set ofpossible senses S for a feature vector ~f amountsto choosing the most probable sense given that vector.

This is to say:

s = arg maxs∈S

P(s|~f ) (1)

Statistics difficulty

Collecting reasonable statistics for above equation is difficult.

For example:

Consider that a binary bag of words vector definedover a vocabulary of 20 words would have

220 = 1, 048, 576 (2)

possible feature vectors.

To get around the problem

Equation 1 is Reformulated into the usual Bayesian manner:

s = arg maxs∈S

P(~f |s)P(s)

P(~f )(3)

Data that associates specific ~f with each sense is sparseBut information about individual feature-value pairs in the contextof specific senses is available in a tagged training set

Assumption

We naively assume that features are independed of one anotherand that features are conditionally independent given the word sense

Yielding the following approximation for P(~f |s):

P(~f |s) ≈n∏

P(fj |s) (4)

Probability of an entire vector given a sense can be estimated bythe product of the probability of its individual features given that sense

Naive Bayes Classifier for WSD

Since P(~f ) is the same for all possible sensesit does not affect the final ranking of sensesLeaving us with the following formulation whenwe subtitute for P(~f |s) in equation 3 above

s = arg maxs∈S

P(s)n∏

P(fj |s) (5)

Training a Naive Bayes Classifier

We can estimate each of the probabilities in equation 5 as shown below:

Prior probability of each sense P(s)

This probability is the sum of the instances of each sense of the word, i.e.:

P(si) =count(si , wj)

count(wj)(6)

Individual feature probabilities P(fj |s)

P(fj |s) =count(fj , s)

count(s)(7)

Intuition of Naive Bayes Classifier for WSD

Take a target word in contextExtract the specified features e.g. neighbouring words, POS, position

Compute P(s)n∏

P(fj |s) for each sense

Return the sense associated with the highest scores.

Conclusion

We discussed the Naive Baye’s classifier for WSDbased on Baye’s theorem and shown that it is possibleto disambiguate word Senses in contextBut we have not discussed:

Evaluation of such systems, andDisambiguation of phrasesTo find out, come to my TabuDag presentation

Naive Bayes Classifier Approach to Word Sense Disambiguation · Naive Bayes Classiﬁer Approach to...

Documents

Naive Bayes by Seo

Arboles de Decisión Naive Bayes

Naive Bayes - achamad.staff.ipb.ac.idachamad.staff.ipb.ac.id/wp-content/plugins/as-pdf/TOTO HARYANT… · Naive Bayes Sambil Mengingat Kembali Tentang Naive bayes. Berikut contoh

Robust Naive Bayes

Spam Filtering with Naive Bayes - Which Naive Bayes? · PDF fileSpam Filtering with Naive Bayes – Which Naive Bayes? ∗ Vangelis Metsis † Institute of Informatics and Telecommunications,

Naive Bayes Classi cation

naive bayes document classification - University of Windsorjlu.myweb.cs.uwindsor.ca/538/538bayes2018.pdf · Text classification Naive Bayes NB theory Evaluation of TC naive bayes

Naive Bayes

Naive Bayes - Brown University

Naive Bayes Text Classification · Naive Bayes Text Classification Sandeep Avula asandeep@live.unc.edu. Outline Basic Probability and Notation Bayes Law and Naive Bayes Classification

Classification with Naive Bayes

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier - cs.toronto.eduurtasun/courses/CSC411/tutorial4.pdf · Naive Bayes and Gaussian Bayes Classi er Mengye Ren mren@cs.toronto.edu October 18,

Jurnal Naive Bayes

Klasifikacija - Naive Bayes

41 j48 Naive Bayes Weka

Lesson 7.1 Naive Bayes Classifier

Web viewThe naive Bayes model for unsupervised word sense disambiguation : ... Hristea, Florentina. P 325.5.D38H75 2013. 002161927 . Gesammelte Schriften :

Bayes Theorem & Naïve Bayes - Penn Engineeringcis521/Lectures/naive-bayes-spam.pdfUsing Naive Bayes Classifiers to Classify Text: Basic method for Multinomial Variables • As a generative

Naive Bayes lecture