View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Semi-Supervised Natural Language Learning Reading Group
• I set up a site at: http://www.cs.cmu.edu/~acarlson/semisupervised/
• Cover other applications of semi-supervised learning?
• Volunteers?
• Every week or bi-weekly?
• Time change? 1pm? Noon?
Unsupervised Word Sense Disambiguation Rivaling
Supervised Methods
Author: David Yarowsky (1995)
Presented by: Andy Carlson
Word Sense Disambiguation
• Determining what sense of a word is meant in a given sentence
• “Toyota is considering opening a plant in Detroit.”
• “The banana plant is grown all over the tropics for its fruit.”
• Different from sense induction– we assume we already know distinct senses
Using unlabeled data
• Two properties of language let us use unlabeled data:
• One sense per collocation– Nearby words provide strong and consistent clues
• One sense per discourse– With a document, the sense of a word is highly
consistent
• We can base an iterative bootstrapping algorithm on these two properties
Decision Lists
• List of rules of the form “collocation => sense”
• Example: life (within 2-10 words) => biological sense of plant
• Rules are ordered by log-likelihood ratio
The algorithm – step 1
• Find all occurrences of the given polysemous word
• We follow examples for the word plant
Step 2 – Initial Labeling
• For each sense of the word, identify a small number of training examples
• Strategies: dictionary words, human-labelling of most frequent collocates, or human-chosen collocates
• Example: the words life and manufacturing are used as seed collocations
Step 3d
• Repeat step 3 iteratively
• Details – grow window size for collocations, and randomly perturb the class inclusion threshold