Upload
loren-tilbury
View
222
Download
3
Tags:
Embed Size (px)
Citation preview
Semi-Supervised Learning & Summary
Advanced Statistical Methods in NLPLing 572
March 8, 2012
2
RoadmapSemi-supervised learning:
Motivation & perspective
Yarowsky’s modelCo-training
Summary
3
Semi-supervised Learning
4
MotivationSupervised learning:
5
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:
6
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:No labeled data required, butMay not work well, may not learn desired
distinctions
7
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:No labeled data required, butMay not work well, may not learn desired
distinctions
E.g. Unsupervised parsing techniquesFits data, but doesn’t correspond to linguistic
intuition
8
SolutionSemi-supervised learning:
9
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training data
10
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
11
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
Many different semi-supervised machine learnersVariants of supervised techniques:
Semi-supervised SVMs, CRFs, etc
12
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
Many different semi-supervised machine learnersVariants of supervised techniques:
Semi-supervised SVMs, CRFs, etc
Bootstrapping approaches Yarowsky’s method, self-training, co-training
13
There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example
The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example
Label the First Use of “Plant”
14
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysis
15
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysisCorrect sense can determine
.
16
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysisCorrect sense can determine
Available syntactic structureAvailable thematic roles, correct meaning,..
17
Disambiguation FeaturesKey: What are the features?
18
Disambiguation FeaturesKey: What are the features?
Part of speech Of word and neighbors
Morphologically simplified formWords in neighborhood
Question: How big a neighborhood? Is there a single optimal size? Why?
(Possibly shallow) Syntactic analysisE.g. predicate-argument relations, modification, phrases
Collocation vs co-occurrence featuresCollocation: words in specific relation: p-a, 1 word +/-Co-occurrence: bag of words..
19
WSD Evaluation
20
WSD EvaluationIdeally, end-to-end evaluation with WSD
componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific
21
WSD EvaluationIdeally, end-to-end evaluation with WSD
componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific
Typically, intrinsic, sense-basedAccuracy, precision, recallSENSEVAL/SEMEVAL: all words, lexical sample
22
WSD Evaluation Ideally, end-to-end evaluation with WSD component
Demonstrate real impact of technique in system Difficult, expensive, still application specific
Typically, intrinsic, sense-based Accuracy, precision, recall SENSEVAL/SEMEVAL: all words, lexical sample
Baseline: Most frequent sense
Topline: Human inter-rater agreement: 75-80% fine; 90% coarse
23
Minimally Supervised WSDYarowsky’s algorithm (1995)
Bootstrapping approach:Use small labeled seedset to iteratively train
24
Minimally Supervised WSDYarowsky’s algorithm (1995)
Bootstrapping approach:Use small labeled seedset to iteratively train
Builds on 2 key insights:One Sense Per Discourse
Word appearing multiple times in text has same sense
Corpus of 37232 bass instances: always single sense
25
Minimally Supervised WSD Yarowsky’s algorithm (1995)
Bootstrapping approach: Use small labeled seedset to iteratively train
Builds on 2 key insights: One Sense Per Discourse
Word appearing multiple times in text has same senseCorpus of 37232 bass instances: always single sense
One Sense Per CollocationLocal phrases select single sense
Fish -> Bass1
Play -> Bass2
26
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag
27
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K
28
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
29
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules
30
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D)
31
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
32
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
3. Apply 1 Sense/Discourse
33
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
3. Apply 1 Sense/Discourse
Disambiguation: First Rule Matched
34
Yarowsky Decision List
35
Iterative Updating
36
There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example
The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example
Label the First Use of “Plant”
37
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
38
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words
39
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words
Result: Correct Selection95% on Pair-wise tasks
40
Self-TrainingBasic approach:
Start off with small labeled training set
Train a supervised classifier with the training set
Apply new classifier to residual unlabeled training data
Add ‘best’ newly labeled examples to labeled training
Iterate
41
Self-TrainingSimple – right?
42
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
43
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
Highest confidence?Probably accurate, butProbably add little new information to classifier
44
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
Highest confidence?Probably accurate, butProbably add little new information to classifier
Most different?Probably adds information, butMay not be accurate
Use most different, highly confident instances
45
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”
46
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”Ensemble classifier:
Uses results from multiple classifiers
47
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”Ensemble classifier:
Uses results from multiple classifiers
Multi-view classifier:Uses different views of data – feature subsetsIdeally, views should be:
Conditionally independent Individually sufficient – enough information to learn
48
Co-training Set-upCreate two views of data:
Typically partition feature set by typeE.g. predicting speech emphasis
View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context
49
Co-training Set-upCreate two views of data:
Typically partition feature set by typeE.g. predicting speech emphasis
View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context
Some approaches use learners of different types
In practice, views may not truly be conditionally indep.But often works pretty well anyway
50
Co-training ApproachCreate small labeled training data set
Train two (supervised) classifiers on current training Using different views
Use two classifiers to label residual unlabeled instances
Select ‘best’ newly labeled data to add to training data* Adding instances labeled by C1 to training data for C2, v.v.
Iterate
51
Graphically
Figure from Jeon&Liu’11
52
More Devilish DetailsQuestions for co-training:
53
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
54
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
How many instances to add per iteration?Threshold – by count, by value?
55
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
How many instances to add per iteration?Threshold – by count, by value?
How long to iterate?Fixed count? Threshold classifier confidence? etc…
56
Co-training ApplicationsApplied to many language related tasks
Blum & Mitchell’s paperAcademic home web page classification95% accuracy: 12 pages labeled; 788 classified
Sentiment analysis
Statistical parsing
Prominence recognition
Dialog classification
57
Learning Curves:Semi-supervised vs
Supervised
9 12 24 50 100 30066
68
70
72
74
76
78
80
82
84
supervisedsemi-supervised
# of labelled examples
Ac
cu
rac
y
58
Semi-supervised LearningUmbrella term for machine learning techniques
that:Use a small amount of labeled training dataAugmented with information from unlabeled data
59
Semi-supervised LearningUmbrella term for machine learning techniques
that:Use a small amount of labeled training dataAugmented with information from unlabeled data
Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s
60
Semi-supervised LearningUmbrella term for machine learning techniques that:
Use a small amount of labeled training dataAugmented with information from unlabeled data
Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s
Can be temperamental:Sensitive to data, learning algorithm, design choicesHard to predict effects of:
amount of labeled data, unlabeled data, etc
61
Summary
62
ReviewIntroduction:
Entropy, cross-entropy, and mutual information
Classic machine learning algorithms:Decision trees, kNN, Naïve Bayes
Discriminative machine learning algorithms:MaxEnt, CRFs, SVMs
Other models:TBL, EM, Semi-supervised approaches
63
General MethodsData organization:
Training, development, test data splits
Cross-validation:Parameter turning, evaluation
Feature selection:Wrapper methods, filtering, weighting
Beam search
64
Tools, Data, & TasksTools:
Mallet libSVM
Data:20 Newsgroups (Text classification)Penn Treebank (POS tagging)
65
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
66
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)
67
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)
Ling and other electives
68
Course Evaluationshttps://depts.washington.edu/oeaias/webq/surve
y.cgi?user=UWDL&survey=1397
Thank you!