24
Learning Within- Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Learning Within-Sentence Semantic Coherence

  • Upload
    ford

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning Within-Sentence Semantic Coherence. Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University. Semantic (in)Coherence. Trigram: content words unrelated Effect on speech recognition: - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Within-Sentence Semantic Coherence

Learning Within-Sentence Semantic Coherence

Elena Eneva

Rose Hoberman

Lucian LitaCarnegie Mellon University

Page 2: Learning Within-Sentence Semantic Coherence

Semantic (in)Coherence

Trigram: content words unrelated Effect on speech recognition:

– Actual Utterance: “THE BIRD FLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK”

– Top Hypothesis: “THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID”

Our goal: model semantic coherence

Page 3: Learning Within-Sentence Semantic Coherence

A Whole Sentence Exponential Model [Rosenfeld 1997]

P0(s) is an arbitrary initial model (typically N-gram)

fi(s)’s are arbitrary computable properties of s (aka features)

Z is a universal normalizing constant

)exp()(1

)Pr( )(0 sii

i

fsPZ

s

(

def

Page 4: Learning Within-Sentence Semantic Coherence

A Methodology for Feature InductionGiven corpus T of training sentences:

1. Train best-possible baseline model, P0(s)

2. Use P0(s) to generate corpus T0 of “pseudo sentences”

3. Pose a challenge: find (computable) differences that allow discrimination between T and T0

4. Encode the differences as features fi(s)

5. Train a new model:

)exp()(1

)( )(01 sii

i

fsPZ

sP

Page 5: Learning Within-Sentence Semantic Coherence

Discrimination Task:

1. - - - feel - - sacrifice - - sense - - - - - - - - -meant - - - - - - - - trust - - - - truth

2. - - kind - free trade agreements - - - living - - ziplock bag - - - - - - university japan's daiwa bank stocks step –

Are these content words generated from atrigram or a natural sentence?

Page 6: Learning Within-Sentence Semantic Coherence

Building on Prior Work

Define “content words” (all but top 50) Goal: model distribution of content

words in sentence Simplify: model pairwise co-

occurrences (“content word pairs”) Collect contingency tables; calculate

measure of association for them

Page 7: Learning Within-Sentence Semantic Coherence

Q Correlation Measure

Q values range from –1 to +1

21122211

21122211

cccc

cc-cc

Q

W1 yes

W1 no

W2 yes c11 c21

W2 no c12 c22

Derived fromCo-occurrenceContingencyTable

Page 8: Learning Within-Sentence Semantic Coherence

Density Estimates

We hypothesized:– Trigram sentences: wordpair correlation

completely determined by distance– Natural sentences: wordpair correlation

independent of distance kernel density estimation

– distribution of Q values in each corpus– at varying distances

Page 9: Learning Within-Sentence Semantic Coherence

Q Distributions

Q Value

Den

sity

---- Trigram Generated Broadcast News

Distance = 1 Distance = 3

Page 10: Learning Within-Sentence Semantic Coherence

Likelihood Ratio Feature

ji ijij

ijij

TrigramdQ

BNewsdQL

, wordpairs ),|Pr(

),|Pr(

she is a country singer searching for fame and fortune in nashville

Q(country,nashville) = 0.76 Distance = 8Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9

Page 11: Learning Within-Sentence Semantic Coherence

Simpler Features

Q Value based– Mean, median, min, max of Q values for content

word pairs in the sentence (Cai et al 2000)– Percentage of Q values above a threshold– High/low correlations across large/small distances

Other– Word and phrase repetition– Percentage of stop words– Longest sequence of consecutive stop/content

words

Page 12: Learning Within-Sentence Semantic Coherence

Datasets

LM and contingency tables (Q values) derived from 103 million words of BN

From remainder of BN corpus and sentences sampled from trigram LM:– Q value distributions estimated from ~100,000

sentences– Decision tree trained and test on ~60,000 sentences

Disregarded sentences with < 7 words – “Mike Stevens says it’s not real”– “We’ve been hearing about it”

Page 13: Learning Within-Sentence Semantic Coherence

Experiments

Learners: – C5.0 decision tree– Boosting decision stumps with

Adaboost.MH Methodology:

– 5-fold cross validation on ~60,000 sentences

– Boosting for 300 rounds

Page 14: Learning Within-Sentence Semantic Coherence

Results

Feature Set Classification

Accuracy

Q mean, median, min, max (Previous Work)

73.39 ± 0.36

Likelihood Ratio 77.76 ± 0.49

All but Likelihood Ratio 80.37 ± 0.42

All Features 80.37 ± 0.46

Likelihood Ratio + non-Q

Page 15: Learning Within-Sentence Semantic Coherence

Shannon-Style Experiment

50 sentences – ½ “real” and ½ trigram-generated– Stopwords replaced by dashes

30 participants– Average accuracy of 73.77% ± 6– Best individual accuracy 84%

Our classifier:– Accuracy of 78.9% ± 0.42

Page 16: Learning Within-Sentence Semantic Coherence

Summary

Introduced a set of statistical features which capture aspects of semantic coherence

Trained a decision tree to classify with accuracy of 80%

Next step: incorporate features into exponential LM

Page 17: Learning Within-Sentence Semantic Coherence

Future Work

Combat data sparsity– Confidence intervals– Different correlation statistic– Stemming or clustering vocabulary

Evaluate derived features– Incorporate into an exponential language model– Evaluate the model on a practical application

Page 18: Learning Within-Sentence Semantic Coherence

Agreement among Participants

Page 19: Learning Within-Sentence Semantic Coherence

Expected Perplexity Reduction

Semantic coherence feature– 78% of broadcast news sentences– 18% of trigram-generated sentences

Kullback-Leibler divergence: .814 Average perplexity reduction per word

= .0419 (2^.814/21) per sentence? Features modify probability of entire sentence Effect of feature on per-word probability is

small

Page 20: Learning Within-Sentence Semantic Coherence

Likelihood Value

Den

sity

---- Trigram Generated

Broadcast News

Distribution of Likelihood Ratio

Page 21: Learning Within-Sentence Semantic Coherence

Discrimination Task

Natural Sentence:– but it doesn't feel like a sacrifice in a sense that you're

really saying this is you know i'm meant to do things the right way and you trust it and tell the truth

Trigram-Generated:– they just kind of free trade agreements which have been

living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though

Page 22: Learning Within-Sentence Semantic Coherence

Q Value

Den

sity

---- Trigram Generated Broadcast News

Q Values at Distance 1

Page 23: Learning Within-Sentence Semantic Coherence

Q Value

Den

sity

---- Trigram Generated Broadcast News

Q Values at Distance 3

Page 24: Learning Within-Sentence Semantic Coherence

Outline

The problem of semantic (in)coherence Incorporating this into the whole-

sentence exponential LM Finding better features for this model

using machine learning Semantic coherence features Experiments and results