9
Language Model Methods and Metrics Gary Luu Ryan Fortune

Language Model Methods and Metrics Gary Luu Ryan Fortune

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Language Model Methods and Metrics

Gary LuuRyan Fortune

Skip N-grams

• Interpolated with Bigram• Get Influence of words further away without

increasing dimensionality• Learning Curve

Skip N-gram Learning Curve

Content Word Language Model

• Help predict next word using last uncommon word, try to capture context

• Found list of 250 most common words• Tried different sizes for common words• Interpolated with language models, since this

wouldn’t maintain grammar• P(w|C)

Content Word Model

Bag Generation Metrics

• Bag Generation – NP-Hard• Random Restart Greedy Hill-Climbing• Stability Metric

• Give model correct sentence, does it maintain it as an optima?

• A percentage of sentences that remain stable

• Reconstruction Metric• Needs to be compared against lucky/random

Bag Generation Metrics

Clustering -IBMFullPredict

• Clustering overview• Perplexity down to 107 with million sentence

corpus

• Pibmfullpredict(wi|wi-2wi-1) = [λP(W|wi-2wi-1) + (1-λ)P(W|Wi-1Wi-2)] * [μP(w|wi-1wi-2,W) + (1-μ)P(w|Wi-2,Wi-1,W)]

Learning Curve for IBMFullPredict