31
Unsupervised Unsupervised learning of learning of Natural languages Natural languages Eitan Volsky Eitan Volsky Yasmine Meroz Yasmine Meroz

Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Embed Size (px)

Citation preview

Page 1: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Unsupervised learning ofUnsupervised learning ofNatural languagesNatural languages

Eitan VolskyEitan Volsky

Yasmine MerozYasmine Meroz

Page 2: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

IntroductionIntroduction

• Grammar learning methods can be grouped into two kinds: supervised and unsupervised.

• Roughly speaking, unsupervised methodsuse only pre-tagged sentences, while supervised methods are first initialized with structured sentences.

• Other Forms of supervision exist as well, for example, probabilistic grammars.

Page 3: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

• Supervised methods clearly outperform unsupervised ones, but they are much more time consuming, and in many cases it’s impossible to find a treebank or corpus, suitable for a specific task.

• Examples :– Deciphering text in an unknown Language– DNA sequence analysis.

Page 4: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

The bootstrapping processThe bootstrapping process

• The process generates the syntactic structure of a sentence while it begins from scratch (when it’s completely unsupervised)

• The structure has to be useful, thusarbitrary, random or incomplete structures are avoided.

• The system should try to minimize the amount of the information it needs to learn structure.

Page 5: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

The Scope of the articleThe Scope of the article

• The article presents two unsupervised learning frameworks :– EMILE 4.1– ABL (Alignment-based learning)

• We’ll present the frameworks and the algorithms that underlay them, and compare them on the ATIS and the OVIS corpora.

Page 6: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

EMILE 4.1EMILE 4.1

• Some definitions first :

• The sentence : David makes tea

“David tea” is a “Context”

makes is an “Expression”

Page 7: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Substitution Classes - intuitionSubstitution Classes - intuition

• If a language has a CFG then expressions, which are generated from thesame non-terminal can substitute each other in each context where that non-terminal is a valid constituent.

• If we have a sufficiently rich example we can expect to find classes of expressions that cluster together.

Page 8: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Primary and characteristic contexts and Primary and characteristic contexts and expressionsexpressions

• A grammatical type is defined as a pair<TC, TE> where TC is a set of context and TE is a set of expressions.

• Expressions and Contexts from those sets are called primary.

• Characteristic Context for T is a context which appears only with expressions oftype T. The same for characteristic expressions.

Page 9: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

ExampleExample

• “walk” can be both noun and a verb. So it cannot be characteristic neither for noun phrases nor for verb phrases.

• “thing” can only be a noun, thus it appears only in noun phrases.

• “thing” is characteristic for the type “noun”.

Page 10: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Shallow languages Shallow languages within Chomsky Hierarchywithin Chomsky Hierarchy

C

type

0 Context

sensitiveContext-free

regular

Shallow

•Seems to be an independent category

Page 11: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Shallow languages - first criterionShallow languages - first criterion

• Grammar G has context separability if each type of G has characteristic context and expression separability if each type of G has characteristic context.

• Shallow language has to be context and expression separable.

Page 12: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Shallow languages - second criterionShallow languages - second criterion

• Shallow language has to have a sample set of sentences S inducing characteristic contexts and expressions for all types of GL. It’s called characteristic sample.

• For all sentences of this sample set :

K(s) < log(|G|)

Kolmogorov complexity = descriptive length of s

Page 13: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Why Shallow languages ?Why Shallow languages ?

• If the sample is taken under simple distribution (dominated by recursively enumerable), The last criteria promises us the sample can be learnt (its grammar to be induced) in Polynomial Time to |G|,

• Shallow grammars can be learned efficiently from positive examples, what turns the argument of poverty of stimulus, based on Gold’s results to unconvincing.

Page 14: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Natural Languages are ShallowNatural Languages are Shallow

• It is claimed (unproven) that natural language are shallow.

• NL have large lexicons and relatively few rules.

• Their Shallowness ensures us that if we sample enough sentences, the sample will be characteristic with large confidence.

Page 15: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

How does EMILE really work ?How does EMILE really work ?

• Two Phases :– Clustering – Rule Induction

Page 16: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

CorpusCorpus

• John makes tea.• John likes coffee.• John is eating.• John likes tea.• John likes eating.• John makes coffee.

• …

• …

Page 17: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

ClusteringClustering

context

expr’

John

(.)

tea

John

(.)

coffee

John

(.)

eating

John

makes

(.)

John

likes

(.)

John

is

(.)

makesxxlikesxxxisxteaxxcoffeexxeatingxx

Page 18: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

identification of clusters - settingsidentification of clusters - settings

• The identification of clusters depends on the following settings :– Total_support%– Context_support%– Expression_support%

Suppose that:

Total_support% = Context_support% = 75%

Expression_support% = 80%

Page 19: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

context

expr’

John

makes

(.)

John

likes

(.)

John

drinks

(.)

John

buys

(.)

teaxxxx

coffeexxxx

lemonadexxx

soupxxxx

applesx

Page 20: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Rule InductionRule Induction

• T => s0 [T1] s1 [T2] s2 [T3] s3

• EMILE attempts to transform the collection of derivation rules found into CFG, consisting of those rules.

• [0] => what is a family fare• [19] => a family fare.

•[0] => what is [19]

Page 21: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

ABL ABL (Alignment-Based Learning)(Alignment-Based Learning)

• ABL is based on Harris’ principle of substitutability (1951) :

All constituents of the same kind can be replaced by each other.

ABL uses a reversed version of this principle :

If parts of sentences can be substituted by each other, they are constituents of the same type.

Page 22: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

The AlgorithmThe Algorithm

• The output of algorithm is a labeled, bracketed version of the input corpus.

• The model learns by comparing all sentences in the input corpus to each other in pairs.

• Two Phases : – Alignment learning– Selection learning

Page 23: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

A Comparison of two sentencesA Comparison of two sentences

• The comparison of two sentences falls into one of three different categories :

– All words in the two sentences are the same– The sentences are completely unequal– Some words in the sentences are same in

both and some are not.

Page 24: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Alignment LearningAlignment Learning

• What is a family fare

• What is a payload of an African swallow ?

• The unequaled parts of the sentence are possible constituents.

• {a family fare, the payload of an African swallow}

Page 25: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

The Edit DistanceThe Edit Distance

• The edit distance is the minimum edit cost needed to transform one sentence into another

• (Wagner & Fischer 1971)

• The algorithm which finds the edit distance

also finds the longest common subsequence, and it also gives an estimation how far is the link between the two parts.

Page 26: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

ExampleExample

From (San Francisco to)1 Dallas ()2

From ()1 Dallas (to San Francisco)2

From (San Francisco)1 to (Dallas)2From (Dallas)1 to (San Francisco)2

Page 27: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Overlapping ConstituentsOverlapping Constituents

• I didn’t take my passport.

• I didn’t like this plane.

• If {this plane} was already stored, “like this plane” overlaps with it, and we cannot assign them different types because it would prevent us from inducing a CFG in a later stage.

Page 28: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Selection LearningSelection Learning

• In the Selection Learning phase, we try to get rid of the overlapping constituents by finding the best combination of constituents of each type.

• 3 ways to compute constituent’s probability :– ABL : first-is-correct– ABL : leaf– ABL : branch

Page 29: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Selection Learning (cont’)Selection Learning (cont’)

• After the probabilities of the overlapping constituents were computed, The probability of each combination is computed using geometric mean, while using the Viterbi algorithm optimization, in order to do it efficiently.

Page 30: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

Theoretical ComparisonTheoretical Comparison

• ABL is much more greedy, and thus learns faster and better on small corpora, but cannot learn on large corpora out of efficiency reasons. It stores all possible constituents, and only then selects the best ones.

• EMILE is developed for large corpora (more than 100K sentences) and is much less greedy.It finds a grammar rule only when enough information was found to support it.

Page 31: Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz

ConclusionsConclusions

• Both frameworks work pretty well for unsupervised learning models.

• Their underlying ideas match rather well.

• It should be possible to develop a hybrid version, which uses the best qualities of both algorithms.