64
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Embed Size (px)

Citation preview

Page 1: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

CONSTRAINED CONDITIONAL MODELS TUTORIALJingyu Chen, Xiao Cheng

Page 2: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

INTRODUCTION

Page 3: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Main ideas:• Idea 1: Modeling

Separate modeling and problem formulation from algorithms• Similar to the philosophy of probabilistic modeling

• Idea 2: Inference

Keep model simple, make expressive decisions (via constraints)

• Unlike probabilistic modeling, where models become more expressive • Inject background knowledge

• Idea 3: Learning

Expressive structured decisions can be supported by simply

learned models • Global Inference can be used to amplify the simple models (and even

minimal supervision).

Page 4: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Task of interest: Structured Prediction• Common formulation

• e.g. HMM, CRF, Structured Perceptron etc.

• Covers a lot of NLP problems:• Parsing; Semantic Parsing; Summarization; Transliteration; Co-

reference resolution, Textual Entailment…

• IE problems:• Entities, relations, attributes…

• How to improve without incurring performance issues?

Page 5: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Pipeline?• Very crude approximation to the real problem, propagates

error.• Ignores dependency :

• e.g. In relation extraction, the label of the entity depends on the relation it is involved and the relation label depends on the label of its arguments.

Page 6: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Model Formulation• Typical models

• With CCM we choose

Penalty Violation measure

Regularization

Local dependencye.g. HMM, CRF

Page 7: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Constraint expressivity

Multiclass Problem:

One v. All approximation:

Ideal classification, can be expressed through constraints

Page 8: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Implementations

Modeling Objective function

Constrained Optimization Solver

Integer Linear Programming

Inference Exact ILP, Heurisitic Search, Relaxation, Dynamic Programming

Learning Learn and , can be learnt jointly or separately, semi-supervised learning etc.

arg max𝑦𝑤𝑇 𝑓 (𝑥 , 𝑦 ) −𝜌𝑇 𝑑 (𝑥 , 𝑦 )

Page 9: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

How do we use CCM to learn?

Page 10: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

EXAMPLE 1: JOINT INFERENCE-BASED LEARNINGConstrained HMM in Information Extraction

Page 11: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Typical work flow• Define basic classifiers• Define constraints as linear inequalities• Combine the two into an objective function

Page 12: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• Information extraction without prior knowledge• Use HMM

Page 13: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example

AUTHOR Lars Ole Andersen . Program analysis and

TITLE specialization for the

EDITOR C

BOOKTITLE Programming language

TECH-REPORT . PhD thesis .

INSTITUTION DIKU , University of Copenhagen , May

DATE 1994 .

Violates a lot of natural constraints

Page 14: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• Each field must be a consecutive list of words and can

appear at most once in a citation.

• State transitions must occur on punctuation marks.

• The citation can only start with AUTHOR or EDITOR.

• The words pp., pages correspond to PAGE.• Four digits starting with 20xx and 19xx are DATE.• Quotations can appear only in TITLE

Page 15: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• How do we use constraints with HMM?• Standard HMM:

• Learn the probability of the sequence of labels and input :

• Inference, taking the most likely label sequence:

Page 16: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• New objective function involving constraints• Penalize the probability of sequence if it violates

constraint

Penalty for each time the constraint is violated

Page 17: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• Transform to linear model

Page 18: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example• We need to learn the new parameters maximizes the

scoring function

• Despite the fact that the scoring function is no longer a log likelihood of the dataset, it is still a smooth concave function with a unique global maximum with zero gradient.

Page 19: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example

Simply counting the probabilityof the constraints being violated

Page 20: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

HMMCCM Example

Page 21: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Are there other ways to learn?

Can this paradigm be generalized?

Page 22: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

TRAINING PARADIGMS

Page 23: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Training paradigms

DecomposeLearn Inference

Page 24: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Prior knowledge: Features vs. Constraints

Feature Constraint

Data dependent Yes No (if not learnt)

Learnable Yes Yes

Size Large Small

Improvement Approach

Higher order model Post-processing for I+L

Domain

Penalty type Soft Hard & Soft

Common usage Local Global

Formulation Propositional/ FOL/

Page 25: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Comparison with MLN• MLN models constraints are formulated as an explicit

probability jointly with the overall distributions:• e.g.

• Constraints in CCM are formulated as linear inequalities• e.g.

• Theoretically the same, very different in practice

Page 26: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Training paradigms• Learning + Inference: Train with some constraints, apply

all constraints only in inference• No need to retrain an existing system• Fast and modular

• Inference-Based Training: Train jointly with constraints and dependencies (e.g. Graphical Models)• Better for strong interactions between

• Other training paradigm:• Pipe-line like sequential model [Roth, Small, Titov: AI&Stat’09]• Constraints Driven Learning (CODL) [Chang et. al’07,12]

Page 27: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Which paradigm is better?

Page 28: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

For each iteration

For each in the training data

If

endif

endfor

endfor

Algorithmic view of the differences

IBT−𝜌𝑇𝑑 (𝑥 , 𝑦)

𝒀 𝑷𝑹𝑬𝑫=arg max𝑦𝑤𝑇 𝑓 (𝑥 , 𝑦 ) −𝜌𝑇𝑑 (𝑥 , 𝑦 )   I+L

Page 29: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

L+I vs. IBT tradeoffs

# of Features

In some cases problems are hard due to lack of training data.

Semi-supervised learning

Page 30: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Choice of paradigm• IBT:

• Better when the interaction between output label is strong

• L+I:• Faster computationally• Modular, no need to retrain existing classifier and works with

simple models such as

Page 31: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

PARADIGM 2:LEARNING + INFERENCEAn example with Entity-Relation Extraction

Page 32: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

Dole ’s wife, Elizabeth , is a native of N.C. E1 E2 E3

R12 R2

3

1: 32Decision time inference

Page 33: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Formulation 1: Joint Global Model

Intractable to learn Need to decomposition

Page 34: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Formulation 2: Local learning + global inference

Page 35: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

Cost function:

c{E1 = per}· x{E1 = per} + c{E1 = loc}· x{E1 = loc} + … + c{R12 = spouse_of}· x{R12 = spouse_of} + … + c{R12 = }· x{R12 = } + …

R12 R21 R23 R32 R13 R31

E1

DoleE2

ElizabethE3

N.C.

Page 36: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

Exactly one label for each relation and entity

Relation and entity type constraints

Integral constraints, in effect boolean

Page 37: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Each entity is either a person, organization or location:x{E1 = per}+ x{E1 = loc}+ x{E1 = org} + x{E1 = }=1

• (R12 = spouse_of) (E1 = person) (E2 = person)

x{R12 = spouse_of} x{E1 = per}

x{R12 = spouse_of} x{E2 = per}

Page 38: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Entity classification results

Page 39: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Relation identification results

Page 40: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Entity-Relation Extraction [RothYi07]

• Relation identification results

Page 41: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

INNER WORKINGS OF INFERENCE

Page 42: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Constraints Encoding• Atoms

• Existential quantification

• Negation

• Conjunction• Disjunction

Page 43: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Integer Linear Programming (ILP)• Powerful tool, very general

• NP-hard even in binary case, but efficient for most NLP problems

• If ILP can not solve the problem efficiently, we can fall back to approximate solutions using heuristic search

Page 44: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Integer Linear Programming (ILP)

Page 45: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Integer Linear Programming (ILP)

Page 46: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SENTENCE COMPRESSION

Page 47: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Sentence Compression Example Modelling Compression with Discourse Constraints, James Clarke and Mirella Lapata,

COLING/SCL 2006

• 1. What is sentence compression? • Sentence compression is commonly expressed as a word deletion

problem: given an input sentence of words W = w1,w2, . . . ,wn, the aim is to produce a compression by removing any subset of these words (Knight and Marcu 2002).

Page 48: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

A trigram language model: maximize a scoring function by ILP:

p i: word i starts the compressionq i,j : sequence wi,wj ends the compressionX i,j,k : trigram wi , wj ,wk in the compressionY i : word i in the compressionEach p ,q,x,y is either 0 or 1,

Page 49: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Sentential Constrains:• 1. disallows the inclusion of modifiers without their head

words:

• 2. presence of modifiers when the head is retained in the compression:

• 3. constrains that if a verb is present in the compression then so are its arguments:

Page 50: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Modifier Constraint Example

Page 51: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Modifier Constraint Example

Page 52: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Sentential Constrains:• 4. preserve personal pronouns in the compressed output:

Page 53: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Discourse Constrains:• 1. Center of a sentence is retained in the compression,

and the entity realised as the center in the following sentence is also retained.

• Center of the sentences is the entity with the highest rank.• Entity may ranked by many features.• EX:• grammatical role• (subjects > objects > others).

Page 54: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Discourse Constrains:• 2. Lexical Chain Constrains:•

• Lexical chain is a sequences of semantically related words.

• Often the longest lexical chain is the most important chain.

Page 55: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SEMANTIC ROLE LABELING

Page 56: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

Semantic Role labeling Example:

• What is SRL?• SRL identifies all

constituents that fill a semantic role, and determines their roles.

Page 57: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

General information:• Both models(argument identifier and argument

classifiers) are trained by SNoW.

• Idea: maximization the scoring function

Page 58: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SRL: Argument Identification• use a learning scheme that utilizes two classifiers, one to• predict the beginnings of possible arguments, and the

other the ends. The predictions are combined to form argument candidates.

• Why:• When only shallow parsing is available, the system does

not have constituents to begin with. Therefore, conceptually, the system has to consider all possible subsequences.

Page 59: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SRL: List of features• POS tags• Length• Verb class• Head word and POS tag of the head word• Position• Path• Chunk pattern• Clause relative position• Clause coverage• NEG• MOD

Page 60: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SRL: Constraints• 1. Arguments cannot overlap with the predicate.

• 2. Arguments cannot exclusively overlap with the clauses.

• 3. If a predicate is outside a clause, its arguments cannot be embedded in that clause.

• 4. No overlapping or embedding arguments.

• 5. No duplicate argument classes for core arguments.• Note: conjunction is an exception.• [A0 I] [V left ] [A1 my pearls] [A2 to my daughter] and [A1 my

gold] [A2 to my son].

Page 61: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SRL: Constraints• 6. if an argument is a reference to some other argument

arg, then this referenced argument must exist in the sentence.

• 7. If there is a C-arg argument, then there has to be an arg argument; in addition,the C-arg argument must occur after arg.

• the label C-arg is then used to specify the continuity of the arguments.

• 8. Given a specific verb, some argument types should• never occur.

Page 62: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng
Page 63: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

SRL Results:

Page 64: CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

QA• Questions?