124
Grammatical processing with Grammatical processing with LFG and XLE LFG and XLE Ron Kaplan Ron Kaplan ARDA Symposium, August 2004 ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Grammatical processing withGrammatical processing with LFG and XLE LFG and XLE

Ron KaplanRon Kaplan

ARDA Symposium, August 2004ARDA Symposium, August 2004

Advanced QUestionAnswering for INTelligence

Page 2: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

F-structure Conceptual semantics

KR

XLE/LFG Parsing KR MappingTarget

KRRText

Text

Sources

Question

Assertions

Query

Text to user

ComposedF-StructureTemplates

AnswersExplanationsSubqueries

XLE/LFGGeneration

MMatch M

Layered Architecture for Question AnsweringLayered Architecture for Question Answering

Page 3: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

F-structure Conceptual semantics

KR

XLE/LFG Parsing KR MappingTarget

KRRText

Text

Sources

Question

Assertions

Query

Text to user

ComposedF-StructureTemplates

AnswersExplanationsSubqueries

XLE/LFGGeneration

Layered Architecture for Question AnsweringLayered Architecture for Question Answering

Page 4: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Layered Architecture for Question AnsweringLayered Architecture for Question Answering

F-structure Conceptual semantics

KR

XLE/LFG Parsing KR MappingTarget

KRRText

Sources

Question

Assertions

Query

TheoriesLexical Functional GrammarAmbiguity managementGlue Semantics

ResourcesEnglish GrammarGlue lexiconKR mapping

InfrastructureXLEMaxEnt models Linear deductionTerm rewriting

Page 5: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Deep analysis matters…Deep analysis matters… if you care about the answerif you care about the answer

Example:

A delegation led by Vice President Philips, head of the chemical division, flew to Chicago a week after the incident.

Question: Who flew to Chicago?

Candidate answers:

division closest nounhead next closestV.P. Philips next

shallow but wrong

delegation furthest away butSubject of flew

deep and right

“grammatical function”

Page 6: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

F-structure: localizes argumentsF-structure: localizes argumentsWas John pleased? “John was easy to please” Yes “John was eager to please” Unknown

PRED please(SUBJ, OBJ)SUBJ someoneOBJ John

PRED easy(SUBJ, COMP)SUBJ John

COMP

PRED please(SUBJ, OBJ)SUBJ JohnOBJ someone

PRED eager(SUBJ, COMP)SUBJ John

COMP

“lexical dependency”

Page 7: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

TopicsTopics

Basic LFG architecture Ambiguity management in XLE Pargram project: Large scale grammars Robustness Stochastic disambiguation [Shallow markup] [Semantic interpretation]

Focus on the language end, not knowledge

Page 8: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

“Tony decided to go.”

The Language Mapping: LFG & XLEThe Language Mapping: LFG & XLE

Functionalstructures

Parse

LFGGrammar

NamedEntities

Generate

XLE

English, German, etc.

Sentence

Know

ledge

XLE: Efficient ambiguitymanagement

TokensMorphology

StochasticModel

Page 9: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Why deep analysis is difficultWhy deep analysis is difficult Languages are hard to describe

– Meaning depends on complex properties of words and sequences

– Different languages rely on different properties

– Errors and disfluencies

Languages are hard to compute– Expensive to recognize complex patterns– Sentences are ambiguous– Ambiguities multiply: explosion in time and space

Page 10: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

EnglishGroup, order

JapaneseGroup, mark

The small children are chasing the dog.

P

gaSbj

S

NP

NAdj

NP

tiisaismall

kodomotatichildren

N

inudog

V

oikaketeiruare chasing

oObj

P

Different patterns code same meaningDifferent patterns code same meaning

S

NP

NAdjDet

V’ NP

the small

Aux

children

Det

the

N

dogare

V

chasing

Page 11: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

EnglishGroup, order

JapaneseGroup, mark

The small children are chasing the dog.

P

gaSbj

S

NP

NAdj

NP

tiisaismall

kodomotatichildren

N

inudog

V

oikaketeiruare chasing

oObj

P

Different patterns code same meaningDifferent patterns code same meaning

S

NP

NAdjDet

V’ NP

the small

Aux

children

Det

the

N

dogare

V

chasing

WarlpiriMark only

S

NP

N

NP

witajarrarlusmall-Sbj

malikidog-Obj

N

kurdujarrarluchildren-Sbj

V

wajilipinyichase

Aux

kapalaPresent

NP

A chase(small(children), dog)

Pred ‘chase<Subj, Obj>’

Subj

Obj

Tense Present

PredMod

childrensmall

Pred dog

LFG theory: minor adjustments on universal theme

Page 12: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

LFG architectureLFG architecture

C(onstituent)-structures and F(unctional)-structures

NP

John

VP

NP

Mary

V

likes

S

Formal encoding of grammatical relations

Formal encoding of order and grouping

ModularityNearly-decomposable

SUBJPRED ‘John’NUM SG

TENSE PRESENT

PRED ‘Mary’NUM SG

OBJ

PRED ‘like<SUBJ,OBJ>’

related by a piecewise correspondence

Page 13: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

LFG LFG grammargrammar

Rules

S NP VP( SUBJ)= =

Lexical entries

N John

( PRED)=‘John’

( NUM)=SG

V likes

( PRED)=‘like<SUBJ, OBJ>’

( SUBJ NUM)=SG

(↑ SUBJ PERS)=3

Context-free rules define valid c-structures (trees). Annotations on rules give constraints that corresponding f-

structures must satisfy. Satisfiability of constraints determines grammaticality. F-structure is solution for constraints (if satisfied).

VP V (NP)= ( OBJ)=

NP (Det) N = =

Page 14: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Rules as well-formedness conditionsRules as well-formedness conditions

S NP( SUBJ)=

VP=

S

NP VPSUBJ [ ]

A tree containing S over NP - VP is OK if F-unit corresponding to NP node is SUBJ of f-unit corresponding to S node The same f-unit corresponds to both S and VP nodes.

If * denotes a particular daughter: : f-structure of mother (M(*)) : f-structure of daughter (*)

Page 15: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

vv

v be the f-structure of the VP

v

ss

s

s be the f-structure of the NP

ff

Let f be the (unknown) f-structure of the S

f

(f SUBJ NUM)=PL and (f SUBJ NUM)=SG

=> SG=PL => FALSE

NP( SUBJ)=

walks( SUBJ NUM)=SG

S VP =

they( NUM)=PL

Inconsistent equations = UngrammaticalInconsistent equations = Ungrammatical

What’s wrong with “They walks” ?

(f SUBJ) = s and (s NUM)=PL => (f SUBJ NUM)=PL

Then (substituting equals for equals):

f = v and (v SUBJ NUM)=SG => (f SUBJ NUM)=SG

If a valid inference chain yields FALSE, the premises are unsatisfiable, no f-structure.

S

NP VP

walksthey

Page 16: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

English and JapaneseEnglish and Japanese

S V =

NP*( ( GF))=

Japanese: Any number of NP’s before Verb Particle on each defines its grammatical function

ga: ( GF)=SUBJ

o: ( GF)=OBJ

English: One NP before verb, one after: Subject and Object

S V =

NP( SUBJ)=

NP( OBJ)=

Page 17: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Warlpiri: Discontinuous constituentsWarlpiri: Discontinuous constituents

witajarrarlusmall-Sbj

malikidog-Obj

kurdujarrarluchildren-Sbj

wajilipinyichase

kapalaPresent

S

NP

N

NP

N

VAux NP

A

PRED ‘chase<Subj, Obj>’

SUBJ

OBJ

TENSE Present

PREDMOD

childrensmall

PRED dog

Like Japanese: Any number of NP’s Particle on each defines its grammatical function

Unlike Japanese, head Noun is optional in NP

NP A* ( MOD)

N =

rlu: ( GF)=SUBJki: ( GF)=OBJ

S … NP* …( ( GF))=

Page 18: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

English: Discontinuity in questionsEnglish: Discontinuity in questionsWho did Mary see?Who did Bill think Mary saw?Who did Bill think saw Mary?

Who is understood as subject/object of distant verb.Uncertainty: which function of which verb?

PRED see<SUBJ,OBJ>TENSE pastSUBJ MaryOBJ

Mary saw

S’

NP S

Aux NP V S

NP V

Who

did Bill think

Q WhoTENSE pastPRED think<SUBJ, COMP>

COMP

S’ → NP S(↑ Q)=↓ ↑=↓

(↑ COMP* SUBJ|OBJ)=↓

OBJCOMP OBJCOMP SUBJ

Page 19: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Summary: Lexical Functional GrammarSummary: Lexical Functional Grammar

Modular: c-structure/f-structure in correspondence Mathematically simple, computationally transparent

– Combination of Context-free grammar, Quantifier-free equality theory– Closed under composition with regular relations: finite-state morphology

Grammatical functions are universal primitives– Subject and Object expressed differently in different languages

English: Subject is first NP Japanese: Subject has ga

– But: Subject and Object behave similarly in all languages Active to Passive: Object becomes Subject English: move words Japanese: move ga

Adopted by world-wide community of linguists– Large literature: papers, (text)books, conferences; reference theory– (Relatively) easy to describe all languages– Linguists contribute to practical computation

Stable: Only minor changes in 25 years

Kaplan and Bresnan, 1982

Page 20: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Efficient computation withEfficient computation with LFG grammars: LFG grammars: Ambiguity Management in XLE Ambiguity Management in XLE

Page 21: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Computation challenge: Pervasive ambiguityComputation challenge: Pervasive ambiguity

walks Noun or Verb? untieable knot (untie)able or un(tieable)? bank river or financial?

walks Noun or Verb? untieable knot (untie)able or un(tieable)? bank river or financial?

The duck is ready to eat. Cooked or hungry?

The duck is ready to eat. Cooked or hungry?

Every proposer wants an award. The same award or each their own?

Every proposer wants an award. The same award or each their own?

I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation) I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation)

The sheet broke the beam. Atoms or photons?

The sheet broke the beam. Atoms or photons?

KnowledgeSemanticsSyntaxMorphologyTokenization

Page 22: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Coverage vs. AmbiguityCoverage vs. Ambiguity

I fell in the park.

+I know the girl in the park.

I see the girl in the park.

Page 23: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Ambiguity can be explosiveAmbiguity can be explosiveIf alternatives multiply within or across components…

Tokenize

Morphology

Syntax

Sem

antics

Know

ledge

Page 24: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Computational consequences of ambiguityComputational consequences of ambiguity

Serious problem for computational systems– Broad coverage, hand written grammars frequently produce

thousands of analyses, sometimes millions– Machine learned grammars easily produce hundreds of

thousands of analyses if allowed to parse to completion

Three approaches to ambiguity management:– Prune: block unlikely analysis paths early– Procrastinate: do not expand alternative analysis paths until

something else requires them» Also known as underspecification

– Manage: compact representation and computation of all possible analyses

Page 25: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Pruning Pruning Premature Disambiguation⇒ Premature Disambiguation⇒ Conventional approach: Use heuristics to kill as soon as possible

Tokenize

Morphology

Syntax

Sem

antics

Know

ledge

XX

X

Statistics

X

Oops: Strong constraints may reject the so-far-best (= only) option

Fast computation, wrong result

X

Page 26: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Procrastination: Passing the BuckProcrastination: Passing the Buck

Chunk parsing as an example:– Collect noun groups, verb groups, PP groups– Leave it to later processing to put these together– Some combinations are nonsense

Later processing must either:– Call (another) parser to check constraints– Have its own model of constraints (= grammar)– Solve constraints that chunker includes with output

Page 27: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Computational Complexity of LFG Computational Complexity of LFG

LFG is simple combination of two simple theories– Context-free grammars for trees– Quantifier free theory of equality for f-structures

Both theories are easy to compute– Cubic CFG Parsing– Linear equation solving

Combination is difficult: Parsing problem is NP Complete– Exponential/intractible in the worst case

(but computable, unlike some other linguistic theories

– Can we avoid the worst case?

Page 28: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Some syntactic dependenciesSome syntactic dependencies

Local dependencies: These dogs *This dogs (agreement)

Nested dependencies: The dogs [in the park] bark (agreement)

Cross-serial dependencies:

Jan Piet Marie zag helpen zwemmen (predicate/argument

map)

See(Jan, help(Piet, swim(Marie)))

Long distance dependencies:

The girl who John says that Bob believes … likes Henry left.

Left(girl) Says(John, believes(Bob, (…likes(girl, Henry))))

Page 29: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

ExponentialIntractable!

Expressiveness vs. complexityExpressiveness vs. complexity

But languages have mostly local and nested dependencies... so (mostly) cubic performance should be possible.

Linear

Cubic

The Chomsky Hierarchy

n n is length of sentenceis length of sentence

Type DependencyComputational

Complexity

Regular Local O(n)

Context-free Nested O(n3)

Context-sensitive

Cross-serial and

Long DistanceO(2n)

Page 30: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

NP Complete ProblemsNP Complete Problems Problems that can be solved by a Nondeterministic Turing Machine

in Polynomial time General characterization: Generate and test

– Lots of candidate solutions that need to be verified for correctness– Every candidate is easy to confirm or disconfirm

n elementsn elements

22nn candidates candidates

NonNondeterministic TM has an oracledeterministic TM has an oraclethat provides only the right candidatesthat provides only the right candidatesto test, doesn’t search.to test, doesn’t search.

Deterministic TM doesn’t have oracle,Deterministic TM doesn’t have oracle,must test all (exponentially many) candidates. must test all (exponentially many) candidates.

Page 31: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Polynomial search problemsPolynomial search problems Subparts of a candidate are independent of other parts: outcome is

not influenced by other parts (context free) The same independent subparts appear in many candidates We can (easily) determine that this is the case Consequence: test subparts independent of context, share results

Page 32: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Why is LFG parsing NP Complete?Why is LFG parsing NP Complete?

Classic generate-and-test search problem

Exponentially many tree-candidates– CFG chart parser quickly produces packed representation

of all trees– CFG can be exponentially ambiguous– Each tree must be tested for f-structure satisfiability

Exponentially many exponential problems

Boolean combinations of per-tree constraintsEnglish base verbs: Not 3rd singular

( SUBJ NUM)SG ( SUBJ PERS)3

Disjunction!

Page 33: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

XLE Ambiguity Management: The intuitionXLE Ambiguity Management: The intuition

The sheep saw the fish.How many sheep?

How many fish?

Packed representation is a “free choice” system– Encodes all dependencies without loss of information– Common items represented, computed once– Key to practical efficiency

The sheep-sg saw the fish-sg.The sheep-pl saw the fish-sg.The sheep-sg saw the fish-pl.The sheep-pl saw the fish-pl.

Options multiplied out

In principle, a verb might require agreement of Subject and Object: Have to check it out.

The sheep saw the fishsgpl

sgpl

Options packed

But English doesn’t do that: Subparts are independent

Page 34: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

… but it’s wrongDoesn’t encode all dependencies, choices are not free.

Dependent choicesDependent choices

Das Mädchen-nom sah die Katze-nomDas Mädchen-nom sah die Katze-accDas Mädchen-acc sah die Katze-nomDas Mädchen-acc sah die Katze-acc

Das Mädchen sah die Katzenomacc

nomacc

The girl saw the cat

Again, packing avoids duplication

badThe girl saw the catThe cat saw the girl bad

Who do you want to succeed? I want to succeed John want intrans, succeed trans I want John to succeed want trans, succeed intrans

Page 35: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Solution: Label dependent choicesSolution: Label dependent choices

Das Mädchen-nom sah die Katze-nomDas Mädchen-nom sah die Katze-accDas Mädchen-acc sah die Katze-nomDas Mädchen-acc sah die Katze-acc

badThe girl saw the catThe cat saw the girl bad

• Label each choice with distinct Boolean variables p, q, etc.• Record acceptable combinations as a Boolean expression • Each analysis corresponds to a satisfying truth-value assignment

(free choice from the true lines of ’s truth table)

Das Mädchen sah die Katze p:nom

p:acc

q:nom

q:acc

(pq)

(pq) =

Page 36: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Boolean SatisfiabilityBoolean Satisfiability

Produces simple conjunctions of literal propositions (“facts”--equations)

Easy checks for satisfiability

If ad FALSE, replace any conjunction with a and d by FALSE.

Blow-up of disjunctive structure before fact processing

Individual facts are replicated (and re-processed): Exponential

(a b) x (c d)

(a x c) (a x d) (b x c) (b x d)

Can solve Boolean formulas by multiplying out: Disjunctive Normal Form

Page 37: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Alternative: “Contexted” normal formAlternative: “Contexted” normal form

Produce a flat conjunction of contexted facts

(a b) x (c d) pa pb x qc qd

context fact

Page 38: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

– Each fact is labeled with its position in the disjunctive structure– Boolean hierarchy discarded

(a b) x (c d) pa pb x qc qd

a b dc

xp qp q

xp a p b q c q d

Produce a flat conjunction of contexted facts

No blow-up, no duplicates– Each fact appears and can be processed once– Claims:

» Checks for satisfiability still easy» Facts can be processed first, disjunctions deferred

Alternative: “Contexted” normal formAlternative: “Contexted” normal form

Page 39: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Conversion to logically equivalent contexted form

Lemma: iff p p (p a new Boolean variable)

Proof:(If) If is true, let p be true, in which case p p is true. (Only if) If p is true, then is true, in which case is true.

A sound and complete methodA sound and complete method

Ambiguity-enabled inference (by trivial logic):

If is a rule of inference, then so is [C1 ] [C2 ] [(C1C2) ]

Maxwell & Kaplan, 1987, 1991

E.g. Substition of equals for equals: x=y x/y is a rule of inference

Therefore: (C1 x=y) (C2 ) (C1 C2 x/y)

Valid for any theory

Page 40: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Test for satisfiabilityTest for satisfiability

Perform all fact-inferences, conjoining contexts If infer FALSE, add context to nogoods Solve conjunction of nogoods

– Boolean satisfiability: exponential in nogood context-Booleans– Independent facts: no FALSE, no nogoods

Implicitly notices independence/context-freeness

E.g. R SG=PL ⇒ R → FALSE.

R is called a “nogood” context.

Suppose R FALSE is deduced from a contexted formula . Then is satisfiable only if R.

Page 41: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Example 1Example 1

“They walk”– No disjunction, all facts are in the default “True” context– No change to inference

T(f SUBJ NUM)=SG T(f SUBJ NUM)=SG T SG=SG

reduces to: (f SUBJ NUM)=SG (f SUBJ NUM)=SG SG=SG

“They walks”– No disjunction, all facts still in the default “True” context– No change to inference:

T(f SUBJ NUM)=PL T(f SUBJ NUM)=SG TPL=SG T→FALSE

Satisfiable iff ¬T, so unsatisfiable

Page 42: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Examples 2Examples 2

“The sheep walks”– Disjunction of NUM feature from sheep

(f SUBJ NUM)=SG (f SUBJ NUM)=PL

– Contexted facts: p(f SUBJ NUM)=SG p(f SUBJ NUM)=PL

(f SUBJ NUM)=SG (from walks)

– Inferences: p(f SUBJ NUM)=SG (f SUBJ NUM)=SG p SG=SG

p(f SUBJ NUM)=PL (f SUBJ NUM)=SG p PL=SG p FALSE

p FALSE is true iff p is false iff p is True. Conclusion: Sentence is grammatical in context p: Only 1 sheep

Page 43: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Contexts and packing: Index by facts Contexts and packing: Index by facts

The sheep saw the fish.

Contexted unification concatenation, when choices don’t interact.

NUM p SGp PL

SUBJ

NUM q SGq PL

OBJ

NUM p SGp PL

SUBJ

NUM q SGq PL

OBJ

Page 44: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Compare: DNF unificationCompare: DNF unification

The sheep saw the fish.

DNF cross-product of alternatives: Exponential

[ SUBJ [NUM SG]]

[SUBJ [NUM PL ]]

[ OBJ [NUM SG]]

[ OBJ [NUM PL ]]

SUBJ [ NUM SG]OBJ [ NUM SG]

SUBJ [ NUM SG]OBJ [ NUM PL]

SUBJ [ NUM PL]OBJ [ NUM SG]

SUBJ [ NUM PL]OBJ [ NUM PL]

Page 45: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

The XLE wagerThe XLE wager (for real sentences of real languages)(for real sentences of real languages)

Alternatives from distant choice-sets can be freely chosen without affecting satisfiability– FALSE is unlikely to appear

Contexted method optimizes for independence– No FALSE no nogoods nothing to solve.

Bet: Worst case 2n reduces to k2m where m << n

Page 46: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Ambiguity-enabled inference:Ambiguity-enabled inference:Choice-logic common to all modulesChoice-logic common to all modules

If is a rule of inference,then so is C1 C2 (C1C2)

1. Substitution of equals for equals (e.g. for LFG syntax) x=y x/y

Therefore: C1x=y C2 (C1C2) x/y

Ambiguity-enabled components propagate choices, can defer choosing, enumerating

2. Reasoning Cause(x,y) Prevent(y,z) Prevent(x,z)

Therefore: C1Cause(x,y) C2Prevent(y,z) (C1C2)Prevent(x,z)

3. Log-linear disambiguation Prop1(x) Prop2(x) Count(Featuren)

Therefore: C1 Prop1(x) C2 Prop2(x) (C1C2) Count(Featuren)

Page 47: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Summary: Contexted constraint satisfactionSummary: Contexted constraint satisfaction

Packed– facts not duplicated– facts not hidden in Boolean structure

Efficient– deductions not duplicated– fast fact processing (e.g. equality) can prune slow disjunctive processing– optimized for independence

General and simple– applies to any deductive system, uniform across modules – not limited to special-case disjunctions– mathematically trivial

Compositional free-choice system– enumeration of (exponentially many?) valid solutions deferred across

module boundaries– enables backtrack-free, linear-time, on-demand enumeration– enables packed refinement by cross-module constraints: new nogoods

Page 48: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

The remaining exponentialThe remaining exponential

Contexted constraint satisfaction (typically) avoids the Boolean explosion in solving f-structure constraints for single trees

How can we suppress tree enumeration? (and still determine satisfiability)

Page 49: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Ordering strategy: Easy things firstOrdering strategy: Easy things first

Do all c-structure before any f-structure processing– Chart is a free choice representation, guarantees valid trees

Only produce/solve f-structure constraints for constituents in complete, well-formed trees

[NB: Interleaved, bottom-up pruning is a bad idea] Bets on inconsistency, not independence

Page 50: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Asking the right questionAsking the right question

How can we make it faster?– More efficient unifier: undoable operations, better

indexing, clever data structures, compiling.– Reordering for more effective pruning.

Why not cubic?– Intuitively, the problem isn’t that hard.– GPSG: Natural language is nearly context free.– Surely for context-free equivalent grammars!

Page 51: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

No f-structure filtering, no nogoods...No f-structure filtering, no nogoods... but still explosive but still explosive

LFG grammar for a context-free language:

a( A)=+

S S S( L)=

S( R)=

S Sa aa

S

S

S

S

Sa

S

S

S

Chart:Packed trees

S

S S

S

a aa

S

SS

a

F-structuresenumerate trees

R [A +]

L [A +]

L [A +]

R [A +]RL

Page 52: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Disjunctive lazy copyDisjunctive lazy copy

• Pack functional information from alternative local subtrees.

• Unpack/copy to higher consumers only on demand.

S

Automatically takes advantage of context freeness, without grammar analysis or compilation

SS

SS

S S

L

R

p f1q f2r f3

p f6q f5r f4

( L)= on Sdoesn’t access

internal features

41

63

2 5

Page 53: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

The XLE wagerThe XLE wager

Most feature dependencies are restricted to local subtrees – mother/daughter/sister interactions– maybe a grandmother now and then– very rarely span an unbounded distance

Optimize for local case– bounded computation per subtree gives cubic curve– graceful degradation with non-local interactions

… but still correct

Page 54: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Packing Equalities in F-structurePacking Equalities in F-structure

NP(SUBJ)=

NP(NUM)=sg

NPV

NP(SUBJ)=

Adj NP=

S

VP

V

visiting relatives

visiting relatives (NUM)=pl

is(SUBJ NUM)=sg

Adj

boring

A1 A2

T (SUBJ NUM)=sg

A1 (SUBJ NUM)=sg

A2 (SUBJ NUM)=pl

T & A1 sg=sg

T & A2 sg=pl nogood(A2)

Page 55: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence
Page 56: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

a:1 a:2

Page 57: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

XLE Performance: HomeCentre CorpusXLE Performance: HomeCentre Corpus

0

2

4

6

8

10

12

14

0 20 40 60

Words

Time (secs)

About 1100 English sentences

0

500

1000

1500

2000

2500

3000

3500

0 20 40 60

Words

Local Subtrees

Page 58: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Time is ~linear in subtrees: Time is ~linear in subtrees: Nearly cubic Nearly cubic

0

2

4

6

8

10

12

14

0 1000 2000 3000 4000

Local Subtrees

Time (secs)

2.1 ms/subtree

R2=.79

Page 59: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

French HomeCentreFrench HomeCentre

0

5

10

15

20

25

30

0 2000 4000 6000 8000

Local Subtrees

Time (secs)

3.3 ms/subtree

R2=.80

Page 60: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

German HomeCentre German HomeCentre

0

5

10

15

20

25

30

35

0 1000 2000 3000 4000

Local Subtrees

Time (secs)

3.8 ms/subtree

R2=.44

Page 61: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Generation with LFG/XLEGeneration with LFG/XLE

Parse: string → c-structure → f-structure Generate: f-structure → c-structure → string Same grammar: shared development, maintenance Formal criterion: s ∈ Gen(Parse(s)) Practical criterion: don’t generate everything

– Parsing robustness → undesired strings, needless ambiguity– Use optimality marks to restrict generation grammar– Restricted (un)tokenizing transducer: don’t allow arbitrary white

space, etc.

Page 62: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Mathematics and ComputationMathematics and Computation

Formal properties Gen(f) is a (possibly infinite) set

– Equality is idempotent: x=y x=y ⇔ x=y∧– Longer strings with redundant equations map to same f-structure

What kind of set?Context-free language (Kaplan & Wedekind, 2000)

Page 63: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

ComputationComputationXLE/LFG generation: Convert LFG grammar to CFG only for strings that map to f

– NP complete, ambiguity managed (as usual)– All strings in CFL are grammatical w.r.t. LFG grammar– Composition with regular relations is crucial

CFG is a packed, free-choice representation of all strings– Can use ordinary CF generation algorithms to enumerate strings– Can defer enumeration, give CFG for client to enumerate– Can apply other context-free technology

» Choose shortest string» Reduce to finite set of unpumped strings (Context free Pumping Lemma)» Choose most probable (for fluency, not grammaticality)

Page 64: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Generating from incomplete f-structuresGenerating from incomplete f-structures

Grammatical features can’t be read from– Back-end question-answering logic– F-structure translated from other language

Generating from a bounded underspecification of a complete f-structure is still context-free– Example: a skeleton of predicates– Proof: CFL’s are closed under union, bounded extensions produce

finite alternatives

Generation from arbitrary underspecification is undecidable– Reduces to undecidable emptiness problem (= Hilbert’s 10th)

(Dymetman, van Noord, Wedekind, Roach)

Page 65: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Question: What is the graph partitioning problem?– Generated Queries: “The graph partitioning problem is *”– Answer (Google): The graph partitioning problem is defined as

dividing a graph into disjoint subsets of nodes …

Question: When were the Rolling Stones formed?– Generated Queries: “The Rolling Stones were formed *”

“* formed the Rolling Stones *” – Answer (Google): Mick Jagger, Keith Richards, Brian Jones, Bill

Wyman,and Charlie Watts formed the Rolling Stones in

1962.

A (light-weight?) approach to QAA (light-weight?) approach to QA

ParseAsk GenerateQuestion F-structure Queries Search

Analyze the question, anticipate and search for possible answer phrases

Page 66: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Pipeline for Answer AnticipationPipeline for Answer Anticipation

Convert

GeneratorParserQuestionAnswerPhrases

Englishgrammar

Englishgrammar

Search(Google...)

Question f-structures Answer f-structures

Page 67: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Grammar engineering:Grammar engineering: The Parallel Grammar ProjectThe Parallel Grammar Project

Page 68: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Pargram projectPargram project Large-scale LFG grammars for several languages

– English, German, Japanese, French, Norwegian– Coming along: Korean, Urdu, Chinese, Arabic, Welsh, Malagasy, Danish– Intuition + Corpus: Cover real uses of language--newspapers, documents, etc.

Parallelism: test LFG universality claims– Common c- to f-structure mapping conventions

(unless typologically motivated variation)

– Similar underlying f-structures Permits shared disambiguation properties, Glue interpretation premises

– Practical: all grammars run on XLE software International consortium of world-class linguists

– PARC, Stuttgart, Fuji Xerox, Konstanz, Bergen, Copenhagen, Oxford, Dublin City University, PIEAS…

– Full week meetings, twice a year– Contributions to linguistics and comp-ling: books and papers – Each group is self-funded, self-managed

Page 69: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Pargram goalsPargram goals Practical

– Create grammatical resources for NL applications» translation, question answering, information retrieval, ...

– Develop discipline of grammar engineering» what tools, techniques, conventions make it easy to develop

and maintain broad-coverage grammars?» how long does it take?» how much does it cost?

Theoretical– Refine and guide LFG theory through broad coverage

of multiple languages– Refine and guide XLE algorithms and implementation

Page 70: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Parallel f-structures (where possible)Parallel f-structures (where possible)

Page 71: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

……but different c-structuresbut different c-structures

Page 72: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Pargram grammarsPargram grammars

German

English*

French

Japanese (Korean)

#Rules

251

388

180

56

#States

3,239

13,655

3,422

368

#Disjuncts

13,294

55,725

16,938

2,012

* English allows for shallow markup: labeled bracketing, named-entities

Page 73: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Why Norwegian and Japanese?Why Norwegian and Japanese?

Engineering assessment: given mature system, parallel grammar specs. How hard is it?

Norwegian: best case– Well-trained LFG linguists– Users of previous Parc software– Closely related to existing Pargram languages

Japanese: worst case– One computer scientist, one traditional Japanese linguist--no LFG

experience– Typologically different language– Character sets, typographical conventions

Conclusion: not that hardFor both languages: good coverage, accuracy in ~2 person years

Page 74: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Engineering resultsEngineering results

Grammars and Lexicons Grammar writer’s cookbook (Butt et al., 1999)

New practical formal devices– Complex categories for efficiency NP[nom] vs. NP: ( CASE) = NOM

– Optimality marks for robustness

enlarge grammar without being overrun by peculiar analyses

– Lexical priority: merging different lexicons

Integration of off-the-shelf morphologyFrom Inxight, based on earlier PARC research, and Kyoto

Page 75: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Accuracy and coverageAccuracy and coverage WSJ F scores for English Pargram grammar

– Produces dependencies, not labeled trees

– Stochastic model trained on sections 2-22

– Tested on dependencies for 700 sentences in section 23

– Robustness: some output for every input

Full

74.7%

Fragments

25.3%

Best 88.5 76.7

Most probable

82.5 69

Random 78.4 67.7

(Named Entities seem to bump these by ~3%)

Riezler et al., 2002

Page 76: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

““Meridian will pay a premium of $30.5 million to Meridian will pay a premium of $30.5 million to assume $2 billion in deposits.” assume $2 billion in deposits.”

mood(pay~0, indicative),tense(pay~0, fut),adjunct(pay~0, assume~7),obj(pay~0, premium~3),stmt_type(pay~0, declarative),subj(pay~0, Meridian~5),det_type(premium~3, indef),adjunct(premium~3, of~23),num(premium~3, sg),pers(premium~3, 3),adjunct(million~4, 30.5~28),number_type(million~4, cardinal),num(Meridian~5, sg),pers(Meridian~5, 3),obj(assume~7, $~9),stmt_type(assume~7, purpose),

subj(assume~7, pro~8),number($~9, billion~17),adjunct($~9, in~11),num($~9, pl),pers($~9, 3),adjunct_type(in~11, nominal),obj(in~11, deposit~12),num(deposit~12, pl),pers(deposit~12, 3),adjunct(billion~17, 2~19),number_type(billion~17, cardinal),number_type(2~19, cardinal),obj(of~23, $~24),number($~24, million~4),num($~24, pl),pers($~24, 3),number_type(30.5~28, cardinal))

Page 77: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Accuracy and coverageAccuracy and coverage Japanese Pargram grammar

– ~97% coverage on large corpora» 10,000 newspaper sentences (EDR)» 460 copier manual sentences» 9,637 customer-relations sentences

– F-scores against 200 hand-annotated sentences from newspaper corpus:

» Best: 87%» Average: 80%

Recall: Grammar constructed with ~2 person-years of effort (compare: Effort to create an annotated training corpus)

Page 78: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Robustness:Robustness: Some output for every input Some output for every input

Page 79: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Sources of BrittlenessSources of Brittleness

Vocabulary problems– Gaps in coverage, neologisms, terminology– Incorrect entries, missing frames…

Missing constructions– No theoretical guidance (or interest)

(e.g. dates, company names)

– Core constructions overlooked» Intuition and corpus both limited

Ungrammatical input– Real world text is not perfect– Sometimes it’s horrendous

Strict performance limits (XLE parameters)

Page 80: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Real world inputReal world input

Other weak blue-chip issues included Chevron, which went down 2 to 64 7/8 in Big Board composite trading of 1.3 million shares; Goodyear Tire & Rubber, off 1 1/2 to 46 3/4, and American Express, down 3/4 to 37 1/4. (WSJ, section 13)

``The croaker's done gone from the hook” (WSJ, section 13)

(SOLUTION 27000 20) Without tag P-248 the W7F3 fuse is located in the rear of the machine by the charge power supply (PL3 C14 item 15. (Copier repair tip)

Page 81: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

LFG entries from Finite-State MorphologiesLFG entries from Finite-State Morphologies

Broad-coverage inflectional transducers falls → fall +Noun +Pl

fall +Verb +Pres +3sg

Mary → Mary +Prop +Giv +Fem +Sg

vienne → venir +SubjP +SG {+P1|+P3} +Verb

For listed words, transducer provides– canonical stem form– inflectional information

Page 82: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

On-the-fly LFG entries On-the-fly LFG entries “-unknown” head-word matches unrecognized stems Grammar writer defines -unknown and affixes

-unknown N (↑ PRED)=‘%stem’ (↑ NTYPE)=common;

V (↑ PRED)=‘%stem<SUBJ,OBJ>’.

+Noun N-AFX (↑ PERS)=3.

+Pl N-AFX (↑ NUM)=pl.

+Pres V-AFX (↑ TENSE)=present

+3sg V-AFX (↑ SUBJ PERS)=3 (↑ SUBJ NUM)=sg

Pieces assembled by sublexical rules: NOUN → N N-AFX*.

VERB → V V-AFX*.

(transitive)

Page 83: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Guessing for unlisted wordsGuessing for unlisted words

Use FST guesser for general patterns– Capitalized words can be proper nouns

» Saakashvili → Saakashvili +Noun +Proper +Guessed

– ed words can be past tense verbs or adjectives» fumped → fump +Verb +Past +Guessed

fumped +Adj +Deverbal +Guessed

Languages with richer morphology allow better guessers

Page 84: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Subcategorization and Argument Mapping?Subcategorization and Argument Mapping?

Transitive, intransitive, inchoative…– Not related to inflection– Can’t be inferred from shallow data

Fill in gaps from external sources– Machine readable dictionaries– Other resources: VerbNet, WordNet, FrameNet, Cyc– Not always easy, not always reliable

» Current research

Page 85: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Grammatical failuresGrammatical failures

Fall-back approach First try to get a complete analysis

– Prefer standard rules, but– Allow for anticipated errors

E.g. subject/verb disagree, but interpretation is obvious

– Optimality-theory marks to prefer standard analyses

If fail, enlarge grammar, try again– Build up fragments that get complete sub-parses

(c-structure and f-structure)– Allow tokens that can’t be chunked– Link chunks and tokens in a single f-structure

Page 86: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Fall-back grammar for fragmentsFall-back grammar for fragments

Grammar writer specifies REPARSECAT– Alternative c-structure root if no complete parse– Allows for fragments and linking

Grammar writer specifies possible chunks– Categories (e.g. S, NP, VP but not N, V)– Looser expansions

Optimality theoryGrammar writer specifies marks to

» Prefer standard rules over anticipated errors» Prefer parse with fewest chunks» Disprefer using tokens over chunks

Page 87: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

ExampleExample

“The the dog appears.”

Analyzed as– “token” the– sentence “the dog appears”

Page 88: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

C-structureC-structure

Page 89: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

F-structureF-structure

• Many chunks have useful analyses• XLE/LFG degrades to shallow parsing in worst case

Page 90: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Robustness summaryRobustness summary

External resources for incomplete lexical entries – Morphologies, guessers, taggers– Current work: Verbnet, Wordnet, Framenet, Cyc– Order by reliability

Fall back techniques for missing constructions– Disprefered rules– Fragment grammar

Current WSJ evaluation:– 100% coverage, ~85% full parses– F-score (esp. recall) declines for fragment parses

Page 91: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Brief demo Brief demo

Page 92: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Stochastic disambiguation:Stochastic disambiguation: When you have to choose When you have to choose

Page 93: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Finding the most probable parse Finding the most probable parse XLE produces many candidates

– All valid (with respect to grammar and OT marks)– Not all equally likely– Some applications are ambiguity enabled (defer selection)– … But some require a single best guess

Grammar writers have only coarse preference intuitions– Many implicit properties of words and structures with unclear

significance Appeal to probability model to choose best parse

– Assume: previous experience is a good guide for future decisions– Collect corpus of training sentences– Build probability model that optimizes for previous good results– Apply model to choose best analysis of new sentences

Page 94: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

IssuesIssues

What kind of probability model? What kind of training data? Efficiency of training, disambiguation? Benefit vs. random choice of parse?

– Random is awful for treebank grammars– Hard LFG constraints restrict to plausible candidates

Page 95: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Probability modelProbability model Conventional models: stochastic branching process

– Hidden Markov models– Probabilistic Context-Free grammars

Sequence of decisions, each independent of previous decisions, each choice having a certain probability– HMM: Choose from outgoing arcs at a given state– PCFG: Choose from alternative expansions of a given category

Probability of an analysis = product of choice probabilities Efficient algorithms

– Training: forward/backward, inside/outside– Disambiguation: Viterbi

Abney 1997 and others: Not appropriate for LFG, HPSG…– Choices are not independent: Information from different CFG branches

interacts through f-structure– Relative-frequency estimator is inconsistent

Page 96: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Exponential models are appropriateExponential models are appropriate (aka Log-linear models) (aka Log-linear models)

Assign probabilities to representations, not to choices in a derivation

No independence assumption Arithmetic combined with human insight

– Human:» Define properties of representations that may be relevant

» Based on any computable configuration of f-structure features, trees

– Arithmetic:» Train to figure out the weight of each property

Page 97: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Stochastic Disambiguation in XLEStochastic Disambiguation in XLE All parses All parses Most probable Most probable

Discriminative ranking– Conditional log-linear model on c/f-structure pairs

Probability of parse x for string s, wheref is a vector of feature values for x is a vector of feature weightsZ is normalizer for all parses of s

– Discriminative estimation of from partially labeled data(Riezler et al. ACL’02)

– Combined l1-regularization and feature-selection» Avoid over-fitting, choose best features

(Riezler & Vasserman, EMNLP’04)

pλ (x | s) =eλ • f (x )

Z

Page 98: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Coarse training data for XLECoarse training data for XLE“Correct” parses are consistent with weak annotation

Considering/VBG (NP the naggings of a culture imperative), (NP-SBJ I) promptly signed/VBD up.

Sufficient for disambiguation, not for grammar induction

(S (S-ADV (NP-SBJ (-NONE- *-1)) (VP (VBG Considering) (NP (NP (DT the) (NNS naggings)) (PP (IN of) (NP (DT a) (NN culture) (NN imperative)))))) (, ,) (NP-SBJ-1 (PRP I)) (VP (ADVP-MNR (RB promptly)) (VBD signed) (PRT (RB up))) (. .))

Compare with full PTB annotation:

Page 99: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Classes of propertiesClasses of properties C-structure nodes and subtrees

– indicating certain attachment preferences Recursively embedded phrases

– indicating high vs. low attachment F-structure attributes

– presence of grammatical functions Atomic attribute-value pairs in f-structure

– particular feature values Left/right/ branching behavior of c-structures (Non)parallelism of coordinations in c- and f-structures Lexical elements

– tuples of head words, argument words, grammatical relations

~60,000 candidate properties, ~1000 selected

Page 100: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Some properties and weightsSome properties and weights0.937481 cs_embedded VPv[pass] 1-0.126697cs_embedded VPv[perf] 3-0.0204844 cs_embedded VPv[perf] 2-0.0265543 cs_right_branch-0.986274cs_conj_nonpar 5-0.536944cs_conj_nonpar 4-0.0561876 cs_conj_nonpar 30.373382 cs_label ADVPint-1.20711 cs_label ADVPvp-0.57614 cs_label AP[attr]-0.139274cs_adjacent_label DATEP PP-1.25583 cs_adjacent_label MEASUREP PPnp-0.35766 cs_adjacent_label NPadj PP-0.00651106 fs_attrs 1 OBL-COMPAR0.454177 fs_attrs 1 OBL-PART-0.180969fs_attrs 1 ADJUNCT0.285577 fs_attr_val DET-FORM the0.508962 fs_attr_val DET-FORM this0.285577 fs_attr_val DET-TYPE def0.217335 fs_attr_val DET-TYPE demon0.278342 lex_subcat achieve OBJ,SUBJ,VTYPE SUBJ,OBL-AG,PASSIVE=+0.00735123 lex_subcat acknowledge COMP-EX,SUBJ,VTYPE

Page 101: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

EfficiencyEfficiency Properties counts

– Associated with AND/OR tree of XLE contexts (a1, b2)» Detectors may add new nodes to tree: conjoined contexts

– Shared among many parses Training

– Dynamic programming algorithm applied to AND/OR tree» Avoids unpacking of individual parses (Miyao and Tsujii HLT’02)

» Similar to inside-outside algorithm of PCFG

– Fast algorithm for choosing best properties– Can train only on sentences with relatively low-ambiguity

» Shorter, perhaps easier to annotate

– 5 hours to train over WSJ (given file of parses) Disambiguation

– Viterbi algorithm applied to Boolean tree– 5% of parse time to disambiguate– 30% gain in F-score from random-parse baseline

Page 102: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Integrating Shallow Mark up:Integrating Shallow Mark up:Part of speech tagsPart of speech tagsNamed entitiesNamed entitiesSyntactic bracketsSyntactic brackets

Page 103: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Shallow mark-up of input stringsShallow mark-up of input strings Part-of-speech tags (tagger?)

I/PRP saw/VBD her/PRP duck/VB. I/PRP saw/VBD her/PRP$ duck/NN.

Named entities (named-entity recognizer)

<person>General Mills</person> bought it. <company>General Mills</company> bought it Syntactic brackets (chunk parser?)

[NP-S I] saw [NP-O the girl with the telescope]. [NP-S I] saw [NP-O the girl] with the telescope.

Page 104: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

HypothesisHypothesis Shallow mark-up

– Reduces ambiguity– Increases speed– Without decreasing accuracy– (Helps development)

Issues– Markup errors may eliminate correct analyses– Markup process may be slow– Markup may interfere with existing robustness mechanisms

(optimality, fragments, guessers)– Backoff may restore robustness but decrease speed in 2-

pass system

Page 105: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Implementation in XLEImplementation in XLEInput string

Marked up string

Tokenizer (FST)(plus POS, NE converter)

Morphology (FST)(plus POS filter)

LFG grammar(plus bracket metarule,

NE sublexical rule)

f-strc-strf-str

Input string

Tokenizer (FST)

Morphology (FST)

LFG grammar

c-str

Integration with minimal changes to existing system/grammar

Page 106: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Experimental Results: PARC 700Experimental Results: PARC 700

Full/All % Full

parses

Optimalsol’ns

Best

F-sc

Time

%

Unmarked 76 482/1753 82/79 65/100

Named ent 78 263/1477 86/84 60/91

POS tag 62 248/1916 76/72 40/48

Lab brk 65 158/ 774 85/79 19/31

Page 107: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Comparison: Shallow vs. Deep parsingComparison: Shallow vs. Deep parsingHLT, 2004HLT, 2004

Popular myth– Shallow statistical parsers are fast, robust… and useful

– Deep grammar-based parsers are slow and brittle

Is this true?Comparison on predicate-argument relations, not phrase-trees

– Needed for meaning-sensitive applications (= usefulness)

(translation, question answering…but maybe not IR)

– Collins (1999) parser: state-of-the-art, marks arguments

(for fair test, wrote special code to make relations explicit--not so easy)

– LFG/XLE with morphology, named-entities, disambiguation

– Measured time, accuracy against PARC 700 Gold Standard

– Results:» Collins is a bit times faster than LFG/XLE

» LFG/XLE makes somewhat fewer errors, provides more useful detail

Page 108: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

XLE SystemXLE System Parser/generator for LFG grammars: multilingual Composition with finite-state transductions Careful ambiguity-management implementation

– Preserves context-free locality in equational disjunctions– Exports ambiguity-enabling interfaces

Efficient implementation of clause-conjunction (C1C2)\

Log-linear disambiguation– Appropriate for LFG representations– Ambiguity-enabled theory and implementation

Robustness: shallow in the worst-case Scales to broad-coverage grammars, long sentences Semantic interface: Glue

Page 109: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

LFG/XLE: Current issuesLFG/XLE: Current issues Induction of LFG grammars from treebanks

– Basic work in ParGram: Dublin City University– Principles of generalization, for human extension, combination with

manual grammar DCU + PARC

Large grammars for more language typologies– E.g. verb initial: Welsh, Malagasy, Arabic

Reduce performance variance; why not linear?– Competence vs. performance: limit center embedding?– Investigate speed/accuracy trade-off

Embedding in applications: XLE as a black box– Question answering(!), Translation, Sentence condensation …– Develop, combine with other ambiguity-enabled modules

Reasoning, transfer-rewriting…

Page 110: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Matching for Question AnsweringMatching for Question Answering

F-structure

ParserParserQuestion

Englishgrammar

Englishgrammar

F-structure

AnswerSources

Semantics

Semantics

Overlap detector

Page 111: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Glue SemanticsGlue Semantics

Page 112: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Logical & Collocational SemanticsLogical & Collocational Semantics

Logical Semantics– Map sentences to logical representations of meaning– Enables inference & reasoning

Collocational semantics – Represent word meanings as feature vectors– Typically obtained by statistical corpus analysis– Good for indexing, classification, language modeling, word

sense disambiguation– Currently does not enable inference

Complementary, not conflicting, approaches

Page 113: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Example Semantic RepresentationExample Semantic Representation

F-structure gives basic predicate-argument structure, but lacks:

– Standard logical machinery (variables, connectives, etc)

– Implicit arguments (events, causes)

– Contextual dependencies (the wire = part25)

Mapping from f-structure to logical form is systematic, but non-trivial

The wire broke

PRED

SUBJ

TENSE

break<SUBJ>

PRED wireSPEC defNUM sg

past

Syntax (f-structure)

w. wire(w) & w=part25 & t. interval(t) & t<now & e. break_event(e) & occurs_during(e,t) & object_of_change(e,w) & c. cause_of_change(e,c)

Semantics (logical form)

Page 114: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Glue Semantics Glue Semantics Dalrymple, Lamping & Saraswat 1993 Dalrymple, Lamping & Saraswat 1993 and subsequentlyand subsequently

Syntax-semantics mapping as linear logic inference

Two logics in semantics:– Meaning Logic (target semantic representation) any suitable semantic representation– Glue Logic (deductively assembles target meaning) fragment of linear logic

Syntactic analysis produces lexical glue premises

Semantic interpretation uses deduction to assemble final meaning from these premises

Page 115: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Linear LogicLinear Logic Influential development in theoretical computer

science (Girard 87) Premises are resources consumed in inference

(Traditional logic: premises are non-resourced)

• Linguistic processing typically resource sensitiveWords/meanings used exactly once

Traditional LinearA, AB |= B A, A -o B |= BA, AB |= A&B A, A -o B |= AB A re-used A consumed

A, B |= B A, B |= B A discarded Cannot discard A

/

/

Page 116: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Glue Interpretation (Outline)Glue Interpretation (Outline) Parsing sentence instantiates lexical entries to produce

lexical glue premises Example lexical premise (verb “saw” in “John saw Fred”):

see : g -o (h -o f)Meaning Term Glue Formula2-place predicate g, h, f: constituents in parse “consume meanings of g and h to produce meaning of f”

• Glue derivation |= M : f

• Consume all lexical premises ,

• to produce meaning, M, for entire sentence, f

Page 117: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Glue Interpretation Glue Interpretation Getting the premisesGetting the premises

PRED

SUBJ

OBJ

see

PRED John

PRED Fred

f: g:

h:

S

NP VP

V NP

John saw Fred

Syntactic Analysis:

Lexicon: John NP john: Fred NP fred: saw V see: SUBJ -o (OBJ -o )

Instantiated premises: john: g fred: h see: g -o (h -o f)

Page 118: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Glue InterpretationGlue InterpretationDeduction with premisesDeduction with premises

Premises john: g fred: h see: g -o (h -o f)

Linear Logic Derivation g -o (h -o f) g

h -o f h f Using linear modus ponens

Derivation with Meaning Terms see: g -o (h -o f) john: g

see(john) : h -o f fred : h

see(john)(fred) : f

Linear modus ponens = function application

Page 119: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Fun: Arg:

Fun(Arg):

Modus Ponens = Function ApplicationModus Ponens = Function ApplicationThe Curry-Howard IsomorphismThe Curry-Howard Isomorphism

Curry Howard Isomorphism: Pairs LL inference rules with operations on meaning terms

g -o f g

f

Propositional linear logic inference constructs meanings LL inference completely independent of meaning language

(Modularity of meaning representation)

Page 120: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Semantic AmbiguitySemantic AmbiguityMultiple derivations from single set of premisesMultiple derivations from single set of premises

PRED criminal

MODSalleged

from London

f:

Alleged criminal from London Premises

criminal: f

alleged: f -o f

from-London: f -o f

Two distinct derivations:

1. from-London(alleged(criminal))

2. alleged(from-London(criminal))

Page 121: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Semantic Ambiguity & ModifiersSemantic Ambiguity & Modifiers

Multiple derivations from single premise set– Arises through different ways of permuting modifiers

around an skeleton Modifiers given formal representation in glue as

-o logical identities– E.g. an adjective is a noun -o noun modifier

Modifiers prevalent in natural language, and lead to combinatorial explosion– Given N f -o f modifiers, N! ways of permuting

them around f skeleton

Page 122: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Ambiguity management in semanticsAmbiguity management in semantics

Efficient theorem provers that manage combinatorial explosion of modifiers– Packing of N! analyses

» Represent all N! analyses in polynomial space» Compute representation in polynomial time» Free choice: Read off any given analysis in linear time

– Packing through structure re-use» N! analyses through combinations of N sub-analyses» Compute each sub-analysis once, and re-use

Page 123: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence

Parse

Generate

Select

Transfer

Interpret

Glue SemanticsLFG Syntax

FS Morphology

EnglishFrench

JapaneseGerman

AlgorithmsProgramsData structures

Mathematics

MultidimensionalArchitecture

Parc Linguistic EnvironmentParc Linguistic Environment

Translation

Condensation

Applications

Dialog

Question Answering

Email Routing

Knowledge tracking

Email ResponseTheorySoftwareTableware

UrduNorwegian

Models, parametersAm

big

uit

yM

anag

emen

t

ScaleModularityRobustness

Page 124: Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004 Advanced QUestion Answering for INTelligence