31
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido Dagan TAC November 2011, NIST, Gaithersburg, Maryland, USA Download at: http://www.cs.biu.ac.il/~nlp/downloads/biutee BIUTEE

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido

Embed Size (px)

Citation preview

Knowledge and Tree-Edits in Learnable Entailment Proofs

Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido Dagan

TACNovember 2011, NIST, Gaithersburg, Maryland, USADownload at: http://www.cs.biu.ac.il/~nlp/downloads/biutee

BIUTEE

2

RTE• Classify a (T,H) pair as

ENTAILING or NON-ENTAILING

T: The boy was located by the police.H: Eventually, the police found the child.

Example

3

Matching vs. Transformations• Matching

• Sequence of transformations (A proof)

– Tree-Edits• Complete proofs• Estimate confidence

– Knowledge based Entailment Rules• Linguistically motivated• Formalize many types of knowledge

T = T0 → T

1 → T

2 → ... → T

n = H

4

Transformation based RTE - ExampleT = T

0 → T

1 → T

2 → ... → T

n = H

Text: The boy was located by the police.Hypothesis: Eventually, the police found the child.

Transformation based RTE - ExampleT = T

0 → T

1 → T

2 → ... → T

n = H

Text: The boy was located by the police.

The police located the boy.

The police found the boy.

The police found the child.

Hypothesis: Eventually, the police found the child.

5

Transformation based RTE - ExampleT = T

0 → T

1 → T

2 → ... → T

n = H

6

7

BIUTEE Goals

• Tree Edits1. Complete proofs2. Estimate confidence

• Entailment Rules3. Linguistically motivated4. Formalize many types of knowledge

• BIUTEE• Integrates the benefits of both worlds

8

Challenges / System Components

1. generate linguistically motivated complete proofs?

2. estimate proof confidence?3. find the best proof?4. learn the model parameters?

How to

9

1. Generate linguistically motivated complete proofs

Entailment Rules

boy child

Generic Syntactic

Lexical Syntactic

Lexical

Bar-Haim et al. 2007. Semantic inference at the lexical-syntactic level.

11

Extended Tree Edits (On The Fly Operations)

• Predefined custom tree edits– Insert node on the fly– Move node / move sub-tree on the fly– Flip part of speech– …

• Heuristically capture linguistic phenomena– Operation definition– Features definition

Proof over Parse Trees - ExampleT = T

0 → T

1 → T

2 → ... → T

n = H

Text: The boy was located by the police.Passive to active

The police located the boy.X locate Y X find Y

The police found the boy.Boy child

The police found the child.Insertion on the fly

Hypothesis: Eventually, the police found the child.

12

13

2. Estimate proof confidence

14

Cost based Model

• Define operation cost– Assesses operation’s validity– Represent each operation as a feature vector– Cost is linear combination of feature values

• Define proof cost as the sum of the operations’ costs

• Classify: entailment if and only if proof cost is smaller than a threshold

Feature vector representation• Define operation cost

– Represent each operation as a feature vector

Features (Insert-Named-Entity, Insert-Verb, … , WordNet, Lin, DIRT, …)

The police located the boy.DIRT: X locate Y X find Y (score = 0.9)

The police found the boy.

(0,0,…,0.457,…,0)(0 ,0,…,0,…,0)Feature vector that represents the operation 15

An operation

A downward function of score

16

Cost based Model

• Define operation cost– Cost is linear combination of feature values

Cost = weight-vector * feature-vector• Weight-vector is learned automatically

)())(( ofwofC Tw

Confidence Model

• Define operation cost– Represent each operation as a feature vector

• Define proof cost as the sum of the operations’ costs

)()()()(11

PfwofwoCPC Tn

ii

Tn

iiww

Cost of proofWeight vector

Vector represents the proof.Define

)()(1

Pfofn

ii

18

Feature vector representation - exampleT = T

0 → T

1 → T

2 → ... → T

n = H

(0,0,……………….………..,1,0)

(0,0,………..……0.457,..,0,0)

(0,0,..…0.5,.……….……..,0,0)

(0,0,1,……..…….…..…....,0,0)

(0,0,1..0.5..…0.457,....…1,0)

+

+

+

=

Text: The boy was located by the police.

Passive to activeThe police located the boy.

X locate Y X find YThe police found the boy.

Boy childThe police found the child.

Insertion on the flyHypothesis: Eventually, the

police found the child.

19

Cost based Model• Define operation cost

– Represent each operation as a feature vector• Define proof cost as the sum of the

operations’ costs• Classify: “entailing” if and only if proof cost is

smaller than a threshold

bPfwT )(Learn

20

3. Find the best proof

21

Search the best proofT H

Proof #1Proof #2Proof #3Proof #4

22

Search the best proof

• Need to find the “best” proof• “Best Proof” = proof with lowest cost

‒ Assuming a weight vector is given• Search space is exponential

‒ AI style search algorithm

Proof #1Proof #2Proof #3Proof #4

T HProof #1Proof #2Proof #3Proof #4

T H

23

4. Learn model parameters

24

Learning

• Goal: Learn parameters (w, b)• Use a linear learning algorithm

– logistic regression, SVM, etc.

25

Inference vs. Learning

Training samples

Vector representation

Learning algorithm

w,bBest Proofs

Feature extraction

Feature extraction

26

Inference vs. Learning

Training samples

Vector representation

Learning algorithm

w,bBest Proofs

Feature extraction

27

Iterative Learning Scheme

Training samples

Vector representation

Learning algorithm

w,bBest Proofs

1. W=reasonable guess

2. Find the best proofs

3. Learn new w and b

4. Repeat to step 2

28

Summary- System Components

1. Generate syntactically motivated complete proofs?– Entailment rules– On the fly operations (Extended Tree Edit Operations)

2. Estimate proof validity?– Confidence Model

3. Find the best proof?– Search Algorithm

4. Learn the model parameters?– Iterative Learning Scheme

How to

29

Results RTE7

ID Knowledge Resources Precision %

Recall % F1 %

BIU1 WordNet, Directional Similarity 38.97 47.40 42.77

BIU2 WordNet, Directional Similarity, Wikipedia 41.81 44.11 42.93

BIU3 WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database

39.26 45.95 42.34

BIUTEE 2011 on RTE 6 (F1 %)

Base line (Use IR top-5 relevance) 34.63

Median (September 2010) 36.14

Best (September 2010) 48.01

Our system 49.54

30

Conclusions• Inference via sequence of transformations

– Knowledge– Extended Tree Edits

• Proof confidence estimation• Results

– Better than median on RTE7– Best on RTE6

• Open Source

http://www.cs.biu.ac.il/~nlp/downloads/biutee

Thank You

http://www.cs.biu.ac.il/~nlp/downloads/biutee