Global&Learning&of&Textual& EntailmentGraphsnlp.stanford.edu/joberant/homepage_files/talks/Public... · 2014. 7. 27. · Global&Learning&of&Textual& EntailmentGraphs ! Supervisors:

Global Learning of Textual Entailment Graphs

Supervisors: Eytan Ruppin, Ido dagan, Shimon Edelman 15/8/12

Natural Language Understanding

•  Roots in early AI (50’s)

•  InformaKon explosion •  2008: 1 trillion unique URLs found by Google

Global Learning of Textual Entailment Graphs 2

Organizing Textual InformaKon

•  Acute need to organize textual informaKon – Based on meaning rather than form

•  Etzioni, Search needs a shake-‐up, Nature 476 – Challenge: inference/language variability


NLU ApplicaKons

Q Where was Reagan raised? A Reagan was brought up in Dixon.

… IBS is characterized by lower abdominal pain …


NLU ApplicaKons

Purchase events:

IBM Coremetrics

Yahoo! Overture

Google reMail

… …

… Google acquires mobile E-mail utility reMail …


Obama gave a speech last night in the Israeli lobby conference

In his speech at the American Israel Public Affairs CommiFee yesterday, the president challenged … Barack Obama’s AIPAC

address yesterday ...

NLU ApplicaKons

•  MulK-‐document SummarizaKon

6 Global Learning of Textual Entailment Graphs

Textual Entailment

Textual Entailment (TE) -‐ DefiniKon •  A direcKonal relaKon between two text fragments: Text (t) and Hypothesis (h).

t textually entails h (t ⇒ h) if humans reading t will infer that h is most likely true

t: Usain Bolt celebrates 100m gold with Swedish women's handball team h: Usain Bolt won a gold medal

•  Recognising Textual Entailment (RTE): given t and h, recognize whether t textually entails h


Appeal of Textual Entailment •  Inferences required by semanKc applicaKons can be reduced to recognizing textual entailment: •  QA: (Harabagiu and Hickl, 2006)

•  Reagan was brought up in Dixon ⇒ Reagan was raised in X

•  IE: (Romano et al., 2006) •  Google acquired e-‐mail uRlity reMail ⇒ Xcompany purchase Ycompany

•  SummarizaKon: omit sentences entailed by current summary

•  Led to seven challenges since 2005 for semanKc inference systems


Entailment rules

•  TE systems require knowledge –  encoded as entailment rules:

•  AutomaKc methods for learning large-‐scale knowledge resources is imperaKve

giraffe ⇒ mammal X was brought up in Y ⇒ X was raised in Y Xcity capital of Ycountry ⇒ Xcity be part of Ycountry X [VERBtrans] Y ⇒ Y be [VERBparRciple] by X


PredicaKve Entailment Rules

•  ProposiKon: NL expression containing one predicate and at least one argument:

•  ProposiKonal template: proposiKon where at least one argument is replaced by a variable:

X jumped up and down Xplayer beat Yplayer X passed the baton to Y

Andy Murray jumped up and down Ryan Lochte beat Michael Phelps Yohan Blake passed the baton to Ussain Bolt


PredicaKve Entailment Rules

•  InformaKon is conveyed by proposiKons

Y is a symptom of X ⇒ X cause Y

X cause an increase in Y ⇒ X affect Y

X’s treatment of Y ⇒ X treat Y


ContribuKon 1: Entailment Graphs

X affect Y

X treat Y X lower Y

X reduce Y

X affect Y ⇒ X treat Y

X treat Y ⇒ X affect Y

X affect Y ⇒ X lower Y

X lower Y ⇒ X affect Y

…

…

X lower Y ⇒ X reduce Y

X reduce Y ⇒ X lower Y

•  Advantage: employ structural constraints


ContribuKon 2: Scaling

1

10

100

1000

10000

100000

2010 2011 2012

# rules

# nodes

exact exact approximate


ContribuKon 3: ApplicaKons

•  Public release of learned rules: 1.  High precision resource: 100,000 rules 2.  High coverage resource: 10,000,000 rules

•  Textual exploraKon applicaKon


Outline

•  Background •  Basic algorithm and ILP formulaKon •  Scalability 1: Graph decomposiKon •  Scalability 2: Tree-‐based approximaKon •  Text exploraKon applicaKon

17

Background

Local Learning

•  Sources of informaKon: 1.  Lexicographic: WordNet (Szpektor and Dagan, 2009),

FrameNet (Ben-‐aharon et al, 2010) 2.  Paiern-‐based (Chklovsky and Pantel, 2004) 3.  DistribuKonal similarity (Lin and Pantel, 2001; Szpektor

and Dagan, 2008; Bhagat et al, 2007;Yates and Etzioni, 2009; Poon and Domingos 2010; Schoenmackers et al., 2010)

Input: pair of predicates (p1,p2) QuesKon: p1→p2 ?


Background à Basic algorithm and ILP formulaKon à Graph decomposiKon à Tree-‐based approximaKon à Text exploraKon applicaKon

19

ProperKes of InformaKon Sources

1.  Lexicographic: – WordNet relaKons: hyponym, derivaKon, entailment

– Limited coverage: “X cause a reducRon in Y” 2.  Paiern-‐based:

– “he scared and even startled me” – Requires very large corpus (web-‐scale) – DisKnguishes different semanKc relaKons:

•  “to X and even Y” vs. “Either X or Y” 3.  DistribuKonal similarity



20

DistribuKonal Similarity

X affect Y

# Y X 7 metabolism insulin

4 BP Zantac

… … …

X treat Y

# Y X 9 BP Zantac

2 diabetes diet

… … …

•  Vary w.r.t to representaKon and similarity computaKon •  Symmetric vs. direcKonal

•  Good coverage •  Discerning exact semanKc relaKon is difficult



21

Global Learning Input: Set of predicates P QuesKon: Find E = { (p1,p2) | p1 à p2 }

•  Applying transiKvity constraints: – Undirected graphs:

•  Paraphrasing (Yates and Etzioni, 2009) •  Co-‐reference resoluKon (Finkel and manning, 2008)

– Directed graph •  Taxonomy inducKon (Snow et al., 2006) •  Ontology inducKon (Poon and Domingos, 2010) •  Temporal informaKon extracKon (Ling and Weld, 2010)



22

Global Learning

•  Snow et al. (2006) presented an algorithm for taxonomy inducKon.

mammal

carnivore

feline

horse

cat

•  At each step they add the concept that maximizes the likelihood of the taxonomy given the transiKvity constraint.



23

Basic Algorithm and ILP FormulaKon

Entailment Graphs

•  Nodes: proposiKonal templates •  Edges: entailment rules X affect Y

ProperKes •  Entailment is transi8ve •  Strong connecKvity components correspond to “equivalence” •  Caveat: ambiguity

X treat Y X lower Y

X reduce Y


Background à Basic algorithm and ILP formula8on à Graph decomposiKon à Tree-‐based approximaKon à Text exploraKon applicaKon

25

Learning Entailment Graph Edges

Input: Corpus C Output: Entailment graph G = (P,E)

1.  Extract proposiKonal templates P from C 2.  Train a local entailment classifier: given (p1,p2),

esKmate whether p1à p2 3.  Decoding: Find the edges of the graph using the

local probabiliKes and a transiKvity constraint



26

raise,lower

Corpus

treat(Norvasc,BP) affect(Norvasc,BP) treat(insulin,metab.) affect(diet,diabetes) raise(wine,faRgue) lower(wine,BP)

Local Entailment Classifier – Step 1+2

PosiKve: Treatàaffect Raise,Lower

treatàaffect 0.1 0.9 0.8 0.6 1 0 0.7

treatàaffect raise à lower

raiseàlower 0 0.1 0.2 0 0 0.3 0

classifier

(raise,increase)

P = 0.7

Distant supervision



27

)1()1(log

maxargˆ

θθ−⋅−

=

⋅=

⋅

≠∑

ij

ijij

ijji

ijX

ppw

xwG

Graph ObjecKve FuncKon

•  Use local classifier probabiliKes pij to express the graph probability:

“density” prior

1  i à j 0 else



28

Global Learning of Edges – Step 3

•  Problem is NP-‐hard: –  ReducKon from “TransiKve Subgraph” (Yannakakis, 1978)

Input: Set of nodes V, weighKng funcKon w:V×Và R Output: Set of directed edges E that maximizes the objecKve funcKon under a global transi8vity constraint

Input: Directed graph G = (V,E) Output: Maximal set of edges A ⊆ E such that G’ = (V,A) is transiKve

•  Integer Linear Programming FormulaKon



29

Integer Linear Program

}1,0{1,,,

maxargˆ

∈

≤∈∀

⋅=

−+

≠∑

ij

ikjkij

ijji

ij

xxxxVkji

xwG

•  Variables: xij •  ObjecKve funcKon: maximizes P(G) •  Global transiKvity constraint: -‐ O(|V|3) constraints

i

k

j 1 1

0 1+1-‐0 = 2 > 1



30

EvaluaKon: Focused Entailment Graphs

•  Argument is instanKated by a target concept (nausea) •  InstanKaKng an argument reduces ambiguity

X-‐related-‐to-‐nausea X-‐associated-‐with-‐nausea

X-‐prevent-‐nausea X-‐help-‐with-‐nausea

X-‐reduce-‐nausea X-‐treat-‐nausea



31

32

Experimental EvaluaKon

•  50 million word tokens healthcare corpus •  Ten medical students prepared gold standard graphs for 23 medical concepts: – Smoking, seizure, headache, lungs, diarrhea, chemotherapy, HPV, Salmonella, Asthma, etc.

•  EvaluaKon: – F1 on set of learned edges vs. gold standard



33

Gold Standard Graph -‐ Asthma Background à Basic algorithm and ILP formula8on à Graph decomposiKon à Tree-‐based approximaKon à Text exploraKon applicaKon

Evaluated algorithms

•  Local algorithms – DistribuKonal similarity – WordNet – Local classifier

•  Global algorithms –  ILP/Snow et al. (greedy opKmizaKon)



35

Results F1 Precision recall

43.8* 50.1 46.0 ILP-‐global

36.6 37.1 45.7 Greedy

38.1 45.3 44.5 ILP-‐local

37.5 34.9 53.5 Local1 37.7 31.6 52.5 Local2 39.8 38.0 53.5 Local*1 38.1 32.1 52.5 Local*2 13.2 44.1 10.8 WordNet

•  The algorithm significantly outperforms all other baselines.



36

Results Global=false/Local=true Global=true/Local=false

42 48 Gold standard = true

494 78 Gold standard = false

•  Global algorithm avoids false posiKves.



37

IllustraKon – Graph Fragment



38

Graph DecomposiKon

Graph DecomposiKon

•  ProposiKon: If we can parKKon the nodes to I,J such that wij<0 for every i,j then any (i,j) is not an edge in the opKmal soluKon

I J


Background à Basic algorithm and ILP formulaKon à Graph decomposi8on à Tree-‐based approximaKon à Text exploraKon applicaKon

40

Graph DecomposiKon

wij>0 wij<0

•  Entailment graphs are sparse



41

Scalability – Graph DecomposiKon

Wij>0 Wij<0

•  Local soluKon •  Trans. violaKons •  Local soluKon •  Global SoluKon •  Local SoluKon •  Conn. components



42


1.  Input 2.  Undirected

posiKve edges 3.  Conn.

Components 4.  Apply ILP



43


•  Input •  Undirected posiKve edges •  Conn. Components 5. Global soluKon



44

DecomposiKon Extension



45

Cuzng-‐plane Method

•  Applied in parsing (Riedel and Clarke, 2006) •  Number of constraints: O(|V|3) •  A reasonable local classifier does not violate most of the constraints

•  Add constraints only if they are violated



46

Cuzng-‐plane Method

a b

dc

Constraints: 1.  (c,a,b) 2.  (d,b,a) 3.  (d,b,c)

•  Empirically converges in 6 iteraKons and reduces number of constraints from 106 to 103-‐104

Wij>0 Wij<0



47

Final Algorithm

•  Given a set of predicates: – Decompose into components – Apply a cuzng-‐plane method on each component



48

EvaluaKon: Typed Entailment Graphs

•  Predicate variables are typed: – Xcompany acquire Ycompany à Ycompany is sold to Xcompany

•  Rules of wide-‐applicability but less ambiguous •  Schoenmackers et al. (EMNLP 2010) used a local algorithm to learn 30,000 entailment rules

•  We want to use a global algorithm



49

Typed Entailment Graphs

province of (place,country)

become part of (place,country)

annex (country,place)

invade (country,place)

•  A graph is defined for every pair of types •  Nodes are termed typed predicates •  Types alleviate the ambiguity problem



50

Experiment 1 -‐ TransiKvity

•  1 million TextRunner tuples over 10,672 typed predicates and 156 types

•  Consist ~2,000 typed entailment graphs •  10 gold standard graphs of sizes: 7, 14, 22, 30, 38, 53, 59, 62, 86 and 118

•  EvaluaKon: – F1 on set of edges vs. gold standard – Area Under the Curve (AUC)



51

Evaluated algorithms

•  Local algorithms – DistribuKonal similarity (DIRT, BInc, etc.) – DistribuKonal similarity + background knowledge –  ILP with No transiKvity constraints – Sherlock rule resource

•  Global algorithm: ILP



52

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

precision

recall

Global Local

Precision-‐Recall: Global vs. Local



53

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

precision

recall

Global Local1 Local2 Local3 Local4




54

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

precision

recall

Global Sherlock




55

Results Micro-‐average AUC

R P F1 ILP 43.4 42.2 42.8 0.22

Clsf 30.8 37.5 33.8 0.17

Sherlock 20.6 43.3 27.9 N/A

SR 38.4 23.2 28.9 0.14

DIRT 25.7 31.0 28.1 0.13

BINC 31.8 34.1 32.9 0.17

•  R/P/F1 at point of maximal micro-‐F1 •  TransiKvity improves rule learning over typed predicates



56

Experiment 2 -‐ Scalability

•  Run ILP with and without graph decomposiKon over ~2,000 graphs

•  Compare number of learned rules for various sparseness parameters



57

Results Sparsity # learned rules Δ(%)

-‐1.75 6,242/7,466 +20

-‐1 16,790/19,396 +16

-‐0.6 26,330/29,732 +13

•  Scaling techniques add 3,500 new rules to best configuraKon

•  Corresponds to 13% increase in relaKve recall



58

Tree-‐based ApproximaKon

Tree-‐based ApproximaKon -‐ Outline

1.  Forest-‐reducible graphs (FRG) 2.  Tree-‐Node-‐Fix algorithm (TNF) 3.  Experiments:

1.  Typed entailment graph 2.  Untyped entailment graphs

Directed Forest

•  For an edge iàj in a DAG we say that i is a child of j and j a parent of i

•  A directed forest is a DAG where all nodes have no more than one parent

j

i

i


Background à Basic algorithm and ILP formulaKon à Graph decomposiKon à Tree-‐based approxima8on à Text exploraKon applicaKon

61

Entailment Graphs are not Forests

Xdisease be epidemic in

Ycountry

Xdisease common in Ycountry

Xdisease occur in Ycountry

Xdisease frequent in Ycountry

Xdisease begin in Ycountry

•  But generally entailment is from “specific” predicates to more “general” ones



62

CompuKng SCC Graph


Ycountry

Xdisease common in Ycountry Xdisease frequent in Ycountry



•  Contract strongly connected components.



63

TransiKve ReducKon


Ycountry

Xdisease common in Ycountry Xdisease frequent in Ycountry



•  Delete edges inferred by transiKvity Global Learning of Textual Entailment Graphs


64

Reduced Graphs

•  Reduced graphs are an equivalent representaKon for transiKve graphs.

•  The reduced graph in this case is a directed forest

epidemic in

common in

occur in

frequent in

begin in

be epidemic in

common in frequent in

occur in

begin in



65

Forest-‐reducible Graphs (FRG) •  Hypothesis: The reduced graph of entailment graphs is usually a directed forest, i.e., entailment graphs are FRGs

•  Not necessarily true:

Xperson ambush Yperson

•  But excepKons are rare…

Xperson wait for Yperson

Xperson plan to aZack

Yperson



66

FRG AssumpKon is reasonable

•  Over our Typed entailment graph data set: – We manually constructed FRGs >= .95 recall and 1.0 precision on 7 gold standard graphs

•  Finding the best FRG is sKll NP-‐hard, by a polynomial reducKon from X3C.

Input: Set X of size 3n, subsets S1, S2, …, Sm of size 3 Output: Subsets Si1, …, Sin covering X



67

.

.

.

x1

x2

x3

x4

x5

x3n-‐1

x3n

.

.

.

.

s1

s2

sm

.

.

.

.

t1

tm

t2 a

2

2

2

1

1

1

M M

M

1

1

1

1

1 1

1 1 1

FRG Summary •  Entailment graphs are o~en FRGs

–  InteresKng linguisKc observaKon – Leads to an efficient method for learning edges



69

Tree-‐based ApproximaKon -‐ Outline

•  Forest-‐reducible graphs (FRG) •  Tree-‐Node-‐Fix algorithm (TNF) •  Experiments:

– Typed entailment graph – Large untyped entailment graphs + resource

SequenKal ApproximaKon

2 3

1

4

5

Input: Nodes V and weighting function w!1. Initialize graph (e.g., empty graph)!2. For every node v in V:!

1.  Re-attach v in a locally-optimal way!3. Until convergence!



71

2 3

1

4

5

Reaiach (2)




Candidate edges: (1,2) (2,1) (2,3) (3,2) (2,4) (4,2) (2,5) (5,2)



72

2

3

1

4

5




Candidate edges: (1,2) (2,1) (2,3) (3,2) (2,4) (4,2) (2,5) (5,2)



73

2

3

1

4

5

Reaiach (4)




Candidate edges: (1,4) (4,1) (2,4) (4,2) (3,4) (4,3) (4,5) (5,4)



74

2

3

1

45




Candidate edges: (1,4) (4,1) (2,4) (4,2) (3,4) (4,3) (4,5) (5,4)



75

Tree-‐Node-‐Fix: Re-‐aiach v

Input: Graph G, node v in V, function w!1. Delete v!2. Compute reduced graph Gred (directed

forest)!3. Re-attach v in a locally-optimal way.!

Node re-‐aiachment in FRGs takes linear Kme!



76

ObjecKve FuncKon

v

c3 c4 c2

c1

c5

∑∑≠

⋅

≠

⋅ +vk

vkvkvi

iviv xwxwmaxarg

Incoming edges Outgoing edges Global Learning of Textual Entailment Graphs


77

Scores for Components

v

c3 c4 c2

c1

c5

∑∑∈

−

∈

− +=)(

)()(cchildd

invci

ivinv dSwcS Sv-‐in(c4)



78

DefiniKons

v

c3 c4 c2

c1

c5

∑∑∈

−

∈

− +=)(

)()(cchildd

invci

ivinv dSwcS Sv-‐in(c1)



79

DefiniKons

v

c3 c4 c2

c1

c5

Sv-‐out(c4) ))(()( cparentSwcS outvci

vioutv −

∈

− +=∑



80

DefiniKons

v

c3 c4 c2

c1

c5

Sv-‐out(c5) ))(()( cparentSwcS outvci

vioutv −

∈

− +=∑



81

TNF – case 1

v

c3

•  InserKon into one of the components

c4 c2

c1

c5

)()( cScS outvinv −− +Global Learning of Textual Entailment Graphs


82

TNF – case 2

v

c2

•  Choose a single parent •  Only direct children of p can become children of v •  Once a parent is chosen, choose children independently.

c3 c1

p

g

))(,0max()( cSpS invc

outv −− ∑+Global Learning of Textual Entailment Graphs


83

TNF – case 3

v

c2

•  Choose no parent (forest root) •  Choose children independently from roots.

c3 c1

r

g

))(,0max( rS invrootsr

−

∈∑



84

TNF -‐ Summary

•  CompuKng Sv-‐in(c) and Sv-‐out(c) is linear (DP) •  Choosing best case is linear •  Re-‐aiachment is linear •  TNF is quadraKc •  Extension: Tree-‐Node-‐Component-‐Fix (TNCF)

– Allow re-‐aiachment of reduced graph components.



85

Complexity Results

FRG TransiKve Graph

NP-‐hard NP-‐hard EnKre graph

Linear ? Node re-‐aiachment

86 Global Learning of Entailment Graphs

Background à Algorithm scheme and ILP formulaKon à Graph decomposiKon à Tree-‐based approxima8on à Future work

Outline

•  Forest-‐reducible graphs (FRG) •  Tree-‐Node-‐Fix algorithm (TNF) •  Experiments:

– Typed entailment graph – Large untyped entailment graphs + resource

Experiment 1: Typed Graphs

province of (place,country)

become part of (place,country)

annex (country,place)

invade (country,place)

•  ACL 11’ experimental sezng •  Medium-‐size graphs •  Predicates/nodes are disambiguated



88

Evaluated Algorithms

•  No-‐trans: no transiKvity constraints (wij) •  Exact-‐graph: ACL 11’ exact method •  Exact-‐forest: Find opKmal FRG with ILP solver •  LP-‐relax (MarKns et al., 09’) •  Tree-‐Node-‐Fix (TNF) •  Graph-‐Node-‐Fix (GNF)



89

Run Kme



90

Precision-‐recall Curve



91

Expriment 2: Untyped Graphs •  Large graph: 10,000 predicates •  Based on REVERB (fader et al., 11’) •  Predicates are ambiguous

– TransiKvity and FRG assumpKon

•  Features: distribuKonal similarity, lexicographic and string-‐similarity features for every pair of predicates

X lower Y

X reduce Y



92

Training and Test Set GeneraKon

•  Gold standard annotaKon: –  Instance-‐based evaluaKon with crowdsourcing (Zeichner et al., 2012)

– Training set: ~3,300 pairs of predicates – Test set: ~1,750 pairs of predicates

•  Trained classifier provides wij

•  We managed to run only No-‐trans, TNF, and TNCF



93

Experiment 2 -‐ Results F1 Precision Recall Method 0.24 0.7 0.14 No-‐trans 0.25 0.7 0.15 TNF 0.27 0.8 0.16 TNCF

•  FRG and TransiKvity assumpKons limit recall •  TNCF – improved precision for recall ~ 0.15 •  High precision resource of > 100,000 entailment rules



94

Resource

•  High-‐precision resource of > 100,000 entailment rules

•  In addiKon, 15 million rules learned over 100,000 predicates – Outperforms DIRT (Lin and Pantel, 2001)

•  SyntacKc and lexical representaKons •  hip://u.cs.biu.ac.il/~nlp/downloads/



95

Local Results

97

Re-‐aiachment Examples

offer a wealth of provide a wealth of

provide a lot of provide plenty of

offer a lot of

offer plenty of

give a lot of

generate a lot of

mean a lot of cause of

cause



encrypt

convert convince

compress

code encode



depress

deter

discourage

bring down

lower


Conclusions

•  NLU ApplicaKons require large knowledge resources of predicaKve entailment rules

•  In this dissertaKon we presented – A model that uKlizes graph structure to improve rule learning

– Algorithms that scale for learning entailment graphs – A large knowledge resource of rules – An applicaKon for text exploraKon

•  Future work: ambiguity


Thanks for coming!


Documents

Global&Learning&of&Textual& EntailmentGraphsnlp.stanford.edu/joberant/homepage_files/talks/Public... · 2014. 7. 27. · Global&Learning&of&Textual& EntailmentGraphs ! Supervisors: