Probabilistic Lexical Models for Textual Inference

Bar Ilan University @ IBM July 2012

Eyal Shnarch, Ido Dagan, Jacob Goldberger

The entire talk in a single sentencewe address

with a

lexical textual

inference

principled probabilistic

improves state-of-the

Outlinewe address

with a

lexical textual

inference

we address

with a

lexical textual

inference

Textual inference – useful in many NLP apps

improves state-of-the-artprincipled probabilistic model

in Belgium Napoleon was defeated

In the Battle of Waterloo, 18 Jun 1815,

the French army, led by

Napoleon, was crushed.

Napoleon was

Emperor of the

French from 1804 to

lexical textual inference

Napoleon was not tall enough to win

the Battle of Waterloo

At Waterloo Napoleon did surrender...Waterloo - finally facing my Waterloo

Napoleon engaged in a series of wars, and won many

BIU NLP lab

Chaya Liebeskind

Lexical textual inference

• Complex systems use parser

• Lexical inference rules link terms from T to H• Lexical rules come from lexical resources• H is inferred from T iff all its terms are inferred

Improves state-of-the-artprincipled probabilistic model lexical textual inference

In the Battle of Waterloo, 18 Jun 1815, the French army, led by Napoleon, was crushed.

in Belgium Napoleon was defeatedText Hypothesis

1st or 2nd order co-occurrence

Textual inference for ranking

In which battle was Napoleon defeated?

Napoleon was

Emperor of the

French from 1804 to

Ranking textual inference – prior work

• Transform T’s parsed tree into H’s parsed tree• Based on principled ML model(Wang et al. 07, Heilman and Smith 10, Wang and Manning 10)

Syntactic-based

methods

• Fast, easy to implement, highly competitive• Practical across genres and languages(MacKinlay and Baldwin 09, Clark and Harrison 10,

Majumdar and Bhattacharyya 10)

Heuristic lexical

methods

Lexical entailment scores – current practice

• Count covered/uncovered• (Majumdar and Bhattacharyya, 2010; Clark and Harrison, 2010)

• Similarity estimation• (Corley and Mihalcea, 2005; Zanzotto and Moschitti, 2006)

• Vector space• (MacKinlay and Baldwin, 2009)

Mostly heuristic

we address

with a

lexical textual

inference

Probabilistic model – overview

H which battle was Napoleon defeated

Battle of Waterloo French army led by Napoleon was crushed

)( HTP

knowledge integration

term-level

sentence-level

)( 3hTP )( 1hTP

t1 t2 t3 t4 t5 t6

h1 h2 h3

)( 2hTP

annotations are available at

sentence-level only

x1 x2 x3

Knowledge integration

• Distinguish resources reliability levels• WordNet >> similarity-based thesauri (Lin, 1998; Pantel and Lin, 2002)

• Consider transitive chains length• The longer a chain is the lower its probability

• Consider multiple pieces of evidence• More evidence means higher probability

which battle was Napoleon defeated

transitive chainr

is a rulemultiple evidence

Probabilistic model – term level

chainr

chainhtP )()(

ris a rule

is the reliability level of the

resource which suggested r

1)( hTP

ACL 11 short paper this level parameters: one per input lexical resource

t1 t2 t3 t4 t5 t6

h1 h2 h3

multiple evidence

)](1[hchainsc

𝜃𝑊𝑁

𝜃𝑊𝑖𝑘𝑖

Probabilistic model – overview

)( HTP

knowledge integration

term-level

sentence-level

)( 3hTP )( 2hTP )( 1hTP

Probabilistic model – sentence level

we define hidden binary random variables:

xt = 1 iff ht is inferred from T (zero otherwise)

h1 h2 h3

x1 x2 x3

)( 3hTP )( 2hTP )( 1hTP

final sentence-level decision

Modeling with AND gate:• Most intuitively• However

• Too strict• Does not model terms dependency

Probabilistic model – sentence level

),|()( 1 jxiykyPkq tttij

this level parameters

}1,0{,, kji

h1 h2 h3

x1 x2 x3

y1 y2 y3

)( 3hTP )( 2hTP )( 1hTP

we define another binary random variable:

yt – inference decision for the prefix h1… ht

P(yt = 1) is dependent on yt-1 and xt

xt = 1 iff ht is inferred by T (zero otherwise)

M-PLM – inference

)1(),|()()()1(2

tttttn xyyPxPxPyP

1,,,,,

h1 h2 h3

x1 x2 x3

y1 y2 y3

)( 3hTP )( 2hTP )( 1hTP

qij(k)

)2()()()()()(}1,0{,

ijtttt kqjxPikyPk

)3()()( 11 kxPk

can be computed efficiently with a forward algorithm

)4()1()1( nnyP

M-PLM – summary

Annotation final sentence-level decision

Parameters

resource Observed

Lexical rules which link terms

Learning we developed EM

scheme to jointly learn all parameters

Hidden

so how our model does?

In which battle was Napoleon defeated?

Napoleon was

Emperor of the

French from 1804 to

we address

with a

lexical textual

inference

Evaluations – data sets

improves sate-of-the-artprincipled probabilistic model lexical textual inference

Ranking in passage retrieval for QA

(Wang et al. 07)

5700/1500 question-candidate answer pairs from TREC 8-13

Manually annotated

Notable line of work from recent years: Punyakanok et al. 04, Cui et al. 05, Wang et al. 07, Heilman and Smith 10, Wang and Manning 10

Recognizing textual entailment

within a corpus

20,000 text-hypothesis pairs in each RTE-5, RTE-6

Originally constructed for classification

Evaluations – baselines

Syntactic generative models• Require parsing• Apply sophisticated machine learning methods(Punyakanok et al. 04, Cui et al. 05, Wang et al. 07, Heilman and Smith 10, Wang and Manning 10)

Lexical model – Heuristically Normalized-PLM• AND-gate for the sentence-level• Add heuristic normalizations to addresses its disadvantages (TextInfer

workshop 11)

• Performance in line with best RTE systems

HN-PLM

QA results – syntactic baselines

MAP MRR40

Punyakanok et al.Cui et al. 05Wang and ManningWang et al. 07Heilman and Smith

QA results – syntactic baselines + HN-PLM

MAP MRR40

Punyakanok et al.Cui et al. 05Wang and ManningWang et al. 07Heilman and SmithHN-PLM

QA results – baselines + M-PLM

MAP MRR40

Punyakanok et al.Cui et al. 05Wang and ManningWang et al. 07Heilman and SmithHN-PLMM-PLM

RTE results – M-PLM vs. HN-PLM

RTE-5 MAP RTE-5 MRR RTE-6 MAP RTE-6 MRR40

HN-PLMM-PLM

+6.0%+3.6%

First approach - summary

Clean probabilistic lexical model• As a lexical component or as a stand alone inference system• Superiority of principled methods over heuristic ones • Attractive passage retrieval ranking method• Code available - BIU NLP downloads

M-PLM limits• Processing is term order dependent• Lower performance on classification vs. HN-PLM

does not normalize well across hypotheses length

improves state-of-the-artprincipled probabilistic model lexical textual inference

lexical textual

inference

1 2 3A

second approach:

resources as observers

each resource is a witness

Battle of Waterloo French army led by Napoleon was crushedt1 t2 t3 t4 t5 t6

h1 h2 h3

Bottom-up witnesses model

Battle of Waterloo French army led by Napoleon was crushedt1 t2 t3 t4 t5 t6

h1 h2 h3

x1 x2 x3

𝑃 (𝑊 (h𝑖 )|𝑥 𝑖=1 )= ∏𝑤∈𝑊 (h𝑖)

𝜃𝑤 ⋅ ∏𝑤∉𝑊 (h𝑖)

(1−𝜃𝑤)

𝜂0≝𝑃 (𝑥 𝑖=1∨𝑦=0)𝜂1≝𝑃 (𝑥𝑖=1∨𝑦=1)

𝜃𝑤=𝑃 (𝑤 (𝑥 𝑖 )=1∨𝑥 𝑖=1)𝜏𝑤=𝑃 (𝑤 (𝑥 𝑖 )=1∨𝑥𝑖=0) 𝑃 (𝑊 (h𝑖 )|𝑥 𝑖=0 )= ∏

𝑤∈𝑊 (h𝑖)𝜏𝑤 ⋅ ∏

𝑤∉𝑊 (h𝑖)(1−𝜏𝑤)

Likelihood

Advantages of the second approach

Inference:

• Hypothesis length is not an issue• Learn from non-entailing resources• and provide a recall and precision estimation for a resource

¿𝑃 (𝑊 (𝐻 )|𝑦=1 ) ⋅ 𝑃 (𝑦=1)

𝑃 (𝑊 (𝐻 ))

¿𝑃 (𝑊 (𝐻 )|𝑦=1 ) ⋅ 𝑃 (𝑦=1)

𝑃 (𝑊 (𝐻 )|𝑦=0 ) ⋅ 𝑃 ( 𝑦=0 )+𝑃 (𝑊 (𝐻 )|𝑦=1 ) ⋅ 𝑃 (𝑦=1)

(near) future plans

• Context model• There are other languages than English

• Deploy the new version of a Wikipedia-base lexical resource with the Italian dump

• Test the probabilistic lexical models for other languages• Cross language textual entailment

Cross Language Textual Entailment

quale battaglia fu sconfitto Napoleone

Italian monolingual

English-Italian phrase table

English monolingual

Bar Ilan University @ IBM July 2012 35/34

Demo examples:

[Bap,WN] no transitivityJack and Jill go_up the hill to fetch a pail of water Jack and Jill climbed a mountain to get a bucket of fluid

[WN,Wiki] <show graph>Barak Obama's Buick got stuck in Dublin in a large Irish crowdUnited_States_President's car got stuck in Ireland, surrounded by many people

Barak Obama - WN is out of date, need a new version of Wikipedia

Bill_Clinton's Buick got stuck in Dublin in a large Irish crowdUnited_States_President's car got stuck in Ireland, surrounded by many people

------------------------------------------------------------------------------[Bap,WN] this time with <transitivity & multiple evidence> Jack and Jill go_up the hill to fetch a pail of water Jack and Jill climbed a mountain to get a bucket of fluid

[VO,WN,Wiki]in the Battle_of_Waterloo the French army led by Napoleon was crushedin which battle Napoleon was defeated?

------------------------------------------------------------------------------[all]1. in the Battle_of_Waterloo the French army led by Napoleon was crushed 72%

2. Napoleon was not tall enough to win the Battle_of_Waterloo 47%

3. at Waterloo Napoleon did surrender... Waterloo - finally facing my Waterloo 34%

4. Napoleon engaged in a series of wars, and won many 47%

5. Napoleon was Emperor of the French from 1804 to 18159% [a bit long run]

Probabilistic Lexical Models for Textual Inference

Documents

Computational Modeling of Lexical Ambiguitymodeling multiple sense disambiguation tasks within a uniform framework, we propose a probabilistic model (topic model), which encodes human

A Sentiment Polarity Analyser based on a Lexical ...ceur-ws.org/Vol-1509/ITALIA2015_paper_2.pdf · A Sentiment Polarity Analyser based on a Lexical-Probabilistic Approach ... in the

Referential cohesion and textual variation in · PDF fileReferential cohesion and ... –Lexical cohesion •Reiteration –Repetition –Synonymy ... Analysis: oral narrative [A]

TEXTUAL INFORMATION SEGMENTATION BY COHESIVE TIES · partitioned text, lexical cohesion also reveals the textual segmentability which means how texts are seen not as a continuous

A Review of the Lexical Content and Its Treatment in ... · Contents (cont.) 2.2.1.3 Lexical Textual Frequency of Occurrence and 17 Lexical Range 2.2.1.4 Textual Frequency: Problems

Lexical Analysis (Scanning) Lexical Analysis (Scanning)

A Probabilistic Model of Lexical and Syntactic Access and ...web.stanford.edu/~jurafsky/cogsci-journal.pdf · input– are fundamental to language understanding. The literature abounds

Probabilistic Lexical Models for Textual Inference Eyal Shnarch, Ido Dagan, Jacob Goldberger Probabilistic Lexical Models for Textual Inference Eyal Shnarch,

Textual inference: Methods, open source platform and ...neumann/slides/biu-symposium-exci.pdf · Italian German English Lexical component Entailment rules (Biutee) Wikipedia Italian

CSE391 – 2005 NLP 1 Natural Language Processing Machine Translation Predicate argument structures Syntactic parses Lexical Semantics Probabilistic Parsing

Extending lexical resources for Abui - 2016 · 2 OUTLINE OF THE TALK Lexicographic workﬂow for ‘small’ languages • Wordlists • Textual corpus • Rapid Words • WordNet

Stuttgart October 2001 From an informal textual lexicon to a well-structured lexical database: An experiment in data reverse engineering WCRE 2001, Stuttgart,

Anahtar Kelimeler: LEXICAL, SYNTACTIC AND TEXTUAL … · Anahtar Kelimeler: Türkiye Türkçesi, ikilemeler, ikili tekrarlar, sözlükbirimsel görünümler, dizgesel görünümler,

A COESÃO LEXICAL COM O USO DO APLICATIVO ...repositorio.roca.utfpr.edu.br/.../1/CT_TCTE_I_2017_47.pdfum elemento da coesão textual: a anáfora, o qual será estudado por meio do

Marzieh Razavimarziehrazavi.github.io/cv/cv-razavi-short-classic.pdf · Dec. 2013 Probabilistic Lexical Modeling and Unsupervised Training for Zero-Resourced ASR, Ramya Rasipuram;

Lexical - sfs.uni-tuebingen.de · Slide - 2 Lexical Seman tics and Relations A Con textual Approac h to Lexical Seman tics Seman tic prop erties of a lexical item are fully re ected

Textual Data Analysis · Usually a Text Mining (TM) strategy includes the following steps: Pre-processing Lexical processing Information Extraction: Taltac2 Software - consists of

A AMBIGUIDADE LEXICAL NO GÊNERO TEXTUAL CHARGE: …dspace.bc.uepb.edu.br/jspui/bitstream/123456789/10380/1/PDF... · Existe uma polissemia produzida por uma memória discursiva,

Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996

ISCOL 2011 – Bar Ilan University /151 A Probabilistic Model for Lexical Entailment Eyal Shnarch, Jacob Goldberger, Ido Dagan Bar Ilan University