35
Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Embed Size (px)

Citation preview

Page 1: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Two-Stage Constraint Based Sanskrit Parser

Akshar Bharati,

IIIT,Hyderabad

Page 2: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Brief outline

Dependency Paninian framework vibhakti-karaka correspondence karaka frames (basic + transformation) Source groups, demand groups

Constraints Three basic constraints Constraints as Integer programming equations

Page 3: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Notions from Paninian Framework – a)Karaka relations

It uses the notion of karaka relations between verbs and nouns in a sentence.

The notion of karaka relations is central to the Paninian model.

The karaka relations are syntactico-semantic (or semantico-syntactic) relations between the verbals and other related constituents in a sentence.

Page 4: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Notions from Paninian Framework – Demand Frames

For the task of karaka assignment, the core parser uses the fundamental principle of ' akanksha' (demand unit) and ' yogyata' (qualification of the source unit) .

Ex: CAwraH vixyAlayam gacCawi (student) (school) (go)

Verb Frame for this form of “gacCawi”

Page 5: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Demand Frame Gam1:

-------------------------------------------------------------------------------

arc-label necessity vibhakti lex-type src-pos arc-dir

-----------------------------------------------------------------------------

K1 m 1 n l ds

K2 m 2 n l ds

K3 m 3 n l ds

K5 m 5 n l ds

Page 6: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Constraint Based Parsing Computational Paninian Model Integer Programming with basic constraints

For each mandatory karakas in a karaka chart there should be exactly one outgoing edge labelled by the karaka from the demand group

For each of the desirable or optional karakas in a karaka chart there should be at most one outgoing edge labelled by the karaka from the demand group

There should be exactly one incoming arc into each of the source group

Page 7: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Parser

Two stage strategy

Stage I (Intra-clausal relations) Dependency relations marked Relations such as k1, k2, k3, etc. for each verb

Stage II (Inter-clausal relations & conjunct relations) Conjuncts and relative clauses

Page 8: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Steps in Parsing

Morph, POS tagging,Chunking

SENTENCE

Identify DemandGroups

Load Frames&

Transform

Find CandidatesApply

Constraints& Solve

Final ParseIs ComplexNO

YES

STAGE - II

Page 9: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Morph,Chunked,Tagged data

((

1 (( NP <fs af='CAwra,n,m,sg,,o,1,1‘ '>

1.1 CAwraH NN <fs af='CAwra,n,m,sg,,o,1,1'>

))

2 (( NP <fs af='vixyAlaya,n,m,sg,,d,2,2’>

2.1 vixyAlayam NN <fs af='vixyAlaya,n,m,sg,,d,2,2'>

))

3 (( VGF <fs af='gam1,v,,sg,3,,karwari_lat, gaNaH='BvAxiH' paxI='parasmEpaxI' XAwuH='gamLz'>

3.1 gacCawi VM <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'>

))

))

Page 10: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

CAwraH <fs af='CAwra,n,m,sg,,o,1,1'>

vixyAlayam <fs af='vixyAlaya,n,m,sg,,d,2,2'>

gacCawi <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'>

Page 11: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Demand Frame Gam1:

-------------------------------------------------------------------------------

arc-label necessity vibhakti lex-type src-pos arc-dir

-----------------------------------------------------------------------------

K1 m 1 n l ds

K2 m 2 n l ds

K3 m 3 n l ds

K5 m 5 n l ds

Page 12: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

k1 k2

CAwraH vixyAlayam gacCawi

Page 13: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Sanskrit Example

CAwraH vixyAlayam gacCawi

Page 14: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Steps (Stage II)

Identify NewDemandGroups

Load Frames&

Transform

FindCandidates

ApplyConstraints

& Solve

FINAL PARSE

Repair

Output ofSTAGE - I

Page 15: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Example – Relative Clause

vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE that book which Ram ERG. Mohana DAT. gave is famous is ‘The book which Ram gave to Mohana is famous’

Page 16: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Output after Stage - I

xI

puswaka

mohanarAma

k2k4

k1

_ROOT_

jo

hEk1

prasixXa

k1s

mainmain

vaha

Page 17: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Identify the demand group

xiyA ‘give’Main verb of the relative clause

Page 18: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Identify the demand group,Load and Transform DF

jo ‘which’ transformation (special) Transforms the demand frame of the main verb of the

relative clause

--------------------------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype src-pos arc-dir oprt--------------------------------------------------------------------------------------------------------------nmod__relc m any n r|l p insert--------------------------------------------------------------------------------------------------------------

Page 19: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Karaka Frame

vaha puswaka jo rAma ne mohana ko xI prasixXa hE | that book which Ram ERG. Mohana DAT. gave famous is‘The book which Ram gave to Mohana is famous’

Main verb of relative clause

--------------------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype src-pos arc-dir oprt--------------------------------------------------------------------------------------------------------nmod__relc m any n r|l p insert---------------------------------------------------------------------------------------------------------

Transformed frame for xe after applying the jo trasformation

New row inserted after

transformation

Page 20: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Possible candidates

vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE |

nmod__relc

Page 21: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Output after Stage - II

xiyA hE

vaha puswaka

mohana rAma

k2k4

k1

_ROOT_

jo

hEk1

prasixXa

k1s

nmod__relc

main

Page 22: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Example II – Coordination

rAma Ora siwA kala Aye | Ram and Sita yesterday came ‘Ram and Sita came yesterday’

Page 23: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Output of Stage - I

rAma

_ROOT_

Ayek1

siwA

Ora

kala

k7t

dummydummy

main

Page 24: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

For Stage – II (Constraint Graph)

rAma

_ROOT_

Ayek1

siwA

Ora

kala

main

k7tccof

ccof

Page 25: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Candidate Arcs

rAma

_ROOT_

Ayek1

siwA

Ora

kala

main

k1

k1

ccofccof

Page 26: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Solution Graph

rAma

_ROOT_

Aye

siwA

Ora

kala

k7t

maink1

ccofccof

Page 27: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Parse tree

Aye

kalaOra

k7tk1

_ROOT_

rAma siwA

ccofccof

main

Output after Stage II

Page 28: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Results for Hindi

Page 29: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Results

CBP: Results when only the first parse is considered

CBP’’: When best parse of the first 25 parses are considered

CBP was tested on 220 sentences These are the results published in IALP-2008

Page 30: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Work Progress in Sanskrit

Existing Constraint Based parser for Sanskrit can parse simple sentences.

Over 2000 demand charts Two stage parsing needs more development Experiments performed with 268 simple sentences Re-ranking of parses is not done,only the first parse is

considered for results Results not very accurate due to data problems

Page 31: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Results in Sanskrit

Labelled attachment score: 540 / 1213 * 100 = 44.52 %

Unlabeled attachment score: 876 / 1213 * 100 = 72.22 %

Label accuracy score: 566 / 1213 * 100 = 46.66 %

Page 32: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Treebank requirement

Proper Gold tagged,chunked and dependency marked data for Sanskrit will improve the efficiency of the parser

Annotation with proper tools It will also help us in using machine learning

methods to train statistical parsers for Sanskrit

Page 33: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

Further work on Constraint Based Parsing.

Extension of the parser using treebank data Hybrid approaches

Soft Constraints Pruning of the graph in data driven parsers using

Constraint Graph Allow learning of the parser from the treebank

data Better performance

Page 34: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

What we expect From Data

((

1 (( NP <fs af='CAwra,n,m,sg,,o,1,1' drel='k1:3' name='1'>

1.1 CAwraH NN <fs af='CAwra,n,m,sg,,o,1,1'>

))

2 (( NP <fs af='vixyAlaya,n,m,sg,,d,2,2' drel='k2:3' name='2'>

2.1 vixyAlayam NN <fs af='vixyAlaya,n,m,sg,,d,2,2'>

))

3 (( VGF <fs af='gam1,v,,sg,3,,karwari_lat,' name='3' gaNaH='BvAxiH' paxI='parasmEpaxI' XAwuH='gamLz'>

3.1 gacCawi VM <fs af='gam1,v,,sg,3,,karwari_lat,' paxI='parasmEpaxI' gaNaH='BvAxiH' XAwuH='gamLz'>

))

))

Page 35: Two-Stage Constraint Based Sanskrit Parser Akshar Bharati, IIIT,Hyderabad

THANKS!!