39
English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC - Université de Caen http://www.info.unicaen.fr/~jvergne TALN 2002

English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

Embed Size (px)

Citation preview

Page 1: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

English version

A method for top-down and deterministic

parsing of multilingual corpora :

application : computing subject-verb links

Jacques VergneGREYC - Université de Caen

http://www.info.unicaen.fr/~jvergne

TALN 2002

Page 2: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -2-

English version

Features of the experience

• experimenting, exploring, explaining, transmitting

deterministic parsing methods

• choice of a classical task, limited and (apparently) simple :

detecting and linking subjects and verbs in clauses

with the smaller possible soft (program + resources)

Page 3: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -3-

English version

Linking subject <—> verb

• linking pronoun or chunk subject to the verbal chunk in every clause

• multilingual corpus (English, German, French, Italian, Spanish)

with language identification : genericity of the method ?

• top-down : document —> clause and chunk, (with partial chunking, without going down to the word level)

• written in perl :

- sentence parsing : 40 Kb

- resources : 20 Kb for 5 languages

Page 4: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -4-

English version

with beginnings of clauses, beginnings of chunks

How doing without a dictionary ?

<[>||<d>L'euro</d> ||<V>rend déjà <p>d'éminents</p> services

<[><p>Dans les deux</p> cas ||<d>ces systèmes</d>

<p>d'armes</p> ||<V>disposent <p>de radars</p>

<[>||<d>Questo tema</d> ||<V>rischia <p>di essere</p> <d>la questione</d> sociale <p>del futuro</p>

<[>||<d>La Bolsa</d> <p>de Tokio</p> ||<V>cerró ayer

<p>a su nivel</p> más bajo <p>en 17</p> años

with determiner - verbal ending couples

Page 5: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -5-

English version

<[>||<d>Das Sternbild</d> nämlich ||<V>steht <p>in dieser

Jahreszeit</p> besonders tief <p>am Himmel</p>

<[><p>Bis Ende Oktober</p> ||<V>schließt sich ||<d>der Reigen</d> <p>in Connecticut</p>, Massachusetts

<cc>und Rhode Island

<[>||<d>The costs</d> ||<V>mount rapidly,

<[cc>But ||<d>the Pentagon</d> move ||<V>represents

<d>the first</d> significant federal call-up

with beginnings of clauses, beginnings of chunks

How doing without a dictionary ?

with determiner - verbal ending couples

Page 6: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -6-

English version

Resources : all for French

"à condition que|à condition qu|ainsi que|ainsi qu|auquel|auxquels|combien|comme|comment|dont|dés que|dés qu|lorsque|

lorsqu|même si|où|parce que|parce qu|pourquoi|quand|alors que|alors qu|bien que|bien qu|quoi que|quoi qu|tandis que|tandis qu|tant que|tant qu|puisque|puisqu|sans que|sans qu|que|qu|qui|sauf si|si"

"et donc|et encore|et ensuite|et même|et non|et pas|et pourtant|et|ou bien|ou même|ou encore|ou|mais aussi|mais|car|mais|or|puis"

"quant à|quant au|quant aux|grâce à|grâce au|grâce aux|face à|face au|face aux|à partir de|à partir du|à partir d|à|À|afin de|afin d|aprés|au-delà d|au-delà de|au-delà du|au-delà des|au|aux|auprés d|auprés de|auprés du|auprés des|autour d|

autour de|autour du|autour des|avant|avec|chez|contre|dans|de par|d'entre|d'où|d|de|des|du|depuis|devant|dés|durant| en tant que|en tant qu|en|entre|hors d|hors de|hors du|hors des|jusque|jusqu'à|jusqu'au|jusqu'aux|lors d|lors de|lors du|lors des|malgré|outre|par|parmi|pendant|pour|près de|près d|sans|sauf|sous|selon|sur|vers|via|voire"

"un|une|le|la|l|ce|cet|cette|sa|son|notre|leur|tout|toute|chaque|aucun|aucune| Un|Une|Le|La|L|Ce|Cet|Cette|Sa|Son|Notre|Leur|Tout|Toute|Chaque|Aucun|Aucune"

"les|ces|ses|leurs|nos|tous|toutes|plusieurs|deux|trois|quatre|cinq|six|sept|huit|neuf|dix|d'autres|certains|quelques|Les|Ces|Ses|Leurs|Nos|Tous|Toutes|Plusieurs|Deux|Trois|Quatre|Cinq|Six|Sept|Huit|Neuf|Dix|D'autres|Certains|Quelques"

"je|j|tu|il|elle|l'on|on|c|ça|cela|ceci"

"ils|elles|nous|vous"

"a|avait|aura|ait|aurait|est|était|sera|serait|va|allait|ira|faisait|fera"

"ont|avaient|auront|aient|auraient|sont|étaient|seront|seraient|vont|allaient|iront|font|faisaient|feront"

"e|a|ed|pand|end|ond|erd|ord|oud|et|it|ît|tient|vient|pent|sent|eint|ort|ut|ût""ent|ont"

"n'|ne |m'|me |t'|te |s'|se |s'en |s'y |lui |leur |en |y |le |la |les |l'"

beginningsof clause

beginningsof

chunk

subjectpronouns

auxiliaries

verbalendings

clitics

Page 7: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -7-

English version

Resources : all for English

"although|as if|as|because|before|how|if|since|than|that|though|unless|until|whatever|what|when| where|whether|while|who|which|whom|whose|why"

"and|but|or|nor"

"about|according to|across|after|against|along|amid|among|around|such as|at|because of|behind|

between|by|despite|due to|during|except for|for|from|in order to|in|inside|into|instead of|like|of|off|on|out of| over|per|prior to|less than|more than|throughout|through|to|toward|under|unlike|via|within|without|with"

"such a|a|an|another|this|any|each|one|Such a|A|An|Another|This|Any|Each|One"

"many|most|much of|much|plenty of|several|some|such|these|those|both|two|three|four|five|six|seven| eight|nine|ten|a few|Many|Most|Much of|Much|Plenty of|Several|Some|Such|These|Those|Both|Two| Three|Four|Five|Six|Seven|Eight|Nine|Ten|A few"

"the|our|your|its|his|her|their|The|Our|Your|Its|His|Her|Their"

"I|he|she|it"

"we|they|you"

"has|is|was|does|says|tells|hasn't|isn't|wasn't|doesn't" "have|are|were|do|say|tell|haven't|aren't|weren't|don't" "had|will|would|shall|should|may|might|must|cannot|can|could|did|said|told|hadn't|wouldn't|shouldn't|may|mustn't|can't|couldn't|didn't|won't "

"s|ed""a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|t|u|v|w|x|y|z"

""

beginningsof clause

beginningsof

chunk

subjectpronouns

auxiliaries

verbalendings

clitics

Page 8: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -8-

English version

Resources : all for German

"dass|daß|in denen|indessen|dessen|indem|nachdem|ob|obwohl| was|warum|wer|weil|wenn|wie|wo|wofür|worauf|worin"

"aber|oder|und"

"dem|den|des|diesem|diesen|dieser|dieses|einem|einen|einer|eines|meinem|meinen|meiner|meines|deinem|deinen|deiner|deines|seinem|seinen|seiner|seines|ab|als|am|an|anhand|auf|aus|bei|bis|durch|für|

gegen|gen|hinter|ihren|im|innerhalb|ins|in|mit|nach|neben|ohne|pro|seit|über|‹ber|um|unseren|unter|vom|von|vor|während|wegen|zum|zur|zu|zwischen"

"der|das|ein|eine|dieser|diese|kein|keine|ihres|ihr|Der|Das|Ein|Eine|Dieser|Diese|Kein|Keine|Ihres|Ihr|die|meine|seine|viel|Die|Meine|Seine|Viel"

"die|meine|seine|ihre|viele|alle|zwei|Die|Meine|Seine|Ihre|Viele|Alle|Zwei"

"ich|er|sie|es|man|Ich|Er|Sie|Es|Man"

"wir|Sie|sie|Wir"

"habe|hat|hatte|bin|ist|sei|wäre|war|wird|werde|wurde|darf|dürfte|kann|konnte|könnte|könne|lässt|muss|soll|will|wollte"

"haben|hatten|sind|waren|werden|wurden|worden|können|könnten|lassen|müssen|mussten|sollen|sollten"

"b|nd|te|e|f|ag|ng|ah|hm|t"

"en|rn"

""

beginningsof clause

beginningsof

chunk

subjectpronouns

auxiliaries

verbalendings

clitics

Page 9: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -9-

English version

1 document

Parsing and Hierarchies of grains

intermediary grains computed grains

textual zones

proto-clauses

extracting

validating, segmenting, linking

clauses

purely top-down parser

segmenting / written forms

proto-chunks

tagging / written forms

going down in the hierarchy

of physical grains

chunks

physical grains

sentences

segmenting / punctuation

Page 10: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -10-

English version

proto-clauses(= hypotheses on clauses)

post-processing

standardprocess

Parsing process

cutting,linking

proto-clauses

clauses(= 1 proto-clause)

1 sentence

diagnostic

clauses(= 1/2 proto-clause,

2 proto-clauses)

beginnings of clause

auxiliaries, subject pronouns,verbal endings

partialchunking

subject & verb ?sentence ?

linkingsubject - verb

beginnings of chunks

no

segmentation / written forms

Page 11: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -11-

English version

Standard process : example 1

0 : <[>Je n'ai jamais dit

1 : <[cs>queque </cs>l'euro allait remplacer le dollar

2 : <[.>..

Je n'ai jamais dit que l'euro allait remplacer le dollar.

(Ouest-France of 18/10/2001)

• tagging beginnings of proto-clauses —> segmentation into proto-clauses :

proto-clause = clause

Page 12: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -12-

English version

Standard process : example 1

0 : <[><pp>Je <V>n'ai jamais dit [nbpp=1 nbV=1]

1 : <[cs>queque </cs><d>l'euro</d> allait remplacer <d>le dollar</d>[nbpp=0 nbV=0]

2 : <[.>..

• tagging beginnings of chunks —> partial chunking in the written form of the proto-clause

• tagging subject pronouns, auxiliaries —> counting subject pronouns and auxiliaries

Page 13: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -13-

English version

Standard process : example 1

0 : <[>||<pp>Je ||<V>n'ai jamais dit [nbV=1 saturS=1]

1 : <[cs>queque </cs>||<d>l'euro</d> ||<V>allait remplacer <d>le dollar</d>[nbV=1 saturS=1]

2 : <[.>..

• for every proto-clause : detecting and linking subject and verb

Page 14: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -14-

English version

Standard process : example 1

0 : <[>||<pp>Je ||<V>n'ai jamais dit [nbV=1 saturS=1]

1 : <[cs>queque </cs>||<d>l'euro</d> ||<V>allait remplacer <d>le dollar</d>[nbV=1 saturS=1]

2 : <[.>..

• diagnostic of every clause and of the sentence

• every clause has its subject and its verb

and the sentence has a main clause (without a mark)

Page 15: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -15-

English version

Standard process : example 2

Eine spektakuläre Operation gelang ihm im November 1974, als er ein Spenderherz transplantierte, ohne das Herz des Empfängers zu entfernen.

(Der Spiegel - 2/9/2001)

0 : <[>Eine spektakuläre Operation gelang ihm im November 1974,

1 : <[cs>alsals </cs>er ein Spenderherz transplantierte,

2 : <[><pi>ohneohne </pi>das Herz des Empfängers <pi>zu </pi>entfernen

3 : <[.>..

• tagging of the beginnings of proto-clauses —> segmentation into proto-clauses :

Page 16: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -16-

English version

Standard process : example 2

0 : <[><d>Eine spektakuläre Operation</d> gelang ihm <p>im November</p> 1974, [nbpp=0 nbV=0]

1 : <[cs>alsals </cs><pp>er <d>ein Spenderherz</d> transplantierte, [nbpp=1 nbV=0]

2 : <[><pi>ohneohne </pi><d>das Herz</d> <p>des Empfängers</p> <pi>zu

entfernen</pi>

3 : <[.>..

• tagging beginnings of chunks —> partial chunking in the written form of the proto-clause

• tagging pronouns, auxiliaries —> counting pronouns and auxiliaries

Page 17: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -17-

English version

Standard process : example 2

• for every proto-clause : detecting and linking subject and verb

0 : <[>||<d>Eine spektakuläre Operation</d> ||<V>gelang ihm <p>im November</p> 1974, [nbV=1 saturS=1]

1 : <[cs>alsals </cs>||<pp>er <d>ein Spenderherz</d> ||<V>transplantierte, [nbV=1 saturS=1]

2 : <[><pi>ohneohne </pi><d>das Herz</d> <p>des Empfängers</p> <pi>zu entfernen</pi>

3 : <[.>..

Page 18: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -18-

English version

Standard process : example 2

0 : <[>||<d>Eine spektakuläre Operation</d> ||<V>gelang ihm <p>im November</p> 1974, [nbV=1 saturS=1]

1 : <[cs>alsals </cs>||<pp>er <d>ein Spenderherz</d> ||<V>transplantierte, [nbV=1 saturS=1]

2 : <[><pi>ohneohne </pi><d>das Herz</d> <p>des Empfängers</p> <pi>zu entfernen</pi>

3 : <[.>..

• diagnostic of every clause and of the sentence

• every clause has its subject and its verb

and the sentence has a main clause (without a mark)

Page 19: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -19-

English version

Post-processing :

proto-clause clause

2 operations are possible :

• cutting 1 proto-clause => 2 clauses

• linking 2 proto-clauses => 1 clause

Page 20: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -20-

English version

Post-processing : cutting a proto-clause into 2 clauses

Result of the standard process :

2 verbs in 1 proto-clause

=> searching a cut point

0 : <[cs>AlthoughAlthough </cs>||<pp>they ||<V>have not ruled out <d>a possibility</d> [nbV=1 saturS=1]

1 : <[cs>thatthat </cs><d>another criminal</d> <V>could be <p>behind the anthrax</p>

      attacks, investigators <V>are intensely looking <p>at evidentiary</p> threads

      linking <d>the letters</d> <p>to the hijackers</p>[nbV=2]

2 : <[.>..

Page 21: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -21-

English version

Post-processing : cutting a proto-clause into 2 clauses

0 : <[cs>AlthoughAlthough </cs>||<pp>they ||<V>have not ruled out <d>a possibility</d> [nbV=1 saturS=1]

1 : <[cs>thatthat </cs>||<d>another criminal</d> ||<V>could be <p>behind the anthrax</p>

      attacks,, [nbV=1 saturS=1]

2 : <[>||investigators ||<V>are intensely looking <p>at evidentiary</p> threads linking

      <d>the letters</d> <p>to the hijackers</p>[nbV=1 saturS=1]

3 : <[.>..

Cut on the comma :

every clause now has its subject and its verb

and the sentence has a main clause (without a mark)

Page 22: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -22-

English version

Post-processing : linking 2 proto-clauses

0 : <[><d>Eine junge Südafrikanerin</d>, [nbV=0]

1 : <[pr>||diedie </pr>1969 <d>ein neues Herz</d> ||<V>erhielt, [nbV=1 saturS=1]

2 : <[>überlebte damit zwölf Jahre[nbV=0]

3 : <[.>..

Result of the standard process :

2 proto-clauses have no verb

=> trying to link them

Page 23: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -23-

English version

Post-processing : linking 2 proto-clauses

0 : <[>|<d>Eine junge Südafrikanerin</d>, [nbV=0 S_en_attente=1] (ping of the subject)

1 : <[pr>||diedie </pr>1969 <d>ein neues Herz</d> ||<V>erhielt,

[nbV=1 saturS=1]

2 : <[>überlebte damit zwölf Jahre

[nbV=0]

linking the proto-clause 0 to the proto-clause 2by the "ping-pong" process :

Page 24: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -24-

English version

0 : <[>||<d>Eine junge Südafrikanerin</d>, [nbV=0 S_en_attente=0 lienS=2] (ping of the subject)

1 : <[pr>||diedie </pr>1969 <d>ein neues Herz</d> ||<V>erhielt,

[nbV=1 saturS=1]

2 : <[>||<V>überlebte damit zwölf Jahre

[nbV=1 saturS=1 lienS=0] (pong of the verb)

Post-processing : linking 2 proto-clauses

linking the proto-clause 0 to the proto-clause 2by the "ping-pong" process :

3 : <[.>..

every clause now has its subject and its verb

and the sentence has a main clause (without a mark)

Page 25: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -25-

English version

0 : <[><d>Les tueurs</d>, [nbV=0]

1 : <[pr>||quiqui </pr>||<V>ont assassiné Rehavam Zeevi, ministre israélien      <p>du Tourisme</p>, appartiennent <p>au camp</p> <p>des      ennemis</p> <p>de la paix</p>

[nbV=1 saturS=1]

2 : <[.>..

Post-processing : cutting a proto-clause into 2 clauses

+ linking 2 proto-clauses

Result of the standard process :

1 proto-clause has no verb

=> trying to cut and link

Page 26: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -26-

English version

"ping-pong" process : ping of the subject = putting a subject candidate in a waiting position

0 : <[>|<d>Les tueurs</d>, [nbV=0 S_en_attente=plur] (ping of the subject?)

1 : <[pr>||quiqui </pr>||<V>ont assassiné Rehavam Zeevi, ministre israélien

     <p>du Tourisme</p>,, appartiennent <p>au camp</p> <p>des

     ennemis</p> <p>de la paix</p>

[nbV=1 saturS=1]

cutting the proto-clause 1 into 2 proto-clauses :

Post-processing : cutting a proto-clause into 2 clauses

+ linking 2 proto-clauses

Page 27: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -27-

English version

0 : <[>|<d>Les tueurs</d>, [nbV=0 S_en_attente=plur] (ping of the subject?)

1 : <[pr>||quiqui </pr>||<V>ont assassiné Rehavam Zeevi, ministre israélien

     <p>du Tourisme</p>,,

[nbV=1 saturS=1]

2 : <[>appartiennent <p>au camp</p> <p>des ennemis</p> <p>de la

     paix</p>

[nbV=0]

Post-processing : cutting a proto-clause into 2 clauses

+ linking 2 proto-clauses

cutting the proto-clause 1 into 2 proto-clauses :

Page 28: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -28-

English version

0 : <[>||<d>Les tueurs</d>, [nbV=0 S_en_attente=0 lienS=2] (ping of the subject?)

1 : <[pr>||quiqui </pr>||<V>ont assassiné Rehavam Zeevi, ministre israélien

     <p>du Tourisme</p>,

[nbV=1 saturS=1]

2 : <[>||<V>appartiennent <p>au camp</p> <p>des ennemis</p> <p>de la

     paix</p>

[nbV=1 saturS=1 lienS=0] (pong of the verb)3 : <[.>..

every clause now has its subject and its verb

and the sentence has a main clause (without a mark)

Post-processing : cutting a proto-clause into 2 clauses

+ linking 2 proto-clauses"ping-pong" process : pong of the verb = a waiting subject candidate & agreeing verbal ending

Page 29: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -29-

English version

Implementation of the linguistic model

physical grains

computedgrains

clauses

sentences

proto-chunks

these grains are represented

in a repetitive structure

these grains are tagged

in the written forms of the (proto-)clauses

proto-clauses

chunks

intermediarygrains

in the repetitive structure of the

(proto-)clauses

Page 30: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -30-

English version

Aims of the "Groupe Syntaxe" of the GREYC

• searching minimal solutions :

for a given task, minimising means

- very little programs

- very simple algorithms

- deterministic solutions (without combination enumeration) :

. computing on forms and their positions

- linguistic minimal bases :. using very few properties,

only ones which are useful in the process

. very few resources (typographical, morphological)

Page 31: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -31-

English version

Very small programs !

• how ?

while using very general linguistic properties

defined in comprehension

and not in extension

• why ?

because these properties are interesting :

few, abstract

operativeefficient

understanding, modelling

acting

Page 32: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -32-

English version

Conclusions

• classical tasks are feasible with minimal means (quasi absence of dictionary)

other tasks : computing reported speech, locating explanations cf. Nadine Lucas (GREYC) and Emmanuel Giguet (LATTICE)

• with fewer means, work is easier :- fewer lexical resources => lower cost- easy to add a new language- always above the word level

• beginnings of a promising way

• still a long way ...

Page 33: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -33-

English version

your questions ?

End of the lecture

Page 34: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -34-

English version

to download

• you can download this presentation on http://www.info.unicaen.fr/~jvergne/TALN2002_JVergne_en.ppt

• also see my presentation at TALN 2001 Parsing natural languages : from "combinatory" to "deterministic" parsing

on http://www.info.unicaen.fr/~jvergne/TALN2001_JVergne_en.ppt

• also see the tutorial of Coling 2000"Trends in Robust Parsing"

on http://www.info.unicaen.fr/~jvergne/tutorialColing2000.html

(presentation and references)

Page 35: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -35-

English version

Page 36: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -36-

English version

1 document

Parsing and Hierarchies of grains

classicalparsers

recursives phrases, sentence

physical grains

computedgrains

sentences

tokens

segmenting

segmenting

grouping tokens and phra.

top - down in the hierarchy

of physical grains

bottom - up in the hierarchy

of computed grains

Page 37: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -37-

English version

1 document

Parsing and Hierarchies of grains

1998parser

chunks

physical grains

computedgrains

sentences

tokens

segmenting

segmenting

grouping tokens

linking chunks

top - down in the hierarchy

of physical grains

bottom - up in the hierarchy

of computed grains

Page 38: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -38-

English version

1 document

Parsing and Hierarchies of grains

GREYCparser

chunks

physical grains

computedgrains

textual zones

tokens

extracting

segmenting

grouping and linking

clauses

sentences

grouping and linking

grouping and linking

top - down in the hierarchy

of physical grains

bottom - up in the hierarchy

of computed grains

Page 39: English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC

24/6/2002 © Jacques Vergne TALN 2002 -39-

English version