17
CAP: A Hierarchical Lexical Function Amalia Todirascu Linguistique, Langues, Paroles (LILPA) University of Strasbourg [email protected]

CAP: A Hierarchical Lexical Function

  • Upload
    vangie

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Amalia Todirascu Linguistique, Langues, Paroles (LILPA) University of Strasbourg [email protected]. CAP: A Hierarchical Lexical Function. The Project. Goals to study a specific CAP lexical function, in several languages (French, English, German) economy, politics - PowerPoint PPT Presentation

Citation preview

Page 1: CAP: A Hierarchical Lexical Function

CAP: A Hierarchical Lexical Function

Amalia Todirascu

Linguistique, Langues, Paroles (LILPA)

University of Strasbourg

[email protected]

Page 2: CAP: A Hierarchical Lexical Function

2

The Project

Goals to study a specific CAP lexical function, in

several languages (French, English, German) economy, politics

to provide a complete linguistic description of this function

to extend a multilingual ontology, Prolexbase (Tran and Maurel, 2006)

Page 3: CAP: A Hierarchical Lexical Function

The Project (II)

collaboration with CLARIN European project (http://www.clarin.eu)– WP3 Humanities overview

• WP3.3 Call for collaboration with Humanities projects

– Collaboration• access to existing corpora and tools

• consultancy

Page 4: CAP: A Hierarchical Lexical Function

4

CAP – a Lexical Function

CAP lexical function (Mel'čuk 1984, 1988, 1992, 1999) – hierarchical relations

Two persons François Fillon est premier ministre de Nicolas Sarkozy Sebek em war ein Oberpriester ca. 1780 v.Chr

Two organisations Swiss Private Aviation AG, a fully-owned subsidiary of Swiss

International Air Lines AG Peugeot est une firme sochalienne

A Person and an organization or a country SWISS Finanzchef Marcel Klaus Traian Băsescu is the Romanian president

Page 5: CAP: A Hierarchical Lexical Function

5

Context

linguistics : noun classifications (Kleiber 1990, Kleiber 1999, Jonasson 1994)

lexical databases: WordNet (Miller, 1995), EuroWordNet (Vossen, 1998), BalkaNet (Tufis, 2004), FrameNet (Baker, et al, 1998)

ontologies: Prolexbase (Tran and Maurel, 2006) (Grass et al, 2004) , SUMO (Niles and Pease, 2001)

several applications : information extraction QA systems

Page 6: CAP: A Hierarchical Lexical Function

6

The Methodology

we identify existing monolingual and parallel corpora

DE, EN, FR CLARIN language resource registry

tagged and raw corpora annotation tools (both from the repository and on-line web

services)

we create our own multilingual corpora

Page 7: CAP: A Hierarchical Lexical Function

7

The Methodology (II)

we apply several data extraction strategies• searching synonyms of "chef/head of/Vorsitzender";• searching Named Entities related by the CAP relation (Martine Aubry – Parti Socialiste);• searching annotated persons and organizations through aligned corpora

we analyse the contexts to classify the expressions and their argumentswe extend Prolexbase ontology

Page 8: CAP: A Hierarchical Lexical Function

Corpora (I)

• Available public data• Web interfaces (CQP)

• Various domains and genres

• monolingual : • Wortschatz (http://corpora.informatik.uni-leipzig.de), IULA

(http://bwananet.iula.upf.edu), COSMAS (http://www.ids-mannheim.de/cosmas2), BNC (http://www.natcorp.ox.ac.uk/)

• multilingual :• Oslo (http://www.hf.uio.no), CLUVI (http://sli.uvigo.es/CLUVI), DGT-TM

(http://langtech.jrc.it/DGT-TM.html)

Page 9: CAP: A Hierarchical Lexical Function

Corpora (II)

Corpora built for the project

monolingual : party chiefs (DE, EN), French president (FR) (200,000 tokens/language)

multilingual (paralel and comparable) :

aiplane companies (51,000-54,000 tokens)

European parliament (127,000-134,000 tokens)

European commission (175,000-195,000 tokens)

Domains : politics, economy

Page 10: CAP: A Hierarchical Lexical Function

10

Preprocessing the Corpora

Unitex tool (Paumier, 2000) Resources available for the three languagesTools :

tokenizer, lemmatizer and tagger CasSys (Friburger and Maurel, 2004) to annotate French Named

Entities

Weblicht Platform NE annotations for German and English

sentence aligner : Alinea (Kraif, 2001)

Page 11: CAP: A Hierarchical Lexical Function

11

Data extraction

three strategies for data extractionA. we identify synonyms/hyponyms for English (WordNet,

FrameNet) and their equivalents in French and German• chef, président, PDG, directeur général• Chief executive officer, president, head of• Vorsitzender, Direktor

B. we search pairs of entities which are related by a CAP relation• Barack Obama – United States of America• José Manuel Barroso – la Commission européenne• Marcel Klaus – SWISS

C. we use aligned corpora and French NER CasSys (Friburger and Maurel, 2004) to obtain relevant contexts of Person or Organization

Page 12: CAP: A Hierarchical Lexical Function

Data Extraction (II)

Problems few contexts from existing corpora (30 to 50) Various queries

CQP/web interface

raw texts

Various annotations few tagged corpora

almost no NE annotated corpora

heterogenous tools to preprocess corpora

12

Page 13: CAP: A Hierarchical Lexical Function

13

'Cap' lexical units

various lexical categories nouns :

positions (e.g.Finanzdirektor), professions (infirmière en chef), titles (Dr.), army ranks (General)

verbs : to lead, to organize, to commandA trilingual ontology

95 lexical units (FR), 93 lexical units (EN), 67 lexical units (DE)

From existing lexical databases From corpora

Page 14: CAP: A Hierarchical Lexical Function

14

Linguistic Analysis

arguments types persons, organizations, places common nouns : anaphoric references to

organisations or persons in charge, nationality adjectives

various linguistic expressions Nouns – morpho-syntactic variations Verbs

complex verbo-nominal predicates (sous la gouverne de, unter der Leitung von, under the direction of, become président, être elu …)

Page 15: CAP: A Hierarchical Lexical Function

15

Morpho-Syntactic Properties

Nouns affixation

général, généralissime (FR) composition

vice-roi (FR), vice-roy (EN), Vizekönig (DE) modification

adjective (directeur général, FR, Generaldirektor DE) prepositional phrase (infirmière en chef FR, head nurse EN,

Oberschwester DE) noun being the possessor of another noun

du Conseil de Sécurité des Nations Unies, United Nation Security Council, des UN-Sicherheitsrates

Page 16: CAP: A Hierarchical Lexical Function

17

Conclusion and Further Work

study from the lexical semantics field : a hierarchy relation in a multilingual perspective – CAP various expressions and various arguments types data from monolingual and multilingual corpora trilingual ontology (FR,DE, EN) – extension of Prolexbase

Overall experience querying various interfaces heterogeneous annotation information heterogeneous tools combining linguists’ and computational linguists’ competences

Page 17: CAP: A Hierarchical Lexical Function

18

The Lexico-Syntactic Patterns

French patterns <Organization>de<Organization> Conseil d'Administration de SWISS

English patterns <CAP function> of <Organisation>, <Person> Chief executive officer of the company TAROM, M. Gheorghe Birla

German patterns <Person> <sein> <tokens>* <CAP function> <Organisation>

Peter Siegenthaler ist seit Juli 2000 Direktor der Eidgenössischen Finanzverwaltung