24
LIRICS Mid-term Rev iew 1 LIRICS WP2 – NLP Lexica Monica Monachini [email protected] CNR-ILC - Pisa 23rd May 2006

LIRICS WP2 – NLP Lexica

Embed Size (px)

DESCRIPTION

LIRICS WP2 – NLP Lexica. Monica Monachini [email protected] CNR-ILC - Pisa 23rd May 2006. Summary of the presentation. Overview of WP2 1° year objectives Main results in T2.1 and T2.2 Work done Synergies with other LIRICS WPs, ISO activities, meetings - PowerPoint PPT Presentation

Citation preview

Page 1: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 1

LIRICS WP2 – NLP Lexica

Monica [email protected]

CNR-ILC - Pisa23rd May 2006

Page 2: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 2

Summary of the presentation

Overview of WP21° year objectives

Main results in T2.1 and T2.2Work doneSynergies with other LIRICS WPs, ISO

activities, meetingsPriorities for future activities

Page 3: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 3

WP2 overall objective

Define a “family” of standards for NLP lexiconsTwo-level standards:the high level specifications provide

structural elements, i.e. lexical classes and relations between them, the meta-model;

the low level specifications provide standardized constants, i.e. data categories used to “adorn” the lexical classes ISO 12620

Page 4: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 4

WP2 T2.1 overview and objectives

From past and on-going standardization activities,

gathering linguistic information considered relevant for lexical description and to be combined with the layers of the lexical model

Coherent input to ISO Data Category Registry revision

Page 5: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 5

WP2 T2.1 results

Proposal for a unified set of lexical information and unified descriptors as draft set of Data Categories Maximum set of candidate lexical data categories subdivided along the layers of linguistic description: morphosyntax, syntax and semantics. Data Categories shared between WP2 and WP3 relevant to Morphosyntactic description have been incorporated in the Syntax Tool: the Morphosyntactic Profile.

Page 6: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 6

WP2 T2.1 Deliverables

  1st year 2nd year 3rd year

 M1

M2

M3

M4

M5

M6

M7

M8

M9

M10

M11

M12

M13

M14

M15

M16

M17

M18

M19

M20

M21

m22

M23

M24

M25

M26

M27

M28

M29

M30

WP2                                                            

T2.1                                                            

T2.2                                               I            

T2.3                                         I                  

D.2.1 Survey and evaluation of existing standard for Lexica

D.2.1 Survey and evaluation of existing standard for Lexica (revision)

(version foreseen in conjunction with Data Cats to be issued togetherwith the data model in T2.2)

D.2.1 Survey and evaluation of existing standard for Lexica

Page 7: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 7

WP2 T2.2 overview and objectives

Define a lexical framework, a general and abstract meta-model as a set of structural nodes relevant for lexical description, enabling specific implementations on the basis of common Data Categories Definition of the common set of related Data Categories

Page 8: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 8

WP2 T2.2 results

Formulation of a high-level lexical meta-model, the Lexical Markup Framework, a flexible environment for user-defined mark-up languages Proof-of-concepts: mapping exercises of well known NLP lexicon practices against the model

Page 9: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 9

WP2 T2.2 Deliverables

  1st year 2nd year 3rd year

 M1

M2

MM3

M4

M5

M6

M7

M8

M9

M10

M11

M12

M13

M14

M15

M16

M17

M18

M19

M20

M21

m22

M23

M24

M25

M26

M27

M28

M29

M30

WP2                                                            

T2.1                                                            

T2.2                                               I            

T2.3                                         I                  

NLP Lexica standard for CD ballot (submitted beginning year 06)

NLP Lexica standard for ISO DIS ballot

Internal milestone for internal quality control

Page 10: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 10

WP2 Activities, Meetings, Synergies...

LIRICS WPs BI- TRI-LATERAL Working Meetings: CNR-ILC – MPI, 15.2.2005: PAROLE-SIMPLE lexical architecture and LEXUS tool WP2 internal meeting, 16.2.2005: basic structure of the meta-model for lexicons (core model +

extensions) CNR-ILC – DFKI, 5.5.2005: convergences between morpho-syntactic and syntactic data; issues for

the submission of the N W I on Syntax (SynAF) to ISO Pisa, 23-24.11.2005. WP2 internal meeting: basic structure of the meta-model for representation of

Multiword expressions

LIRICS Meetings Paris, 16-17.3.2005. Progress of work within WP2. Presentation of the standard core model for

lexicons and the extensions for NLP lexicons Barcelona, 21-22.6.2005. LIRICS Industrial Advisory Board Meeting Barcelona, 22.6.2005 Presentation of first bulk of information relevant for lexical description Nancy, 8-9.12.2005. WP4 TDG3 Workshop: connections between lexico-semantic representation

and semantic roles in lexiconISO Meetings Berlin 8-9.4.2005. ISO TC37/SC4 WG4 Meetings Warsaw 21-26.08.05. Plenary meeting of ISO TC37/SC4. Task force for the purpose of designating

generic data category sets for alignment with with the level of the metamodel; task force related to the representation of MWEs.

Rome 27.10.2005. UNI-DIAM Commission: candidature of Italy as P-member in ISO TC37/SC4 (CNR-ILC reference expert)

Page 11: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 11

• provide a common model for the creation and use of lexical resources• manage the exchange of data between and among these resources• enable the merging of electronic resources to form extensive global resources. Range of topics:• monolingual, • bilingual • multilingual lexical resources

Scalability • the same specifications are to be used for both small and large lexicons

Coverage• linguistic description range from morphology, syntax, semantic to multilingual representation• languages are not restricted to European languages • the range of targeted NLP applications is not restricted.

What is LMF for?

Page 12: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 12

Future activities/Priorities/Plans Data Categories

deliver rev 2 of D2.1: candidate data categories will receive the necessary adjustments after discussion

extend the ISO Registry to cover further layers of linguistic description: do we need an ISO Syntactic Profile (Bejin)?

LMF model refine the NLP multilingual and MWE extensions XML representation of LMF linguistic objects in order

to allow unified access to LMF conformant lexicons through APIs

Provide implementation of test suite lexical entries: PAROLE-SIMPLE lexicons ready to be described according to LMF (LEXUS), to be put in the LMF server and made accessible via the web.

Page 13: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 13

Structure of LMF

NLP Multilingual notations extension

NLP Inflectional paradigm extension

NLP Morphology extension

NLP MWE pattern extension

NLP Semantic extension

MRD extension

NLP Syntax extension

Core Package

Structural skeleton, with the basic hierarchy of information in a lexical entry

extend a subset of core-model classes; are conformant to the core model; cannot be used regardless to the core model

LMF specifications comply with modeling UML principles

Page 14: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 14

Core package

Representation Frame

Lexicon Information

Form Sense

Entry Relation

Sense Relation

Lexical Entry

Database

Lexicon

0..* 0..*

0..*1

0..* 0..*

0..*1

1

0..*

11

1

0..*

1

1..*

1

0..*

1

1..*

1..*

1

Container for managing the top level language components. The number of words or MWe of the lexicon is equal to the number of lexical entries in a given lexicon.

Form consists of a text string that represents a single word or a multi-word expression

Sense specifies or disambiguates the meaning and context of a form

One to many Representation Frames can be associated with Form, each of which contains a form and data categories that specify the orthographic types and name of the word

It is a cross-reference pivot that can link to many Lexical Entries within or across Lexicons.

Page 15: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 15

Package for extensional morphology

InflectionalParadigm

ListOfComponents LemmatisedForm

InflectedForm

LexicalEntry

Stem

{ordered}0..*{ordered}

1..*

0..1 1

0..*1

1

0..*

1

1..*

0..*

0..1

: InflectedForm

grammaticalNumber = singularwrittenForm = clergyman

: InflectedForm

grammaticalNumber = pluralwrittenForm = clergymen

: LemmatisedForm

writtenForm = clergyman

: LexiconInformation

language = eng

: LexicalEntry

: Database

: Lexicon

1st strategy:describe the morphologyrepresenting explicitly all inflections

Page 16: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 16

Package for inflectional paradigm

MorphologicalFeaturesCombo

InflectedFormCalculator

MorphologicalFeature

InflectionalParadigm

OperationArgument

ListOfComponents LemmatisedForm

Composer

Operation

Stem{ordered}

0..*

1

0..*0..*

0..* 10..*

{ordered}

1

0..*

11..*

0..1

0..*

1

: MorphologicalFeaturesCombo

: MorphologicalFeaturesCombo

: Operation

graphicalOperator = removeAfter

: InflectedFormCalculator

stem = 0

: Operation

graphicalOperator = addAfter

: InflectedFormCalculator

stem = 0

: LemmatisedForm

writtenForm = clergyman

: MorphologicalFeature

att = numberval = singular

: MorphologicalFeature

att = genderval = masculine

: MorphologicalFeature

att = numberval = plural

: InflectionalParadigm

id = asMan

: OperationArgument

val = 2

: OperationArgument

val = en

for "clergymen"

for "clergyman"

2nd strategy: declare an inflectional paradigm; use the inflectional paradigm extension for defining it

Page 17: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 17

Package for NLP syntax

SyntacticArgument

Construction

SemanticArgument

SyntacticBehavior

ConstructionSet

LexicalEntry

Self

Sense

Described in core package

Described in Semantic package

Described in core package

1 0..*

0..*0..*

0..1

0..*

0..* 0..*0..1 1

1

0..*

0..*0..*

0..1

0..1

1

0..*

0..*0..*

0..*

0..*

0..*

0..*

: SyntacticArgument

function = subjectsyntacticConstituent = NP

: SyntacticArgument

function = objectsyntacticConstituent = NP

: Construction

id = amare-SyntFrame

: Self

id = amare-selfauxiliary = avere

Syntactic behavior represents one of the behaviors of one (or more) senses

Construction describes one syntactic construction and can be shared by all words with the same syntactic behavior

Self refers to the head lexical entry and describes syntactic properties

Syntactic Argument describes a syntactic actant

ConstructionSet regroups together various Syntactic Constructions and factorizes syntactic descriptions to have a minimum of syntactic behavior elements in the lexicon.

Page 18: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 18

XML representation

Page 19: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 19

Package for NLP semantics

PredicativeRepresentation

Sense

SemanticPredicate

SemanticArgument

SyntacticArgumentSemanticDefinition

SyntacticBehavior

PredicateRelation

Construction

SynsetRelationSenseExample

SenseRelation

LexicalEntry

Proposition

Synset

Described in core package

Described in syntactic package

0..* 0..*

1 0..*

0..* 0..*

0..1

0..*1

0..*

1

0..*

1

0..*

0..1

0..*

1

0..*

0..*

10..*

0..*

1

0..*

0..*

0..*

10..*

0..*

10..1

1..*

Predicative Representation describes the link between Sense and Semantic Predicate

Semantic Predicate describes an abstract meaning

Semantic Argument describes a semantic actant and is linked with its syntactic counterpart

Page 20: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 20

Package for NLP semantics (cont.)

PredicativeRepresentation

Sense

SemanticPredicate

SemanticArgument

SyntacticArgumentSemanticDefinition

SyntacticBehavior

PredicateRelation

Construction

SynsetRelationSenseExample

SenseRelation

LexicalEntry

Proposition

Synset

Described in core package

Described in syntactic package

0..* 0..*

1 0..*

0..* 0..*

0..1

0..*1

0..*

1

0..*

1

0..*

0..1

0..*

1

0..*

0..*

10..*

0..*

1

0..*

0..*

0..*

10..*

0..*

10..1

1..*

Page 21: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 21

XML representation

Page 22: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 22

Package for NLP semantics (cont.)

PredicativeRepresentation

Sense

SemanticPredicate

SemanticArgument

SyntacticArgumentSemanticDefinition

SyntacticBehavior

PredicateRelation

Construction

SynsetRelationSenseExample

SenseRelation

LexicalEntry

Proposition

Synset

Described in core package

Described in syntactic package

0..* 0..*

1 0..*

0..* 0..*

0..1

0..*1

0..*

1

0..*

1

0..*

0..1

0..*

1

0..*

0..*

10..*

0..*

1

0..*

0..*

0..*

10..*

0..*

10..1

1..*

: Definition

text = a deciduous tree of the genus Quercus; has acorm ...language = engview

: Definition

text = the hard durable wood of any oaklanguage = engview

: Definition

text = a tall perennial wood plant ...language = engview

: Form

lemmatisedForm = oak tree

: SynSetRelation

type = hyponymy

: Form

lemmatisedForm = tree

: Form

lemmatisedForm = oak

: LexicalEntry

partOfSpeech = noun

: LexicalEntry

partOfSpeech = noun

: LexicalEntry

partOfSpeech = noun

: SynSet

id = 11520753

: SynSet

id = 11520081

: SynSet

id = 12352501

: Sense : Sense : Sense

Page 23: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 23

Package for Multilingual representation

Transfer Axis Relation

Sense Axis Relation

Syntactic Behavior

SenseExample

Transfer Axis

Example Axis

Source Test

Sense Axis

Target Test

SynSet

Sense0..*

0..*

0..*

0..*

1

0..*

0..* 0..*

0..*0..*

1

0..*

1

0..1

10..*

0..1

1

1

0..*

1

0..*

10..*

: Sense Axis Relation

comment = flows into the sealabel = more precise

: Sense

label = eng:riverlabel = fra:rivière

: Sense

: Sense

label = fra:fleuve

: Sense Axis

: Sense Axis

Sense Axis Relation describes the linking between two different Sense Axis

Source and TargetTest permit to express conditions about the translation on the source/target language side

Page 24: LIRICS WP2 – NLP Lexica

LIRICS Mid-term Review 24

Package for Multiword expressions

Combiner Argument

List of Components Lemmatised Form

Combiner

MWE Pattern

0..11

1..*0..*

0..*

1

0..1

0..*

0..*

1

0..*

1

: MWE Pattern

id = VPSomebodyPPcomment = for a pattern, VP somebody IndirectObject

: Lemmatised Form

writtenForm = throw to the lions

: Combiner

constituent = NPsemanticRestriction = human

: Combiner

head = trueconstituent = VPrank = 0graphicalSeparator = space

: Combiner Argument

rank = 1graphicalSeparator = space

: Combiner Argument

rank = 2graphicalSeparator = space

: Combiner Argument

rank = 3graphicalSeparator = space

: Combiner Argument

function = directObject

: Combiner Argument

function = indirectObject

: List of Components

: Lemmatised Form

writtenForm = throw

: Lemmatised Form

writtenForm = to

: Lemmatised Form

writtenForm = the

: Lemmatised Form

writtenForm = lion

: Combiner

constituent = PPnumber = plural