25
March 5, 2008 Companions Semantic Representation and Dia log Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

Embed Size (px)

Citation preview

Page 1: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

1

The PDTMorphology

and Surface Syntax

Jan Hajič

Institute of Formal and Applied Linguistics

School of Computer Science

Faculty of Mathematics and Physics

Charles University, Prague

Czech Republic

Page 2: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

2

Morphology (m-layer) Prerequisites for the manual annotation process:

Tokenized data Annotation guidelines Annotation tool

Manual decision making support Offline (or online) morphological analyzer

Quality checking tool Process description

Results (manually annotated data) to be used for... tagger training, linguistic research, basis for further

annotation, ...

Page 3: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

3

Morphological Attributes

Tag: 13 categoriesExample: AAFP3----3N----Adjective no poss. Gender negatedRegular no poss. Number no voiceFeminine no person reserve1Plural no tense reserve2Dative superlative base

var.Lemma: POS-unique identifier

Books/verb -> book-1, went -> go, to/prep. -> to-1

Ex.: nejnezajímavějším“(to) the most uninteresting”

Page 4: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

4

Morphological Tagset 13 categories, 4452 plausible tags (combinations):

Category # of values Example(s)POS 10 N (noun), Z (punctuation)SUBPOS 75 P (personal pron.), U (possessive adj.)GENDER 8 I (masc. inanimate), X (any), - (N.A)NUMBER 4 P (plural), D (dual)CASE 9 1 (nominative), 6 (locative)POSSGENDER 4 M (masc. animate), F (feminine)POSSNUMBER 3 S (singular), P (plural)PERSON 5 1 (first), ...TENSE 4 P (present), M (past)GRADE 5 3 (superlative)NEGATION 3 A (affirmative), N (negative)VOICE 3 A (active), P (passive)VAR 11 1 (1st variant), 6 (colloq. style), 8 (abbrev.)

Page 5: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

5

Morphological Analysis

Formally: MA: A+ → Pow(L x T) MA(f) = { [ l,t ] };

f A+ (the token), l L (lemma), t T (tag)

tokens taken in isolation no attempt to solve e.g. auxiliaries vs. full verbs Ex.: MA(“má“) = { [mít,VB-S---3P-AA---], lit. “to have”

lit. “has”,”my” [můj,PSFS1-S1------1], lit. “my” [můj,PSFS5-S1------1], [můj,PSNP1-S1------1], [můj,PSNP4-S1------1], [můj,PSNP5-S1------1] }

Page 6: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

6

Morphological Analysis:Implementation Dictionary-based

covers 800kW (lemmas), ~ 20 mil. forms (w/tag)

C code implementation standard (regular) derivations on-the-fly; ex.:

spojit spojený spojenýspojenost

spojitelný spojitelný

spojitelnost irregular forms listed in dictionary (w/tags) no phonological processing (concatenation only) grammatical prefixes only: negation, superlative

joinedlyjoin joined joinedliness

joinablyjoinable joinability

Page 7: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

7

The Morphological Annotation Tool (LAW)

Page 8: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

March 5, 2008 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

8

The Process ofMorphological Annotation

From tokenized to annotated text:

(Auto) morphological

analysismorphological

dictionary

tokenized

text (auto,

w-layer)

text w/morph.

interpretations

Manual morphological

disambiguation (DA)

text w/select.

interpretation Manual adjudication annotated

text (m-layer)

annotation

guidelines

Page 9: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

9

PDT – Syntactic Annotation Surface syntax annotation

Dependency surface syntax Comparable to Penn Treebank annotation

Convertible: dependency ↔ parse trees

Deep syntactic/semantic annotation Dependency trees Different topology High level of generalization and formalization Many node attributes

Page 10: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

10

Analytical Syntax (a-layer) Dependency + Analytical Function

dependent

governor

The influence of the Mexicancrisis on Central and EasternEurope has apparently been underestimated.

Page 11: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

11

Analytical Syntax: Functions

Main (for [main] semantic lexemes):Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom“Double” dependency: AtrAdv, AtrObj, AtrAtr

Special (function words, punctuation,...):Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxYPrepositions/Conjunctions: AuxP, AuxCPunctuation, Graphics: AuxX, AuxS, AuxG, AuxK

StructuralElipsis: ExD, Coordination etc.: Coord, Apos

Page 12: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

12

Example

All came from Cray Research.

Page 13: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

13

Surface Syntax Example Complete sentence: Sb, Pred, Obj

Resistance needs courage.

Page 14: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

14

Surface Syntax Example Analytical verb form:

he would be allowed to be enrolled

Page 15: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

15

Surface Syntax Example Predicate with copula (state)

you were fired

Page 16: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

16

Surface Syntax Example Passive construction (action)

(The) book has been translated [by Mr. X]

Page 17: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

17

Surface Syntax Example Complement

she left crying

Page 18: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

18

Surface Syntax Example Object

he gave Mary a book

Page 19: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

19

Surface Syntax Example Object used for infinitive of analytical verb

forms he wants to learn

Page 20: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

20

Surface Syntax Example Relative clause (embedded)

the woman, who had a French accent, was very pretty

Page 21: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

21

Surface Syntax Example Coordination

... (to) magic, mysticism(,) etc.

Page 22: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

22

Surface Syntax Example Apposition

cheap, i.e. under five dollars

Page 23: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

23

Incomplete phrases Peter works well, but Paul badly

Surface Syntax Example

Page 24: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

24

Surface Syntax Example Variants (equality)

he bought shoes for his son

Page 25: March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax

25

XML Annotation Layers (English)

Strictly top-down links w+m+a can be easily

“knitted” API for cross-layer

access (programming)

PML Schema / Relax NG

[With slight modification, can be used for spoken data (audio as layer “-1”)]