Designing an Elicitation Corpus with Semantic Representations Simon Fung Advisor: Lori Levin...

Preview:

Citation preview

Designing an Elicitation Corpus with Semantic Representations

Simon FungAdvisor: Lori LevinNovember 2006

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。

Uses for parallel corpus

statistical MT training data learning about grammar of new

language

Motivation

how do languages form various constructions (e.g. relative clauses)?

1. The student whom I saw2. 我見過的學生。

Motivation

what semantic distinctions are important in different languages?He is talking. Tā zài jiăng

huà.Il parle.

They are talking.

Tā mén zài jiăng huà.

Ils parlent.

He talks. {habitually}

Tā jiăng huà. Il parle.

The MILE (MInor Language Elicitation) Corpus

sentences covering various semantic categories/constructions

e.g. number, gender, relative clauses to be translated into language

under study semantic representation for each

sentence

The MILE (MInor Language Elicitation) Corpus

10,000-20,000 words translations done by one person 7 languages per year for next 5

years E.g., Thai, Bengali, Punjabi May have a lot of speakers, but fewer

electronic resources

Constraints

maximize range of semantic categories and constructions

minimize corpus size

Constraints

different languages complex in different areas only one corpus, for this project ultimate goal: dynamically navigate

through features e.g. no sing./pl. distinction → no dual

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((actor ((np-function fn-actor)(np-general-type interrogative-type) (np-person person-third)(np-number num-dual) (np-biological-gender bio-gender-male)(np-animacy anim-human)(np-pronoun-antecedent antecedent-n/a) (np-specificity specificity-neutral)(np-identifiability identifiability-neutral) (np-distance distance-neutral)(np-pronoun-exclusivity inclusivity-n/a))) (undergoer ((np-person person-third)(np-identifiability unidentifiable)(np-number num-pl) (np-specificity non-specific)(np-animacy anim-inanimate)(np-biological-gender bio-gender-n/a)(np-function fn-undergoer)(np-general-type common-noun-type)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral))) (c-polarity polarity-positive) (c-v-absolute-tense future) (c-general-type open-question)(c-question-gap gap-actor)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a))

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Feature name

Example: feature structuresrcsent: Who will break windows? context: "Who" refers to two men; spoken to a co-worker;

((ACTOR ((NP-FUNCTION FN-ACTOR)(NP-GENERAL-TYPE INTERROGATIVE-TYPE)(NP-PERSON PERSON-THIRD)(NP-NUMBER NUM-DUAL)(NP-BIOLOGICAL-GENDER BIO-GENDER-MALE)))

(UNDERGOER ((NP-PERSON PERSON-THIRD)(NP-IDENTIFIABILITY

UNIDENTIFIABLE)(NP-NUMBER NUM-PL)(NP-SPECIFICITY NON-SPECIFIC)))

(C-POLARITY POLARITY-POSITIVE)(C-V-ABSOLUTE-TENSE FUTURE))

Feature name

value

Using semantic representation

Advantages: more precise more complete encode actual linguistic features to

elicit

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

Method

1. create semantic representations first (instead of starting with English)

2. write English sentences based on them

3. translate sentences into various languages

Corpus example

Was there an apple?Wasn't there an apple?Will there be an apple?Won't there be an apple?There was an apple.There was not an apple.There will be an apple.There will not be an apple....

那裡曾經有一個蘋果嗎 ?那裡不是曾經有一個蘋果嗎 ?那裡會有一個蘋果嗎 ?那裡不是會有一個蘋果嗎 ?那裡曾經有一個蘋果。那裡曾經沒有一個蘋果。那裡會有一個蘋果。那裡不會有一個蘋果。

1. Naturalness naturalness of sentences vs. holding

lexical items constant• minimal pairs ideal (A tree fell/The tree fell)• but also want natural sentences• natural → easier to translate → less mistakes

She hurt herself. *It hurt itself.

sentences are hand-written vs using natural language generators

(GenKit)

2. Restrictions

• need to find restrictions on combinations of features

• some combinations invalid/unnatural

• e.g. inclusive and third-person

3. Definition of values use language-independent

semantic categories precise

e.g. specificity better than definiteness

agreement on definitions• intercoder agreement (informal

experiment)• writers agreed on English forms to use

Avoiding language-specificity many-to-many translations of determiners

I have a cat. J’ai un chat.

The cat is fat. Le chat est gros.

I like chocolate. J’aime le chocolat.

I eat chocolates. Je mange des chocolats.

Communism failed. Le communisme a échoué.

He has (some) money. Il a de l’argent.

I am a teacher. Je suis professeur.

England L’angleterre

I don’t have a/any cat(s). Je n’ai pas de chat.

Avoiding language-specificity

Have to break it down by function: Indefinite quantity (some water) Generic (the moose is a noble animal) Predicate nominal (I am a doctor) definite noun phrase (the dog is sick) Etc.

Definiteness

example of a problem in design of features and values

how to define definiteness, while avoiding using English

definiteness categories?

Criteria for definiteness

Lyons (1999): uniqueness familiarity identifiability specificity inclusiveness

Criteria for definiteness

chose the most important criteria: identifiability specificity

Definiteness

You and I are in a room. I say

“The chair is on fire!”

Definiteness

Why did I say “the chair”? identifiability

I know that you know what chair I’m talking about

specificity I’m referring to a particular chair

Grammatical feature: specificity

John wants to marry a Norwegian.Feature: np-specificity

Values specific

John wants to marry a (specific) Norwegian. non-specific

John wants to marry some Norwegian. specificity-neutral

She is a Norwegian.

Grammatical feature: specificity

Turkish direct objects:

Ali bir kitap okudu. Ali one book read Ali read a book.

Ali bir kitab-ı okudu.

Ali one book-acc readAli read a (specific) book.

Layout of Corpus1. Clause types, negation, and formality2. Discourse setting/Speaker-hearer features3. Basic NP features4. Verbal Tense and Aspect5. Evidentiality and Modality6. Causatives7. Comparatives8. Modifiers9. Conjunctions10. Clause-combining

Layout of Corpus

combine feature values systematically why combine

some features interact e.g. Will the woman be happy?

(interrogative, future tense) what to combine?

some features known to interact e.g. person, number (I am, we are, he is)

Status

delivered 21,133 words (sampled version) translated into Thai, Bengali Spanish -> Guarani

Recommended