22
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia

Improving Translation Selection using Conceptual Vectors

  • Upload
    winda

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving Translation Selection using Conceptual Vectors. LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia. Presentation Overview. Problem Background & Motivation Research Objectives Methodology Advantages & Contributions. - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Translation Selection using Conceptual Vectors

Improving Translation Selection using Conceptual Vectors

LIM Lian TzeComputer Aided Translation UnitSchool of Computer SciencesUniversiti Sains Malaysia

Page 2: Improving Translation Selection using Conceptual Vectors

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 3: Improving Translation Selection using Conceptual Vectors

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 4: Improving Translation Selection using Conceptual Vectors

Natural Language is Ambiguous

bank

?? ??

Page 5: Improving Translation Selection using Conceptual Vectors

Word Sense Disambiguation Given:

a list of meanings/senses of words (dictionaries)

input text containing occurrences of ambiguous words

Assign the correct sense to particular instance of ambiguous word in context

A.k.a. “sense-tagging”

….bank#1: a financial institution that accepts deposits and channels the money into lending activities

bank#2: sloping land (especially the slope beside a body of water)

….

…withdraw money from the bank...

bank#1

Page 6: Improving Translation Selection using Conceptual Vectors

Disambiguation in Machine Translation (1)

….bank#1: a financial institution that accepts deposits and

channels the money into lending activities

bank#2: sloping land (especially the slope beside a bodyof water)….

…withdraw money from the bank...

(Malay translations)

bank

tebing

…withdraw money from the bank#1...

…mengeluarkan wang dari bank...

English input

Malay output

sense-tag(WSD)

select translation wordThat worked

well…

Page 7: Improving Translation Selection using Conceptual Vectors

Disambiguation in Machine Translation (2)

….circulation#6: the spread or transmission of something

(as news or money) to a wider group or area ….

(Malay translations)

edaran (money)

penyebaran (berita)

…50 ringgit notes in circulation...

… 50 ringgit notes in circulation#6...

…duit kertas 50 ringgit dalam edaran?? penyebaran?...

English input

Malay output

sense-tag(WSD)

translate

That DIDN’T work well…

Page 8: Improving Translation Selection using Conceptual Vectors

Optimising WSD for MT

Input word Sense number Translation word

select select

select

(Lee and Kim 2002)

Page 9: Improving Translation Selection using Conceptual Vectors

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 10: Improving Translation Selection using Conceptual Vectors

Main Objective Existing MT system:

Selects fragments (translation units) from previously translated examples

Re-combines selected translation units to produce translation output for new input text

Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes

.

Page 11: Improving Translation Selection using Conceptual Vectors

Need semantic knowledge about…

Word senses Use dictionary definitions

Pairs of translation words From bilingual knowledge bank (BKB) made up of pairs of sentences

that are translations of each other Corresponding words in each translation sentence pair are explicitly

marked

Need a model to capture semantic knowledge of lexical items Conceptual Vectors (Lafourcade 2001) Using a selection of concepts or themes Construct mathematical vectors from concepts Thematic similarity between lexical items ≡ angle between CVs

Page 12: Improving Translation Selection using Conceptual Vectors

Need to:

Compile CVs for word meanings on 2 levels: Word sense (from dictionary) Word/phrase translation unit (from BKB) using data

compiled from previous step

Use compiled information during translation runtime to select correct translation units

Page 13: Improving Translation Selection using Conceptual Vectors

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages and Contributions

Page 14: Improving Translation Selection using Conceptual Vectors

Brief OutlineDictionary /

LexiconWord senses

word → sense numberlevel knowledge

Concept Category Labels

BKB

Examples Translationunits

tag

Translation Unit Profile(word → translation level

knowledge)

Input Text

“clues”

matching, comparison, selection

selected translation units

Translated Text

Data Preparation Phase EBMT Run-time Phase

Page 15: Improving Translation Selection using Conceptual Vectors

During TranslationDictionary /

LexiconWord senses

word → sense numberlevel knowledge

Concept Category Labels

BKB

Examples Translationunits

tag

Translation Unit Profile(word → translation level

knowledge)

Input Text

“clues”

matching, comparison, selection

selected translation units

Translated Text

Data Preparation Phase EBMT Run-time Phase

Page 16: Improving Translation Selection using Conceptual Vectors

Some Results Translating ‘circulation’ to Malay

edaran or penyebaran TS: proposed translation selection using CVs BS: baseline strategy, chooses

the translation that co-occur with the same input words (and same structure) as in the BKB

or the most frequently occuring translation

Input Translation chosen by TS

Translation chosen by BS

We will stop the circulation of that magazine. edaran penyebaran

We will stop the circulation of that rumour. penyebaran penyebaran

We will stop the circulation of that newspaper. edaran penyebaran

Page 17: Improving Translation Selection using Conceptual Vectors

Presentation Overview

Problem Background & Motivation Research Objectives Methodology Advantages & Contributions

Page 18: Improving Translation Selection using Conceptual Vectors

Advantages and Weaknesses Pros:

optimized for EBMT focus on translation selection, bypass intermediate WSD at run time Handles many-to-many mapping of source word sense translation

words allows for bi-directional translation with sense-tagging for 1 language mathematical operations on vectors are easy to implement avoids combinatorial effect when multiple ambiguous words in input

Cons: not all ambiguities can be solved using co-occurring concepts does not handle translation selection of function words manual work required in data preparation

Page 19: Improving Translation Selection using Conceptual Vectors

Research Contributions Adaptation of a WSD approach for the specific aim of

translation selection

Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries

Production of knowledge about word meanings on two levels: Word senses as in dictionaries Translations as in parallel text

Page 20: Improving Translation Selection using Conceptual Vectors

Summary WSD can be customized for different NLP applications accordingly

Different requirements Increase efficiency

WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model Requires a concept category hierarchy and word sense list Concepts related to a word sense modelled as mathematical vector Conceptual similarity = angular distance between vectors

Future work Automating data preparation tasks Investigating suitable weights or normalizing factors during CV manipulation Integration with other WSD or translation selection strategies

Page 21: Improving Translation Selection using Conceptual Vectors

Future Work

Automate tagging tasks that are currently done manually

Investigate different weight values for CVs for different syntactic relations or word classes

Integrate with other WSD/translation selection tasks

Page 22: Improving Translation Selection using Conceptual Vectors

Thank You