12
Hebrew-to-English XFER MT Project - Update Alon Lavie March 17, 2004

Hebrew-to-English XFER MT Project - Update

Embed Size (px)

DESCRIPTION

Hebrew-to-English XFER MT Project - Update. Alon Lavie March 17, 2004. The Team. Alon Lavie Shuly Wintner (Faculty at Haifa Univ.) Yaniv Eytani (MS student at Haifa Univ.) Erik Peterson and Kathrin Probst…. Main Tasks in Month-1. Hebrew Encoding Issues Hebrew Language Resources: - PowerPoint PPT Presentation

Citation preview

Page 1: Hebrew-to-English XFER MT Project - Update

Hebrew-to-English XFER MT Project - Update

Alon Lavie

March 17, 2004

Page 2: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 2

The Team

• Alon Lavie• Shuly Wintner (Faculty at Haifa Univ.)• Yaniv Eytani (MS student at Haifa Univ.)• Erik Peterson and Kathrin Probst…

Page 3: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 3

Main Tasks in Month-1

• Hebrew Encoding Issues• Hebrew Language Resources:

– H-to-E Translation Lexicon– Morphological Analyzer

• Putting together a front-end to the XFER engine: morphology, format conversions

• Elicitation for Hebrew (two versions of EC)• Strong Decoder for H-to-E• Installing system on local server in Haifa

Page 4: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 4

Hebrew Encoding Issues

• Input texts are (mostly) in standard Windows encoding for Hebrew

• Morphology analyzer and other resources already set to work in an “ascii-like” representation

Converter script converts the input into the ascii representation

• All further processing is done in the ascii representation• Lexicon and grammar rules are also in ascii

representation• Elicitation is done in UTF8 Hebrew, output is converted

to ascii representation

Page 5: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 5

Translation Lexicon

• “Dahan” H-to-E and E-to-H dictionary available to us

• Excel spreadsheet format• Coverage is not great but not bad

– H-to-E is about 15K translation pairs– E-to-H is about 7K translation pairs

• POS information on both sides• No proper names or named entities• Issue with spelling convention “KTIB XSR”

Page 6: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 6

Translation Lexicon

• Yaniv wrote scripts that– Extract the relevant fields from the excel file– Merge with added lexicons (i.e. names)– Sort and remove duplicate entries– Convert to the XFER lexicon format

• Kathrin adapted script that “enhances” lexicon for English generation (plurals of nouns, tensed verb forms)

Page 7: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 7

Morphological Analyzer

• Morphology is a big deal for Hebrew• Not just inflections and derivations, but also

– Different words due to omission of vowels from the script– Attached prefixes for conj, det, prepositions, and some

attached possessive suffixes• Analyzer program from MS student at Technion already

available, works on Windows and with minimal adaptation on Linux

• Coverage is reasonable…• Produces all analyses or a disambiguated analysis for

each word• Entire sentence passed as input to morpher (not word-

by-word)

Page 8: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 8

Morphology Work Completed

• Split attached prefixes and suffixes into separate words for translation

• Produce f-structures as output• Convert feature-value codes to our conventions• Install morpher as a server running on our linux

machines• Yaniv wrote java scripts to handle input-output from the

morpher• Erik integrated a wrapper for running morpher as a

server on our linux machines• Currently works in single analysis per word, almost

ready to test with all analyses mode

Page 9: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 9

Elicitation for Hebrew

• Erik made sure Elicitation Tool works for Hebrew

• Two reduced versions of full EC: Alon version, and Kathrin version

• Shuly and Yaniv translated and aligned substantial portion of both

• Kathrin trained an initial learned grammar

Page 10: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 10

Pending Issues

• Strong Decoder for H-to-E:– Kathrin and Alon adapting script for running

Stephan’s decoder.– No real amounts of parallel text, so no translation

model scores for the edges… – It seems to be working, but questions about English

LM and parameter settings– Need to consult with Stephan…

• Installing system locally in Haifa– Erik is working with Yaniv and Alon to get it working

there…

Page 11: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 11

Demo

• Small input file of 10 sentences• Alon wrote very small manual grammar• Lexicon needs a lot of cleanup…• Morphology errors (only single analysis)• Translation output directly from XFER

engine (no strong decoding)

Page 12: Hebrew-to-English XFER MT Project - Update

March 17, 2004 Hebrew-to-English MT Update 12

Plans for Month-2

• Expanding and cleaning up translation lexicon• All morphological analyses: adapting to a

lattice input!• Translation with Strong Decoder• More extensive manual grammar development

for a solid comparison• Elicitation – finish translating and aligning• Testing and Evaluation

– Collect some dev and eval sets with parallel translations

– Evaluation with METEOR and BLEU– Frequent testing and fixing cycles