Upload
catherine-sutton
View
213
Download
1
Embed Size (px)
Citation preview
Machine Translation
marazI to UNLPresented by
Ashwini, Salil
Center for Indian Language Technology Solutions
CSE, IIT Powai
Characteristics of marazI
a. Syntactic structure – Subject-object-verb
e.g. rama Baat Katao. – Similarity with Hindi
b. Morphology
– P`a%yaya– Differences with Hindi
Main tasks
1. Marathi-UW dictionary building
2. Rulebase building for converting Marathi language phenomenon to UNL expressions
3. Testing using corpus sentences
4. Verification with Hindi and Marathi deconverters.
Analysis consists of
• Morphology
• Syntax
• Semantics
• Pragmatics
Marathi analysis done so far
We focus on Marathi morphology
• Noun morphology
• Pronoun morphology click
• Verb morphology click
• Relation label morphology click
• Adjective morphology click
Types of adjectives in Marathi
1. Pronounic adjectives 1.1 Pronoun adjectives: The nine pronouns being used as adjectives.
1.2 Adjectives derived from the nine pronouns
2. Qualitative adjectives 2.1 Adjectives ending with vowel +É 2.2 Adjectives ending with vowels other than +É
2.3 Postposition adjectives
Type of adjectives [contd.]
3. Numerical adjectives• 3.1 Cardinal
3.1.1 (whole number)3.1.2 (fractional number)
3.1.3 (entirety, totality, completeness)• 3.2 Ordinal• 3.3 Occurrencial
6 types• 3.4 Distinctive
[pAvaNedonashe] means 175 or 199.75?
- There is no word assigned to 199.75, 299.75, etc. - the problems with paun, pauvane and savva.- (pAvaNedon) times 100 (she). she and shambhar,
both mean 100. pAUNashe means 75. pAvaNeshambhar means 99.75.
- The powers of ten for which there is a distinct word in Marathi need to be stored separately.
- pronunciation is not pAvaNedona-[pause]-she but
pAvaNe -[pause]-donashe
Tables of numbers: continous and random access.
• Some forms of numbers are used for verbalizing the tables of numbers: ºÉÉiÉ / ºÉÉiÉÉ / ºÉÉiÉä / ºÉÉiÉÒä / ºÉiiÉä.
• Marathi: A, B times, (is C), occurring in the table for A. English: B A’s (are C).
• Usage of forms: 1. only for the expression ‘A’ 2. only for ‘B times’ 3. only while recalling the number directly without going through the table.
• Some forms occur especially for square. The repetition is emphasized.
words used to familiarise a child with numbers
• Some words are used mostly to familiarise a child with numbers: BEÒ BE, nÖEÔ nÉäxÉ, ÊiÉEÔ iÉÒxÉ, etc. The similarity of each word with the number is used to help a child remember the number. The words used as familiarisers are: BEÒ, nÖEÔ, ÊiÉEÔ, SÉÉèEÒ, {ÉÉSÉÒ, ºÉɽÒ, ºÉÉiÉÒä, +É`Ò, xÉ´Éä, nɽÒ.
playing cards and game of cricket
1. playing cards:
ekka, durri / durra, tirri / tirra, chavvi / chouka, panji / panja, chhakki / chhakka, satti / satta, atthi / attha, navvi / nashsha, dashshi / dashsha.
2. shots scoring multiple runs in the game of cricket:
SÉÉèEÉ®, ¹É]EÉ®.
The current status of dictionary
Number of entries 375
•Dictionary click
•Nouns
•Noun morphology suffixes
•Verbs
•Verb morphology suffixes
The current status of rulebase
Number of rules is 1050.
• Verb morphology (Simple and conjunct verbs) – Tense (Past, Present, Future)– Aspect of tense (Progress, complete, custom)– Voice (Passive voice)
– +lÉÇ (imperative, should, negative)– Ability, intention etc. for conjunct verbs only.
The current status of rulebase [contd.]
• Noun morphology – Number
– With case marker (ºÉɨÉÉxªÉ° {É)• Case when penultimate vowel is either
> or <Ç e.g. ¨ÉÚ±É - ¨ÉÖ±Éä (Plural)
The current status of rulebase [contd.]
• Relation labels used so faragt, obj, gol, aoj, and, or
e.g. ¨ÉÖ±ÉÉÆxÉÒ +ÉƤÉä JÉɱ±Éä xÉ´½iÉäÃ.
obj(eat(icl>do).@entry.@pred.@past.@not. @complete, mango(icl>fruit):08.@pl)
agt(eat(icl>do).@entry.@pred.@past.@not. @complete, child(icl>person):00.@pl)
Plans
• Adjective morphology
• Pronoun morphology
• Relation labels handling for corpus sentences.
For simple sentence only.
THANK YOU
References:
•Damle, Moro Keshav (1970). Shastriya marathi vyakarana. [SaswrIya marATI vyAkaraNa]. (Ed: K. S. Arjunwadkar). Pune: Deshmukh & Co.
•Meying, Zhu (2000) EnConverter specifications, version 2.1. Tokyo: UNU/IAS/UNL Center.
• Meying, Zhu (2002) UNL specifications, version 3 edition 1. Tokyo: UNU/IAS/UNL Center.
•Valambe, M. R. (2001) Sugam marathi vyakaran lekhan [sugama marATI vyAkaraNa leKana]. Pune: Nitin Prakashan.