35
184 CHAPTER 6 KANNADA MORPHOLOGICALANALYZER AND GENERATOR From a serious computational perspective, the creation and availability of a MAG for a language is important.The role of morphology is very significant in the field of NLP, as seen in applications like MT, QA system, IE, IR, spell checker, lexicography etc. Morphological analyzer and generators are very essential for languages having rich inflectional and derivational morphological features such as Kannada. The function of morphological analyzer is to return all the morphemes and their grammatical categories associated with a particular word form. On the other hand, for a given root word and grammatical information, morphological generator will generate the particular word form of that word.Developing a full fledged morphological analyzer and generator tools for highly agglutinative language like Kannada is a challenging task. To build a morphological analyzer and generator for a language, one has to take care of the morphological peculiarities of that language. Some peculiarities of Kannada language such as, the usage of classifiers, excessive presence of vowel harmony etc. make it morphologically complex and thus, a challenge in NLG. Table 6.1: Input/Output examples for morphological analyzer Input Output ಆಗಳನ (AnegaLu) +ಗಳನ (Ane+gaLu) ಹೋಗನತೋ (hOguttEne) ಹೋಗನ +ಉತ +(hOgu+utt+Ene) The function of morphological analyzer is to segment the given word into component morphemes and assigning correct morpho-syntactic information. The Table 6.1 shows examples for morphological analysis of Kannada words.The function of morphological generator is to combine the constituent morphemes to get the actual word. The Table 6.2 shows examples for morphological generation of Kannada words.

KANNADA MORPHOLOGICALANALYZER AND GENERATOR

Embed Size (px)

Citation preview

Page 1: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

184

CHAPTER 6

KANNADA MORPHOLOGICALANALYZER AND

GENERATOR

From a serious computational perspective, the creation and availability of a MAG for

a language is important.The role of morphology is very significant in the field of NLP, as

seen in applications like MT, QA system, IE, IR, spell checker, lexicography etc.

Morphological analyzer and generators are very essential for languages having rich

inflectional and derivational morphological features such as Kannada. The function of

morphological analyzer is to return all the morphemes and their grammatical categories

associated with a particular word form. On the other hand, for a given root word and

grammatical information, morphological generator will generate the particular word form

of that word.Developing a full fledged morphological analyzer and generator tools for

highly agglutinative language like Kannada is a challenging task. To build a

morphological analyzer and generator for a language, one has to take care of the

morphological peculiarities of that language. Some peculiarities of Kannada language such

as, the usage of classifiers, excessive presence of vowel harmony etc. make it

morphologically complex and thus, a challenge in NLG.

Table 6.1: Input/Output examples for morphological analyzer

Input Output

ಆನೆಗಳನ (AnegaLu) ಆನೆ+ಗಳನ (Ane+gaLu)

ಹ ೋಗನತ್ತೋನೆ (hOguttEne) ಹ ೋಗನ+ಉತ್ತತ+ಏನೆ (hOgu+utt+Ene)

The function of morphological analyzer is to segment the given word into component

morphemes and assigning correct morpho-syntactic information. The Table 6.1 shows

examples for morphological analysis of Kannada words.The function of morphological

generator is to combine the constituent morphemes to get the actual word. The Table 6.2

shows examples for morphological generation of Kannada words.

Page 2: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

185

Table 6.2: Input/output Examples for Morphological Generator

Input Output

ಆನೆ+ಗಳನ (Ane+gaLu) ಆನೆಗಳನ (AnegaLu)

ಹ ೋಗನ+ಉತ್ತತ+ಏನೆ (hOgu+utt+Ene) ಹ ೋಗನತ್ತೋನೆ (hOguttEne)

The first section of this chapter used to explain the development of a Rule Based

Morphological Analyzer and Generator (RBMAG) system for Kannada language by

incorporating morphological information and peculiarities of the language. The proposed

RBMAG system was developed using FST.

The second section of this chapter explains the development of a statistical based

morphological analyser for Kannada language verbs. The developed statistical system is a

paradigm based morphological analyzer, developedusing machine learning approach. The

systemwas designed using sequence labelling approach and training, testing and

evaluations were performed by SVM algorithms.

6.1 CHALLENGES IN KANNADA MORPHOLOGY

Kannada belongs to the south Dravidian family of languages. Kannada morphology is

characterized as agglutinative or concatinative, i.e., words are formed by adding suffixes

to the root word in a series. When suffixes attach to the root word, several

morphophonemic changes take place. The orders in which suffixes attach to a root word

determine the morpho-syntax. It is a verb-final inflectional language with relatively free

word order. Agglutination is another important feature of this language. Out of 38 basic

characters, 330 conjuncts are formed due to combination vowels and consonants. There

are more than 10,000 basic stems (root words) in Kannada language. Also more than a

million morphed variants formed due to more than 5000 distinct character variants

[158,159].

The complexity of developing morphological analyzer for Dravidian language like

Kannada is comparatively higher than the other languages like English. Most of the words

may change spelling when stems are inflected. In agglutinative language like Kannada

Page 3: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

186

normally root word is affixed with several morphemes to generate thousands of word

forms. To build an effective morphological analyzer one should carefully analyze and

identify all these roots and morphemes. The next challenge is the design of morphological

structure and generation of well organized corpus that should possibly cover all types of

inflections.

6.1.1 Agglutination of Kannada Language

Agglutination is the most critical and important feature of Kannada language. Due to

the highly agglutinating nature of this language and the morphophonemic variations that

take place at the point of agglutination, it is very difficult to mark word boundaries. For

example consider the verb root „ಓದಕೆ ೆಂಡಿದ್ವನ‟ (OdikoMDiddavana) ->“the

one(masculine) who was reading”. The different meaningful parts of this word are as

follows:

ಓದನ + ಇ + ಕೆ ಳನು + ಂೆಂಡ್ + ಉ + ಇಯನ + ದ್ + ಅ + ಅವನನ + ಅ

Odu + i + koLLu + MD + u + iru + dd + a + avanu + a

Root + VBP + AUXV + PST + VBP +AUXV + PST + RP + PRON-3SM + ACC

The above word consists of ten meaningful parts, in which one root word (Root), two

Verbal Participle (VBP), two Auxiliary Verbs (AUXV), two Past Tense Markers (PST),

one Relative Participle (RP), one Pronoun (PRON-3SM) and one accusative (ACC).

6.1.2 Types and Features of Kannada Words

In general, there are three types of Kannada words namely: i) namapada (Declinable

words or nouns) ii) kriyapada (Conjugable words or Verbs) and iii) avyaya (Uninflected

words). Nouns, pronouns and adjectives belong to declinable words and are inflected to

differences of case, number and gender. Conjugable words are inflected to mark

differences of person, gender, number, aspect, mood and tense. All the Kannada words are

of three genders: masculine, feminine and neuter. Declinable and conjugable words have

two numbers: singular and plural. The singular has no particular distinguishing marker

added. The plural marker is usually “gaLu”, but there are some exceptions as follows:

Masculine nouns (E.g., huDuga) ending in “a” and some feminine nouns (E.g., heMgasu)

Page 4: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

187

endings in “u” have plural with “aru” . Feminine nouns ending with “i (E.g., huDugi)” or

“e (atte)” have plural with “yaru”. Also nouns with kinship terms (E.g., aNNa), the marker

for plural is often “aMdiru”. Some nouns are irregular plurals such as “makkaLu” which is

the plural for noun “magu”.

6.1.3 Noun Cases and Characteristics suffixes

The case system of Kannada is similar to those of other south Dravidian languages

like Tamil, Telugu and Malayalam. Nouns may usually end in a, e, i, u, A or in a

consonant [158]. Various suffixes are added to the noun stem to indicate different

relationships between the noun and other constituents of the sentence. The different types

of suffixes are used with a particular case, based on the type of nouns and their end

character. For example “dative” case characteristic suffixes are decided by the following

criteria as shown in Table 6.3.

Table 6.3: Dative Case Characteristics suffixes for Nouns

Noun type Ends with Dative

suffix

Example

noun

Dative form

Neuter noun

ಅ (a) ಕೆ್ (kke) ಭಯ (mara) ಭಯಕೆ ್(marakke)

ಎ,ಇ,ಉ (e,i,u) ಗೆ (ge) ಭನೆ (mane) ಭನೆಗೆ (manege)

consonants ಇಗೆ (ige) ಊಯನ (Uru) ಊರಿಗೆ (Urige)

Neuter

determinative

- ಅಕೆ್ (akke) ಇದನ (idu) ಇದಕೆ ್(idakke)

Rational noun - ನಗೆ (nige) ಅಣ್ಣ (aNNa) ಅಣ್ಣನಗೆ (aNNanige)

Table 6.4 below shows the different cases and their corresponding characteristic

suffixes for nouns.

Page 5: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

188

Table 6.4: Noun Cases and their Characteristics suffixes

Feature Characteristic Suffix

Singular Plural

Nominative(Pr

athama) ವು(vu)/ಮನ(yu)/ಉ(u)/ ನನ(nu) ಗಳನ(gaLu)/ಯನ(ru)/ಂೆಂದಯನ(Mdiru)/ಮ

ಯನ(yaru)

Accusative

(Dwitiya) ವನನು(vannu)/ಮನನು(yannu)/ಅ

ನನು(annu)/ನನನು (nannu)

ಗಳನನು(gaLannu)/ಂೆಂದಯನನು(Mdiran

nu)/ಮಯನನು(yarannu)/ಯನನು(rannu)

Instrumental

(Tritiya) ದೆಂದ(diMda)/ಯಿೆಂದ(yiMda)/

ಇೆಂದ(iMda)/ನೆಂದ(niMda)

ಗಳೆಂದ(gaLiMda)/ಂೆಂದರಿೆಂದ(MdiriMd

a)/ಮರಿೆಂದ(yariMda)/ರಿೆಂದ(riMda)

Dative

(Chaturthi) ಕೆ್(kke)/ಗೆ(ge)/ಇಗೆ

(ige)/ನಗೆ(nige)/ಅಕೆ ್(akke)

ಗಳಗೆ(gaLige)/ಂೆಂದರಿಗೆ(Mdirige)/ಮರಿ

ಗೆ(yarige)/ರಿಗೆ(rige)

Ablative

(Pachami)

ದೆಸೆಯಿೆಂದ(deseyiMda)

ಗಳದೆಸೆಯಿೆಂದ(gaLadeseyiMda)/ಂೆಂದ

ಯದೆಸೆಯಿೆಂದ(MdiradeseyiMda)/ಮಯ

ದೆಸೆಯಿೆಂದ(yaradeseyiMda)/ಯದೆಸೆಯಿೆಂ

ದ(radeseyiMda)

Genitive

(Shashti) ದ(da)/ಮ(ya)/ಇನ(ina)/

ನ(na)/ಅ(a)/ವಿನ(vina)/ ಅಯ(ra)

ಗಳ(gala)/ಂೆಂದಯ(Mdira)/ಮಯ(yara)/

ಯ(ra)

Locative

(Saptami) ದಲ್ಲಿ(dalli)/ಮಲ್ಲಿ(yalli)/ಅಲ್ಲಿ(alli)

/ ನಲ್ಲಿ(nalli)

ಗಳಲ್ಲ (ಿgaLalli)/ಂೆಂದಯಲ್ಲ (ಿMdiralli)/

ಮಯಲ್ಲ (ಿaralli)/ಯಲ್ಲ (ಿralli)

Vocative

(Sambhodana) ಏ(E)/ಳೋ(vE)/ಆ(A)/ಈ(I) ಗಲ(gale)/ಂೆಂದರೋ(MdirE)/

/ಮರೋ(yare)

6.1.4 Verb Morphology

Comparing with other Dravidian language like Malayalam, the morphological

structure of Kannada is more complex because it inflects to person, gender, and number

markings. In case of verb morphology each root word is combines with auxiliaries that

indicate aspect, mood, causation, attitude etc. The uniqueness in the structure of verbal

Page 6: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

189

complexity makes it very challenging to capture in a machine analyzable and generatable

format. Also the formation of the verbal complex involves arrangement of the verbal units

and the interpretation of their combinatory meaning. Phonology also plays a little role in

word formation in terms of „morphophonemic‟ and „sandhi‟ rules which account for the

shape changes due to inflection. To resolve the computational challenges in verb

morphological analysis,I have classified verbs into 35 distinguished paradigms and verb

words are grouped based on their class paradigms.

6.1.4.1 Inflections and Features of Kannada Verbs

Verb forms can be broadly classified into two types: finite verbs and non-finite verbs.

In case of finite verbs, the verbs are usually added to the end of sentences with the

exception of clitics or reportives and can have nothing added to them. The general syntax

of finite verb is the form: Subject-Object-Verb. Some of the finite forms of the verbs are

imperatives, present and past forms marked with PNG, modals and verbal/participle

nouns. The tense can be past/present/future, if it is in the affirmative. The negative form

does not take tense. The non-finite verbs in contrast cannot stand alone and must have

some other forms following them. Non-finite verb forms include infinitives, verbal and

adjectival participles and tense-marked verb stems [158]. The non-past denotes both

present and future tenses and unlike Malayalam language (another south Dravidian

language) all tenses have different tense markers in Kannada language. Mood is another

important feature of Kannada language and is associated with statements of fact versus

possibility, supposition, etc [160]. There are four different moods that are expressed in

Kannada are: infinitive, imperative, affirmative and negative. Also Kannada has some

additional modal forms such as: indicative, conditional, optative, potential, monitory and

conjunctive.

Kannada language also include past verb stems in addition to simple verb stems, that

are used in forming the past tense, past participles, conditionals and some other

constructions. The contingent form is another distinguished feature of Kannada language

that is not present in any other Dravidian languages [161]. Past verbs are broadly classified

into two types called regular and irregular (or semi regular). In case of regular the different

words are formed by adding „id‟ to the verb stem. In the other case different words are

formed by adding any one of the past tense marker as shown in Table 6.18.

Page 7: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

190

6.1.4.1.1 The Infinitive

The infinitive is a non-finite form of the verbs that occur together with other verbs,

auxiliary verbs (modals), negative morphemes and some other forms. There are two types

of infinitives in Kannada called „ಅಲ್ „(al) and „ಓಕೆ್‟ (Okke). Both are added to the verb

root to generate other word forms as shown in Table 6.5.

Table 6.5: The Infinitive

Case Example and Meaning

Case 1 ಫಯನ ( baru ) -> come + ಅಲ್ ( al ) + ಇಲ್ (ಿ illa ) -> negative =

ಫಯಲ್ಲಲ್ ಿ (baralilla ) -> didn’t come

Case 2 ನೆ ೋಡನ ( nODu ) -> see + ಓಕೆ್ (Okke) = ನೆ ೋಡ ೋಕೆ ್(nODOkke ) ->

to see

6.1.4.1.2The Imperative

Based on various degrees of politeness and deference, Kannada verbs exhibit number

of forms that express commands or exhortations. These imperatives also changes based on

verb types which usually depend upon the end of verb. The Table 6.6 below indicates the

imperatives with example each.

Table 6.6: The Imperative

Degree of politeness i – Stem,

Eg.ಕನಡಿ

(kuDi)„drink‟

o – Stem,

Eg.ತ್ಗೆ

(tego) „take‟

bA , tA –Stem,

Eg.ಬ,ಫಯನ

(bA,baru)‟come‟

Other

stems

Eg.ಹ ೋಗನ (hOgu)

„go‟

Impolite,

casual

Masculine ಕನಡಿಯೋ

(kuDiyO )

ತ್ಗೆ ಲ ುೋ

(tegoLLO )

ಬರ ೋ (bArO ) ಹ ೋಗೆ ೋ

(hOgO)

Feminine ಕನಡಿಯೋ

(kuDiyE)

ತ್ಗೆ ಲುೋ

(tegoLLE)

ಬರೋ (bArE) ಹ ೋಗೆೋ

(hOgE)

Nonpolite ಕನಡಿ (kuDi) ತ್ಗೆ (tego) ಬ (bA) ಹ ೋಗನ

(hOgu)

Page 8: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

191

Polite, plural ಕನಡಿೋರಿ

(kuDIri)

ತ್ಗೆ ಳು

(tegoLLi )

ಫನು (banni) ಹ ೋಗಿ

(hOgi)

Very polite ಕನಡಿಯಿರಿ

(kuDiyiri )

ತ್ಗೆ ಳುರಿ

(tegoLLiri)

ಫನುರಿ (banniri) ಹ ೋಗಿರಿ

(hOgiri)

ultrapolite ತವುಕನಡಿಯಿರಿ

(tAvu

kuDiyiri)

ತವುತ್ಗೆ ಳುರಿ

(tAvu

tegoLLiri)

ತವುಫನುರಿ

(tAvu banniri)

ತವುಹ ೋಗಿ

ರಿ

(tAvu

hOgiri)

6.1.4.1.3 Negative Imperatives

These forms command someone not to do something. There are two ways for

indicating negative imperatives called „ಬಯದನ‟ (bAradu) and „ಬೋಡ‟ (bEDa). The first one

„bAradu‟ (historically a form of bA/baru, „come‟) maybe added with infinitive „ಅ „(a) to

generate negative verb form. Also added with infinitive „ಅಲ್ „(al) or „ಲ್‟(l) and an

emphatic „ಏ‟ (E) to generate strong negative verb forms. Table 6.7 illustrates the negative

imperative with example.

Table 6.7: The Negative Imperative

Case Example and Meaning

Case 1

ಹ ೋಗನ (hOgu) + ಅ (a) + ಬಯದನ (bAradu) = ಹ ೋಗಬಯದನ

(hOgabAradu) -> ‘don’t go’ or

ಹ ೋಗನ (hOgu) + ಬಯದನ (bAradu)= ಹ ೋಗಾಯದನ (hOgbAradu) -> ‘don’t go’

Case 2

ಹ ೋಗನ (hOgu) + ಅಲ್ (al) + ಏ (E) + ಬಯದನ (bAradu) = ಹ ೋಗಱೋಬಯದನ

(hOgalEbAradu) ‘definitely don’t go’ or

ಹ ೋಗನ (hOgu) + ಲ್ (l) + ಏ (E) + ಬಯದನ (bAradu) = ಹ ೋಗೆಿೋಬಯದನ

(hOglEbAradu) ‘definitely don’t go’

Similarly the second way for indicating negative imperatives are using the negative

modals „ಬೋಡ‟ (bEDa, negative modal of the word „ಬೋಕನ‟ (bEku) - want, need, must,

should) and „ಕ ಡದನ‟ (kUDadu, ‟ must not‟). Table 6.8 illustrates these negative

imperatives with example.

Page 9: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

192

Table 6.8: The Negative Imperative

Case Example and Meaning

Case 1

ಮಡನ (mADu) + ಅ (a) + ಬೋಡ (bEDa) = ಮಡಬೋಡ (mADabEDa) or

ಮಡನ (mADu) + ಬೋಡ (bEDa) = ಮಡಾೋಡ (mADbEDa), ‘don’t do’

Case 2 ಮಡನ (mADu) + ಅ (a) + ಬೋಡಿ (bEDi) = ಮಡಬೋಡಿ (mADabEDi) or

ಮಡನ (mADu) + ಬೋಡಿ (bEDi)= ಮಡಾೋಡಿ (mADbEDi), ‘please don’t do’

Case 3

ಮಡನ (mADu) + ಅಲ್ (al) + ಏ (E) ಬೋಡ (bEDa) = ಮಡಱೋಬೋಡ

(mADalEbEDa) or

ಮಡನ (mADu) + ಲ್ (l) + ಏ (E) ಬೋಡ (bEDa) = ಮಡಿೋಬೋಡ

(mADlEbEDa), ‘must not do’

Case 4

ಮಡನ (mADu) + ಅ (a) + ಕ ಡದನ (kUDadu) = ಮಡಕ ಡದನ

(mADakUDadu) or

ಮಡನ (mADu) + ಕ ಡದನ(kUDadu)= ಮಡ ್ಡದನ(mADkUDadu), ‘must

not do’

6.1.4.1.4 Optative

The optative is usually used with first and third persons in Kannada and is formed by

adding‟ ಇ‟ (i) to the infinitives. It is often translates into an English word „let‟, if it is used

in an affirmative and has the meaning „shall‟, „should‟ and „may‟ when appeared in the

interrogative as shown in table 6.9.

Table 6.9: The Optative

Case Example and Meaning

Affirmative ಅವಳನಭಡನ (avaLu maDu) + ಅಲ್ (al) + ಇ (i) = ಅವಳನಮಡಲ್ಲ

(avaLu mADali), ‘let her do’

Interrogative ಅವಳನಯವಗಭಡನ (avaLu yAvAga maDu) + ಅಲ್ (al) + ಇ (i) =

ಅವಳನಯವಗಮಡಲ್ಲ (avaLu yAvAga mADali)?, ‘when

shall/should/may she do?’

Page 10: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

193

6.1.4.1.5 Hortative

This form in Kannada can be formed by „ಓಣ್‟ (ONa) to the verb stem and can be

translated either as „let‟s (do something) or „shall we (do something)?‟ in the interrogative

sentence. Table 6.10 illustrates this with example.

Table 6.10: The Hortative

Case Example and Meaning

Affirmative ಮಡನ (mADu) + ಓಣ್ (ONa) = ಮಡ ೋಣ್ (mADONa), ‘let’s do’

Interrogative ಏನನ (Enu) + ಮಡನ (mADu) + ಓಣ್ (ONa) = ಏನನಮಡ ೋಣ್ (Enu

mADONa)?, ‘What shall we do?’

6.1.4.1.6 Participle

Kannada has some non finite verb forms called participles that function verbally,

adjectivally or has some special syntactic function in the sentence. Participles may be

affirmative and that can be marked for tense or negative. The important verbal participles

in Kannada are as shown in Table 6.11 with example for each.

Table 6.11: Verbal Participle

Participle Description Example

Present

verbal

participle

Adding „ಆ‟(A) to the „verb stem‟

+ „ ಉತ್ತತ ’(utt, present tense

marker) followed by a „finite

verb‟ or „verb phrase‟.

ಅವುುಯೋಚನೆಮಡನತತಕನಳತ್ತತದ್ನನ(avnu

yOcane mADuttA kuLittiddanu)‘ he

was sitting thinking’

Past verbal

participle

If the past tense marker of the

verb is ‘ಇದ್’(id), then add

‘ಇ’(i) to the ‘verb stem’

followed by a ‘finite verb’ or

‘verb phrase’. Otherwise add

‘ಉ’(u) to the ‘past verb stem’

Case 1:ಮಡಿಊರಿಗೆಫೆಂದೆನನ( mADi

Urige baMdenu)‘having done, I

came to village’

Case 2:ಹ ೋಗಿ + ಬಿಟನು + ಫನು( hOgi +

biTTu + banni) ‘go and come’

Page 11: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

194

followed a ‘finite verb’ or ‘verb

phrase’.

Negative

verbal

participle

Used to express the negative

of both the present and past

verbal participle. These are

formed by adding ‘ಅದೆ‘(ade) to

the ‘verb stem’ or to the

negative stem ‘illa’.

Caes 1: ನೆ ೋಡನ + ಅದೆ = ನೆ ೋಡದೆ

(nODu + ade = nODade)‘without

seeing’

Case 2: ಇಲ್ ಿ+ ಅದೆ =ಇಲ್ಿದೆ (illa + ade

=illade) ‘not being/having

been’

Verbal/part

iciple noun

The most common among

verbal participle nouns are

the neuter singular ‘ಅದನ’(adu)

and personal verbal nouns

like ‘ಅವನನ’ (avanu), ‘ಅವಳನ’

(avaLu), ‘ಅವಯನ’(avaru) etc.

Caese 1: ಮಡನ + ಓ +ಅದನ= ಮಡ ೋದನ

(ADu + O +adu= mADOdu) ‘the

(act/fact of) doing, that which does’

Case 2: ನೆ ೋಡನ + ಓ +ಅವಯನ =

ನೆ ೋಡ ೋವವಯನ n(ODu + O +avaru =

nODOvavaru) ‘those (people

) who see’

Negative

verbal/parti

ciple noun

Can be formed by affixing the

negative adjectival participle

‘ಅದ’ (ada) to the

demonstrative pronouns.

Case 1: ಮಡನ + ಅದ + ಅದನ =ಮಡದದನ

(mADu + ada + adu =mADadadu)

‘(the act/fact of) not doing, that

which does/did not do’

Case 2: ಇಲ್ಿ +ಅದ

+ಅದನ=ಇಲ್ಿದದನಒಇಾಲ್ಿದನ ್

( illa +ada +adu=illadadu or illaddu )

‘the act/fact of not being, that which

is /was not

Case 3: ಹ ೋಗನ+ ಅದ +ಅವಳನ =

ಹ ೋಗದವಳನ (hOgu+ ada +avaLu =

hOgadavaLu ) ‘the women

who does/did not go’

Page 12: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

195

6.1.4.1.7 Modal auxiliaries

There are number of modal auxiliaries in Kannada language that may have number of

different meanings as shown in Table 6.12.

Table 6.12: Modal Auxiliaries

Modal

auxiliaries

Description Example

‘ಬೋಕನ’ (bEku)

The modal ‘ಬೋಕನ (is

wanted, needed; must,

should, ought) is used in

different situation with

different meaning.

Case 1:ರ್ನನಹ ೋಗಬೋಕನ ( nAnu

hOga bEku)‘ I ought/need/want to

go’

Case 2: ನೋವುರ್ಲಇಲ್ಲಿಇಯಬೋಕನ ( nIvu

nALe illi ira bEku ) ‘you

must/should be here tomorrow’

Case 3: ನೋವುಅವನನನುನೆ ೋಡಿಇಯಬೋಕನ(

nIvu avanannu nODi ira bEku ) ‘

you must have seen him’

‘ಬೋಡ’(bEDa),

‘ಬೋಡಿ’(bEDi)

The negative of modal

‘ಬೋಕನ’ is ‘ಬೋಡ’( should not,

must not, need not). The

more polite or plural form

is ‘ಬೋಡಿ’.

Case 1: ಫಯಬೋಡ (bara bEDa) ‘

don’t come’

Case 2: ಫಯದಯಬೋಡ (baradira

bEDa) ‘come’

‘ಕ ಡದನ’

(kUDadu)

Similar to the negative

modal ‘ಬೋಡ’ but ಕ ಡದನ is

the strongest negative

modal. This also attached

to the infinitive as like

‘ಬೋಡ’ and ‘ಬಯದನ

‘(bAradu).

ಫಯಕ ಡದನ (bara kUDadu)

‘should/must not come’

‘ಫಹನದನ’

(bahudu)

This is also attached to

the infinitive and has the

meaning (someone)

ನೆ ೋಡಫಹನದನ (nODa bahudu)

‘can/might see’

Page 13: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

196

can/may (do something)

‘ಬಯದನ’

(bAradu)

The negative of ‘ಫಹನದನ’ is

‘ಬಯದನ’

ನೆ ೋಡಬಯದನ (nODa bAradu)

‘can’t/shouldn’t see

‘ಆರ್ ‘ (Ar)

The negative contingent

‘ಆರ್’ with PNG markers is

attached to the verbal

infinitive to generate the

meaning ‘cannot’/’might

not’.

ನಗನ + ಅಲ್ + ಆರ್ + ಎನನ = ನಗಱರನನ

nagu + al + Ar + enu =

nagalArenu

‘ cannot/might not laugh’

6.1.4.1.8 Dative-stative verbs

Mainly, there are two verb stems namely „ಫಯನ‟ (baru) and „ಆಗನ‟ (Agu) are frequently

used as dative-stative verbs in Kannada language. These verbs have a habitual sense when

they stand alone and normally do not take tense markers. The verb „ಇಯನ‟ ( iru, be)

indicating possession has the meaning „have‟ in English. The second verb „ಆಗನ‟(become)

act as an aspect marker indicating finality.

6.1.4.1.9 Verbal aspect markers

Aspect markers are very similar to main verbs in their morphology and syntax but

semantically they do not express the lexical meaning like the other main verbs. In

Kannada language, the verbal aspect marker is usually added to the past verbal participle.

The Table 6.13 shows the various verbal aspect markers that are used in Kannada. The

aspect markers in Kannada and their meaning in English are underlined in each of the

given example.

Table 6.13: Verbal Aspect Markers

Aspect markers and

their meaning

Description Example

‘ಬಿಡನ’(biDu) Meaning:

‘completion’

(‘perfective’,

Aspectual ‘ಬಿಡನ’

is attached to

the past verbal

Case 1: ಅವನನಬಿದನ್ಬಿಟುನನ( avanu

biddu biTTanu) ‘he fell down’

Case 2: ಬಿಟನುಬಿಡನ (biTTubiDu) ‘let

Page 14: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

197

definiteness). participle. It is

homophonous

with the lexical

verb ‘ಬಿಡನ’

(leave) and has

the similar

tense

formation.

However the

aspectual ‘ಬಿಡನ’

can also

attached to the

lexical verb

‘ಬಿಡನ’.

go’

‘ಹ ೋಗನ’(hOgu)Meaning:

‘completion’(sometimes

in voluntaryor

accidental)

The aspectual

‘ಹ ೋಗನ’

indicates that

something has

changed from

one state to

another. It is

homophonous

with the lexical

verb ‘ಹ ೋಗನ’

(go) and has

the similar

tense

formation.

Aspectual

‘ಹ ೋಗನ’ is

attached to the

past verbal

Case 1: ಅನುಬೆಂದನಹ ೋಗಿದೆ(anna

beMdu hOgide) ‘the rice has

gotten overcooked’

Case 2: ಕೆಟನುಹ ೋಗನ ( keTTu

hOgu)‘get spoiled’

Page 15: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

198

participle.

‘ಆಡನ’(ADu)

Meaning: continuity,

duration (with some

verbs ‘reciprocal’or

‘competitive’)

It is

homophonous

with the lexical

verb ‘ಆಡನ’ (paly)

and has the

similar tense

formation.

Aspectual ‘ಆಡನ’

is also attached

to the past

verbal

participle.

Case 1: ಅವಯನಓಡ್ಡಿದಯನ(avaru

ODADidaru)‘they ran around’

Case 2: ಅವಯನಕದಡಿದಯನ(avaru

kAdADidaru)‘they fought with

each other’

‘ಕೆ ಡನ’ (koDu)

Meaning: benefactive

It is

homophonous

with the lexical

verb ‘ಕೆ ಡನ’

(give) and has

the similar

tense

formation.

Aspectual

‘ಕೆ ಡನ’ is also

attached to the

past verbal

participle.

Case :

ಅವನನಕತ್ಫರದನಕೆ ಟುನನ(avanu kate

baredu koTTanu)‘he wrote the

story for someone’s benefit ‘

‘ನೆ ೋಡನ’ (nODu)

Meaning: ‘attemptive’,

experimentive

It is

homophonous

with the lexical

verb ‘ನೆ ೋಡನ’

(see) and has

the similar

Case:

ಅವನನಕಫಿಕನಡಿದನನೆ ೋಡಿದನನ(avanu

kAfi kuDidu nODidanu)‘hetried

drinking/tasted the coffee’

Page 16: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

199

tense

formation. It is

usually used

with transitive

verbs and

rarely used with

intransitive

formation.

‘ಹಕನ’ (hAku’)

Meaning :exhaustive,

malefactive

It is

homophonous

with the lexical

verb ‘ಹಕನ’ (put,

place) and

takes regular

(‘weak’) tense

formation. It is

usually used

with transitive

verbs.

Case:ಅವನನದೆ ೋಸೆಯಲ್ಿತ್ತೆಂದನಹಕಿದನನ(

avanu dOseyella tiMdu

hAkidanu)‘he ate up all the pan

cakes’ (against our whishes)

‘ಕೆ ಳನು’ (koLLu)

Meaning: reflexive, self

benefactive

It is

homophonous

with the lexical

verb ‘ಕೆ ಳನು’

(buy, take,

acquire) and

has the similar

tense

formation.

Case : ಅವನನಕತ್ಫರದನಕೆ ೆಂಡನನ (

avanu kate baredu koMDanu)‘he

wrote a story for himself’

6.1.4.1.10 The causative suffix ‘ಇಸನ‘ (isu)

The causative suffix „ಇಸನ‟ (isu) (or „yisu‟) can be added to any verb stems to make

causative verbs out of noncausative ones as in Table 6.14.

Page 17: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

200

Table 6.14: The causative suffix „ಇಸನ„ (isu)

Case Example and Meaning

Case 1 ‘ಕಲ್ಲ’ (kali -> learn) + ‘ಇಸನ‘(isu) -> ‘ಕಲ್ಲಸನ’ (kalisu -> teach)

(Intransitive) ( Causative)

Case 2

‘ಮಡನ’ (mADu -> do) + ‘ಇಸನ’ (isu) -> ‘ಮಡಿಸನ’( mADisu, make

(someone ) do->Intransitive- Causative)+‘ಇಸನ’(isu)+ಮಡಿಸಿಸನ(

mADisisu, make (someone) make (someone) do -> Double

causative)

6.1.4.1.11 The conditional suffix ‘are’

The conditional suffix „are‟ in Kannada is used to express „if‟ (something) in English

and the same form is used for all persons. Usually the suffix „are‟ is attached to the past

verb stem to create the conditional form. The Table 6.15 illustrates this.

Table 6.15: The conditional suffix

Case Example and Meaning

Case 1 ‘ಮಡಿದರ’ (mADidare) ‘if (someone) does (something) , (then..)’

Case 2 ‘ಹ ೋದರ’ (hOdare) ‘if (someone) goes, (then…)’

6.1.4.1.12 Verbs marked with tense and PNG

The PNG and the tense marker concatenated to the verb stems are the two important

aspect of verb morphology. The verbal inflectional morphemes attach to the verbs

providing information about the syntactic aspects like number, person, case-ending

relation and tense. The Table 6.16 shows the various PNG suffixes that can be attached to

be any verb root word.

Table 6.16: PNG-Suffixes

Person Number Gender PNG Suffix

Page 18: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

201

Present Future Past Contingent

First Singular Masculine

/Feminine

ಏನೆ (Ene) ಎನನ, ಎ

(enu, e)

ಎನನ, ಎ

(enu, e)

ಏನನ (Enu)

Plural Masculine

/Feminine

ಏಳ (Eve) ಎವು (evu) ಎವು (evu) ಏವು ( Evu)

Second Singular Masculine

/Feminine

ಈ, ಈಯ

(I, Iye)

ಇ, ಇಯ (i,

iye)

ಇ, ಇಯ (i,

iye)

ಈಮ (Izha)

Plural Masculine

/Feminine

ಈರಿ (Iri) ಇರಿ (iri) ಇರಿ (iri) ಈರಿ ( Iri)

Third

Singular Masculine ಆನೆ (Ane) ಅನನ (anu) ಅನನ (anu) ಆನನ (Anu)

Singular Feminine ಆಲ (Ale) ಅಳನ (aLu) ಅಳನ (aLu) ಆಳನ ( ALu)

Plural Masculine

/Feminine

ಆರ (Are) ಅಯನ (aru) ಅಯನ (aru) ಆಯನ ( Aru)

Singular Neuter ಇದೆ (ide) ಉದನ (udu) ಇತನ(itu) ಈತನ, (Itu)

Plural Neuter ಇಳ (ive) ಅವು (Avu) ಅವು (avu) ಆವು ( Avu)

6.2 CLASSIFYING VERB PARADIGMS AND SYSTEM DESIGN

The first step involved in the implementation of morphological analyzer is to classify

the verb paradigms with computational perspective. Most of the cases the problem arises

due to past tense markers that change from one paradigm to another [160]. I have

classified Kannada verbs into 35 different paradigms by considering the entire situation

that possibly generates different possible words as discussed previously. Each paradigm

root word will inflect with the same set of inflections. In other words, every word in each

paradigm will have similar orthographic changes (sandhi changes) or has the same

inflectional behavior, when a suffix is added to it. Consider the two words „ಏಳನ (ELu) and

„ಬಿೋಳನ„ (bILu) . These two words, when inflected with tense (past) and PNG markers to

form ‟ಎದೆ್ನನ‟ (eddenu) and „ಬಿದೆ್ನನ „(biddenu). As these two words show the same

orthographic changes, they are grouped under the same paradigm. The Table 6.17

illustrates these paradigms with example each.

Table 6.17: Verb Paradigms

Page 19: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

202

Paradigms Past tense

marker

Description & Example

Class-1

-ತ್ತತ-(-tt-) Verbs ends with 'Ayu', 'Iyu', 'ILu'; Eg: sAyu, Iyu, kILu etc.

Class-2

-ತ್ತತ-(-tt-) Verbs ends with 'ru', 'aLu', 'uLu'; Eg: „heru‟,‟aLu‟,‟uLu‟etc.

Class-3

-ತ್ತತ-(-tt-) Verbs ends with „aLu‟,‟uLu; Eg : aLu, uLu

Class-4

-ನ್ತತ- (-Mt-) Verbs ends with 'illu'; Eg : nillu

Class-5

-ತ್ತ- (-t-) Verbs ending with „i‟ and „e‟; Eg: „kali‟, „mere‟, „koLe‟ etc.

Class-6

-ತ್ತ- (-t-) Verbs ends with 'ULu'; Eg: Example: hULu

Class-7

-ತ್ತ- (-t-) Verbs ends with „Olu‟, „Ulu‟, „Elu‟; Eg:jOlu‟,‟nUlu‟,‟hElu‟

Class-8

-ದ್- (-d-)

Verbs ending with 'Ayu','Oyu','Eyu','Iyu'; Eg:‟ kAyu‟, „sIyu‟

Class-9

-ದ್- (-d-) Verbs ending with 'Agu','Ogu'; Eg: „hOgu‟, „Agu‟ etc

Class-10

-ದ್- (-d-) Verbs ends with 'are' ; Eg: „bare‟

Class-11

-ದ್- (-d-) verbs ending with 'ge' and 'gi' ; Eg: „age‟, „agi‟

Class-12

-ದ್- (-d-) Verbs ending with 'yyu' ; Eg: „koyyu‟, „geyyu‟, „hoyyu‟ etc.

Class-13

-ದ್- (-d-) Verbs ends with 'nnu' ; Eg:‟ annu‟, „tinnu‟, „ennu‟ etc

Class-14

-ದ್- (-d-) Verbs ending with 'Eyu' ; Eg: „gEyu‟, „nEyu‟ etc

Class-15

-ದ್- (-d-) Verbs ending with 'Ayu' ; Eg: „Ayu‟

Class-16

-ದ್ -(-dd-) Verbs ends with 'iru' ; Eg: „iru‟

Class-17

-ದ್ -(-dd-) Verbs ends with 'kaLu' ; Eg: kaLu

Class-18

-ದ್ -(-dd-) Verbs ends with 'ILu','ELu' ; Eg: „bILu‟ ,‟ELu‟, etc

Class-19

-ದ್ -(-dd-) Verbs ends with 'Eyu' ; Eg: :mEyu

Class-20

-ದ್ -(-dd-) Verbs ends with 'ellu' ; Eg: „gellu‟

Class-21

-ಇದ್-(-id-) Verbs ends with 'ADu', 'ODu' ; Eg: „ADu‟,‟nODu‟, „tODu‟

Page 20: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

203

Class-22

-ಇದ್-(-id-) Verb ends with 'TTu',‟ddu‟, „bbu‟,‟ ttu‟, „llu‟, „ccu‟

Eg: aTTu, addu, ubbu, kuttu, cellu, heTTu, beccu,hottu etc

Class-23

-ಇದ್-(-id-) verbs ending with 'Oru', 'Eru' ; Eg: tOru, sEru, etc

Class-24

-ಇದ್-(-id-) Verbs ends with 'ju',‟su‟ ; Eg: mOju, aMkurisu etc

Class-25

-ಇದ್-(-id-) Verb ends with 'MTu',‟Mju‟,‟Mcu‟; Eg: IMTu, aMju, hoMcu

Class-26

-ಇದ್-(-id-) Verbs ends with 'ELu', 'ILu'; Eg: hELu, sILu etc

Class-27

-ನ್ತ - (-nd-) Verbs ends with 'Eyu', 'Oyu' ; Eg: bEyu, nOyu etc

Class-28

-ನ್ತ - (-nd-) Verbs ends with 'A'(aru) ; Eg: taru(tA),baru(bA) etc.

Class-29

-ನ್ತ - (-nd-) Verbs ends with 'ollu', 'ellu', 'allu' ; Eg: kollu,mellu ,sallu etc.

Class-30

-ನ್ತ - (nD-) Verb stems ending with 'ANu' ; Eg: kANu

Class-31

-ನ್ತ - (nD-) Verb ends with 'oLLu' ; Eg: koLLu

Class-32

-ಟ್- (-T-) Verb ends with „aDu',‟eDu‟,‟oDu‟,‟iDu‟,‟uDu‟

Eg: aDu, keDu, koDu, iDu, uDu, etc

Class-33

-ಕ್ಕ- (-k-) Verb ends with 'ggu'and ''gu'; Eg: oggu, sigu, nagu, etc

Class-34

-ದ್- (-d-) Verbs ends with 'kAyu' ; Eg: : kAyu, dArikAyu

Class-35

-ನ್ತ - (nD-) Verbs ends with 'ko' ; Eg: baggiko, bEDiko etc

Designing the morphological system thatshould probably generate all possible word

forms for the given verb root word was the next important stage in the developed system.

Fig. 6.1 shows the proposed flowchart for verb morphology. The meaning of each of the

abbreviations that are used in the flowchart is shown in Table 6.18.

Table 6.18: Abbreviations in the System Design

Abreviation Examples

PAST: Past tense marker tt, Mt.,t, d, dd,id,Md, D,T, kk, MD, ttidd, Mtidd,

Page 21: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

204

tidd, didd, ddidd, idd, Mdidd , Didd, Tidd, kkidd

,MDidd .

PRESENT: Present tense marker utt, yutt, uttidd, yuttidd, iyutt, iyuttidd

FUTURE: Future tense marker uv, yuv.

PNG: Person Noun Gender As shown in Table V

PN: Pronoun avanu, avaLu, avaru, adu

PP: Past Participle u, I, ǿ (Nul)

TM: Tense Marker PRESENT, PAST, FUTURE

INF: Infinitive isi, is, isal, isalu, isid , iyisid, sid, al , alu ,i ,sal

,salu ,si , s, a

The different levels show the possibility of different verb words derived from the

same root word. For example the different morphemes associated with the verb word

„ಓದಸಿನೆ ೋಡನತತನೆ„(OdisinODuttAne) is:

ಓದನ + ಇಸಿ + ನೆ ೋಡನ + ಉತ್ತತ + ಆನೆ

Odu + isi + nODu + utt + Ane

Root(Level 1) + INF (Level 2) + AUXV (Level 3) + PRESENT (Level 2) + PNG(Level 3)

6.3 PROPOSED RULE BASED MAG

The proposed rule based MAG tool was developed using AT &T Finite State

Machine. This section describes the various efforts required to create the proposed rule

based MAG system.

6.3.1 Information Required to Build MAG

The following information‟s are required to build a morphological analyzer and

generator:

Page 22: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

205

6.3.1.1 Lexicon

The list of stems and affixes together with basic information‟s about them (Noun stem

or Verb stem etc,).

6.3.1.2 Morphotactics

The model of morpheme ordering that explains which classes of morphemes can follow

other classes of morphemes inside a word. E.g., the rule that Kannada plural morpheme

follows the noun stem rather than preceding it.

6.3.1.3 Orthographic rules

These are spelling rules used to model the changes that occur in a word, usually when

two morphemes combine. For example, insert a “yu” on the surface tape just when the

lexical tape has a morpheme ending in „e‟ (or i, etc) and the next morphemes are

“tt”(PRES) and “Ane”(3SM).

Aguvudilla

Aguttave

illa

Agu

Aguttive

Agadu

Aguvudu

Aguttade

PRESENT

FUTURE

ddre

ddu

yAytu

bEku

bEDa

bAradu

bahudu

iddu

iddare

PP(u,i

)

PNG

RP(ǿ)

PNG

Root

INF

ONa/LL

ONa/y

ONa

PAST

uvike/y

uvike/L

Luvike

iri/yir

i/LLiri

/

uva/yu

va/LLu

va

ali/zha

li/LLali

al/yal/

LLal

Olla/

yOlla

/LLOll

a

ade/y

ade/L

Lade

is/yis

/LLis

i/isi/L

Lisi

a/ya/iy

a/LLa

RP(a)

udu

PP(u)

PN

illa

RP(iruv)

ade

ide

PN

PNG

udilla

RP (a)

udu

PN

INF

PAS

T

PRE

SEN

T

FU

TU

RE

illa

al

uvike

bEDa

bEkAgiru

mADu

bEkAgibaru

iDu

nODu

baru

bAradu

bahudu

Agu

hOgu

sAdhyavAgu

iru

ikkAgibA

illa

Aytu

nODu

koLLu

hOgu

biDu

koMDiru

hAku

tolagu

taLLu

tOrisu

koDu

iDu

NEG(ad)

TM

PNG

PN

PNG

PNG

illa

u

PN

adu

Aytu

ad

u

Page 23: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

206

beLe + insert“yu” + PRES(tt) + 3SM(anu) ->beLe-yu-tt-Ane =beLeyuttAne

6.3.2 Creation of rules using FST

A FST essentially is a finite state automaton that works on two (or more) tapes. The

most common way to think about transducers is as a kind of “translating machine” which

works by reading from one tape and writing onto the other. For example, on one tape we

read “ಉದಾನಗಳನ”, on the other we write “ಉದಾನ+N +PL”, or the other way around as

shown in Fig. 6.2.“ಉ: ಉ” means read a “ಉ” symbol on one tape and write the same “ಉ”

on the other tape. Similarly “+N:ε” means read a “+N” symbol on one tape and write

nothing on the other.

Lexical Level

Surface Level

Page 24: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

207

Fig. 6.2:FST working principle

FST‟s can be used for both analysis and generation (they are bidirectional) and, it act

as a two level morphology [162, 163]. A word is represented as a correspondence between

a lexicalleveland surfacelevel. At lexical level, represents a simple concatenation of

morphemes making up a word. But at the surface level, represents the actual spelling of

the final word.

6.3.3Architecture of Proposed MAG Model

With all relevant morphological feature information of Kannada words, created well

defined sandhi rules based on FST. The architecture of proposed rule based MAG system

is as shown in Fig. 6.3.

For the Morphological generator, if the string which has the root word and its

morphemic information is accepted by the automaton, then it generates the corresponding

root word and morpheme units in the first level as shown in Fig. 6.4.

Fig. 6.3: Architecture of proposed MAG model

Here “beLe” is the root word, “V” indicates the category of the root word as verb,

“PRES and FUT” indicates the tense markers for presentence and future tense respectively

and 3SM indicates PNG marker for third singular masculine.

Page 25: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

208

Fig. 6.4: Example forMorphotactics Rule

The output of the first level becomes the input of the second level, where the

orthographic (sandhi) rules are handled as shown in Fig. 6.5. If it gets accepted then it

generates the inflected word.

Fig. 6.5: Application of Sandhi Rule

The sandhi rules should be written in such a way that, if the root word ends with “e”

and the next morphemes are “tt”(PRES) or “Ane”(3SM), then insert “yu” immediately

after the root word. Fig. 6.6 below shows the corresponding sandhi rule.

Page 26: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

209

Fig. 6.6: Example for Sandhi Rule

Based on the inflections and differences, all possible Morphotactic and Sandhi rules

were written for all forms of nouns and verbs using FST.

Example Rules for Noun Morphology

####Plural "gaLu"#####

[b2] [<epsilon>] / __ gaLu

This rule works as follows if any root word given as input in the form „Root+N+PL‟

as like this „mara+N+PL‟, here mara (tree) is root word, here + is consider as [b2] this is

already defined in alphabets file. So +N should be replaced as [<epsilon>] that is replace

as empty string and then Plural marker is added to the root word „+PL‟ is replaced as

„gaLu‟, now tool give output as „maragaLu‟.

Example Rules for Verb Morphology

####PRESENT “tt”#####

[b2] [<epsilon>] / __ tt

This rule works as follows if any root word given as input in the form Root+V+PRES

as like this „Odu+V+PRES‟, here Odu (tree) is root word, here + is consider as [b2] this is

already defined in alphabets file this is common for both Noun and Verb. So „+V‟ should

be replaced as „[<epsilon>]‟ that is replace as empty string and then Presentence marker is

added to the root word „+PRES‟ is replaced as „tt‟, now tool give output as „Odutt‟.

Page 27: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

210

6.4 MORPHOLOGICAL ANALYZER USING MACHINE LEARNING

APPROACH

The proposed morphological analyzer model was developed using supervised machine

learning approach using the popular classification and regression tool called SVM. In the

supervised machine learning approach, the training corpus consists of pairs of input

objects and the desired output.

Generally, in morphological analysis process, a complex word form is transformed

into root and suffixes. In the case of machine learning approach all the rules including

complex spelling rules are also handled by the classification task. Machine learning

approaches needs only corpora with linguistical information and do not require any hand

coded morphological rules [164]. The morphological or linguistical rules are automatically

extracted from the annotated corpora. In the proposed method, sequence labelling is used

to align the parallel corpus and morphological analysis problem was converted into

classification problem using machine learning approach.

6.4.1 Corpus development

The performance of the morphological analyzer is greatly depends on the aligned

morphological corpora which should be large, good-quality and representative. The

sequence of steps involved in the corpus development is as shown in Fig. 6.7.

6.4.1.1 Romanization

SVM support only Roman character code but Dravidian language like Kannada does

not support this code format and support only Unicode character. Mapping files were

created and used to map from Unicode to Roman and vice versa. The Romanized aligned

input-output data corpus,consisting of most commonly used verbs, selected from all verb

paradigms was created manually.

Page 28: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

211

Fig. 6.7: Preprocessing steps

6.4.1.2Segmentation

This step involves four different stages: grapheme segmentation, splitting syllable, C-

V (Consonant-Vowel) representation and segmentation. In the first stage each and every

Romanized word in the corpora is segmented based on Kannada grapheme. Again these

graphemes are split into syllables of consonants and vowels in the second stage. In the

next stage the consonant and vowel markers “-C” and “-V” are append to the segmented

consonant and vowel syllable respectively. The symbol “*” is used to indicate the

morpheme boundaries of the output data.

6.4.1.3Alignment and Mapping

Using the sequence labelling approaches [165] the segmented input - output words

were aligned vertically and consequently as segments with space between them. The

Table 6.19 shows a sample of alignment of input and out data in the corpus creation.

Table 6.19: Sample Training Data Format

Page 29: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

212

Input k-C a-V l-C i-V t-C a-V n-C u-V

Output k a l i* t* a n u*

6.4.1.4 Dissolving Mismatch

When we map input and output data words, the problem of mismatching may occur

due to either the input units are larger or smaller than that of the output units. In the first

case when the input units are larger than the output units, based on the morph-syntactic

rules, inserting a null symbol „$‟ in the output vector can solve the mismatching problem.

Consider the Kannada verb „kaliyuttAne‟ is having 11 segments in the input sequence and

10 segments in the output sequence.

Due to the morphosyntactic rule the occurrence of „y‟ in the input sequence becomes

null in the output sequence. To equalize the input and output data in such situation the

training data „y‟ is mapped with the empty symbol „$‟ as shown in Table 6.20.

In the second case the number of segments in the input sequence is less than the

corresponding output sequence as oppose to the first case. To illustrate this situation,

consider the Kannada verb „OdidaLu‟ which is having 7 segments in the input sequence

but 8 segments in the output sequence. In order to overcome this problem using the

morphosyntactic rule the first occurrence of „d- C‟ in the input sequence is mapped with

two segments „d‟ and „u‟ in output sequence in the training data.

Table 6.20: Dissolving Mismatch

Case Input Sequence Mismatched

Output Labels

Corrected Output

Labels

Case 1 k-C | a-V | l-C | i-V |y-C |u-V

| t | t-C | A-V | n-C |e-V

( 11 segments )

k | a|l| i* |u| t | t*| A|

n|e*

( 10 segments )

k | a| l| i* | $ | u| t | t*| A|

n|e*

(11 segments )

Case 2 O | d-C | i-V | d-C |a-V |L-V |

u-V

( 7 segments )

O | d | u* | i | d* | a | L | u*

( 8 segments )

O | du* | i | d* | a | L | u* ( 7 segments )

Page 30: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

213

6.4.2 Morphological Analyzer Model Creation

The architecture of proposed model mainly consists of three different phases as shown

in Fig. 6.8.

Fig. 6.8: Architecture of Morphological Analyzer Model

The pre-processed parallel corpus consists of sequence of input characters and their

corresponding output labels. The parallel corpus consist of more than 200,000 words were

trained using SVMTlearn tool and the morphological analyser model called model-I was

generated. The model-I was used to analyse and identify different morphemes associated

with the given input test word. Similarly another model called model-II was created for

assigning grammatical classes to each morpheme in a word and this second model was

trained using sequence of morphemes and their grammatical categories.

The working principle is as follows:The test input word is given to the trained model-

I. The trained model-I predicts each label to the input segments. In the next phase the

segmented morphemes are given to the trained model-II. It predicts grammatical

categories to the segmented morphemes for the given input word.

Page 31: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

214

6.5 SYSTEM EVALUATION AND PERFORMANCE

Development of MAG is a challenging task for all types of word forms. The

developed rule based MAG is capable of analyzing and generating a list of twenty

thousand nouns, around three thousand verbs and a relatively smaller list of adjectives.The

uniqueness of the developed MAG is its capacity to generate and analyze transitive,

causative and tense forms apart from the passive constructions, auxiliaries and verbal

nouns. The performance of the developed system can be substantially improved by adding

more rules such as rules for complex morphology etc. Also by checking against more and

more different types of word lexicons, the accuracy of the developed MAG can be

improved. A rule based machine translation system for English to Kannada language was

developed using the proposed MAG. The following Fig. 6.9 shows a command line

screenshot for the developed RMAG.

In the second proposed method, a parallel corpus consist of more than 200,000 words were

trained using SVMTlearn tool and the morphological analyzer model was generated.

Using the SVMTagger we have tested more than 50,000 different verb words selected

from two standard dictionaries [166,167] and also from the Amrita POS tagged corpus.

The performance of the system was evaluated using SVMTeval tool and the outputs which

are incorrect are noticed. In contrast to the rule based approach, the system performance is

considerably increased by adding the input words to the training corpus whose

corresponding output are incorrect during testing and evaluation. From the experiment we

found that the performance of our system significantly outperforms and achieves a very

competitive accuracy of 96.25% for Kannada verbs.

Page 32: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

215

Fig. 6.9: Command Line Output of Morph Tool

Sample screenshots of the developed MAG model for noun are shown in Fig. 6.10 and

6.11. Similarly Fig. 6.12 and 6.13 shows the screenshots of the developed MAG model for

verb.

Fig. 6.10: Screenshot of Kannada Morph analyzer for Noun

Page 33: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

216

Fig. 6.11: Screenshot of Kannada Morph generator for Noun

Fig. 6.12: Screenshot of Kannada Morph analyzer for Verb

Page 34: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

217

Fig. 6.13: Screenshot of Kannada Morph generator for Verb

6.6 SUMMARY

This chapter is a part of the research work which deals with the design and

development of morphological analyzer and generator for Kannada language using rule

based as well as statistical based approaches.

Development of a morphological analyzer and generator for all types of word forms is a

challenging task for an agglutinative language like Kannada. The implementations aimed

to incorporate more lexical information of Kannada language with good semantic features,

which will solve the morphological problem more effectively. The performance of the

statistical approach depends on large sized aligned bilingual corpora of all types of word

categories. On the hand the performance of the rule based approach depends on all types

of simple and complex linguistic rules, in order to cover all types of word forms.

The performance of the developed rule based system can be substantially improved by

adding more rules by checking against more and more different types of word lexicons.

On the other hand, the performance of the statistical approach can be improved by

increasing the corpus size to cover other word categories like noun, pronoun, adverb etc.

Page 35: KANNADA MORPHOLOGICALANALYZER AND GENERATOR

218

6.7 PUBLICATIONS

1. Antony P J, Anand Kumar M and Soman K P: “Paradigm Based Morphological

Analyzer for Kannada Language Using Machine Learning Approach.”, International

journal on-Advances in Computer Science and Technology (ACST), ISSN 0973-6107,

Vol 3 No. 4, 2010, pp. 457–481.

2. Ramasami Veerappan, Antony P J and Soman K P: “A Rule based Kannada

Morphological Analyzer and Generator using Finite State Transducer”, International

journal on Computer Application -IJCA (0975 – 8887), Volume 27– No.10, August

2011.