Upload
vuongcong
View
300
Download
8
Embed Size (px)
Citation preview
184
CHAPTER 6
KANNADA MORPHOLOGICALANALYZER AND
GENERATOR
From a serious computational perspective, the creation and availability of a MAG for
a language is important.The role of morphology is very significant in the field of NLP, as
seen in applications like MT, QA system, IE, IR, spell checker, lexicography etc.
Morphological analyzer and generators are very essential for languages having rich
inflectional and derivational morphological features such as Kannada. The function of
morphological analyzer is to return all the morphemes and their grammatical categories
associated with a particular word form. On the other hand, for a given root word and
grammatical information, morphological generator will generate the particular word form
of that word.Developing a full fledged morphological analyzer and generator tools for
highly agglutinative language like Kannada is a challenging task. To build a
morphological analyzer and generator for a language, one has to take care of the
morphological peculiarities of that language. Some peculiarities of Kannada language such
as, the usage of classifiers, excessive presence of vowel harmony etc. make it
morphologically complex and thus, a challenge in NLG.
Table 6.1: Input/Output examples for morphological analyzer
Input Output
ಆನೆಗಳನ (AnegaLu) ಆನೆ+ಗಳನ (Ane+gaLu)
ಹ ೋಗನತ್ತೋನೆ (hOguttEne) ಹ ೋಗನ+ಉತ್ತತ+ಏನೆ (hOgu+utt+Ene)
The function of morphological analyzer is to segment the given word into component
morphemes and assigning correct morpho-syntactic information. The Table 6.1 shows
examples for morphological analysis of Kannada words.The function of morphological
generator is to combine the constituent morphemes to get the actual word. The Table 6.2
shows examples for morphological generation of Kannada words.
185
Table 6.2: Input/output Examples for Morphological Generator
Input Output
ಆನೆ+ಗಳನ (Ane+gaLu) ಆನೆಗಳನ (AnegaLu)
ಹ ೋಗನ+ಉತ್ತತ+ಏನೆ (hOgu+utt+Ene) ಹ ೋಗನತ್ತೋನೆ (hOguttEne)
The first section of this chapter used to explain the development of a Rule Based
Morphological Analyzer and Generator (RBMAG) system for Kannada language by
incorporating morphological information and peculiarities of the language. The proposed
RBMAG system was developed using FST.
The second section of this chapter explains the development of a statistical based
morphological analyser for Kannada language verbs. The developed statistical system is a
paradigm based morphological analyzer, developedusing machine learning approach. The
systemwas designed using sequence labelling approach and training, testing and
evaluations were performed by SVM algorithms.
6.1 CHALLENGES IN KANNADA MORPHOLOGY
Kannada belongs to the south Dravidian family of languages. Kannada morphology is
characterized as agglutinative or concatinative, i.e., words are formed by adding suffixes
to the root word in a series. When suffixes attach to the root word, several
morphophonemic changes take place. The orders in which suffixes attach to a root word
determine the morpho-syntax. It is a verb-final inflectional language with relatively free
word order. Agglutination is another important feature of this language. Out of 38 basic
characters, 330 conjuncts are formed due to combination vowels and consonants. There
are more than 10,000 basic stems (root words) in Kannada language. Also more than a
million morphed variants formed due to more than 5000 distinct character variants
[158,159].
The complexity of developing morphological analyzer for Dravidian language like
Kannada is comparatively higher than the other languages like English. Most of the words
may change spelling when stems are inflected. In agglutinative language like Kannada
186
normally root word is affixed with several morphemes to generate thousands of word
forms. To build an effective morphological analyzer one should carefully analyze and
identify all these roots and morphemes. The next challenge is the design of morphological
structure and generation of well organized corpus that should possibly cover all types of
inflections.
6.1.1 Agglutination of Kannada Language
Agglutination is the most critical and important feature of Kannada language. Due to
the highly agglutinating nature of this language and the morphophonemic variations that
take place at the point of agglutination, it is very difficult to mark word boundaries. For
example consider the verb root „ಓದಕೆ ೆಂಡಿದ್ವನ‟ (OdikoMDiddavana) ->“the
one(masculine) who was reading”. The different meaningful parts of this word are as
follows:
ಓದನ + ಇ + ಕೆ ಳನು + ಂೆಂಡ್ + ಉ + ಇಯನ + ದ್ + ಅ + ಅವನನ + ಅ
Odu + i + koLLu + MD + u + iru + dd + a + avanu + a
Root + VBP + AUXV + PST + VBP +AUXV + PST + RP + PRON-3SM + ACC
The above word consists of ten meaningful parts, in which one root word (Root), two
Verbal Participle (VBP), two Auxiliary Verbs (AUXV), two Past Tense Markers (PST),
one Relative Participle (RP), one Pronoun (PRON-3SM) and one accusative (ACC).
6.1.2 Types and Features of Kannada Words
In general, there are three types of Kannada words namely: i) namapada (Declinable
words or nouns) ii) kriyapada (Conjugable words or Verbs) and iii) avyaya (Uninflected
words). Nouns, pronouns and adjectives belong to declinable words and are inflected to
differences of case, number and gender. Conjugable words are inflected to mark
differences of person, gender, number, aspect, mood and tense. All the Kannada words are
of three genders: masculine, feminine and neuter. Declinable and conjugable words have
two numbers: singular and plural. The singular has no particular distinguishing marker
added. The plural marker is usually “gaLu”, but there are some exceptions as follows:
Masculine nouns (E.g., huDuga) ending in “a” and some feminine nouns (E.g., heMgasu)
187
endings in “u” have plural with “aru” . Feminine nouns ending with “i (E.g., huDugi)” or
“e (atte)” have plural with “yaru”. Also nouns with kinship terms (E.g., aNNa), the marker
for plural is often “aMdiru”. Some nouns are irregular plurals such as “makkaLu” which is
the plural for noun “magu”.
6.1.3 Noun Cases and Characteristics suffixes
The case system of Kannada is similar to those of other south Dravidian languages
like Tamil, Telugu and Malayalam. Nouns may usually end in a, e, i, u, A or in a
consonant [158]. Various suffixes are added to the noun stem to indicate different
relationships between the noun and other constituents of the sentence. The different types
of suffixes are used with a particular case, based on the type of nouns and their end
character. For example “dative” case characteristic suffixes are decided by the following
criteria as shown in Table 6.3.
Table 6.3: Dative Case Characteristics suffixes for Nouns
Noun type Ends with Dative
suffix
Example
noun
Dative form
Neuter noun
ಅ (a) ಕೆ್ (kke) ಭಯ (mara) ಭಯಕೆ ್(marakke)
ಎ,ಇ,ಉ (e,i,u) ಗೆ (ge) ಭನೆ (mane) ಭನೆಗೆ (manege)
consonants ಇಗೆ (ige) ಊಯನ (Uru) ಊರಿಗೆ (Urige)
Neuter
determinative
- ಅಕೆ್ (akke) ಇದನ (idu) ಇದಕೆ ್(idakke)
Rational noun - ನಗೆ (nige) ಅಣ್ಣ (aNNa) ಅಣ್ಣನಗೆ (aNNanige)
Table 6.4 below shows the different cases and their corresponding characteristic
suffixes for nouns.
188
Table 6.4: Noun Cases and their Characteristics suffixes
Feature Characteristic Suffix
Singular Plural
Nominative(Pr
athama) ವು(vu)/ಮನ(yu)/ಉ(u)/ ನನ(nu) ಗಳನ(gaLu)/ಯನ(ru)/ಂೆಂದಯನ(Mdiru)/ಮ
ಯನ(yaru)
Accusative
(Dwitiya) ವನನು(vannu)/ಮನನು(yannu)/ಅ
ನನು(annu)/ನನನು (nannu)
ಗಳನನು(gaLannu)/ಂೆಂದಯನನು(Mdiran
nu)/ಮಯನನು(yarannu)/ಯನನು(rannu)
Instrumental
(Tritiya) ದೆಂದ(diMda)/ಯಿೆಂದ(yiMda)/
ಇೆಂದ(iMda)/ನೆಂದ(niMda)
ಗಳೆಂದ(gaLiMda)/ಂೆಂದರಿೆಂದ(MdiriMd
a)/ಮರಿೆಂದ(yariMda)/ರಿೆಂದ(riMda)
Dative
(Chaturthi) ಕೆ್(kke)/ಗೆ(ge)/ಇಗೆ
(ige)/ನಗೆ(nige)/ಅಕೆ ್(akke)
ಗಳಗೆ(gaLige)/ಂೆಂದರಿಗೆ(Mdirige)/ಮರಿ
ಗೆ(yarige)/ರಿಗೆ(rige)
Ablative
(Pachami)
ದೆಸೆಯಿೆಂದ(deseyiMda)
ಗಳದೆಸೆಯಿೆಂದ(gaLadeseyiMda)/ಂೆಂದ
ಯದೆಸೆಯಿೆಂದ(MdiradeseyiMda)/ಮಯ
ದೆಸೆಯಿೆಂದ(yaradeseyiMda)/ಯದೆಸೆಯಿೆಂ
ದ(radeseyiMda)
Genitive
(Shashti) ದ(da)/ಮ(ya)/ಇನ(ina)/
ನ(na)/ಅ(a)/ವಿನ(vina)/ ಅಯ(ra)
ಗಳ(gala)/ಂೆಂದಯ(Mdira)/ಮಯ(yara)/
ಯ(ra)
Locative
(Saptami) ದಲ್ಲಿ(dalli)/ಮಲ್ಲಿ(yalli)/ಅಲ್ಲಿ(alli)
/ ನಲ್ಲಿ(nalli)
ಗಳಲ್ಲ (ಿgaLalli)/ಂೆಂದಯಲ್ಲ (ಿMdiralli)/
ಮಯಲ್ಲ (ಿaralli)/ಯಲ್ಲ (ಿralli)
Vocative
(Sambhodana) ಏ(E)/ಳೋ(vE)/ಆ(A)/ಈ(I) ಗಲ(gale)/ಂೆಂದರೋ(MdirE)/
/ಮರೋ(yare)
6.1.4 Verb Morphology
Comparing with other Dravidian language like Malayalam, the morphological
structure of Kannada is more complex because it inflects to person, gender, and number
markings. In case of verb morphology each root word is combines with auxiliaries that
indicate aspect, mood, causation, attitude etc. The uniqueness in the structure of verbal
189
complexity makes it very challenging to capture in a machine analyzable and generatable
format. Also the formation of the verbal complex involves arrangement of the verbal units
and the interpretation of their combinatory meaning. Phonology also plays a little role in
word formation in terms of „morphophonemic‟ and „sandhi‟ rules which account for the
shape changes due to inflection. To resolve the computational challenges in verb
morphological analysis,I have classified verbs into 35 distinguished paradigms and verb
words are grouped based on their class paradigms.
6.1.4.1 Inflections and Features of Kannada Verbs
Verb forms can be broadly classified into two types: finite verbs and non-finite verbs.
In case of finite verbs, the verbs are usually added to the end of sentences with the
exception of clitics or reportives and can have nothing added to them. The general syntax
of finite verb is the form: Subject-Object-Verb. Some of the finite forms of the verbs are
imperatives, present and past forms marked with PNG, modals and verbal/participle
nouns. The tense can be past/present/future, if it is in the affirmative. The negative form
does not take tense. The non-finite verbs in contrast cannot stand alone and must have
some other forms following them. Non-finite verb forms include infinitives, verbal and
adjectival participles and tense-marked verb stems [158]. The non-past denotes both
present and future tenses and unlike Malayalam language (another south Dravidian
language) all tenses have different tense markers in Kannada language. Mood is another
important feature of Kannada language and is associated with statements of fact versus
possibility, supposition, etc [160]. There are four different moods that are expressed in
Kannada are: infinitive, imperative, affirmative and negative. Also Kannada has some
additional modal forms such as: indicative, conditional, optative, potential, monitory and
conjunctive.
Kannada language also include past verb stems in addition to simple verb stems, that
are used in forming the past tense, past participles, conditionals and some other
constructions. The contingent form is another distinguished feature of Kannada language
that is not present in any other Dravidian languages [161]. Past verbs are broadly classified
into two types called regular and irregular (or semi regular). In case of regular the different
words are formed by adding „id‟ to the verb stem. In the other case different words are
formed by adding any one of the past tense marker as shown in Table 6.18.
190
6.1.4.1.1 The Infinitive
The infinitive is a non-finite form of the verbs that occur together with other verbs,
auxiliary verbs (modals), negative morphemes and some other forms. There are two types
of infinitives in Kannada called „ಅಲ್ „(al) and „ಓಕೆ್‟ (Okke). Both are added to the verb
root to generate other word forms as shown in Table 6.5.
Table 6.5: The Infinitive
Case Example and Meaning
Case 1 ಫಯನ ( baru ) -> come + ಅಲ್ ( al ) + ಇಲ್ (ಿ illa ) -> negative =
ಫಯಲ್ಲಲ್ ಿ (baralilla ) -> didn’t come
Case 2 ನೆ ೋಡನ ( nODu ) -> see + ಓಕೆ್ (Okke) = ನೆ ೋಡ ೋಕೆ ್(nODOkke ) ->
to see
6.1.4.1.2The Imperative
Based on various degrees of politeness and deference, Kannada verbs exhibit number
of forms that express commands or exhortations. These imperatives also changes based on
verb types which usually depend upon the end of verb. The Table 6.6 below indicates the
imperatives with example each.
Table 6.6: The Imperative
Degree of politeness i – Stem,
Eg.ಕನಡಿ
(kuDi)„drink‟
o – Stem,
Eg.ತ್ಗೆ
(tego) „take‟
bA , tA –Stem,
Eg.ಬ,ಫಯನ
(bA,baru)‟come‟
Other
stems
Eg.ಹ ೋಗನ (hOgu)
„go‟
Impolite,
casual
Masculine ಕನಡಿಯೋ
(kuDiyO )
ತ್ಗೆ ಲ ುೋ
(tegoLLO )
ಬರ ೋ (bArO ) ಹ ೋಗೆ ೋ
(hOgO)
Feminine ಕನಡಿಯೋ
(kuDiyE)
ತ್ಗೆ ಲುೋ
(tegoLLE)
ಬರೋ (bArE) ಹ ೋಗೆೋ
(hOgE)
Nonpolite ಕನಡಿ (kuDi) ತ್ಗೆ (tego) ಬ (bA) ಹ ೋಗನ
(hOgu)
191
Polite, plural ಕನಡಿೋರಿ
(kuDIri)
ತ್ಗೆ ಳು
(tegoLLi )
ಫನು (banni) ಹ ೋಗಿ
(hOgi)
Very polite ಕನಡಿಯಿರಿ
(kuDiyiri )
ತ್ಗೆ ಳುರಿ
(tegoLLiri)
ಫನುರಿ (banniri) ಹ ೋಗಿರಿ
(hOgiri)
ultrapolite ತವುಕನಡಿಯಿರಿ
(tAvu
kuDiyiri)
ತವುತ್ಗೆ ಳುರಿ
(tAvu
tegoLLiri)
ತವುಫನುರಿ
(tAvu banniri)
ತವುಹ ೋಗಿ
ರಿ
(tAvu
hOgiri)
6.1.4.1.3 Negative Imperatives
These forms command someone not to do something. There are two ways for
indicating negative imperatives called „ಬಯದನ‟ (bAradu) and „ಬೋಡ‟ (bEDa). The first one
„bAradu‟ (historically a form of bA/baru, „come‟) maybe added with infinitive „ಅ „(a) to
generate negative verb form. Also added with infinitive „ಅಲ್ „(al) or „ಲ್‟(l) and an
emphatic „ಏ‟ (E) to generate strong negative verb forms. Table 6.7 illustrates the negative
imperative with example.
Table 6.7: The Negative Imperative
Case Example and Meaning
Case 1
ಹ ೋಗನ (hOgu) + ಅ (a) + ಬಯದನ (bAradu) = ಹ ೋಗಬಯದನ
(hOgabAradu) -> ‘don’t go’ or
ಹ ೋಗನ (hOgu) + ಬಯದನ (bAradu)= ಹ ೋಗಾಯದನ (hOgbAradu) -> ‘don’t go’
Case 2
ಹ ೋಗನ (hOgu) + ಅಲ್ (al) + ಏ (E) + ಬಯದನ (bAradu) = ಹ ೋಗಱೋಬಯದನ
(hOgalEbAradu) ‘definitely don’t go’ or
ಹ ೋಗನ (hOgu) + ಲ್ (l) + ಏ (E) + ಬಯದನ (bAradu) = ಹ ೋಗೆಿೋಬಯದನ
(hOglEbAradu) ‘definitely don’t go’
Similarly the second way for indicating negative imperatives are using the negative
modals „ಬೋಡ‟ (bEDa, negative modal of the word „ಬೋಕನ‟ (bEku) - want, need, must,
should) and „ಕ ಡದನ‟ (kUDadu, ‟ must not‟). Table 6.8 illustrates these negative
imperatives with example.
192
Table 6.8: The Negative Imperative
Case Example and Meaning
Case 1
ಮಡನ (mADu) + ಅ (a) + ಬೋಡ (bEDa) = ಮಡಬೋಡ (mADabEDa) or
ಮಡನ (mADu) + ಬೋಡ (bEDa) = ಮಡಾೋಡ (mADbEDa), ‘don’t do’
Case 2 ಮಡನ (mADu) + ಅ (a) + ಬೋಡಿ (bEDi) = ಮಡಬೋಡಿ (mADabEDi) or
ಮಡನ (mADu) + ಬೋಡಿ (bEDi)= ಮಡಾೋಡಿ (mADbEDi), ‘please don’t do’
Case 3
ಮಡನ (mADu) + ಅಲ್ (al) + ಏ (E) ಬೋಡ (bEDa) = ಮಡಱೋಬೋಡ
(mADalEbEDa) or
ಮಡನ (mADu) + ಲ್ (l) + ಏ (E) ಬೋಡ (bEDa) = ಮಡಿೋಬೋಡ
(mADlEbEDa), ‘must not do’
Case 4
ಮಡನ (mADu) + ಅ (a) + ಕ ಡದನ (kUDadu) = ಮಡಕ ಡದನ
(mADakUDadu) or
ಮಡನ (mADu) + ಕ ಡದನ(kUDadu)= ಮಡ ್ಡದನ(mADkUDadu), ‘must
not do’
6.1.4.1.4 Optative
The optative is usually used with first and third persons in Kannada and is formed by
adding‟ ಇ‟ (i) to the infinitives. It is often translates into an English word „let‟, if it is used
in an affirmative and has the meaning „shall‟, „should‟ and „may‟ when appeared in the
interrogative as shown in table 6.9.
Table 6.9: The Optative
Case Example and Meaning
Affirmative ಅವಳನಭಡನ (avaLu maDu) + ಅಲ್ (al) + ಇ (i) = ಅವಳನಮಡಲ್ಲ
(avaLu mADali), ‘let her do’
Interrogative ಅವಳನಯವಗಭಡನ (avaLu yAvAga maDu) + ಅಲ್ (al) + ಇ (i) =
ಅವಳನಯವಗಮಡಲ್ಲ (avaLu yAvAga mADali)?, ‘when
shall/should/may she do?’
193
6.1.4.1.5 Hortative
This form in Kannada can be formed by „ಓಣ್‟ (ONa) to the verb stem and can be
translated either as „let‟s (do something) or „shall we (do something)?‟ in the interrogative
sentence. Table 6.10 illustrates this with example.
Table 6.10: The Hortative
Case Example and Meaning
Affirmative ಮಡನ (mADu) + ಓಣ್ (ONa) = ಮಡ ೋಣ್ (mADONa), ‘let’s do’
Interrogative ಏನನ (Enu) + ಮಡನ (mADu) + ಓಣ್ (ONa) = ಏನನಮಡ ೋಣ್ (Enu
mADONa)?, ‘What shall we do?’
6.1.4.1.6 Participle
Kannada has some non finite verb forms called participles that function verbally,
adjectivally or has some special syntactic function in the sentence. Participles may be
affirmative and that can be marked for tense or negative. The important verbal participles
in Kannada are as shown in Table 6.11 with example for each.
Table 6.11: Verbal Participle
Participle Description Example
Present
verbal
participle
Adding „ಆ‟(A) to the „verb stem‟
+ „ ಉತ್ತತ ’(utt, present tense
marker) followed by a „finite
verb‟ or „verb phrase‟.
ಅವುುಯೋಚನೆಮಡನತತಕನಳತ್ತತದ್ನನ(avnu
yOcane mADuttA kuLittiddanu)‘ he
was sitting thinking’
Past verbal
participle
If the past tense marker of the
verb is ‘ಇದ್’(id), then add
‘ಇ’(i) to the ‘verb stem’
followed by a ‘finite verb’ or
‘verb phrase’. Otherwise add
‘ಉ’(u) to the ‘past verb stem’
Case 1:ಮಡಿಊರಿಗೆಫೆಂದೆನನ( mADi
Urige baMdenu)‘having done, I
came to village’
Case 2:ಹ ೋಗಿ + ಬಿಟನು + ಫನು( hOgi +
biTTu + banni) ‘go and come’
194
followed a ‘finite verb’ or ‘verb
phrase’.
Negative
verbal
participle
Used to express the negative
of both the present and past
verbal participle. These are
formed by adding ‘ಅದೆ‘(ade) to
the ‘verb stem’ or to the
negative stem ‘illa’.
Caes 1: ನೆ ೋಡನ + ಅದೆ = ನೆ ೋಡದೆ
(nODu + ade = nODade)‘without
seeing’
Case 2: ಇಲ್ ಿ+ ಅದೆ =ಇಲ್ಿದೆ (illa + ade
=illade) ‘not being/having
been’
Verbal/part
iciple noun
The most common among
verbal participle nouns are
the neuter singular ‘ಅದನ’(adu)
and personal verbal nouns
like ‘ಅವನನ’ (avanu), ‘ಅವಳನ’
(avaLu), ‘ಅವಯನ’(avaru) etc.
Caese 1: ಮಡನ + ಓ +ಅದನ= ಮಡ ೋದನ
(ADu + O +adu= mADOdu) ‘the
(act/fact of) doing, that which does’
Case 2: ನೆ ೋಡನ + ಓ +ಅವಯನ =
ನೆ ೋಡ ೋವವಯನ n(ODu + O +avaru =
nODOvavaru) ‘those (people
) who see’
Negative
verbal/parti
ciple noun
Can be formed by affixing the
negative adjectival participle
‘ಅದ’ (ada) to the
demonstrative pronouns.
Case 1: ಮಡನ + ಅದ + ಅದನ =ಮಡದದನ
(mADu + ada + adu =mADadadu)
‘(the act/fact of) not doing, that
which does/did not do’
Case 2: ಇಲ್ಿ +ಅದ
+ಅದನ=ಇಲ್ಿದದನಒಇಾಲ್ಿದನ ್
( illa +ada +adu=illadadu or illaddu )
‘the act/fact of not being, that which
is /was not
Case 3: ಹ ೋಗನ+ ಅದ +ಅವಳನ =
ಹ ೋಗದವಳನ (hOgu+ ada +avaLu =
hOgadavaLu ) ‘the women
who does/did not go’
195
6.1.4.1.7 Modal auxiliaries
There are number of modal auxiliaries in Kannada language that may have number of
different meanings as shown in Table 6.12.
Table 6.12: Modal Auxiliaries
Modal
auxiliaries
Description Example
‘ಬೋಕನ’ (bEku)
The modal ‘ಬೋಕನ (is
wanted, needed; must,
should, ought) is used in
different situation with
different meaning.
Case 1:ರ್ನನಹ ೋಗಬೋಕನ ( nAnu
hOga bEku)‘ I ought/need/want to
go’
Case 2: ನೋವುರ್ಲಇಲ್ಲಿಇಯಬೋಕನ ( nIvu
nALe illi ira bEku ) ‘you
must/should be here tomorrow’
Case 3: ನೋವುಅವನನನುನೆ ೋಡಿಇಯಬೋಕನ(
nIvu avanannu nODi ira bEku ) ‘
you must have seen him’
‘ಬೋಡ’(bEDa),
‘ಬೋಡಿ’(bEDi)
The negative of modal
‘ಬೋಕನ’ is ‘ಬೋಡ’( should not,
must not, need not). The
more polite or plural form
is ‘ಬೋಡಿ’.
Case 1: ಫಯಬೋಡ (bara bEDa) ‘
don’t come’
Case 2: ಫಯದಯಬೋಡ (baradira
bEDa) ‘come’
‘ಕ ಡದನ’
(kUDadu)
Similar to the negative
modal ‘ಬೋಡ’ but ಕ ಡದನ is
the strongest negative
modal. This also attached
to the infinitive as like
‘ಬೋಡ’ and ‘ಬಯದನ
‘(bAradu).
ಫಯಕ ಡದನ (bara kUDadu)
‘should/must not come’
‘ಫಹನದನ’
(bahudu)
This is also attached to
the infinitive and has the
meaning (someone)
ನೆ ೋಡಫಹನದನ (nODa bahudu)
‘can/might see’
196
can/may (do something)
‘ಬಯದನ’
(bAradu)
The negative of ‘ಫಹನದನ’ is
‘ಬಯದನ’
ನೆ ೋಡಬಯದನ (nODa bAradu)
‘can’t/shouldn’t see
‘ಆರ್ ‘ (Ar)
The negative contingent
‘ಆರ್’ with PNG markers is
attached to the verbal
infinitive to generate the
meaning ‘cannot’/’might
not’.
ನಗನ + ಅಲ್ + ಆರ್ + ಎನನ = ನಗಱರನನ
nagu + al + Ar + enu =
nagalArenu
‘ cannot/might not laugh’
6.1.4.1.8 Dative-stative verbs
Mainly, there are two verb stems namely „ಫಯನ‟ (baru) and „ಆಗನ‟ (Agu) are frequently
used as dative-stative verbs in Kannada language. These verbs have a habitual sense when
they stand alone and normally do not take tense markers. The verb „ಇಯನ‟ ( iru, be)
indicating possession has the meaning „have‟ in English. The second verb „ಆಗನ‟(become)
act as an aspect marker indicating finality.
6.1.4.1.9 Verbal aspect markers
Aspect markers are very similar to main verbs in their morphology and syntax but
semantically they do not express the lexical meaning like the other main verbs. In
Kannada language, the verbal aspect marker is usually added to the past verbal participle.
The Table 6.13 shows the various verbal aspect markers that are used in Kannada. The
aspect markers in Kannada and their meaning in English are underlined in each of the
given example.
Table 6.13: Verbal Aspect Markers
Aspect markers and
their meaning
Description Example
‘ಬಿಡನ’(biDu) Meaning:
‘completion’
(‘perfective’,
Aspectual ‘ಬಿಡನ’
is attached to
the past verbal
Case 1: ಅವನನಬಿದನ್ಬಿಟುನನ( avanu
biddu biTTanu) ‘he fell down’
Case 2: ಬಿಟನುಬಿಡನ (biTTubiDu) ‘let
197
definiteness). participle. It is
homophonous
with the lexical
verb ‘ಬಿಡನ’
(leave) and has
the similar
tense
formation.
However the
aspectual ‘ಬಿಡನ’
can also
attached to the
lexical verb
‘ಬಿಡನ’.
go’
‘ಹ ೋಗನ’(hOgu)Meaning:
‘completion’(sometimes
in voluntaryor
accidental)
The aspectual
‘ಹ ೋಗನ’
indicates that
something has
changed from
one state to
another. It is
homophonous
with the lexical
verb ‘ಹ ೋಗನ’
(go) and has
the similar
tense
formation.
Aspectual
‘ಹ ೋಗನ’ is
attached to the
past verbal
Case 1: ಅನುಬೆಂದನಹ ೋಗಿದೆ(anna
beMdu hOgide) ‘the rice has
gotten overcooked’
Case 2: ಕೆಟನುಹ ೋಗನ ( keTTu
hOgu)‘get spoiled’
198
participle.
‘ಆಡನ’(ADu)
Meaning: continuity,
duration (with some
verbs ‘reciprocal’or
‘competitive’)
It is
homophonous
with the lexical
verb ‘ಆಡನ’ (paly)
and has the
similar tense
formation.
Aspectual ‘ಆಡನ’
is also attached
to the past
verbal
participle.
Case 1: ಅವಯನಓಡ್ಡಿದಯನ(avaru
ODADidaru)‘they ran around’
Case 2: ಅವಯನಕದಡಿದಯನ(avaru
kAdADidaru)‘they fought with
each other’
‘ಕೆ ಡನ’ (koDu)
Meaning: benefactive
It is
homophonous
with the lexical
verb ‘ಕೆ ಡನ’
(give) and has
the similar
tense
formation.
Aspectual
‘ಕೆ ಡನ’ is also
attached to the
past verbal
participle.
Case :
ಅವನನಕತ್ಫರದನಕೆ ಟುನನ(avanu kate
baredu koTTanu)‘he wrote the
story for someone’s benefit ‘
‘ನೆ ೋಡನ’ (nODu)
Meaning: ‘attemptive’,
experimentive
It is
homophonous
with the lexical
verb ‘ನೆ ೋಡನ’
(see) and has
the similar
Case:
ಅವನನಕಫಿಕನಡಿದನನೆ ೋಡಿದನನ(avanu
kAfi kuDidu nODidanu)‘hetried
drinking/tasted the coffee’
199
tense
formation. It is
usually used
with transitive
verbs and
rarely used with
intransitive
formation.
‘ಹಕನ’ (hAku’)
Meaning :exhaustive,
malefactive
It is
homophonous
with the lexical
verb ‘ಹಕನ’ (put,
place) and
takes regular
(‘weak’) tense
formation. It is
usually used
with transitive
verbs.
Case:ಅವನನದೆ ೋಸೆಯಲ್ಿತ್ತೆಂದನಹಕಿದನನ(
avanu dOseyella tiMdu
hAkidanu)‘he ate up all the pan
cakes’ (against our whishes)
‘ಕೆ ಳನು’ (koLLu)
Meaning: reflexive, self
benefactive
It is
homophonous
with the lexical
verb ‘ಕೆ ಳನು’
(buy, take,
acquire) and
has the similar
tense
formation.
Case : ಅವನನಕತ್ಫರದನಕೆ ೆಂಡನನ (
avanu kate baredu koMDanu)‘he
wrote a story for himself’
6.1.4.1.10 The causative suffix ‘ಇಸನ‘ (isu)
The causative suffix „ಇಸನ‟ (isu) (or „yisu‟) can be added to any verb stems to make
causative verbs out of noncausative ones as in Table 6.14.
200
Table 6.14: The causative suffix „ಇಸನ„ (isu)
Case Example and Meaning
Case 1 ‘ಕಲ್ಲ’ (kali -> learn) + ‘ಇಸನ‘(isu) -> ‘ಕಲ್ಲಸನ’ (kalisu -> teach)
(Intransitive) ( Causative)
Case 2
‘ಮಡನ’ (mADu -> do) + ‘ಇಸನ’ (isu) -> ‘ಮಡಿಸನ’( mADisu, make
(someone ) do->Intransitive- Causative)+‘ಇಸನ’(isu)+ಮಡಿಸಿಸನ(
mADisisu, make (someone) make (someone) do -> Double
causative)
6.1.4.1.11 The conditional suffix ‘are’
The conditional suffix „are‟ in Kannada is used to express „if‟ (something) in English
and the same form is used for all persons. Usually the suffix „are‟ is attached to the past
verb stem to create the conditional form. The Table 6.15 illustrates this.
Table 6.15: The conditional suffix
Case Example and Meaning
Case 1 ‘ಮಡಿದರ’ (mADidare) ‘if (someone) does (something) , (then..)’
Case 2 ‘ಹ ೋದರ’ (hOdare) ‘if (someone) goes, (then…)’
6.1.4.1.12 Verbs marked with tense and PNG
The PNG and the tense marker concatenated to the verb stems are the two important
aspect of verb morphology. The verbal inflectional morphemes attach to the verbs
providing information about the syntactic aspects like number, person, case-ending
relation and tense. The Table 6.16 shows the various PNG suffixes that can be attached to
be any verb root word.
Table 6.16: PNG-Suffixes
Person Number Gender PNG Suffix
201
Present Future Past Contingent
First Singular Masculine
/Feminine
ಏನೆ (Ene) ಎನನ, ಎ
(enu, e)
ಎನನ, ಎ
(enu, e)
ಏನನ (Enu)
Plural Masculine
/Feminine
ಏಳ (Eve) ಎವು (evu) ಎವು (evu) ಏವು ( Evu)
Second Singular Masculine
/Feminine
ಈ, ಈಯ
(I, Iye)
ಇ, ಇಯ (i,
iye)
ಇ, ಇಯ (i,
iye)
ಈಮ (Izha)
Plural Masculine
/Feminine
ಈರಿ (Iri) ಇರಿ (iri) ಇರಿ (iri) ಈರಿ ( Iri)
Third
Singular Masculine ಆನೆ (Ane) ಅನನ (anu) ಅನನ (anu) ಆನನ (Anu)
Singular Feminine ಆಲ (Ale) ಅಳನ (aLu) ಅಳನ (aLu) ಆಳನ ( ALu)
Plural Masculine
/Feminine
ಆರ (Are) ಅಯನ (aru) ಅಯನ (aru) ಆಯನ ( Aru)
Singular Neuter ಇದೆ (ide) ಉದನ (udu) ಇತನ(itu) ಈತನ, (Itu)
Plural Neuter ಇಳ (ive) ಅವು (Avu) ಅವು (avu) ಆವು ( Avu)
6.2 CLASSIFYING VERB PARADIGMS AND SYSTEM DESIGN
The first step involved in the implementation of morphological analyzer is to classify
the verb paradigms with computational perspective. Most of the cases the problem arises
due to past tense markers that change from one paradigm to another [160]. I have
classified Kannada verbs into 35 different paradigms by considering the entire situation
that possibly generates different possible words as discussed previously. Each paradigm
root word will inflect with the same set of inflections. In other words, every word in each
paradigm will have similar orthographic changes (sandhi changes) or has the same
inflectional behavior, when a suffix is added to it. Consider the two words „ಏಳನ (ELu) and
„ಬಿೋಳನ„ (bILu) . These two words, when inflected with tense (past) and PNG markers to
form ‟ಎದೆ್ನನ‟ (eddenu) and „ಬಿದೆ್ನನ „(biddenu). As these two words show the same
orthographic changes, they are grouped under the same paradigm. The Table 6.17
illustrates these paradigms with example each.
Table 6.17: Verb Paradigms
202
Paradigms Past tense
marker
Description & Example
Class-1
-ತ್ತತ-(-tt-) Verbs ends with 'Ayu', 'Iyu', 'ILu'; Eg: sAyu, Iyu, kILu etc.
Class-2
-ತ್ತತ-(-tt-) Verbs ends with 'ru', 'aLu', 'uLu'; Eg: „heru‟,‟aLu‟,‟uLu‟etc.
Class-3
-ತ್ತತ-(-tt-) Verbs ends with „aLu‟,‟uLu; Eg : aLu, uLu
Class-4
-ನ್ತತ- (-Mt-) Verbs ends with 'illu'; Eg : nillu
Class-5
-ತ್ತ- (-t-) Verbs ending with „i‟ and „e‟; Eg: „kali‟, „mere‟, „koLe‟ etc.
Class-6
-ತ್ತ- (-t-) Verbs ends with 'ULu'; Eg: Example: hULu
Class-7
-ತ್ತ- (-t-) Verbs ends with „Olu‟, „Ulu‟, „Elu‟; Eg:jOlu‟,‟nUlu‟,‟hElu‟
Class-8
-ದ್- (-d-)
Verbs ending with 'Ayu','Oyu','Eyu','Iyu'; Eg:‟ kAyu‟, „sIyu‟
Class-9
-ದ್- (-d-) Verbs ending with 'Agu','Ogu'; Eg: „hOgu‟, „Agu‟ etc
Class-10
-ದ್- (-d-) Verbs ends with 'are' ; Eg: „bare‟
Class-11
-ದ್- (-d-) verbs ending with 'ge' and 'gi' ; Eg: „age‟, „agi‟
Class-12
-ದ್- (-d-) Verbs ending with 'yyu' ; Eg: „koyyu‟, „geyyu‟, „hoyyu‟ etc.
Class-13
-ದ್- (-d-) Verbs ends with 'nnu' ; Eg:‟ annu‟, „tinnu‟, „ennu‟ etc
Class-14
-ದ್- (-d-) Verbs ending with 'Eyu' ; Eg: „gEyu‟, „nEyu‟ etc
Class-15
-ದ್- (-d-) Verbs ending with 'Ayu' ; Eg: „Ayu‟
Class-16
-ದ್ -(-dd-) Verbs ends with 'iru' ; Eg: „iru‟
Class-17
-ದ್ -(-dd-) Verbs ends with 'kaLu' ; Eg: kaLu
Class-18
-ದ್ -(-dd-) Verbs ends with 'ILu','ELu' ; Eg: „bILu‟ ,‟ELu‟, etc
Class-19
-ದ್ -(-dd-) Verbs ends with 'Eyu' ; Eg: :mEyu
Class-20
-ದ್ -(-dd-) Verbs ends with 'ellu' ; Eg: „gellu‟
Class-21
-ಇದ್-(-id-) Verbs ends with 'ADu', 'ODu' ; Eg: „ADu‟,‟nODu‟, „tODu‟
203
Class-22
-ಇದ್-(-id-) Verb ends with 'TTu',‟ddu‟, „bbu‟,‟ ttu‟, „llu‟, „ccu‟
Eg: aTTu, addu, ubbu, kuttu, cellu, heTTu, beccu,hottu etc
Class-23
-ಇದ್-(-id-) verbs ending with 'Oru', 'Eru' ; Eg: tOru, sEru, etc
Class-24
-ಇದ್-(-id-) Verbs ends with 'ju',‟su‟ ; Eg: mOju, aMkurisu etc
Class-25
-ಇದ್-(-id-) Verb ends with 'MTu',‟Mju‟,‟Mcu‟; Eg: IMTu, aMju, hoMcu
Class-26
-ಇದ್-(-id-) Verbs ends with 'ELu', 'ILu'; Eg: hELu, sILu etc
Class-27
-ನ್ತ - (-nd-) Verbs ends with 'Eyu', 'Oyu' ; Eg: bEyu, nOyu etc
Class-28
-ನ್ತ - (-nd-) Verbs ends with 'A'(aru) ; Eg: taru(tA),baru(bA) etc.
Class-29
-ನ್ತ - (-nd-) Verbs ends with 'ollu', 'ellu', 'allu' ; Eg: kollu,mellu ,sallu etc.
Class-30
-ನ್ತ - (nD-) Verb stems ending with 'ANu' ; Eg: kANu
Class-31
-ನ್ತ - (nD-) Verb ends with 'oLLu' ; Eg: koLLu
Class-32
-ಟ್- (-T-) Verb ends with „aDu',‟eDu‟,‟oDu‟,‟iDu‟,‟uDu‟
Eg: aDu, keDu, koDu, iDu, uDu, etc
Class-33
-ಕ್ಕ- (-k-) Verb ends with 'ggu'and ''gu'; Eg: oggu, sigu, nagu, etc
Class-34
-ದ್- (-d-) Verbs ends with 'kAyu' ; Eg: : kAyu, dArikAyu
Class-35
-ನ್ತ - (nD-) Verbs ends with 'ko' ; Eg: baggiko, bEDiko etc
Designing the morphological system thatshould probably generate all possible word
forms for the given verb root word was the next important stage in the developed system.
Fig. 6.1 shows the proposed flowchart for verb morphology. The meaning of each of the
abbreviations that are used in the flowchart is shown in Table 6.18.
Table 6.18: Abbreviations in the System Design
Abreviation Examples
PAST: Past tense marker tt, Mt.,t, d, dd,id,Md, D,T, kk, MD, ttidd, Mtidd,
204
tidd, didd, ddidd, idd, Mdidd , Didd, Tidd, kkidd
,MDidd .
PRESENT: Present tense marker utt, yutt, uttidd, yuttidd, iyutt, iyuttidd
FUTURE: Future tense marker uv, yuv.
PNG: Person Noun Gender As shown in Table V
PN: Pronoun avanu, avaLu, avaru, adu
PP: Past Participle u, I, ǿ (Nul)
TM: Tense Marker PRESENT, PAST, FUTURE
INF: Infinitive isi, is, isal, isalu, isid , iyisid, sid, al , alu ,i ,sal
,salu ,si , s, a
The different levels show the possibility of different verb words derived from the
same root word. For example the different morphemes associated with the verb word
„ಓದಸಿನೆ ೋಡನತತನೆ„(OdisinODuttAne) is:
ಓದನ + ಇಸಿ + ನೆ ೋಡನ + ಉತ್ತತ + ಆನೆ
Odu + isi + nODu + utt + Ane
Root(Level 1) + INF (Level 2) + AUXV (Level 3) + PRESENT (Level 2) + PNG(Level 3)
6.3 PROPOSED RULE BASED MAG
The proposed rule based MAG tool was developed using AT &T Finite State
Machine. This section describes the various efforts required to create the proposed rule
based MAG system.
6.3.1 Information Required to Build MAG
The following information‟s are required to build a morphological analyzer and
generator:
205
6.3.1.1 Lexicon
The list of stems and affixes together with basic information‟s about them (Noun stem
or Verb stem etc,).
6.3.1.2 Morphotactics
The model of morpheme ordering that explains which classes of morphemes can follow
other classes of morphemes inside a word. E.g., the rule that Kannada plural morpheme
follows the noun stem rather than preceding it.
6.3.1.3 Orthographic rules
These are spelling rules used to model the changes that occur in a word, usually when
two morphemes combine. For example, insert a “yu” on the surface tape just when the
lexical tape has a morpheme ending in „e‟ (or i, etc) and the next morphemes are
“tt”(PRES) and “Ane”(3SM).
Aguvudilla
Aguttave
illa
Agu
Aguttive
Agadu
Aguvudu
Aguttade
PRESENT
FUTURE
ddre
ddu
yAytu
bEku
bEDa
bAradu
bahudu
iddu
iddare
PP(u,i
)
PNG
RP(ǿ)
PNG
Root
INF
ONa/LL
ONa/y
ONa
PAST
uvike/y
uvike/L
Luvike
iri/yir
i/LLiri
/
uva/yu
va/LLu
va
ali/zha
li/LLali
al/yal/
LLal
Olla/
yOlla
/LLOll
a
ade/y
ade/L
Lade
is/yis
/LLis
i/isi/L
Lisi
a/ya/iy
a/LLa
RP(a)
udu
PP(u)
PN
illa
RP(iruv)
ade
ide
PN
PNG
udilla
RP (a)
udu
PN
INF
PAS
T
PRE
SEN
T
FU
TU
RE
illa
al
uvike
bEDa
bEkAgiru
mADu
bEkAgibaru
iDu
nODu
baru
bAradu
bahudu
Agu
hOgu
sAdhyavAgu
iru
ikkAgibA
illa
Aytu
nODu
koLLu
hOgu
biDu
koMDiru
hAku
tolagu
taLLu
tOrisu
koDu
iDu
NEG(ad)
TM
PNG
PN
PNG
PNG
illa
u
PN
adu
Aytu
ad
u
206
beLe + insert“yu” + PRES(tt) + 3SM(anu) ->beLe-yu-tt-Ane =beLeyuttAne
6.3.2 Creation of rules using FST
A FST essentially is a finite state automaton that works on two (or more) tapes. The
most common way to think about transducers is as a kind of “translating machine” which
works by reading from one tape and writing onto the other. For example, on one tape we
read “ಉದಾನಗಳನ”, on the other we write “ಉದಾನ+N +PL”, or the other way around as
shown in Fig. 6.2.“ಉ: ಉ” means read a “ಉ” symbol on one tape and write the same “ಉ”
on the other tape. Similarly “+N:ε” means read a “+N” symbol on one tape and write
nothing on the other.
Lexical Level
Surface Level
207
Fig. 6.2:FST working principle
FST‟s can be used for both analysis and generation (they are bidirectional) and, it act
as a two level morphology [162, 163]. A word is represented as a correspondence between
a lexicalleveland surfacelevel. At lexical level, represents a simple concatenation of
morphemes making up a word. But at the surface level, represents the actual spelling of
the final word.
6.3.3Architecture of Proposed MAG Model
With all relevant morphological feature information of Kannada words, created well
defined sandhi rules based on FST. The architecture of proposed rule based MAG system
is as shown in Fig. 6.3.
For the Morphological generator, if the string which has the root word and its
morphemic information is accepted by the automaton, then it generates the corresponding
root word and morpheme units in the first level as shown in Fig. 6.4.
Fig. 6.3: Architecture of proposed MAG model
Here “beLe” is the root word, “V” indicates the category of the root word as verb,
“PRES and FUT” indicates the tense markers for presentence and future tense respectively
and 3SM indicates PNG marker for third singular masculine.
208
Fig. 6.4: Example forMorphotactics Rule
The output of the first level becomes the input of the second level, where the
orthographic (sandhi) rules are handled as shown in Fig. 6.5. If it gets accepted then it
generates the inflected word.
Fig. 6.5: Application of Sandhi Rule
The sandhi rules should be written in such a way that, if the root word ends with “e”
and the next morphemes are “tt”(PRES) or “Ane”(3SM), then insert “yu” immediately
after the root word. Fig. 6.6 below shows the corresponding sandhi rule.
209
Fig. 6.6: Example for Sandhi Rule
Based on the inflections and differences, all possible Morphotactic and Sandhi rules
were written for all forms of nouns and verbs using FST.
Example Rules for Noun Morphology
####Plural "gaLu"#####
[b2] [<epsilon>] / __ gaLu
This rule works as follows if any root word given as input in the form „Root+N+PL‟
as like this „mara+N+PL‟, here mara (tree) is root word, here + is consider as [b2] this is
already defined in alphabets file. So +N should be replaced as [<epsilon>] that is replace
as empty string and then Plural marker is added to the root word „+PL‟ is replaced as
„gaLu‟, now tool give output as „maragaLu‟.
Example Rules for Verb Morphology
####PRESENT “tt”#####
[b2] [<epsilon>] / __ tt
This rule works as follows if any root word given as input in the form Root+V+PRES
as like this „Odu+V+PRES‟, here Odu (tree) is root word, here + is consider as [b2] this is
already defined in alphabets file this is common for both Noun and Verb. So „+V‟ should
be replaced as „[<epsilon>]‟ that is replace as empty string and then Presentence marker is
added to the root word „+PRES‟ is replaced as „tt‟, now tool give output as „Odutt‟.
210
6.4 MORPHOLOGICAL ANALYZER USING MACHINE LEARNING
APPROACH
The proposed morphological analyzer model was developed using supervised machine
learning approach using the popular classification and regression tool called SVM. In the
supervised machine learning approach, the training corpus consists of pairs of input
objects and the desired output.
Generally, in morphological analysis process, a complex word form is transformed
into root and suffixes. In the case of machine learning approach all the rules including
complex spelling rules are also handled by the classification task. Machine learning
approaches needs only corpora with linguistical information and do not require any hand
coded morphological rules [164]. The morphological or linguistical rules are automatically
extracted from the annotated corpora. In the proposed method, sequence labelling is used
to align the parallel corpus and morphological analysis problem was converted into
classification problem using machine learning approach.
6.4.1 Corpus development
The performance of the morphological analyzer is greatly depends on the aligned
morphological corpora which should be large, good-quality and representative. The
sequence of steps involved in the corpus development is as shown in Fig. 6.7.
6.4.1.1 Romanization
SVM support only Roman character code but Dravidian language like Kannada does
not support this code format and support only Unicode character. Mapping files were
created and used to map from Unicode to Roman and vice versa. The Romanized aligned
input-output data corpus,consisting of most commonly used verbs, selected from all verb
paradigms was created manually.
211
Fig. 6.7: Preprocessing steps
6.4.1.2Segmentation
This step involves four different stages: grapheme segmentation, splitting syllable, C-
V (Consonant-Vowel) representation and segmentation. In the first stage each and every
Romanized word in the corpora is segmented based on Kannada grapheme. Again these
graphemes are split into syllables of consonants and vowels in the second stage. In the
next stage the consonant and vowel markers “-C” and “-V” are append to the segmented
consonant and vowel syllable respectively. The symbol “*” is used to indicate the
morpheme boundaries of the output data.
6.4.1.3Alignment and Mapping
Using the sequence labelling approaches [165] the segmented input - output words
were aligned vertically and consequently as segments with space between them. The
Table 6.19 shows a sample of alignment of input and out data in the corpus creation.
Table 6.19: Sample Training Data Format
212
Input k-C a-V l-C i-V t-C a-V n-C u-V
Output k a l i* t* a n u*
6.4.1.4 Dissolving Mismatch
When we map input and output data words, the problem of mismatching may occur
due to either the input units are larger or smaller than that of the output units. In the first
case when the input units are larger than the output units, based on the morph-syntactic
rules, inserting a null symbol „$‟ in the output vector can solve the mismatching problem.
Consider the Kannada verb „kaliyuttAne‟ is having 11 segments in the input sequence and
10 segments in the output sequence.
Due to the morphosyntactic rule the occurrence of „y‟ in the input sequence becomes
null in the output sequence. To equalize the input and output data in such situation the
training data „y‟ is mapped with the empty symbol „$‟ as shown in Table 6.20.
In the second case the number of segments in the input sequence is less than the
corresponding output sequence as oppose to the first case. To illustrate this situation,
consider the Kannada verb „OdidaLu‟ which is having 7 segments in the input sequence
but 8 segments in the output sequence. In order to overcome this problem using the
morphosyntactic rule the first occurrence of „d- C‟ in the input sequence is mapped with
two segments „d‟ and „u‟ in output sequence in the training data.
Table 6.20: Dissolving Mismatch
Case Input Sequence Mismatched
Output Labels
Corrected Output
Labels
Case 1 k-C | a-V | l-C | i-V |y-C |u-V
| t | t-C | A-V | n-C |e-V
( 11 segments )
k | a|l| i* |u| t | t*| A|
n|e*
( 10 segments )
k | a| l| i* | $ | u| t | t*| A|
n|e*
(11 segments )
Case 2 O | d-C | i-V | d-C |a-V |L-V |
u-V
( 7 segments )
O | d | u* | i | d* | a | L | u*
( 8 segments )
O | du* | i | d* | a | L | u* ( 7 segments )
213
6.4.2 Morphological Analyzer Model Creation
The architecture of proposed model mainly consists of three different phases as shown
in Fig. 6.8.
Fig. 6.8: Architecture of Morphological Analyzer Model
The pre-processed parallel corpus consists of sequence of input characters and their
corresponding output labels. The parallel corpus consist of more than 200,000 words were
trained using SVMTlearn tool and the morphological analyser model called model-I was
generated. The model-I was used to analyse and identify different morphemes associated
with the given input test word. Similarly another model called model-II was created for
assigning grammatical classes to each morpheme in a word and this second model was
trained using sequence of morphemes and their grammatical categories.
The working principle is as follows:The test input word is given to the trained model-
I. The trained model-I predicts each label to the input segments. In the next phase the
segmented morphemes are given to the trained model-II. It predicts grammatical
categories to the segmented morphemes for the given input word.
214
6.5 SYSTEM EVALUATION AND PERFORMANCE
Development of MAG is a challenging task for all types of word forms. The
developed rule based MAG is capable of analyzing and generating a list of twenty
thousand nouns, around three thousand verbs and a relatively smaller list of adjectives.The
uniqueness of the developed MAG is its capacity to generate and analyze transitive,
causative and tense forms apart from the passive constructions, auxiliaries and verbal
nouns. The performance of the developed system can be substantially improved by adding
more rules such as rules for complex morphology etc. Also by checking against more and
more different types of word lexicons, the accuracy of the developed MAG can be
improved. A rule based machine translation system for English to Kannada language was
developed using the proposed MAG. The following Fig. 6.9 shows a command line
screenshot for the developed RMAG.
In the second proposed method, a parallel corpus consist of more than 200,000 words were
trained using SVMTlearn tool and the morphological analyzer model was generated.
Using the SVMTagger we have tested more than 50,000 different verb words selected
from two standard dictionaries [166,167] and also from the Amrita POS tagged corpus.
The performance of the system was evaluated using SVMTeval tool and the outputs which
are incorrect are noticed. In contrast to the rule based approach, the system performance is
considerably increased by adding the input words to the training corpus whose
corresponding output are incorrect during testing and evaluation. From the experiment we
found that the performance of our system significantly outperforms and achieves a very
competitive accuracy of 96.25% for Kannada verbs.
215
Fig. 6.9: Command Line Output of Morph Tool
Sample screenshots of the developed MAG model for noun are shown in Fig. 6.10 and
6.11. Similarly Fig. 6.12 and 6.13 shows the screenshots of the developed MAG model for
verb.
Fig. 6.10: Screenshot of Kannada Morph analyzer for Noun
216
Fig. 6.11: Screenshot of Kannada Morph generator for Noun
Fig. 6.12: Screenshot of Kannada Morph analyzer for Verb
217
Fig. 6.13: Screenshot of Kannada Morph generator for Verb
6.6 SUMMARY
This chapter is a part of the research work which deals with the design and
development of morphological analyzer and generator for Kannada language using rule
based as well as statistical based approaches.
Development of a morphological analyzer and generator for all types of word forms is a
challenging task for an agglutinative language like Kannada. The implementations aimed
to incorporate more lexical information of Kannada language with good semantic features,
which will solve the morphological problem more effectively. The performance of the
statistical approach depends on large sized aligned bilingual corpora of all types of word
categories. On the hand the performance of the rule based approach depends on all types
of simple and complex linguistic rules, in order to cover all types of word forms.
The performance of the developed rule based system can be substantially improved by
adding more rules by checking against more and more different types of word lexicons.
On the other hand, the performance of the statistical approach can be improved by
increasing the corpus size to cover other word categories like noun, pronoun, adverb etc.
218
6.7 PUBLICATIONS
1. Antony P J, Anand Kumar M and Soman K P: “Paradigm Based Morphological
Analyzer for Kannada Language Using Machine Learning Approach.”, International
journal on-Advances in Computer Science and Technology (ACST), ISSN 0973-6107,
Vol 3 No. 4, 2010, pp. 457–481.
2. Ramasami Veerappan, Antony P J and Soman K P: “A Rule based Kannada
Morphological Analyzer and Generator using Finite State Transducer”, International
journal on Computer Application -IJCA (0975 – 8887), Volume 27– No.10, August
2011.