4
International Journal of Advancements in Research & Technology, Volume 2, Issue 10, October-2013 174 ISSN 2278-7763 Copyright © 2013 SciResPub. IJOART OCR for Classifying and Retrieving Different Syriac Alphabet Forms Pof. Dr. Abdul Monem S. Rahma 1 , Dr.Basima Z.Yacob 2 and Danny T. Baito 2 1 Computer Science -University of Technology –Bagdad –Iraq Email:[email protected] , 2 Computer Science -University of Duhok – Duhok –Iraq Email: sadaanweya @yahoo.com ABSTRACT In this paper, an attempt is made to present a method for recognizing Syriac alphabet forms using invariant moments and re- trieving the other forms of recognized character. The character is distinguished first and then the other two forms of recognized charater are retrieved. A minimum distance nearest neighbor classifier is adopted for classification. The experimental results confirm the recognition and retrieval accuracy as of 100% for different Syriac alphabet forms. Keywords: OCR, Syriac alphabet forms, West Syriac (Serta) alphabet, Estrangela alphabet 1 INTRODCCTION PTICAL character recognition (OCR) is one of the im- portant active areas of research in pattern recognition. Presently, all communications, businesses and trades are per- formed through tetechnology hence; the development of reli- able OCR system is inevitable for different scripts and lan- guages. Even though, reasonably good OCR systems are available in the market for local and universal languages and scripts, the rate of recognition drastically drops down when OCR reads a document containing different font style (forms) of the characters [1]. In the past several decades, a large number of OCR sys- tems have been developed for natural languages [2], [3], [4]. However, the problem of Syriac character recognition has been rarely addressed [5], the purpose of this paper is to de- velop an efficient system for recognizing different Syriac al- phabet forms by computing invariant moments as features. Syriac is an ancient Iraqi language, and it is culturally used by human beings in Iraq. It has many religious scripts as well as scientific and literary books which have been written in different alphabet Syriac forms (Estrangela (bible font), Ser- ta (West) and Madnkhaya (East)). In [5], Authors have used invariant moments feature to recognize the East Syriac alphabet form. In this paper the strangela(bible font), and Serta (West) forms are recognized by computing the invariant moments as features, and also the corresponding recognized character in other two Syriac al- phabet forms are retrieved. In multi alphabet forms like Syriac language, it is common that many scripts and books through the long history of this language have been written in three different Syriac alphabet forms. This addresses the need of developing a single Syriac Alphabet Forms recognition system. 2 OVERVIEW OF SYRIAC ALPHABET FORMS The Syriac language is one of the Semitic languages that is being spoken in Iraq, Syria, Turkey and Iran by Assyrians. It’s an ancient language, one of the rarest and oldest in the world. The Syriac alphabet consists of 22 characters which is written from right to left [6]. There are three important forms in the Syriac alphabet lan- guage,Estrangela(biblefont),Serta(west) and Madnkhaya(East) form. Estrangela (bible font), meaning 'rounded', is the oldest form and is considered the classical version of the Syriac alphabet. It was revived during the 10th century, and is now used mainly in scholarly publications, titles and inscription. Fig.1 shows Estrangela Syriac alphabet form. West Syriac is generally written with Serta, meaning 'line', which is also known as the Psheta (simple), Maronite or Jaco- bite. It was modeled on Estrangela but with simpler, more flowing lines. A version of Serta appeared in the earliest Syriac manuscripts, and it became popular during the 8th century .Fig. 2 is shown West (Serta) Syriac alphabet O Fig.1 Estrangela Syriac alphabet form

OCR for Classifying and Retrieving Different Syriac -

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

International Journal of Advancements in Research & Technology, Volume 2, Issue 10, October-2013 177 ISSN 2278-7763

Copyright © 2013 SciResPub. IJOART

two other forms of “Taw” in Estrangela and Serta(West) can be retrieved using the same index .Fig 4 shows the forms of “Taw” letter.

Using 7 or 6 or 5 or 4 or 3 or 2 or 1 moment as extracted fea-ture for classification, the same results of recognition rate are achieved, and the rate of Recognition and retrieval accuracy is 100% for different Syriac alphabet forms , So the proposed OCR can use one moment instead of 2,3,4,5,6 or 7 moments , this will reduce the time needed to recognize the charac-ter. Table 4 shows the recognition and retrival time in millise-cond for one character by using moments between 1 to 7.

5 CONCLUSION A simple Syraic Alphabet Characters recognition system of the three different forms that uses Invariant moments as features, and that uses the index of the recognized letter to retrieve the other two forms of this letter is proposed. From the test results, it has been identified that using seven, six, five, four, three, two or one moment as a feature, the same results of recogni-tion and retrieval are achieved, Due to these results, one mo-ment can be used instead of two or three or four or five or six or seven moments to recognize the character, this leads to the reduction of time of the Syriac Alphabet Forms recognition system.

REFERENCES [1] Hangarge, Shashikala Patil and B.V.Dhandra “Multi-font/size Kan-

nada Vowels and Numerals Recognition Based on Modified Inva-riant Moments”, IJCA Special Issue on “Recent Trends in Image Processing and Pattern Recognition, RTIPPR, pp 126-130, 2010.

[2] S. Sardar, A. Wahab “Optical character recognition system for Urdu” International conference on Information and Emerging Technologies, June 2010.

[3] M.Soleymani and F.Razzazi,“An efficient front-end system for Iso-lated Persian/Arabic character Recognition ofhandwritten data-Entry Forms,” International Journal Of Computational Intelligence, vol 1, pp.193-196,2003.

[4] B. B. Chaudhuri and U. Pal, “A Complete Printed Bangla OCR Sys-tem”, Pattern Recognition. Vol. 31, pp.531-549, 1998.

[5] Abdul Monem S. Rahma, Basima Z.Yacob and Danny T. Baito "Moment invariants Based Features Extraction For Classification of Syriac Alphabet Language," International Journal of Advances in Engi-neering and Technology, Vol. 6, Issue 4, pp. 1442-1451 Sept. 2013.

[6] Rev. Shlemon I. Khoshaba, “Lessons in the teaching of the Syriac language”, AL-Mashrq House Cultural, Duhok – Iraq, 2010.

[7] http://learnaramaic.blogspot.com/2012/05/estrangelo-syriac-script.html#.UjxT-NLwl7I ,2012.

[8] http://www.omniglot.com/writing/syriac.htm [9] Syriac alphabet , http://en.wikipedia.org/wiki/Syriac_alphabet [10] Dhandra B.V., Malemath V.S., Mallikarjun H., Hegadi Ravindra, ”

Multi-font English Character Recognition based on Modified Inva-riant Moments “, Journal of Combinatorial Mathematics and Combina-torial Computing, Vol. 67, pp. 153-162 , 2008.

TABLE 4 THE RECOGNITION AND RETRIVAL TIME (MS) OF USING

MOMENTS BETWEEN 7 TO 1 Number of

Moment The recognition time(MS)for one character +

retrieval time of the two other forms 7 230.5 6 227.5 5 225.9 4 225.5 3 223.2 2 222.0 1 220.1

Fig. 4. Letter “Taw” has three different forms, (a): Estrangela form, (b) Serta (West) form, (c) Madnkḥaya(East) form

IJoART