17
A Web based system to capture outlines of Arabic fonts M. Sarfraz * , F.A. Razzak Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, KFUPM # 1510, Dhahran 31261, Saudi Arabia Received 30 December 2001; received in revised form 17 March 2002; accepted 4 July 2002 Abstract This paper presents a Web based system for the automatic capture of Arabic fonts. The system is based on an algorithm, which incorporates various features including the detection of boundary, discovering corner and break points, and fitting the curve. The work done, in this paper, fully automate the process and produces best optimal results. Moreover the automated system developed is deployed over the Web using client/server model and state of the art technology. Ó 2002 Elsevier Science Inc. All rights reserved. Keywords: Web; Font; Corner points; Contour; Spline 1. Introduction Today, the systems, tools, and applications through World Wide Web have revolutionized the way people used to think, act, and react in the past. This revolution can be observed in all disciplines of life including business, education, government, entertainment, health, defense, etc. Although, a large amount of developments, in this area, have taken place in the last few years but it is too Information Sciences 150 (2003) 177–193 www.elsevier.com/locate/ins * Corresponding author. Tel.: +966-3-860-2763; fax: +966-3-860-1562. E-mail addresses: [email protected], [email protected] (M. Sarfraz). 0020-0255/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved. PII:S0020-0255(02)00376-6

A Web based system to capture outlines of Arabic fonts

Embed Size (px)

Citation preview

A Web based system to capture outlinesof Arabic fonts

M. Sarfraz *, F.A. Razzak

Department of Information and Computer Science, King Fahd University of Petroleum and Minerals,

KFUPM # 1510, Dhahran 31261, Saudi Arabia

Received 30 December 2001; received in revised form 17 March 2002; accepted 4 July 2002

Abstract

This paper presents a Web based system for the automatic capture of Arabic fonts.

The system is based on an algorithm, which incorporates various features including the

detection of boundary, discovering corner and break points, and fitting the curve. The

work done, in this paper, fully automate the process and produces best optimal results.

Moreover the automated system developed is deployed over the Web using client/server

model and state of the art technology.

� 2002 Elsevier Science Inc. All rights reserved.

Keywords: Web; Font; Corner points; Contour; Spline

1. Introduction

Today, the systems, tools, and applications through World Wide Web have

revolutionized the way people used to think, act, and react in the past. This

revolution can be observed in all disciplines of life including business, education,

government, entertainment, health, defense, etc. Although, a large amount ofdevelopments, in this area, have taken place in the last few years but it is too

Information Sciences 150 (2003) 177–193

www.elsevier.com/locate/ins

*Corresponding author. Tel.: +966-3-860-2763; fax: +966-3-860-1562.

E-mail addresses: [email protected], [email protected] (M. Sarfraz).

0020-0255/02/$ - see front matter � 2002 Elsevier Science Inc. All rights reserved.

PII: S0020 -0255 (02 )00376 -6

young yet. The area in itself is too hot as far as current needs and challenges are

concerned. International community feels to achieve a great success to their

problems through this discipline. This article is devoted to one of the appli-

cations concerned with font designing. A Web based system has been devel-oped to facilitate the Internet community specifically concerned with Arabic

fonts.

Arabic characters are different from other characters such as English, Latin,

etc. The Arabic script is very rich in different font formats and its cursively

nature requires much more attention. This paper deals with an algorithm to

eliminate the human interaction to obtain the outline of original digital image

and the system�s appearance over the Web.In the traditional approaches [5,10,14], initially, a digitized image is ob-

tained either by scanning or from some electronic device. From this digitized

image, boundary or contour of the image is obtained [1,2,25]. Then corner

points of the image are determined from contour. These corner points can be

obtained by some interactive method or by some automated corner detection

algorithm [3,6,8,18,20]. Optimal curve fitting is done by segmenting the con-

tour outline at the corner points. Normally, the curve fitting methods are based

on Bezier quadratics or cubics [9,11–13,15–17,21,22,32,33].

The methodology, in this paper, mainly differs to the traditional ap-proaches in various ways. Since, some times corners are not detected precisely

and some times only corner points are not sufficient to fit the curve which

represent the original image, some more points are needed to achieve a best fit.

These points are called the break points and are used along with corner points

to achieve the best fit by using the least square method. The subdivision

methodology is used to conquest the desired solution. Another major difference

lies in the curve model for the description of design curve. The outline cap-

turing technique, instead of traditional Bezier quadratics or cubics, is basedupon a cubic spline model that has attracting features to control the curve

segments.

The growth of information technology and the World Wide Web motivates

us to make the system appear over the Web. The advantages of a Web based

system are

• share the knowledge with the whole Internet community,• good and easy-to-use user interface using HTML technology,• no additional software is required at the client side except Web browser,• user can test their own images and get a quick and faster response.The client/server model is used to achieve our goal. The Matlab Web Server,

Apache and Tomcat Web Server along with HTML, JSP and JavaBeans

technology [19] are used to implement the system.

This paper is organized as follows. The discussion of scanning the image is

in Section 2. Extraction of boundary is discussed in Section 3. The issue of

detecting the corner points is discussed in Section 4. The details of fitting cubic

178 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

spline are given in Section 5. The Web application structure and implemen-

tation is discussed in Section 6. Finally Section 7 concludes the paper.

2. Getting digitized image

Digitized image can be obtained directly from some electronic device or by

scanning an image. The quality of digitized scanned image depends on various

factors such as image on paper, scanner, and attributes set during scanning.

The quality of digitized image obtained directly from electronic device dependson the resolution of device, source of image, type of image etc. Fig. 1(a) and (b)

shows the digitized images of two different Arabic fonts.

3. Image outline extraction

The next step is to find all the cyclical outlines (i.e., closed curves) in thebitmap image. The resulting list is called a pixel outline list that consists of the

pixel coordinates of each edge on the outline. For example, the pixel outline list

for an ‘‘i’’ has two elements: one for the dot, and one for the stem. The pixeloutline list for an ‘‘o’’ also has two elements: one for the outside of the shape,

and one for the inside. Boundary of digitized image is extracted by using some

boundary detection algorithm. There are numerous algorithms for detecting

boundary [1,2,25]. We have used the algorithm in [25]. The input to this al-

gorithm is a bitmap file. The algorithm returns number of pieces and for eachpiece number of boundary points and values of these boundary points

Fig. 1. Two Arabic fonts.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 179

pi ¼ ðxi; yiÞ, i ¼ 1; . . . ;N . Fig. 2(a) and (b) shows detected boundary of theimages of Fig. 1(a) and (b) respectively.

4. Detection of corner points

After finding out boundary points, next step is detection of corner points.

The corner points are those points that partition the outline into various

segments. There has been a good amount of work done for the detection of

corner points given the boundary of an image. A number of approaches have

been proposed by the researchers [3,6,8,18,20,23,24,26]. These include curva-

ture analysis with numerical techniques and some signal processing methods.

In [20] some of the possible ways to detect corners in an image are presented.The curvature can be analyzed using some numerical approaches. The detec-

tion of corner actually depends on the actual resolution of the image and

processing width to calculate the curvature. For the detection of corners, in this

paper, we adopted the technique used by Chetverikov and Szabo [6] algorithm.

The algorithm is demonstrated in Fig. 3(a) and (b) that shows detected corner

points for outlines in Fig. 2(a) and (b) respectively.

5. Outline generation

We divide the whole set of contour points into groups called segments such

that each segment lies between two consecutive corner points. The parametric

representation of curves is then used to fit the curve in a piecewise manner.

There are various curve design techniques in the literature [4,7,25–31], butcubic polynomials are most often used because lower-degree polynomials give

0 50 100 150 200 25020

40

60

80

100

120

140

160

180

0 20 40 60 80 100 120 1400

50

100

150

200

250

(a) (b)

Fig. 2. The contour of the fonts in Fig. 1.

180 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

too little flexibility in controlling the shape of the curve, and higher-degree

polynomials can introduce unwanted wiggles and also require more compu-

tation. No lower-degree representation allows a curve segment to interpolate

(pass through) two specified endpoints with specified derivatives at each end-

point. Given a cubic polynomial with its four coefficients, four knowns are used

to solve for the unknown coefficients. The four knowns might be the two

endpoints and the derivatives at the endpoints.

Let Fi, Fiþ1 i 2 Z be the two end characteristic points given at the distinctknots ti, tiþ1, i 2 Z with interval spacing hi ¼ tiþ1 � ti > 0. Also let Di, Diþ1,

i 2 Z denote the first derivative values defined at the knots. Then the gener-alized form of the cubic is defined by

Pijðti ;tiþ1ÞðtÞ ¼ ð1� tÞ3Fi þ 3tð1� tÞ2Vi þ 3t2ð1� tÞ þ t3Fiþ1 ð1Þ

where

Vi ¼ Fi þ hiDi=3; Wi ¼ Fiþ1 � hiDiþ1=3 ð2Þ

The interpolation conditions are as follows:

PiðtiÞ ¼ Fi; Piðtiþ1Þ ¼ Fiþ1 and

P ð1Þi ðtiÞ ¼ Di; P ð1Þ

i ðtiþ1Þ ¼ Diþ1; i 2 Zð3Þ

Eq. (1) can be rewritten as

Pijðti ;tiþ1ÞðtÞ ¼ R0;iðtÞFi þ R1;iðtÞVi þ R2;iðtÞ þ R3;iðtÞFiþ1 ð4Þ

0 50 100 150 200 25020

40

60

80

100

120

140

160

180

0 20 40 60 80 100 120 1400

50

100

150

200

250

(a) (b)

Fig. 3. The contours of the fonts in Fig. 1, identifying the corner points.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 181

where

R0;iðtÞ ¼ ð1� tÞ3

R1;iðtÞ ¼ 3tð1� tÞ2

R2;iðtÞ ¼ 3t2ð1� tÞR3;iðtÞ ¼ t3

ð5Þ

The functions Rj;i, j ¼ 0; 1; 2; 3 are Bernstein Bezier like basis functions, suchthat

X3j¼0

Rj;iðtÞ ¼ 1 ð6Þ

One can observe that the piecewise curve, thus obtained, results to a cubic

spline with C1 continuity.There are number of parameterizations techniques in the literature [4,7,25–

28], we use the chord-length parameterization to estimate the parametric value

t associated with each point pi ¼ ðxi; yiÞ.

ti ¼0 if i ¼ 0;jp1p2j þ jp2p3j þ � � � þ jpi�1pijjp1p2j þ jp2p3j þ � � � þ jpn�1pnj

if i < n;

1 if i ¼ n:

8><>:

It should be noted that ti is in normalized form and varies from 0 to 1, andhence hi in our case is always equal to 1.

5.1. Estimation of tangent vectors

We define a distance based choice for tangent vectors Di�s at Fi�s as follows.For open curves, the tangents are defined as follows:

D0 ¼ 2ðF1 � F0Þ � ðF2 � F0Þ=2;Dn ¼ 2ðFn � Fn�1Þ � ðFn � Fn�2Þ=2;Di ¼ aiðFi � Fi�1Þ þ ð1� aiÞðFiþ1 � FiÞ; i ¼ 1; . . . ; n� 1

9=; ð7Þ

For close curves, the tangents are described as follows:

F�1 ¼ Fn�1; Fnþ1 ¼ F1;Di ¼ aiðFi � Fi�1Þ þ ð1� aiÞðFiþ1 � FiÞ; i ¼ 0; . . . ; n

ð8Þ

where

ai ¼jFiþ1 � Fij

jFiþ1 � Fij þ jFi � Fi�1j; i ¼ 0; . . . ; n: ð9Þ

182 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

5.2. Optimal design curve

The case, when tangents are estimated as described in Section 6.1, is a C1

Hermite spline curve and will be treated as a default design curve in this article.This is a simplest case to consider and requires less computation initially. Since

hi are equal to 1, we have:

Vi ¼ Fi þ Di=3 and Wi ¼ Fiþ1 � Diþ1=3

Fig. 4(a) and (b) are the Hermite cubic splines fitted to the corner points in Fig.

3(a) and (b) respectively.

Suppose, for i ¼ 0; 1; 2; . . . ; n� 1, the data segmentsfPi;j ¼ ðxi;j; yi;jÞ; j ¼ 1; 2; . . . ;mig ð10Þ

are given as the ordered sets of the universal set of the data points. Then the

square roots Si;j�s of distances between Pi;j�s and their corresponding parametricpoints Piðtj)�s on the curve are computed as:

Si ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijPiðui;jÞ � Pi;jj

q; i ¼ 0; 1; 2; . . . ; n� 1 ð11Þ

where the parameterization over u�s is in accordance with the chord-lengthparameterization.

For the best fitting of the curve to the given data, we have to find out the

spline curve so that the Si;j�s are minimal in each segment. This can be done bybreaking the curve pieces at those points, where the square roots of distances

are highest. Thus the curve fitted using this way, will be a candidate of best fit.

A fitted Hermite spline curve may not satisfy the threshold tolerance limit.

The curve is then to be subdivided in two at the point of worst error––the point

20

40

60

80

100

120

140

160

180

0 20 40 60 80 100 120 1400

50

100

150

200

250(a) (b)

0 50 100 150 200 250

Fig. 4. The Hermite cubic splines to the corner points in Fig. 3.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 183

where the fitted spline is farthest from the digitized curve. The new break point

will be considered as a significant point and the curve is again fitted to thesecorner and break points obtained so far. This process of inserting break points

is repeated unless a threshold value is not violated.

Figs. 5 and 6 are the demonstration of the scheme. In the matrices of Fig. 5,

the first (from left to right) is for the bitmapped image of the font, second is for

the outline of the font, third is showing the corner points, fourth is the Hermite

spline curve fit, from fifth to ninth is the insertion of break points against the

threshold value three.

The matrix of Fig. 6 follows in the same way against the threshold valuetwo. One can observe that lesser threshold value gives rise to more break

points. Hence more accuracy costs more in terms of break points.

6. Web application

A system or application appearance over the World Wide Web is nowadays

common and sometimes essential and economical. Our aim is to implementand test the curve fitting approach that is discussed in Section 6 providing end

Fig. 5. Implementation of the scheme for threshold value 3.

184 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

user with an easy-to-use interface. Another reason for making the system Web

based is to share the knowledge amongst the whole Internet community.

We have used the Matlab Web Server along with Apache Web Server. We

will briefly discuss here the Matlab Web Server environment, it�s components,and how to build the applications in it. For extensive details, the reader isreferred to [19].

6.1. Matlab Web server environment

The Matlab Web Server enables to create Matlab applications that use the

capabilities of the World Wide Web to send data to Matlab for computation

and to display the results in a Web browser. The Matlab Web Server depends

upon TCP/IP networking for transmission of data between the client system

and Matlab. The required networking software and hardware must be installed

on the system prior to using the Matlab Web Server. In the simplest configu-

ration, a Web browser runs on client workstation, while Matlab, the MatlabWeb Server (matlabserver), and the Web server daemon (httpd) run on another

machine (see Fig. 7). In a more complex network, the Web server daemon can

Fig. 6. Implementation of the scheme for threshold value 2.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 185

run on a machine apart from the others (see Fig. 8). Fig. 9 shows how Matlab

operates over the Web.

6.2. Structure and implementation

The structure of our system is similar to the one depicted in Fig. 7. We have

used Apache Web Server as the httpd daemon and our Web system works as

shown in Fig. 9. The scheme, discussed in previous sections, has been imple-

mented and made available for testing over the Internet. Demonstration of the

working Web page can be seen in Figs. 10–14. A set of Arabic alphabets has

been provided for testing in Fig. 10. Fig. 11 shows the option of threshold valuefor the square root difference of the actual and computed outline, the running of

the scheme, and the online help for the user. The selected font and its outline is

displayed in Fig. 12, whereas the computed outline is shown in Fig. 13 together

with the computed corner points and break points. The statistical information,

about the selected font and the computed outline, is provided in Fig. 14.

6.3. Uploading feature

In addition to providing some standard images and sets of alphabets, a usercan upload a bitmap image. Since Matlab Web Server does not support the

Fig. 7. Simple configuration.

186 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

uploading feature, we opt for Tomcat Web Server as it supports JSP and

JavaBeans technology. By using JSP and JavaBean technology, a user bitmap

Fig. 8. Complex configuration.

Fig. 9. Matlab on the Web.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 187

Fig. 10. Set of Arabic alphabets.

Fig. 11. About the working of Web page.

188 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

Fig. 12. Screen for the selected font and its outline.

Fig. 13. Final result screen showing corner points, break points, and the computed outline.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 189

image file is uploaded and then one html file is also updated according to it, so

that a user can see and test his image. Although the uploading feature uses

Fig. 15. Screen shot of an upload page.

Fig. 14. Statistical information.

190 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

different Web Server, this is completely transparent to the novice user. All that

one has to do is to select the file and then upload it by clicking a button. A

sample screen of an upload page is shown in Fig. 15.

7. Concluding remarks

An algorithm for the approximation of boundary of digital images, for

Arabic fonts, has been presented. In addition to the detection of corner points,

a strategy to detect a set of break points is also explained to optimize theoutline. The class of cubic splines is used to fit the optimal curve. It eliminates

the human interaction in obtaining the outline of digitized image. A World

Wide Web based interface has been created for automatic outline capturing of

the Arabic fonts. An easy-to-use interface is developed whereby users can test

the scheme discussed in this paper.

This work is still in progress and some more elegant results are expected to

be achieved by using different parametric techniques, segment subdivision and

cubic models. In addition to the available fonts, the authors are also looking toadd more sets of fonts. Moreover, the facility where the users can upload their

own fonts, is also in progress. The presently designed interface is simple and

more features will be added in the future.

Acknowledgements

The authors acknowledge the support of King Fahd University of Petro-

leum and Minerals, through the project FT/2000-03, in the development of this

work.

References

[1] I.S.I. Abuhaiba, S. Datto, M.J.J. Holt, Straight line approximation and id representation of

off-line hand-written text, Image and Vision Computing 13 (10) (1994) 755–769.

[2] G. Avrahami, V. Pratt, Sub-pixel edge detection in character digitization, in: R. Morris,

J. Andre (Eds.), Raster Imaging and Digital Typography II, Cambridge University Press, 1991,

pp. 54–64.

[3] H.L. Beus, An improved Corner Detection Algorithm based on chain coded plane curves,

Pattern Recognition 20 (3) (1987) 291–296.

[4] J.M. Brun, S. Fouofu, A. Bouras, P. Arthaud, In search of an optimal parameterization of

curves, in: COMPUGRAPHICS95, 1995, pp. 135–142.

[5] H. Chang, H. Yan, Vectorization of hand-drawn image using piecewise cubic Bezier curves

fitting, Pattern Recognition 31 (11) (1998) 1747–1755.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 191

[6] D. Chetverikov, Z. Szabo, A simple and efficient algorithm for detection of high curvature

points in planar curves, Proc. 23rd Workshop of the Australian Pattern Recognition Group,

1999, pp. 175–184.

[7] G.E. Farin, Curves and Surfaces for Computer Aided Geometric Design: A Practical Guide,

Academic Press, New York, 1997.

[8] H. Freeman, L.S. Davis, A corner finding algorithm for chain-coded curves, IEEE Trans.

Computers 26 (1977) 297–303.

[9] R.D. Hersch, in: Roger D. Hersch (Ed.), Font Rasterization, Visual and Technical Aspects of

Type, Cambridge University Press, 1993, pp. 78–109, ISBN 0-521-44026-2.

[10] J. Herz, R.D. Hersch, Towards a universal auto-hinting system for typographic shapes,

Electronic Publishing 7 (4) (1994) 251–260.

[11] F. Hussain, 1994. Quadratic representation of Bezier cubic outlines, private communication.

[12] F. Hussain, Conic rescue of Bezier founts, New Advances in Computer Graphics, Proceeding

of CG International�89, Springer-Verlag, Tokyo, 1989, pp. 97–120.[13] F. Hussain, B. Zalik, S. Kolmanic, Intelligent digitisation of Arabic characters, IEEE Pro-

ceedings on Information Visualisation conference, London, 2000, pp. 337–342.

[14] K. Itoh, Y. Ohno, A curve fitting algorithm for character fonts, Electronic Publishing 6 (3)

(1993) 195–205.

[15] P. Karow, Font Technology (Methods and Tools), Springer-Verlag, Berlin, 1994.

[16] P. Karow, Digital Typefaces: Description and Formats, Springer-Verlag, Berlin, 1994.

[17] M.A. Khan, An efficient font design method, Master�s thesis, Department of Informationand Computer Science, King Fahd University of Petroleum and Minerals, Saudi Arabia,

2001.

[18] H.C. Liu, M.D. Srinath, Corner detection from chain-code, Pattern Recognition (1990) 51–68.

[19] MATLAB Web Server, The Language of Technical Computing, The MathWorks, Inc.,

Available from http://www.mathworks.com/access/helpdesk/help/pdfndoc/webserver/web-server.pdf (2000).

[20] S. Pei et al., Corner detection using nest moving average, Pattern Recognition 27 (11) (1994)

1533–1537.

[21] M.L.V. Pitteway, Algorithms of Conic Generation, in: NATO ASI series F, vol. 17, Springer-

Verlag, 1985, pp. 219–237.

[22] V. Pratt, Techniques for conic splines, computer graphics, ACM SIGGRAPH 19 (1985) 151–

159.

[23] A. Quddus, M.M. Fahmy, Wavelet transformation for grey-level corner detection, Pattern

Recognition 28 (6) (1995) 853–861.

[24] A. Quddus, M.M. Fahmy, A fast wavelet-based corner detection technique, Electronics Letters

35 (4) (1999) 287–288.

[25] A. Quddus, Curvature Analysis Using Multiresolution Techniques, PhD thesis, King Fahd

University of Petroleum and Minerals, Saudi Arabia, 1998.

[26] A. Rosenfeld, E. Johnston, Angle detection on digital curves, IEEE Trans. Computers 22

(1973) 875–878.

[27] A. Rosenfeld, J.S. Weszka, An improved method of angle detection on digital curves, IEEE

Trans. Computers 24 (1975) 940–941.

[28] M. Sarfraz, Curves and surfaces for CAD using C2 rational cubic splines, Engineering with

Computers 11 (2) (1995) 94–102.

[29] M. Sarfraz, Generalized geometric interpolation for rational cubic splines, Computers and

Graphics 18 (1) (1994) 61–72.

[30] M. Sarfraz, A C2 rational cubic spline which has linear denominator and shape control,

Annales Univ. Sci. Budapest 37 (1994) 53–62.

[31] M. Sarfraz, Interpolatory rational bicubic spline surface with shape control, J. Sci. Res. 20

(1&2) (1991) 43–64.

192 M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193

[32] P.J. Schneider, Phoenix: an interactive curve design system based on the automatic fitting of

hand sketched curves, Master�s thesis, University of Washington, USA, 1988.[33] B. Zalik, F. Hussain, Constraint-based interactive system for font outlines design, Proceedings:

International Conference on Imaging Science, Systems, and Technology, CISST�98, Las Vegas,Nevada, USA, 1998, pp. 274–281, ISBN: 1-892512-03-3.

M. Sarfraz, F.A. Razzak / Information Sciences 150 (2003) 177–193 193