36
Arabic mathematical e-documents M. E DDAHIBI , A. L AZREK and K. S AMI University Cadi Ayyad, Marrakech, Morocco www.ucam.ac.ma/fssm/rydarab [email protected] TUG Arabic mathematical e-documents – p. 1/34

Arabicmathematicale-documentsucam.ac.ma/fssm/rydarab/doc/expose/tugmathe.pdf · Arabicmathematicale-documents M. EDDAHIBI, ... diacritics ( ) some ... The use of the Arabic Computer

Embed Size (px)

Citation preview

Arabic mathematical e-documents

M. EDDAHIBI, A. LAZREK and K. SAMI

University Cadi Ayyad, Marrakech, Moroccowww.ucam.ac.ma/fssm/rydarab

[email protected]

TUG � � � �

Arabic mathematical e-documents – p. 1/34

Outlines� Mathematics in Arabic� e-document in Arabic� Why MathML?� MathML I18n� Mozilla’s extension� Conclusions & Prospects

Arabic mathematical e-documents – p. 2/34

Mathematics in ArabicArabic writing� the direction of writing spreads out from right to left� the Arabic alphabet is special ( ��� ���

)� the cursivity of the writing ( �� � )� the consonants are marked as letters while vowels arediacritics (

� ���� � �� )� some letters differ only by the number and the positionof dots ( �� ��� � � )� some letters differ only by some parts of glyphs( ��� �

)� the shape of a letter depends on its position in theword (initial, median, final and isolated form: � � � � )

Arabic mathematical e-documents – p. 3/34

� punctuation marks present particularities oforientation (

) & glyph ( �)� the letters of a word can be superposed throughligatures as characters ( � � � ��)� the letters can be stretched in a curvilinear waythrough the kashida, which is an extensible curve( � � ��)� some letters and words can also be superposed asnon characters� several calligraphic styles can be in use. Shapes ofletters, ligatures, calligraphic rules, . . . vary accordingto these styles (Farisy Koufy Maghriby Naskh Thuluth Rouqaa Dywany)

Arabic mathematical e-documents – p. 4/34

Arabic mathematical layoutIn Arabic presentation, mathematical expressions arecharacterized by:� the direction of writing: from right to left� the cursivity of letters: characters are tied with small

flowing curves� the use of specific symbols: Arabic alphabetic symbols ( ! "#%$ # &# ) some conventional calligraphic symbols ( ')( * ( + ) some mirrored symbols ( ),-/. 021345687 9 : ;<�=7 > ?A@ B �/C DFE � G HJI K ?ML B > NO= < L 9 O6 5?PQR S BT

These differences are an additional source of difficulties formathematical documents composition tools.

Arabic mathematical e-documents – p. 5/34

e-document in Arabic

The RyDArab system allows Arabic mathematicalexpressions composition. As it is a TEX extension, itpreserves all the distinguished qualities:� a full numerical composition� a high typographical quality� several options to adapted it according to the areas

and levels� generating documents in several formats (DVI, PS,PDF)� producing documents in HTML format with imagebased mathematical expressions

Arabic mathematical e-documents – p. 6/34

Why MathML?

Mathematical documents in HTML (with images), PDF orPS formats, lead to some problems such as:� Size: loading such format can be very low� Reuse: mathematical expressions can’t be reedited� Structuration: mathematical information is not

available for searching, indexing, . . .� Portability: mathematical expressions can’t beprocessed by a computer algebra system

Moreover, documents typographical quality is poor in thecase of HTML format: mathematical expressions appeardotty or grainy when the document is printed.

To avoid these drawbacks, MathML present a better solution.

Arabic mathematical e-documents – p. 7/34

MathML I18n

Semantically, an Arabic mathematical expression is thesame as its Latin equivalent:U V8W XZY W [/\ W \ ] ]\_^ \ [^ Y V^ Xa`bdeclare type="fn" c bdeclare type="fn" cbci c f b/ci c bci cd b/ci cb lambda c b lambda cbbvar c bci c x b/ci c b/bvar c bbvar c bci cfe b/ci c b/bvar cbapply c bapply cbplus/ c bplus/ cbapply c bapply cbpower/ c bpower/ cbci c x b/ci c bci cfe b/ci cbcn c 2 b/cn c bcn c 2 b/cn cb/apply c b/apply cbci c x b/ci c bci cfe b/ci cbcn c 3 b/cn c bcn c 3 b/cn cb/apply c b/apply cb/lambda c b/lambda cb/declare c b/declare c

Arabic mathematical e-documents – p. 8/34

Only display aspects need to be taken into account:gmath h gmath hgmsqrt h gmsqrt hgmfrac h gmfrac hgmrow h gmrow hgmo h - g/mo h gmo h - g/mo hgmi h x g/mi h gmi hi g/mi hgmo h + g/mo h gmo h + g/mo hgmn h 2 g/mn h gmn h 2 g/mn hg/mrow h g/mrow hgmn h 3 g/mn h gmn h 3 g/mn hg/mfrac h g/mfrac hg/msqrt h g/msqrt hg/math h g/math hj k l mn o pq ir instead of

m lts jn

Arabic mathematical e-documents – p. 9/34

Presentation u viewerIs there a need to create a new Arabic MathML viewer oronly to extend an already available one?To answer this question, we have tested several softwaretools like:� MathType, Equation Editor, WebEq, MathPlayer,

TechExplorer, MathCad, Ezmath, . . .� TEXmacs, Mozilla, Amaya, . . .

Because of several technical reasons and once non-freeand non-open-source tools are omitted, two web browsersremained:� Mozilla� Amaya

Arabic mathematical e-documents – p. 10/34

Amaya

Amaya is a W3C WYSIWYG editor/browser that allowsauthors to include mathematical expressions in web pages,following the MathML specification.Under Linux, Amaya uses GTK+ 1.2, without support forbidirectionality or cursivity.Moreover, it uses a drawing area as medium for viewingweb pages. The gdk_draw_text function is used to displaytext. This function can’t be used to display texts in Arabic,Japanese, Thai, . . . , even if it is used in multilingualversions of GTK+. Amaya uses fixed-size fonts PCF.For extensible symbols, Amaya uses drawing functions.For example, the square root is obtained by drawing treelines:

Arabic mathematical e-documents – p. 11/34

Mozilla

Mozilla is a web browser produced by Mozilla andNetscape. Mozilla was selected for an adaptation to theneeds of the situation, mainly because of its popularity andwidespread adoption as well as the existence of an Arabicversion. The layout of mathematical expressions in Latinwriting is more elegant and of good typographical qualitycompared to other systems.Mozilla uses GTK+ 2.0 for the display of web pages. It is amultilingual web browser and uses TrueType fonts. Mozillauses FreeType that allows handling glyphs.For extensible mathematical symbols, it combines bothdrawings functions and font glyphs. For example, the

square root is obtained by one glyph and one line:

Arabic mathematical e-documents – p. 12/34

Mozilla’s extension

In Mozilla’s mathematical environment, there is no supportfor bidirectionality or cursivity.If it exists, a bidirectionality algorithm for MathML should benecessarily different from that used for the text.vmn w 1 v/mn wvmo w + v/mo wvmi w �x� v/mi wvmo w - v/mo wvmn w 2 v/mn w

yz8zMz{z8zMz{z8zMz{z8zM|z{z8zMz8z{zMz8z{zMz8z{}

gives SR P�~ � � instead ofSR � �~ P because the letter � � is

a strong right-to-left character inUnicode.

Arabic mathematical e-documents – p. 13/34

In MathML, the expressions spread out from left to right:subexpressions are placed from left to right into a motherexpression.Neither bidirectionality nor cursivity are supported evenwith mtext ( � \hbox in TEX):vmtext wa�� � ��� D�� C �8� v/mtext wsame behavior with mi:vmi w D �a� �� v/mi w

Arabic mathematical e-documents – p. 14/34

Expression renderingA new element rl will be introduced to invert the directionof rendering of an expression:vmrow wvrl wvmi w � � v/mi wvmo w + v/mo wvmi w v/mi wv/rl wv/mrow w

Text renderingrl is also used for a suitable text rendering:vmtext w vrl w�� � ��� D �� C �� v/rl w v/mtext wvmi w vrl w D ��� �� v/rl w v/mi w

Arabic mathematical e-documents – p. 15/34

Mirrored symbolsrl can help to obtain mirrored symbols:vrl wvmo w vrl w [ v /rl w v/mo wvmi w ��� v/mi wvmo w , v/mo w

mo rl , /rl /momo /mo

vmn w 3 v/mn wvmo w vrl w ) v /rl w v/mo wv/rl w

The characters [ and ) are already contained in the

Bidi Mirroring list in the Unicode Character Databasebut the character , is not yet mentioned in this list

Arabic mathematical e-documents – p. 16/34

Mirrored symbolsrl can help to obtain mirrored symbols:vrl wvmo w vrl w [ v /rl w v/mo wvmi w ��� v/mi w

mo , /mo

vmo w vrl w , v/rl w v/mo w

mo /mo

vmn w 3 v/mn wvmo w vrl w ) v /rl w v/mo wv/rl w

The characters [ and ) are already contained in the

Bidi Mirroring list in the Unicode Character Databasebut the character , is not yet mentioned in this list

Arabic mathematical e-documents – p. 16/34

Mirrored symbolsrl can help to obtain mirrored symbols:vrl wvmo w vrl w [ v /rl w v/mo wvmi w ��� v/mi w

mo , /momo rl , /rl /mo

vmo w � v/mo wvmn w 3 v/mn wvmo w vrl w ) v /rl w v/mo wv/rl w

� �� �The characters [ and ) are already contained in the

Bidi Mirroring list in the Unicode Character Databasebut the character , is not yet mentioned in this list

Arabic mathematical e-documents – p. 16/34

Vertical superpositionElements, that require more than one argument, are forbidimensional composition. The MathML rendering makesparticular arrangement of the arguments necessary.Elements of vertical arrangement, such as mfrac, do notneed special handling:vmfrac wvrl wvmi w � � v/mi wvmo w + v/mo wvmi w v/mi wv/rl wvmn w 3 v/mn wv/mfrac w

Arabic mathematical e-documents – p. 17/34

Multi-argument elementsUsing rl to get a superscript element msup or a subscriptelement msub, in the suitable right-to-left position,generates a syntax error because msup requires twoarguments, whereas there is only one argument:�msup ��rl ��mi ��� �/mi ��mi ����� �/mi ��/rl ��/msup �

In that case, we need new elements amsup and amsub:�amsup ��mi ��� �/mi ��mi ����� �/mi ��/amsup �Arabic mathematical e-documents – p. 18/34

Specific notationIn Arabic, some mathematical beings have a completelydifferent notation compared to their Latin equivalents.For example, the Arabic notation of the arrangement, incombinatorics, is:vamarrange wvmi w � v/mi wvmn w 5 v/mn wvmn w 2 v/mn wv/amarrange w

Arabic mathematical e-documents – p. 19/34

Reflected symbolsIn Arabic mathematical presentation, some symbols, suchthat the square root symbol or the sum, in some Arabiccultural areas, are built through a symmetric reflection ofthe corresponding Latin ones.These symbols require the introduction of a new font familysuch as the one offered in the Arabic Computer Modern.The ACM fonts correspond to the Computer Modern fonts,in TrueType format, with a mirror effect on some glyphs.This fact leads to the introduction of a new element amath,which corresponds to the math element. amath is used tochoose the ACM fonts.

Arabic mathematical e-documents – p. 20/34

amath would not be necessary if the Arabic mathematicalsymbols are already added in the Unicode tables.For example, the Unicode name N-ARY SUMMATION andthe code U+02211 noted by & sum; or &#x02211,corresponds to two glyphs and .gamath hgrl hgmstyle displaystyle="true" hgmunderover hgmo h &sum; g/mo hgmrow hgrl hgmi h�� g/mi hgmo h = g/mo hgmn h 1 g/mn hg/rl hg/mrow hgmi h ��� g/mi hg/munderover hg/mstyle hgmi h� g/mi hg/rl hg/amath h

Arabic mathematical e-documents – p. 21/34

Alphabetic symbolsTo distinguish alphabetic symbols, in different shapes, fromletters used in Arabic texts, and to avoid the heterogeneityresulting from the use of several fonts, there is a need for acomplete and homogeneous Arabic mathematical font.That’s exactly what we are trying to do in another project.

Until their adoption by Unicode, the symbols in use in thisfont will be located in the Private Use Area

E000-F8FF in the Basic Multilingual Plane.vmi w &#xE004; v/mi wArabic mathematical e-documents – p. 22/34

Composed extensible symbolsThe use of the Arabic Computer Modern font family is notenough to compose expressions with symbols such as thesquare root.

Result of using ACM font only:

sub-expression u sub-expression

What should be done:

sub-expression u �   ¡�¢ ¢£¤¥¦ £¨§ ©«ª ¢

Arabic mathematical e-documents – p. 23/34

The use of rl only is not a solution; we will have tointroduce a new element amsqrt:vamath wvamsqrt wvmfrac wvrl wvmi w ��� v/mi wvmo w + v/mo wvmi w v/mi wv/rl wvmn w 3 v/mn wv/mfrac wv/amsqrt wv/amath w

Arabic mathematical e-documents – p. 24/34

The root element amroot requires two treatments:� the same treatment as that used for amsqrt� a treatment similar to that used for amsupvamath wvamroot wvmi w v/mi wvmn w 3 v/mn wv/amroot wv/amath wArabic mathematical e-documents – p. 25/34

Slanted symbolsSome elements, as munderover, munder and mover,need italic correction.In fact, mathematical symbols like integral are slanted, andthen the subscript and superscript will be shifted in thedirection of the symbol slant. ¬ ­® ¯These corrections allow indexes and exponents to be veryclose to the slanted symbol.

Arabic mathematical e-documents – p. 26/34

°

amath

±°

rl

±°

mstyle displaystyle="true"

±°

amsubsup

±°

mo

±

&int;

°

/mo

±°

mn

±

0

°

/mn

±°

mn

±

1

°

/mn

±°

/amsubsup

±°

/mstyle

±°

amsup

±°

mi

±³² °

/mi

±°

mi

±�´¶µ °

/mi

±°

/amsup

±°

mi

±¸· °

/mi

±°

mi

±³² °/mi

±°

/rl

±°

/amath

±Arabic mathematical e-documents – p. 27/34

°

amath

±°

rl

±°

mstyle displaystyle="true"

±°

amunderover

±°

mo

±

&int;

°

/mo

±°

mn

±

0

°

/mn

±°

mn

±

1

°

/mn

±°

/amunderover

±°

/mstyle

±°

amsup

±°

mi

±³² °

/mi

±°

mi

±�´¶µ °

/mi

±°

/amsup

±°

mi

±¸· °

/mi

±°

mi

±³² °/mi

±°

/rl

±°

/amath

±Arabic mathematical e-documents – p. 28/34

Extensible symbolsIn addition to the extensible symbols borrowed from Latinmathematics, Arabic mathematics had several otherextensible symbols:� the limit

D¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ º �¼»� the product

� ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ��� the sum ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ �� �In the following example, manual lengthening of the limitsymbol is performed. But dynamic lengthening, thanks toan automatic calculation of the width of the expression

under the limit sign, would be better:

Arabic mathematical e-documents – p. 29/34

A dynamic lengthening requires the creation of new elements.A lengthening of the straight line is not in conformity withthe rules of the Arabic calligraphy. A curvilinearlengthening is required. It can be obtained through usingCurExt, which makes it possible to stretch Arabic lettersaccording to calligraphic rules.� the limit

½( ¾� the product ')( *� the sum ( +The following mathematical expression is an example ofthe use of mover, with automatic lengthening of the overarrow, obtained thanks to the use of the attribute

accentunder with the true value:

Arabic mathematical e-documents – p. 30/34

DiscussionThe use of the element rl doesn’t represent a verypractical solution as the encoding becomes heavier. Theaddition of this element and of all the new elements mustbe transparent for the user, since they affect only thepresentation and not the semantic of expression.There are two possible solutions:� to build a new algorithm of bidirectionality� to add a new math attribute that will make it possible

to choose the mathematical notation:vmath nota=”latin” w for a Latin expressionvmath nota=”arabic” w for an Arabic expressionOne of these values can be the default value of the attribute.

Arabic mathematical e-documents – p. 31/34

Conclusions & Prospects

The project for the development of communication andpublication tools for scientific and technical e-documents inArabic is still in its beginning.

Our goal was to identify the difficulties and limitations thatcan obstruct the use of MathML for writing mathematics inArabic.

Arabic mathematical e-documents can be structured andpublished on the web using this extended version ofMozilla. Thus, such documents benefit from all theadvantages of using MathML.

We hope that the previous proposals can help in findingsuitable recommendations for Arabic mathematics inUnicode and MathML.

Arabic mathematical e-documents – p. 32/34

Mathematical notationSpecial features:� high precision of the expressions composition integration of the expressions into the text any change in the position of a symbol in the

streamline has a meaning size of the symbol will have to change accordingto its position change of the shape of the symbol according tothe size and the need� size variation of the symbols according to the context� use of a significant number of features of writing: types: text, symbols, extensible symbols, . . . attributes: bold, italic, . . .¿

Arabic mathematical e-documents – p. 33/34

MathML

MathML is a W3C recommendation for mathematicalexpressions.It consists on an XML vocabulary with the following goals:� Encode both mathematical notation and mathematical

meaning� Facilitate conversion to and from other mathematicalformats� Be well suited to template and other mathematicsediting techniques� Be human legible� Allow the passing of information intended for specificrenders and applications ¿

Arabic mathematical e-documents – p. 34/34