Localizing Korn Shell Scripts

Embed Size (px)

Citation preview

  • 8/9/2019 Localizing Korn Shell Scripts

    1/6

    Localizing Korn Shell Scripts

    Localizing Korn Shell ScriptsFinnbarr P. Murphy

    ([email protected])

    In response to recent messages on the ast-users mailing list asking for a how-to or FAQ on how to

    localize Korn Shell (ksh93) shell scripts, I decided to write this post as there is a paucity of good

    information available on the Internet or in print on this particular topic.

    First of all what is meant by localization? Internationalization (internationalisation, I18N) and

    localization (localisation, L12N) are means of adapting software applications to different

    languages and cultural differences. Internationalization is the process of designing and

    engineering a software application so that it can readily support various languages and regions

    differences without changes to the source code. Localization is the process of adapting

    internationalized software for a particular geographical area by translating text strings in the user

    interface into a local language and providing any necessary environment variables which affectcodesets, character sorting order, date and time display, thousands separators and suchlike.

    An example is probably the simplest way to demonstrate what is involved in localizing a shell

    script and the process. Assume we want to localize the following very simple shell script called

    demo which is located in the subdirectory/example:

    #!/bin/ksh

    name="John Kane"

    print "Simple demonstration of ksh93 message translation"

    print "Message locale is: $LC_MESSAGES"

    echo "Hello"print "Goodbye"

    printf "Welcome %s\n" $name

    print "This string is not translated because it is not in the message catalog"

    exit 0

    This shell script is to be localized for French and Italian users so that the strings enclosed in

    double quotations (message text strings) are displayed in their native language.

    Before the shell script can be localized, it must first be internationalized. The message text strings

    must be written in a format which ksh93 understands to mean replace this text string if possible

    by the appropriate text string from a message catalog for the current locale.. Fortunately this is

    easy to do in ksh93 using the special syntax $. A $ in front of a double quoted string is ignored

    in the C or POSIX locale but in other locales may cause the text inside the double quotes (the

    default message text string) to be replaced by a locale specific message text string. Why the use of

    mayinstead ofshall in the previous sentence? Well, if the shell script has not yet been localized, a

    suitable message catalog may not yet exist and therefore the default message text string will be

    displayed.

    Here is the internationalized version ofdemo.

    #!/bin/ksh

    name="John Kane"print "Simple demonstration of ksh93 message translation"

    print "Message locale is: $LC_MESSAGES"

    echo $"Hello"

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 1/6

    http://blog.fpmurphy.com/2010/07/localizing-korn-shell-scripts.htmlhttps://mailman.research.att.com/pipermail/ast-users/http://kornshell.com/http://kornshell.com/https://mailman.research.att.com/pipermail/ast-users/http://blog.fpmurphy.com/2010/07/localizing-korn-shell-scripts.html
  • 8/9/2019 Localizing Korn Shell Scripts

    2/6

    Localizing Korn Shell Scripts

    print $"Goodbye"

    printf $"Welcome %s\n" $name

    print $"This string is not translated because it is not in the message catalog"

    exit 0

    The default message text strings are still displayed if you execute the shell script. It works just like

    the original version since no message catalogs have so far been created.

    The next stage of the process is to extract the message text strings and translate them into the

    appropriate languages. You can manually extract these text strings or you can let ksh93 do all the

    work of extracting the text strings by invoking ksh93 with the -D option.

    $ ksh -D demo

    "Hello"

    "Goodbye"

    "Welcome %s\n"

    "This string should not be translated because it is not in the message catalog"

    $

    Incidentally the bash shell also has support for the $ message string syntax and for the -D

    option. However the Bash Reference Manual does not document this functionality. Instead it

    documents the GNU gettext PO (portable object) file format and localization methodology.

    When localizing a shell script a decision has to be made as to where to place the localized

    message catalogs. Typically they are placed in a subdirectory under the directory where the script

    is located but can be placed elsewhere if the NLSPATHenvironmental variable is set. ksh93

    supports the following locations for message catalogs by default:

    ${ROOT}/share/lib/locale/%l/%C/%N${ROOT}/share/locale/%l/%C/%N

    ${ROOT}/lib/locale/%l/%C/%N

    where ${ROOT} is the directory containing the shell script and %l,%C and %N have the same

    meaning as when used with the NLSPATHenvironmental variable.

    NLSPATHis the environmental variable which catopen() uses to attempt to locate message

    catalogues. The NLS inNLSPATHstands for National Language Support. AnNLSPATHvariable

    consists of one or more templates. Templates consist of of an optional prefix, one or more format

    elements, a filename and an optional suffix. Templates are separated by colons. For example, the

    followingNLSPATHvariable consists of two templates:

    NLSPATH=":%N.cat:/shlib/message/%L/%N.cat"

    A leading colon or two adjacent colons (::) is equivalent to specifying %N. A string describing the

    current locale is expected to have the form language[_territory[.codeset]], e.g. en_US.utf8,

    de_DE.utf8, as all three components are used by NLSPATH formatting elements.

    %NThis format element is substituted with the name of the message catalog file.q

    %L This format element is substituted with the current locale name.q

    %l This format element is substituted with the language component of the current locale name.q

    %t This format element is substituted with the territory component of the current locale name.q

    %c This format element is substituted with the codeset component of the current locale name.q

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 2/6

    http://www.gnu.org/software/bash/manual/bash.htmlhttp://www.gnu.org/software/bash/manual/bash.html
  • 8/9/2019 Localizing Korn Shell Scripts

    3/6

    Localizing Korn Shell Scripts

    In order to demonstrate the use ofNLSPATHand to keep things simple, our example places them

    in a subdirectory under where the shell script is located using following directory structure:

    /example/demo

    /example/demo/locale

    /example/demo/locale/C

    /example/demo/locale/fr/example/demo/locale/it

    Note the/example/demo/locale/C subdirectory. It is mandatory to have a message catalog in this

    directory otherwise localization does not work and only the default message text strings are

    displayed. The message text strings in this message catalog must be exactly the same as the

    message text strings in your script. Code in libast (../libast/port/mg.c) compares the default

    message text string from the shell script to all the message text strings in this message catalog. If

    there is a match, the message catalog set and membernumbers (more about these shortly) are

    used to quickly retrieve the corresponding message text string from the appropriate locale

    message catalog if one exists.

    Here is the message text source file (demo.msg) for the C locale:

    $quote "

    $set 3 This is the C locale message set

    1 "Hello"

    2 "Goodbye"

    3 "Welcome %s\\n"

    Message text source files must conform to the gencat format specification. See IEEE Std

    1003.1-2008 for the full specification. Here is a brief summary of the more important directives in

    this specification:

    $ comment A line beginning with $ followed by a blank character is treated as a comment.q

    $delset N comment Delete message setNfrom an existing message catalog. Any text followingq

    the set number is treated as a comment.

    $quote C Specifies a quote character C to surround message-text so that trailing spaces orq

    empty messages are visible in a message source line. By default no quoting of message-text is

    recognized.

    $set N comment Specifies the set identifierNof the following messages until the next $set orq

    EOF. Any text following the set identifier is treated as a comment. If no $set directive is

    specified, all messages are placed in message set 1.

    M message-text The message-text is stored in the message catalog with the set identifierq

    specified by the last $set directive and with a message identifier ofM.

    Refer to the msggen or gencat man pages or the IEEE Standard for the complete specification.

    There is much more to the specification than what I have described here.

    You must use the AT&TAST (Advanced Software Technologies) Open Source Collection msggen

    utility to generate (compile) a message catalog from a message text source file. In case you are

    unaware of it, ksh93 is also part of the AST Open Source Collection. The msggen utility is part of

    the ast-base package; it is not part of the ast-ksh package. Note the use of 3 for the set number in

    the above example. This is mandatory for ksh93 shell scripts. It is hardcoded into ksh93; no other

    set number will work. AST libraries use set 1, AST command and utilities use set 2 and shell

    scripts use set 3.

    Message catalogs produced by msggen are platform independent and are smaller than the

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 3/6

    http://www2.research.att.com/sw/download/http://www2.research.att.com/sw/download/
  • 8/9/2019 Localizing Korn Shell Scripts

    4/6

    Localizing Korn Shell Scripts

    equivalent catalog produced by the gencat utility. On the other hand, message catalogs produced

    by gencat are platform dependent and may have to be recompiled when ported to a different

    platform. If you have difficulty distinguishing between message catalogs produced by gencat and

    those produced by msggen, an easy way to differentiate the two formats is by means of the first 4

    bytes of a message catalog. Those generated by msggen contain the magic string:

    "\015\023\007\000"

    Here is how to use msggen to generate a message catalog from a message text source file.

    $ msggen locale/C/demo.cat demo.msg

    If the specified message catalog already exists msggen merges the message text source file into

    this message catalog, otherwise a new message catalog is created. If set and message numbers

    collide, the new message text will replace the message text currently contained in the message

    catalog. Non-ASCII characters must be UTF-8 encoded. Message text source files containingsymbolic identifiers cannot be processed by the msgget utility.

    You do not have to give a message catalog an extension. However it is common practice for

    message catalogs to use the .cat extension. In fact you can call the message catalog anything you

    like but the default is for the name of the message catalog to be exactly the same as the name of

    the shell script. You can work around this restriction to a certain extent by the use of an

    appropriate NLSPATH string.

    In this example we setNLSPATHso that it handles the .cat extension.

    $ NLSPATH=/example/locale/%l/%N.cat; export NLSPATH

    where %l is the language element and %N is the catalog name parameter.

    You can also use msggen to check a compiled message catalog:

    $ msggen -l locale/C/demo.cat

    $quote "

    $set 3

    1 "Hello"

    2 "Goodbye"

    3 "Welcome %s\\n"

    Another useful feature ofmsggen is that you can use it to retrieve a specific message string from a

    message catalog as shown here (3 is the set, 2 is the id):

    $ msgget C demo 3.2

    Goodbye

    $ msgget fr_FR.utf8 demo 3.2

    Au Revoir

    $

    Here is the contents of the French message text source file (demo.msg.fr):

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 4/6

  • 8/9/2019 Localizing Korn Shell Scripts

    5/6

    Localizing Korn Shell Scripts

    $quote "

    $set 3 This is the French locale message set

    1 "Bonjour"

    2 "Au Revoir"

    3 "Bienvenu %s\\n"

    which is compiled into the French locale message catalog by:

    $ msggen locale/fr/demo.cat demo.msg.fr

    and here is the contents of the Italian message text source file (demo.msg.it):

    $quote "

    $set 3 This is the Italian locale message set

    1 "Ciao"

    2 "Addio"

    3 "Benvenuto %s\\n"

    which is compiled into the Italian locale message catalog by:

    $ msggen locale/it/demo.cat demo.msg.it

    Now that we have generated all the necessary message catalogs and placed them in the

    appropriate subdirectories, we are ready to test the localization of the demo script.

    $ NLSPATH=/example/locale/%l/%N.cat; export NLSPATH

    $ LC_MESSAGES=en_US.utf8; export LC_MESSAGES$ ./demo

    Simple demonstration of ksh93 message translation

    Message locale is: en_US.utf8

    Hello

    Goodbye

    Welcome John Kane

    This string should not be translated because it is not in the message catalog

    $ LC_MESSAGES=fr_FR.utf8; export LC_MESSAGES

    $ ./demo

    Simple demonstration of ksh93 message translation

    Message locale is: fr_FR.utf8

    Bonjour

    Au RevoirBienvenu John Kane

    This string should not be translated because it is not in the message catalog

    $ LC_MESSAGES=it_IT.utf8; export LC_MESSAGES

    $ ./demo

    Simple demonstration of ksh93 message translation

    Message locale is: it_IT.utf8

    Ciao

    Addio

    Benvenuto John Kane

    This string should not be translated because it is not in the message catalog

    In the above example I used theLC_MESSAGES environmental variable to indicate to ksh93 whichmessage catalog to use when displaying message strings. This is all that ksh93 actually needs to

    locate the right message catalog.

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 5/6

  • 8/9/2019 Localizing Korn Shell Scripts

    6/6

    Localizing Korn Shell Scripts

    In the real world, however, the LANG environmental variable would be set to the appropriate

    locale instead of justLC_MESSAGES. This can cause unexpected output and errors in your scripts.

    Consider floating point numbers for example. Do not assume that the decimal point is always a

    period.

    $ float pi=3.14159; printf "%.5f\n" pi

    3.14159

    $ LANG=es_ES.utf8; export LANG

    $ locale -k LC_NUMERIC

    decimal_point=","

    thousands_sep=""

    grouping=-1;-1

    numeric-decimal-point-wc=44

    numeric-thousands-sep-wc=0

    numeric-codeset="UTF-8"

    $ printf "%.5f\n" pi

    3,14159

    $ float pi=3.14159; printf "%.5f\n" pi

    ksh: 3.14159: arithmetic syntax error

    $

    When the locale is set to es_ES (Spain), the decimal point is a comma not a period as in the USA.

    Note how the assignmentfloat pi=3.14159 fails in the es_ES locale because of the use of a period

    as the decimal point.

    Do not assume that numbers group in threes and that the grouping separator is a comma.

    Consider the following:

    $ printf "%d %'d\n" 10000000 10000000

    10000000 10000000

    $ LC_NUMERIC=en_GB printf "%d %'d\n" 10000000 1000000010000000 10,000,000

    $ LC_NUMERIC=de_DE printf "%d %'d\n" 10000000 10000000

    10000000 10.000.000

    $ LC_NUMERIC=de_CH printf "%d %'d\n" 10000000 10000000

    10000000 10'000'000

    Note the use of%d to indicate that the grouping separator should be included in the output.

    Do not make assumptions about the format of the output of commands such as the date and who

    commands. Such assumptions will generally fail in a non-US locale. For example, determining the

    day of the month by piping output of the command to awk command will fail in a non_US locale. If

    a shell script makes assumptions about the format of the output from locale-sensitive commandsand utilities, then it needs to be changed.

    In conclusion, I hope this post helps readers understand how to localize their ksh93 shell scripts.

    Obviously it is only a quick introduction to the subject. Please let me know if there is anything

    important that I have not discussed or got wrong and I will update the post.

    P.S. The above example was tested on ksh93 version 93t+.2010-03-05

    07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 6/6