Upload
finnbarr-p-murphy
View
222
Download
0
Embed Size (px)
Citation preview
8/9/2019 Localizing Korn Shell Scripts
1/6
Localizing Korn Shell Scripts
Localizing Korn Shell ScriptsFinnbarr P. Murphy
In response to recent messages on the ast-users mailing list asking for a how-to or FAQ on how to
localize Korn Shell (ksh93) shell scripts, I decided to write this post as there is a paucity of good
information available on the Internet or in print on this particular topic.
First of all what is meant by localization? Internationalization (internationalisation, I18N) and
localization (localisation, L12N) are means of adapting software applications to different
languages and cultural differences. Internationalization is the process of designing and
engineering a software application so that it can readily support various languages and regions
differences without changes to the source code. Localization is the process of adapting
internationalized software for a particular geographical area by translating text strings in the user
interface into a local language and providing any necessary environment variables which affectcodesets, character sorting order, date and time display, thousands separators and suchlike.
An example is probably the simplest way to demonstrate what is involved in localizing a shell
script and the process. Assume we want to localize the following very simple shell script called
demo which is located in the subdirectory/example:
#!/bin/ksh
name="John Kane"
print "Simple demonstration of ksh93 message translation"
print "Message locale is: $LC_MESSAGES"
echo "Hello"print "Goodbye"
printf "Welcome %s\n" $name
print "This string is not translated because it is not in the message catalog"
exit 0
This shell script is to be localized for French and Italian users so that the strings enclosed in
double quotations (message text strings) are displayed in their native language.
Before the shell script can be localized, it must first be internationalized. The message text strings
must be written in a format which ksh93 understands to mean replace this text string if possible
by the appropriate text string from a message catalog for the current locale.. Fortunately this is
easy to do in ksh93 using the special syntax $. A $ in front of a double quoted string is ignored
in the C or POSIX locale but in other locales may cause the text inside the double quotes (the
default message text string) to be replaced by a locale specific message text string. Why the use of
mayinstead ofshall in the previous sentence? Well, if the shell script has not yet been localized, a
suitable message catalog may not yet exist and therefore the default message text string will be
displayed.
Here is the internationalized version ofdemo.
#!/bin/ksh
name="John Kane"print "Simple demonstration of ksh93 message translation"
print "Message locale is: $LC_MESSAGES"
echo $"Hello"
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 1/6
http://blog.fpmurphy.com/2010/07/localizing-korn-shell-scripts.htmlhttps://mailman.research.att.com/pipermail/ast-users/http://kornshell.com/http://kornshell.com/https://mailman.research.att.com/pipermail/ast-users/http://blog.fpmurphy.com/2010/07/localizing-korn-shell-scripts.html8/9/2019 Localizing Korn Shell Scripts
2/6
Localizing Korn Shell Scripts
print $"Goodbye"
printf $"Welcome %s\n" $name
print $"This string is not translated because it is not in the message catalog"
exit 0
The default message text strings are still displayed if you execute the shell script. It works just like
the original version since no message catalogs have so far been created.
The next stage of the process is to extract the message text strings and translate them into the
appropriate languages. You can manually extract these text strings or you can let ksh93 do all the
work of extracting the text strings by invoking ksh93 with the -D option.
$ ksh -D demo
"Hello"
"Goodbye"
"Welcome %s\n"
"This string should not be translated because it is not in the message catalog"
$
Incidentally the bash shell also has support for the $ message string syntax and for the -D
option. However the Bash Reference Manual does not document this functionality. Instead it
documents the GNU gettext PO (portable object) file format and localization methodology.
When localizing a shell script a decision has to be made as to where to place the localized
message catalogs. Typically they are placed in a subdirectory under the directory where the script
is located but can be placed elsewhere if the NLSPATHenvironmental variable is set. ksh93
supports the following locations for message catalogs by default:
${ROOT}/share/lib/locale/%l/%C/%N${ROOT}/share/locale/%l/%C/%N
${ROOT}/lib/locale/%l/%C/%N
where ${ROOT} is the directory containing the shell script and %l,%C and %N have the same
meaning as when used with the NLSPATHenvironmental variable.
NLSPATHis the environmental variable which catopen() uses to attempt to locate message
catalogues. The NLS inNLSPATHstands for National Language Support. AnNLSPATHvariable
consists of one or more templates. Templates consist of of an optional prefix, one or more format
elements, a filename and an optional suffix. Templates are separated by colons. For example, the
followingNLSPATHvariable consists of two templates:
NLSPATH=":%N.cat:/shlib/message/%L/%N.cat"
A leading colon or two adjacent colons (::) is equivalent to specifying %N. A string describing the
current locale is expected to have the form language[_territory[.codeset]], e.g. en_US.utf8,
de_DE.utf8, as all three components are used by NLSPATH formatting elements.
%NThis format element is substituted with the name of the message catalog file.q
%L This format element is substituted with the current locale name.q
%l This format element is substituted with the language component of the current locale name.q
%t This format element is substituted with the territory component of the current locale name.q
%c This format element is substituted with the codeset component of the current locale name.q
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 2/6
http://www.gnu.org/software/bash/manual/bash.htmlhttp://www.gnu.org/software/bash/manual/bash.html8/9/2019 Localizing Korn Shell Scripts
3/6
Localizing Korn Shell Scripts
In order to demonstrate the use ofNLSPATHand to keep things simple, our example places them
in a subdirectory under where the shell script is located using following directory structure:
/example/demo
/example/demo/locale
/example/demo/locale/C
/example/demo/locale/fr/example/demo/locale/it
Note the/example/demo/locale/C subdirectory. It is mandatory to have a message catalog in this
directory otherwise localization does not work and only the default message text strings are
displayed. The message text strings in this message catalog must be exactly the same as the
message text strings in your script. Code in libast (../libast/port/mg.c) compares the default
message text string from the shell script to all the message text strings in this message catalog. If
there is a match, the message catalog set and membernumbers (more about these shortly) are
used to quickly retrieve the corresponding message text string from the appropriate locale
message catalog if one exists.
Here is the message text source file (demo.msg) for the C locale:
$quote "
$set 3 This is the C locale message set
1 "Hello"
2 "Goodbye"
3 "Welcome %s\\n"
Message text source files must conform to the gencat format specification. See IEEE Std
1003.1-2008 for the full specification. Here is a brief summary of the more important directives in
this specification:
$ comment A line beginning with $ followed by a blank character is treated as a comment.q
$delset N comment Delete message setNfrom an existing message catalog. Any text followingq
the set number is treated as a comment.
$quote C Specifies a quote character C to surround message-text so that trailing spaces orq
empty messages are visible in a message source line. By default no quoting of message-text is
recognized.
$set N comment Specifies the set identifierNof the following messages until the next $set orq
EOF. Any text following the set identifier is treated as a comment. If no $set directive is
specified, all messages are placed in message set 1.
M message-text The message-text is stored in the message catalog with the set identifierq
specified by the last $set directive and with a message identifier ofM.
Refer to the msggen or gencat man pages or the IEEE Standard for the complete specification.
There is much more to the specification than what I have described here.
You must use the AT&TAST (Advanced Software Technologies) Open Source Collection msggen
utility to generate (compile) a message catalog from a message text source file. In case you are
unaware of it, ksh93 is also part of the AST Open Source Collection. The msggen utility is part of
the ast-base package; it is not part of the ast-ksh package. Note the use of 3 for the set number in
the above example. This is mandatory for ksh93 shell scripts. It is hardcoded into ksh93; no other
set number will work. AST libraries use set 1, AST command and utilities use set 2 and shell
scripts use set 3.
Message catalogs produced by msggen are platform independent and are smaller than the
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 3/6
http://www2.research.att.com/sw/download/http://www2.research.att.com/sw/download/8/9/2019 Localizing Korn Shell Scripts
4/6
Localizing Korn Shell Scripts
equivalent catalog produced by the gencat utility. On the other hand, message catalogs produced
by gencat are platform dependent and may have to be recompiled when ported to a different
platform. If you have difficulty distinguishing between message catalogs produced by gencat and
those produced by msggen, an easy way to differentiate the two formats is by means of the first 4
bytes of a message catalog. Those generated by msggen contain the magic string:
"\015\023\007\000"
Here is how to use msggen to generate a message catalog from a message text source file.
$ msggen locale/C/demo.cat demo.msg
If the specified message catalog already exists msggen merges the message text source file into
this message catalog, otherwise a new message catalog is created. If set and message numbers
collide, the new message text will replace the message text currently contained in the message
catalog. Non-ASCII characters must be UTF-8 encoded. Message text source files containingsymbolic identifiers cannot be processed by the msgget utility.
You do not have to give a message catalog an extension. However it is common practice for
message catalogs to use the .cat extension. In fact you can call the message catalog anything you
like but the default is for the name of the message catalog to be exactly the same as the name of
the shell script. You can work around this restriction to a certain extent by the use of an
appropriate NLSPATH string.
In this example we setNLSPATHso that it handles the .cat extension.
$ NLSPATH=/example/locale/%l/%N.cat; export NLSPATH
where %l is the language element and %N is the catalog name parameter.
You can also use msggen to check a compiled message catalog:
$ msggen -l locale/C/demo.cat
$quote "
$set 3
1 "Hello"
2 "Goodbye"
3 "Welcome %s\\n"
Another useful feature ofmsggen is that you can use it to retrieve a specific message string from a
message catalog as shown here (3 is the set, 2 is the id):
$ msgget C demo 3.2
Goodbye
$ msgget fr_FR.utf8 demo 3.2
Au Revoir
$
Here is the contents of the French message text source file (demo.msg.fr):
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 4/6
8/9/2019 Localizing Korn Shell Scripts
5/6
Localizing Korn Shell Scripts
$quote "
$set 3 This is the French locale message set
1 "Bonjour"
2 "Au Revoir"
3 "Bienvenu %s\\n"
which is compiled into the French locale message catalog by:
$ msggen locale/fr/demo.cat demo.msg.fr
and here is the contents of the Italian message text source file (demo.msg.it):
$quote "
$set 3 This is the Italian locale message set
1 "Ciao"
2 "Addio"
3 "Benvenuto %s\\n"
which is compiled into the Italian locale message catalog by:
$ msggen locale/it/demo.cat demo.msg.it
Now that we have generated all the necessary message catalogs and placed them in the
appropriate subdirectories, we are ready to test the localization of the demo script.
$ NLSPATH=/example/locale/%l/%N.cat; export NLSPATH
$ LC_MESSAGES=en_US.utf8; export LC_MESSAGES$ ./demo
Simple demonstration of ksh93 message translation
Message locale is: en_US.utf8
Hello
Goodbye
Welcome John Kane
This string should not be translated because it is not in the message catalog
$ LC_MESSAGES=fr_FR.utf8; export LC_MESSAGES
$ ./demo
Simple demonstration of ksh93 message translation
Message locale is: fr_FR.utf8
Bonjour
Au RevoirBienvenu John Kane
This string should not be translated because it is not in the message catalog
$ LC_MESSAGES=it_IT.utf8; export LC_MESSAGES
$ ./demo
Simple demonstration of ksh93 message translation
Message locale is: it_IT.utf8
Ciao
Addio
Benvenuto John Kane
This string should not be translated because it is not in the message catalog
In the above example I used theLC_MESSAGES environmental variable to indicate to ksh93 whichmessage catalog to use when displaying message strings. This is all that ksh93 actually needs to
locate the right message catalog.
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 5/6
8/9/2019 Localizing Korn Shell Scripts
6/6
Localizing Korn Shell Scripts
In the real world, however, the LANG environmental variable would be set to the appropriate
locale instead of justLC_MESSAGES. This can cause unexpected output and errors in your scripts.
Consider floating point numbers for example. Do not assume that the decimal point is always a
period.
$ float pi=3.14159; printf "%.5f\n" pi
3.14159
$ LANG=es_ES.utf8; export LANG
$ locale -k LC_NUMERIC
decimal_point=","
thousands_sep=""
grouping=-1;-1
numeric-decimal-point-wc=44
numeric-thousands-sep-wc=0
numeric-codeset="UTF-8"
$ printf "%.5f\n" pi
3,14159
$ float pi=3.14159; printf "%.5f\n" pi
ksh: 3.14159: arithmetic syntax error
$
When the locale is set to es_ES (Spain), the decimal point is a comma not a period as in the USA.
Note how the assignmentfloat pi=3.14159 fails in the es_ES locale because of the use of a period
as the decimal point.
Do not assume that numbers group in threes and that the grouping separator is a comma.
Consider the following:
$ printf "%d %'d\n" 10000000 10000000
10000000 10000000
$ LC_NUMERIC=en_GB printf "%d %'d\n" 10000000 1000000010000000 10,000,000
$ LC_NUMERIC=de_DE printf "%d %'d\n" 10000000 10000000
10000000 10.000.000
$ LC_NUMERIC=de_CH printf "%d %'d\n" 10000000 10000000
10000000 10'000'000
Note the use of%d to indicate that the grouping separator should be included in the output.
Do not make assumptions about the format of the output of commands such as the date and who
commands. Such assumptions will generally fail in a non-US locale. For example, determining the
day of the month by piping output of the command to awk command will fail in a non_US locale. If
a shell script makes assumptions about the format of the output from locale-sensitive commandsand utilities, then it needs to be changed.
In conclusion, I hope this post helps readers understand how to localize their ksh93 shell scripts.
Obviously it is only a quick introduction to the subject. Please let me know if there is anything
important that I have not discussed or got wrong and I will update the post.
P.S. The above example was tested on ksh93 version 93t+.2010-03-05
07-18-2010 Copyright 2004-2010 Finnbarr P. Murphy. All rights reserved. 6/6