Internationalizing JavaScript Applications · ECMAScript • Language Speci!cation • Developed by...

Preview:

Citation preview

Internationalizing JavaScript Applications

Norbert Lindenberg

© Norbert Lindenberg 2012. All rights reserved.

ECMAScript

• Language Speci!cation

• Developed by Ecma TC 39

• Language syntax and semantics

• Core API: Object, String, Array, RegExp, ...

• 5.1 current

• 6 expected December 2013

ECMAScript• Internationalization API Speci!cation

• Developed by Ecma TC 39 + experts

• Collation, number, date & time formatting

• Started fall 2010

• Speci!cation stable

• Implementations and test suite in progress

• Approval expected December 2012

JavaScript Environments

• Web browsers: with DOM, XHR

• Servers: Node

• Platforms: Firefox OS, Metro Windows 8-style UI, Phonegap

• Libraries: jQuery, Dojo, YUI, GWT, +++++

Collation

Collation (Sorting)• Old: String.prototype.localeCompare

• Only string argument

• New: Intl.Collator

• locales

• options

• Fixed: String.prototype.localeCompare

• With locales and options arguments

Locales• BCP 47 language tags

• Language, script, country codes

• “es”, “en-AU”, “zh-Hans-CN”

• Unicode locale extension

• “de-u-co-phonebk”

• Preference lists

• [“mr”, “hi”, “en-IN”]

Locale Negotiation• BCP 47 Lookup

• [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX”

• Best !t

• implementation de!ned

• [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es”

• Unicode extension handled separately

Collator Extensions

• co: collation – phonebook, pinyin, ...

• kf: case !rst – upper, lower

• kn: numeric sorting

• kk: use normalization

Collator Options

• localeMatcher: lookup, best !t

• usage: sort, search

• sensitivity: base, accent, case, variant

• ignorePunctuation

• numeric, normalization, caseFirst

Non-ECMAScript

• Nothing good found (some for Latin only)

• Collation is hard

• Knowledge of full Unicode character set

• Big tables

Number Formatting

Number Formatting• Old: Number.prototype.toLocaleString

• No arguments

• New: Intl.NumberFormat

• locales

• options

• Fixed: Number.prototype.toLocaleString

• With locales and options arguments

NumberFormat Extensions

• nu: numbering system

NumberFormat Options

• localeMatcher: lookup, best !t

• style: decimal, currency, percent

• currency: ISO 4217 currency code

• currencyDisplay: symbol, code, name

• minimum/maximum digits

• useGrouping

¤ % ๙ # , ⚑Globalize + + - + - 250+

Dojo + + - + - 30+

Closure + + + + + 300+

Windows 8-style UI + + + + + 100s

iLib + + - + - 10+¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.

Non-ECMAScript

Date and Time Formatting

Date and Time Formatting

• Old: Date.prototype.toLocale[|Date|Time]String

• No arguments

• New: Intl.DateTimeFormat

• locales

• options

• Fixed: Date.prototype.toLocale[|Date|Time]String

• With locales and options arguments

DateTimeFormat Extensions

• ca: calendar

• nu: numbering system

DateTimeFormat Options

• localeMatcher: lookup, best !t

• timeZone: UTC

• hour12

• weekday, era, year, month, day, hour, minute, second, timeZoneName: components

• formatMatcher: basic, best !t

Non-ECMAScript

ca tz ๙ ⚑Globalize 5+ + - 250+Dojo 4 - - 30+Closure + + + 300+Windows 8-style UI ? - ? ?iLib 3 + - 10+YUI - - - 50+ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.

Message Construction

• Substitution

• {user} went to {city}.

• {user}さんは{city}へ行きました。

Message Construction

• Plurals

• {user} est allé à {city}.

• {user1} et {user2} sont allés à {city}.

• 1-6 forms depending on language

• {number, plural {one {...} few {...} many {...}}}

Message Construction

• Gender

• {user} est allé à {city}.

• {user} est allée à {city}.

• 1-4 forms depending on language

• {gender, select {female {...} male {...} unknown {...}}}

Message Construction{gender, select {

female {num, plural {

one {{user1} est allée à {city}.}

other {{user1} et {user2} sont allées à {city}.}}}

male {num, plural {

one {{user1} est allé à {city}.}

other {{user1} et {user2} sont allés à {city}.}}}

}}

Message Construction

• Google has MessageFormat for Closure environment

• Alex Sexton provided standalone version

Occupy Wall Street. By @tanlines.

Supplementary Characters

• Characters above U+FFFF

• Emoji, rare CJK, ancient scripts, musical symbols, ...

• 2 units in UTF-16

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

• DOM, text input, text rendering, XMLHttpRequest, libraries, apps

ECMAScript 6: UTF-16

• New Unicode mode in regular expressions

• Case conversion for full Unicode

• Full Unicode in identi!ers

• String accessors for code points

• But: no change to low-level string comparison

Rendering

• Emoji on Mac/iOS are rendered with color font

• On Mac, only Safari supports this font

• Not Firefox, Chrome, Opera

• Fonts for other supplementary characters supported in all modern browsers

Regular Expressions

• RegExp in ES5 doesn’t have much Unicode support

• No support for Unicode character properties

• No support for supplementary characters

Regular Expressions

• CSet (inimino): Character classes with supplementary characters

• XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters

Unicode Normalization

• Makes strings be equal that users perceive as equal (more or less)

• ä = a ¨

• ự = ự

• 김 = ㄱ ㅣ ㅁ

Unicode Normalization

• ECMAScript “assumes” normalization happens where needed

• Reality: applications have to do it

• Libraries available, but not up to date:

• unorm (Matsuza)

• Richard Ishida’s normalizer

北京大学.中国

北京大学.中国

Internationalized Domain Names

• Unicode at user interface

• ASCII under the hood

• 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s

• Main steps:

• normalization (as discussed)

• punycode (Mathias Bynens has latest)

Summary

• ECMAScript Internationalization API provides core functionality

• Please review and provide feedback

• http://norbertlindenberg.com/2012/06/ecmascript-internationalization-api/

• Libraries provide more internationalization support than you may think

Recommended