37
11/7/200 4 Kent State University Shreve 1 Gregory M. Shreve Software Localization and Internationalization: How and Why

Localization Testing

Embed Size (px)

DESCRIPTION

Testing the global product to adapt to the local language and culture

Citation preview

11/7/2004

Kent State University Kent State University

Shreve1

Gregory M. ShreveGregory M. Shreve

Software Localization and Internationalization:How and Why

11/7/2004

Kent State University Kent State University

Shreve2

Internet World Stats estimates the current number of WWW users at 785 million. Of these, 29% reside in North America, 27.7% reside in Europe, and 31% reside in Asia with penetration rates of 69.8%, 29.9% and 6.7% respectively.

With 58.7% of current users residing in regions with an average penetration rate of only 18.3%, it is clear that these foreign markets offer substantial rewards for those prepared to enter them.

Internet, E-Commerce & Foreign Markets

The growth of the Internet and e-commerce over the next decade will be driven by the expansion of foreign

markets.

11/7/2004

Kent State University Kent State University

Shreve3

In 2003 e-commerce sales to foreign customers exceeded domestic sales. This year the European Internet economy is expected to break the 4 trillion dollar mark, growing at a compound annual rate of 87%. Western Europe is expected to lead all regions with 692 billion dollars in global online exports in 2004.

North America will move 23% of its exports online, with the U.S. pumping 210 billion dollars into cross border e-commerce. The Asia-Pacific region will reach 219 billion dollars in 2004, sparked by 57 billion dollars in Japanese online exports.

Consumer as Foreigner

11/7/2004

Kent State University Kent State University

Shreve4

Global, Globalize, Globalization

Companies that intend to sell online will have to globalize their web presence and their products to reach the majority of the online marketplace. They will have to make their web sites, software interfaces, and product documentation available in the languages and cultural styles of an increasingly diverse and international market by applying a process called localization – the translation of content and adaptation of interface and form to reflect the expectations of one or many given locales.

For global-strategy American companies, over40% of total revenue comes from internationalsales. These companies market high-technology products such as software,medical instrumentation, CAD / CAM devices, and so on.

11/7/2004

Kent State University Kent State University

Shreve5

Global, Globalize, Globalization

Most of these products have a high document overhead, with instructions on the assembly, use, maintenance, and repair of the products delivered via off- and on-line electronic documentation. Most are marketed and supported online. Further, many products may have embedded software components and user interfaces use on-line databases. These products and documents must be delivered to locales, target markets with different cultural and linguistics contexts.

CBTcomputer-based-training

UIuser interfaces

Marketingpackages, web

Documentationmanuals, help files

Supportcustomer, technical, web

11/7/2004

Kent State University Kent State University

Shreve6

Language Industry

While global marketing existed before the 1990’s, the translation / software localization industry (or “language industry” for short) today has evolved primarily as a result of the rapid global expansion of the computer software market and the increasing use of the Internet as a global marketing and customer service tool – all part of globalization.

The corporate problem is, of course, that many companies do not understand HOW to prepare their many products, documents, web pages and database interfaces for distribution in other linguistic and cultural locales – hence the need for the services of the language industry.

11/7/2004

Kent State University Kent State University

Shreve7

New Media, New Markets

Experts estimate the current worth of the U.S. language industry at just under $2 billion annually, with the global market worth approximately $6 billion. Indications are that growth will continue to be strong into the next decade because of new electronic media and markets.

Consider the case of massively multi-player online games (MMOGs): the language industry enables the publishers of these games to leverage their initial development investment by translating and adapting the games for international locales. Industry projections are that MMOGs will post a 52% cumulative annual growth rate between 2002 and 2006.

11/7/2004

Kent State University Kent State University

Shreve8

Initial Definitions

This presentation examines the issues and processes involved in software internationalization and localization.

There are three related major processes to consider. We have already discussed globalization.

• globalization, a strategic decision to reach an international audience or to include different linguistic and cultural materials in a product, software application, web site or digital collection;

• internationalization, a design process intended to enable efficient and cost-effective subsequent linguistic and cultural adaptation;

• localization, the preparation of locale-specific versions of an application’s interface and content.

G11N L10N I18N

11/7/2004

Kent State University Kent State University

Shreve9

Localization is the preparation of locale-specific versions of a software application, electronic document, internet resource, or digital collection. It consists of the translation of textual material into the language and textual conventions of the target locale and the adaptation of non-textual materials and delivery / display mechanisms to take into account the cultural requirements of that locale.

Internationalization is an “upstream” engineering process that should precede localization. Its aim is to make subsequent localization/translation easier, more efficient, and less costly.

Internationalization & Localization

internationalization localizationglobalization

translation

11/7/2004

Kent State University Kent State University

Shreve10

Scope of Processes

internationalization

localization

globalization

translation

organizational policies & strategies

business, IT, & document processes

documents, interfaces, tools

Each of these processes has a different scope and occurs at a different point in the business and document cycles of an organization.

Ea

rlier

La

ter

11/7/2004

Kent State University Kent State University

Shreve11

Evolution of Software Localization

Software localization developed as part of the globalization of the personal computer software market. Software applications and supporting electronic documents were the first “localized” products. The growth of the Internet and the World Wide Web created a demand for localized web pages and sites. Digital multimedia and digital repositories (including digital libraries) are emerging foci of localization.

PCsoftware

1980

2005

WWW

repositories

multimedia

11/7/2004

Kent State University Kent State University

Shreve12

Document: Display and Content

documentdocument

documentdocuments

display

content

color, graphics, icons, symbols, display

organization

interface: menus, dialogs,messages, prompts, alerts,

document organization,writing system

Localization focuses on both display (appearance, presentation) and content. Thus, localization includes a cultural adaptation as well as a linguistic translation component.

date, time, calendar, currency, number, address

content: help files, auxiliarydocuments, HTML /

XML document content

metadata, vocabularies

non-linguistic

linguistic

11/7/2004

Kent State University Kent State University

Shreve13

Localizing Software Applications

Software applications were the first localized “electronic documents Early localization included finding all “strings” embedded in code:

#include <stdio.h>main() {

int n; char y[5]; printf("This program converts decimal numbers to hexadecimal\n\n"); while(1) { printf("\nEnter decimal number: ");

scanf("%d",&n); printf("\nNumber entered is <%d> decimal and <%x> hexa",n,n); printf("\nDo you want to continue? "); scanf("%s",y); if(strcmp(y,"yes")) { printf("\n exiting ..\n"); exit(); } } }

strings are

directly in code

source.c

11/7/2004

Kent State University Kent State University

Shreve14

Extract Localizable Resources

PortfolioMenu MENUBEGIN POPUP "&File" BEGIN MENUITEM "&Add Student",1 MENUITEM SEPARATOR MENUITEM "&Delete Student", 2 MENUITEM SEPARATOR MENUITEM "&Update Student", 3 MENUITEM "E&xit", 4 END POPUP "&Tools" BEGIN MENUITEM "Add &Portrait", 5 END POPUP "&Help" BEGIN MENUITEM "About Portfolio", 6 MENUITEM SEPARATOR MENUITEM "Contents", 7 ENDEND

Strings are not the only localizable material:

• dialog boxes• controls• labels• menus• icons• graphics• tooltips

RESOURCES

11/7/2004

Kent State University Kent State University

Shreve15

Localizing Web Pages

character sets localizing tag content recognizing which tags have localizable content not breaking tags looking for text generated by attributes (title, alt) looking for text generated by scripts (server-side, client-side) evaluating CSS and stylesheet changes making changes to graphics dealing with graphics with integral text

Localization of HTML

Web sites are also now being localized. The link below points to a commented HTML file that gives a simple introduction to localizing an HTML web page. At the localizer’s level some of the issues (not an exhaustive list) are:

11/7/2004

Kent State University Kent State University

Shreve16

A Solution: Re-Engineer the Software

As one could imagine, localizing directly in code led to problems. First, translator / localizers were quite capable of “breaking code.” There were also problems associated with the necessity for multiple “re-builds” of the basic software for each language version. Language expansion (differences in textual volume) created sizing problems in dialogs and controls. Localization was labor-intensive, difficult and expensive. A solution was to re-engineer the software with the intent of separating language resources from the underlying delivery mechanism.

11/7/2004

Kent State University Kent State University

Shreve17

Internationalization: Separate Resources

Internationalization is a re-engineering and re-design process intended to make localization and translation easier, faster and more cost-effective.

A first step in the inter-nationalization of software applications is the separation or extraction of linguistic and cultural resources from the application, leaving a “neutral” software kernel.

Extraction requires specialized localization tools.

applicationsoftware

kernel

resources

11/7/2004

Kent State University Kent State University

Shreve18

Extract Localizable Materials

#include <stdio.h> extern unsigned char *intl_m_msg(), *intl_f_msg(); main() {

int n; char y[5]; printf(intl_m_msg("","mypg",1)); while(1) { printf(intl_m_msg("","mypg",2));

scanf("%d",&n); printf(intl_m_msg("","mypg",3),n,n); printf(intl_m_msg("","mypg",4)); scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)); exit(); } } }

This program converts decimal numbers to hexadecimal\n\n"

\n Enter decimal number:

\n Number entered is <%d> decimal and <

%x> hexa

\n Do you want to continue?

\n exiting ..\n

yes"

1

23

456

EXTRACT

source.c mypg.en

11/7/2004

Kent State University Kent State University

Shreve19

Extract Localizable Materials

#include <stdio.h> extern unsigned char *intl_m_msg(), *intl_f_msg(); main() {

int n; char y[5]; printf(intl_m_msg("","mypg",1)); while(1) { printf(intl_m_msg("","mypg",2));

scanf("%d",&n); printf(intl_m_msg("","mypg",3),n,n); printf(intl_m_msg("","mypg",4)); scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)); exit(); } } }

Ce programme convertit les nombres décimaux en hexadécimal\n\n

\nEntrer le nombre décimal:

\nLe nombre entré est <%d> décimal et <%x> hexadécimal

\nVoulez vous continuer?

\nSortie ..\n

oui

1

23

456

TRANSLATE

source.c mypg.fr

11/7/2004

Kent State University Kent State University

Shreve20

<BODY><TABLE> <TR><TD>Joan</TD><TD>Smith</TD></TR> <TR><TD>266 South Prospect Street</TD></TR> <TR><TD>Kent</TD></TR> <TR><TD> Ohio</TD></TR> <TR><TD> 44240</TD></TR>

.

.

. <TABLE><BODY>

Content and Display in Web Pages

Web pages share the problem of “separation of content and coding” with application software. You can see from our web page example how true this is. Internationalization solutions in web pages also involve the “extraction” of linguistic and cultural material from the software vehicle. Cutting edge solutions create dynamic HTML from XML-based language content.

<gradinquiry> <name> <firstname>Joan </firstname> <lastname>Smith</lastname> </name> <address> <addressline1>266 South Prospect Street</addressline1> <addressline2/> <city>Kent</city> <state>Ohio</state> <zip>44240</zip> </address> <country>USA</country> <phone>330-673-9999</phone> <fax>330-672-4017</fax> <email>[email protected]</email></gradinquiry>

HTML

XML

11/7/2004

Kent State University Kent State University

Shreve21

Two Multilingual Web Architectures

multilingual XML content

content is “dynamically” inserted in generated local page templates

Principle of separating linguistic from software elements

as used in software localization

Multiple static versions of pages stored in a folder hierarchy by

language and navigated by selection mechanism

languageselection

static web pageis selected and displayed

OLDNEW

XSLtransforms

11/7/2004

Kent State University Kent State University

Shreve22

I18N Content Management

translation

Dynamic Pageslocalization

XMLRepresentation(content only,strip format)

Content Repository(archive, database)

Style Sheet Repository

Display Medium

acquire information

organize, classify

deployformat

This system assumes anInternationalizeddynamic web pagearchitecture

11/7/2004

Kent State University Kent State University

Shreve23

Internationalization: Control

Truly effective internationalization also involves early intervention in and re-design of “upstream” business and document processes like authoring to exert greater control and to reduce variability.

creation: authoring

storage

acquisition

distribution

rendering

retrieval

documentdocument

documentdocuments

11/7/2004

Kent State University Kent State University

Shreve24

Internationalization & Authoring

I18N controlled languagesterminology control

software documents

help text

technical writers

L10N localizationvendor

machine translation

dependency

For instance, intervention in and re-design of document creation processes (authoring) can yield significant “downstream” benefits for localization. Controlled language and terminology control are two strategies.

11/7/2004

Kent State University Kent State University

Shreve25

Internationalization & Localization

I18N

software internationalizationtools

software documents

help text

resources

technical writers

L10N

localizablesoftware

distribution

localizationvendor

internationalization engineers

controlled languagesterminology control

Internationalization engineerswork with or for clients to createinternationalized products.

11/7/2004

Kent State University Kent State University

Shreve26

Localization Management & Tools

L10Nlocalizablesoftware

distribution

projectmanagement

tools

localizationtools

workflowmanagement

document / versioncontrol

translators / localizers

QA/testing /

validation tools

A localization project requires itsown processes and tools.

localizationproject

11/7/2004

Kent State University Kent State University

Shreve27

Localization Management & Tools

localizationproject

localizablesoftware

distribution

localizationtool

(enterprise)

translators / localizers

project managerlocalization engineer

localizationtoolkit

(distribution)

localizationtool

(translator)

translationmemory

terminologymanager

Translation memories and terminology managers are important tools for maintaining standardized translations and glossaries. TMs provide the focus of QA, ensure replicability / repeatability, and allow re-use of linguistic and cultural materials.

11/7/2004

Kent State University Kent State University

Shreve28

Localization Management & Tools

translators / localizers

localizationtoolkit

(distribution)

localizationtool

(translator)

translationmemory

terminologymanager

Specialized localization for alignment and term extraction are used to automate the construction of TMs.

term extractiontool

text alignmenttool

11/7/2004

Kent State University Kent State University

Shreve29

Reusability

Version 1

Version 2 Version 3

translationmemory

new version uses 70% same text

initial translation with TM tool

30%change

latest version uses 80% same text as

previous

20%change

Reusability is an especially important objective of internationalization and reduces the cost of localization.

11/7/2004

Kent State University Kent State University

Shreve30

Goals of Internationalization

The goals of internationalization are:

reusability

scalability

authority / quality

accessibility

accuracy / acceptability

translations

I18N solution

equivalence

cross-language

target culture(s)

control target document

These goals are metby separating content from display, defining and extracting culturally variable material from fixed or neutral material, intervening in the document cycle to exert control over document processes, and using translation memories and terminology management to ensure critical characteristics such as authority and reusability

11/7/2004

Kent State University Kent State University

Shreve31

Enhanced Corpora

Future directions in internationalization will involve exploiting document corpora more effectively and extracting useful linguistic and textual objects for control and re-use.

Control of the document cycle begins with understanding the documents we already “own” and enhancing them.

11/7/2004

Kent State University Kent State University

Shreve32

Corpus

New Localization Objects

Many linguistic objects useful in computer-assisted authoring and translation, web page localization, machine translation and cross-language information retrieval (including browsing) can be extracted from a well-understood and deliberately structured document corpus.

11/7/2004

Kent State University Kent State University

Shreve33

Corpus Replication

Using statistical techniques it is possible to replicate the contents of a monolingual corpus and add multilingual equivalents for terms, phrases, document segments and other objects to it.

11/7/2004

Kent State University Kent State University

Shreve34

What The Industry is Doing Now

The language industry currently relies on using translation memories and terminology managers. There are significant drawbacks to this method that prevent new gains in cost reduction and profitability – the goal of inter-nationalization.

11/7/2004

Kent State University Kent State University

Shreve35

A New Model

New approaches to internationalization and automatic localization leverage the linguistic value of existing corpora and allow the creation of “enhanced” corpora whose contents are understood and controlled. Statistical corpus linguistics and XML combine to allow the next step in localization technology.

11/7/2004

Kent State University Kent State University

Shreve36

Peer-to-Peer Localization Resources

A peer-to-peer networking platform with a security and digital rights management layer can be used to link clients in an XML resource network. A vendor can assess per transaction charges for access to corpus object stores.

11/7/2004

Kent State University Kent State University

Shreve37

Socio-Cultural Style Sheets

The peer-to-peer networking platform can also be used to provide new capabilities for next generation localization. Client-Side Socio-Cultural Style-sheets (CSSCS) can provide for automated solutions to on-the-fly provision of web content in the languages and formats desired by and expected by web users all over the world.