52
XLIFF - the XML based Open Standard for Localisable Content Tony Jewtushenko Oracle Corporation - Principal Product Manager Chair – OASIS XLIFF TC The XML Localisation Interchange File Format

XLIFF - the XML based Open Standard for Localisable Content Tony Jewtushenko Oracle Corporation - Principal Product Manager Chair – OASIS XLIFF TC The

Embed Size (px)

Citation preview

XLIFF - the XML based Open Standard for Localisable

Content

Tony JewtushenkoOracle Corporation - Principal Product Manager

Chair – OASIS XLIFF TC

The XML Localisation Interchange File Format

Slide 2

Agenda

• Open StandardsDefinition and process

• Overview of XLIFF Definition, goals, and benefits of XLIFFArchitecture and Main Features of XLIFFUse cases

• Open Source LocalisationTechnical OverviewProcess OverviewUse case

• Where does XLIFF fit?Tools Support for XLIFFXLIFF Adoption by Open Source community

Slide 3

What is an Open Standard?

Open standards are:• Publicly available in stable, persistent versions• Developed and approved under a published process • Open to public input: public comments, public archives, no

NDAs• Subject to explicit, disclosed IPR terms• See the US, EU, WTO governmental & treaty definitions of

“standards”

Anything else is proprietary

Source: “Relationship Between Open Standards and Open Source Software”, Patrick Gannon – CEO OASIS, Open Source in Government, Washington, DC, 15-17 March 2004

Slide 4

OASIS: Standards Body Home of XLIFF

• OASIS: Organization for the Advancement of Structured Information Standards

• World’s largest independent, non-profit organization dedicated to the standardisation of eBusiness specifications.

• More than 150 member companies plus individuals• Operates XML.ORG Registry, the open community

clearinghouse of XML application schemas • Technical work on XML interoperability includes

XML conformance and XML Registries/Repositories • General XML and eBusiness technical resource

Slide 5

OASIS Standards Process

• Specifications are created under an open, democratic, vendor-neutral process– Anyone may participate

– No single organisation can dictate the specification - specifications must meet everyone’s needs

– All discussions are open to the public view and comment

• Two Tiered Specification approval process– Committee Draft approved by Technical Committee

– OASIS members approve specification as OASIS Standard

• Process guarantees that specifications are created by a broad range of industry, not just a single vendor

Slide 6

XLIFF Overview

A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format.

Slide 7

What is XLIFF?

A specification for the lossless interchange of localizable data and its related information, which is tool-neutral, has been formalized as an XML vocabulary, and features an extensibility mechanism.

Slide 8

Why XLIFF is Needed?

Localization offers the following challenges:

• Insufficient interoperability between tools.

• Lack of support for overall localization workflow.

• Necessity of localization tools developers to deal with many formats.

• Large number of proprietary intermediate formats.

Slide 9

Advantages – Technology (1/2)

• For a given utility, only one implementation is necessary (e.g. not one spell checker for PO Files, and another one for HTML).

• Increases usability of utilities (i.e. all formats with XLIFF filters can be used with XLIFF-enabled utilities).

• Can contain either UI or Document content

• Metadata provides integration with automated workflow.

Slide 10

Advantages – Technology (2/2)

• All advantages of XML-based processing:– Content validation (XSD)– Use of its internationalization features.– Better interoperability and cross-platform support.– Powerful rendering options (XSL-FO, CSS).– Powerful transformation options (XSLT).– Greater integration with Web services.

• Access to existing, and often open-source, XML implementations

Slide 11

XLIFF Timeline

• September 2000 - DataDefinition Kickoff

• December 2000 - first face to face

• March 2001 - second face to face

• End March 2001 - draft 1.0 spec and DTD published

• June 2001 - White Paper published

• December 2001 - OASIS XLIFF Technical Committee Proposal submitted

• April 2002 – XLIFF 1.0 Specification approved by formal vote as an OASIS Committee Specification

• May 2003 – XLIFF 1.1 Specification approved by formal vote as an OASIS Committee Specification

• August/Sept 2003 – XLIFF 1.1 Peer Review

• November 2003 – Revised XLIFF 1.1 Specification approved as OASIS Committee Specification

• November 2003 – XLIFF 1.1 Specification submitted for public review

Slide 12

Drivers Behind XLIFF

Alchemy SoftwareBowne Global SolutionsConvey SoftwareEktron, Inc ENLASO Corp (RWS)GlobalsightHPLotus/IBMLionbridgeLRCMoravia IT

NovellOraclePASS EngineeringMicrosoftSAPSDL InternationalSun MicrosystemsTektronixTRADOSXML-Intl

Slide 13

XLIFF TC in the Standards Community

• Shared interests with OASIS Translation Web Services Technical Committee– XLIFF may be used as data container for WS

• Shared interests with the OSCAR SIG at LISA– Segmentation and word-count.– Content markup (inline codes).

• Shared interests with the W3C i18n WG– Localization directives.– Best practices.– In the localization aspects of the W3C. recommendations.– Web services.

Slide 14

Architecture

A look at XLIFF’s main features and how they work together.

Slide 15

Extract-Localize-Merge Paradigm

• Separate data related to localization from parts not related to localization.

• Merge translated data with codes at the end of the process to create the final document.

• Skeleton file is optional, so this paradigm is also optional

Slide 16

A Birds-Eyes View

An XLIFF document can capture anything needed for a localization project:

1. Localizable objects (e.g. text strings) in source and target languages.

2. Supplementary information (e.g. glossaries, or material to recreate the original format).

3. Administrative information (e.g. workflow data).

4. Custom data (e.g. initialization information for tools).

Slide 17

The XLIFF Document

• An XLIFF document is designed to store the extracted data related to localization.

• Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF.

• Each XLIFF document can include several <file> elements.

• A whole localization project can possibly be stored in a single XLIFF document.

Slide 18

Bilingual Model

• Each <file> element is designed to store one source language and one target language.

• The rational is that the translation of different target language is done by different people most of the time.

• However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese.

Slide 19

Localizable Objects

• XLIFF allows not only text string as localizable object but also other object types such as graphics.

• Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text).

• Relationship between object can be captured (e.g. all items in a menu).

Slide 20

An XLIFF Snippet…

A simple menu represented as XLIFF

Slide 21

Supplementary Info

• XLIFF provides “hooks” for storing supplementary information (for example to glossaries or translation memories which should be used).

• The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document.

Slide 22

Administrative Info

XLIFF provides mechanisms for capturing administrative information:

• For relating source material to XLIFF documents.

• For storing workflow data.

• For providing pre-translation entries generated by TM, MT, translation repository.

• For keeping track of changes.

Slide 23

XLIFF 1.1 Custom Data

In XLIFF 1.1, we have the ability to customise XLIFF by extending via private namespace:– Elements– Attributes– Attribute Values

Slide 24

Embedding XLIFF 1.1

• Can embed an entire or part of an XLIFF doc in other XML doc

• XML defined by XML Schema (XSD) that includes an <any> element in the definition of the element where the XLIFF data can be inserted

Slide 25

Use Cases

XLIFF in the localisation process.

Slide 26

Basic Use Case – without XLIFF

Tool ResourceFilters

DeveloperApplications TranslatorCustomer

SpecificTool (s)

Native File 2(e.g., JavaFiles)

Native File 1(e.g., HTML)

Native File 3(e.g., Java Properties)

Native File n

Publisher/CustomerDomain

LocalisationDomain

Slide 27

Basic Use Case –with XLIFF

XLIFF compliant DeveloperApplications

TranslatorXLIFFCompliantEditor

XLIFF file(s) containingHTML, Java, Properties, etc translatable resources

Non XLIFF compliant DeveloperApplications

- OR -

Publisher/CustomerDomain

LocalisationDomain

Direct toXLIFF authoring

HTML

Java Properties

RC Data

Pre-processing

Slide 28

Automated Localisation with CAT Use Case

Developer Translator

GenerateXLIFF

Pseudo Translate / Test

LocalizationEngineer

XLIFF Translation Kit

100% match

TranslationRepository

DefectReport

XLIFF Editor

XLIFF Translation Kit

Translate

RequiresTranslation

100%Translated

0% Translated

100%Translated

Fuzzymatch

TranslationMemory

MachineTranslation

MachineTranslate

Update

Slide 29

Open Source Localisation

Issues specific to localising Open Source software.

Slide 30

Open Source Resource Formats

• User Assistance (Help):– DocBook as intermediate container

• UI Resources:– Many different format types, but converge on:

• PO / POT

• Java Resource Bundles (.properties & .java)

Slide 31

Docbook

• Formed in 1991• SGML and XML versions• Many commercial XML editors optimised for

Docbook• No good Open Source XML editors available.• GNU converts Docbook to (XML->) PO files,

translates, then converts back.• Docbook converted to HTML dynamically by Yelp

Help Browser.• To optimise performance can pre-convert to HTML

Slide 32

UI Resource Format – Java Resources

• ListResourceBundle– .java file– Can contain binary data– Compiled into class file

• PropertyResourceBundles– .properties file– Contain strings only– Values acquired at runtime– Requires 8859-1 encoding– Non 8859-1 characters represented as UTF8 escape codes

(ie, \uxxxx)– native2ascii to convert non 8859-1 content

Slide 33

UI Resource Format – Java Resources

• Localization challenges:– Each file contains 1 language locale pair– Key / Value Pairs– No normalized metadata – comments often used for

ad hoc metadata.

Slide 34

UI Resource Format - PO

PO (Portable Object) Files, and POT (templates)– A “Catalog”– Bi-lingual model– Resource bundle accessed by “gettext()” – Text files– Utilities available to convert from many resource types to

PO (ie., C, Delphi, Java, Python, etc.)– Compiled into “MO” files– Support for Plurals– Limited metadata– Used by most GNU, GNOME, KDE and other Open

Source projects

Slide 35

PO File Syntax# SOME DESCRIPTIVE TITLE.

# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER

msgid “”

msgstr “”

"Project-Id-Version: Project Version \n"

"PO-Revision-Date: YYYY-DD-MM HH:MM-SSSS\n"

"Last-Translator: TranslatorName <email>\n"

"MIME-Version: 1.0\n"

"Content-Type: text/plain; charset=code\n"

"Content-Transfer-Encoding: 8bit\n"

"POT-Creation-Date: \n"

"Language-Team: \n“

white-space (usually a single new line)

# translator-comments

#. automatic-comments

#: reference...

#, flag...

msgid untranslated-string

msgstr translated-string

Header

Resource(s)Segment Metadata

Comments

Separator

Slide 36

PO File Plural Form

white-space

# translator-comments

#. automatic-comments

#: reference...

#, flag...

msgid untranslated-string

msgstr translated-string

msgstr_plural translated-string-plural-form

msgstr[0] translated-string-plural-form

msgstr[1] translated-string-plural-form

msgstr[n] translated-string-plural-form

Plural form of a message in the PO file looks like this:

“n” is language specific

Slide 37

PO File Plural Forms Syntax / Examplesmsgid untranslated-string

msgstr_plural translated-string-plural-form

msgstr[0] translated-string-plural-form

msgstr[1] translated-string-plural-form

msgstr[n] translated-string-plural-form

msgid "%s file"

msgid_plural "%s files"

msgstr[0] "%s fichier"

msgstr[1] "%s fichiers"

msgid "%s file"

msgid_plural "%s files"

msgstr[0] "%s plik"

msgstr[1] "%s pliki"

msgstr[2] "%s plików"

Syntax

French

Polish

Slide 38

PO File Localization Challenges

• Plural Forms Challenges– Rules differ across languages, and implementations differ

across platforms.

– PO editing tools don’t support plural form well (poedit, Kbabel), and recommend using text editors .

• Limited normalized metadata• Little or no context information for translators• Docbook represented as PO files loses metadata• Limited support for segmentation, alignment

Slide 39

Simplified GNU/KDE Style Use Case

Docbook

i18n Coordinator

Documentation Author

DeveloperDomain Localisation

Domain

Docbook/PO converterCVSUI Developer

Generate PO FilesPO

PO

Preparation &Project Management

Translator

PO

Text Editor

PO Editor

CVSUP

PO/Docbook converter

Translation

TM

Slide 40

Open Source Localisation Process

• Localization in Open Source community is very technical, and almost entirely manual – primary interface is CVS, even for translators(eg: http://i18n.kde.org/translation-howto/index.html)

• Process and tools differ from project to project, even language to language.

• Little or no formal linguistic review: quality, style consistency vary widely.

• Project Management and translation are performed by volunteers.

Slide 41

Tools Support

A survey of localization tools that support XLIFF

Slide 42

XML-Enabled Translation Tools

• Any XML-enabled translation tool can work with an XLIFF document, as long as the text to translate is initially copied in the <target> elements. However, this does not mean it supports all XLIFF features, but just permits translation of <target> content.

• Many tools cannot handle conditional translation (for example: <trans-unit translate="no">). Then, you need to add extra elements temporarily.

Slide 43

XLIFF Enabled Commercial Tools

• Alchemy Software - Catalyst 5.0 – Visual XLIFF 1.1 Editor http://www.alchemysoftware.ie

• Heartsome XLIFF Editor, support for PO files, Docbook: http://www.heartsome.net

• PASS: Passolo: Visual XLIFF Editor: http://www.passolo.com

• Trados: No direct XLIFF support yet, but can edit XLIFF files using modified INI

• XML-Intl : XLIFF Editor http://www.xml-intl.com

Slide 44

XLIFF Enabled Shareware/Freeware

• ENSALO Corp (formerly “RWS Group”) : Extraction Utility for RC Data and Java Properties to XLIFF 1.1 http://dotnet.goglobalnow.net/

Various Freeware Utilities, including converters for PO files: http://www.translate.com/shared/tools

Slide 45

XLIFF Enabled Open Source

• International Components for Unicode (ICU):– Open Source set of C/C++ and Java libraries for

Unicode support, software internationalization and globalization, extends JDK i18n

– genrb, and XLIFF2ICUConverter class to convert between common formats and XLIFF

– Includes RBManager, a Java based resource bundle editor with XLIFF support

http://oss.software.ibm.com/icu/

Slide 46

XLIFF Enabled Open Source

• Okapi Framework XSL Template Collection:–Sample utilities for transforming XLIFF to PO, RC, Java Properties

http://sourceforge.net/project/showfiles.php?group_id=42949&release_id=67485

• xliffRoundTrip tool–Transforms any XML file to/from XLIFF using XSLT

http://sourceforge.net/projects/xliffroundtrip/

• Lionbridge ForeignDesk–Incomplete XLIFF support

http://sourceforge.net/projects/foreigndesk/

Slide 47

Future Support for XLIFF Announced:• Apple Corp: Apple’s resource editor AppleGlot• Idiom: Worldserver V.6.0• SDL International: SDLX support for XLIFF currently

in development. See http://www.sdlx.com for more information.

• uPortal: Open Source Web portal infrastructure for Universities – XLIFF support announced for Version 3.0, to be released in 2005

Slide 48

Where does XLIFF fit?

• Good choice for projects with multiple resource formats, especially good for XML.

• XLIFF addresses the process and metadata related problems of Open Source projects:– Supports workflow metadata.– Supports multiple resource formats– Normalised translation memory / repository data.– Simplifies translator usability experience.

Slide 49

Where does XLIFF fit?

• Issues Blocking Adoption by Open Source:– Adoption requires retooling - lack of existing open

source XLIFF tools for PO and Docbook.– PO tools deemed adequate for current requirements– “Volunteer” model reduces urgency to reduce costs

Slide 50

Where does XLIFF fit?

• Issues Encouraging Adoption by Open Source:– Increase in commercial product development for

Open Source platforms• Translation not volunteer effort - cost control important.

• Integration with existing automation required.

• Increased availability of commercial tools that support XLIFF

– Increase in Java Open Source projects• Java projects are well supported by XLIFF.

• Well documented L10n best practices include XLIFF

• Available commercial and Open Source tools

Slide 51

More Information

• The XLIFF TC Web Site: http://www.xliff.org

• A “best practice” from Sun Developer Network: http://developers.sun.com/dev/gadc/technicalpublications/whitepapers/translation_technology_sun.html

• Presenter: – XLIFF TC Chair: Tony Jewtushenko (Oracle)

([email protected])

Slide 52

Thank You...

Questions?