72
Software Architecture 04/2008/KW www.folt.de OPENTMS SOFTWARE ARCHITECTURE Roßtal, 29/08/2008 Doc.Nr.: HEA-1.1-2008 Version 1.3 Author: Dr. Klemens Waldhör / [email protected] Location: OpenTMS_Software_Architecure_v1.3.doc

Open Tms Software Architecure

Embed Size (px)

DESCRIPTION

The paper describes the basic architecture of the open source translation memory system openTMS

Citation preview

Page 1: Open Tms Software Architecure

Software Architecture 04/2008/KW

www.folt.de

OPENTMS

SOFTWARE ARCHITECTURE

Roßtal, 29/08/2008 Doc.Nr.: HEA-1.1-2008

Version 1.3

Author: Dr. Klemens Waldhör / [email protected] Location: OpenTMS_Software_Architecure_v1.3.doc

Page 2: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 2/72

1 VERSIONING INFORMATION

• V0.1 – Version 0.1 – April/May/June2008: Start Version; Klemens Wald-

hör, Heartsome Europe - TOSS_Software_Architecure.doc;

• V1.0 – Version 1.0 – 05.08.2008: Initial version; Klemens Waldhör, Heart-

some Europe; based on discussion with Michael Schneider, beodoc,

04.07.2008 - OpenTMS_Software_Architecure_v1.0.doc

• V1.1 – Version 1.1 – 30.08.2008: Modifications based on the FOLT inter-

nal architecture discussion meeting, 29.08.2008, Acolada GmbH, Nürn-

berg. Participants: Ulrike Baral, beodoc; Torsten Kuprat; Michael Schnei-

der, beodoc; Klemens Waldhör, Heartsome Europe; Thomas Wedde, eu-

roscript; OpenTMS_Software_Architecure_v1.1.doc

Page 3: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 3/72

2 PREFACE

This manual gives an overview of the software architecture OpenTMS. It is based

on the requirements defined in the FOLT Open Source Initiative (Folt, 2007b).

The architecture of OpenTMS is mainly based on several models. These models

describe the key components of OpenTMS. Each model handles a specific aspect

of the translation process and its requirements. The models form a framework

which guide the construction of language specific software tools.

The following core models are identified:

• Security model

• Document model

• Process model

• User model

• Data model

• GUI model

• Interface model

On top of those models the application model organises real applications (like the

GUI model).

OpenTMS uses a data source in the data model which organises the access to

database or any kind device which allows to store (TM or terminology) data.

The architecture also contains a description of some basic functions

which can form the basic core of translation tools. The architecture is

defined in such a way that is can be easily extended with new functions

or combining existing functions to new functionality.

Page 4: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 4/72

CONTENTS

1 VERSIONING INFORMATION.........................................................................2

2 PREFACE.........................................................................................................3

3 LIST OF TABLES AND FIGURES ...................................................................7

4 DEFINITIONS...................................................................................................8

5 INTRODUCTION ............................................................................................12

5.1 Arguments for an OpenTMS Software Architcture......................................12

5.2 Basics.........................................................................................................12

5.2.1 Naming conventions........................................................................................ 12

5.2.2 Naming of OpenTMS specific functions/methods ............................................ 13

5.3 Character set ..............................................................................................13

5.4 Standards ...................................................................................................13

5.5 Basic Requirements ...................................................................................14

5.6 Architecture ................................................................................................14

6 OPENTMS ARCHITECTURE AND MODELS................................................16

6.1 Parameters in OpenTMS models ...............................................................16

6.2 Core Models of OpenTMS..........................................................................18

6.3 OpenTMS Core Library...............................................................................20

6.4 The Application Model ................................................................................20

6.5 Implementation Languages ........................................................................21

7 SECURITY MODEL........................................................................................22

7.1 Security, OpenTMS and Programming Languages ....................................23

7.2 Communication Level .................................................................................24

7.3 Document Level..........................................................................................24

7.4 Database Level...........................................................................................25

7.5 Security Level .............................................................................................25

8 BASIC OPENTMS COMPONENTS ...............................................................27

9 DOCUMENT MODEL .....................................................................................30

9.1 Documents ...............................................................................................30

Page 5: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 5/72

9.2 Character Sets .........................................................................................31

9.3 XML document handling ........................................................................31

9.4 XLIFF Documents ....................................................................................31

9.4.1 OpenTMS and Skeleton files........................................................................... 32

9.4.2 Security and encryption in XLIFF – secureXLIFF............................................. 33

9.5 TMX Documents ......................................................................................33

9.5.1 Security and encryption in TMX – secureTMX................................................. 34

9.6 TBX Documents .......................................................................................34

9.6.1 Security and encryption in TBX – secure TBX ................................................. 34

9.7 Other Documents ....................................................................................35

9.8 Basic Document Access Functionality........................................................35

10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE..................................37

11 DATA MODEL................................................................................................41

11.1 Data sources ..............................................................................................41

11.2 TM Matches................................................................................................43

11.3 Basic data source access functionality .......................................................44

11.4 Databases ..................................................................................................47

11.4.1 Open source SQL data bases ......................................................................... 47

11.4.2 Closed source SQL databases ........................................................................ 47

11.4.3 Alternatives ..................................................................................................... 47

11.4.4 Database Access ............................................................................................ 49

11.4.5 Database and data source configuration ......................................................... 49

12 TRANSLATION OBJECTS ............................................................................51

12.1 Format information .....................................................................................52

12.2 Terminology versus Translation Memory....................................................52

12.3 Variables , placeholders, replacement classes...........................................53

13 PROCESS MODEL ........................................................................................56

13.1 OpenTMS Process .....................................................................................56

13.2 OpenTMS Scripting Language ...................................................................56

13.3 OpenTMSL Communication Methods.........................................................58

14 USER MODEL................................................................................................59

14.1 User roles ...................................................................................................59

Page 6: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 6/72

14.2 Basic user functionality...............................................................................60

15 GUI MODEL ...................................................................................................61

16 INTERFACE MODEL .....................................................................................62

17 CONFIGURING OPENTMS............................................................................63

17.1 Naming of the configuration file ..................................................................64

17.2 Structure of the configuration file ................................................................64

17.3 Configuration Options.................................................................................65

18 DMS INTERFACE ..........................................................................................66

19 BIBLIOGRAPHY ............................................................................................68

20 APPENDIX .....................................................................................................69

20.1 Multiple translations for a linguistic concept................................................69

Page 7: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 7/72

3 LIST OF TABLES AND FIGURES

Fig 1: OpenTMSName defined as a regular expression 12

Fig 2: Naming of OpenTMS functions for export 13

Fig 3: OpenTMS Procedure description 15

Fig 4: OpenTMS Models 18

Fig 5: Example securing XLIFF document exchange 23

Fig 6: OpenTMS Objects 28

Fig 7: XLIFF File 32

Fig 8: Some basic XLIFF File functions 36

Fig 9: Hierarchy of processes 38

Fig 10: Applications 38

Fig 11: Pipeline Architecture 40

Fig 12: Data sources and data components 41

Fig 13: Data sources with several data components 42

Fig 14: Data source access types 45

Fig 15: Data source access types 46

Fig 16:Configuring different database types 49

Fig 17: Representation of linguistic entities as General Linguistic Object 52

Fig 18: Conversions of linguistic entities 53

Fig 19: OpenTMS Scripting Language 56

Fig 20: OpenTMSL Inter-process and computer communication 57

Fig 21: Some basic user functions 60

Fig 22: Configuration of OpenTMS 63

Fig 23: Configuration file naming example 64

Fig 24: Configuration option structure 65

Fig 25: OpenTMS options table 65

Page 8: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 00 ; Rev.00; April 2007 8

4 DEFINITIONS

Client: A client is an application or system that accesses a (remote) service on

another computer system known as a server by way of a network. URL:

http://en.wikipedia.org/wiki/Client_%28computing%29

Client-Server: Client-server is a computing architecture which separates a client

from a server, and is almost always implemented over a computer network. A cli-

ent-server application is a distributed system that constitutes of both client and

server software. A client is a software or process that may initiate a communica-

tion session, while a server can not initiate sessions, but is waiting for a requests

from a client. Client and server may also aim at the host computer hardware con-

nected to a network, that are residing the client and server software respectively.

URL: http://en.wikipedia.org/wiki/Client-server

Doclet: Als Doclet bezeichnet man in Anlehnung an Applets Module, die von Do-

kumentationswerkzeugen zur Verarbeitung und automatischen Erzeugung von

Dokumentation und eventuell auch Code eingesetzt werden. Bekannt sind Doclets

insbesondere im Umfeld der Programmiersprache Java, wo sie als Module im Do-

kumentationswerkzeug Javadoc eingesetzt werden. URL:

http://de.wikipedia.org/wiki/Doclet.

GUI: Graphical User Interface. An application which allows a human user to inter-

act with a program thru windows, menus etc.

“A graphical user interface (GUI) (IPA: /ˈguːiː/) is a type of user interface which al-

lows people to interact with electronic devices like computers, hand-held devices

(MP3 Players, Portable Media Players, Gaming devices), household appliances

and office equipment. A GUI offers graphical icons, and visual indicators as op-

posed to text-based interfaces, typed command labels or text navigation to fully

represent the information and actions available to a user. The actions are usually

performed through direct manipulation of the graphical elements.” URL:

http://en.wikipedia.org/wiki/GUI

FOLT: Forum Open Language Tools URL: www.folt.org

HTTP: Hypertext Transfer Protocol (HTTP) is a communications protocol for the

transfer of information on intranets and the World Wide Web. Its original purpose

Page 9: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 9/72

was to provide a way to publish and retrieve hypertext pages over the Internet.

URL: http://en.wikipedia.org/wiki/HTTP

HTTPS: Hypertext Transfer Protocol over Secure Socket Layer or HTTPS is a URI

scheme used to indicate a secure HTTP connection. It is syntactically identical to

the http:// scheme normally used for accessing resources using HTTP. Using an

https: URL indicates that HTTP is to be used, but with a different default TCP port

(443) and an additional encryption/authentication layer between the HTTP and

TCP. This system was designed by Netscape Communications Corporation to

provide authentication and encrypted communication and is widely used on the

World Wide Web for security-sensitive communication such as payment transac-

tions and corporate logons. URL: http://en.wikipedia.org/wiki/Https

Open Source: Open source is a development methodology,[1] which offers practi-

cal accessibility to a product's source (goods and knowledge). Some consider

open source as one of various possible design approaches, while others consider

it a critical strategic element of their operations. Before open source became

widely adopted, developers and producers used a variety of phrases to describe

the concept; the term open source gained popularity with the rise of the Internet,

which provided access to diverse production models, communication paths, and

interactive communities.

The open source model of operation and decision making allows concurrent input

of different agendas, approaches and priorities, and differs from the more closed,

centralized models of development.[2] The principles and practices are commonly

applied to the development of source code for software that is made available for

public collaboration, and it is usually released as open-source software. URL:

http://en.wikipedia.org/wiki/Open_source

RPC: Remote procedure call (RPC) is a technology that allows a computer pro-

gram to cause a subroutine or procedure to execute in another address space

(commonly on another computer on a shared network) without the programmer

explicitly coding the details for this remote interaction. That is, the programmer

would write essentially the same code whether the subroutine is local to the exe-

cuting program, or remote. When the software in question is written using object-

oriented principles, RPC may be referred to as remote invocation or remote

method invocation. URL: http://en.wikipedia.org/wiki/Remote_procedure_call

Page 10: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 10/72

Server: In information technology, a server is an application or device that per-

forms services for connected clients as part of a client-server architecture. A

server application, as defined by RFC 2616 (HTTP/1.1), is "an application program

that accepts connections in order to service requests by sending back responses."

Server computers are devices designed to run such an application or applications,

often for extended periods of time with minimal human direction. Examples of d-

class servers include web servers, e-mail servers, and file servers. URL:

http://en.wikipedia.org/wiki/Server_%28computing%29

Software Architecture: The software architecture of a program or computing sys-

tem is the structure or structures of the system, which comprise software

components, the externally visible properties of those components, and the

relationships between them. The term also refers to documentation of a sys-

tem's software architecture. Documenting software architecture facilitates com-

munication between stakeholders, documents early decisions about high-level de-

sign, and allows reuse of design components and patterns between projects. URL:

http://en.wikipedia.org/wiki/Software_architecture.

TOMCAT: Apache Tomcat is a Servlet container developed by the Apache Soft-

ware Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer

Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java"

HTTP web server environment for Java code to run. … Apache Tomcat includes

tools for configuration and management, but can also be configured by editing

configuration files that are normally XML-formatted. URL:

http://en.wikipedia.org/wiki/Apache_Tomcat

UML (Unified Modeling Language): In the field of software engineering, the Uni-

fied / Universal Modeling Language (UML) is a standardized visual specification

language for object modeling. UML is a general-purpose modeling language that

includes a graphical notation used to create an abstract model of a system, re-

ferred to as a UML model. UML is officially defined at the Object Management

Group (OMG) by the UML metamodel, a Meta-Object Facility metamodel (MOF).

Like other MOF-based specifications, UML has allowed software developers to

concentrate more on design and architecture URL:

http://en.wikipedia.org/wiki/Unified_Modeling_Language

Page 11: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 11/72

Unicode: In computing, Unicode is an industry standard allowing computers to

consistently represent and manipulate text expressed in most of the world's writing

systems. Developed in tandem with the Universal Character Set standard and

published in book form as The Unicode Standard, Unicode consists of a repertoire

of more than 100,000 characters, a set of code charts for visual reference, an en-

coding methodology and set of standard character encodings, an enumeration of

character properties such as upper and lower case, a set of reference data com-

puter files, and a number of related items, such as character properties, rules for

normalization, decomposition, collation, rendering and bidirectional display order

(for the correct display of text containing both right-to-left scripts, such as Arabic or

Hebrew, and left-to-right scripts). URL: http://en.wikipedia.org/wiki/Unicode

UTF-8: UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length

character encoding for Unicode. It is able to represent any character in the Uni-

code standard, yet the initial encoding of byte codes and character assignments

for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily

becoming the preferred encoding for e-mail, web pages, and other places where

characters are stored or streamed. URL: http://en.wikipedia.org/wiki/UTF-8

XML-RPC: XML-RPC is a remote procedure call protocol which uses XML to en-code its calls and HTTP as a transport mechanism. URL: http://en.wikipedia.org/wiki/Xml-rpc

Page 12: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 12/72

5 INTRODUCTION

5.1 Arguments for an OpenTMS Software Architcture

The arguments for an open source based localization tool have been discussed in FOLT, 2007a.

Software design principles:

For end users (translators): easy to install

For translation providers: server version, networking

For customers: running own servers; secure interfaces

5.2 Basics

5.2.1 Naming conventions

OpenTMS uses a standardized naming convention scheme for variables, names in xml file etc.

Each legal OpenTMS name (string, literal, variable name, function names) con-sists of one or more words. Variables starts with an uppercase letter. Function names (e.g. identifying processes) start with lowercase. Only the characters [A-Z] are allowed. The remaining characters are either [a-z] or [0-9]. No blanks are al-lowed between words.

Word := [A-Z]([a-z]|[0-9])*

word := [a-z]([a-z]|[0-9])*

OpenTMSName := Word+

OpenTMSFunctionName := word Word*

Examples:

• The variable: xliffDocument

• The function: openXliffDocument

Fig 1: OpenTMSName defined as a regular expression

Exceptions from the naming conventions could be introduced if acronyms etc. are used for words (e.g. TMX). Nevertheless it is not recommended to do this.

Page 13: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 13/72

5.2.2 Naming of OpenTMS specific functions/methods

It is suggested using a consistent OpenTMS naming system for functions and variables which are exported from OpenTMS. Exported functions refer to functions which can be used in applications (similar to the public concept in Java or C++). This immediately helps to identify code which is used in systems outside of OpenTMS. The special string “OpenTMS_” is used for this purpose.

ExportOpenTMSName:= “OpenTMS_” Word+

ExportOpenTMSFunctionName := “OpenTMS_” word Word*

Examples:

• The variable: OpenTMS_Ecoding

• The function: OpenTMS_openXliffDocument

Fig 2: Naming of OpenTMS functions for export

5.3 Character set

OpenTMS uses UTF-8 as basic character set, esp. for exchanging files.

5.4 Standards

FOLT builds heavily on the idea of Open Source and using standards. Therefore the FOLT requirements use well-established localization standards to represent various types of localization information - based on XML.

• XLIFF - XML based localization exchange format

• TTX – Trados TM format

• TMX - XML based localization translation memory exchange format

• SRX - XML based format for describing segmentation rules

• GMX – standard for measuring quantitative aspects in the translation process

• TBX / MARTIF / OLIF – formats for representing terminology

• CSV

• Language Encoding ISO 639…

In general the basic architecture makes heavy use of XML. XML based structures are used as the basic mechanism to exchange information between different ap-

Page 14: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 14/72

plications (->Translets). Using XML has the advantage that many (open source) parsers are available for different programming languages which enables imple-menting the core OpenTMS architecture in different languages and environments.

5.5 Basic Requirements

The following is taken from the FOLT (2007b); it extracts the main requirements:

• Software: Web based application; thin client; no installation no properiatary run

time components; preferred open source software (FOLT, 2007b, p. 17)

• Operating System: OS Independent

• Hardware: standard hardware (FOLT, 2007b, p. 17)

• Interfaces: Integration into CMS, workflow management should be supported

(FOLT, 2007b, p. 17).

• Product interfaces: Exchange supported through XLIFF and TMX (FOLT,

2007b, p. 18).

• Database: Open source database (FOLT, 2007b, p. 21); basically all SQL da-

tabases should be supported, therefore a generic database interface is re-

quired.

• Scalability: single and multi user requirement

5.6 Architecture

The architecture is described mainly in diagrams and text. The target group of this

document are mainly non technicians. Therefore it is tried to keep the document

as informal as possible without loosing the necessary precision. Further docu-

ments or versions of this document may add more details to the various items dis-

cussed. If possible the basic methods and classes have been written in Java but

this should not induce that the implementation requires Java as an implementation

language.

The various components described in the document are called models. A model

organizes a certain functionality or aspect of the OpenTMS systems. An example

of a model is the security model of OpenTMS. This model describes all necessary

functions and structures to implement the OpenTMS security system.

There are several methods to describe architecture, methods and objects of a

piece of software. Within this document mainly diagrams and block diagrams are

Page 15: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 15/72

used to show the structure of the software. For describing methods and objects an

XML based methodology is used (taken from Tomcat).

The following is an example of a method call description using the Tomcat inter-

face description. The method will be enhanced by describing also the possible re-

turn values.

<translet>

<translet -name>ApplyTranslationMemoryToSegment</translet-name>

<translet-class>com.OpenTMS.translet.translateSegment</translet-

class>

<init-param>

<param-name>

TMXDB

</param-name>

<param-value>

OpenTMSexampledatabase

</param-value>

</init-param>

<init-param>

<param-name>

SEGMENT

</param-name>

<param-value>

This segments needs to be translated.

</param-value>

</init-param>

<init-param>

<param-name>

FUZZYQUALITY

</param-name>

<param-value>

70

</param-value>

</init-param>

</translet>

Fig 3: OpenTMS Procedure description

Annotation: In order to keep the text more compact function naming does not in-

clude the naming scheme described in chapter 5.2.2. But this jus for readability

purposes. The real implementation should adhere to the naming scheme.

Page 16: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 16/72

6 OPENTMS ARCHITECTURE AND MODELS

The OpenTMS architecture is composed of several models. Each model imple-

ments a specific aspect and behavior of the OpenTMS system. Each model com-

municates with the other model through parameters and values.

6.1 Parameters in OpenTMS models

Parameter and their realization, esp. their types, independently from a specific pro-

gramming languages is not really trivial – apart from trivial types like characters,

strings, integers or other numbers. Transferring more complex structured informa-

tion has to be organized based on those primitive types. Programming languages

typically uses “serialization” approaches to achieve at least a transfer of date from

one application instance to another instance.

OpenTMS tries to use a general parameter / value model which addresses both

programming language specific and programming language independent parame-

ter / value transfer. In order to make the integration of existing applications possi-

ble OpenTMS supports different options for parameter representation.

The following methods should be supported:

• XML based parameters: all values should be transferred thru xml elements

where the value is given thru the element content (string), the name of the

parameter as attribute and the type of the parameter as an attribute too. XL

based parameter / value transfer is esp. useful when transferring complex

structured values between functions (e.g. objects). Nevertheless complex

parameters (objects) need to be serialized. It is suggested that OpenTMS

defines some additional basic parameter types which often occur in transla-

tion tools (e.g. date type, TransUnits from XLIFF, tu or tuvs in TMX).

• Tomcat parameters: This follows the way how the TOMCAT server engine

defines method calls with parameter values. Actually also XML based.

• XML-RPC parameter: This follows the way how XML-RPC defines method

calls with parameter values. It supports some basic types like integer etc.

More complex parameters have to be serialized.

Page 17: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 17/72

• Programming Language specific parameters: Those parameters should

be wrapped in a specific object thru serialisation. This parameter type

should only be used within a specific implementation where it is very

unlikely that it will be used by other programming languages.

• Hash tables: Hash tables are supported by most programming languages

and transfer between database is often supported. Basically an entry in the

table contains a key (the name of the parameter) and the value of the pa-

rameter (value of the key).

The kernel of each language specific OpenTMS implementation contains a basic

library which supports creating reading and writing OpenTMS parameters.

Type Comment

int Integer as in Java

float Float as in Java

char Character as in Java

String String as in Java

Time

Date

TransUnit XML based XLIFF TransUnit Structure

tu XML based TMX tu Structure

GLO General Linguistic Object - see chapter

12

MoLo Monolingual Object - see chapter 12

Mulo Multilingual Object - see chapter 12

Fig 4: Table of Core OpenTMS parameter types

An example how parameters are used is given in Fig. 2.

Page 18: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 18/72

6.2 Core Models of OpenTMS

The following chapter describes the core models of OpenTMS. The key idea is

that OpenTMS uses an extendible architecture approach which allows to add new

models in an easy, yet compatible way to the kernel architecture. A new model

has to fulfill some basic requirements, e.g. that parameters are defined and used

in the way as described in the previous chapter 6.1.

Fig 5: OpenTMS Models and their relations

The OpenTMS models are arranged in a kind of “onion model”. The kernel is rep-

resented by the process model which in turn builds on the user, document and

data model which model specific aspects of the OpenTMS system. These kernel

models are “shielded” by the security model which is responsible for assuring that

only allowed operations are performed.

Page 19: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 19/72

• Security Model: This model describes the security aspects and require-

ments of OpenTMS. Other models use the security model to allow or re-

strict the access to OpenTMS specific functions. OpenTMS uses a security

model which on the one side secures the communication channel and on

the other side secures data (e.g. the value of elements in an xml file or the

values in a property file).

• User Model: This model realizes the user and its representation in the

OpenTMS. The user model works in tight connection with the security. User

does now only imply human users, but also other processes. User models

have rights attached to them which in turn support the security model of

OpenTMS.

• Process Model: This model implements the functions (combined finally into

applications – see application model) of the OpenTMS, e.g. a converter or a

translation memory search.

• Data Model: Basically this model implements the database side of

OpenTMS. It uses a generalized database model, called data sources.

Data sources are any kind of storage media for data, starting from plain text

files towards SQL and other types of databases.

• Document Model: The document model describes the core documents

used in OpenTMS. Basically this is based on XLIFF and TMX. The docu-

ment model also could be seen as part of the data model but due to the im-

portance of documents as one of the core output produced by the transla-

tion and localization process they are modeled separately.

• GUI Model: This model specifies editors and other functionality which re-

quires a GUI. The GUI model is not further detailed in the architecture

specification here. The GUI model should be defined in a separate docu-

ment.

• Interface Model: The model describes how to extend OpenTMS with new

models. The Interface model is an abstract model and needs further inspec-

tion. An example of such an extension is the interface to CMS systems. In-

terface models are also of quite importance as they serve as the connection

Page 20: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 20/72

to other applications (e.g. Web servers, CMS systems) and in general to

scripting languages like Perl, PHP etc.

• Application Model: This model realizes programs, which performs tasks

like translation etc.

6.3 OpenTMS Core Library

In order to achieve a consistent implementation and in order to foster a quick im-

plementation OpenTMS implements its key functions in a core library. Function

implemented in the core library should not be re-implemented (“reinvented”) in ex-

ternal functions or processes. Obviously the set of key functions will evolve over

time. Functionality and implementation of the core should not be changed without

important reasons (similar to the LINUX implementation process).

Using a core library OpenTMS will ensure that certain functions behave in the

same way across applications. It also gives security to the developer and the user

that functionality does not change unforeseeable.

Core library functions should be the first one which are realized if OpenTMS is im-

plemented in different programming languages.

6.4 The Application Model

The OpenTMS architecture just serves as a model how the different aspects of

tools supporting the translation process can be implemented. As a model it is in-

dependent from any programming language.

Applications need to be written in order to make the functionality of OpenTMS

accessible to users. This is realized in the application model. The GUI model can

be seen as an example of an application model.

Applications obviously depend on the existence of a concrete implementation in an

existing programming language (Java, C#, Perl or whatever). In this sense

OpenTMS provides a programming framework which allows to construct language

support tools.

Page 21: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 21/72

In the beginning OpenTMS will come with some basic applications (Editors etc.).

But the main idea is that a profound framework is defined and specified which al-

lows the construction of new language applications.

OpenTMS also supports its own scripting language (OpenTMSL). This language

makes the OpenTMS functions accessible thru simple calls (similar to batch files).

This scripting language can also be used to construct applications.

6.5 Implementation Languages

In a first step it is suggested to implement a Java version of OpenTMS. Java has

the advantage compared to other languages that it runs on several operating ma-

chines (which is one of the goals of FOLT and OpenTMS). Integrating tools written

in other language can be done as OpenTMS from its basic model is constructed

toward using XML-RPC and similar communication modes.

The basic Java implementation can serve as the basis for other implementations

(C, C#, C++, Perl, PHP etc.).

With regard to security issues associated with choosing a proper programming

languages see chapter 7.

Page 22: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 22/72

7 SECURITY MODEL

A key success factor of the OpenTMS system is security. As translation always

can involve documents of various security levels a proper handling of the docu-

ments and document transmission is required.

Depending on the security level data can be encoded/encrypted. It is suggested to

use three different levels.

• Level 0: No security procedures are applied, data are transferred as they

are.

• Level 1: The communication channel is secured. It uses standard secure

protocols here.

• Level 2: Encoding for security is done here on data level. Basically this

means that strings are encrypted when the are communicated through a

communication channel or are written or retrieved from a database. This

also involves encrypted XLIFF files (resp. parts of it).

• Level 4: GUI level related security

Level 1 and 2 can be used together to achieve optimal security where necessary.

Security is attached to the OpenTMS User model.

A key feature of the OpenTMS architecture is that the security model is transpar-

ent. Actually when writing a (new) application the programmer does not need to

take care of the security expect. The OpenTMS kernel provides all the functions

and interfaces to make those calls transparent; supplying the correct parameters is

sufficient.

Actually another type of security level (Level 4) can be introduced at GUI level. At

this level functions like copy and paste are secured in addition. This should pro-

hibit that users can copy and paste the content of text windows (editing windows)

into other applications. Defining this security level will be left to the GUI model

definition.

Page 23: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 23/72

The following diagram shows how several methods can be combined to achieve a

high security during the transmission of an XLIFF file. In this example in a first step

the XLIFF is secured (encrypted). Once a transfer of the file during the net work is

required the channel as such is also secured. Once the XLIFF file is received it is

decoded by the OpenTMS system. From a programmatic side this is just realised.

by setting and defining the security to be used.

Fig 6: Example securing XLIFF document exchange

7.1 Security, OpenTMS and Programming Languages

In the previous chapter the issue of programming languages has been discussed.

A common known problem with programming languages – more precisely with

applications written in those languages and often also only associated with specific

Page 24: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 24/72

operating systems – security measures are often not properly implemented (e.g.

the very old problem of “buffer overflows” in C).

OpenTMS overcomes this problem by clearly defining specific modules which are

encapsulated and follow modern software development rules (e.g. access only

thru well defined interfaces) a special security layer wraps the various modules.

This architecture specification is mainly targeted towards the server part of

OpenTMS. Thus it is independently from any GUI application.

GUIs can use OpenTMS basically in two ways:

a) thru the OpenTMS server functionality: This approach encapsulates all

modules and functions and gives the highest possible security measure.

Here only “public server sided functionality” can be used.

b) Directly calling functions from the OpenTMS library: Obviously this can

cause problems if the GUI does not call the functions properly (esp. in pro-

gramming languages like C or C++).

One of the OpenTMS target GUIs are web based applications (browser based).

Those will call all the functionality thru a web server, SOAP or XML-RPC inter-

faces. This minimises the danger of introducing security problem on the client size

(e.g. for GUIs which have to follow requirements like ZDv 54/100 VS-NfD „IT-

Sicherheit in der Bundeswehr“). By restricting to “plain HTML” one can reduce the

risk to a minimum. Obviously increasing the security level goes with a decrease in

comfort und user friendliness. This decision is up to the end user and his organisa-

tion.

7.2 Communication Level

Communications which goes through TCP/IP should support (strong) encryption of

the data transmitted. This is done in addition to using protocols like https, se-

cureFTP etc.

7.3 Document Level

The basis of most activities in OpenTMS are documents.

Page 25: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 25/72

A key problem is the transfer of xliff files. The content of the segments are nor-

mally readable by human readers. If required the segments in the xliff files (as well

as in tmx or tbx files) can be encrypted (creating something like a secureXLIFF,

secureTMX, secureTBX). The segments can only be read in conjunction with a

user and password. The users who have regular access to the content can be

stored in encrypted form in the header of the xliff file or be supplied when opening

the xliff document.

7.4 Database Level

Database entries follow the same procedure. If required the entries should be en-

crypted. At this level database specific security functionality can and should be

applied to.

Without the knowledge of the user - password combination an export etc. of the

database does not provide any information in case of an attack.

In addition any data base security layers need to be supported too.

7.5 Security Level

The following functions assume that each encryption and decryption process as-

sociates the relevant user and his roles with the security function. At this point no

function parameters are defined. This will be done in an implementation manual.

Page 26: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 26/72

Function Comment

Encrypt / Decrypt General function which encrypts and decrypts any type of document

Encrypt XLIFF

Decrypt XLIFF

This function encrypts the texts (segments) of a XLIFF document. The xml structure as such is still visible. Depending on the parameters supplied attributes etc. are secured too.

Encrypt TMX

Decrypt TMX

This function encrypts the texts (segments) of a TMX document. The xml structure as such is still visible. Depending on the parameters supplied attributes etc. are secured too.

Encrypt TBX

Decrypt TBX

This function encrypts the texts (segments) of a TBX document. The xml structure as such is still visible. Depending on the parameters supplied attributes etc. are secured too.

Establish Secure Communi-cation

Establish a secure communication channel. The type of security depends on the supplied parame-ters.

Terminate Secure Communi-cation

Terminates a secure communication channel.

Secure Data Source Enables the encryption / decryption of database entries.

Page 27: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 27/72

8 BASIC OPENTMS COMPONENTS

The OpenTMS framework is organized around a set of basic components called models (see chapter 6) which interact and allow to apply processes on them. The following is a brief overview which basic models exist:

• Documents: Documents form one key feature of the architecture. Basically

documents are every form of text. Translations and other modification proc-

esses (e.g. segmentation) are applied to documents. A key document type in

OpenTMS is an XLIFF document which is main paradigm for communication

text between various processes.

• Database: Database refers to any kind of storage which can be used to re-

trieve a specific text or sub-text (like a paragraph, segment). Database in the

OpenTMS context is understood widely, starting from simple text files towards

highly sophisticated SQL or object oriented database systems. OpenTMS uses

a general database object which can come in various flavors, e.g. translation

memory, a phrase database or terminology databases. OpenTMS database

architecture supports various security levels. Encrypting of entries should be

supported. OpenTMS uses the notion of “data source” for this generalized

data bases.

• Processes: Processes apply operations to documents and databases. Opera-

tions could be: modifications, inserting, searching, editing, converting etc. A

key process in OpenTMS is the translations process. OpenTMS processes are

named “Translets” (or Translet in singular). An example of a Translet is a Do-

clet, a module which is applied for the conversion, modification etc. of docu-

ments. Processes in OpenTMS are normally accessible through the OpenTMS

Scripting Language, a language which gives access to the core operations of

the OpenTMS architecture (similar to Java Scripts)

Page 28: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 28/72

Fig 7: OpenTMS Objects

From a certain perspective processes can be seen as a special type of commu-

nication. Within OpenTMS three different communication types can be distin-

guished. Communication is here used in a broad view.

• Command (file) based process: Here an executable is run (batch mode).

Command processes use xml based command files as input parameters.

• Function based process: Here the specific process is called either as a func-

tion or method within a piece of software.

• Net (TCP/IP) based process: Here a process is run through a net work

(TCP/IP) using SOAP, RPC, XML-RPC or similar communication methods. The

method is activated in a certain process while the actual execution is run in an-

other process (could be a server, a virtual machine, multi threading or similar).

• Workflow: A workflow is a set of processes which are applied in a specific se-

quence. A workflow also may involve humans as part of the workflow. A typical

workflow could be: PM received document to translate – determines document

characteristics – compute statistics – provides offer – client accepts offer – PM

determines translator – converts document for translator – sends to translator –

and so on. This means that a workflow also can contain purely humans actions

interwoven with computer processes. Anyway each human process must be

mapped to a computer process.

Page 29: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 29/72

Later in the document it is mentioned that processes can be organized in pipe-

lines. Actually this means that one process can take the output of another process,

do some computation on this output and create a new output which itself can now

form the input to another process.

Page 30: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008

30

9 DOCUMENT MODEL

9.1 Documents

Documents(“texts”) are a core concept in OpenTMS. Documents are normally the

core interest as documents need to be translated. Documents normally come into

OpenTMS as input or output. Documents are normally processed in OpenTMS

thru XLIFF (chapter 9.4). Documents are converted into XLIFF and back. Docu-

ments come in various formats, e.g.:

• WinWord

• RTF

• Plain text

• HTML

• XML

• OpenOffice

• program texts

• resource files

• property files

• database entries

• any other common location industry formats

• any other document type

The most simple type of a document is a string, a sequence of characters. For

OpenTMS processes strings are packed into XML structures, mainly a subset of

XLIFF.

A key property of a document is a language associated with it – although the lan-

guage itself may vary within the document. If a document gets translated at least a

second language is associated with it.

Page 31: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 31/72

9.2 Character Sets

OpenTMS uses the Unicode character set for all (internal) representation pur-

poses. This has the advantage that most of the characters used worldwide can be

processed with OpenTMS. Also most programming languages use nowadays Uni-

code as their internal character representation.

UTF-8 formatted text is used as the core character set if OpenTMS produces and

delivers files which are some kind of final document (e.g. for statistics output). De-

viations come in if the original character set differs.

The core library of OpenTMS contains basic functions to convert from one charac-

ter set to another character set. In addition the kernel library should contain some

functions which allow the detection of a character format of a document.

9.3 XML document handling

OpenTMS heavily uses XML bases standards (XLIFF, TMX, TBX). There are sev-

eral good open source implementations for XML handling available (DOM model,

SAX parser, JDOM just to name a view). Obviously those functions should used to

manipulate those documents.

On top of the standard xml library functionality functions are required to support

the manipulation of the translation / localization XML standards. Those functions

will also be part of the core library.

9.4 XLIFF Documents

XLIFF documents form the core document type on which most of the processes

are applied (segmentation, translation etc.). XLIFF documents are created by con-

verters. Converters take different document formats (rtf, xml, html etc.) and con-

vert them to the xml based XLIFF format (XLIFF, 2008).

The following shows a very simple example of an XLIFF document.

Page 32: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 32/72

<?xml version="1.0" encoding="UTF-8" ?>

<xliff version="1.0">

<file datatype="XML" original="D:\araya\test\simplexml\simplexml.xml"

source-language="de" target-language="es">

<header>

<phase-group>

<phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase-

name="1" process-name="pre-process" tool="XML2XLIFF version 2.0"/>

<phase company-name="Araya" date="Sun May 11 11:29:11 CEST 2008" phase-

name="2" process-name="Segmentation" tool="SEGMENTER version 2.0"/>

</phase-group>

<skl>

<external-file href="C:\araya\skl\simplexml.xml.27120.skl"/>

<internal-file

form="mimestring">PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiID8+DQo

8c2ltcGxleG1sPg0KPHNl

Z21lbnQ+JSUlMCUlJQo8L3NlZ21lbnQ+DQo8c2VnbWVudD4lJSUxJSUlCjwvc2VnbWVudD4NC

jwv

c2ltcGxleG1sPg==</internal-file></skl>

<prop-group name="encoding"><prop prop-type="encoding">UTF-

8</prop></prop-group>

<prop-group name="xmlformat">

<prop prop-type="donotresolveentitiesfile">C:\araya\ini\edqm-

ent.txt</prop>

<prop prop-type="iniFile">c:/Araya/ini/config_simplexml.xml</prop>

</prop-group>

<prop-group name="specialinfo">

</prop-group>

</header>

<body>

<trans-unit approved="no" help-id="0" id="0" xml:space="preserve">

<source xml:lang="de">Das ist ein Segment</source>

<target xml:lang="es" xml:space="preserve"/><prop-group><prop prop-

type="segmentid">1067381512</prop></prop-group></trans-unit>

<trans-unit approved="no" help-id="1" id="1" xml:space="preserve">

<source xml:lang="de">Das ist ein <ph id="0">&lt;b&gt;</ph>Segment

mit<ph id="1">&lt;/b&gt;</ph> Format</source>

<target xml:lang="es" xml:space="preserve"/><prop-group><prop prop-

type="segmentid">1067381512</prop></prop-group></trans-unit>

</body>

</file>

</xliff>

Fig 8: XLIFF File

9.4.1 OpenTMS and Skeleton files

Skelton files are one of the key features of XLIFF. In order to reduce the size of

content of a segment (transunit, source and target) most converters move the non-

Header of the XLIFF File

Reference to an external file

Internal File

Properties of the XLIFF File

Segments

Page 33: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 33/72

relevant part (e.g. format information) of an (external) document in an external rep-

resentation. They then use a kind of referencing scheme to specify where parts of

the text and the segment come together (mainly for back conversion). Skeleton

files mainly contain the format (non-textual) part of a document. Often this part is

bigger than the core text.

One can distinguish between internal and external skeleton files (also called skl

files).

External skl files keep the XLIFF file small, while internal skl files create a bigger

XLIFF file. With external files the problem of back conversion is more complicated

as the back converter requires the skl file. One way to overcome this problem is to

compress the internal skl file and encode it appropriately.

OpenTMS supports the back conversion of a document independently from the

place it was created. Thus normally XLIFF files in OpenTMS use internal skl files.

In case where this is not possible or wanted a procedure must be supplied which

allows to reintegrate the skl file into the xliff file before transmitted to another ma-

chine, user etc.

9.4.2 Security and encryption in XLIFF – secureXLIFF

As described in the section about security XLIFF documents must follow the secu-

rity architecture of OpenTMS. XLIFF documents are potential threat for security. If

they are transmitted via the web or by another transport method (USB stick etc.)

other persons may read the XLIFF document. In order to prevent access of unau-

thorized users it is proposed to encrypt the relevant parts (esp. source and target

elements) of the document. Only specified users with the correct password will

gain access through an editor or similar to the content of the XLIFF document.

XLIFF editors reading the file must support the OpenTMS security layer. Using

such a security approach one also could forbid copy and paste etc. for a given xliff

document.

Annotation: Obviously an open source encryption method should be used.

Using a secureXLIFF may be a good argument for industrial user to use the

OpenTMS concept and architecture.

9.5 TMX Documents

Page 34: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 34/72

TMX documents form the core document type on which database operations apply

(fuzzy search, word based search etc.). TMX documents resp. their entries are

stored in databases. Converters take different translation memory exchange for-

mats (Trados, etc.) and convert them to the xml based TMX format (TMX, 2008).

Databases store the tmx entries. While there is no problem with the meta informa-

tion associated with each TMX entry (tu) the global TMX document meta informa-

tion creates a problem. As databases are organized around entries this meta in-

formation must be stored in separate tables and referenced by each entry.

TMX files are normally imported into databases to support high access speed1

.

9.5.1 Security and encryption in TMX – secureTMX

The same security architecture as for XLIFF should be applied to TMX.

9.6 TBX Documents

TBX documents form the core document type for terminology data. TBX docu-

ments are imported into a OpenTMS database. TMX and TBX documents are in-

ternally stored in the same entry structure. They can distinguished by specific

markers.

The reason for storing both TMX and TBX documents in the same type of data-

base is that this allows the re-usage of both data in similar situations. Obvi-

ously the database functions need to support reading and writing the entries given

the context. This a (originally) TBX entry may be used as a TMX entry (translation

memory match) in one context while a TMX entry could be used as a terminology

match in another context. This internally identical handling should not imply that

both entry types are the same but reality shows that often the usage patterns re-

quire that they can be used interchangeable.

9.6.1 Security and encryption in TBX – secure TBX

The same security architecture as fur XLIFF should be applied to TMX.

1

A key question is if OpenTMS should allow direct access to TMX files (like Star text files) too without having the need to import them into a database. Advantage would be that esp. for small TMX files there is no real need to store them in a database. It would also not require any database drivers. XML access functions would be sufficient. One could see this a special type of database.

Page 35: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 35/72

9.7 Other Documents

OpenTMS requires to process all types of other documents. Once those files are

brought into the OpenTMS system those files are converted to XLIFF (except

those cases discussed above). Once processed those XLIFF documents are con-

verted back to their original format.

Ideally OpenTMS should contain or interact with a CMS system which provides a

convenient way of storing all kinds of documents. Interfaces to CMS will be de-

fined. Although the implementation of the interface is not part of the OpenTMS

implementation. See chapter 18

9.8 Basic Document Access Functionality

In the following some basic XLIFF file functions are described. Those functions

should go into the core library of OpenTMS. They are by far not exhaustive. A

more detailed function library for XLIFF will be defined later. Although most of the

functions can be realised by using DOM functionality, a function library which

makes it easy to handle XLIFF files should be realised.

As the functions will involve complex parameter combinations the parameters will

be supplied as XML constructs. For performance reason one will not really supply

flat xml files, but an in-memory version of the XML file (nodes etc.).

Basic Translation Func-tions for XLIFF documents

Comment

Convert Document Converts a given document to XLIFF

Backconvert Document Back converts a given document from XLIFF

CreateXLIFFDocument Creates an empty XLIFF document. This function maybe questionable as normally XLIFF docu-ments have just an temporary status. The nor-mally come into existence thru a converter call. Nevertheless such a function may be helpful. Pure to text conversion can be achieved anyway.

GetProperties Retrieves the (general) properties of the XLIFF document

SetProperties Sets the (general) properties of the XLIFF docu-ment

Page 36: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 36/72

Segment Segments the XLIFF document based on some SRX rules (configuration file)

AddTransUnit Adds a new TransUnit at a certain position. This function also depends on the original format. De-pending on the format this function may cause problems in the back conversion process.

RetrieveTransUnit Retrieves a segment of the XLIFF document; this includes all the information of the segment (thus the whole trans-unit is received)

RemoveTransUnit Removes a TransUnit; here one could distinguish between immediately (and therefore permanently executing the operation) or just making the change in memory and later saving the changes.

ModifyTransUnit Modifies a TransUnit; here one could distinguish between immediately (and therefore permanently executing the operation) or just making the change in memory and later saving the changes.

TranslateTransUnit The TransUnit is translated based on some pa-rameters supplied. This can include TM transla-tion, term translation or machine translation or basically any other kind of translations or �nvocacation.

SplitTransUnit Splits the source part of a TransUnit. Care has to be taken with regard to validity.

CombineTransUnit Combines the source parts of a TransUnit. Care has to be taken with regard to validity.

SaveDocument Saves the XLIFF document

GetStatistics Returns some statistics of the translation process (GMX based)

Fig 9: Some basic XLIFF File functions

Page 37: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 37/72

10 OPENTMS AS A CLIENT/SERVER ARCHITECTURE

The kernel OpenTMS architecture is based on the client server principle. Using

a client server architecture brings many advantages, amongst the very critical one

that processes can be spread over several computers or threads in modern oper-

ating systems and hardware architectures. This does not imply that the OpenTMS

architecture only can be implemented on a client server basis. All the processes

(Translets) also can run in a single user environment (e.g. by a procedural call

within an editor). But by using a client server framework one avoids the problem to

re-program or re-implement a piece of software which was designed to run in a

single threaded environment only. This holds with regard to using global or static

variables etc. from an implementation point of view.

Each procedure developed for OpenTMS should be designed with multi thread-

ing in the background. Each procedure should be encapsulated in such a way that

it can be surrounded by a (process wrapper) which allows it to run other as a

(multi) thread in the same software or computer environment or can be distributed

over several computers. Actually this means “globally defined variables” should

be avoided as far as possible. As has been described before the key functions are

implemented in the OpenTMS core library.

All (main) procedures should also be written in such a way that they can be called

easily by the OpenTMS scripting language.

Page 38: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 38/72

Fig 10: Hierarchy of processes

Processes have to adhere to the security concept of OpenTMS. Processes can

only be executed if they (and the user associated with the process) have appropri-

ate rights (gained thru the security model). This esp. applies for processes which

use network connections.

Fig 11: Applications

Most of the processes are XLIFF exchange based (thinking in terms of functions

and variables this means that the parameters of functions are XLIFF documents or

substructures of XLIFF). This means that the processes mainly operate on XLIFF

based xml structures. They add or modify XLIFF structures. In principle the opera-

tions should be non destructive. That is information is not deleted or removed but

only added. In some cases this cannot be fully held: e.g. if a translator modifies a

translation (in a destructive way) the (older) information is lost. The same may ap-

ply to database entries. This also depends on the usage of a proper versioning

system. As a consequence of using internally XLIFF related structures conver-

sions to related XML based formats like TMX, TBX etc. must be supported. This

can be realized by attaching import and export procedures to the OpenTMS ker-

nel.

Page 39: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 39/72

Exceptions are for example converters which take a whatever formatted docu-

ment as input and produce an XLIFF document. The same applies to back con-

version.

Please note that the above figure also represents some kind of workflow. Basic

workflows can be part of the OpenTMS architecture (e.g. each process applying

changes to an XLIFF document should document this in the XLIFF header). But it

is not intended that OpenTMS as such comes with its own workflow solution. More

complex workflow procedures should be modeled either using proprietary or open

source software.

OpenTMS also follow the “old style” of UNIX pipe lining. Processes (see chapter

about process model) take an input and produce an output. The next process will

take the output of the previous process applying some further transformation of the

input and creating new output. Nevertheless there is some difference. As parame-

ters can become quite complex the UNIX style of interpreting the input just as “a

string” is opened here up to support input and output in form of the parameters

described before.

Page 40: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 40/72

Fig 12: Pipeline Architecture

Figure 11 shows a typical pipe lining of several processes (Translets) during a

translation process. OpenTMS can differentiate between two basic Translets.

• Human Initiated Translets: These are Translets which are invoked and

(fully) controlled by humans. Examples are a Translation Editor, operation

which invoke inserting or updating entries in a database.

• Automated Translets: These are processes which are normally run auto-

matically and do not require human interactions. Examples are the steps –

conversion – segmentation – pre-translation. Here also automated pro-

cedures (e.g. pre-translating a project – Translets applied to a set of docu-

ments) have to mentioned.

Page 41: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 41/72

11 DATA MODEL

11.1 Data sources

Data (mostly databases) are modeled thru data sources. Data sources are the ba-

sic objects which allow the access to all kind of data, esp. databases. Data

sources mainly store segments from TMX files or TBX entries. Data sources are

XML oriented, that is depending on the xml document supplied it converts the en-

try in such a way that it can be transferred to a data component.

Fig 13: Data sources and data components

Why not directly refereeing to databases? The basic idea behind the usage of a

data source as the core data object in OpenTMS (representing databases) etc. is

that creating such a layer between the real databases (e.g. MySQL) and the

OpenTMS software makes adding new types of data quite easy. The various types

of data are referred to as data components. Thus an SQL database is a data

Page 42: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 42/72

component, but also a TMX file could be seen as a data component if the relevant

access operations are supported. Similar an Excel file can be considered as a

data source. Using this approach OpenTMS is not restricted to SQL databases,

but can use flat files, spread sheets etc. too. It can also support direct access to

vendor specific databases or systems. A server sided installation of OpenTMS can

also act as data source.

Fig 14: Data sources with several data components

OPE

NTMS

SOFT

WARE

OpenTMS

DataSourceLayer

Data type specific

access functions

Maps the OpenTMSaccess functions to the

specific data component

Access to data sourcesthrough standardised

interface

Various datacomponents like files

etc.

Page 43: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 43/72

A data component which is connected thru a data source must support a core

functionality. This core functionality is divided into three types of functions (meth-

ods):

• Read methods: This involves all functions retrieving data from a data

component. Read methods also maps the results in the way the caller

needs the data (e.g. TBX or TMX).

• Write methods: This involves all functions writing, updating and deleting

data to a data component. Write methods also take into account which in-

put format is used (e.g.TMX or TBX etc.) and convert them into the internal

data source format.

• Select Methods: This methods are part of the read methods and allow to

select specific entries from the data source.

Care has to be taken which security level has been chosen. Depending on the

level the data have to be encrypted and decrypted.

Two types of data components can be distinguished:

• Read only data components: This type of component can only retrieve

data, but not store data. An example could be if a plain TMX file is used as

data component.

• Full data components: Here both read and write methods are supported.

Depending on the user configuration data components can be configured to be-

have differently. It can appear as read only data component for one user, while for

another used it could be accessible as full data component.

11.2 TM Matches

OpenTMS differentiates between three types of matches:

• Perfect Match: This is a match where the segment to be searched

matches the segment in TM both with regard to the text content and

the format

• Exact Match: In this case only the text part of the segment matches with

the database entry perfectly, the format information differs.

Page 44: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 44/72

• Fuzzy Match: In this case there are some deviations between the search

segment and the match in the TM. The difference is usually stated in %

values. This type of match is also often called inexact match.

One may consider in the future other types of matches too, e.g. replacement class

matches where only the “blank characters (white spaces)”, differ. For this see also

chapter 12.3.

11.3 Basic data source access functionality

The following (read and write ) access functions are the core functions need. Ac-

cess results in matches. A basic idea is that that the function decides based on the

input supplied how the entry is interpreted and written into the database. This

means that TMX entries are handled differently from TBX entries etc.

Please note that in the description of the functions no explicit reference is made to

the security model. It is assumed that the security level is set before or in invoca-

tion with the database function invocation.

Page 45: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 45/72

Access Type Comment

Exact Access A given entry is found by the “string=segment” supplied but independently of the format..

Exact Format Access A given entry is found by the “string” supplied tak-ing format information into account.

Fuzzy Access A given entry is found by using a similarity search. Similarity is measured in %, where 100% is iden-tical to an exact access.

Fuzzy Format Access A given entry is found by using a similarity search – taking the format into account. Similarity is measured in %, where 100% is identical to an exact format access.

Word Based Access A search is done by splitting the string into indi-viduals words. The word identification is language dependent. The words could either be searched using OR or AND

2

. Word based access could be enhanced by supporting stemming (e.g. Porter stemming algorithm)

Regular Expression Access A regular expression is used to retrieve the result set. Actually such a function is quite resource consuming.

Sub segment Access Segments are retrieved based on some sub seg-ments of a given search string. Actually this could be seen as a more specialized form of the regular expression search or word based search. This type of search is esp. important if a segment ac-tually represents a paragraph and may contain several sentences.

Fig 15: Data source access types

2

It is suggested to use a logical represenation of the query similar to Google (www.google.com). Here + denotes”word must exist”, while – denotes that the word is not allowed to exist in the result set.

Page 46: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 46/72

Access Functions for TM and TBX data

Comment

RetrieveTMMatch Get a match from the Translation Memory. The actual result depends on the data source access type chosen. Parameters involve match quality etc.

RetrieveTBXMatch Get a TBX match from the terminology database. The actual result depends on the data source ac-cess type chosen.

AddEntry This is a generic function adding data (e.g. TMX entries) to data sources. The function is generic in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.).

CreateEntry Creates an empty data source entry of a specific type

AddTMEntry Adds a TM entry; actually a specialization of Ad-dEntry

AddTBXEntry Adds a TBX entry; actually a specialization of Ad-dEntry

RemoveEntry This is a generic function removing data (e.g. TMX entries) to data sources. The function is ge-neric in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.)

ModifyEntry This is a generic function modifying data (e.g. TMX entries) to data sources. The function is ge-neric in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.)

CopyEntry This is a generic function copying data (e.g. TMX entries) to data sources. The function is generic in that that sense that it decides on the type of the xml document to be added how the entry is stored (TMX, TBX etc.)

Fig 16: Data source access types

Page 47: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 47/72

11.4 Databases

A key principle of the OpenTMS architecture is its independence from database

products. OpenTMS defines a core subset of access functions (based on SQL)

which can be implemented by nearly all database systems.

The following gives a (a non exhaustive) list of database types which should be

supported3

.

11.4.1 Open source SQL data bases

• MySQL - www.mysql.de

• Postgres - www.mysql.de

• H2 - www.h2database.com

• Cloudscape - www.ibm.com/software/data/cloudscape (IBM)

• …

11.4.2 Closed source SQL databases

• SQL Server (different flavors) -

www.microsoft.com/germany/sql/default.mspx

• Oracle - www.oracle.com

• …

11.4.3 Alternatives

SQL databases are not the only databases out there. Other database formats

could be:

• Spreadsheets (like SQL)

3

A key question at this point is if OpenTMS should implement something as an “internal database” which just would mean storing the database as “simple hash tables” which can be serialised and de-serialised. See also the discussion of TMX documents (Footnote 1). Alternatively the internal database could just consist of an xml file.

Page 48: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 48/72

• Object oriented databases

• XML database systems (e.g. XINDICE)

• Plain text files

Page 49: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 49/72

11.4.4 Database Access

Internally all main access functions of OpenTMS are based on specific objects

(see page 51) and all access happens through these objects. By using this addi-

tional abstraction level (interfaces as they are called in most programming lan-

guages nowadays) one gets even independent from SQL and is open for future

advances in the area of databases development.

All access functions are mapped to SQL statements (or their equivalents) which

are not hardcoded but stored in xml database configuration files.

Till this point there is no real necessity to realize the database only in SQL. The

advantage of using SQL as the language describing the access functions is a) that

it is widespread and b) standardized.

Fig 17:Configuring different database types

11.4.5 Database and data source configuration

As OpenTMS needs to support a lot of different database / data sources type add-

ing a new database type should not require changing the data source code kernel.

Page 50: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 50/72

Therefore for each data source type a configuration file defines the main pa-

rameters of the database. Depending on security require the configuration file

can be secured using the security model functions for documents. This includes:

• Database class driver – e.g com.mysql.jdbc.Driver

• Connection String – e.g. jdbc:mysql:

• Any other connection string specific commands (e.g. buffer size)

• Commit support

• Unicode support

• Server Address

• Port

• User (encrypted)

• Password (encrypted)

• Mapping of OpenTMS database access function to database specific ac-

cess code (e.g. SQL code like <command step="1">DROP TABLE MONO

IF EXISTS MONO</command>). Depending on the access functions they

can be organized in groups if a specific functionality requires to run sev-

eral database functions (e.g. creating all the necessary tables for a new

database). This is mainly important for SQL databases as here a variation

of supported SQL types exist.

• Reference to code (e.g. jar file, dll etc.), If a specific functions needs to run

at a specific point of time (e.g. creating a new database). This should en-

able to inject specific implementation code for specific tasks (e.g. if some

functionality cannot be executed thru SQL commands)

In addition a more generic interface can be called if a database cannot be inte-

grated with the configuration file specifications above. In this case the whole inter-

face for the new database needs to be implemented and made available to

OpenTMS.

Page 51: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 51/72

12 TRANSLATION OBJECTS

A key entity in the translation process are translations. Translations (inherently

multilingual) consist usually of segments (monolingual) and languages associated

with those segments.

As a consequence the architecture uses three types of language related entities.

This objects are used by processes to create the translation functionality.

A “General Linguistic Object” (GLO) contains information (features, attributes)

which are common to all linguistic information types. Examples are: unique id,

creation and modification dates, authors etc.. Linguistic Objects always can be

serialized to XML. Main supported formats are here: XLIFF, TMX and TBX.

From that object two objects are derived:

• A “Monolingal Object” (MoLO) which represents a linguistic entity for a

given language. It inherits all the features of GLO and adds for example

the language of the entity (segment).

• A “Multilingual Object” (MuLO) represents translations by linking one or

more MoLOS into one object. A MuLO constists at least of one MoLO and

can contain up to n MoLOS. It is not required that each MoLO of a MuLO

has a different language.4

Each of those object types contain a unique id, in addition a MoLo inherits an

MuLO related id so that it can be easily associated with its translations.

4

The behaviour of multilingual objects can be configured. One option can be to treat all entries as bi-lingual objects only. Thus one MuLo only would contain MoLos – a source and target MoLo. Normally options like this should be used with caution as they introduce problems in managing real multilingual databases. This is esp. true if one source segment may have several transla-tons (target MoLos). Nevertheless there may be cases where one requires to have several translations for a source segment, eg. Something like a temporary translation. In this caseit is suggested to associate “status attributes” with the MoLo. This could be the used on the one hand as a sorting criteria for matches and on the other hand for identifying problem transla-tions.

Page 52: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 52/72

Obviously attributes are associated with Linguistic Objects. As several standards

are used (TMX, XLIFF and TBX) a mapping of the attributes between the different

types is required. Within the object the attributes may be identified through their

name space.

Fig 18: Representation of linguistic entities as General Linguistic Object

12.1 Format information

Format information (e.g. transported thru the <ph> tag in XLIFF ) and its correct

handling is a key and kernel function of OpenTMS. The core OpenTMS library

contains all the necessary functions to handle format information correctly.

OpenTMS should aim at providing the highest possible support in format handling.

12.2 Terminology versus Translation Memory

Page 53: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 53/72

Within computational linguistics a key difference is made between terminology and

translation memory. Both concepts clearly are used in two different contexts. This

is also reflected that there are (at least) two standards: TMX (TMX, 2008) and TBX

(TBX, 2008). Nevertheless from a conceptual and software engineering point of

view both concepts share more than distinguish them. Both have “strings” as their

basic representations – either as terms or as segments – and also meta informa-

tion matches in most cases. A main difference is their context usage. TMs are

normally applied at segment level; consist normally of more characters), while

terms are used at a sub segment (word, phrase) level.

As this differences only appear at the usage level OpenTMS consequently imple-

ments the same underlying (database) structure for TM and term entries. Using

special markers a distinction can be made at run time (= usage time). The advan-

tage immediately can be seen that by this approach both concepts can be used in

different usage contexts. Search and retrieval functionality is available for both

concepts (e.g. fuzzy search is rarely available for term databases; using a com-

mon internal representation this drawback is overcome).

Fig 19: Conversions of linguistic entities

12.3 Variables , placeholders, replacement classes

Translation memory entries, sometimes also terminology entries, often contain

textual parts which can act as placeholders. Typical examples of placeholders are

Page 54: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 54/72

numbers, month names, acronyms etc. In many cases it is possible automatically

replacing those “variable parts” with their actual counterpart in a segment. This is

esp. useful in matching, e.g. just be replacing the numbers in a match with its cor-

rect value to achieve a better match, even a perfect match.

OpenTMS supports for this reason the concept of replacement classes. A re-

place class is specific construct which generalizes a certain type of string or infor-

mation. A replacement class consists of basically two parts:

• A class name (e.g. number)

• A procedure describing the replacement class. In many cases the proce-

dure can be defined through a regular expression. Another option maybe

that specific strings (e.g. terms from a terminology database) may act as

replacement class.

• A procedure maybe language dependent. If a procedure is language de-

pendent transformation rules have to be defined how a value of language A

is transformed to a language B.

Example:

Class: GeneralNumber Procedures: General:

Definition: ([0-9]+?)(\.)([0-9]+?) Transform: $1.$2

German: Definition: ([0-9]+?)(,)([0-9]+?) Transform: $1,$2

The basic idea is that a language specific procedure involves two parts:

• a definition part which describes how to detect (evaluate) an instance of a

replacement class

• a transformation part which describes how to compute the instance of a

replacement class given that a replacement class has been detected (e.g.

in another language)

Page 55: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 55/72

When a replacement class matches parts of segment the matching part is re-

placed with replacement class carrying forward the class name and the value of

the original class.

Replacement classes invoke two main challenges:

• A key problem in defining replace classes is the order in which they are

involved (checked). Depending on the definition of the regular expression

several expression may match (e.g. numbers without and with decimal

points). Open TMS should apply a strict linear order procedure. The first

matching expression is applied and used.

• The other key problem is checking if all the replacement classes appear a)

in both source and target match and b) appear in the source segment (the

one which requires translation). For OpenTMS the proposed solution is that

the replacement classes in both source and target have to mach exactly. If

this is given the replacement classes also have to match source segment to

be translated. It has to be noted that another approach could be used too –

removing the non matching replacement classes in all three involved

strings.

Page 56: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 56/72

13 PROCESS MODEL

13.1 OpenTMS Process

An OpenTMS process realizes the functionality of the OpenTMS system – mainly

supporting the translation process. Examples of processes are converters, seg-

menters, translation memories, machine translation, statistics modules etc.

OpenTMS processes build on the core library functions and move them into a

process environment. In many cases this does not really mean that a process is

created in the deep meaning of a process, it also cold mean that a function of the

core library (but any othr function defined in another OpenTMS context) is called

from an application.

13.2 OpenTMS Scripting Language

Most OpenTMS processes are available through the OpenTMS Scripting Lan-

guage (OpenTMSL). The OpenTMS Scripting language enables developers and

users to write their own scripts to adapt the OpenTMS processes to their needs.

OpenTMSL is defined in a programming language independent way and should be

implemented in different programming languages. It basically makes the functions

defined in the core library accessible to the public through an easy to learn script-

ing language.

Fig 20: OpenTMS Scripting Language

Page 57: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 57/72

OpenTMSL itself is defined within a OpenTMSL XML document and can be read

by different XML parsers. Reference implementations should be done Java, Perl,

C# etc.

Fig 21: OpenTMSL Inter-process and computer communication

OpenTMSL supports also multi threading. It takes a procedure (see chapter about

Basic Architecture) and enriches it with multi threading capabilities as well as in-

terprocess communication capabilities. This requires that the procedures inter-

preted and executed by the scripting languages allows to be run in such an envi-

ronment.

OpenTMSL is designed in such a way that it can communicate with other

OpenTMSL instances on the same or other machines. Running different

OpenTMSL engines on the same machine should enhance reliability and scalabil-

ity of the overall system. One might think that one OpenTMSL engine (instance) is

dedicated towards TM translation where another OpenTMSL instance is dedicated

Page 58: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 58/72

towards MT translations. This approach should also support the option to run older

software (e.g. old MT system; legacy software) in a special environment so that

nevertheless the overall goals of FOLT - exchange of information through stan-

dards - are met.

13.3 OpenTMSL Communication Methods

OpenTMS supports the following communication methods:

• XML-RPC Interface

• SOAP

• HTTP Interface

• Servlet Implementations

• Batch File Processing

Page 59: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 59/72

14 USER MODEL

Users – either human users or processes – are key components of OpenTMS.

Whenever a process attaches to a OpenTMS instantiation a OpenTMS user

name is attached to the process.

Normally the user login-in is used to identify the OpenTMS user name. If

OpenTMS runs as a server process the OpenTMS user name is assigned at ser-

vice start time.

OpenTMS user names are case insensitive.

A OpenTMS user basically consist of a user name (together with one ore more

aliases), a password (or any other secure identification method) , a set of rights

and set of roles as well as a set of groups the user belongs to.

Rights are usual rights like read, write, delete. Most operating systems support

their user right system. OpenTMS reuse those right systems.

If a user uses several machines aliases can be defined to allow identification

across machines. See page 63.

Depending on security require the user model configuration files can be secured

using the security model functions for documents.

14.1 User roles

Users can have different roles attached to them and can appear differently de-

pending on their roles. Each role may have assigned specific rights.

User roles are (not exhaustive!)

• Translator

• Evaluator

• Project Manager

• Customer

Page 60: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 60/72

User always have to be associated with passwords.

14.2 Basic user functionality

In the following some basic user file functions are described. Those functions

should go into the core library of OpenTMS. They are by far not exhaustive.

Basic User Functions Comment

CreateUser Creates a user

RemoveUser Removes a user

ModifyUser Makes modification to the user properties

Fig 22: Some basic user functions

Page 61: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 61/72

15 GUI MODEL

The GUI model realizes editors and similar applications which support the interac-

tion of the human user with the OpenTMS software.

This document is not intended to discuss and describe more details how to imple-

ment one or more GUIs for OpenTMS.

Anyway if those applications are defined and realized they should adhere to the

principles of the OpenTMS architecture.

Page 62: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 62/72

16 INTERFACE MODEL

OpenTMS software should be easily integrated into other software systems. An

interface model realizes this aspect of the behavior of OpenTMS. Interface models

are used for functions which do not model kernel aspects of OpenTMS (e.g. work-

flow management, CMS integration), but anyway are of import interest for the

OpenTMS community.

An example of an interface model is given in the section about CMS interfaces.

Page 63: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 63/72

17 CONFIGURING OPENTMS

A key feature of the OpenTMS architecture is the ability to configure the system.

Fig 23: Configuration of OpenTMS

Due to its broad usage requirements (stand alone, server etc.) several different

configuration methods should be supported.

• General configuration (GC): This configuration contains the all configura-

tion options which are used when no user related configuration is avail-

able. The CG can also define if an option can be overwritten by the user

configuration file or not.

• Server configuration (SC): This is a configuration which resides on a

server and is mainly target towards controlling server sided options. It is a

sub set of the GC.

Page 64: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 64/72

• User configuration (UC): This configuration stored the user specific con-

figuration options.

Depending on security require the configuration file can be secured using the se-

curity model functions for documents. Each option of the configuration file can be

secured separately.

17.1 Naming of the configuration file

The name of the user configuration file depends on the user name. The general

configuration file is always: OpenTMSConfig.xml. The file is always located in the

direct sub directory config (so one level below the main OpenTMS directory. A dis-

tribution mechanism for transferring user profiles between different machines

should be supported.

Examples

c:/Program Files/OpenTMS/config Configuration directory

c:/Program Files/OpenTMS/config/OpenTMSConfig.xml General configuration file

c:/Program Files/OpenTMS/config/OpenTMSConfig.klemens.xml User configuration file

Fig 24: Configuration file naming example

In addition OpenTMS can support the storage of configuration options in data-

bases. This has the advantage that one user can work with his personal environ-

ment on different machines.

17.2 Structure of the configuration file

A configuration file is written in an XML based format. The location of the configu-

ration file is relative to the start directory of the main OpenTMS application and

should be stored in a config directory,

The configuration file uses schema and xsd to restrict the possible values and sup-

port error detection. Each option supports the overwrite option. This option allows

to define if a user has the right to modify the option or not. The admin attribute al-

lows overrule the overwrite attribute. Users mentioned in this list (separated by “;”)

always have the right to modify the option).

Page 65: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 65/72

Changes done in the user configuration gave no influence (no modification) on the

general configuration.

<option name=optionname [admin=<list of admins>] [over-

write=true|false]>

…value…

</option>

or alternatively

<option name=optionname [overwrite=true|false] value=value>

Fig 25: Configuration option structure

17.3 Configuration Options

In the following table some main options are described.

Option name Description Values Option Type

OpenTMSDir Location of the OpenTMS directory

Directory name GC, SC, UC

LogDir Log Dir name; relative to OpenTMSDir

Directory name

ErrorDir Error Dir name; relative to OpenTMSDir

Directory name

LogLevel Control amount of log-ging

<number> 0, 1, 2, 3…

ConfigDir Configuration directory Directory name

Fig 26: OpenTMS options table

Page 66: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 66/72

18 DMS INTERFACE

As has been explained in the chapter about “Documents” OpenTMS may require

to store various documents. Ideally this document repository should be a DMS

system.

For the start only some very basic functions should be supported by OpenTMS to

retrieve documents from and insert into a DMS system. Please note that this does

not imply a very complex DMS system, a flat system based on a directory system

can be sufficient. It has to be clearly stated here that this functions must be imple-

mented as part of the DMS system and are not implemented as part of OpenTMS.

OpenTMS just uses (calls) this functions (methods) when storing of documents is

required within OpenTMS. It is not considered to be a core functionality of

OpenTMS.

OpenTMS will provide this functionality through an XML RPC interface. Basic idea

is that documents are organized in repositories which contain the documents.

The DMS system must be able to handle any input document supplied. No re-

striction is made the format of the document.

The DMS interface can also be used to act as a versioning of documents. During

the translation process xliff files etc. change due to the translations etc. The differ-

ent versions can be kept in the DMS system.

DMS document handling should be supported thru WEBDAV too.

The following core functions should be supported by the DMS system. Each func-

tion normally returns an unique identifier.

Function Comment

Connect Connects to CMS system

Create Repository creates a repository where documents

can be added to

Page 67: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 67/72

Remove Repository Removes repository

Add Document Adds a document to a given repository

and identifies it by an unique identifier.

Add Document Version Adds a new version of the document.

Each version is identified through its

original id augmented by a version

code.

Remove Document Removes a document from a repository

Replace Document Replaces an existing document in the

repository

Retrieve Document Returns a document from the reposi-

tory; if it is a versioned document the

most recent version is returned.

Retrieve Document Version Returns a specific version of the docu-

ment from the repository

Page 68: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 68/72

19 BIBLIOGRAPHY

FOLT (2007a). Arguments for the development of an open source soft-ware translation memory technology to support translators. Stutt-gart, February 2007. FOLT - Accessed on 13 April 2007 on http://www.folt.org/index.php?option=com_docman&task=doc_do

wnload&gid=12&Itemid=39.

FOLT (2007b). ExposéTranslationMemoryOpenSourceSystemTMOSS. Stuttgart, 28. Oktober 2007. FOLT - Accessed on 13 April 2007 on

http://www.folt.org/index.php?option=com_docman&task=doc_download&gid=16&Itemid=39.

GMX (2008). Global information management Metrics eXchange (GMX). LISA Standard - Accessed on 13 April 2007 on http://www.lisa.org/Global-information-m.104.0.html.

MARTIF (1999). ISO 12200 Terminology – Computer applications - Ma-chine-readable Terminology Interchange Format (MARTIF) - Nego-tiated Interchange. ISO TC 37 - http://www.ttt.org/clsframe/negotiated.html.

OLIF (2008). OLIF - The open XML language data standard. OLIF 2 Con-sortium - Accessed on 13 April 2007 on http://www.olif.net/.

SRX (2008). Segmentation Rules eXchange (SRX). LISA Standard - Ac-cessed on 13 April 2007 on http://www.lisa.org/Segmentation-Rules-e.40.0.html.

TBX (2008). Term Base eXchange (TBX). LISA Standard - Accessed on 13 April 2007 on http://www.lisa.org/Term-Base-eXchange.32.0.html.

TMX (2008). Translation Memory eXchange (TMX). LISA Standard - Ac-cessed on 13 April 2007 on http://www.lisa.org/Translation-Memory-e.34.0.html.

XLIFF (2008). XLIFF Version 1.2. OASIS Standard, 1. Febraury 2008 - Accessed on 13 April 2007 on http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html.

Xml:tm (2008). XML Text Memory (xml:tm). Lisa Standard, - Accessed on 13 April 2007 on http://www.lisa.org/XML-Text-Memory-

xml.107.0.html.

Page 69: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 69/72

20 APPENDIX

20.1 Multiple translations for a linguistic concept

A key problem in translation is the handling of multiple translations for a linguis-

tic concept, e.g. two German translations for an English concept. This is an in-

herent problem as the situation becomes more complex if more than two lan-

guages are involved. This can easily lead to a situation (-> Process) where two

MuLOS mayhave to be merged into one MuLO as a newly entered translation pair

refers for each of translations to a MoLO which links to different MuLOs. This

shown by the following examples:

In the following we assume DE - EN as source and target language. Segments

are identified as follows: S-<language code>-<n>: <string> - n is the sequence

number. In the examples below segments which result in unifications/merges are

coloured.

Time 1:

S-DE-1: Haus S-EN-1: house

After accepted the database will contain these two entries as a translation of

the segments. In TMX term this is now one TU

TU 1:

TUV: S-DE-1: Haus

TUV: S-EN-1: house

Next another language pair is added.

Time 2:

S-DE-2: Heim S-EN-2: home

After accepting this translation pair the database will contain now two TU en-

tries.

TU 1:

Page 70: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 70/72

TUV: S-DE-1: Haus

TUV: S-EN-1: house

TU 2:

TUV: S-DE-2: Heim

TUV: S-EN-2: home

Now another pair is added:

Time 3:

S-DE-3: Haus S-EN-3: building

After accepting this translation pair the database will contain now two TU en-

tries – but TU 1 is extended with a new translation S-EN-3 – as S-DE-3 is

identical to S-DE-1.

TU 1:

TUV: S-DE-1: Haus TUV: S-EN-1: house TUV: S-EN-3: building

TU 2:

TUV: S-DE-2: Heim

TUV: S-EN-2: home

Now another translation pair comes in:

Time 4:

S-DE-4: Heim S-EN-4: house

Now S-DE-4 is contained as S-DE-2 in TU 2 while S-EN-4 is contained in TU 1.

As there is now obvious entry which should be preferred where the translation

pair should be added both TU-1 and TU-2 are unified – meaning both entries

are merged into one.

The result of this is with TU 2 being removed:

TU 1:

TUV: S-DE-1: Haus TUV: S-EN-1: house TUV: S-EN-3: building

TUV: S-DE-2: Heim

Page 71: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 71/72

TUV: S-EN-2: home

This actually means now that Haus can be translated into English as: house –

building – home (and vice versa), Heim also as house – building – home,

house can be translated into German as Haus – Heim. And so on.

Although this sounds quite simple for two languages it immediately gets com-

plicated if several languages are involved. Here a language could operate as

“pivot language” meaning that – although not really intended – a whole set of

entries get merged although before two distinct entries. This can be esp. con-

fusing if several translators translate. The DE-EN translator may be surprised

by a unified entry as he never was the source of the merger and never pro-

duced double translation. This can be seen by the following example:

Time x:

Initial entries

TU 1: TUV: S-DE-1: Haus

TUV: S-EN-1: house TUV: S-EN-3: building TUV: S-DE-2: Heim TUV: S-EN-2: home

TUV: S-LA-6: domus

TU 2: TUV: S-DE-1: Wald

TUV: S-EN-1: wood TUV: S-LA-6: silva

Assume now the EN-LA translator makes an error in his translation and adds

(but the argument holds for other combinations too!) the following combina-

tion:

S-EN-11: wood S-LA-11: domus

This results immediately in just one TU, TU 2 being removed.

Page 72: Open Tms Software Architecure

Software Architecture 04/2008/KW

Dok. Nr.: HEA-1-2008; Version 1.0; April/May/June/August 2008 72/72

TU 1:

TUV: S-DE-1: Haus

TUV: S-EN-1: house TUV: S-EN-3: building TUV: S-DE-2: Heim TUV: S-EN-2: home

TUV: S-LA-6: domus TUV: S-DE-1: Wald TUV: S-EN-1: wood TUV: S-LA-6: silva

The DE – EN translator will be confused the next time he searches for “Haus”

as he will get now the following EN proposals: house – building –home –

wood. And the reason was the entries done by the EN-LA translator. One has

to add that in some cases the corresponding EN-LA translation pair may be

perfectly correct but for DE – EN it may be totally wrong and confusing.

As a consequence translators should be carefully with their translation in order

to avoid unexpected translation links.