36
OLIF2 Consortium: OLIF2 Consortium: Organizational Organizational Meeting Meeting April 6, 2000 SAP AG Walldorf, Germany

OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Embed Size (px)

Citation preview

Page 1: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

OLIF2 Consortium: OLIF2 Consortium: Organizational Meeting Organizational Meeting

April 6, 2000SAP AG

Walldorf, Germany

Page 2: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

AgendaAgenda

9.00 – 9.15 Welcome and introductory Remarks: Daniel Grasmick

9.15 – 9.45 Structure of the OLIF2 Consortium: Daniel Grasmick, Susan McCormick

9.45 – 10.30 Time frame for OLIF2: Daniel Grasmick, Susan McCormick

Financial issues for the consortium: Daniel Grasmick

10.30 – 10.45 Coffee break

10.45 – 12.00 Current status of OLIF: Gregor Thurmair

12.00 – 13.00 Discussion of changes to OLIF currently envisaged for OLIF2: Susan McCormick

13.00 – 14.00 Lunch

14.00 – 14.30 Review of current level of support for OLIF among tool vendors: Daniel Grasmick, Susan McCormick

14.30 – 15.30 Review of other interchange formats and initiatives: all participants

Discussion of interaction of OLIF2 Consortium with SALT and/or OSCAR: all participants

15.30 – 15.45 Coffee break

15.45 – 17.00 Task descriptions for work groups to review current OLIF and suggest changes/additionsin linguistic, terminology, and technical specifications; recommendations to be

completed in April/May, 2000: all participants

Page 3: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Consortium ParticipantsConsortium Participants

Gregor Thurmair, Sail LabsJohannes Ritzke, Sail LabsAlex Muzarku, LogosPierre-Yves Foucou, Systran Yves Mahe, XeroxPaolo Martins, EUChris Pyne, L10NBRIDGEJörgen Danielsen, L10NBRIDGENils van der Laan, TradosPeter Quartier, LotusUlrike Irmler, MicrosoftDaniel Grasmick, SAPSusan McCormick, SAPJennifer Brundage, SAP Christian Lieske, SAPChristoph Pahlke-Lerch, SAP

Page 4: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Welcome and IntroductionsWelcome and Introductions

• Company

• Professional background

• Terminology volume

• Languages supported

• Organization of terminology management in your company

• Terminology database(s) used

• Other tools related to terminology

• Any exchange formats?

• Future plans for terminology/lexicon management

Page 5: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Purpose of OLIF2Purpose of OLIF2

To upgrade the current OLIF standard

so that it can be supported by tool

vendors and applied by users in 2001

Page 6: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Why a New Consortium?Why a New Consortium?

• OLIF was developed in the OTELO project as a OLIF was developed in the OTELO project as a prototype, but is not usable in its current formprototype, but is not usable in its current form

• The SALT project plans to use the OLIF format as part The SALT project plans to use the OLIF format as part of its XLT standard, but will not edit OLIF1 for contentof its XLT standard, but will not edit OLIF1 for content

• LISA TBX will be based on SALT XLTLISA TBX will be based on SALT XLT

• None of the other formats supports MT requirementsNone of the other formats supports MT requirements

• Thus, usable OLIF is requiredThus, usable OLIF is required

e.g., SAP will double its terminology volume by the end of 2000 and add additional NLP tools needing term data

Page 7: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

• OTELO participantsOTELO participants SAIL Labs, Logos, Lotus, SAP

• New MT representativeNew MT representative Systran

• Term Management representativesTerm Management representatives Trados, Xerox

• Service (and tool) providersService (and tool) providers L10NBRIDGE, L&H via SAIL Labs

• UsersUsers EU, Microsoft...

• ... And open to interested parties... And open to interested parties

Structure of the ConsortiumStructure of the Consortium

Page 8: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Time Frame for OLIF2Time Frame for OLIF2

Phase I: Specification

• Working groups make recommendations for changes to OLIF format by May 31, 2000

• Specifications for OLIF2 complete by September, 2000

Phase II: Implementation

• Tool vendors support new format in 2001

• Maintenance tools developed by end of 2000/beginning of 2001

Page 9: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Changes to OLIF Changes to OLIF for OLIF2 for OLIF2

Page 10: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

OLIF to OLIF2OLIF to OLIF2

• technical structure

• linguistic analysis

• terminology handling

Review current OLIF format for changes to:

Page 11: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

XMLXML

• well-supported industry standard

• extensible - new element types easily defined

• well-suited for data exchange formats

• SALT project already working on XML-based standard in which they want to embed OLIF

Make OLIF compliant with XML:

technical

Page 12: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Achieving XML-ComplianceAchieving XML-Compliance

• OLIF2 is primarily ‘rewrite’ of OLIF, but with XML-compliance

• OLIF entry structure remains basically the same for OLIF2

technical

Page 13: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

• reanalyze some current tags as attributes of XML element types, e.g.,

<LINK=“synonym”>

• allow for more embedding of structure

Use some of the features of XML to make design changes for OLIF2:

XML-Driven Design ChangesXML-Driven Design Changes

technical

Page 14: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Current OLIF: ISO-Latin-1

OLIF2 functionality:

• double-byte characters

• bidirectionality

XML supports ISO/IEC 10646, which is similar to unicode

Character SetsCharacter Sets

technical

Page 15: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

• company-code as part of central entry base

• formally distinguish bilingual from monolingual links

• develop protocol for user-defined fields

Make substantive changes to the structure

Changes to the OLIF ConceptChanges to the OLIF Concept

technical

Page 16: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Converging with other StandardsConverging with other Standards

• Achieve as much overlap as possible with, e.g.,

» names of element types

» structure of entries

Coordination with other standardization initiatives such as SALT

technical

Page 17: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Review of Linguistic Features Review of Linguistic Features

• are features in correct feature groups?

• are all of the features that are essential for the different vendors covered?

» transitivity for Logos

» Systran requirements

» Xerox

• what about other NLP products or users?

Comprehensive review of linguistic features

linguistic

Page 18: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Morphology Morphology

• currently includes only German, Danish and English

• theoretical underpinnings of analysis are inconsistent

Review the current morphological analysis

linguistic

Page 19: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Syntax and Semantics Syntax and Semantics

• selectional restrictions (transfer conditions) - representation should be improved

• syntactic frames - currently for German, Danish and English only

• semantic types - should be reviewed and expanded

Special attention to:

linguistic

Page 20: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Features and Values Features and Values

• Make sure feature names and values conform to general practice

• Make sure all element types that we want to cover are actually in DTD

linguistic

Page 21: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Canonical Forms Canonical Forms

• defined for formulation of entry string in given language

• necessary for optimal convergence of entries from different systems

• based on language-specific lexical conventions

• published as part of formal specification

Conventions for formulating canonical forms

linguistic

Page 22: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Structure of Terminology Structure of Terminology

• allow for deeper structure, more embedding (in line with MARTIF?)

• expand on feature/value pairs to allow more admin detail

Expand current structure?

terminology

Page 23: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Entry Identifier Entry Identifier

• current OLIF does not support a unique

identifier for each entry, although many

termbanks require this

Add unique entry identifier

terminology

Page 24: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Review of OLIF Support Review of OLIF Support Among Tool VendorsAmong Tool Vendors

Page 25: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Overview of Other Overview of Other Exchange Formats and Exchange Formats and

Initiatives Initiatives

Page 26: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

MARTIF MARTIF

• SGML-based

• strictly terminology

• formal concept-orientation

• extensive DTD

• lots of administrative information

• relatively complex embedding in structure

ISO 12200:1999 Standard

Page 27: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

• extended MARTIF - attempt to coordinate with TMX and OLIF

• adapted to XML

• extends MARTIF to include NLP some features

Proposal ISO/TC 37/SC 3 N 318

X-MARTIF X-MARTIF

Page 28: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

SALT SALT

XLT (lex/term exchange)

OLIF (lex) MSC (term < MARTIF)

SALT Project - Currently funded by the EU

Page 29: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

OSCAR OSCAR

• TMX - format for re-use of translation memory data

• TBX - lex/termbase exchange (subset of XLT)

Group within LISA Organization

Page 30: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Geneter Geneter

• for DB management

• compatibility with internet

• fairly complex hierarchical structure

• reworked to allow multiple word senses alongside concept model

“Generic model for the distribution and reuse of heterogeneous terminological data”

Page 31: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Meeting Results:Meeting Results:Participation of all companies invitedParticipation of all companies invited

working in 3 action groups ... working in 3 action groups ...

Page 32: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

TG1: Technical Structure TG1: Technical Structure

Goal: provide formal structure of the formatGoal: provide formal structure of the format

• Review for XML complianceReview for XML compliance• RedundancyRedundancy• Links representationLinks representation• Definition of the headerDefinition of the header• Incorporation of user-defined fieldsIncorporation of user-defined fields

= Output: OLIF DTD= Output: OLIF DTD

Page 33: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

TG2: Linguistic AnalysisTG2: Linguistic Analysis

Goal: provide a “final” list of feature-value Goal: provide a “final” list of feature-value pairs for the linguistic componentpairs for the linguistic component

• Canonical form formulationCanonical form formulation• Morphology, syntax and semanticsMorphology, syntax and semantics• Transfer conditions and transformationsTransfer conditions and transformations• Cross-references (based on ISO)Cross-references (based on ISO)

Page 34: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

TG3: Terminology HandlingTG3: Terminology Handling

Goal: to provide a “final” list of feature-Goal: to provide a “final” list of feature-value pairs for terminologyvalue pairs for terminology

• Concordance with other standardsConcordance with other standards• Administrative informationAdministrative information

Page 35: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Languages Supported in OLIF2Languages Supported in OLIF2

Priority 1Priority 1

• ENEN• DEDE• DADA• FRFR• ESES• PTPT• JAJA

Priority 2Priority 2

• RURU• ITIT• NLNL

• Other priorities...Other priorities...

• ELEL• HUHU• ZHZH• ZFZF• KOKO• ARAR

Page 36: OLIF2 Consortium: Organizational Meeting April 6, 2000 SAP AG Walldorf, Germany

Other ItemsOther Items

• Terminology samples from all participantsTerminology samples from all participants at least 100 entries incl. description at least 2 languages and different categories