30
The diachronic study of term usage in Basque-language legal texts: use of UZEI’s automatic verifier for bilingual texts (2DITE) in terminometry Iker Etxebeste - UZEI Donostia, 23-11-2018

The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

The diachronic study of term usage in

Basque-language legal texts: use of UZEI’s

automatic verifier for bilingual texts

(2DITE) in terminometry

Iker Etxebeste - UZEI

Donostia, 23-11-2018

Page 2: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

INTRODUCTION

• INTRODUCTION SECTIONS

1) Implantation evaluation in terminology planning

2) UZEI’s contribution

3) A trial with 2DITE lexical verifier

Page 3: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

1. IMPLANTATION EVALUATION

EVALUATION OF IMPLANTATION IN TERMINOLOGY PLANNING

(Auger, 1986)

Research Standardization

Evaluation Implantation

DisseminationUpdating

Page 4: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

1. IMPLANTATION EVALUATION

Standardization

▪ Spontaneous process or planned process

▪ Standardization → Institutional intervention, planning

▪ Basque language context:

• EUSKALTERM Basque Public Term Bank (2001)

• Basque Advisory Council: Terminology Committee (2002)

Page 5: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

1. IMPLANTATION EVALUATION

▪ Terminology work is not only about standardization and dissemination of terms

▪ Terminology implantation is not to be taken for granted:

• More socialization/dissemination is required

• Not sufficient success among agents involved in the transmission of the terminology work that has been done

Implantation

Page 6: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

1. IMPLANTATION EVALUATION

▪ Key step in terminology planning

▪ Monitoring the use of terms

▪ Rethinking/updating terms, re-examining criteria

Evaluation

Page 7: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

1. IMPLANTATION EVALUATION

▪ Research on implantation evaluation:

• Measuring the success or failure of terms • Aimed at improving the standardization process from the start• Goal: appropriate and viable, correct and reliable terms

▪ Basque language context:

• Evaluation is foreseen in the Terminology Committee’s action plan • Steps forward pending• UZEI’s contribution

Evaluation

Page 8: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2. UZEI AND TERMINOMETRY

▪ UZEI’s terminometric contribution

▪ New action line in 2005

▪ Feedback received by our terminology proposals

Page 9: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.1 TEIS

TEIS: TERMINOLOGY IMPLANTATION INFORMATION SYSTEM

▪ Project 2006-07

▪ Goal: information on terminology implantation

▪ Monitoring Basque language in written texts

▪ First systematic work with Basque language

Page 10: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.1 TEIS

CORPORA

▪ Synchronic corpora, 2004

▪ Four domains: administration, education, the media, companies

▪ In total, 500 000 words

Page 11: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.1 TEIS

COMPILATION OF TERMINOLOGICAL DICTIONARY

▪ There still were no proposals by the Terminology Committee

▪ Unified Dictionary of the Basque Royal Academy (Euskaltzaindia), subject field mark

▪ Classified by concepts: recommended/preferred terms and other forms

▪ 552 concepts, 1 333 terms

Page 12: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.1 TEIS

TERM IDENTIFICATION AND RESULTS

▪ 10 141 occurrences

▪ High implantation rate of recommended terms: 90%

Page 13: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.2 IDITE

IDITE

▪ Lexical verifier

▪ Identifies recommended and “non-recommended” words (those with use/preferably use rating)

▪ Updated lexical databases

Page 14: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

2.2 IDITE

2013 TRIAL

CORPORA

▪ 1 000 Wikipedia entries, 300 000 words

RESULTS

▪ Identifying those marked as “use” or “preferably use”

▪ 681 forms were identified.

Page 15: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

2DITE (2018)

▪ Automatic lexical verifier for bilingual texts

▪ Developed in order to comply with our quality needs in bilingual texts

▪ Correct bilingual lexicon as reference resource

Page 16: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

Processing of parallel texts

Identifying source language terms

Searching for and identifying the appropriate equivalent in the

target language

In case there is no result: notifying it and showing the context

Page 17: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

RESEARCH TEST (2018)

▪ 2DITE use test

▪ Corpora with special features: bilingual texts

▪ Analysis of legal and administrative terminology

▪ Diachronic test

Page 18: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

CORPORA

▪ Basque legal texts

▪ Two corpora:

1978 – 1994

2014 – 2018

▪ Period of 25 years → evolution

▪ Volume: 375 000 words (180 000 + 195 000)

Page 19: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

LEXICON COMPILATION

▪ Normalized/agreed terms in the legal and administrative sphere

• Basque Advisory Council’s Terminology Committee: dictionaries from those domains

• Basque Language Academy (Euskaltzaindia): Basic Legal-Parliamentary Dictionary

• Justice Department of the Basque Government: Commission for the normalization of court forms

▪ 3 249 concepts, 5 212 terms

Page 20: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

1978 – 1994 corpus

▪ Correct forms:

Occurrences: 13 861Concepts: 607

▪ Other forms:

Occurrences: 5 608Concepts: 367

CORRECT FORMS71%

OTHER FORMS 29%

TOTAL OF OCCURRENCES: 19 469

Page 21: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

2014 – 2018 corpus

▪ Correct forms:

Occurrences: 15 989Concepts: 831

▪ Other forms:

Occurrences: 2 035Concepts: 323

CORRECT FORMS88%

OTHER FORMS12%

TOTAL OF OCCURRENCES: 18 024

Page 22: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

1978 – 1994 corpus

Wide variety of cases:

▪ Typographical errors

▪ Dubious equivalences: bezero (client) ≠ kontsumitzaile (customer)

▪ Non-adapted to general lexicon norms: giza- (“human” in compound nouns)

Page 23: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

1978 – 1994 corpus

▪ Forms that do not meet general lexicon criteria established later on:

▪ dekreto (24%) / dekretu (76%)

▪ dirulaguntza (30%) / diru(-)laguntza (70%)

▪ eraskin (47%) / eranskin (53%)

▪ funtzionari (49%) / funtzionario (51%)

Many forms found in the corpus were standardized later on

Example: funtzionario (-ario, standardized in 1995)

Page 24: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

1978 – 1994 corpus

▪ Terminology standardization:in some cases, a larger variation is observed than that found nowadays

▪ bizisari,… (41%) / pentsio (59%)

▪ aldibaterako xedapen,… (45%) / xedapen iragankor (55%)

▪ arduralari batzorde, eraentza batzorde,… (51%) / administrazio kontseilu (49%)

Page 25: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

1978 – 1994 corpus

▪ Terminology standardization: an example with an extremely large variation

▪ aurrekontu orokorrak (64%)

▪ guztizko diruegitamuak (21%)

▪ diruegitamu nagusiak (7%)

▪ guztizko diru-egitamuak (2.6%)

▪ aurrekontu nagusiak (2.2%)

▪ guztizko aurrekontuak (1.1%)

▪ erabateko aurrekontuak (0.7%)

▪ guztizko diruizendapena (0.4%)

▪ orotariko diruegitamuak (0.4%)

Page 26: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

2014 – 2018 corpus

Not such a large variation:

▪ Few typographical errors

▪ Criteria of general lexicon are significantly more complied

dekretu, eranskin, funtzionario (100%)

dekreto, eraskin, funtzionari (0%)

Page 27: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

2014 – 2018 corpus

▪ Terminology standardization: significantly less formal variation

▪ pentsio, xedapen iragankor (100%)

▪ aurrekontu orokorrak (100%)

▪ Anyway, there are divergencies in some cases

▪ falta (oso) larri (46%) / falta (oso) astun (54%)

▪ gizarte-arloko epaitegi (42%) / lan-arloko epaitegi (58%)

Page 28: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

3. 2DITE

OVERVIEW

▪ Logical evolution of implantation

▪ Standardization in practice → standardized terminology

▪ Lexical prescriptions and terminology standardization processes →updating and adapting terms in practice

▪ 2DITE, a complementary tool for terminometry

Page 29: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

4. LOOKING TO THE FUTURE

▪ Terminometry research is strategic

▪ In Basque terminology, it is called to be the next logical phase of language planning

▪ Implantation of the proposals of the Terminology Committee

▪ Adapting and improving the terminological proposals for Basque

Page 30: The diachronic study of term usage in Basque-language ... · Iker Etxebeste - UZEI Donostia, 23-11-2018. INTRODUCTION •INTRODUCTION SECTIONS 1) Implantation evaluation in terminology

Iker [email protected]