Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The diachronic study of term usage in
Basque-language legal texts: use of UZEI’s
automatic verifier for bilingual texts
(2DITE) in terminometry
Iker Etxebeste - UZEI
Donostia, 23-11-2018
INTRODUCTION
• INTRODUCTION SECTIONS
1) Implantation evaluation in terminology planning
2) UZEI’s contribution
3) A trial with 2DITE lexical verifier
1. IMPLANTATION EVALUATION
EVALUATION OF IMPLANTATION IN TERMINOLOGY PLANNING
(Auger, 1986)
Research Standardization
Evaluation Implantation
DisseminationUpdating
1. IMPLANTATION EVALUATION
Standardization
▪ Spontaneous process or planned process
▪ Standardization → Institutional intervention, planning
▪ Basque language context:
• EUSKALTERM Basque Public Term Bank (2001)
• Basque Advisory Council: Terminology Committee (2002)
1. IMPLANTATION EVALUATION
▪ Terminology work is not only about standardization and dissemination of terms
▪ Terminology implantation is not to be taken for granted:
• More socialization/dissemination is required
• Not sufficient success among agents involved in the transmission of the terminology work that has been done
Implantation
1. IMPLANTATION EVALUATION
▪ Key step in terminology planning
▪ Monitoring the use of terms
▪ Rethinking/updating terms, re-examining criteria
Evaluation
1. IMPLANTATION EVALUATION
▪ Research on implantation evaluation:
• Measuring the success or failure of terms • Aimed at improving the standardization process from the start• Goal: appropriate and viable, correct and reliable terms
▪ Basque language context:
• Evaluation is foreseen in the Terminology Committee’s action plan • Steps forward pending• UZEI’s contribution
Evaluation
2. UZEI AND TERMINOMETRY
▪ UZEI’s terminometric contribution
▪ New action line in 2005
▪ Feedback received by our terminology proposals
2.1 TEIS
TEIS: TERMINOLOGY IMPLANTATION INFORMATION SYSTEM
▪ Project 2006-07
▪ Goal: information on terminology implantation
▪ Monitoring Basque language in written texts
▪ First systematic work with Basque language
2.1 TEIS
CORPORA
▪ Synchronic corpora, 2004
▪ Four domains: administration, education, the media, companies
▪ In total, 500 000 words
2.1 TEIS
COMPILATION OF TERMINOLOGICAL DICTIONARY
▪ There still were no proposals by the Terminology Committee
▪ Unified Dictionary of the Basque Royal Academy (Euskaltzaindia), subject field mark
▪ Classified by concepts: recommended/preferred terms and other forms
▪ 552 concepts, 1 333 terms
2.1 TEIS
TERM IDENTIFICATION AND RESULTS
▪ 10 141 occurrences
▪ High implantation rate of recommended terms: 90%
2.2 IDITE
IDITE
▪ Lexical verifier
▪ Identifies recommended and “non-recommended” words (those with use/preferably use rating)
▪ Updated lexical databases
2.2 IDITE
2013 TRIAL
CORPORA
▪ 1 000 Wikipedia entries, 300 000 words
RESULTS
▪ Identifying those marked as “use” or “preferably use”
▪ 681 forms were identified.
3. 2DITE
2DITE (2018)
▪ Automatic lexical verifier for bilingual texts
▪ Developed in order to comply with our quality needs in bilingual texts
▪ Correct bilingual lexicon as reference resource
3. 2DITE
Processing of parallel texts
Identifying source language terms
Searching for and identifying the appropriate equivalent in the
target language
In case there is no result: notifying it and showing the context
3. 2DITE
RESEARCH TEST (2018)
▪ 2DITE use test
▪ Corpora with special features: bilingual texts
▪ Analysis of legal and administrative terminology
▪ Diachronic test
3. 2DITE
CORPORA
▪ Basque legal texts
▪ Two corpora:
1978 – 1994
2014 – 2018
▪ Period of 25 years → evolution
▪ Volume: 375 000 words (180 000 + 195 000)
3. 2DITE
LEXICON COMPILATION
▪ Normalized/agreed terms in the legal and administrative sphere
• Basque Advisory Council’s Terminology Committee: dictionaries from those domains
• Basque Language Academy (Euskaltzaindia): Basic Legal-Parliamentary Dictionary
• Justice Department of the Basque Government: Commission for the normalization of court forms
▪ 3 249 concepts, 5 212 terms
3. 2DITE
1978 – 1994 corpus
▪ Correct forms:
Occurrences: 13 861Concepts: 607
▪ Other forms:
Occurrences: 5 608Concepts: 367
CORRECT FORMS71%
OTHER FORMS 29%
TOTAL OF OCCURRENCES: 19 469
3. 2DITE
2014 – 2018 corpus
▪ Correct forms:
Occurrences: 15 989Concepts: 831
▪ Other forms:
Occurrences: 2 035Concepts: 323
CORRECT FORMS88%
OTHER FORMS12%
TOTAL OF OCCURRENCES: 18 024
3. 2DITE
1978 – 1994 corpus
Wide variety of cases:
▪ Typographical errors
▪ Dubious equivalences: bezero (client) ≠ kontsumitzaile (customer)
▪ Non-adapted to general lexicon norms: giza- (“human” in compound nouns)
3. 2DITE
1978 – 1994 corpus
▪ Forms that do not meet general lexicon criteria established later on:
▪ dekreto (24%) / dekretu (76%)
▪ dirulaguntza (30%) / diru(-)laguntza (70%)
▪ eraskin (47%) / eranskin (53%)
▪ funtzionari (49%) / funtzionario (51%)
Many forms found in the corpus were standardized later on
Example: funtzionario (-ario, standardized in 1995)
3. 2DITE
1978 – 1994 corpus
▪ Terminology standardization:in some cases, a larger variation is observed than that found nowadays
▪ bizisari,… (41%) / pentsio (59%)
▪ aldibaterako xedapen,… (45%) / xedapen iragankor (55%)
▪ arduralari batzorde, eraentza batzorde,… (51%) / administrazio kontseilu (49%)
3. 2DITE
1978 – 1994 corpus
▪ Terminology standardization: an example with an extremely large variation
▪ aurrekontu orokorrak (64%)
▪ guztizko diruegitamuak (21%)
▪ diruegitamu nagusiak (7%)
▪ guztizko diru-egitamuak (2.6%)
▪ aurrekontu nagusiak (2.2%)
▪ guztizko aurrekontuak (1.1%)
▪ erabateko aurrekontuak (0.7%)
▪ guztizko diruizendapena (0.4%)
▪ orotariko diruegitamuak (0.4%)
3. 2DITE
2014 – 2018 corpus
Not such a large variation:
▪ Few typographical errors
▪ Criteria of general lexicon are significantly more complied
dekretu, eranskin, funtzionario (100%)
dekreto, eraskin, funtzionari (0%)
3. 2DITE
2014 – 2018 corpus
▪ Terminology standardization: significantly less formal variation
▪ pentsio, xedapen iragankor (100%)
▪ aurrekontu orokorrak (100%)
▪ Anyway, there are divergencies in some cases
▪ falta (oso) larri (46%) / falta (oso) astun (54%)
▪ gizarte-arloko epaitegi (42%) / lan-arloko epaitegi (58%)
3. 2DITE
OVERVIEW
▪ Logical evolution of implantation
▪ Standardization in practice → standardized terminology
▪ Lexical prescriptions and terminology standardization processes →updating and adapting terms in practice
▪ 2DITE, a complementary tool for terminometry
4. LOOKING TO THE FUTURE
▪ Terminometry research is strategic
▪ In Basque terminology, it is called to be the next logical phase of language planning
▪ Implantation of the proposals of the Terminology Committee
▪ Adapting and improving the terminological proposals for Basque
Iker [email protected]