11
Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for the Assessment & Application of Technology (BPPT) Indonesia

Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for

Embed Size (px)

Citation preview

Language Resources in Indonesia

Language Technology & Applied Information Laboratory

Directorate for Information Technology and Electronics Agency for the Assessment & Application

of Technology (BPPT)

Indonesia

TBIT Laboratory - BPPT Apply, assess and develop Language Technology &

Applied Information Technology supporting Government’s program in development of IT & Electronics in Indonesia

Advise and setup government national policy in developing language technology and information technology

Develop and deploy language technologies in the area of language processing, text analysis and generation, information retrieval and extraction, machine translation

Develop and maintain Language Resources i.e. grammar rules, electronic dictionaries and annotated corpus

Develop Electronic Data Interchange (EDI) and Electronic Commerce suite for SME

Project Portfolio Multilingual Machine Translation System (CICC-MMTS) KEBI (Indonesian Electronic Dictionaries) UNL (Universal Networking Language) INCI (Indonesian National Corpus Initiative) Online I-E Dictionary on news portal (Detik.com) Multimedia Dictionary (including speech synthesizer) Yanetra (NLP tools for the blind) Others

Manufacturing Technology supported by advanced and integrated information system through International Cooperation (MATIC) for Automotive, Apparel, and Electronics

Web Information Gateway for Apparel Electronic Commerce Projects

Indonesian Electronic Dictionaries - KEBI

Word dictionary (50K root words ~250K derivational words)

Concept dictionary Co-occurrence dictionary Terminology dictionary (15K terms)

Indonesian Dictionary Online – KEBI Online http://nlp.aia.bppt.go.id

Indonesian-English Online Dictionary Indonesia-English Online Dictionary on Detik.com Portal (number

1 for online breaking news)

English Summarization

English Summarization

MiniMiniWeb Pages with English

word links

Web Pages with English

word links

Online Dictionary

Online Dictionary

New

s A

rtic

leN

ews

Art

icle

Dynamic HTML

Generator

Dynamic HTML

Generator

Con

tent

Man

agem

ent S

yste

mC

onte

nt M

anag

emen

t Sys

tem

U

ser

U

ser

Indonesian National Corpus Initiative INCI/KNBI

Source from national news agency LKBN ANTARA

50.000 sentences

~ 1 million words

ambiguous word-type

ambiguous word-token

POS and phrase attachment ambiguity

[NP <JAKSA:IDNCC$IN11135> <AGUNG:IDAJGP$IAJVA> NP] [VP<BERIKAN:IDVT/IDVBT$IVPBN> [NP <CERAMAH:IDNCA$INCAEV> NP] [PP<DI:IDPP$IPPLA> [NP <DEPARTEMEN:IDNCA$INCAOR> <KEUANGAN:IDNCA$INCACT>NP] VP][NP <Jaksa:IDNCC$IN11135> <Agung:IDAJGP$IAJVA> <Sukarton:IDNM$null><Marmosudjono:IDNM$null> <SH:IDNM$INMTL> NP] [ADP <hari:IDNCA$INCATM><Jumat:IDNM$INMDY> ADP] [PP <di:IDPP$IPPLA> <hadapan:IDNCA$INCALC> [NP<Menteri:IDNCC$IN11135> <Keuangan:IDNCA$INCACT> <Menteri:IDNCC$IN11135><Muda:IDAJGP$IAJST> <Keuangan:IDNCA$INCACT> <Menteri:IDNCC$IN11135><Perdagangan:IDNCA$INCACT> <dan:IDCJCO$ICJCOAD> <para:IDPP$IPPACC><pejabat:IDNCA$INCACT> <Eselon-I:IDNM$null> <lingkup:IDNCA$INACC><Departemen:IDNCA$INCAOR> <Keuangan:IDNCA$INCACT> NP] PP] [VP<mengadakan:IDVT/IDVBT$IVABS> [NP <pemaparan:IDNCA$INCAAC><tentang:IDAJGP$IAJGT> <kejahatan:IDNCA$INCACT> <korupsi:IDNCA$INCASD><dan:IDCJCO$ICJCOAD> <penyelundupan:IDNCA$INCAAC> NP] VP]

BIAS (Bahasa Indonesia Analysis System)

Part of CICC-MMTS Improvement using stochastic-

symbolic approach Supervised and unsupervised

learning 15.000 sentences of annotated

corpus (based on GDA tagset) ISTAG (POS Tagger) ISPARSE (Skeleton Parser)

UNL Project

12

Universal Networking Language (UNL).

- Deconverter & Enconverter System

- UNL graph displayer System

- UNL editor System

- Indonesia Language Server :

http:// unlserver .aia .bppt.go.id

Other resources Speech recognition system (Bandung Institute

Technology) Indonesian spelling checker for Microsoft Word

(Gajah Mada University) Computational lexicon research (National

Language Center) Computational morphology (Atmajaya

University)