Upload
tyler-riley
View
215
Download
3
Embed Size (px)
Citation preview
Recent Activities of Speech Corpora and Assessment in Korea
2000. 10. 16
Yong-Ju Lee
Wonkwang University
Korea
Recent trend of speech technology commercial market in Korea
New establishment of the venture company rapidly increased in recent years Not only for original speech technology solution company but value added speech application system or service comp
anyCTI, UMS, VoIP, Voice portal, GPS, Mobile phoneToy companies, ...
It’s because
Government’s venture business promotion policy Speech technology as an attractive item for new
technology business Many researchers and engineers spin off government
supported research institutes and big companies ETRI, LG, SAMSUNG, …
Many labs in University also participated in the business work
Some major international(multinational) corporations also participated in Korean speech technology market This also influenced to promote the market
Common requirement of speech technology companies
Common use of language resources Objective methodology for speech I/O assessment
They all deeply interested in establishment of proper organization to handling above problem
Spoken Language Resources and Assessment Center(SLRAC)
Role Systematic construction and distribution of Korean language
resourcesSpeech only at the initial stage
Active role to preparing speech I/O assessment methodologies
Technical information center for speech technology
Status SLRAC will be set up at first at Speech And Language Scien
ce Laboratory(SALL) in Wonkwang University And will start their job from late this year
Distribution items
First stage Results of speech related project funded by government Common construction results of educational-industrial
consortium
Next state Private results of each organization(industries, Universities
& etc) Long term construction and administration program
supported by government
Distribution Policy, ...
Distribution policy Released to domestic organizations at first will expand international use with a little time lag
Others Hope to keep contact and connection with LDC, ELRA
Details will be presented soon!!
Brief introduction of speech and language corpora and assessment related projects
Construction of speech and language corpora and assessment methodologies(2000 ~ 2001, 2 years) Supported by Ministry of Science and Technology Goal
Language part (accomplished by KAIST) methodology design of machine translation system performance eval
uation methodology design of information retrieved system performance ev
aluation
Speech part(accomplished by Wonkwang Univ.) construction of 2000 speaker’s telephone speech corpus isolated and connected digit, Phonetically Rich Words set up and modification of K-ToBI transcription system and prototypi
ng of prosody DB Design and construction of speech and language corpus for Korean
dictation system
Continued
Results(speech corpora) will be distributed at the SLRAC Details will be presented in next meeting
Continued
Research on the basic platform of dialog system(Late 2001 ~ 2003, 3 years) Supported by Ministry of Commerce, Industry of Energy Accomplished by Seogang Univ. & Wonkwang Univ. Various kinds of dialog speech corpora will be produced
various tasks and environments
Continued
Automatic captioning the TV program using speech recognition(2000 ~ 2001) Supported by Ministry of Information and Communication Accomplished by ETRI Broadcast news speech corpus will be produced
Continued
Speech interface for internet application(2000 ~ 2002) Supported by Ministry of Information and Communication Accomplished by Korea Telecom Various speech corpora is now preparing
words, phrase and sentence speech corpora for web application
Continued
Brain science research(1998 ~ 2007) Supported by Ministry of Science and Technology Accomplished by KAIST Some speech corpus will be designed for language
perceptual study
Continued
Other private industries Industries and other organizations are preparing(or already
prepared) various speech corporatelephone speech(wire and mobile)various kind of speech corpora under PC environmentcommand words for Car environment(voice dialing, GPS, etc.)etc.
New trial for speech corpora in Korean
Several industries share expense for corpora construction
Experienced group produce it SLRAC host the project and selling the results Give a benefit for first attendants
First trial
Korean DIGIT speech corpus Difficult but essential item for Korean speech recognition
Korean digit = monosyllable
First stage : 500 speaker Contents
isolated words, connected 4 digit stringsvarious length digit string
telephone number, ID number, date and time, credit card number, etc.
Various kind of collecting environment Will be released early next year
Next candidate names, geographical name, etc.
Korean COCOSDA
Consult about Establishment of speech and language resource distribution
center(SLRAC etc.) Planning for speech related national project
Sponsor The technical meeting for speech I/O assessment and
speech corpora under the auspices of academic society
Korean Science Foundation(KOSEF) start to support with Special Interest Group promotion program(2000 ~ 2004) “SIG for speech I/O assessment and speech corpora” This will help more active activities
Host The 2001 Oriental COCOSDA workshop in Korea
(24 Aug. 2001)
Oriental COCOSDA 2001