7
Notes on ICASSP 2004 Arthur Chan May 24, 2004

Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages) Brief note of ICASSP 2004 NIST RT 04 Evaluation results Other interesting

Embed Size (px)

Citation preview

Page 1: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

Notes on ICASSP 2004

Arthur ChanMay 24, 2004

Page 2: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

This Presentation (5 pages)

Brief note of ICASSP 2004 NIST RT 04

Evaluation results Other interesting things relate to

CALO

Page 3: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

NIST RT 04 Meeting Transcription – Headlines. Meeting Transcription

A challenge to core technology, evaluation and resource preparation.

Core technology Speaker Segmentation Speech to Text (STT)

Evaluation New evaluation scheme is deviced for overlappe

d speech. Resource preparation

LDC has a big headache in preparing the data.

Page 4: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

Speaker Segmentation Segmenting the speech

Search for the number of speakers. Get speaker turns. Measured by Diarization rate.

Insights: (from ISL) More speakers: the harder the task. A new measure called speaker speakin

g time entropy is proposed.

Page 5: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

STT Very hard task

ICSI, ISL use the state of the art technology +Constrained linear transform +Discriminative training (DT-MAP) +Speaker Adaptive Training.

Individual headphone results WER: 34.8% for non-overlapping speech.

Some meeting is very hard. Many people is speaking at the same time.

Trained on 4 different subset of data, ICSI data is just one of them (70% of the total)

Insights: (ICSI) feature-based technique doesn’t help too much Multiple-distance microphones and array microphones

techniques help. Conclusion: we will also have a hard-time.

Page 6: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

Evaluation and Resource Preparation Evaluation:

Overlapped speech require different schemes for evaluation

Will require multiple string matching. (Detail unknown yet.)

Resource Preparation Currently, no tool can satisfy the need of

transcribing multiple channels of speech with interaction

Professional transcriber failed.

Page 7: Notes on ICASSP 2004 Arthur Chan May 24, 2004. This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting

Other interesting news from ICASSP related to CALO

Project EARS: Lightly supervised training

3000 hours close captioned speech is used

Discriminative training is found to be useful for some sites.

Others