28
Speech tools Jean-Philippe Goldman 03.03.2004

Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

Speech tools

Jean-Philippe Goldman

03.03.2004

Page 2: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

2

Two questions

What kind of data ?

Which task ?

Page 3: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

3

What kind of data ? Speech content (noise, multivoice,…) Data File

Sound/Transcription/PitchCurve Sampling/Quantization

16k 12k 8k 4k 8bit Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h Format

Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere

Transcription: HTK, TIMIT, TextGrid, Phondat Number of files

Page 4: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

4

Which task ?

Visualization and Edition: Record, Play, edit, mix, add effects

Analysis: spectral, pitch

Speech manipulation: Filtering, mixing, adding effects, prosodic manipulation

Annotation: segmentation, labeling

Scripting: Batch, communication with outside

Plotting

Page 5: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

5

Examples of tasks

build stimuli for an experiment (i.e. cross-splicing)

manage a speech database for a TTS engine create a prosodic database analyze speech corpus from experiment

recordings verify/correct an automatic segmentation

Page 6: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

6

Two questions

What kind of data ? Which task ?

Two rules

there is no unique tool to do everything there are plenty of ways to do one thing

Page 7: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

7

Tool features

Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting

Supported format Platform/installation Evolution/community Accessibility Price

Page 8: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

8

Softwares

Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) Matlab (general purpose soft) OGI speech tools (routines + app. dev.) …winpitch, pitchworks, phonedit, cooledit…..

Page 9: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

9

Goldwave

self-defined as “top rated, professional digital audio editor”

Page 10: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

10

Goldwave

pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface

cons: nothing for speech (pitch, formant), windows only, no scripting

Good for file edition not for speech

Page 11: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

11

Page 12: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

12

Esps - Waves

Developed by Entropic + AT&T. Now public Comp.speech FAQ says:

Esps: comprehensive set of speech analysis/processing tools

Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility

Page 13: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

13

Page 14: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

14

Esps – waves

pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard

formats, requires programming skills, development has stopped

Page 15: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

15

Praat

Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam

general purpose speech tool : edition, segmentation and labeling, prosodic manipulation

Page 16: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

16

Page 17: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

17

Praat

pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation

cons: limited scripting language, native format of transcription and pitch files

Page 18: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

18

WaveSurfer Open Source tool for sound visualization and

manipulation speech/sound analysis and sound

annotation/transcription platform for more advanced/specialized

applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications

Requires SnackToolKit

Page 19: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

19

Page 20: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

20

Transcriber

Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis

Page 21: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

21

Page 22: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

22

Matlab (Mathworks)

Math. environment Signal processing toolbox : filter-design,

spectral analysis, waveform generation, linear prediction

voicebox (2002) [email protected] pitch determination algorithm (2002)

Xuejing Sun [email protected] colea speech editor (1998) Philip Loizou

[email protected] Univ of Texas-Dallas

Page 23: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

23

Matlab (Mathworks)

pros: open, powerful, scripting, excellent plotting

cons: poor speech community, standards, not designed for big files

Page 24: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

24

OGI speech tools/CSLU Toolkit development started in 1992 in C on Unix, at Center for Spoken

Language Understanding (CSLU) at OGI Includes :

An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information

a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries

a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools.

MAN Pages RAD rapid application development

points of entry: Package(C), script(tcl), GUI(tk) levels free for research use

Page 25: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

25

Page 26: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

26

Ed

it

An

al

Man

ip

An

no

t

Scrip

t

Plo

t

Fo

rmat

OS

Evo

lut.

Co

mm

Price

Goldwavewin $40

EspsWaves C sh Unix free

Praat

yesnative

consolesendpraat src free

wavesurfer +snack

Ctcl/tk

python src free

transcriberxml free

OGIToolkit free

matlab + Sigproc+ packages native no BSD

stud.$100

$40/tbx

Summary

= yes but requires some dev.

Page 27: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

27

Expect to do conversions

Sound files goldwave (win) sox (unix)

Transcription files scripts to convert text-formatted label files

Page 28: Speech tools Jean-Philippe Goldman 03.03.2004 2 Two questions What kind of data ? Which task ?

28

Links www.goldwave.com www.speech.kth.se/software/#esps www.praat.org www.speech.kth.se/software/#wavesurfer www.cse.ogi.edu/toolkit www.mathworks.com (Matlab)

www.lpl.univ-aix.fr/~sqlab/ (phonedit) www.sciconrd.com/pworks.htm (PitchWorks) www.winpitch.com (WinPitch) www.adobe.com (CoolEdit > Audition)