MPEG-4 Speech coding · 2000-03-08 · 9 Performance Speech quality of MPEG-4 speech codecs were...

Preview:

Citation preview

1

MPEG-4 Speech codin g

Masayuki Nishiguchi

Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation

2

BackgroundMPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance DemonstrationSummary

OutlineOutline

3

Back groundBack ground

Most of the existing speech coding standards support onlya single “compression” functionality.

Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use ofbandwidth and emerging new applications of multimedia systems.

4

MPEG-4 Speech Codin g - featuresMPEG-4 Speech Codin g - features

• Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction)

• Multi bit-rates - 1.5 ~ 24 kbps

• Narrow-band and wide-band - CELP

• Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var)

• New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP

5

GA (AAC, TwinVQ)

CELP (NB-CELP, WB-CELP)

2 4 8 16 32 64

Bit-rate (kbps)

CD

FM

AM

Telephone

Cellularphone

MPEG-4 version-1 Natural Audio

HVXC

Quality

6

MPEG-4 CELPMPEG-4 CELP

Narrow band 3.85-12.2 kbps 10-40 ms frame Wide band 10.9-23.8 kbps 10-20 ms frame

Multi-rate 200 - 800 bps step

Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step

Bandwidth scalability

Fine rate control

Regular pulse - WB: Low complexity

Multi pulse - WB, NB: High coding efficiency

7

Speech Input

Bit-rateInput

CodebookControl

MPE/RPECodebook

Long ternSyn.Filter

WeightedError Calc.

LPCSyn.Filter

LPCparameters

Excitationparameters

-

LPCAnalysis

LSPVQ

CoefficientInterpolation

Blockdia gram of the CELP encoderBlockdia gram of the CELP encoder

8

SpeechInput

Dec ode r - 1

Dec ode r - 2

Dec ode r - 3

Dec ode r - 4

Encoder

6kbps

2kbps

6kbps

2kbps

2kbps

10kbps

8kbps

12kbps

22k bps

Bas i c speec h

Hi gh Qua l i t ys peech

Hi gh Qua l i t y speec h

Wi deband speec h

Structure of the bit-rate scalable codin gStructure of the bit-rate scalable codin g

9

PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.

15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

10

PerformancePerformance

Narrow band CELP - Japanese

1

2

3

4

5

CELP

6.0 kbps 8.3 kbps

12.0 kbps

8.0 kbps

Scalable CELP

12.0 kbps

G.723.1

6.3 kbps

G.729 8.0 kbps

GSM- EFR

12.2 kbps

MNRU10 dB

MNRU20 dB

MNRU30 dB

MNRU40 dB

MOS 95%CI

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

11

Wideband CELP - Japanese

1

2

3

4

5

BW Scalable

16.0 kbps

MPE17.9 kbps

RPE18.1 kbps

G.72248.0 kbps

G.72256.0 kbps Layer III

24 kbps

MNRU 10 dB

MNRU 20 dB

MNRU 30 db

MNRU 40 dB

MOS 95%CI

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

12

MPEG-4 CELP demonstrationMPEG-4 CELP demonstration

6 kbps NB CELP

12 kbps NB CELP

22 kbps WB CELP (BW-scalable)

CELP Demo samples are generated b y NEC

13

MPEG-4 HVXCMPEG-4 HVXC

Low bit-rate / good quality- 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable)- HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps

Bit-rate scalability- 2.0kbps decoding is possible using 4.0kbps bit-stream

Speed change & Pitch change- Attractive for fast speech database search & browsing

14

ApproachApproach

Two different types of coding schemes are combined. One is suitable for voiced segments and the other forunvoiced segments.

Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis.

Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis.

15

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Stochastc codebookshape & gain

Spectral shape&gain

- Voiced -

- Unvoiced -

16

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

17

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

18

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

19

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

20

Freq

Magnitude

Harmonic spectral ma gnitudesHarmonic spectral ma gnitudesand fine pitch estimationand fine pitch estimation

Harmonic spectralenvelope

Pitch frequency

21

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

22

Frequency

Magnitude

Dimension conversion ofDimension conversion ofHarmonic spectral ma gnitudesHarmonic spectral ma gnitudes

Frequency

Magnitude

Frequency

Magnitude

23

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

24

Fixed dimension harmonic spectrum

Weighting

Energy estimation

Shape Codebook -0

Gain

Vector quantization of harmonic spectral envelopeVector quantization of harmonic spectral envelope - base layer - - base layer -

Shape Codebook -1

25

+

+-

SE Gain

Index

DimensionConversion

Input

Index

Index

Index

Index

Index

VQ ofSE

Shape0

VQ ofSE

Shape1

VQ ofSE

Shape2

VQ ofSE

Shape3

VQ ofSE

Shape5. . . .

DimensionConversion

DimensionConversion

Weighteddistortion+

-

Scalable vector quantization of spectral envelopeScalable vector quantization of spectral envelope- base & enhancement layer -- base & enhancement layer -

26

EncoderEncoder

Input

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

LSP

V / UV / MV

Pitch

Spectral shape&gain

- Voiced -

- Unvoiced -

Stochastc codebookshape & gain

27

LPCAnalysis

InputSpeech

VQ of LSP W(z)

H(z) 6bits

4bits

5bits

3bits

GainCodebook

Stochastic Codebook

GainCodebook

Stochastic Codebook

Perceptual Weighting Filterand Subtraction ofzero- Input response of H(z)

PerceptuallyWeightedLPC syn. Filter

Quantization Error

Perceptually

WeightedLPC syn. Filter

Calculationof

Error

H(z)

Calculationof

Error

+-

+- +

+

Scalable CELP encoder for unvoiced segmentsScalable CELP encoder for unvoiced segments- base and enhancement layer -- base and enhancement layer -

28

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

29

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

30

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

31

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

32

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

33

f

f

t

( ) ( ) ( )( )

( ) ( )

f t A t t

t d

m mm

m m

t

=

= +

cosθ

θ ω τ τ φ0 0

Harmonic synthesis for voiced excitation Harmonic synthesis for voiced excitation

34

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

Par

amet

er in

terp

olat

ion

for

spee

d co

ntro

l

LSP

V / UV / MV

Pitch

Spectral shape&gain

35

arrays of original parameters : [ ]param n

arrays of interpolated parameters : [ ]mdf param m_

time index before the time scale modification :mtime index after the time scale modification :

ratio of speed change : spd

time scale modified parameters are approximated as:

Parameter interpolation for speed controlParameter interpolation for speed control

n

define:

fr m spd

fr fr0

1 0

1

1

= −

= +

*

define:

l

r

=

=

0

1fr m spd− *

m spd fr− *

[ ] [ ] [ ]mdf param m param fr r param fr l_ = +0 1 * *

spd

spd > 1< 1

speed up

speed down

36

Normal speed

Speed up

Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control

37

Speed down

Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control

38

PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.

15 Japanese items were evaluated by16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

39

HVXC - Japanese

1

2

3

4

5

HVXC

2.0 kbps

HVXC4.0 kbps

FS10164.8 kbps

MNRU10 dB

MNRU20 dB

MNRU 30 dB

MNRU40 dB

MOS 95%CI

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

40

MPEG-4 HVXC DemonstrationMPEG-4 HVXC Demonstration

FS1016 4.8kbps CELP

2kbps HVXC

4kbps HVXC

41

4kbps HVXC pitch change

4kbps HVXC speed change

Real time software decode by PC

DemonstrationDemonstration

42

SummarySummary

• HVXC at 2.0kbps and 4.0kbps > FS1016 CELP at 4.8 kbps.

• NB CELP existing standards at the same bit-rate ranges providing flexible bit-rate controllability and scalability.

• WB CELP at 18kbps G.722 at 48 to 56 kbps.

• MPEG-4 speech coding provides new functionalities

- speed and pitch change - bit-rate / bandwidth scalability - bit-rate controllability

• International Standard in November 1999

43

References

[1]ISO/IEC JTC1/SC29/WG11 N2503, "Final Draft International Standard of ISO/IEC14496-3", Dec. 1998[2]M.Nishiguchi, K.Iijima, J.Matsumoto, "Harmonic Vector Excitation Coding of Speech at2.0 kbps,” IEEE Workshop on Speech Coding, Sep.1997[3]T.Nomura, M.Iwadare, M.Serizawa, K.Ozawa, “A Bit rate and Bandwidth ScalableCELP coder,” Proc. ICASSP-98, pp.I-341-344, May. 1998[4]T.Nomura, M.Iwadare, N.Tanaka,”MPEG-4/CELP speech coding Algorithm,” Tech.Report of IEICE, SP98-89, Nov. 1998[5]M.Nishiguchi, A.Inoue, Y.Maeda, J.Matsumoto,” Parametric Speech Coding – HVXC at2.0-4.0 kbps,”IEEE Workshop on Speech Coding, June 1999[6]N.Tanaka, et al.,”A Multi-mode Variable Rate Speech Coder for CDMA CellularSystems”,Proc. IEEE VTC pp.198-202, Apr.1996[7]D.W.Griffin and J.S.Lim, "Multiband Excitation Vocoder,"IEEE Trans. ASSP, Vol.36, pp.1223-1235, Aug. 1988[8]M.Nishiguchi, J.Matsumoto, S.Ono, R.Wakatsuki, "Vector Quantized MBE withSimplified V/UV Division at 3.0Kbps," Proc. ICASSP-93, pp.II-151-154, Apr.1993[9]M.Nishiguchi, J.Matsumoto, "Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization," Proc. ICASSP-95, pp.I-484-487, May 1995[10]M.Nishiguchi, K.Iijima, J.Matsumoto, ”Low bit rate speech coding by Harmonic VectorExcitation Coding,” Proc .ASJ 1-2-4,Sep 1997[11]ISO / IEC JTC1 / SC29 / WG11 MPEG98 / N2424 “Report on the MPEG-4 speechcodec verification tests,” Oct. 1998

44

END

Recommended