MPEG-4 Speech coding · 2000-03-08 · 9 Performance Speech quality of MPEG-4 speech codecs were...

MPEG-4 Speech codin g

Masayuki Nishiguchi

Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation

BackgroundMPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance DemonstrationSummary

OutlineOutline

Back groundBack ground

Most of the existing speech coding standards support onlya single “compression” functionality.

Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use ofbandwidth and emerging new applications of multimedia systems.

MPEG-4 Speech Codin g - featuresMPEG-4 Speech Codin g - features

• Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction)

• Multi bit-rates - 1.5 ~ 24 kbps

• Narrow-band and wide-band - CELP

• Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var)

• New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP

GA (AAC, TwinVQ)

CELP (NB-CELP, WB-CELP)

2 4 8 16 32 64

Bit-rate (kbps)

Telephone

Cellularphone

MPEG-4 version-1 Natural Audio

Quality

MPEG-4 CELPMPEG-4 CELP

Narrow band 3.85-12.2 kbps 10-40 ms frame Wide band 10.9-23.8 kbps 10-20 ms frame

Multi-rate 200 - 800 bps step

Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step

Bandwidth scalability

Fine rate control

Regular pulse - WB: Low complexity

Multi pulse - WB, NB: High coding efficiency

Speech Input

Bit-rateInput

CodebookControl

MPE/RPECodebook

Long ternSyn.Filter

WeightedError Calc.

LPCSyn.Filter

LPCparameters

Excitationparameters

LPCAnalysis

CoefficientInterpolation

Blockdia gram of the CELP encoderBlockdia gram of the CELP encoder

SpeechInput

Dec ode r - 1

Dec ode r - 2

Dec ode r - 3

Dec ode r - 4

Encoder

10kbps

12kbps

22k bps

Bas i c speec h

Hi gh Qua l i t ys peech

Hi gh Qua l i t y speec h

Wi deband speec h

Structure of the bit-rate scalable codin gStructure of the bit-rate scalable codin g

PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.

15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

PerformancePerformance

Narrow band CELP - Japanese

6.0 kbps 8.3 kbps

12.0 kbps

8.0 kbps

Scalable CELP

12.0 kbps

G.723.1

6.3 kbps

G.729 8.0 kbps

GSM- EFR

12.2 kbps

MNRU10 dB

MNRU20 dB

MNRU30 dB

MNRU40 dB

MOS 95%CI

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

Wideband CELP - Japanese

BW Scalable

16.0 kbps

MPE17.9 kbps

RPE18.1 kbps

G.72248.0 kbps

G.72256.0 kbps Layer III

24 kbps

MNRU 10 dB

MNRU 20 dB

MNRU 30 db

MNRU 40 dB

MOS 95%CI

MPEG-4 CELP demonstrationMPEG-4 CELP demonstration

6 kbps NB CELP

12 kbps NB CELP

22 kbps WB CELP (BW-scalable)

CELP Demo samples are generated b y NEC

MPEG-4 HVXCMPEG-4 HVXC

Low bit-rate / good quality- 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable)- HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps

Bit-rate scalability- 2.0kbps decoding is possible using 4.0kbps bit-stream

Speed change & Pitch change- Attractive for fast speech database search & browsing

ApproachApproach

Two different types of coding schemes are combined. One is suitable for voiced segments and the other forunvoiced segments.

Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis.

Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis.

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

Harmonicmagnitudesestimation

Dimensionconversion Weighted VQ

V / UV /MV

Pitchdetection

V / UV / MV

Stochastc codebookshape & gain

Spectral shape&gain

- Voiced -

- Unvoiced -

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

Magnitude

Harmonic spectral ma gnitudesHarmonic spectral ma gnitudesand fine pitch estimationand fine pitch estimation

Harmonic spectralenvelope

Pitch frequency

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

Frequency

Magnitude

Dimension conversion ofDimension conversion ofHarmonic spectral ma gnitudesHarmonic spectral ma gnitudes

Frequency

Magnitude

Frequency

Magnitude

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

Fixed dimension harmonic spectrum

Weighting

Energy estimation

Shape Codebook -0

Vector quantization of harmonic spectral envelopeVector quantization of harmonic spectral envelope - base layer - - base layer -

Shape Codebook -1

SE Gain

DimensionConversion

VQ ofSE

Shape0

VQ ofSE

Shape1

VQ ofSE

Shape2

VQ ofSE

Shape3

VQ ofSE

Shape5. . . .

DimensionConversion

Weighteddistortion+

Scalable vector quantization of spectral envelopeScalable vector quantization of spectral envelope- base & enhancement layer -- base & enhancement layer -

EncoderEncoder

LPCAna.LSP VQ

LPCInv. Filter FFT

CELP coding

V / UV /MV

Pitchdetection

V / UV / MV

Spectral shape&gain

- Voiced -

- Unvoiced -

LPCAnalysis

InputSpeech

VQ of LSP W(z)

H(z) 6bits

GainCodebook

Stochastic Codebook

GainCodebook

Stochastic Codebook

Perceptual Weighting Filterand Subtraction ofzero- Input response of H(z)

PerceptuallyWeightedLPC syn. Filter

Quantization Error

Perceptually

WeightedLPC syn. Filter

Calculationof

Scalable CELP encoder for unvoiced segmentsScalable CELP encoder for unvoiced segments- base and enhancement layer -- base and enhancement layer -

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

DimensionconversionInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

( ) ( ) ( )( )

( ) ( )

f t A t t

θ ω τ τ φ0 0

Harmonic synthesis for voiced excitation Harmonic synthesis for voiced excitation

DecoderDecoder

Output

Stochastic shape

Stochastic gain

LSPInv. VQ

Stochasticcodebook

Harmonicsynthesis

Noisegeneration

Windowing

LPC syn.filter

Postfilter

V / UV / MV

Spectral shape&gain

arrays of original parameters : [ ]param n

arrays of interpolated parameters : [ ]mdf param m_

time index before the time scale modification :mtime index after the time scale modification :

ratio of speed change : spd

time scale modified parameters are approximated as:

Parameter interpolation for speed controlParameter interpolation for speed control

define:

fr m spd

fr fr0

define:

1fr m spd− *

m spd fr− *

[ ] [ ] [ ]mdf param m param fr r param fr l_ = +0 1 * *

spd > 1< 1

speed up

speed down

Normal speed

Speed up

Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control

Speed down

Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control

PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.

15 Japanese items were evaluated by16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

HVXC - Japanese

2.0 kbps

HVXC4.0 kbps

FS10164.8 kbps

MNRU10 dB

MNRU20 dB

MNRU 30 dB

MNRU40 dB

MOS 95%CI

MPEG-4 HVXC DemonstrationMPEG-4 HVXC Demonstration

FS1016 4.8kbps CELP

2kbps HVXC

4kbps HVXC

4kbps HVXC pitch change

4kbps HVXC speed change

Real time software decode by PC

DemonstrationDemonstration

SummarySummary

• HVXC at 2.0kbps and 4.0kbps > FS1016 CELP at 4.8 kbps.

• NB CELP existing standards at the same bit-rate ranges providing flexible bit-rate controllability and scalability.

• WB CELP at 18kbps G.722 at 48 to 56 kbps.

• MPEG-4 speech coding provides new functionalities

- speed and pitch change - bit-rate / bandwidth scalability - bit-rate controllability

• International Standard in November 1999

References

[1]ISO/IEC JTC1/SC29/WG11 N2503, "Final Draft International Standard of ISO/IEC14496-3", Dec. 1998[2]M.Nishiguchi, K.Iijima, J.Matsumoto, "Harmonic Vector Excitation Coding of Speech at2.0 kbps,” IEEE Workshop on Speech Coding, Sep.1997[3]T.Nomura, M.Iwadare, M.Serizawa, K.Ozawa, “A Bit rate and Bandwidth ScalableCELP coder,” Proc. ICASSP-98, pp.I-341-344, May. 1998[4]T.Nomura, M.Iwadare, N.Tanaka,”MPEG-4/CELP speech coding Algorithm,” Tech.Report of IEICE, SP98-89, Nov. 1998[5]M.Nishiguchi, A.Inoue, Y.Maeda, J.Matsumoto,” Parametric Speech Coding – HVXC at2.0-4.0 kbps,”IEEE Workshop on Speech Coding, June 1999[6]N.Tanaka, et al.,”A Multi-mode Variable Rate Speech Coder for CDMA CellularSystems”,Proc. IEEE VTC pp.198-202, Apr.1996[7]D.W.Griffin and J.S.Lim, "Multiband Excitation Vocoder,"IEEE Trans. ASSP, Vol.36, pp.1223-1235, Aug. 1988[8]M.Nishiguchi, J.Matsumoto, S.Ono, R.Wakatsuki, "Vector Quantized MBE withSimplified V/UV Division at 3.0Kbps," Proc. ICASSP-93, pp.II-151-154, Apr.1993[9]M.Nishiguchi, J.Matsumoto, "Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization," Proc. ICASSP-95, pp.I-484-487, May 1995[10]M.Nishiguchi, K.Iijima, J.Matsumoto, ”Low bit rate speech coding by Harmonic VectorExcitation Coding,” Proc .ASJ 1-2-4,Sep 1997[11]ISO / IEC JTC1 / SC29 / WG11 MPEG98 / N2424 “Report on the MPEG-4 speechcodec verification tests,” Oct. 1998

MPEG-4 Speech coding · 2000-03-08 · 9 Performance Speech quality of MPEG-4 speech codecs were...

Documents

New Codecs for 5G - Catalyzing the adoption of MPEG-DASH › docs › workshop-2019 › 04-thierry fautier - Harmon… · Quantitative Codecs Overview* Feature Weight Deployed devices

Contributors Guide - TelVue...Contributors Guide The following file formats can be uploaded to TelVue Connect: Media Containers Video Codecs Audio Codecs 3GPP, 3G2 MPEG-4, MPEG-2,

EXL402. 2 3 Microsoft.Speech SIP/SIMPLE (SIP Stack) SRTP/Codecs (Media Stacks) SRTP/Codecs (Media Stacks) Server SAPI (Speech Engines) UCMA Core API

Speech codecs and DCCP with TFRC VoIP mode

MPEG-4 AVC/H.264 Video Codec Comparison · MPEG-4 AVC/H.264 VIDEO CODEC COMPARISON CS MSU GRAPHICS&MEDIA LAB VIDEO GROUP MOSCOW, 12 DEC 2005 Overview Codecs Codec Developer Version

MPEG-4 & Windows Media Dr. Jordi Ribas-Corbera Lead Program Manager, Codecs Digital Media Division Microsoft Corp JordiR@microsoft.com

A Comparative Study on the Effect of Different Codecs on ... · A Comparative Study on the E ect of Di erent Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling

MPEG Audio Coding - University of Haifacs.haifa.ac.il/~nimrod/Compression/Speech/A1mpeg12-2004.pdf · Introduction • High quality low bit-rate audio coding •MPEG-1: Mono & Stereo,

Effect of Speech Compression on the Automatic … wideband AMR-WB+ speech codecs. The effects of these codecs are analyzed using a range of different features, recently reported to

Audio Codecs - MPEG 4 AAC IEEE Coding Schemes

DA-720 Series Windows Software User's Manual · • MPEG DTV-DVD Audio Decoder (MPEG-2, AAC) • MPEG Layer-3 Audio Codecs (MP3) ... cmd in the Windows Start menu field and press

"MPEG-4 Natural Audio Coding - Natural Speech Coding … · The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms

Codec and GOP Identiﬁcation in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

Speech Coding · 4 © NOKIA 2005 • Speech signals contain a lot of redundancy (repetitive waveforms, correlation) • Speech codecs are used to pack the signal for efficient

MPEG-4 AVC/H.264 Video Codecs Comparisoncompression.ru/video/codec_comparison/pdf/msu_mpeg_4_avc_h264... · Dicas Elecard Intel IPP MainConcept x264 XviD Codec Relative bitrate Dicas

Comparison of audio codecs using PEAQcodecs using PEAQ ... · Objective testing methods - 1996. ITU-T Rec. P.861: defines the method for the objective analysis of speech codecs, it

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund magnus.westerlund@ericsson.com

mms ss17 psychoacoustics mpeg audio · © Fraunhofer IDMT 2 Psychoacoustic Fundamentals MPEG Audio Coding Speech Coding AUDIO CODING

MPEG-4 AVC/H.264 Video Codecs Comparison

MPEG-4 AVC/H.264 Video Codecs Comparison - · PDF fileMPEG-4 AVC/H.264 Video Codecs Comparison Video group head: Dr. Dmitriy Vatolin Project head: Dr. Dmitriy Kulikov Measurements,