Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
MPEG-4 Speech codin g
Masayuki Nishiguchi
Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation
2
BackgroundMPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance DemonstrationSummary
OutlineOutline
3
Back groundBack ground
Most of the existing speech coding standards support onlya single “compression” functionality.
Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use ofbandwidth and emerging new applications of multimedia systems.
4
MPEG-4 Speech Codin g - featuresMPEG-4 Speech Codin g - features
• Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction)
• Multi bit-rates - 1.5 ~ 24 kbps
• Narrow-band and wide-band - CELP
• Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var)
• New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP
5
GA (AAC, TwinVQ)
CELP (NB-CELP, WB-CELP)
2 4 8 16 32 64
Bit-rate (kbps)
CD
FM
AM
Telephone
Cellularphone
MPEG-4 version-1 Natural Audio
HVXC
Quality
6
MPEG-4 CELPMPEG-4 CELP
Narrow band 3.85-12.2 kbps 10-40 ms frame Wide band 10.9-23.8 kbps 10-20 ms frame
Multi-rate 200 - 800 bps step
Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step
Bandwidth scalability
Fine rate control
Regular pulse - WB: Low complexity
Multi pulse - WB, NB: High coding efficiency
7
Speech Input
Bit-rateInput
CodebookControl
MPE/RPECodebook
Long ternSyn.Filter
WeightedError Calc.
LPCSyn.Filter
LPCparameters
Excitationparameters
-
LPCAnalysis
LSPVQ
CoefficientInterpolation
Blockdia gram of the CELP encoderBlockdia gram of the CELP encoder
8
SpeechInput
Dec ode r - 1
Dec ode r - 2
Dec ode r - 3
Dec ode r - 4
Encoder
6kbps
2kbps
6kbps
2kbps
2kbps
10kbps
8kbps
12kbps
22k bps
Bas i c speec h
Hi gh Qua l i t ys peech
Hi gh Qua l i t y speec h
Wi deband speec h
Structure of the bit-rate scalable codin gStructure of the bit-rate scalable codin g
9
PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.
15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab.
15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.
* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
10
PerformancePerformance
Narrow band CELP - Japanese
1
2
3
4
5
CELP
6.0 kbps 8.3 kbps
12.0 kbps
8.0 kbps
Scalable CELP
12.0 kbps
G.723.1
6.3 kbps
G.729 8.0 kbps
GSM- EFR
12.2 kbps
MNRU10 dB
MNRU20 dB
MNRU30 dB
MNRU40 dB
MOS 95%CI
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
11
Wideband CELP - Japanese
1
2
3
4
5
BW Scalable
16.0 kbps
MPE17.9 kbps
RPE18.1 kbps
G.72248.0 kbps
G.72256.0 kbps Layer III
24 kbps
MNRU 10 dB
MNRU 20 dB
MNRU 30 db
MNRU 40 dB
MOS 95%CI
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
12
MPEG-4 CELP demonstrationMPEG-4 CELP demonstration
6 kbps NB CELP
12 kbps NB CELP
22 kbps WB CELP (BW-scalable)
CELP Demo samples are generated b y NEC
13
MPEG-4 HVXCMPEG-4 HVXC
Low bit-rate / good quality- 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable)- HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps
Bit-rate scalability- 2.0kbps decoding is possible using 4.0kbps bit-stream
Speed change & Pitch change- Attractive for fast speech database search & browsing
14
ApproachApproach
Two different types of coding schemes are combined. One is suitable for voiced segments and the other forunvoiced segments.
Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis.
Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis.
15
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Stochastc codebookshape & gain
Spectral shape&gain
- Voiced -
- Unvoiced -
16
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
17
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
18
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
19
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
20
Freq
Magnitude
Harmonic spectral ma gnitudesHarmonic spectral ma gnitudesand fine pitch estimationand fine pitch estimation
Harmonic spectralenvelope
Pitch frequency
21
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
22
Frequency
Magnitude
Dimension conversion ofDimension conversion ofHarmonic spectral ma gnitudesHarmonic spectral ma gnitudes
Frequency
Magnitude
Frequency
Magnitude
23
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
24
Fixed dimension harmonic spectrum
Weighting
Energy estimation
Shape Codebook -0
Gain
Vector quantization of harmonic spectral envelopeVector quantization of harmonic spectral envelope - base layer - - base layer -
Shape Codebook -1
25
+
+-
SE Gain
Index
DimensionConversion
Input
Index
Index
Index
Index
Index
VQ ofSE
Shape0
VQ ofSE
Shape1
VQ ofSE
Shape2
VQ ofSE
Shape3
VQ ofSE
Shape5. . . .
DimensionConversion
DimensionConversion
Weighteddistortion+
-
Scalable vector quantization of spectral envelopeScalable vector quantization of spectral envelope- base & enhancement layer -- base & enhancement layer -
26
EncoderEncoder
Input
LPCAna.LSP VQ
LPCInv. Filter FFT
CELP coding
Harmonicmagnitudesestimation
Dimensionconversion Weighted VQ
V / UV /MV
Pitchdetection
LSP
V / UV / MV
Pitch
Spectral shape&gain
- Voiced -
- Unvoiced -
Stochastc codebookshape & gain
27
LPCAnalysis
InputSpeech
VQ of LSP W(z)
H(z) 6bits
4bits
5bits
3bits
GainCodebook
Stochastic Codebook
GainCodebook
Stochastic Codebook
Perceptual Weighting Filterand Subtraction ofzero- Input response of H(z)
PerceptuallyWeightedLPC syn. Filter
Quantization Error
Perceptually
WeightedLPC syn. Filter
Calculationof
Error
H(z)
Calculationof
Error
+-
+- +
+
Scalable CELP encoder for unvoiced segmentsScalable CELP encoder for unvoiced segments- base and enhancement layer -- base and enhancement layer -
28
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
29
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
30
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
31
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
32
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
33
f
f
t
( ) ( ) ( )( )
( ) ( )
f t A t t
t d
m mm
m m
t
=
= +
∑
∫
cosθ
θ ω τ τ φ0 0
Harmonic synthesis for voiced excitation Harmonic synthesis for voiced excitation
34
DecoderDecoder
Output
Stochastic shape
Stochastic gain
LSPInv. VQ
DimensionconversionInv. VQ
Stochasticcodebook
Harmonicsynthesis
Noisegeneration
Windowing
LPC syn.filter
Postfilter
Par
amet
er in
terp
olat
ion
for
spee
d co
ntro
l
LSP
V / UV / MV
Pitch
Spectral shape&gain
35
arrays of original parameters : [ ]param n
arrays of interpolated parameters : [ ]mdf param m_
time index before the time scale modification :mtime index after the time scale modification :
ratio of speed change : spd
time scale modified parameters are approximated as:
Parameter interpolation for speed controlParameter interpolation for speed control
n
define:
fr m spd
fr fr0
1 0
1
1
= −
= +
*
define:
l
r
=
=
0
1fr m spd− *
m spd fr− *
[ ] [ ] [ ]mdf param m param fr r param fr l_ = +0 1 * *
spd
spd > 1< 1
speed up
speed down
36
Normal speed
Speed up
Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control
37
Speed down
Harmonics spectra interpolationHarmonics spectra interpolationfor speed controlfor speed control
38
PerformancePerformanceSpeech quality of MPEG-4 speech codecs were evaluatedin the official MPEG-4 verification tests in Aug 98 at 2European labs and 1 Japanese lab*.
15 Japanese items were evaluated by16 Japanese listeners in the Japanese Lab.
15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.
* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
39
HVXC - Japanese
1
2
3
4
5
HVXC
2.0 kbps
HVXC4.0 kbps
FS10164.8 kbps
MNRU10 dB
MNRU20 dB
MNRU 30 dB
MNRU40 dB
MOS 95%CI
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
40
MPEG-4 HVXC DemonstrationMPEG-4 HVXC Demonstration
FS1016 4.8kbps CELP
2kbps HVXC
4kbps HVXC
41
4kbps HVXC pitch change
4kbps HVXC speed change
Real time software decode by PC
DemonstrationDemonstration
42
SummarySummary
• HVXC at 2.0kbps and 4.0kbps > FS1016 CELP at 4.8 kbps.
• NB CELP existing standards at the same bit-rate ranges providing flexible bit-rate controllability and scalability.
• WB CELP at 18kbps G.722 at 48 to 56 kbps.
• MPEG-4 speech coding provides new functionalities
- speed and pitch change - bit-rate / bandwidth scalability - bit-rate controllability
• International Standard in November 1999
≈
≈
43
References
[1]ISO/IEC JTC1/SC29/WG11 N2503, "Final Draft International Standard of ISO/IEC14496-3", Dec. 1998[2]M.Nishiguchi, K.Iijima, J.Matsumoto, "Harmonic Vector Excitation Coding of Speech at2.0 kbps,” IEEE Workshop on Speech Coding, Sep.1997[3]T.Nomura, M.Iwadare, M.Serizawa, K.Ozawa, “A Bit rate and Bandwidth ScalableCELP coder,” Proc. ICASSP-98, pp.I-341-344, May. 1998[4]T.Nomura, M.Iwadare, N.Tanaka,”MPEG-4/CELP speech coding Algorithm,” Tech.Report of IEICE, SP98-89, Nov. 1998[5]M.Nishiguchi, A.Inoue, Y.Maeda, J.Matsumoto,” Parametric Speech Coding – HVXC at2.0-4.0 kbps,”IEEE Workshop on Speech Coding, June 1999[6]N.Tanaka, et al.,”A Multi-mode Variable Rate Speech Coder for CDMA CellularSystems”,Proc. IEEE VTC pp.198-202, Apr.1996[7]D.W.Griffin and J.S.Lim, "Multiband Excitation Vocoder,"IEEE Trans. ASSP, Vol.36, pp.1223-1235, Aug. 1988[8]M.Nishiguchi, J.Matsumoto, S.Ono, R.Wakatsuki, "Vector Quantized MBE withSimplified V/UV Division at 3.0Kbps," Proc. ICASSP-93, pp.II-151-154, Apr.1993[9]M.Nishiguchi, J.Matsumoto, "Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization," Proc. ICASSP-95, pp.I-484-487, May 1995[10]M.Nishiguchi, K.Iijima, J.Matsumoto, ”Low bit rate speech coding by Harmonic VectorExcitation Coding,” Proc .ASJ 1-2-4,Sep 1997[11]ISO / IEC JTC1 / SC29 / WG11 MPEG98 / N2424 “Report on the MPEG-4 speechcodec verification tests,” Oct. 1998
44
END