View
253
Download
0
Category
Preview:
Citation preview
Figure 4.1 DPCM principles: (a) encoder/decoder schematic; (b) encoder timing.
© Pearson Education Limited 2001
Bandlimitingfilter ADC Subtractor
Timing+
control
RegisterR
Adder
DPCM signal encoder PCM DPCM
Speechinputsignal
Parallel-to-serial
converter
DAC
Timing+
control
RegisterR
Adder
DPCM signal decoder PCM DPCM
Speechoutputsignal
Serial-to-parallel
converter
Network
Low-passfilter
(a)
(b) Sample N
T0 T1
DPCM = PCM – R0 R1 = R0 + DPCMTime
R0 = current contents of register R and R1 = new/updated contents
Figure 4.2 Third-order predictive DPCM signal encoder anddecoder schematic.
© Pearson Education Limited 2001
Bandlimitingfilter ADC Subtractor
Timing+
control
Adder
Predictive DPCM signal encoder PCM DPCM
Speechinputsignal
Parallel-to-serial
converter
DAC
Timing+
control
Predictive DPCM signal decoder PCM DPCM
Speechoutputsignal
Serial-to-parallel
converter
Network
Low-passfilter
R1 R2 R3
C1 C2 C3
Adder
R1 R2 R3
C1 C2 C3
C1, C2, C3 = predictor coefficients
Figure 4.3 ADPCM subband encoder and decoder schematic.
© Pearson Education Limited 2001
ADPCM subband decoder
Speechoutputsignal
48kbps
16kbps
ADPCM subband encoder
Speechinputsignal
Multiplexer
Network
Lower subbandbandlimiting filter(50Hz – 3.5kHz
Lower subbandADPCM encoder
48kbps
8ksps
Upper subbandbandlimiting filter(3.5kHz – 7kHz
Upper subbandADPCM encoder
16kbps
16ksps64kbps
Lower subbandlow-pass filter
(50Hz – 3.5kHz)
Upper subbandlow-pass filter
(3.5kHz – 7kHz)
Demultiplexer
Lower subbandADPCM decoder
Upper subbandADPCM decoder
Figure 4.4 Linear predictive coding (LPC) signal encoder anddecoder schematic.
© Pearson Education Limited 2001
LPC signal decoder
Speechoutputsignal
LPC signal encoder
Speechinputsignal
Network
Voiced/unvoicedPitchLoudness
LPC vocal tractmodel coefficients
Vocaltract
analysis
Waveformfeature
extraction
Voiced signalsynthesizer
Unvoiced signalsynthesizer
LoudnessVoiced/unvoiced
Pitch
LPC vocal tractmodel coefficients
Vocal tractmodel
Digitizer
Digitized segments of input signal
Figure 4.5 Perceptual properties of the human ear:(a) sensitivity as a function of frequency; (b) frequency masking.
© Pearson Education Limited 2001
00.01 0.02 0.05 0.1 0.2 0.5 1.0 2.0 5.0 10 20
Frequency (kHz)
Sign
al a
mpl
itude
rela
tive
to th
e m
inim
umse
nsiti
vity
leve
l of t
he e
ar (d
B)
20
40
60
80
100
B A
(a)
00.01 0.02 0.05 0.1 0.2 0.5 1.0 2.0 5.0 10 20
Frequency (kHz)
Rela
tive
signa
l am
plitu
dele
vel (
dB)
20
40
60
80
100
B
A
(b)
= Hearing sensitivity of the human ear
Figure 4.6 Variation with frequency of effect of frequencymasking.
© Pearson Education Limited 2001
00 1 2 3 4 5 6 7 8 10 11
Frequency (kHz)
Rela
tive
signa
l am
plitu
de le
vel (
dB)
20
40
60
80
100
9
Figure 4.7 Temporal masking caused by a loud signal.
© Pearson Education Limited 2001
00 T T + 50 Time (ms)
Rela
tive
signa
l am
plitu
de le
vel (
dB)
20
40
60
80
100Signal ON Signal OFF
= Inaudible signal amplitudes
Figure 4.8 MPEG perceptual coder schematic: (a) encoder/decoder implementation schematic (b) exampleframe format.
© Pearson Education Limited 2001
Single-to-maskratios (SMRs)
+Bit allocations
Maskingthresholds
Psychoacoustic model
Q1
Q2
Q32
Bit allocations
Analysisfilterbank(DFT)
PCMencoder
1
2
32
DQ1
DQ2
DQ32
Bit allocations
PCMdecoder
1
2
32
Demultiplexframe into12 × 32subbandsamples
+bit
allocations
Synthesisfilterbank(IDFT)
1
2
32
Audioinputsignal
Formatframe
fortransmission
Encodedbitstream
PCMtime samples
Subbandsamples
Quantizedsubbandsamples
Network
Audiooutputsignal
Perceptual encoder
Perceptual decoder
(a)
(b)
DFT = discrete Fourier transformQ = quantizer DQ = dequantizerIDFT = Inverse DFT
Header SBS format Ancillary data12 × 32 subband samples (SBS)
Minimum encoding/decoding delay
Figure 4.9 Perceptual coder schematics: (a) forward adaptivebit allocation (MPEG); (b) fixed bit allocation (Dolby AC-1).
© Pearson Education Limited 2001
Demux DQ SFBQB FFAFB
PM
SMRs+
bit allocations
Bit allocations
PCMinput
Quantizedsubband samples
Subbandsamples
Encodedbitstream
Dequantizedsubband samples
Quantized samples+
bit allocations
PCMoutput
(a)
Demux DQ SFBQB FFAFB
PM
SMRs
PCMinput
Encodedbitstream
PCMoutput
(b)
Fixed bitallocations
Fixed bitallocations
AFB = analysis filter bankSFB = synthesis filter bankSMRs = signal-to-mask ratios
QB = quantization blocksDQ = dequantization blocks
PM = psychacoustic modelFF = frame formatter
Figure 4.10 Perceptual coder schematic: (a) backward adaptivebit allocation (Dolby AC-2); (b) hybrid backward/forwardadaptive bit allocation (Dolby AC-s).
© Pearson Education Limited 2001
Demux DQ SFBQB FFAFB
PM
SMRs+
bit allocations
PCMinput
Quantizedsamples
Subbandsamples
Encodedbitstream
Dequantizedsamples
PCMoutput
(a)
SEE
PM
SED
Bit allocations
Decoded spectralenvelope
Encodedspectralenvelope
Demux DQ SFBQB FFAFB
PMB
PCMinput
Encodedbitstream
PCMoutput
(b)
SEE
PMB
SEDEncodedspectralenvelope
PMF
Modificationinformation
Modificationinformation
AFB = analysis filter bankSFB = synthesis filter bankSEE = spectral envelope encoder
QB = quantization blocksDQ = dequantization blocksSED = spectral envelope decoder
PM = psychoacoustic modelFF = frame formatter
Figure 4.11 Example frame sequences with: (a) I- and P-framesonly; (b) I-, P- and B-frames; (c) PB-frames.
© Pearson Education Limited 2001
IEncodedframe sequence
P
Prediction
P
Prediction
I P P I
M = 1
N = 3
(a)
IEncodedframe sequence
B B
Prediction
P B B I
M = 3
N = 6
(b)
Bidirectional predictions
M = prediction span N = group of pictures (GOP) span
P B P
Prediction(c)
Bidirectional predictionsPB-frame
(encoded as a single frame)
Figure 4.12 P-frame encoding: (a) macroblock structure; (b) encoding procedure.
© Pearson Education Limited 2001
Video frame format (4:1:1)(a)
M × 16
N × 16
8
8 Cb
88
8
8
Macroblock contents
8
8 Cr
Y
Cb
Cr
1 macroblock = 4 (8 × 8) blocks for Y+ 1 (8 × 8) block for Cb+ 1 (8 × 8) block for Cr
Search region in target frame:(b)
Best matchmacroblock, MR
DCT+
Quantization+
Run-lengthencoding
Differentialencoding
Huffman
Encoded bitstream
MD VMT – MR
Macroblock tobe encoded, MT
Motion vector, V
Same search region in preceding(I or P) reference frame:
Figure 4.13 B-frame encoding procedure.
© Pearson Education Limited 2001
Best matchmacroblock, MP
MD, VPMT – MP
Motion vector, VP
Same search region in preceding(I or P) reference frame:
DCT+
Quantization+
Run-lengthencoding
Differentialencoding
Huffman
Encoded bitstream
MD'
MD", VP, VSMT – [MD + MD']1
2
MD', VSMT – MS
Search region in target frame:
Macroblock tobe encoded, MT
Best matchmacroblock, MS
Motion vector, VS
Same search region in succeeding(P or I) reference frame:
MD MD"oror VP VSand/or
Figure 4.14 Implementation schematics: (a) I-frames; (b) P-frames; (c) B-frames; (d) example macroblock encodedbitstream format.
© Pearson Education Limited 2001
Entropy encoding(EE)
Formatter Encodedbitstream
Quantization(Q)
Forward DCT(FDCT)
Target framecontents
(a) I-frames:Macroblocks
Q Formatter Encodedbitstream
FDCTDifferencecomputation
Target framecontents
(b) P-frames:Macroblocks
EE
DQ
Referenceframe contents
Motionestimation
IDCT
Macroblock address
Motion vector Differentialencoding
DQ = dequantizer IDCT = inverse DCT
Q Formatter Encodedbitstream
FDCTDifferencecomputation
Target framecontents
(c) B-frames:Macroblocks
EE
DQ
Referenceframe contents
IDCT
Macroblock address
Motion vectors Differentialencoding
Succeedingframe contents
Quantizationvalue
AddressType Motionvector
Blockspresent
B1 B2 B3 B6
Skip,value
End ofblock
Skip,value
DC
(d)
Motionestimation
Figure 4.15 H.261 encoding formats: (a) macroblock format; (b) frame/picture format; (c) GOB structure.
© Pearson Education Limited 2001
Quantizationvalue
Address Type Motionvector
Coded blockpattern
B1 B2 B3 B6
Skip,value
End ofblock
Skip,value
DC
(a)
Picturetype
Picturestart code
Temporalreference
GOB1 GOB2 GOB3 GOB12
MB MBMBQuantizationparameter
(b)
Groupnumber
GOBstart code
MB1
(c)
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
48 pixels1 GOB:
176 pixels
1MB = 16 × 16 pixels(luminance)
GOB1 2
3 4
5 6
7 8
9 10
11 12
352 pixels
288 pixels
CIF resolution
GOB1
3
5
176 pixels
144 pixels
QCIF resolution
GOB = Group of (macro) blocks
QCIFCIF
Figure 4.16 H.261 video encoder principles: (a) implementationschematic; (b) FIFO buffer operation.
© Pearson Education Limited 2001
Encodedbitstream
FDCTDifferencecomputation
Target framecontents
(a)
EE
IDCT
Macroblock address
Motion vector Differentialencoding
(b)
Reference framememory
+Motion estimation
DQ
Frameformatter
Q FIFObuffer
Quantization control
Variable inputbit rate
(from frame formatter)
Constant output bit rate(to transmission channel)
FIFO buffer
Highthreshold
Lowthreshold
Increase quantizationthreshold
Decrease quantizationthreshold
Figure 4.17 H.263 error tracking scheme: (a) example errorpropagation; (b) same example with error tracking applied.
© Pearson Education Limited 2001
Frame1
Encoder
GOB 1
3
5
2 3 4 5
1
Decoder
2 3 4 5
Time
(a)
Frame1
Encoder
GOB 1
3
5
2 3 4 5
1
Decoder
2 3 4 5
Time
(b)
NAK
(1, 3
)
= corrupted macroblocks = intracoded macroblocks
= frame contents incur transmission errors
GOB = group of (macro)blocks NAK (1, 3) = GOB3 in frame1 corrupted
Figure 4.18 Independent segment decoding: (a) effect of a GOBbeing corrupted; (b) when used with error tracking.
© Pearson Education Limited 2001
Frame1
Encoder
GOB 1
3
5
2 3 4 5
1
Decoder
2 3 4 5
Time
(a)
Frame1
Encoder
GOB 1
3
5
2 3 4 5
1
Decoder
2 3 4 5
Time
(b)
NAK
(2, 3
)
= corrupted GOB = intracoded GOB
= frame contents incur transmission errors
GOB = group of (macro)blocks NAK (2, 3) = GOB3 in frame2 corrupted
Figure 4.19 Reference picture selection with independentsegment decoding: (a) NAK mode; (b) ACK mode.
© Pearson Education Limited 2001
ACK
(4)
ACK
(1)
NAK
(2, 3
)
Frame1(I)
Encoder
GOB 1
3
5
(a)
I = intracoded frame P = predicted/intercoded frame
Decoder
2(P) 3(P) 4(P) 5(P) 6(P)
1 2 3 4 5 6
Time
Intercoding sequence
ACK
(2)
Frame1(I)
Encoder
(b)
Decoder
2(P) 3(P) 4(P) 5(P) 6(P)
1 2 3 4 5 6
Time
Intercoding sequence
7(P)
Interdecoding sequence
Interdecoding sequence
GOB 1
3
5
Figure 4.20 MPEG-1 example frame sequence.
© Pearson Education Limited 2001
I B B
Predictions
P B B I
Bidirectional predictions
B P B B P B
Figure 4.21 MPEG-1 video bitstream structure: (a) composition;(b) format.
© Pearson Education Limited 2001
B1 B2
B3 B4
B5
B6
Y Cb, Cr
MB
8
8
B1–6MB22MB2MB1
Slice 2
Slice N
Slice 1
I B B P B B P B B P B B I
GOP1 GOP2 GOP3 GOP4 GOPN
Sequence
GOP format:
Multiple GOPs:
One (or more) stored video(s):
I, P or Bpictures/frames:
(a)
GOP1 GOP2 GOP3 GOPNBitstream
parametersVideo
parametersSequencestart code
Picture 2 Picture NPicture
(frame) 1GOP
parametersTime-stamp
GOPstart code
Slice 2 Slice NEncode
parametersBuffer
parametersTypePicture
start code
MB 3 MB NQuantizationparameters
Verticalposition
Slicestart code
Slice 1
MB 2MB 1
(b)
Quantizationparameters
Figure 4.22 MPEG-2 DCT block derivation with I-frames: (a) effect of interlaced scanning; (b) field mode; (c) framemode.
© Pearson Education Limited 2001
1 2 3 4 720
Line 1
3
N
T field
Field 1
1 2 3 4 720
2
4
N
T field
Field 2
1 2 3 4 720
1
3
N
T field
Field 1
Frame 1 Frame 2
T frame T frame
Time
N = 480 (NTSC), 576 (PAL)
(a)
1
(b)
1
One 16 × 16 macroblock
23456
16
2 3 4 5 6 161
1
Four 8 × 8 DCT blocks
357
2 3 84
15
9 11 161210
Field 1
2468
16
Field 2
1
(c)
1
One 16 × 16 macroblock
23456
16
2 3 4 5 6 161
1
Four 8 × 8 DCT blocks
234
2 3 84
8
9 11 161210
Frame 19
101112
16
Figure 4.23 Content-based video coding principles showing howa frame/scene is defined in the form of multiple video objectplanes.
© Pearson Education Limited 2001
Original frame/scene
VOP 2
VOP = video object plane
VOP 1
VOP 0
Figure 4.24 MPEG-4 coding principles: (a) encoder/decoderschematics; (b) VOP encoder schematic.
© Pearson Education Limited 2001
VOP 0encoding
VOP 1encoding
VOP 2encoding
Audio encoding
Scene and object descriptors
VOPidentification
+definition
Audioinput
Video frameinput
Encoder
Transmissionmultiplexer
Scene and object descriptors
Audiooutput
Videooutput
Decoder
Audio decoding
VOP 0decoding
VOP 1decoding
VOP 2decoding
Scenecomposition
+rendering
Transmissiondemultiplexer
Network
Reference VOP memory
Motion estimation/compensation
Shape coding
Difference computationTargetVOP Ncontents
Texturecoding/decoding
Texture
Motion
Shape
VOP Nmultiplex
VOP Nencoding
VOP Nvideo input
VOP Nencoder
(a)
(b)
Figure 4.25 MPEG-4 decoder schematic.
© Pearson Education Limited 2001
VideoAudio
Decompressionlayer
Interactions
Elementarystreams (ESs)
Packetizedelementary
streams (PESs)
Transportstream (TS)
Objectdescriptors
Scenedescriptor VOP 1 VOP 2
Composition and rendering
Multimedia PCor
television
Synchronization layer
FlexMux layer
Transmissionnetwork
Audioobject VOP 0
Figure 4.26 MPEG-4 encoding: (a) conventional GOB approach;(b) using fixed-length video packets; (c) video packet format.
© Pearson Education Limited 2001
MBnumber
Resyncmarker QP Motion vectors MBM DCT informationVP format:
GOB = group of (macro) blocksQP = quantization parametersX = bit errors
MB = macroblock VP = video packetMBM = motion boundary marker
VP 1 VP 2 VP 3 VP 4 VP 5 VP 6 VP 7 VP 8 VP 9
(c)
Encoded using VPs(equal numberof bits per VP)
(b)
GOB 1 GOB 5Encoded using GOBs
(equal numberof MBs per GOB)
(a)
GOB 3
Intercoded frame contents/bitstream
GOB 1
GOB 3
GOB 5
Uncoded frame contents (QCIF)
Figure 4.27 Reversible VLCs: (a) example codeword set; (b) effect of transmission errors on decoding procedure.
© Pearson Education Limited 2001
111101110011100011
RVLC
1010010001
VLC
Maximum codeword length = 6 bits
(a)
111100111111011100011111100011
Forward scan
(b)
VLC = variable length codewordRVLC = reversible VLC
0 = bit error
111100111010011100011111100011
Bitstream without errors:
Forward scan
Bitstream with errors:
Errordetected
Backward scan
111Useable codewords:
Errordetected
100011111100011
Example 4.1
© Pearson Education Limited 2001
An MPEG-1 system uses the frame sequence shown in Figure 4.20.
(i) Define the terms M and N and hence determine their values for thesequence shown in the figure.
(ii) Derive a suitable reordered sequence that ensures firstly, only twoframes must be stored in the decoder, and secondly, the required I-and/or P-frames are available to decode each P- and B-frame asthey are received.
Answer:
(i) As we described earlier in Section 4.3.1 under the subheading of“Frame types”, M is the distance (in frames) between a P-frame andthe immediately preceding I- or P- frame, and N is the number offrames between two successive I-frames. The latter is known as agroup of pictures or GOP. Hence for the frame sequence shown inFigure 4.20, M = 3 and N = 12.
(ii) A suitable reordered frame sequence that meets the definedrequirements is:
IPBBPBBPBBIBBPBB ...
Example 4.2
© Pearson Education Limited 2001
A digitized video is to be compressed using the MPEG-1 standard.Assuming a frame sequence of:
IBBPBBPBBPBBI...
and average compression ratios of 10:1 (I), 20:1 (P) and 50:1 (B),derive the average bit rate that is generated by the encoder for both theNTSC and PAL digitization formats.
Answer:
Frame sequence = IBBPBBPBBPBBI...
Hence: 1/12 of frames are I-frames, 3/12 are P-frames, and8/12 are B-frames.
and Average compression ratio = (1 × 0.1 + 3 × 0.05 + 8 × 0.02)/12
= 0.0342 or 29.24:1
NTSC frame size:
Without compression= 352 × 240 × 8 + 2 (176 × 120 × 8)
= 1.013760Mbits per frame
With compression = 1.01376 × 1/29.24
= 34.670kbits per frame
Hence bit rate generated at 30 fps = 1.040Mbps
4.2 Continued
© Pearson Education Limited 2001
PAL frame size:
Without compression= 352 × 288 × 8 + 2 (176 × 144 × 8)
= 1.216512Mbits per frame
With compression = 1.216512 × 1/29.24
= 41.604kbits per frame
Hence bit rate generated at 25 fps = 1.040Mbps
Normally, allowing for packetization and multiplexing overheads, abandwidth of 1.2 Mbps is allocated for the video. Hence, assuming amaximum bit rate of 1.5 Mbps, this leaves 300 kbps for the compressedaudio stream.
Recommended