View
221
Download
0
Category
Tags:
Preview:
Citation preview
Speech codecs and DCCP with TFRC VoIP mode
Magnus Westerlund
magnus.westerlund@ericsson.com
Important Features of TFRC VoIP mode
• Minimum packet interval 10 ms• Packet rate is penalized:
– X = X * S_true / (S_true + H)– H=40; Header size– S_true is complete RTP packet size, i.e. RTP+Payload
• Still TFRC and sending is delayed if not sufficient bit-rate available.
• Slow start of 4 packets, the size limitation is not an issue for the discussed codecs.
ReceiverSender
System overview
• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay
• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP
Codec
MIC
Payload
Packetization
DCCP
Internet
Codec
Speaker
DCCP
Jitter
Buffer
Problems with TFRC style packet rate penalties
• Varying the packetization, directly affects the system delay seen at the receiver.
• Requires a jitter buffer that is capable of handling the increased or decreased system delay.
• Frequent changes will make it more problematic for adaptive buffers to correctly parameterize the jitter.
• Buffer under-runs needs to be handled with little impact on voice quality. Thus insertion of audio data or invoking of error concealment becomes required.
Speech and Audio Codecs with RTP Payload formats
• Narrowband codecs:– G.711 (PCMA or PCMU)– G.723– G.726– G.728– G.729– GSM– GSM-EFR– AMR– EVRC– SMV– QCELP– BroadVoice 16– iLBC
• Wideband codecs– AMR-WB
– VMR-WB
– BroadVoice 32
– G.722
• Variable sampling rate– DVI4
– VDVI
– L8
– L16
– PCMA
– PCMU
Codec and RTP payload properties
• Bit-rate of encoded content
• Sample or frame based
• Frame lengths: 2.5, 5, 10, 20, 30, etc. frame lengths in milliseconds
• Basically all payload formats supports aggregation, however some have modes where it is restricted.
DTX and Comfort Noise
• DTX is Discontinuous Transmission• Voice activity detector (VAD) detects if there is
active speech or not. • When there is no active speech different DTX
procedures can be used:– No Transmission at all– Comfort Noise (CN) using RFC 3389– Codec built CN in like AMR SID (Silence Descriptor)
• Frequency of Comfort Noise packets varies but is usually some fraction of normal packet rate
Sample based codecs
• Speech bandwidth depends on sampling rate.• Sample based, and can usually handle any number
of samples per packet.• Usually no adaptivity other than packetization.
Some can vary quantization, like G.726.• Bit-rate depends on sampling rate and sample
quantization. • Example: G.711 uses 8 bits per sample, and 8kHz
sampling. Resulting in 64 kbps audio data rate.• Comfort noise may be supported using RFC 3389.
AMR
• 3GPP defined, mandatory speech codec in UMTS 3G networks
• Narrowband codec (8kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate: has 8 encoding modes with bit-rate
between 12.2 and 4.75 kbps. • Has comfort noise generation (SID) and DTX.• The SID (Silence Descriptor) is sent in every 8th
frame and is 5 bytes in size.
EVRC and SMV• 3GPP2 defined, required in CDMA networks• Narrowband codecs (8kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes at 3 (EVRC) or 4 (SMV) different rates, varying from
8.55 to 0.8 kbps depending on audio input. Thus highly variable packet sizes.
• The average bit-rate is dependent on codec modes, Each mode selects the used encoding rates differently to provide different average rates.
• Lacks DTX and needs to transmit all frames.• One mode in the payload format requires a single frame per
packet.
Broad Voice 16
• Broadcom defined coded, used in voice over cable• Narrowband codec (8kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at
least 2 frames per packet aggregation for TFRC VoIP mode.
• No rate adaptation, fixed encoding at 16 kbps.• No built in comfort noise or DTX.
Broad Voice 32
• Broadcom defined coded, used in voice over cable • Wideband codec (16kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at
least 2 frames per packet aggregation for TFRC VoIP mode.
• No rate adaptation, fixed encoding at 32 kbps.• No built in comfort noise or DTX.
AMR-WB
• 3GPP specified codec, mandatory in UMTS 3G if wideband supported
• Wideband codec (16kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate encoding at 9 different rates between
23.85 and 6.6 kbps• Has built in support for DTX and comfort noise
(SID)• SID (silence descriptor) is sent every 8th frame and
is 5 bytes in size
VMR-WB
• 3GPP2 defined• Wideband Codec (16kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes using 4 different rates
(13.3-1.0 kbps)• Has compatibility mode with AMR-WB (12.6,
8.85, 6.60)• Has DTX mode
Summary of codecs
AMR EVRC SMV BV16 BV32 AMR-WB
VMR-WB
Sampling rate
8k 8k 8k 8k 16k 16k 16k
Frame size
20 20 20 5 5 20 20
Bit-rate (kbps)
4.75-12.2
0.8-8.8(4.2)
0.8-8.8 (4.2)
16 32 6.6-23.85
1.0-13.3
Runtime codec adaption
Y Y Y N N Y Y
DTX Y N N N N Y Y
The effects of codec bit-rate adaptation
• Reduction of codec bit-rate always means lower quality
• The actual switching does affect user perceived quality:– Codec transition effects (varying)– The change in quality can be noticeable
• Switching to higher codec rate may not improve user experience.– Flapping between modes can be more annoying than
constant lower quality
Other codec developments
• Audio encoding, rather than speech:– Greater bit-rate span 10-300 kbps
• Variable frame-rate, depending on codec mode (AMR-WB+), which is problematic in RTP
• Currently scalability is hot:– For audio, usually not speech
– MPEG is doing something
– European union research project assuming arbitrary truncation of packets
Effects of packetization• The AMR codec bit-rate
adaptation has less impact than the choice of packetization on total bandwidth.
• Calculated using IP (20) + DCCP (12) + RTP (12) headers for each packet
• Not unexpected considering that a speech frame including payload overhead is 13, 18 and 32 bytes.
Codec Mode
Frames per packet
Total(kbps)
4.75 3 11.2
6.7 3 13.2
4.75 2 14.2
6.7 2 16.2
12.2 3 18.8
12.2 2 21.8
4.75 1 23.2
6.7 1 25.2
12.2 1 30.8
ReceiverSender
System Delay Overview
• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay
• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP
Codec
MIC
Payload
Packetization
DCCP
Internet
Codec
Speaker
DCCP
Jitter
Buffer
Delay and Robustness Effects
• Although it seems tempting to use 3 frames per packet to save bandwidth it will cost much delay.
• For optimal quality there is need to trade off quality reduction from lower bit-rate modes against the expected system delay.
• For a system which already have a big delay; reduce codec mode.
• For a system with small delays changing packetization to use more frames per packet can be done without much quality cost.
• More frames per packet also reduces robustness
Questions for future studies
• How hard is it to maintain an periodic transmission with TFRC VoIP mode? Otherwise it will introduce extra jitter, which requires more receiver buffering.
• What is the effects of DTX, like in the AMR case, where the packet rate drops to an 1/8th compared to active speech.
Recommended