Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund magnus.westerlund@ericsson.com

Speech codecs and DCCP with TFRC VoIP mode

Magnus Westerlund

magnus.westerlund@ericsson.com

Important Features of TFRC VoIP mode

• Minimum packet interval 10 ms• Packet rate is penalized:

– X = X * S_true / (S_true + H)– H=40; Header size– S_true is complete RTP packet size, i.e. RTP+Payload

• Still TFRC and sending is delayed if not sufficient bit-rate available.

• Slow start of 4 packets, the size limitation is not an issue for the discussed codecs.

ReceiverSender

System overview

• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay

• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP

Payload

Packetization

Internet

Speaker

Jitter

Buffer

Problems with TFRC style packet rate penalties

• Varying the packetization, directly affects the system delay seen at the receiver.

• Requires a jitter buffer that is capable of handling the increased or decreased system delay.

• Frequent changes will make it more problematic for adaptive buffers to correctly parameterize the jitter.

• Buffer under-runs needs to be handled with little impact on voice quality. Thus insertion of audio data or invoking of error concealment becomes required.

Speech and Audio Codecs with RTP Payload formats

• Narrowband codecs:– G.711 (PCMA or PCMU)– G.723– G.726– G.728– G.729– GSM– GSM-EFR– AMR– EVRC– SMV– QCELP– BroadVoice 16– iLBC

• Wideband codecs– AMR-WB

– VMR-WB

– BroadVoice 32

– G.722

• Variable sampling rate– DVI4

– VDVI

– L8

– L16

– PCMA

– PCMU

Codec and RTP payload properties

• Bit-rate of encoded content

• Sample or frame based

• Frame lengths: 2.5, 5, 10, 20, 30, etc. frame lengths in milliseconds

• Basically all payload formats supports aggregation, however some have modes where it is restricted.

DTX and Comfort Noise

• DTX is Discontinuous Transmission• Voice activity detector (VAD) detects if there is

active speech or not. • When there is no active speech different DTX

procedures can be used:– No Transmission at all– Comfort Noise (CN) using RFC 3389– Codec built CN in like AMR SID (Silence Descriptor)

• Frequency of Comfort Noise packets varies but is usually some fraction of normal packet rate

Sample based codecs

• Speech bandwidth depends on sampling rate.• Sample based, and can usually handle any number

of samples per packet.• Usually no adaptivity other than packetization.

Some can vary quantization, like G.726.• Bit-rate depends on sampling rate and sample

quantization. • Example: G.711 uses 8 bits per sample, and 8kHz

sampling. Resulting in 64 kbps audio data rate.• Comfort noise may be supported using RFC 3389.

• 3GPP defined, mandatory speech codec in UMTS 3G networks

• Narrowband codec (8kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate: has 8 encoding modes with bit-rate

between 12.2 and 4.75 kbps. • Has comfort noise generation (SID) and DTX.• The SID (Silence Descriptor) is sent in every 8th

frame and is 5 bytes in size.

EVRC and SMV• 3GPP2 defined, required in CDMA networks• Narrowband codecs (8kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes at 3 (EVRC) or 4 (SMV) different rates, varying from

8.55 to 0.8 kbps depending on audio input. Thus highly variable packet sizes.

• The average bit-rate is dependent on codec modes, Each mode selects the used encoding rates differently to provide different average rates.

• Lacks DTX and needs to transmit all frames.• One mode in the payload format requires a single frame per

packet.

Broad Voice 16

• Broadcom defined coded, used in voice over cable• Narrowband codec (8kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at

least 2 frames per packet aggregation for TFRC VoIP mode.

• No rate adaptation, fixed encoding at 16 kbps.• No built in comfort noise or DTX.

Broad Voice 32

• Broadcom defined coded, used in voice over cable • Wideband codec (16kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at

least 2 frames per packet aggregation for TFRC VoIP mode.

• No rate adaptation, fixed encoding at 32 kbps.• No built in comfort noise or DTX.

AMR-WB

• 3GPP specified codec, mandatory in UMTS 3G if wideband supported

• Wideband codec (16kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate encoding at 9 different rates between

23.85 and 6.6 kbps• Has built in support for DTX and comfort noise

(SID)• SID (silence descriptor) is sent every 8th frame and

is 5 bytes in size

VMR-WB

• 3GPP2 defined• Wideband Codec (16kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes using 4 different rates

(13.3-1.0 kbps)• Has compatibility mode with AMR-WB (12.6,

8.85, 6.60)• Has DTX mode

Summary of codecs

AMR EVRC SMV BV16 BV32 AMR-WB

VMR-WB

Sampling rate

8k 8k 8k 8k 16k 16k 16k

Frame size

20 20 20 5 5 20 20

Bit-rate (kbps)

4.75-12.2

0.8-8.8(4.2)

0.8-8.8 (4.2)

16 32 6.6-23.85

1.0-13.3

Runtime codec adaption

Y Y Y N N Y Y

DTX Y N N N N Y Y

The effects of codec bit-rate adaptation

• Reduction of codec bit-rate always means lower quality

• The actual switching does affect user perceived quality:– Codec transition effects (varying)– The change in quality can be noticeable

• Switching to higher codec rate may not improve user experience.– Flapping between modes can be more annoying than

constant lower quality

Other codec developments

• Audio encoding, rather than speech:– Greater bit-rate span 10-300 kbps

• Variable frame-rate, depending on codec mode (AMR-WB+), which is problematic in RTP

• Currently scalability is hot:– For audio, usually not speech

– MPEG is doing something

– European union research project assuming arbitrary truncation of packets

Effects of packetization• The AMR codec bit-rate

adaptation has less impact than the choice of packetization on total bandwidth.

• Calculated using IP (20) + DCCP (12) + RTP (12) headers for each packet

• Not unexpected considering that a speech frame including payload overhead is 13, 18 and 32 bytes.

Codec Mode

Frames per packet

Total(kbps)

4.75 3 11.2

6.7 3 13.2

4.75 2 14.2

6.7 2 16.2

12.2 3 18.8

12.2 2 21.8

4.75 1 23.2

6.7 1 25.2

12.2 1 30.8

ReceiverSender

System Delay Overview

• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay

• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP

Payload

Packetization

Internet

Speaker

Jitter

Buffer

Delay and Robustness Effects

• Although it seems tempting to use 3 frames per packet to save bandwidth it will cost much delay.

• For optimal quality there is need to trade off quality reduction from lower bit-rate modes against the expected system delay.

• For a system which already have a big delay; reduce codec mode.

• For a system with small delays changing packetization to use more frames per packet can be done without much quality cost.

• More frames per packet also reduces robustness

Questions for future studies

• How hard is it to maintain an periodic transmission with TFRC VoIP mode? Otherwise it will introduce extra jitter, which requires more receiver buffering.

• What is the effects of DTX, like in the AMR case, where the packet rate drops to an 1/8th compared to active speech.

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund magnus.westerlund@ericsson.com

Documents

Validating DCCP Simultaneous Feature Negotiation Procedureceur-ws.org/Vol-1372/paper7.pdf · Validating DCCP Simultaneous Feature Negotiation Procedure Somsak Vanit-Anunchai School

Lecture 8: TCP Friendliness, DCCP, NATs, and STUN

DCCP update - Linux kernelvger.kernel.org/netconf2009_slides/dccp_nc_slides.pdf · Background DCCP is thanks to Arnaldo (OLS 2005) – pluggable framework – based on TCP abstractions

Westerlund 1

Lecture 8: TCP Friendliness, DCCP, NATs, and · PDF file• Every DCCP packet uses a new sequence number - Data - Acknowledgements - Control trafﬁc

EXAMENSARBETEltu.diva-portal.org/smash/get/diva2:1026407/FULLTEXT02.pdf · Ola Westerlund D0032D Examensarbete Samuel Westerlund Luleå Tekniska Universitet, Datornätverk THDNG 2014-07-16

AIO-TFRC: A light-weighted rate control scheme for streaming over

TFRC for Voice: the VoIP Variant

Datagram Congestion Control Protocol (DCCP) - Eddie …read.seas.harvard.edu/~kohler/pubs/rfc4340.pdf · RFC 4340 Datagram Congestion Control Protocol (DCCP) March 2006 mechanisms

Lars Westerlund Suomalaiset SS-VAPAAEHTOISET ja ... · 1941–1943 Lars Westerlund SKS Suomalaiset SS-VAPAAEHTOISET ja väkivaltaisuudet 1941–1943 Talvisodan katkerien kokemusten

Molecular Clouds Towards RCW49 and Westerlund 2

AnalysisofTwo-LayerProtocols: DCCP Simultaneous ... · PDF fileAnalysisofTwo-LayerProtocols: DCCP Simultaneous-OpenandHolePunching Procedures∗ Somsak Vanit-Anunchai School of Telecommunication

BEING A FRONTLINER1 dccp

Session 14 Yngve Westerlund

DCCP : Datagram Congestion Control Protocol (rev. 11)eugen.dedu.free.fr/publi/dccp.pdf · 3 Introd Requirements Header Functioning Implem Concl DCCP draft organisation DCCP gathers

TFRC: TCP Friendly Rate Control using TCP Equation Based Congestion Model

– Anette Blomqvist – Ulrica Larsson – Erik Westerlund –

TFRC Based adaptive video Streaming in cloud

To implement Video streaming over TFRC -Manigandan Natarajan -Manigandan Natarajan

Friendly Rate Control (TFRC)