HD Voice Codecs

Embed Size (px)

Citation preview

  • 7/27/2019 HD Voice Codecs

    1/21

    http://www.voipsupply.com/hd-voice-codecs

    HD voice codecs

    What is a codec?

    The word codec comes from mashing together the functions ofcompressing (co) and decompressing (dec) analog sound into digital bits

    for use by computers and networks. There are literally hundreds of audio

    codecs -- pieces of computer code -- available today and embedded in

    any device that plays sound, from a simple MP3 player to the hottest

    smart phones. Some are open source and free while others are

    proprietary and/or patented, requiring licensing fees.

    Why are there so many different codecs? Over the years, people have

    created and optimized codecs for the specific environments they were

    going to be used in, so the cellular community built codecs that

    optimized the use of radio frequency (RF) bandwidth while others

    wanted adaptable bit-rate codecs suitable for a wired broadband

    environment that would adjust sound quality depending on how much

    bandwidth was available -- compress a little if there's a lot of bandwidth,

    crunch harder if there's less.

    More recently, developers have been leveraging more efficient computer

    processors to develop better codecs. The tradeoff for using more CPUcycles is, of course, more power required to run them -- not an issue at a

    desktop, but definitely a concern for mobile devices.

    A number of codecs are ITU (International Telecommunications Union)

    standards, formalized for international use and incorporation into

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    2/21

    http://www.voipsupply.com/hd-voice-codecs

    devices. If a codec name starts with a G and a period, such as G.711 or

    G.722, it's an ITU standard.

    Popular HD voice codecs - G.722, AMR-WB, SILK, iSAC

    You can't talk about HD voice codecs without first talking about baselineanalog and digital voice quality. Established way back in 1972, G.711 is

    the standard for stock VoIP voice quality and equal to what you get out

    of a POTS analog phone call. It captures speech in a range of 3.4 kHz, has

    a sampling rate of 8 kHz, and needs 64 kbit/s of bandwidth to deliver a

    call.

    G.722 is Old School when it comes to HD voice, formalized back in 1988.

    It captures sound in a range of 7 kHz and samples audio at a rate of 16kHz -- double that of G.711. The result is superior quality and clarity far

    above a POTS analog phone call. Taking advantage of CPU processing

    speeds, G.722 can deliver double the quality of a G.711 phone session in

    the same amount of bandwidth -- 64 kbit/s.

    You'll find G.722 built into pretty much every desktop VoIP handset built today

    (2010), regardless of manufacturer or model of phone -- yes, even the

    modest-looking $129 list price entry models support G.729. Patents on

    G.722 have expired so there's no licensing fees and the processing

    requirements are minimal on today's chips. At least one software shop

    (D2 Technologies) has implemented G.722 for the Android mobile

    operating system. Handset manufacturers who support G.722 include

    Aastra, ADTRAN, Allworx, AudioCodes, Avaya, Cisco, Panasonic, Polycom,

    Siemens and Snom .

    Coming strong out of Europe and the mobile community is AMR-WB, also

    known as G.722.2. Mobile operators wanted better sound qualitydelivered in less bandwidth, so AMR-WB should deliver quality G.722

    quality at around 24 kbit/s. France Telecom and Ericsson have been

    leaders in promoting AMR-WB for mobile HD voice -- in part, because

    they hold some of the patents in the standard -- and they would like to

    see AMR-WB appear in desktop phones and software clients so users can

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    3/21

    http://www.voipsupply.com/hd-voice-codecs

    make end-to-end calls in AMR-WB, rather than having to translate

    (transcode) between G.722 and AMR-WB. You'll see more AMR-WB buzz

    for desktop handsets later in 2010 and into 2011.

    SILK is Skype's "super wideband" voice codec. Optimized for real-time

    communications on the Internet, SILK is an adaptive bit-rate codec that

    supports multiple sampling rates ranging from 8 kHz narrowband to 24

    kHz or more. If you have the CPU cycles and bandwidth of 40 Kbp/s, SILK

    gives you the best performance possible. On a lower-powered machine

    and/or with less available bandwidth, SILK drops down and adjusts to the

    conditions involved. Unlike AMR-WB, SILK is available royalty-free. A few

    manufacturers, including AudioCodes, have discussed incorporating SILK

    into their products.

    Finally, Global IP Solution (GIPS) offers a proprietary wideband speech

    codec that has been incorporated into a large number of soft clients and

    applications, including AIM, Citrix Online, CommuniGate, Gizmo5, Google

    Talk, IBM Lotus, NimBuzz, QQ, WebEx, and Yahoo!

    The problem with too many different codec

    In order to have a successful HD voice call, both (or nearly all in aconference) need to use the same codec. If both sides are using different

    HD codecs either one side has to be transcoded -- translated -- into the

    same codec type or both sides have to shift to a mutually agreeable

    codec.

    Transcoding already takes place in the VoIP world on a daily basis, with

    calls being compressed before sent out long distance and translations

    taking place between the POTS network and VoIP transport. The issueswith transcoding between HD codecs are that it takes more horsepower

    (processing cycles) than with vanilla VoIP/POTS networks and nobody is

    willing to say the end translation product is as good as a "pure" end-to-

    end HD voice call using a single codec.

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    4/21

    http://www.voipsupply.com/hd-voice-codecs

    If both sides can't find a mutually agreeable HD voice codec, they end up

    dropping down to the lowest common denominator -- G.711 -- which kills

    the primary point of using HD in the first place.

    What is HD Voice?

    HD voice is a technology that delivers at least twice the sound as

    compared to a typical voice phone call (i.e. "Plain old phone service" or

    POTS to be hip; Public switched telephone network or PSTN if you're

    more formal) delivered on a landline through the world's analog circuit-

    switched phone network.

    Real world benefits from HD voice include:

    1. Better comprehension and clarity, especially in longdetailed and/or technical discussions

    2. Clarity in understanding acronyms

    3. The ability to differentiate between and clearly identifyothers on a conference call

    4. Clarity and easier understanding in multi-national/multi-lingual conversation where you have non-native speakersand native speakers communicating in one or morelanguages

    5. More accurate transcriptions (both human andautomated)

    In short, everything involving voice is better in HD voice, be it simple

    person-to-person call , a 20 person international conference discussion,

    or a speech-to-text process.

    The technology of HD voice

    Sound is measured in hertz, or Hz. The human ear can typically hear

    everything between 20Hz and 20,000Hz. The higher the number, the

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    5/21

    http://www.voipsupply.com/hd-voice-codecs

    higher (squeaker) the sound is until you move past 20,000Hz and into

    ultrasound frequencies only a dog can pick up.

    A landline phone call captures and delivers sound in a range of 300Hz to

    3400Hz, so there's a lot of sound information chopped off on both the low

    and high-ends of the scale. For simplicity's sake, a POTS call has a range

    of about 3.4 kHz (3400Hz). POTS calls are often called "narrowband" calls

    because they have such a restricted range as compared to what the

    human ear can actually hear and process.

    Since an HD voice call is defined as delivering at least twice the sound

    range of a traditional phone call, an HD voice call will have a range of

    about 7 kHz -- or more. Wideband voice and HD voice are often used

    interchangeably since an HD voice call is "wider" -- more of a Hz range

    than a narrowband call.

    In order to deliver twice or better sound than a POTS call, the first thing

    you need is a phone acoustically built to capture and deliver that extra

    information, so both the microphone(s) and a speaker/handset must

    capable of receiving and delivering across a 7 kHz or greater range.

    Once sound is captured, it needs to be processed into digital form with a

    codec. The G.722 codec (more on codecs later) is generally considered

    the baseline for HD voice; it captures and delivers sound between 30Hz

    to 7000Hz.

    Interestingly, a HD voice call using G.722 can be delivered on the same

    amount of bandwidth as its digital POTS equivalent of G.711 -- 64 kbp/s.

    If you are currently using G.711 in a VoIP phone system, you can switch to HD

    voice without needing more bandwidth.

    Finally, two (or more, if it's a conference call) parties need to be able to

    talk to each other using the same voice encoding (codec) scheme. Within

    an organization/PBX domain, this is pretty easy -- turn on G.722, rebootthe phones, and you're done. Communicating between different HD voice

    groups, or "islands" is more difficult because there's some Internet

    peering and interconnection issues involved, but service providers are

    working out the details to transparently provide HD voice calling.

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    6/21

    http://www.voipsupply.com/hd-voice-codecs

    What phones support HD voice?

    A better question might be "What phones don't support HD voice?" All the

    major IP telephone handset manufacturers -- Aastra, Allworx, AudioCodes,

    Avaya, Cisco, ShoreTel, Panasonic, Polycom, Siemens, and snom

    --support G.722 in their current (2010) phone lines going all the way

    down to the entry-level (i.e cheapest) model.

    Benefits of HD Voice

    Simply put, HD voice makes everything (voice) better. With an HD voice

    call delivering twice the sound as a narrowband one, there's much moreaudio information provided for the brain to process, resulting in less

    fatigue and better comprehension. Computer-based processes like voice

    recognition and speech-to-text also gain from HD voice, with better

    accuracy.

    Advocates of HD voice use the cliche' of a call sounding as clear and

    natural as if you are talking to someone in the same room, and there's a

    laundry list of reasons for wideband goodness ranging from being able to

    understand a three year old (higher voices get clipped) to public safety.

    Specific HD Voice benefits for businesses include:

    Reduction of fatigue

    During a narrowband call, your brain is quietly working to "fill in the

    blanks" by interpreting word sounds that have been clipped to fit into asound range of 300Hz to 3400Hz. All the information you normally hear

    between 20Hz to 300Hz and 3400Hz to 20,000Hz is gone, so your brain

    has to figure out what is being said by using contextual clues.

    For short and clear calls, this isn't a big headache, but the longer the

    call, the more work your brain ends up doing without you thinking about

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/ip-phoneshttp://www.voipsupply.com/hd-voicehttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    7/21

    http://www.voipsupply.com/hd-voice-codecs

    it. (Yes, there's a reason why you dread hour-long calls and long for them

    to be over).

    Better comprehension and clarity

    Acronyms are "notorious" for being hard to understand during anarrowband call, says HD voice expert and Polycom CTO Jeff Rodman. In

    addition, similar-sounding words like "sail" and "fail" also cause confusion.

    A narrowband call can result in a lot of repetition and additional

    explanation -- or people just don't get it the first time through and have

    to get clarification via email... or another phone call.

    Because HD voice provides more sound information, it's easy to

    understand the difference between FEC, FCC, SEC, and FTC on a call. Inprofessions where accuracy and speed counts -- such as medical, legal,

    and financial -- HD voice is a clear winner because information is

    communicated more accurately the first time around. Technical

    conversations are easier because terms can be clearly understood.

    As a result, it is rare that speakers are asked to repeat themselves -- an

    occurrence that happens all too often in narrowband.

    In addition, individual voices -- people -- are highlighted in HD voice,

    making it easier to know who is talking during a call.

    Conference calls rock!

    If there's just one "must have" app for HD voice, it is conferencing. The

    combination of reduced fatigue and better compensation and clarity

    enable people to worry more about the content of discussions, rather

    than trying to struggle understanding what is being said.

    Executives at Fortune 500 companies -- the "C-Level" guys -- are starting

    to insist upon conducting conference calls in HD voice for the efficiency

    it brings. Time is money, and HD voice enables people to focus on the jobat hand and to get it done more quickly.

    Improved multi-national/multi-cultural communication

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/manufacturer/polycomhttp://www.voipsupply.com/ip-phones/conferencehttp://www.voipsupply.com/manufacturer/polycomhttp://www.voipsupply.com/ip-phones/conferencehttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    8/21

    http://www.voipsupply.com/hd-voice-codecs

    HD voice is a clear winner when it comes to international calls and

    another "must have" for businesses regularly doing business with non-

    native speakers of another language.

    For most of us, it is a challenge to speak another language. There are

    accent issues, vocabulary issues, and even tone can be used differently

    to communicate nuances. Put all of those factors into a narrowband call

    and the ability to clearly communicate between offices in Europe and

    Asia becomes much more difficult.

    Using HD voice, non-native speakers will be able to more clearly

    understand what is being said and be more clearly understood when they

    speak. And if everyone is a non-native speaker of the common language

    being used during a call, HD voice might make the difference between

    communication and confusion.

    More accurate transcriptions

    HD voice provides much better raw information for both humans and

    computers to process when it comes to creating transcriptions. Human

    beings can more easily hear what is being said in a recording, saving

    time. Any automated speech-to-text process -- ranging from transcription

    to emailing phone messages -- benefits.

    http://www.voipsupply.com/hd-voice-codecshttp://www.voipsupply.com/hd-voice-codecs
  • 7/27/2019 HD Voice Codecs

    9/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    VoIP bandwidthfundamentals

    E-Mail Print A AA AAA inShare Facebook Twitter Share This RSS ReprintsBandwidth requirements for Voice over IP can be a tricky beast totame until you look at the method and factors involved. This guideinvestigates what bandwidth means for VoIP, how to calculatebandwidth consumption for a VoIP network and how bandwidth can besaved by using voice compression.

    Table of contents

    1. What about bandwidth for VoIP?

    -- An introduction to bandwidth issues for Voice over IP and itsdifferent components.

    2.

    3. Calculating bandwidth consumption for VoIP

    -- This section discusses how bandwidth can be calculated for VoIP

    transmissions and what strategies work best for the majority of

    situations.

    4.

    5. How can voice compression save bandwidth?

    -- Using voice compression can be one of the best strategies when

    trying to save bandwidth. This section discusses how these 'savings'

    can be achieved.

    6.

    What about bandwidth for VoIP?Voice over IP (VoIP) is the descriptor for the technology used to carry

    digitized voice over an IP data network. VoIP requires two classes ofprotocols: a signaling protocol such as SIP, H.323 or MGCP that is usedto set up, disconnect and control the calls and telephony features; anda protocol to carry speech packets. The Real-Time TransportProtocol (RTP) carries speech transmission. RTP is an IETF standardintroduced in 1995 when H.323 was standardized. RTP will work withany signaling protocol. It is the commonly used protocol among IP PBXvendors.

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#introhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#consumptionhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#compressionhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#introhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#consumptionhttp://searchunifiedcommunications.techtarget.com/tutorial/VoIP-bandwidth-fundamentals#compressionhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://searchnetworking.techtarget.com/definition/Real-Time-Transport-Protocolhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    10/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    An IP phone or softphone generates a voice packet every 10, 20, 30 or40ms, depending on the vendor's implementation. The 10 to 40ms ofdigitized speech can be uncompressed, compressed and evenencrypted. This does not matter to the RTP protocol. As you havealready figured out, it takes many packets to carry one word.

    The shorter the packet, the shorter the delay

    End-to-end (phone-to-phone) delay needs to be limited. The shorterthe packet creation delay, the more network delay the VoIP call cantolerate. Shorter packets cause less of a problem if the packet is lost.Short packets require more bandwidth, however, because of increasedpacket overhead (this is discussed below). Longer packets that containmore speech bytes reduce the bandwidth requirements but produce alonger construction delay and are harder to fix if lost. Many vendorshave chosen 20 or 30ms size packets.

    RTP packet format

    The RTP header field contains the digitized speech sample (20 or 30msof a word) time stamp and sequence number and identifies the contentof each voice packet. The content descriptor defines the compressiontechnique (if there is one) used in the packet. The RTP packet formatfor VoIP over Ethernet is shown below.

    Ethernet

    Trailer

    Digitized

    Voice

    RTP

    Header

    UDP

    Header

    IP

    Header

    Ethernet

    Header

    RTP can be carried on frame relay, ATM, PPP and other networks withonly the far right header and left trailer varying by protocol. Thedigitized voice field, RTP, UDP and IP headers remain the same.

    Each of these packets will contain part of a digitized spoken word. Thepacket rate is 50 packets per second for 20ms and 33.3 packets per

    second for 30ms voice samples.The voice packets are transmitted atthese fixed rates. The digitized voice field can contain as few as 10bytes of compressed voice or as many as 320 bytes of uncompressedvoice.

    The UDP header carries the sending and receiving port numbers for thecall. The IP header carries the sending and receiving IP addresses forthe call plus other control information. The Ethernet header carries the

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    11/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    LAN MAC addresses of the sending and receiving devices. The Ethernettrailer is used for error detection purposes. The Ethernet header isreplaced with a frame relay, ATM or PPP header and trailer when thepacket enters a WAN.

    'Shipping and handling'

    In reality, there is no Voice over IP. It is really voice over RTP, overUDP, over IP and usually over Ethernet. The headers and trailers arerequired fields for the networks to carry the packets. The header andtrailer overhead can be called the shipping and handling cost.

    The RTP plus UDP plus IP headers will add on 40 bytes. The Ethernetheader and trailer account for another 18 bytes of overhead, for a totalof at least 58 bytes of overhead before there are any voice bytes in thepacket. These headers, plus the Ethernet header, produce theoverhead for shipping the packets. This overhead can range from 20%to 80% of the bandwidth consumed over the LAN and WAN. Manyimplementations of RTP have no encryption, or the vendor hasprovided its own encryption facilities. An IP PBX vendor may offer astandardized secure version of RTP (SRTP).

    Shorter packets have higher overhead. There are 54 bytes of overheadcarrying the voice bytes. As the size of the voice field gets larger withlonger packets, the percentage of overhead decreases -- therefore theneeded bandwidth decreases. In other words, bigger packets are more

    efficient than smaller packets.

    Header compression

    Cisco has created a header compression technique that is now thestandard called RTP header compression. This technique actuallycompresses the RTP, UDP and IP headers and significantly reduces theRTP, UDP and IP overhead from 40 bytes to between 4 and 6 bytes.The bandwidth consumption for compressed voice packets can bereduced by nearly 60%. This technique has less value for largeuncompressed voice packets. The header compression technique is not

    recommended for the LAN implementations because there is typicallymore than enough bandwidth for voice calls. The header compressiontechnique should be considered for the WAN implementations, wherebandwidth is limited and much more expensive.

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    12/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    Calculating bandwidth consumption for VoIP

    The bandwidth needed for VoIP transmission will depend on a fewfactors: the compression technology, packet overhead, networkprotocol used and whether silence suppression is used. This tipinvestigates the first three considerations. Silence suppression will becovered in a later tip.

    There are two primary strategies for improving IP network performancefor voice: Allocate more VoIP bandwidth (reduce utilization) orimplement QoS.

    How much bandwidth to allocate depends on:

    Packet size for voice (10 to 320 bytes of digital voice)

    CODEC and compression technique (G.711, G.729, G.723.1,

    G.722, proprietary)

    Header compression (RTP + UDP + IP), which is optional

    Layer 2 protocols, such as point-to-point protocol (PPP), Frame

    Relay and Ethernet

    Silence suppression/voice activity detection

    Calculating the bandwidth for a VoIP call is not difficult once you knowthe method and the factors to include. The chart below, "Calculatingone-way voice bandwidth," demonstrates the overhead calculation for20 and 40 byte compressed voice (G.729) being transmitted over aFrame Relay WAN connection. Twenty bytes of G.729 compressedvoice is equal to 20 ms of a word. Forty bytes of G.729 compressedvoice is equal to 40 ms of a word.

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    13/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    The results of this method of calculation are contained in the nexttable, "Packet voice transmission requirements." The tabledemonstrates these points:

    Bandwidth requirements reduce with compression, G.711 vs.

    G.729.

    Bandwidth requirements reduce when longer packets are used,

    thereby reducing overhead.

    Even though the voice compression is an 8 to 1 ratio, the

    bandwidth reduction is about 3 or 4 to 1. The overhead negates

    some of the voice compression bandwidth savings.

    Compressing the RTP, UDP and IP headers (cRTP) is most

    valuable when the packet also carries compressed voice.

    Packet voice transmission requirements

    (Bits per second per voice channel)

    Codec Voice bit

    rate

    Sample

    time

    Voice

    payload

    Packets per

    second

    Ethernet PPP or Frame

    Relay

    RTP cRTP

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    14/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    G.711 64 Kbps 20 msec 160 bytes 50 87.2Kbps

    82.4Kbps

    68.0Kbps

    G.711 64 Kbps 30 msec 240 bytes 33.3 79.4

    Kbps

    76.2

    Kbps

    66.6

    Kbps

    G.711 64 Kbps 40 msec 320 bytes 25 75.6

    Kbps

    73.2

    Kbps

    66.0

    Kbps

    G.729A8 Kbps 20 msec 20 bytes 50 31.2

    Kbps

    26.4

    Kbps

    12.0

    Kbps

    G.729A8 Kbps 30 msec 30 bytes 33.3 23.4

    Kbps

    20.2

    Kbps

    10.7

    Kbps

    G.729A8 Kbps 40 msec 40 bytes 25 19.6Kbps

    17.2Kbps

    10.0Kbps

    Note: RTP assumes 40-octets RTP/UDP/IP overhead per packet

    Compressed RTP (cRTP) assumes 4-octets RTP/UDP/IP overhead per packetEthernet overhead adds 18-octets per packet

    PPP/Frame Relay overhead adds 6-octets per packet

    This table provided courtesy ofMichael Finneran.

    The varying designs of packet size, voice compression choice andheader compression make it difficult to determine the bandwidth tocalculate for a continuous speech voice call. The IP PBX or IP phonevendor should be able to provide tables like the one above for theirproducts. Many vendors have selected 30 ms for the payload size oftheir VoIP implementations. A good rule of thumb is to reserve 24 Kbpsof IP network bandwidth per call for 8 Kbps (G.729-like) compressedvoice. If G.711 is used, then reserve 80 Kbps of bandwidth.

    If silence suppression/voice activity detection is used, the bandwidth

    consumption may drop 50% -- to 8 Kbps total per VoIP call. But theassumption that everyone will alternate between voice and silencewithout conflicting with each other is not always realistic. Silencesuppression will be discussed in a later tip.

    Most enterprise designers do not perform these calculations. Thevendor provides the necessary information. The designer does have

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicemailto:[email protected]:[email protected]://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    15/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    some freedom, such as selecting the compression technique for voicepayloads and headers, and may be able to vary the packet size.

    How can voice compression save bandwidth?The Public Switched Telephone Network (PSTN) started with thetransmission of analog speech. This worked well for decades until theareas under city streets became saturated with copper cables, onecopper pair per call. Starting in the 1950s, AT&T Bell Labs developed atechnique to carry more voice calls over copper wire. They developeddigitized voice technology through which 24 digital calls can be carriedon two pairs of copper wire, thereby increasing the carrying capacity ofthe cables twelvefold. The voice is digitized into streams of 64,000 bpsper call. The technology is called a T1 circuit and the bandwidth for the24 calls is 1.544 Mbps. This worked well for domestic connections. TheT1 technology then became the mechanism for long-distance domestictransmission.

    Most of the early voice compression technologies were designed forundersea cables, where bandwidth was limited and expensive. Voicecompression technologies were created to reduce this bandwidth

    requirement. Voice compression is also used for digital cell calls,operating at about 8 Kbps instead of 64 Kbps. So voice compression isnot new.

    As the PBX market has moved into an IP-based environment, voicecompression has become attractive for WAN transmission. Voicecompression can be used on a LAN, but since LANs have so muchavailable bandwidth, it is not commonly applied to the LAN.

    The quality of a PSTN voice call provides enough analog bandwidth tounderstand the speaker in any language. It is also enough bandwidth

    for speaker recognition. The analog bandwidth delivered by the PSTN isabout 3.4 KHz. This is considered toll quality. Voice compression canreduce the speech quality and may affect speaker recognition, so thereis a limit to how much bandwidth reduction is possible before callerscomplain about voice quality.

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchnetworking.techtarget.com/definition/PSTNhttp://searchnetworking.techtarget.com/definition/PSTNhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    16/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    The CODEC (COder/DECoder) is the component in an IP phone thatdigitizes the voice and converts it back into an analog stream ofspeech. The CODEC is the analog-to-digital-to-analog converter. TheCODEC may also perform the voice compression and decompression.

    There are several voice digitization standards and some proprietarytechniques in use for VoIP transmission. Most vendors support one ormore of the following ITU standards and avoid proprietary solutions:

    G.711 is the default standard for IP PBX vendors, as well as for

    the PSTN. This standard digitizes voice into 64 Kbps. There is no

    voice compression.

    G.729 is supported by many vendors for compressed voice

    operating at 8 Kbps, 8 to 1 compression. With quality just belowthat of G.711, it is the second most commonly implemented

    standard.

    G.723.1 was once the recommended compression standard. It

    operates at 6.3 Kbps and 5.3 Kbps. Although this standard further

    reduces bandwidth consumption, voice is noticeably poorer than

    with G.729, so it is not very popular for VoIP.

    G.722 operates at 64 Kbps, but offers high-fidelity speech.

    Whereas the three previously described standards deliver an analog

    sound range of 3.4 kHz, G.722 delivers 7 kHz. This version of

    digitized speech has been announced by several vendors and will

    become common in the future.It is important to note that all of the voice digitization transmissionspeeds are for voice only. The actual transmission speed required mustinclude the packet protocol overhead.

    The quality of a voice call is defined by the Mean Opinion Score (MOS).A score of 4.4 to 4.5 out of a possible 5.0 is considered to be tollquality. Voice compression will affect the MOS. An MOS below 4.0 willusually produce complaints from the callers. Cell phone calls averageabout 3.8 to 4.0 for the MOS. The following table presents the voiceMOS for different standard CODECs:

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchnetworking.techtarget.com/definition/codechttp://searchnetworking.techtarget.com/definition/codechttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    17/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    Standard Speed MOSSampling delay per phone

    G.711 64 Kbps 4.4 0.75 ms

    G.729 8 Kbps 4.2 10 ms

    G.723.1 6.3 Kbps5.3 Kbps

    4.03.5

    30 ms

    This table illustrates two points. First, as the voice is compressed, thevoice quality (MOS) decreases. The MOS in the table does not includenetwork impairments such as jitter and packet loss. These impairmentswill further reduce the voice quality. The VoIP network designer shouldchoose a compression technique with a higher MOS so the networkimpairments will not reduce the voice quality to an unacceptable level.

    Second, voice compression also adds delay to the end-to-end call. Thetable shows the sampling delay for one phone. This delay is doubledfor the two phones of a call. This end-to-end delay needs to be limited.As compression increases, the delay experienced in the IP networkneeds to decrease, which increases the cost of transmission over theWAN, but not the LAN. The delays shown in the table are thetheoretical minimum. The actual delays experienced will probablyexceed 30 ms, no matter what compression technology isimplemented. This delay will vary by vendor.

    The conclusion is that digital voice compression is worth pursuing forVoIP transmission on a WAN, but it comes with some costs in voicequality reduction and increased end-to-end delay.

    For more information, view this VoIP over WAN tutorial.

    About the author:Gary Audin has more than 40 years of computer, communications andsecurity experience. He has planned, designed, specified, implementedand operated data, LAN and telephone networks. These have included

    local area, national and international networks, as well as VoIP and IPconvergent networks in the U.S., Canada, Europe, Australia and Asia.

    About G.711

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://searchenterprisewan.techtarget.com/guides/VoIP-over-WAN-tutorialhttp://searchenterprisewan.techtarget.com/guides/VoIP-over-WAN-tutorialhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    18/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    ISDN audio telecommunication may in principle be accomplished in many ways, butmost regular calls are compressed according to the G.711 recommendation of theCCITT (Comit Consultatif International Tlphonique et Tlgraphique, whichnowadays has been integrated into ITU). G.711 allows compression by usinglogarithmic interpolation, which reduces 14 most significant bits into 8. As thesampling rate is 8 kHz, the transmission rate equals to the 64 kbps offered by one

    ISDN B-channel. There are two brands of G.711: A-law is dominant in Europe,whereas United States and Japan are commonly using u-law.

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    19/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    What is HD voice?

    Your standard POTS call captures and delivers sound in an audio range of 300 hertz

    to 3400 hertz with standards set back in 1937. The VoIP equivalent of POTS is

    G.711 and takes up 64 kbit/s of bandwidth.

    The baseline definition for wideband voice typically called HD voice is G.722. It

    delivers audio in the range of 30 to 7000 hertz, about twice as good as a typical

    POTS calls and G.711. Due to a little data compression on the fly, a G.722 phone

    call only takes up 64 kbit/s of bandwidth.

    The combination of upper and lower frequency sounds gives a much clearer and

    "richer" experience on voice calls with the key marketing-speak phrase used to

    describe it as, "Conversations sound as clear and natural as if talking to someone in

    the same room."

    Additional/complementary buzz-phrases include "a dramatically improved

    communications experience" and "Conference calls will be easy to follow and much

    less exhausting."

    Why is HD voice such a big deal?

    Current quality of phone calls suck compared to FM radio or CDs and mobile callssuck more. Cellular tech heads started with a 1937-era audio standard and then ranthe the quality of experience through a more via data compression blender to crammore calls into radio frequency (RF) spectrum.Implementing HD voice should make everything revolving around voice conferencecalls, IVR, speech-to-text, calls to Mum and the wee ones a much betterexperience.

    How do you deliver mobile HD voice?First, forget about the POTS network and all that legacy analogue crap. You need anall-IP network with low latency and enough bandwidth to transport a wideband voicecall, so you need the latest hot-rocking 3G and 4G-esque data networks.

    France Telecom is delivering HD voice over the latest GSMHSPA-alphabet-soup via asoft client, but you can do the same thing on a fastenough WiMAX or LTE network. Qualcomm has done some demos over CDMA, but

    given the worldwide love of 4G, mobile HD on that tech might be some wishfulthinking.

    You also need end-user devices (i.e. phones) with a quality microphone to capture 7KHz of sound, enough CPU horsepower to encode and decode that information on thefly, and a speaker/headphone to deliver the sound to the human ear.

    Nokia and SonyEricsson have announced phones that support AMR-WB, the de factostandard of mobile HD voice. You can also do mobile HD voice with a softclient and a

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/company/viahttp://news.techeye.net/topic/3ghttp://news.techeye.net/topic/4ghttp://news.techeye.net/topic/gsmhttp://news.techeye.net/topic/hspahttp://news.techeye.net/topic/wimaxhttp://news.techeye.net/topic/ltehttp://news.techeye.net/company/qualcommhttp://news.techeye.net/company/nokiahttp://news.techeye.net/company/sonyhttp://news.techeye.net/company/ericssonhttp://news.techeye.net/company/warner-brothershttp://news.techeye.net/company/viahttp://news.techeye.net/topic/3ghttp://news.techeye.net/topic/4ghttp://news.techeye.net/topic/gsmhttp://news.techeye.net/topic/hspahttp://news.techeye.net/topic/wimaxhttp://news.techeye.net/topic/ltehttp://news.techeye.net/company/qualcommhttp://news.techeye.net/company/nokiahttp://news.techeye.net/company/sonyhttp://news.techeye.net/company/ericssonhttp://news.techeye.net/company/warner-brothershttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    20/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    sufficiently powerful smartphone; expect to see HD voice clients for the iPhone beingdemoed by Global IP Solutions (GIPS) and Fraunhofer using codecs other than AMR-WB.

    AMR-WB what the hell?AMR-WB (AMR-wideband) is the codec and heir-apparent replacement for AMR used

    in "standard" GSM calls to provide mobile HD voice. Also called G.722.2, it isdesigned to provide an HD voice experience in 24 kbit/s a big deal to the cellularworld that wants to conserve both RF and network bandwidth.

    But there's no free beer when compared to G.722. AMR-WB requires more CPUcycles and number crunching for efficient compression which translates to shorterbattery life. Further, AMR-WB is a patented codec with intellectual propertycontributed by France Telecom/Orange, Nokia, and Ericsson and VoiceAge.Alternatives to AMR-WB have been floated ranging from implementing G.722to Skype's SILK to Fraunhofer providing an "AAC Enhanced Low Delay" codec basedon MPEG.

    G.722 has the advantages of being royalty-free and not such as CPU devourer, but it

    takes up 64 kbit/s for the cellular RF heads, this is a theoretical show stopper, butsince the mobile people are pimping their data networks to support two-way videocalling with HD voice, the whole "conserve RF/conserve network bandwidth"argument is crap. Device manufacturers also like the fact that G.722 is a simplepiece of code to implement relative to all the different profile flavors for AMR-WB.Skype wants everyone to use SILK and offers it as royalty-free and open source butafter the skeletons as to who owned what IP after eBay bought Skype, well Itdoesn't stop people loading Skype clients on mobile phones and running SILK"natively."And it works just like normal phone calling, eh? If I have an HD voice phone and mybud does-Wellnot really, not yet.

    Carriers and businesses running HD voice currently operate as islands you cancommunicate with someone within your network, but if carrier A has HD voice andcarrier B has HD voice, you aren't going to be able to connect an HD voice phone callbecause the higher level SIP/IP connectivity isn't set up if you are using those old-fashioned phone numbers to "dial" another person.

    Some sort of HD voice interoperability / interconnection announcement at MobileWorld Congress is purportedly going to take place where a group of mobile carriershave agreed to exchange AMR-WB calls among themselves and if so, this is one ofthose Key Announcements which will get HD voice moving faster.HD voice interoperability is not technically hard, since mobile carriers already haveways to exchange MMS and picture mail and all those other multimedia-loaded

    services via IP; supporting AMR-WB calls is just another data type to exchange viaIP. But the politics is another story.

    Speaking of ugly, how do calls move between the PSTN, HD voice, AMR-WB, G.722,SILK, and whatever flavor-of-the-day codecs pop up?

    Calls need to be transcoded translated between codecs. For example, FranceTelecom already has to transcode between mobile HD voice users and its own PSTNconnections to the rest of the world. And you have to transcode between AMR-WB

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/topic/smartphonehttp://news.techeye.net/product/iphonehttp://news.techeye.net/company/orangehttp://news.techeye.net/company/skypehttp://news.techeye.net/topic/open-sourcehttp://news.techeye.net/company/ebayhttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/smartphonehttp://news.techeye.net/product/iphonehttp://news.techeye.net/company/orangehttp://news.techeye.net/company/skypehttp://news.techeye.net/topic/open-sourcehttp://news.techeye.net/company/ebayhttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/topic/mobile-world-congresshttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
  • 7/27/2019 HD Voice Codecs

    21/21

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice

    and G.722 (mobile HD voice and broadband HD voice), plus SILK since Skype wantsits due for HD voice.

    Doug Mohney is Editor-in-Chief of HD Voice News (www.hdvoicenews.com) and ishappy to cause heartburn in league with Mike Magee whenever he can.

    Read more:http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL

    http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://www.hdvoicenews.com/http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViLhttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViLhttp://www.hdvoicenews.com/http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voicehttp://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL