Upload
siddharta-mangas-espinosa
View
232
Download
0
Embed Size (px)
Citation preview
8/12/2019 Voice Fundamentals Book
1/127
F o r e w o r d
After more than 100 years of experience in providing global telecommunications
solutions, Nortel (Northern Telecom) has acquired Bay Networks, Inc., adding world-
class, IP-based data communications capabilities that complement and expand Nortel's
acknowledged strengths. This precedent-setting union creates Nortel Networks, a new
company with a widely respected heritage and a unique market position: Unified
Networks.
Unified Networks create greater value for customers worldwide through network
solutions that integrate data networking and telephony. The Unified Networks strategy
extends to solutions, products, and services, delivering new economics in networking by
reducing costs, introducing higher revenue services, and delivering new value derived
through networking.
The emergence of the World Wide Web and increasing market deregulation has created
a strong demand for networks that provide increased profitability and higher service
levels for organizations of all types. Unified Networks from Nortel Networks deliver
cutting-edge solutions to reach these new levels of economics.
To meet increased needs, Nortel Networks is delivering a new class of customer rela-
tionships. Extranets, intranets, Web access, e-mail, call centers, and old-fashioned
personal attention are combined to help customers deal with a wide range of new, chal-
lenging, and potentially confusing issues. Whether a customer is at the enterprise level,
a service provider, or a small business, Nortel Networks delivers Unified Networks
solutions designed to meet their unique business challenges.
Solutions based on Unified Networks strategies can take many forms, involving many
different products and technologies - including those that existed prior to the merger with
Bay Networks and many launched since. Unified Networks solutions are differentiated
only by their size, scope, and ambition.
Each solution is tailored to the unique needs of the customer, and integrates a variety of
products, technologies, and services, some of which are described below.
Accelar brings together switching and routing into a low-cost, very high-performance
package.
CallPilot unifies disparate messaging systems, user interfaces, and presentation
formats, making messaging more intuitive and easier to use.
I
Vo ice Fu n d a m e n ta ls
I n t r o d u c t i o n
8/12/2019 Voice Fundamentals Book
2/127
8/12/2019 Voice Fundamentals Book
3/127
#1 FRADs - Dataquest 1998
#1 FRAD revenues - Dell'Oro 1H98
#1 Packet Switch - Dataquest 1998
#1 in PADs - Dataquest 1998
Vo ice Co m m u n ica t i o nVoice communication has long existed in the world of analog and digital telephone
exchanges. Fixed, dedicated, switched services have provided the user with the ability
to place telephone calls to practically anywhere in the world. The way that voice is
carried and switched within both public and private networks is changing with the
evolution towards the Broadband-Integrated Services Digital Network (B-ISDN), as
based upon Asynchronous Transfer Mode (ATM) technology. This evolution is acceler-
ating the shift to enterprise networks capable of handling voice, video, and data trans-
missions over a single, integrated infrastructure.
This evolution delivers numerous benefits, including more efficient use of network
bandwidth and the ability to offer many different types of voice service. However, there
are many issues that first need to be understood, and then overcome, before these
benefits can be realized.
This booklet examines these issues, highlighting voice communication techniques in
use today and investigating those that may become commonplace in the future. Of
course, with the limited space available, every technological aspect will not be covered
in depth. This booklet is designed to serve as useful introduction to the main subject
areas, and additional references have been included for individuals interested in
learning more.
References are given by a number in square brackets and listed in Appendix D e.g. [3].
In t e n d e d Au d i e n ceThis booklet is intended for a broad range of readers who are involved in the design,
implementation, operation, management, or support of enterprise networks carrying
both voice and data traffic. It has particular relevance to those with a background in data
and/or limited experience in voice technologies, although it will also serve as a useful
reference for anyone involved in voice networks today.
III
Vo ice Fu n d a m e n ta ls
I n t r o d u c t i o n
8/12/2019 Voice Fundamentals Book
4/127
IV
Vo ice Fu n d a m e n ta ls
8/12/2019 Voice Fundamentals Book
5/127
V
Vo ice Fu n d a m e n ta ls
Ta b le o f C o n t e n t s
Ta b le o f Co n t e n t s
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 A Short History of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 The Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32.1 Key Voice Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.1.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.1.2 Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2.2 Basic Operation of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2.2.1 Basic Telephony - Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2.2.2 Basic Telephony - The Speech Path . . . . . . . . . . . . . . . . . . . . . . .7
2.3 Other Types of Telephones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
2.3.1 "Two-wire" vs. "Four-wire" Telephony . . . . . . . . . . . . . . . . . . . . . . .8
3 PBX Phone Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.1 Introduction to the PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.2 Call Routing in a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.3 Voice Interfaces on a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
4 Introduction to Digital Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134.1 The Channel Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
4.2 Digital Voice - Pulse Code Modulation (PCM) G.711 . . . . . . . . . . . . . . .13
4.2.1 A-law and -law PCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
4.2.2 Power of a Digital Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
4.2.3 Distortion Resulting from the Digitization Process . . . . . . . . . . . .17
4.3 The Digital 1.544 Mbps PBX Interface (DS-1) . . . . . . . . . . . . . . . . . . . .17
4.3.1 Physical Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
4.3.2 Framing - D4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
4.3.3 Framing - Extended Superframe (ESF) . . . . . . . . . . . . . . . . . . . .18
4.3.4 Channel Associated Signaling (CAS) on DS-1 . . . . . . . . . . . . . . .19
4.3.5 Common Channel Signaling on DS-1 . . . . . . . . . . . . . . . . . . . . .20
4.3.6 DS-1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
4.4 The Digital 2.048 Mbps PBX Interface (E1) . . . . . . . . . . . . . . . . . . . . . .20
4.4.1 Physical Interface - G.703 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
4.4.2 Framing Structure - G.704 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
4.4.3 Channel Associated Signaling (CAS) on E1 . . . . . . . . . . . . . . . . .23
4.4.4 Common Channel Signaling (CCS) on E1 . . . . . . . . . . . . . . . . . .24
4.4.5 E1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
8/12/2019 Voice Fundamentals Book
6/127
4.5 The Need for PBX Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
4.5.1 PBX Systems Without Synchronization . . . . . . . . . . . . . . . . . . . .26
4.5.2 PBX Systems With Synchronization . . . . . . . . . . . . . . . . . . . . . . .26
5 Speech Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .295.1 Different Coding Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
5.2 Adaptive Differential Pulse Code Modulation (ADPCM) . . . . . . . . . . . . .31
5.3 Code Excited Linear Prediction (CELP) . . . . . . . . . . . . . . . . . . . . . . . . .33
5.4 Low Delay-CELP (LD-CELP) ITU-T G.728 . . . . . . . . . . . . . . . . . . . . . . .34
5.5 Conjugate Structure-Algebraic CELP (CS-ACELP) ITU-T G.729 . . . . . .34
5.6 Other Compression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
5.7 Speech Compression Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
5.7.1 Mean Opinion Score (MOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
5.7.2 Quantization Distortion Units (QDUs) . . . . . . . . . . . . . . . . . . . . . .38
5.7.3 Speech Compression and Voice-Band Data . . . . . . . . . . . . . . . . .39
5.8 Fax Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
6 Echo and Echo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .416.1 What is Echo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
6.1.1 Causes of Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
6.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
6.2.1 When Is Echo Control Required? . . . . . . . . . . . . . . . . . . . . . . . . .43
6.2.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
6.3 Echo Suppressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
6.4 Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
6.4.1 Nonlinear Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
6.4.2 Tail Circuit Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
6.4.3 Types of Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
6.4.4 Tone Disabling of Echo Cancellers and Echo Suppressors . . . . . .49
6.4.5 G.168 (Improved Echo Canceller) . . . . . . . . . . . . . . . . . . . . . . . .49
7 Introduction to Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .517.1 Analog Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
7.1.1 Ground Start: 2 Way PBX to Public Exchange Trunk Circuit . . . . .51
7.1.2 E&M Trunk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
7.1.3 AC Signaling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
7.1.4 Manual Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
7.2 Digital Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
7.2.1 Channel Associated Signaling (CAS) . . . . . . . . . . . . . . . . . . . . . .58
7.2.2 Common Channel Signaling (CCS) . . . . . . . . . . . . . . . . . . . . . . .59
VI
Vo ice Fu n d a m e n ta ls
Ta b le o f C o n t e n t s
8/12/2019 Voice Fundamentals Book
7/127
7.2.2.1 Private-to-Public Networking Protocols . . . . . . . . . . . . . . . .60
7.2.2.2 Public to Public Networking Protocol . . . . . . . . . . . . . . . . .60
7.2.2.3 Private Networking Protocols . . . . . . . . . . . . . . . . . . . . . . .60
7.2.2.4 How Does CCS Work? . . . . . . . . . . . . . . . . . . . . . . . . . . .61
8 Voice Within the Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . .658.1 What is an Enterprise Network? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
8.1.1 Different Types of Enterprise Networks . . . . . . . . . . . . . . . . . . . .66
8.2 Time Division Multiplexers (TDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68
8.2.1 TDM Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
9 Voice Over Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . .719.1 Introduction to ATM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
9.1.1 The ATM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
9.1.2 The ATM Adaptation Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
9.1.3 ATM Service Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
9.1.4 Statistical Multiplexing with ATM . . . . . . . . . . . . . . . . . . . . . . . . .76
9.2 Voice and Telephony Over ATM (VTOA) . . . . . . . . . . . . . . . . . . . . . . . .77
9.2.1 Voice Sample Cellification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
9.2.2 Speech Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
9.3 PBX Synchronization Across an ATM Network . . . . . . . . . . . . . . . . . . .80
9.3.1 Synchronous Residual Time Stamp (SRTS) . . . . . . . . . . . . . . . . .81
9.3.2 Adaptive Clock Recovery (ACR) . . . . . . . . . . . . . . . . . . . . . . . . .81
9.3.3 Independent Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
10 Voice Over Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8310.1 Introduction to Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
10.2 Voice Over Frame Relay (VoFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
10.2.1 Delay on Frame Relay Networks . . . . . . . . . . . . . . . . . . . . . . . .86
10.2.2 VoFR Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86
10.3 Benefits and Issues of VoFR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
10.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
10.3.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
11 Voice Over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9111.1 What is IP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
11.1.1 How Voice Over IP Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
11.1.2 Benefits and Issues of Voice Over IP . . . . . . . . . . . . . . . . . . . . .93
11.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
VII
Vo ice Fu n d a m e n ta ls
Ta b le o f C o n t e n t s
8/12/2019 Voice Fundamentals Book
8/127
12 Voice Switching in the Enterprise Network . . . . . . . . . . . . . . . . . . . .9512.1 The Evolution of Voice Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
12.2 What is Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98
12.3 Why Perform Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
APPENDIX A - Company and Product Overview . . . . . . . . . . . . . . . . .101
APPENDIX B - Introduction to Decibels . . . . . . . . . . . . . . . . . . . . . . . .109
APPENDIX C - Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
APPENDIX D - References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
VIII
Vo ice Fu n d a m e n ta ls
Ta b le o f C o n t e n t s
8/12/2019 Voice Fundamentals Book
9/127
1
Vo ice Fu n d a m e n ta ls
1 In t rod u ct ion
This booklet provides an introduction into many of the fundamentals of voice communi-
cation, beginning with analog techniques, and concluding with voice over ATM. Some of
the ways that enterprise networks can be created will be discussed, and how to make
the most efficient use of available network bandwidth through techniques such as
speech compression and speech activity detection (also known as silence suppression).
Many other issues are discussed, including how and why echo occurs and techniques
that can be used to overcome it, thus allowing network managers to migrate their voice
networks onto ATM.
1.1 A Sh o r t His to ry of t h e Telep h on eIn the mid-1870s, while trying to understand sound and sound communications,
Scottish-born inventor Alexander Graham Bell had an idea for a device that would
transmit sound over long distances by converting the sound to an electrical signal. This
device was later called the telephone derived from the Greek words meaning 'far' (tele)
and 'sound' (phone). Bell was not the only person of the time developing a telephone
device. However, he was the first one to patent the device in 1876.
Further developments were made to the telephone during the late 1870s. Bell created
the induction-based earpiece, and Thomas Edison was responsible for the design of the
carbon microphone. The incorporation of these enhancements produced a truly practical
instrument.
Initially the telephone had no mechanism for dialing another number. To make a call a
handle needed to be turned which generated an electric current. This current signaled
the operator of the local exchange. To connect a caller to the called party, the operator
would manually insert a jack plug into the corresponding jack socket.
It wasn't until 1889 that Almon B. Strowger developed the automatic telephone
exchange. The most unlikely of people to be involved in telephony, Strowger developed
the exchange as a way of beating his business rival in Kansas City, USA. The wife of
Strowger's main competitor was the operator of the local exchange, and whenever a call
came in asking for an undertaker, naturally she passed it onto her husband. To alleviate
this problem, Strowger developed the first automatic telephone exchange and the dial
telephone, eliminating the need for an operator.
Telephone networks have undergone many changes since those early days. However,
many of the underlying principles remain the same. The basic "two-wire" telephone used
Sect ion 1 - In t ro du c t ion
8/12/2019 Voice Fundamentals Book
10/127
in most domestic homes today still operates in essentially the same way as the tele-
phones of over 100 years ago.
2
Vo ice Fu n d a m e n ta ls
Sect ion 1 - In t ro du c t ion
8/12/2019 Voice Fundamentals Book
11/127
2 Th e Te lep h on e
In this section, the basic operation of the telephone is examined with a look at the two
basic functions that it offers: signaling and speech transmission. To better understand
this critical piece of equipment, it is important to appreciate how human voice and
hearing function. To complete this section, other types of telephones will be examined,
including proprietary designs and digital telephone sets.
2.1 Key Voice Fu n d a m en ta ls
2.1.1 Fre q u e n cy
Human speech occurs as a result of air being forced from the lungs, through the vocal
chords and along the vocal tract which extends from an opening in the vocal chords to
the mouth and nose. Speech consists of a number of different types of sounds, including
voiced, unvoiced and plosive sounds. The voiced sounds result from the vocal chords
vibrating, thus interrupting the flow of air from the lungs and producing sounds in the
frequency range of approximately 50 to 500 Hertz (Hz). Unvoiced sounds result when
the air passes some obstacle in the mouth or a constriction in the vocal tract. Finally,
plosive sounds result from air being let out with a sudden burst, for example when the
vocal tract is closed then suddenly released, or when the mouth is closed and suddenly
opened. A person's nasal cavities and sinuses also modify all of these sounds, and all
contribute to what we know as normal human speech.
The range of frequencies that result from these sound sources, combined with the
structure of the vocal tract, nasal cavities, and sinuses vary, depending upon whom is
actually speaking. The resulting mix of frequencies determines the unique sound of a
person's voice.
The range of frequencies produced by speech varies significantly from one person to
another as explained above. Normally, frequencies in the range of about 50 Hz upward
are generated, with the majority of the energy concentrated between about 300 Hz and
3 kilohertz (kHz). The human ear, on the other hand, can detect sounds over a range of
frequencies from around 20 Hz to 20 kHz, with maximum sensitivity in the region
between 300 Hz and 10 kHz.
Taking these two factors into account, as well as the results of practical testing, the
frequency band of 300 Hz to 3.4 kHz has been found to be the key sonic range for
speech intelligibility and voice recognition. Reducing this bandwidth quickly reduces
intelligibility, while increasing it adds quality yet does not significantly improve intelligi-
bility or voice recognition.
3
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep h on e
8/12/2019 Voice Fundamentals Book
12/127
As a result, the frequency band used in telephone systems is limited to between 300 Hz
and 3.4 kHz, delivering a system that provides speech transmission that is quickly
recognized and easily understood.
2.1.2 LevelsIt is important to ensure that voice signals are transmitted at the correct level across a
network, so that end-to-end performance is maintained. Too low a level can result in
speech merging into background noise, creating an environment where the listener finds
it hard to hear the talker and is encouraged to talk loudly. On the other hand, too high a
level will encourage the listener to talk too quietly.
Today, international voice communication is part of everyday life. People need to be able
to communicate with others anywhere in the world as effectively as if they were in their
own country, or even in their own office. This goal is complicated by the way telephone
systems have evolved differently in different countries. For example, an analog (the term
analog is described in section 4.2) telephone from North America transmits a lower-level
electrical signal for a given acoustic volume than a telephone in the UK.
Signal levels will be discussed in this booklet in terms of decibels (dBs), and related
terms such as dBm, dBm0, and dBr. For readers unfamiliar with these terms, or individ-
uals who simply need a refresher, please refer to Appendix B.
2.2 Ba s ic Ope r a t i on o f t h e Te le p h on e
Telephones come in many varieties, yet they fall into two main categories: analog, and
digital. The original sets designed by A. G. Bell were analog. In fact, most telephones
used in domestic environments are still analog.
The simplest form of telephone today is the two-wire "loop-disconnect" telephone. It is
also known by various other names, including "loop-start" and "POTS" (Plain Old
Telephone Service) telephone. It connects to the telephone exchange via two wires that
carry the voice signals in both directions, hence the term two-wire telephone. The wires
also carry the dialed digits to the exchange and the incoming ringing voltage to the
phone. The exchange places a voltage of about 48 volts across the pair of wires to
power the telephone and monitor the on-hook, off-hook, and pulse dialing activity.
2.2.1 Ba sic Telep h on y - Sign a lin g
To initiate a call, the user lifts the handset. This action closes a switch in the telephone
and causes current to flow in a loop, hence the term "loop-start." The exchange detects
this current as an incoming call and provides a dial tone to the line. The dial tone signals
4
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep ho n e
8/12/2019 Voice Fundamentals Book
13/127
to the user that they may now start to dial. Dialing before hearing the dial tone may result
in digits being missed by the exchange. However, today's modern exchanges will
usually return dial tone immediately after detecting current flow. Upon hearing the dial
tone, the user begins to dial the called number. If the telephone is set to pulse dial, the
telephone rapidly opens and closes the loop at a rate of approximately 10 PPS (Pulses
Per Second) or 20 PPS. This is also referred to as loop-disconnect dialing. Figure 2.1
shows the progress of a call from the handset being lifted and dial tone being returned,
to the first digit being dialed (a 3 in this case).
The dial speed and the make/break ratio are standards that were set in the past. They
reflect the characteristics of switching equipment and direct control switches. The make
break ratio varies according to the different dial pulse receivers used in different
countries (e.g. North America:
61/39, UK: 67/33, Germany:
60/40). The ratio 50/50 was not
chosen because it did not match
the characteristics of mechanical
relays and switches in the
switching systems.
An alternative way of sending
dialing information, called Dual
Tone Multi Frequency (DTMF) is
much more common today. In this
form of signaling, each number is
represented by two tones that are
transmitted simultaneously on the
voice path for short period of time.
5
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep h on e
Figu re 2 .2 DTMF Freq u en cies
Figu re 2.1 Op era t ion o f Loo p Discon n ect Dia l ling
8/12/2019 Voice Fundamentals Book
14/127
The frequencies used are shown in Figure 2.2 and defined in ITU-T Recommendation
Q.23 [1].
DTMF transmits digits much faster than pulse dialing, and the time taken to send each
digit is independent of the digit being sent. An additional benefit of DTMF is that once
the call is established, pressing a key on the phone will transmit the tones over the voice
path, enabling DTMF to be used to access voice mail, home banking systems, and other
tone-based systems.
When an incoming call arrives at the telephone set, the exchange applies an AC ringing
voltage to the pair of wires. To answer the incoming call, the user picks up the handset.
This action applies a loop to the line that is detected by the exchange, which then
removes the ringing and connects the voice path through.
Recall - Recall is a function usually available on a simple two-wire analog telephone
(except for older models). It is often accessed with a button marked "R", and can be
used for a number of functions such as accessing additional features from a telephone
exchange or swapping between calls on the same line. There are two types of Recall,
namely Timed Break Recall (TBR) and Earth Recall (ER). With TBR, pressing the Recall
button while the handset is off-hook causes the phone to put a timed break on the line
(similar to dialing a "1"). With the phone set to Earth Recall, the phone momentarily
applies a ground (earth) to one of its leads, known as the B lead.
6
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep ho n e
Figu re 2 .3 Tw o w ire Telep ho n e Set In te rfa ce
8/12/2019 Voice Fundamentals Book
15/127
2.2 .2 Ba s ic Telep h on y - Th e Sp ee ch Pa th
Apart from transmitting dialing information, the main function of the telephone is to
provide voice communications. As already mentioned, the simple telephone has to
provide simultaneous voice paths in both directions even though there are only two
wires. It achieves this through the use of a hybrid circuit, the purpose of which is to take
four-wire speech (i.e. separate paths for transmit and receive) and to combine the two
onto a single two-wire path. Figure 2.3 shows a simplified view of the interface between
a two-wire telephone and the exchange. Speech from the mouthpiece will pass through
the hybrid and onto the telephone line. However, for reasons described below, a certain
amount of the speech signal will be reflected back to the earpiece, which is referred to
as sidetone.
The telephone, the telephone line, and the exchange interface create impedance that
determine the relationship between the voltage and current on the line. To enable the
maximum amount of power to be transferred from the telephone to the line and into the
exchange interface (and vice versa), the impedances should be as close as possible to
one another. This is achieved through the use of balance impedance.
As long as the balance impedance closely matches the impedance presented by the line
and the exchange interface, minimal amounts of signal will be reflected back to the
telephone earpiece. If there is significant impedance "mismatch", then sidetone
increases as more signal is reflected back to the earpiece. At the exchange interface,
the speech signal originating from the telephone mouthpiece is directed towards caller
"Y" by the hybrid.
In the opposite direction, a speech signal from caller "X" at the exchange is transmitted
to the 2-wire line, and as long as the exchange balance impedance matches the line and
telephone set, then little of this signal is reflected back to caller "Y".
2.3 Oth er Typ es of Te lep h on esWhile the simple two-wire telephone set is used extensively in the domestic environ-
ment, it is less common in private networks based on Private Branch Exchange (PBX)
systems. In this environment, it is common to find feature telephone sets that offer a
wider range of facilities than the basic telephone. These feature sets will vary from
manufacturer to manufacturer, each one usually being of a proprietary design. Some will
be analog, while others will be digital. Most PBX systems today are of the digital variety,
although this does not necessarily imply the use of a digital telephone set, and allow
both analog and digital phones to be used. With an analog phone, the interface in the
7
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep h on e
8/12/2019 Voice Fundamentals Book
16/127
PBX converts the analog signals to digital PCM (see Section 4.2). A digital telephone
performs the analog/digital and digital/analog conversions within the set.
Some digital telephones are of a proprietary nature, where the format of the digital data
is manufacturer-specific. Many digital sets, however, conform to the Basic Rate ISDN
standards as outlined in ITU-T Recommendation I.420 [2]. This defines a digital
interface that carries two channels operating at 64 Kbps (known as the B or Bearer
channels) and a 16 kilobits per second (Kbps) signaling channel (known as the D or
Delta channel).
The B channels can be used to carry data or digitized voice in a similar manner to the
Primary Rate interfaces as described in sections 4.3 and 4.4. The main difference is that
on a Basic Rate interface only two B channels are available. The D channel is normally
used to carry Common Channel Signaling (CCS) information for call control in a similar
manner to CCS on Primary Rate interfaces (see Section 7.2). However, the specifica-
tions also allow the D channel to be used to carry user data, including packet-switched
and frame-based data.
2.3.1 "Tw o-w ire" vs . "Fou r-w ire" Telep h on y
References are frequently made to two-wire or a four-wire telephone sets. Care should
be taken when interpreting the meaning of these terms, since they can be confusing. A
two-wire telephone means that speech is carried in both directions on the same pair of
wires, and requires hybrid circuits to split the two paths into separate transmit and
receive functions at the telephone set. It is also possible, particularly in some proprietary
telephone sets, to have additional wires for signaling purposes. In this case, a two-wire
telephone may actually be connected to the exchange by more than two wires.
A four-wire telephone strictly means that speech is carried in each direction on separate
pairs of wire and no hybrid circuit is necessary. However, a better definition of four-wire
telephony is that the speech is carried on separate paths that may be pairs of wire, or
might even be separate channels in a digital system. For example, once into a digital
telephone exchange, speech is carried as "four-wire" even though it is carried in
timeslots in high-speed digital signals rather than actually on four wires.
8
Vo ice Fu n d a m e n ta ls
Sect ion 2 - Th e Te lep ho n e
8/12/2019 Voice Fundamentals Book
17/127
3 PBX Ph on e Sys te m s
This section is not intended to be a detailed description of the operation of PBX phone
systems. However, in order to better understand some of the other sections presented
in this booklet, some of its key aspects will be covered.
The PBX system will be examined followed by a quick look at how telephone calls are
routed within a PBX network. Finally, the different types of voice interfaces normally
available on a PBX will be introduced.
3.1 In t r od u ct ion to th e PBX
Simply put, a PBX is the telephone exchange that is privately owned by an organization.
Its objective is to provide voice (and often data) communications to its users. In addition
to offering simple call setup facilities, it also offers many other features and facilities to
make life easier for its users. Users can place calls to others in their organization by
simply dialing their extension number. To place a call to a person not connected to the
same PBX network requires the call to be routed via the Public Switched Telephone
Network (PSTN). This usually involves dialing an access code, such as a "9" or "0",
followed by the complete destination number including country code and area code
if appropriate.
9
Vo ice Fu n d a m e n ta ls
Sect ion 3 - PBX Phon e Sys te m s
Figu re 3.1 Typica l Com p on en ts of a Dig i ta l PBX
8/12/2019 Voice Fundamentals Book
18/127
Most larger PBX systems today are digital. This means that they route connections in a
digital form, with speech first being converted from analog into PCM (see Section 4.2).
Figure 3.1 shows the typical components used within a digital PBX.
The "core" of the PBX is the common control and the switching matrix. The common
control acts as the "brains", and controls the overall operation of the PBX. It performs
functions such as recognizing that a phone has been taken off hook and connecting a
dial-tone generator to the phone, interpreting the dialed digits and routing the call to a
particular trunk or line interface, and so on. The switching matrix takes in 1.544 Mbps bit
streams comprising multiple 64 Kbps channels and allows them to be connected, or
switched, to any other 64 Kbps channel on any other interface.
Interfaces to the PBX come in two main types: lines and trunks. Lines connect user
devices such as analog or digital telephone sets, or other devices such as data
terminals. Trunks are shared links and can carry connections originating from line inter-
faces on the same PBX or from other trunks also connected to the PBX. Analog trunks
can only support one connection at a time, while digital trunks can support many
connections simultaneously (see Sections 4.3 and 4.4). The trunks can then be broken
down into two additional types: PSTN trunks (also called Central Office trunks) and
private trunks. PSTN trunks connect the PBX to the public telephone network, and
private trunks (also called "Tie lines" or "Tie trunks") connect the PBX to other PBXs as
part of an overall private network.
3.2 Ca ll Ro u t in g in a PBX
When a user dials a destination number, the PBX needs to determine how to route the
connection in the most efficient manner. The PBX needs to consider many factors,
including: Is the number valid? Is this user allowed to connect to the specified destina-
tion? Which is the cheapest trunk to use? Is there a trunk free?
Figure 3.2 shows a network of PBX systems connected together with inter-PBX trunks,
which could be analog, digital, or both. To place a call the user takes Telephone A "off-
hook" and dials the number for Telephone B. PBX 1 inspects the dialed digits and makes
a decision as to which trunk to route the call on. In this case it chooses Trunk 1. PBX 1
seizes Trunk 1 (or a single timeslot of a digital trunk) and passes dialing information
across it. The method used for dialing will depend upon what type of trunk it is (see
Section 7 for more details on signaling). PBX 2 receives the call, inspects the dialed
number, and routes the call accordingly onto PBX 3. PBX 3 inspects the digits, identifies
the destination to be located on the exchange, and alerts the user to an incoming call
by making the phone ring.
10
Vo ice Fu n d a m e n ta ls
Sect ion 3 - PBX Phon e Sys te m s
8/12/2019 Voice Fundamentals Book
19/127
3.3 Voice In te r fa ces o n a PBX
The types of voice interface available on PBX systems are many and varied. They
typically fall into three categories:
Line Interfaces - These are the interfaces on the PBX that connect to desktop tele-
phones. Line interfaces can include any of the types discussed in Sections 2.2 and 2.3,
including:
2-wire analog loop disconnect/loop start
2-or 4-wire analog proprietary feature set
4-wire digital set, either proprietary or conforming to Basic Rate ISDN
standards
Private Trunk Interfaces - These interfaces provide the links between PBXs within a
multi-PBX private network. They allow calls to be routed from one PBX to another
without the need to involve the public network, avoiding extra call cost. Private trunk
interfaces typically include:
2- or 4-wire analog with Ear and Mouth (E&M) signaling
4-wire analog with AC15 signaling
Digital trunk supporting CAS or CCS signaling
Public Trunk Interfaces - These provide access from the PBX to the PSTN (Public
Switched Telephone Network) for outgoing and/or incoming calls. Public trunk interfaces
typically include:
Ground Start analog trunk: 2-wire, both-way calling
Analog Direct Dial In (DDI): 2-wire, typically incoming calls only
Digital trunk supporting CAS or CCS signaling
11
Vo ice Fu n d a m e n ta ls
Sect ion 3 - PBX Phon e Sys te m s
Figu re 3.2 Ca ll Rou t in g in a PBX Pr iva t e Ne tw ork
8/12/2019 Voice Fundamentals Book
20/127
12
Vo ice Fu n d a m e n ta ls
8/12/2019 Voice Fundamentals Book
21/127
4 In t r o d u ct i o n t o Dig i t a l Vo i ce
As previously discussed, many of today's voice interfaces rely on analog technology.
However, once received into a PBX or a wide area network, analog voice is converted
to a digital format in order to derive all the available benefits. This section looks at the
background to digital voice, followed by a more detailed look at how analog speech is
actually converted to a digital format. The integration of digital voice channels into
primary rate digital interfaces is then presented, and finally the need for synchronization
in a PBX network is examined.
4 .1 Th e Ch a n n e l Ba n k
The channel bank was one of the first devices to make use of digital voice in a practical
environment. It is a device that takes multiple analog voice channels, digitizes them, and
then multiplexes them onto a high-speed digital link. Two main types of channel banks
exist today, one supporting up to 24 voice channels, and another supporting 30/31
channels multiplexed onto 1.544 Mbps and 2.048 Mbps digital links.
The first channel bank was developed in North America in 1962, and was known as the
D1 channel bank. It provided 24 analog inputs, each of which was converted to 8-bit
PCM (although the least significant bit was then ignored). The resulting seven bits were
used for each voice sample, and one bit was used for signaling, providing a combined
data rate of 1.544 Mbps. It was later found that this seven-bit PCM gave unsatisfactory
voice quality. Subsequent generations have used eight-bit PCM with "robbed bit"
signaling (see Section 4.4.4).
Channel banks have evolved beyond simply supporting analog voice circuits. Today, it
is common for a channel bank to support multiple interface types other than just the
analog voice interface. For example:
2-wire speech with loop disconnect signaling (incoming and outgoing)
2-/4-wire speech with E&M signaling
Data 0 - 64 Kbps
Data n x 64 Kbps
4.2 Digi t a l Voice - Pu lse Cod e M od u la t ion (PCM) G.711
Digital voice is a representation of analog voice signals that uses binary "1"s and "0"s,
also known as bits.
13
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
8/12/2019 Voice Fundamentals Book
22/127
Figure 4.1 shows the progress of a speech signal entering the telephone, being
converted into an analog electrical signal, and then being converted into a digital form.
When the talker speaks, they create variations of pressure in the air. The telephone
picks up these pressure changes and turns them into an electrical signal that is
analogous to the acoustic signal from the talker (hence the term analog). This analog
signal is then converted into a digital stream of data bits which represents the digital
voice signal.
But why transport voice in a digital format? There are a number of reasons, including:
Digital transmission is independent of distance - When an analog
signal is transmitted on a transmission channel, it is attenuated to
compensate for signal losses in the cable. In addition, noise is picked up
that will affect the quality of the voice transmission. The signal that arrives
at the destination is made up of a combination of the original signal and
line noise. Amplifiers can be used to boost the signal back to the original
level, but there is no easy way for the amplifier to distinguish between the
original signal and line noise, so the noise is also amplified.
Digital signals take on one of two levels, represented by binary "0"s and
binary "1"s. When noise is introduced into the digital signal, it can be
easily removed by regenerating equipment. Thus, the signal that arrives
at the destination is an identical replica of the signal transmitted from the
source.
Multiplexing of voice and data - Since the digitized PCM voice is essen-
tially a data stream running at a bit rate of 64 Kbps, it can be readily inte-
grated with other PCM channels to make up an aggregate connection
combining many voice channels in one physical connection. Since it is
14
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .1 Prog ress o f spe ech signa l An a logu e to Dig it a l
8/12/2019 Voice Fundamentals Book
23/127
essentially data, it can be combined with other 'real' data and transmitted
over a common transmission medium. This approach is used for the
digital representation of analog voice signals in telephone systems and is
defined in ITU-T Recommendation G.711 [3]. A simplified view of the
process can be seen in Figure 4.2.
The analog signal is sampled at a rate of 8000 times per second. This rate is derived
from a theory developed by Harry Nyquist, which states that the sampling rate must be
at least twice the maximum frequency of the signal being sampled (i.e. > 2 x 3.4 kHz).
This results in Pulse Amplitude Modulation (PAM), which is simply a series of pulses that
represent the amplitude of the analog signal at each sample time. Each PAM sample is
compared to a range of fixed quantization levels, each of which is represented by a fixed
binary pattern. The binary pattern of the closest quantization level is then used to
represent the PAM sample.
15
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .2 An a logu e to PCM Con vers ion
Fi g u re 4 . 3 Lin e a r / Un i fo rm a n d N o n - lin e a r / N o n -u n i fo rm Q u a n t i sa t i o n
8/12/2019 Voice Fundamentals Book
24/127
Due to a finite number of quantization levels available, this process introduces an error
into the digital representation of the analog signal. The more bits used in each sample
results in additional quantization levels and less error. To achieve reasonable quality
over the range of speech amplitudes found in networks requires a minimum of 12-bit
PCM samples assuming linear quantization (also known as uniform quantization). A
view of linear quantization is given in Figure 4.3a.
In practice, this number of levels is unnecessary for two reasons. First, average signal
levels are normally small, and only the lower quantization levels actually get used.
Second, the human ear operates in a logarithmic manner, being more sensitive to distor-
tion in low-level signals than in high-level signals.
As a result, a technique known as companding is used. This reduces the number of
quantization levels by retaining multiple levels at low signal amplitudes, and reducing the
number of levels for high amplitudes. This process of companding is shown in
Figure 4.3b.
4.2 .1 A-la w a n d - la w PCM
There are two common types of PCM, -law and A-law, each of which uses a different
rule for the companding process. North America and Japan mostly use -law, whereas
other areas of the world require the use of A-law. Both types are defined in the G.711
Recommendation, yet differ in a number of ways. The companding laws are different,
and the allocation of PCM codes differs in relation to the amplitude of the PAM samples.
With A-law, after converting from PAM to PCM, the even bits of each sample are inverted
before being transmitted onto the digital transmission path. This bit inversion was origi-
nally used to ensure that a sufficient number of "1"s existed in the digital stream,
because any channel that was idle would otherwise produce a pattern of only "0"s. In
fact, on a 2.048 Mbps PBX interface this inversion is unnecessary, since the problem of
too many "0"s is dealt with at the physical layer (see Section 4.3.1).
For these reasons, when operating an international network where both -law and A-law
PCM systems are used, it is important to perform a proper conversion between the two.
The conversion process is given in G.711, which defines that digital paths between
countries should carry signals encoded using A-law PCM. Where both countries use the
same law, then that law should be used on digital paths between them. Any necessary
conversion will be done in the countries using -law PCM.
4 .2.2 Pow er o f a D ig i t a l Sign a lIt is easy to quantify the power levels for an analog interface, since this is something that
can be measured directly with a power meter. In the PCM world, there is no equivalent
16
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
8/12/2019 Voice Fundamentals Book
25/127
direct method of measurement. Instead, a specific relationship is defined between an
analog signal and a digital sequence. The relationship of power to the digital signal is
defined in G.711 through two tables (one for -law and one for A-law) that define a
sequence of PCM samples. When decoded, these samples result in a sine wave signal
of 1 kHz, at a nominal level of 0 dBm0. This provides a theoretical maximum level of
+3.17 dBm0 for -law, and +3.14 dBm0 for A-law. Any attempt to exceed these levels
will result in distortion of the signal, simply because there are no more quantization
levels.
4 .2.3 Disto r t ion Resu lt in g f rom th e Dig i t i za t ion Proces s
When PCM values are allocated to the PAM samples, a certain amount of distortion
results because of the finite number of quantization levels available to quantize the
analog signal. Distortion is covered in more depth in Section 7.7.
4 .3 Th e Dig ita l 1.54 4 M b p s PBX In t e rfa ce (DS-1)
The 1.544 Mbps PBX interface is common to North America and Japan, and is often
referred to as a "T1" or a "DS-1" interface. (In practice, these two terms are often used
interchangeably, although this is wrong. DS-1 refers to a particular speed of 1.544 Mbps,
while T1 refers to a digital transmission system). It offers 23 or 24 traffic timeslots
depending upon the type of signaling being used.
In countries supporting DS-1 interfaces, such as in North America, various types of T1
transmission facilities are offered. AMI facilities expect the attached device (e.g. PBX) to
provide an AMI electrical signal as described in Section 4.3.1. The main problem with
this is that long strings of "0"s do not provide any electrical voltage transitions. This can
result in loss of repeater synchronization on the transmission facility. It is therefore the
responsibility of the attached equipment to ensure that sufficient "1"s exist to maintain
synchronization. The proportion of "1"s to "0"s is known as the "1"s density.
An alternative type of facility supports Bipolar Eight Zeroes Substitution (B8ZS), where
violation pulses are introduced into the user data stream upon the detection of an
excess "0"s count. This technique is similar in principle to HDB3, as seen in Section
4.3.1, and is described below.
4.3.1 Physical Interface
DS-1 is supported via twisted pair cable only, unlike E1 that is supported on both unbal-
anced coaxial cable and balanced twisted pair cables. DS-1 uses Alternate Mark
Inversion (AMI) line coding to electrically encode the signal on the line. However, to
overcome any problem of low "1"s density, a process called B8ZS is normally used
17
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
8/12/2019 Voice Fundamentals Book
26/127
instead of the E1 HDB3 process. The operation of B8ZS, shown in Figure 4.4, works by
replacing strings of eight consecutive binary zeroes with a code that introduces bipolar
violations into the 4th and 7th bit positions. This ensures that a sufficient number of
voltage transitions exist, while retaining the DC-balanced nature of the signal.
4.3.2 Fra m ing - D4
There are two common types of framing used on a DS-1 interface, D4 and Extended
Superframe (ESF).
D4 framing is shown in Figure 4.8. This consists of a frame of 193 bits with a repetition
rate of 8000 frames per second, giving a data rate of 1.544 Mbps and a 125s frame
duration. Each frame contains 24 eight-bit timeslots named timeslot 1 through to timeslot
24, and a single bit called the F or framing bit. All 24 timeslots are normally available for
traffic except for when CCS is carried. In this case, timeslot 24 is reserved for the
signaling channel.
Framing is achieved using the F bit over a sequence of 12 frames, which is also called
a superframe. In odd-numbered frames the F bit is called Ft for terminal framing, and
performs frame alignment. In even-numbered frames the F bit is called Fs and performs
superframe alignment.
4 .3.3 Fra m ing - Ex te n d ed Su p e r f ra m e (ESF)Today, ESF is more common than D4 due to its capabilities for monitoring the perfor-
mance of an in-service T1 link. This was not easily possible with D4, since the link would
need to be taken out of service in order for performance testing to be carried out.
The extended superframe, as shown in Figure 4.9, does exactly what its name implies
and extends the 12-frame superframe to 24 frames. The use of the F bit is also changed.
18
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .4 HDB3 Cod ing
8/12/2019 Voice Fundamentals Book
27/127
Only 6 out of the 24 frames in the ESF are now used for synchronization purposes. Of
the remaining 18 F bits, six are utilized for CRC checking to verify the integrity of the
ESF, and 12 make up a Facility Data Link (FDL). The FDL is also known as the Data
Link (DL) and is sometimes called the Embedded Operation Channel (EOC). The FDL
is available for the communication of alarms, loop backs and general performance infor-
mation between terminating devices such as Customer Service Units (CSUs), which
terminate the T1s at the customer premises.
4.3.4 Ch a n n el Assoc ia te d Sign a lin g (CAS) on DS-1
The technique for CAS on a DS-1 is shown in Figures 4.8 and 4.9. The basic process is
the same for both D4 and ESF framing. However for D4, only two signaling bits, A and
B, are used for each traffic timeslot. With ESF, four bits are used: A, B, C, and D.
The process used is called bit robbing, because the least significant bit of each traffic
timeslot in every sixth frame is taken away to carry signaling information rather than
traffic. Meanwhile, the other seven bits are left alone and continue to carry traffic such
as PCM. Any distortion introduced to PCM voice traffic by this bit-robbing technique is
negligible and can be ignored. However, for data the distortion can be significant. This
is why data support is typically only 56 Kbps rather than 64 Kbps, with only the seven
most significant bits being used.
19
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .5 G .70 4 Fra m e St ru ctu re fo r 2.048 Mbi t / s E1
8/12/2019 Voice Fundamentals Book
28/127
4 .3.5 Co m m o n Ch a n n e l Sig n a li n g o n DS-1
Common Channel Signaling utilizes timeslot 24 to carry signaling information as HDLC-
based data messages. See section 5.2.2 for more details and examples.
4 .3.6 DS-1 Ala rm sDS-1 provides the same alarm conditions, but in a different manner and with a different
naming convention. Figure 4.10 gives a comparison between DS-1 and E-1 alarm condi-
tions and the naming conventions associated with each.
The method used by DS-1 to provide a remote alarm indication differs depending on
whether D4 or ESF framing is being used.
With D4 trunks, a remote alarm indication, also called a yellow alarm, is given by trans-
mitting a "0" in the bit 2 position of every timeslot. Putting this alarm indication in the
traffic timeslots has two implications. First, it destroys any valid information carried in the
traffic timeslots. Second, the receiver must validate the indication for a period of time
(typically about 600 milliseconds (ms)) before taking any action, since it is possible that
normal traffic could briefly mimic it.
With ESF trunks, a remote alarm indication (yellow alarm) is given by using the F bit
Facility Data Link to transmit an alternating pattern of eight "0"s, followed by eight "1"s,
then eight "0"s, and so on.
4 .4 Th e Digi t a l 2 .0 4 8 Mb p s PBX In t e r fa ce (E1)
A digital PBX interface running at 2.048 Mbps, sometimes called the "E1" interface, is
designed to conform to ITU-T Recommendation G.732 "Characteristics of Primary PCM
Multiplex Equipment Operating at 2.048 Mbps" [4]. This in turn refers to the following
recommendations:
G.703: "Physical/Electrical Characteristics of Hierarchical Digital Interfaces"
G.704: "Synchronous Frame Structures Used at Primary and Secondary
Hierarchical Levels"
G.711: "Pulse Code Modulation (PCM) of Voice Frequencies"
4.4 .1 Physica l In t e rfa ce - G.70 3
ITU-T Recommendation G.703 [5] defines the electrical characteristics for many types
and speeds of interfaces, including 64 Kbps, 1.544 Mbps, 6.312 Mbps, 32.064 Mbps,
44.736 Mbps, 2.048 Mbps, 8.448 Mbps, 34.368 Mbps, 139.264 Mbps, 97.728 Mbps, and
155.52 Mbps. European voice applications primarily use the 2.048 Mbps interface.
20
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
8/12/2019 Voice Fundamentals Book
29/127
One of two types of physical interface may be used: the 75-ohm unbalanced coaxial
interface, or the 120-ohm balanced twisted pair interface. Parameters including voltage,
voltage and frequency tolerance, and others are specified in G.703.
The actual data bits are transmitted using Alternate Mark Inversion (AMI) with High
Density Bipolar 3 (HDB3) encoding. The objective of using these is twofold: first, to
remove any DC component from the transmitted signal (AMI performs this function), and
secondly, to ensure that there are a sufficient number of voltage transitions in the signal.
This is referred to as "1"s density, and is important so that the receiving device can
derive synchronization, or timing from the signal (HDB3 performs this function).
Figure 4.8 shows AMI and HDB3 encoding. AMI employs a three-level signal where
binary zeroes are encoded using 0 volts, and successive binary "1"s (marks) are
encoded using alternating voltages of 2.37 V for the unbalanced interface and 3 V for
the balanced interface.
HDB3 is defined in G.703, and works by replacing each block of 4 successive zeroes by
a pattern of either 000V or B00V, where B represents an inserted pulse that conforms to
the AMI rule and V represents an AMI violation. A violation is where two successive "1"s
use the same electrical polarity. The choice of which pattern is used ensures that the
number of B pulses between consecutive V pulses is odd, thus retaining the DC-
balanced nature of the signal. This is important to help ensure error-free transmission.
4 .4 .2 Fra m ing S t ru ctu re - G .70 4
ITU-T Recommendation G.704 [6] defines the frame structures for a number of different
speed links, including 1.544 Mbps, 6.312 Mbps, 2.048 Mbps, and 8.448 Mbps.
As shown in Figure 4.5, a frame of 256 bits is defined with a repetition rate of 8000
frames per second, giving a data rate of 2.048 Mbps and a 125 s frame duration. Each
21
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4.6 E1 Alar m s
8/12/2019 Voice Fundamentals Book
30/127
frame comprises 32 eight-bit timeslots named timeslot 0 through to timeslot 31. Timeslot
0 is used for a number of purposes, including frame synchronization and alarm
reporting. Timeslot 16 is normally used to carry signaling information, although in some
circumstances it may be used to carry a traffic channel. Timeslots 1 to 15 and 17 to 31
are used to carry 30 traffic channels, normally PCM in the case of PBX systems.
However, since these timeslots simply represent a 64 Kbps channel they can be used
to carry any form of traffic, including data.
Alternate timeslot 0s carry different information. Figure 4.5 is concerned with the frame
alignment signal 0011011 in even frames, and the A (alarm) bit in odd frames. The
purpose of the frame alignment signal is to allow the devices at each end of a link to
22
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .7 Bipo la r Eight Zer oe s Su bst i tu t ion (B8ZS) Cod ing
Figu re 4 .8 D4 Fra m ing
8/12/2019 Voice Fundamentals Book
31/127
synchronize to the frame, enabling them to know where the frame starts and which bits
refer to which timeslot. The A-bit, also known as the Remote Alarm Indication (RAI) is
set to "0" in normal operation. In the event of a fault condition the A-bit will be set to a
binary "1". See section 4.3.5 for more details on alarms.
The other bits, Si and Sa4-Sa8, are of less significance, although still important. Si is
reserved for international use. One specific use, as given in G.704, is to carry a Cyclic
Redundancy Check that can be used for enhanced error monitoring on the link. It is
important to note that both ends of a link must be configured in the same way, either with
CRC enabled or CRC disabled.
Sa4 to Sa8 are additional spare bits that can be used for a number of purposes as
defined in G.704. For example, Sa4 can be used as a message-based link for opera-
tions, maintenance and performance monitoring.
4.4 .3 Ch a n n el Assoc ia te d Sign a lin g (CAS) on E1Figure 4.5 demonstrates a form of signaling known as Channel Associated Signaling
(CAS). With CAS, specific bits of data within the frame are defined to carry signaling
information for each of the traffic timeslots. The information for each timeslot is trans-
mitted as a group of four bits, designated A, B, C and D. Since timeslot 16 only has eight
bits available to support 30 traffic timeslots, a multiframe structure of 16 frames (desig-
nated frame 0 to frame 15) is defined to allow A, B, C, and D bits to be carried for all 30
timeslots. Frame 0 carries a Multiframe Alignment Signal (MFAS) of four zeroes. This
allows the receiving system to identify which frame is which, and associate a traffic
timeslot with its correct signaling bits. Frame 0 also carries a remote alarm indicator to
23
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Fig u r e 4.9 ESF Fr a m in g
8/12/2019 Voice Fundamentals Book
32/127
signify loss of multiframe alignment. Timeslot 16 of frame 1 then carries A, B, C, and D
bits for timeslots 1 and 17, timeslot 16 of frame 2 carries A, B, C, and D bits for timeslots
2 and 18, and so on up to frame 15 of the multiframe, which carries A, B, C, and D bits
for timeslots 15 and 31, after which the sequence repeats.
A common problem is when the A, B, C, and D bits in timeslot 16 associated with any
of the traffic channels 1 to 15 are set to 0000. If this is the case a false multiframe
alignment signal will result, causing the whole signaling mechanism to fail. This situation
is most common in a configuration where the PBX system is connected to a multiplexer
network that provides an idle pattern of 0000 to the PBX for channels that are not routed.
In fact, ITU-T G.704 recommends against the use of 0000 for any signaling purposes for
timeslots 1 to 15. It also recommends that if B, C, and D are not used, then they should
be set to B=1, C=0 and D=1.
4 .4 .4 Com m on Cha n n e l Sign a ling (CCS) on E1
Common Channel Signaling utilizes timeslot 16, as does CAS. However, rather than
defining specific bits to carry signaling information for each of the traffic timeslots,
signaling information is sent as High Level Data Link Control (HDLC)-based data
messages. See section 5.2.2 for more details and examples.
4 .4.5 E1 Ala rm sRecommendation G.732 describes various fault conditions and subsequent actions that
should be taken. It includes such conditions as power supply failure and codec failure,
although this booklet will only describe problems associated with the link itself.
Frame Level Alarms - (in reference to Figure 4.6) In the event of one of the following
problems occurring on the received signal (Rx), bit 3 in timeslot 0 of odd numbered
frames should be set to "1" on the transmit (Tx) signal of that PBX system.
Loss of incoming signal
Loss of frame alignment
Excessive error ratio 1 x 10-3
24
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .10 DS-1 a n d E1 Ala rm Com p a rison
8/12/2019 Voice Fundamentals Book
33/127
Loss of frame alignment will occur for a number of reasons, including cabling or
equipment faults. In the event of a failure in the transmission system between the two
PBXs, the transmission system should automatically apply an Alarm Indication Signal
(AIS) condition to the line of a continuous stream of "1"s.
Multiframe Alarm - When running Channel Associated Signaling, bit 6 of timeslot 16 in
frame 0 of the multiframe is used to indicate loss of multiframe alignment. If multiframe
alignment is lost on the receive signal, the PBX will set bit 6 in its transmit signal as to
the far end.
Loss of multiframe alignment is a rare situation, since in a normal failure condition loss
of frame alignment is more likely. A common cause of such a condition is misconfigura-
tion of one end of the link, as described in Section 4.3.3 above.
4.5 Th e Ne ed for PBX Syn ch ron iza t ion
Similar to many digital systems, digital PBX systems operate internally in a synchronous
manner where data is moved from one place to another using a common clock, or timing
source. When two PBXs are connected together via a digital link, they must be synchro-
nized in order to avoid bit slips and loss of frame synchronization.
Digital PBX interfaces typically have some form of buffering mechanism in order to
overcome problems associated with jitter, wander, and slight timing inaccuracies.
However, in the event that the timing mismatch between two PBXs is too great, the
buffer will fill. Once full, it must be emptied, resulting in loss of data and loss of frame
synchronization. Frame synchronization should be regained rapidly, so long as the
timing mismatch is not too great.
25
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4.11 La ck of Syn chro n isa t io n
1,544,001Mbit/s 1,543,999Mbit/s
8/12/2019 Voice Fundamentals Book
34/127
To understand why synchronization is required, consider the following two examples.
The first example shows two PBX systems connected without synchronization, while the
second shows PBX 2 synchronized to PBX 1.
4.5.1 PBX Sys te m s With ou t Syn chro n iza t ion(With reference to Figure 4.11) The two PBXs are connected via a 1.544 Mbps link. This
is only a nominal speed and some deviation is inevitable. It is quite possible that the
transmit clock from PBX 1 is running slightly fast, say 1.544001 Mbps. This represents
an accuracy of about 5 parts in 10 million, which is not uncommon. PBX 2, on the other
hand, is running slightly slowly at 1.543999 Mbps, again an accuracy of about 5 parts in
10 million.
Every second, PBX 1 transmits 1,544,001 bits of data onto the trunk, and PBX 2
receives 1,543,999 bits, leaving two bits to be absorbed in a buffer at the input to PBX
2. This will continue with two bits being absorbed in the buffer every second until the
buffer is full, at which point it will be emptied causing loss of frame synchronization. The
effect of this is hard to predict, but will probably cause clicks on any voice call currently
in progress across the trunk, or disconnection.
4.5.2 PBX Sys te m s With Syn chro n iza t ion
(With reference to Figure 4.12) The system has been rearranged to allow PBX 2 to
synchronize its internal clock to the data arriving at the digital trunk port. PBX 1 still
transmits data a bit fast, at 1.544001 Mbps, but now PBX 2 also receives data at
1.544001 Mbps. The buffer does not fill, and frame synchronization is preserved.
This approach is effective in a point-to-point situation, but synchronization also needs to
be maintained in a full network scenario such as shown in Figure 4.13.
26
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .12 PBX Synchr on isa t ion
1.544001 Mbit/s 1.544001 Mbit/s
8/12/2019 Voice Fundamentals Book
35/127
There are a number of rules to follow with synchronization, one of which is that the whole
network should be synchronized back to the same clock source. Figure 4.13 shows a
small network of five PBX systems with PBX 1 taking a high-accuracy clock from an
ISDN connection to the public network. An ideal clocking arrangement based upon this
network would be for PBX 2 to synchronize its system clock to the data on Link 1 coming
from PBX 1. PBX 3 would then synchronize to the data coming from PBX 2 via Link 2.
PBX 4 would synchronize to PBX 1 via Link 3 and PBX 5 to PBX 4 via Link 6. In this
way, all PBX systems are synchronized together.
However, this scenario alone would not compensate for a failure. If link 6 were to fail,
PBX 5 would lose the clock to which it is synchronized. To overcome this problem, it is
normal to have a clock fallback list in each PBX, and in the event of a problem the PBX
will search for another valid source.
However, one key criteria to follow when creating clock fallback lists is to ensure that the
network cannot get into a clocking loop. For example, this is where PBX 4 takes its
clocking signal from PBX 3, PBX 3 from PBX 5, and PBX 5 from PBX 4. In this scenario
it is likely that errors would occur, causing loss of synchronization on the links. The error
level could extend to where synchronization cannot be regained, resulting in a complete
loss of one or more trunks. For this reason, it is very important to follow the manufac-
turer's guidelines when configuring network clocking.
27
Vo ice Fu n d a m e n ta ls
Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice
Figu re 4 .13 Synchr on isa t ion in a Net w ork of PBXs
8/12/2019 Voice Fundamentals Book
36/127
28
Vo ice Fu n d a m e n ta ls
8/12/2019 Voice Fundamentals Book
37/127
5 Sp e e ch Co m p r e s s io n
Speech compression, also referred to as voice compression, describes the process of
digitizing speech to a bit rate of less than 64 Kbps. It is normal, however, to start with PCM
at 64 Kbps and compress it to a lower rate. Ideally, the resulting speech quality will not
be affected, however in practice, there will be some degradation that may or may not be
apparent to the users. There are many different techniques with different characteristics
used for speech compression, which result in bit rates from a few Kbps up to 40 Kbps.
PCM speech can be compressed because a large portion of the 64 Kbps bit stream is
redundant. Furthermore, it is thought that speech of reasonable quality can be provided
at rates as low as 1 Kbps. This has not yet been achieved, in large part because current
understanding of the way speech works is less than complete. As time goes by, new and
more efficient techniques are being developed to drive the bit rate lower and lower while
maintaining acceptable quality.
This section looks at a number of speech compression techniques in common usage
today, and other new systems poised for entry into the marketplace. The section
concludes with information on speech compression impairments, including the negative
effects that are introduced into voice telephony when speech compression is used.
5.1 Differe n t Co d in g Typ e s
Speech compression schemes can be classified into one of three categories: Waveform
Coding, Source Coding, and Hybrid Coding.
Waveform Coding - Waveform coders attempt to reconstruct a waveform in a form as
close to the original signal as possible, based on samples of the original waveform. In
theory, this means that waveforms are signal-independent and should work with non-
voice signals such as modem and fax traffic. Typically waveform coders are relatively
simple to implement and produce acceptable quality speech at rates above 16 Kbps.
Below this, the reconstructed speech quality degrades rapidly.
PCM is an example of a waveform coding technique. If linear quantization is used, then
at least 12 bits per PCM sample are needed to reproduce good quality speech. This
results in a bit rate of 96 Kbps (8000 samples per second x 12). However, the nature of
speech and human hearing does not tend to follow a linear pattern. Much of a speech
signal is at low levels, and human ears are not sensitive to the absolute amplitude of
sounds, but instead to the log of the waveform amplitude. Therefore, in representing
speech digitally, more bits are allocated at the lower levels than at the higher levels. This
29
Vo ice Fu n d a m e n ta ls
Se c t i o n 5 - Sp e e c h C o m p re s s io n
8/12/2019 Voice Fundamentals Book
38/127
process, called companding, results in digitized speech of 64 Kbps. Therefore, even
PCM effectively represents a compressed digital speech form. Companding is
discussed in more detail in Section 4.2.
Other common waveform coding techniques include Adaptive Differential Pulse Code
Modulation (ADPCM), and Continuously Variable Slope Delta (CVSD).
Source Coding - Source coders (also known as Vocoders) are more complex than
waveform coders, but can compress speech to bit rates of 2.4 Kbps and below. To
achieve these compression rates, knowledge of the speech generation process is
required. In principle, the speech signal is analyzed and a model of the source is
generated. A number of parameters are then transmitted to the destination to allow it to
rebuild the model and thus recreate the speech. These parameters include such infor-
mation as whether a sound is voiced (such as vowels) or unvoiced (such as most conso-
nants), amplitude information, and pitch. While source coders provide very low bit rates,
the subjective quality of the regenerated speech can be poor. Often, although the
speech is understood, recognition of whom is speaking, referred to as talker recognition,
is very poor. Furthermore, source coders do not carry non-speech signals very well,
such as modem or fax signals.
Because of these factors, source coders are not typically used in commercial applica-
tions. Their main use has been in military applications where natural sounding speech
is not as important as a very low bit rate, which can then be encrypted for security
purposes.
Hybrid Coding - As the name suggests, hybrid coding uses aspects of both waveform
and source coding, bringing together the benefits of high-quality speech from waveform
coders and low bit rates of source coders.
Hybrid coders operate in a similar manner to source coders, where a model is built on
the parameters of the speech signal. Rather than transmit these parameters directly to
the destination, hybrid coders are used to synthesize a number of new signals. These
new signals are then compared with the original signal to find the best match. The
modeled parameters, along with the excitation signal, which represents how the synthe-
sized signal was produced, are then transmitted to the destination where the speech is
reproduced. Examples of hybrid coders include Code Excited Linear Prediction (CELP)
and its derivatives, some of which are described later in this section.
30
Vo ice Fu n d a m e n ta ls
Se c t i o n 5 - Sp e e c h C o m p re s s io n
8/12/2019 Voice Fundamentals Book
39/127
8/12/2019 Voice Fundamentals Book
40/127
DPCM in action, where only the difference between one sample and the next is trans-
mitted from the source to the destination.
For PCM, the analog speech signal is sampled at regular points, and the absolute level
of the sample is encoded. With DPCM, the difference between one sample and the next
is encoded. By doing this, fewer bits are needed to encode the signal. Although speech
samples don't change rapidly in practice, when they do the changes in signal level are
often greater than can be encoded using DPCM. This results in distortion of the signal,
as shown in Figure 5.1.
However, with ADPCM the amplitude range over which a given number of bits are used
to encode a sample varies, or adapts, depending upon the range of amplitudes
occurring at the time. This can be seen in Figure 5.2, which shows the principle of
encoding the difference between the actual signal and a prediction of what it will be,
based on the prediction that the next sample will be the same as the current one.
The general rule that is used by ADPCM is as follows:
"When signals are quantized towards the limit of the current range, the
range used to quantize the next sample is changed."
During the first few samples, the difference between one and the next will be relatively
small. These differences are encoded using the full range of quantization levels that are
available, given the number of bits available for each sample. For example, at 32 Kbps
the difference is encoded in four bits, one for the sign (whether the next sample is more
positive or negative than that predicted) and three for the magnitude of the difference. A
few samples further on, the difference between one and the next is greater, yet can still
be encoded using the full range of quantization levels. This is achieved through the
32
Vo ice Fu n d a m e n ta ls
Se c t i o n 5 - Sp e e c h C o m p re s s io n
Figu re 5.2 Ad a p t ive Differen t ia l Pu lse Cod e Mod u la t ion (ADPCM)
8/12/2019 Voice Fundamentals Book
41/127
process of adaptation, where the amplitude range that
can be encoded with four bits changes depending upon
the amplitudes at the time.
As previously mentioned, ADPCM supports four different
bit rates. The different rates are achieved through the
number of bits used to encode the difference between
one sample and the next, as seen in Figure 5.3.
5.3 Co d e Excite d Lin e a r Pre d ict ion (CELP)
At bit rates of around 16 Kbps and below, the quality of waveform coders falls rapidly.
We have also seen that source coders, while operating at very low bit rates, tend to
reduce the talker recognition substantially. Therefore, hybrid schemes, especially CELP
coders and their derivatives tend to be used today for sub 16 Kbps compression. Many
CELP implementations are proprietary to the manufacturer although we shall also
discuss two standardized versions known as LD-CELP as defined in ITU-T G.728 and
CS-ACELP as defined in ITU-T G.729.
The essence of a CELP encoder is to analyze the incoming speech, and then transmit
a number of parameters to the decoder so that the original speech can be reproduced
as accurately as possible. These parameters include the mathematical model of a filter
which simulates the talker's vocal tract (the vocal characteristics that make a person
sound unique), gain information giving the level of the speech, and a codebook index.
The codebook index is used to point to a sequence of pre-defined speech samples,
known as vectors, which is common to both the transmitter and the receiver. The
number of codebook entries and the number of samples within each entry is dependent
upon the actual CELP implementation.
At the transmitter, groups of PCM speech samples (vectors) from the input speech,
typically up to 20 ms in length, are compared to the vectors stored in the codebook.
Generating a synthetic speech signal for every entry in the codebook and comparing it
to the actual speech input vector does this. The index for the vector that produces the
best match with the input speech waveform is then transmitted to the receiver. At the
receiver, this waveform is then extracted from the codebook and filtered, using the math-
ematical model of the original talker's vocal tract. This produces highly recognizable,
high-quality speech transmissions.
Due to the complexities of speech and the wide range of different human voices, the
processing required for CELP is very intensive, typically of the order of 15 million instruc-
tions per second (MIPS) or more for one voice channel. However, CELP is a very
33
Vo ice Fu n d a m e n ta ls
Se c t i o n 5 - Sp e e c h C o m p re s s io n
Figu re 5.3 ADPCM bits
8/12/2019 Voice Fundamentals Book
42/127
popular form of speech compression because of its high speech quality and low bit
rates, typically between 4.8 Kbps and 16 Kbps. The practical drawbacks of CELP are
evident in two main areas. First, CELP often produces end-to-end delays in the order of
50 to 100 ms. This is due to a combination of the processing overhead and the number
of speech samples that are buffered for analysis. Such high delays can cause trans-
mission problems (see Section 6 on echo). Second, since CELP is tuned to human
speech, it does not support voice band data well and can cause problems with modems
and fax machines, as well as the transmission of DTMF tones.
5.4 Lo w De la y-Co d e Excit e d Lin e a r Pr e d ictio n (LD-CELP) ITU-T G.728
LD-CELP is based upon CELP, and provides similar speech quality as 32 Kbps ADPCM
at a rate of 16 Kbps. It also incurs smaller levels of delay, typically less than 2 ms, as
compared with normal CELP delay levels of 50 to 100 ms. LD-CELP uses backward
adaption to produce its filtering characteristics, which means that the filter is produced
from previously reconstructed speech.
At the encoder, A law or -law PCM is first converted to linear PCM. The input signal is
then partitioned into blocks of five consecutive input signal samples. The encoder then
compares each of 1024 codebook vectors with each input block. The 10-bit codebook
index of the best match codebook vector is then transmitted to the decoder.
The decoding operation is also performed on a block-by-block basis. Upon receiving
each 10-bit index, the decoder performs a table look-up to extract the corresponding
code vector from the codebook. The extracted code vector is then filtered to produce the
current decoded signal vector. The five samples of the post filter signal vector are then
converted to five A-law or -law PCM output samples.
Please note that this is a somewhat simplified description of the operation of LD-CELP.
A more detailed description can be found in G.728 [15].
5.5 Con ju ga te St ru ctu re -Alge bra ic Cod e Exc ite d Line a r Pred ict ion
(CS-ACELP) ITU-T G.729
CS-ACELP is another speech compression technique that is based upon CELP. It was
originally designed for packetized voice support on mobile networks, although a different
scheme known as RPE-LTP has been adopted on the GSM mobile telephone system
(see Section 5.6).
34
Vo ice Fu n d a m e n ta ls
Se c t i o n 5 - Sp e e c h C o m p re s s io n
8/12/2019 Voice Fundamentals Book
43/127
CS-ACELP operates at a rate of only 8 Kbps, yet still provides speech quality similar to
that of ADPCM at 32 Kbps. Furthermore, it has been shown to operate well in tests even
when packets are lost.
The coder operates on speech frames of 10 ms, corresponding to 80 samples at a
sampling rate of 8000 samples per second. For every 10 ms frame, the speech signal
is analyzed to extract the parameters of the CELP model (linear prediction filter coeffi-
cients, adaptive and fixed codebook indices and gains). These parameters are then
transmitted to the destination in a specified frame format. At the decoder, the filter and
gain parameters are used to retrieve the filter information and simulate the filter. The
speech is then reconstructed by taking the codebook entry of 80 samples and passing
it through the reconstructed filter. Speech is then be converted to A or -law PCM and
transmitted to the interface.
Please note that this is a somewhat simplified description of the operation of CS-ACELP.
A more detailed description can be found in G.729 [16].
5.6 O t h e r Co m p r e s sio n Te ch n iq u e s
Continuously Variable Slope Delta (CVSD) - CVSD was one of the earlier speech
compression schemes designed into TDM multiplexers and was very popular in the
early 1980s. It is a waveform coder operating directly on the waveform of the signal.
However, rather than starting with PCM, as is the case with many other compression
schemes, it often relies on analog rather than digital techniques. With CVSD coding, the
sending end compares the analog input voltage to an internal reference voltage. If the
input signal is greater than the reference, a "1" is transmitted and the reference voltage
increased. If the input signal is less than the reference, a "0" is transmitted and the
reference voltage is decreased.