Voice Fundamentals Book

8/12/2019 Voice Fundamentals Book

1/127

F o r e w o r d

After more than 100 years of experience in providing global telecommunications

solutions, Nortel (Northern Telecom) has acquired Bay Networks, Inc., adding world-

class, IP-based data communications capabilities that complement and expand Nortel's

acknowledged strengths. This precedent-setting union creates Nortel Networks, a new

company with a widely respected heritage and a unique market position: Unified

Networks.

Unified Networks create greater value for customers worldwide through network

solutions that integrate data networking and telephony. The Unified Networks strategy

extends to solutions, products, and services, delivering new economics in networking by

reducing costs, introducing higher revenue services, and delivering new value derived

through networking.

The emergence of the World Wide Web and increasing market deregulation has created

a strong demand for networks that provide increased profitability and higher service

levels for organizations of all types. Unified Networks from Nortel Networks deliver

cutting-edge solutions to reach these new levels of economics.

To meet increased needs, Nortel Networks is delivering a new class of customer rela-

tionships. Extranets, intranets, Web access, e-mail, call centers, and old-fashioned

personal attention are combined to help customers deal with a wide range of new, chal-

lenging, and potentially confusing issues. Whether a customer is at the enterprise level,

a service provider, or a small business, Nortel Networks delivers Unified Networks

solutions designed to meet their unique business challenges.

Solutions based on Unified Networks strategies can take many forms, involving many

different products and technologies - including those that existed prior to the merger with

Bay Networks and many launched since. Unified Networks solutions are differentiated

only by their size, scope, and ambition.

Each solution is tailored to the unique needs of the customer, and integrates a variety of

products, technologies, and services, some of which are described below.

Accelar brings together switching and routing into a low-cost, very high-performance

package.

CallPilot unifies disparate messaging systems, user interfaces, and presentation

formats, making messaging more intuitive and easier to use.

I

Vo ice Fu n d a m e n ta ls

I n t r o d u c t i o n


2/127


3/127

#1 FRADs - Dataquest 1998

#1 FRAD revenues - Dell'Oro 1H98

#1 Packet Switch - Dataquest 1998

#1 in PADs - Dataquest 1998

Vo ice Co m m u n ica t i o nVoice communication has long existed in the world of analog and digital telephone

exchanges. Fixed, dedicated, switched services have provided the user with the ability

to place telephone calls to practically anywhere in the world. The way that voice is

carried and switched within both public and private networks is changing with the

evolution towards the Broadband-Integrated Services Digital Network (B-ISDN), as

based upon Asynchronous Transfer Mode (ATM) technology. This evolution is acceler-

ating the shift to enterprise networks capable of handling voice, video, and data trans-

missions over a single, integrated infrastructure.

This evolution delivers numerous benefits, including more efficient use of network

bandwidth and the ability to offer many different types of voice service. However, there

are many issues that first need to be understood, and then overcome, before these

benefits can be realized.

This booklet examines these issues, highlighting voice communication techniques in

use today and investigating those that may become commonplace in the future. Of

course, with the limited space available, every technological aspect will not be covered

in depth. This booklet is designed to serve as useful introduction to the main subject

areas, and additional references have been included for individuals interested in

learning more.

References are given by a number in square brackets and listed in Appendix D e.g. [3].

In t e n d e d Au d i e n ceThis booklet is intended for a broad range of readers who are involved in the design,

implementation, operation, management, or support of enterprise networks carrying

both voice and data traffic. It has particular relevance to those with a background in data

and/or limited experience in voice technologies, although it will also serve as a useful

reference for anyone involved in voice networks today.

III


I n t r o d u c t i o n


4/127

IV



5/127

V


Ta b le o f C o n t e n t s

Ta b le o f Co n t e n t s

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 A Short History of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

2 The Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32.1 Key Voice Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.1.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.1.2 Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

2.2 Basic Operation of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

2.2.1 Basic Telephony - Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

2.2.2 Basic Telephony - The Speech Path . . . . . . . . . . . . . . . . . . . . . . .7

2.3 Other Types of Telephones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.3.1 "Two-wire" vs. "Four-wire" Telephony . . . . . . . . . . . . . . . . . . . . . . .8

3 PBX Phone Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.1 Introduction to the PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

3.2 Call Routing in a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

3.3 Voice Interfaces on a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

4 Introduction to Digital Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134.1 The Channel Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

4.2 Digital Voice - Pulse Code Modulation (PCM) G.711 . . . . . . . . . . . . . . .13

4.2.1 A-law and -law PCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

4.2.2 Power of a Digital Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

4.2.3 Distortion Resulting from the Digitization Process . . . . . . . . . . . .17

4.3 The Digital 1.544 Mbps PBX Interface (DS-1) . . . . . . . . . . . . . . . . . . . .17

4.3.1 Physical Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

4.3.2 Framing - D4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

4.3.3 Framing - Extended Superframe (ESF) . . . . . . . . . . . . . . . . . . . .18

4.3.4 Channel Associated Signaling (CAS) on DS-1 . . . . . . . . . . . . . . .19

4.3.5 Common Channel Signaling on DS-1 . . . . . . . . . . . . . . . . . . . . .20

4.3.6 DS-1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

4.4 The Digital 2.048 Mbps PBX Interface (E1) . . . . . . . . . . . . . . . . . . . . . .20

4.4.1 Physical Interface - G.703 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

4.4.2 Framing Structure - G.704 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

4.4.3 Channel Associated Signaling (CAS) on E1 . . . . . . . . . . . . . . . . .23

4.4.4 Common Channel Signaling (CCS) on E1 . . . . . . . . . . . . . . . . . .24

4.4.5 E1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24


6/127

4.5 The Need for PBX Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

4.5.1 PBX Systems Without Synchronization . . . . . . . . . . . . . . . . . . . .26

4.5.2 PBX Systems With Synchronization . . . . . . . . . . . . . . . . . . . . . . .26

5 Speech Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .295.1 Different Coding Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

5.2 Adaptive Differential Pulse Code Modulation (ADPCM) . . . . . . . . . . . . .31

5.3 Code Excited Linear Prediction (CELP) . . . . . . . . . . . . . . . . . . . . . . . . .33

5.4 Low Delay-CELP (LD-CELP) ITU-T G.728 . . . . . . . . . . . . . . . . . . . . . . .34

5.5 Conjugate Structure-Algebraic CELP (CS-ACELP) ITU-T G.729 . . . . . .34

5.6 Other Compression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

5.7 Speech Compression Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

5.7.1 Mean Opinion Score (MOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

5.7.2 Quantization Distortion Units (QDUs) . . . . . . . . . . . . . . . . . . . . . .38

5.7.3 Speech Compression and Voice-Band Data . . . . . . . . . . . . . . . . .39

5.8 Fax Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

6 Echo and Echo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .416.1 What is Echo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

6.1.1 Causes of Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

6.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

6.2.1 When Is Echo Control Required? . . . . . . . . . . . . . . . . . . . . . . . . .43

6.2.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44

6.3 Echo Suppressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

6.4 Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

6.4.1 Nonlinear Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

6.4.2 Tail Circuit Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

6.4.3 Types of Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

6.4.4 Tone Disabling of Echo Cancellers and Echo Suppressors . . . . . .49

6.4.5 G.168 (Improved Echo Canceller) . . . . . . . . . . . . . . . . . . . . . . . .49

7 Introduction to Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .517.1 Analog Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

7.1.1 Ground Start: 2 Way PBX to Public Exchange Trunk Circuit . . . . .51

7.1.2 E&M Trunk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

7.1.3 AC Signaling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

7.1.4 Manual Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56

7.2 Digital Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

7.2.1 Channel Associated Signaling (CAS) . . . . . . . . . . . . . . . . . . . . . .58

7.2.2 Common Channel Signaling (CCS) . . . . . . . . . . . . . . . . . . . . . . .59

VI




7/127

7.2.2.1 Private-to-Public Networking Protocols . . . . . . . . . . . . . . . .60

7.2.2.2 Public to Public Networking Protocol . . . . . . . . . . . . . . . . .60

7.2.2.3 Private Networking Protocols . . . . . . . . . . . . . . . . . . . . . . .60

7.2.2.4 How Does CCS Work? . . . . . . . . . . . . . . . . . . . . . . . . . . .61

8 Voice Within the Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . .658.1 What is an Enterprise Network? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65

8.1.1 Different Types of Enterprise Networks . . . . . . . . . . . . . . . . . . . .66

8.2 Time Division Multiplexers (TDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68

8.2.1 TDM Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

9 Voice Over Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . .719.1 Introduction to ATM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

9.1.1 The ATM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

9.1.2 The ATM Adaptation Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . .72

9.1.3 ATM Service Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

9.1.4 Statistical Multiplexing with ATM . . . . . . . . . . . . . . . . . . . . . . . . .76

9.2 Voice and Telephony Over ATM (VTOA) . . . . . . . . . . . . . . . . . . . . . . . .77

9.2.1 Voice Sample Cellification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78

9.2.2 Speech Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

9.3 PBX Synchronization Across an ATM Network . . . . . . . . . . . . . . . . . . .80

9.3.1 Synchronous Residual Time Stamp (SRTS) . . . . . . . . . . . . . . . . .81

9.3.2 Adaptive Clock Recovery (ACR) . . . . . . . . . . . . . . . . . . . . . . . . .81

9.3.3 Independent Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82

10 Voice Over Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8310.1 Introduction to Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83

10.2 Voice Over Frame Relay (VoFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

10.2.1 Delay on Frame Relay Networks . . . . . . . . . . . . . . . . . . . . . . . .86

10.2.2 VoFR Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86

10.3 Benefits and Issues of VoFR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

10.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

10.3.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

11 Voice Over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9111.1 What is IP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91

11.1.1 How Voice Over IP Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93

11.1.2 Benefits and Issues of Voice Over IP . . . . . . . . . . . . . . . . . . . . .93

11.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94

VII




8/127

12 Voice Switching in the Enterprise Network . . . . . . . . . . . . . . . . . . . .9512.1 The Evolution of Voice Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96

12.2 What is Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98

12.3 Why Perform Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99

APPENDIX A - Company and Product Overview . . . . . . . . . . . . . . . . .101

APPENDIX B - Introduction to Decibels . . . . . . . . . . . . . . . . . . . . . . . .109

APPENDIX C - Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

APPENDIX D - References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117

VIII




9/127

1


1 In t rod u ct ion

This booklet provides an introduction into many of the fundamentals of voice communi-

cation, beginning with analog techniques, and concluding with voice over ATM. Some of

the ways that enterprise networks can be created will be discussed, and how to make

the most efficient use of available network bandwidth through techniques such as

speech compression and speech activity detection (also known as silence suppression).

Many other issues are discussed, including how and why echo occurs and techniques

that can be used to overcome it, thus allowing network managers to migrate their voice

networks onto ATM.

1.1 A Sh o r t His to ry of t h e Telep h on eIn the mid-1870s, while trying to understand sound and sound communications,

Scottish-born inventor Alexander Graham Bell had an idea for a device that would

transmit sound over long distances by converting the sound to an electrical signal. This

device was later called the telephone derived from the Greek words meaning 'far' (tele)

and 'sound' (phone). Bell was not the only person of the time developing a telephone

device. However, he was the first one to patent the device in 1876.

Further developments were made to the telephone during the late 1870s. Bell created

the induction-based earpiece, and Thomas Edison was responsible for the design of the

carbon microphone. The incorporation of these enhancements produced a truly practical

instrument.

Initially the telephone had no mechanism for dialing another number. To make a call a

handle needed to be turned which generated an electric current. This current signaled

the operator of the local exchange. To connect a caller to the called party, the operator

would manually insert a jack plug into the corresponding jack socket.

It wasn't until 1889 that Almon B. Strowger developed the automatic telephone

exchange. The most unlikely of people to be involved in telephony, Strowger developed

the exchange as a way of beating his business rival in Kansas City, USA. The wife of

Strowger's main competitor was the operator of the local exchange, and whenever a call

came in asking for an undertaker, naturally she passed it onto her husband. To alleviate

this problem, Strowger developed the first automatic telephone exchange and the dial

telephone, eliminating the need for an operator.

Telephone networks have undergone many changes since those early days. However,

many of the underlying principles remain the same. The basic "two-wire" telephone used

Sect ion 1 - In t ro du c t ion


10/127

in most domestic homes today still operates in essentially the same way as the tele-

phones of over 100 years ago.

2


Sect ion 1 - In t ro du c t ion


11/127

2 Th e Te lep h on e

In this section, the basic operation of the telephone is examined with a look at the two

basic functions that it offers: signaling and speech transmission. To better understand

this critical piece of equipment, it is important to appreciate how human voice and

hearing function. To complete this section, other types of telephones will be examined,

including proprietary designs and digital telephone sets.

2.1 Key Voice Fu n d a m en ta ls

2.1.1 Fre q u e n cy

Human speech occurs as a result of air being forced from the lungs, through the vocal

chords and along the vocal tract which extends from an opening in the vocal chords to

the mouth and nose. Speech consists of a number of different types of sounds, including

voiced, unvoiced and plosive sounds. The voiced sounds result from the vocal chords

vibrating, thus interrupting the flow of air from the lungs and producing sounds in the

frequency range of approximately 50 to 500 Hertz (Hz). Unvoiced sounds result when

the air passes some obstacle in the mouth or a constriction in the vocal tract. Finally,

plosive sounds result from air being let out with a sudden burst, for example when the

vocal tract is closed then suddenly released, or when the mouth is closed and suddenly

opened. A person's nasal cavities and sinuses also modify all of these sounds, and all

contribute to what we know as normal human speech.

The range of frequencies that result from these sound sources, combined with the

structure of the vocal tract, nasal cavities, and sinuses vary, depending upon whom is

actually speaking. The resulting mix of frequencies determines the unique sound of a

person's voice.

The range of frequencies produced by speech varies significantly from one person to

another as explained above. Normally, frequencies in the range of about 50 Hz upward

are generated, with the majority of the energy concentrated between about 300 Hz and

3 kilohertz (kHz). The human ear, on the other hand, can detect sounds over a range of

frequencies from around 20 Hz to 20 kHz, with maximum sensitivity in the region

between 300 Hz and 10 kHz.

Taking these two factors into account, as well as the results of practical testing, the

frequency band of 300 Hz to 3.4 kHz has been found to be the key sonic range for

speech intelligibility and voice recognition. Reducing this bandwidth quickly reduces

intelligibility, while increasing it adds quality yet does not significantly improve intelligi-

bility or voice recognition.

3


Sect ion 2 - Th e Te lep h on e


12/127

As a result, the frequency band used in telephone systems is limited to between 300 Hz

and 3.4 kHz, delivering a system that provides speech transmission that is quickly

recognized and easily understood.

2.1.2 LevelsIt is important to ensure that voice signals are transmitted at the correct level across a

network, so that end-to-end performance is maintained. Too low a level can result in

speech merging into background noise, creating an environment where the listener finds

it hard to hear the talker and is encouraged to talk loudly. On the other hand, too high a

level will encourage the listener to talk too quietly.

Today, international voice communication is part of everyday life. People need to be able

to communicate with others anywhere in the world as effectively as if they were in their

own country, or even in their own office. This goal is complicated by the way telephone

systems have evolved differently in different countries. For example, an analog (the term

analog is described in section 4.2) telephone from North America transmits a lower-level

electrical signal for a given acoustic volume than a telephone in the UK.

Signal levels will be discussed in this booklet in terms of decibels (dBs), and related

terms such as dBm, dBm0, and dBr. For readers unfamiliar with these terms, or individ-

uals who simply need a refresher, please refer to Appendix B.

2.2 Ba s ic Ope r a t i on o f t h e Te le p h on e

Telephones come in many varieties, yet they fall into two main categories: analog, and

digital. The original sets designed by A. G. Bell were analog. In fact, most telephones

used in domestic environments are still analog.

The simplest form of telephone today is the two-wire "loop-disconnect" telephone. It is

also known by various other names, including "loop-start" and "POTS" (Plain Old

Telephone Service) telephone. It connects to the telephone exchange via two wires that

carry the voice signals in both directions, hence the term two-wire telephone. The wires

also carry the dialed digits to the exchange and the incoming ringing voltage to the

phone. The exchange places a voltage of about 48 volts across the pair of wires to

power the telephone and monitor the on-hook, off-hook, and pulse dialing activity.

2.2.1 Ba sic Telep h on y - Sign a lin g

To initiate a call, the user lifts the handset. This action closes a switch in the telephone

and causes current to flow in a loop, hence the term "loop-start." The exchange detects

this current as an incoming call and provides a dial tone to the line. The dial tone signals

4


Sect ion 2 - Th e Te lep ho n e


13/127

to the user that they may now start to dial. Dialing before hearing the dial tone may result

in digits being missed by the exchange. However, today's modern exchanges will

usually return dial tone immediately after detecting current flow. Upon hearing the dial

tone, the user begins to dial the called number. If the telephone is set to pulse dial, the

telephone rapidly opens and closes the loop at a rate of approximately 10 PPS (Pulses

Per Second) or 20 PPS. This is also referred to as loop-disconnect dialing. Figure 2.1

shows the progress of a call from the handset being lifted and dial tone being returned,

to the first digit being dialed (a 3 in this case).

The dial speed and the make/break ratio are standards that were set in the past. They

reflect the characteristics of switching equipment and direct control switches. The make

break ratio varies according to the different dial pulse receivers used in different

countries (e.g. North America:

61/39, UK: 67/33, Germany:

60/40). The ratio 50/50 was not

chosen because it did not match

the characteristics of mechanical

relays and switches in the

switching systems.

An alternative way of sending

dialing information, called Dual

Tone Multi Frequency (DTMF) is

much more common today. In this

form of signaling, each number is

represented by two tones that are

transmitted simultaneously on the

voice path for short period of time.

5



Figu re 2 .2 DTMF Freq u en cies

Figu re 2.1 Op era t ion o f Loo p Discon n ect Dia l ling


14/127

The frequencies used are shown in Figure 2.2 and defined in ITU-T Recommendation

Q.23 [1].

DTMF transmits digits much faster than pulse dialing, and the time taken to send each

digit is independent of the digit being sent. An additional benefit of DTMF is that once

the call is established, pressing a key on the phone will transmit the tones over the voice

path, enabling DTMF to be used to access voice mail, home banking systems, and other

tone-based systems.

When an incoming call arrives at the telephone set, the exchange applies an AC ringing

voltage to the pair of wires. To answer the incoming call, the user picks up the handset.

This action applies a loop to the line that is detected by the exchange, which then

removes the ringing and connects the voice path through.

Recall - Recall is a function usually available on a simple two-wire analog telephone

(except for older models). It is often accessed with a button marked "R", and can be

used for a number of functions such as accessing additional features from a telephone

exchange or swapping between calls on the same line. There are two types of Recall,

namely Timed Break Recall (TBR) and Earth Recall (ER). With TBR, pressing the Recall

button while the handset is off-hook causes the phone to put a timed break on the line

(similar to dialing a "1"). With the phone set to Earth Recall, the phone momentarily

applies a ground (earth) to one of its leads, known as the B lead.

6



Figu re 2 .3 Tw o w ire Telep ho n e Set In te rfa ce


15/127

2.2 .2 Ba s ic Telep h on y - Th e Sp ee ch Pa th

Apart from transmitting dialing information, the main function of the telephone is to

provide voice communications. As already mentioned, the simple telephone has to

provide simultaneous voice paths in both directions even though there are only two

wires. It achieves this through the use of a hybrid circuit, the purpose of which is to take

four-wire speech (i.e. separate paths for transmit and receive) and to combine the two

onto a single two-wire path. Figure 2.3 shows a simplified view of the interface between

a two-wire telephone and the exchange. Speech from the mouthpiece will pass through

the hybrid and onto the telephone line. However, for reasons described below, a certain

amount of the speech signal will be reflected back to the earpiece, which is referred to

as sidetone.

The telephone, the telephone line, and the exchange interface create impedance that

determine the relationship between the voltage and current on the line. To enable the

maximum amount of power to be transferred from the telephone to the line and into the

exchange interface (and vice versa), the impedances should be as close as possible to

one another. This is achieved through the use of balance impedance.

As long as the balance impedance closely matches the impedance presented by the line

and the exchange interface, minimal amounts of signal will be reflected back to the

telephone earpiece. If there is significant impedance "mismatch", then sidetone

increases as more signal is reflected back to the earpiece. At the exchange interface,

the speech signal originating from the telephone mouthpiece is directed towards caller

"Y" by the hybrid.

In the opposite direction, a speech signal from caller "X" at the exchange is transmitted

to the 2-wire line, and as long as the exchange balance impedance matches the line and

telephone set, then little of this signal is reflected back to caller "Y".

2.3 Oth er Typ es of Te lep h on esWhile the simple two-wire telephone set is used extensively in the domestic environ-

ment, it is less common in private networks based on Private Branch Exchange (PBX)

systems. In this environment, it is common to find feature telephone sets that offer a

wider range of facilities than the basic telephone. These feature sets will vary from

manufacturer to manufacturer, each one usually being of a proprietary design. Some will

be analog, while others will be digital. Most PBX systems today are of the digital variety,

although this does not necessarily imply the use of a digital telephone set, and allow

both analog and digital phones to be used. With an analog phone, the interface in the

7




16/127

PBX converts the analog signals to digital PCM (see Section 4.2). A digital telephone

performs the analog/digital and digital/analog conversions within the set.

Some digital telephones are of a proprietary nature, where the format of the digital data

is manufacturer-specific. Many digital sets, however, conform to the Basic Rate ISDN

standards as outlined in ITU-T Recommendation I.420 [2]. This defines a digital

interface that carries two channels operating at 64 Kbps (known as the B or Bearer

channels) and a 16 kilobits per second (Kbps) signaling channel (known as the D or

Delta channel).

The B channels can be used to carry data or digitized voice in a similar manner to the

Primary Rate interfaces as described in sections 4.3 and 4.4. The main difference is that

on a Basic Rate interface only two B channels are available. The D channel is normally

used to carry Common Channel Signaling (CCS) information for call control in a similar

manner to CCS on Primary Rate interfaces (see Section 7.2). However, the specifica-

tions also allow the D channel to be used to carry user data, including packet-switched

and frame-based data.

2.3.1 "Tw o-w ire" vs . "Fou r-w ire" Telep h on y

References are frequently made to two-wire or a four-wire telephone sets. Care should

be taken when interpreting the meaning of these terms, since they can be confusing. A

two-wire telephone means that speech is carried in both directions on the same pair of

wires, and requires hybrid circuits to split the two paths into separate transmit and

receive functions at the telephone set. It is also possible, particularly in some proprietary

telephone sets, to have additional wires for signaling purposes. In this case, a two-wire

telephone may actually be connected to the exchange by more than two wires.

A four-wire telephone strictly means that speech is carried in each direction on separate

pairs of wire and no hybrid circuit is necessary. However, a better definition of four-wire

telephony is that the speech is carried on separate paths that may be pairs of wire, or

might even be separate channels in a digital system. For example, once into a digital

telephone exchange, speech is carried as "four-wire" even though it is carried in

timeslots in high-speed digital signals rather than actually on four wires.

8




17/127

3 PBX Ph on e Sys te m s

This section is not intended to be a detailed description of the operation of PBX phone

systems. However, in order to better understand some of the other sections presented

in this booklet, some of its key aspects will be covered.

The PBX system will be examined followed by a quick look at how telephone calls are

routed within a PBX network. Finally, the different types of voice interfaces normally

available on a PBX will be introduced.

3.1 In t r od u ct ion to th e PBX

Simply put, a PBX is the telephone exchange that is privately owned by an organization.

Its objective is to provide voice (and often data) communications to its users. In addition

to offering simple call setup facilities, it also offers many other features and facilities to

make life easier for its users. Users can place calls to others in their organization by

simply dialing their extension number. To place a call to a person not connected to the

same PBX network requires the call to be routed via the Public Switched Telephone

Network (PSTN). This usually involves dialing an access code, such as a "9" or "0",

followed by the complete destination number including country code and area code

if appropriate.

9


Sect ion 3 - PBX Phon e Sys te m s

Figu re 3.1 Typica l Com p on en ts of a Dig i ta l PBX


18/127

Most larger PBX systems today are digital. This means that they route connections in a

digital form, with speech first being converted from analog into PCM (see Section 4.2).

Figure 3.1 shows the typical components used within a digital PBX.

The "core" of the PBX is the common control and the switching matrix. The common

control acts as the "brains", and controls the overall operation of the PBX. It performs

functions such as recognizing that a phone has been taken off hook and connecting a

dial-tone generator to the phone, interpreting the dialed digits and routing the call to a

particular trunk or line interface, and so on. The switching matrix takes in 1.544 Mbps bit

streams comprising multiple 64 Kbps channels and allows them to be connected, or

switched, to any other 64 Kbps channel on any other interface.

Interfaces to the PBX come in two main types: lines and trunks. Lines connect user

devices such as analog or digital telephone sets, or other devices such as data

terminals. Trunks are shared links and can carry connections originating from line inter-

faces on the same PBX or from other trunks also connected to the PBX. Analog trunks

can only support one connection at a time, while digital trunks can support many

connections simultaneously (see Sections 4.3 and 4.4). The trunks can then be broken

down into two additional types: PSTN trunks (also called Central Office trunks) and

private trunks. PSTN trunks connect the PBX to the public telephone network, and

private trunks (also called "Tie lines" or "Tie trunks") connect the PBX to other PBXs as

part of an overall private network.

3.2 Ca ll Ro u t in g in a PBX

When a user dials a destination number, the PBX needs to determine how to route the

connection in the most efficient manner. The PBX needs to consider many factors,

including: Is the number valid? Is this user allowed to connect to the specified destina-

tion? Which is the cheapest trunk to use? Is there a trunk free?

Figure 3.2 shows a network of PBX systems connected together with inter-PBX trunks,

which could be analog, digital, or both. To place a call the user takes Telephone A "off-

hook" and dials the number for Telephone B. PBX 1 inspects the dialed digits and makes

a decision as to which trunk to route the call on. In this case it chooses Trunk 1. PBX 1

seizes Trunk 1 (or a single timeslot of a digital trunk) and passes dialing information

across it. The method used for dialing will depend upon what type of trunk it is (see

Section 7 for more details on signaling). PBX 2 receives the call, inspects the dialed

number, and routes the call accordingly onto PBX 3. PBX 3 inspects the digits, identifies

the destination to be located on the exchange, and alerts the user to an incoming call

by making the phone ring.

10




19/127

3.3 Voice In te r fa ces o n a PBX

The types of voice interface available on PBX systems are many and varied. They

typically fall into three categories:

Line Interfaces - These are the interfaces on the PBX that connect to desktop tele-

phones. Line interfaces can include any of the types discussed in Sections 2.2 and 2.3,

including:

2-wire analog loop disconnect/loop start

2-or 4-wire analog proprietary feature set

4-wire digital set, either proprietary or conforming to Basic Rate ISDN

standards

Private Trunk Interfaces - These interfaces provide the links between PBXs within a

multi-PBX private network. They allow calls to be routed from one PBX to another

without the need to involve the public network, avoiding extra call cost. Private trunk

interfaces typically include:

2- or 4-wire analog with Ear and Mouth (E&M) signaling

4-wire analog with AC15 signaling

Digital trunk supporting CAS or CCS signaling

Public Trunk Interfaces - These provide access from the PBX to the PSTN (Public

Switched Telephone Network) for outgoing and/or incoming calls. Public trunk interfaces

typically include:

Ground Start analog trunk: 2-wire, both-way calling

Analog Direct Dial In (DDI): 2-wire, typically incoming calls only

Digital trunk supporting CAS or CCS signaling

11



Figu re 3.2 Ca ll Rou t in g in a PBX Pr iva t e Ne tw ork


20/127

12



21/127

4 In t r o d u ct i o n t o Dig i t a l Vo i ce

As previously discussed, many of today's voice interfaces rely on analog technology.

However, once received into a PBX or a wide area network, analog voice is converted

to a digital format in order to derive all the available benefits. This section looks at the

background to digital voice, followed by a more detailed look at how analog speech is

actually converted to a digital format. The integration of digital voice channels into

primary rate digital interfaces is then presented, and finally the need for synchronization

in a PBX network is examined.

4 .1 Th e Ch a n n e l Ba n k

The channel bank was one of the first devices to make use of digital voice in a practical

environment. It is a device that takes multiple analog voice channels, digitizes them, and

then multiplexes them onto a high-speed digital link. Two main types of channel banks

exist today, one supporting up to 24 voice channels, and another supporting 30/31

channels multiplexed onto 1.544 Mbps and 2.048 Mbps digital links.

The first channel bank was developed in North America in 1962, and was known as the

D1 channel bank. It provided 24 analog inputs, each of which was converted to 8-bit

PCM (although the least significant bit was then ignored). The resulting seven bits were

used for each voice sample, and one bit was used for signaling, providing a combined

data rate of 1.544 Mbps. It was later found that this seven-bit PCM gave unsatisfactory

voice quality. Subsequent generations have used eight-bit PCM with "robbed bit"

signaling (see Section 4.4.4).

Channel banks have evolved beyond simply supporting analog voice circuits. Today, it

is common for a channel bank to support multiple interface types other than just the

analog voice interface. For example:

2-wire speech with loop disconnect signaling (incoming and outgoing)

2-/4-wire speech with E&M signaling

Data 0 - 64 Kbps

Data n x 64 Kbps

4.2 Digi t a l Voice - Pu lse Cod e M od u la t ion (PCM) G.711

Digital voice is a representation of analog voice signals that uses binary "1"s and "0"s,

also known as bits.

13


Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice


22/127

Figure 4.1 shows the progress of a speech signal entering the telephone, being

converted into an analog electrical signal, and then being converted into a digital form.

When the talker speaks, they create variations of pressure in the air. The telephone

picks up these pressure changes and turns them into an electrical signal that is

analogous to the acoustic signal from the talker (hence the term analog). This analog

signal is then converted into a digital stream of data bits which represents the digital

voice signal.

But why transport voice in a digital format? There are a number of reasons, including:

Digital transmission is independent of distance - When an analog

signal is transmitted on a transmission channel, it is attenuated to

compensate for signal losses in the cable. In addition, noise is picked up

that will affect the quality of the voice transmission. The signal that arrives

at the destination is made up of a combination of the original signal and

line noise. Amplifiers can be used to boost the signal back to the original

level, but there is no easy way for the amplifier to distinguish between the

original signal and line noise, so the noise is also amplified.

Digital signals take on one of two levels, represented by binary "0"s and

binary "1"s. When noise is introduced into the digital signal, it can be

easily removed by regenerating equipment. Thus, the signal that arrives

at the destination is an identical replica of the signal transmitted from the

source.

Multiplexing of voice and data - Since the digitized PCM voice is essen-

tially a data stream running at a bit rate of 64 Kbps, it can be readily inte-

grated with other PCM channels to make up an aggregate connection

combining many voice channels in one physical connection. Since it is

14



Figu re 4 .1 Prog ress o f spe ech signa l An a logu e to Dig it a l


23/127

essentially data, it can be combined with other 'real' data and transmitted

over a common transmission medium. This approach is used for the

digital representation of analog voice signals in telephone systems and is

defined in ITU-T Recommendation G.711 [3]. A simplified view of the

process can be seen in Figure 4.2.

The analog signal is sampled at a rate of 8000 times per second. This rate is derived

from a theory developed by Harry Nyquist, which states that the sampling rate must be

at least twice the maximum frequency of the signal being sampled (i.e. > 2 x 3.4 kHz).

This results in Pulse Amplitude Modulation (PAM), which is simply a series of pulses that

represent the amplitude of the analog signal at each sample time. Each PAM sample is

compared to a range of fixed quantization levels, each of which is represented by a fixed

binary pattern. The binary pattern of the closest quantization level is then used to

represent the PAM sample.

15



Figu re 4 .2 An a logu e to PCM Con vers ion

Fi g u re 4 . 3 Lin e a r / Un i fo rm a n d N o n - lin e a r / N o n -u n i fo rm Q u a n t i sa t i o n


24/127

Due to a finite number of quantization levels available, this process introduces an error

into the digital representation of the analog signal. The more bits used in each sample

results in additional quantization levels and less error. To achieve reasonable quality

over the range of speech amplitudes found in networks requires a minimum of 12-bit

PCM samples assuming linear quantization (also known as uniform quantization). A

view of linear quantization is given in Figure 4.3a.

In practice, this number of levels is unnecessary for two reasons. First, average signal

levels are normally small, and only the lower quantization levels actually get used.

Second, the human ear operates in a logarithmic manner, being more sensitive to distor-

tion in low-level signals than in high-level signals.

As a result, a technique known as companding is used. This reduces the number of

quantization levels by retaining multiple levels at low signal amplitudes, and reducing the

number of levels for high amplitudes. This process of companding is shown in

Figure 4.3b.

4.2 .1 A-la w a n d - la w PCM

There are two common types of PCM, -law and A-law, each of which uses a different

rule for the companding process. North America and Japan mostly use -law, whereas

other areas of the world require the use of A-law. Both types are defined in the G.711

Recommendation, yet differ in a number of ways. The companding laws are different,

and the allocation of PCM codes differs in relation to the amplitude of the PAM samples.

With A-law, after converting from PAM to PCM, the even bits of each sample are inverted

before being transmitted onto the digital transmission path. This bit inversion was origi-

nally used to ensure that a sufficient number of "1"s existed in the digital stream,

because any channel that was idle would otherwise produce a pattern of only "0"s. In

fact, on a 2.048 Mbps PBX interface this inversion is unnecessary, since the problem of

too many "0"s is dealt with at the physical layer (see Section 4.3.1).

For these reasons, when operating an international network where both -law and A-law

PCM systems are used, it is important to perform a proper conversion between the two.

The conversion process is given in G.711, which defines that digital paths between

countries should carry signals encoded using A-law PCM. Where both countries use the

same law, then that law should be used on digital paths between them. Any necessary

conversion will be done in the countries using -law PCM.

4 .2.2 Pow er o f a D ig i t a l Sign a lIt is easy to quantify the power levels for an analog interface, since this is something that

can be measured directly with a power meter. In the PCM world, there is no equivalent

16




25/127

direct method of measurement. Instead, a specific relationship is defined between an

analog signal and a digital sequence. The relationship of power to the digital signal is

defined in G.711 through two tables (one for -law and one for A-law) that define a

sequence of PCM samples. When decoded, these samples result in a sine wave signal

of 1 kHz, at a nominal level of 0 dBm0. This provides a theoretical maximum level of

+3.17 dBm0 for -law, and +3.14 dBm0 for A-law. Any attempt to exceed these levels

will result in distortion of the signal, simply because there are no more quantization

levels.

4 .2.3 Disto r t ion Resu lt in g f rom th e Dig i t i za t ion Proces s

When PCM values are allocated to the PAM samples, a certain amount of distortion

results because of the finite number of quantization levels available to quantize the

analog signal. Distortion is covered in more depth in Section 7.7.

4 .3 Th e Dig ita l 1.54 4 M b p s PBX In t e rfa ce (DS-1)

The 1.544 Mbps PBX interface is common to North America and Japan, and is often

referred to as a "T1" or a "DS-1" interface. (In practice, these two terms are often used

interchangeably, although this is wrong. DS-1 refers to a particular speed of 1.544 Mbps,

while T1 refers to a digital transmission system). It offers 23 or 24 traffic timeslots

depending upon the type of signaling being used.

In countries supporting DS-1 interfaces, such as in North America, various types of T1

transmission facilities are offered. AMI facilities expect the attached device (e.g. PBX) to

provide an AMI electrical signal as described in Section 4.3.1. The main problem with

this is that long strings of "0"s do not provide any electrical voltage transitions. This can

result in loss of repeater synchronization on the transmission facility. It is therefore the

responsibility of the attached equipment to ensure that sufficient "1"s exist to maintain

synchronization. The proportion of "1"s to "0"s is known as the "1"s density.

An alternative type of facility supports Bipolar Eight Zeroes Substitution (B8ZS), where

violation pulses are introduced into the user data stream upon the detection of an

excess "0"s count. This technique is similar in principle to HDB3, as seen in Section

4.3.1, and is described below.

4.3.1 Physical Interface

DS-1 is supported via twisted pair cable only, unlike E1 that is supported on both unbal-

anced coaxial cable and balanced twisted pair cables. DS-1 uses Alternate Mark

Inversion (AMI) line coding to electrically encode the signal on the line. However, to

overcome any problem of low "1"s density, a process called B8ZS is normally used

17




26/127

instead of the E1 HDB3 process. The operation of B8ZS, shown in Figure 4.4, works by

replacing strings of eight consecutive binary zeroes with a code that introduces bipolar

violations into the 4th and 7th bit positions. This ensures that a sufficient number of

voltage transitions exist, while retaining the DC-balanced nature of the signal.

4.3.2 Fra m ing - D4

There are two common types of framing used on a DS-1 interface, D4 and Extended

Superframe (ESF).

D4 framing is shown in Figure 4.8. This consists of a frame of 193 bits with a repetition

rate of 8000 frames per second, giving a data rate of 1.544 Mbps and a 125s frame

duration. Each frame contains 24 eight-bit timeslots named timeslot 1 through to timeslot

24, and a single bit called the F or framing bit. All 24 timeslots are normally available for

traffic except for when CCS is carried. In this case, timeslot 24 is reserved for the

signaling channel.

Framing is achieved using the F bit over a sequence of 12 frames, which is also called

a superframe. In odd-numbered frames the F bit is called Ft for terminal framing, and

performs frame alignment. In even-numbered frames the F bit is called Fs and performs

superframe alignment.

4 .3.3 Fra m ing - Ex te n d ed Su p e r f ra m e (ESF)Today, ESF is more common than D4 due to its capabilities for monitoring the perfor-

mance of an in-service T1 link. This was not easily possible with D4, since the link would

need to be taken out of service in order for performance testing to be carried out.

The extended superframe, as shown in Figure 4.9, does exactly what its name implies

and extends the 12-frame superframe to 24 frames. The use of the F bit is also changed.

18



Figu re 4 .4 HDB3 Cod ing


27/127

Only 6 out of the 24 frames in the ESF are now used for synchronization purposes. Of

the remaining 18 F bits, six are utilized for CRC checking to verify the integrity of the

ESF, and 12 make up a Facility Data Link (FDL). The FDL is also known as the Data

Link (DL) and is sometimes called the Embedded Operation Channel (EOC). The FDL

is available for the communication of alarms, loop backs and general performance infor-

mation between terminating devices such as Customer Service Units (CSUs), which

terminate the T1s at the customer premises.

4.3.4 Ch a n n el Assoc ia te d Sign a lin g (CAS) on DS-1

The technique for CAS on a DS-1 is shown in Figures 4.8 and 4.9. The basic process is

the same for both D4 and ESF framing. However for D4, only two signaling bits, A and

B, are used for each traffic timeslot. With ESF, four bits are used: A, B, C, and D.

The process used is called bit robbing, because the least significant bit of each traffic

timeslot in every sixth frame is taken away to carry signaling information rather than

traffic. Meanwhile, the other seven bits are left alone and continue to carry traffic such

as PCM. Any distortion introduced to PCM voice traffic by this bit-robbing technique is

negligible and can be ignored. However, for data the distortion can be significant. This

is why data support is typically only 56 Kbps rather than 64 Kbps, with only the seven

most significant bits being used.

19



Figu re 4 .5 G .70 4 Fra m e St ru ctu re fo r 2.048 Mbi t / s E1


28/127

4 .3.5 Co m m o n Ch a n n e l Sig n a li n g o n DS-1

Common Channel Signaling utilizes timeslot 24 to carry signaling information as HDLC-

based data messages. See section 5.2.2 for more details and examples.

4 .3.6 DS-1 Ala rm sDS-1 provides the same alarm conditions, but in a different manner and with a different

naming convention. Figure 4.10 gives a comparison between DS-1 and E-1 alarm condi-

tions and the naming conventions associated with each.

The method used by DS-1 to provide a remote alarm indication differs depending on

whether D4 or ESF framing is being used.

With D4 trunks, a remote alarm indication, also called a yellow alarm, is given by trans-

mitting a "0" in the bit 2 position of every timeslot. Putting this alarm indication in the

traffic timeslots has two implications. First, it destroys any valid information carried in the

traffic timeslots. Second, the receiver must validate the indication for a period of time

(typically about 600 milliseconds (ms)) before taking any action, since it is possible that

normal traffic could briefly mimic it.

With ESF trunks, a remote alarm indication (yellow alarm) is given by using the F bit

Facility Data Link to transmit an alternating pattern of eight "0"s, followed by eight "1"s,

then eight "0"s, and so on.

4 .4 Th e Digi t a l 2 .0 4 8 Mb p s PBX In t e r fa ce (E1)

A digital PBX interface running at 2.048 Mbps, sometimes called the "E1" interface, is

designed to conform to ITU-T Recommendation G.732 "Characteristics of Primary PCM

Multiplex Equipment Operating at 2.048 Mbps" [4]. This in turn refers to the following

recommendations:

G.703: "Physical/Electrical Characteristics of Hierarchical Digital Interfaces"

G.704: "Synchronous Frame Structures Used at Primary and Secondary

Hierarchical Levels"

G.711: "Pulse Code Modulation (PCM) of Voice Frequencies"

4.4 .1 Physica l In t e rfa ce - G.70 3

ITU-T Recommendation G.703 [5] defines the electrical characteristics for many types

and speeds of interfaces, including 64 Kbps, 1.544 Mbps, 6.312 Mbps, 32.064 Mbps,

44.736 Mbps, 2.048 Mbps, 8.448 Mbps, 34.368 Mbps, 139.264 Mbps, 97.728 Mbps, and

155.52 Mbps. European voice applications primarily use the 2.048 Mbps interface.

20




29/127

One of two types of physical interface may be used: the 75-ohm unbalanced coaxial

interface, or the 120-ohm balanced twisted pair interface. Parameters including voltage,

voltage and frequency tolerance, and others are specified in G.703.

The actual data bits are transmitted using Alternate Mark Inversion (AMI) with High

Density Bipolar 3 (HDB3) encoding. The objective of using these is twofold: first, to

remove any DC component from the transmitted signal (AMI performs this function), and

secondly, to ensure that there are a sufficient number of voltage transitions in the signal.

This is referred to as "1"s density, and is important so that the receiving device can

derive synchronization, or timing from the signal (HDB3 performs this function).

Figure 4.8 shows AMI and HDB3 encoding. AMI employs a three-level signal where

binary zeroes are encoded using 0 volts, and successive binary "1"s (marks) are

encoded using alternating voltages of 2.37 V for the unbalanced interface and 3 V for

the balanced interface.

HDB3 is defined in G.703, and works by replacing each block of 4 successive zeroes by

a pattern of either 000V or B00V, where B represents an inserted pulse that conforms to

the AMI rule and V represents an AMI violation. A violation is where two successive "1"s

use the same electrical polarity. The choice of which pattern is used ensures that the

number of B pulses between consecutive V pulses is odd, thus retaining the DC-

balanced nature of the signal. This is important to help ensure error-free transmission.

4 .4 .2 Fra m ing S t ru ctu re - G .70 4

ITU-T Recommendation G.704 [6] defines the frame structures for a number of different

speed links, including 1.544 Mbps, 6.312 Mbps, 2.048 Mbps, and 8.448 Mbps.

As shown in Figure 4.5, a frame of 256 bits is defined with a repetition rate of 8000

frames per second, giving a data rate of 2.048 Mbps and a 125 s frame duration. Each

21



Figu re 4.6 E1 Alar m s


30/127

frame comprises 32 eight-bit timeslots named timeslot 0 through to timeslot 31. Timeslot

0 is used for a number of purposes, including frame synchronization and alarm

reporting. Timeslot 16 is normally used to carry signaling information, although in some

circumstances it may be used to carry a traffic channel. Timeslots 1 to 15 and 17 to 31

are used to carry 30 traffic channels, normally PCM in the case of PBX systems.

However, since these timeslots simply represent a 64 Kbps channel they can be used

to carry any form of traffic, including data.

Alternate timeslot 0s carry different information. Figure 4.5 is concerned with the frame

alignment signal 0011011 in even frames, and the A (alarm) bit in odd frames. The

purpose of the frame alignment signal is to allow the devices at each end of a link to

22



Figu re 4 .7 Bipo la r Eight Zer oe s Su bst i tu t ion (B8ZS) Cod ing

Figu re 4 .8 D4 Fra m ing


31/127

synchronize to the frame, enabling them to know where the frame starts and which bits

refer to which timeslot. The A-bit, also known as the Remote Alarm Indication (RAI) is

set to "0" in normal operation. In the event of a fault condition the A-bit will be set to a

binary "1". See section 4.3.5 for more details on alarms.

The other bits, Si and Sa4-Sa8, are of less significance, although still important. Si is

reserved for international use. One specific use, as given in G.704, is to carry a Cyclic

Redundancy Check that can be used for enhanced error monitoring on the link. It is

important to note that both ends of a link must be configured in the same way, either with

CRC enabled or CRC disabled.

Sa4 to Sa8 are additional spare bits that can be used for a number of purposes as

defined in G.704. For example, Sa4 can be used as a message-based link for opera-

tions, maintenance and performance monitoring.

4.4 .3 Ch a n n el Assoc ia te d Sign a lin g (CAS) on E1Figure 4.5 demonstrates a form of signaling known as Channel Associated Signaling

(CAS). With CAS, specific bits of data within the frame are defined to carry signaling

information for each of the traffic timeslots. The information for each timeslot is trans-

mitted as a group of four bits, designated A, B, C and D. Since timeslot 16 only has eight

bits available to support 30 traffic timeslots, a multiframe structure of 16 frames (desig-

nated frame 0 to frame 15) is defined to allow A, B, C, and D bits to be carried for all 30

timeslots. Frame 0 carries a Multiframe Alignment Signal (MFAS) of four zeroes. This

allows the receiving system to identify which frame is which, and associate a traffic

timeslot with its correct signaling bits. Frame 0 also carries a remote alarm indicator to

23



Fig u r e 4.9 ESF Fr a m in g


32/127

signify loss of multiframe alignment. Timeslot 16 of frame 1 then carries A, B, C, and D

bits for timeslots 1 and 17, timeslot 16 of frame 2 carries A, B, C, and D bits for timeslots

2 and 18, and so on up to frame 15 of the multiframe, which carries A, B, C, and D bits

for timeslots 15 and 31, after which the sequence repeats.

A common problem is when the A, B, C, and D bits in timeslot 16 associated with any

of the traffic channels 1 to 15 are set to 0000. If this is the case a false multiframe

alignment signal will result, causing the whole signaling mechanism to fail. This situation

is most common in a configuration where the PBX system is connected to a multiplexer

network that provides an idle pattern of 0000 to the PBX for channels that are not routed.

In fact, ITU-T G.704 recommends against the use of 0000 for any signaling purposes for

timeslots 1 to 15. It also recommends that if B, C, and D are not used, then they should

be set to B=1, C=0 and D=1.

4 .4 .4 Com m on Cha n n e l Sign a ling (CCS) on E1

Common Channel Signaling utilizes timeslot 16, as does CAS. However, rather than

defining specific bits to carry signaling information for each of the traffic timeslots,

signaling information is sent as High Level Data Link Control (HDLC)-based data

messages. See section 5.2.2 for more details and examples.

4 .4.5 E1 Ala rm sRecommendation G.732 describes various fault conditions and subsequent actions that

should be taken. It includes such conditions as power supply failure and codec failure,

although this booklet will only describe problems associated with the link itself.

Frame Level Alarms - (in reference to Figure 4.6) In the event of one of the following

problems occurring on the received signal (Rx), bit 3 in timeslot 0 of odd numbered

frames should be set to "1" on the transmit (Tx) signal of that PBX system.

Loss of incoming signal

Loss of frame alignment

Excessive error ratio 1 x 10-3

24



Figu re 4 .10 DS-1 a n d E1 Ala rm Com p a rison


33/127

Loss of frame alignment will occur for a number of reasons, including cabling or

equipment faults. In the event of a failure in the transmission system between the two

PBXs, the transmission system should automatically apply an Alarm Indication Signal

(AIS) condition to the line of a continuous stream of "1"s.

Multiframe Alarm - When running Channel Associated Signaling, bit 6 of timeslot 16 in

frame 0 of the multiframe is used to indicate loss of multiframe alignment. If multiframe

alignment is lost on the receive signal, the PBX will set bit 6 in its transmit signal as to

the far end.

Loss of multiframe alignment is a rare situation, since in a normal failure condition loss

of frame alignment is more likely. A common cause of such a condition is misconfigura-

tion of one end of the link, as described in Section 4.3.3 above.

4.5 Th e Ne ed for PBX Syn ch ron iza t ion

Similar to many digital systems, digital PBX systems operate internally in a synchronous

manner where data is moved from one place to another using a common clock, or timing

source. When two PBXs are connected together via a digital link, they must be synchro-

nized in order to avoid bit slips and loss of frame synchronization.

Digital PBX interfaces typically have some form of buffering mechanism in order to

overcome problems associated with jitter, wander, and slight timing inaccuracies.

However, in the event that the timing mismatch between two PBXs is too great, the

buffer will fill. Once full, it must be emptied, resulting in loss of data and loss of frame

synchronization. Frame synchronization should be regained rapidly, so long as the

timing mismatch is not too great.

25



Figu re 4.11 La ck of Syn chro n isa t io n

1,544,001Mbit/s 1,543,999Mbit/s


34/127

To understand why synchronization is required, consider the following two examples.

The first example shows two PBX systems connected without synchronization, while the

second shows PBX 2 synchronized to PBX 1.

4.5.1 PBX Sys te m s With ou t Syn chro n iza t ion(With reference to Figure 4.11) The two PBXs are connected via a 1.544 Mbps link. This

is only a nominal speed and some deviation is inevitable. It is quite possible that the

transmit clock from PBX 1 is running slightly fast, say 1.544001 Mbps. This represents

an accuracy of about 5 parts in 10 million, which is not uncommon. PBX 2, on the other

hand, is running slightly slowly at 1.543999 Mbps, again an accuracy of about 5 parts in

10 million.

Every second, PBX 1 transmits 1,544,001 bits of data onto the trunk, and PBX 2

receives 1,543,999 bits, leaving two bits to be absorbed in a buffer at the input to PBX

2. This will continue with two bits being absorbed in the buffer every second until the

buffer is full, at which point it will be emptied causing loss of frame synchronization. The

effect of this is hard to predict, but will probably cause clicks on any voice call currently

in progress across the trunk, or disconnection.

4.5.2 PBX Sys te m s With Syn chro n iza t ion

(With reference to Figure 4.12) The system has been rearranged to allow PBX 2 to

synchronize its internal clock to the data arriving at the digital trunk port. PBX 1 still

transmits data a bit fast, at 1.544001 Mbps, but now PBX 2 also receives data at

1.544001 Mbps. The buffer does not fill, and frame synchronization is preserved.

This approach is effective in a point-to-point situation, but synchronization also needs to

be maintained in a full network scenario such as shown in Figure 4.13.

26



Figu re 4 .12 PBX Synchr on isa t ion

1.544001 Mbit/s 1.544001 Mbit/s


35/127

There are a number of rules to follow with synchronization, one of which is that the whole

network should be synchronized back to the same clock source. Figure 4.13 shows a

small network of five PBX systems with PBX 1 taking a high-accuracy clock from an

ISDN connection to the public network. An ideal clocking arrangement based upon this

network would be for PBX 2 to synchronize its system clock to the data on Link 1 coming

from PBX 1. PBX 3 would then synchronize to the data coming from PBX 2 via Link 2.

PBX 4 would synchronize to PBX 1 via Link 3 and PBX 5 to PBX 4 via Link 6. In this

way, all PBX systems are synchronized together.

However, this scenario alone would not compensate for a failure. If link 6 were to fail,

PBX 5 would lose the clock to which it is synchronized. To overcome this problem, it is

normal to have a clock fallback list in each PBX, and in the event of a problem the PBX

will search for another valid source.

However, one key criteria to follow when creating clock fallback lists is to ensure that the

network cannot get into a clocking loop. For example, this is where PBX 4 takes its

clocking signal from PBX 3, PBX 3 from PBX 5, and PBX 5 from PBX 4. In this scenario

it is likely that errors would occur, causing loss of synchronization on the links. The error

level could extend to where synchronization cannot be regained, resulting in a complete

loss of one or more trunks. For this reason, it is very important to follow the manufac-

turer's guidelines when configuring network clocking.

27



Figu re 4 .13 Synchr on isa t ion in a Net w ork of PBXs


36/127

28



37/127

5 Sp e e ch Co m p r e s s io n

Speech compression, also referred to as voice compression, describes the process of

digitizing speech to a bit rate of less than 64 Kbps. It is normal, however, to start with PCM

at 64 Kbps and compress it to a lower rate. Ideally, the resulting speech quality will not

be affected, however in practice, there will be some degradation that may or may not be

apparent to the users. There are many different techniques with different characteristics

used for speech compression, which result in bit rates from a few Kbps up to 40 Kbps.

PCM speech can be compressed because a large portion of the 64 Kbps bit stream is

redundant. Furthermore, it is thought that speech of reasonable quality can be provided

at rates as low as 1 Kbps. This has not yet been achieved, in large part because current

understanding of the way speech works is less than complete. As time goes by, new and

more efficient techniques are being developed to drive the bit rate lower and lower while

maintaining acceptable quality.

This section looks at a number of speech compression techniques in common usage

today, and other new systems poised for entry into the marketplace. The section

concludes with information on speech compression impairments, including the negative

effects that are introduced into voice telephony when speech compression is used.

5.1 Differe n t Co d in g Typ e s

Speech compression schemes can be classified into one of three categories: Waveform

Coding, Source Coding, and Hybrid Coding.

Waveform Coding - Waveform coders attempt to reconstruct a waveform in a form as

close to the original signal as possible, based on samples of the original waveform. In

theory, this means that waveforms are signal-independent and should work with non-

voice signals such as modem and fax traffic. Typically waveform coders are relatively

simple to implement and produce acceptable quality speech at rates above 16 Kbps.

Below this, the reconstructed speech quality degrades rapidly.

PCM is an example of a waveform coding technique. If linear quantization is used, then

at least 12 bits per PCM sample are needed to reproduce good quality speech. This

results in a bit rate of 96 Kbps (8000 samples per second x 12). However, the nature of

speech and human hearing does not tend to follow a linear pattern. Much of a speech

signal is at low levels, and human ears are not sensitive to the absolute amplitude of

sounds, but instead to the log of the waveform amplitude. Therefore, in representing

speech digitally, more bits are allocated at the lower levels than at the higher levels. This

29


Se c t i o n 5 - Sp e e c h C o m p re s s io n


38/127

process, called companding, results in digitized speech of 64 Kbps. Therefore, even

PCM effectively represents a compressed digital speech form. Companding is

discussed in more detail in Section 4.2.

Other common waveform coding techniques include Adaptive Differential Pulse Code

Modulation (ADPCM), and Continuously Variable Slope Delta (CVSD).

Source Coding - Source coders (also known as Vocoders) are more complex than

waveform coders, but can compress speech to bit rates of 2.4 Kbps and below. To

achieve these compression rates, knowledge of the speech generation process is

required. In principle, the speech signal is analyzed and a model of the source is

generated. A number of parameters are then transmitted to the destination to allow it to

rebuild the model and thus recreate the speech. These parameters include such infor-

mation as whether a sound is voiced (such as vowels) or unvoiced (such as most conso-

nants), amplitude information, and pitch. While source coders provide very low bit rates,

the subjective quality of the regenerated speech can be poor. Often, although the

speech is understood, recognition of whom is speaking, referred to as talker recognition,

is very poor. Furthermore, source coders do not carry non-speech signals very well,

such as modem or fax signals.

Because of these factors, source coders are not typically used in commercial applica-

tions. Their main use has been in military applications where natural sounding speech

is not as important as a very low bit rate, which can then be encrypted for security

purposes.

Hybrid Coding - As the name suggests, hybrid coding uses aspects of both waveform

and source coding, bringing together the benefits of high-quality speech from waveform

coders and low bit rates of source coders.

Hybrid coders operate in a similar manner to source coders, where a model is built on

the parameters of the speech signal. Rather than transmit these parameters directly to

the destination, hybrid coders are used to synthesize a number of new signals. These

new signals are then compared with the original signal to find the best match. The

modeled parameters, along with the excitation signal, which represents how the synthe-

sized signal was produced, are then transmitted to the destination where the speech is

reproduced. Examples of hybrid coders include Code Excited Linear Prediction (CELP)

and its derivatives, some of which are described later in this section.

30




39/127


40/127

DPCM in action, where only the difference between one sample and the next is trans-

mitted from the source to the destination.

For PCM, the analog speech signal is sampled at regular points, and the absolute level

of the sample is encoded. With DPCM, the difference between one sample and the next

is encoded. By doing this, fewer bits are needed to encode the signal. Although speech

samples don't change rapidly in practice, when they do the changes in signal level are

often greater than can be encoded using DPCM. This results in distortion of the signal,

as shown in Figure 5.1.

However, with ADPCM the amplitude range over which a given number of bits are used

to encode a sample varies, or adapts, depending upon the range of amplitudes

occurring at the time. This can be seen in Figure 5.2, which shows the principle of

encoding the difference between the actual signal and a prediction of what it will be,

based on the prediction that the next sample will be the same as the current one.

The general rule that is used by ADPCM is as follows:

"When signals are quantized towards the limit of the current range, the

range used to quantize the next sample is changed."

During the first few samples, the difference between one and the next will be relatively

small. These differences are encoded using the full range of quantization levels that are

available, given the number of bits available for each sample. For example, at 32 Kbps

the difference is encoded in four bits, one for the sign (whether the next sample is more

positive or negative than that predicted) and three for the magnitude of the difference. A

few samples further on, the difference between one and the next is greater, yet can still

be encoded using the full range of quantization levels. This is achieved through the

32



Figu re 5.2 Ad a p t ive Differen t ia l Pu lse Cod e Mod u la t ion (ADPCM)


41/127

process of adaptation, where the amplitude range that

can be encoded with four bits changes depending upon

the amplitudes at the time.

As previously mentioned, ADPCM supports four different

bit rates. The different rates are achieved through the

number of bits used to encode the difference between

one sample and the next, as seen in Figure 5.3.

5.3 Co d e Excite d Lin e a r Pre d ict ion (CELP)

At bit rates of around 16 Kbps and below, the quality of waveform coders falls rapidly.

We have also seen that source coders, while operating at very low bit rates, tend to

reduce the talker recognition substantially. Therefore, hybrid schemes, especially CELP

coders and their derivatives tend to be used today for sub 16 Kbps compression. Many

CELP implementations are proprietary to the manufacturer although we shall also

discuss two standardized versions known as LD-CELP as defined in ITU-T G.728 and

CS-ACELP as defined in ITU-T G.729.

The essence of a CELP encoder is to analyze the incoming speech, and then transmit

a number of parameters to the decoder so that the original speech can be reproduced

as accurately as possible. These parameters include the mathematical model of a filter

which simulates the talker's vocal tract (the vocal characteristics that make a person

sound unique), gain information giving the level of the speech, and a codebook index.

The codebook index is used to point to a sequence of pre-defined speech samples,

known as vectors, which is common to both the transmitter and the receiver. The

number of codebook entries and the number of samples within each entry is dependent

upon the actual CELP implementation.

At the transmitter, groups of PCM speech samples (vectors) from the input speech,

typically up to 20 ms in length, are compared to the vectors stored in the codebook.

Generating a synthetic speech signal for every entry in the codebook and comparing it

to the actual speech input vector does this. The index for the vector that produces the

best match with the input speech waveform is then transmitted to the receiver. At the

receiver, this waveform is then extracted from the codebook and filtered, using the math-

ematical model of the original talker's vocal tract. This produces highly recognizable,

high-quality speech transmissions.

Due to the complexities of speech and the wide range of different human voices, the

processing required for CELP is very intensive, typically of the order of 15 million instruc-

tions per second (MIPS) or more for one voice channel. However, CELP is a very

33



Figu re 5.3 ADPCM bits


42/127

popular form of speech compression because of its high speech quality and low bit

rates, typically between 4.8 Kbps and 16 Kbps. The practical drawbacks of CELP are

evident in two main areas. First, CELP often produces end-to-end delays in the order of

50 to 100 ms. This is due to a combination of the processing overhead and the number

of speech samples that are buffered for analysis. Such high delays can cause trans-

mission problems (see Section 6 on echo). Second, since CELP is tuned to human

speech, it does not support voice band data well and can cause problems with modems

and fax machines, as well as the transmission of DTMF tones.

5.4 Lo w De la y-Co d e Excit e d Lin e a r Pr e d ictio n (LD-CELP) ITU-T G.728

LD-CELP is based upon CELP, and provides similar speech quality as 32 Kbps ADPCM

at a rate of 16 Kbps. It also incurs smaller levels of delay, typically less than 2 ms, as

compared with normal CELP delay levels of 50 to 100 ms. LD-CELP uses backward

adaption to produce its filtering characteristics, which means that the filter is produced

from previously reconstructed speech.

At the encoder, A law or -law PCM is first converted to linear PCM. The input signal is

then partitioned into blocks of five consecutive input signal samples. The encoder then

compares each of 1024 codebook vectors with each input block. The 10-bit codebook

index of the best match codebook vector is then transmitted to the decoder.

The decoding operation is also performed on a block-by-block basis. Upon receiving

each 10-bit index, the decoder performs a table look-up to extract the corresponding

code vector from the codebook. The extracted code vector is then filtered to produce the

current decoded signal vector. The five samples of the post filter signal vector are then

converted to five A-law or -law PCM output samples.

Please note that this is a somewhat simplified description of the operation of LD-CELP.

A more detailed description can be found in G.728 [15].

5.5 Con ju ga te St ru ctu re -Alge bra ic Cod e Exc ite d Line a r Pred ict ion

(CS-ACELP) ITU-T G.729

CS-ACELP is another speech compression technique that is based upon CELP. It was

originally designed for packetized voice support on mobile networks, although a different

scheme known as RPE-LTP has been adopted on the GSM mobile telephone system

(see Section 5.6).

34




43/127

CS-ACELP operates at a rate of only 8 Kbps, yet still provides speech quality similar to

that of ADPCM at 32 Kbps. Furthermore, it has been shown to operate well in tests even

when packets are lost.

The coder operates on speech frames of 10 ms, corresponding to 80 samples at a

sampling rate of 8000 samples per second. For every 10 ms frame, the speech signal

is analyzed to extract the parameters of the CELP model (linear prediction filter coeffi-

cients, adaptive and fixed codebook indices and gains). These parameters are then

transmitted to the destination in a specified frame format. At the decoder, the filter and

gain parameters are used to retrieve the filter information and simulate the filter. The

speech is then reconstructed by taking the codebook entry of 80 samples and passing

it through the reconstructed filter. Speech is then be converted to A or -law PCM and

transmitted to the interface.

Please note that this is a somewhat simplified description of the operation of CS-ACELP.

A more detailed description can be found in G.729 [16].

5.6 O t h e r Co m p r e s sio n Te ch n iq u e s

Continuously Variable Slope Delta (CVSD) - CVSD was one of the earlier speech

compression schemes designed into TDM multiplexers and was very popular in the

early 1980s. It is a waveform coder operating directly on the waveform of the signal.

However, rather than starting with PCM, as is the case with many other compression

schemes, it often relies on analog rather than digital techniques. With CVSD coding, the

sending end compares the analog input voltage to an internal reference voltage. If the

input signal is greater than the reference, a "1" is transmitted and the reference voltage

increased. If the input signal is less than the reference, a "0" is transmitted and the

reference voltage is decreased.

Documents

Voice Fundamentals Book