EMS_THEORY05703671

8/3/2019 EMS_THEORY05703671

http://slidepdf.com/reader/full/emstheory05703671 1/6

A Theory of Loss-less Compression of High Quality Speech Signals with Comparison

Hussain Mohammed Dipu Kabir1, Syed Bahauddin Alam, Md. Isme Azam, Rishad Ahmed,

Md. Riazul Islam, M. Asif Ul Hoque, Md. Abdul Matin

Bangladesh University of Engineering and Technology,Dhaka-1000, Bangladesh,[email protected]

Abstract—The field of speech compression has advancedrapidly due to cost-effective digital technology and diversecommercial applications. In voice communication a real-timesystem should be considered. It is not still possible to compresssignals without facing any loss in real-time system. This paperpresents a theory of loss-less digital compression for savinghigh quality speech signals. Emphasis is given on the qualityof speech signal. In hearing music high quality music is alwaysneeded, consuming smaller memory space. In this compression8-bit PCM/PCM speech signal is compressed. When values of samples are varying they are kept same. When they are notvarying the number of samples containing same value is saved.After compression the signal is also an 8-bit PCM/PCM butexpansion is needed before hearing it. This technique may alsobe used in real-time systems.

Index Terms—Compression, Lossless, Probability, Identifier,Repetition, LPC, MPEG-4.

I. INTRODUCTION

While modern lossy coding standards such as MP3 or

AAC can achieve high compression ratios with transparent

subjective quality, they do not preserve every single bit of

the original audio data. Thus, lossy coding methods are not

suited for editing or archiving applications, since multiplecoding or post-processing can reveal originally masked

distortions. Applying lossless entropy coding methods such

as Lempel-Ziv, Huffman or arithmetic coding directly to

the audio signal is not very efficient due to the long-

time correlations and the high range of values. Therefore,

conventional data compression tools such as Winzip or gzip

fail in the case of digital audio data.

Lossless audio coding enables the compression of digital

audio data without any loss in quality due to a perfect (i.e.

bit-identical) reconstruction of the original signal. Entropy

coding is used for compressing data. Huffman coding is an

example of entropy coding [1]. These coding are used for

zipping files. These algorithms are not always compressingdata; especially for small frame. Lossless compression has

many applications in recording and distribution of audio.

For instance, bit-exact representation is strongly desirable in

archival systems, studio operations, for collaborative work or

music distribution over broadband links. This paper presents

a loss-less compression technique for saving speech signal.

It’s real-time applications are also mentioned in this paper.

PCM or DPCM is used in cellular communication. Before

sampling an analog low-pass filter is used to avoid folded

signals of higher frequencies. DPCM is better technique

[2] because here less data is needed to be transferred. In

speech compression Linear Predictive Coding (LPC) is used.

LPC is based on AR signal modeling. LPC is the basis

of speech compression for cell phones, digital answering

machines. LPC reduces the transmitted data by factor of

more than twelve [3]. LPC is a lossy compression scheme.

LPC is specifically tailored for speech. It does not work well

for audio in general. In adaptive rate sampling techniqueusing Level Crossing Sampling Scheme (LCSS) [4] bit-

rate varies with input and time. LCSS technique is not

always compressing data. So, network will face difficulties

in implementation of LCSS algorithm. Sampling rate can

be reduced to reduce bit-rate of data transfer, but speech

quality will degrade in reducing sampling rate. Non-uniform

sampling methods can be efficient for data compression for

saving data [5]. The proposed technique is better than non-

uniform sampling methods. In silence detection algorithm

for speech detection [6] speech and silence are separated

by efficient coding. In this algorithm a frame is declared

as silence or speech. This is a lossy technique and for same

quality output compression is less compared to the proposedalgorithm. Wideband audio compression is generally aimed

at a quality that is nearly indistinguishable from consumer

compact-disc audio [7]. Speech compression algorithms, in

mobile satellite systems bring us loosy compressions [8].

The proposed method is similar of PCM when signal

contains high-frequency and similar to level crossing sam-

pling scheme when change in signal is small. The change

in signal should be so small that the change is not noticed

when, signal is sampled as PCM. This compression remains

the signal PCM, without adding another identifier bit for

repetition. As another identifier bit is not used, one sample is

used as identifier of repetition. One sample holds one value.

The value used as identifying, should not exist in signal.One sample may hold the value same to identifier. In such

case the value should be changed slightly.

An experimental implementation of the polynomial ap-

proximation method for speech compression was integrated

into the 2400 b/s mixed excitation linear prediction (MELP)

speech coder standard [9], [10]. The version of this standard

which was used for implementation was an enhanced MELP

coder [11]. In this system, the frequency range is divided into

five contiguous frequency bands and synthesis is based on

UKSim Fourth European Modelling Symposium on Computer Modelling and Simulation

978-0-7695-4308-6/10 $26.00 © 2010 IEEE

DOI 10.1109/EMS.2010.33

136

UKSim Fourth European Modelling Symposium on Computer Modelling and Simulation

978-0-7695-4308-6/10 $26.00 © 2010 IEEE

DOI 10.1109/EMS.2010.33

136

8/3/2019 EMS_THEORY05703671


voiced/unvoiced excitations for each band. In speech com-

pression by polynomial approximation method new speech

coder is proposed which operates at a transmission rate of

1533 b/s, and for all noisy conditions tested performs better

than the 2400 b/s standard speech coder. But this modern

techniques are also lossy compressions and not applicable

for high quality music compression.

For a detailed introduction to the basic principles of loss-

less audio compression and an historical overview of the first

loss-less audio compressors, readers are directed to [12].

Practically a signal contains large number of zeros. It

occurs when person talks in less noisy environment. In case

of music some samples have very sharp changes and some

samples are containing zeros. The proposed compression

technique is effective for such signals containing large

number of zeros.

Using the proposed compression technique 10% to 85%

compression occurs. That means small memory is needed

to save it. Compression ratio depends on speech signal.

In this compression there is no constrains about sampling

rate and quality of speech. Tolerance level can also be

used to get better compression. Using tolerance level quality

of signal will degrade. Tolerance level 0.02 means when

two signals have 0.02 differences in value, their value will

be leveled. The proposed technique can also be used with

entropy coding for better compression.

I I . MATHEMATICAL BACKGROUND

In proposed technique PCM signals are compressed. In

practical case a signal contains large number of same valued

samples. Suppose samples are taken as 8-bit PCM; value of

samples will be -128 to +127 according to 2’s compliment

number. The value of a sample 120 or more is rare. Most

of the sample contains a value, near zero. The values of samples have normal distribution of variance, σ = 6 to 25;

depending on signal source [13].

Formula of normal distribution-

f (x) =1√

2πσ2.e−

(x−μ)2

2σ2 (1)

Here,

σ2 = variance, μ = median, x = value of a sample.

So a sample having a value 125 is very rare. According to

this equation this probability is 3.7574 ×10−96 (calculating

using Matlab and considering σ = 6) to 5.9626 ×10−8

(considering σ = 25). In a signal of 14,001 samples, no

sample has value of 125. The value 127 is not so rarebecause when a sample have value 127 or more; the value

of this sample is saved as 127.

Here, the value 125 is used as identifier of repetition.

Though probability a sample having this value is very rare,

this case should be considered. If any sample has the value

125 the value should be changed to 126. The value 126 is

very close to 125. Energy of signal is proportional to square

of amplitude. So percentage change in energy of signal due

to this change is (1 - 1252/1262) = 1.58% for only this

sample.

According to Weber’s law-equally perceived differences

have values proportional to absolute levels.

ΔReaponse ∝ ΔStimulus

Stimulus(2)

From this law the following equation is derived-

r = k.ln(s

s0) (3)

Here,

r = response, s = stimulus, k = constant.

For this intensity of signal is measured in db(decidel) and

signals are compressed by μ-law or A-law before sampling.

When the signal is previously compressed by μ-law or A-

law, change in response of signal will be (126 - 125)/125 =

0.8% When the signal is not compressed by μ-law or A-law,

change in response of signal will be very small. Assuming

20 consecutive samples has the value 0 and to represent it

3 consecutive samples are needed. They are-

s1 = 0 //value of samples2 = 125 //identifier of repetition

and s3 = 19. //number of repetition

Here, s3 is an unsigned 8-bit integer; may hold value 0 to

255.

Problem may arise when number of repetition is more than

255. Assuming 540 consecutive samples has the value 0 and

to represent it 5 consecutive samples are needed. They are-

s1 = 0 //value of sample

s2 = 125 //identifier of repetition

s3 = 255

s4 = 255

s5 = 29

From the values of samples, number of repetition is 255+ 255 + 29 = 539

When number of repetition is 255, then s3 = 255 and s4 =

0. s3 = 255 means repetition continues to s4 and s4 = 255

means repetition continues to s5 etc.

For 16 or 32 bit PCM data we will use (2N − 3) as

identifier of repetition. If this number exists in sample value

will be changed by adding 1.

Compression ratio is denoted by-

Cr =B1

Bo

(4)

Here,

Cr = compression ratio,Bo = size before compression,

B1 = size after compression.

This compression should be applied when number of

repetition is 3 or more. When number of repetition is less

than 3, signal is not compressed according to this algorithm.

9th bit can be used for repetition identifier, but this

technique will not efficient for all signals. Identifier sample

system is always more efficient.

137137

8/3/2019 EMS_THEORY05703671


Fig. 1. Sampling of an analog speech signal

MPEG-4 is the most popular loss-less compression com-

pression system. The exact formula used in MPEG-4 ALS

to predict yi from the previous samples yi−k is shown in

eqn. (5)

yi = (M

k=1

akyi−k + 2Q−1)/2Q (5)

Where, M = Order of the predictor, ak = Linear prediction

coefficients, Q = Number of bits representing a.

Direct quantization of the predictor coefficients is not very

efficient for transmission, since small errors in quantization

can produce spectral errors and might yield to instability.

III. ALGORITHM

Considering a signal shown in Fig. 1-

The signal is sampled after a given interval Ts (sampling

period). After sampling values of samples will be s1, s2,

s3.., s9 = 3, 0, 0, 0, 0, 4, - 4, 1, 0.After applying proposed compression technique number

of sample is 8, they are s1, s2, s3.., s8 = 3, 0, 125, 3, 4, -

4, 1, 0.

Here s3 = 125 means s2 is repeated.

s4 = 3 means number of repetition is 3.

Only 1 sample is compressed as number of repetition is

3. When number of repetition is 100, (100 - 2) = 98 sample

is compressed.

So after compression the signals is a PCM signal, when

value of samples are always changing and the signal is level

crossing sampled signal, when value of samples are not

changing.

In commercial use of this software user has given anopportunity of avoiding change of nibble bit. When speaker

is not saying anything and no background music is available

only nibble bit (right most bit) may alter due to noise or due

to surroundings. Fig. 2 shows the algorithm flowchart for

proposed compression system. Though the value of sample

125 is very rare, we need to consider it. Fig.3 shows the

solution of this rare problem. Operations of Fig. 3 should

be performed before the operations of Fig. 2.

Fig. 2. Flowchart of proposed compression system

Fig. 3. Flowchart for changing value of sample, when any sample containsvalue 125

IV. SIMULATION RESULTS

A model matlab code is built to simulate and implementthis theory. This theory is tested over 5 male and 5 female

voices. Percentage compression varies from 10% to 28%.

In following simulation (First Signal) original signal’s

length is 17501 and compressed length is 13591. Percentage

of compression is 22.34%, compression ratio = 0.7766.

Waveforms of original signal, compressed signal and signal

after expansion of compressed signal are shown in Fig. 4-6.

In following simulation (Second Signal) original signal’s

138138

8/3/2019 EMS_THEORY05703671


Fig. 4. Original PCM signal (First signal)

Fig. 5. Signal after compression (First signal)

length is 13501 and compressed length is 10636. Percentage

of compression is 21.22%, compression ratio = 0.7878.

Waveforms of original signal, compressed signal and signal

after expansion of compressed signal are shown in Fig. 7-9.

This theory presents a loss-less, one to one compression

technique. After expansion, original signal used as input of

compression is found as output.

In commercial case user is given an opportunity of avoid-

ing nibble bit. If user avoids nibble bit compressed signaland signal after expansion of compressed signals are as Fig.

10,11 where original signal is shown in Fig. 7.

Here, original signal’s length is 13501 and compressed

length is 8172. Percentage of compression is 39.47%, com-

pression ratio = 0.6053. That means only avoiding change of

nibble bit percentage of compression is increased by factor

of more than two.

In reconstructed signal, it is seen that this signal is almost

Fig. 6. Signal after expansion (First signal)

Fig. 7. Original PCM signal (second signal)

Fig. 8. Signal after compression (second signal)

Fig. 9. Signal after expansion (second signal)

Fig. 10. Signal after compression when tolerance is considered (secondsignal)

139139

8/3/2019 EMS_THEORY05703671


Fig. 11. Signal after expansion when tolerance is considered (secondsignal)

similar to the main signal. The only difference exists when

main signal is zero and only right most bit/nibble bit is

changing. Here this change is not considered.

V. DISCUSSION

Performance of lossy compression systems is much better

than loss-less systems. Their percentage of compression,cost of compression, time needed for compression all are

much better than loss-less compressions [14]. Loss-less

compression algorithms are Adaptive Arithmetic Compress,

Adaptive Huffman Compress, LZAH Compress, LZWAH

Compress etc. In all of these algorithms compression ratio

depends on input signal, like proposed algorithm. These

are completely loss-less but always can not compress data.

For signal of smaller frame size of data after applying this

operation is higher than original data. MPEG-4 ALS is a

recent loss-less-only audio compression standard, providing

loss-less compression for PCM multichannel audio signals,

with an introductory description available in [15].Among all

recent algorithms MPEG-4 ALS 19, optimum compressionis the best [16]- [18].

A. Comparison

Compression algorithm are of two types- lossy and loss-

less.

1) Comparing with Lossy Algorithms: A number of good

lossy compression algorithms are available at present. Such

as LPC reduces the transmitted data by factor of more than

twelve. Mixed excitation linear prediction (MELP) algorithm

also reduces data by a high factor [9]. These are good

techniques for sending normal speech signals. For saving

high quality music signal these algorithms are not good, as

they are lossy techniques. Modern goal of down sampling isto maintain good quality using smaller memory space [19].

As the proposed algorithm is a loss-less one, we should

compare compression ratios with current loss-less speech

compression algorithms.

2) Comparing with Loss-less Algorithms: Recent popular

lossless algorithms are Free Lossless Audio Codec (FLAC)

[20], The true Audio (TTA), Apple lossless, MPEG-4 ALS,

Monkeys Audio, La, Shorten, Windows Media 9 etc. Among

TABLE IAVERAGE COMPRESSION RATIO OF THE LOSSLESS COMPRESSION

TECHNIQUES

MPEG-4 FLAC Apple Windows Monkey’s ProposedALS Lossless Media 9 Audio system

0.543 0.611 0.610 0.594 0.583 0.740

them proposed compression technique is the simplest. Com-

pression Ratio of the Lossless compression techniques are

shown in Table I. Though compression ratio of proposed

system is higher we may decrease compression ratio by

avoiding change of nibble bit. Data loss occurs doing so;

it will not be a good technique for archiving.

Proposed algorithm is compared with MPEG-4 ALS, the

most popular lossless algorithm. Advantages of proposed

algorithm are shown below-

• MPEG-4 ALS algorithm is more complex and time

consuming than proposed algorithm.

• More complex hardware is needed to implementMPEG-4 ALS.

• MPEG-4 ALS is not able to compress any signals.

Compressed length is higher than normal length for

random signal.

According to block diagram of two systems proposed

algorithm is faster and it’s implementation needs smaller and

simpler hardware. In MPEG-4 ALS frame is predicted and

then quantized according to equn. (5). Prediction coefficients

are also quantized. One frame may contain value which can’t

be predicted from previous sample. For a totally random

signal error will be higher for any prediction. In MPEG-4

ALS both error and quantization coefficients are sent. Error

signal is entropy coded. For random signal error is also

random and entropy coding is not efficient. So size of data

will increase. In such case proposed compression system will

work faster and confirm that compression is not possible.

B. Real-time Applications

Proposed technique can be adopted and used for real-time

application. In a real-time system, like cellular communica-

tion, always signal is sampled and sent frame by frame.

Suppose a frame contains 20 samples. Assuming, after

applying this compression technique a frame is containing

12 samples. Unused 8 samples will contain 64 bit in 8-bit PCM system. This data will not be sent and other

information can be sent using these bits as shown in Fig.

12 and Fig. 13. Instead of sending extra samples security

data can be sent, which is called water marking. In this case

good transparency and security will be found. In loss-less

compression systems transparency of water marking is not

normally good [21]. Watermarking can not ensure security

without encryption [22], [23].

140140

8/3/2019 EMS_THEORY05703671


Fig. 12. Flowchart of sending end operation for RT application

Fig. 13. Flowchart of receiving end operation for RT application afterreceiving frame

C. Transmission TechniqueIn this proposed technique speech signals are compressed

without any loose of quality and without missing any bit.

Compressed signal is not a normal speech signal. In com-

pressed signal change in one bit may cause a huge change

after decompression operation. Data should be transmitted

through loss-less medium. In such medium error is identi-

fied; such as check-sum [24], [25] method.

VI. CONCLUSION AND FUTURE WOR K

In this paper we presented a very efficient technique

of compressing high quality voice signals. Proposed loss-

less compression algorithms are complex and sometime

compressed size is larger than original size. We proposeda simpler lossless algorithm and it will return notification

quickly, when compression is not possible. In future we will

try to implement it using FPGA board. It will also be used in

real-time applications. In real-time applications some frame

will contain less samples then normal frames. Instead of

sending extra samples security data can be sent, which is

called water marking. In this case good transparency and

security will be experienced.

REFERENCES

[1] C. Giria; B. M. Raob; S. Chattopadhyaya, “Split Variable-lengthInput Huffman Code With Application to Test Data Compression for

Embedded Cores in SOCs ,” in International Journal of Electronics,Vol. 96, Issue 9, pp 935 - 942, September 2009.[2] Simon Haykin- An Introduction to Analog and Digital Communica-

tions, 2nd ed., 1989[3] A. M. M,A. Najih, A. R. Ramli, A. Ibrahim, Syed A.R, “Comparing

Speech Compression Using Wavelets With Other Speech CompressionSchemes,” in (SCOReD)IEEE Proceedings, 2003, Putrajaya, Malaysia.

[4] S. M. Qaisar, L. Fesquet, M. Renaudin, Computationally EfficientAdaptive Rate Sampling and Filtering in EUSIPCO, Poznan, 2007.

[5] J.W. Mark and T.D. Todd, “A Nonuniform Sampling Approach to DataCompression” IEEE Transactions on Communications, vol. COM-29,pp. 24-32, January 1981.

[6] Mr. C. Rose, Dr. R. W. Donaldson, “Real-time Implementation andEvaluation of an Adaptive Silence Deletion Algorithm for SpeechCompression.”IEEE Pacific Rim Conference on Communications, ” inComputers and Signal Processing, May 9-10, 1991.

[7] A. Gersho- “Advances in Speech and Audio Compression ” in pro-ceedings of the IEEE , VOL. 82, NO. 6, JUNE 1994.

[8] M. Markovic, B. BoSkovic- “peech Compression Algorithms in MobileSatellite Systems,” in IEEE proc., pp.13-15. NiS, Yugoslavia, October1999

[9] S. Dusan, L. Flanagan, A. Karve, M. Balaraman, “Speech Compressionby Polynomial Approximation” in IEEE Trans. on audio, speech and

language processing, vol. 15, no. 2, February 2007.[10] L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V. McCree, “pELP:

The new federal standard at 2400 bps” in IEEE Int. Conf. Acoust,

Speech, Signal Process, 1997, pp. 1591-1594.[11] T. Wang, K. Koishida, V. Cuperman, A. Gersho, and J. S. Collura, “A

1200/2400 b/s coding suite based on MELP” in IEEEWorkshop SpeechCoding,Ibaraki, Japan, 2002.

[12] M. Hans and R.W. Schafer, “Loss-less Compression of Digital Audio,” Lossless Compression of Digital Audio,, vol. 18, issue 4, pp. 21-32, July2001.

[13] R. E. Walpole, R. H. Myers, Probability & Statistics for Engineers

and Scientists, 2nd ed.[14] K. W. Ng, A. J. Pollard, L. R. Dacombe, R. D. McLeod, and H. C.

Card, “Performance of Lossless Compression Algorithms on VoicebandData,” in CCECE’96, IEEE , 1996

[15] T. Liebchen and Y.A. Reznik, MPEG-4 ALS: an Emerging Standardfor Lossless Audio Coding, in Proceedings of the Data Compression

Conference, pp. 439-448, Snowbird, Utah, March 2004.[16] F. Ghido and I. Tabus, “Benchmarking of Compresssion and Speed

Performance for Lossless Audio Compression Algorithms,”in Interna-

tional Conference on Acoustics, Speech and Signal Processing,ICASSP,March-April 2008

[17] T. Liebchen, T. Moriya, N. Harada, Y. Kamamoto, and Y. A. Reznik,“The MPEG-4 Audio Lossless Coding (ALS)Standard - Technologyand Applications,” in 119th Convention, 2005 October 710 New York,NY, USA

[18] T. Liebchen, “MPEG-4ALS -TheStandardforLosslessAudioCoding”in The Journal of the Acoustical Society of Korea, vol. 28, no. 7,October 2009.

[19] L. Fang, O. C. Au, X. Wen, Y. Yang, W. Tang, “An Lmmse-based Merging Approach for Subpixel-based Downsampling,” in 17th European Signal Processing Conference (EUSIPCO), August 24-28,2009

[20] FLAC open-source audio compression program, Available in-http://flac.sourceforge.net

[21] J. Dittmann, D. Megias, A. Lang1, J. Herrera-Joancomart, “HeoreticalFramework for a Practical Evaluation and Comparison of AudioWatermarking Schemes in the Triangle of Robustness, Transparencyand Capacity,” in Transaction on Data Hiding and Multimedia Security

I; Springer LNCS 4300; Editor Yun Q. Shi, pp. 1-40; ISBN 978-3-540-49071-5, 2006

[22] P. C. van Oorschot, A. J. Menezes, and S. A. Vanstone, Handbook of

Applied Cryptography,CRC press Inc., Florida, 1996[23] M. A. Qadeer, R. Kasana, S. Sayeed.”Encrypted Voice Calls with

IP enabled Wireless Phones over GSM/ CDMA/ WiFi Networks”, IEEE Proc., International Conference on Computer Engineering and

Technology (ICCET), PP. 218-222, 2009.[24] D. V. Hall- Microprocessor and Interfacing, Programming and Hard-

ware, 2nd ed.[25] A. S. Tanenbaum- Computer Networks, 4th ed., 2002

141141

Documents

EMS_THEORY05703671