Upload
syed-bahauddin-alam
View
221
Download
0
Embed Size (px)
Citation preview
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 1/6
A Theory of Loss-less Compression of High Quality Speech Signals with Comparison
Hussain Mohammed Dipu Kabir1, Syed Bahauddin Alam, Md. Isme Azam, Rishad Ahmed,
Md. Riazul Islam, M. Asif Ul Hoque, Md. Abdul Matin
Bangladesh University of Engineering and Technology,Dhaka-1000, Bangladesh,[email protected]
Abstract—The field of speech compression has advancedrapidly due to cost-effective digital technology and diversecommercial applications. In voice communication a real-timesystem should be considered. It is not still possible to compresssignals without facing any loss in real-time system. This paperpresents a theory of loss-less digital compression for savinghigh quality speech signals. Emphasis is given on the qualityof speech signal. In hearing music high quality music is alwaysneeded, consuming smaller memory space. In this compression8-bit PCM/PCM speech signal is compressed. When values of samples are varying they are kept same. When they are notvarying the number of samples containing same value is saved.After compression the signal is also an 8-bit PCM/PCM butexpansion is needed before hearing it. This technique may alsobe used in real-time systems.
Index Terms—Compression, Lossless, Probability, Identifier,Repetition, LPC, MPEG-4.
I. INTRODUCTION
While modern lossy coding standards such as MP3 or
AAC can achieve high compression ratios with transparent
subjective quality, they do not preserve every single bit of
the original audio data. Thus, lossy coding methods are not
suited for editing or archiving applications, since multiplecoding or post-processing can reveal originally masked
distortions. Applying lossless entropy coding methods such
as Lempel-Ziv, Huffman or arithmetic coding directly to
the audio signal is not very efficient due to the long-
time correlations and the high range of values. Therefore,
conventional data compression tools such as Winzip or gzip
fail in the case of digital audio data.
Lossless audio coding enables the compression of digital
audio data without any loss in quality due to a perfect (i.e.
bit-identical) reconstruction of the original signal. Entropy
coding is used for compressing data. Huffman coding is an
example of entropy coding [1]. These coding are used for
zipping files. These algorithms are not always compressingdata; especially for small frame. Lossless compression has
many applications in recording and distribution of audio.
For instance, bit-exact representation is strongly desirable in
archival systems, studio operations, for collaborative work or
music distribution over broadband links. This paper presents
a loss-less compression technique for saving speech signal.
It’s real-time applications are also mentioned in this paper.
PCM or DPCM is used in cellular communication. Before
sampling an analog low-pass filter is used to avoid folded
signals of higher frequencies. DPCM is better technique
[2] because here less data is needed to be transferred. In
speech compression Linear Predictive Coding (LPC) is used.
LPC is based on AR signal modeling. LPC is the basis
of speech compression for cell phones, digital answering
machines. LPC reduces the transmitted data by factor of
more than twelve [3]. LPC is a lossy compression scheme.
LPC is specifically tailored for speech. It does not work well
for audio in general. In adaptive rate sampling techniqueusing Level Crossing Sampling Scheme (LCSS) [4] bit-
rate varies with input and time. LCSS technique is not
always compressing data. So, network will face difficulties
in implementation of LCSS algorithm. Sampling rate can
be reduced to reduce bit-rate of data transfer, but speech
quality will degrade in reducing sampling rate. Non-uniform
sampling methods can be efficient for data compression for
saving data [5]. The proposed technique is better than non-
uniform sampling methods. In silence detection algorithm
for speech detection [6] speech and silence are separated
by efficient coding. In this algorithm a frame is declared
as silence or speech. This is a lossy technique and for same
quality output compression is less compared to the proposedalgorithm. Wideband audio compression is generally aimed
at a quality that is nearly indistinguishable from consumer
compact-disc audio [7]. Speech compression algorithms, in
mobile satellite systems bring us loosy compressions [8].
The proposed method is similar of PCM when signal
contains high-frequency and similar to level crossing sam-
pling scheme when change in signal is small. The change
in signal should be so small that the change is not noticed
when, signal is sampled as PCM. This compression remains
the signal PCM, without adding another identifier bit for
repetition. As another identifier bit is not used, one sample is
used as identifier of repetition. One sample holds one value.
The value used as identifying, should not exist in signal.One sample may hold the value same to identifier. In such
case the value should be changed slightly.
An experimental implementation of the polynomial ap-
proximation method for speech compression was integrated
into the 2400 b/s mixed excitation linear prediction (MELP)
speech coder standard [9], [10]. The version of this standard
which was used for implementation was an enhanced MELP
coder [11]. In this system, the frequency range is divided into
five contiguous frequency bands and synthesis is based on
UKSim Fourth European Modelling Symposium on Computer Modelling and Simulation
978-0-7695-4308-6/10 $26.00 © 2010 IEEE
DOI 10.1109/EMS.2010.33
136
UKSim Fourth European Modelling Symposium on Computer Modelling and Simulation
978-0-7695-4308-6/10 $26.00 © 2010 IEEE
DOI 10.1109/EMS.2010.33
136
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 2/6
voiced/unvoiced excitations for each band. In speech com-
pression by polynomial approximation method new speech
coder is proposed which operates at a transmission rate of
1533 b/s, and for all noisy conditions tested performs better
than the 2400 b/s standard speech coder. But this modern
techniques are also lossy compressions and not applicable
for high quality music compression.
For a detailed introduction to the basic principles of loss-
less audio compression and an historical overview of the first
loss-less audio compressors, readers are directed to [12].
Practically a signal contains large number of zeros. It
occurs when person talks in less noisy environment. In case
of music some samples have very sharp changes and some
samples are containing zeros. The proposed compression
technique is effective for such signals containing large
number of zeros.
Using the proposed compression technique 10% to 85%
compression occurs. That means small memory is needed
to save it. Compression ratio depends on speech signal.
In this compression there is no constrains about sampling
rate and quality of speech. Tolerance level can also be
used to get better compression. Using tolerance level quality
of signal will degrade. Tolerance level 0.02 means when
two signals have 0.02 differences in value, their value will
be leveled. The proposed technique can also be used with
entropy coding for better compression.
I I . MATHEMATICAL BACKGROUND
In proposed technique PCM signals are compressed. In
practical case a signal contains large number of same valued
samples. Suppose samples are taken as 8-bit PCM; value of
samples will be -128 to +127 according to 2’s compliment
number. The value of a sample 120 or more is rare. Most
of the sample contains a value, near zero. The values of samples have normal distribution of variance, σ = 6 to 25;
depending on signal source [13].
Formula of normal distribution-
f (x) =1√
2πσ2.e−
(x−μ)2
2σ2 (1)
Here,
σ2 = variance, μ = median, x = value of a sample.
So a sample having a value 125 is very rare. According to
this equation this probability is 3.7574 ×10−96 (calculating
using Matlab and considering σ = 6) to 5.9626 ×10−8
(considering σ = 25). In a signal of 14,001 samples, no
sample has value of 125. The value 127 is not so rarebecause when a sample have value 127 or more; the value
of this sample is saved as 127.
Here, the value 125 is used as identifier of repetition.
Though probability a sample having this value is very rare,
this case should be considered. If any sample has the value
125 the value should be changed to 126. The value 126 is
very close to 125. Energy of signal is proportional to square
of amplitude. So percentage change in energy of signal due
to this change is (1 - 1252/1262) = 1.58% for only this
sample.
According to Weber’s law-equally perceived differences
have values proportional to absolute levels.
ΔReaponse ∝ ΔStimulus
Stimulus(2)
From this law the following equation is derived-
r = k.ln(s
s0) (3)
Here,
r = response, s = stimulus, k = constant.
For this intensity of signal is measured in db(decidel) and
signals are compressed by μ-law or A-law before sampling.
When the signal is previously compressed by μ-law or A-
law, change in response of signal will be (126 - 125)/125 =
0.8% When the signal is not compressed by μ-law or A-law,
change in response of signal will be very small. Assuming
20 consecutive samples has the value 0 and to represent it
3 consecutive samples are needed. They are-
s1 = 0 //value of samples2 = 125 //identifier of repetition
and s3 = 19. //number of repetition
Here, s3 is an unsigned 8-bit integer; may hold value 0 to
255.
Problem may arise when number of repetition is more than
255. Assuming 540 consecutive samples has the value 0 and
to represent it 5 consecutive samples are needed. They are-
s1 = 0 //value of sample
s2 = 125 //identifier of repetition
s3 = 255
s4 = 255
s5 = 29
From the values of samples, number of repetition is 255+ 255 + 29 = 539
When number of repetition is 255, then s3 = 255 and s4 =
0. s3 = 255 means repetition continues to s4 and s4 = 255
means repetition continues to s5 etc.
For 16 or 32 bit PCM data we will use (2N − 3) as
identifier of repetition. If this number exists in sample value
will be changed by adding 1.
Compression ratio is denoted by-
Cr =B1
Bo
(4)
Here,
Cr = compression ratio,Bo = size before compression,
B1 = size after compression.
This compression should be applied when number of
repetition is 3 or more. When number of repetition is less
than 3, signal is not compressed according to this algorithm.
9th bit can be used for repetition identifier, but this
technique will not efficient for all signals. Identifier sample
system is always more efficient.
137137
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 3/6
Fig. 1. Sampling of an analog speech signal
MPEG-4 is the most popular loss-less compression com-
pression system. The exact formula used in MPEG-4 ALS
to predict yi from the previous samples yi−k is shown in
eqn. (5)
yi = (M
k=1
akyi−k + 2Q−1)/2Q (5)
Where, M = Order of the predictor, ak = Linear prediction
coefficients, Q = Number of bits representing a.
Direct quantization of the predictor coefficients is not very
efficient for transmission, since small errors in quantization
can produce spectral errors and might yield to instability.
III. ALGORITHM
Considering a signal shown in Fig. 1-
The signal is sampled after a given interval Ts (sampling
period). After sampling values of samples will be s1, s2,
s3.., s9 = 3, 0, 0, 0, 0, 4, - 4, 1, 0.After applying proposed compression technique number
of sample is 8, they are s1, s2, s3.., s8 = 3, 0, 125, 3, 4, -
4, 1, 0.
Here s3 = 125 means s2 is repeated.
s4 = 3 means number of repetition is 3.
Only 1 sample is compressed as number of repetition is
3. When number of repetition is 100, (100 - 2) = 98 sample
is compressed.
So after compression the signals is a PCM signal, when
value of samples are always changing and the signal is level
crossing sampled signal, when value of samples are not
changing.
In commercial use of this software user has given anopportunity of avoiding change of nibble bit. When speaker
is not saying anything and no background music is available
only nibble bit (right most bit) may alter due to noise or due
to surroundings. Fig. 2 shows the algorithm flowchart for
proposed compression system. Though the value of sample
125 is very rare, we need to consider it. Fig.3 shows the
solution of this rare problem. Operations of Fig. 3 should
be performed before the operations of Fig. 2.
Fig. 2. Flowchart of proposed compression system
Fig. 3. Flowchart for changing value of sample, when any sample containsvalue 125
IV. SIMULATION RESULTS
A model matlab code is built to simulate and implementthis theory. This theory is tested over 5 male and 5 female
voices. Percentage compression varies from 10% to 28%.
In following simulation (First Signal) original signal’s
length is 17501 and compressed length is 13591. Percentage
of compression is 22.34%, compression ratio = 0.7766.
Waveforms of original signal, compressed signal and signal
after expansion of compressed signal are shown in Fig. 4-6.
In following simulation (Second Signal) original signal’s
138138
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 4/6
Fig. 4. Original PCM signal (First signal)
Fig. 5. Signal after compression (First signal)
length is 13501 and compressed length is 10636. Percentage
of compression is 21.22%, compression ratio = 0.7878.
Waveforms of original signal, compressed signal and signal
after expansion of compressed signal are shown in Fig. 7-9.
This theory presents a loss-less, one to one compression
technique. After expansion, original signal used as input of
compression is found as output.
In commercial case user is given an opportunity of avoid-
ing nibble bit. If user avoids nibble bit compressed signaland signal after expansion of compressed signals are as Fig.
10,11 where original signal is shown in Fig. 7.
Here, original signal’s length is 13501 and compressed
length is 8172. Percentage of compression is 39.47%, com-
pression ratio = 0.6053. That means only avoiding change of
nibble bit percentage of compression is increased by factor
of more than two.
In reconstructed signal, it is seen that this signal is almost
Fig. 6. Signal after expansion (First signal)
Fig. 7. Original PCM signal (second signal)
Fig. 8. Signal after compression (second signal)
Fig. 9. Signal after expansion (second signal)
Fig. 10. Signal after compression when tolerance is considered (secondsignal)
139139
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 5/6
Fig. 11. Signal after expansion when tolerance is considered (secondsignal)
similar to the main signal. The only difference exists when
main signal is zero and only right most bit/nibble bit is
changing. Here this change is not considered.
V. DISCUSSION
Performance of lossy compression systems is much better
than loss-less systems. Their percentage of compression,cost of compression, time needed for compression all are
much better than loss-less compressions [14]. Loss-less
compression algorithms are Adaptive Arithmetic Compress,
Adaptive Huffman Compress, LZAH Compress, LZWAH
Compress etc. In all of these algorithms compression ratio
depends on input signal, like proposed algorithm. These
are completely loss-less but always can not compress data.
For signal of smaller frame size of data after applying this
operation is higher than original data. MPEG-4 ALS is a
recent loss-less-only audio compression standard, providing
loss-less compression for PCM multichannel audio signals,
with an introductory description available in [15].Among all
recent algorithms MPEG-4 ALS 19, optimum compressionis the best [16]- [18].
A. Comparison
Compression algorithm are of two types- lossy and loss-
less.
1) Comparing with Lossy Algorithms: A number of good
lossy compression algorithms are available at present. Such
as LPC reduces the transmitted data by factor of more than
twelve. Mixed excitation linear prediction (MELP) algorithm
also reduces data by a high factor [9]. These are good
techniques for sending normal speech signals. For saving
high quality music signal these algorithms are not good, as
they are lossy techniques. Modern goal of down sampling isto maintain good quality using smaller memory space [19].
As the proposed algorithm is a loss-less one, we should
compare compression ratios with current loss-less speech
compression algorithms.
2) Comparing with Loss-less Algorithms: Recent popular
lossless algorithms are Free Lossless Audio Codec (FLAC)
[20], The true Audio (TTA), Apple lossless, MPEG-4 ALS,
Monkeys Audio, La, Shorten, Windows Media 9 etc. Among
TABLE IAVERAGE COMPRESSION RATIO OF THE LOSSLESS COMPRESSION
TECHNIQUES
MPEG-4 FLAC Apple Windows Monkey’s ProposedALS Lossless Media 9 Audio system
0.543 0.611 0.610 0.594 0.583 0.740
them proposed compression technique is the simplest. Com-
pression Ratio of the Lossless compression techniques are
shown in Table I. Though compression ratio of proposed
system is higher we may decrease compression ratio by
avoiding change of nibble bit. Data loss occurs doing so;
it will not be a good technique for archiving.
Proposed algorithm is compared with MPEG-4 ALS, the
most popular lossless algorithm. Advantages of proposed
algorithm are shown below-
• MPEG-4 ALS algorithm is more complex and time
consuming than proposed algorithm.
• More complex hardware is needed to implementMPEG-4 ALS.
• MPEG-4 ALS is not able to compress any signals.
Compressed length is higher than normal length for
random signal.
According to block diagram of two systems proposed
algorithm is faster and it’s implementation needs smaller and
simpler hardware. In MPEG-4 ALS frame is predicted and
then quantized according to equn. (5). Prediction coefficients
are also quantized. One frame may contain value which can’t
be predicted from previous sample. For a totally random
signal error will be higher for any prediction. In MPEG-4
ALS both error and quantization coefficients are sent. Error
signal is entropy coded. For random signal error is also
random and entropy coding is not efficient. So size of data
will increase. In such case proposed compression system will
work faster and confirm that compression is not possible.
B. Real-time Applications
Proposed technique can be adopted and used for real-time
application. In a real-time system, like cellular communica-
tion, always signal is sampled and sent frame by frame.
Suppose a frame contains 20 samples. Assuming, after
applying this compression technique a frame is containing
12 samples. Unused 8 samples will contain 64 bit in 8-bit PCM system. This data will not be sent and other
information can be sent using these bits as shown in Fig.
12 and Fig. 13. Instead of sending extra samples security
data can be sent, which is called water marking. In this case
good transparency and security will be found. In loss-less
compression systems transparency of water marking is not
normally good [21]. Watermarking can not ensure security
without encryption [22], [23].
140140
8/3/2019 EMS_THEORY05703671
http://slidepdf.com/reader/full/emstheory05703671 6/6
Fig. 12. Flowchart of sending end operation for RT application
Fig. 13. Flowchart of receiving end operation for RT application afterreceiving frame
C. Transmission TechniqueIn this proposed technique speech signals are compressed
without any loose of quality and without missing any bit.
Compressed signal is not a normal speech signal. In com-
pressed signal change in one bit may cause a huge change
after decompression operation. Data should be transmitted
through loss-less medium. In such medium error is identi-
fied; such as check-sum [24], [25] method.
VI. CONCLUSION AND FUTURE WOR K
In this paper we presented a very efficient technique
of compressing high quality voice signals. Proposed loss-
less compression algorithms are complex and sometime
compressed size is larger than original size. We proposeda simpler lossless algorithm and it will return notification
quickly, when compression is not possible. In future we will
try to implement it using FPGA board. It will also be used in
real-time applications. In real-time applications some frame
will contain less samples then normal frames. Instead of
sending extra samples security data can be sent, which is
called water marking. In this case good transparency and
security will be experienced.
REFERENCES
[1] C. Giria; B. M. Raob; S. Chattopadhyaya, “Split Variable-lengthInput Huffman Code With Application to Test Data Compression for
Embedded Cores in SOCs ,” in International Journal of Electronics,Vol. 96, Issue 9, pp 935 - 942, September 2009.[2] Simon Haykin- An Introduction to Analog and Digital Communica-
tions, 2nd ed., 1989[3] A. M. M,A. Najih, A. R. Ramli, A. Ibrahim, Syed A.R, “Comparing
Speech Compression Using Wavelets With Other Speech CompressionSchemes,” in (SCOReD)IEEE Proceedings, 2003, Putrajaya, Malaysia.
[4] S. M. Qaisar, L. Fesquet, M. Renaudin, Computationally EfficientAdaptive Rate Sampling and Filtering in EUSIPCO, Poznan, 2007.
[5] J.W. Mark and T.D. Todd, “A Nonuniform Sampling Approach to DataCompression” IEEE Transactions on Communications, vol. COM-29,pp. 24-32, January 1981.
[6] Mr. C. Rose, Dr. R. W. Donaldson, “Real-time Implementation andEvaluation of an Adaptive Silence Deletion Algorithm for SpeechCompression.”IEEE Pacific Rim Conference on Communications, ” inComputers and Signal Processing, May 9-10, 1991.
[7] A. Gersho- “Advances in Speech and Audio Compression ” in pro-ceedings of the IEEE , VOL. 82, NO. 6, JUNE 1994.
[8] M. Markovic, B. BoSkovic- “peech Compression Algorithms in MobileSatellite Systems,” in IEEE proc., pp.13-15. NiS, Yugoslavia, October1999
[9] S. Dusan, L. Flanagan, A. Karve, M. Balaraman, “Speech Compressionby Polynomial Approximation” in IEEE Trans. on audio, speech and
language processing, vol. 15, no. 2, February 2007.[10] L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V. McCree, “pELP:
The new federal standard at 2400 bps” in IEEE Int. Conf. Acoust,
Speech, Signal Process, 1997, pp. 1591-1594.[11] T. Wang, K. Koishida, V. Cuperman, A. Gersho, and J. S. Collura, “A
1200/2400 b/s coding suite based on MELP” in IEEEWorkshop SpeechCoding,Ibaraki, Japan, 2002.
[12] M. Hans and R.W. Schafer, “Loss-less Compression of Digital Audio,” Lossless Compression of Digital Audio,, vol. 18, issue 4, pp. 21-32, July2001.
[13] R. E. Walpole, R. H. Myers, Probability & Statistics for Engineers
and Scientists, 2nd ed.[14] K. W. Ng, A. J. Pollard, L. R. Dacombe, R. D. McLeod, and H. C.
Card, “Performance of Lossless Compression Algorithms on VoicebandData,” in CCECE’96, IEEE , 1996
[15] T. Liebchen and Y.A. Reznik, MPEG-4 ALS: an Emerging Standardfor Lossless Audio Coding, in Proceedings of the Data Compression
Conference, pp. 439-448, Snowbird, Utah, March 2004.[16] F. Ghido and I. Tabus, “Benchmarking of Compresssion and Speed
Performance for Lossless Audio Compression Algorithms,”in Interna-
tional Conference on Acoustics, Speech and Signal Processing,ICASSP,March-April 2008
[17] T. Liebchen, T. Moriya, N. Harada, Y. Kamamoto, and Y. A. Reznik,“The MPEG-4 Audio Lossless Coding (ALS)Standard - Technologyand Applications,” in 119th Convention, 2005 October 710 New York,NY, USA
[18] T. Liebchen, “MPEG-4ALS -TheStandardforLosslessAudioCoding”in The Journal of the Acoustical Society of Korea, vol. 28, no. 7,October 2009.
[19] L. Fang, O. C. Au, X. Wen, Y. Yang, W. Tang, “An Lmmse-based Merging Approach for Subpixel-based Downsampling,” in 17th European Signal Processing Conference (EUSIPCO), August 24-28,2009
[20] FLAC open-source audio compression program, Available in-http://flac.sourceforge.net
[21] J. Dittmann, D. Megias, A. Lang1, J. Herrera-Joancomart, “HeoreticalFramework for a Practical Evaluation and Comparison of AudioWatermarking Schemes in the Triangle of Robustness, Transparencyand Capacity,” in Transaction on Data Hiding and Multimedia Security
I; Springer LNCS 4300; Editor Yun Q. Shi, pp. 1-40; ISBN 978-3-540-49071-5, 2006
[22] P. C. van Oorschot, A. J. Menezes, and S. A. Vanstone, Handbook of
Applied Cryptography,CRC press Inc., Florida, 1996[23] M. A. Qadeer, R. Kasana, S. Sayeed.”Encrypted Voice Calls with
IP enabled Wireless Phones over GSM/ CDMA/ WiFi Networks”, IEEE Proc., International Conference on Computer Engineering and
Technology (ICCET), PP. 218-222, 2009.[24] D. V. Hall- Microprocessor and Interfacing, Programming and Hard-
ware, 2nd ed.[25] A. S. Tanenbaum- Computer Networks, 4th ed., 2002
141141