108
UNIVERSITY OF VICTORIA Department of Electrical and Computer Engineering ELEC 484 – Audio Signal Processing Final Project Direct FFT/IFFT based Phase Vocoder for Implementing Digital Audio Effects Prepared by: Tim I. Perry V00213455 May, 2009 Prepared for: Dr. Peter Driessen ELEC 484 University of Victoria

TPerry-PhaseVocoder

Embed Size (px)

DESCRIPTION

TPerry-PhaseVocoder

Citation preview

Page 1: TPerry-PhaseVocoder

UNIVERSITY OF VICTORIA Department of Electrical and Computer Engineering

ELEC 484 – Audio Signal Processing Final Project

Direct FFT/IFFT based Phase Vocoder for Implementing Digital Audio Effects

Prepared by:

Tim I. Perry

V00213455

May, 2009

Prepared for:

Dr. Peter Driessen

ELEC 484

University of Victoria

Page 2: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

1

Contents 1.  Design Overview .................................................................................................................................. 4 

1.1.  Analysis Portion ............................................................................................................................ 6 

1.1.1.  Windowing 6 

1.1.2.  Circular Shift and FFT 9 

1.1.3.  Overview of Analysis Stage 9 

1.2.  Resynthesis Portion ..................................................................................................................... 11 

1.3.  Phase Unwrapping ...................................................................................................................... 13 

2.  Testing Hop Size with Vectors ........................................................................................................... 15 

3.  Testing with Cosine Waves................................................................................................................. 18 

3.1.  Cosine wave input with integer # of samples/cycle .................................................................... 18 

3.2.  Cosine wave input with non-integer # of samples/cycle ............................................................ 24 

3.3.  Non-integer windowing with cosine wave input ........................................................................ 27 

4.  Waterfall Plots .................................................................................................................................... 29 

4.1.  Phase vs. Time, Instantaneous Frequency, and Frequency Resolution ....................................... 30 

4.2.  Amplitude, Magnitude, and Phase in Time-Frequency Plane ..................................................... 32 

Non-Integer Samples per Cycle, Fractional Cycle Per Segment 42 

5.  Cyclic Shift ......................................................................................................................................... 45 

5.1.  Cyclic Shift with Cosine Wave Input ......................................................................................... 45 

Without Cyclic Shift 47 

5.2.  Cyclic Shift with Kronecker Delta Function Input ..................................................................... 49 

With Cyclic Shift 49 

Without Cyclic Shift 51 

6.  Preliminary Audio Testing .................................................................................................................. 52 

7.  Implementing Audio Effects ............................................................................................................... 55 

7.1.  Time Stretching ........................................................................................................................... 55 

7.2.  Pitch Shifting .............................................................................................................................. 60 

7.3.  Stable/Transient Components Separation ................................................................................... 63 

7.4.  Robotization ................................................................................................................................ 65 

7.5.  Whisperization ............................................................................................................................ 67 

7.6.  Denoising .................................................................................................................................... 68 

Page 3: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

2

7.7.  What Wha Filter in Freq Domain ............................................................................................... 69 

Preliminary BPF Design 69 

Frequency Domain Wha-Wha Implementation 71 

8.  Audio Compression Using Phase Vocoder ......................................................................................... 80 

8.1.  Data Compression using a Threshold Amplitude for Eliminating Bins ...................................... 81 

8.2.  Data Compression by Keeping N Strongest Frequency Components ......................................... 84 

9.  Conclusions ......................................................................................................................................... 88 

REFERENCES ........................................................................................................................................... 88 

APPENDIX A – List of MATLAB files ..................................................................................................... 89 

APPENDIX B – Two Example Implementations ....................................................................................... 90 

1.  pv_WhaBPF.m ................................................................................................................................ 90 

2.  pv_Pitchshift.m ............................................................................................................................... 97 

APPENDIX C – waterfall_Plot.m ............................................................................................................ 104 

Page 4: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

3

ELEC 484 Final Project  

Phase Vocoder Implementation  PROJECT GUIDELINES FOR PHASE 1 Implement a phase vocoder using the ideas in DAFX chapter 8.  

a) Review Assignment 5 question 1c windowed overlapping segments, raised cosine windows, cyclic shift, and overlap‐add (DAFX figure 8.5)  

b) Test with a cosine wave with an integer and a non‐integer number of samples per cycle, verify that after the overlap add the output is the same as the input. Choose the frequency such that there are many cycles per segment, about 1 cycle per segment and a small fraction of a cycle per segment. (6 frequencies total). 

 c) Plot the amplitudes in the time/frequency plane (3D or color coded plot of amplitude vs time vs 

frequency) for each case and interpret the result  

d) Plot the phases in the time/frequency plane (3D or color coded plot of amplitude vs time vs frequency) for each case. Also plot the phase versus time for frequency bins close to the frequency of the cosine wave. Explain why the phases change with time for cosine waves of different frequencies. Identify the amount of phase change as a function of frequency and explain why it is so. Hint: consider the idea of instantaneous frequency. 

 e) Investigate the effect of the cyclic shift. Test with and without cyclic shift using a cosine wave and an 

impulse signal, and explain the results.  

f) Test also with the Tom_diner signal and verify the output is the same as the input.  

g) Implement the following audio effects by manipulating the amplitudes and phases, as explained in DAFX chapter 8: time stretching, pitch shifting, stable/transient components separation, robotization, whisperization, denoising. Also implement and test the filter of assignment 2 in the frequency domain. Test with the Tom_diner signal, cosine wave signals and other signals which demonstrate the effect clearly. 

        

Page 5: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

4

1. Design Overview For practical use with digital audio effects, a phase vocoder was implemented based on a block-by-block FFT/IFFT approach. This technique makes use of a kernel algorithm that uses a sliding window to perform FFTs on successive segments of audio. Phase and amplitude values are computed in the frequency domain, as well as any additional desired processing, such as the implementation of audio effects. Next, the IFFT is performed on the segment, which is re-windowed and overlap-added in the time domain to preceding segments, recovering the signal (or a processed version of the signal). This process can conceptually be broken into three stages:

1. Analysis 2. Frequency Domain Processing (implementation of specific DAFX) 3. Synthesis

This documentation will frequently refer to these stages; however, these stages are not separate entities with the implementation used. To allow for a shorter computation time, the kernel algorithm iterates through each stage on a segment of audio, all within in the same loop. With each successive windowing of a block of audio, the kernel algorithm executes through an iteration of analysis-processing-synthesis.

Figure 1: Abstracted phase vocoder design.

Page 6: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

5

Figure 2: Direct FFT/IFFT (Block-by-Block) Analysis/Synthesis Phase Vocoder model, as illustrated in Zölzer’s DAFX. [1]. While computation speed was the main factor for choosing the block-by-clock phase vocoder approach, the intention here is not so much to develop an efficient phase vocoder. Instead, the goal was to develop a robust design that can easily be tested, and easily be modified to accommodate various frequency domain processing schemes. Keeping the conceptual segregation between the three stages of Analysis, Processing, and Synthesis is helpful when designing for flexibility. Additionally, for the purpose of learning, it is important to be able to abstract certain elements of the phase vocoder while focussing the details of a specific element, such as the frequency domain implementation of an effect.

Page 7: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

6

In the spirit of robustness, and with heavy emphasis on analysis, this phase vocoder looses some of its elegance and simplicity due to input data type checking and real time plot generation. Having spent significant time testing and troubleshooting, I found it useful to be able to throw multiple data types (numerical vectors or wavfile names) into the same phase vocoder, and have it generate appropriate plots and feedback catered to that data type. More than anything else, I have found myself using this phase vocoder as a tool for time-frequency analysis on audio signals.

1.1.  Analysis Portion  The analysis portion of the kernel algorithm serves the purpose of bringing a windowed segment of audio into the frequency domain. Segments, or blocks, are windowed in such a way that they overlap (share some samples) with their neighbouring windows in the time domain. This overlap is defined by the hop size, Ra. The effect is a “sliding FFT” (actually, a hopping FFT, but an appropriate hop size and windowing scheme can be used to obtain smooth sounding results, despite discreet steps between each analysis window). The analysis part can be broken up into 3 steps:

1. Window the current block, forming the analysis grain.

2. Perform a circular shift on the analysis grain

3. Take the FFT of the circular shifted grain

1.1.1. Windowing 

Several different windowing schemes were tried. Windowing was applied to the block during the analysis stage, and inverse windowing was applied to the IFFT during the resynthesis stage. The goal was to minimize spectral leakage, and obtain an accurate reconstruction of the original signal. With rectangular windowing, truncation is abrupt, and significant spectral leakage occurs. For a smooth transition between blocks, raised cosine windowing of overlapping segments was used. Three potential raised cosine windows windows to use are outlined in Figure 3. Typically, when frequency selectivity is important, the use of a Hamming or Blackman window would be preferred over than Hann window. However, these windowing functions produce more spectral leakage than the Hann window. Also, for successful overlap adding with this phase vocoder implementation, the standard raised cosine window will have to be modified.

Three raised cosine window forms based on the Hann window shape are defined as follows:

%----Hann Window: "Hanning" removes multiplications by zero-------- w1 = .5*(1 - cos(2*pi*(0:WLen-1)'/ WLen)); %periodic Hann for overlap add w_han = .5*(1 - cos(2*pi*(0: WLen -1)'/( WLen -1))); %Raised Cosine (Hann) window w_hng = .5*(1 - cos(2*pi*(1: WLen)'/( WLen +1))); %Raised Cosine (Hanning) window

Page 8: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

7

 Figure 3:Three potential raised cosine windows to use. The blue is a Hann window, the green is a Hanning window (removes multiplications by zero), and the red is a periodic Hann window, designed for overlap-adding of successive blocks. The periodic Hann window was used.

The Hanning window was modified according to the recommendations in [1] to facilitate smooth overlap adding. The resulting window, w1, is the periodic hann, or “hanningz” as it is referred to in [1] and [2]. This window is designed to begin at sample with a value of zero, and end with a non-zero values sample, having the same value as the second sample [2]. That is:

          , 1 0 01 1

So, applying this requirement to the raised cosine window, we get the general form as shown below. %============================================================== % Create framing window (modified hanning window for OLA [2]) % w = [0, k_1, k_2,..., k_n-1 = k1] %============================================================== w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen))); % analysis window w2 = w1; % synthesis window

Window sizes and hop sizes are discussed later on in with accompanying examples. A larger analysis window size typically results in better frequency precision [2]. Smaller analysis windows, however, offer superior tracking of rapid spectral changes, such as burst of high frequency energy during transients. A workaround to allow for enhanced frequency precision with a smaller window size, is to zero pad the windowed grain:

1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

Samples

Am

plitu

deTime domain

0 0.2 0.4 0.6 0.8-350

-300

-250

-200

-150

-100

-50

0

50

Normalized Frequency ( rad/sample)

Mag

nitu

de (

dB)

Frequency domain

Page 9: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

8

grain_zp = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)]; % zero pad analysis grain for greater frequency precision

In the end, this was not implemented. The reason is that time-frequency plots were generated for many audio examples, and the computer system was running out of memory. Also, having to make design decisions over what particular window size should be used for each application was constructive in general. There is a trade-off between frequency precision and representation of transients, which will be frequently discussed.

Figure 4: Modified hanning window for WLen = 8 (used for vector testing)

Figure 5: Modified hanning window for WLen = 256 (used for cosine wave testing)

1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

Samples

Am

plitu

de

Time domain

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-350

-300

-250

-200

-150

-100

-50

0

50

Normalized Frequency ( rad/sample)

Mag

nitu

de (

dB)

Frequency domain

10 20 30 40 50 600

0.2

0.4

0.6

0.8

1

Samples

Am

plitu

de

Time domain

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-350

-300

-250

-200

-150

-100

-50

0

50

Normalized Frequency ( rad/sample)

Mag

nitu

de (

dB)

Frequency domain

Page 10: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

9

1.1.2. Circular Shift and FFT 

Circular/cyclic shift centers the window maxima at the origin, giving us zero phase at the middle of the analysis grain, and a centered FFT. A simple phase relationship can be achieved in this way, which is important when it comes time for phase unwrapping (discussed later). Without cyclic shift, an impulse that is centered in the window will have oscillating phase values between consecutive frequency bins, and as a result, the phases will unwrap in opposite directions [2]. By default, a raised cosine window is zero at the origin. If, however, we shift the windowed segment (grain) such that it is centered at the origin, we now have a grain with zero phase at its center. When we perform successive FFTs on a grains centered about the origin, we obtain a phase relationship that can be measured with a simple, systematic phase unwrapping algorithm.

1.1.3. Overview of Analysis Stage 

The analysis stage, by taking the FFT of a windowed, circular shifted segment, reveals the measured phase and amplitude in each frequency bin. The next stage is either direct resynthesis, or frequency domain processing followed by resynthesis. Many frequency domain effects require phase unwrapping to be performed after the analysis stage, which will be discussed later. Figure 6 highlights the analysis stage of the phase vocoder in context with the other stages. The Matlab code for the basic operations of the analysis stage is included.

Page 11: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

10

Figure 6: Phase Vocoder implementation with analysis stage documented.

Page 12: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

11

1.2.  Resynthesis Portion 

The resythesis portion reconstructs the signal, or a processed version of the signal such that it regains a time domain representation suitable for audio playback. For the definition of the “resynthesis stage” that is applied in this design, it is assumed that the target phase, denoted phi_t, is known. For the basic phase I/O functionality of the phase vocoder without effects, phi_t is simply set equal to the measured phase from the analysis section, phi. In cases where FX processing has occurred in the frequency domain, we will assume for now that phi_t (as well as any amplitude scaling on the FFT) has been calculated specifically for that effect. This is done so that we can hold off on an explanation on phase unwrapping until it becomes relevant - when we dive deeper into the specifics of phase vocoder effects processing.

The resynthesis part can be broken up into 4-5 steps for now:

1. Take the IFFT of the current frame, which is the analysis FFT frame with any newly

calculated target phase/amplitude values.

2. Re-compensate for the early circular shift by un-shifting the segment.

3. Apply inverse windowing (window tapering) to the segment, to correct for phase discontinuities that may have occurred at the edges of a frame [2]. We will call the result the synthesis grain, grain_t.

4. *FX specific: for certain effects, such as pitch shifting, an interpolation

scheme/resampling may be implemented, which required redefined synthesis grains. This will be discussed later when it is applied.

5. Overlap add the synthesis grains back into the time domain.

Page 13: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

12

Figure 7: Phase Vocoder implementation with resynthesis stage documented.

Page 14: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

13

1.3.  Phase Unwrapping 

Appearing in the resynthesis portion of the analysis-synthesis based phase vocoder of [2], phase unwrapping will instead be conceptually placed before resyntehesis, in the “frequency domain processing stage”. This is because phase unwrapping is a precursor to many frequency domain audio effects.

Taking Ω as the phase variation, the phase can be represented in a general form as follows:

From the above expression, we will conceptually represent the phase in a single frequency bin k, and express phase purely as a function of n (but in application, each expression below will be applied to every frequency bin). For specific phase values, n will be expressed explicitly in terms of the block index s (represented as i in the Matlab implementations), and the hope size . In order to obtain the exact phase value for each bin, phase unwrapping was performed. The difference between the measured phase phi0 and the target phase phi_t which corresponds to each bin’s nominal frequency was first computed. Next, the phase increment was calculated relative to one sample. The vector delta_phi contains the phase difference between two adjacent frames for each bin, and the nominal phase of the bin [2]. The following computations are base on the measured phase values of two consecutive FFT frames. If Ω corresponds to the frequency of a stable sinusoid, then the target phase phi_t can be computed from the previous measured phase phi0 [1] by adding Ω , where is the hop size:

(1)

The unwrapped phase can be computed from the target phase phi_t and deviation phase phi_d as:

(2)

The principle argument function is used in the calculation of the deviation phase [1]. princarg returns the nominal initial phase of each bin, placing it in the range [ , . The deviation phase phi_d was computed as the principle argument of phi minus the principle argument of phi_t:

Page 15: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

14

(3)

%============================================================ % princarg.m % % Function that returns the principle argument of the nominal % initial phase of each frame (for use with pVocoder_FFT). [2] %============================================================ function Phase = princarg(Phasein) a=Phasein/(2*pi); k=round(a); Phase=Phasein-k*2*pi; End

Figure 8: Phase computations for frequency bin k [1].

From Figure 8, the unwrapped phase difference between two consecutive frames is the difference between the deviation phase and the previous phase value:

(4)

The instantaneous frequency for frequency bin k can be calculated at time 1 as:

Page 16: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

15

(5)

The unwrapped phase and instantaneous frequency values will be important for the implementation of many frequency domain FX, which will be shown later.

2. Testing Hop Size with Vectors Preliminary and intermediate testing was conducted on vectors. The function kernalPlot.m is automatically called to plot the analysis grains of each iteration of the kernel algorithm if the total number of windowed grains will be few enough to view on a single plot (ie. the input is a test vector, not an audio signal). This was useful when experimenting with various windowing and hop size schemes. Since FFTshift is used to perform the cyclic shift, kernelPlot plots the grain before it has been centered about the origin. With the modified hanning window (called hanningz in [1] and [2[), it was found that the ratio of analysis hop size to window length had to be at least ¼ for the signal to be properly reconstructed during overlap-adding (OLA). For the basic phase vocoder implementation (without effects), the ratio that produced the best OLA results was:

= ¼

(6)

Page 17: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

16

Figure 9: Successive analysis grains generated by the kernal algorithm (only non-zero grains displayed). WLen = 8, Ra = 4.

Figure 10: With a hop/win ratio of  = 1/2, modulating of the output signal occurred during the

synthesis overlap add process.

Page 18: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

17

Figure 11: Successive analysis grains generated by the kernal algorithm. WLen = 8, Ra = 2.

Figure 12: With a hop/win ratio of  = 1/4, no output modulation occurred during the synthesis

overlap-add process.

Page 19: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

18

3. Testing with Cosine Waves In order to test the I/O operation of the basic phase vocoder at various frequencies, cosine wave testing was employed. Testing was conducted on sampled cosine wave segments with both integer and non integer number of samples per cycle. Several different input signal lengths were used in order to allow for plots with easy visibility at all frequencies tested. Frequencies and input parameters were used for the majority of the tests are displayed below: %========================================================================= % Testing PV on cosine wave with int/non-int # of samples/cycle %========================================================================= fs = 8000; % sampling frequency Ts = 1/fs; % sampling period Nx = 1000; % duration of signal nT = (0:Nx-1)*Ts; % Nx length time vector % integer # of samples/cycle f1 = 4; % 2000 samples/cycle f2 = 31.25; % 256 samples/cycle f3 = 500; % 16 samples/cycle f4 = 2000; % 4 samples/cycle % non-integer # of samples/cycle f5 = 7; f6 = 33; f7 = 300; f8 = 1500; %----------- cosine wave input parameters--------- f = f3; % choose freq of input x = cos(2*pi*f*nT); % cosine wave input WLen = 256; % window length Ra = WLen/4; % analysis hop size Rs = WLen/4; % synthesis hop size

3.1.  Cosine wave input with integer # of samples/cycle 

The following universal parameters were used for testing the phase vocoder with cosine waves having an integer number of samples per cycle:

Sampling frequency: fs = 8000Hz Window length: WLen = 256 samples Duration of sinusoid (unpadded): N_orig = 1000 samples

Page 20: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

19

The testing results are visible in the plots below. For each figure, the details of the test are outline in the caption. Plots of the analysis grains (prior to cyclic shift) are also included, as they are helpful for illustrating the windowing at each frequency. The I/O plots demonstrate that the reconstructed waveform is identical to the input waveform at all frequencies tested (both integer and non-integer number of samples per cycle)

Figure 13: Successive analysis grains for integer sampled cosine wave with freq 4Hz (2000 samples per cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 14: Input and output for 4 Hz cosine wave.

Page 21: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

20

Figure 15: Successive analysis grains for integer sampled cosine wave with freq 31.25Hz (256 samples per cycle, 1 cycle per segment). WLen = 256, Ra = WLen/4.

Figure 16: Input and output for 31.25Hz cosine wave (256 samples/cycle).

Page 22: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

21

Figure 17: Successive analysis grains for integer sampled cosine wave with freq 100Hz. WLen = 256, Ra = WLen/4.

Figure 18: Input and output for 100Hz cosine wave (80 samples/cycle).

Page 23: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

22

Figure 19: Successive analysis grains for integer sampled cosine wave with freq 500Hz. WLen = 256, Ra = WLen/4.

Figure 20: Input and output for 500Hz cosine wave (16 samples/cycle).

Page 24: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

23

Figure 21: Successive analysis grains for integer sampled cosine wave with freq 500Hz. WLen = 256, Ra = WLen/4.

Figure 22: Input and output for 1000Hz (8 samples/cycle) cosine wave, zoomed in for clarity.

Page 25: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

24

3.2.  Cosine wave input with non­integer  # of samples/cycle The analysis grain and I/O plots below reveal a perfect resynthesis for cosine waves with a fractional number of samples per cycle.

Figure 23: Successive analysis grains for integer sampled cosine wave with freq 7Hz (1333.33 samples per cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 24: Input and output for 7 Hz cosine wave.

Page 26: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

25

Figure 25: Successive analysis grains for integer sampled cosine wave with freq 7Hz (1333.33 samples per cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 26: Input and output for 33 Hz cosine wave (242.424 samples per cycle, 1.056 cycles/segment).

Page 27: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

26

Figure 27: Successive analysis grains for integer sampled cosine wave with freq 300Hz. WLen = 256, Ra = WLen/4.

Figure 28: Input and output for 300Hz cosine wave (8000/300 ~ 26.667 samples/cycle).

Page 28: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

27

3.3.   Non­integer windowing with cosine wave input   For contrast, non-integer windowing is demonstrated below. Here, we see the results of truncation, which leads to spectral leakage in the frequency domain. The resulting signal is not reconstructed accurately. This is most apparent by viewing the amplitude envelope of the output signal. The envelope is no longer rectangular, as it has undergone a small amount of amplitude modulation. Interestingly, the spectral leakage bears resemblance to a simple AM modulated waveform when viewed in the frequency domain – a carrier wave/band with two sidebands. For the phase vocoder, however, we want to avoid this – that means no non-integer windowing (we will, however, apply non-integer hop sizes for certain effects).

Figure 29: Successive analysis grains for non-integer windowing of 100Hz (80 samples/cycle) cosine wave (top). Input and output of PV (bottom). Nx_in = 874.3, WLen = 293.7.

Page 29: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

28

Figure 30: Successive analysis grains for non-integer windowing of 300Hz (~ 26.667 samples/cycle) cosine wave (top). Input and output of PV (center). Zoomed view of output, showing the results of truncation (which leads to spectral leakage). Nx_in = 874.3, WLen = 293.7.

Page 30: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

29

4. Waterfall Plots The time-frequency (waterfall) representation of a signal is useful for visualization/analyzing the spectral content as as it changes with time. Also, the relationship between frequency resolution and window size becomes very apparent when it can be seen, and not only heard. Waterfall plots represent a signal in terms of its successive FFT frames – in other words, as a function of the sample index of windowed time segments, and of the frequency bins. For a window length of WLen samples, we have an FFT size of WLen frequency bins. The waterfall plots throughout this paper evolved, following the evolution of the waterfall plot function, which has undergone frequent modifications. Four different time-frequency representations will be used for analysing this phase vocoder design and operation:

1. Amplitude Waterfall (both linear and logarithmic frequency scales are used)

2. Phase Waterfall

3. Magnitude Waterfall (both linear and logarithmic frequency scales are used)

4. Phase vs. Time plot of Frequncy Bins near the maximum signals maximum amplitude in the frequency domain. This type of plot is essentially a phase waterfall that has been zoomed in to a typical area of interest – in the case of cosine waves, the center bin is the bin closest to the waves frequency. In the case of more complex signals, the plot will often be centered close to the fundamental.

Before plotting the 3D time-frequency representations, a test plot of the FFT frames was completed, as shown in Figure 31. Each subplot corresponds to the FFT of each analysis grain. The corresponding waterfall plot will simply involve linearly orienting the FFT frames along the time axis, according to the index of each window (the block index). The spacing (number of samples) between frames in the time domain is determined by the hop size.

Page 31: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

30

Figure 31: FFTs of each frame for a 500Hz cosine wave 1000 samples in length. WLen = 256, Ra = WLen/4.

4.1. Phase vs. Time, Instantaneous Frequency, and Frequency Resolution  To properly interpret the phase vs. time plots for frequency bins close to the frequency of a cosine wave, we will first derive an expression for instantaneous frequency. The goal is to understand why the phases change with time for cosine waves of different frequencies. Let the input cosine waveform be expressed by:

cos  cos  2   Instantaneous frequency is defined as the rate of change in the phase. From this we can conclude that

when the phase is not changing, the instantaneous frequency  and the frequency of the cosine

wave is constant. 12

After taking the FFT of a segment of x, we will focus on the instantaneous frequency of one frequency bin:

Page 32: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

31

,12

,

This can be expressed in terms of the unwrapped phase difference, bringing us back to equation (5).The instantaneous frequency for bin k at time 1 is:

Since frequency bins are spaced according to the frequency resolution , some frequencies will be directly centered on a bin, and some will lie between two bins. For maximum frequency resolution and the most accurate measurement of the sinusoids phase and amplitude, we could use rectangular windowing. However, this phase vocoder will not simply be used for analysis of sinusoids; it will be used for audio effects. Rectangular windowing spectral leakage, which we would like to minimize. For cosine wave frequencies that have both an integer number of samples per cycle and lie precisely on a multiple of , the Phase vs. Time plot shows unity phase in the frequency bin where maximum amplitude occurs. This makes sense, considering that the frequency of the cosine is not changing, that is, the instantaneous frequency 0 . The exception is at the truncation points where the input cosine has initially been rectangular windowed. At these points in time, spectral leakage is visibly occurring, which is indicated by the presence of visible sidebands. Even with modified hann window tapering specifically catered to OLA, we notice spectral leakage at the end points, where no overlap adding is occurring. When (it does not lie directly on the bin), the plots below show that we do not get perfectly linear phase in the nearest bin – the phase is time varying. This is because we are not actually measuring the phase of the input cosine wave – we are measuring the phase of the nearest frequency bin. The nearest frequency bin is centered on a slightly different frequency than the cosine wave. Since it’s a slightly different frequency, it will sometimes be in phase with our cosine wave (having 0 phase at these points), and sometimes be out of phase (the relative phase changes with time). The closer the measured bin is to the center frequency, the slower the phase will change with time. This confirms that for the most accurate phase measurements, we want a high frequency resolution The above is analogous to beat frequencies: two nearby tones exhibit a beat frequency when they are slightly off in frequency. As we tune one tone to the other, the beat frequency slows, and when the signals have the exact same pitch, it stops. In terms of musical applications, a higher frequency resolution can be compared to a better “tuner”. With a better tuner, the frequency of an out of tune tone can be measured more precisely.

Page 33: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

32

4.2.  Amplitude, Magnitude, and Phase in Time­Frequency Plane 

The following time-frequency plots coincide with the earlier testing from section 3. The constant parameters are: Sampling frequency: fs = 8000Hz Window length: WLen = 256 samples Frequency resolution: f0 = 31.25Hz

Integer Samples per Cycle, Many Cycles per Segment

In the example below, the 2000 Hz cosine wave input corresponds to a 4 samples per cycle. With the exception of spectral leakage caused by the discontinuous cosine wave, the amplitude spectrum is well represented, with a peak at 1. The cosine wave frequency f1= 2000Hz is centered on one of the FFT bins. As a result, we have a 0 phase representation in the phase waterfall. The phase appears to be random everywhere else in the spectrum (and appears even more so with a higher frequency resolution). The phase at all values other than f1 should be ignored, as the points were calculated from round-off errors at frequencies where the spectrum should be zero. This problem can be largely solved if a centered hann window is used – but we have already discussed the need for an offset window for best results with the overlap add process.

Figure 32: Amplitude and Phase waterfall representations 2000Hz Cosine Wave input (Nx = 5000 samples)

Page 34: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

33

The plot of the bins near the maximum amplitude bin visually confirms the discussion in the previous section. Here, the f1 is bin centered, and the phase is constantly 0. In the direct neighbouring bins, which are slightly off in frequency, the phase is still constant up to the points where truncation occurs, and sidebands cause their phase to take a jump from unity to +-180 degrees (opposite for upper and lower sidebands).

Figure 33: Phases of nearby bins for 2000Hz Cosine Wave input (signal length = 5000 samples)

Figure 34: Phases of nearby bins for 2000Hz Cosine Wave input (signal length = 1000 samples).

Integer Samples per Cycle, 1 Cycle per Segment

0

1000

2000

3000

4000

5000

18501900

19502000

20502100

-3

-2

-1

0

1

2

3

n [samples]

Phase vs Time for Freq Bins Near Fundamental (2000Hz-CosineWave)

f [Hz]

Arg

X w(f)

[ra

d]

-3

-2

-1

0

1

2

3

0

200

400

600

800

1000

1200

1400

18501900

19502000

20502100

-3

-2

-1

0

1

2

3

n [samples]

X: 256Y: 2000Z: 0

Phase vs Time for Freq Bins Near Fundamental (2000Hz-CosineWave)

f [Hz]

Arg

X w(f)

[ra

d]

-3

-2

-1

0

1

2

3

Page 35: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

34

For the WLen = 256 samples and fs = 8000Hz, we achieve 1 cycle per segment for:

f1 = 31.25Hz (256 samples per cycle)

31.25Hz is gain centered on an FFT bin (the first bin). The amplitude spectrum between the truncated edged of the cosine wave segment is slightly misshapen, we don’t have zero phase. The phase is changing with time in the 31.25Hz bin. This is because successive windows are out of phase with one another. We need to go through four successive windowings (since Ra = WLen/4) to obtain two identical phases. The phase starts at 0, and the phase is again 0 after every 4nd successive windowing, The kernel plot in Figure 37 helps to illustrate this.

Figure 35: Amplitude and Phase waterfall representations of 31.5Hz Cosine Wave input (Nx = 2048samples)

Page 36: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

35

Figure 36: Amplitude and Phase waterfall representations 31.5Hz Cosine Wave input (Nx = 2048 samples)

Figure 37: Successive analysis grains (prior to cyclic shift) for 31.5Hz Cosine Wave input (Nx = 2048 samples)

Integer Samples per Cycle, Fractional Cycle Per Segment

Page 37: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

36

For the WLen = 256 samples and fs = 8000Hz, we achieve can achieve a fraction of a cycle per segment with integer samples per cycle for a number of frequencies. The chosen frequency is:

f1 = 4Hz (2000 samples per cycle)

Since the period of this cosine wave is much larger than the window size, the actual amplitude of the cosine wave (its maximum or minimum value) will only be measured once for every 4 full window lengths traversed by the kernel algorithm(corresponding to 16 hops/successive windowings). The trend related, but different for the phase. The phase readings are constant between consecutive hops, but after every 4 full window lengths the value jumps between 0 and pi. Each time we hit zero phase in the bin closest to f1, we see a re-occurring pattern across the spectrum that results from measured phase values in the sidebands - the result of spectral leakage. Since f1= 4Hz is below the lowest frequency bin, a similar trend is also occurring just below the Nyquist frequency. In an attempt to represent the signal with the frequency bin below f1, which does not exist, the phases are being folded back into the bin at the top end of the spectrum.

Figure 38: Amplitude and Phase waterfall representations for 4Hz Cosine Wave input (Nx = 5 000 samples).

Page 38: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

37

Figure 39: Magnitude Spectrum in dB for 4Hz Cosine Wave input (signal length = 5 000 samples).

Figure 40: Phases of nearby bins for 500Hz Cosine Wave input (Nx = 5000 samples).

0

1000

2000

3000

4000

5000

0

20

40

60

80

100

120

-2

0

2

n [samples]

Phase vs Time for Freq Bins Near Fundamental (4Hz-CosineWave)

f [Hz]

X: 4544Y: 0Z: 0

X: 4928Y: 0Z: 3.142

Arg

X w(f)

[ra

d]

-3

-2

-1

0

1

2

3

Page 39: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

38

Non-Integer Samples per Cycle, Many Cycles per Segment

With a frequency f1 = 1500 Hz, we have an example using non integer samples per cycle that lines up perfects in the center of a bin. As in the case with integer samples per cycle, the phase is constantly 0 in the center bin. Neighbouring bins experience deviation when the sidebands leak into them at points of truncation.

Figure 41: Amplitude and Phase waterfall representations for 1500Hz Cosine Wave input (Nx = 1025 samples). Figure 42: Phases of nearby bins for 1500Hz Cosine Wave input (Nx = 1024 samples).

Page 40: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

39

Non-Integer Samples per Cycle, ~1 Cycle Per Segment

With a frequency f1 = 33 Hz, we have 242.424 samples per cycle, and 1.056 cycles per segment. Viewing the amplitude spectrum, the trend resembles what we saw with 1 cycle per sample. There are some inconsistencies in the shape of the amplitude spectrum. Some of the energy from the fundamental has leaked into other parts of the spectrum. The phase jumps between 0 and 180 degrees after each 2 successive windowings (nonlinear phase). This jump in phase means that the instantaneous frequency of the cosine wave is not constant. If we zoom in closely on the magnitude spectrum, this is apparent.

Page 41: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

40

Figure 43: Amplitude and Phase waterfall representations for 33 Hz Cosine Wave input (Nx = 2048 samples).

Page 42: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

41

Figure 44: Phase waterfall for 33Hz Cosine Wave input (Nx = 2048 samples).

Figure 45: Phases of nearby bins for 33Hz Cosine Wave input (Nx = 2048 samples).

Page 43: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

42

Figure 46: Zoomed view of a portion of the magnitude spectrum [dB] that displays the peak amplitude of the bins closest to f1. The instantaneous frequency changes with the phase.

Non-Integer Samples per Cycle, Fractional Cycle Per Segment

A frequency of f1 = 7 Hz was chosen for a cosine wave that has non integer samples per cycle, and undergoes less than one cycle per segment. The general trends look is similar the Integer samples per cycle example (correct amplitude values are only measured when the cosine wave maxima/minima is windowed); however, spectral leakage is more severe in this case. Since we are no longer centered on a frequency bin, the phase exhibits nonlinear behaviour, and creates sidebands that fluctuate in intensity (the energy that they tap from the fundamental is greatest when the amplitude of the center bin is lowest. At the first truncation point, the sidebands peak at -20dB in neighbouring bins.

Page 44: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

43

Figure 47: Amplitude and Phase waterfall representations for7 Hz Cosine Wave input (Nx = 2048 samples).

Page 45: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

44

Figure 48: Magnitude spectrum of 7Hz cosine wave input (Nx = 2048 samples)

Figure 49: Phases of nearby bins for 7Hz Cosine Wave input (Nx = 2048 samples).

Page 46: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

45

5. Cyclic Shift

5.1.  Cyclic Shift with Cosine Wave Input  All previous signals were tested with a cyclic shift applied in order to center the FFT (fftshift was used). A 500 Hz cosine wave will be used here, first represented with cyclic shift applied:

Figure 50: 500 Hz cosine wave amplitude and phase waterfalls with cyclic shift applied.

Page 47: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

46

Figure 51: cosine wave phase waterfalls with cyclic shift applied (alternative view)

Figure 52: 500Hz cosine wave phase magnitude waterfall with cyclic shift applied.

Page 48: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

47

Figure 53: Phases of nearby bins for 500Hz Cosine Wave input (Cyclic shift applied) .

Without Cyclic Shift   

When we remove the cyclic shift, the phase response in bins that don’t have zero face is altered quite significantly. It’s difficult to interpret the relationship with a cosine wave, however. We have removed a common 0 phase reference at the window center, and the resulting FFTs are not centered with a phase of 0 in frequency bin 0. Phase unwrapping can still be performed, but it will require a more complicated algorithm than the one outline here. Where we would likely have significant problems, is with frequencies that lie between frequency bin centers, in schemes with not great frequency precision (such as the one used here, with the modified hann window). Figure 54 shows a drastic change in the phase relationship between the bins directly next to the center bin, which has for practical purposes retained its 0 phase. The nextdoor bins, however, are 180 degrees out of phase, where with a cyclic shift applied, they lie at zero between the truncation regions. The phase relationship seems to be the same as the cyclic shifted phase relationship in every second bin out from the center bin (which has 0 phase). However, every odd bin out from this reference point has a very different phase relationship.

Page 49: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

48

Figure 54: Phases of nearby bins for 500Hz Cosine Wave input (Cyclic shift not applied)

Page 50: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

49

5.2. Cyclic Shift with Kronecker Delta Function Input 

%========================================================================= % Testing Phase Vocoder on Unit Impulse %[y,ModIn,PhasesIn,ModOut,PhasesOut]=PVOCODER_FFT(x,fs,WLen,Ra,Rs,TAG) %========================================================================= x_imp = zeros(1000, 1); x_imp(1) = 1; % impulse input WLen = 256; % window length Ra = WLen/4; % analysis hop size Rs = WLen/4; % synthesis hop size By analyzing a burst of broadband energy, it is somewhat easier to observe what is happening in the across the frequency spectrum when cyclic shift is/is not applied. A Kronecker Delta Function will be used as the input, as defined above.

With Cyclic Shift 

After framing the input segment, a cyclic shift is applied, centering the analysis grain at the time origin , thereby assigning 0 phase to the window center. Using a circular shift, the analysis window is centered at the time origin. The resulting FFT has 0 phase associated with bin 0.

Page 51: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

50

Figure 55: Amplitude (top) and phase (bottom) waterfall representations of delta function (cyclic shift is

used)

Page 52: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

51

Figure 56: Time-Frequency representations of amplitude and phase

 

Without Cyclic Shift 

Without cyclic shift, there is again a change in the phase spectrum. With the simple input signal used, we can see that the bin closet to the origin has a maximum value reaching toward 180 degrees, while with cyclic shifted example, the same bin is a value closer to zero phase, which is the value at the origin. The phase values exhibit a more erratic behaviour along the both the sample index axis and the frequency bin axis when cyclic shift is not used. Phase unwrapping will certainly be more complex, and the correct target phase values may be ambiguous.

Figure 57: Input and Output signals for Impulse input (top), selected analysis grains (bottom).

0200

400600

8001000

12001400

020

4060

80100

120

-3

-2

-1

0

1

2

3

n [samples]

Phase vs Time for Freq Bins Near Fundamental (Kronecker-Delta)

f [Hz]

Arg

X w(f)

[ra

d]

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Page 53: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

52

6. Preliminary Audio Testing Before implementing and effects, the basic phase vocoder I/O operation was tested to verify that the analysis-resynthesis stages perform correctly, and do not introduce significant artefacts to the sound. Testing was performed on various audio signals with different timbres and attack styles. For signals that are defined largely by transient components, such as percussion recordings, it was found that a smaller

Page 54: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

53

window size (WLen = 1024 samples seemed to work) was more effective than a large window. This is because with a smaller window, FFTs are taken more frequently, and hence the rapidly changing transients are better tracked. With a larger window size on percussive material, transients were noticeably distorted, with a smeared and less percussive attack. On the other hand, for pitched instruments with more subtle transients, a larger window size is preferred (WLen = 4096 samples was frequently used). The higher frequency precision that a larger window/FFT size provides is particularly well suited to largely stable signals with a complex timbre (such as a bowed violin). The catch here, however, is that a “rich” timbre is typically characterized as being abundant in partial harmonics and broadband spectral content, much of which is high frequency energy that decays quickly. Still, for slowly changing signals in general, the larger FFT size facilitates a reconstructed signal with a spectral representation that is more faithful to that of the original signal. For many test signals, in general, a reasonable compromise between transient response tracking and frequency resolution was achieved with a window size of 2048 samples. The flute2.wav was chosen to demonstrate the input/output performance of the phase vocoder, as the waveform is quite simple in shape (a single note with a volume swell and changing dynamics). The flute “vibrato” is actually more of a tremolo, as it consists mainly of a rapid variation in amplitude and timbre, not in pitch. This shows up in the envelope of the waveform, and seems to gain emphasis when certain effects are applies, such as time stretching. Figure 58 shows an apparently clean re-synthesis of the input signal. This audio example and the topic of frequency resolution will be explored in the wha-wha filter implementation section.

Figure 58: IO of flute2.wav (WLen = 4096).

Page 55: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

54

Figure 59: Magnitude [dB] waterfall of flute2.wav.

Figure 60: Diner Input Amplitude Waterfall

Page 56: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

55

Figure 61: Diner Input Magnitude Waterfall [dB]

7. Implementing Audio Effects For the most part, audio effects were implemented in the frequency domain between the analysis and resynthesis stages of the kernel algorithm. Certain effects, such as pitch shifting, required additional considerations during resynthesis (interpolation/resampling in the case of pitch shifting, which was also used in robotization). In such cases, having all stages of the processing in a single kernel loop was convenient. An alternative phase vocoder implementation [2] that uses separate analyisis/synthesis functions was very helpful as a learning tool, but less convenient in practice when effects processing required special considerations during the analysis and syntheses stages.

7.1.  Time Stretching  To implement time stretching, a time stretch ratio was defined in the kernel algorithm initialization stage as follows.

tStretch = Rs/Ra % time stretch ratio Ra is the analysis hop size, and Rs is the synthesis hop size. The time stretching function is dependent on different window sizes for analysis and synthesis. For the basic I/O phase vocoder outlined earlier, tStretch was equal to 1 (no time stretching). By using a different synthesis hop size, we are essentially re-

Page 57: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

56

sampling the signal during synthesis at a different sampling rate. This will produce a time stretch, but on its own, it will also produce pitch shifting. In order to time stretch the signal without applying pitch shifting, we have to do some frequency domain processing on the phase. Finally, phase unwrapping is put to use. The frequency domain processing portion of the code for the time stretching implementation of the phase vocoder is shown below. This code segment fits between the analysis and synthesis portions of the kernel algorithm. %========================================================== % Frequency Domain Processing %========================================================== %----------- Phase Unwrapping --------------- phi_d = princarg(phi-phi0-omega); % devlation phase (3) % phase increment delta_phi: the phase difference between two adjacent frames for each bin, added to nominal phase of the bin delta_phi = omega + phi_d; phi0 = phi; % measured phase %--------- Target Phase Calculation --------- % implemetents time stretching by ratio Rs/Ra phi_t = princarg(phi_t + delta_phi*tStretch); % target phase The phase unwrapping section was described earlier. The target phase calculation, however, will make use of the time stretching ratio. By scaling the phase increment with the stretch ratio, we calculate new target phase values that preserve the instantaneous frequency of each bin during resynthesis. This results in the same time stretching provided by the different Ra and Rs hop sizes, but the pitch of the original signal is retained. Window size is important (WLen should be the same size as the FFT), and should ideally be selected based on the signal type (for example, a smaller window size for signals that have a significant transient component will allow rapid spectral changes to be tracked, at the expense of frequency precision: listen to 'EidolDrum-TimeStretch2-L1024-Ra128Rs256.wav). Also, sufficient overlap must be preserved between window segments. The synthesis hop size in relation to the window size is critical. For the moderate time stretching that was performed a suitable analysis hop size/window length ratio was found to be:

1/8

If is too large, for example, and the signal is stretched to several times its length, a situation can

arise where there is insufficient overlap to perform overlap adding in the resyntheis stage, resulting in a butchered output signal.

Page 58: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

57

Figure 62: Time stretching on a cosine wave by a factor of 2. The envelope of the reconstructed signal is not ideal at the ends of the signal, as the original cosine wave was rectangular windowed before being applied to the phase vocoder. Overlap adding in the middle of the signal, however, is preserved. With real audio signals, we typically have a smoother transition to and from silence.

Page 59: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

58

Figure 63: Phases near max amplitude bins for time stretching 2 cosine waves, 500Hz (top) and 2000Hz (bottom).

Page 60: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

59

Figure 64: diner.wav time stretched by a factor of 2.

Figure 65: Closer look at the end of both input and output waveforms for diner.wav time stretched by a factor of 2.

Page 61: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

60

7.2.  Pitch Shifting  To perform pitch shifting without effecting the duration of the signal, a system of integrated resampling was combined with a linear interpolation scheme, as outline in [1]. The two basic steps are as follows:

1. For each grain, a time stretching is performed with a stretch ratio of Rs/Ra. This changes the pitch and duration.

2. To retain the new pitch, but keep the original signal duration, each FFT is resampled to a

length , where the NFFT is the FFT size. The integrated resampling is conducted

using an interpolation scheme, and the resulting interpolated grains are overlap added in the time domain.

Figure 66: Overview of block-by-block pitch shifting with integrated resampling (an interpolation scheme) [1].

A linear interpolation scheme was used for integrated resampling, and has also found its way into alternative implementations of some of the other audio effects that were tested. Since pitch shifting involved many initializations for the interpolation scheme, and interpolation was performed in the resynthesis stage of the kernel algorithm, the entire kernel algorithm of the function pv_Pitchshift.m will be included here (the full phase vocoder function is included in the appendix). First, the pitch shifting specific initializations will be highlighted:

Page 62: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

61

%-------------------------------------------------- % FX-specific initializations for pitch shifting %-------------------------------------------------- tStretch = Rs/Ra % time stretch ratio %-------Linear Interpolation Parameters--------- Lresamp = floor(WLen/tStretch); % length of resampled/interpolated grain nInterpSpace = linspace(0,Lresamp-1,Lresamp)'; % linear spaced time row vec nfracInterp = 1 + nInterpSpace*WLen/Lresamp; nInterp0 = floor(nfracInterp); % Lresamp length vector of sample integer % values between 1 and WLen nInterp1 = nInterp0 + 1; % Lresamp length vector of sample integer % values between 2 and WLen+1 frac0 = nfracInterp - nInterp0; % fractional distances of integer samples % below interpolation points frac1 = 1-frac0; % fractional distances of integer samples % above interpolation points Output = zeros(Lresamp+Nx,1); % initialize output vector (overlap-added % interpolated synthesis grains)

The analysis portion is shared with the original implementation that was outline. In the frequency domain, identical processing is performed as with time stretching. However, the resynthesis portion introduces linear interpolation between successive grains, using the frac0 and frac1 fractional distances (which compare integer sample points to interpolation points) that are denoted above. Frequency domain processing and resynthesis portions of the pitch shifting kernel algorithm are included below:

%========================================================== % Resynthesis Portion with Linear Interpolation %========================================================== ft = r.*exp(j*phi_t); % FFT with ith grain target phase rt = abs(ft); % output amplitude ModuliOut(1:win_end, i+1) = rt; % store output moduli (same as input) PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases %-------------- Inverse FFT & Windowing ------------ tIFFT = fftshift(real(ifft(ft))); % shifted IFFT grain_t = tIFFT.*w2(1:win_end); % inverse windowing (tapering) %----------------- Interpolation -------------------- grain_t2 = [grain_t;0]; % pad w/ single zero to allow interpolation % between succesive grains

Page 63: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

62

% apply linear interpolation (integrated resampling): grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0; if (numWin_s <= 24) % plot this grain kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs); end %----------Overlap Adding of Resampled Grains--------- Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3; vIn = vIn + Ra; % sample index for start of next block vOut = vOut + Ra;

A rudimentary harmonizer is built into the pitch pv_Pitchshift.m function, which keeps the left channel of the original signal, and combines it with a pitch shifted version of the signal, stored in the right channel. The harmonizer is not key centered, is simply harmonized based on a specified interval. While this is not very useful in practice, is was interesting to experiment with adding specific harmonies to percussion lines and pitched instruments. Also, I did some experimentation with micro tonal pitch shifting and harmonies. This, of course, is highly dependent on the frequency resolution used. The various intervals and input parameters are listed below for interest. Most of the audio examples are minor third and tritone intervals. Of course, few of the intervals are obtained exactly using the rounding system used. The exception is the perfect fifth, a ratio of 1.5.

%-----choose parameters by ~pitch shifting interval------ Ra = WLen/8; % analyis hop size %interval = 32805/32768; % skhisma (results in bad approximation) %interval = 2048/2025 % diaschisma %interval = 81/80 % Syntonic coma %interval = 1/4 % 2 octaves below %interval = 1/2 % octave below %interval = 5/7 % tritone (septimal/Huygens) below %interval = 3/4 % perfect fourth below %interval = 5/6 % minor third below %interval = 7/6 % augmented second (septimal minor 3rd) %interval = 6/5 % minor 3rd interval = 7/5 % tritone (septimal/Huygens) %interval = 10/7 % diminished 5th %interval = 3/2 % perfect 5th %interval = 8/5 % minor 6th %interval = 13/7 % tridecimal minor third %interval = 7/4 % harmonic seventh (septimal min/subminor) %interval = 16/9 % minor 7th Rs = round(Ra*interval) % analysis hop size for ~tStretchRatio tStretchRatio = Rs/Ra % time stretching ratio

Page 64: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

63

%============================================================== % Create Harmony (occurs after kernel loop of pv_Pitchshift.m %============================================================== harmony = zeros(Ny,2); % stereo output file % assign input to left channel and zero pad to output length harmony(:,1)=[Input; zeros(Ny-Nx,1)]*0.9; harmony(:,2)=y*0.9; % assign shifted signal to right channel harmTAG = [wavfile(1:length(wavfile)-4),'-Harmony',num2str(tStretch)]; harmTAG = strcat(harmTAG,'-L',num2str(WLen)... ,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav'); %for file naming wavwrite(harmony,fs,harmTAG);

7.3. Stable/Transient Components Separation 

The separation of stable and transient components results no so much in a complete separation, but in two distinctively different sounds. The transient portion can be considered a ‘fractalization’ of the sound, and the stable component is an ‘etherization’ of the sound [1]. If the process is performed correctly, these two signals should be able to reproduce the original signal if added back together. The basic operation was achieved as follows:

1. The instantaneous frequency in each bin (the derivative of the phase with respect to time) was calculated.

2. The instantaneous frequency was checked against a threshold value, which was used to define whether or not it was in a stable range.

3. In the case of stable component preservation, bins defined as ‘stable’ were kept and used to reconstruct the stable part of the signal (the ethereal part). In the case of transient component preservation, the unstable components of the signal are kept for reconstruction (the fractal part)

The conditional part of this algorithm for defining ‘stability’ can be mathematically expressed as [1]:

(DAFX 8.47)

(7)

Page 65: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

64

What is happening here, is that we are monitoring the instantaneous frequency with time. In the case of a pure sinusoid, the instantaneous frequency does not change with time (the phase remains 0) – this is a stable signal. In the case of a stair drum blast, there is a rapid impulse of broadband energy that decays quickly – this is a transient, or unstable component. Equation 7 maps to the diagram below, and the phase unwrapping notation that was used earlier. The area enclosed by the angle dfRa is where we define stability, which is calculated with reference to the expected target phase.

Figure 67: Defining the stable range for transient component separation [1].

The implementation of this algorithm in Malab can be used to provide a less conceptual explanation. Referring to the kernel algorithm block diagram that was developed earlier, two parts of the code are modified: the FX specific initializations, and the frequency domain processing: %---------------------------------------------- % FX-specific initializations %---------------------------------------------- df = weight*2*pi/WLen; % preset threshhold value (corresonds to (8.46),DAFX) threshAngle = df*Ra; % angular range about target phase defined as stable phi1 = zeros(WLen,1); phi2 = zeros(WLen,1); grain = zeros(WLen,1); Output = zeros(Nx,1); % initialize output vector

phi_d = princarg(phi-2*phi1+phi2);

%====================================================================== % Frequency Domain Processing (FX-specific for Transient-Stable separtion) %======================================================================

%-----Find Phase increment per sample for all frequency bins--- phi_d = princarg(phi-2*phi1+phi2); % deviation phase phi_t = phi; % target phase = measured phase

Page 66: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

65

%-----Check if frequency within stable range for each bin------ % -if not stable & mode = 0, set amplitude to 0 in bin (keep stable) % -if stable & mode = 1, set amplitude to 0 in bin (keep transients) if mode == 0 r = r.*(abs(phi_d) < threshAngle); %remove transients, keep stable else r = r.*(abs(phi_d) >= threshAngle); %remove stable, keep transients end

The results of this effect were less inspiring that most of the other effects. It is rather difficult to obtain a anything close to what one would expect as a clean separation of components, even for relatively simple signals. Regardless of the quality of the effect, however, this was a very interesting and productive exercise that yielded some unexpected results when attempting to define exploring stability thresholds.

7.4.  Robotization 

Robotization is inherently a very simple effect as the phase vocoder is concerned. The basis effect has one operation in the frequency domain, and that is to set the target phase in every bin on every FFT to zero. This forces the sound to become periodic – that is, mostly stable. The reason that we get periodicity, is that each IFFT (synthesis grain) is essentially a burst of sound. When we overlap add these pulses back into the time domain, the result is a periodic train of pulsed sounds. This will impose a pitch on the output signal, which is defined by the hop size Ra and the sampling rate:

The result of robotization is a very inorganic, synthesized version of the input sound, lacking expression from the original performance. Transient based signals such as drums take on a complete retransformation. For the basic implementation, the single line of code placed in the frequency domain processing portion of the kernel algorithm is:

%========================================================== % Frequency Domain Processing (FX-specific for Roboto) %========================================================== %----Put 0 phase values on every FFT for robotization---- phi_t = 0; % set target phase as zero

However, we can expand on this. The basic Robotization scheme requires that Ra is an integer. For added flexibility, with the goal of calculating the hop size from a defined frequency value, linear interpolation was employed in the finction pv_RobotoLinterp.m. In this way, audio was robotized

Page 67: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

66

using fractional hop size values. Referring to the long list of linear interpolation initializations provided in the pitch shifting code overview, the resynthesis portions becomes:

%========================================================== % Resynthesis Portion with Linear Interpolation %========================================================== ft = r.*exp(j*phi_t); % FFT with ith grain target phase rt = abs(ft); % output amplitude ModuliOut(1:win_end, i+1) = rt; % store output moduli (same as input) PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases %-------------- Inverse FFT & Windowing ------------ tIFFT = fftshift(real(ifft(ft))); % shifted IFFT grain_t = tIFFT.*w2(1:win_end); % inverse windowing (tapering) %----------------- Interpolation -------------------- grain_t2 = [grain_t;0]; % pad w/ single zero to allow interpolation % between succesive grains % apply linear interpolation (integrated resampling): grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0; if (numWin_s <= 24) % plot this grain kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs); end %----------Overlap Adding of Resampled Grains--------- Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3; vIn = vIn + Ra; % sample index for start of next block vOut = vOut + Ra;

Page 68: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

67

Figure 68: Robotization applied to the input signal TranSiberianDrum.wav.

7.5.  Whisperization 

Whisperization is another effect that is simple to implement within the existing phase vocoder structure. The effect contrasts with Robotization , in that a random phase is imposed on each FFT frame before resynthesis. The effect removes periodicity (pitched components) and harmonic relationships from a signal. The degree to which a sound is “Whisperized” is controlled by the window length and hop size. For sufficient removal of nearly all pitched components, a window size of 512 or less and a hop size of WLen/8 was effective. The single FX processing operation required between analysis and resynthesis stages of the outline phase vocoder model are is shown below.

%========================================================== % Frequency Domain Processing (FX-specific for Whisperization) %========================================================== %----Randomize phase values on every FFT for whisperization---- phi_t = 2*pi*rand(WLen,1); % set target phase as pseudorandom

Page 69: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

68

An interesting effect was achieved by using whisperization on the audio file TranSiberianDrum.wav ,a percussive audio sample that was already somewhat lacking in pitched components. The whisperization function is called pv_Demonize.m , as the “whispering” that is produced on vocals typically sounds sinister.

7.6.  Denoising  Denoising was originally attempted by applying a time domain compressor concept, but along the frequency bins of each FFT. The idea was that attack and release time could conceptually be replaced by a gain ramps (in essence, resulting in a filter that is dynamically applied to scale the FFTs). The purpose would be to shape the spectrum when a defined threshold is reached in a particular range of bins, removing noise from the overall signal. This idea was more simple in concept than in application, and a working model was not successfully implemented. Instead of a brute force (and highly computationally complex) denoising algorithm, a more elegant approach was taken, as suggested in [1]. A noise threshold was roughly tuned with a scalar coefficient (taken into the phase vocoder as an input parameter), and a nonlinear function of the FFTs magnitudes and this coefficient was used as a simple noise gate, The basis frequency domain portion of the code is below, where: The FFT of the current frame is defined by f the FFT amplitude is defined by r = abs(f) the output FFT with noise gating applied is ft, which gets its IFFT taken in the resynthesis

stage in preparation for OLA. The nonlinear filter coefficient is coef [a typical value was found to be coef = 0.0002]

%========================================================== % Frequency Domain Processing (FX-specific for denoising) %========================================================== %--------------Apply denoising------------- r_mag = 2*r/WLen; % magnitude of f in quantity peak ft = f.*r_mag./(r_mag + coef); % nonlinear function for noise gate

Denoising modifies the timbre of a sound, typically by smoothing out the spectrum and removing a significant portion of the high frequency energy. This very simple algorithm was is quite effective at removing noise; as a consequence, instruments with rich timbres undergo significant colouration. The audio examples provided are based more on sound colouration attribute of the denoising than the act of removing unwanted noise. In one example, the audio file Tristram2.wav receives denoising in an attempt to remove some guitar fret noise, which was masking the vibrato that was being applied on

Page 70: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

69

a double stop. Of course, the resulting audio has undergone a drastic change in spectral content – so this is not a truly practical application for the basic denoiser.

7.7.  What Wha Filter in Freq Domain 

Preliminary BPF Design 

To achieve a frequency domain implementation of a wha-wha filter using a phase vocoder, the bandpass filter from Assignment 2 was first redesigned to new, realistic specifications (based on Assignemnt 3). The specifications for the bandpass filter to be used to create a wah-wah effect were chosen to be: sampling rate fs = 44,100 Hz center frequency f1 = 44,100/64 = 689 Hz. 3dB bandwidth B = 100 Hz implementation uses 2 poles and 2 zeros This structure for the preliminary BPF design is based on the allpass filter (page 41 of DAFX). The center frequency is altered often enough to make a smooth sound without noticeable transients.

Figure 69is the block diagram for a tuneable second order allpass filter, denoted A(z). From this structure, the tuning parameters c and d are used to implement a second order BPF.

Figure 69: Second-order allpass filter block diagram [1]

Page 71: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

70

For the allpass filter of Figure 1:

1

1 1 (8)

For a second-order bandpass/bandreject filter:

121

(9)

tan 1

tan2

1 (10)

cos2

(11)

Figure 70: Second-order bandpass and bandreject implemented with allpass filter [1]

Therefore, the second order bandpass transfer function can be expressed as:

    (12)

cut-off frequency (phase = -180˚, group delay is maximum) is

controlled by the coefficient d bandwidth is controlled by the coefficient c

Expanding this, we can find the a and b parametric filter coefficients. The terms in the numerator cancel out, and the resulting form is analogous to the BPF seen in assignment 2:

12

1 11 1

For constructing the filter in Matlab using the filter function, the coefficient arrays are would implemented as:

b = [(c+1) 0 (c+1)];

Page 72: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

71

a = 2*[(1 d*(1-c) -c];

However, since the BPF will be implemented as a wha filter in the phase vocoder, it will be represented manually for convenient manipulation in the frequency domain:

kBins = linspace(0,fs/2,WLen/2 + 1); % lin spaced freq bins up to Nyquist z=exp(2*pi*kBins/fs*j); %for manually expressing filter transfer function .... c and d defined in (8) and (9)

Num = (((c+1).*z.*z)-(c+1)); Den = 2*(z.*z + d*(1-c).*z - c); H = Num./Den;

Figure 71: Preliminary BPF design. The frequency domain implementation will apply the appropriate scaling to the amplitudes and phases (or optionally, only the amplitudes) of each FFT frame based on this filter shape, and move the center frequency with each successive window.

Frequency Domain Wha­Wha Implementation 

Page 73: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

72

%---------------------------------------------- % FX-specific initializations (wha filter) %---------------------------------------------- whaSpeed = WLen/whaRate; % Multiplier for filter sweep frequency kBins = linspace(0,fs/2,WLen); % linear spaced freq bins up to Nyquist z=exp(2*pi*kBins/fs*j); % for manually expressing filter transfer function Hframes = zeros(WLen,numWin_a+1); % Hframes(i*Ra,k) to store each block’s filter H(k) Output = zeros(Nx,1); % initialize output vector

%========================================================== % Frequency Domain Processing (FX-specific for Wha BPF) %========================================================== fc = f1*(1 + sweepRange*cos(2*pi*whaSpeed*i/fs)); % change center freq %----------------------------------------------- % Build Transfer Function w/ Filter Coefficients % allpass filter form (Pg 41-43, DAFX) d = -cos(2*pi*fc/fs); % apply new f_c to d param c = (tan(pi*B/fs)-1)/(tan(2*pi*B/fs)+1); % controls the bandwidth Num = (((c+1).*z.*z)-(c+1)); % b = [(c+1) 0 (c+1)] Den = 2*(z.*z + d*(1-c).*z - c); % a = 2*[(1 d*(1-c) -c] H = Num./Den; % BPF transfer function H(k) Hframes(1:win_end, i+1) = H; % store filter H(k) in Hframes(k,i*Ra) %------------------------------------------------------ % Filter this block by multiplication with its FFT % choose to keep current phase, or let filter change phase response keepPhase = 0; if keepPhase == 1 rt = r.*abs(H'); % scale amplitudes with filter phi_t = phi; % keep original phase ft = rt.*exp(j*phi_t); % FFT filtered with ith grain BPF else ft = f.*H'; % filter the current block phi_t = angle(ft); % output phase rt = abs(ft); % output amplitude end

Page 74: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

73

Figure 72: Amplitude waterfall of Wha filter applied to white noise.

Figure 73: Magnitude waterfall of Wha filter applied to white noise; log frequency scale (top), linear freq scale (bottom). Axis with label not visible is freq. Output file: whitenoise-Wha-fc689B100rate8rng1-L1024-Ra128Rs128.wav

Page 75: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

74

Figure 74: Input (top) and output (bottom) magnitude waterfall of flute.wav. Output file is flute2-Wha- fc689B100rate8rng0.8-L1024-Ra128Rs128.wav. Window size is 1024 samples. Comparing Figure 74 with Figure 75 shows the effect of window size on frequency resolution. The harmonics in the flute spectrum are better represented with the larger FFT size of 4096. With a smaller window size, we get smoother tracking for the wha sweeps as there are smaller increments in filter center frequency due to more frequent windowing and more FFTs. However, with a larger window size, the

Page 76: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

75

bandpass filter specs are better approximated due to the higher frequency resolution (more frequency bins in each FFT, with closer spacing between bins). Without applying an interpolation scheme, however, a wide sweeping wha filter with a 4096 window size will not sound as smooth to the listener’s ear. This is due to less frequent (and as a result, larger) increments of the BPF center frequency.

Figure 75: Input (top) and output (bottom) magnitude waterfall of flute.wav. Output file is flute2-Wha-fc689B100rate8rng0.8-L4096-Ra128Rs128.wav. Window size is 4096 samples.

Page 77: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

76

Figure 76: flute2.wav I/O of pv_whaBPF for window size 4096 (left) and 1024 (right)

Figure 77: Wha filtering on TyrSmidir1.wav. Output audio file is TyrSmidur1-Whaf1689B689spd130rng0.8-L1024-Ra128Rs128.wav.

Page 78: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

77

Figure 78: Magnitude waterfall plots on log freq scale (input top, output bottom) of Wha filtering on TyrSmidir1.wav. Output audio file is TyrSmidur1-Whaf1689B689spd130rng0.8-L1024-Ra128Rs128.wav. Apply the wha filter implementation to TyrSmidir1.wav produced interesting results, as there is a timbre change when the distorted guitars come in. Since there is more high frequency noise after this mark, we can hear the wha filter bringing out distortion harmonics in the upper end of the wha sweep range. This is particularly noticeable with a large window size, as the frequency resolution is higher and therefore the

Page 79: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

78

filter pass band is more narrow. The effect of bringing out harmonics is made more dramatic with a wider sweep that reaches into the higher frequency part of the spectrum (reaching over 8kHz, for example) or a higher default center frequency.

Page 80: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

79

Figure 79: Wha filtering on TyrSmidir1.wav. Output audio file is TyrSmidur1-Wha-fc689B100spd520rng5-L1024-Ra128Rs128.wav.

Page 81: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

80

8. Audio Compression Using Phase Vocoder Two different algorithms for data compression were used in the frequency domain. Aside from one initialization, as documented, below, the entire code fits between the analysis and synthesis portions of the phase vocoder kernel algorithm that has been outlined earlier. The two methods of compression are discussed on the following pages.

Figure 80: Frequency domain processing portion for data compression. Occurs betwen the Analayis and Synthesis stages of the phase vocoder. Mode = 0 corresponds to compression based on a Threshold; Mode = 1 corresponds to compression based on a specified number of bins to keep.

Page 82: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

81

8.1. Data Compression using a Threshold Amplitude for Eliminating Bins  Data compression on audio was first implemented by setting all frequency components (corresponding to the bins of the FFT) below a threshold amplitude to 0. Higher threshold values correspond to a higher level of compression. A threshold of 1 corresponds to the maximum value that particular windowed FFT. With a threshold of 1, every component of the signal represented is one of the maximum amplitude values in its FFT. With a threshold of 0, the output is identical to the input. This technique required a hop size of 1/8 to produce less than atrocious results. With medium levels of compression using this method (a threshold value of about 0.5), the reconstructed signal has significant noise components. With very high levels of compression, however, (thresholds closer to 1), the output sound consisted of primarily the fundamental, harmonics, and other key frequencies (such as those in the formants for vocals). This method produces unpredictable results, as each successive will often be represented by a different number of bins. Over the duration of several bins, it is possible to jump from a signal represented by many bins to a single sinusoid.

Figure 81: compression using a threshold of 0.5 times the maximum FFT amplitude of each frame on

Diner.wav. Output audio is diner-DataCompThresh0.2-L4096-Ra512Rs512. The large difference in signal envelope scaling is due primarily to normalization of the output signal to the peak level of a noise artefact in

the waveform. FFT size is 4096.

Page 83: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

82

Figure 82: Amplitude waterfall representations (input top, output bottom) of compression using a threshold of 0.5 times the maximum FFT amplitude of each frame. Output audio is Tristram2-DataCompThresh0.5-L4096-Ra512Rs512.wav. FFT size is 4096.

Page 84: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

83

Figure 83: Magnitude waterfall representations (input top, output bottom) of compression using a threshold of 0.5 times the maximum FFT amplitude of each frame. Output audio is Tristram2-DataCompThresh0.5-L4096-Ra512Rs512.wav. FFT size is 4096.

Page 85: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

84

8.2.  Data Compression by Keeping N Strongest Frequency Components  By keeping only a specified number of bins, Nkeep , which have the highest amplitude values in each FFT, a more consistent and slightly better form of compression was obtained. Nkeep was chosen based on a scaler value that we can call keepRatio, which must multiplied by the window size. Alternatively, the number of bins to keep can be manually entered, but this is less consistent if window sizes are being changed. The ratio floor(keepRatio*WLen) has varying acceptable values that compress that audio without creating noticeable artefacts. The value depends primarily on the timbre of the audio. For example, a single sinusoid will sound fine if it is represented by a single frequency bin (in this case, keepRatio = 1/WLen). For most audio, the minimum value of keepRatio for a reasonable quality compressed reproduction was found to be 0.7. If it is set to be less than 0.7, audible artefacts become apparent. True perceptual encoders can achieve much higher ratios than this by exploiting the masking effects of nearby frequency bins and applying the most compression at frequencies where we have the least sensitive hearing. Additionally, perceptual encoders exploit pre and post masking techniques to hide quantization noise around transients. However, the primitive model used here did achieve success with keepRatios as low as 0.7, which I did not expect. Most of the frequency bins that were scrapped were in the high end of the spectrum, at the edge or outside of our hearing range. As soon as frequency bins that are well perceived by human hearing are cut out, in this case, artefacts are noticeable. One reason that these artefacts sound unpleasant, is that the subtle richness of the timbre is being revoved, and but some of the high frequency harmonics are kept, sticking out in the spectrum. A alternative (but still simple) scheme is to scale the weaker bins to lesser values, and allocate less bits to them. A potentially better scheme is to simply allocate less bits to these values by increasing the discrete step between bits, rather than truncating the dynamic range. For example, in the extreme case, a frequency bin that is rather unimportant for human perception could have 2 values, on (1) and off (0).

Figure 84: Data compression on Tristram2.wav, keeping 0.7*WLen frequency bins (Len = 1024)

Page 86: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

85

Figure 85: Input (top) and Output (bottom) amplitude waterfalls for data compression on Tristram2.wav, keeping 0.7*WLen frequency bins (Len = 1024). The amplitude spectrum appears unchanged (even with somewhat higher levels of compression), as there is less energy in the frequencies where compression occurred (high frequencies). Output file is Tristram2-DataCompN716.8-L1024-Ra128Rs128.wav

Page 87: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

86

Figure 86: Input (top) and Output (bottom) Magnitude waterfalls for data compression on Tristram2.wav, keeping 0.2*WLen frequency bins (Len = 1024). The upper end of the magnitude spectrum shows that many bins have been set to zero.

Page 88: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

87

Figure 87: and Output (bottom) amplitude waterfalls for data compression on diner.wav, using floor(0.1*WLen) frequency bins (Len = 1024).

Page 89: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

88

9. Conclusions A robust phase vocoder model, outlined in the block diagram of Figure 1, was implemented based on the block-by-block FFT approach in [1]. The operation of the phase vocoder was evolved over time; the last two implementations (wha filter and data compression) represent the most up to date version of the phase vocoder framework. These phase vocoders call the function waterfall_Plot.m, according to their input parameters, to produce time-ferquency representations of the input and output audio or vectors. This functionality can easily be extended to the other phase vocoders, which were implemented with an older version of the waterfall plotting function. The pitch shifting implementation, on the other hand, is the most full featured of the basic effects, incorporating linear interpolation and a rudimentary harmonizer. This project has helped develop of my practical and theoretical experience with digital audio effects, and DSP in general. The time-frequency representation functions that were developed for this phase vocoder have proven to be useful tools outside of the project – for analysis of audio signals, and signals in general. Had additional time been available, many of the effects that were implemented could have undergone expansions. With the fundamental framework in place, creative effects that are based on frequency-domain processing can be designed and easily integrated into the phase vocoder.

REFERENCES [1] Udo Zölzer, DAFX. John Wiley & Sons, 2002.

[2] A. Götzen, N. Bernardini, D. Arfib, “Traditional (?) Implementations of a Phase-Vocoder: The

Tricks of the Trade, COST G-6 Conference Proceedings (DAFX-00), 2000

Page 90: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

89

APPENDIX A – List of MATLAB files

LIST OF MATLAB FILES

M-FILE DESCRIPTION

PhaseVocode_wav.m Tests implementations on audio

PhaseVocode_vec.m Tests implementations on vectors

pVocoder_FFT.m Original I/O operation

pv_Timestretch.m Time Stretching

pv_Pitchshift.m Pitch Shifting/Basic Harmonizer

pv_Transient.m Stable/Transient component separation

pv_Robotize.m Basic robotization

pv_RobotizeLinterp.m Robotization with linear interpolation

pv_Demonize.m Whisperization

pv_Denoise.m Denoising

pv_WhaBPF.m Wha wha filter (freq domain implementation)

pv_DataComp.m Audio data compression experimentation

kernalPlot.m Plots analysis grains

waterfall_Plot.m Plots time-frequency representations

waterfall_surf.m Plots time-frequency representations (old version)

design_whaBPF.m Bandpass filter plots for pv_WhaBPF

princarg.m Computed principle argument for phase unwrapping

Page 91: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

90

APPENDIX B – Two Example Implementations

1. pv_WhaBPF.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % pv_WhaBPF Author: Tim Perry % Elec484: DAFX V00213455 % Final Project, Phase 1 2009-07-18 % % FFT/IFFT implementation of the Phase Vocoder (Block-by-Block Approach) % to be used for implementing a wha-wha bandpass filter in the freq domain. % -based on concept of the direct FFT/IFFT method used in Zolzer's DAFX % % [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut,waterFIG_i,waterFIG_o]... % =PV_WHABPF(x,WLen,f1,B,whaRate,sweepRange,TAG,waterplot,plotCODE) % % x = input vector or .wav file % WLen = analysis & synthesis window size % f1 = Default center freq (sweep fc w/ respect to f1) (ex.689Hz) % B - 3dB bandwidth (recomend 100 Hz) % whaRate - Controls wha-wha rate independent of WLen % sweepRange - Multiplier for range of filter sweep % TAG - 'String-For-Naming-Plots' % waterplot = [0 0 0] to plot no time-freq representations % [1 0 0] to plot input waterfalls on lin freq scale % [0 1 1] to plot output watrfalls on log scale % [1 1 1] to plot I/O waterfalls on log scale, ext. % plotCODE - [1 1 1 1] plots Amp, Phase, Mag, Phase @ maxAmp Bins % - [0 0 0 0] plots nothing % % y = output vector % ModuliIn,ModuliOut: input & output moduli (amplitude) matrices % PhaseIn,PhaseOut: input & output phase matrices % waterFIG_i = Waterfall fig handles for input. % [waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG] % waterFIG_o = Waterfall fig handles but for output. % % FILTER SPECS: % sampling rate fs = 44,100 Hz % center frequency example: f_1 = 44,100/64 = 689 Hz. % 3dB bandwidth example: B = 100 Hz % implementation uses 2 poles and 2 zeros % % REFERENCES: % [1] Udo Zölzer, DAFX. John Wiley & Sons, 2002. % [2] A. Götzen, N. Bernardini, D. Arfib, 2000. (see PDF documentation) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut,waterFIG_i,waterFIG_o]... = pv_WhaBPF(x,WLen,f1,B,whaRate,sweepRange,TAG,waterplot,plotCODE) % x = 'white_noise.wav'; %for testing % WLen = 1024; % TAG = 'WhaTest'; % waterplot = 0; Ra = WLen/8; % analysis hop size Rs = Ra; % synthesis hop size

Page 92: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

91

%============================================================== % Input processing & zero padding %============================================================== figNum = get(0,'CurrentFigure'); %------------------------check input type-------------------- [xtype xinfo] = wavfinfo(x); % input data info disp(xinfo) if length(xtype) == 0 iswav = 0; % vector input fs = 8000; grainFIGa = figure('Name','Analysis Grains (kernalPlot)', ... 'Position',[100,100,1000,800]); clear figure(figNum); figNum = figNum + 1; grainFIGs = figure('Name','Synthesis Grains (kernalPlot)', ... 'Position',[100,100,1000,800]); clear figure(figNum); figNum = figNum + 1; else iswav = 1; % wav file input wavfile = x; TAG=strcat(wavfile(1:length(wavfile)-4),'-',TAG); [x fs nbits] = wavread(wavfile); % read input audio end %------use left channel for mono treatment of stereo input----- x = x(:,1); % left channel (col vector) N_orig = length(x); % original length of input vector if size(x,2)>1, % ensure column vectors for matrix operations x=x'; end %----------------zero-pad Input---------------------- % WLen zeros before x, % WLen-(N_orig-n.*Ra) zeros after x, where n = floor(X./Y) Input = [zeros(WLen, 1); x; zeros(WLen-mod(N_orig,Ra),1)]; Input = Input/max(abs(x)); % normalize Input Nx = length(Input); % zero-padded input length fileTAG = strcat(TAG,'-L',num2str(WLen)... ,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav'); %for file naming plotTAG = strcat(TAG,': WLen=',num2str(WLen)... ,' Ra=',num2str(Ra),' Rs=',num2str(Rs)); %for plot naming %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH %============================================================== % Create framing window (modified hanning window for OLA [2]) % w = [0, k_1, k_2,..., k_n-1 = k1] %==============================================================

Page 93: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

92

w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen))); % analyis window w2 = w1; % synthesis window %winview = wvtool(w1,w2); % plot windows numWin_a = ceil((Nx-WLen+Ra)/Ra); % # of analysis windows numWin_s = ceil((Nx-WLen+Rs)/Rs); %============================================================== % Initializations for Kernal Algorithm %============================================================== %---------------------------------------------- % FX-specific initializations (wha filter) %---------------------------------------------- % f1 = fs/64; % Default center frequency (sweep fc with respect to f1) % B = 100; % 3dB bandwidth is 100 Hz % whaSpeed = 50; % Multiplier for filter sweep frequency % sweepRange = 3; % Multiplier for range of filter sweep whaSpeed = WLen/whaRate; % Multiplier for filter sweep frequency kBins = linspace(0,fs/2,WLen); % lin spaced freq bins up to Nyquist z=exp(2*pi*kBins/fs*j); %for manually expressing filter transfer function Hframes = zeros(WLen, numWin_a+1); % Hframes(i*Ra,k) to store each block’s filter H(k) Output = zeros(Nx,1); % initialize output vector %---------------------------------------------- % General PV initializations %---------------------------------------------- %omega = 2*pi*Ra*[0:WLen-1]'/WLen; % nominal phase increment for Ra %phi0 = zeros(WLen,1); % previous measured phase phi_t = zeros(WLen,1); % target phase ModuliIn = zeros(WLen, numWin_a+1); PhasesIn = ModuliIn; ModuliOut = zeros(WLen, numWin_s+1); PhasesOut = ModuliOut; nPlotIndices_a = zeros(numWin_a+1,1); % to store sample indexes for plots nPlotIndices_s = zeros(numWin_s+1,1); tic %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH %======================================================================= %------------------------Kernal Algorithm----------------------- % -performs FFTs, IFFTs, and overlap-add of successive grains % -implements analysis and resynthesis frame-by-frame % -stores phase and moduli for input and output signals %======================================================================= vIn = 1; % analysis sample index @ frame start vOut = 1; % synthesis sample index @ frame start for i = 0:numWin_a - 2 % processing on ith frame (i = 0:floor((Nx - WLen)/Ra)) %==========================================================

Page 94: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

93

% Analysis Portion %========================================================== if ((vIn + WLen - 1) >= Nx) frame_end = Nx - vIn; % end index of ith frame (if last) else frame_end = WLen - 1; % end index offset of ith frame end win_end = frame_end + 1; % end index offset of window %--------Window input, forming kernal-------- grain = Input(vIn : vIn + frame_end).*w1(1:win_end); %grain = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)]; % zero pad analysis grain for greater frequency precision if (numWin_a <= 24) % plot this grain kernalPlot(grain,WLen,i,numWin_a,grainFIG); end %--------FFT on circular shifted grain -------- f = fft(fftshift(grain)); % FFT of ith grain r = abs(f); % amplitude phi = angle(f); % phase %----------Store analysis results----------- nPlotIndices_a(i+1) = vIn + frame_end; % store sample index for FFT frame plot ModuliIn(1:win_end, i+1) = r; % store analysis results PhasesIn(1:win_end, i+1) = phi; %========================================================== % Frequency Domain Processing (FX-specific for Wha BPF) %========================================================== fc = f1*(1 + sweepRange*cos(2*pi*whaSpeed*i/fs)); % change center freq %----------------------------------------------- % Build Transfer Function w/ Filter Coefficients % allpass filter form (Pg 41-43, DAFX) d = -cos(2*pi*fc/fs); % apply new f_c to d param c = (tan(pi*B/fs)-1)/(tan(2*pi*B/fs)+1); % controls the bandwidth Num = (((c+1).*z.*z)-(c+1)); % b = [(c+1) 0 (c+1)] Den = 2*(z.*z + d*(1-c).*z - c); % a = 2*[(1 d*(1-c) -c] H = Num./Den; % BPF transfer function H(k) % Hframes(1:win_end, i+1) = H; % store filter H(k) in Hframes(k,i*Ra) %------------------------------------------------------ % Filter this block by multiplication with its FFT % choose to keep current phase, or let filter change phase response keepPhase = 0; if keepPhase == 1 rt = r.*abs(H'); % scale amplitudes with filter phi_t = phi; % keep original phase ft = rt.*exp(j*phi_t); % FFT filtered with ith grain BPF else ft = f.*H'; % filter the current block phi_t = angle(ft); % output phase rt = abs(ft); % output amplitude end

Page 95: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

94

%========================================================== % Resynthesis Portion %========================================================== ModuliOut(1:win_end, i+1) = rt; % store output moduli (same as input) PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases %----------- Inverse FFT & Windowing --------- tIFFT = fftshift(real(ifft(ft))); % shifted IFFT grain_t = tIFFT.*w2(1:win_end); % inverse windowing (tapering) %------------ Overlap Adding --------------- Output(vOut:vOut+frame_end) = Output(vOut:vOut+frame_end) + grain_t; vIn = vIn + Ra; % sample index for start of next block vOut = vOut + Rs; end %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH toc %====================================================================== %------------ Output Processing & Plotting------------------ %====================================================================== y = Output/max(abs(Output)); % normalize output vector plotTAG_in = ['INPUT for', plotTAG]; plotTAG_out = ['OUTPUT', plotTAG]; if iswav == 1 %===================================================== % Time Domain Plots for .WAV Input %===================================================== Ny = length(y); Ts = 1/fs; % sampling period nT_in = (0:Nx-1)*Ts; % time vector (zero-padded input) nT_out = (0:Ny-1)*Ts; % time vector (output) axis_time = [0, max(Nx,Ny), -1.2, 1.2]; figure('Name','I/O'); clear figure(figNum); figNum = figNum + 1; colordef white; %-------input (integer samples/cycle)--------- subplot(2,1,1) hold on; plot(nT_in*fs,Input); %plot(nT_in*fs,Input, 'b.'); %stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13); grid on;

Page 96: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

95

axis(axis_time); title(['Input x[n] (', wavfile, ') , normalized & padded']); ylabel('x[n]') xlabel('n (samples)') hold off; %-------output (integer samples/cycle)--------- subplot(2,1,2) hold on; plot(nT_out*fs,y, 'r'); %plot(nT_out*fs,y, 'r.'); %stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13); grid on; axis(axis_time) title(['Output y[n] (', plotTAG, ') , normalized']); ylabel('y[n]') xlabel('n (samples)') hold off; %================================================================== % Time-Frequency Plots % Plots the following: % - Amplitude & Phase Waterfalls % - Magnitude Waterfall (dB scale) % - Phase vs. Time plot of Freq Bins near max amplitude % (in the case of a single tone sinusoid, will be centered % around pitch) %================================================================== freqScale = waterplot(3); % linear or log frequency scale %plotCODE = [1 1 1 0]; % plot amp, phase, and mag waterfalls if waterplot(1) == 1 % input waterfall plots waterFIG_i = waterfall_Plot(ModuliIn,PhasesIn,nPlotIndices_a,... fs,WLen,Ra,Nx,plotCODE,freqScale,wavfile); end if waterplot(2) == 1 % output waterfall plots waterFIG_o = waterfall_Plot(ModuliOut,PhasesOut,nPlotIndices_a,... fs,WLen,Ra,Ny,plotCODE,freqScale,plotTAG); end %============================================================== % Output Audio File %============================================================== y = y*0.999; % lower level a bit more to remove wavrite clip warning x = x/max(abs(x)); % normalize input for comparison playback wavwrite(y, fs, nbits, fileTAG); % write output audio file %wavplay(x, fs); % play PV input wavplay(y, fs); % play PV output %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH else %====================================================== % Time Domain Plots for Vector Input %====================================================== Ny = length(y); Nsamples_in = linspace(0,Nx,Nx); Nsamples_out = linspace(0,Ny,Ny);

Page 97: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

96

axis_time = [0, max(Nx,Ny), -1.2, 1.2]; figure (figNum); clear figure(figNum); %------------normalized input plot-------------- subplot(2,1,1); hold on; plot(Nsamples_in, Input,'b-'); stem(Nsamples_in, Input,'.','b-','MarkerSize',9); axis(axis_time); grid on; title(['Input x[n] (', plotTAG, ') , normalized']); ylabel('x[n]'); xlabel('n'); hold off; %------------normalized output plot-------------- subplot(2,1,2); hold on; plot(Nsamples_out, y,'b-'); stem(Nsamples_out, y,'.','b-','MarkerSize',9); axis(axis_time); grid on; title(['Output y[n] (', plotTAG, ') , normalized']); ylabel('y[n]'); xlabel('n'); hold off; %================================================================== % Time-Frequency Plots for vector input % Plots the following: % - Amplitude & Phase Waterfalls % - Magnitude Waterfall (dB scale) % - Phase vs. Time plot of Freq Bins near max amplitude % (in the case of a single tone sinusoid, will be centered % around pitch) %================================================================== if waterplot == 1 waterfallFIG_in = waterfall_vecPlot(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG_in); waterfallFIG_out = waterfall_vecPlot(ModuliOut,PhasesOut,nPlotIndices_a,fs,WLen,Ra,Rs,Ny,plotTAG_out); end end

Page 98: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

97

2. pv_Pitchshift.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % pv_Pitchshift.m Author: Tim Perry % Elec484: DAFX V00213455 % Final Project, Phase 1 2009-07-13 % % FFT/IFFT implementation of the Phase Vocoder (Block-by-Block Approach) % to be used for pitch shifting, using integrated resampling. % -based on concept of the direct FFT/IFFT method used in Zolzer's DAFX % -uses different hop sizes for analysis and resynthesis % -time stretch ratio tStretch = Rs/Ra % -for each grain, a time stretching and resampling is performed % % [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut]... % =PV_PITCHSHIFT(x,WLen,Ra,Rs,TAG,waterplot) % % x = input vector or .wav file % WLen = analysis window/grain size & synthesis window size % Ra = analysis hop size (Ra <= WLen/4) % Rs = synthesis hop size (Rs <= WLen/4) % TAG = 'String-For-Naming-Plots' % waterplot = 1 to plot time-freq representations, 0 otherwise % % y = output vector % ModuliIn,ModuliOut: input & output moduli (amplitude) matrices % PhaseIn,PhaseOut: input & output phase matrices % % REFERENCES: % [1] Udo Zölzer, DAFX. John Wiley & Sons, 2002. % [2] A. Götzen, N. Bernardini, D. Arfib, 2000. (see PDF documentation) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut] ... = pv_Pitchshift(x,WLen,Ra,Rs,TAG,waterplot) %============================================================== % Input processing & zero padding %============================================================== figNum = get(0,'CurrentFigure'); %------------------------check input type-------------------- [xtype xinfo] = wavfinfo(x); % input data info disp(xinfo) if length(xtype) == 0 iswav = 0; % vector input fs = 8000; grainFIGa = figure('Name','Analysis Grains (kernalPlot)', ... 'Position',[100,100,1000,800]); clear figure(figNum); figNum = figNum + 1; grainFIGs = figure('Name','Synthesis Grains (kernalPlot)', ... 'Position',[100,100,1000,800]); clear figure(figNum);

Page 99: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

98

figNum = figNum + 1; else iswav = 1; % wav file input wavfile = x; TAG=strcat(wavfile(1:length(wavfile)-4),'-',TAG); [x fs nbits] = wavread(wavfile); % read input audio end %------use left channel for mono treatment of stereo input----- x = x(:,1); % left channel (col vector) N_orig = length(x); % original length of input vector if size(x,2)>1, % ensure column vectors for matrix operations x=x'; end %----------------zero-pad Input---------------------- % WLen zeros before x, % WLen-(N_orig-n.*Ra) zeros after x, where n = floor(X./Y) Input = [zeros(WLen, 1); x; zeros(WLen-mod(N_orig,Ra),1)]; Input = Input/max(abs(x)); % normalize Input Nx = length(Input); % zero-padded input length fileTAG = strcat(TAG,'-L',num2str(WLen)... ,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav'); %for file naming plotTAG = strcat(TAG,': WLen=',num2str(WLen)... ,' Ra=',num2str(Ra),' Rs=',num2str(Rs)); %for plot naming %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH %============================================================== % Create framing window (modified hanning window for OLA [2]) % w = [0, k_1, k_2,..., k_n-1 = k1] %============================================================== w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen))); % analyis window w2 = w1; % synthesis window %winview = wvtool(w1,w2); % plot windows numWin_a = ceil((Nx-WLen+Ra)/Ra); % # of analysis windows numWin_s = ceil((Nx-WLen+Rs)/Rs); % # of synthesis windows %============================================================== % Initializations for Kernal Algorithm %============================================================== %-------------------------------------------------- % FX-specific initializations for pitch shifting %-------------------------------------------------- tStretch = Rs/Ra % time stretch ratio %-------Linear Interpolation Parameters---------

Page 100: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

99

Lresamp = floor(WLen/tStretch); % length of resampled/interpolated grain nInterpSpace = linspace(0,Lresamp-1,Lresamp)'; % linear spaced time row vec nfracInterp = 1 + nInterpSpace*WLen/Lresamp; nInterp0 = floor(nfracInterp); % Lresamp length vector of sample integer % values between 1 and WLen nInterp1 = nInterp0 + 1; % Lresamp length vector of sample integer % values between 2 and WLen+1 frac0 = nfracInterp - nInterp0; % fractional distances of integer samples % below interpolation points frac1 = 1-frac0; % fractional distances of integer samples % above interpolation points Output = zeros(Lresamp+Nx,1); % initialize output vector (overlap-added % interpolated synthesis grains) %---------------------------------------------- % General PV initializations %---------------------------------------------- omega = 2*pi*Ra*[0:WLen-1]'/WLen; % nominal phase increment for Ra phi0 = zeros(WLen,1); % previous measured phase phi_t = zeros(WLen,1); % target phase ModuliIn = zeros(WLen, numWin_a+1); PhasesIn = ModuliIn; ModuliOut = zeros(WLen, numWin_s+1); PhasesOut = ModuliOut; nPlotIndices_a = zeros(numWin_a+1,1); % to store sample indexes for plots nPlotIndices_s = zeros(numWin_s+1,1); tic %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH %======================================================================= %------------------------Kernal Algorithm----------------------- % -performs FFTs, IFFTs, and overlap-add of successive grains % -implements analysis and resynthesis frame-by-frame % -stores phase and moduli for input and output signals %======================================================================= vIn = 1; % analysis sample index @ block start vOut = 1; % synthesis sample index @ block start for i = 0:numWin_a - 2 % processing on ith block (i = 0:floor((Nx - WLen)/Ra)) %========================================================== % Analysis Portion %========================================================== if ((vIn + WLen - 1) >= Nx) frame_end = Nx - vIn; % end index of ith frame (if last) else frame_end = WLen - 1; % end index offset of ith frame end win_end = frame_end + 1; % end index offset of window

Page 101: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

100

%--------Window input, forming grain-------- grain = Input(vIn : vIn + frame_end).*w1(1:win_end); % zero pad analysis grain for greater frequency precision %grain = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)]; if (numWin_a <= 24) % plot this grain kernalPlot(grain,WLen,i,numWin_a,grainFIG); end %--------FFT on circular shifted grain -------- f = fft(fftshift(grain)); % FFT of ith grain r = abs(f); % amplitude phi = angle(f); % phase %----------Store analysis results----------- nPlotIndices_a(i+1) = vIn + frame_end; % store sample index for FFT frame plot ModuliIn(1:win_end, i+1) = r; % store analysis results PhasesIn(1:win_end, i+1) = phi; %========================================================== % Frequency Domain Processing %========================================================== %----------- Phase Unwrapping --------------- phi_d = princarg(phi-phi0-omega); % devlation phase (3) delta_phi = omega + phi_d; % phase difference between two % adjacent frames for each added % to nominal phase of the bin phi0 = phi; % measured phase %--------- Target Phase Calculation --------- % -implemetents time stretching by ratio Rs/Ra % -phase increment is stretched, then added to previous phase phi_t = princarg(phi_t + delta_phi*tStretch); %========================================================== % Resynthesis Portion %========================================================== ft = r.*exp(j*phi_t); % FFT with ith grain target phase rt = abs(ft); % output amplitude ModuliOut(1:win_end, i+1) = rt; % store output moduli (same as input) PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases %-------------- Inverse FFT & Windowing ------------ tIFFT = fftshift(real(ifft(ft))); % shifted IFFT grain_t = tIFFT.*w2(1:win_end); % inverse windowing (tapering) %----------------- Interpolation -------------------- grain_t2 = [grain_t;0]; % pad w/ single zero to allow interpolation % between succesive grains grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0; % linear interp

Page 102: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

101

if (numWin_s <= 24) % plot this grain kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs); end %----------Overlap Adding of Resampled Grains--------- Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3; vIn = vIn + Ra; % sample index for start of next block vOut = vOut + Ra; end %HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH toc %====================================================================== %------------ Output Processing & Plotting------------------ %====================================================================== y = Output/max(abs(Output)); % normalize output vector plotTAG_in = ['INPUT for', plotTAG]; plotTAG_out = ['OUTPUT', plotTAG]; if iswav == 1 %===================================================== % Time Domain Plots for .WAV Input %===================================================== Ny = length(y); Ts = 1/fs; % sampling period nT_in = (0:Nx-1)*Ts; % time vector (zero-padded input) nT_out = (0:Ny-1)*Ts; % time vector (output) axis_time = [0, max(Nx,Ny), -1.2, 1.2]; figure('Name','I/O'); clear figure(figNum); figNum = figNum + 1; %-------input--------- subplot(2,1,1) hold on; plot(nT_in*fs,Input); %plot(nT_in*fs,Input, 'b.'); %stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13); grid on; axis(axis_time); title(['Input x[n] (', plotTAG, ') , normalized & padded']); ylabel('x[n]') xlabel('n (samples)') hold off; %-------output --------- subplot(2,1,2) hold on; plot(nT_out*fs,y, 'r'); %plot(nT_out*fs,y, 'r.'); %stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13);

Page 103: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

102

grid on; axis(axis_time) title(['Output y[n] (', plotTAG, ') , normalized']); ylabel('y[n]') xlabel('n (samples)') hold off; %================================================================== % Time-Frequency Plots % Plots the following: % - Amplitude & Phase Waterfalls % - Magnitude Waterfall (dB scale) % - Phase vs. Time plot of Freq Bins near max amplitude % (in the case of a single tone sinusoid, will be centered % around pitch) %================================================================== if waterplot == 1 waterfallFIG = waterfall_surf(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG); end %============================================================== % Output Audio File %============================================================== y = y*0.999; % lower level a bit more to remove wavrite clip warning x = x/max(abs(x)); % normalize input for comparison playback wavwrite(y, fs, nbits, fileTAG); % write output audio file wavplay(x, fs); % play PV input wavplay(y, fs); % play PV output %============================================================== % Create Harmony %============================================================== harmony = zeros(Ny,2); % stereo output file harmony(:,1)=[Input; zeros(Ny-Nx,1)]*0.9; % assign input to left channel and zero pad to output length harmony(:,2)=y*0.9; % assign shifted to right channel harmTAG = [wavfile(1:length(wavfile)-4),'-Harmony',num2str(tStretch)]; harmTAG = strcat(harmTAG,'-L',num2str(WLen)... ,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav'); %for file naming wavwrite(harmony,fs,harmTAG); else %====================================================== % Time Domain Plots for Vector Input %====================================================== Ny = length(y); Nsamples_in = linspace(0,Nx,Nx); Nsamples_out = linspace(0,Ny,Ny); axis_time = [0, max(Nx,Ny), -1.2, 1.2]; figure (figNum); clear figure(figNum);

Page 104: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

103

%------------normalized input plot-------------- subplot(2,1,1); hold on; plot(Nsamples_in, Input,'b-'); stem(Nsamples_in, Input,'.','b-','MarkerSize',9); axis(axis_time); grid on; title(['Input x[n] (', plotTAG, ') , normalized']); ylabel('x[n]'); xlabel('n'); hold off; %------------normalized output plot-------------- subplot(2,1,2); hold on; plot(Nsamples_out, y,'b-'); stem(Nsamples_out, y,'.','b-','MarkerSize',9); axis(axis_time); grid on; title(['Output y[n] (', plotTAG, ') , normalized']); ylabel('y[n]'); xlabel('n'); hold off; %================================================================== % Time-Frequency Plots for vector input % Plots the following: % - Amplitude & Phase Waterfalls % - Magnitude Waterfall (dB scale) % - Phase vs. Time plot of Freq Bins near max amplitude % (in the case of a single tone sinusoid, will be centered % around pitch) %================================================================== if waterplot == 1 waterfallFIG_in = waterfall_vecPlot(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG_in); waterfallFIG_out = waterfall_vecPlot(ModuliOut,PhasesOut,nPlotIndices_a,fs,WLen,Ra,Rs,Ny,plotTAG_out); end end

Page 105: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

104

APPENDIX C – waterfall_Plot.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % waterfall_Plot.m Author: Tim Perry % Elec484: DAFX V00213455 % Final Project, Phase 1 part c 2009-07-08 % % Function to perform time-frequency representation plots (waterfall plots) % from the phase and moduli data provided by a phase vocoder. % Plots the following: % - Amplitude Waterfall (linear scale) % - Phase Waterfall % - Magnitude Waterfall (dB scale) % - Phase vs. Time plot of Freq Bins near max amplitude % (in the case of a single tone sinusoid, will be centered % around pitch) % % FIGS = WATERFALL_PLOT(Moduli,Phases,nPlotIndices,fs,WLen,Ra,... % N,plotCODE,freqScale,plotTAG) % % Moduli - amplitude matrix from successive FFT frames % Phases - phase matrix from successive FFT frames % nPlotIndices - sample indexes for FFT frame plots % fs - sampling frequency % WLen - analysis window size % Ra - analysis hop size % N - time axis length [samples] (ex: signal length) % plotCODE - [1 1 1 1] plots Amp, Phase, Mag, Phase @ maxAmp Bins % - [0 0 0 0] plots nothing % LogFreq - 1 for logarithmic freq scale, 0 for linear freq scale % plotTAG - 'String-For-Naming-Plots' % % FIGS = fig handles [waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG] % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG]= waterfall_Plot... (Moduli,Phases,nPlotIndices,fs,WLen,Ra,N,plotCODE,LogFreq,plotTAG) colordef black; figNumStart = get(0,'CurrentFigure'); %clear figure(figNumStart); figNum = figNumStart + 1; waterAmpFIG = []; %return empty array if figures not created waterPhaseFIG = []; waterMagFIG = []; pMaxBinFIG = []; %============================================================== % Time-Frequency Plots Paremters %============================================================== %ModuliIn_norm = 2*ModuliIn/(WLen/2); %Amp spectrum in quantity peak

Page 106: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

105

Amp_norm = Moduli/Ra; % Amp spectrum in quantity peak Mag_dB = 20*log10(Moduli/max(max(Moduli))); f0 = fs/WLen; % frequency resolution numBins = WLen; % # of frequency bins %kBins = linspace(0,fs/2,numBins/2 + 1); %lin spaced freq bins up to Nyquist kBins = linspace(0,fs/2,numBins/2 + 1); %lin spaced freq bins up to Nyquist [n, k] = meshgrid(nPlotIndices,kBins); % rectangular domain if plotCODE(1) == 1; %================================================================= %-----------------Amplitude Waterfall--------------- %================================================================= waterAmpFIG = figure('Colormap',jet(128),'Name',... 'Waterfall Amplitude Plot','Position',[10,20,1500,950]); clear figure(figNum); figNum = figNum + 1; hold on %C = del2(n,k,AmpIn_norm(1:(numBins/2+1), :)); %colour mapping %waterAmp = surf(n,k,Amp_norm(1:(numBins/2 + 1), :)); %plot input amps waterAmp = meshz(n,k,Amp_norm(1:(numBins/2 + 1), :)); set(waterAmp,'MeshStyle','col') %colormap winter colorbar axis([0,N,0,fs/2,0,max(max(Amp_norm))]) title(['Amplitude Waterfall of Short-time FFTs (', plotTAG, ')']); grid on ylabel('f [Hz]') xlabel('n [samples]') zlabel('|X_w(f)|') hold off if LogFreq == true set(gca,'yscale','log') %set(waterAmp,'MeshStyle','both') %view(viewmtx(-70,8,10)); % set viewpoint end view(viewmtx(-70,25,25)); % set viewpoint %view(viewmtx(-55,25,25)); % set viewpoint ylabh = get(gca,'YLabel'); set(ylabh,'Position',get(ylabh,'Position') + [-1*fs 0 0]) end if plotCODE(2) == 1; %================================================================= %-----------------Phase Waterfall------------------- %================================================================= waterPhaseFIG = figure('Colormap',jet(128),'Name',... 'Waterfall Phase Plot','Position',[20,20,1800,950]); clear figure(figNum); figNum = figNum + 1; hold on % waterPhase = surf(n,k,Phases(1:(numBins/2 + 1), :)); %plot phases waterPhase = meshz(n,k,Phases(1:(numBins/2 + 1), :));

Page 107: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

106

set(waterPhase,'MeshStyle','col') colormap('jet') colorbar axis([0,N,0,fs/2,-3.2,3.2]) title(['Phase Waterfall of Short-time FFTs (', plotTAG, ')']); grid on ylabel('f [Hz]') xlabel('n [samples]') zlabel('Arg{X_w_(f)} [rad]') hold off if LogFreq == true set(gca,'yscale','log') end view(viewmtx(-55,75,25)); % set viewpoint end if plotCODE(3) ==1; %================================================================= % Magnitude Waterfall (dB) %================================================================= waterMagFIG = figure('Name','Magnitude Waterfall [dB]','Position'... ,[300,300,1200,650]); clear figure(figNum); figNum = figNum + 1;; % Mag_dB(numBins/2 + 1, numWin_a) = -190; % set to expand colourmap scale hold on %surf(n*Ts,k,Mag_dB(1:(numBins/2 + 1), :)); %plot input mag %waterfall(k,n*Ts,Mag_dB(1:(numBins/2 + 1), :)); %plot input mag waterMag = meshz(n,k,Mag_dB(1:(numBins/2 + 1), :)); set(waterMag,'MeshStyle','col') colormap('jet') colorbar('location','East') axis([0,N,0,fs/2,1.2*min(min(Mag_dB)),0]) title(['Magnitude [dB] Waterfall of Short-time FFTs (', plotTAG, ')']); grid on ylabel('f [Hz]') xlabel('n [samples]') zlabel('20log|X_w(f)| [dB]') hold off if LogFreq == true set(gca,'yscale','log') colormap('winter') %set(waterMag,'MeshStyle','both') %view(viewmtx(-70,8,10)); end view(viewmtx(-75,20,10)); % set viewpoint ylabh = get(gca,'YLabel'); % make freq axis label visible set(ylabh,'Position',get(ylabh,'Position') + [-1*fs*(N/100000) 0 0]) end

Page 108: TPerry-PhaseVocoder

Tim Perry V00213455 2009-05-21

107

if plotCODE(4) ==1; %================================================================= %----------Phase vs. Time (@ Freqs Bins near max amplitude)------- %================================================================= pMaxBinFIG = figure('Name','Phase vs. Time','Position',... [800,50,850,450]); clear figure(figNum); %figNum = figNum + 1; %------find freq bins where max amplitude occures----- [maxAmps, bins_max]=max(Moduli, [ ], 1); % get indices @ max center_bin = median(bins_max) - 1; % bin w/ most peaks bins_around = 5; % # bins above and below to include %--------define bins to plot (close to center_bin)------- if (center_bin <= bins_around) % freq bins above f k_closeBins = [center_bin:center_bin+bins_around - 1]; elseif (center_bin >= numBins/2 - bins_around) % freq bins below f k_closeBins = [center_bin-5:center_bin]; else % freq bins surrounding f k_closeBins = [center_bin-bins_around:center_bin+bins_around - 1]; end [n, k_close] = meshgrid(nPlotIndices,k_closeBins*f0); % rect domain hold on waterfall(n,k_close,Phases(k_closeBins + 1, :)); %stem3(n,k_close,PhasesIn(k_closeBins, :),'.'); colorbar axis([0,N,min(min(k_close)),max(max(k_close)),-3.2,3.2]) title(['Phase vs Time for Freq Bins Near Fundamental (', plotTAG, ')']); grid on ylabel('f [Hz]') xlabel('n [samples]') zlabel('Arg{X_w_(f)} [rad]') hold off view(114,77) % set viewpoint (azimuth, elevation) end colordef white;