Time-Scale Modification of Speech Signal

Time-Scale Time-Scale Modification of Modification of Speech SignalSpeech SignalSOLAFS (Synchronized Overlap-SOLAFS (Synchronized Overlap-Add, Fixed Synthesis)Add, Fixed Synthesis)

OverviewOverview

IntroductionIntroduction Overview of the methodsOverview of the methods Basic IdeaBasic Idea SOLAFS MethodSOLAFS Method Matlab CodeMatlab Code The resultsThe results ConclusionConclusion

IntroductionIntroduction

There are a large number of applications There are a large number of applications to modify the time-scale of speech, music to modify the time-scale of speech, music or other acoustic material.or other acoustic material.

Without modifying the pitch.Without modifying the pitch. To speed up or slow down the speechTo speed up or slow down the speech No Donald Duck or Minnie Mouse No Donald Duck or Minnie Mouse

effects.effects.


TSM-Time scale modification refer toTSM-Time scale modification refer to

changing the reproduction rate of a changing the reproduction rate of a

signal.signal. Two primary operation involvedTwo primary operation involved

- time-scale expansion -slow down- time-scale expansion -slow down

- time-scale compression -speed up- time-scale compression -speed up


original

expansion

compression

Overview of methodsOverview of methods

Time- scale modification utilizes three Time- scale modification utilizes three basic methods:basic methods:

- frequency domain processing methods- frequency domain processing methods

- analysis/synthesis methods- analysis/synthesis methods

- time-domain processing methods- time-domain processing methods SOLAFS is a time-domain processing SOLAFS is a time-domain processing

method.method.

Basic IdeaBasic Idea

SOLAFS is an improvement of the prior SOLA SOLAFS is an improvement of the prior SOLA method( Synchronized overlap-add).method( Synchronized overlap-add).

SOLA consists ofSOLA consists of -shifting the beginning of a new speech -shifting the beginning of a new speech

segment over the end of the preceding segment over the end of the preceding segment to find the point of the highest cross-segment to find the point of the highest cross-correlation.correlation.

-when found it, the frame are overlapped and -when found it, the frame are overlapped and average together.average together.

SOLAFSSOLAFSThere are 4 parameters There are 4 parameters

Window lengthWindow length ( (W) ) - smallest unit of input signal - smallest unit of input signal that is manipulated by the methodthat is manipulated by the method

Analysis shiftAnalysis shift ( (SSa a ) - ) - inter-frame interval between inter-frame interval between

successive search ranges for analysis windows successive search ranges for analysis windows along the input signalalong the input signal

Synthesis shiftSynthesis shift (S(Sss) - ) - inter-frame interval between inter-frame interval between

successive analysis windows along the output successive analysis windows along the output signalsignal

Shift search intervalShift search interval (k(kmaxmax) - ) - the duration of the the duration of the

interval over which an analysis window may be interval over which an analysis window may be shifted for purpose of aligning it with the region of shifted for purpose of aligning it with the region of the output signal it will overlap.the output signal it will overlap.

SOLAFSSOLAFS The four parameters used in the SOLAFSThe four parameters used in the SOLAFS

AnalysisAnalysisThe analysis windows are chosen as follows:The analysis windows are chosen as follows:

wherewhere mm = a window index, i.e. it refers to the m = a window index, i.e. it refers to the mthth window window nn = a sample index in an input buffer for the input = a sample index in an input buffer for the input signal, which buffer is signal, which buffer is WW samples long samples long

kkm m == the number of samples of shift for the the number of samples of shift for the mth mth window window

xxmm[n[n] = the ] = the nth nth sample in the sample in the mthmth analysis window analysis window

)1......(1,...,0

0

][][

Otherwise

WnfornkmSxnx mam

m

AnalysisAnalysis The analysis windows are then used to form the output The analysis windows are then used to form the output

signal signal y[i] y[i] recursively in accordance to the following:recursively in accordance to the following:

where: where:

WWovov= W –S= W –Sss is the number of points in the overlap regionis the number of points in the overlap region

b[n]b[n] = an overlap-add weighting function which is referred = an overlap-add weighting function which is referred to as a to as a fading factorfading factor – an averaging function, a – an averaging function, a linear fade function, and so forth. linear fade function, and so forth.

)2...(1,...,

1,...,0

][

][])[1(][][][

WWnfor

Wnfor

nx

nxnbnmSynbnmSy

ov

ov

m

mss

Analysis Analysis

Calculation for Calculation for kkmm

kkmm is an optimal shift that is determined is an optimal shift that is determined

by the normalized cross-correlation between x by the normalized cross-correlation between x and y in the overlap region.and y in the overlap region.

where where

kkmaxmax is the maximum allowable shift from the is the maximum allowable shift from the initial string position of the analysis window initial string position of the analysis window

)3].......([maxmax0

kRk mxy

kkm

KKmm can be often predicted without computation can be often predicted without computation

of the similarity.of the similarity. The The mmthth shift, shift, kkmm, should be determined by:, should be determined by:

][max

)(

max0

1

kR

SSktk m

xykk

asmm

m

if maxkto m

otherwise

Implement in MATLABImplement in MATLABThere are 7 steps as follows;There are 7 steps as follows; 1. As an initialization step , take 1. As an initialization step , take WW samples samples from the input signal, which samples are from the input signal, which samples are stored in an input signal buffer, and place stored in an input signal buffer, and place them in an output sample buffer for the them in an output sample buffer for the output signal. output signal. 2. find the start of the first analysis 2. find the start of the first analysis

window window mSmSaa..

Implement in MATLABImplement in MATLAB

3. Next, find the maximum similarity 3. Next, find the maximum similarity between the first between the first WWovov samples at the start samples at the start

of of the analysis windowthe analysis window and at the end of and at the end of the output signalthe output signal by computing the cross- by computing the cross-correlation between the samples from the correlation between the samples from the start of the analysis window, and the start of the analysis window, and the samples from the end of the output samples from the end of the output window. window.


4. We shift the start of the analysis 4. We shift the start of the analysis window by window by one or two one or two and repeat step 3.and repeat step 3.

5. Steps 3 and 4 are repeated until we 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the have shifted the analysis window by the maximum amount of maximum amount of kkmaxmax that is allowed. that is allowed.


6. If the maximum cross-correlation 6. If the maximum cross-correlation occurs for a certain shift of the analysis occurs for a certain shift of the analysis window, overlap-add the last window, overlap-add the last WWov ov

samples of the output signal and the samples of the output signal and the first first WWovov samples of the shifted analysis samples of the shifted analysis

window, and transfer window, and transfer W – WW – Wovov further further

samples into the output buffer. samples into the output buffer.


7. Steps 2 – 7 are repeated by choosing 7. Steps 2 – 7 are repeated by choosing the next analysis window, until the input the next analysis window, until the input signal reaches its end. signal reaches its end.

Parameter choicesParameter choices

The smallest useful synthesis shift is The smallest useful synthesis shift is

SSss= W= Wovov

The smallest useful window length is The smallest useful window length is

W = 2WW = 2Wovov

KKmaxmax = 2W = 2W

MATLABMATLAB

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Project Spring 2005%% Rachan Fugcharoen ECE5525% % Do SOLAFS timescale mod'n% % Y is X scaled to run F x faster. X is added-in in windows% % W pts long, overlapping by Wov points with the previous output. % % The similarity is calculated over the last Wsim points of output.% % Maximum similarity skew is Kmax pts.% % Each xcorr calculation is decimated by xdecim (8)% % The skew axis sampling is decimated by kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%% Read the wave file [d,fs,bit]=wavread('we.wav');W=400; % window lengthWov=W/2; % Overlapping point long Kmax=2*W; % maximum number of shifting Wsim=Wov; % Similarity point long of outputxdecim=8; % decimation of each xcorrkdecim=2; % decimation of the skew axis sampling X=d'

MATLABMATLAB% Factor to run x faster or slower F=4; Ss =W-Wov; size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1, ypts); % Cross-fade win is Wov pts long - it grows xfwin = (1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region ovix = (1-Wov):0;% Index for non-overlapping bit newix = 1:(W-Wov);% Index for similarity chunks% decimate the cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; % Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);

MATLABMATLAB

xabs = 0;lastxpos = 0;km = 0;

for ypos = Wsim:Ss:(ypts-W); % Ideal X position

xpos = F * ypos; % Overlap prediction - assume all of overlap from last copy

kmpred = km + (xpos - lastxpos); lastxpos = xpos;

if (kmpred <= Kmax) km = kmpred; % no need to search

else

MATLABMATLAB

% Calculate the skew, km% .. by first figuring the cross-correlation ysim = Y(ypos + simix); % Clear the Rxy array rxy = zeros(1, Kmax+1); rxx = zeros(1, Kmax+1);Kmin = 0;

MATLABMATLAB

for k = Kmin:kdecim:Kmax xsim = padX(Wsim + xpos + k + simix); rxx(k+1) = norm(xsim); rxy(k+1) = (ysim * xsim'); end% Zero the pts where rxx was zero Rxy = (rxx ~= 0).*rxy./(rxx+(rxx==0));% Local max gives skew km = min(find(Rxy == max(Rxy))-1);end xabs = xpos+km;

MATLABMATLAB

% Cross-fade some points Y(ypos+ovix) = ((1-xfwin).*Y(ypos+ovix)) + (xfwin.*padX(Wsim+xabs+ovix));% Add in remaining points Y(ypos+newix) = padX(Wsim+xabs+newix);end% Plot the result subplot(211);plot(X);grid;original=axis;subplot(212);plot(Y);grid;change=axis;

MATLABMATLAB

if F > 1 subplot(211); title('Original wave file');axis(original) subplot(212); title(['Modified wave file (Speed=',num2str(F),'X)']);axis(original)else subplot(211); title('Original wave file'); axis(change) subplot(212); title(['Modified wave file (Speed =',num2str(F),'X)']);axis(change)end% Play the wave file and save the wave filesound(Y,fs);wavwrite(Y,fs,8,'we_new_2X.wav');

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =2X)

Speed 2X

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =4X)

Speed 4X

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =0.75X)

Speed 0.75X

ResultsResults

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1Original wave file

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =0.5X)

Speed 0.5X

ConclusionConclusion

The result can be accept with the proper The result can be accept with the proper choice of the parameters.choice of the parameters.

The SOLAFS algorithm provides time-The SOLAFS algorithm provides time-scale modified speech over the wide scale modified speech over the wide range of compression and expansion.range of compression and expansion.

It requires significantly less computation It requires significantly less computation than many other methods.than many other methods.

ConclusionConclusion

From the MATLAB code, it requires a lot From the MATLAB code, it requires a lot of buffer to hold the sample . It will cause of buffer to hold the sample . It will cause difficulties in real-time applications. difficulties in real-time applications.

In real-time applications, they have to In real-time applications, they have to process everything as fast as possible. If process everything as fast as possible. If the data is stored in compressed form or the data is stored in compressed form or the storage units are slow, they will be the storage units are slow, they will be difficult to process.difficult to process.

ReferencesReferences

D.J Hejna. Real-time time-scale D.J Hejna. Real-time time-scale modification of speech via the modification of speech via the synchronized overlap-add algorithm. synchronized overlap-add algorithm. Master’s thesis, M.I.T.,1990Master’s thesis, M.I.T.,1990

Don Hejna and Bruce R. Musicus. The Don Hejna and Bruce R. Musicus. The SOLAFS Time-Scale Modification SOLAFS Time-Scale Modification Algorithm. Research.1991Algorithm. Research.1991

Documents

Time-Scale Modification of Speech Signal