33
Time-Scale Time-Scale Modification of Modification of Speech Signal Speech Signal SOLAFS (Synchronized SOLAFS (Synchronized Overlap-Add, Fixed Overlap-Add, Fixed Synthesis) Synthesis)

Time-Scale Modification of Speech Signal

  • Upload
    finley

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Time-Scale Modification of Speech Signal. SOLAFS (Synchronized Overlap-Add, Fixed Synthesis). Overview. Introduction Overview of the methods Basic Idea SOLAFS Method Matlab Code The results Conclusion. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Time-Scale Modification of Speech Signal

Time-Scale Time-Scale Modification of Modification of Speech SignalSpeech SignalSOLAFS (Synchronized Overlap-SOLAFS (Synchronized Overlap-Add, Fixed Synthesis)Add, Fixed Synthesis)

Page 2: Time-Scale Modification of Speech Signal

OverviewOverview

IntroductionIntroduction Overview of the methodsOverview of the methods Basic IdeaBasic Idea SOLAFS MethodSOLAFS Method Matlab CodeMatlab Code The resultsThe results ConclusionConclusion

Page 3: Time-Scale Modification of Speech Signal

IntroductionIntroduction

There are a large number of applications There are a large number of applications to modify the time-scale of speech, music to modify the time-scale of speech, music or other acoustic material.or other acoustic material.

Without modifying the pitch.Without modifying the pitch. To speed up or slow down the speechTo speed up or slow down the speech No Donald Duck or Minnie Mouse No Donald Duck or Minnie Mouse

effects.effects.

Page 4: Time-Scale Modification of Speech Signal

IntroductionIntroduction

TSM-Time scale modification refer toTSM-Time scale modification refer to

changing the reproduction rate of a changing the reproduction rate of a

signal.signal. Two primary operation involvedTwo primary operation involved

- time-scale expansion -slow down- time-scale expansion -slow down

- time-scale compression -speed up- time-scale compression -speed up

Page 5: Time-Scale Modification of Speech Signal

IntroductionIntroduction

original

expansion

compression

Page 6: Time-Scale Modification of Speech Signal

Overview of methodsOverview of methods

Time- scale modification utilizes three Time- scale modification utilizes three basic methods:basic methods:

- frequency domain processing methods- frequency domain processing methods

- analysis/synthesis methods- analysis/synthesis methods

- time-domain processing methods- time-domain processing methods SOLAFS is a time-domain processing SOLAFS is a time-domain processing

method.method.

Page 7: Time-Scale Modification of Speech Signal

Basic IdeaBasic Idea

SOLAFS is an improvement of the prior SOLA SOLAFS is an improvement of the prior SOLA method( Synchronized overlap-add).method( Synchronized overlap-add).

SOLA consists ofSOLA consists of -shifting the beginning of a new speech -shifting the beginning of a new speech

segment over the end of the preceding segment over the end of the preceding segment to find the point of the highest cross-segment to find the point of the highest cross-correlation.correlation.

-when found it, the frame are overlapped and -when found it, the frame are overlapped and average together.average together.

Page 8: Time-Scale Modification of Speech Signal

SOLAFSSOLAFSThere are 4 parameters There are 4 parameters

Window lengthWindow length ( (W) ) - smallest unit of input signal - smallest unit of input signal that is manipulated by the methodthat is manipulated by the method

Analysis shiftAnalysis shift ( (SSa a ) - ) - inter-frame interval between inter-frame interval between

successive search ranges for analysis windows successive search ranges for analysis windows along the input signalalong the input signal

Synthesis shiftSynthesis shift (S(Sss) - ) - inter-frame interval between inter-frame interval between

successive analysis windows along the output successive analysis windows along the output signalsignal

Shift search intervalShift search interval (k(kmaxmax) - ) - the duration of the the duration of the

interval over which an analysis window may be interval over which an analysis window may be shifted for purpose of aligning it with the region of shifted for purpose of aligning it with the region of the output signal it will overlap.the output signal it will overlap.

Page 9: Time-Scale Modification of Speech Signal

SOLAFSSOLAFS The four parameters used in the SOLAFSThe four parameters used in the SOLAFS

Page 10: Time-Scale Modification of Speech Signal

AnalysisAnalysisThe analysis windows are chosen as follows:The analysis windows are chosen as follows:

wherewhere mm = a window index, i.e. it refers to the m = a window index, i.e. it refers to the mthth window window nn = a sample index in an input buffer for the input = a sample index in an input buffer for the input signal, which buffer is signal, which buffer is WW samples long samples long

kkm m == the number of samples of shift for the the number of samples of shift for the mth mth window window

xxmm[n[n] = the ] = the nth nth sample in the sample in the mthmth analysis window analysis window

)1......(1,...,0

0

][][

Otherwise

WnfornkmSxnx mam

m

Page 11: Time-Scale Modification of Speech Signal

AnalysisAnalysis The analysis windows are then used to form the output The analysis windows are then used to form the output

signal signal y[i] y[i] recursively in accordance to the following:recursively in accordance to the following:

where: where:

WWovov= W –S= W –Sss is the number of points in the overlap regionis the number of points in the overlap region

b[n]b[n] = an overlap-add weighting function which is referred = an overlap-add weighting function which is referred to as a to as a fading factorfading factor – an averaging function, a – an averaging function, a linear fade function, and so forth. linear fade function, and so forth.

)2...(1,...,

1,...,0

][

][])[1(][][][

WWnfor

Wnfor

nx

nxnbnmSynbnmSy

ov

ov

m

mss

Page 12: Time-Scale Modification of Speech Signal

Analysis Analysis

Calculation for Calculation for kkmm

kkmm is an optimal shift that is determined is an optimal shift that is determined

by the normalized cross-correlation between x by the normalized cross-correlation between x and y in the overlap region.and y in the overlap region.

where where

kkmaxmax is the maximum allowable shift from the is the maximum allowable shift from the initial string position of the analysis window initial string position of the analysis window

)3].......([maxmax0

kRk mxy

kkm

Page 13: Time-Scale Modification of Speech Signal

KKmm can be often predicted without computation can be often predicted without computation

of the similarity.of the similarity. The The mmthth shift, shift, kkmm, should be determined by:, should be determined by:

][max

)(

max0

1

kR

SSktk m

xykk

asmm

m

if maxkto m

otherwise

Page 14: Time-Scale Modification of Speech Signal

Implement in MATLABImplement in MATLABThere are 7 steps as follows;There are 7 steps as follows; 1. As an initialization step , take 1. As an initialization step , take WW samples samples from the input signal, which samples are from the input signal, which samples are stored in an input signal buffer, and place stored in an input signal buffer, and place them in an output sample buffer for the them in an output sample buffer for the output signal. output signal. 2. find the start of the first analysis 2. find the start of the first analysis

window window mSmSaa..

Page 15: Time-Scale Modification of Speech Signal

Implement in MATLABImplement in MATLAB

3. Next, find the maximum similarity 3. Next, find the maximum similarity between the first between the first WWovov samples at the start samples at the start

of of the analysis windowthe analysis window and at the end of and at the end of the output signalthe output signal by computing the cross- by computing the cross-correlation between the samples from the correlation between the samples from the start of the analysis window, and the start of the analysis window, and the samples from the end of the output samples from the end of the output window. window.

Page 16: Time-Scale Modification of Speech Signal

Implement in MATLABImplement in MATLAB

4. We shift the start of the analysis 4. We shift the start of the analysis window by window by one or two one or two and repeat step 3.and repeat step 3.

5. Steps 3 and 4 are repeated until we 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the have shifted the analysis window by the maximum amount of maximum amount of kkmaxmax that is allowed. that is allowed.

Page 17: Time-Scale Modification of Speech Signal

Implement in MATLABImplement in MATLAB

6. If the maximum cross-correlation 6. If the maximum cross-correlation occurs for a certain shift of the analysis occurs for a certain shift of the analysis window, overlap-add the last window, overlap-add the last WWov ov

samples of the output signal and the samples of the output signal and the first first WWovov samples of the shifted analysis samples of the shifted analysis

window, and transfer window, and transfer W – WW – Wovov further further

samples into the output buffer. samples into the output buffer.

Page 18: Time-Scale Modification of Speech Signal

Implement in MATLABImplement in MATLAB

7. Steps 2 – 7 are repeated by choosing 7. Steps 2 – 7 are repeated by choosing the next analysis window, until the input the next analysis window, until the input signal reaches its end. signal reaches its end.

Page 19: Time-Scale Modification of Speech Signal

Parameter choicesParameter choices

The smallest useful synthesis shift is The smallest useful synthesis shift is

SSss= W= Wovov

The smallest useful window length is The smallest useful window length is

W = 2WW = 2Wovov

KKmaxmax = 2W = 2W

Page 20: Time-Scale Modification of Speech Signal

MATLABMATLAB

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Project Spring 2005%% Rachan Fugcharoen ECE5525% % Do SOLAFS timescale mod'n% % Y is X scaled to run F x faster. X is added-in in windows% % W pts long, overlapping by Wov points with the previous output. % % The similarity is calculated over the last Wsim points of output.% % Maximum similarity skew is Kmax pts.% % Each xcorr calculation is decimated by xdecim (8)% % The skew axis sampling is decimated by kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%% Read the wave file [d,fs,bit]=wavread('we.wav');W=400; % window lengthWov=W/2; % Overlapping point long Kmax=2*W; % maximum number of shifting Wsim=Wov; % Similarity point long of outputxdecim=8; % decimation of each xcorrkdecim=2; % decimation of the skew axis sampling X=d'

Page 21: Time-Scale Modification of Speech Signal

MATLABMATLAB% Factor to run x faster or slower F=4; Ss =W-Wov; size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1, ypts); % Cross-fade win is Wov pts long - it grows xfwin = (1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region ovix = (1-Wov):0;% Index for non-overlapping bit newix = 1:(W-Wov);% Index for similarity chunks% decimate the cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; % Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);

Page 22: Time-Scale Modification of Speech Signal

MATLABMATLAB

xabs = 0;lastxpos = 0;km = 0;

for ypos = Wsim:Ss:(ypts-W); % Ideal X position

xpos = F * ypos; % Overlap prediction - assume all of overlap from last copy

kmpred = km + (xpos - lastxpos); lastxpos = xpos;

if (kmpred <= Kmax) km = kmpred; % no need to search

else

Page 23: Time-Scale Modification of Speech Signal

MATLABMATLAB

% Calculate the skew, km% .. by first figuring the cross-correlation ysim = Y(ypos + simix); % Clear the Rxy array rxy = zeros(1, Kmax+1); rxx = zeros(1, Kmax+1);Kmin = 0;

Page 24: Time-Scale Modification of Speech Signal

MATLABMATLAB

for k = Kmin:kdecim:Kmax xsim = padX(Wsim + xpos + k + simix); rxx(k+1) = norm(xsim); rxy(k+1) = (ysim * xsim'); end% Zero the pts where rxx was zero Rxy = (rxx ~= 0).*rxy./(rxx+(rxx==0));% Local max gives skew km = min(find(Rxy == max(Rxy))-1);end xabs = xpos+km;

Page 25: Time-Scale Modification of Speech Signal

MATLABMATLAB

% Cross-fade some points Y(ypos+ovix) = ((1-xfwin).*Y(ypos+ovix)) + (xfwin.*padX(Wsim+xabs+ovix));% Add in remaining points Y(ypos+newix) = padX(Wsim+xabs+newix);end% Plot the result subplot(211);plot(X);grid;original=axis;subplot(212);plot(Y);grid;change=axis;

Page 26: Time-Scale Modification of Speech Signal

MATLABMATLAB

if F > 1 subplot(211); title('Original wave file');axis(original) subplot(212); title(['Modified wave file (Speed=',num2str(F),'X)']);axis(original)else subplot(211); title('Original wave file'); axis(change) subplot(212); title(['Modified wave file (Speed =',num2str(F),'X)']);axis(change)end% Play the wave file and save the wave filesound(Y,fs);wavwrite(Y,fs,8,'we_new_2X.wav');

Page 27: Time-Scale Modification of Speech Signal

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =2X)

Speed 2X

Page 28: Time-Scale Modification of Speech Signal

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =4X)

Speed 4X

Page 29: Time-Scale Modification of Speech Signal

ResultsResults

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-1

-0.5

0

0.5

1Original wave file

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =0.75X)

Speed 0.75X

Page 30: Time-Scale Modification of Speech Signal

ResultsResults

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1Original wave file

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1Modified wave file (Speed =0.5X)

Speed 0.5X

Page 31: Time-Scale Modification of Speech Signal

ConclusionConclusion

The result can be accept with the proper The result can be accept with the proper choice of the parameters.choice of the parameters.

The SOLAFS algorithm provides time-The SOLAFS algorithm provides time-scale modified speech over the wide scale modified speech over the wide range of compression and expansion.range of compression and expansion.

It requires significantly less computation It requires significantly less computation than many other methods.than many other methods.

Page 32: Time-Scale Modification of Speech Signal

ConclusionConclusion

From the MATLAB code, it requires a lot From the MATLAB code, it requires a lot of buffer to hold the sample . It will cause of buffer to hold the sample . It will cause difficulties in real-time applications. difficulties in real-time applications.

In real-time applications, they have to In real-time applications, they have to process everything as fast as possible. If process everything as fast as possible. If the data is stored in compressed form or the data is stored in compressed form or the storage units are slow, they will be the storage units are slow, they will be difficult to process.difficult to process.

Page 33: Time-Scale Modification of Speech Signal

ReferencesReferences

D.J Hejna. Real-time time-scale D.J Hejna. Real-time time-scale modification of speech via the modification of speech via the synchronized overlap-add algorithm. synchronized overlap-add algorithm. Master’s thesis, M.I.T.,1990Master’s thesis, M.I.T.,1990

Don Hejna and Bruce R. Musicus. The Don Hejna and Bruce R. Musicus. The SOLAFS Time-Scale Modification SOLAFS Time-Scale Modification Algorithm. Research.1991Algorithm. Research.1991