Dynamic time warping and PIC 16F676 for control of devices

1st

• Introduction

• Proposed System Overview

• A Simple Speech Recognition System and its Types

• Acquisition of Speech Signal and its Analysis

• Dynamic Time Warping Algorithm for Digit Recognition

2nd

• Introduction

• RS-232-C and Serial Communication with MatlabR2011b

• Serial Communications with PIC 16F676 for Device Control

• Interfacing Circuit Schematics and Design

3rd

• Summary

• Conclusion and Results

• Future Work

Part 1

Introduction

Proposed System

Overview

Speech Recognition

and its Types

Acquisition of Speech

Signal and its Analysis

Dynamic Time

Warping (DTW)

DTW for Digit

Recognition

Discussion So far was with Reference to Implementation of Speaker Recognition for the process of user Authentication

Goal of the project is to provide access to the Authenticated user to control the devices connected to the System

Speaker Recognition Speech Recognition Device Control

The control of the devices would be via recognition of the device Id (digits from 1 to 8) connected to the system

The Recognition of the device id is accomplished using DTW Algorithm based Speaker Independent Isolated word Recognition

)

:

Recording Training Sequences

MFCC Feature Extraction

Speaker Model

Monitoring Microphone


Calculate VQ Make Decision and

Display Results

Monitoring Microphone


DTW based matching

Toggle Device status

DTW Algorithm is based on Dynamic Programming, which is nothing but a systematic process of comparing 2 sequences of acoustic feature vectors

It is used for measuring 2 time series which may vary in time or Speed

Our speech is represented by a series of feature vectors that are computed every 10ms

This technique is used to find optimal assignment between 2 time series of acoustic feature vectors

If one of the time series is “warped” non-linearly by stretching or shrinking along its time axis then this technique of obtaining time frames of comparable length is called “Time Warping”

Whole words comprises of dozens of feature vectors. The no of vectors depends upon how fast we speak.

Let us consider an example of a word ‘ w ’ having a vector sequence x which is to be compared with a known seq. w

We need to measure the distances between these vector sequences to determine its similarity

During the computation of distances we need to assign a “Optimal Assignment” between the individual vector pairs and also compute distances between the pairs

However words with different lengths of sequence vectors needs to be taken into consideration for that pupose consider the following diagram

• The length Lp of the path is determined by max. no of vectors in x and w

• The assignment between x and w as given by P and it can be interpreted as time warping between the time axes of x and w

• Thus by time warping different length of vector sequences can be cmpensated

• For a given path P the distances between vector sequences can now be computed as the sum of the distances between individual vectors

• d(gl) denotes the vector distance for the time indices i and j defined by the grid point

gl={I,j} this distance would be the Euclidian distance

• The criterium of finding the optimal path Popt os to minimize the distance D(x ,w, P)

• However it is not necessary to compute all the paths P and the corresponding distances D to determine which is the optimum

• Since feature vectors are measured in short time intervals we restrict time warping to reasonable boundaries. For this pupose we need to understand local path alternatives

• The first and last vectors of X and W should be assigned to each other

• To locally wrap the duration of the speech signal we “reuse” the preceding vectors to restrict time warping, with these restrictions we can draw local path alternatives

• The grid pt. (i,j) can have the possible

predecessor path (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)

• Popt will be the concatenation of these local path alternatives

• Now that we have defined the local pathalternatives we can use Bellman’s principle to find the optimal path Popt

• Bellman’s principle states the following: If Popt is the optimal path through the matrix of grid points beginning at (0, 0) and ending at (TW −1, TX −1), and the grid point (i, j) is part of path Popt, then the partial path from (0, 0) to (i, j) is also part of Popt. • Only 3 possible predecessor paths: (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)

• Now let us assume we have calculated the optimal paths considering the

above 3 paths and its corresponding accumulated distance

• We can mow find the optimal path from(0,0) to grid point (i,j) by selecting exactly the one path hypothesis which minimizes the accumulated distance

• Since the decision for the best predecessor path hypothesis reduces the number of paths leading to grid point (i, j) to exactly one, it is also said that the possible path hypotheses are recombined during the optimization step.

𝛿

𝛿(i,j)

Initialization(0,0) Iteration Termination 𝛿(Tx-1,Tw-1)

1st

• Introduction





2nd

• Introduction




3rd

• Summary


• Future Work

1. Introduction

2. RS-232-C Serial

Communications with Matlab

3. Serial Communication with PIC16F676

for Device Control

4. Interfacing Circuits ad Schematics

• The RS-232-C convention specifies that, with respect to ground, a voltage more negative than -3 V is interpreted as a 1 bit and a voltage more positive than +3 V as a 0 bit.

• Serial communications, according to RS-232-C, require that transmitter and receiver agree on a communications protocol.

Serial communications in MatlbR2011b is possible by writing scripts which initializes a special variable to keep track of serial connections – the Serial Object.

Unlike normal variables which have a single value, objects have many "attributes" or parameters that can be set. (ex. port number, baud rate, buffer size, etc.) One of those attributes is the port number. A label that corresponds to which port your device is connected to.

In order to send or receive data through the serial port object it must be open. When not in use it can be closed (not the same as deleting it). We can have many different serial objects in memory.

They can all send and receive data at the same time as long as they are each on a different port. There can even be several objects associated with the same physical port. However, only one of those objects associated with a given port can actually be open (sending or receiving data) at any time.

a. Creating a Serial Port Object:

serialPort = serial('com1') Resulting Intializations: 1.Serial Port Object : Serial-COM1 2.Communication Settings 3.Communication State Port: COM1 Status: closed BaudRate: 9600 RecordStatus: off 4.Terminator: 'LF' 5.Read/Write State TransferStatus: idle ValuesReceived: 0 BytesAvailable: 0 ValuesSent: 0 b. Setting the Parameters get(serialPort, 'baudrate') set(serialPort, 'BaudRate', 19200) ans =9600

get(serialPort, 'BaudRate') ans =19200

The method described previously is cumbersome if we have a lot of things that we want to change. A better way to to set them when you create the Serial object.

serialPort_new = serial('com1', 'baudrate', 19200, 'terminator', 'CR') • Writing To The Serial Port Before we can write to the serial port, you need to open it:

fopen(‘COM1’)

• Writing Binary Data Use the command fwrite to send four bytes of binary data

fwrite(COM1, [0, 12, 117, 251]); • Reading From The Serial Port You can use fread to read in data (not text). It can automatically format the data for you. Here is an example. Say the buffer currently has 2 bytes of data in it a = fread(serialObj, 2);% Will read two bytes and create a vector

Establish Serial Port

Communication with

Matlab

Acquire Results of

User

Authentication

Display Results of

the Authenticated

User

Display the Speech

Recognition Menu and

accept the Device Id utterd

by the authenticated User

Send the Identified device ID

via the Serial port to PIC to

toggle the current status of

the device

Overview of the system

Registers use in Asynchronous Mode

1. The SPBRG register is set up for the selected baud rate. 2. Asynchronous reception is enabled by clearing the SYNC bit in the TXSTA register and setting the SPEN bit in the RCSTA register

3. To enable the receive data interrupt, the RCIE, GIE, and PEIE bits must be set. 4. Reception is activated by setting the CREN bit in RCSTA. 5. When reception has concluded, the RCIF bit in the PIR1 register is set. 6. Received data is retrieved by reading RCREG. 7. If any error occurred the CREN bit must be cleared

pic project/Relay Board SCH K619A.pdf

1st

• Introduction





2nd

• Introduction




3rd

• Summary


• Future Work

In this Presentation all the Aspects involved in the process of Speaker and Speech Recognition and the various techniques used to achieve them have been discussed.

Acquisition of Acoustic feature vectors and matching those vectors with existing models in the database using Vector quantization and optimizing it using the LBG algorithm and word identification using DTW have been dealt with.

Serial communication between Matlab and PIC via the serial port using the RS-232-C standard is also presented and finally the process of granting access to the authenticated user for device control has been dealt with in this presentation.

User Speaker Recognition Speech Recognition Accuracy (Speaker/Speech)

Speaker Id No of attempts

Correctly Recognized

No of attempts

CorrectLy Recognized

1 10 8 10 9 (80/90)

2 10 9 10 8 (90/80)

3 10 8 10 9 (80/90)

4 10 9 10 9 (90/90)

Total

40

34

40

35

(85/86.25)

Insert a Class Id

Speech s/g Duration, fs, no of bits per sec

Speech S/g acquisition via

mic using audiorecorder

function

Feature Extraction Using Mfcc (s,fs) Frame Blocking

using Hamming Window

Mel-frequency filter bank

Feature Matching using

Vqlbg(d,k)

Vq Codebook

Speech s/g Duration, fs, no of bits per sec

Speech S/g acquisition via

mic using audiorecorder

function

Feature Extraction Using Mfcc (s,fs)

Frame Blocking using Hamming

Window

Mel-frequency filter bank

Feature Matching using Vqlbg(d,k)

Vq Codebook

Vq Codebook from Training

Phase

Vq Codebook from Testing

Phase

Comparison of Euclidian Distances

User Id with Lowest Euclidian Distance is Authenticated

Creation of Reference Templates

Path to separate folder is provided which has all the

words to be recognized

Feature Extraction

Calculation of lowest total Cost

Comparison of Local Distance

with all the stored words

Selection of Optimal path

Sends the results of recognition

word to COM port

Signal(device id) received by PIC and the corresponding device is

toggled

• The System proposed could be improved to a great extent by implementing more efficient models for speaker Identification such as Hidden Markov Models (HMM) This uses theory from statistics in order to (sort of) arrange our feature vectors into a Markov matrix (chains) that stores probabilities of state transitions.

• Along with Speaker Recognition an added level of voice based biometric security could also

be provided using Speech Recognition, that is after verifying who the user , acquire some specific keyword unique to the system.Also Integration of mobile phone based sytem access would mean controlling any system from almost anywhere in thee world.

• The Fuzzy c-means clustering technique improves VQ performance at the classification stage.

The FVQ performance can be improved more by using a fuzzy-based hierarchical clustering approach proposed by Haipeng.

• The performance of GMM is better than the other classifiers, even though FVQ improves the

ASR performance significantly when compared to the other VQ techniques. Additionalwork in the area of enhanced or alternative fuzzy clustering techniques is appropriate.

Technology

Dynamic time warping and PIC 16F676 for control of devices