Upload
roger-gomes
View
523
Download
3
Embed Size (px)
DESCRIPTION
A presentation done as a part of the final year project during Semester 8 in the under-graduate degree course in engineering. This presentation explains one of the modules of the project "Speaker and Speech Recognition based Embedded System Design for User Authentication and remote Device Control" which is the Speech Recognition Module. It effectively explains the Dynamic Time Warping Algorithm used for Speech Recognition and how that is further used along with PIC 16F676 Microcontroller to acquire control of remote devices connected to the system.
Citation preview
1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
Part 1
Introduction
Proposed System
Overview
Speech Recognition
and its Types
Acquisition of Speech
Signal and its Analysis
Dynamic Time
Warping (DTW)
DTW for Digit
Recognition
Discussion So far was with Reference to Implementation of Speaker Recognition for the process of user Authentication
Goal of the project is to provide access to the Authenticated user to control the devices connected to the System
Speaker Recognition Speech Recognition Device Control
The control of the devices would be via recognition of the device Id (digits from 1 to 8) connected to the system
The Recognition of the device id is accomplished using DTW Algorithm based Speaker Independent Isolated word Recognition
)
:
Recording Training Sequences
MFCC Feature Extraction
Speaker Model
Monitoring Microphone
MFCC Feature Extraction
Calculate VQ Make Decision and
Display Results
Monitoring Microphone
MFCC Feature Extraction
DTW based matching
Toggle Device status
DTW Algorithm is based on Dynamic Programming, which is nothing but a systematic process of comparing 2 sequences of acoustic feature vectors
It is used for measuring 2 time series which may vary in time or Speed
Our speech is represented by a series of feature vectors that are computed every 10ms
This technique is used to find optimal assignment between 2 time series of acoustic feature vectors
If one of the time series is “warped” non-linearly by stretching or shrinking along its time axis then this technique of obtaining time frames of comparable length is called “Time Warping”
Whole words comprises of dozens of feature vectors. The no of vectors depends upon how fast we speak.
Let us consider an example of a word ‘ w ’ having a vector sequence x which is to be compared with a known seq. w
We need to measure the distances between these vector sequences to determine its similarity
During the computation of distances we need to assign a “Optimal Assignment” between the individual vector pairs and also compute distances between the pairs
However words with different lengths of sequence vectors needs to be taken into consideration for that pupose consider the following diagram
• The length Lp of the path is determined by max. no of vectors in x and w
• The assignment between x and w as given by P and it can be interpreted as time warping between the time axes of x and w
• Thus by time warping different length of vector sequences can be cmpensated
• For a given path P the distances between vector sequences can now be computed as the sum of the distances between individual vectors
• d(gl) denotes the vector distance for the time indices i and j defined by the grid point
gl={I,j} this distance would be the Euclidian distance
• The criterium of finding the optimal path Popt os to minimize the distance D(x ,w, P)
• However it is not necessary to compute all the paths P and the corresponding distances D to determine which is the optimum
• Since feature vectors are measured in short time intervals we restrict time warping to reasonable boundaries. For this pupose we need to understand local path alternatives
• The first and last vectors of X and W should be assigned to each other
• To locally wrap the duration of the speech signal we “reuse” the preceding vectors to restrict time warping, with these restrictions we can draw local path alternatives
• The grid pt. (i,j) can have the possible
predecessor path (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)
• Popt will be the concatenation of these local path alternatives
• Now that we have defined the local pathalternatives we can use Bellman’s principle to find the optimal path Popt
• Bellman’s principle states the following: If Popt is the optimal path through the matrix of grid points beginning at (0, 0) and ending at (TW −1, TX −1), and the grid point (i, j) is part of path Popt, then the partial path from (0, 0) to (i, j) is also part of Popt. • Only 3 possible predecessor paths: (i − 1, j) ; (i − 1, j − 1) ; (i, j − 1)
• Now let us assume we have calculated the optimal paths considering the
above 3 paths and its corresponding accumulated distance
• We can mow find the optimal path from(0,0) to grid point (i,j) by selecting exactly the one path hypothesis which minimizes the accumulated distance
• Since the decision for the best predecessor path hypothesis reduces the number of paths leading to grid point (i, j) to exactly one, it is also said that the possible path hypotheses are recombined during the optimization step.
𝛿
𝛿(i,j)
Initialization(0,0) Iteration Termination 𝛿(Tx-1,Tw-1)
1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
1. Introduction
2. RS-232-C Serial
Communications with Matlab
3. Serial Communication with PIC16F676
for Device Control
4. Interfacing Circuits ad Schematics
• The RS-232-C convention specifies that, with respect to ground, a voltage more negative than -3 V is interpreted as a 1 bit and a voltage more positive than +3 V as a 0 bit.
• Serial communications, according to RS-232-C, require that transmitter and receiver agree on a communications protocol.
Serial communications in MatlbR2011b is possible by writing scripts which initializes a special variable to keep track of serial connections – the Serial Object.
Unlike normal variables which have a single value, objects have many "attributes" or parameters that can be set. (ex. port number, baud rate, buffer size, etc.) One of those attributes is the port number. A label that corresponds to which port your device is connected to.
In order to send or receive data through the serial port object it must be open. When not in use it can be closed (not the same as deleting it). We can have many different serial objects in memory.
They can all send and receive data at the same time as long as they are each on a different port. There can even be several objects associated with the same physical port. However, only one of those objects associated with a given port can actually be open (sending or receiving data) at any time.
a. Creating a Serial Port Object:
serialPort = serial('com1') Resulting Intializations: 1.Serial Port Object : Serial-COM1 2.Communication Settings 3.Communication State Port: COM1 Status: closed BaudRate: 9600 RecordStatus: off 4.Terminator: 'LF' 5.Read/Write State TransferStatus: idle ValuesReceived: 0 BytesAvailable: 0 ValuesSent: 0 b. Setting the Parameters get(serialPort, 'baudrate') set(serialPort, 'BaudRate', 19200) ans =9600
get(serialPort, 'BaudRate') ans =19200
The method described previously is cumbersome if we have a lot of things that we want to change. A better way to to set them when you create the Serial object.
serialPort_new = serial('com1', 'baudrate', 19200, 'terminator', 'CR') • Writing To The Serial Port Before we can write to the serial port, you need to open it:
fopen(‘COM1’)
• Writing Binary Data Use the command fwrite to send four bytes of binary data
fwrite(COM1, [0, 12, 117, 251]); • Reading From The Serial Port You can use fread to read in data (not text). It can automatically format the data for you. Here is an example. Say the buffer currently has 2 bytes of data in it a = fread(serialObj, 2);% Will read two bytes and create a vector
Establish Serial Port
Communication with
Matlab
Acquire Results of
User
Authentication
Display Results of
the Authenticated
User
Display the Speech
Recognition Menu and
accept the Device Id utterd
by the authenticated User
Send the Identified device ID
via the Serial port to PIC to
toggle the current status of
the device
Overview of the system
Registers use in Asynchronous Mode
1. The SPBRG register is set up for the selected baud rate. 2. Asynchronous reception is enabled by clearing the SYNC bit in the TXSTA register and setting the SPEN bit in the RCSTA register
3. To enable the receive data interrupt, the RCIE, GIE, and PEIE bits must be set. 4. Reception is activated by setting the CREN bit in RCSTA. 5. When reception has concluded, the RCIF bit in the PIR1 register is set. 6. Received data is retrieved by reading RCREG. 7. If any error occurred the CREN bit must be cleared
1st
• Introduction
• Proposed System Overview
• A Simple Speech Recognition System and its Types
• Acquisition of Speech Signal and its Analysis
• Dynamic Time Warping Algorithm for Digit Recognition
2nd
• Introduction
• RS-232-C and Serial Communication with MatlabR2011b
• Serial Communications with PIC 16F676 for Device Control
• Interfacing Circuit Schematics and Design
3rd
• Summary
• Conclusion and Results
• Future Work
In this Presentation all the Aspects involved in the process of Speaker and Speech Recognition and the various techniques used to achieve them have been discussed.
Acquisition of Acoustic feature vectors and matching those vectors with existing models in the database using Vector quantization and optimizing it using the LBG algorithm and word identification using DTW have been dealt with.
Serial communication between Matlab and PIC via the serial port using the RS-232-C standard is also presented and finally the process of granting access to the authenticated user for device control has been dealt with in this presentation.
User Speaker Recognition Speech Recognition Accuracy (Speaker/Speech)
Speaker Id No of attempts
Correctly Recognized
No of attempts
CorrectLy Recognized
1 10 8 10 9 (80/90)
2 10 9 10 8 (90/80)
3 10 8 10 9 (80/90)
4 10 9 10 9 (90/90)
Total
40
34
40
35
(85/86.25)
Insert a Class Id
Speech s/g Duration, fs, no of bits per sec
Speech S/g acquisition via
mic using audiorecorder
function
Feature Extraction Using Mfcc (s,fs) Frame Blocking
using Hamming Window
Mel-frequency filter bank
Feature Matching using
Vqlbg(d,k)
Vq Codebook
Speech s/g Duration, fs, no of bits per sec
Speech S/g acquisition via
mic using audiorecorder
function
Feature Extraction Using Mfcc (s,fs)
Frame Blocking using Hamming
Window
Mel-frequency filter bank
Feature Matching using Vqlbg(d,k)
Vq Codebook
Vq Codebook from Training
Phase
Vq Codebook from Testing
Phase
Comparison of Euclidian Distances
User Id with Lowest Euclidian Distance is Authenticated
Creation of Reference Templates
Path to separate folder is provided which has all the
words to be recognized
Feature Extraction
Calculation of lowest total Cost
Comparison of Local Distance
with all the stored words
Selection of Optimal path
Sends the results of recognition
word to COM port
Signal(device id) received by PIC and the corresponding device is
toggled
• The System proposed could be improved to a great extent by implementing more efficient models for speaker Identification such as Hidden Markov Models (HMM) This uses theory from statistics in order to (sort of) arrange our feature vectors into a Markov matrix (chains) that stores probabilities of state transitions.
• Along with Speaker Recognition an added level of voice based biometric security could also
be provided using Speech Recognition, that is after verifying who the user , acquire some specific keyword unique to the system.Also Integration of mobile phone based sytem access would mean controlling any system from almost anywhere in thee world.
• The Fuzzy c-means clustering technique improves VQ performance at the classification stage.
The FVQ performance can be improved more by using a fuzzy-based hierarchical clustering approach proposed by Haipeng.
• The performance of GMM is better than the other classifiers, even though FVQ improves the
ASR performance significantly when compared to the other VQ techniques. Additionalwork in the area of enhanced or alternative fuzzy clustering techniques is appropriate.