EMG Vocalizer - WINLABcrose/capstone12/entries/EMGvocalizer_Fi… · EMG Vocalizer Capstone Design: Final Report Matthew Banks, Jennifer Padgett, Sophie Tsalkhelishvi, Kristin Weidmann

EMG Vocalizer

Capstone Design: Final Report

Matthew Banks, Jennifer Padgett, Sophie Tsalkhelishvi, Kristin Weidmann

5/2/2012

Advisor: Professor Rose

1

Abstract

The initial goal of this project was to design a system that converts subvocal signals into

speech. Subvocal signals are captured by taking EMGs at the throat and mouth since when

people “think out loud” their vocal chords vibrate ever so slightly. With enough low noise

amplification these signals can be observed and processed so that a person’s thoughts can be

interpreted. However, this initial objective was not achievable.

The goal of this project changed to recognition of the motions made when a person

speaks. The movements at the mouth were captured with an EMG, and six words were used as

part of a recognition system. The words, when recognized, were converted back into speech by a

vocal synthesizer.

Figure 1: The Vocalizer System

Introduction

In many situations high noise makes communication over microphones difficult which

can result in vital time spent repeating information. Under certain circumstances, this time could

mean the difference between life or death. The EMG Vocalizer, however, will allow

communication to continue without environmental noise affecting the signal, therefore reducing

the need to repeat information. A microphone system supplemented with the EMG Vocalizer

would allow for seamless communication.

The beauty of this system is that it has the ability to recognize any key words that you

train it with. For tests, the words alpha, omega, left, right, forward, and reverse were used.

Given the ability to program for any six facial movements, the system can be trained with useful

words for a given situation. For example, if one were to use this system for robotic control the

words stop, go, up, down, left, and right may be used to train the system.

2

Since the system relies on movement instead of verbal communication, the system could

also be trained to recognize facial expressions or any bodily gestures with different electrode

placements. The Vocalizer could be used to help mute people communicate with those who do

not know sign language. It could also be used to detect eye movement and facial twitches to aid

in lie detection. The system has many potential applications. However, in this report the

application of more reliable audio communication is explored.

Hardware

Early stages:

Initially, the plan for the project was to use the Emotiv Epoch EEG headset to acquire the

subvocal signals at the neck/jaw. The plan was to redesign the headset for use as an EMG, then

the signal was going to be filtered using analog hardware and then sent to the computer

wirelessly using a USB transmitter/receiver. After receiving the Epoch headset and testing it as

an EEG, it worked well (at the head), but unfortunately the headset was not able to acquire

significant data at the throat. This was because the software that was provided with the headset

was created to use signals coming from specific points on the head. In order to get around this

problem, the drivers for the headset would need to be re-written so that the raw signal data could

be sent directly to MATLAB. This proved to be a much more difficult task than expected, and

would consume a large amount of time. Since it was still unclear how well the Epoch headset

would work as a sensor for EMG signals, we concluded that it would not be wise to risk time

writing drivers for a system which was not guaranteed to work.

Introduction of ECG/EMG:

After ruling out the Emotiv headset, a new plan was created to use an ECG

machine/circuit to gather EMG signals. Since an ECG machine can be expensive and difficult to

acquire, one was made. A basic ECG circuit design was implemented using a similar schematic

to the one from [10]. However, the schematic of the actual implementation can be seen below.

3

Figure 2: Schematic used following similar design to [10]

The circuit was constructed using the following parts:

2 x LM324AN Quad op-amp chips

6 x 10k resistors

12 x 100k resistors

1 x 1uF capacitor

1 x 0.1uF capacitor

6 x diodes

3 x alligator clips

1 x 1MΩ resistor

U1A

LM324AN

3

2

11

4

1

U1B

LM324AN

3

2

11

4

1

U2C

LM324AN

3

2

11

4

1

U2B

LM324AN

3

2

11

4

1

U3D

LM324AN

3

2

11

4

1

U6B

LM324AN

3

2

11

4

1

R1

100kΩ

R2

100kΩ

R3

100kΩ

R4

100kΩ

R5

100kΩ

R6100kΩ

R7

100kΩR8100kΩ

R9100kΩ

R10100kΩ

R1110kΩ

R12

10kΩ

R1310kΩ

R1410kΩ

R15

10kΩ

R16

10kΩ

V15 V

V35 V

V55 V

V85 V

V95 V

V115 V

C1

1µF

C2

0.1µF

Electrode 1

Electrode 2

V25 V

Body

To Audio Jack

R17

1MΩ

4

Figure 3:Constructed circuit following the Schematic shown above (diodes used to protect

circuit)

The parts listed were soldered onto a breadboard, and the alligator clips were used as

electrode leads. Unfortunately, this circuit was very sensitive to noise, and thus not able to

acquire very usable signals. Even a large EMG signal, like from a flexing bicep, was hard to

distinguish from noise. This noise was due to the lengthy wires soldered onto the circuit, along

with op-amps that have a low CMRR (Common-Mode Rejection Ratio) of about 85dB.

This circuit was then reconstructed using a solderless breadboard, so that interchanging

components would be easier without having to re-build the whole circuit. Furthermore, using a

solderless breadboard eliminated most of the lengthy wires, which reduced the noise drastically.

Op-amps with a higher CMRR of about 100dB (LF353Ns) were also used in the new circuit to

reduce noise. Additionally, the diodes were removed to simplify the circuit since there is no

voltage large enough to damage the circuit coming from the body. For this circuit, the DC

supply was recommended to be about and . In the design following [10],

there is an offset circuit set at half of , so in this case about 2.5V. This section of the circuit

made this the new ground reference for the circuit, which makes it easier to supply the circuit

with only 5V, and not requiring . Unfortunately, this generated many restrictions as a

separate DC supply would be required if a subsequent gain stage was to be added to the circuit.

This is because with another gain stage, the new op-amp would have 0V as the ground reference,

5

which caused the signal to hit the supply rail when a gain stage was added. Although the circuit

was able to get a better EMG signal than the first circuit did at the bicep, the signal to noise ratio

at the throat was about 1-to-1, which essentially just picked up noise. Since getting a separate

DC supply just for the gain stages would be excessive and unpractical, along with the fact that

the SNR needed to be improved, several changes to the circuit needed to be made.

A circuit called a right-leg driver was improved on the circuit, and the rest of the circuit

was simplified so that the signal from the two electrodes could be input into a differential

amplifier, with the right-leg driver creating a floating ground, and then the signal would be

amplified. What this essentially left was an instrumentation amplifier circuit, with a right-leg

driver circuit. According to [2], a right-leg driver maintains a known potential on the body, with

reference to the circuit ground. This reduces the common-mode DC offset of the circuit, and

cancels out any deviations on any of the circuit's channels. The schematic of the circuit used can

be seen below which was created using Multisim software.

6

Figure 4: Schematic of EMG Circuit with Right-leg Driver using Multisim

(with external gain resistance to be 44Ω)

This Circuit was created using the following parts:

2 x LF353Ns

1 x INA129P

2 x 10kΩ resistor

1 x 1MΩ resistor

2 x 1nF capacitors

2 x 22Ω resistors

Shielded wire (For the leads of Electrode 1 & 2)

U1

AD620AN

3

2

6

7 1 8

54

U2A

LF353N

3

2

4

8

1

U3B

LF353N

3

2

4

8

1

Electrode 1

Electrode 2

To Audio Jack/ComputerShield

R122Ω

R222Ω

R3

10kΩ

R4100kΩ

R5

10kΩ

R6

1MΩ

V1-15 V

V215 V

V3-15 V

V415 V

V515 V

V6-15 V

C1

1nF

C2

1nF

Body

7

Figure 5: Results of simulation

Looking at the simulation results, it can be seen that the difference between the two

inputs of (Electrode 1) and (Electrode 2) was about 0.3mV, the output of the circuit is around

1000X that input with an amplitude of 336.1mV peak-peak.

8

Figure 6: The Constructed Circuit of the Schematic in Fig. 4

(Additional 0.1µF Capacitors used to reduce supply noise)

The actual instrumentation amplifier used was an INA129P, and the right-leg driver

circuit was implemented using the LF353Ns. In the schematic above, an AD620AN

Instrumentation amplifier is used for simulation, but it has very similar qualities (almost

identical) to the INA129P's. The INA129P was used for its incredible CMRR of 120dB, and a

potential gain of 10,000, which can be adjusted using a single external resistor (or in the

current circuit design, the sum of two resistors in series). Furthermore, 1nF capacitors were

added to the feedback loop of the op-amp whose output goes to the body in the right-leg driver,

along with one 1MΩ resistor at the output. The 1MΩ resistor is in parallel with one of the 1nF

capacitors at the output, and the output of this goes to the body, and into a negative feedback

with the other 1nF capacitor. This Resistance at the output of the op-amp is to prevent

oscillation of the signal.

With the new design, the supply can be up to which means that hitting the supply

rail will no longer be a problem. Furthermore, with a potential gain of 10,000, subsequent gain

stages may not even be necessary, reducing the number of components in the circuit. Another

significant change made to the circuit was the use of shielded wire for the lead wires, which

reduces most of the noise being picked up from the wires themselves. With these changes, the

gain can be selected using the following equation given from the data sheet for the INA129 [8]:

9

where is the external resistor chosen. In this case, was first chosen to be around 50Ω, so

that the gain would be around 1000. Using the resistors that were available, two 22Ω resistors

were used together for 44Ω. This gave a gain of about 1123, which is even better. This was not

quite enough gain to get a noticeable subvocal signal though, at least without filtering out the

noise. It did, however, pick up the EMG signal, when a word is mouthed. The waveform below

is an example of the word "alpha" being mouthed.

Figure 7: Example of EMG signal at throat generated by mouthing the word alpha

The raw data waveform received from just thinking about the word “alpha,” showed that

the current hardware is not quite sensitive enough to pick up a signal of that small of magnitude.

It is possible that the signal can be picked up, but the SNR is currently not high enough to see it,

and while maintaining a reasonable budget and within the time restraints of this project, the

equipment and resources required to attain such a signal would not be possible to obtain. This

resulted in a change in the objective for the project. The new goal was to take the EMG signal

from saying, or mouthing a word, amplifying it, while maintaining a high SNR, sending it to the

computer to be processed, and then using a vocal synthesizer to output the same word.

10

With this new goal, the current circuit at the time was adequate for completing the

objective, but it needed improvements. The current interface from the circuit to the computer

was to output the signal to an oscilloscope, and then to the computer. This is very impractical

and the plans were to make it a wireless interface. Filters needed to be added to the circuit to

notch out 60Hz, in order to remove the noise that resides on the body's surface, and to attenuate

signals above 100Hz to eliminate unwanted higher frequencies from the signal, and to prevent

aliasing. It was also planned for the board to be condensed and then printed out as a PCB

(Printed Circuit Board). Finally, the DC supply may be changed over to batteries in order to

reduce supply noise, and to increase portability of the circuit.

Analog Filters:

Two filters are being used to reduce the noise, and to prevent aliasing in the signal. A

Twin-T notch filter is being used to notch out the 60Hz noise that is present on the body, and a

5th-Order Butterworth Low-Pass Filter is being used to attenuate signals above 100Hz. These

filters were necessary to increase the SNR of the circuit, before it is sent into the oscilloscope,

and then to the computer. If the interface changes completely to a wireless hardware/computer

interface, then these filters will no longer be needed as the noise will be placed back onto the

signal during transmission, and will have to be filtered out using digital filters. Currently, the

filters have been created and are being used as the wireless interface is not complete.

Twin-T Notch Filter:

The Twin-T notch filter was chosen because it is capable of very high attenuation of a

signal, and a very small notch width, making it much more accurate for elimination of a small

range of frequencies than a normal notch filter. To construct this filter, I used the method

described in [6], where the transfer function for the twin-t notch filter is described as follows

( )

Where

( )

Choosing C=0.1µF, → rad/s, and a notch band of about 20Hz→

rad/s

11

R=26.525KΩ, and using commercial resistor values R=27KΩ

R/2=13KΩ (Using commercial resistor values)

, so and ( )

Below is a schematic of the notch filter with the appropriate component values.

Figure 8: Schematic of Twin-T Notch Filter

In addition to the passive components (resistors and capacitors), two LM741 Op-Amps were

used to make this filter. These were chosen because they are cheap and readily available in the

lab along with the fact that for the purpose of making filters, the LM741will be an acceptable

Op-amp as they can work using a supply of 15V, while power consumption is only in about

50-85mW.

This was then tested and simulated using the following Matlab code:

%60Hz Twin-T notch filter simulation

12

w0=370.37; %rad/s 58.9Hz

B=125.66; %rad/s 19.99Hz

Hs=tf([1 0 w0^2],[1 B w0^2]);

bode(Hs)

Which yielded the following bode plot

Figure 9: Bode Plot of 60Hz Twin-T Notch Filter

-300

-250

-200

-150

-100

-50

0

Magnitude (

dB

)

System: Hs

Frequency (rad/sec): 370

Magnitude (dB): -285

101

102

103

104

-90

-45

0

45

90

Phase (

deg)

Bode Diagram

Frequency (rad/sec)

13

Looking at the simulation, it can be seen that the signal at 370 rad/s or 58.8Hz, has a

magnitude of -285dB, which means that that frequency is attenuated. Furthermore, the notching

starts to occur at about 304 rad/s and stops notching out the signal at about 452 rad/s, which is

about a 23Hz notch band. Therefore, this design satisfies both of the specifications needed for

the notch filter.

5th Order Butterworth Low-Pass Filter:

The 5th order Butterworth Low-Pass Filter was chosen because a filter with a fairly sharp

slope (small transition band) is ideal as this would reduce noise above 100Hz at a much faster

rate. According to [6]Increasing the order of the filter n times causes the transfer function of the

filter to have n poles, and the final slope of the transition band will be -20n dB/dec.

Unfortunately, this requires more components and therefore takes up more space on the circuit

board, and would increase production costs (especially if this circuit was to be mass produced).

A compromise had to be made, and since filtering had been done digitally at first using a 5th

order Butterworth Low-Pass Filter in Matlab, which worked well for what we were doing, this

was the type of filter that was created.

The method used to create this filter was to take the fifth-order Butterworth Polynomial

from [6] which is

( )( )( )

What this polynomial actually means is that the 5th-order filter will be created using a first-order

filter cascaded with two second-order filters. A prototype of the filter was then created using the

coefficients in this polynomial and setting

=1Ω and

, so .

For the first-order filter and Using

( )

rad/s

14

For the two second-order filters where for one of the second-order filters (say B),

and for the other (say C), so

for filter B:

for filter C:

Scaling:

Now that the prototype is designed, the components can be scaled to make the filter a 100Hz low

pass filter. Since the cutoff frequency will be → rad/s, this means that

rad/s

Since it is ideal to have capacitors small, R was chosen to be 10K, therefore

Following the equation

and using values that were available in the labs these values became

15

Figure 10: Schematic of 5th-Order Butterworth Low-Pass Filter

For this Filter, three LM741 Op-Amps were used in addition to the passive components

described above.

This circuit was then tested and simulated in Matlab using the following code

%Fifth-order Butterworth Low-Pass Filter fc=100Hz

R=10000;

Ca=147*10^(-9);

C1=533*10^(-9);

C2=47*10^(-9);

C3=183*10^(-9);

C4=122*10^(-9);

Hs=tf([1/(R*Ca)],[1 1/(R*Ca)]);

Hs1=tf([1/(R*R*C1*C2)],[1 2/(R*C1) 1/(R*R*C1*C2)]);

Hs2=tf([1/(R*R*C3*C4)],[1 2/(R*C3) 1/(R*R*C3*C4)]);

bode(Hs*Hs1*Hs2)

16

Which yielded the following bode plot

Figure 11: Bode Plot of 5th-Order Butterworth Low-Pass Filter

Looking at the resulting simulation, it can be seen that at about 617 rad/s (98.2Hz) the magnitude

is about-1.5 dB, and at about 6010 rad/s which is about (956.5Hz), the magnitude is about -96.2

dB. This corresponds well to the design specifications of a low-pass filter with a cutoff

frequency of 100Hz, and a slope for a 5th-order filter (-100dB/dec).

-150

-100

-50

0

50

Magnitude (

dB

)

System: untitled1

Frequency (rad/sec): 617

Magnitude (dB): -1.5

System: untitled1

Frequency (rad/sec): 6.01e+003

Magnitude (dB): -96.2

101

102

103

104

-450

-360

-270

-180

-90

0

Phase (

deg)

Bode Diagram

Frequency (rad/sec)

17

These two filters were then constructed using real hardware components, which can be

seen below

Figure 12: Final Implementation of the EMG circuit

Printing the Circuit Board:

Unfortunately, due to time constraints the circuit will not be able to be sent to a board

house and printed, as it requires about 2-3 weeks for this process. The schematic and the board

layout were constructed using Eagle software however, and can be seen below.

18

Figure 13: The schematic of the EMG with the Right-Leg Driver, the 100Hz Butterworth LPF,

and the 60Hz Twin-T Notch Filter

19

Figure 14: Board Layout of the EMG Circuit, RLD, LPF, and Notch Filter

Data and Hardware-Software Interface

Electrodes:

After experimentation and research, it is evident that the type of electrode, it's placement,

and how the skin/surface is prepared can make all of the difference when it comes to getting a

very good signal or a useless one. Initially, pennies were used since copper is a good conductor,

but most pennies are covered with many impurities and a layer of oxide which will reduce the

quality of the signal, and increase the impedance of the skin/electrode interface, which is the

opposite of what is needed. After this, many other types of conductors were tested from cans to

aluminum foil, but none would give a very clean signal. Finally, ECG electrodes were acquired

20

and were found to give much better results. Looking at [1], the proper preparation of the skin

and electrodes have been taken into consideration, as this can make or break your signal quality,

especially at the level of EMG vocal signals. Some of these preparations include washing the

surface area with water, using isotonic gel, and attaching the electrodes 5-10 minutes before

testing. This is imperative because without letting the isotonic gel set in and the electrode

making the best contact, it is easy to not pick up any signal at all. During one experiment, one

group member was hooked up to the circuit, and no signal was being shown on the oscilloscope.

After about eight minutes after the electrodes were attached to the subject, a signal became

visible and the circuit was working as expected.

The most commonly used electrodes in the development of the system were AgCl EKG

electrodes. While these electrodes are very conductive and help acquire clear signals, they do

not stay in place for more than thirty minutes to one hour. To counter electrode slip, medical

tape was used on top of the electrodes, but this did not extend the amount of time the electrodes

were stationary by much. When the electrodes slip the signals acquired change from what they

were when the electrodes were freshly applied. This means that the system needs to be

recalibrated and the recognition system retrained every hour or so, to ensure reliable recognition

and reliable signals. In a commercial implementation of this system the problem could be fixed

by either using sub-coetaneous electrodes or electrodes with better adhesive.

In the initial stages of the project the team had to experiment to find the best electrode

placement on the face. Many different placements were tried. For example, one electrode on the

side of the lips and one on the upper lip was tried. The picture below shows many of the

different pairings and placements that were tried.

Figure 15: Various Electrode Placements

21

The team found the best placement to gather information about the words a person is

speaking was to have one electrode at the chin, just below the lips, and another on the cheek,

about one inch forward from the ear. This produced data about lip, cheek, and jaw movement.

Ground was placed at the base of the throat. This ground placement help eliminate artifacts from

heartbeats and swallowing. The placement can be seen on an individual below:

Figure 16: Electrode placement example.

Hardware/Computer Interface:

Initially the data was imported into Matlab for processing over the audio jack of a laptop.

Even though this approach worked well initially, the team found that the data was noisy and

switched to a different approach. Additionally, the Matlab audio recorder worked differently on

several computers used to collect data. Now data is gathered by connecting the EMG circuit to a

Tektronix oscilloscope so the waveform can be viewed, and interfacing the oscilloscope with

Matlab, using tools from the instrument control toolbox. This approach is less noisy and more

reliable. It also has the advantage of allowing the team to see the data as it is being gathered,

which is great for analytical purposes, but this requires bringing an oscilloscope around wherever

the device is being used, not to mention an outlet must be nearby to plug the oscilloscope in.

The code written to support data gathering was constructed in such a way that allowed

the observers to see the data and selectively capture data on the screen by hitting “enter”. This

made it easy to gather good training data since any noise from coughing or smiling could be seen

on the oscilloscope before the waveform was entered into the training set. The ability to be

selective about data used increased the success of recognition systems later discussed.

22

Currently, a wireless interface is being constructed using an Arduino Uno Rev3, and a

XBee transmitter, along with the shield, which allows the XBee to attach to the Arduino. This

method was chosen because in order to send the signal from the hardware to the computer

wirelessly, the signal must first be converted from an analog signal to a digital signal. The

Arduino has an ADC built in, which will take an analog signal between 0 to 5V and discretizes

the signal. The voltage range from 0 to 5 is quantized into 1024 "bins" that each sample can be

organized into. Unfortunately, if you have a signal that only has a voltage change of millivolts,

this will decrease the resolution. Furthermore, if the signal has any negative components, then

the Arduino will not pick it up. To fix this problem, a DC bias of about 2.5V was added to the

signal, and then removed during processing, and the signal will be amplified a little more as well.

This will ensure that the signal stays primarily within the 0 to 5V range, and will utilize as many

of the 1024 "bins" as possible, increasing the resolution.

In order to create this DC bias, a voltage divider had to be implemented. A second EMG

circuit was created like in Fig.6, since analog filters would not be necessary for this interface,

and the voltage divider was then constructed on the breadboard. Assuming that a 9V battery will

eventually run the hardware, a 10KΩ resistor is placed in series with two 6.8KΩ resistors in

series, and the potential across all three resistors is 9V. The voltage between the 10KΩ and the

6.8KΩ resistor was about 5.186V, which can be used to power the Arduino. The voltage

between the two 6.8KΩ resistors was about 2.593V, which will be the DC offset added to the

output signal of the circuit.

Unfortunately, after spending a day working on just programming the XBee transmitter,

it seems that there was a compatibility problem with the hardware, and the operating system of

the laptop we were using. The program that was used to program the XBee antenna was X-CTU,

and it did not seem to communicate with or recognize the XBee properly using Windows Vista.

With time constraints, it did not seem efficient to spend more time or resources trying to get a

new laptop with bluetooth capability, or to install Windows 7 onto the current laptop, just to

program the antenna. This would not even ensure that the output of the circuit would be

transmitted properly, or have a usable resolution. This problem would take even more time, so

even though the bluetooth wireless interface can be a good way to communicate from the circuit

to the computer if there was more time, this is not the case, and the old interface with the

oscilloscope will suffice for now.

Pre-Processing:

Even though the oscilloscope to computer interface was much better than the initial

audio-jack to computer interface some residual noise was still present. While the circuit had a

60Hz notch filter and a low pass filter, the signal acquired by the computer had 60 Hz and above

noise. The noise from the interface was therefore filtered out in software. Before being used for

word recognition a low pass filter at 25Hz was applied to each signal since it was determined

that the signals contained no useful information for recognition above this frequency.

23

Figure 17: Filtered and Unfiltered “Left”

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2Filtered and Unfiltered Left

Sample

Am

plit

ude in V

olts

Unfiltered

Filtered Above 25Hz

24

Figure 18: Filtered and Unfiltered “Right”

Recognition Algorithms

Word Recognition:

Initially, a simple distance based decision rule was used to detect words. Training sets

consisted of ten examples of each word. The ten examples were each centered based on peak

energy and normalized, then the ten samples were averaged together to create an approximation

of an ideal word. The six “ideal” words were then used to create six orthonormal basis

functions, using the Gram-Schmidt orthonormalization procedure. The original “ideal” signals

were then projected onto the basis, and those projections were used as ideal points in the signal

space with which to compare new signals. This approach, however, only yielded about 50%

recognition if the signals were very distinct.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2Filtered and Unfiltered Right

Sample

Am

plit

ude in V

olts

Unfiltered

Filtered Above 25Hz

25

Figure19: Example of “ideal” signals

Figure 20: Example of basis functions generated by Gram-Schmidt, corresponding to

figure 19

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04

0.06Ideal Alpha

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04Ideal Omega

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04

0.06Ideal Left

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04Ideal Right

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04

0.06Ideal Forward

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04Ideal Reverse

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04

0.06Basis 1

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 2

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.06

-0.04

-0.02

0

0.02Basis 3

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04

0.06Basis 4

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 5

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04Basis 6

Sample

Norm

aliz

ed A

mplit

ude

26

Figure 21: Another example of “ideal” signals

Figure 22: Example of basis functions generated by Gram-Schmidt, corresponding to fig

21

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Ideal Alpha

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Ideal Omega

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04

0.06Ideal Left

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.06

-0.04

-0.02

0

0.02Ideal Right

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04

0.06

0.08Ideal Forward

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02

0

0.02

0.04Ideal Reverse

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 1

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 2

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 3

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 4

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04

0.06Basis 5

Sample

Norm

aliz

ed A

mplit

ude

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04

-0.02

0

0.02

0.04Basis 6

Sample

Norm

aliz

ed A

mplit

ude

27

Next, detection based on correlation with the “ideal” signals was tested. The parameter

used was the value at zero lag (where the signals most match). Using this approach better

recognition was achieved. Then the correlation and distance decision rules were combined, and

this approach yielded up to eighty-five percent accuracy in experiments. It should be noted that

the system is only so accurate for about thirty minutes after electrodes are initially applied to the

skin, since after that amount of time the electrodes start to move on the face and the signals

acquired are different than the signals from the original training set.

In addition to comparing new waveforms with the “ideal” waveforms acquired in the

training set a detection scheme in which the incoming waveform was compared to each

waveform in the training set was tried. The incoming signal was correlated with each of the

known 60 signals. Whichever set of signals had the highest aggregate correlation was

considered to be the set to which the incoming waveform belonged. While it was initially

thought that this approach might perform better than comparison with the six “ideal” signals, it

was surprisingly not an improvement. In fact, the number of correct assignments decreased

when compared to the number of correct assignments when the ideal signals were used for

comparison.

The initial use of the correlation and distance based decision rules was not as successful

as the later uses. The initial success rate was only about fifty to sixty percent. Therefore,

alternative approaches were explored, and a neural network was created to recognize the

waveforms.

A feed forward network with back propagation was created to recognize waveforms.

Even though only one hidden layer is needed to approximate any signal according to the

universal approximation theorem, the initial networks had several hidden layers and several

nodes in each layer. Many of these networks had anywhere between two and five hidden layers

with ten to forty nodes in each layer. The initial biases were set to zero and the weights to one.

The transfer function used for all nodes was a hyperbolic tangent sigmoid. The network with the

best accuracy turned out to be one with forty-two nodes in its only hidden layer. That network

achieved up to 70% accuracy. However, the networks had the same problem with electrode slip

as did the distance and correlation based detection.

A back propagation algorithm was used to adjust the weights of the network during

training to achieve the desired network output. While experimenting with the networks, data sets

from the same individual taken in one sitting, data sets from the same individual taken during

multiple sittings, and data sets from other individuals were used to train the network. When

training with sets from multiple individuals the network could not reach its mean squared error

performance goals (which were loosely set). Therefore, later networks were trained using only

data from one individual. However, this had the drawback of creating the need to personalize

networks and train for each use.

28

The features fed into the network took some time to develop. First, averaged frequencies

over time were used as features for the network. However, it was quickly determined that

frequency didn’t make much sense as a feature since the signals being detected are at most 25Hz.

Next, the average energy in an 80ms block was used as a feature. This was dramatically more

effective than the frequency features, but only yielded a result of up to fifty percent recognition.

The next features tried were linear predictive coding coefficients. These were added to the

feature vector in addition to the energy features. This did not improve the recognition

noticeably. Polynomial fit coefficients were also tried as features, but the coefficients for each

signal were so similar that it was not a good feature to use.

Figure 23 :Examples of Energy Feature Vectors

One of the main challenges when working with the neural network was determining how

strict the training goal should be. If too small an error were the goal then the network would not

be able to generalize if presented with different data. If, however, the error goal was too large

then the network would not perform well, even for known data. The best results were obtained

using a mean square error goal of 0.1 for the normalized signals.

An interesting exercise was done in which a person was asked to group signals by how

alike they were. The subject was given thirty waveforms (five instances of each word) and told

that there were six words but was not told how many were in each group. The same experiment

was then run through the neural network. The outcome of the experiment is shown below:

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Alpha Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Omega Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Left Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Right Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Forward Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Reverse Engergy Feature Vector

Vector Element

Norm

aliz

ed E

nerg

y

29

Figure 24: Outcome of human and computer grouping experiment

The experiment suggests two things: that the signals are not clearly distinguishable to

humans and computer and the feature vectors being used are still not a good match for the

system. Therefore, after observing the later success of the correlation and decision approach the

features used for the neural network were correlations with the ideal waveforms. Surprisingly,

adding the distances in signal space as features degraded the performance of the network to

about 60%. The scheme with only correlation values at zero lag produced 70% correct

recognitions, a great improvement over the earlier features used with the network.

Recognition Scheme Best Detection Achieved

Combined Distance and Correlation 85%

Distance 70%

Correlation 80%

Neural Network with Frequency Features 30%

Neural Network with Energy Features 50%

Neural Network with Energy and LPC

Coefficient

Features

50%

Neural Network with Correlation Features 70%

Neural Network with Correlation and Distance

Features 60%

The main outcome of all recognition testing is that so far, simple correlation and signal

space distances work best for this system. While neural networks do better than randomly

30

guessing six words, the correlation and distance still work better. However, for a large number

of signals the distance and correlation based technique takes a long time to compute, so given

many more words for the system to recognize and neural network might be the best choice for

word recognition since computing the output is much quicker.

Text-to-Speech Voice Synthesizer

A voice synthesizer is the artificial production of the human speech. We are

implementing a voice synthesizer in software and translating it from text to speech. The goal of

the text-to-speech voice synthesis is to convert text into an acoustic signal that is

indistinguishable from the human speech. It will transmit the information from machine to a

human speech. The process of the voice synthesizer is first to convert the text that contains

words into equivalent words. Once that is accomplished we then convert the text into a sound.

The system diagram of the text to speech voice synthesizer looks like following:

Figure 25: Diagram of Text to Speech Voice Synthesizer System [11]

There are many different ways to implement a voice synthesizes. There is Concatenative

synthesis, format synthesis, articulatory synthesis, hmm-based synthesis, and sinewave synthesis.

Concatenative synthesis is based on the stringing together of segments of recorded speech,

usually a short sequence of phonemes. Generally, concatenative synthesis produces the most

31

natural sounding synthesized speech. It is most high quality voice synthesizers today. However,

differences between natural variations in speech and the nature of the automated techniques for

segmenting the waveforms sometimes result in audible glitches in the output.

Articulatory synthesis refers to computational techniques for synthesizing speech based

on models of he human vocal tract and the articulation processes occurring there. HMM-based

synthesis is a synthesis method based on Hidden Markov Models, also called Statistical

Parametric Synthesis. In this system, the frequency spectrum (vocal tract), fundamental

frequency (vocal source), and duration (prosody) of speech are modeled simultaneously by

HMMs. Speech waveforms are generated from HMMs themselves base on the maximum

likelihood criterion.

Sinewave synthesis is a technique for synthesizing speech by replacing the formants

(main bands o energy with a pure tone whistles. Sinewave synthesis does not use human speech

samples at runtime. Instead, the synthesized speech output is created using additive synthesis and

an acoustic model. Parameters such as fundamental frequency, voicing, and noise levels are

varied over time o create a waveform of an artificial speech.

There are three main sub types of the concatenative synthesis: Unit selection synthesis,

Diphone synthesis, and Domain-specific synthesis. Unit selection synthesis requires enormous

databases to build models of the sounds of the speech that can readily be concatenated into a

decent sounding utterance with few flaws at the boundaries between speech sounds. Recorded

utterance is divided into individual phones, diphones, syllables, words, and/or sentences. It uses

visual illustrations like waveforms. Databases are created based on division of the segments and

parameters like pitch, time and position of the phonemes.

Diphone synthesis does not use enormous databases. It uses the minimum speech

database that is contained with all the sound-to-sound transitions (diphones). Diphone synthesis

undergoes the sonic glitches of the concatenative synthesis. Domain-specific synthesis

concatenated already pre-recorded sounds to create a complete utterance.

For our project we are using concatenative synthesis – unit selection synthesis. We stored

large amount of data into a database. Synthetic structure of the sentence that is spoken has been

user. The group recorded the sounds of the each phoneme of the word and saved it as an audio

file format. Then we collected a speech waveform signals and concatenated individual segments

to construct a new utterance.

We are synthesizing six words: alpha, omega, left, right, forward and reverse. We are

recording word by creating an audio recorder object with a sampling rate of a 44100Hz, 16-bit,

and 2-channel. Time sample of the speech with a microphone is 1 to 2000 seconds and the

formulation of the microphone signal data is SIN((2PI*500t)/FS). Each phoneme is recorded

separately and saved into a database. Then we plot the waveforms of each segment, crop them

into a desired frequency ranges and plot the new waveforms. Cropped sounds are getting

32

concatenated and written in one single sound. The following couple of paragraphs will contain

all the six words with the waveform plots, cropped plots and the original concatenated sound

plots.

Alpha is split into four phonemes: ae/l/f/ah. Each phoneme is recorded separately and the

waveforms are plotted:

Figure 26: Alpha (ae-l-f-ah) waveform.

After this we crop each phoneme waveform into desired frequency ranges:

33

Figure 27: Cropped Alpha (ae-l-f-ah) waveform.

The concatenated plot of the Alpha looks following:

Figure 28: Concatenated Alpha (ae-l-f-ah) waveform.

34

Omega is split into five phonemes: ow/m/eh/g/ah. Each phoneme is recorded separately

and the waveforms are plotted:

Figure 29: Omega (ow-m-eh-g-ah) waveform.

Omega waveform after cropping each phoneme waveform into desired frequency ranges:

Figure 30: Cropped Omega (ow-m-eh-g-ah) waveform.

The concatenated plot of the Omega:

35

Figure 31: Concatenated Omega (ow-m-eh-g-ah) waveform.

Left is split into four phonemes: l/eh/f/t. Each phoneme is recorded separately and the


Figure 32: Left (l-eh-f-t) waveform.

36

Left waveform after cropping each phoneme waveform into desired frequency ranges:

Figure 33: Cropped Left (l-eh-f-t) waveform.

The concatenated plot of the Left:

Figure 34: Concatenated Left (l-eh-f-t) waveform.

37

Right is split into three phonemes: r/ay/t. Each phoneme is recorded separately and the


Figure 35: Right (r-ay-t) waveform.

Right waveform after cropping each phoneme waveform into desired frequency ranges:

Figure 36: Cropped Right (r-ay-t) waveform.

38

The concatenated plot of the Right:

Figure 37: Concatenated Right (r-ay-t) waveform.

Forward is split into six phonemes: f/uh/r/w/axr/d. Each phoneme is recorded separately and the


Figure 38: Forward (f-uh-r-w-axr-d) waveform.

39

Forward waveform after cropping each phoneme waveform into desired frequency

ranges:

Figure 39: Cropped Forward (f-uh-r-w-axr-d) waveform.

The concatenated plot of the Forward:

40

Figure 40: Concatenated Forward (f-uh-r-w-axr-d) waveform.

Reverse is split into five phonemes: r/iy/v/axr/s. Each phoneme is recorded separately and the


Figure 41: Reverse (r-iy-v-axr-s) waveform.

Reverse waveform after cropping each phoneme waveform into desired frequency ranges:

41

Figure 42: Cropped Reverse (r-iy-v-axr-s) waveform.

The concatenated plot of the Reverse:

Figure 43: Concatenated Reverse (r-iy-v-axr-s) waveform.

Although the concatenation process is very straightforward, large databases may require

complex search algorithms and the signal processing could be modified to achieve desired

42

speaker characteristics. The final speech sounds natural and more recognizable after

concatenative synthesis. Some speech synthesis produce continues speech is we select waveform

segments from databases by a large number of segments but we usually do not record them but

they are generic speeches. To concatenate large database segments may give us a very good

quality speech but those techniques are costly in terms of data collection, organization, and

storing into the memory.

Filtering and Smoothing the Speech Sound:

Filtering and smoothing a pre-recorded sound is a very important development in a signal

processing. The main idea of filtering is to average a large window of points and calculate the

least squares fit. In speech synthesis signal processing is used to smooth the existing waveform

from errors. Sometimes linear interpolation in the frequency domain does not give us a good

output and we need to seek other algorithms that provides natural transitions. Spectral smoothing

helps to modify existing audio frames and interpolation helps to add more frames as needed.

When no spectral smoothing is used for the audio files the sound will sound unnatural.

Therefore to eliminate such an obstacle we are using Sovitzky-Golay smoothing filter. Savitzky-

Golay smoothing filter is digital polynomial smoothing filter and they are the most frequently

used digital smoothing filters in spectrometry.

The Savitzky-Golay filter SG(N,n) is linear and/or shift variant and acts on a vector of

input samples x(k) to produce the smoothed vector of y(k). When the window N=2M+1 is taken

with the samples of x(-M), …, x(M) the best least squares fit by the polynomial vector of p(-M),

…, p(M) of the even degree of n. Same output will be applied if the window is shifted by k with

the samples of x(k-M), …, x(k+M). The filter output of y(k) is the center of the least squares fit

to the 2M+1 samples with the degree of n. The Savitzky-Golay filter value formula is,

Figure 44: Savitzky-Golay Smoothing Filter Formula [8]

where nL and nR are the number of samples to the left and to the right respectively.

When applied the Savitzky-Golay smoothing filter to our already concatenated sound

waveforms they looked as following:

43

Figure

45: Filtered Alpha using The Savitzky-Golay Filter.

Figure 46: Filtered Omega using The Savitzky-Golay Filter.

44

Figure 47: Filtered Left using The Savitzky-Golay Filter.

Figure 48: Filtered Right using The Savitzky-Golay Filter.

45

Figure 49: Filtered Forward using The Savitzky-Golay Filter.

Figure 50: Filtered Reverse using The Savitzky-Golay Filter.

46

Spectral smoothing is the most common that people used for the speech and/or audio

coding. From studies spectral smoothing performs at its best then original spectra are similar to

each other like concatenative synthesis with large databases.

Spectral interpolation or waveform interpolation is a technique that helps to shape pitch-

period waveforms. It operates frame-by-frame basis and in each segment pitch segment and/or

waveforms are interpreted. Waveform interpolation is extracted from the original sound or signal

at some time interval. In order to produce interpolated waveforms pitch period and signal has to

be interpolated in either time domain or frequency domain. It provides smoother and nicer results

for a large number of interpolated pitch periods and calculates smoothed speech frames.

The following graphs represent the interpolated waveforms of all the six words that we

have been using for the project:

Figure 51: Waveform Interpolation of Alpha.

47

Figure 52: Waveform Interpolation of Omega.

Figure 53: Waveform Interpolation of Left

48

Figure 54: Waveform Interpolation of Right.

Figure 55: Waveform Interpolation of Forward.

49

Figure 56: Waveform Interpolation of Reverse.

After applying spectral smoothing the speech is smoother and has more natural sound. It

definitely improved the concatenative speech synthesis sounding. When smoothing is done and

all the filtered sounds are recorded and saved we produce speech from the words. We simply

input the word by text we would like to speak and our voice synthesizer gives us a speech sound.

Conclusion

After several demos, it can be concluded that the EMG Vocalizer is successful in taking

the EMG signal of a spoken, or mouthed word, classifying it as one of the six trained words, and

then outputting a synthesized version of that word. This can be done with 85% accuracy, but

after about a half hour to an hour, the electrodes start to move out of place and the signals

acquired differ from the signals used to train the system significantly. This movement is due to

lack of good adhesive material on the electrodes and the weight of the alligator clips that were

50

attached. As a result, the system had to be re-trained every hour or so. The EMG vocalizer

could perform better given electrodes that do not slip over time.

Other than the unreliability of the electrodes, all other aspects of the project worked

properly. The EMG circuit that was constructed successfully picked up the EMG signals when a

word is mouthed or spoken. This signal was amplified and filtered to attain a high signal-to-

noise ratio using analog circuitry. The signal was then sent to the computer and processed in

Matlab where a combination of distance and correlation based recognition schemes were used to

classify the incoming signal as one of the 6 keywords that were trained into the system. The

output of this is a text word that is input into the text-to-speech voice synthesizer. Using the unit

selection synthesis type of concatenative synthesis, the text string was able to be translated into

an audible representation of the word. Therefore the EMG Vocalizer is successful in its

objective to turn an EMG signal from a spoken or mouthed word, deciding what the word is, and

then outputting the word as an audio signal that was not disrupted by environmental noise.

51

References

[1] Biopac Systems, Inc. EDA (GSR) Subject Preparation.

< http://www.biopac.com/eda-gsr-subject-preparation>

[2] Driessen, Peter F. The Experimental Portable EEG/EMG Amplifier. University of Victoria.

August 1, 2003. <http://www.ece.uvic.ca/~elec499/2003a/group11/thereport.pdf>

[3] Haykin, Simon S., Communication Systems. Wiley, 2009.

[4] Haykin, Simon S., Neural Networks. Michigan: Prentice Hall, 1999.

[5] Jonsson, Fredrik. "Jonsson.eu." Web Site of Fredrik Jonsson. 4 Dec. 2011. Web. Apr. 2012.

<http://www.jonsson.eu/>.

[6] Nilsson, James W., Riedel, Susan A. Electric Circuits. 8th ed. Upper Saddle River, New

Jersey. 2008 Pearson Educaton, Inc. (pp.606-640)

[7] Persson, Per-Olof, and Gilbert Strang. "Smoothing by Savitzky-Golay and Legendre Filters."

Web

[8] Precision, Low Power INSTRUMENTATION AMPLIFIERS. Burr Brown Corporation.

1995.

<http://pdf1.alldatasheet.com/datasheet-pdf/view/56692/BURR-BROWN/INA129.html>

[9] Rabiner, Lawrence R., and Stephen E. Levinson. "Isolated and Connected Word Recognition-

Theory and Selected Applications." IEEE Transactions on Communications. 5th ed. Vol.

COM-29. 1981. 621-50.

[10] The Story of My ECG,< http://www.eng.utah.edu/~jnguyen/ecg/long_story_3.html>

[11] "Text-to-speech." - File Exchange. Web. 19 Mar. 2012.

<http://www.mathworks.com/matlabcentral/fileexchange/18091-text-to-speech>.

[12] Utama, Robert J., Ann K. Syrdal, and Alistair Conkie. "Six Approaches to Limited Domain

Concatenative Speech Synthesis." 17 Sept. 2006. Web. Mar. 2012.

http://www.biopac.com/eda-gsr-subject-preparation

http://www.ece.uvic.ca/~elec499/2003a/group11/thereport.pdf

http://pdf1.alldatasheet.com/datasheet-pdf/view/56692/BURR-BROWN/INA129.html

http://www.eng.utah.edu/~jnguyen/ecg/long_story_3.html

52

[13] (W3C). Web. 19 Mar. 2012. "Speech Synthesis Markup Language (SSML) Version 1.0."

World Wide Web Consortium. <http://www.w3.org/TR/speech-synthesis/>.

53

Appendix A- Costs

Item Cost

Arduino Uno $23.00

SainSmart Xbee Shield Module for Arduino UNO MEGA Duemilanove $14.95

Xbee 1mW antenna $66.90

Burr-Brown INA129P $7.70

Total $112.55

54

Appendix B- Code for Word Recognition

function result=bigcompare(wave, M);

% should have ten instances of the trainwaves

% compares an input to each and every one of the waves in the training set

%....60 comparisons, test to see if this does better than the "ideal"

%signal

[r c d] = size(M);

W1=zeros(c,d,r);

for d=1:r

for k=1:6

W1(:,k,d)= M(d,:,k);

end

end

cvec=zeros(10, 6);

for k=1:10

cvec(k,:)=corrs(wave', W1(:,:,k)');

end

c=mean(cvec);

[val ind]=max(c);

resvec=zeros(6,1);

for r=1:length(ind);

resvec(ind(r))=resvec(ind(r))+1;

end

[val result]=max(resvec);

55

%this function will output "centered" waveforms for the matched filter

%this installment takes the max energy point as the center

%then circularly shifts the waveforms :)

function newform=centerwave(signal)

N=length(signal);

center=floor(N/2);

S=abs(signal).^2;

[val ind]=max(S); %take max energy as center

diff=ind-center;

if diff>0

newform=modshift(signal, 'left', diff);

elseif diff<0

newform=modshift(signal,'right',diff);

end

if diff==0

newform=signal;

end

56

% this creates the groupings of waves to compare against the human test

load('KWeidmann.mat');

load('KWeidmannnet.mat');

[r c d]=size(M);

confuse=zeros(6,6);

for k=1:d

for t=1:r/2

s=sim(netn,createfv(M(t,:,k),'norm'));

[val ind]=max(s);

confuse(k,ind)=confuse(k,ind)+1;

end

end

57

function c=condense(waveform);

% takes a length 10000 waveform (provided by osc) and condenses to length 50

% vector of mean energy values in an 80ms window

numrun=10000/50;

c=zeros(1,50);

waveform=waveform-mean(waveform);

for f=1:50

c(f)=mean(waveform((numrun*(f-1)+1):numrun*f).^2);

end

c=normalize(c);

58

%signal is the input signal, and M is a matrix of ideal vectors

%the rows of M are signals and columns are points

%all signals should be normalized

%For the correlation comparison

function cor=corrs(signal,M)

[r c]=size(M);

cor=zeros(1,r);

for k=1:r

cor(k)=max(xcorr(signal,M(k,:),'coeff'));

end

59

function [Basis ,Ideal, Ideal_projections]=createbasis(M);

%M is a row by col by depth where depth is the number of basis to create

%and row is the number of training sets for each utterance

[row col depth]=size(M);

Ideal=zeros(depth,col);

Basis=Ideal;

for t=1:depth

for k=1:row

M(k,:,t)=normalize(M(k,:,t));

M(k,:,t)=centerwave(M(k,:,t));

end

Ideal(t,:)=normalize(mean(M(:,:,t))); %create the ideal waveforms

end

Basis=gs(Ideal); %compute the basis functions using Gram-Schmidt

Ideal_projections=projections(Ideal,Basis); %get the idea projections onto the basis function

60

%This is a function to create feature vectors that are part energy and part

%frequency information about a signal

function fv=createfv(sig,strng)

[num den]=butter(4, 100/2500,'low'); %remove the high frequency jitters

sig=filter(num, den, sig);

a1= normalize(abs(tess(sig)));

a2= condense(sig);

fv=[a1'; a2'];

s=strcmp('norm',strng);

if s==1;

fv=fv/max(fv);

end

61

%This is a function to create feature vectors that are part energy and part

%frequency information about a signal This version also includes linear

%predictive coding as a feature

function fv=createfv2(sig)


sig=filter(num, den, sig);

a1= (abs(tess(sig)));

a1=a1/max(abs(a1));

a2= condense(sig);

a2=a2/max(abs(a2));

a3= lpc(sig,10);

fv=[a1'; a2'; a3'];

62

function c= createfv3(sig, waveforms, basis, idp);

%creates a feature vector of correlation values and (perhaps) points in

%signal space

%The signal, in this case is compared to the ideal waveforms

sig=sig';

c=corrs(sig,waveforms);

%

p=projections(sig,basis);

d=distance(idp,p);

c=[c d];

%

63

% Create a VISA-USB object.

function [Basis Waveforms IdP]=data_gather(name,deviceObj);

% Connect device object to hardware.

connect(deviceObj);

row=10;

col=10000;

depth=6;

M=zeros(row,col,depth);

% Execute device object function(s).

r=input('Press to Start Training Alpha');

A=zeros(col,row);

for k=1:row

r=input('Press Enter');

groupObj = get(deviceObj, 'Waveform');

groupObj = groupObj(1);

[A(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,1)=(A)';

size(M)

r=input('Press to Start Training Omega');

O=zeros(col,row);

for k=1:row




[O(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,2)=(O)';

64

r=input('Press to Start Training Left');

L=zeros(col,row);

for k=1:row




[L(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,3)=(L)';

r=input('Press to Start Training Right');

R=zeros(col,row);

for k=1:row




[R(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,4)=(R)';

r=input('Press to Start Training Forward');

F=zeros(col,row);

for k=1:row




[F(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,5)=(F)';

r=input('Press to Start Training Reverse');

65

Rev=zeros(col,row);

for k=1:row




[Rev(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');

end

M(:,:,6)=(Rev)';

name=strcat(name,'.mat')

[Basis Waveforms IdP]=createbasis(M);

save(name,'Basis','Waveforms', 'IdP');

66

function [wave,x]=data_single();

deviceObj=start();

% Execute device object function(s).

wave=zeros(10000);

r=input('Press Any Key');



[wave,x] = invoke(groupObj, 'readwaveform', 'channel1');

67

%simple distance based decision rule :) with correllation added on

function [d ind]=decide(distance, cor)

decvect=.3*(1-distance) + .7*cor;

decvect=cor;

[val ind]=max(decvect);

switch (ind)

case 1

d='alpha';

case 2

d='omega';

case 3

d='left';

case 4

d='right';

case 5

d='forward';

case 6

d='reverse';

end

%%and so on :)

68

%demo for Professor Rose on 4/24 @ 10 AM

%load('netinuse.mat');

k=1;

load('KWeidmannnabs.mat');


for d=1:6

for r=1:10

M(r,:,d)= filter(num, den, M(r,:,d));

end

end

Mtest=M;



while k~=0

k=input('To end, press 0');

[wave,x]=data_single();

%

d=createfv(wave,'norm');

res=sim(netn,d);

[y ind]=max(res);

%

wave=filter(num, den, wave);

wave=normalize(wave);

wave=centerwave(wave);

% begin reco based on distance and correlation

p=projections(wave,Basis);

d=distance(IdP,p);

c=corrs(wave,Waveforms);

[dec ind]=decide(d,c);

word(ind);

switch (ind)

case 1

d='alpha';

case 2

d='omega';

case 3

69

d='left';

case 4

d='right';

case 5

d='forward';

case 6

d='reverse';

end

d

%%%Kristin and Sophie's Code goes here!!!!

end

70

%computes the distance of one signal from all ideal comparisons

function d=distance(ideal_projections, projections)

[r c]=size(ideal_projections);

for k= 1:r

d(k)= sqrt(sum((ideal_projections(k,:)-projections).^2));

end

71

%this function does gram schmidt orthanormalization

function basis=gs(signals)

[r c]=size(signals);

basis=zeros(r,c);

basis(1,:)=normalize(signals(1,:));

for t=2:r

v=signals(t,:);

for k=1:t-1

g=sum(v.*conj(basis(k,:))); %inner product to get projection

v=v-(g.*basis(k,:));

end

basis(t,:)=normalize(v);

end

72

% the main program for the Capstone classifier

%Initializes either training or use of the system

%note: this is an old version

t=1;

r=input('To train a new sequence press 0, to test press 1 ');

if r==0

%we gather training vectors

name=input('First Initial Last Name ','s');

deviceObj=start;

[Basis, Wavforms, Idp]=data_gather(name,deviceObj);

end

if r==1

%we use a previously defined training set and do reco

name=input('First Initial and Last name ','s');

name=strcat(name,'.mat');

load(name)

while t==1

deviceObj=start;

wave=get_waveform(deviceObj);

p=projections(wave,Basis);

d=distance(IdP,p);

c=corrs(wave,Waveforms);

decide(d,c)

t=input('To Continue Press 1, to Exit press 0');

end

end

73

%this function finds the projections of all signals onto all basis

%signals are on the rows, basis projection values on the columns

function ps=projections(signals, basis);

[r1 c1]=size(signals);

[r2,c2]=size(basis);

for q=1:r1

for t=1:r2

ps(q,t)=sum(signals(q,:).*conj(basis(t,:)));

end

end

74

%Neural Network: feature vectors and functions to create them changed frequently

numset=20; % this is the number of examples of data we will use to train the nn

W=zeros(6,6,4);

load('JPadgettnabs.mat'); M1=M;

load('JPadgettnabs2.mat');


[r c dpth]=size(M);

%want to get into 6x10000x10

W1=zeros(10000,6,numset);

for d=1:numset

for k=1:6

if d<=10

W1(:,k,d)= M(d,:,k);

elseif d>10 && d<21

W1(:,k,d)= M1(d-10,:,k);

end

%

if d>=21

W1(:,k,d)= M2(d-20,:,k);

end

%

end

end

[r1 c1 depth]=size(W1);

for d=1:depth

for k=1:dpth

W(:,k,d)=createfv3(W1(:,k,d), Waveforms); %see how the frequency works

end

end

%W=W/max(max(max(W)));

%W=abs(W);

75

targets=eye(6);

alphabet=W(:,:,1);

%[alphabet,targets] = prprob;

[R,Q] = size(alphabet);

[S2,Q] = size(targets);

%m1=min(min(min(W)));

%m2=max(max(max(W)));

netn = newff(minmax(alphabet),[41 S2],...

'tansig' 'tansig' ,'traingdx');

%tan sig appears to work bestin this case :)

netn.trainParam.goal = .04; % Mean-squared error goal.... 0.03 worked well here... not sure

why it %stopped working .. because you used another

%set to train the network, silly :p

netn.trainParam.epochs = 800; % Maximum number of epochs to train.

T=targets;

for pass = 1:depth

k=mod(pass,2);

fprintf('Pass = %.0f\n',pass);

%

P = [alphabet, alphabet, ...

(alphabet + randn(R,Q)*0.1), ...

(alphabet + randn(R,Q)*.02)];

%

if k==0

P=W(:,:,floor(pass/2));

end

if k==1

P=W(:,:,depth-floor(pass/2)) ;

end

76

[netn,tr] = train(netn,P,T);

end

netn.trainParam.goal = .1; % Mean-squared error goal.

netn.trainParam.epochs = 500; % Maximum number of epochs to train.

net.trainParam.show = 5; % Frequency of progress displays (in epochs).

P = alphabet;

T = targets;

[netn,tr] = train(netn,P,T);

77

function s=normalize(sig);

s=sig/sqrt(sum(abs(sig).^2));

78

%this function finds the projections of all signals onto all basis

%signals are on the rows, basis projection values on the columns

function ps=projections(signals, basis);

[r1 c1]=size(signals);

[r2,c2]=size(basis);

for q=1:r1

for t=1:r2

ps(q,t)=sum(signals(q,:).*conj(basis(t,:)));

end

end

79

Appendix C- Speech Synthesis Code

NewRecord.m

%Create an audiorecorder object for CD-quality audio in stereo,

and view its properties:

recObj = audiorecorder(44100, 16, 2);

get(recObj)

%Collect a sample of your speech with a microphone, and plot the

signal data:

fs=2000;

t=1:2000;

s=sin(2*pi*500*t./fs);

% Record your voice for 10 seconds.

recObj = audiorecorder;

disp('Say Phoneme at the Beep')

sound(s,2000);

pause(1);

recordblocking(recObj, 10);

disp('End of Recording.');

% Play back the recording.

play(recObj);

80

% Store data in double-precision array.

myRecording = getaudiodata(recObj);

% Plot the waveform.

figure(1)

plot(myRecording);

%Save as a .wav file

wavwrite(myRecording,'alpha.wav');

81

concatAlpha.m

%concatenating Alpha

%reading 'ae'

a = wavread('ae');

%graph of the 'ae'

figure(1)

plot(a)

title('ae')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'ae'

b= a(7100:8600)

figure(2)

plot(b)

title('Cropped ae')

xlabel('Freq (Hz)')

ylabel('dB')

wavwrite(b,'CropAe.wav');

cAe = wavread('CropAe');

82

%reading 'l'

c = wavread('l');

%graph of the 'l'

figure(3)

plot(c)

title('l')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of the 'l'

d= c(4300:6400)

figure(4)

plot(d)

title('Cropped l')

xlabel('Freq (Hz)')

ylabel('dB')

wavwrite(d,'CropL.wav');

cL = wavread('CropL');

%readin 'f'

e = wavread('f');

%graph of the 'f'

83

figure(5)

plot(e)

title('f')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of the 'f'

f= e(4000:6500)

figure(6)

plot(f)

title('Cropped f')

xlabel('Freq (Hz)')

ylabel('dB')

wavwrite(f,'CropF.wav');

cF = wavread('CropF');

%readin 'ah'

g = wavread('ah');

%graph of the 'ah'

figure(7)

plot(g)

title('ah')

xlabel('Freq (Hz)')

ylabel('dB')

84

%cropped graph of the 'ah'

h= g(7500:9000)

figure(8)

plot(h)

title('Cropped ah')

xlabel('Freq (Hz)')

ylabel('dB')

wavwrite(h,'CropAh.wav');

cAh = wavread('CropAh');

%concatenating the cropped sound of Alpha

alpha = [cAe;cL;cF;cAh];

sound(alpha)

%writing the Alpha.wav file

wavwrite(alpha, 'Alpha');

figure(9)

plot(alpha)

title('Alpha')

xlabel('Freq (Hz)')

ylabel('dB')

%Savitzky-Golay Filter applied to Alpha

85

frame = 9;

degree = 0;

y = sgolayfilt(alpha, degree, frame);

figure(10)

plot(y)

title('Filtered Alpha')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing filtered sound of Alpha

wavwrite(alpha, 'FiltAlpha.wav');

86

concatOmega.m

%concatenating Omega

%reding 'ow'

a = wavread('ow');

%graph of the 'ow'

figure(1)

plot(a)

title('ow')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'ow'

b= a(4100:6200)

figure(2)

plot(b)

title('Cropped ow')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropOw.wav file

wavwrite(b,'CropOw.wav');

87

cO = wavread('CropOw');

%reading 'm'

c = wavread('m');

figure(3)

plot(c)

title('m')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'm'

d= c(3800:7000)

figure(4)

plot(d)

title('Cropped m')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped Crop.wav file

wavwrite(d,'CropM.wav');

cM = wavread('CropM');

%reading 'eh'

e = wavread('eh');

88

%graph of 'eh'

figure(5)

plot(e)

title('eh')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'eh'

f= e(6000:7800)

figure(6)

plot(f)

title('Cropped eh')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropEh.wav file

wavwrite(f,'CropEh.wav');

cEh = wavread('CropEh');

%reading 'g'

g = wavread('g');

%graph of 'g'

figure(7)

plot(g)

89

title('g')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'g'

h= g(6800:7600)

figure(8)

plot(h)

title('Cropped g')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropG.wav file

wavwrite(h,'CropG.wav');

cG = wavread('CropG');

%reading 'ah'

i = wavread('ah');

%graph of 'ah'

figure(9)

plot(i)

title('ah')

xlabel('Freq (Hz)')

ylabel('dB')

90

%cropped graph of 'ah'

j= i(7500:9000)

figure(10)

plot(j)

title('Cropped ah')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropAh.wav file

wavwrite(j,'CropAh.wav');

cAh = wavread('CropAh');

%concatenating the cropped sound of Omega

omega = [cO;cM;cEh;cG;cAh];

sound(omega)

%writing the Omega.wav file

wavwrite(omega, 'Omega');

figure(11)

plot(omega)

title('Omega')

xlabel('Freq (Hz)')

ylabel('dB')

91

%Savitzky-Golay Filter applied to Omega

frame = 9;

degree = 0;

y = sgolayfilt(omega, degree, frame);

figure(12)

plot(y)

title('Filtered Omega')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing filtered sound of Omega

wavwrite(omega, 'FiltOmega.wav')

92

concatLeft.m

%concatenating Left

%reading 'l'

a = wavread('l');

%graph of 'l'

figure(1)

plot(a)

title('l')

xlabel('Freq (Hz)')

ylabel('dB')

%graph of cropped 'l'

b= a(4300:6400)

figure(2)

plot(b)

title('Cropped l')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropL.wav file

wavwrite(b,'CropL.wav');

cL = wavread('CropL');

93

%reading 'eh'

c = wavread('eh');

figure(3)

plot(c)

title('eh')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'eh'

d= c(6000:7800)

figure(4)

plot(d)

title('Cropped eh')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropEh.wav file

wavwrite(d,'CropEh.wav');

cEh = wavread('CropEh');

%reading 'f'

e = wavread('f');

figure(5)

plot(e)

title('f')

94

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'f'

f= e(4000:6500)

figure(6)

plot(f)

title('Cropped f')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropF.wav file

wavwrite(f,'CropF.wav');


%reading 't'

g = wavread('t');

figure(7)

plot(g)

title('t')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 't'

h= g(5900:7000)

95

figure(8)

plot(h)

title('Cropped t')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropT.wav file

wavwrite(h,'CropT.wav');

cT = wavread('CropT');

%concatenating the cropped sound of Left

left = [cL;cEh;cF;cT];

sound(left)

%writing the Left.wav file

wavwrite(left, 'Left')

figure(9)

plot(left)

title('Left')

xlabel('Freq (Hz)')

ylabel('dB')

%Savitzky-Golay Filter applied to Left

frame = 9;

96

degree = 0;

y = sgolayfilt(left, degree, frame);

figure(10)

plot(y)

title('Filtered Left')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing the filtered sound of Left

wavwrite(left, 'FiltLeft.wav');

97

concatRight.m

%concatenating Right

%reading 'r'

a = wavread('r');

%graph of 'r'

figure(1)

plot(a)

title('r')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'r'

b= a(4500:6500)

figure(2)

plot(b)

title('Cropped r')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropR.wav file

wavwrite(b,'CropR.wav');

cR = wavread('CropR');

98

%reading 'ay'

c = wavread('ay');

%graph of 'ay'

figure(3)

plot(c)

title('ay')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'ay'

d= c(6000:7000)

figure(4)

plot(d)

title('Cropped ay')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropAy.wav file

wavwrite(d,'CropAy.wav');

cAy = wavread('CropAy');

%reading 't'

e = wavread('t');

99

%graph of 't'

figure(5)

plot(e)

title('t')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 't'

f= e(5900:7000)

figure(6)

plot(f)

title('Cropped t')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropT.wav file

wavwrite(f,'CropT.wav');

cT = wavread('CropT');

%concatenating the cropped sound of Right

right = [cR;cAy;cT];

sound(right)

%writing the Right.wav file

100

wavwrite(right, 'Right')

figure(7)

plot(right)

title('Right')

xlabel('Freq (Hz)')

ylabel('dB')

%Savitzky-Golay Filter applied to Right

frame = 9;

degree = 0;

y = sgolayfilt(right, degree, frame);

figure(8)

plot(y)

title('Filtered Right')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing filltered sound of Right

wavwrite(right, 'FiltRight.wav');

101

concatForward.m

%concatenating Forward

%reading 'f'

a = wavread('f');

%graph of the 'f'

figure(1)

plot(a)

title('f')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'f'

b= a(4000:6500)

figure(2)

plot(b)

title('Cropped f')

xlabel('Freq (Hz)')

ylabel('dB')

102

%writing cropped CropF.wav file

wavwrite(b,'CropF.wav');


%reading 'uh'

c = wavread('uh');

%graph of 'uh'

figure(3)

plot(c)

title('uh')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'uh'

d= c(5900:7000)

figure(4)

plot(d)

title('Cropped uh')

xlabel('Freq (Hz)')

ylabel('dB')

103

%writing cropped CropUh.wav file

wavwrite(d,'CropUh.wav');

cUh = wavread('CropUh');

%reading 'r'

e = wavread('r');

%graph of 'r'

figure(5)

plot(e)

title('r')

xlabel('Freq (Hz)')

ylabel('dB')


f= e(4500:6500)

figure(6)

plot(f)

title('Cropped r')

xlabel('Freq (Hz)')

ylabel('dB')


wavwrite(f,'CropR.wav');


104

%reading w

g = wavread('w');

%graph of 'w'

figure(7)

plot(g)

title('w')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'w'

h= g(3900:6600)

figure(8)

plot(h)

title('Cropped w')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropW.wav file

wavwrite(h,'CropW.wav');

cW = wavread('CropW');

105

%reading 'axr'

i = wavread('axr');

%graph of 'axr'

figure(9)

plot(i)

title('axr')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'axr'

j= i(6100:8100)

figure(10)

plot(j)

title('Cropped axr')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropAxr.wav file

wavwrite(j,'CropAxr.wav');

cAxr = wavread('CropAxr');

%reading 'd'

k = wavread('d');

106

figure(11)

plot(k)

title('d')

xlabel('Freq (Hz)')

ylabel('dB')

l= k(6050:7800)

figure(12)

plot(b)

title('Cropped d')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropD.wav file

wavwrite(l,'CropD.wav');

cD = wavread('CropD');

%concatenating the cropped sound of Forward

forward = [cF;cUh;cR;cW;cAxr;cD];

sound(forward)

%writing the Forward.wav file

wavwrite(forward, 'Forward.wav');

figure(13)

107

plot(forward)

title('Forward')

xlabel('Freq (Hz)')

ylabel('dB')

%Savitzky-Golay Filter applied to Forward

frame = 9;

degree = 0;

y = sgolayfilt(forward, degree, frame);

figure(14)

plot(y)

title('Filtered Forward')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing filtered sound of Forward

wavwrite(forward, 'FiltForward.wav');

108

concatReverse.m

%concatenating Reverse

%readin 'r'

a = wavread('r');

%graph of 'r'

figure(1)

plot(a)

title('r')

xlabel('Freq (Hz)')

ylabel('dB')


b= a(5000:7300)

figure(2)

plot(b)

title('Cropped r')

xlabel('Freq (Hz)')

ylabel('dB')


wavwrite(b,'CropR.wav');


109

%readin 'iy'

c = wavread('iy');

%graph of 'iy'

figure(3)

plot(c)

title('iy')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'iy'

d= c(7800:9100)

figure(4)

plot(d)

title('Cropped iy')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropIy.wav file

wavwrite(d,'CropIy.wav');

cIy = wavread('CropIy');

%reading 'v'

e = wavread('v');

110

%graph of 'v'

figure(5)

plot(e)

title('v')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'v'

f= e(14000:15500)

figure(6)

plot(f)

title('Cropped v')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropV.wav file

wavwrite(f,'CropV.wav');

cV = wavread('CropV');

%reading 'axr'

g = wavread('axr');

%graph of 'axr'

figure(7)

plot(g)

111

title('axr')

xlabel('Freq (Hz)')

ylabel('dB')

%cropped graph of 'axr'

h= g(6100:8100)

figure(8)

plot(h)

title('Cropped axr')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropAxr.wav file

wavwrite(h,'CropAxr.wav');

cAxr = wavread('CropAxr');

%reading 's'

i = wavread('s');

%graph of 's'

figure(9)

plot(i)

title('s')

xlabel('Freq (Hz)')

112

ylabel('dB')

%cropped graph of 's'

j= i(5100:6000)

figure(10)

plot(j)

title('Cropped s')

xlabel('Freq (Hz)')

ylabel('dB')

%writing cropped CropS.wav file

wavwrite(j,'CropS.wav');

cS = wavread('CropS');

%concatenating the cropped sound of Reverse

reverse = [cR;cIy;cV;cAxr;cS];

sound(reverse)

%writing the Reverse.wav file

wavwrite(reverse, 'Reverse');

figure(11)

plot(reverse)

title('Reverse')

xlabel('Freq (Hz)')

113

ylabel('dB')

%Savitzky-Golay Filter applied to Reverse

frame = 9;

degree = 0;

y = sgolayfilt(reverse, degree, frame);

figure(12)

plot(y)

title('Filtered Reverse')

xlabel('Freq (Hz)')

ylabel('dB')

sound(y)

%writing filtered sound of Reverse

wavwrite(reverse, 'FiltReverse.wav');

114

interpAlpha.m

clear;

clc;

[x,Fs,bits]=wavread('FiltAlpha.wav');

n = length(x)

k1 = wavread('FiltAlpha');

sound(k1)

%Interpolation of the signal

intX = zeros(1,2*length(x));

intX(1:2:2*length(x)) =x;

wavwrite(x,Fs,bits,'NewAlpha.wav');

s1 = wavread('NewAlpha');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Alpha Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

115

title('Interpolated Alpha Signal')

xlabel('Freq (Hz)')

ylabel('dB')

116

interpOmega.m

clear;

clc;

[x,Fs,bits]=wavread('FiltOmega.wav');

n = length(x)

k1 = wavread('FiltOmega');

sound(k1)

%interpolation of the signal



wavwrite(x,Fs,bits,'NewOmega.wav');

s1 = wavread('NewOmega');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Omega Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

117

title('Interpolated Omega Signal')

xlabel('Freq (Hz)')

ylabel('dB')

118

interpLeft.m

clear;

clc;

[x,Fs,bits]=wavread('FiltLeft.wav');

n = length(x)

k1 = wavread('FiltLeft');

sound(k1)




wavwrite(x,Fs,bits,'NewLeft.wav');

s1 = wavread('NewLeft');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Left Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

title('Interpolated Left Signal')

xlabel('Freq (Hz)')

ylabel('dB')

119

interpRight.m

clear;

clc;

[x,Fs,bits]=wavread('FiltRight.wav');

n = length(x)

k1 = wavread('FiltRight');

sound(k1)




wavwrite(x,Fs,bits,'NewRight.wav');

s1 = wavread('NewRight');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Right Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

title('Interpolated Right Signal')

120

xlabel('Freq (Hz)')

ylabel('dB')

121

interpForward.m

clear;

clc;

[x,Fs,bits]=wavread('FiltForward.wav');

n = length(x)

k1 = wavread('FiltForward');

sound(k1)

%Interpolation of the signal



wavwrite(x,Fs,bits,'NewForward.wav');

s1 = wavread('NewForward');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Forward Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

title('Interpolated Forward Signal')

122

xlabel('Freq (Hz)')

ylabel('dB')

123

interpReverse.m

clear;

clc;

[x,Fs,bits]=wavread('FiltReverse.wav');

n = length(x)

k1 = wavread('FiltReverse');

sound(k1)




wavwrite(x,Fs,bits,'NewReverse.wav');

s1 = wavread('NewReverse');

sound(s1)

figure(1)

plot([1:n],x)

title('Original Reverse Signal');

xlabel('Freq (Hz)')

ylabel('dB')

figure(2)

plot([1:2*n],intX)

title('Interpolated Reverse Signal')

124

xlabel('Freq (Hz)')

ylabel('dB')

125

TextToSpeech.m

%produce speech from the word

word = input('What word would you like to speak? ', 's')

switch word

case 'alpha'

a = wavread('NewAlpha.wav');

sound(a)

case 'omega'

b = wavread('NewOmega.wav');

sound(b)

case 'left'

c = wavread('NewLeft.wav');

sound(c)

case 'right'

d = wavread('NewRight.wav');

sound(d)

case 'forward'

e = wavread('NewForward.wav');

sound(e)

case 'reverse'

f = wavread('NewReverse.wav');

sound(f)

end

Documents

EMG Vocalizer - WINLABcrose/capstone12/entries/EMGvocalizer_Fi… · EMG Vocalizer Capstone Design: Final Report Matthew Banks, Jennifer Padgett, Sophie Tsalkhelishvi, Kristin Weidmann