VISUALIZATION OF THE MULTI-DIMENSIONAL SPEECI-I … · 2016-12-28 · VISUALIZATION OF THE MULTI-DIMENSIONAL SPEECI-I PARAMETER SPACE by Andrew Poon Ngai Ho l ... Despite some initial

VISUALIZATION OF THE MULTI-DIMENSIONAL

SPEECI-I PARAMETER SPACE

by

Andrew Poon Ngai Ho

l.. .~ t

A thesis submitted in ' partial fulfilment of the requirements

for the degree of Master of Science

Division of Information Engineering

The Chinese University of Hong Kong

May 1993

,-

~es~5

Tl< 78~:2-S'£Sf'6 b Vi03

ABSTRACT

Speech data has long been considered as complex,· difficult

to comprehend and manipulate. The parameters describing

these complicated data are in general multi-dimensional.

The main purpose of the work reported in this thesis is an

attempt to literally visualize these multi-dimensional

speech parameter. It is hoped that interesting features can

be spotted, and hence provide additional insight to the

property of speech. What is more important, it may help to

reduce the dimensionality of the speech parameter space.

The thesis provides a comprehensive background of the

subject. Various speech data representations are presented.

various methods for visualization of multi-dimensional data

and dimension reduction are also thoroughly con~idered. The

choices for speech data representations, LPC coefficients

and LPC cepstral coefficients, and the final - choice for .

dimensional reduction, Sliced Inverse Regression, used in

the thesis will be discussed in more detail. Computer

programs were developed to do the transformation of the raw

speech data and to do the visualization.

The results obtained were indicative. Using Sliced Inverse

Regression, the LPC coefficients were found to be lying on

an non-linear surface. Moreover, though the LPC cepstral

coeff icients did not provide any interesting feature , it

indicates the inappropriateness of using LPC cepstral

coefficients for the analysis in this thesis.

i

ACKNOWLEDGMENTS

I would like to express my special thanks to Dr. Edmund Lai

for his continual support and guidance on this project. It

was Dr. Lai's original idea to investigate on the topic in

this thesis. without his initial inspiration, this research

will not be started at all. During the lengthy and painful

research process, Dr. Lai has offered his grateful

assistance in suggesting directions and raising possible

pitfalls when something seems to go wrong.

Also, I would also like to express my thanks to the Computer

Technicians in the Information Engineering Department,

especially Andy Lai. They had given me many helps in

operating the computers in the Universities. If not assisted

by them, it would have taken much more time to get

accustomed to the computers (especially the PC to host

connection) before fully mastering them for the real work.

Last but not least, I would like" to t:hank my wife Flora

Choi. She had to tolerate the times of her husband's

negligence of her in favour of his project. Practically, she

even helped me prepare the tables and some of the graphic

formatting in this thesis. Moreover, it is her lovingly

moral support that prevented me from quitting when seemingly

unsolvable difficulties were encountered. without her, this

thesis would never have been completed.

ii

TABLE OF CONTENTS

ABSTRACT

ACKNOWLEDGMENTS

1. INTRODUCTION

2. REPRESENTATION OF SPEECH DATA

2.1 SAMPLE DATA REPRESENTATION

2.2 ANALOG LINEAR SYSTEM MODEL

2.3 DISCRETE FOURIER TRANSFORM

2.4 FILTER BAND REPRESENTATION

· •

· · • •

· · 2.5 LINEAR PREDICTIVE CODING ,(LPC)

3.

2.2 LPC CEPSTRAL COEFFICIENT . ·

MULTI-DIMENSIONAL ANALYSIS • •

3.1 PURE GRAPHICAL TOOLS ....

·

· . . . . • · · · •

· · · · · • • • · • • •

· · • · · · · · • · · •

• · · · · • •

· . . . . · . . . .

3.1.1 MULTI-HISTOGRAM · . . . . . . . . 3.1.2 STARS . . . . · . . . . . . . . 3.1.3 SPIKED SCATTERPLOT · . . . . . . 3.1.4 GLYPHS . . . . · . . . .

4

4

7

8

8

10

13

18

18

18

19

19

22

3.1.5 BOXES. . • . • • • • • • • • • • •• 22

3.1.6 LIMITATIONS OF THE BASIC METHODS

3.1.7 CHERNOFF FACES

3.1.8 ANDREW'S CURVE

3.1.9 LIMITATIONS

ANDREW'S CURVE

OF

. .

· . . . . . . . . · . . . . . . . .

CHERNOFF FACES AND

. · · · · • · · 3.1.10 SCATTERED PLOT MATRIX · · • · · 3.1.11 PARALLEL-AXIS SYSTEM · . . · • • · •

3.1.12 COMMON BASIC PITFALL · . . • • • · •

22

26

27

30

30

32

33

4

3.2 PURE PROJECTION METHODS . . . . . . . . 3.2.1 PRINCIPAL COMPONENTS ANALYSIS ••

3.2.2 PRINCIPLE CO-ORDINATES ANALYSIS.

· . • •

3.2.3 REGRESSION ANALYSIS .•••• · . · . 3.3 SLICED INVERSE REGRESSION (SIR) · . · .

DATA ANALYSIS . . . . . . . . . . . . . . . . . 4.1 PROGRAMS AND TEST DATA · . . . . . . 4.2 ACTUAL SPEECH DATA RESULTS · . . . .

4.2.1 SINGLE UTTERANCE OF "4" BY SPEAKER A

ONLY . . • . . • . • • · · • • . · •

4.2.2 TWELVE UTTERANCES OF "4" BY SPEAKER A

4.2.3 THREE UTTERANCES PER SPEAKER OF "4" BY

SPEAKER A, BAND C · . · · · · . • . 4.2.4 TWO UTTERANCES PER DIGIT OF "1" TO "9"

BY SPEAKER A · . . . . 4.2.5 ONE UTTERANCE PER DIGIT PER SPEAKER OF

36

36

37

38

41

50

50

63

66

72

78

83

"1" TO "9" BY SPEAKER A,B,C . . . •• 86

CONCLUSION AND FURTHER WORKS . . . . . . . • • . •• 93

5.1 CONCLUSION •• . . . . . . . . . . . . . 5.2 FURTHER WORKS ..•. • · · . . . . . .

REFERENCES

93

94

APPENDIX I

APPENDIX 2

APPENDIX 3

APPENDIX 4

APPENDIX 5

APPENDIX 6

APPENDIX 7

APPENDIX 8

MATLAB PROGRAM LISTING FOR SIR

C PROGRAM LISTING FOR ROTATIONAL VIEW

C PROGRAM LISTING FOR LPC AND CEPSTRAL

TRANSFORMS

ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR

SINGLE UTTERANCE OF "4 11 BY SPEAKER A


12 UTTERANCES OF "4" BY SPEAKER A


5 UTTERANCES PER SPEAKER ,"OF "4" BY SPEAKER

A,B,C.


2 UTTERANCES PER DIGIT OF DIGIT "l" TO "9 11 BY

SPEAKER A


1 UTTERANCE PER SPEAKER PER DIGIT OF "1" TO

"9" BY SPEAKER A,B,C

CHAPTER 1 INTRODUCTION

Speech processing has a history long before the invention of

the telephone in 1876. Beginning in the late 18th century,

researches had already been done on the engineering aspect

of speech. Ini tial interests concentrated on mechanical

speech synthesizers which tried to produce human voice

mechanically. [Furui 89J This led to the further and deeper

research on vocal vibration and hearing mechanisms. Today,

speech recognition ~s one of the hottest topics in

artificial intelligence research. Also, speech synthesis and

storage is beginning to move from academic research to

commercial applications (eg voice mails, tapeless recording

phones, electronic dictionaries) .

Despite some initial success, the complexity of sound

waveform is still posing great difficulties in the research

and development on speech. To cope with the problem,

scientists and engineers thus used a lot . of other methods to

represent human voice with finite discrete parameters

instead of dealing directly with the complex and continuous

sound waveform. Many models exist. However, no matter which

kind of modelling is used, a considerable number of

parameters are still needed to closely represent the actual

human voice. Human's visual experience limits our ability to

deal with dimensions higher than three. Therefore,

difficulties in analyzing the human voice data still exist.

It is still difficult for human to deal with the multi-

1

dimensional parameters di~ectly. Though some advances have

been made recently, statistical analysis on mult.i-variant

data still has not reach to its maturity yet. Multi

dimensional data remains an obstacle to human for further

analysis.

On the other hand, even at the early stage of speech

research ( eg. Flanagan 1965 ), researches have long

suggested that the actual speech data may be close to 2

dimensional in nature. Is it possible that the speech data

lie on a hyper-surface in the original multi-dimensional

space ? How can we reduce this multi-dimensional data to a

2 or 3 dimensional ? How can we find the distribution and

characteristics ? These are the main questions which a

proper visualization of the speech parameter space may

enable us to answer.

If speech parameters can basically be represented in 2

dimensions, say, they lie ON a hyper-surface, it will be a

major discovery in speech research. Speech data can

theoretically be compressed to 2 parameters only. It will

save a lot of storage spaces and transmission times for

speech data. 2-dimensional data are much easier to

manipulate in speech synthesis and in other analytical work.

Also, a 2-dimensional data space will makes speech

recognition a much simpler job. The system only has to

recognise the 2 dimensional point only. Therefore, a major

breakthrough in speech research will definitely emerge.

2

This thesis is divided into 8 chapters. As already seen,

Chapter 1 provides · the basic background of the· problem.

There are many possible ways to represent speech data in

discrete parameters. These methods will be presented in

Chapter 2. The final chosen forms of speech representation

parameters - namely LPC (Linear Predictive Coding) and LPC

Cepstral Coefficients are discussed in more detail. General

methods to handle multi-dimensional data are discussed in

Chapter 3. The methods for reducing dirnensionality for

visualization used in this thesis is SIR (Sliced Inverse

Regression), which will also be discussed in detail. Further

spotting of other features in speech data will be aided by

the human eye. The actual programs and test data are briefly

explained in Chapter 4. The actual results using these

programs and test data are presented later in the chapter.

Discussion of the results will also be included. A final

review of the whole thesis will be concluded in the last

chapter - Chapter 5. Suggestion on further works will be

raised in this last chapter.

3

CHAPTER 2 REPRESENTATION OF SPEECH DATA

Human voice signals are very complex, even for a single

vowel sound. Our vocal system is continuously changing

during our utterance of a simple voice. This can be

considered as a complex time variant system. For the sake of

simplicity in the initial phase, researchers approximate

human voice production as a time invariant system for short

intervals. considering the actual data of voice signals, we

can see that signals change very little within approximately

20 ms in general. within 20 ms, the system can be

approximate by a time invariant one. Many researchers also

use this frame width of 20 ms for analysis. However I in

practical situation, a sharp edge window will not be used.

Abrupt changes at the frame boundaries will usually cause

large inaccuracies in the mathematical analysis. Therefore,

researcher will usually slightly increase the frame length,

but keep approximately the same frame advance, and add a

belt shaped , weighing window to smooth the frame

boundary. [FIG. 1)

2.1 SAMPLE DATA REPRESENTATION

A direct way to represent speech data will be simply use the

sampled value of the speech data. [FIG 2J The sampling rate

being the Nyquist rate. For a 20ms frame and a sampling rate

of 8KHZ, for instance, the number of parameters (sampling

points) will be 160. This is obviously excessive. Moreover,

the dimension will be dependent on the frame length and the

4

VO

ICE

SI

GN

AL

FR

AM

E

ABR

UPT

W

IND

OW

ACTU

AL

WIN

DO

W

20 m

s 2

0m

s

-..1

·"·

L

FIG

. 1

FRA

MIN

G

AN

D

WIN

DO

WIN

G

OF

TH

E

SPE

EC

H

DA

TA

IN

TIM

E

DO

MA

IN

5

sampling rate. This raw' data approach is definitely not a

good representation of the speech for further analysis.

2.2 ANALOG LINEAR SYSTEM MODEL

Before the invention of PCM and digital signal processing,

researches are limited to analog systems. Early systems had

tried to break down the voice production system and imitate

their functions separately. Using mechanical and/or

electrical means, researchers tried to model the lung in

analog form, vocal tract, vocal cord, aural and nasal

cavities, lips and tongues etc. and produce human voices

correspondingly. In actual human voice production, air flow,

glottal contraction rate and tension, resonance in the

cavities, and other effect on lips and tongues all together

make up an extremely complex system. Many different models

exist - Linear Separable Equivalent Circuit Model, Vocal

Tract Transmission Model, Vocal Cord Model, Articulatory

Model etc. Each of them views the system on a di.fferent

perspective and emphasizes different aspect of the actual

system. When modelling anyone of them, quite a few number

of parameters are involved, eg. air-flow rate, dimension of

the glottis etc .. Initially, some apparent success has been

accomplished in speech· production, though the resultant

artificial voice produced are still not very good. To

improve the result, more parameters and more fine tuning on

parameters are needed. However ,the improvement attained is

far from proportional to the effort put in. As the number of

parameters is increased, it becomes difficult to fine tune

7

them because the outcome voice is not independently

correlated with one parameter. Adjusting one parameter will

affect the influence of some other parameters. Thus the

system becomes very difficult to fine tune.

Recently, many researchers put aside the direct modelling

of the human vocal system due to its gaining complexity.

Instead they concentrate on representing the overall

resultant speech data. Mathematical tools are used instead

of direct analog means. Examples of such are OFT, band-pass

filtering, LPC and Cepstral transforms etc.

2.3 DISCRETE FOURIER TRANSFORM

The simple Discrete Fourier Transform (OFT) faces the same

problem with the raw data approach - too many dimensions and

dependency on frame length and sampling rate. The difference

is that one is in the time domain and one is in the spectral

domain.

2.4 FILTER BAND REPRESENTATION

Band-pass filtering is an example widely used commercially

in vocoder due to its simplicity and low cost. Each band

fil~ers out a band spectrum. A value corresponding to the

energy within that band is thus read as a parameter. The

number of bands will determine the number of parameters.

Both analog or digital filters can be used.[FIG.3] However,

to be accurate, a lot of bands are still needed. Moreover,

as the band gets narrower, the band-pass filters have to be

8

l..-

Q

"' '- Q) c:

o -g

--0

(]

) "- c..

en

::E

Cl:

1.0

0.8

0.6

0.4

0.2

0'

4 8

U n

vo

ice

d s

pe

ech

----

----

----

----

----

----

----

----

----

----

----

_.

Vo

ice

d s

pe

ech

12

P

16

2

0

FIG

.)

BLO

CK

D

IAG

RA

M

OF

BA

ND

-FIL

TE

RIN

G

9

steeper in their charaeteristics. This poses a practical

problem in this approach. If digital filter is used, there

will be a conflict betwe'en the time resolution and the

frequency resolution.

Besides DFT, other forms of digital signal processing are

also popular. DSP becomes popular recently mainly because of

the invention of PCM and the advances in digital technology.

One of the most popular method used in representing voice

data in recent years is the Linear Predictive Coding

coefficients.

coefficients.

Another

These

choice which

two methods

is the Cepstral

of speech data

representation are used in this research and will be

discussed in more detail in the following sessions.

2.5 LINEAR PREDICTIVE CODING (LPC)

The two most important papers to appear since 1960 were

presented at the 6th International Congress on Acoustics

held in Tokyo, Japan, in 1968

analysis-synthesis system based

method presented by NTT I s

the paper on a speech

on the maximum likelihood

Electrical communications

Laboratories, and the paper on predictive coding presented

by Bell Laboratories. Th~se papers essentially produced the

greatest thrust to progress in speech information processing

technology; in other words, they opened the way to digital

speech processing technology. specifically, both papers deal

with the information compression technique using the linear

prediction of speech waves and are based on mathematical

10

techniques for stochast±c processes. These techniques gave

rise to Linear Predictive Coding (LPC) . [Furui 89]' Now, LPC

is a very widely used methods to represent voice data both

in research areas and commercial applications. A lot of

speech recording / talking chips in the market employ LPC or

its variations in their algorithms. (eg Analog Device Inc.,

DSP Group Inc., Zilog Inc. etc.) [HKPC92] The basic

principle of LPC works as follows.

Assume discrete signal samples Xl' X2 ,X3 , X4 •••••• where

sampling frequency Fs > 2Fb ,the signal bandwidth. Let us

assume the following first-order linear combination between

the present sample value Xl and the previous p samples,

where € is an uncorrelated statistical variable having a

mean value of 0 and a variance of 02.- The sig:nal Xl can thus

. be thought of "linearly predicted" by the previous values Xl - i

with an error term of E. By adjusting the weighting

coefficients ai' it is possible to minimize the rms value of

€. The a· thus obtained are the LPC coefficients. The diagram I

of a LPC synthesizer is depicted in Fig 4. The value of p is

called the order of the LPC analysis. The greater the p, the

smaller the residual error €. However, for band limited

data, the values of the are significant for the first 10 -

14 LPC coefficients. Even for much larger value of p, the

coefficients become relatively insignificant after the 12 th

11

Cl)

H Cl)

X ~ ::c E-i Z >t Cl)

U • • ~ ~

~ 0 N

U M

H E-i

~ ~ ::c u Cl)

~

W t!) H ~

coefficient. Usually these coefficients will be at least an

order less than the first few coefficients. Moreover, the

reduction of residual error is also relatively insignificant

for the order larger than 12.[FIG. 5][TI 1987] Therefore,

for practical purpose ,many researchers will use the ' order

of 10-14 for analysis. The order that we use in this

research is 12. These 12 coefficients will then be the

parameters of a short interval (assuming time invariant) of

human voice. In other words, we will be dealing with 12-

dimensional data.

To obtain the LPC coefficients, ie obtaining aI' a2 , ••• ap

which minimize the residual error €, a set of linear

simultaneous equation has to be solved. These can be solved

by covariance and autocorrelation method by using Cholesky

decomposition and Durbin's recursive solution respectively.

The former method is more efficient than the later one. But

the former one is less accurate when the number of samples

are small and temporal variation large. [Furui 1989]

Therefore, ln our research, we will use ' the later method for

our analysis.

2.2 LPC CEPSTRAL COEFFICIENT

Cepstral transform is technique used to separate spectral

envelope and spectral fine structure. It is defined as the

inverse Fourier transform of the short-time logarithmic

amplitude spectrum. [Bogert et al., 1963; Noll, 1964, NolI

1967] The block diagram of the cepstral transform is

depicted in Fig 6.

13

'- e ~ <D

c:

o ~

"0

<D

'- 0

-C

/)

~

0:

1.0

1\

\ , \ \ \

0.8

\

0.6

0.4

0.2

0'

\ \ \ \ \ \ '\ "',

'"

4

" "

Unv

oice

d sp

eech

-'---

----

----

----

----

----

----

----

----

----

-

8

Voi

ced

spee

ch

12

P 1

6

20

FIG

. 5

VA

RIA

TIO

N

OF

TH

E

RMS

PR

ED

ICT

ION

ER

RO

R

WIT

H

TH

E

NU

MB

ER

OF

PRE

DIC

TO

R

CO

EF

FIC

IEN

TS

, P

" 1

4

-I « z (!J -Cl')

z - . - -

--I-

'- Ll.. '-/'

0 /

--

CD tL: 0 . " / 0 -1 -" /

\D .

to r-f

The basic concept of cepstral transform is as follows. If

the magnitude of the Fourier transform of a signal is a

multiplication of two functions, an envelope component and

a fine structure component, then taking the logarithm of the

spectrum will convert the multiplication relationship to an

additive one. The inverse Fourier Transform will then

separate the two functions.

x(t) = a(t) * b(t) (convolution)

F(x(t)) = X(w)

= F(a(t)) F(b(t)) (multiplication)

= A(w) B(w)

log(X(w)) = log(A(w)) + log(B(w))

F-1 ( log (X (w) )) = F-1

( log (A (w) )) + F-1 ( log (B (w) ) )

Thus the cepstral transform will separate these two spectral

components. The spectral envelope portion will correspond to

significant coefficients on the first few cepstrum

coefficients. On the other hand, the smaller spectral'

variation will be represented by later smaller coefficients.

Thus the cepstral transform essentially separate two

convolutionally related properties by transforming the

relationship into a summation. This is a kind of homomorphic

analysis or filtering. [Oppenheim and Schafer 1968]

Actually, the term cepstrum is obviously includes the

meaning of the inverse transform of the spectrum. The

independent parameter for the cepstrum is called quefrency,

16

which is also obviously' formed from the word frequency.

Since the cepstrum is the inverse transform of the frequency

domain function, the quefrency becomes the time-domain

parameter. The separation of the spectral envelope and the

fine structure is termed liftering, obviously derived ' from

filtering.

Now, instead of cepstral transforming a time-domain signal,

we transform the LPC coefficients. The coefficients obtained

are the LPC cepstral coefficients. Perhaps only the first 3

or 4 coefficients are significant due to the envelope

extraction property of cepstral transform.

The cepstral coefficients must be differentiated with the

LPC cepstrum transform, which is a cepstral transform

applied to time-domain speech signal using z-transform of

the impulse response of an all-pole speech production system

estimated by the LPC analysis instead of using Fourier

transform as abovementioned.[Furui 1989).

For simplicity, in the following part of the thesis, the LPC

cepstral coefficients will be simply called cepstral

coefficients and the transform will be simplY called

cepstral transform.

17

CHAPTER 3 l\1ULTI-DIMENSIONAL ANALYSIS

Given the 12 LPC coefficients or the cepstral coefficients,

How can we analyze them ?

Are there any interesting features that lies

within these 12 dimensional data ?

How can we know if all these points may lie on

some hyper-surface ?

3.1 PURE GRAPHICAL TOOLS

Human eye has long been the most useful tool in spotting

distribution characteristic, provided that the data are

presented well. A graph is often a better summary than

thousands of words or thousands of numerical figures in

tables. Line graphs, bar charts-, pie charts, etc. have long

been used to depict 2 dimensional data. Many ways have also

been devised to present multi-dimensional data. These

methods include multi-histograms, stars, spiked scatterplot

[Gower'1967] , glyphs [Anderson 1960], boxes [Hartigan 1975],

Andrew's curve [Andrew 1972], and Chernoff's faces [Chernoff

1973].

3.1.1 MULTI-HISTOGRAM

The most natural and the simplest way of representing a p-

dimensional observation is by using p vertical columns or

bars. The bars are placed next to each other with the

heights respectively proportional to the p-th dimension

18

observations. Such a bar diagram is constructed for every

observation vector. Alternatively the midpoints of the bars

can be joined to give a more continuous appearance to the

profiles. An example in 6 dimension is depicted in FIG 7.

3.1.2 STARS

Star or polygons represent the p measurements of a case on

equally spaced radii extending from the centre of a circle.

Any negative measurements can be adjusted by a suitable

transformation. The mea$ur~me~ts ·~re · then linked to form a

star. Such a star can now be formed for every observation

vector. This star representation is also called circle

diagram or snowflakes. The same 6 dimensional points are

shown in Figure 8. [Everitt 1978]

3.1.3 SPIKED SCATTERPLOT

The 2-D scatterplot has long been a standard representation

for 2 dimensional data. Therefore, modifying the 2-D

scatterplot diagram to accommodate more dimensions seems to

be an obvious choice. For example, [Gower 1967] reports a

method due to Ross in which the first two variable values

for each observation are plotted in the usual way to give,

for a particular observation, a point R, say. The third

variable value for this observation is now represented as a

line with length proportional to this value originating from

R and directed eastward if the value is positive and

westwards if negative. Similarly the fourth dimension uses

the north and south directions, and if five and six

19

,.--

,.--

,.--

r---

-r-

--r-

--r--

~

r--

'--

r--

---~

r--

r---

-r-

--r----

r--

r--

r--

--

r--

r--

f--

r---

r--

x , X

1X aX

4X

sX ,

X ,X!X~4X5X$

X, X

tX,x

4X V

C,

x , x

zX ,x

"x ,X

, X

1X IX

aX"X

sX

.

X,X

1XsX

4X

,X,

X, X

!X,x

.X ,x

, X

, X zX

sX..x

,X,

X , x

zX sX

.J( ,X

, X

,X tX

sX .. X~.

FIG

7

MU

LT

I-H

IST

OG

RA

MS

(PR

OF

ILE

S)

OF

FIV

E

6-D

IME

NS

ION

AL

P

OIN

TS

20

1 2

3 4

5 X

2 X

, r-----..

. X e

(--1::

---7

(-----

" )

~

/',--

--(-

---~

---7

'

,

/ \

--\-

~

~

x ..

x'

~\J

~

5

FIG

8

STA

R

RE

PR

ESE

NT

AT

ION

O

F F

IVE

6-

DIM

EN

SIO

NA

L

PO

INT

S

21

dimensions

directions

are required

could be used.

the N . E. / S . W . and N • W. / S . E.

Figure 9 shows a set of 6

dimensional data as in above.

3.1.4 GLYPHS

Anderson (1960) proposed that the measurements of each of

the p variables in a specific case be represented as rays

from a circle. These representations are known a glyphs.

This representation differs from the star representation

as discussed ·p~evi.ously .. Firstly, all the rays are drawn

from the top of the circle to the outside and secondly, the

end points of the rays are not linked. Such a glyph is then

drawn for every observation vector. An example is depicted

in Figure 10.

3.1.5 BOXES

Hartigan (1975) recommended that boxes be used for

representing multivariate data. If p = 3, each observation

vector is represented by one box with the length, width and

height corresponding to the Xli X2 ,and X3 observations. All

negative values are made posi ti ve by the addi tion of an

appropriate constant. For p > 3 a single side of the box can

be used to represent more than one variable by dividing the

side into segments. This is shown in Figure 11.

3.1.6 LIMITATIONS OF THE BASIC METHODS

All the above methods are relatively naive methods and can

be applied only if the total dimension is small, say 5 to

22

X2

~

L .

1 ~

~

X1

Xs

X6~ X4

~

XS

FIG

9

SP

IKE

D

SCA

TT

ER

PLO

T

OF

FIV

E

6-D

IME

NS

ION

AL

P

OIN

TS

23

x x

Xa

)(sI

I'

X3

~I

X1 FIG

1

1

BO

XES

R

EPR

ESE

NT

AT

ION

O

F F

IVE

6-

DIM

EN

SIO

NA

L

DA

TA

25

6, and are mostly used in' observing simple tendencies or for

simple comparisons. For dimensions larger than 8, it is

extremely difficult if not impossible to observe any

tendencies or clustering. Moreover, the result depends

heavily on the ordering of the dimensions. If the dimension

ordering is chosen well, one may observe interesting feature

easily. On the other hand, if a poor ordering is used, even

a simple clustering will be difficult to spot. Anderson

suggested that his variables in glyphs should be ordered

such that highly correlated ones should be as far as

possible with adjacent rays (as depicted in Figure 10). For

box representation it is also clear that grouping together

of highly correlated variables on one side of a box reduces

the variation between segment lengths of a side for

individual boxes.

More complex representation which can accommodate higher

number of dimensions and less succeptable to the ordering of

the dimensions are Chernoff Faces and Andrew's Curve.

3.1.7 CHERNOFF FACES

Considerable interest is being shown in the representation

of multi-variate data points by means of faces (or rather

characteristics of the human face). It is suggested that

human are accustomed to distinguishing accurately between

faces with different features and widely divergent facial

features can be used to associated with different variables.

For example, Xl can be associated with the size of the mouth,

26

X2 with the size of the "nose ........ Computer program can

easily be written to generate these faces corresponding to

different multi-dimensional vector. A simple example is

shown in Figure 12.

3.1.8 ANDREW'S CURVE

Instead of using direct association, Andrews (1972) proposed

the use of more complex harmonic functions for presenting

mUlti-variate data points. He introduced the function

Two important properties of the Andrew's function are

1) The function representation preserves means. This

implies that the function of mean vector of the

observed vectors Xl ,X2 , •••• Xn is the pointwise mean

of the functions fl (t) ,f2 (t) .... f3 (t) .

2) The function representation preserves distances.

It can · be proved that the distance between two

functions fl(t) and f 2 (t) measured by

i: [f1 (t) -f2 (t)]2 dt

is proportional to the Euclidean distance between

the observa tion vector XI and X2 •

These two properties give rise to the fact that Andrews'

curves are useful in clustering observation points in

homogeneous groups or to compare individual functions with

27

Var

iabl

e F

eatu

re

1 N

W a

ngle

of f

ace

c:=

J

I~

2 N

E

angl

e of

face

3

Left

eyeb

row

o 0

4 R

ight

eye

brow

5

So

cke

t o

f le

ft e

ye

6 P

upil

of le

ft e

ye

A

7 S

ock

et

of

rig

ht

eye

8

Pup

il o

f ri

gh

t eye

N

9 S

ha

pe

of no

se

10

Sh

ap

e o

f m

ou

th

/j\

11

SW

an

gle

of f

ace

1

2

SE

an

gle

of f

ace

FIG

12

AN

EX

AM

PLE

OF

A C

HER

NO

FF

FAC

E

28

110

100 I

J( •

90

eo

70

60

60

<40

30

20

10 0

-10

-20

-30

....«

)

-s.o

-2

.0

-1.0

0.

0 1.

0 2.

0 3.

0

FIG

1

3

AN

DR

EW

S'

CU

RV

E FO

R

FIV

E

6-D

IME

NS

ION

AL

P

OIN

TS

29

the mean function. An example of which is shown in Figure

13.

3.1.9 LIMITATIONS OF CHERNOFF FACES AND ANDREW'S CURVE

still, both Chernoff faces and Andrews curves suffer a major

drawback as all the abovementioned representations. When the

number of data points are in the order of hundreds or

thousands, it become almost impossible for human eye to deal

with so many figures, whatever it may be. Also, all these

representations are good for clustering analysis. For other

characteristics such as whether the points lies on a

hypersurface, all these methods seems useless. Some other

method should be investigated.

3.1.10 SCATTERED PLOT MATRIX

The famous scattered plot matrix is one of the most widely

used practical tools for multi-dimensional data analysis.

[Cleveland 1987] It simply arrange each pairs of 2-

dimensional scatterplot in a matrix as in Figure 14. It is

good for viewing clustering and is handy for data

manipulation. However, for viewing the characteristic of the

data other than clustering, scatterplot matrix becomes less

useful. Imagine what would be seen in the scatterplot matrix

if the data lie roughly on the surface of a hypersphere.

Every plot will look similar as a single round cluster. A

great deal of intelligent (and lucky) data manipulation

guesses has to be done to figure out this hypersphere

30

-..,

• D1

•

• •

• D

2 •

• •

• 0

3

• •

• • •

04

•

• •

• D

5 •

•

FIG

1

4

SCA

TT

ER

PLO

T

MA

TR

IX

OF

A S

ING

LE

5-

DIM

EN

SIO

NA

L

PO

INT

31

characteristic. Moreover,' for interactive data manipulation

using scatterplot matrix, the total dimension should be less

than 7 or 8. The total plots will be 64 when the dimension

is 8. Even if we neglect the upper diagonal plots, (because

they are the transpose of the lower diagonal) the total

plots remains 28. Our human vision and mind is almost

impossible to handle so many plots at one time, even though

we have a large computer screen to accommodate tens or even

hundreds of scatterplots, not to mention the use of screen

scrolling to viewtn practice. For our speech data analysis I

the total dimensions in our case is 12. Therefore

scatterplot is not a suitable tool for our analysis.

3.1.11 PARALLEL-AXIS SYSTEM

Another mUlti-variant visual tool is the parallel-axis (or

parallel-coordinate) system. [Inselberg & Dimsdate 1987]

Dimension are represented by evenly spaced vertical axis.

The value (projection) on that dimension is represented by

an intersection on that particular axis. Lines are then used

to connect all those intersections. Therefore, a real point

in the actual multi-dimensional space will be represented by

a line pattern in the parallel coordinate system. (FIG 15)

This system is a compact representation system which

compresses the multi-dimensional information into a single

2-dimensional plot. However, the plot becomes messy if the

number of data points is large, say a few hundred. It will

then be difficult to spot interesting characteristics on one

32

screen even though enlargement of a small window is

allowed. Moreover, the characteristic which can be spotted

from this parallel coordinate system is limited. For

example, you may be able to spot that the characteristic of

the data points lies on a hypersurface provided that you are

sure that the hypersurface is a convex conic. For a convex

conic, the envelope of the plot will be an open ended

conic. (FIG 16) However, if the hypersurface is not known to

be a convex conic, then even we have an open ended conic

envelope, no conclusion can be made. We have to dig into

those individual lines instead of simply the envelope for

more detail. The parallel axis representation may find its

usefulness in other application area such as air traffic

control (including the dimension of time, the total

dimension is 4.) The research of the parallel axis system is

not so mature as to provide a simple and general viewing

method for spotting interesting characteristics yet.

Therefore, this tool is still not a good enough good choice

for our purpose.

3.1.12 COMMON BASIC PITFALL

The main basic problem with the above methods is that they

try to preserve all multi-dimensional details in their 2-

dimensional representation. Too much information will appear

on the 2-dimensional representation and it will become

difficult to spot any interesting characteristics. Some sort

of information compression should be used.

33

D1

D2

D3

D4

D5

D6

/

FIG

1

5

A S

ING

LE

6-

DIM

EN

SIO

NA

L

PO

INT

U

SIN

G

PAR

AL

LE

L

CO

-OR

DIN

AT

E

RE

PRE

SEN

TA

TIO

N

34

01

02

0

3

04

0

5

06

FIG

1

6

6-D

IME

NS

ION

AL

PO

INT

S O

N

A C

ON

VEX

SU

RFA

CE

U

SIN

G

PAR

AL

LE

L

CO

-OR

DIN

AT

E

RE

PRE

SEN

TA

TIO

N

35

3.2 PURE PROJECTION METHODS

Another way to analyze multi-dimensional data is by the use

of numerical statistics. The usual numerical tools in

statistics are in some form or another of a projection

pursuit. Higher dimensional data are transformed to lower

dimensional through some numerical statistical

manipulations. Through the transformation, some sort of

information compression is achieved. This prevents the

pitfall of _putting all detail multi-dimensional information

to a 2 dimensional plane. Principal components analysis,

principal co-ordinates analysis and regressions are the most

widely used statistic tools.

3.2.1 PRINCIPAL COMPONENTS ANALYSIS

Principal components analysis is described in detail in many

textbooks of mUltivariate analysis. It is usually used to

obtain a low-dimensional (usually two to three dimensional)

representation of mUltivariate data so that the data may be

examined visually and any structu~e identified. Essentially,

principal components analysis consists of finding linear

transformations, Yl' Y2' •••• Yp of the original variable

Xl,X2' .. ~.Xp that have the property of being uncorrrelated.

The y-variables are chosen in such a way that Yl has maximum

variance'Y2 has maximum variance subject to being

uncorrelated with Yl and so on. The transformation is

obtained by finding the latent roots and vectors of either

the covariance or the correlation matrix. The latent roots,

arranged in descending order of magnitude, are equal to the

36

variances of the corresponding y-variates, these being of

course, the principal components. The proof of this can be

found in many text books such as [KENDALL 1975].

An alternative method of viewing principal component

analysis is that given by Gower (1967) who shows that the

first principal component may be regarded as the line of

best fit (in the least squares sense) to the n,p-dimensional

observation in the sample. These observations may therefore

be represented in one dimension by taking their projection

onto this line. A derived and improved representation is

given by the projection of the observations onto the plane

defined by the first two latent vectors of the covariance or

correlation matrix. Similarly, the first p latent vectors

give the best fit in the p-dimension. In each case the

original observation, may be projected into the lower

dimensional space to give a set of points which acts as an

approximation to the configuration of the observation. Once

the first 2 principal components are found, the data are

transformed accordingly and can be easily plotted on the 2-D

graph.

3.2.2 PRINCIPLE CO-ORDINATES ANALYSIS

Gower (1966) describes a ·procedure which he terms principal

co-ordinates analysis, which is similar in some respects to

principal components analysis, but leads to a set of

projected points whose distances apart may be allowed to

reflect non-Euclidean distances between the original

37

observations. In situation where the appropriate distance

measure between observations is considered to be something

other than Euclidean this would obviously be a very useful

procedure. Therefore, principal component analysis can be

thought of a special case in principal co-ordinate analysis.

Both methods described above (principal components and

principle co-ordinates) are excellent if the data are nearly

collinear or coplanar. For curved data, these methods may

still suggest a good transform for viewing data. One problem

of these method is that they are quite static. Either you

spot something interesting or you does not. Not much

suggestion is provided on what the final hypersuface is even

though it is spotted. Moreover, not much space is open for

further analysis.

3.2.3 REGRESSION ANALYSIS

Regression is among the most widely used methods in

statistics. It- is hardly anything new. Linear regression

appears in many college text books. The basic concept of

regression is that it tries to represent one of the

dimension in terms of the others.

xk = f (X1 'X2 - •••• Xk_1,Xk + 1 , _ •• xn ) + 0

where 0 is a random error term.

The chosen xk is usually denoted as y, the regressand. The

remaining dimensions of x, usually called the regressors,

are then grouped as vector X. The y value is then

"predicted" by the vector X (providing that 6 is small).

38

The process of finding the function is usually transformed

to a minimization problem. A typical least square method is

to minimize the variation of predicted y and the actual y,

ie LS2.

For parametric regression, some form of the function is

assumed. For example, in linear regression, y is expressed

in terms of linear combination of x.

or y= AX (where A is a vector)

In non-linear parametric regression, we need to assume

beforehand some predefined orthogonal functions. Eeg Box and

Cox 1964] Needless to say, the result depends heavily on the

choice of the function. In our case of complex speech data,

we may then need to blindly try many different functional

models in search for interesting characteristics.

Non-parametric regression make no assumption of a parameter

model. Instead, it generally only assumes that the function

belongs to some infinite dimensional collection of

functions. Examples of such are polynomial, Fourier series,

and smoothing spline estimator [Eubank 1988].

However, usually the non-parametric regressions are

extremely computational intensive. Usually many iterations

are needed for the function estimation (potentially infinite

number of components). To make the matter worse, usually a

39

large number of data points are needed. This is time

consuming even if the aid of modern computer and may not be

applicable for small data set such as one single person

speaking a single word. Moreover, during the estimation,

some smoothing parameter has to be carefully chosen in order

to be accurate. Finally, there are still basic assumptions

that lie behind non-parametric regression. The curve are

usually be assumed to be continuous, differentiable or

differentiable with a square integrable second derivative.

This assumption may be doubtful in speech data where

traditionally a clear distinction is made on frictitious

sound and non-frictitious vowels.

The advantage of using regression is that it offer some

information of the modelling function. Moreover, it has the

flexibility to choose any dimension as regressand and can do

some manipulation on regressand and regressor separately

(such as taking the logarithm of either the regressand or

the regressors) .

Pure graphical methods tries to presents all information

contents in 2-dimension. Parallel axis presents all

projections along the axis. They are hard to analyze in the

context of this thes is. Each of the proj ection methods

discussed above, principle component, parametric and non

parametric regression has its imperfection. Is it possible

to have a method which is applicable to our speech data

while skipping the disadvantages of the above methods ? This

40

problem was answered by Ker-Chau Li in his Sliced Inverse

Regression method [Ker-Chau Li 1989].

3.3 SLICED INVERSE REGRESSION (SIR)

The usual method used in classical regressions is the

Forward Regression, with the goal being the approximation of

the response surface E(ylx). [Friedman and Stuetzle 1981,

Donoho and Johnstone 1985,Hall 1988, Chen 1988] The

statistics manipulations is to find the best projected

variables for predicting y. There is ,another relatively new

regression method called the Inverse Regression [Li 1988, Li

1989]. It works from y backward and towards its relationship

with each dimension in X and tries to estimates the inverse

regression curve E(Xly). Basically, it searches for the best

two or three uncorrelated projected variables that can be

predicted most accurately from y. It does not try to obtain

the regression function as in ei ther parametric or non

parametric functions. It simply try to obtain the best view.

SIR analysis procedure is as follows

1) Choose an arbitrary dimension y, the remaining

dimensions will be grouped in vector x.

2) Divide the range of y into H slices.

3) within each slice compute the sample mean of x,

denoted by Xh , h=l, ... , H

4) Compute the sample covariance of x,

41

n ~xx = L

i=l

and the weighted covariance for the slice means .

H

~1'\1'\ = L Ph (Xh-X) (Xh-X) t h=l

where Ph is the proportion of cases that fall into

the slice h, and x is the sample average of Xi'S.

5) Conduct an eigenvalue decomposition of the

weighted covariance L ~~ for the slice means with

respect to the sample covariance Lxx.

6) Order the eigenvectors according to the descending

order of the corresponding eigenvalues and pick

the first two ( ie with largest eigenvalues).

7) The two eigenvectors picked will be the necessary

transformations to the original dimensions.

Basically, the method first obtain a rough estimation of the

response curve be slicing y and obtain the E(X) for each

slide (step 1-3). It then proceed to perform a principle

component analysis on X (step 5-7). The sample covariance in

step 4 and 5 is for standardization of X and then transform

back to the original dimension after the eigenvectors are

42

obtained. (Without standardization, the resultant principle

components will be scale variant of the original

dimensions.) with the regressand and two transformation

axis, the original data can be transformed and be view in 3-

D graphics, which can be observed directly by human eye. The

resultant view will be the best view for that particular

regressand.

with the principle component analysis as background, the

eigenvalues obtained will correspond to the variances of the

original data points to the corresponding edr directions

(the eigenvectors). Therefore, they serve as indication to

the significance of the resultant transformation. If the

largest two eigenvalues are not very distinctively larger

than the rest, the resultant best view may not be as good as

expected. Therefore, the relative sizes of the largest two

eigenvalues with respect to the others serve as an

indication of meaningfulness. Further analysis is needed

should the eigenvalues are close to each other.

Li had done extensive work on mathematically proving that

SIR works as it should and had provided the necessary

mathematical foundation. He also demonstrate by using many

sets of sample and real life data [Li 1988,1989].

SIR combines the advantages of numerical methods and

graphical tools in spotting interesting characteristics of

speech data. On the numerical analysis side, SIR has the

43

flexibility and informativeness of regression analysis. All

the dimensions can be chosen as regressand for analysis.

Regression analysis provide more information about the

actual modelling function than pure principle component

analysis. Moreover, further transformation on either the

regressors or the regressand can be done separately. This

will facilitate further analysis of the speech data. It does

not need to assume the modelling function beforehand as in

parametric regression.

Comparing to most other non-parametric regression, the best

feature of SIR is the computational simplicity. Looking at

the algorithm, one easily finds that the most expensive

steps are just one round of sorting of n values and one

round of eigenvalue decomposition of a p by P matrix with

respect to another. In contrast to many other non-parametric

regressions, huge number of iteration is not needed. As a

result, this can be easily implemented on a very small

machine, even pc.

Another advantage of SIR is that it is less dependent on the

H, the number of slices on y. The choice of H acts like a

smoothing parameter in the nonparametric curve fitting

problem. [c.f. Eubank 1988, Li 1984,1985,1986,1987, Hardle,

Hall and Marron 1988]. For the nonparametric regression or

density estimation, if the smoothing parameter is not

carefully selected one may have inconsistency in estimation.

But for SIR, the result is less sensitive the choice of H.

44

A large range of H can be used and still have root n

consistent. This is shown both in Li's paper [Li 1989] and

in our later test data in next chapter.

The SIR does not predefined an exact form of the modelling

function as the parametric regression. It only assumes that

the function is in the form

which is a much less stringent condition on the modelling

function.

with the smoothing parameter introduced in SIR, the

resultant view is better if the final formula is actually in

the form

comparing with pur~ principle component analysis. Principle

component tends to manipulate all data point at once. The

local smoothin9 effect of slicing in SIR will provide a more

subtle result. This is actually conf irmed on some sample

data. with the same set of data points, the best view

obtained by principle component analysis tends to have a

larger spread than using SIR.

SIR is an inverse regression algorithm. There is also

advantage of the inverse regression over the forward one.

45

When the total dimensions is large, a huge number of the

data points are need in order not to have the points

sparsely located in X. Since forward regression, which tries

to obtain E(yIX), basically works on X, a large number of

points are needed to make the statistical manipulation

meaningful. On the other hand, inverse regression, which

tries to obtain E(Xly), works only from y to each individual

dimension in X, the requirement on the number of data is

much less.

Traditional regressions tend to dig into the detail of the

underlying formula [eg Van Rijckervorsel and de Leeuw 1988,

de Leeuw 1984, Breiman L. and Friedman J. 1985, Koyak 1987].

In parametric regression, trail-and-error assumptions have

to be made in order to obtain the detail formula. (try to

choose some orthogonal functions) . Non-parametric

regressions usually assumes infinite series functions and

involves complex calculations. On the other hand, the SIR

method only attempt to obtain the 2 best projections (to get

the best view on ~ne data). What the actual detail formula

is will be of no concern. This will save a lot of computing

time. And most important, the object of our experiment is to

find the best view only_ We want to see visually the shape

of the resultant points 'under those projected axes. The

detail formula is of lesser concern for the moment. If we

can visually see some interesting features or shape under

the projected axis, obtaining the detail expression will be

a much easier job. We will have a clearer picture of where

46

we should head and less blind assumptions will have to be

made. Therefore, SIR is chosen as our initial tool on the

speech data.

There is one theoretical pitfall in SIR. The major problem

of this algorithm occurs when the resultant 3D points is

more or less symmetr ical about the y axis. Under this

circumstance, the algorithm could not guarantee to project

the best view. For symmetry along the regressor axis, the

mean value on the transformed axis will be closed to zero.

This mean value will be used in the statistical process in

SIR. Therefore, small perturbation in the mean value will

induce large error. The resultant transformed axis will be

quite off from the actual best projections. This

disadvantage of SIR is confirmed when symmetrical data is

used. However, this does not

explained below.

affect our analysis as

First of all, ' this problem can easily be spotted by

reviewing the eigenvalues. If the data points are

symmetrical about the regressor axis, the resultant

eigenvalues will not show 2 distinctively larger values.

This indicate that either the data does not have a good

projection, or there is a possibly symmetry on the y axis.

Secondly, for real life speech data, it is highly unlikely

that the data points are symmetrical about the regressor

axis for every dimension as the y-coordinate. Therefore, it

47

is highly improbable that suitable projection cannot be

obtained no matter which dimension is chosen as the y co

ordinate. In the actual tests, every dimension in our speech

data will be used as the y-coordinates for SIR. In

conclusion, this problem in SIR should not pose any problem

in our analysis.

On the other hand, if it is fortunate, the resultant data

may be have only one significant eigenvalue and the

remaining eigenvalues are insignificant. Then the case will

resemble the simpler linear regression. Therefore, what SIR

can accomplish is a superset of linear regression, ie always

more powerful.

After obtaining the regression projections, the data points

are transformed according to the first two eigenvectors.

Including the regressor y, the data is then transformed to

a total of 3 dimensional. Viewing 3 dimensional data on 2

dimensional surface is not - a simple job. Li used the

rotating method to view the 3-D points on the computer

screen. This will let the viewer to see all possible views

and therefore will not cause any loss of information in 3-D.

The commercial product Macspin, as an instance, has provided

a user-friendly environment for graphical analysis of three

variables by rotation. Li used XLISP-STAT to perform his

rotation, which is more flexible than the commercial

Macspin. We will use the C programming to do job. As

described in the next chapter. This will allow almost

48

limitless flexibility for further investigation.

As a result, the SIR effectively combined the essence of the

many approaches - pure visual representation, dry numerical

analysis, principle component and regression analysis of

multi-dimensional data. with the aid of human eye, it is

hoped that interesting features can be obtained on the

speech data.

\

49

CHAPTER 4 DAT A ANALYSIS

4.1 PROGRAMS AND TEST DATA

SIR program has been implemented using MATLAB, which provide

many built-in functions for vector, matrix and statistic

manipulations. The hardware platform is the DEe station

2000. The program listing is attached in Appendix I. Due to

the power and compactness of MATLAB, and the simplicity of

the SIR algorithm, the program lis~ing is not that lengthy.

A set of points on a monkey saddle is used for testing. The

equation of the monkey saddle is

where

B: = (0.5,0.5,0.5,0.5,0,0,0,0),

B: = (0,0,0,0,0.5,0.5,0.5,0.5), Q = 2 , € = random (-50, +50)

This monkey saddle was also used by [Li 1989] and therefore

can be used as a cross check.

A small set of data (no. of data n=10,no. of slices h=5,no.

of dimensions d=9) had been tried on the program to test the

steps and results first. The resultant eigenvalues and

eigenvectors are listed in Table I. Each step in the program

50

was followed manually to ensure correct results. Due to the

small amount of data points involved, the eigenvalues and

eigenvectors obtained are not quite meaningful. However,

this serves the purpose of manually checking out the

program.

After checking that every step is correct for these 10 data

points, another 500 data points on the monkey saddle are

used. Dimension is kept to be 9. Various values for h has

been tried. The results for h=10, 30, 50, 100 are listed in

Table 2,3,4,5 respectively. The result agrees quite well

with the expected. The two eigenvectors corresponding to the

largest eigenvalues are closed to the original Bs. Also, the

result agrees with Li's proposal that h need not be large

for smoothing y. Only slight improvement is seen when h is

increased from 10 to 30. For h =50, the result is similar to

h=30 with a slight deterioration you may say. For h=100, the

result seems to deteriorate slightly more. It may be due to

the small number of points in each slice and hence affect

the statistical calculation a bit. It seems that h=30 for

n=500 is a reasonably good choice. This matches with Li's

51

EIGENVALUES

* *

D V1 V2 V3 V4 V5 V6 V7 V8

.542 .900 .900 5.66 -3.47 -1.53 -4.03 -6.66

E-15 E-15 E-16 E-17 E-18

EIGENVECTORS

* *

D V1 V2 V3 V4 V5 V6 V7 V8

Xl .139 .170 -.022 .162 .069 -.428 -.832 .421

X2 -.405 -.316 -.287 -.420 -.137 .320 .338 -.623

X3 -.358 -.121 -.317 -.229 -.078 -.137 -.025 .415

X4 .497 .632 .200 .506 .163 .082 . 133 -.017

X5 .386 .197 .198 .216 .129 -.333 -.309 .387

X6 -.082 -.142 -.706 .010 -.043 -.082 -.013 .025

X7 .174 .224 .270 .234 .106 .150 .154 .193

X8 -.506 -.589 -.374 -.622 -.245 .428 .020 -.267

TABLE 1 EIGENVALUES AND EIGENVECTORS

FOR THE TESTING MONKEY SADDLE WITH n=10,h=5.

52

EIGENVALUES

* *

D V1 V2 V3 V4 V5 V6 V7

.732 .279 .020 .014 .010 .006 7.61

E-4

EIGENVECTORS

* *

D V1

Xl .489

X2 .474

X3 .523

X4 .511

X5 -.004

X6 -.024

X7 -.015

X8 -.031

V2 V3 V4 V5 V6 V7

.007 -.093 .319 .418 .694 .156

.010 .077 -.331 -.482 .158 -.492

-.008 .125 .002 -.327 -.090 .253

-.087 -.000 .098 .362 -.659 .060

-.475 -.066 -.172 ·.158 -.068 -.428

-.537 .722- .046 .197 -.073 -.393

-.482 .002 -.605 -.104 .197 .545

-.485 -.119 .618 -.530 .032 .177


FOR THE TESTING MONKEY SADDLE WITH

n=500, h=10.

53

V8

.002

V8

.081

.361

-.746

.348

-.330

-.127

.177

.170

EIGENVALUES

* *

D V1 V2 V3 V4 V5 V6 V7 V8

D .831 .306 .071 .013 .019 .032 .051 .043

EIGENVECTORS

* *

D V1 V2 V3 V4 V5 V6 V7 V8

Xl .497 .010 -.201 .444 .403 -.623 .070 .073

X2 .486 -.095 .021 -.402 -.194 .130 .725 .101

X3 .517 .000 .307 .018 .398 .517 -.366 .332

X4 .498 .071 -.248 -.033 -.546 -.013 -.388 -.470

X5 -.008 .484 .580 -.444 .114 -.381 -.142 -.171

X6 -.031 .535 -.575 -.324 .402 .288 .001 -.219

X7 -.026 .474 .334 .577 -.095 .308 .391 -.190

X8 -.011 .491 -.158 .074 -.398 -.063 -.110 .735



n=50Q, h=30.

54

EIGENVALUES

* *

D Vl V2 V3 V4 V5 V6 V7 V8

D .845 .336 .037 .043 .055 .090 .072 .077

EIGENVECTORS

* *

D Vl V2 V3 V4 V5 V6 V7 V8

Xl .487 .019 .749 .166 -.292 .055 .205 -.180

X2 .488 -.070 -.478 .419 -.462 -.032 -.348 .078

X3 .517 .014 -.069 .126 .569 -.277 .321 .503

X4 .505 .048 -.138 -.630 .152 .346 -.158 -.359

X5 -.018 .454 -.269 -.097 -.127 -.543 .423 -.463

X6 -.027 .551 -.236 -.072 -.199 .574 .393 .338

X7 -.025 .512 .223 -.221 -.119 -.379 -.545 .415

X8 -.22 .469 .089 .566 .533 .176 -.280 -.282



n=500. h=50.

55

EIGENVALUES

* *

D V1 V2 V3 V4 V5 V6 V7 V8

D .856 .352 .212 .081 .141 .095 .113 .110

EIGENVECTORS

* *

D V1 V2 V3 V4 V5 V6 V7 V8

Xl .494 .0155 -.195 .807 .245 -.119 .185 -.072

X2 .480 -.100 -.122 -.321 -.087 .532 -.017 -.569

X3 .525 .038 .551 -.286 -.128 -.418 .423 .048

X4 .498 .049 -.321 - .. 136 -.109 .004 -.52 D. .550

X5 -.017 .453 .440 -.014 .432 .545 .059 .320

X6 -.037 .526 -.124 .096 -.765 .222 .222 .151

X7 -.020 .564 .226 .024 .069 -.319 -.509 -.488

X8 -.016 .431 -.530 -.367 .358 -.283 .452 -.058



n=500, h=100.

56

extensively using h=10 to 20 for n around 300. Since the

eigenvalues and the eigenvectors are meaningfully obtained, the

later calculations are also checked manually to ensure

correctness.

In additional to the MATLAB program, a Turbo-C program has been

written for rotational view of the 3-D data. It reads all the 3-D

data points and then rotates it about the z-axis. The 3-D points

will be the transformed points of the 12 dimensional SIR/Cepstral

coefficients after the SIR analysis. Using this rotational view,

any interesting characteristic may be spotted by the human eye -

the most powerful analyst tool. The complete source program is

list in Appendix II.

Turbo-C is employed due to its extensive library of graphic

functions. These graphical functions can be used easily. This

will save much time comparing to programming in x-window for the

DEC workstation. The program will ensure that the data will be

viewed at all angles for possible interesting features. To

enhance viewing, perspecti ve is used. [FIG 17] Though some

graphic theorists and psychologists may be argued that

perspective does not improve much in viewing 3-D data, the cost

of perspective in this program is relatively little. The program

had actually be tested both by employing perspective and without

perspective calculations fo~ 500 points. Using perspective does

not noticeably slow down the rotation for 500 data points.

Therefore, perspective is still employed in our program.

57

with

out

Pers

pect

ive

Far

obje

ct

Nea

r ob

ject

-'-t

I __

_ ~-,--:~ _

_ ",:-

Scr

een

'9

View

poin

t

With

Per

spec

tive Far o

bjec

t

Nea

r ob

ject

'V

View

poin

t

FIG

1

7

WIT

H

PE

RS

PE

CT

IVE

, N

EAR

ER

OB

JEC

TS

A

PPE

AR

L

AR

GE

R,

FUR

TH

ER

O

BJE

CT

S

APP

EA

R

SMA

LLER

58

In the 3D rotational program, the rotational increments of yaw

and roll can be adjusted. [FIG 18] Too large the increment may

miss some interesting views and the motion will look too

discrete. Too small the increment will slow the rotation. For a

balance, an increment of 5 degree is used in the actual data for

both roll and yaw increment. Yaw ranges from -90° to 90°, which

allow virtually every possible angle to be viewed.

As seen from the listing, a dump-screen function is included for

hard copy record of results. However, in testing the actual

printed result, it is relatively disappointing. without the

rotational motion, data points usually look like a random cluster

even though there are some interesting features in the figure.

Characteristic can hardly be spotted with one single static

image. Quite a number of consecutive plots are needed to review

the actual shape. As a balance, 2 images per rotation will be

included in this thesis for displaying the result.

The resulting 2 eigenvectors obtained in SIR will be used to

multiplied with the X in the raw data respectively. Together with

y, the raw data will then be transformed to a new set of 3

dimensional data. These 3-D data will be feed into this

rotational program for viewing. For initial try, some random

points around a plane and a monkey saddle are generated and are

rotated by the program respectively. The result works good. Both

the plane and the monkey saddle were easily spotted in the

rotational view. (FIG 19,20) Then, the SIR transformation obtained

for h=30 is then employed to the data. The resultant monkey

59

x

~ Z H ~ ~ H :> Cl I

M

Z H

~ ~ ~

0 \D

Cl Z ~

..................................................... ................. ;- ......... .l .... :' .:

/:::~;.,/

~ ~ 0 p::

ex)

M

. ~

N~ _____ ~/i/J & H ~

>-

o Cl ~

o o

, '

, ."

. ", t' "

", , , ..... · :......

.' "

, " " • ". I.

:- . ", '::-

• I ••

: : .. ; :" .

.. : .' :',::.;. "

" ,

------------------~-.-~ ::---l __ ci :·,.: ....... ~'. , ,.,' ~.. .'

. I ,\. , . . , .. I " .. :.

, ' . . ..... :.. . '.' .. ,'.' .,' . '

o o

. i": :- \ .. " , . ./.':- '.: ~ \ .~'" . " ... . . "- .: i .'

. , :' " I

, :

'"

. . r .

~.

'I

11

-' '. " .... ", .

, - .:,' ,', , " · " ..

t : • ..... • •••

...... . ,', ....... ··:.t ','

:'"

... ,

" . ,

'. ~ z , '. ,c(

~ At

,c(

Z 0

Cl)

8 Z H 0 At

~ 0 0 Z

C2 0

M 0 \0

~

~ 0

Z 0 H 8 ,c( 8 0 ~

0 I

C"'1

Cl' r-i

~ H ~

·'

VA~

o 1

0

ROLL

: o

&0.

0

: ... , .

.. :: ..

=.:.:

' .. '.'

:'"

... :!~:

., .:.'

.. .. '.

' ':. if

__

-~~~~~. ;:

"2~)··:·· '

~~~~-:(~.~;:

~j:;. ~.;'~: :::,,: :

:. .~

":"~

..:.

~.:.

.'

_..

..:..

_:.,

I •

•

.-

.'.

YRU

o 1

0

ROLL

· o

90

.0-

'. . .,

'., ";."

':.:.:-'

, .. .' ",

t·· ':. .

' ; .':.<

..... ·

.'

'; .. '...

. "

.:; .. : :rr"

'~""'-:"-:::

' :':: .. '.;,

: .;;: , .

.

.: ..

... : .. : .. .,-:

.,,-;

-. .

",.::,:,.

:.:';.:

'.

• ".

J ..

. .

"

.:,

.'

FIG

. 2

0

3-D

R

OT

AT

ION

O

F 4

00

R

AN

DO

M

PO

INT

S

ON

A

T

EST

M

ON

KEY

SA

DD

LE

62

saddle is fed into this 3D rotating program. The original monkey

saddle and the transformed monkey saddle are both displayed. (Fig

21,22). It is seen that the best view shows the monkey saddle

shape as in the original data.

A planar structure can be easily spotted on the rotational· view.

In some viewing angles, the transformed data points resemble to

a line while in some other viewing angles the data points look

scattered in the 3D space. When viewing at the video monitor, the

plane can be easily spotted due to the continuous motion. The

static plots do have difficulties in showing this planar feature.

It still serve as indications though.

4.2 ACTUAL SPEECH DATA RESULTS

The raw data at hand are digitized samples of single words spoken

by 3 different persons. For simplicity, they are called speaker

A,B,and C respectively. Each of them spoke the English digits

"One" to "Nine". The sampling rate is 8000 samples per second

with 12-bit accuracy. The LPC and cepstral transform programs are

modified from the speech library of Dr. Edmund Lai (Department

of Information Engineering, Chinese University of Hong Kong) . The

resultant listings are listed in Appendix Ill. The order used for

LPC and cepstral analysis is 12. Each analysis frame contains 256

sample points, which correspond to 32 ms. Frame advance of 100

sample points are used, ie. 12.5 ms. Hamming window is used as

discussed in Chapter 2. All these parameters' used are reasonable

and common in most research environments.

63

-._---

--._-

-._---

--.

.'

'.,

o' --. ....

Io

.

:-:. .-

-...

.... _

-_

. ~p •••

_---------------------"-~.'=i'-' ....

::.

.:..'

' . • :'

1-';'"

' . .

. .,

~~

......

:.:.:.

"r'

.':1,:

. t .. -.!

-

:0 ..•..

':

" .' '.'

FIG

. 2

1

'l,lf-)

1.·,;

i(]

--.1

0

Ra~ L

I[

]

,1.;'2

:t) .~]

J=1

J_l::

1"

,:1 ('

I;;.~5:[t 1

. (]I

I"-~

J

.' " ' .. .' o ~ , ., . -l- ,P- ;:1

.' .. ';

'0:/

···.1

., .-;

. . it',·

':'l

;~:"

.~-.. -... --

-<~¥.~

~~--~-

-----~

----

--~---

--.. -~

. :'

::~::.:.

---

---~--

--.

-' .. y

': ....

, "', '. P

. ,

1-:

:.-

.0 .'

.'

3-D

R

OT

AT

ION

O

N

TH

E

OR

IGIN

AL

M

ON

KEY

SA

DD

LE

BY

LI

64

'\;A

J.o~

a -2

0.0

fJoJ

JLL

o

23

0.0

FIL

E

.·!:

4(y

sd 1

.o

.'-g

'"--

--

'.

o· . " :: 1

"

: ......

~I.'

•

i'

-----~--

--------

~ .~[;~~~~

: -~~

, .

. -r.' e

r _

_-----------

.. ---

----

-l--

.----

-·

'.", i.~

:·i~::

· .: -

:·~"::1

i', ~ •

./::::~.~:

:: .:

, ....

, , '.-'

. '.:

:' .'

.',

:.

\·~~

~ii·

~

o --

1.0

~::t

fJi_

t_

\..J

~::u

) . (

)

~.=··r!

:c

c .i

. I_

A..

_

~

.~ ..

vu~~.jsc .. l

. s· 1

. i·-

• I

' ••

" .

,e,: .,;

. ...

'11

••••

••

·A !

....

-----

-~ .[ !

:.:;'

.

-----~

-;: ··it/

: .,: .. _

---.

----

----

--f";

:.~"".

:. '-

('

01'

.::~(

, ,., ..

. : :"

,' .....

. 'E~

' ,,l'

•...

.-:

. "

"

FIG

. 2

2

3-D

R

OT

AT

ION

O

F T

HE

S

IR

TRA

NSF

OR

MED

M

ON

KEY

SA

DD

LE

BY

LI

65

\r-I

!.J

o -.

10

i:::D

I-L

P 1

-30

.0

!='I

LE

rill(~5d I

. s i

t-

After the LPC transform (or Cepstral transform) of a single set

of raw data, the resultant }2-dimensional data are fed in SIR for

analysis. Then the transformed 3-D data is displayed by the

rotational program.

Five sets of data are tested.

1)

2)

3)

4)

one single utterance

12 utterances of "4"

3 utterances/speaker

2 utterances/digits

of

by

of

"1"

"4" by speaker A.

speaker A.

"4" by speakers A,B and C.

to "9" by speaker A.

5) 1 utterance per speaker per digit of "1" to "9" by

speakers A,B and c.

The results of each test are described in the following sessions.

4.2.1 SINGLE UTTERANCE OF "4" BY SPEAKER A ONLY

Initially, the data of a single utterance of "4" is used. The

result shows a thin threading in the 3-D views. No matter which

dimension is chosen as regressor, a thin thread feature still

appears. A typical figure is shown in Figure 23. This is somewhat

expected because this is only a single utterance of "4" by one

person, which does not contain much variation in sound. Besides

the frictitious "f" sound in "four", the "our" is a quite

constant sound with only slight and gradual change. Therefore,

the data points are expected to be a thin thread shape. Anyway,

it serves to prove that our experiment setting works as it is

. planned.

66

~ ..

, , \ \ 'l " '1 . I \ \. .... , \ '1, .. ... .... ... " "

", \. '. \1'··

.-. ~

,,-

-.,.

_-'_

I\.

_.

. ----

.. --.---.

-.--~--

. _-.

. ---J

.,

... -

I

"

#ii

o -5

0,0

RO

LL

: a 1

15

.0

FIL

E

: x

frM

4.3

d

... :.

,.,.F

· ..

.. -,'

'" .'

.... 1

"".'

.,.'

.' r···

· .. .' ..,- .'

•. 1··· ..

.r··

.' , . . '" .. "

... r

1"-

: "

" ."

'''''-' .. ~

,

, .... , .....

'. -·0

", "'~

...... , .. '.

'.~

L ..

... ., •• ,

•••• 1-

. ... ".',\

., .":

. ..

FIG

23

A

TY

PIC

AL

V

IEW

W

ITH

S

ING

LE

U

TTER

AN

CE

"4"

OF

SPE

AK

ER

A

ON

L

PC

CO

EF

FIC

IEN

TS

67

...

_,

..If

•

~a·~

. a

-50

.0

RC

LL

o

31

5.0

FIL

E

xft

-M4

.3d

SOR

TED

EI

GEN

VA

LUES

1 2

3 4

5 6

7 8

9 1

0

11

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

-0

.98

4

0.9

42

0

.67

3

0.5

72

0

.50

8

0.4

20

0

.35

3

0.1

92

0.

1'19

0

.05

1

0.0

19

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S

1 2

3 4

5 6

7 8

9 1

0

11

--

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

---

-0.1

88

-0

.00

3

0.1

19

-0

.01

7

0.09

'1

-0.0

22

-O

.U'H

i -0

.12

3

-0.0

10

-0

.19

7

0.1

25

-0

.14

0

-0.0

02

0

.46

3

0.0

16

0

.17

0

-0.2

30

-0

.21

2

-0.3

43

-0

.03

4

-0.0

56

0

.03

3'

-0.1

88

0

.01

9

0.3

16

-0

.05

5

0.0

80

-O

.37

ti -O.~HO

-0.2

7,1

-O.03~

0.1t

13

0.0

33

-0

.29

1

0.0

12

0

.19

6

-0.1

96

0.

Ofj

5 -0

.29

0

O. 2~\

5 -0

.07

9

-0.0

07

-0

.30

0

-0.0

67

-0

.28

5

0.2

63

0

.37

3

-0.4

81

0

.27

9

-0.1

.187

0.~5~

-0.3

11

-l).1~~

-0.3

22

-0

.38

9

-0.0

55

-0

.21

0

-0.3

83

0

.41

1

0.3

27

O

. '18

t1 -

O. 1

55

-

O. 1

62

0

.06

4

-0.3

77

-0

.13

8

0.1

98

-0

.26

6

0.'1

53

0.2

38

O

. {j0

2 D

.l :j

7 -0

.02

5

-0.0

05

-0

.35

3

-0.5

11

0

.29

5

-0.4

3'1

0

.57

0

-0.4

83

0

.33

3

-0.1

39

0

.04

3

-0.5

03

-0

.33

5

-0.7

19

0

.20

9

·-0

.5'1

1

0.3

73

0.

0,13

0.

00,1

-0

.51

2

0.1

23

-0

.12

8

-0.2

80

4-0

.21

8

0.2

56

-Q

.15

6

0.0

55

-0

.07

'1

0.06

,1

-0. '

1 65

0.

,126

0

.38

9

-0.3

68

0

.28

8

0.0

77

0

.05

9

-0.1

82

0

.28

1

-0.1

93

-0

.37

9

O. 8(j

~ -0

.53

0

EIG

EN

VA

LU

ES

AN

D

CO

RR

ESP

ON

DIN

G

EIG

EN

VE

CT

OR

S FR

OM

S

IR

ON

L

PC

CO

EF

FIC

IEN

TS

O

F S

ING

LE

U

TT

ER

AN

CE

1/

'1"

BY

SP

EA

KE

R

A

WIT

H

THE

4TH

C

OE

FF

ICIE

NT

A

S Y

0.0

40

0

.20

6

0.2

61

0

.37

7

0.5

49

0

.54

0

0.3

59

TA

BL

E

6 T

YP

ICA

L

EIG

EN

VA

LU

ES

A

ND

E

IGE

NV

EC

TO

RS

FOR

S

ING

LE

U

TT

ER

AN

CE

O

F 11

4"

BY

SP

EA

KE

R

A O

N

LP

C

CO

EF

FIC

IEN

TS

68

Upon inspection on the eigenvalues for the LPC coefficients, it

is seen that the first two eigenvalues are not signif icantly

larger than the rest. For example, the eigenvalues and

eigenvectors corresponding to 'the result above is shown in TABLE

6. Therefore, it seems that the transformation obtained may not

be the actual best views. It may be due to the relatively small

number of points in the raw data. Anyway, it still provide a thin

thread result as we expected. And it serves as reference for

future comparison.

The cepstral coefficient result is rather disappointing. It shows

a similar shape as the LPC coefficients, but with much greater

dispersion in points. An example is shown in Figure 24, which

correspond to the LPC resul ts in the previous figure. Upon

inspection of the first 2 eigenvalues, they are also not

distinctively larger than the rest [Table 7]. Therefore, no

conclusion should be made at this moment of time concerning the

worsening result of the cepstral transform. It may be caused by

~IR instead of problem of the transform itself.

Details of all 3-D views,eigenvalues and eigenvectors are shown

in appendix IV.

69

...... •• 1

, ..

.:.:

:

{ / t I t j I I . ! I I I , J I I t I ! ! ---"'-

.- '"-----

-- --""

'---- . --~

-.---.

-----.

~

o -4

5.0

ROLL

: '0

. 33

5.0

FIL

E

: x

f 1-1

,,4. 3

d

. . . .

-~--

----

--~-

.---

--.'

.I e ••••• :

,I / , / l

FIG

24

A

TY

PIC

AL

V

IEW

W

ITH

S

ING

LE

U

TTER

AN

CE

"411

'OF

SPE

AK

ER

A

ON

C

EPS

TR

AL

C

OE

FF

ICIE

NT

S

70

~f,j

(]

-40

.0

RO

LL

: o

17

0.0

FIL

E

: xft,~.3d

SOR

TE

D

EIG

ENV

ALU

ES

1 2

3 4

5 6

7 8

9 10

11

--

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

---

0.92

8 0.

715

0.59

1 0.

556

0.50

6 0.

395

0.3

/ 14

0.26

6 0.

194

0.11

7 0.

050

CO

RR

ESP

ON

DIN

G

EIG

EN

VE

CT

OR

S

1 2

3 4

5 6

7 8

9 1

0

11

--

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

---

-0.1

48

-0

.24

0

-0.1

53

0

.01

1

0.1

39

0

.04

5

0.1

07

0

.07

2

0.0

51

-0

.01

3

-0.1

96

0.0

37

0

.25

4

0.1

65

0

.05

2

-0.0

'15

0

.10

1

0.1

57

0

.02

0 -O.O~)O

-0.0

39

0

.07

4

0.2

32

-0

.07

7

0 .. 3

17

0.0

27

-0

.1 (j

/l

-0.2

'11

0

.01

2

-0.3

09

-O

.06

fj

0.0

12

-0

.04

2

0.2

25

-0

.72

0

0.0

35

0.

'119

0.

2U8

0.0'

11

0.0

32

-0

.07

6

0.0

89

0

.07

3

-0.0

70

0.1

31

-0

.01

2

0.4

66

0

.2(j

8

0.0

06

-(

).'1

96

0.1

11

0.

3(jf

) -0

.]7

3

-0.1

19

-0

.22

1

0.0

25

0

.09

7

0.07

'1

-0.1

18

-0.13~

() . 0

~ '7

-

0 . 5

(; 0

-

(). 1

1 :1 H

-

O. ,I

J H

-

O. 1

02

-

O. 1

21

0

.42

8

0.3

81

-0

.27

8

0.2

83

-0

.70

/1

0.19

H

0.:3

1'1

-0.0

22

0

.21

5

-0.4

30

-0

.00

5

0.3

21

-0

.33

0

0.2

36

-0~607

-0.1

99

0

.27

0

O.2

H3

-0.1

2'1

-0

.24

2

0.3

70

-0

.22

0

0.1

52

-0

.08

9

0.4

65

-0

.39

0

0.3

20

-0

.36

0

0.1

94

-0

.57

5

0.7

61

-0

.10

H

-0.2

46

-0

.04

3

'.-0

.00

7

-0.5

00

-(

). 2

15

0.

'1 fl

2 -0

.5]

,I -0

. 25

0

-().

/15~

-0

.23

1

0.'1

57

-0.0

11

,

0.7

37

-0

.28

6

-0.1

62

-0

.26

9

-0.0

86

-0

.40

9

-0.5

92

0

.10

1

-0.1

35

-0

.65

2

0.8

81

,EIG

EN

VA

LU

ES

AN

D

CO

RR

ESP

ON

DT

NG

E

IGE

NV

EC

TO

RS

FRO

M

SIR

ON

C

EP

S'I

'RA

L

CO

EF

FIC

IEN

TS

O

F

SIN

GL

E

UT

TE

RA

NC

E

"'1"

B

Y

SP

EA

KE

R

A

WIT

H

TH

E

4TH

C

OE

FF

ICIE

NT

A

S

Y

TA

BL

E

7 T

YP

ICA

L

EIG

EN

VA

LU

ES

AN

D

EIG

EN

VE

CT

OR

S FO

R

SIN

GL

E

UTT

ERA

NC

E O

F 11

4"

BY

SP

EA

KE

R

A

ON

CE

PS

'rR

AL

C

OE

FF

ICIE

NT

S

71

4.2.2 TWELVE UTTERANCES OF "4" BY SPEAKER A

All the 12 utterances of "4" by speaker A are used. Among the raw

data, speaker A has the most utterances per digits. The other 2

speaker have only 5 utterances per digits. Therefore, we used

speaker A in this and the previous experiment.

In viewing the resultant transformed data, the view produce quite

good results for LPC coefficients. A typical one is depicted in

Figure 25. - When using the first few dimensions as the y

regressors, a clear view of two clusters are obtained. One of

them should be the voiced sound in "-our" while the other cluster

should correspond to the unvoiced sound "f". The resultant views

show that the data cluster quite compactly. A strip shape cluster

should correspond the voiced sound. variations should be caused

by the difference between different utterances. The elongated

shape should reflect the gradual transition of the voiced sound

to contain the "r" sound effect. Obviously, a more detail

separation of points is needed to actually confirm this.

As the y regressor reaches the last few dimensions, the

points began to spread out. This is somewhat expected because the

last few dimensions are relatively small value. Using this

dimension as the regressor will be much more sensitive to small

errors, inaccuracies, and other variations.

72

.1'.-

' .. '

-... -~

... -....•.

... -....

... ~

_ •.••

.•. 1'

·"·

-.--.~".

~ . ...-, .....

.. ..

-:-- >':':":'

:.:.:.:.

"::;:'~"

" .-.. .....

. , ... , .•.

. , •.....•.. ••.. ~ ...

..• •...•.•...

.-: .. ,

YAW

a

-50

.0

RO

LL

: o

31

0.0

FIL

E

: lP

C4

S4

.3D

:. '.

;1

.__

__

__

__

__

__

ii

-off

I ..t

.

r

...••

·.r-.

. IT

"'- t:'

r;~

YAW

: o

-4-0

.0

RO

LL

(]

90

.0

FIL

E

: L

PC

4S

4.3

D

FIG

2

5

A T

YPI

CA

L

VIE

W

WIT

H

12

UTT

ERA

NC

E "4

" O

F SP

EA

KE

R

A O

N L

PC

CO

EF

FIC

IEN

TS

73

In contrast to the previous single utterance case, the first two

largest eigenvalues for LPC coefficients are quite significantly

larger than the rest in general. (Refer to Table 8) One exception

is that the second largest eigenvalues, though still quite larger

than the most of the others, are usually relatively close to the

third largest eigenvalues. As from the theory of SIR, this may

still spawn doubt of the accuracy of the best views as the

previous single utterance of 11 four" by the same speaker. However,

the views obtained are still encouraging and provide reasonable

results.

As in the previous case, the cepstral transform failed to obtain

any better result. Instead, it seems again to worsen the view.

A typical example is shown in Figure 28. The best view on the

cepstral coefficients after the transform does not show any

interesting features. Instead, at the very best, only clustering

is observed. According to the eigenvalues, the view obtained by

SIR are similar to the one found in the LPC coefficients. Table

9 shown the eigenvalues and corresponding eigenvectors for Figure

28. The largest two eigenvalues are quite distinctively larger

than the rest. still," the second largest eigenvalue is somewhat

close to the third largest eigenvalue. There may still leave a

possibility that the worsening of cepstral transform may be

caused by SIR. Therefore, the value of cepstral transform cannot

be discarded yet.

Details of all 3-D views,eigenvalues and eigenvectors are shown

in appendix V.

74

SOR

TED

EI

GEN

VA

LUES

:

1 2

3 4

5 6

7 8

9 1

0

11

0.9

94

0

.91

00

.35

2

0.1

96

0

.14

7

0.0

68

0

.04

1

0.0

30

0

.02

3

0.0

18

0

.00

3

CO

RR

ESPO

ND

ING

EIG

ENV

ECTO

RS

1 2

3 4

5 6

7 8

9 1

0

11

.---

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

--0

.19

9

0.4

35

0

.55

0

-0.1

08

-0

.06

5

0.1

44

-0

.10

6

0.1

20

-0

.02

2

-0.1

97

-0

.05

5

-0.0

01

0

.01

2

-0.0

91

-0

.. 143

0

.04

4

0.1

65

-0

.05

5

0.2

37

-0

.06

4

-0.4

65

0

.10

8

0.4

64

-0

.12

2

-0.4

91

-0

.09

3

0.1

27

-0

.25

7

0.2

49

-0

.30

9

-0.0

17

-0

.26

7·

0.1

47

0

.31

5

-0.4

51

-0

.01

1

0.0

40

-0

.16

6

0.4

38

-0

.50

2

0.1

86

-0

.23

6

-0.2

71

-0

.13

9

0.0

48

-0

.07

8

-0.2

61

0

.21

9

0.0

10

0

.08

2

-0.2

83

-0

.33

3

-0.1

44

-0

.44

2

-0.2

18

0

.44

9

0.1

75

0

.01

5

0.2

44

0

.08

1

0.1

38

0

.03

5

-0.0

00

0

.36

7

-0.1

31

-0

.38

2

0.3

23

0

.57

4

-0.0

94

-0

.15

5

-0.2

62

0

.60

0

-0.1

48

0.

l1S7

0

.10

9

-0.0

93

-0

.58

2

-0.1

74

0

.24

7

0.0

73

-0

.28

0

-0.1

46

-0

.18

7

-0.0

55

0

.01

3

-0.4

04

-0

.45

2

-0.5

56

-0

.05

6

-0.3

92

0

.54

4

-0.2

48

0

.32

5

-0.4

07

0

.44

9

-0.5

40

0

.19

2

-0.4

12

-0

.26

9

0.3

25

0

.05

0

-0.0

37

-0

.60

8

0.4

03

0

.21

9

0.1

02

-0

.30

4

0.5

24

-0

.06

6

-0.0

25

0

.44

6

-0.0

98

-0

.26

3

-0.5

63

0

.76

5

0.2

'14

-0

.59

/ 1

0.2

60

-0

.54

5

0.0

02

0

.17

3

EIG

EN

VA

LU

ES

AN

D

CO

RR

ESP

ON

DIN

G

EIG

EN

VE

CT

OR

S FR

OM

S

IR O

N

LP

C

CO

EF

FIC

IEN

TS

O

F

12

UT

TE

RA

NC

ES

O

F

"4"

BY

S

PE

AK

ER

A

W

ITH

TH

E 4T

H

CO

EFF

ICIE

NT

A

S Y

TA

BL

E

8 T

YP

ICA

L

EIG

EN

VA

LU

ES

AN

D

EIG

EN

VE

CT

OR

S FO

R

12

UTT

ERA

NC

E O

F 11

4"

BY

SPE

AK

ER

A

O

N

LPC

CO

EF

FIC

IEN

TS

75

....

... -

0: o

... :

~~ ~.. ,

::;):: ...

:.:

° 0

00

.., •••••

: •••• r.

.1' ::

.~: .

: .':

.~.: .•

--:=' ~

h-i:··

.'::·:

....

' s. ~ ...

~ l.f

1:"..

.. °

'/_

"_'I

-"-'

I-"-

'I_

/-.. -·~~~

~~?F<:: .. :,:.: ...... .

"1

.•.. "'. '. ". ... '1 • . , ....

•. ; ..•.

=.J=

-~W

i -

--

a -5

0.0

RO

LL

a

2iO

.O

Fli

_E

x

r t-t

'14

. 3d

.... -

. -.

. -:"'':

:

.~:~.~:~

• L

.: . ...:

.. ,A

-::'

!'.

_ L

· .J.

. ..

... ...

•... _ ...

• I'

....•

. . --.

...• .... . -

. l~~~~· : ..........

..... .... ! :.

:.~.

. '::

,1.'

. ...

. .t1

;~

......

. ••

-L

. e •

. ··t· .

I ••

• ~ •.••

· •••. ~~

.:

.: .. -:

-..

. ;....

-

::: ..

.: .: ~~U;i~

~;L~ .... .' :.:~.

' ••

"e __ ••

• -..... -.

"'--

:..-.

,-- ." ..

'-. -.... ' ..

".'-:

. ....

'tAW

(]

-50

.0

RO

LL

o

32

0.0

FIL

E

x"ft~.3d

FIG

2

6

A

TY

PIC

AL

V

IEW

W

ITH

1

2

UT

TE

RA

NC

E

"4

"

OF

S

PE

AK

ER

A

O

N

CE

PS

rrR

AL

C

OE

FF

ICIE

NT

S

76

SORT

ED

EIG

ENV

ALU

ES

1 2

3 4

5 ·6

7

8 9

10

11

-~------~--------------------------------------------------------------------

0.8

14

0

.45

0

0.1

44

0

.11

9

0.0

87

0

.07

3

0.0

64

0

.05

2

0.0

46

0

.03

3

0.0

12

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S

1 2

3 4

5 6

7 8

9 1

0

11

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

-0

.18

8

-0.0

88

0

.01

0·

0.0

51

0

.30

8

-0.1

03

-0

.16

9

0.0

77

-0

.13

8

-0.0

31

-0

.01

5

0.0

94

-0

.07

7

0.0

37

0

.00

7 .

~0.248

0.1

46

0

.02

4

0.2

13

-0

.00

9

-0.2

06

-0

.02

4

-0.2

41

0

.30

1

-0.0

37

-0

.19'

1 -0

.02

6

0.3

86

0

.19

7

0.0

11

-0

.05

7

0.0

68

0

.02

1

-0.3

73

0

.39

3

0.0

44

-0

.12

9

0.2

32

0

.22

8

-0.5

21

0

.19

0

-0.0

22

0

.19

0

0.0

47

0

.26

6

-0.5

13

-0

.28

6

-0.2

99

-0

.01

9

0.'1

46

-0.2

32

-0

.02

4

-0.3

29

0

.03

8

0.0

16

0

.15

2

-0.0

14

-0

.39

0

-0.0

63

0

.65

7

-0.0

65

0

.28

9

0.4

52

0

.13

9

-0.2

96

0

.14

7

. -0

.14

6

-0.2

45

0

.24

3

0.5

68

0

.18

1

0.5

09

0

.52

6

0.0

37

-0

.18

4

-0.3

18

0

.55

3

-0.2

26

0

.58

0

0.2

69

-0

.30

0

0.2

75

-0

.02

4

0.3

64

-0

.25

4

0.1

64

-0

.31

7

-0.3

95

-0

.53

3

0.1

14

0

.07

2.

-0.1

84

'-

0.4

02

-0

.38

9

0.1

80

0

.00

4

-0.8

82

0

.22

6

-0.0

13

-0

.17

7

0.0

40

0

.31

9

0.2

14

-0

.01

8

-0.0

78

0

.02

0

0.8

00

0

.10

5

0.7

14

-0

.67

3

. -0

.52

7

-'0

.26

0

-0.7

23

0

.59

8

-0.2

93

0

.38

3 .

-0.2

85

0

.03

4

-0.0

18

0

.25

3

-0.2

45

EIG

ENV

ALU

ES

AND

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S FR

OM

SI

R O

N C

EP

ST

RA

L

CO

EF

FIC

IEN

TS

O

F

12

UT

TE

RA

NC

ES

O

F "4

11

BY

S

PE

AK

ER

A

WIT

H

THE

4TH

C

OE

FFIC

IEN

T A

S Y

TA

BL

E

9 T

YP

ICA

L

EIG

EN

VA

LU

ES

A

ND

E

IGE

NV

EC

TO

RS

F

OR

1

2

UT

TE

RA

NC

E

OF

"4

"

BY

S

PE

AK

ER

A O

N C

EP

ST

RA

L

CO

EFF

ICIE

NT

S

77

4.2.3 THREE UTTERANCES PER SPEAKER OF "4" BY SPEAKER A, B AND C

For further investigation, 3 utterances of the utterance "4" by

all 3 speakers are taken as raw data. The data can be used to

compare with the past two cases.

The LPC views continue to provide encouraging results. Many views

review a plane structure of the data points. An example is shown

in FIG 27. Points basically concentrate within a strip-like

structure. However, comparing the resultant shape with the

previous case, it seems that the shape does not resemble with

each other a lot. The double clustering features is missing in

this test. The variation between different persons seems to blur

up the clear distinction of the "f" and "-our" sound.

On the other hand, the eigenvalues obtained in this case are far

more indicative. The first two eigenvalues are quite larger than

the rest, even the third one. Therefore, the data obtained in

this case should be more certain and more indicative. The

eigenvalues and eigenvectors corresponding to the view in Figure

27 is shown in Table 10.

The cepstral transform continue to provide poor results. The

points in the best view look as if it is a random cluster only.

No interesting feature is spotted. A typical one is shown in FIG

28. This time, the eigenvalues are more indicative as the LPC

transform. [Table 11] The first two eigenvalues are quite

distinctively larger than the rest. Therefore, up to this point,

78

------

------

------

------

------

-

" ,l l I \

t.

/l

./l

(' . ( .l .' ...• / /

~t '/'

,:1 •

• '

'f'AI..1

a

-40

.0

RO

LL

: a 7

0.0

FIL

E

: )(f .,,\

4. 3

d

-.-----~----

----.--.----

--------...:

:=.ki;-: .. : .. :'.

r. ['L

I".'J

•

• ' r,

..,.

d:.~

."

... ··.·

h~r·,:

.: /1

.. ;~t

}:.

,I • ':

!~~.

_.,

... : •.

::·,'~·

:··r'r.

" ~··

.. r-t.,.

:. ,I

• ~'.

r"_'

,/ . ,'.

;'.!~,df,:

I

..

... '.

.

, 't

i' -~ .

. ./

~"'\."':

~ ,

-'I'f.

:.,

..' :,

:.!'?"

i

'

... . :·

:<t-·:

',.':'.~

.:.:.:

'

" 1

.. : .. "; .

.

.. ' ,', , . '. ' .

VAW

o

-40

.0

RO

LL

Cl

16

0.0

FIL

E

: X

ft"1

14.3

d

FIG

27

A

TY

PIC

AL

V

IEW

W

ITH

3

UT

TE

RA

NC

E

PER

SP

EA

KE

R

OF

11

4"

OF

SPE

AK

ER

A

, B

, C

ON

L

PC

CO

EF

FIC

IEN

TS

79

___

__

__

__

• _

___

__

___

_ • _

__

__

__

• __

__

_ 0

••

__

. __

_ •

__

_ .

__

_ _

SOR

TED

EI

GEN

VA

LUES

1 2

3 4

5 6

7 8

9 10

11

0.9

92

0

.63

3.

0.3

66

0

.26

9

0.0

78

0

.06

2

0.0

55

0

.02

9

0.0

20

0

.01

6

0.0

05

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S

1 2

3 4

5 6

7 8

9 1

0

11

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

-0

.21

3

-0.4

39

-0

.24

0

-0.1

67

0

.01

1

-0.1

02

0

.08

5

-0.0

43

-0

.03

3

0.0

26

-0

.05

2

0.2

55

-0

.10

5

-0.1

10

-9

.17

2

-0.1

50

0.

U'1

B

-O.O

BB

-0.0~}9 ~O.lJJ

-0.0

59

-0

.03

4

0.3

12

0

.19

4

0.0

43

-0

.15

5

-0.0

23

0

.29

7

-0.0

54

-0

.01

3

-0.1

24

-0

.00

3

-0.0

29

0

.37

3

-0.1

21

-0

.00

9

0.0

94

-0

.06

0

0.2

91

-0

.22

0

-0.0

12

0

.04

5

0.0

75

-0

.07

6

0.3

73

-0

.41

4

0.2

39

-0

.05

1

-0.0

05

0

.26

0

-0.0

85

-0

.0:3

,0,

0.1

97

0

.28

3

-0.2

01

0

.36

1

-0.1

42

-0

.01

7

-O.O

UO

-0

.10

5

-0.5

1H

O

.OlU

-U.07~

0.'1

26

0.2

00

-0

.27

8

0.3

21

-0

.38

7

-0.2

56

-0

.16

8

-0.1

48

-0

.14

3

-0.3

92

-0

.01

9

0.3f

j··l

0.5

65

-0

.37

8

0.2B

O

-0.3

51

-0

.28

2

-0.3

UO

-0

.31

5

0.11

B

-0. 0

~~4

0.1

53

0.

26'1

0

.67

3

-0.4

74

0

.26

2

0.3

70

-0

.53

7

-0.4

69

-0

.34

6

-0.'1

80

-0

.24

0

0.4

16

0

.39

1

0.1

60

-0

.32

6

0.2

55

0

.17

8

-0.6

2H

-9

.51

5

-0.t

l1{j

0.

'111

-0

.55

7

0.,1

72

0.5

77

0

.16

3

0.1

57

0

.26

3

-0.3

33

-0

.20

5

-0.5

10

-0

.72

(j

0.2

14

-0

.62

9

0.7

49

0

.23

5

0.1

35

0

.61

1

EIG

EN

VA

LU

ES

. AN

D

CO

HH

ES

PO

ND

ING

E

IGE

NV

EC

'L'O

HS

FH

OM

S

IR

ON

L

PC

C

OE

FF

ICIE

NT

S

OF

:3

UT

TE

RA

NC

ES

P

ER

SP

EA

KE

R

OF

11

/1"

BY

S

PE

AK

ER

A

,B,C

W

ITH

T

HE

4T

H

CO

EF

FIC

IEN

T

AS

Y

TA

BL

E

10

T

YP

ICA

L

EIG

EN

VA

LU

ES

AN

D

EIG

EN

VE

CT

OR

S FO

R

3 U

TT

ER

AN

CE

PE

R

SPE

AK

ER

O

F "4

" BY

SP

EA

KE

R

A,

B,

C

ON

L

PC

CO

EF

FIC

IEN

TS

80

FIG

2

8

.... '

: •• 1

".

:.

.':.,

. ', '

.~ ......

.. ·r

~.

. .. :

.. ; f.. ..

: :~~!

. .': ~:

.: 0.-"

.:r;

,~,(:..:

~~'~'

;'~r~~

::-i''

:~ ,

• I :~

J -,·

-·t.·l

~·I' .•

_, 0

•

-.1 ~"

.'

~: ...

....

....

.. _

. 0

\ 0 .

.. J~

, ''

'\1~

" ":

;.'.

0

00

0

:-',

L.

i.1

~ ?~r·

.~'.~"

0 ,

----

-,-/""

,~,~

, ... ',

.. ; .

.. ~.

. 1

"'''-

', "

11

••• ., ...

: :.

1 I'

' •

• '.:

:. JL

""

~"''

''~A

-0

'~'..

.. ·t

I

L" -

" 0

'

0 ' ... 0

+,

• ,;:

,~.:.)

.. 0

0:

: '.

o ., ....

, ~ ..

..

I .1

' •

,. . ; ~'

..

...

I

. '.

~

o -4

0.0

RO

LL

o 1

80

.0

FIL

E

xf

l"1

r)4

. 3

d

.. ',' .

.: I'

: "

I. ..

, ,

''',

-J

.,-.

.. ' ..

I '.: :.:

..

l I l I •

J

. :~>:'::

·::!:;tL

.:;, ,

•.

~'~l

' I.

' .. :

.. \~:~

'~~:::'.

' :

",,: i 1

,.

•

;: .:: '{f .

~iii;~j;::

: :' ....

.. ~:·I ~:

;:·1::,:

:· :::

.

. "j.~f:t1

~i'W~~--

---~--___

_ ,'_'

I:' J!"

,':

•••

. ..• "

'.::

.

A T

YP

ICA

L

VIE

W

WIT

H

3 U

TT

ER

AN

CE

PE

R

SPE

AK

ER

O

F "4

11

OF

SPE

AK

ER

A

, B

, C

81

~

a -4

0.0

RO

LL

o 3

50

.0

FIL

E

xft

-wl.

3d

ON

L

PC

CO

EF

FIC

IEN

TS

SO

RTE

D

EIG

EN

VA

LUE

S

1 2

3 4

5 6

7 8

9 10

11

--

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

---

0.4

67

0

.23

4

0.1

32

0

.09

7

0.0

63

0

.03

0

0.0

26

0

.01

8

0.0

14

0

.00

9

0.0

03

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S

1 2

3 4

5 6

7 8

9 1

0

11

--

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

---

-0.0

43

-0

.06

0

0.0

00

-0

.05

1

0.2

49

0

.03

8

-0.1

56

0

:02

8

-0.0

80

0

.04

8

0.0

95

-0

.19

0

-0.1

63

0

.01

9

0.0

02

-0

.21

6

-0.0

52

-0

.02

1

0.0

42

0

.07

6

0.0

82

0

.05

9

0.4

95

0

.16

1

0.2

27

0

.22

8

-0.0

14

0

.02

0

0.1

51

-0

.07

8

-0.1

09

0

.08

0

0.0

26

-0

.11

3

0.5

90

-0

.25

5

0.2

04

0

.10

9

0.0

81

-0

.14

0

0.2

78

-0

.25

2

0.0

70

-0

.13

4

0.1

09

-0

.15

5

0.3

62

-0

.07

3

0.2

62

0

.35

1

-O.l

l:i

-0.1

36

0

.42

2

-0.0

54

-O

.lB

H

0.7

29

-0

.27

6

0.2

55

-0

.03

8

-0.2

5'1

0

.11

1

0.5

60

0

.15

2

-0.0

75

0

.28

3

0.4

90

-0

.13

6

-0.3

30

0

.49

8

-0.3

44

0

.55

3

-0.3

28

0

.40

8

0.1

54

-0

.51

2

0.4

03

-0

.35

9

-0.0

79

0.

45'1

0

.18

5

0.2

02

0

.09

1

-0.5

17

-0

.05

4

0.0

05

()

. 1 1'

1 -

O. 1

17

0.7

41

-0

.29

8

0.0

69

0

.16

5

0.3

78

0

.13

6

0.5

21

0

.25

5

-0.6

41

-0

.32

1

-0.4

38

-0

.07

2

0.2

11

. 0

.21

8

-0.5

7:1

0.2

53

-0

.05

3

-0.0

06

-0

.20

7

-0.6

22

0.

1'16

0

.72

5

0.0

56

0

.05

0 '

0

.35

5

0.2

20

-0

.72

9

-0.6

41

0

.45

5

-0.5

49

-0

.22'

1 -0

.57:

1 -0

.04

8

-0.0

85

EIG

EN

VA

LU

ES

AN

D

CO

RR

ESP

ON

DIN

G

EIG

EN

VE

CT

OR

S FR

OM

S

IR O

N

CE

PS

TR

AL

C

OE

FF

ICIE

NT

S

OF

3 U

TT

ER

AN

CE

S

PE

R

SPE

AK

ER

O

F "1

1'1

BY

SP

EA

KE

R

A,B

.C

WIT

H

THE

4TH

C

OE

FFIC

IEN

T

AS

Y

TA

BL

E

11

T

YP

ICA

L

EIG

EN

VA

LU

ES

AN

D

EIG

EN

VE

CT

OR

S FO

R

3 U

TT

ER

AN

CE

PE

R

SPE

AK

ER

O

F "4

" BY

SP

EA

KE

R

A

ON

C

EPST

RA

L C

OE

FFIC

IEN

TS

')

'.

82

it can be concluded that cepstral transform is not a suitable

method for dimensional reduction of speech data.

In fact, in the next two cases, the cepstral transform still

failed to provide any better view than LPC. This reinforces the

negative conclusion on the cepstral transform.

All views, eigenvalues and eigenvectors for this case are

depicted in Appendix VI.

4.2.4 TWO UTTERANCES PER DIGIT OF "1" TO "9" BY ' SPEAKER A

For further test, the raw data of the utterances digit "1" to "g"

are combined. These data includes a wide range of voice sound

instances and unvoiced sound instances. Therefore, it serves as

a real test on the dimension of speech data.

Once again, the LPC transformed data shows planer feature in

strip form. An example is shown in Figure 28. This is a highly

encourag ing result. utterances of 11 111 to 11 nine" contain many

voiced sound and unvoiced sound. still, its SIR transformed LPC

coefficients can be observed to be 2 dimensional in structure.

It seems that the reduction of speech dimensions to 2 dimensions

is quite possible. However, this should be double confirmed in

the next case.

The eigenvalues for the LPC transform remain indicative [Table

12]. The first two eigenvalues are quite larger than the rest.

This shows that the encouraging result does not come by chance.

83

.-...... ~ --

----

-'---

----

---'"

---'"

"---

----

--. iiJ'J

r~

.~ .

.f:J-

.. ' :t £ ~.

.. " .:~

" ,,/

, "l

-"

IiI

..'

" " /

,ill

" ./ " I / " l

~

a -5

0.0

RO

LL

o 7

0.0

FIL

E

xf"

... 4

.3d

.'

,""

~ . o

-40

.0

Rt1

.L

: a 1

80

.0

FIL

E

: xft'~. 3

d

FIG

28

A

TY

PIC

AL

V

IEW

W

ITH

2

UT

TE

RA

NC

E

PER

D

IGIT

PE

R

SPE

AK

ER

O

F 11

1"

TO

"9"

OF

SPE

AK

ER

A

, B

c

C

ON

L

PC

CO

EF

FIC

IEN

TS

84

' ..

' "S

OR

TE

D

EIG

EN

VA

L U

ES

1 2

3 4

5 6

7 8

9 1

0

11

...

. ------------------------------------~-----------------~--------------~-------

0.9

89

0

.35

8

0.1

23

0

.05

3

0.0

44

0

.03

3

0.0

31

0

.01

5

0.0

13

0

.00

7

0.0

02

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S ..

~ .. '

. :

"

", .

..

.' .'

..

i'·

1 2

3 4

5 6

7 8

9 1

0

11

-----------------------------------------------------.----------------~------

0.2

33

0

.10

3

-0.1

80

-0

.14

4

-0.0

74

0

.04

7

-0.0

09

0

.03

9

-0.0

47

-0

.00

5

-0.2

05

0

.22

5

-0.1

33

-0

.20

7

-0.0

60

-0

.10

9

0.2

29

-0

.06

6

-O.J

12

-0

.11

7

-0.1

25

-0

.30

7

0.3

32

0

.31

5

-0.0

85

0

.16

5

0.1

76

0

.18

0

-0.2

25

-0

.27

1

0.0

52

-0

.13

5

-0.1

58

0

.32

0

-0.2

92

" -0

.18

3

0.1

22

-0

.38'

1 0

.23

3

Cl.'1

E'7

0.2

83

-0

.05

/ 1

-0.2

12

-0

.15

6

0.2

64

0

.06

7

-0.3

06

0

.11

0

0.1

94

0

.31

3

0.1

51

.0

.'139

0

.12

8

-0.4

09

-0

.15

9

0.3

97

-0

.44

9

-0.1

97

-0

.01

5

0.35

:3

O.1

fj8

-0./

129

0.2

19

0.

00,1

0

.11

8

0.0

24

0

.38

9

-0.2

57

-0

.1/1

'1

0.0

80

-0

.30

H

0.'1

33

-0.2

09

-0

.08

7

0.1

77

0

.50

8

-0.0

01

0

.29

2

0.0

57

' -O.~~19

-0.2

07

-O

.:lO

2 O

. :l'l

'l O

.'15G

-0

.27

1

O.7

J6

0.2

26

-0

.01

2

0.2

53

-0

.03

1

-0.5

30

0

.11

7

-0.0

33

0

.04

6

0.2

85

-0

.16

6

0.2,

15

0.2

08

0

.19

8

0.2

72

( 0

.56

9

-0.'1

53

~O.086

0.19

,1

0.5

76

0

.12

1

-0.2

01

-0

.'19

3

0.1

92

0

.43

7

0.28

4 0

.43

3

-0.3

77

-0

.92

3

-0.6

46

O

. 30

5

-(l

. '11

5 -

0.5

H 8

-O

. G H

5 -

O. 5

92

0

.71

7

EIG

EN

VA

LU

ES

A

ND

C

OR

RE

SP

ON

DIN

G

EIG

EN

VE

CT

OR

S

FRO

t'wt

SIR

O

N

LP

C

CO

EF

FIC

IEN

TS

O

F

2 U

TT

ER

AN

CE

S

PE

R

DIG

IT

OF

"I"

T

O

"9/1

B

Y

SP

EA

KE

R

A

WIT

H

TH

E

4T

H

CO

EF

FIC

IEN

T

AS

Y

TA

BL

E

12

' T

YP

ICA

L

EIG

EN

VA

LU

ES

AN

D

EIG

EN

VE

CT

OR

S FO

R

2 U

TT

ER

AN

CE

S P

ER

SP

EA

KE

R

PE

R

DIG

IT

OF

"1"

TO

"9"

BY

SPE

AK

ER

A

O

N

LPC

C

OE

FF

ICIE

NT

S

'0·

;

85

The cepstral transform continue to provide random clustering

only. This result simply reconfirm our previous conclusion about

the poor performance of cepstral transform. [FIG 29 and Table 13]

All views, eigenvalues and eigenvectors are depicted in Appendix

VII.

4.2.5 ONE UTTERANCE PER DIGIT PER SPEAKER OF "1" TO "9" BY

SPEAKER A,B,C

As in the previous case of single speaker uttering "1" to "9",

3 speakers speaking the same words still provide good result in

LPC coefficients. Transformed data points still seems to lies on

a 2 dimensional surface. Comparing to the previous case of single

speaker, this multiple speaker case seems to have a slightly

wider spread and more dispersed points. This may be due to the

deviations of multiple speakers. A typical example ~ is shown in

Figure 30.

The eigenvalues are indicative as all the previous cases [Table

14J. This again confirms the significance of the results.

Cepstral transform does not provide any surprise. The result

simply looks like a random cluster. An example is shown in Figure

31 and Table 15.

All views, eigenvalues and eigenvectors are depicted in Appendix

VIII.

86

J'~~~~ ~~i~tf

~~:.';-'. /:-{

:: . ....... L

.' .f

l "l'

~''''

,".'

' ••

• ..

.:. ,

I, t. ".

:~~

.:r !~

·i·

.' oe-

•

. ' .=-:~:~:~

;i:~~~,~

~.~:~:(.-

.. .

:.

~~~

a -5

0.0

.t-R

OLL

a

0.0

FILE

b

: xft

,...

4. 3

d

--,--

.. I. •

• '.

.. " ... '."

..

, .

...

. : . .\~.

:~·~.;

::,~s:

:·< .:'

. ,y-,--._/~<)~~~

~~:.:::?" ' ;'

,--;. ... ><

~: .

~

o -5

0.0

RO

LL

a 1

20

.0

FIL

E

b :'x

ft-.

.-.4

. 3d

FIG

2

9

A T

YP

ICA

L

VIE

W

WIT

H

2 U

TT

ER

AN

CE

P

ER

D

IGIT

PE

R

SPE

AK

ER

O

F 1t

1"

TO

"9"

OF

SPE

AK

ER

A

I B

I

C

ON

C

EPS

TR

AL

C

OE

FF

ICIE

NT

S

87

.:. ~~~\r

< :. >. ~.

SOR

TED

EI

GEN

VA

LUES

: ;.

lo:.~.

\.,

,.'

1

2 3

4 5

6 7

8 9

10

11

-~---------------------------------------------------------------------------

0.54

7 0.

191

0.09

2 0

.07

4

0.04

7 0.

032

0.02

5 0.

016

0.01

3 0.

011

0.00

9

.; C

OR

RES

PON

DIN

G

EIG

ENV

ECTO

RS

.. "

: ..

• •

I.".

~. "

, ~

:,l·~.

)i:.

.......

... ..

:' . -,"

. .

'.~."

,:

1 2

.~ .

3 4

5 6

7 8

9 1

0 .

1

1

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

-0

.11

9

-0.2

43

-0

.21

7

-0.2

69

-0

.39

6

0.0

15

0

.67

1

0.1

60

-0

.34

5

-0.0

56

-0

.23

1

-0.0

60

0

.12

3

0.0

97

·0

.13

1

0.5

28

' -0

.01

4

-0.3

81

. 0

.34

9

0.0

80

-0

.05

8

0.3

76

-0

.04

5

-0.0

21

....

0.5

31

-0

.07

9

0.0

30

-0

.13

2

-0.4

62

0

.59

1

-0.4

10

0

.20

5

0.4

19

-0.0

15

-0

.05

1

0.1

87

0

.25

0

-0.1

5'1

-0

.09

8

0.2

65

0

.55

3

-0.08~

0.2H

1 0

.63

6

0.0

45

-0

.06

5

0.0

80

0

.17

9

0.3

18

-0

.05

7

0.4

74

-0

.65

0

-0.0

56

0

.44

3

0.0

70

-0.1

40

-0

.0/ 1

2 0

.18

1

-0.1

02

0

.32

1

0.2,

11

0.2

38

0.

12'1

-0

.67

2

-O.4

HO

0

.06

3

0.1

36

0

.13

7

-0.0

47

-0

.29

5

0.4

97

-0

.08

4

0.0

70

0

.12

6

-0.0

32

0

.36

8

-0.6

80

0.0

41

-0

.01

8

0.0

82

-0

.07

7

0.1

23

0

.27

1

0.3

27

-0

.2'1

0

0.4

15

-0

.5'1

4

0.4

88

-0.0

24

0.2~1

0.0

35

0

.lB

6

-0.1

63

-0

.19

1

0.5

96

-0

.33

8

-0.1

52

-0

. J'1

6 -0

.45

2

-0.0

03

0

.02

2

0.0

14

0

.12

9

-0.0

35

-0

.12

7

-0.1

21

. -

0.1

46

0

.22

0

0.0

54

-0

.51

9

0.3

14

0

.00

0

-0.1

63

-0

.21

1

0.0

90

0

.28

9

-0.6

81

-0

.21

9

0.5

06

0

.70

0

0.2

99

EIG

ENV

ALU

ES

AND

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S FR

OM

SI

R

ON

CEP

STR

AL

CO

EF

FIC

IEN

TS

O

F

2 U

TT

ER

AN

CE

S

PE

H

DIG

IT

OF

"1

" T

O

1,19

" B

Y

SPEA

KER

A

WIT

H

THE

4TH

C

OE

FF

ICIE

NT

A

S Y

TAB

LE

13

TY

PIC

AL

EI

GEN

VA

LUES

AN

D EI

GEN

VEC

TOR

S FO

R 2

UTT

ERA

NC

ES

PER

SP

EAK

ER

PER

D

IGIT

O

F 1

11

11

TO

"911

BY

SP

EA

KE

R

A

ON

C

EPS

TR

AL

C

OE

FF

ICIE

NT

S

88

:-.. ~: ~

. . .

• "

••

,a

'.

,,1

-~ I I I ,1

.j ) ~ . . ! ,! : i -, .i :;

r '

••

SOR

TE

D

EIG

ENV

ALU

ES

1 2

3 4

5 6

7 8

9 10

-

11

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

----

-. ,-

0.9

24

0

.34

6

0.1

92

0

.12

1

0.0

52

0

.02

2

0.0

17

0

.01

1

0.0

09

0

.00

4

0.0

02

,CO

RR

ESPO

ND

ING

EIG

ENV

ECTO

RS

--,

..

. '

' ..

...

' -'

, -,

t.: :'

..

--<~i ~~

~~~t"

;',>

1 2

3 4

5 6

7 8

9 1

0

11

----------------------------------------------------------------~------------

0.2

00

-0

.44

1

0.0

30

0.

0'13

0

.04

2

0.0:

37

-0.0

30

-0

.02

5

-0.0

78

0

.00

0

-0.0

50

-0

.11

5

-0.0

76

-0

.16

3

0.1

00

0

.12

7 -O.O5~

O.D

!)7

-0.2

07

-0

.25

7

-0.1

47

0

.08

6

0.3

42

-0

.00

3

0.3

16

o.

O:l~

} -0

.01

'1

O. 1

~ B

-

O. :

H} 7

-

(). 1

02

-

0.2

05

-

(). 0

80

0.

1/17

0

.49

0

-0.0

04

0

.02

9

-0.0

36

-0

.15

0

-0.2

88

O

. :\5

:1

-0.1

91

-0

.12

0

-0.1

28

0

.16

1

0.2

80

-0

.44

5

-0.0

24

0

.17

8

-0.1

73

-0

.OB

5 -O.

-15:

_~ -O.1~2

-0.1

20

-0

.10

0

0.3

4/ 1

0

.53

3

-0.1

59

-0

.72

4

0.0'

17

0.0

15

0.

1:30

-0

.12

2

0.0

49

0

.09

6

-0.1

26

0

.19

7

0.3

96

-0

.13

0

-0.0

26

0

.57

0

-0.2

25

0

.09

11

0.5

77

-0

.34

7

O~14

2 0

.18

0

0.2

31

0

.20

3

-0.3

33

-0

.40

3

-0.0

'12

-0

.12

7

-0.l

B9

0.

08'1

-0

.09

2

O.1

2G

0.7

76

0.

'148

0

.16

3

-0.2

62

-0

.03

0

0.3

'12

0

.21

6

0.2

01

0

.28

0

O •.

1911

0.2

79

0.

1H7

0.'1

B5

-0.0

51

-p

.51

6

-0.2

74

0

.11

2

0.32

'1

0.'1

91

O. ~\1

2 -0

.'1 tl

t1 O

. fi5

/ 1

-0.2

29

0

.41

7

-0.0

07

-0

.33

9

0.2

95

0

.27

9

0.8

43

-0

.72

3

-0.1

49

-0

.56

8

0.6

53

0

.45

6

0.3

49

EIG

ENV

ALU

ES

AN

D

CO

RR

ESPO

ND

ING

EI

GEN

VEC

TOR

S FR

OM

SI

R

ON

LP

C

CO

EFF

ICIE

NT

S O

F 1

UTT

ERA

NC

ES

PER

D

IGIT

PE

R

SPE

AK

ER

O

F "1

" TO

"9

11

BY

S

PE

AK

ER

A,B

,C

WIT

H

TH

E

4TH

C

OE

FF

ICIE

NT

A

S

Y

TA

BL

E

14

TY

PIC

AL

E

IGE

NV

AL

UE

S A

ND

E

IGE

NV

EC

TO

RS

FOR

1

UTT

ERA

NC

E P

ER

D

IGIT

PE

R

SPE

AK

ER

O

F "1

"

TO

"9"

BY

SPE

AK

ER

A

, B

, C

ON

L

PC

CO

EF

FIC

IEN

TS

90

, "

-I

I

~i 'J '0]

-:. '. ';:,

:=J :'1 :.1 .. .:'~ :I~ . ~ .. ;. I. '. :::

.....

" ~ .•.. . " ... '.\ " .~ :.; I'

.:

.:.

';

'. " " ..•. \. ... \"" "

. .'

\ \ '. t, •

I.. ..,.-

__ \

'.'

'"

------

<-".

. . ,

',' :~

. I:

.~.·

..,

._--

-

' .. .' .'

.. .

-: t, .

..: .. ,:.:.

·\:!Y'

.'::·

. _-

----

-.:

•• '

~;'I'~

. . ~i'~ .

. ~~-

•

• ;·'

1'.1

.' "'~':~"

• ..

i, ::

': -"

~I·!

.':"

: .'

. ·· .. ::

.~'r·~

.·t·:

. \~

~.t.

~.

... :

.:'

.... ~

'~':'l

: "~

~:.:

' : ~.

:.:

. . ' .

. '1 ;

.• 11'

, ~~} ,.:

F.

. '

.' "

""}'.

"'-t:\

.. , "

,.),.,

,'(.

.' ,

.:.

t '1

:":(.' ~~ ... f

·····.' •

••• ,

'. '

•. i'

uJ

,,',

•

.:"

• ••

_"It

'· •••

,,'.

' .

... 1

". ::,r~':

. :.;;: ::

:" ';~~

,",

. ,: ..

.. I,

,"

'\

~

o -4

0.0

liO

LL

o

20

.0

FIL

E

Xff

ri4

.3d

::" , ..

I, I,

... -....

... :.

',:'" :':,

.~;. : ~::

:. , .

.... :. ·, .. ~l :

.:}{? ···.

0 '.

.: "

. ..

·~.'

.t .

j .•.•

, ...

... ~.

~:: ··

~~~;

"i.~

:: .

. ....

:.: ..

.. ~ .:'

ii:~

~::,

:'; ..

.'

':.,:.

~

.:. :·

,1)7

: ...

. ~ ..

I'

.• ,.

,','

" ~

.• ..:

.:.~

1'1

•

.. . ,'

. .7,

1 ......

1~·.

n 11

', .. :

. •

" ••

,o.,

~ ~ \

. ' "

._,

:.

.:.

'. :

~ ~,:

. ~~

... :'~

:-.:

.

, . " ~,

.. "

. :r."

.~ . I

. __

__

__

_ . _

__

__

__

.. _::_

·:_:·:_:

· ... ;~"..r,..~"":~.._j·: :

~~!;fj!:.!: .:: :

': ;:: : "

::: :.,

•.

....

. ! .

, .

I ..

~

: '

, ... ;

" ..

t •• ' .'::

/J'.:

• .

~

'.

It

......

: •

I •

' ..

~

a -4

0.0

RO

LL

o 1

80

.0

FIL

E

Xft

, ... 4

.3d

FIG

3

1

A T

YP

ICA

L

VIE

W

WIT

H

1 U

TT

ER

AN

CE

P

ER

D

IGIT

PE

R

SPE

AK

ER

O

F "1

11

TO

"9"

BY

SPE

AK

ER

A

, B

,

C

ON

C

EPS

TR

AL

C

OE

FF

ICIE

NT

S

91

5.2 FURTHER WORKS

As already introduced in the previous chapter, some further works

have to be done. Some of them are research oriented and some are

technically oriented.

i) A finer distinction of data points should be employed. In

the present initial analysis, all data points from every

person of each frame has the same color on the 3D rotating

plot. For further and deeper analysis, different person and

different set of frames should have different color on the

3D rotating plot so as to spot more information. Different

colors of points are assigned to different person (eg red

for the first person, orange for the second, and so on ),

and the transition of frames should be represented by a

gradual change of color tone. For example, for the data in

red, the initial frames are in light red while the last

frames are dark red. Gradual change of redness is used for

intermediate frames. In doing so, the person and the

approximate frame location can be spotted on the 3D

rotation. A re-programming of the 3D rotation program under

X-window or on true color PC is recommended. This will

enable higher color resolution and pixel resolution compare

with using VGA resolution at the present stage.

ii) More voice data should be used. For initial attempt, we had

use the utterances of digit "1" to "9". The LPC results

obtained are quite satisfactory. However, for double

94

confirmation on the power of LPC and SIR, additional voice

should be added, extending the variety of sound produced.

iii) The best views obtained between single utterance and multi

utterances, or between single speaker and multiple speaker

of the same utterances does not resemble to each other.

Although planar views were obtained for all cases, the

figures does not match with each other. The shapes differ.

The 2 transformation eigenvectors for each corresponding

case also differ. The reason for this should be further

investigated. The 11 dimensional function describing the

approximate planer shape should be found for comparison and

further investigation.

iv) The failure of cepstral transform on LPC coefficients

should be further investigated for confirmation. However,

an interesting alternative that may worth a try is to apply

the cepstral transform on the original speech data instead

of the LPC coefficients. This may yield some further

insight on the speech data.

v) The data points in 3D may be reduce to save computation

time and memory space in the rotation program. Points which

are very close with each other cannot be distinguished in

the computer screen. Those points will all be overlapped

within one pixel on the screen. However, in the 3D rotation

program, . all those overlapping points are considered

95

separate and are separately calculated for rotation. This

will slow down the rotation and acquire more memory space.

This is not efficient and should be improved. A

preprocessing of the 3D data points should be added to

eliminate duplicate and close-with-each-other points.

vi) There is still an extremely small chance that the data

points are symmetrical along the regressor axis for each

dimension as regressor for cepstral transform. Therefore,

to ensure that this is or is not the case, slightly more

works has to be done. To solve the problem, a common offset

can be added to the dimensions. (The regressor dimension

can be excluded). Then the resultant multi-dimensional data

should not be symmetrical along the regressor axis any

more. This offset data can then be fed into the SIR for

analysis again.

96

REFERENCES :

Inselberg,Alfred and Dimsdate,Bernard 1987 Coordinates for visualizing Multi-Dimensional Proceeding of Computer Graphics International.

"Parallel Geometry" in

Everitt ,B.S. 1978 : "Graphical Techniques and Multivariate Data" Heinemann Education Books

Andrew D.F. 1988 : "Some approaches to Interactive Statistical Graphics" in Dynamic Graphics for Statistics, Wadsworth.

Ratkowsky, David A, 1983 : "Non Linear Regression Modelling - A Unified Practical Approach", Dekker

Flanagan, James L, 1965 Perception", Springer

"Speech Analysis Synthesis and

HKPC 1992 EB Features in Electronics Bulletin by Hong Kong Productivity Council Vol 11 No. 1.

Friedman, Jerome H, 1982 IIAn Introduction To Real Time Graphical Techniques for Analyzing Multivariate Data" in Proceedings of the Third Annual Conference and Exposition of National Computer Graphics Association.

Cater, John P, 1983 : "Electronically Speaking - Computer Speech Generation" , Howard W. San & Co ..

Keithweiskamp, Loren Heiney and Namir Shamnas using Turbo C", WILEY.

"Power Graphics

Li, K.C. 1984 "consistency for cross-validated nearest neighbour estimates in nonparametric regression" Ann. state 12.

Li, K.C. 1985 "From stein's unbiased risk estimates to the method of generalized cross-validation" Ann. State 13.

Li, K.C. 1986 : "Asymptotic optimality of CL and generalized cross-validation in ridge regression and application to spline smoothing" Ann. State 14.

Li, K.C. 1987 : "Asymptotic optimality validation and generalized cross-validation Ann. State 15.

of Cp , CL , crossDiscrete index set"

Li, K.C. 1988 : "Sliced Inverse Regression: Dimension Reduction Via Inverse Regression" U.C.L.A. statistic Series, No. 3 (revised) .

Li, K.C. 1989 : "Sight-seeing with SIR, a transformation-based projection pursuit method".

Mongomery Peck 1992 "Introduction to Linear regression Analysis" 2nd Edition, Wiley.

Morgan, Nelson 1984 : "Talking Chips" ,Mcgraw-Hill.

Stevens, Roger : "Graphic Programming in C" , M&T Books.

Roman Kue : "Introduction to Digital Signal Processing", McGrawHill.

Furui, Sadaoki 1989 Recognition".

"Digital Speech Processing, Synthe'sis, and

Veltri,stevent 1985 : "How To Make Your Computer Talk" McGraw Hill. '

TI 1987 Texas Instrument and HKPC "Fundamentals of Digital Signal Processing U , Seminar Handout.

Daulton, Tyrone 1987 : "Thr'ee DiIhensio.nal Perspective Plotting" , Byte December 1987 .

. Cleverland, William S., 1988 : "Dynamic Graphics For Statistics", Wadsworth & Books/Cole.

Cleveland,William S. and Wilks, Allan R. 1987 "Dynamic Graphics For Data Analysis" by Richard A. Becker, in statistical Science.

L.A. van Rijckevorsel, J, and de Leeuw, J. 1988 correspondence analysis" Wiley.

"Component and

De Leeuw 1984 "The Gifi-system of nonlinear mUltivariate analysis, in Data analysis and informatics II It, edited by E. Diday, Amsterdam, North Holland.

Breiman, L., and Friedman, - J. 1985 transformations for multiple regression Amer. statist. Assoc. 80.

"Estimating optimal and correlation." J.

Koyak, R. 1987 : "On measuring internal dependence in a set of random variables" Ann. statist. 15.

Box, G.E. and Cox, D.R. 1964 : "An analysis of transformations." Journal of Royal statistical Society, 26.

Eubank R. L. 1988 "Spline Smoothing and nonparametric regression" Marcel Dekker, New York.

Hardle, W., Hall, P. and Marron, S 1988 "How far are automatically chosen regression smoothing parameters from their optimum. with Discussion." J. Amer. statist. Assoc. 33.

Kendall, Sir Maurice, 1975 "Multivariate Analysis" Charles Griffin & company Ltd., London and High Wycombe.

000388813 .

Documents

VISUALIZATION OF THE MULTI-DIMENSIONAL SPEECI-I … · 2016-12-28 · VISUALIZATION OF THE MULTI-DIMENSIONAL SPEECI-I PARAMETER SPACE by Andrew Poon Ngai Ho l ... Despite some initial