Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
VISUALIZATION OF THE MULTI-DIMENSIONAL
SPEECI-I PARAMETER SPACE
by
Andrew Poon Ngai Ho
l.. .~ t
A thesis submitted in ' partial fulfilment of the requirements
for the degree of Master of Science
Division of Information Engineering
The Chinese University of Hong Kong
May 1993
,-
~es~5
Tl< 78~:2-S'£Sf'6 b Vi03
ABSTRACT
Speech data has long been considered as complex,· difficult
to comprehend and manipulate. The parameters describing
these complicated data are in general multi-dimensional.
The main purpose of the work reported in this thesis is an
attempt to literally visualize these multi-dimensional
speech parameter. It is hoped that interesting features can
be spotted, and hence provide additional insight to the
property of speech. What is more important, it may help to
reduce the dimensionality of the speech parameter space.
The thesis provides a comprehensive background of the
subject. Various speech data representations are presented.
various methods for visualization of multi-dimensional data
and dimension reduction are also thoroughly con~idered. The
choices for speech data representations, LPC coefficients
and LPC cepstral coefficients, and the final - choice for .
dimensional reduction, Sliced Inverse Regression, used in
the thesis will be discussed in more detail. Computer
programs were developed to do the transformation of the raw
speech data and to do the visualization.
The results obtained were indicative. Using Sliced Inverse
Regression, the LPC coefficients were found to be lying on
an non-linear surface. Moreover, though the LPC cepstral
coeff icients did not provide any interesting feature , it
indicates the inappropriateness of using LPC cepstral
coefficients for the analysis in this thesis.
i
ACKNOWLEDGMENTS
I would like to express my special thanks to Dr. Edmund Lai
for his continual support and guidance on this project. It
was Dr. Lai's original idea to investigate on the topic in
this thesis. without his initial inspiration, this research
will not be started at all. During the lengthy and painful
research process, Dr. Lai has offered his grateful
assistance in suggesting directions and raising possible
pitfalls when something seems to go wrong.
Also, I would also like to express my thanks to the Computer
Technicians in the Information Engineering Department,
especially Andy Lai. They had given me many helps in
operating the computers in the Universities. If not assisted
by them, it would have taken much more time to get
accustomed to the computers (especially the PC to host
connection) before fully mastering them for the real work.
Last but not least, I would like" to t:hank my wife Flora
Choi. She had to tolerate the times of her husband's
negligence of her in favour of his project. Practically, she
even helped me prepare the tables and some of the graphic
formatting in this thesis. Moreover, it is her lovingly
moral support that prevented me from quitting when seemingly
unsolvable difficulties were encountered. without her, this
thesis would never have been completed.
ii
TABLE OF CONTENTS
ABSTRACT
ACKNOWLEDGMENTS
1. INTRODUCTION
2. REPRESENTATION OF SPEECH DATA
2.1 SAMPLE DATA REPRESENTATION
2.2 ANALOG LINEAR SYSTEM MODEL
2.3 DISCRETE FOURIER TRANSFORM
2.4 FILTER BAND REPRESENTATION
· •
· · • •
· · 2.5 LINEAR PREDICTIVE CODING ,(LPC)
3.
2.2 LPC CEPSTRAL COEFFICIENT . ·
MULTI-DIMENSIONAL ANALYSIS • •
3.1 PURE GRAPHICAL TOOLS ....
·
· . . . . • · · · •
· · · · · • • • · • • •
· · • · · · · · • · · •
• · · · · • •
· . . . . · . . . .
3.1.1 MULTI-HISTOGRAM · . . . . . . . . 3.1.2 STARS . . . . · . . . . . . . . 3.1.3 SPIKED SCATTERPLOT · . . . . . . 3.1.4 GLYPHS . . . . · . . . .
4
4
7
8
8
10
13
18
18
18
19
19
22
3.1.5 BOXES. . • . • • • • • • • • • • •• 22
3.1.6 LIMITATIONS OF THE BASIC METHODS
3.1.7 CHERNOFF FACES
3.1.8 ANDREW'S CURVE
3.1.9 LIMITATIONS
ANDREW'S CURVE
OF
. .
· . . . . . . . . · . . . . . . . .
CHERNOFF FACES AND
. · · · · • · · 3.1.10 SCATTERED PLOT MATRIX · · • · · 3.1.11 PARALLEL-AXIS SYSTEM · . . · • • · •
3.1.12 COMMON BASIC PITFALL · . . • • • · •
22
26
27
30
30
32
33
4
3.2 PURE PROJECTION METHODS . . . . . . . . 3.2.1 PRINCIPAL COMPONENTS ANALYSIS ••
3.2.2 PRINCIPLE CO-ORDINATES ANALYSIS.
· . • •
3.2.3 REGRESSION ANALYSIS .•••• · . · . 3.3 SLICED INVERSE REGRESSION (SIR) · . · .
DATA ANALYSIS . . . . . . . . . . . . . . . . . 4.1 PROGRAMS AND TEST DATA · . . . . . . 4.2 ACTUAL SPEECH DATA RESULTS · . . . .
4.2.1 SINGLE UTTERANCE OF "4" BY SPEAKER A
ONLY . . • . . • . • • · · • • . · •
4.2.2 TWELVE UTTERANCES OF "4" BY SPEAKER A
4.2.3 THREE UTTERANCES PER SPEAKER OF "4" BY
SPEAKER A, BAND C · . · · · · . • . 4.2.4 TWO UTTERANCES PER DIGIT OF "1" TO "9"
BY SPEAKER A · . . . . 4.2.5 ONE UTTERANCE PER DIGIT PER SPEAKER OF
36
36
37
38
41
50
50
63
66
72
78
83
"1" TO "9" BY SPEAKER A,B,C . . . •• 86
CONCLUSION AND FURTHER WORKS . . . . . . . • • . •• 93
5.1 CONCLUSION •• . . . . . . . . . . . . . 5.2 FURTHER WORKS ..•. • · · . . . . . .
REFERENCES
93
94
APPENDIX I
APPENDIX 2
APPENDIX 3
APPENDIX 4
APPENDIX 5
APPENDIX 6
APPENDIX 7
APPENDIX 8
MATLAB PROGRAM LISTING FOR SIR
C PROGRAM LISTING FOR ROTATIONAL VIEW
C PROGRAM LISTING FOR LPC AND CEPSTRAL
TRANSFORMS
ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR
SINGLE UTTERANCE OF "4 11 BY SPEAKER A
ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR
12 UTTERANCES OF "4" BY SPEAKER A
ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR
5 UTTERANCES PER SPEAKER ,"OF "4" BY SPEAKER
A,B,C.
ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR
2 UTTERANCES PER DIGIT OF DIGIT "l" TO "9 11 BY
SPEAKER A
ALL VIEWS, EIGENVALUES AND EIGENVECTORS FOR
1 UTTERANCE PER SPEAKER PER DIGIT OF "1" TO
"9" BY SPEAKER A,B,C
CHAPTER 1 INTRODUCTION
Speech processing has a history long before the invention of
the telephone in 1876. Beginning in the late 18th century,
researches had already been done on the engineering aspect
of speech. Ini tial interests concentrated on mechanical
speech synthesizers which tried to produce human voice
mechanically. [Furui 89J This led to the further and deeper
research on vocal vibration and hearing mechanisms. Today,
speech recognition ~s one of the hottest topics in
artificial intelligence research. Also, speech synthesis and
storage is beginning to move from academic research to
commercial applications (eg voice mails, tapeless recording
phones, electronic dictionaries) .
Despite some initial success, the complexity of sound
waveform is still posing great difficulties in the research
and development on speech. To cope with the problem,
scientists and engineers thus used a lot . of other methods to
represent human voice with finite discrete parameters
instead of dealing directly with the complex and continuous
sound waveform. Many models exist. However, no matter which
kind of modelling is used, a considerable number of
parameters are still needed to closely represent the actual
human voice. Human's visual experience limits our ability to
deal with dimensions higher than three. Therefore,
difficulties in analyzing the human voice data still exist.
It is still difficult for human to deal with the multi-
1
dimensional parameters di~ectly. Though some advances have
been made recently, statistical analysis on mult.i-variant
data still has not reach to its maturity yet. Multi
dimensional data remains an obstacle to human for further
analysis.
On the other hand, even at the early stage of speech
research ( eg. Flanagan 1965 ), researches have long
suggested that the actual speech data may be close to 2
dimensional in nature. Is it possible that the speech data
lie on a hyper-surface in the original multi-dimensional
space ? How can we reduce this multi-dimensional data to a
2 or 3 dimensional ? How can we find the distribution and
characteristics ? These are the main questions which a
proper visualization of the speech parameter space may
enable us to answer.
If speech parameters can basically be represented in 2
dimensions, say, they lie ON a hyper-surface, it will be a
major discovery in speech research. Speech data can
theoretically be compressed to 2 parameters only. It will
save a lot of storage spaces and transmission times for
speech data. 2-dimensional data are much easier to
manipulate in speech synthesis and in other analytical work.
Also, a 2-dimensional data space will makes speech
recognition a much simpler job. The system only has to
recognise the 2 dimensional point only. Therefore, a major
breakthrough in speech research will definitely emerge.
2
This thesis is divided into 8 chapters. As already seen,
Chapter 1 provides · the basic background of the· problem.
There are many possible ways to represent speech data in
discrete parameters. These methods will be presented in
Chapter 2. The final chosen forms of speech representation
parameters - namely LPC (Linear Predictive Coding) and LPC
Cepstral Coefficients are discussed in more detail. General
methods to handle multi-dimensional data are discussed in
Chapter 3. The methods for reducing dirnensionality for
visualization used in this thesis is SIR (Sliced Inverse
Regression), which will also be discussed in detail. Further
spotting of other features in speech data will be aided by
the human eye. The actual programs and test data are briefly
explained in Chapter 4. The actual results using these
programs and test data are presented later in the chapter.
Discussion of the results will also be included. A final
review of the whole thesis will be concluded in the last
chapter - Chapter 5. Suggestion on further works will be
raised in this last chapter.
3
CHAPTER 2 REPRESENTATION OF SPEECH DATA
Human voice signals are very complex, even for a single
vowel sound. Our vocal system is continuously changing
during our utterance of a simple voice. This can be
considered as a complex time variant system. For the sake of
simplicity in the initial phase, researchers approximate
human voice production as a time invariant system for short
intervals. considering the actual data of voice signals, we
can see that signals change very little within approximately
20 ms in general. within 20 ms, the system can be
approximate by a time invariant one. Many researchers also
use this frame width of 20 ms for analysis. However I in
practical situation, a sharp edge window will not be used.
Abrupt changes at the frame boundaries will usually cause
large inaccuracies in the mathematical analysis. Therefore,
researcher will usually slightly increase the frame length,
but keep approximately the same frame advance, and add a
belt shaped , weighing window to smooth the frame
boundary. [FIG. 1)
2.1 SAMPLE DATA REPRESENTATION
A direct way to represent speech data will be simply use the
sampled value of the speech data. [FIG 2J The sampling rate
being the Nyquist rate. For a 20ms frame and a sampling rate
of 8KHZ, for instance, the number of parameters (sampling
points) will be 160. This is obviously excessive. Moreover,
the dimension will be dependent on the frame length and the
4
VO
ICE
SI
GN
AL
FR
AM
E
ABR
UPT
W
IND
OW
ACTU
AL
WIN
DO
W
20 m
s 2
0m
s
-..1
·"·
L
FIG
. 1
FRA
MIN
G
AN
D
WIN
DO
WIN
G
OF
TH
E
SPE
EC
H
DA
TA
IN
TIM
E
DO
MA
IN
5
sampling rate. This raw' data approach is definitely not a
good representation of the speech for further analysis.
2.2 ANALOG LINEAR SYSTEM MODEL
Before the invention of PCM and digital signal processing,
researches are limited to analog systems. Early systems had
tried to break down the voice production system and imitate
their functions separately. Using mechanical and/or
electrical means, researchers tried to model the lung in
analog form, vocal tract, vocal cord, aural and nasal
cavities, lips and tongues etc. and produce human voices
correspondingly. In actual human voice production, air flow,
glottal contraction rate and tension, resonance in the
cavities, and other effect on lips and tongues all together
make up an extremely complex system. Many different models
exist - Linear Separable Equivalent Circuit Model, Vocal
Tract Transmission Model, Vocal Cord Model, Articulatory
Model etc. Each of them views the system on a di.fferent
perspective and emphasizes different aspect of the actual
system. When modelling anyone of them, quite a few number
of parameters are involved, eg. air-flow rate, dimension of
the glottis etc .. Initially, some apparent success has been
accomplished in speech· production, though the resultant
artificial voice produced are still not very good. To
improve the result, more parameters and more fine tuning on
parameters are needed. However ,the improvement attained is
far from proportional to the effort put in. As the number of
parameters is increased, it becomes difficult to fine tune
7
them because the outcome voice is not independently
correlated with one parameter. Adjusting one parameter will
affect the influence of some other parameters. Thus the
system becomes very difficult to fine tune.
Recently, many researchers put aside the direct modelling
of the human vocal system due to its gaining complexity.
Instead they concentrate on representing the overall
resultant speech data. Mathematical tools are used instead
of direct analog means. Examples of such are OFT, band-pass
filtering, LPC and Cepstral transforms etc.
2.3 DISCRETE FOURIER TRANSFORM
The simple Discrete Fourier Transform (OFT) faces the same
problem with the raw data approach - too many dimensions and
dependency on frame length and sampling rate. The difference
is that one is in the time domain and one is in the spectral
domain.
2.4 FILTER BAND REPRESENTATION
Band-pass filtering is an example widely used commercially
in vocoder due to its simplicity and low cost. Each band
fil~ers out a band spectrum. A value corresponding to the
energy within that band is thus read as a parameter. The
number of bands will determine the number of parameters.
Both analog or digital filters can be used.[FIG.3] However,
to be accurate, a lot of bands are still needed. Moreover,
as the band gets narrower, the band-pass filters have to be
8
l..-
Q
"' '- Q) c:
o -g
--0
(]
) "- c..
en
::E
Cl:
1.0
0.8
0.6
0.4
0.2
0'
4 8
U n
vo
ice
d s
pe
ech
----
----
----
----
----
----
----
----
----
----
----
_.
Vo
ice
d s
pe
ech
12
P
16
2
0
FIG
.)
BLO
CK
D
IAG
RA
M
OF
BA
ND
-FIL
TE
RIN
G
9
steeper in their charaeteristics. This poses a practical
problem in this approach. If digital filter is used, there
will be a conflict betwe'en the time resolution and the
frequency resolution.
Besides DFT, other forms of digital signal processing are
also popular. DSP becomes popular recently mainly because of
the invention of PCM and the advances in digital technology.
One of the most popular method used in representing voice
data in recent years is the Linear Predictive Coding
coefficients.
coefficients.
Another
These
choice which
two methods
is the Cepstral
of speech data
representation are used in this research and will be
discussed in more detail in the following sessions.
2.5 LINEAR PREDICTIVE CODING (LPC)
The two most important papers to appear since 1960 were
presented at the 6th International Congress on Acoustics
held in Tokyo, Japan, in 1968
analysis-synthesis system based
method presented by NTT I s
the paper on a speech
on the maximum likelihood
Electrical communications
Laboratories, and the paper on predictive coding presented
by Bell Laboratories. Th~se papers essentially produced the
greatest thrust to progress in speech information processing
technology; in other words, they opened the way to digital
speech processing technology. specifically, both papers deal
with the information compression technique using the linear
prediction of speech waves and are based on mathematical
10
techniques for stochast±c processes. These techniques gave
rise to Linear Predictive Coding (LPC) . [Furui 89]' Now, LPC
is a very widely used methods to represent voice data both
in research areas and commercial applications. A lot of
speech recording / talking chips in the market employ LPC or
its variations in their algorithms. (eg Analog Device Inc.,
DSP Group Inc., Zilog Inc. etc.) [HKPC92] The basic
principle of LPC works as follows.
Assume discrete signal samples Xl' X2 ,X3 , X4 •••••• where
sampling frequency Fs > 2Fb ,the signal bandwidth. Let us
assume the following first-order linear combination between
the present sample value Xl and the previous p samples,
where € is an uncorrelated statistical variable having a
mean value of 0 and a variance of 02.- The sig:nal Xl can thus
. be thought of "linearly predicted" by the previous values Xl - i
with an error term of E. By adjusting the weighting
coefficients ai' it is possible to minimize the rms value of
€. The a· thus obtained are the LPC coefficients. The diagram I
of a LPC synthesizer is depicted in Fig 4. The value of p is
called the order of the LPC analysis. The greater the p, the
smaller the residual error €. However, for band limited
data, the values of the are significant for the first 10 -
14 LPC coefficients. Even for much larger value of p, the
coefficients become relatively insignificant after the 12 th
11
Cl)
H Cl)
X ~ ::c E-i Z >t Cl)
U • • ~ ~
~ 0 N
U M
H E-i
~ ~ ::c u Cl)
~
W t!) H ~
coefficient. Usually these coefficients will be at least an
order less than the first few coefficients. Moreover, the
reduction of residual error is also relatively insignificant
for the order larger than 12.[FIG. 5][TI 1987] Therefore,
for practical purpose ,many researchers will use the ' order
of 10-14 for analysis. The order that we use in this
research is 12. These 12 coefficients will then be the
parameters of a short interval (assuming time invariant) of
human voice. In other words, we will be dealing with 12-
dimensional data.
To obtain the LPC coefficients, ie obtaining aI' a2 , ••• ap
which minimize the residual error €, a set of linear
simultaneous equation has to be solved. These can be solved
by covariance and autocorrelation method by using Cholesky
decomposition and Durbin's recursive solution respectively.
The former method is more efficient than the later one. But
the former one is less accurate when the number of samples
are small and temporal variation large. [Furui 1989]
Therefore, ln our research, we will use ' the later method for
our analysis.
2.2 LPC CEPSTRAL COEFFICIENT
Cepstral transform is technique used to separate spectral
envelope and spectral fine structure. It is defined as the
inverse Fourier transform of the short-time logarithmic
amplitude spectrum. [Bogert et al., 1963; Noll, 1964, NolI
1967] The block diagram of the cepstral transform is
depicted in Fig 6.
13
'- e ~ <D
c:
o ~
"0
<D
'- 0
-C
/)
~
0:
1.0
1\
\ , \ \ \
0.8
\
0.6
0.4
0.2
0'
\ \ \ \ \ \ '\ "',
'"
4
" "
Unv
oice
d sp
eech
-'---
----
----
----
----
----
----
----
----
----
-
8
Voi
ced
spee
ch
12
P 1
6
20
FIG
. 5
VA
RIA
TIO
N
OF
TH
E
RMS
PR
ED
ICT
ION
ER
RO
R
WIT
H
TH
E
NU
MB
ER
OF
PRE
DIC
TO
R
CO
EF
FIC
IEN
TS
, P
" 1
4
-I « z (!J -Cl')
z - . - -
--I-
'- Ll.. '-/'
0 /
--
CD tL: 0 . " / 0 -1 -" /
\D .
to r-f
The basic concept of cepstral transform is as follows. If
the magnitude of the Fourier transform of a signal is a
multiplication of two functions, an envelope component and
a fine structure component, then taking the logarithm of the
spectrum will convert the multiplication relationship to an
additive one. The inverse Fourier Transform will then
separate the two functions.
x(t) = a(t) * b(t) (convolution)
F(x(t)) = X(w)
= F(a(t)) F(b(t)) (multiplication)
= A(w) B(w)
log(X(w)) = log(A(w)) + log(B(w))
F-1 ( log (X (w) )) = F-1
( log (A (w) )) + F-1 ( log (B (w) ) )
Thus the cepstral transform will separate these two spectral
components. The spectral envelope portion will correspond to
significant coefficients on the first few cepstrum
coefficients. On the other hand, the smaller spectral'
variation will be represented by later smaller coefficients.
Thus the cepstral transform essentially separate two
convolutionally related properties by transforming the
relationship into a summation. This is a kind of homomorphic
analysis or filtering. [Oppenheim and Schafer 1968]
Actually, the term cepstrum is obviously includes the
meaning of the inverse transform of the spectrum. The
independent parameter for the cepstrum is called quefrency,
16
which is also obviously' formed from the word frequency.
Since the cepstrum is the inverse transform of the frequency
domain function, the quefrency becomes the time-domain
parameter. The separation of the spectral envelope and the
fine structure is termed liftering, obviously derived ' from
filtering.
Now, instead of cepstral transforming a time-domain signal,
we transform the LPC coefficients. The coefficients obtained
are the LPC cepstral coefficients. Perhaps only the first 3
or 4 coefficients are significant due to the envelope
extraction property of cepstral transform.
The cepstral coefficients must be differentiated with the
LPC cepstrum transform, which is a cepstral transform
applied to time-domain speech signal using z-transform of
the impulse response of an all-pole speech production system
estimated by the LPC analysis instead of using Fourier
transform as abovementioned.[Furui 1989).
For simplicity, in the following part of the thesis, the LPC
cepstral coefficients will be simply called cepstral
coefficients and the transform will be simplY called
cepstral transform.
17
CHAPTER 3 l\1ULTI-DIMENSIONAL ANALYSIS
Given the 12 LPC coefficients or the cepstral coefficients,
How can we analyze them ?
Are there any interesting features that lies
within these 12 dimensional data ?
How can we know if all these points may lie on
some hyper-surface ?
3.1 PURE GRAPHICAL TOOLS
Human eye has long been the most useful tool in spotting
distribution characteristic, provided that the data are
presented well. A graph is often a better summary than
thousands of words or thousands of numerical figures in
tables. Line graphs, bar charts-, pie charts, etc. have long
been used to depict 2 dimensional data. Many ways have also
been devised to present multi-dimensional data. These
methods include multi-histograms, stars, spiked scatterplot
[Gower'1967] , glyphs [Anderson 1960], boxes [Hartigan 1975],
Andrew's curve [Andrew 1972], and Chernoff's faces [Chernoff
1973].
3.1.1 MULTI-HISTOGRAM
The most natural and the simplest way of representing a p-
dimensional observation is by using p vertical columns or
bars. The bars are placed next to each other with the
heights respectively proportional to the p-th dimension
18
observations. Such a bar diagram is constructed for every
observation vector. Alternatively the midpoints of the bars
can be joined to give a more continuous appearance to the
profiles. An example in 6 dimension is depicted in FIG 7.
3.1.2 STARS
Star or polygons represent the p measurements of a case on
equally spaced radii extending from the centre of a circle.
Any negative measurements can be adjusted by a suitable
transformation. The mea$ur~me~ts ·~re · then linked to form a
star. Such a star can now be formed for every observation
vector. This star representation is also called circle
diagram or snowflakes. The same 6 dimensional points are
shown in Figure 8. [Everitt 1978]
3.1.3 SPIKED SCATTERPLOT
The 2-D scatterplot has long been a standard representation
for 2 dimensional data. Therefore, modifying the 2-D
scatterplot diagram to accommodate more dimensions seems to
be an obvious choice. For example, [Gower 1967] reports a
method due to Ross in which the first two variable values
for each observation are plotted in the usual way to give,
for a particular observation, a point R, say. The third
variable value for this observation is now represented as a
line with length proportional to this value originating from
R and directed eastward if the value is positive and
westwards if negative. Similarly the fourth dimension uses
the north and south directions, and if five and six
19
,.--
,.--
,.--
r---
-r-
--r-
--r--
~
r--
'--
r--
---~
r--
r---
-r-
--r----
r--
r--
r--
--
r--
r--
f--
r---
r--
x , X
1X aX
4X
sX ,
X ,X!X~4X5X$
X, X
tX,x
4X V
C,
x , x
zX ,x
"x ,X
, X
1X IX
aX"X
sX
.
X,X
1XsX
4X
,X,
X, X
!X,x
.X ,x
, X
, X zX
sX..x
,X,
X , x
zX sX
.J( ,X
, X
,X tX
sX .. X~.
FIG
7
MU
LT
I-H
IST
OG
RA
MS
(PR
OF
ILE
S)
OF
FIV
E
6-D
IME
NS
ION
AL
P
OIN
TS
20
1 2
3 4
5 X
2 X
, r-----..
. X e
(--1::
---7
(-----
" )
~
/',--
--(-
---~
---7
'
,
/ \
--\-
~
~
x ..
x'
~\J
~
5
FIG
8
STA
R
RE
PR
ESE
NT
AT
ION
O
F F
IVE
6-
DIM
EN
SIO
NA
L
PO
INT
S
21
dimensions
directions
are required
could be used.
the N . E. / S . W . and N • W. / S . E.
Figure 9 shows a set of 6
dimensional data as in above.
3.1.4 GLYPHS
Anderson (1960) proposed that the measurements of each of
the p variables in a specific case be represented as rays
from a circle. These representations are known a glyphs.
This representation differs from the star representation
as discussed ·p~evi.ously .. Firstly, all the rays are drawn
from the top of the circle to the outside and secondly, the
end points of the rays are not linked. Such a glyph is then
drawn for every observation vector. An example is depicted
in Figure 10.
3.1.5 BOXES
Hartigan (1975) recommended that boxes be used for
representing multivariate data. If p = 3, each observation
vector is represented by one box with the length, width and
height corresponding to the Xli X2 ,and X3 observations. All
negative values are made posi ti ve by the addi tion of an
appropriate constant. For p > 3 a single side of the box can
be used to represent more than one variable by dividing the
side into segments. This is shown in Figure 11.
3.1.6 LIMITATIONS OF THE BASIC METHODS
All the above methods are relatively naive methods and can
be applied only if the total dimension is small, say 5 to
22
X2
~
L .
1 ~
~
X1
Xs
X6~ X4
~
XS
FIG
9
SP
IKE
D
SCA
TT
ER
PLO
T
OF
FIV
E
6-D
IME
NS
ION
AL
P
OIN
TS
23
x x
Xa
)(sI
I'
X3
~I
X1 FIG
1
1
BO
XES
R
EPR
ESE
NT
AT
ION
O
F F
IVE
6-
DIM
EN
SIO
NA
L
DA
TA
25
6, and are mostly used in' observing simple tendencies or for
simple comparisons. For dimensions larger than 8, it is
extremely difficult if not impossible to observe any
tendencies or clustering. Moreover, the result depends
heavily on the ordering of the dimensions. If the dimension
ordering is chosen well, one may observe interesting feature
easily. On the other hand, if a poor ordering is used, even
a simple clustering will be difficult to spot. Anderson
suggested that his variables in glyphs should be ordered
such that highly correlated ones should be as far as
possible with adjacent rays (as depicted in Figure 10). For
box representation it is also clear that grouping together
of highly correlated variables on one side of a box reduces
the variation between segment lengths of a side for
individual boxes.
More complex representation which can accommodate higher
number of dimensions and less succeptable to the ordering of
the dimensions are Chernoff Faces and Andrew's Curve.
3.1.7 CHERNOFF FACES
Considerable interest is being shown in the representation
of multi-variate data points by means of faces (or rather
characteristics of the human face). It is suggested that
human are accustomed to distinguishing accurately between
faces with different features and widely divergent facial
features can be used to associated with different variables.
For example, Xl can be associated with the size of the mouth,
26
X2 with the size of the "nose ........ Computer program can
easily be written to generate these faces corresponding to
different multi-dimensional vector. A simple example is
shown in Figure 12.
3.1.8 ANDREW'S CURVE
Instead of using direct association, Andrews (1972) proposed
the use of more complex harmonic functions for presenting
mUlti-variate data points. He introduced the function
Two important properties of the Andrew's function are
1) The function representation preserves means. This
implies that the function of mean vector of the
observed vectors Xl ,X2 , •••• Xn is the pointwise mean
of the functions fl (t) ,f2 (t) .... f3 (t) .
2) The function representation preserves distances.
It can · be proved that the distance between two
functions fl(t) and f 2 (t) measured by
i: [f1 (t) -f2 (t)]2 dt
is proportional to the Euclidean distance between
the observa tion vector XI and X2 •
These two properties give rise to the fact that Andrews'
curves are useful in clustering observation points in
homogeneous groups or to compare individual functions with
27
Var
iabl
e F
eatu
re
1 N
W a
ngle
of f
ace
c:=
J
I~
2 N
E
angl
e of
face
3
Left
eyeb
row
o 0
4 R
ight
eye
brow
5
So
cke
t o
f le
ft e
ye
6 P
upil
of le
ft e
ye
A
7 S
ock
et
of
rig
ht
eye
8
Pup
il o
f ri
gh
t eye
N
9 S
ha
pe
of no
se
10
Sh
ap
e o
f m
ou
th
/j\
11
SW
an
gle
of f
ace
1
2
SE
an
gle
of f
ace
FIG
12
AN
EX
AM
PLE
OF
A C
HER
NO
FF
FAC
E
28
110
100 I
J( •
90
eo
70
60
60
<40
30
20
10 0
-10
-20
-30
....«
)
-s.o
-2
.0
-1.0
0.
0 1.
0 2.
0 3.
0
FIG
1
3
AN
DR
EW
S'
CU
RV
E FO
R
FIV
E
6-D
IME
NS
ION
AL
P
OIN
TS
29
the mean function. An example of which is shown in Figure
13.
3.1.9 LIMITATIONS OF CHERNOFF FACES AND ANDREW'S CURVE
still, both Chernoff faces and Andrews curves suffer a major
drawback as all the abovementioned representations. When the
number of data points are in the order of hundreds or
thousands, it become almost impossible for human eye to deal
with so many figures, whatever it may be. Also, all these
representations are good for clustering analysis. For other
characteristics such as whether the points lies on a
hypersurface, all these methods seems useless. Some other
method should be investigated.
3.1.10 SCATTERED PLOT MATRIX
The famous scattered plot matrix is one of the most widely
used practical tools for multi-dimensional data analysis.
[Cleveland 1987] It simply arrange each pairs of 2-
dimensional scatterplot in a matrix as in Figure 14. It is
good for viewing clustering and is handy for data
manipulation. However, for viewing the characteristic of the
data other than clustering, scatterplot matrix becomes less
useful. Imagine what would be seen in the scatterplot matrix
if the data lie roughly on the surface of a hypersphere.
Every plot will look similar as a single round cluster. A
great deal of intelligent (and lucky) data manipulation
guesses has to be done to figure out this hypersphere
30
-..,
• D1
•
• •
• D
2 •
• •
• 0
3
• •
• • •
04
•
• •
• D
5 •
•
FIG
1
4
SCA
TT
ER
PLO
T
MA
TR
IX
OF
A S
ING
LE
5-
DIM
EN
SIO
NA
L
PO
INT
31
characteristic. Moreover,' for interactive data manipulation
using scatterplot matrix, the total dimension should be less
than 7 or 8. The total plots will be 64 when the dimension
is 8. Even if we neglect the upper diagonal plots, (because
they are the transpose of the lower diagonal) the total
plots remains 28. Our human vision and mind is almost
impossible to handle so many plots at one time, even though
we have a large computer screen to accommodate tens or even
hundreds of scatterplots, not to mention the use of screen
scrolling to viewtn practice. For our speech data analysis I
the total dimensions in our case is 12. Therefore
scatterplot is not a suitable tool for our analysis.
3.1.11 PARALLEL-AXIS SYSTEM
Another mUlti-variant visual tool is the parallel-axis (or
parallel-coordinate) system. [Inselberg & Dimsdate 1987]
Dimension are represented by evenly spaced vertical axis.
The value (projection) on that dimension is represented by
an intersection on that particular axis. Lines are then used
to connect all those intersections. Therefore, a real point
in the actual multi-dimensional space will be represented by
a line pattern in the parallel coordinate system. (FIG 15)
This system is a compact representation system which
compresses the multi-dimensional information into a single
2-dimensional plot. However, the plot becomes messy if the
number of data points is large, say a few hundred. It will
then be difficult to spot interesting characteristics on one
32
screen even though enlargement of a small window is
allowed. Moreover, the characteristic which can be spotted
from this parallel coordinate system is limited. For
example, you may be able to spot that the characteristic of
the data points lies on a hypersurface provided that you are
sure that the hypersurface is a convex conic. For a convex
conic, the envelope of the plot will be an open ended
conic. (FIG 16) However, if the hypersurface is not known to
be a convex conic, then even we have an open ended conic
envelope, no conclusion can be made. We have to dig into
those individual lines instead of simply the envelope for
more detail. The parallel axis representation may find its
usefulness in other application area such as air traffic
control (including the dimension of time, the total
dimension is 4.) The research of the parallel axis system is
not so mature as to provide a simple and general viewing
method for spotting interesting characteristics yet.
Therefore, this tool is still not a good enough good choice
for our purpose.
3.1.12 COMMON BASIC PITFALL
The main basic problem with the above methods is that they
try to preserve all multi-dimensional details in their 2-
dimensional representation. Too much information will appear
on the 2-dimensional representation and it will become
difficult to spot any interesting characteristics. Some sort
of information compression should be used.
33
D1
D2
D3
D4
D5
D6
/
FIG
1
5
A S
ING
LE
6-
DIM
EN
SIO
NA
L
PO
INT
U
SIN
G
PAR
AL
LE
L
CO
-OR
DIN
AT
E
RE
PRE
SEN
TA
TIO
N
34
01
02
0
3
04
0
5
06
FIG
1
6
6-D
IME
NS
ION
AL
PO
INT
S O
N
A C
ON
VEX
SU
RFA
CE
U
SIN
G
PAR
AL
LE
L
CO
-OR
DIN
AT
E
RE
PRE
SEN
TA
TIO
N
35
3.2 PURE PROJECTION METHODS
Another way to analyze multi-dimensional data is by the use
of numerical statistics. The usual numerical tools in
statistics are in some form or another of a projection
pursuit. Higher dimensional data are transformed to lower
dimensional through some numerical statistical
manipulations. Through the transformation, some sort of
information compression is achieved. This prevents the
pitfall of _putting all detail multi-dimensional information
to a 2 dimensional plane. Principal components analysis,
principal co-ordinates analysis and regressions are the most
widely used statistic tools.
3.2.1 PRINCIPAL COMPONENTS ANALYSIS
Principal components analysis is described in detail in many
textbooks of mUltivariate analysis. It is usually used to
obtain a low-dimensional (usually two to three dimensional)
representation of mUltivariate data so that the data may be
examined visually and any structu~e identified. Essentially,
principal components analysis consists of finding linear
transformations, Yl' Y2' •••• Yp of the original variable
Xl,X2' .. ~.Xp that have the property of being uncorrrelated.
The y-variables are chosen in such a way that Yl has maximum
variance'Y2 has maximum variance subject to being
uncorrelated with Yl and so on. The transformation is
obtained by finding the latent roots and vectors of either
the covariance or the correlation matrix. The latent roots,
arranged in descending order of magnitude, are equal to the
36
variances of the corresponding y-variates, these being of
course, the principal components. The proof of this can be
found in many text books such as [KENDALL 1975].
An alternative method of viewing principal component
analysis is that given by Gower (1967) who shows that the
first principal component may be regarded as the line of
best fit (in the least squares sense) to the n,p-dimensional
observation in the sample. These observations may therefore
be represented in one dimension by taking their projection
onto this line. A derived and improved representation is
given by the projection of the observations onto the plane
defined by the first two latent vectors of the covariance or
correlation matrix. Similarly, the first p latent vectors
give the best fit in the p-dimension. In each case the
original observation, may be projected into the lower
dimensional space to give a set of points which acts as an
approximation to the configuration of the observation. Once
the first 2 principal components are found, the data are
transformed accordingly and can be easily plotted on the 2-D
graph.
3.2.2 PRINCIPLE CO-ORDINATES ANALYSIS
Gower (1966) describes a ·procedure which he terms principal
co-ordinates analysis, which is similar in some respects to
principal components analysis, but leads to a set of
projected points whose distances apart may be allowed to
reflect non-Euclidean distances between the original
37
observations. In situation where the appropriate distance
measure between observations is considered to be something
other than Euclidean this would obviously be a very useful
procedure. Therefore, principal component analysis can be
thought of a special case in principal co-ordinate analysis.
Both methods described above (principal components and
principle co-ordinates) are excellent if the data are nearly
collinear or coplanar. For curved data, these methods may
still suggest a good transform for viewing data. One problem
of these method is that they are quite static. Either you
spot something interesting or you does not. Not much
suggestion is provided on what the final hypersuface is even
though it is spotted. Moreover, not much space is open for
further analysis.
3.2.3 REGRESSION ANALYSIS
Regression is among the most widely used methods in
statistics. It- is hardly anything new. Linear regression
appears in many college text books. The basic concept of
regression is that it tries to represent one of the
dimension in terms of the others.
xk = f (X1 'X2 - •••• Xk_1,Xk + 1 , _ •• xn ) + 0
where 0 is a random error term.
The chosen xk is usually denoted as y, the regressand. The
remaining dimensions of x, usually called the regressors,
are then grouped as vector X. The y value is then
"predicted" by the vector X (providing that 6 is small).
38
The process of finding the function is usually transformed
to a minimization problem. A typical least square method is
to minimize the variation of predicted y and the actual y,
ie LS2.
For parametric regression, some form of the function is
assumed. For example, in linear regression, y is expressed
in terms of linear combination of x.
or y= AX (where A is a vector)
In non-linear parametric regression, we need to assume
beforehand some predefined orthogonal functions. Eeg Box and
Cox 1964] Needless to say, the result depends heavily on the
choice of the function. In our case of complex speech data,
we may then need to blindly try many different functional
models in search for interesting characteristics.
Non-parametric regression make no assumption of a parameter
model. Instead, it generally only assumes that the function
belongs to some infinite dimensional collection of
functions. Examples of such are polynomial, Fourier series,
and smoothing spline estimator [Eubank 1988].
However, usually the non-parametric regressions are
extremely computational intensive. Usually many iterations
are needed for the function estimation (potentially infinite
number of components). To make the matter worse, usually a
39
large number of data points are needed. This is time
consuming even if the aid of modern computer and may not be
applicable for small data set such as one single person
speaking a single word. Moreover, during the estimation,
some smoothing parameter has to be carefully chosen in order
to be accurate. Finally, there are still basic assumptions
that lie behind non-parametric regression. The curve are
usually be assumed to be continuous, differentiable or
differentiable with a square integrable second derivative.
This assumption may be doubtful in speech data where
traditionally a clear distinction is made on frictitious
sound and non-frictitious vowels.
The advantage of using regression is that it offer some
information of the modelling function. Moreover, it has the
flexibility to choose any dimension as regressand and can do
some manipulation on regressand and regressor separately
(such as taking the logarithm of either the regressand or
the regressors) .
Pure graphical methods tries to presents all information
contents in 2-dimension. Parallel axis presents all
projections along the axis. They are hard to analyze in the
context of this thes is. Each of the proj ection methods
discussed above, principle component, parametric and non
parametric regression has its imperfection. Is it possible
to have a method which is applicable to our speech data
while skipping the disadvantages of the above methods ? This
40
problem was answered by Ker-Chau Li in his Sliced Inverse
Regression method [Ker-Chau Li 1989].
3.3 SLICED INVERSE REGRESSION (SIR)
The usual method used in classical regressions is the
Forward Regression, with the goal being the approximation of
the response surface E(ylx). [Friedman and Stuetzle 1981,
Donoho and Johnstone 1985,Hall 1988, Chen 1988] The
statistics manipulations is to find the best projected
variables for predicting y. There is ,another relatively new
regression method called the Inverse Regression [Li 1988, Li
1989]. It works from y backward and towards its relationship
with each dimension in X and tries to estimates the inverse
regression curve E(Xly). Basically, it searches for the best
two or three uncorrelated projected variables that can be
predicted most accurately from y. It does not try to obtain
the regression function as in ei ther parametric or non
parametric functions. It simply try to obtain the best view.
SIR analysis procedure is as follows
1) Choose an arbitrary dimension y, the remaining
dimensions will be grouped in vector x.
2) Divide the range of y into H slices.
3) within each slice compute the sample mean of x,
denoted by Xh , h=l, ... , H
4) Compute the sample covariance of x,
41
n ~xx = L
i=l
and the weighted covariance for the slice means .
H
~1'\1'\ = L Ph (Xh-X) (Xh-X) t h=l
where Ph is the proportion of cases that fall into
the slice h, and x is the sample average of Xi'S.
5) Conduct an eigenvalue decomposition of the
weighted covariance L ~~ for the slice means with
respect to the sample covariance Lxx.
6) Order the eigenvectors according to the descending
order of the corresponding eigenvalues and pick
the first two ( ie with largest eigenvalues).
7) The two eigenvectors picked will be the necessary
transformations to the original dimensions.
Basically, the method first obtain a rough estimation of the
response curve be slicing y and obtain the E(X) for each
slide (step 1-3). It then proceed to perform a principle
component analysis on X (step 5-7). The sample covariance in
step 4 and 5 is for standardization of X and then transform
back to the original dimension after the eigenvectors are
42
obtained. (Without standardization, the resultant principle
components will be scale variant of the original
dimensions.) with the regressand and two transformation
axis, the original data can be transformed and be view in 3-
D graphics, which can be observed directly by human eye. The
resultant view will be the best view for that particular
regressand.
with the principle component analysis as background, the
eigenvalues obtained will correspond to the variances of the
original data points to the corresponding edr directions
(the eigenvectors). Therefore, they serve as indication to
the significance of the resultant transformation. If the
largest two eigenvalues are not very distinctively larger
than the rest, the resultant best view may not be as good as
expected. Therefore, the relative sizes of the largest two
eigenvalues with respect to the others serve as an
indication of meaningfulness. Further analysis is needed
should the eigenvalues are close to each other.
Li had done extensive work on mathematically proving that
SIR works as it should and had provided the necessary
mathematical foundation. He also demonstrate by using many
sets of sample and real life data [Li 1988,1989].
SIR combines the advantages of numerical methods and
graphical tools in spotting interesting characteristics of
speech data. On the numerical analysis side, SIR has the
43
flexibility and informativeness of regression analysis. All
the dimensions can be chosen as regressand for analysis.
Regression analysis provide more information about the
actual modelling function than pure principle component
analysis. Moreover, further transformation on either the
regressors or the regressand can be done separately. This
will facilitate further analysis of the speech data. It does
not need to assume the modelling function beforehand as in
parametric regression.
Comparing to most other non-parametric regression, the best
feature of SIR is the computational simplicity. Looking at
the algorithm, one easily finds that the most expensive
steps are just one round of sorting of n values and one
round of eigenvalue decomposition of a p by P matrix with
respect to another. In contrast to many other non-parametric
regressions, huge number of iteration is not needed. As a
result, this can be easily implemented on a very small
machine, even pc.
Another advantage of SIR is that it is less dependent on the
H, the number of slices on y. The choice of H acts like a
smoothing parameter in the nonparametric curve fitting
problem. [c.f. Eubank 1988, Li 1984,1985,1986,1987, Hardle,
Hall and Marron 1988]. For the nonparametric regression or
density estimation, if the smoothing parameter is not
carefully selected one may have inconsistency in estimation.
But for SIR, the result is less sensitive the choice of H.
44
A large range of H can be used and still have root n
consistent. This is shown both in Li's paper [Li 1989] and
in our later test data in next chapter.
The SIR does not predefined an exact form of the modelling
function as the parametric regression. It only assumes that
the function is in the form
which is a much less stringent condition on the modelling
function.
with the smoothing parameter introduced in SIR, the
resultant view is better if the final formula is actually in
the form
comparing with pur~ principle component analysis. Principle
component tends to manipulate all data point at once. The
local smoothin9 effect of slicing in SIR will provide a more
subtle result. This is actually conf irmed on some sample
data. with the same set of data points, the best view
obtained by principle component analysis tends to have a
larger spread than using SIR.
SIR is an inverse regression algorithm. There is also
advantage of the inverse regression over the forward one.
45
When the total dimensions is large, a huge number of the
data points are need in order not to have the points
sparsely located in X. Since forward regression, which tries
to obtain E(yIX), basically works on X, a large number of
points are needed to make the statistical manipulation
meaningful. On the other hand, inverse regression, which
tries to obtain E(Xly), works only from y to each individual
dimension in X, the requirement on the number of data is
much less.
Traditional regressions tend to dig into the detail of the
underlying formula [eg Van Rijckervorsel and de Leeuw 1988,
de Leeuw 1984, Breiman L. and Friedman J. 1985, Koyak 1987].
In parametric regression, trail-and-error assumptions have
to be made in order to obtain the detail formula. (try to
choose some orthogonal functions) . Non-parametric
regressions usually assumes infinite series functions and
involves complex calculations. On the other hand, the SIR
method only attempt to obtain the 2 best projections (to get
the best view on ~ne data). What the actual detail formula
is will be of no concern. This will save a lot of computing
time. And most important, the object of our experiment is to
find the best view only_ We want to see visually the shape
of the resultant points 'under those projected axes. The
detail formula is of lesser concern for the moment. If we
can visually see some interesting features or shape under
the projected axis, obtaining the detail expression will be
a much easier job. We will have a clearer picture of where
46
we should head and less blind assumptions will have to be
made. Therefore, SIR is chosen as our initial tool on the
speech data.
There is one theoretical pitfall in SIR. The major problem
of this algorithm occurs when the resultant 3D points is
more or less symmetr ical about the y axis. Under this
circumstance, the algorithm could not guarantee to project
the best view. For symmetry along the regressor axis, the
mean value on the transformed axis will be closed to zero.
This mean value will be used in the statistical process in
SIR. Therefore, small perturbation in the mean value will
induce large error. The resultant transformed axis will be
quite off from the actual best projections. This
disadvantage of SIR is confirmed when symmetrical data is
used. However, this does not
explained below.
affect our analysis as
First of all, ' this problem can easily be spotted by
reviewing the eigenvalues. If the data points are
symmetrical about the regressor axis, the resultant
eigenvalues will not show 2 distinctively larger values.
This indicate that either the data does not have a good
projection, or there is a possibly symmetry on the y axis.
Secondly, for real life speech data, it is highly unlikely
that the data points are symmetrical about the regressor
axis for every dimension as the y-coordinate. Therefore, it
47
is highly improbable that suitable projection cannot be
obtained no matter which dimension is chosen as the y co
ordinate. In the actual tests, every dimension in our speech
data will be used as the y-coordinates for SIR. In
conclusion, this problem in SIR should not pose any problem
in our analysis.
On the other hand, if it is fortunate, the resultant data
may be have only one significant eigenvalue and the
remaining eigenvalues are insignificant. Then the case will
resemble the simpler linear regression. Therefore, what SIR
can accomplish is a superset of linear regression, ie always
more powerful.
After obtaining the regression projections, the data points
are transformed according to the first two eigenvectors.
Including the regressor y, the data is then transformed to
a total of 3 dimensional. Viewing 3 dimensional data on 2
dimensional surface is not - a simple job. Li used the
rotating method to view the 3-D points on the computer
screen. This will let the viewer to see all possible views
and therefore will not cause any loss of information in 3-D.
The commercial product Macspin, as an instance, has provided
a user-friendly environment for graphical analysis of three
variables by rotation. Li used XLISP-STAT to perform his
rotation, which is more flexible than the commercial
Macspin. We will use the C programming to do job. As
described in the next chapter. This will allow almost
48
limitless flexibility for further investigation.
As a result, the SIR effectively combined the essence of the
many approaches - pure visual representation, dry numerical
analysis, principle component and regression analysis of
multi-dimensional data. with the aid of human eye, it is
hoped that interesting features can be obtained on the
speech data.
\
49
CHAPTER 4 DAT A ANALYSIS
4.1 PROGRAMS AND TEST DATA
SIR program has been implemented using MATLAB, which provide
many built-in functions for vector, matrix and statistic
manipulations. The hardware platform is the DEe station
2000. The program listing is attached in Appendix I. Due to
the power and compactness of MATLAB, and the simplicity of
the SIR algorithm, the program lis~ing is not that lengthy.
A set of points on a monkey saddle is used for testing. The
equation of the monkey saddle is
where
B: = (0.5,0.5,0.5,0.5,0,0,0,0),
B: = (0,0,0,0,0.5,0.5,0.5,0.5), Q = 2 , € = random (-50, +50)
This monkey saddle was also used by [Li 1989] and therefore
can be used as a cross check.
A small set of data (no. of data n=10,no. of slices h=5,no.
of dimensions d=9) had been tried on the program to test the
steps and results first. The resultant eigenvalues and
eigenvectors are listed in Table I. Each step in the program
50
was followed manually to ensure correct results. Due to the
small amount of data points involved, the eigenvalues and
eigenvectors obtained are not quite meaningful. However,
this serves the purpose of manually checking out the
program.
After checking that every step is correct for these 10 data
points, another 500 data points on the monkey saddle are
used. Dimension is kept to be 9. Various values for h has
been tried. The results for h=10, 30, 50, 100 are listed in
Table 2,3,4,5 respectively. The result agrees quite well
with the expected. The two eigenvectors corresponding to the
largest eigenvalues are closed to the original Bs. Also, the
result agrees with Li's proposal that h need not be large
for smoothing y. Only slight improvement is seen when h is
increased from 10 to 30. For h =50, the result is similar to
h=30 with a slight deterioration you may say. For h=100, the
result seems to deteriorate slightly more. It may be due to
the small number of points in each slice and hence affect
the statistical calculation a bit. It seems that h=30 for
n=500 is a reasonably good choice. This matches with Li's
51
EIGENVALUES
* *
D V1 V2 V3 V4 V5 V6 V7 V8
.542 .900 .900 5.66 -3.47 -1.53 -4.03 -6.66
E-15 E-15 E-16 E-17 E-18
EIGENVECTORS
* *
D V1 V2 V3 V4 V5 V6 V7 V8
Xl .139 .170 -.022 .162 .069 -.428 -.832 .421
X2 -.405 -.316 -.287 -.420 -.137 .320 .338 -.623
X3 -.358 -.121 -.317 -.229 -.078 -.137 -.025 .415
X4 .497 .632 .200 .506 .163 .082 . 133 -.017
X5 .386 .197 .198 .216 .129 -.333 -.309 .387
X6 -.082 -.142 -.706 .010 -.043 -.082 -.013 .025
X7 .174 .224 .270 .234 .106 .150 .154 .193
X8 -.506 -.589 -.374 -.622 -.245 .428 .020 -.267
TABLE 1 EIGENVALUES AND EIGENVECTORS
FOR THE TESTING MONKEY SADDLE WITH n=10,h=5.
52
EIGENVALUES
* *
D V1 V2 V3 V4 V5 V6 V7
.732 .279 .020 .014 .010 .006 7.61
E-4
EIGENVECTORS
* *
D V1
Xl .489
X2 .474
X3 .523
X4 .511
X5 -.004
X6 -.024
X7 -.015
X8 -.031
V2 V3 V4 V5 V6 V7
.007 -.093 .319 .418 .694 .156
.010 .077 -.331 -.482 .158 -.492
-.008 .125 .002 -.327 -.090 .253
-.087 -.000 .098 .362 -.659 .060
-.475 -.066 -.172 ·.158 -.068 -.428
-.537 .722- .046 .197 -.073 -.393
-.482 .002 -.605 -.104 .197 .545
-.485 -.119 .618 -.530 .032 .177
TABLE 2 EIGENVALUES AND EIGENVECTORS
FOR THE TESTING MONKEY SADDLE WITH
n=500, h=10.
53
V8
.002
V8
.081
.361
-.746
.348
-.330
-.127
.177
.170
EIGENVALUES
* *
D V1 V2 V3 V4 V5 V6 V7 V8
D .831 .306 .071 .013 .019 .032 .051 .043
EIGENVECTORS
* *
D V1 V2 V3 V4 V5 V6 V7 V8
Xl .497 .010 -.201 .444 .403 -.623 .070 .073
X2 .486 -.095 .021 -.402 -.194 .130 .725 .101
X3 .517 .000 .307 .018 .398 .517 -.366 .332
X4 .498 .071 -.248 -.033 -.546 -.013 -.388 -.470
X5 -.008 .484 .580 -.444 .114 -.381 -.142 -.171
X6 -.031 .535 -.575 -.324 .402 .288 .001 -.219
X7 -.026 .474 .334 .577 -.095 .308 .391 -.190
X8 -.011 .491 -.158 .074 -.398 -.063 -.110 .735
TABLE 3 EIGENVALUES AND EIGENVECTORS
FOR THE TESTING MONKEY SADDLE WITH
n=50Q, h=30.
54
EIGENVALUES
* *
D Vl V2 V3 V4 V5 V6 V7 V8
D .845 .336 .037 .043 .055 .090 .072 .077
EIGENVECTORS
* *
D Vl V2 V3 V4 V5 V6 V7 V8
Xl .487 .019 .749 .166 -.292 .055 .205 -.180
X2 .488 -.070 -.478 .419 -.462 -.032 -.348 .078
X3 .517 .014 -.069 .126 .569 -.277 .321 .503
X4 .505 .048 -.138 -.630 .152 .346 -.158 -.359
X5 -.018 .454 -.269 -.097 -.127 -.543 .423 -.463
X6 -.027 .551 -.236 -.072 -.199 .574 .393 .338
X7 -.025 .512 .223 -.221 -.119 -.379 -.545 .415
X8 -.22 .469 .089 .566 .533 .176 -.280 -.282
TABLE 4 EIGENVALUES AND EIGENVECTORS
FOR THE TESTING MONKEY SADDLE WITH
n=500. h=50.
55
EIGENVALUES
* *
D V1 V2 V3 V4 V5 V6 V7 V8
D .856 .352 .212 .081 .141 .095 .113 .110
EIGENVECTORS
* *
D V1 V2 V3 V4 V5 V6 V7 V8
Xl .494 .0155 -.195 .807 .245 -.119 .185 -.072
X2 .480 -.100 -.122 -.321 -.087 .532 -.017 -.569
X3 .525 .038 .551 -.286 -.128 -.418 .423 .048
X4 .498 .049 -.321 - .. 136 -.109 .004 -.52 D. .550
X5 -.017 .453 .440 -.014 .432 .545 .059 .320
X6 -.037 .526 -.124 .096 -.765 .222 .222 .151
X7 -.020 .564 .226 .024 .069 -.319 -.509 -.488
X8 -.016 .431 -.530 -.367 .358 -.283 .452 -.058
TABLE 5 EIGENVALUES AND EIGENVECTORS
FOR THE TESTING MONKEY SADDLE WITH
n=500, h=100.
56
extensively using h=10 to 20 for n around 300. Since the
eigenvalues and the eigenvectors are meaningfully obtained, the
later calculations are also checked manually to ensure
correctness.
In additional to the MATLAB program, a Turbo-C program has been
written for rotational view of the 3-D data. It reads all the 3-D
data points and then rotates it about the z-axis. The 3-D points
will be the transformed points of the 12 dimensional SIR/Cepstral
coefficients after the SIR analysis. Using this rotational view,
any interesting characteristic may be spotted by the human eye -
the most powerful analyst tool. The complete source program is
list in Appendix II.
Turbo-C is employed due to its extensive library of graphic
functions. These graphical functions can be used easily. This
will save much time comparing to programming in x-window for the
DEC workstation. The program will ensure that the data will be
viewed at all angles for possible interesting features. To
enhance viewing, perspecti ve is used. [FIG 17] Though some
graphic theorists and psychologists may be argued that
perspective does not improve much in viewing 3-D data, the cost
of perspective in this program is relatively little. The program
had actually be tested both by employing perspective and without
perspective calculations fo~ 500 points. Using perspective does
not noticeably slow down the rotation for 500 data points.
Therefore, perspective is still employed in our program.
57
with
out
Pers
pect
ive
Far
obje
ct
Nea
r ob
ject
-'-t
I __
_ ~-,--:~ _
_ ",:-
Scr
een
'9
View
poin
t
With
Per
spec
tive Far o
bjec
t
Nea
r ob
ject
'V
View
poin
t
FIG
1
7
WIT
H
PE
RS
PE
CT
IVE
, N
EAR
ER
OB
JEC
TS
A
PPE
AR
L
AR
GE
R,
FUR
TH
ER
O
BJE
CT
S
APP
EA
R
SMA
LLER
58
In the 3D rotational program, the rotational increments of yaw
and roll can be adjusted. [FIG 18] Too large the increment may
miss some interesting views and the motion will look too
discrete. Too small the increment will slow the rotation. For a
balance, an increment of 5 degree is used in the actual data for
both roll and yaw increment. Yaw ranges from -90° to 90°, which
allow virtually every possible angle to be viewed.
As seen from the listing, a dump-screen function is included for
hard copy record of results. However, in testing the actual
printed result, it is relatively disappointing. without the
rotational motion, data points usually look like a random cluster
even though there are some interesting features in the figure.
Characteristic can hardly be spotted with one single static
image. Quite a number of consecutive plots are needed to review
the actual shape. As a balance, 2 images per rotation will be
included in this thesis for displaying the result.
The resulting 2 eigenvectors obtained in SIR will be used to
multiplied with the X in the raw data respectively. Together with
y, the raw data will then be transformed to a new set of 3
dimensional data. These 3-D data will be feed into this
rotational program for viewing. For initial try, some random
points around a plane and a monkey saddle are generated and are
rotated by the program respectively. The result works good. Both
the plane and the monkey saddle were easily spotted in the
rotational view. (FIG 19,20) Then, the SIR transformation obtained
for h=30 is then employed to the data. The resultant monkey
59
x
~ Z H ~ ~ H :> Cl I
M
Z H
~ ~ ~
0 \D
Cl Z ~
..................................................... ................. ;- ......... .l .... :' .:
/:::~;.,/
~ ~ 0 p::
ex)
M
. ~
N~ _____ ~/i/J & H ~
>-
o Cl ~
o o
, '
, ."
. ", t' "
", , , ..... · :......
.' "
, " " • ". I.
:- . ", '::-
• I ••
: : .. ; :" .
.. : .' :',::.;. "
" ,
------------------~-.-~ ::---l __ ci :·,.: ....... ~'. , ,.,' ~.. .'
. I ,\. , . . , .. I " .. :.
, ' . . ..... :.. . '.' .. ,'.' .,' . '
o o
. i": :- \ .. " , . ./.':- '.: ~ \ .~'" . " ... . . "- .: i .'
. , :' " I
, :
'"
. . r .
~.
'I
11
-' '. " .... ", .
, - .:,' ,', , " · " ..
t : • ..... • •••
...... . ,', ....... ··:.t ','
:'"
... ,
" . ,
'. ~ z , '. ,c(
~ At
,c(
Z 0
Cl)
8 Z H 0 At
~ 0 0 Z
C2 0
M 0 \0
~
~ 0
Z 0 H 8 ,c( 8 0 ~
0 I
C"'1
Cl' r-i
~ H ~
·'
VA~
o 1
0
ROLL
: o
&0.
0
: ... , .
.. :: ..
=.:.:
' .. '.'
:'"
... :!~:
., .:.'
.. .. '.
' ':. if
__
-~~~~~. ;:
"2~)··:·· '
~~~~-:(~.~;:
~j:;. ~.;'~: :::,,: :
:. .~
":"~
..:.
~.:.
.'
_..
..:..
_:.,
I •
•
.-
.'.
YRU
o 1
0
ROLL
· o
90
.0-
'. . .,
'., ";."
':.:.:-'
, .. .' ",
t·· ':. .
' ; .':.<
..... ·
.'
'; .. '...
. "
.:; .. : :rr"
'~""'-:"-:::
' :':: .. '.;,
: .;;: , .
.
.: ..
... : .. : .. .,-:
.,,-;
-. .
",.::,:,.
:.:';.:
'.
• ".
J ..
. .
"
.:,
.'
FIG
. 2
0
3-D
R
OT
AT
ION
O
F 4
00
R
AN
DO
M
PO
INT
S
ON
A
T
EST
M
ON
KEY
SA
DD
LE
62
saddle is fed into this 3D rotating program. The original monkey
saddle and the transformed monkey saddle are both displayed. (Fig
21,22). It is seen that the best view shows the monkey saddle
shape as in the original data.
A planar structure can be easily spotted on the rotational· view.
In some viewing angles, the transformed data points resemble to
a line while in some other viewing angles the data points look
scattered in the 3D space. When viewing at the video monitor, the
plane can be easily spotted due to the continuous motion. The
static plots do have difficulties in showing this planar feature.
It still serve as indications though.
4.2 ACTUAL SPEECH DATA RESULTS
The raw data at hand are digitized samples of single words spoken
by 3 different persons. For simplicity, they are called speaker
A,B,and C respectively. Each of them spoke the English digits
"One" to "Nine". The sampling rate is 8000 samples per second
with 12-bit accuracy. The LPC and cepstral transform programs are
modified from the speech library of Dr. Edmund Lai (Department
of Information Engineering, Chinese University of Hong Kong) . The
resultant listings are listed in Appendix Ill. The order used for
LPC and cepstral analysis is 12. Each analysis frame contains 256
sample points, which correspond to 32 ms. Frame advance of 100
sample points are used, ie. 12.5 ms. Hamming window is used as
discussed in Chapter 2. All these parameters' used are reasonable
and common in most research environments.
63
-._---
--._-
-._---
--.
.'
'.,
o' --. ....
Io
.
:-:. .-
-...
.... _
-_
. ~p •••
_---------------------"-~.'=i'-' ....
::.
.:..'
' . • :'
1-';'"
' . .
. .,
~~
......
:.:.:.
"r'
.':1,:
. t .. -.!
-
:0 ..•..
':
" .' '.'
FIG
. 2
1
'l,lf-)
1.·,;
i(]
--.1
0
Ra~ L
I[
]
,1.;'2
:t) .~]
J=1
J_l::
1"
,:1 ('
I;;.~5:[t 1
. (]I
I"-~
J
.' " ' .. .' o ~ , ., . -l- ,P- ;:1
.' .. ';
'0:/
···.1
., .-;
. . it',·
':'l
;~:"
.~-.. -... --
-<~¥.~
~~--~-
-----~
----
--~---
--.. -~
. :'
::~::.:.
---
---~--
--.
-' .. y
': ....
, "', '. P
. ,
1-:
:.-
.0 .'
.'
3-D
R
OT
AT
ION
O
N
TH
E
OR
IGIN
AL
M
ON
KEY
SA
DD
LE
BY
LI
64
'\;A
J.o~
a -2
0.0
fJoJ
JLL
o
23
0.0
FIL
E
.·!:
4(y
sd 1
.o
.'-g
'"--
--
'.
o· . " :: 1
"
: ......
~I.'
•
i'
-----~--
--------
~ .~[;~~~~
: -~~
, .
. -r.' e
r _
_-----------
.. ---
----
-l--
.----
-·
'.", i.~
:·i~::
· .: -
:·~"::1
i', ~ •
./::::~.~:
:: .:
, ....
, , '.-'
. '.:
:' .'
.',
:.
\·~~
~ii·
~
o --
1.0
~::t
fJi_
t_
\..J
~::u
) . (
)
~.=··r!
:c
c .i
. I_
A..
_
~
.~ ..
vu~~.jsc .. l
. s· 1
. i·-
• I
' ••
" .
,e,: .,;
. ...
'11
••••
••
·A !
....
-----
-~ .[ !
:.:;'
.
-----~
-;: ··it/
: .,: .. _
---.
----
----
--f";
:.~"".
:. '-
('
01'
.::~(
, ,., ..
. : :"
,' .....
. 'E~
' ,,l'
•...
.-:
. "
"
FIG
. 2
2
3-D
R
OT
AT
ION
O
F T
HE
S
IR
TRA
NSF
OR
MED
M
ON
KEY
SA
DD
LE
BY
LI
65
\r-I
!.J
o -.
10
i:::D
I-L
P 1
-30
.0
!='I
LE
rill(~5d I
. s i
t-
After the LPC transform (or Cepstral transform) of a single set
of raw data, the resultant }2-dimensional data are fed in SIR for
analysis. Then the transformed 3-D data is displayed by the
rotational program.
Five sets of data are tested.
1)
2)
3)
4)
one single utterance
12 utterances of "4"
3 utterances/speaker
2 utterances/digits
of
by
of
"1"
"4" by speaker A.
speaker A.
"4" by speakers A,B and C.
to "9" by speaker A.
5) 1 utterance per speaker per digit of "1" to "9" by
speakers A,B and c.
The results of each test are described in the following sessions.
4.2.1 SINGLE UTTERANCE OF "4" BY SPEAKER A ONLY
Initially, the data of a single utterance of "4" is used. The
result shows a thin threading in the 3-D views. No matter which
dimension is chosen as regressor, a thin thread feature still
appears. A typical figure is shown in Figure 23. This is somewhat
expected because this is only a single utterance of "4" by one
person, which does not contain much variation in sound. Besides
the frictitious "f" sound in "four", the "our" is a quite
constant sound with only slight and gradual change. Therefore,
the data points are expected to be a thin thread shape. Anyway,
it serves to prove that our experiment setting works as it is
. planned.
66
~ ..
, , \ \ 'l " '1 . I \ \. .... , \ '1, .. ... .... ... " "
", \. '. \1'··
.-. ~
,,-
-.,.
_-'_
I\.
_.
. ----
.. --.---.
-.--~--
. _-.
. ---J
.,
... -
I
"
#ii
o -5
0,0
RO
LL
: a 1
15
.0
FIL
E
: x
frM
4.3
d
... :.
,.,.F
· ..
.. -,'
'" .'
.... 1
"".'
.,.'
.' r···
· .. .' ..,- .'
•. 1··· ..
.r··
.' , . . '" .. "
... r
1"-
: "
" ."
'''''-' .. ~
,
, .... , .....
'. -·0
", "'~
...... , .. '.
'.~
L ..
... ., •• ,
•••• 1-
. ... ".',\
., .":
. ..
FIG
23
A
TY
PIC
AL
V
IEW
W
ITH
S
ING
LE
U
TTER
AN
CE
"4"
OF
SPE
AK
ER
A
ON
L
PC
CO
EF
FIC
IEN
TS
67
...
_,
..If
•
~a·~
. a
-50
.0
RC
LL
o
31
5.0
FIL
E
xft
-M4
.3d
SOR
TED
EI
GEN
VA
LUES
1 2
3 4
5 6
7 8
9 1
0
11
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
-0
.98
4
0.9
42
0
.67
3
0.5
72
0
.50
8
0.4
20
0
.35
3
0.1
92
0.
1'19
0
.05
1
0.0
19
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S
1 2
3 4
5 6
7 8
9 1
0
11
--
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
---
-0.1
88
-0
.00
3
0.1
19
-0
.01
7
0.09
'1
-0.0
22
-O
.U'H
i -0
.12
3
-0.0
10
-0
.19
7
0.1
25
-0
.14
0
-0.0
02
0
.46
3
0.0
16
0
.17
0
-0.2
30
-0
.21
2
-0.3
43
-0
.03
4
-0.0
56
0
.03
3'
-0.1
88
0
.01
9
0.3
16
-0
.05
5
0.0
80
-O
.37
ti -O.~HO
-0.2
7,1
-O.03~
0.1t
13
0.0
33
-0
.29
1
0.0
12
0
.19
6
-0.1
96
0.
Ofj
5 -0
.29
0
O. 2~\
5 -0
.07
9
-0.0
07
-0
.30
0
-0.0
67
-0
.28
5
0.2
63
0
.37
3
-0.4
81
0
.27
9
-0.1
.187
0.~5~
-0.3
11
-l).1~~
-0.3
22
-0
.38
9
-0.0
55
-0
.21
0
-0.3
83
0
.41
1
0.3
27
O
. '18
t1 -
O. 1
55
-
O. 1
62
0
.06
4
-0.3
77
-0
.13
8
0.1
98
-0
.26
6
0.'1
53
0.2
38
O
. {j0
2 D
.l :j
7 -0
.02
5
-0.0
05
-0
.35
3
-0.5
11
0
.29
5
-0.4
3'1
0
.57
0
-0.4
83
0
.33
3
-0.1
39
0
.04
3
-0.5
03
-0
.33
5
-0.7
19
0
.20
9
·-0
.5'1
1
0.3
73
0.
0,13
0.
00,1
-0
.51
2
0.1
23
-0
.12
8
-0.2
80
4-0
.21
8
0.2
56
-Q
.15
6
0.0
55
-0
.07
'1
0.06
,1
-0. '
1 65
0.
,126
0
.38
9
-0.3
68
0
.28
8
0.0
77
0
.05
9
-0.1
82
0
.28
1
-0.1
93
-0
.37
9
O. 8(j
~ -0
.53
0
EIG
EN
VA
LU
ES
AN
D
CO
RR
ESP
ON
DIN
G
EIG
EN
VE
CT
OR
S FR
OM
S
IR
ON
L
PC
CO
EF
FIC
IEN
TS
O
F S
ING
LE
U
TT
ER
AN
CE
1/
'1"
BY
SP
EA
KE
R
A
WIT
H
THE
4TH
C
OE
FF
ICIE
NT
A
S Y
0.0
40
0
.20
6
0.2
61
0
.37
7
0.5
49
0
.54
0
0.3
59
TA
BL
E
6 T
YP
ICA
L
EIG
EN
VA
LU
ES
A
ND
E
IGE
NV
EC
TO
RS
FOR
S
ING
LE
U
TT
ER
AN
CE
O
F 11
4"
BY
SP
EA
KE
R
A O
N
LP
C
CO
EF
FIC
IEN
TS
68
Upon inspection on the eigenvalues for the LPC coefficients, it
is seen that the first two eigenvalues are not signif icantly
larger than the rest. For example, the eigenvalues and
eigenvectors corresponding to 'the result above is shown in TABLE
6. Therefore, it seems that the transformation obtained may not
be the actual best views. It may be due to the relatively small
number of points in the raw data. Anyway, it still provide a thin
thread result as we expected. And it serves as reference for
future comparison.
The cepstral coefficient result is rather disappointing. It shows
a similar shape as the LPC coefficients, but with much greater
dispersion in points. An example is shown in Figure 24, which
correspond to the LPC resul ts in the previous figure. Upon
inspection of the first 2 eigenvalues, they are also not
distinctively larger than the rest [Table 7]. Therefore, no
conclusion should be made at this moment of time concerning the
worsening result of the cepstral transform. It may be caused by
~IR instead of problem of the transform itself.
Details of all 3-D views,eigenvalues and eigenvectors are shown
in appendix IV.
69
...... •• 1
, ..
.:.:
:
{ / t I t j I I . ! I I I , J I I t I ! ! ---"'-
.- '"-----
-- --""
'---- . --~
-.---.
-----.
~
o -4
5.0
ROLL
: '0
. 33
5.0
FIL
E
: x
f 1-1
,,4. 3
d
. . . .
-~--
----
--~-
.---
--.'
.I e ••••• :
,I / , / l
FIG
24
A
TY
PIC
AL
V
IEW
W
ITH
S
ING
LE
U
TTER
AN
CE
"411
'OF
SPE
AK
ER
A
ON
C
EPS
TR
AL
C
OE
FF
ICIE
NT
S
70
~f,j
(]
-40
.0
RO
LL
: o
17
0.0
FIL
E
: xft,~.3d
SOR
TE
D
EIG
ENV
ALU
ES
1 2
3 4
5 6
7 8
9 10
11
--
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
---
0.92
8 0.
715
0.59
1 0.
556
0.50
6 0.
395
0.3
/ 14
0.26
6 0.
194
0.11
7 0.
050
CO
RR
ESP
ON
DIN
G
EIG
EN
VE
CT
OR
S
1 2
3 4
5 6
7 8
9 1
0
11
--
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
---
-0.1
48
-0
.24
0
-0.1
53
0
.01
1
0.1
39
0
.04
5
0.1
07
0
.07
2
0.0
51
-0
.01
3
-0.1
96
0.0
37
0
.25
4
0.1
65
0
.05
2
-0.0
'15
0
.10
1
0.1
57
0
.02
0 -O.O~)O
-0.0
39
0
.07
4
0.2
32
-0
.07
7
0 .. 3
17
0.0
27
-0
.1 (j
/l
-0.2
'11
0
.01
2
-0.3
09
-O
.06
fj
0.0
12
-0
.04
2
0.2
25
-0
.72
0
0.0
35
0.
'119
0.
2U8
0.0'
11
0.0
32
-0
.07
6
0.0
89
0
.07
3
-0.0
70
0.1
31
-0
.01
2
0.4
66
0
.2(j
8
0.0
06
-(
).'1
96
0.1
11
0.
3(jf
) -0
.]7
3
-0.1
19
-0
.22
1
0.0
25
0
.09
7
0.07
'1
-0.1
18
-0.13~
() . 0
~ '7
-
0 . 5
(; 0
-
(). 1
1 :1 H
-
O. ,I
J H
-
O. 1
02
-
O. 1
21
0
.42
8
0.3
81
-0
.27
8
0.2
83
-0
.70
/1
0.19
H
0.:3
1'1
-0.0
22
0
.21
5
-0.4
30
-0
.00
5
0.3
21
-0
.33
0
0.2
36
-0~607
-0.1
99
0
.27
0
O.2
H3
-0.1
2'1
-0
.24
2
0.3
70
-0
.22
0
0.1
52
-0
.08
9
0.4
65
-0
.39
0
0.3
20
-0
.36
0
0.1
94
-0
.57
5
0.7
61
-0
.10
H
-0.2
46
-0
.04
3
'.-0
.00
7
-0.5
00
-(
). 2
15
0.
'1 fl
2 -0
.5]
,I -0
. 25
0
-().
/15~
-0
.23
1
0.'1
57
-0.0
11
,
0.7
37
-0
.28
6
-0.1
62
-0
.26
9
-0.0
86
-0
.40
9
-0.5
92
0
.10
1
-0.1
35
-0
.65
2
0.8
81
,EIG
EN
VA
LU
ES
AN
D
CO
RR
ESP
ON
DT
NG
E
IGE
NV
EC
TO
RS
FRO
M
SIR
ON
C
EP
S'I
'RA
L
CO
EF
FIC
IEN
TS
O
F
SIN
GL
E
UT
TE
RA
NC
E
"'1"
B
Y
SP
EA
KE
R
A
WIT
H
TH
E
4TH
C
OE
FF
ICIE
NT
A
S
Y
TA
BL
E
7 T
YP
ICA
L
EIG
EN
VA
LU
ES
AN
D
EIG
EN
VE
CT
OR
S FO
R
SIN
GL
E
UTT
ERA
NC
E O
F 11
4"
BY
SP
EA
KE
R
A
ON
CE
PS
'rR
AL
C
OE
FF
ICIE
NT
S
71
4.2.2 TWELVE UTTERANCES OF "4" BY SPEAKER A
All the 12 utterances of "4" by speaker A are used. Among the raw
data, speaker A has the most utterances per digits. The other 2
speaker have only 5 utterances per digits. Therefore, we used
speaker A in this and the previous experiment.
In viewing the resultant transformed data, the view produce quite
good results for LPC coefficients. A typical one is depicted in
Figure 25. - When using the first few dimensions as the y
regressors, a clear view of two clusters are obtained. One of
them should be the voiced sound in "-our" while the other cluster
should correspond to the unvoiced sound "f". The resultant views
show that the data cluster quite compactly. A strip shape cluster
should correspond the voiced sound. variations should be caused
by the difference between different utterances. The elongated
shape should reflect the gradual transition of the voiced sound
to contain the "r" sound effect. Obviously, a more detail
separation of points is needed to actually confirm this.
As the y regressor reaches the last few dimensions, the
points began to spread out. This is somewhat expected because the
last few dimensions are relatively small value. Using this
dimension as the regressor will be much more sensitive to small
errors, inaccuracies, and other variations.
72
.1'.-
' .. '
-... -~
... -....•.
... -....
... ~
_ •.••
.•. 1'
·"·
-.--.~".
~ . ...-, .....
.. ..
-:-- >':':":'
:.:.:.:.
"::;:'~"
" .-.. .....
. , ... , .•.
. , •.....•.. ••.. ~ ...
..• •...•.•...
.-: .. ,
YAW
a
-50
.0
RO
LL
: o
31
0.0
FIL
E
: lP
C4
S4
.3D
:. '.
;1
.__
__
__
__
__
__
ii
-off
I ..t
.
r
...••
·.r-.
. IT
"'- t:'
r;~
YAW
: o
-4-0
.0
RO
LL
(]
90
.0
FIL
E
: L
PC
4S
4.3
D
FIG
2
5
A T
YPI
CA
L
VIE
W
WIT
H
12
UTT
ERA
NC
E "4
" O
F SP
EA
KE
R
A O
N L
PC
CO
EF
FIC
IEN
TS
73
In contrast to the previous single utterance case, the first two
largest eigenvalues for LPC coefficients are quite significantly
larger than the rest in general. (Refer to Table 8) One exception
is that the second largest eigenvalues, though still quite larger
than the most of the others, are usually relatively close to the
third largest eigenvalues. As from the theory of SIR, this may
still spawn doubt of the accuracy of the best views as the
previous single utterance of 11 four" by the same speaker. However,
the views obtained are still encouraging and provide reasonable
results.
As in the previous case, the cepstral transform failed to obtain
any better result. Instead, it seems again to worsen the view.
A typical example is shown in Figure 28. The best view on the
cepstral coefficients after the transform does not show any
interesting features. Instead, at the very best, only clustering
is observed. According to the eigenvalues, the view obtained by
SIR are similar to the one found in the LPC coefficients. Table
9 shown the eigenvalues and corresponding eigenvectors for Figure
28. The largest two eigenvalues are quite distinctively larger
than the rest. still," the second largest eigenvalue is somewhat
close to the third largest eigenvalue. There may still leave a
possibility that the worsening of cepstral transform may be
caused by SIR. Therefore, the value of cepstral transform cannot
be discarded yet.
Details of all 3-D views,eigenvalues and eigenvectors are shown
in appendix V.
74
SOR
TED
EI
GEN
VA
LUES
:
1 2
3 4
5 6
7 8
9 1
0
11
0.9
94
0
.91
00
.35
2
0.1
96
0
.14
7
0.0
68
0
.04
1
0.0
30
0
.02
3
0.0
18
0
.00
3
CO
RR
ESPO
ND
ING
EIG
ENV
ECTO
RS
1 2
3 4
5 6
7 8
9 1
0
11
.---
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
--0
.19
9
0.4
35
0
.55
0
-0.1
08
-0
.06
5
0.1
44
-0
.10
6
0.1
20
-0
.02
2
-0.1
97
-0
.05
5
-0.0
01
0
.01
2
-0.0
91
-0
.. 143
0
.04
4
0.1
65
-0
.05
5
0.2
37
-0
.06
4
-0.4
65
0
.10
8
0.4
64
-0
.12
2
-0.4
91
-0
.09
3
0.1
27
-0
.25
7
0.2
49
-0
.30
9
-0.0
17
-0
.26
7·
0.1
47
0
.31
5
-0.4
51
-0
.01
1
0.0
40
-0
.16
6
0.4
38
-0
.50
2
0.1
86
-0
.23
6
-0.2
71
-0
.13
9
0.0
48
-0
.07
8
-0.2
61
0
.21
9
0.0
10
0
.08
2
-0.2
83
-0
.33
3
-0.1
44
-0
.44
2
-0.2
18
0
.44
9
0.1
75
0
.01
5
0.2
44
0
.08
1
0.1
38
0
.03
5
-0.0
00
0
.36
7
-0.1
31
-0
.38
2
0.3
23
0
.57
4
-0.0
94
-0
.15
5
-0.2
62
0
.60
0
-0.1
48
0.
l1S7
0
.10
9
-0.0
93
-0
.58
2
-0.1
74
0
.24
7
0.0
73
-0
.28
0
-0.1
46
-0
.18
7
-0.0
55
0
.01
3
-0.4
04
-0
.45
2
-0.5
56
-0
.05
6
-0.3
92
0
.54
4
-0.2
48
0
.32
5
-0.4
07
0
.44
9
-0.5
40
0
.19
2
-0.4
12
-0
.26
9
0.3
25
0
.05
0
-0.0
37
-0
.60
8
0.4
03
0
.21
9
0.1
02
-0
.30
4
0.5
24
-0
.06
6
-0.0
25
0
.44
6
-0.0
98
-0
.26
3
-0.5
63
0
.76
5
0.2
'14
-0
.59
/ 1
0.2
60
-0
.54
5
0.0
02
0
.17
3
EIG
EN
VA
LU
ES
AN
D
CO
RR
ESP
ON
DIN
G
EIG
EN
VE
CT
OR
S FR
OM
S
IR O
N
LP
C
CO
EF
FIC
IEN
TS
O
F
12
UT
TE
RA
NC
ES
O
F
"4"
BY
S
PE
AK
ER
A
W
ITH
TH
E 4T
H
CO
EFF
ICIE
NT
A
S Y
TA
BL
E
8 T
YP
ICA
L
EIG
EN
VA
LU
ES
AN
D
EIG
EN
VE
CT
OR
S FO
R
12
UTT
ERA
NC
E O
F 11
4"
BY
SPE
AK
ER
A
O
N
LPC
CO
EF
FIC
IEN
TS
75
....
... -
0: o
... :
~~ ~.. ,
::;):: ...
:.:
° 0
00
.., •••••
: •••• r.
.1' ::
.~: .
: .':
.~.: .•
--:=' ~
h-i:··
.'::·:
....
' s. ~ ...
~ l.f
1:"..
.. °
'/_
"_'I
-"-'
I-"-
'I_
/-.. -·~~~
~~?F<:: .. :,:.: ...... .
"1
.•.. "'. '. ". ... '1 • . , ....
•. ; ..•.
=.J=
-~W
i -
--
a -5
0.0
RO
LL
a
2iO
.O
Fli
_E
x
r t-t
'14
. 3d
.... -
. -.
. -:"'':
:
.~:~.~:~
• L
.: . ...:
.. ,A
-::'
!'.
_ L
· .J.
. ..
... ...
•... _ ...
• I'
....•
. . --.
...• .... . -
. l~~~~· : ..........
..... .... ! :.
:.~.
. '::
,1.'
. ...
. .t1
;~
......
. ••
-L
. e •
. ··t· .
I ••
• ~ •.••
· •••. ~~
.:
.: .. -:
-..
. ;....
-
::: ..
.: .: ~~U;i~
~;L~ .... .' :.:~.
' ••
"e __ ••
• -..... -.
"'--
:..-.
,-- ." ..
'-. -.... ' ..
".'-:
. ....
'tAW
(]
-50
.0
RO
LL
o
32
0.0
FIL
E
x"ft~.3d
FIG
2
6
A
TY
PIC
AL
V
IEW
W
ITH
1
2
UT
TE
RA
NC
E
"4
"
OF
S
PE
AK
ER
A
O
N
CE
PS
rrR
AL
C
OE
FF
ICIE
NT
S
76
SORT
ED
EIG
ENV
ALU
ES
1 2
3 4
5 ·6
7
8 9
10
11
-~------~--------------------------------------------------------------------
0.8
14
0
.45
0
0.1
44
0
.11
9
0.0
87
0
.07
3
0.0
64
0
.05
2
0.0
46
0
.03
3
0.0
12
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S
1 2
3 4
5 6
7 8
9 1
0
11
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
-0
.18
8
-0.0
88
0
.01
0·
0.0
51
0
.30
8
-0.1
03
-0
.16
9
0.0
77
-0
.13
8
-0.0
31
-0
.01
5
0.0
94
-0
.07
7
0.0
37
0
.00
7 .
~0.248
0.1
46
0
.02
4
0.2
13
-0
.00
9
-0.2
06
-0
.02
4
-0.2
41
0
.30
1
-0.0
37
-0
.19'
1 -0
.02
6
0.3
86
0
.19
7
0.0
11
-0
.05
7
0.0
68
0
.02
1
-0.3
73
0
.39
3
0.0
44
-0
.12
9
0.2
32
0
.22
8
-0.5
21
0
.19
0
-0.0
22
0
.19
0
0.0
47
0
.26
6
-0.5
13
-0
.28
6
-0.2
99
-0
.01
9
0.'1
46
-0.2
32
-0
.02
4
-0.3
29
0
.03
8
0.0
16
0
.15
2
-0.0
14
-0
.39
0
-0.0
63
0
.65
7
-0.0
65
0
.28
9
0.4
52
0
.13
9
-0.2
96
0
.14
7
. -0
.14
6
-0.2
45
0
.24
3
0.5
68
0
.18
1
0.5
09
0
.52
6
0.0
37
-0
.18
4
-0.3
18
0
.55
3
-0.2
26
0
.58
0
0.2
69
-0
.30
0
0.2
75
-0
.02
4
0.3
64
-0
.25
4
0.1
64
-0
.31
7
-0.3
95
-0
.53
3
0.1
14
0
.07
2.
-0.1
84
'-
0.4
02
-0
.38
9
0.1
80
0
.00
4
-0.8
82
0
.22
6
-0.0
13
-0
.17
7
0.0
40
0
.31
9
0.2
14
-0
.01
8
-0.0
78
0
.02
0
0.8
00
0
.10
5
0.7
14
-0
.67
3
. -0
.52
7
-'0
.26
0
-0.7
23
0
.59
8
-0.2
93
0
.38
3 .
-0.2
85
0
.03
4
-0.0
18
0
.25
3
-0.2
45
EIG
ENV
ALU
ES
AND
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S FR
OM
SI
R O
N C
EP
ST
RA
L
CO
EF
FIC
IEN
TS
O
F
12
UT
TE
RA
NC
ES
O
F "4
11
BY
S
PE
AK
ER
A
WIT
H
THE
4TH
C
OE
FFIC
IEN
T A
S Y
TA
BL
E
9 T
YP
ICA
L
EIG
EN
VA
LU
ES
A
ND
E
IGE
NV
EC
TO
RS
F
OR
1
2
UT
TE
RA
NC
E
OF
"4
"
BY
S
PE
AK
ER
A O
N C
EP
ST
RA
L
CO
EFF
ICIE
NT
S
77
4.2.3 THREE UTTERANCES PER SPEAKER OF "4" BY SPEAKER A, B AND C
For further investigation, 3 utterances of the utterance "4" by
all 3 speakers are taken as raw data. The data can be used to
compare with the past two cases.
The LPC views continue to provide encouraging results. Many views
review a plane structure of the data points. An example is shown
in FIG 27. Points basically concentrate within a strip-like
structure. However, comparing the resultant shape with the
previous case, it seems that the shape does not resemble with
each other a lot. The double clustering features is missing in
this test. The variation between different persons seems to blur
up the clear distinction of the "f" and "-our" sound.
On the other hand, the eigenvalues obtained in this case are far
more indicative. The first two eigenvalues are quite larger than
the rest, even the third one. Therefore, the data obtained in
this case should be more certain and more indicative. The
eigenvalues and eigenvectors corresponding to the view in Figure
27 is shown in Table 10.
The cepstral transform continue to provide poor results. The
points in the best view look as if it is a random cluster only.
No interesting feature is spotted. A typical one is shown in FIG
28. This time, the eigenvalues are more indicative as the LPC
transform. [Table 11] The first two eigenvalues are quite
distinctively larger than the rest. Therefore, up to this point,
78
------
------
------
------
------
-
" ,l l I \
t.
/l
./l
(' . ( .l .' ...• / /
~t '/'
,:1 •
• '
'f'AI..1
a
-40
.0
RO
LL
: a 7
0.0
FIL
E
: )(f .,,\
4. 3
d
-.-----~----
----.--.----
--------...:
:=.ki;-: .. : .. :'.
r. ['L
I".'J
•
• ' r,
..,.
d:.~
."
... ··.·
h~r·,:
.: /1
.. ;~t
}:.
,I • ':
!~~.
_.,
... : •.
::·,'~·
:··r'r.
" ~··
.. r-t.,.
:. ,I
• ~'.
r"_'
,/ . ,'.
;'.!~,df,:
I
..
... '.
.
, 't
i' -~ .
. ./
~"'\."':
~ ,
-'I'f.
:.,
..' :,
:.!'?"
i
'
... . :·
:<t-·:
',.':'.~
.:.:.:
'
" 1
.. : .. "; .
.
.. ' ,', , . '. ' .
VAW
o
-40
.0
RO
LL
Cl
16
0.0
FIL
E
: X
ft"1
14.3
d
FIG
27
A
TY
PIC
AL
V
IEW
W
ITH
3
UT
TE
RA
NC
E
PER
SP
EA
KE
R
OF
11
4"
OF
SPE
AK
ER
A
, B
, C
ON
L
PC
CO
EF
FIC
IEN
TS
79
___
__
__
__
• _
___
__
___
_ • _
__
__
__
• __
__
_ 0
••
__
. __
_ •
__
_ .
__
_ _
SOR
TED
EI
GEN
VA
LUES
1 2
3 4
5 6
7 8
9 10
11
0.9
92
0
.63
3.
0.3
66
0
.26
9
0.0
78
0
.06
2
0.0
55
0
.02
9
0.0
20
0
.01
6
0.0
05
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S
1 2
3 4
5 6
7 8
9 1
0
11
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
-0
.21
3
-0.4
39
-0
.24
0
-0.1
67
0
.01
1
-0.1
02
0
.08
5
-0.0
43
-0
.03
3
0.0
26
-0
.05
2
0.2
55
-0
.10
5
-0.1
10
-9
.17
2
-0.1
50
0.
U'1
B
-O.O
BB
-0.0~}9 ~O.lJJ
-0.0
59
-0
.03
4
0.3
12
0
.19
4
0.0
43
-0
.15
5
-0.0
23
0
.29
7
-0.0
54
-0
.01
3
-0.1
24
-0
.00
3
-0.0
29
0
.37
3
-0.1
21
-0
.00
9
0.0
94
-0
.06
0
0.2
91
-0
.22
0
-0.0
12
0
.04
5
0.0
75
-0
.07
6
0.3
73
-0
.41
4
0.2
39
-0
.05
1
-0.0
05
0
.26
0
-0.0
85
-0
.0:3
,0,
0.1
97
0
.28
3
-0.2
01
0
.36
1
-0.1
42
-0
.01
7
-O.O
UO
-0
.10
5
-0.5
1H
O
.OlU
-U.07~
0.'1
26
0.2
00
-0
.27
8
0.3
21
-0
.38
7
-0.2
56
-0
.16
8
-0.1
48
-0
.14
3
-0.3
92
-0
.01
9
0.3f
j··l
0.5
65
-0
.37
8
0.2B
O
-0.3
51
-0
.28
2
-0.3
UO
-0
.31
5
0.11
B
-0. 0
~~4
0.1
53
0.
26'1
0
.67
3
-0.4
74
0
.26
2
0.3
70
-0
.53
7
-0.4
69
-0
.34
6
-0.'1
80
-0
.24
0
0.4
16
0
.39
1
0.1
60
-0
.32
6
0.2
55
0
.17
8
-0.6
2H
-9
.51
5
-0.t
l1{j
0.
'111
-0
.55
7
0.,1
72
0.5
77
0
.16
3
0.1
57
0
.26
3
-0.3
33
-0
.20
5
-0.5
10
-0
.72
(j
0.2
14
-0
.62
9
0.7
49
0
.23
5
0.1
35
0
.61
1
EIG
EN
VA
LU
ES
. AN
D
CO
HH
ES
PO
ND
ING
E
IGE
NV
EC
'L'O
HS
FH
OM
S
IR
ON
L
PC
C
OE
FF
ICIE
NT
S
OF
:3
UT
TE
RA
NC
ES
P
ER
SP
EA
KE
R
OF
11
/1"
BY
S
PE
AK
ER
A
,B,C
W
ITH
T
HE
4T
H
CO
EF
FIC
IEN
T
AS
Y
TA
BL
E
10
T
YP
ICA
L
EIG
EN
VA
LU
ES
AN
D
EIG
EN
VE
CT
OR
S FO
R
3 U
TT
ER
AN
CE
PE
R
SPE
AK
ER
O
F "4
" BY
SP
EA
KE
R
A,
B,
C
ON
L
PC
CO
EF
FIC
IEN
TS
80
FIG
2
8
.... '
: •• 1
".
:.
.':.,
. ', '
.~ ......
.. ·r
~.
. .. :
.. ; f.. ..
: :~~!
. .': ~:
.: 0.-"
.:r;
,~,(:..:
~~'~'
;'~r~~
::-i''
:~ ,
• I :~
J -,·
-·t.·l
~·I' .•
_, 0
•
-.1 ~"
.'
~: ...
....
....
.. _
. 0
\ 0 .
.. J~
, ''
'\1~
" ":
;.'.
0
00
0
:-',
L.
i.1
~ ?~r·
.~'.~"
0 ,
----
-,-/""
,~,~
, ... ',
.. ; .
.. ~.
. 1
"'''-
', "
11
••• ., ...
: :.
1 I'
' •
• '.:
:. JL
""
~"''
''~A
-0
'~'..
.. ·t
I
L" -
" 0
'
0 ' ... 0
+,
• ,;:
,~.:.)
.. 0
0:
: '.
o ., ....
, ~ ..
..
I .1
' •
,. . ; ~'
..
...
I
. '.
~
o -4
0.0
RO
LL
o 1
80
.0
FIL
E
xf
l"1
r)4
. 3
d
.. ',' .
.: I'
: "
I. ..
, ,
''',
-J
.,-.
.. ' ..
I '.: :.:
..
l I l I •
J
. :~>:'::
·::!:;tL
.:;, ,
•.
~'~l
' I.
' .. :
.. \~:~
'~~:::'.
' :
",,: i 1
,.
•
;: .:: '{f .
~iii;~j;::
: :' ....
.. ~:·I ~:
;:·1::,:
:· :::
.
. "j.~f:t1
~i'W~~--
---~--___
_ ,'_'
I:' J!"
,':
•••
. ..• "
'.::
.
A T
YP
ICA
L
VIE
W
WIT
H
3 U
TT
ER
AN
CE
PE
R
SPE
AK
ER
O
F "4
11
OF
SPE
AK
ER
A
, B
, C
81
~
a -4
0.0
RO
LL
o 3
50
.0
FIL
E
xft
-wl.
3d
ON
L
PC
CO
EF
FIC
IEN
TS
SO
RTE
D
EIG
EN
VA
LUE
S
1 2
3 4
5 6
7 8
9 10
11
--
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
---
0.4
67
0
.23
4
0.1
32
0
.09
7
0.0
63
0
.03
0
0.0
26
0
.01
8
0.0
14
0
.00
9
0.0
03
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S
1 2
3 4
5 6
7 8
9 1
0
11
--
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
---
-0.0
43
-0
.06
0
0.0
00
-0
.05
1
0.2
49
0
.03
8
-0.1
56
0
:02
8
-0.0
80
0
.04
8
0.0
95
-0
.19
0
-0.1
63
0
.01
9
0.0
02
-0
.21
6
-0.0
52
-0
.02
1
0.0
42
0
.07
6
0.0
82
0
.05
9
0.4
95
0
.16
1
0.2
27
0
.22
8
-0.0
14
0
.02
0
0.1
51
-0
.07
8
-0.1
09
0
.08
0
0.0
26
-0
.11
3
0.5
90
-0
.25
5
0.2
04
0
.10
9
0.0
81
-0
.14
0
0.2
78
-0
.25
2
0.0
70
-0
.13
4
0.1
09
-0
.15
5
0.3
62
-0
.07
3
0.2
62
0
.35
1
-O.l
l:i
-0.1
36
0
.42
2
-0.0
54
-O
.lB
H
0.7
29
-0
.27
6
0.2
55
-0
.03
8
-0.2
5'1
0
.11
1
0.5
60
0
.15
2
-0.0
75
0
.28
3
0.4
90
-0
.13
6
-0.3
30
0
.49
8
-0.3
44
0
.55
3
-0.3
28
0
.40
8
0.1
54
-0
.51
2
0.4
03
-0
.35
9
-0.0
79
0.
45'1
0
.18
5
0.2
02
0
.09
1
-0.5
17
-0
.05
4
0.0
05
()
. 1 1'
1 -
O. 1
17
0.7
41
-0
.29
8
0.0
69
0
.16
5
0.3
78
0
.13
6
0.5
21
0
.25
5
-0.6
41
-0
.32
1
-0.4
38
-0
.07
2
0.2
11
. 0
.21
8
-0.5
7:1
0.2
53
-0
.05
3
-0.0
06
-0
.20
7
-0.6
22
0.
1'16
0
.72
5
0.0
56
0
.05
0 '
0
.35
5
0.2
20
-0
.72
9
-0.6
41
0
.45
5
-0.5
49
-0
.22'
1 -0
.57:
1 -0
.04
8
-0.0
85
EIG
EN
VA
LU
ES
AN
D
CO
RR
ESP
ON
DIN
G
EIG
EN
VE
CT
OR
S FR
OM
S
IR O
N
CE
PS
TR
AL
C
OE
FF
ICIE
NT
S
OF
3 U
TT
ER
AN
CE
S
PE
R
SPE
AK
ER
O
F "1
1'1
BY
SP
EA
KE
R
A,B
.C
WIT
H
THE
4TH
C
OE
FFIC
IEN
T
AS
Y
TA
BL
E
11
T
YP
ICA
L
EIG
EN
VA
LU
ES
AN
D
EIG
EN
VE
CT
OR
S FO
R
3 U
TT
ER
AN
CE
PE
R
SPE
AK
ER
O
F "4
" BY
SP
EA
KE
R
A
ON
C
EPST
RA
L C
OE
FFIC
IEN
TS
')
'.
82
it can be concluded that cepstral transform is not a suitable
method for dimensional reduction of speech data.
In fact, in the next two cases, the cepstral transform still
failed to provide any better view than LPC. This reinforces the
negative conclusion on the cepstral transform.
All views, eigenvalues and eigenvectors for this case are
depicted in Appendix VI.
4.2.4 TWO UTTERANCES PER DIGIT OF "1" TO "9" BY ' SPEAKER A
For further test, the raw data of the utterances digit "1" to "g"
are combined. These data includes a wide range of voice sound
instances and unvoiced sound instances. Therefore, it serves as
a real test on the dimension of speech data.
Once again, the LPC transformed data shows planer feature in
strip form. An example is shown in Figure 28. This is a highly
encourag ing result. utterances of 11 111 to 11 nine" contain many
voiced sound and unvoiced sound. still, its SIR transformed LPC
coefficients can be observed to be 2 dimensional in structure.
It seems that the reduction of speech dimensions to 2 dimensions
is quite possible. However, this should be double confirmed in
the next case.
The eigenvalues for the LPC transform remain indicative [Table
12]. The first two eigenvalues are quite larger than the rest.
This shows that the encouraging result does not come by chance.
83
.-...... ~ --
----
-'---
----
---'"
---'"
"---
----
--. iiJ'J
r~
.~ .
.f:J-
.. ' :t £ ~.
.. " .:~
" ,,/
, "l
-"
IiI
..'
" " /
,ill
" ./ " I / " l
~
a -5
0.0
RO
LL
o 7
0.0
FIL
E
xf"
... 4
.3d
.'
,""
~ . o
-40
.0
Rt1
.L
: a 1
80
.0
FIL
E
: xft'~. 3
d
FIG
28
A
TY
PIC
AL
V
IEW
W
ITH
2
UT
TE
RA
NC
E
PER
D
IGIT
PE
R
SPE
AK
ER
O
F 11
1"
TO
"9"
OF
SPE
AK
ER
A
, B
c
C
ON
L
PC
CO
EF
FIC
IEN
TS
84
' ..
' "S
OR
TE
D
EIG
EN
VA
L U
ES
1 2
3 4
5 6
7 8
9 1
0
11
...
. ------------------------------------~-----------------~--------------~-------
0.9
89
0
.35
8
0.1
23
0
.05
3
0.0
44
0
.03
3
0.0
31
0
.01
5
0.0
13
0
.00
7
0.0
02
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S ..
~ .. '
. :
"
", .
..
.' .'
..
i'·
1 2
3 4
5 6
7 8
9 1
0
11
-----------------------------------------------------.----------------~------
0.2
33
0
.10
3
-0.1
80
-0
.14
4
-0.0
74
0
.04
7
-0.0
09
0
.03
9
-0.0
47
-0
.00
5
-0.2
05
0
.22
5
-0.1
33
-0
.20
7
-0.0
60
-0
.10
9
0.2
29
-0
.06
6
-O.J
12
-0
.11
7
-0.1
25
-0
.30
7
0.3
32
0
.31
5
-0.0
85
0
.16
5
0.1
76
0
.18
0
-0.2
25
-0
.27
1
0.0
52
-0
.13
5
-0.1
58
0
.32
0
-0.2
92
" -0
.18
3
0.1
22
-0
.38'
1 0
.23
3
Cl.'1
E'7
0.2
83
-0
.05
/ 1
-0.2
12
-0
.15
6
0.2
64
0
.06
7
-0.3
06
0
.11
0
0.1
94
0
.31
3
0.1
51
.0
.'139
0
.12
8
-0.4
09
-0
.15
9
0.3
97
-0
.44
9
-0.1
97
-0
.01
5
0.35
:3
O.1
fj8
-0./
129
0.2
19
0.
00,1
0
.11
8
0.0
24
0
.38
9
-0.2
57
-0
.1/1
'1
0.0
80
-0
.30
H
0.'1
33
-0.2
09
-0
.08
7
0.1
77
0
.50
8
-0.0
01
0
.29
2
0.0
57
' -O.~~19
-0.2
07
-O
.:lO
2 O
. :l'l
'l O
.'15G
-0
.27
1
O.7
J6
0.2
26
-0
.01
2
0.2
53
-0
.03
1
-0.5
30
0
.11
7
-0.0
33
0
.04
6
0.2
85
-0
.16
6
0.2,
15
0.2
08
0
.19
8
0.2
72
( 0
.56
9
-0.'1
53
~O.086
0.19
,1
0.5
76
0
.12
1
-0.2
01
-0
.'19
3
0.1
92
0
.43
7
0.28
4 0
.43
3
-0.3
77
-0
.92
3
-0.6
46
O
. 30
5
-(l
. '11
5 -
0.5
H 8
-O
. G H
5 -
O. 5
92
0
.71
7
EIG
EN
VA
LU
ES
A
ND
C
OR
RE
SP
ON
DIN
G
EIG
EN
VE
CT
OR
S
FRO
t'wt
SIR
O
N
LP
C
CO
EF
FIC
IEN
TS
O
F
2 U
TT
ER
AN
CE
S
PE
R
DIG
IT
OF
"I"
T
O
"9/1
B
Y
SP
EA
KE
R
A
WIT
H
TH
E
4T
H
CO
EF
FIC
IEN
T
AS
Y
TA
BL
E
12
' T
YP
ICA
L
EIG
EN
VA
LU
ES
AN
D
EIG
EN
VE
CT
OR
S FO
R
2 U
TT
ER
AN
CE
S P
ER
SP
EA
KE
R
PE
R
DIG
IT
OF
"1"
TO
"9"
BY
SPE
AK
ER
A
O
N
LPC
C
OE
FF
ICIE
NT
S
'0·
;
85
The cepstral transform continue to provide random clustering
only. This result simply reconfirm our previous conclusion about
the poor performance of cepstral transform. [FIG 29 and Table 13]
All views, eigenvalues and eigenvectors are depicted in Appendix
VII.
4.2.5 ONE UTTERANCE PER DIGIT PER SPEAKER OF "1" TO "9" BY
SPEAKER A,B,C
As in the previous case of single speaker uttering "1" to "9",
3 speakers speaking the same words still provide good result in
LPC coefficients. Transformed data points still seems to lies on
a 2 dimensional surface. Comparing to the previous case of single
speaker, this multiple speaker case seems to have a slightly
wider spread and more dispersed points. This may be due to the
deviations of multiple speakers. A typical example ~ is shown in
Figure 30.
The eigenvalues are indicative as all the previous cases [Table
14J. This again confirms the significance of the results.
Cepstral transform does not provide any surprise. The result
simply looks like a random cluster. An example is shown in Figure
31 and Table 15.
All views, eigenvalues and eigenvectors are depicted in Appendix
VIII.
86
J'~~~~ ~~i~tf
~~:.';-'. /:-{
:: . ....... L
.' .f
l "l'
~''''
,".'
' ••
• ..
.:. ,
I, t. ".
:~~
.:r !~
·i·
.' oe-
•
. ' .=-:~:~:~
;i:~~~,~
~.~:~:(.-
.. .
:.
~~~
a -5
0.0
.t-R
OLL
a
0.0
FILE
b
: xft
,...
4. 3
d
--,--
.. I. •
• '.
.. " ... '."
..
, .
...
. : . .\~.
:~·~.;
::,~s:
:·< .:'
. ,y-,--._/~<)~~~
~~:.:::?" ' ;'
,--;. ... ><
~: .
~
o -5
0.0
RO
LL
a 1
20
.0
FIL
E
b :'x
ft-.
.-.4
. 3d
FIG
2
9
A T
YP
ICA
L
VIE
W
WIT
H
2 U
TT
ER
AN
CE
P
ER
D
IGIT
PE
R
SPE
AK
ER
O
F 1t
1"
TO
"9"
OF
SPE
AK
ER
A
I B
I
C
ON
C
EPS
TR
AL
C
OE
FF
ICIE
NT
S
87
.:. ~~~\r
< :. >. ~.
SOR
TED
EI
GEN
VA
LUES
: ;.
lo:.~.
\.,
,.'
1
2 3
4 5
6 7
8 9
10
11
-~---------------------------------------------------------------------------
0.54
7 0.
191
0.09
2 0
.07
4
0.04
7 0.
032
0.02
5 0.
016
0.01
3 0.
011
0.00
9
.; C
OR
RES
PON
DIN
G
EIG
ENV
ECTO
RS
.. "
: ..
• •
I.".
~. "
, ~
:,l·~.
)i:.
.......
... ..
:' . -,"
. .
'.~."
,:
1 2
.~ .
3 4
5 6
7 8
9 1
0 .
1
1
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
-0
.11
9
-0.2
43
-0
.21
7
-0.2
69
-0
.39
6
0.0
15
0
.67
1
0.1
60
-0
.34
5
-0.0
56
-0
.23
1
-0.0
60
0
.12
3
0.0
97
·0
.13
1
0.5
28
' -0
.01
4
-0.3
81
. 0
.34
9
0.0
80
-0
.05
8
0.3
76
-0
.04
5
-0.0
21
....
0.5
31
-0
.07
9
0.0
30
-0
.13
2
-0.4
62
0
.59
1
-0.4
10
0
.20
5
0.4
19
-0.0
15
-0
.05
1
0.1
87
0
.25
0
-0.1
5'1
-0
.09
8
0.2
65
0
.55
3
-0.08~
0.2H
1 0
.63
6
0.0
45
-0
.06
5
0.0
80
0
.17
9
0.3
18
-0
.05
7
0.4
74
-0
.65
0
-0.0
56
0
.44
3
0.0
70
-0.1
40
-0
.0/ 1
2 0
.18
1
-0.1
02
0
.32
1
0.2,
11
0.2
38
0.
12'1
-0
.67
2
-O.4
HO
0
.06
3
0.1
36
0
.13
7
-0.0
47
-0
.29
5
0.4
97
-0
.08
4
0.0
70
0
.12
6
-0.0
32
0
.36
8
-0.6
80
0.0
41
-0
.01
8
0.0
82
-0
.07
7
0.1
23
0
.27
1
0.3
27
-0
.2'1
0
0.4
15
-0
.5'1
4
0.4
88
-0.0
24
0.2~1
0.0
35
0
.lB
6
-0.1
63
-0
.19
1
0.5
96
-0
.33
8
-0.1
52
-0
. J'1
6 -0
.45
2
-0.0
03
0
.02
2
0.0
14
0
.12
9
-0.0
35
-0
.12
7
-0.1
21
. -
0.1
46
0
.22
0
0.0
54
-0
.51
9
0.3
14
0
.00
0
-0.1
63
-0
.21
1
0.0
90
0
.28
9
-0.6
81
-0
.21
9
0.5
06
0
.70
0
0.2
99
EIG
ENV
ALU
ES
AND
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S FR
OM
SI
R
ON
CEP
STR
AL
CO
EF
FIC
IEN
TS
O
F
2 U
TT
ER
AN
CE
S
PE
H
DIG
IT
OF
"1
" T
O
1,19
" B
Y
SPEA
KER
A
WIT
H
THE
4TH
C
OE
FF
ICIE
NT
A
S Y
TAB
LE
13
TY
PIC
AL
EI
GEN
VA
LUES
AN
D EI
GEN
VEC
TOR
S FO
R 2
UTT
ERA
NC
ES
PER
SP
EAK
ER
PER
D
IGIT
O
F 1
11
11
TO
"911
BY
SP
EA
KE
R
A
ON
C
EPS
TR
AL
C
OE
FF
ICIE
NT
S
88
:-.. ~: ~
. . .
• "
••
,a
'.
,,1
-~ I I I ,1
.j ) ~ . . ! ,! : i -, .i :;
r '
••
SOR
TE
D
EIG
ENV
ALU
ES
1 2
3 4
5 6
7 8
9 10
-
11
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
-. ,-
0.9
24
0
.34
6
0.1
92
0
.12
1
0.0
52
0
.02
2
0.0
17
0
.01
1
0.0
09
0
.00
4
0.0
02
,CO
RR
ESPO
ND
ING
EIG
ENV
ECTO
RS
--,
..
. '
' ..
...
' -'
, -,
t.: :'
..
--<~i ~~
~~~t"
;',>
1 2
3 4
5 6
7 8
9 1
0
11
----------------------------------------------------------------~------------
0.2
00
-0
.44
1
0.0
30
0.
0'13
0
.04
2
0.0:
37
-0.0
30
-0
.02
5
-0.0
78
0
.00
0
-0.0
50
-0
.11
5
-0.0
76
-0
.16
3
0.1
00
0
.12
7 -O.O5~
O.D
!)7
-0.2
07
-0
.25
7
-0.1
47
0
.08
6
0.3
42
-0
.00
3
0.3
16
o.
O:l~
} -0
.01
'1
O. 1
~ B
-
O. :
H} 7
-
(). 1
02
-
0.2
05
-
(). 0
80
0.
1/17
0
.49
0
-0.0
04
0
.02
9
-0.0
36
-0
.15
0
-0.2
88
O
. :\5
:1
-0.1
91
-0
.12
0
-0.1
28
0
.16
1
0.2
80
-0
.44
5
-0.0
24
0
.17
8
-0.1
73
-0
.OB
5 -O.
-15:
_~ -O.1~2
-0.1
20
-0
.10
0
0.3
4/ 1
0
.53
3
-0.1
59
-0
.72
4
0.0'
17
0.0
15
0.
1:30
-0
.12
2
0.0
49
0
.09
6
-0.1
26
0
.19
7
0.3
96
-0
.13
0
-0.0
26
0
.57
0
-0.2
25
0
.09
11
0.5
77
-0
.34
7
O~14
2 0
.18
0
0.2
31
0
.20
3
-0.3
33
-0
.40
3
-0.0
'12
-0
.12
7
-0.l
B9
0.
08'1
-0
.09
2
O.1
2G
0.7
76
0.
'148
0
.16
3
-0.2
62
-0
.03
0
0.3
'12
0
.21
6
0.2
01
0
.28
0
O •.
1911
0.2
79
0.
1H7
0.'1
B5
-0.0
51
-p
.51
6
-0.2
74
0
.11
2
0.32
'1
0.'1
91
O. ~\1
2 -0
.'1 tl
t1 O
. fi5
/ 1
-0.2
29
0
.41
7
-0.0
07
-0
.33
9
0.2
95
0
.27
9
0.8
43
-0
.72
3
-0.1
49
-0
.56
8
0.6
53
0
.45
6
0.3
49
EIG
ENV
ALU
ES
AN
D
CO
RR
ESPO
ND
ING
EI
GEN
VEC
TOR
S FR
OM
SI
R
ON
LP
C
CO
EFF
ICIE
NT
S O
F 1
UTT
ERA
NC
ES
PER
D
IGIT
PE
R
SPE
AK
ER
O
F "1
" TO
"9
11
BY
S
PE
AK
ER
A,B
,C
WIT
H
TH
E
4TH
C
OE
FF
ICIE
NT
A
S
Y
TA
BL
E
14
TY
PIC
AL
E
IGE
NV
AL
UE
S A
ND
E
IGE
NV
EC
TO
RS
FOR
1
UTT
ERA
NC
E P
ER
D
IGIT
PE
R
SPE
AK
ER
O
F "1
"
TO
"9"
BY
SPE
AK
ER
A
, B
, C
ON
L
PC
CO
EF
FIC
IEN
TS
90
, "
-I
I
~i 'J '0]
-:. '. ';:,
:=J :'1 :.1 .. .:'~ :I~ . ~ .. ;. I. '. :::
.....
" ~ .•.. . " ... '.\ " .~ :.; I'
.:
.:.
';
'. " " ..•. \. ... \"" "
. .'
\ \ '. t, •
I.. ..,.-
__ \
'.'
'"
------
<-".
. . ,
',' :~
. I:
.~.·
..,
._--
-
' .. .' .'
.. .
-: t, .
..: .. ,:.:.
·\:!Y'
.'::·
. _-
----
-.:
•• '
~;'I'~
. . ~i'~ .
. ~~-
•
• ;·'
1'.1
.' "'~':~"
• ..
i, ::
': -"
~I·!
.':"
: .'
. ·· .. ::
.~'r·~
.·t·:
. \~
~.t.
~.
... :
.:'
.... ~
'~':'l
: "~
~:.:
' : ~.
:.:
. . ' .
. '1 ;
.• 11'
, ~~} ,.:
F.
. '
.' "
""}'.
"'-t:\
.. , "
,.),.,
,'(.
.' ,
.:.
t '1
:":(.' ~~ ... f
·····.' •
••• ,
'. '
•. i'
uJ
,,',
•
.:"
• ••
_"It
'· •••
,,'.
' .
... 1
". ::,r~':
. :.;;: ::
:" ';~~
,",
. ,: ..
.. I,
,"
'\
~
o -4
0.0
liO
LL
o
20
.0
FIL
E
Xff
ri4
.3d
::" , ..
I, I,
... -....
... :.
',:'" :':,
.~;. : ~::
:. , .
.... :. ·, .. ~l :
.:}{? ···.
0 '.
.: "
. ..
·~.'
.t .
j .•.•
, ...
... ~.
~:: ··
~~~;
"i.~
:: .
. ....
:.: ..
.. ~ .:'
ii:~
~::,
:'; ..
.'
':.,:.
~
.:. :·
,1)7
: ...
. ~ ..
I'
.• ,.
,','
" ~
.• ..:
.:.~
1'1
•
.. . ,'
. .7,
1 ......
1~·.
n 11
', .. :
. •
" ••
,o.,
~ ~ \
. ' "
._,
:.
.:.
'. :
~ ~,:
. ~~
... :'~
:-.:
.
, . " ~,
.. "
. :r."
.~ . I
. __
__
__
_ . _
__
__
__
.. _::_
·:_:·:_:
· ... ;~"..r,..~"":~.._j·: :
~~!;fj!:.!: .:: :
': ;:: : "
::: :.,
•.
....
. ! .
, .
I ..
~
: '
, ... ;
" ..
t •• ' .'::
/J'.:
• .
~
'.
It
......
: •
I •
' ..
~
a -4
0.0
RO
LL
o 1
80
.0
FIL
E
Xft
, ... 4
.3d
FIG
3
1
A T
YP
ICA
L
VIE
W
WIT
H
1 U
TT
ER
AN
CE
P
ER
D
IGIT
PE
R
SPE
AK
ER
O
F "1
11
TO
"9"
BY
SPE
AK
ER
A
, B
,
C
ON
C
EPS
TR
AL
C
OE
FF
ICIE
NT
S
91
5.2 FURTHER WORKS
As already introduced in the previous chapter, some further works
have to be done. Some of them are research oriented and some are
technically oriented.
i) A finer distinction of data points should be employed. In
the present initial analysis, all data points from every
person of each frame has the same color on the 3D rotating
plot. For further and deeper analysis, different person and
different set of frames should have different color on the
3D rotating plot so as to spot more information. Different
colors of points are assigned to different person (eg red
for the first person, orange for the second, and so on ),
and the transition of frames should be represented by a
gradual change of color tone. For example, for the data in
red, the initial frames are in light red while the last
frames are dark red. Gradual change of redness is used for
intermediate frames. In doing so, the person and the
approximate frame location can be spotted on the 3D
rotation. A re-programming of the 3D rotation program under
X-window or on true color PC is recommended. This will
enable higher color resolution and pixel resolution compare
with using VGA resolution at the present stage.
ii) More voice data should be used. For initial attempt, we had
use the utterances of digit "1" to "9". The LPC results
obtained are quite satisfactory. However, for double
94
confirmation on the power of LPC and SIR, additional voice
should be added, extending the variety of sound produced.
iii) The best views obtained between single utterance and multi
utterances, or between single speaker and multiple speaker
of the same utterances does not resemble to each other.
Although planar views were obtained for all cases, the
figures does not match with each other. The shapes differ.
The 2 transformation eigenvectors for each corresponding
case also differ. The reason for this should be further
investigated. The 11 dimensional function describing the
approximate planer shape should be found for comparison and
further investigation.
iv) The failure of cepstral transform on LPC coefficients
should be further investigated for confirmation. However,
an interesting alternative that may worth a try is to apply
the cepstral transform on the original speech data instead
of the LPC coefficients. This may yield some further
insight on the speech data.
v) The data points in 3D may be reduce to save computation
time and memory space in the rotation program. Points which
are very close with each other cannot be distinguished in
the computer screen. Those points will all be overlapped
within one pixel on the screen. However, in the 3D rotation
program, . all those overlapping points are considered
95
separate and are separately calculated for rotation. This
will slow down the rotation and acquire more memory space.
This is not efficient and should be improved. A
preprocessing of the 3D data points should be added to
eliminate duplicate and close-with-each-other points.
vi) There is still an extremely small chance that the data
points are symmetrical along the regressor axis for each
dimension as regressor for cepstral transform. Therefore,
to ensure that this is or is not the case, slightly more
works has to be done. To solve the problem, a common offset
can be added to the dimensions. (The regressor dimension
can be excluded). Then the resultant multi-dimensional data
should not be symmetrical along the regressor axis any
more. This offset data can then be fed into the SIR for
analysis again.
96
REFERENCES :
Inselberg,Alfred and Dimsdate,Bernard 1987 Coordinates for visualizing Multi-Dimensional Proceeding of Computer Graphics International.
"Parallel Geometry" in
Everitt ,B.S. 1978 : "Graphical Techniques and Multivariate Data" Heinemann Education Books
Andrew D.F. 1988 : "Some approaches to Interactive Statistical Graphics" in Dynamic Graphics for Statistics, Wadsworth.
Ratkowsky, David A, 1983 : "Non Linear Regression Modelling - A Unified Practical Approach", Dekker
Flanagan, James L, 1965 Perception", Springer
"Speech Analysis Synthesis and
HKPC 1992 EB Features in Electronics Bulletin by Hong Kong Productivity Council Vol 11 No. 1.
Friedman, Jerome H, 1982 IIAn Introduction To Real Time Graphical Techniques for Analyzing Multivariate Data" in Proceedings of the Third Annual Conference and Exposition of National Computer Graphics Association.
Cater, John P, 1983 : "Electronically Speaking - Computer Speech Generation" , Howard W. San & Co ..
Keithweiskamp, Loren Heiney and Namir Shamnas using Turbo C", WILEY.
"Power Graphics
Li, K.C. 1984 "consistency for cross-validated nearest neighbour estimates in nonparametric regression" Ann. state 12.
Li, K.C. 1985 "From stein's unbiased risk estimates to the method of generalized cross-validation" Ann. State 13.
Li, K.C. 1986 : "Asymptotic optimality of CL and generalized cross-validation in ridge regression and application to spline smoothing" Ann. State 14.
Li, K.C. 1987 : "Asymptotic optimality validation and generalized cross-validation Ann. State 15.
of Cp , CL , crossDiscrete index set"
Li, K.C. 1988 : "Sliced Inverse Regression: Dimension Reduction Via Inverse Regression" U.C.L.A. statistic Series, No. 3 (revised) .
Li, K.C. 1989 : "Sight-seeing with SIR, a transformation-based projection pursuit method".
Mongomery Peck 1992 "Introduction to Linear regression Analysis" 2nd Edition, Wiley.
Morgan, Nelson 1984 : "Talking Chips" ,Mcgraw-Hill.
Stevens, Roger : "Graphic Programming in C" , M&T Books.
Roman Kue : "Introduction to Digital Signal Processing", McGrawHill.
Furui, Sadaoki 1989 Recognition".
"Digital Speech Processing, Synthe'sis, and
Veltri,stevent 1985 : "How To Make Your Computer Talk" McGraw Hill. '
TI 1987 Texas Instrument and HKPC "Fundamentals of Digital Signal Processing U , Seminar Handout.
Daulton, Tyrone 1987 : "Thr'ee DiIhensio.nal Perspective Plotting" , Byte December 1987 .
. Cleverland, William S., 1988 : "Dynamic Graphics For Statistics", Wadsworth & Books/Cole.
Cleveland,William S. and Wilks, Allan R. 1987 "Dynamic Graphics For Data Analysis" by Richard A. Becker, in statistical Science.
L.A. van Rijckevorsel, J, and de Leeuw, J. 1988 correspondence analysis" Wiley.
"Component and
De Leeuw 1984 "The Gifi-system of nonlinear mUltivariate analysis, in Data analysis and informatics II It, edited by E. Diday, Amsterdam, North Holland.
Breiman, L., and Friedman, - J. 1985 transformations for multiple regression Amer. statist. Assoc. 80.
"Estimating optimal and correlation." J.
Koyak, R. 1987 : "On measuring internal dependence in a set of random variables" Ann. statist. 15.
Box, G.E. and Cox, D.R. 1964 : "An analysis of transformations." Journal of Royal statistical Society, 26.
Eubank R. L. 1988 "Spline Smoothing and nonparametric regression" Marcel Dekker, New York.
Hardle, W., Hall, P. and Marron, S 1988 "How far are automatically chosen regression smoothing parameters from their optimum. with Discussion." J. Amer. statist. Assoc. 33.
Kendall, Sir Maurice, 1975 "Multivariate Analysis" Charles Griffin & company Ltd., London and High Wycombe.
000388813 .