Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
11
Multivariate Normal DistributionMultivariate Normal Distribution
ShyhShyh--Kang Kang JengJengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/Graduate Institute of Networking and Graduate Institute of Networking and
MultimediaMultimedia
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
Outline
Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
Questions
What is the univariate normal distribution?What is the multivariate normal distribution?Why to study multivariate normal distribution?
66
Multivariate Normal DistributionMultivariate Normal Distribution
Generalized from Generalized from univariateunivariate normal normal densitydensityBase of many multivariate analysis Base of many multivariate analysis techniquestechniquesUseful approximation to Useful approximation to ““truetrue””population distributionpopulation distributionCentral limit distribution of many Central limit distribution of many multivariate statistics multivariate statistics Mathematical tractableMathematical tractable
2
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
QuestionsWhat is the formula for the probability density function of a univariate normal distribution?What are the probability meaning of parameters and ?How much probability are in the intervals (-, +) and (-2, +2)?How to look up the accumulated univariate normal probability in Table 1, Appendix?
Questions
What is the MahalanobisMahalanobis distance for distance for univariateunivariate normal distribution?normal distribution?What is the What is the MahalanobisMahalanobis distance for distance for multivariate normal distribution?multivariate normal distribution?What are the symbol for and the What are the symbol for and the formula of the probability density of formula of the probability density of a a pp--dimensional multivariate normal dimensional multivariate normal distribution?distribution?
QuestionsWhat are the possible shapes in a surface diagram of a bivariatenormal density?What is the constant probability density contour for a pp--dimensional dimensional multivariate normal distribution?multivariate normal distribution?What are the What are the eigenvalueseigenvalues and and eigenvectors of the inverse of eigenvectors of the inverse of ? ? (Result 4.1)(Result 4.1)
QuestionsWhat is the region that the total probability inside equals 1-? What is the probability distribution for a linear combination of p random variables with the same multivariate-normal distribution? (Result 4.2)How to find the marginal distribution of a multivariate-normal distribution by Result 4.2?
QuestionsWhat is the probability distribution for a random vector obtained by multiplying a matrix to a random vector of p random variables with the same multivariate-normal distribution? (Result 4.3)What is the probability distribution of a random vector of multivariate normal distribution plus a constant vector? (Result 4.3)
3
Questions
Given the mean and covariance matrix of a multivariate random vector, and the random vector is partitioned, how to find the mean and covariance matrix of the two parts of the partitioned random vector? (Result 4.4)
QuestionsWhat are the if-and-only-if conditions for two multivariate normal vectors X1 and X2 to be independent? (Result 4.5)If two multivariate normal vectors X1and X2 are independent, what will be the probability distribution of the random vector partitioned into X1and X2? (Result 4.5)
Questions
A random vector X is partitioned into X1 and X2, then what is the conditional probability distribution odX1 given X2 = x2? (Result 4.6)What is the probability distribution for the square of the MahalanobisMahalanobisdistance for a multivariate normal distance for a multivariate normal vector? (Result 4.7)vector? (Result 4.7)
Questions
How to find the value of the MahalanobisMahalanobis distance for a distance for a multivariate normal vector when the multivariate normal vector when the probability inside the corresponding probability inside the corresponding ellipsoid is specified? (Result 4.7)ellipsoid is specified? (Result 4.7)
QuestionsWhat is the shape of a chi-square distribution curve? How to look up the accumulated chi-square probability from Table 3, Appendix?What is the joint distribution of two random vectors which are two linear combinations of n different multivariate random vectors? (Result 4.8)
1818
UnivariateUnivariate Normal DistributionNormal Distribution
xexf
N
x 2/]/)[(
2
2
2
2
1)(
),(
4
1919
Table 1, AppendixTable 1, Appendix
2020
Square of DistanceSquare of Distance((MahalanobisMahalanobis distance) distance)
)('
)())((
1
122
μxμx
xxx
2121
pp--dimensional Normal Densitydimensional Normal Density
],,,['
vector random from sample a is
,,2,1,
2
1)(
),(
21
2/)()'(2/12/
1
p
i
p
p
XXX
pix
ef
N
X
x
Σx
Σμ
μxΣμx
2222
Example 4.1 Example 4.1 BivariateBivariate NormalNormal
)1(
1,
),Corr(/
)Var(),Var(
)(),(
2122211
2122211
1112
1222
2122211
1
2221
1211
2122111212
222111
2211
ΣΣ
XX
XX
XEXE
2323
Example 4.1 Squared DistanceExample 4.1 Squared Distance
22
22
11
1112
2
22
22
2
11
112
12
22
11
11221112
22111222
2122211
2211
1
21
1
)1(
1],[
)('
xxxx
x
x
xx
μxΣμx
2424
Example 4.1 Density FunctionExample 4.1 Density Function
]}2
[)1(2
1exp{
)1(2
1),(
22
22
11
1112
2
22
22
2
11
112
12
2122211
21
xx
xx
xxf
5
2525
Example 4.1 Example 4.1 BivariateBivariate DistributionDistribution
11 = 22, 12 = 02626
Example 4.1 Example 4.1 BivariateBivariate DistributionDistribution
11 = 22, 12 = 0.75
2727
ContoursContours
pi
c
c
contourdensityyprobabilitConstant
iii
i
,,2,1,
:axes
at centered ellipsoidan of surface
)-()'(such that all
i
21
eΣe
e
μ
μxΣμxx
2828
Result 4.1Result 4.1
definite positive
for ),(1/for ),(
1
definite positive :
1
1-
1-
Σ
ΣeΣe
eeΣeΣe
Σ
2929
Example 4.2 Example 4.2 BivariateBivariate ContourContour
]2
1,
2
1[',
]2
1,
2
1[',
rseigenvecto and seigenvalue
normal, Bivariate
212112
112111
2211
e
e
3030
Example 4.2 Positive CorrelationExample 4.2 Positive Correlation
6
3131
Probability Related to Probability Related to Squared DistanceSquared Distance
1y probabilit has
)()()'(
satisfying values of ellipsoid Solid21pμxΣμx
x
3232
Probability Related to Probability Related to Squared DistanceSquared Distance
3333
Result 4.2Result 4.2
),( bemust
every for )','(:'
)','(
:'
),(:
2211
ΣμX
aΣaaμaXa
Σaaμa
Xa
ΣμX
p
pp
p
N
N
N
XaXaXa
N
3434
Example 4.3 Marginal DistributionExample 4.3 Marginal Distribution
),(
:in ofon distributi Marginal
),()','(:'
','
'],0,,0,1['
),(:]',,,[
111
111
1
21
iii
i
pp
N
X
NN
X
NXXX
X
ΣaaμaXa
Σaaμa
Xaa
ΣμX
3535
Result 4.3Result 4.3
),(:
)',(:
),(:
11
2121
1111
ΣdμdX
AΣAAμAX
ΣμX
p
q
pqpq
pp
pp
p
N
N
XaXa
XaXa
XaXa
N
3636
Proof of Result 4.3: Part 1Proof of Result 4.3: Part 1
)',(:every for valid
))'('),('(:)('
))'()'(,)'((:)'(
'
,')(' ncombinatiolinear Any
AAΣAμAXb
bAAΣbAμbAXb
bAΣAbμAbXAb
bAa
XaAXb
qN
N
N
7
3737
Proof of Result 4.3: Part 2Proof of Result 4.3: Part 2
),(:
arbitrary is
)',''(:''
)','(:'
'')('
ΣdμdX
a
ΣaadaμadaXa
ΣaaμaXa
daXadXa
pN
N
N
3838
Example 4.4 Linear CombinationsExample 4.4 Linear Combinations
322211
2
33232213222312
13222312221211
32
21
3
2
1
32
21
3
, with verifiedbe can
)',(:
2
2'
110
011
),(:
XXYXXY
N
X
X
X
XX
XX
N
AAΣAμAX
AAΣ
Aμ
AX
ΣμX
3939
Result 4.4Result 4.4
),(:
|
|
,,
),(:
1111
2221
1211
2
1
)1)((2
)1(1
ΣμX
ΣΣ
ΣΣ
Σ
μ
μ
μ
X
X
X
ΣμX
q
qp
q
p
N
N
4.3Result in |Set :Proof))(()()(
qpqqqpq0IA
4040
Example 4.5 Subset DistributionExample 4.5 Subset Distribution
4424
2422
4
221
4424
242211
4
21
4
21
5
,:
,,
),(:
N
X
X
N
X
ΣμX
ΣμX
4141
Result 4.5Result 4.5
22
11
2
1
2
1
2222211111
1221
2221
1211
2
1
2
1
)(21
)1(2
)1(1
|'
|
,:
tindependen ),(:),,(: (c)
0 ifonly and ift independen:,
|
|
,:--- (b)
),Cov( t,independen :, (a)
21
21
2121
Σ0
0Σ
μ
μ
X
X
ΣμXΣμX
ΣXX
ΣΣ
ΣΣ
μ
μ
X
X
0XXXX
qqqq
N
NN
N
4242
Example 4.6 IndependenceExample 4.6 Independence
) also and oft independen is (
tindependen are and
tindependennot :,
200
031
014
),(:
213
32
11
21
3
XXX
XX
X
XX
N
X
Σ
ΣμX
8
4343
Result 4.6Result 4.6
211
221211
221
2211
221
22
2221
1211
2
1
2
1
covariance
and )(mean with normal
is given ofon distributi lconditiona
0,
|
|
,),,(:
ΣΣΣΣ
μxΣΣμ
xXX
Σ
ΣΣ
ΣΣ
Σ
μ
μ
μΣμ
X
X
X
pN
4444
Proof of Result 4.6Proof of Result 4.6
22
211
221211
22
221
221211
12212
|
'|
'
covariance with normaljoint
:
)(
)(
,
|0
|
Σ0
0ΣΣΣΣ
AAΣ
μX
μXΣΣμX
μXA
I
ΣΣI
A
4545
Proof of Result 4.6Proof of Result 4.6
)),((
:given
),0(:)(
))()((
)
|)()((
)()(/),()|(t independen ,
tindependen are and )(
211
221211221
22121
221
211
221211221
221211
221
221211221
221211
22
221
221211221
221211
22221
221211
ΣΣΣΣμxΣΣμ
xXX
ΣΣΣΣμXΣΣμX
μxΣΣμxμXΣΣμX
xX
μxΣΣμxμXΣΣμX
μXμXΣΣμX
q
q
N
N
f
f
APBPBAPBAPBA
4646
Example 4.7 Conditional Example 4.7 Conditional BivariateBivariate
)),(()|(
thatshow
),(:
22
212
112222
12121
2212
1211
2
12
2
1
xNxxf
Nx
x
4747
Example 4.1 Density FunctionExample 4.1 Density Function
]}2
[)1(2
1exp{
)1(2
1),(
22
22
11
1112
2
22
22
2
11
112
12
2122211
21
xx
xx
xxf
4848
Example 4.7Example 4.7
)1(2/))(/(
21211
22121
2221211
2122211
22
222
2
2222
12112
1211
22
22
11
1112
2
22
22
2
11
11212
21211
222221211
)1(2
1
)(/),()|(
2)1(2)1(2
)(
2
1)(
)1(2
1
]2[)1(2
1
xxe
xfxxfxxf
xxx
xxxx
9
4949
Result 4.7Result 4.7
ondistributi theof percentile
)th (100upper thedenotes )( where
,1 is )}()()'(:{
ellipsoid solid theinsidey probabilit The (b)
: )()'( (a)
0),,(:
2p
2p
2p
1
2p
1
μXΣμXx
μXΣμX
ΣΣμX pN
5050
22 DistributionDistribution
)2/on with distributi (Gamma
0,0
0,)2/(2
1)(
(d.f.) freedom of degrees :,
)1,0(:);,(:
,),,(:),,(:
2
22/12/22/2
2
1
2
2
2222
2111
2
ef
x
NX
ZNX
NXNX
i i
ii
i
iii
5151
22 Distribution CurvesDistribution Curves
5252
Table 3, AppendixTable 3, Appendix
5353
Proof of Result 4.7 (a)Proof of Result 4.7 (a)
2
1
21
2
2
1
1
1
'
'
2'2
1'1
1
2
2
1
'
'
1
1
:)()'(),1,0(:
/
/
/
'
)',0(:)(,)(1
)()'(1
)()'(
p
p
iii
p
pp
iiii
pp
p
p
ii
p
ii
i
ii
p
i i
ZNZ
NZ
μXΣμX
Ieee
ee
e
e
e
AAΣ
AAΣμXAZμXe
μXeeμXμXΣμX
5454
Proof of Result 4.7 (b)Proof of Result 4.7 (b)
1)()()'(
by ddistribute
variablerandom new )()'(
),(:by ellipsoid the toassigned
yprobabilit theis)()'(
21
2
1
21
p
p
p
P
N
cP
μXΣμX
μXΣμX
ΣμX
μXΣμX
10
5555
Result 4.8Result 4.8
n
jj
n
jj
nn
n
j
n
jjjjpnn
jp
n
b
c
bbb
ccNccc
N
1
2
1
2
2122112
1 1
222111
j
21
)()'(
)'()(
matrix covariancewith
normaljoint are and and
)(,:
),(:
tindependenmutually :,,,
ΣΣcb
ΣcbΣ
VVXXXV
ΣμXXXV
ΣμX
XXX
5656
Proof of Result 4.8Proof of Result 4.8
ΣAAΣ
ΣΣAAΣ
AAΣAμV
VAX
III
IIIA
Σ00
0Σ0
00Σ
Σ
μ
μ
μ
μ
ΣμXXXX
X
X
X
X
X
)(:' rmsofdiagonalteoff
)(,)(:' of termsdiagonalblock
)',(:,
,
),(:],,,['
1
1
2
1
2
22
1
21
21
2
1
''2
'1
n
jjj
n
jj
n
jj
pn
n
n
npn
bc
bc
Nbbb
ccc
N
5757
Example 4.8 Linear CombinationsExample 4.8 Linear Combinations
312123
22
21
321
1
34321
2223'
3'
)','(:'
201
011
113
,
1
1
3
),( identicalt independen:,,,
aaaaaaa
aaa
N
N
Σaa
μa
ΣaaμaXa
Σμ
ΣμXXXX
5858
Example 4.8 Linear CombinationsExample 4.8 Linear Combinations
0ΣVVXXXXV
ΣΣΣ
μμμ
ΣμXXXXV
V
V
VV
4
12143212
4
1
2
4
1
343211
)(),Cov(,3
201
011
113
)(
2
2
6
2
),(:2
1
2
1
2
1
2
1
1
1
11
jjj
jj
jjj
bc
c
c
N
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
QuestionsWhat are random samples?What is the likelihood?How to estimate the mean and variance of a univariate normal distribution by the maximum-likelihood technique? (point estimates)What is the multivariate normal likelihood?
11
Questions
What is the trace of a matrix?How to compute the quadratic form using the trace of the matrix? (Result 4.9)How to express the trace of a matrix by its eigenvalues? (Result 4.9)Result 4.10
Questions
How to estimate the mean and covariance matrix of a multivariate normal vector? (Result 4.11)What is the invariance property of the maximum likelihood estimates?What is the sufficient statistics?
6363
MaximumMaximum--likelihood Estimationlikelihood Estimation
6464
Multivariate Normal LikelihoodMultivariate Normal Likelihood
estimateslikelihoodMaximum
estimation likelihood Maximum
likelihood
,,, fixedfor and offunction a as
2
1
,,,
ofdensity Joint
),( from sample random:,,,
21
2/)()'(
2/2/21
21
1
1
n
nnpn
pn
n
jjj
e
N
xxxΣμ
ΣXXX
ΣμXXX
μxΣμx
6565
Trace of a MatrixTrace of a Matrix
k
i
k
jij
k
iiiij
kk
a
cc
caa
1 1
2
1
1)(
)'tr()e(
)tr()tr()d(
)tr()tr()c(
)tr()tr()(tr)b(
)tr()tr()a(
scalar a is ;)tr(
AA
AABB
BAAB
BABA
AA
AA
6666
Result 4.9Result 4.9
k
ii
k
kk
1
)tr( b)(
)'tr()'tr(' (a)
vector1 :
matrix symetric :
A
AxxAxxAxx
x
A
12
6767
Proof of Result 4.9 (a)Proof of Result 4.9 (a)
)'tr()')tr(()'tr(
)tr()tr(
)tr(
)tr()tr(
matrix :,matrix :
1 11 1
1 1
AxxxAxAxx
BCCB
BC
CBBC
CB
m
i
k
jjiij
k
j
m
iijji
m
i
k
jjiij
cbbc
cb
mkkm
6868
Proof of Result 4.9 (b)Proof of Result 4.9 (b)
k
ii
kdiag
1
21
)tr()'tr(
)'tr()tr(
,,,
','
ΛΛPP
ΛPPA
Λ
IPPΛPPA
6969
Likelihood FunctionLikelihood Function
2/''tr
2/2/
1
1
1
1
1
1
1
1
1
2
1),(
''
'
'
'tr'
n
jjj n
nnp
n
jjj
n
jjj
n
jjj
n
jjj
n
jjj
eL
n
μxμxxxxxΣ
ΣΣμ
μxμxxxxx
μxxxμxxx
μxμx
μxμxΣμxΣμx
7070
Result 4.10Result 4.10
BΣ
Σ
BΣ
B
BΣ
b
ebe
b
pp
pp
bppb
bb
2/1for only holding
equality with , definite positive allfor
211
scalar positive :
matrix definite positivesymmetric:
)(
2/)tr( 1
7171
Proof of Result 4.10Proof of Result 4.10
IBΣBBΣ
BΣ
BBΣ
ΣBBΣBΣ
BΣB
BΣBBBΣBΣ
BΣ
BΣ
bb
ebebebe
eee
bppb
bbbbb
p
i
bipb
bp
ii
b
p
ii
p
i
i
i
p
ii
2such that )2/1( when attained is boundupper
211
2at 2 maximum a has
11
/,tr
positive all , of seigenvalue:
trtrtr
2/112/1
2/)tr(2/
1
2/2/
12/)tr(
11
1i
1
2/112/1
2/112/12/12/111
1
11
7272
Result 4.11 Maximum Likelihood Result 4.11 Maximum Likelihood Estimators of Estimators of and and
SXXXXΣ
Xμ
ΣμXXX
n
n
n
N
j
n
jj
pn
1'
1ˆ
ˆ
),( from sample random:,,,
1
21
13
7373
Proof of Result 4.11Proof of Result 4.11
SxxxxΣ
ΣΣμ
xμ
μxΣμxxxxxΣ
Σμ
xxxxΣ
n
n
n
eL
n
L
n
jjj
nnp
n
jjj
n
jjj
1'
1ˆ
2
1),ˆ(
ˆ
'2
1'tr
2
1
:),( ofExponent
1
'tr
2/2/
1
1
1
1
1
7474
Invariance PropertyInvariance Property
)Var( of MLE1
ˆ
ˆ of MLE
ˆˆ'ˆ' of MLE
:Examples
)( ofestimator likelihood maximum:)ˆ(
ofestimator likelihood maximum:ˆ
1
2
11
i
n
jijiii
iiii
XXXn
hh
μΣμμΣμ
7575
Sufficient StatisticsSufficient Statistics
population normal
temultivaria a of statistics sufficient are and
and through ,,,
nsobservatio ofset wholeon the depends
2
1
,,,
ofdensity Joint
21
2/''tr
2/2/
21
1
1
Sx
Sxxxx
Σ
XXX
μxμxxxxxΣ
n
n
nnp
n
n
jjj
e
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
Questions
What is the distribution of sample mean for multivariate normal samples?What is the distribution of sample covariance matrix for multivariate normal samples?
7878
Distribution of Sample MeanDistribution of Sample Mean
4.8Result cf.
)/,(:
:case teMultivaria
)/,(:
1 :case Univariate
),( from sample random:,,,
2
21
nN
nNX
p
N
p
pn
ΣμX
ΣμXXX
14
7979
Sampling Distribution of Sampling Distribution of SS
)|)1((on distributiWishart :)1(
),(:
:case teMultivaria
),0(:,)1(
:)1(
1 :case Univariate
),( from sample random:,,,
11
'
2
1
222
21
2
1
22
21
ΣSZZS
Σ0XXZ
ΣμXXX
n-Wn-
N
NZZsn
XXsn
p
N
n
n
jjj
pjj
j
n
jj
n
n
jj
pn
8080
WishartWishart DistributionDistribution
)'|'(:')|(:
)|(:
)|(:),|(:
:Properties
definite positive:
21
2)|(
2121
2211
1
2/)1(4/)1(2/)1(
2/tr2/)2(
1
21
21
1
CCΣCACCACΣAA
ΣAAAA
ΣAAΣAA
A
Σ
AΣA
AΣ
mm
mm
mm
p
i
nppnp
pn
n
WW
W
WW
in
ew
OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S
X
X
Questions
What is the univariate central limit theorem?What is the law of large numbers, for the univariate case and the multivariate case? (Result 4.12)What is the multivariate central limit theorem? (Result 4.13)
Questions
What is the limit distribution for the square of statistical distance?
8484
UnivariateUnivariate Central Limit TheoremCentral Limit Theorem
size sample largefor normalnearly also is
on distributi normalnearly a has
ty variabilisame the
ely approximat having variablesrandom :
,,, causes
tindependen ofnumber largeaby determined:
21
21
X
X
VVVX
V
VVV
X
n
i
n
15
8585
Result 4.12 Law of Large NumbersResult 4.12 Law of Large Numbers
nε-μY-εP
n
YYYY
YE
YYY
n
i
n
as 1][
0, prescribedany for is,That
to
yprobabilitin converges
)( with normal) benot (may population
a from nsobservatiot independen:,.,
21
21
8686
Result 4.12 Multivariate CasesResult 4.12 Multivariate Cases
ΣS
μX
μX
XXX
y toprobabilitin converges
y toprobabilitin converges
)(mean with
normal) temultivaria benot (may population
from nsobservatiot independen ,,,
i
21
E
n
8787
Result 4.13 Central Limit TheoremResult 4.13 Central Limit Theorem
normal)nearly is populationparent the
when moderatefor ion approximat good (quite
size sample largefor
),(ely approximat is )(
covariance
finite and mean with population
a fromn observatiot independen:,,, 21
n
pn
Nn p
n
Σ0μX
Σ
μ
XXX
8888
Limit Distribution of Limit Distribution of Statistical DistanceStatistical Distance
n-p
n
n
n-p
n
pnn
N
p
p
p
largefor
ely approximat:)()'(
large is
y whenprobabilithigh with toclose
largefor
ely approximat :)()'(
size sample largefor )1
,(nearly :
21
21
μXSμX
ΣS
μXΣμX
ΣμX
Outline
Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality
Questions
How to determine if the samples follow a normal distribution?What is the Q-Q plot? Why is it valid?How to measure the straightness in a Q-Q plot?
16
Questions
How to use Result 4.7 to check if the samples are taken from a multivariate normal population?What is the chi-square plot? How to use it?
9292
QQ--QQ PlotPlot
ondistributi normal a
from are data theif since linear,
ely approximat are they if see to,Plot
2/1
2
1][
/)2
1(/: ofPortion
20 e.g., large, tomoderate anddistict be Let
on nsobservatio:
)()(
)()(
2/)(
)(
)(
)()2()1(
2)(
jj
jj
zq
j
j
j
in
qx
xq
n
jdzeqZP
njnjxx
nnx
Xxxx
j
9393
Example 4.9Example 4.9
9494
Example 4.9Example 4.9
Histogram of Histogram of MidTermMidTerm Scores of Scores of Students of This Course in 2006 Students of This Course in 2006
9595
QQ--Q Plot of Q Plot of MidTermMidTerm Scores of Scores of Students of This Course in 2006Students of This Course in 2006
9696n = 33, rQ = 0.946652
17
9797
Example 4.10 Radiation Data of Example 4.10 Radiation Data of ClosedClosed--Door Microwave OvenDoor Microwave Oven
9898
Measurement of StraightnessMeasurement of Straightness
4.2 Tablein value
eappropriat thebelow falls if cesignifican
of levelat hypothesisnormality Reject the
)()(
))((
1
2)(
1
2)(
1)()(
Q
n
jj
n
jj
n
jjj
Q
r
qqxx
qqxx
r
9999
Table 4.2 Table 4.2 QQ--QQ Plot Correlation Plot Correlation Coefficient TestCoefficient Test
100100
Example 4.11Example 4.11
hypothesisnormality reject not Do9351.0
10.0,10
994.0,795.8
472.8,584.8
0,770.0 4.9, Example from dataFor
10
1
2)(
10
1
2)()(
10
1)(
Q
Qj
j
jjj
jj
r
n
rq
xxqxx
qx
101101
Evaluating Evaluating BivariateBivariate NormalityNormality
)5.0()()'(such that all
bygiven ellipse in the lie
nsobservatio sample of 50%roughly ifCheck
22
1 xxSxxx
102102
Example 4.12Example 4.12
18
103103
Example 4.12Example 4.12
conclusion reach the tosmall toois )10( size sample However,
normality bivariatereject 50%an Greater th
39.1 with are nsobservatio 10 ofout Seven
39.134.4]4224,974.126[',
102927
309.62
128831.0003293.0
003293.0000184.0'
2927
309.62
39.1)5.0(
1030.1476.255
76.25520.005,10,
2927
309.62
2
221
5
2
1
2
12
22
5
n
d
dxx
x
x
x
xd
Sx
104104
ChiChi--Square PlotSquare Plot
)/)2
1(1()/)
2
1(( that Note
1 slope havingorigin the
throughlinestraight a resemble shouldplot The
)),/)2
1((( allGraph
freedom of degrees on with distributi square-chi
theof quantile /)2
1(100:)/)
2
1((
distance squared Order the
distance squared :)()(
2,
2)(,
,
2)(
2)2(
2)1(
12
njnjq
dnjq
p
njnjq
ddd
xxSxxd
ppc
jpc
pc
n
105105
Example 4.13 ChiExample 4.13 Chi--Square Plot Square Plot for Example 4.12for Example 4.12
106106
Example 4.13 ChiExample 4.13 Chi--Square Plot Square Plot for Example 4.12for Example 4.12
107107
ChiChi--Square Plot for Computer Square Plot for Computer Generated 4Generated 4--variate Normal Datavariate Normal Data Outline
Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality
19
109109
Steps for Detecting OutliersSteps for Detecting OutliersMake a dot plot for each variableMake a dot plot for each variableMake a scatter plot for each pair of Make a scatter plot for each pair of variablesvariablesCalculate the standardized values. Calculate the standardized values. Examine them for large or small Examine them for large or small valuesvaluesCalculated the squared statistical Calculated the squared statistical distance. Examine for unusually distance. Examine for unusually large values. In chilarge values. In chi--square plot, square plot, these would be points farthest from these would be points farthest from the origin.the origin.
Outline
Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality
QuestionsHow to transform sample counts, proportion, and correlation, such that the new variable is more near to a univariate normal distribution?What is Box and Cox’s univariatetransformation?How to extend Box and Cox’s transformation to the multivariate case?
Questions
How to deal with data including large negative values?
113113
Helpful Transformation to Helpful Transformation to Near NormalityNear Normality
Original ScaleOriginal Scale Transformed ScaleTransformed Scale
Counts, Counts, yy
Proportions, Proportions,
Correlations, Correlations, rr
p
pp
ˆ1
ˆlog
2
1)ˆlogit(
y
r
rrz
1
1log
2
1)( sFisher'
p̂
114114
Box and CoxBox and Cox’’s s UnivariateUnivariate TransformationsTransformations
n
jj
n
jj
n
jj
xn
x
xxxn
n
λ
x
xx
1
)(_____
)(
11
2_____)()(
)(
1
ln11
ln2
)(
maximize to Choose
0,ln
0,1
20
115115
Example 4.16 (Example 4.16 () vs. ) vs.
116116
Example 4.16 Example 4.16 QQ--QQ PlotPlot
117117
Transforming Multivariate Transforming Multivariate ObservationsObservations
p
jpjjj
n
jjkk
n
jjk
n
jkjk
k
p
pxxx
xn
x
xxxn
n
p
ˆ1
ˆ1
ˆ1
'
1
ln11
ln2
maximize toSelect
sticscharacteri the
for ssformationpower tran:,,,
)ˆ(
2
)ˆ(2
1
)ˆ(1)ˆ(
1
)(_____
)(
11
2_____)()(
k
21
21
λx
118118
More Elaborate ApproachMore Elaborate Approach
p
jpjjj
p
k
n
jjkkp
p
p
pxxx
xn
p
111'
from computed is )(
ln1)(ln2
,,,
maximize to',,,Select
sticscharacteri the
for ssformationpower tran:,,,
)(
2
)(2
1
)(1)(
1 121
21
21
21
λx
λS
λS
λ
119119
Example 4.17 Original Example 4.17 Original QQ--QQ Plot for Plot for OpenOpen--Door DataDoor Data
120120
Example 4.17 Example 4.17 QQ--QQ Plot of Plot of Transformed OpenTransformed Open--Door DataDoor Data
21
121121
Example 4.17 Contour Plot ofExample 4.17 Contour Plot offor Both Radiation Datafor Both Radiation Data),( 21
122122
Transform for Data Including Transform for Data Including Large Negative ValuesLarge Negative Values
2,01log
2,0)2/(11
0,0)1log(
0,0/11
2)(
xx
xx
xx
xx
x