21
1 1 Multivariate Normal Distribution Multivariate Normal Distribution Shyh Shyh-Kang Kang Jeng Jeng Department of Electrical Engineering/ Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Communication/ Graduate Institute of Networking and Graduate Institute of Networking and Multimedia Multimedia Outline Introduction The Multivariate Normal Density and Its Properties Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Sampling Distribution of and S Large-Sample Behavior of and S X X Outline Assessing the Assumption of Normality Detecting Outliers and Cleaning Data Transformations to Near Normality Outline Introduction The Multivariate Normal Density and Its Properties Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Sampling Distribution of and S Large-Sample Behavior of and S X X Questions What is the univariate normal distribution? What is the multivariate normal distribution? Why to study multivariate normal distribution? 6 Multivariate Normal Distribution Multivariate Normal Distribution Generalized from Generalized from univariate univariate normal normal density density Base of many multivariate analysis Base of many multivariate analysis techniques techniques Useful approximation to Useful approximation to true truepopulation distribution population distribution Central limit distribution of many Central limit distribution of many multivariate statistics multivariate statistics Mathematical tractable Mathematical tractable

Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

1

11

Multivariate Normal DistributionMultivariate Normal Distribution

ShyhShyh--Kang Kang JengJengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/Graduate Institute of Networking and Graduate Institute of Networking and

MultimediaMultimedia

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

Outline

Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

Questions

What is the univariate normal distribution?What is the multivariate normal distribution?Why to study multivariate normal distribution?

66

Multivariate Normal DistributionMultivariate Normal Distribution

Generalized from Generalized from univariateunivariate normal normal densitydensityBase of many multivariate analysis Base of many multivariate analysis techniquestechniquesUseful approximation to Useful approximation to ““truetrue””population distributionpopulation distributionCentral limit distribution of many Central limit distribution of many multivariate statistics multivariate statistics Mathematical tractableMathematical tractable

Page 2: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

2

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

QuestionsWhat is the formula for the probability density function of a univariate normal distribution?What are the probability meaning of parameters and ?How much probability are in the intervals (-, +) and (-2, +2)?How to look up the accumulated univariate normal probability in Table 1, Appendix?

Questions

What is the MahalanobisMahalanobis distance for distance for univariateunivariate normal distribution?normal distribution?What is the What is the MahalanobisMahalanobis distance for distance for multivariate normal distribution?multivariate normal distribution?What are the symbol for and the What are the symbol for and the formula of the probability density of formula of the probability density of a a pp--dimensional multivariate normal dimensional multivariate normal distribution?distribution?

QuestionsWhat are the possible shapes in a surface diagram of a bivariatenormal density?What is the constant probability density contour for a pp--dimensional dimensional multivariate normal distribution?multivariate normal distribution?What are the What are the eigenvalueseigenvalues and and eigenvectors of the inverse of eigenvectors of the inverse of ? ? (Result 4.1)(Result 4.1)

QuestionsWhat is the region that the total probability inside equals 1-? What is the probability distribution for a linear combination of p random variables with the same multivariate-normal distribution? (Result 4.2)How to find the marginal distribution of a multivariate-normal distribution by Result 4.2?

QuestionsWhat is the probability distribution for a random vector obtained by multiplying a matrix to a random vector of p random variables with the same multivariate-normal distribution? (Result 4.3)What is the probability distribution of a random vector of multivariate normal distribution plus a constant vector? (Result 4.3)

Page 3: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

3

Questions

Given the mean and covariance matrix of a multivariate random vector, and the random vector is partitioned, how to find the mean and covariance matrix of the two parts of the partitioned random vector? (Result 4.4)

QuestionsWhat are the if-and-only-if conditions for two multivariate normal vectors X1 and X2 to be independent? (Result 4.5)If two multivariate normal vectors X1and X2 are independent, what will be the probability distribution of the random vector partitioned into X1and X2? (Result 4.5)

Questions

A random vector X is partitioned into X1 and X2, then what is the conditional probability distribution odX1 given X2 = x2? (Result 4.6)What is the probability distribution for the square of the MahalanobisMahalanobisdistance for a multivariate normal distance for a multivariate normal vector? (Result 4.7)vector? (Result 4.7)

Questions

How to find the value of the MahalanobisMahalanobis distance for a distance for a multivariate normal vector when the multivariate normal vector when the probability inside the corresponding probability inside the corresponding ellipsoid is specified? (Result 4.7)ellipsoid is specified? (Result 4.7)

QuestionsWhat is the shape of a chi-square distribution curve? How to look up the accumulated chi-square probability from Table 3, Appendix?What is the joint distribution of two random vectors which are two linear combinations of n different multivariate random vectors? (Result 4.8)

1818

UnivariateUnivariate Normal DistributionNormal Distribution

xexf

N

x 2/]/)[(

2

2

2

2

1)(

),(

Page 4: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

4

1919

Table 1, AppendixTable 1, Appendix

2020

Square of DistanceSquare of Distance((MahalanobisMahalanobis distance) distance)

)('

)())((

1

122

μxμx

xxx

2121

pp--dimensional Normal Densitydimensional Normal Density

],,,['

vector random from sample a is

,,2,1,

2

1)(

),(

21

2/)()'(2/12/

1

p

i

p

p

XXX

pix

ef

N

X

x

Σx

Σμ

μxΣμx

2222

Example 4.1 Example 4.1 BivariateBivariate NormalNormal

)1(

1,

),Corr(/

)Var(),Var(

)(),(

2122211

2122211

1112

1222

2122211

1

2221

1211

2122111212

222111

2211

ΣΣ

XX

XX

XEXE

2323

Example 4.1 Squared DistanceExample 4.1 Squared Distance

22

22

11

1112

2

22

22

2

11

112

12

22

11

11221112

22111222

2122211

2211

1

21

1

)1(

1],[

)('

xxxx

x

x

xx

μxΣμx

2424

Example 4.1 Density FunctionExample 4.1 Density Function

]}2

[)1(2

1exp{

)1(2

1),(

22

22

11

1112

2

22

22

2

11

112

12

2122211

21

xx

xx

xxf

Page 5: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

5

2525

Example 4.1 Example 4.1 BivariateBivariate DistributionDistribution

11 = 22, 12 = 02626

Example 4.1 Example 4.1 BivariateBivariate DistributionDistribution

11 = 22, 12 = 0.75

2727

ContoursContours

pi

c

c

contourdensityyprobabilitConstant

iii

i

,,2,1,

:axes

at centered ellipsoidan of surface

)-()'(such that all

i

21

eΣe

e

μ

μxΣμxx

2828

Result 4.1Result 4.1

definite positive

for ),(1/for ),(

1

definite positive :

1

1-

1-

Σ

ΣeΣe

eeΣeΣe

Σ

2929

Example 4.2 Example 4.2 BivariateBivariate ContourContour

]2

1,

2

1[',

]2

1,

2

1[',

rseigenvecto and seigenvalue

normal, Bivariate

212112

112111

2211

e

e

3030

Example 4.2 Positive CorrelationExample 4.2 Positive Correlation

Page 6: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

6

3131

Probability Related to Probability Related to Squared DistanceSquared Distance

1y probabilit has

)()()'(

satisfying values of ellipsoid Solid21pμxΣμx

x

3232

Probability Related to Probability Related to Squared DistanceSquared Distance

3333

Result 4.2Result 4.2

),( bemust

every for )','(:'

)','(

:'

),(:

2211

ΣμX

aΣaaμaXa

Σaaμa

Xa

ΣμX

p

pp

p

N

N

N

XaXaXa

N

3434

Example 4.3 Marginal DistributionExample 4.3 Marginal Distribution

),(

:in ofon distributi Marginal

),()','(:'

','

'],0,,0,1['

),(:]',,,[

111

111

1

21

iii

i

pp

N

X

NN

X

NXXX

X

ΣaaμaXa

Σaaμa

Xaa

ΣμX

3535

Result 4.3Result 4.3

),(:

)',(:

),(:

11

2121

1111

ΣdμdX

AΣAAμAX

ΣμX

p

q

pqpq

pp

pp

p

N

N

XaXa

XaXa

XaXa

N

3636

Proof of Result 4.3: Part 1Proof of Result 4.3: Part 1

)',(:every for valid

))'('),('(:)('

))'()'(,)'((:)'(

'

,')(' ncombinatiolinear Any

AAΣAμAXb

bAAΣbAμbAXb

bAΣAbμAbXAb

bAa

XaAXb

qN

N

N

Page 7: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

7

3737

Proof of Result 4.3: Part 2Proof of Result 4.3: Part 2

),(:

arbitrary is

)',''(:''

)','(:'

'')('

ΣdμdX

a

ΣaadaμadaXa

ΣaaμaXa

daXadXa

pN

N

N

3838

Example 4.4 Linear CombinationsExample 4.4 Linear Combinations

322211

2

33232213222312

13222312221211

32

21

3

2

1

32

21

3

, with verifiedbe can

)',(:

2

2'

110

011

),(:

XXYXXY

N

X

X

X

XX

XX

N

AAΣAμAX

AAΣ

AX

ΣμX

3939

Result 4.4Result 4.4

),(:

|

|

,,

),(:

1111

2221

1211

2

1

)1)((2

)1(1

ΣμX

ΣΣ

ΣΣ

Σ

μ

μ

μ

X

X

X

ΣμX

q

qp

q

p

N

N

4.3Result in |Set :Proof))(()()(

qpqqqpq0IA

4040

Example 4.5 Subset DistributionExample 4.5 Subset Distribution

4424

2422

4

221

4424

242211

4

21

4

21

5

,:

,,

),(:

N

X

X

N

X

ΣμX

ΣμX

4141

Result 4.5Result 4.5

22

11

2

1

2

1

2222211111

1221

2221

1211

2

1

2

1

)(21

)1(2

)1(1

|'

|

,:

tindependen ),(:),,(: (c)

0 ifonly and ift independen:,

|

|

,:--- (b)

),Cov( t,independen :, (a)

21

21

2121

Σ0

μ

μ

X

X

ΣμXΣμX

ΣXX

ΣΣ

ΣΣ

μ

μ

X

X

0XXXX

qq

qq

qq

qqqq

N

NN

N

4242

Example 4.6 IndependenceExample 4.6 Independence

) also and oft independen is (

tindependen are and

tindependennot :,

200

031

014

),(:

213

32

11

21

3

XXX

XX

X

XX

N

X

Σ

ΣμX

Page 8: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

8

4343

Result 4.6Result 4.6

211

221211

221

2211

221

22

2221

1211

2

1

2

1

covariance

and )(mean with normal

is given ofon distributi lconditiona

0,

|

|

,),,(:

ΣΣΣΣ

μxΣΣμ

xXX

Σ

ΣΣ

ΣΣ

Σ

μ

μ

μΣμ

X

X

X

pN

4444

Proof of Result 4.6Proof of Result 4.6

22

211

221211

22

221

221211

12212

|

'|

'

covariance with normaljoint

:

)(

)(

,

|0

|

Σ0

0ΣΣΣΣ

AAΣ

μX

μXΣΣμX

μXA

I

ΣΣI

A

4545

Proof of Result 4.6Proof of Result 4.6

)),((

:given

),0(:)(

))()((

)

|)()((

)()(/),()|(t independen ,

tindependen are and )(

211

221211221

22121

221

211

221211221

221211

221

221211221

221211

22

221

221211221

221211

22221

221211

ΣΣΣΣμxΣΣμ

xXX

ΣΣΣΣμXΣΣμX

μxΣΣμxμXΣΣμX

xX

μxΣΣμxμXΣΣμX

μXμXΣΣμX

q

q

N

N

f

f

APBPBAPBAPBA

4646

Example 4.7 Conditional Example 4.7 Conditional BivariateBivariate

)),(()|(

thatshow

),(:

22

212

112222

12121

2212

1211

2

12

2

1

xNxxf

Nx

x

4747

Example 4.1 Density FunctionExample 4.1 Density Function

]}2

[)1(2

1exp{

)1(2

1),(

22

22

11

1112

2

22

22

2

11

112

12

2122211

21

xx

xx

xxf

4848

Example 4.7Example 4.7

)1(2/))(/(

21211

22121

2221211

2122211

22

222

2

2222

12112

1211

22

22

11

1112

2

22

22

2

11

11212

21211

222221211

)1(2

1

)(/),()|(

2)1(2)1(2

)(

2

1)(

)1(2

1

]2[)1(2

1

xxe

xfxxfxxf

xxx

xxxx

Page 9: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

9

4949

Result 4.7Result 4.7

ondistributi theof percentile

)th (100upper thedenotes )( where

,1 is )}()()'(:{

ellipsoid solid theinsidey probabilit The (b)

: )()'( (a)

0),,(:

2p

2p

2p

1

2p

1

μXΣμXx

μXΣμX

ΣΣμX pN

5050

22 DistributionDistribution

)2/on with distributi (Gamma

0,0

0,)2/(2

1)(

(d.f.) freedom of degrees :,

)1,0(:);,(:

,),,(:),,(:

2

22/12/22/2

2

1

2

2

2222

2111

2

ef

x

NX

ZNX

NXNX

i i

ii

i

iii

5151

22 Distribution CurvesDistribution Curves

5252

Table 3, AppendixTable 3, Appendix

5353

Proof of Result 4.7 (a)Proof of Result 4.7 (a)

2

1

21

2

2

1

1

1

'

'

2'2

1'1

1

2

2

1

'

'

1

1

:)()'(),1,0(:

/

/

/

'

)',0(:)(,)(1

)()'(1

)()'(

p

p

iii

p

pp

iiii

pp

p

p

ii

p

ii

i

ii

p

i i

ZNZ

NZ

μXΣμX

Ieee

ee

e

e

e

AAΣ

AAΣμXAZμXe

μXeeμXμXΣμX

5454

Proof of Result 4.7 (b)Proof of Result 4.7 (b)

1)()()'(

by ddistribute

variablerandom new )()'(

),(:by ellipsoid the toassigned

yprobabilit theis)()'(

21

2

1

21

p

p

p

P

N

cP

μXΣμX

μXΣμX

ΣμX

μXΣμX

Page 10: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

10

5555

Result 4.8Result 4.8

n

jj

n

jj

nn

n

j

n

jjjjpnn

jp

n

b

c

bbb

ccNccc

N

1

2

1

2

2122112

1 1

222111

j

21

)()'(

)'()(

matrix covariancewith

normaljoint are and and

)(,:

),(:

tindependenmutually :,,,

ΣΣcb

ΣcbΣ

VVXXXV

ΣμXXXV

ΣμX

XXX

5656

Proof of Result 4.8Proof of Result 4.8

ΣAAΣ

ΣΣAAΣ

AAΣAμV

VAX

III

IIIA

Σ00

0Σ0

00Σ

Σ

μ

μ

μ

μ

ΣμXXXX

X

X

X

X

X

)(:' rmsofdiagonalteoff

)(,)(:' of termsdiagonalblock

)',(:,

,

),(:],,,['

1

1

2

1

2

22

1

21

21

2

1

''2

'1

n

jjj

n

jj

n

jj

pn

n

n

npn

bc

bc

Nbbb

ccc

N

5757

Example 4.8 Linear CombinationsExample 4.8 Linear Combinations

312123

22

21

321

1

34321

2223'

3'

)','(:'

201

011

113

,

1

1

3

),( identicalt independen:,,,

aaaaaaa

aaa

N

N

Σaa

μa

ΣaaμaXa

Σμ

ΣμXXXX

5858

Example 4.8 Linear CombinationsExample 4.8 Linear Combinations

0ΣVVXXXXV

ΣΣΣ

μμμ

ΣμXXXXV

V

V

VV

4

12143212

4

1

2

4

1

343211

)(),Cov(,3

201

011

113

)(

2

2

6

2

),(:2

1

2

1

2

1

2

1

1

1

11

jjj

jj

jjj

bc

c

c

N

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

QuestionsWhat are random samples?What is the likelihood?How to estimate the mean and variance of a univariate normal distribution by the maximum-likelihood technique? (point estimates)What is the multivariate normal likelihood?

Page 11: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

11

Questions

What is the trace of a matrix?How to compute the quadratic form using the trace of the matrix? (Result 4.9)How to express the trace of a matrix by its eigenvalues? (Result 4.9)Result 4.10

Questions

How to estimate the mean and covariance matrix of a multivariate normal vector? (Result 4.11)What is the invariance property of the maximum likelihood estimates?What is the sufficient statistics?

6363

MaximumMaximum--likelihood Estimationlikelihood Estimation

6464

Multivariate Normal LikelihoodMultivariate Normal Likelihood

estimateslikelihoodMaximum

estimation likelihood Maximum

likelihood

,,, fixedfor and offunction a as

2

1

,,,

ofdensity Joint

),( from sample random:,,,

21

2/)()'(

2/2/21

21

1

1

n

nnpn

pn

n

jjj

e

N

xxxΣμ

ΣXXX

ΣμXXX

μxΣμx

6565

Trace of a MatrixTrace of a Matrix

k

i

k

jij

k

iiiij

kk

a

cc

caa

1 1

2

1

1)(

)'tr()e(

)tr()tr()d(

)tr()tr()c(

)tr()tr()(tr)b(

)tr()tr()a(

scalar a is ;)tr(

AA

AABB

BAAB

BABA

AA

AA

6666

Result 4.9Result 4.9

k

ii

k

kk

1

)tr( b)(

)'tr()'tr(' (a)

vector1 :

matrix symetric :

A

AxxAxxAxx

x

A

Page 12: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

12

6767

Proof of Result 4.9 (a)Proof of Result 4.9 (a)

)'tr()')tr(()'tr(

)tr()tr(

)tr(

)tr()tr(

matrix :,matrix :

1 11 1

1 1

AxxxAxAxx

BCCB

BC

CBBC

CB

m

i

k

jjiij

k

j

m

iijji

m

i

k

jjiij

cbbc

cb

mkkm

6868

Proof of Result 4.9 (b)Proof of Result 4.9 (b)

k

ii

kdiag

1

21

)tr()'tr(

)'tr()tr(

,,,

','

ΛΛPP

ΛPPA

Λ

IPPΛPPA

6969

Likelihood FunctionLikelihood Function

2/''tr

2/2/

1

1

1

1

1

1

1

1

1

2

1),(

''

'

'

'tr'

n

jjj n

nnp

n

jjj

n

jjj

n

jjj

n

jjj

n

jjj

eL

n

μxμxxxxxΣ

ΣΣμ

μxμxxxxx

μxxxμxxx

μxμx

μxμxΣμxΣμx

7070

Result 4.10Result 4.10

Σ

B

b

ebe

b

pp

pp

bppb

bb

2/1for only holding

equality with , definite positive allfor

211

scalar positive :

matrix definite positivesymmetric:

)(

2/)tr( 1

7171

Proof of Result 4.10Proof of Result 4.10

IBΣBBΣ

BBΣ

ΣBBΣBΣ

BΣB

BΣBBBΣBΣ

bb

ebebebe

eee

bppb

bbbbb

p

i

bipb

bp

ii

b

p

ii

p

i

i

i

p

ii

2such that )2/1( when attained is boundupper

211

2at 2 maximum a has

11

/,tr

positive all , of seigenvalue:

trtrtr

2/112/1

2/)tr(2/

1

2/2/

12/)tr(

11

1i

1

2/112/1

2/112/12/12/111

1

11

7272

Result 4.11 Maximum Likelihood Result 4.11 Maximum Likelihood Estimators of Estimators of and and

SXXXXΣ

ΣμXXX

n

n

n

N

j

n

jj

pn

1'

ˆ

),( from sample random:,,,

1

21

Page 13: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

13

7373

Proof of Result 4.11Proof of Result 4.11

SxxxxΣ

ΣΣμ

μxΣμxxxxxΣ

Σμ

xxxxΣ

n

n

n

eL

n

L

n

jjj

nnp

n

jjj

n

jjj

1'

2

1),ˆ(

ˆ

'2

1'tr

2

1

:),( ofExponent

1

'tr

2/2/

1

1

1

1

1

7474

Invariance PropertyInvariance Property

)Var( of MLE1

ˆ

ˆ of MLE

ˆˆ'ˆ' of MLE

:Examples

)( ofestimator likelihood maximum:)ˆ(

ofestimator likelihood maximum:ˆ

1

2

11

i

n

jijiii

iiii

XXXn

hh

μΣμμΣμ

7575

Sufficient StatisticsSufficient Statistics

population normal

temultivaria a of statistics sufficient are and

and through ,,,

nsobservatio ofset wholeon the depends

2

1

,,,

ofdensity Joint

21

2/''tr

2/2/

21

1

1

Sx

Sxxxx

Σ

XXX

μxμxxxxxΣ

n

n

nnp

n

n

jjj

e

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

Questions

What is the distribution of sample mean for multivariate normal samples?What is the distribution of sample covariance matrix for multivariate normal samples?

7878

Distribution of Sample MeanDistribution of Sample Mean

4.8Result cf.

)/,(:

:case teMultivaria

)/,(:

1 :case Univariate

),( from sample random:,,,

2

21

nN

nNX

p

N

p

pn

ΣμX

ΣμXXX

Page 14: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

14

7979

Sampling Distribution of Sampling Distribution of SS

)|)1((on distributiWishart :)1(

),(:

:case teMultivaria

),0(:,)1(

:)1(

1 :case Univariate

),( from sample random:,,,

11

'

2

1

222

21

2

1

22

21

ΣSZZS

Σ0XXZ

ΣμXXX

n-Wn-

N

NZZsn

XXsn

p

N

n

n

jjj

pjj

j

n

jj

n

n

jj

pn

8080

WishartWishart DistributionDistribution

)'|'(:')|(:

)|(:

)|(:),|(:

:Properties

definite positive:

21

2)|(

2121

2211

1

2/)1(4/)1(2/)1(

2/tr2/)2(

1

21

21

1

CCΣCACCACΣAA

ΣAAAA

ΣAAΣAA

A

Σ

AΣA

mm

mm

mm

p

i

nppnp

pn

n

WW

W

WW

in

ew

OutlineIntroductionThe Multivariate Normal Density and Its PropertiesSampling from a Multivariate Normal Distribution and Maximum Likelihood EstimationThe Sampling Distribution of and SLarge-Sample Behavior of and S

X

X

Questions

What is the univariate central limit theorem?What is the law of large numbers, for the univariate case and the multivariate case? (Result 4.12)What is the multivariate central limit theorem? (Result 4.13)

Questions

What is the limit distribution for the square of statistical distance?

8484

UnivariateUnivariate Central Limit TheoremCentral Limit Theorem

size sample largefor normalnearly also is

on distributi normalnearly a has

ty variabilisame the

ely approximat having variablesrandom :

,,, causes

tindependen ofnumber largeaby determined:

21

21

X

X

VVVX

V

VVV

X

n

i

n

Page 15: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

15

8585

Result 4.12 Law of Large NumbersResult 4.12 Law of Large Numbers

nε-μY-εP

n

YYYY

YE

YYY

n

i

n

as 1][

0, prescribedany for is,That

to

yprobabilitin converges

)( with normal) benot (may population

a from nsobservatiot independen:,.,

21

21

8686

Result 4.12 Multivariate CasesResult 4.12 Multivariate Cases

ΣS

μX

μX

XXX

y toprobabilitin converges

y toprobabilitin converges

)(mean with

normal) temultivaria benot (may population

from nsobservatiot independen ,,,

i

21

E

n

8787

Result 4.13 Central Limit TheoremResult 4.13 Central Limit Theorem

normal)nearly is populationparent the

when moderatefor ion approximat good (quite

size sample largefor

),(ely approximat is )(

covariance

finite and mean with population

a fromn observatiot independen:,,, 21

n

pn

Nn p

n

Σ0μX

Σ

μ

XXX

8888

Limit Distribution of Limit Distribution of Statistical DistanceStatistical Distance

n-p

n

n

n-p

n

pnn

N

p

p

p

largefor

ely approximat:)()'(

large is

y whenprobabilithigh with toclose

largefor

ely approximat :)()'(

size sample largefor )1

,(nearly :

21

21

μXSμX

ΣS

μXΣμX

ΣμX

Outline

Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality

Questions

How to determine if the samples follow a normal distribution?What is the Q-Q plot? Why is it valid?How to measure the straightness in a Q-Q plot?

Page 16: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

16

Questions

How to use Result 4.7 to check if the samples are taken from a multivariate normal population?What is the chi-square plot? How to use it?

9292

QQ--QQ PlotPlot

ondistributi normal a

from are data theif since linear,

ely approximat are they if see to,Plot

2/1

2

1][

/)2

1(/: ofPortion

20 e.g., large, tomoderate anddistict be Let

on nsobservatio:

)()(

)()(

2/)(

)(

)(

)()2()1(

2)(

jj

jj

zq

j

j

j

in

qx

xq

n

jdzeqZP

njnjxx

nnx

Xxxx

j

9393

Example 4.9Example 4.9

9494

Example 4.9Example 4.9

Histogram of Histogram of MidTermMidTerm Scores of Scores of Students of This Course in 2006 Students of This Course in 2006

9595

QQ--Q Plot of Q Plot of MidTermMidTerm Scores of Scores of Students of This Course in 2006Students of This Course in 2006

9696n = 33, rQ = 0.946652

Page 17: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

17

9797

Example 4.10 Radiation Data of Example 4.10 Radiation Data of ClosedClosed--Door Microwave OvenDoor Microwave Oven

9898

Measurement of StraightnessMeasurement of Straightness

4.2 Tablein value

eappropriat thebelow falls if cesignifican

of levelat hypothesisnormality Reject the

)()(

))((

1

2)(

1

2)(

1)()(

Q

n

jj

n

jj

n

jjj

Q

r

qqxx

qqxx

r

9999

Table 4.2 Table 4.2 QQ--QQ Plot Correlation Plot Correlation Coefficient TestCoefficient Test

100100

Example 4.11Example 4.11

hypothesisnormality reject not Do9351.0

10.0,10

994.0,795.8

472.8,584.8

0,770.0 4.9, Example from dataFor

10

1

2)(

10

1

2)()(

10

1)(

Q

Qj

j

jjj

jj

r

n

rq

xxqxx

qx

101101

Evaluating Evaluating BivariateBivariate NormalityNormality

)5.0()()'(such that all

bygiven ellipse in the lie

nsobservatio sample of 50%roughly ifCheck

22

1 xxSxxx

102102

Example 4.12Example 4.12

Page 18: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

18

103103

Example 4.12Example 4.12

conclusion reach the tosmall toois )10( size sample However,

normality bivariatereject 50%an Greater th

39.1 with are nsobservatio 10 ofout Seven

39.134.4]4224,974.126[',

102927

309.62

128831.0003293.0

003293.0000184.0'

2927

309.62

39.1)5.0(

1030.1476.255

76.25520.005,10,

2927

309.62

2

221

5

2

1

2

12

22

5

n

d

dxx

x

x

x

xd

Sx

104104

ChiChi--Square PlotSquare Plot

)/)2

1(1()/)

2

1(( that Note

1 slope havingorigin the

throughlinestraight a resemble shouldplot The

)),/)2

1((( allGraph

freedom of degrees on with distributi square-chi

theof quantile /)2

1(100:)/)

2

1((

distance squared Order the

distance squared :)()(

2,

2)(,

,

2)(

2)2(

2)1(

12

njnjq

dnjq

p

njnjq

ddd

xxSxxd

ppc

jpc

pc

n

105105

Example 4.13 ChiExample 4.13 Chi--Square Plot Square Plot for Example 4.12for Example 4.12

106106

Example 4.13 ChiExample 4.13 Chi--Square Plot Square Plot for Example 4.12for Example 4.12

107107

ChiChi--Square Plot for Computer Square Plot for Computer Generated 4Generated 4--variate Normal Datavariate Normal Data Outline

Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality

Page 19: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

19

109109

Steps for Detecting OutliersSteps for Detecting OutliersMake a dot plot for each variableMake a dot plot for each variableMake a scatter plot for each pair of Make a scatter plot for each pair of variablesvariablesCalculate the standardized values. Calculate the standardized values. Examine them for large or small Examine them for large or small valuesvaluesCalculated the squared statistical Calculated the squared statistical distance. Examine for unusually distance. Examine for unusually large values. In chilarge values. In chi--square plot, square plot, these would be points farthest from these would be points farthest from the origin.the origin.

Outline

Assessing the Assumption of NormalityDetecting Outliers and Cleaning DataTransformations to Near Normality

QuestionsHow to transform sample counts, proportion, and correlation, such that the new variable is more near to a univariate normal distribution?What is Box and Cox’s univariatetransformation?How to extend Box and Cox’s transformation to the multivariate case?

Questions

How to deal with data including large negative values?

113113

Helpful Transformation to Helpful Transformation to Near NormalityNear Normality

Original ScaleOriginal Scale Transformed ScaleTransformed Scale

Counts, Counts, yy

Proportions, Proportions,

Correlations, Correlations, rr

p

pp

ˆ1

ˆlog

2

1)ˆlogit(

y

r

rrz

1

1log

2

1)( sFisher'

114114

Box and CoxBox and Cox’’s s UnivariateUnivariate TransformationsTransformations

n

jj

n

jj

n

jj

xn

x

xxxn

n

λ

x

xx

1

)(_____

)(

11

2_____)()(

)(

1

ln11

ln2

)(

maximize to Choose

0,ln

0,1

Page 20: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

20

115115

Example 4.16 (Example 4.16 () vs. ) vs.

116116

Example 4.16 Example 4.16 QQ--QQ PlotPlot

117117

Transforming Multivariate Transforming Multivariate ObservationsObservations

p

jpjjj

n

jjkk

n

jjk

n

jkjk

k

p

pxxx

xn

x

xxxn

n

p

ˆ1

ˆ1

ˆ1

'

1

ln11

ln2

maximize toSelect

sticscharacteri the

for ssformationpower tran:,,,

)ˆ(

2

)ˆ(2

1

)ˆ(1)ˆ(

1

)(_____

)(

11

2_____)()(

k

21

21

λx

118118

More Elaborate ApproachMore Elaborate Approach

p

jpjjj

p

k

n

jjkkp

p

p

pxxx

xn

p

111'

from computed is )(

ln1)(ln2

,,,

maximize to',,,Select

sticscharacteri the

for ssformationpower tran:,,,

)(

2

)(2

1

)(1)(

1 121

21

21

21

λx

λS

λS

λ

119119

Example 4.17 Original Example 4.17 Original QQ--QQ Plot for Plot for OpenOpen--Door DataDoor Data

120120

Example 4.17 Example 4.17 QQ--QQ Plot of Plot of Transformed OpenTransformed Open--Door DataDoor Data

Page 21: Multivariate Normal Distributioncc.ee.ntu.edu.tw/.../MultivariateNormalDistribution6s.pdf · 2012. 9. 19. · Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical

21

121121

Example 4.17 Contour Plot ofExample 4.17 Contour Plot offor Both Radiation Datafor Both Radiation Data),( 21

122122

Transform for Data Including Transform for Data Including Large Negative ValuesLarge Negative Values

2,01log

2,0)2/(11

0,0)1log(

0,0/11

2)(

xx

xx

xx

xx

x