52
The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds by Roger Bilisoly, PhD Department of Mathematical Sciences Central Connecticut State University September 2 nd , 2011

The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

Embed Size (px)

Citation preview

Page 1: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 1/52

The Intersection of Statistics withGeometry, Information, and Riemannian Manifolds

by Roger Bilisoly, PhD

Department of Mathematical Sciences

Central Connecticut State UniversitySeptember 2nd, 2011

Page 2: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 2/52

Overview of Talk

1. Geometric view of 1st year statistics: summary statistics,t-test, two sample t-test, correlation

2. Statistical inference, Fisher information, exponentialfamilies of PDFs, and Riemannian manifolds

3. Sampling data from manifolds

Page 3: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 3/52

Geometry of 1st Year Topics:Summary Statistics

• Idea: represent data as vectors in Rn, where n = sample size.Many basic statistical ideas then can be cast geometrically.Let’s look at summary statistics: 

• The Mean – Let data = {x1, x2, x3, …, xn}.

 – We want to summarize this data set by 1 number, say x. – So model  = {x, x, x, …, x}.  – Use the L2 norm to minimize distance between data and model . – Enough to minimize distance2. – distance2 = (data – model)2 = Sum[(x-xi)

2, {i,1,n}] – Solution = mean of data (Just solve D[distance2, x] == 0), which is

an example of least squares. 

• The Standard deviation – Use the L2 norm again on data and model . – Variance = SSE/df = Sum[(xi   – m)2]/(n – 1) = distance2 /(n – 1), which

is an example of sum of squares (SS).

 – We’ll see more about SS in ANOVA and regression. 

Page 4: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 4/52

Geometry of 1st Year Topics:Summary Statistics (Cont.)

• The Median – Let data = {x1, x2, x3, …, xn}.

 – Let model  = {x, x, x, …, x}. 

 – Use L1 norm to minimize distance between data and model .

 – The solution: x = median (n odd) – What about n even?

• The Midrange – As above, but use L norm to minimize distance between data  

and model .

 – The solution: x = midrange = (mini xi + maxi xi)/2.

• L2 has an inner product – This allows the definition of cos( ).

 – As seen on last slide, L2 arises in sums of squares and leastsquares, which makes it ubiquitous in statistics.

Page 5: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 5/52

1 Sample T-Test

• Subsample of 10 human body temp’s was taken from:http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html  

• H0: μ = 98.6 F vs. H1: μ  98.6 F – temps = {97.2,98.7,98.4,98.2,98.4,98.2,97.1,96.7,98.3,98.0}

 – model = {98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6}

 – Let u = {1,1,1,1,1,1,1,1,1,1}/Sqrt[10] (|u| = 1) – Projection of temps on model = (temps.u)u = {1,1,1,1,1,1,1,1,1,1}*97.92

 – error = temps – (temps.u)u ={0.72,0.78,0.48,0.28,0.48,0.28,-0.82,-1.22,0.38,0.08}

 – s = (SSE/df) = Sqrt[Fold[Plus,0,error^2]/9] = 0.671317

The TTEST Procedure

StatisticsLower CL Upper CL Lower CL Upper CL

Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err

temps 10 97.44 97.92 98.4 0.4618 0.6713  1.2256 0.2123

T-Tests

Variable DF t Value Pr > |t|

t 9 -3.20 0.0108

Page 6: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 6/52

Picture of T-Test

tempserror

(temps.u)u

ArcCos[temps.u/Norm[temps]]/Pi*180 = 0.373 

• Human body temp data – temps = {97.2,98.7,98.4,98.2,98.4,98.2,97.1,96.7,98.3,98.0}

 – model = {98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6,98.6}

 – Projection of temps on model = (temps.u)u =

{1,1,1,1,1,1,1,1,1,1}*97.92 – error = temps – (temps.u)u =

{0.72,0.78,0.48,0.28,0.48,0.28,-0.82,-1.22,0.38,0.08}

Page 7: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 7/52

Geometry of Comparing Body Temperaturesby Gender with ANOVA (Long Example)

• Sample of 5 male and 5 female body temp’s taken from:http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html  

• Standard parametric test is the 2-sample t-test.

• data={98.7,99.4,98.8,98.2,99.2,98.0,99.1,98.8,98.0,98.2}

 – First 5 values are females, last 5 are males.• ANOVA table from SAS given below (PROC ANOVA)

• Geometric aspect here: First two rows represent anorthogonal subspace of R10.

The ANOVA Procedure

Dependent Variable: temp

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 1 0.48400000 0.48400000 2.06 0.1892

Error 8 1.88000000 0.23500000

Corrected Total 9 2.36400000

SSE

Page 8: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 8/52

A Basis for R10 

The first row of basis represents the overall mean.

The second row of basis represents the contrast of males vs. females.

The last 8 rows span the error subspace.

Next we’ll perform

Gram-Schmidt via

Mathematica’s 

Orthogonalize[].

Page 9: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 9/52

An Orthonormal Basis for R10 

As before, the first row of orthonormal represents the overall mean.

The second row of orthonormal represents the contrast of males vs. females.

The last 8 rows span the error subspace.

Page 10: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 10/52

Projecting into the following three subspaces:mean, model, and error

Sample of 5 male and 5 female body temp’s taken from:

http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html  

We have modeled:

yij = μ + genderi + errorij

where R10 is broken into

three orthogonal

subspaces, one for eachterm in the model.

y11 = 98.7 = 98.64 + 0.22 +

(0.015+0.225-0.05-0.35),

and so forth.

Page 11: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 11/52

The ANOVA Table RepresentsOrthogonal Decomposition

The ANOVA Procedure

Dependent Variable: temp

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 1 0.48400000 0.48400000 2.06 0.1892

Error 8 1.88000000 0.23500000

Corrected Total 9 2.36400000

ssModel = sum of the squares of the data projected into the model subspace.

Since projection = {0.22,0.22,0.22,0.22,0.22,-0.22,-0.22,-0.22,-0.22,-0.22},

ssModel = 0.222 + 0.222 + … + (-0.22)2 = 10*0.0484 = 0.484.

Page 12: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 12/52

The Schematic Picture for theGender Body Temperatures ANOVA Model

See Figure 6.3 on page 105 of Saville and Wood (1991) for another example

of ANOVA. It also has examples of more complex experimental designs,

regression, and ANCOVA.

Remember that:

data = {98.7,99.4,98.8,98.2,99.2,98.0,99.1,98.8,98.0,98.2}mean = {1,1,1,1,1,1,1,1,1,1}/Sqrt[10]

model = {1,1,1,1,1,-1,-1,-1,-1,-1}/Sqrt[10]

data

(data.mean)mean

(data.model)model

DiagonalMatrix[(error.data)].error

Page 13: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 13/52

Regression of Log(Distance) of Planets vs. Rank

• Bode’s Law – Claim that

log(distance) can bepredicted by rank

 – Includes Ceres

(now a dwarf planet) – Excludes Pluto (now

a dwarf planet)

Page 14: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 14/52

Least Square Solution =Minimizing Distance(data, model) in R9 

Log(Dist) datax values

Model

Objective function

Regression model: Log(distance) = a0 + a1 x

Page 15: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 15/52

2nd Point of View:Project Data onto Model in R9 

Model direction

Log(Dist) data

Projection

Prediction

Page 16: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 16/52

Correlation of LogDist vs. Rank(Bode’s Law Data)

R = 0.9966

Page 17: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 17/52

Correlation = cos( ) in R9 

 y

i

 x

i

 y x

ii

s

 y y

s

 x x

nssn

 y y x xr 

)()(

1

1

)1(

))((

Note that RHS is almost the mean of product of the z-scores of the data.

One can show that as vectors in R23, the z-scores of the data have norms

of df =(n-1), so r = dot product in R23 of unit vectors = cos( ):

)cos()(

1

1)(

1

1

 y

i

 x

i

s

 y y

ns

 x x

nr 

Page 18: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 18/52

Apply r = cos( ) to Bode’s Law

Pearson Correlation(Dot product of z-scores)/d.f. = cos( )

Angle between size and calories vectors

Page 19: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 19/52

And so forth … 

• The above examples don’t exhaust the connections

between statistics and geometry.

 – For example, PCA rotates point clouds in n-dimensional spacewhere n = number of variables.

• But let us consider another geometric point of view … 

Page 20: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 20/52

Statistical Inference, Fisher Information, ExponentialFamilies of PDFs, and Riemannian Manifolds

• It turns out that one can make deeper connectionsbetween statistics and geometry.

 – Probability distributions can be represented by differentiablemanifolds.

 – Some types of data can be represented by differentiablemanifolds.

• However, this connection requires some theory

• Let’s start with an overview to see how this connection is

going to be made.

Page 21: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 21/52

The Big Picture

• Hypothesis testing projects a data set onto aparameterized family of probability distributionfunctions (pdf’s), say   .

 – Call the best fit f.

• A subset of   , say   0, is called the null hypothesis, H0.• If f is close enough to H0, then we don’t reject H0,

otherwise we reject H0.

The above outline raises some questions:What families   are good to use?What subsets   0 are interesting?How do we project data onto   ?

How do we measure the distance between two pdf’s?

Page 22: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 22/52

1. Statistical Estimation

• One approach is to believe that data are sampled from a pdfthat parameterized.

 – Example: one can assume data comes from a normal distributionwith mean, μ, and variance, σ2.

• Statistics are functions of the sample that estimateparameters.

• What properties should a summary statistic have?

 – There is no agreement on this.

• Consider unbiasedness:

 – For normal population, E(sample mean) = μ.

 – However, E(s) = c4(n)σ, where c4(n) 1 – 1/(4n).* 

 – Let Xi ~ Binomial(n, p), then there is no unbiased estimator of 1/p.

*See http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation 

Page 23: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 23/52

Maximum Likelihood (ML) Methods

• Here we’ll focus on ML, which has nice mathematicalproperties and also performs well in practice.

• Assume that independent data comes from pdf f(x| ) –  = vector of parameters

 – xi ~ f(x| ) – x = (x1, x2, …, xn) ~ f(xi| ) = joint pdf

• Likelihood = L( |xi) = f(xi| ) – The likelihood is the joint pdf considered as a function of   

 – L( |xi) can be viewed as a manifold, but more on this later.

• ML finds the  max that maximizes L( |xi) –  max is the most likely value to produce the observed data.

•  max is our estimate of the population parameters   

Page 24: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 24/52

Example: Likelihood forNormal Human Body Temp.

Same data as before

Normal distribution

Likelihood

MLE is at the peak of the

likelihood curve.

Page 25: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 25/52

MLE: μ=97.92, σ=0.637 

For the normal distribution,

the MLEs for μ and σ are the

sample mean and the

population standard

deviation.

Solving this numerically

involves the geometry of hill-

climbing. Gradient ascent

iterates toward the maximum

by moving in direction of 

gradient.

In practice, gradient ascent

can converge slowly. Here a

second-order method would

converge much quicker.See http://en.wikipedia.org/wiki/Gradient_descent 

Sample standard deviation

Population standard deviation

Page 26: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 26/52

Good Properties of MLEs

• Below,   is a vector of parameters for a family of pdf’s.

• Theorem. If  max is the MLE of  , then g( max) is a MLE of g( ).

• Theorem. If f(x| ) is “well behaved,” then  max is consistent. That

is,  max   true, almost surely.• Theorem. If f(x| ) is “well behaved,” then  max is asymptotically

normal: (n) ( max  –  true) N(0,I-1) as n  , where I is the

Fisher information matrix.

 – Because this normal distribution has mean 0,  max is asymptotically

unbiased.

See Wikipedia.org for more info: http://en.wikipedia.org/wiki/Maximum_likelihood 

Page 27: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 27/52

2. Fisher Information

• Last slide said that (n) ( max  –  true) N(0, I-1), where I

is the Fisher information. So what is I? Why is it“informative?” For simplicity, let  be a scalar:

dx X  f  X  f 

 X  f  E  X  f  E 

 X  f Var  I 

 |log|

|log|log

|(log)(

2

2

2

2

2

2

Page 28: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 28/52

• An defined in information theory, H(X) is the entropy ofrandom variable X.

• So I(X) = 2 /  2 H(X).

• However, why is H(X) a reasonable measure of information?

 – Let’s consider the case of a discrete random variable, X. 

 – Then H(X) = i pi log(pi).

 – Information theory uses axiomatic approach and shows that entropygives unique functional form. See Section 6 of Shannon (1948).

 – Statistical mechanics derives H(X) from counting microstates usingthe multinomial coefficient and Stirling’s approximation.

• Log(C) = log(N!/ i(Npi!) ) ~ i pi log(pi). See section on entropy inChapter 8 of Ambegaokar (1996).

• Applying this to real physics (e.g., gases) requires ergodic theory.

dx X  f  X  f  X  H  |log|)(

Page 29: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 29/52

Cramér-Rao Lower Bound (CRLB):Another Way to View I as Informative

• Let γ(X) be an unbiased estimator of g( ).

• Let αi =  /  i E(γ(X)).

• Let Iij = -E(2 /  i  j log(f(x| )) ), where x = sample vector.

 – The matrix I is called the Fisher information matrix.

• Then Var(γ(X))  αtI-1α is the CRLB.

• Variance measures uncertainty: the lower the variance,the better the accuracy of the estimator γ(X) of g( ). – The closer Var(γ(X)) is to the CLRB, the better γ(X) is.

Page 30: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 30/52

Example of CRLB: Exponential Distribution

• Let Xi ~ Exponential( ) = e-x/   /    f(x| ), and suppose wewant to estimate  .

• log(f(x| )) = -x/    – log( ).

•  /  log(f(x| )) = x/  2  – 1/  = (x –  )/  2• I1 = E((x –  )/  2)2 = 1/  4 E((x –  )2) =  2 /  4 = 1/  2, which is

the information contained in 1 data point. So I = nI1 = n/  2.

• Hence the CRLB = I-1 =  2 /n.

• Since the sample mean has variance  2 /n, it is theuniformly minimum variance unbiased estimator (UMVUE)of  .

Page 31: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 31/52

3. The Exponential Family of PDFs

• All the usual parametric distributions seen in a first year statscourse come from the exponential family of pdf’s.

• This family has special properties, including special geometricproperties.

• Define f(x| ) = exp(i i( )Ti(x) – A()) g(x), where the sum is overthe parameters.

• Example: Binomial distribution  – For n fixed, then iid Binomial(n, p) ~  nCx px (1 – p)n-x =

 –  nCx (p/(1 – p))x (1 – p)n = exp(xi log(p/(1 – p)) + n log(1 – p) )  nCx 

 – So set T(x) = xi, (p) = log(p/(1 – p)), A(p) = n log(1 – p), g(x) =  nCx • Example: Normal distribution 

 – We can write the iid normal pdf as a 2 parameter exponential pdf:

 – Exp(  / 2 xi  – 1/(22) xi2 - n/(22)  2) 1/((2 2 ))n 

Page 32: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 32/52

Sufficient Statistics

• The natural parameterization for an exponential pdf is: – f(x| ) = exp(i iTi(x) – A()) g(x) (*)

 – That is, i are the parameters: i( )  i.

• For a parametric family of pdf’s parameterized by  , a

statistic T(X) is sufficient for X, if the distribution X|T(X) isindependent of  . – The intuition is that T(X) is as informative as X with respect to the

parameters  .

• Theorem. For X with a full rank exponential family pdf

with natural parameterization (*), T (T1, T2, …, Ts) issufficient. – xi is sufficient for p for iid binomial data.

 – xi and xi2 are sufficient for ( , 2) for iid normal data.

Page 33: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 33/52

4. Combining 1, 2, and 3 to CreateRiemannian Manifolds

• In part 1, we considered the likelihood = L( |xi) = f(xi| ), as afunction to optimize over  given the data {xi}.

• In part 2, we introduced the Fisher information matrix withentries: Iij = -E(2 /  i  j log(f(x| )) ).

• In part 3, we noted that the exponential family of pdf’s isimportant: f(x| ) = exp(i i( )Ti(x) – A()) g(x).

• In this part, we consider likelihoods of exponential family pdf’s as Riemannian manifolds with the metric given by the Fisherinformation matrix.

• Why Riemannian manifolds? – The metric allows us to measure distances intrinsically.

 – Hypothesis testing boils down to measuring distances to competinghypotheses.

 – All the theory has been worked out.

Page 34: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 34/52

Example: Extrinsic View of the Unit Sphere

Here we map (- /2, /2) x (0, 2) R3 using spherical coordinates: r = 1,

latitude, and longitude.

The surface is 2 dimensional:(latitude, longitude), but the

graphical representation is 3

dimensional.

Intrinsic point of view means

imagining a 2 dimensional spacewhere distances and areas are

not Euclidean.

Page 35: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 35/52

Example: Intrinsic View of the Unit Sphere

Sphere = [0, 2] x [- /2, /2] = closed rectangle.

Let v = latitude and u = longitude.

Metric for sphere: ds2 = E du2 + 2F du dv + G dv2 = Cos[v]2 du2 + dv2.

Area element for sphere: dA = (EG – F2) = Cos[v] du dv.

E, F, and G

For dA calculation

What is surface area of a sphere?

4)cos(2 / 

2 / 

2

0v ududvv

What is length of the equator?

This is defined by v = 0 dv = 0.

2

02

udu

Page 36: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 36/52

• Normal distribution’s likelihood is – Exp(  / 2 xi  – 1/(22) xi

2 - n/(22)  2) 1/((2 2 ))n 

• The manifold is a half-plane: – ( , 2 ) [-, ] x [0, ]

• Metric = Fisher information (n=1):

• Let var  2. Then we have ds2 given by: – This has constant Riemannian curvature.

 – See Skovgaard (1984).

• According to p. 553 of Gray (1993), this is called thegeneralized Poincare metric on the upper half plane.

Example of a Statistical Manifold

4

2

210

01

2

222

var2

var

var

d d ds

Page 37: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 37/52

5. Example 1, p. 1193, of Efron (1975):An Application of Statistical Curvature

Example 1 of Efron (1975): Bivariate normal with fixed coefficient of variation.

Let X ~ N( , I), where  = ( , (c/2)  2). Efron defines statistical curvature by the   below, which is a function of M.

322

2

22

222

1

1

c

c

cc

cc M 

Page 38: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 38/52

Sampling Data from Riemannian Manifolds

• Not all data is Euclidean!

• For instance, directional data isn’t Euclidean. 

 – Angles are cyclic: 350 + 20 = 10

• Small range of angles can be treated with the usualtechniques.

 – Example: Geological faults can be strongly aligned.

• Large range of angles require a different approach.

 – Example: Wind directions vary greatly over a large area.

Page 39: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 39/52

http://earthquake.usgs.gov/earthquakes/recenteqscanv/FaultMaps/122-38.html 

Accessed 8/19/2011.

http://www.iwindsurf.com/windandwhere.iws?regionID=193&regionProductID=2&timeoffset=1 

Accessed 8/19/2011.

Two examples of directional data.

Page 40: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 40/52

N. Fisher’s and Circular Data 

• The following is from section 2.3.1 of Fisher (1993).

• Suppose data = { 1,  2, …,  n}, where the angles aremeasured in radians.

• He defines mean direction,  mean, as follows:

 – Let C = cos( i), and S = sin( i) – Let R2 = C2 + S2 

 – Let cos( mean) = C/R and sin( mean) = S/R

 – Then  mean = arctan(S/C) for S > 0, C > 0, or

 –    mean = arctan(S/C) + for C < 0, or

 –    mean = arctan(S/C) + 2 for S < 0, C > 0.

• The above is computing the direction of the vector sumof the unit vectors, (cos( i), sin( i)). – This makes this approach extrinsic.

Page 41: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 41/52

Intrinsic Approach as stated in Krakowski (2002)

From section 4.1, page 47, of Krzysztof Krakowski’s dissertation from the

Department of Mathematics and Statistics at The University of Western Australia.

Let M be a complete Riemannian manifold with distance function d. Let Q be

a finite sample of points from M.

Definition 4.1.1 Let Q : M R be a function defined as follows:

Define the Riemannian variance 2(Q) as the global minimum of Q . The

Riemannian mean is the set of points for which Q = 2(Q). It can be shownthat the Riemannian mean is always non-empty.

Qq

Q q xd Q

 x 2),(||

1)(

Page 42: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 42/52

Application to the Movements of Sea Stars

• Let’s compare Fisher’s  mean to the Riemannian meanusing the following angle data (in degrees): – {0, 1, 3, 3, 8, 13, 16, 18, 30, 31, 43, 45, 147, 298, 329, 332, 335,

340, 350, 354, 356, 357}

•  mean = 3.1• ListPlot of {Cos[ ],Sin[ ]} below.

Values are from data set B.11

on p. 245 from Fisher (1993).

Page 43: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 43/52

Riemannian Mean

• For the Riemannian mean we minimize the following: – f[x_,data_]:=Mean[Map[If[#>Pi,(2Pi-#)^2,#^2]&,Abs[data-x] ] ]

Minimum is at 7.6°,

compared to 3.1°.

0.134 radians

Page 44: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 44/52

Last Example: Applying Manifoldsto Multivariate Data

• Suppose there is a multivariate data set, say bodymeasurements of a sample of n men (see next slide).

• Goal: Analyze typical body shape.

• The vector of averages of each variable need not be atypical body shape.

• Let’s apply a Riemannian manifold approach based on

the work of David Kendall.

 – See Kendall (1984) and Kendall (1989).

Page 45: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 45/52

Transforming the Data

• Since analyzing overall shape is the goal, for eachmale’s data vector, translate so that the mean is zero

and rescale so that the variance is 1.

 – (xi1, xi2, …, xik ) (zi1, zi2, …, zik ) s.t  j zij = 0 and  j zij 2 = 1, i.

• Let zi = (zi1, zi2, …, zik ), then {zi} is a sample of n pointson the intersection of the (k-1)-sphere, Sk-1 and the plane j zij = 0, which is a (k-2)-sphere, Sk-2.

• Sk-2 is a manifold, and so we can say a typical body is

given by the Riemannian mean with variability measuredby the Riemannian standard deviation.

Page 46: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 46/52

We Take a 3 Variable Subsetof the Full Data set

We restrict ourselves to just the neck, abdomen and thigh measurements.

Page 47: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 47/52

Plot of Transformed Variables vs.Plots of Original Variables

Since we started with k = 3 variables,

the result is a circle (S1), so we can

use circular data methods for an

approximate answer.

Two 3D scatterplots of the

original measurements.

Page 48: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 48/52

References

• Ambegaokar (1996). Reasoning about Luck: Probability and Its Uses in Physics,Cambridge, 1996.

• Bradley Efron (1975). “Defining the Curvature of a Statistical Problem (with Applications toSecond Order Efficiency),” The Annals of Statistics , 3, 1189-1242.

• N. I. Fisher (1993). Statistical Analysis of Circular Data , Cambridge.• Alfred Gray (1993). Modern Differential Geometry of Curves and Surfaces , CRC.

• David Kendall (1984). “Shape Manifolds, Procrustean Metrics, and Complex ProjectiveSpaces,” Bulletin of the London Mathematical Society , 16, 81-121.

• David Kendall (1984). “A Survey of the Statistical Theory of Shape,” Statistical Sciences , 4,87-120.

• Krzysztof Krakowski (2002). Geometrical Methods of Inference . Dissertation fromDepartment of Mathematics and Statistics, The University of Western Australia.

• E. L. Lehmann (1983). Theory of Point Estimation , Wiley.• David Saville and Graham Wood (1991). Statistical Methods: The Geometric Approach ,

Springer.• C. E. Shannon (1948). “A Mathematical Theory of Communication,” The Bell System 

Technical Journal , 27, 379-423.• Lene Skovgaard (1984). “A Riemannian Geometry of the Multivariate Normal Model,”

Scandinavian Journal of Statistics , 11, 211-223.

Page 49: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 49/52

References: Web Pages

• Normal human body temperature data – http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html  – http://en.wikipedia.org/wiki/Normal_human_body_temperature 

• Biasedness of sample standard deviation – http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation 

• Gradient ascent

 – http://en.wikipedia.org/wiki/Gradient_descent • Properties of MLEs

 – http://en.wikipedia.org/wiki/Maximum_likelihood 

• Examples of directional data – http://earthquake.usgs.gov/earthquakes/recenteqscanv/FaultMaps/122-38.html  – http://www.iwindsurf.com/windandwhere.iws?regionID=193&regionProductID=2&

timeoffset=1 • Examples of circular distributions – http://en.wikipedia.org/wiki/Wrapped_normal_distribution  – http://en.wikipedia.org/wiki/Von_Mises_distribution 

Page 50: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 50/52

Differential Geometry and Statistics … 

• Interest is this connection is old: – Dates back to a paper by Rao (1945), “Information and accuracy

attainable in the estimation of statistical parameters.” 

• Research took off in the late 1970s and 1980s:

 – Shun-Ichi Amari, Ole Barndorff-Nielsen active• Books have been written:

 – Differential Geometry in Statistical Inference by Amari (1987) 

 – Differential Geometry and Statistics by Murray and Rice (1993) 

• Field called information geometry is active now: – Methods of Information Geometry by Amari and Nagaoka (2007)

• BUT the results don’t seem to be applied to practicalproblems.

Page 51: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 51/52

What Distribution is Best for Circular Data?

• Fisher (1993) suggests several distributions, but thefollowing two are the most popular:

 – Wrapped normal distribution

 – von Mises distribution

k 2

2

2

2exp

2

1

See http://en.wikipedia.org/wiki/Wrapped_normal_distribution and

http://en.wikipedia.org/wiki/Von_Mises_distribution.

0

cos

2 I 

ex

However, neither of these have all the properties one expects

from working with the normal distribution with Euclidean data.

Page 52: The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

8/4/2019 The Intersection of Statistics with Geometry, Information, and Riemannian Manifolds

http://slidepdf.com/reader/full/the-intersection-of-statistics-with-geometry-information-and-riemannian-manifolds 52/52

Computation of Fisher Informationfor the Normal Distribution