38
Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Embed Size (px)

DESCRIPTION

A datum (16) (0)

Citation preview

Page 1: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Vector geometry: A visual tool for statistics

Sylvain ChartierLaboratory for Computational Neurodynamics and CognitionCentre for Neural Dynamics

Page 2: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Vector geometry

• How using a vector (arrow) we can represent concepts of– Mean, variance (standard deviation), normalization and

standardization.• How using two vectors we can represent concepts of

– Correlation and regression.

Page 3: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

A datum

(16)(0)

Page 4: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

(16)

(8)

Principal of independence of observation : perfectly opposed direction

(0)

Two data

Page 5: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

(16)

(8)(16,8)

(0)

Two data

(0, 0)

Page 6: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

(16,8)

(0, 0)

Two data

Page 7: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Starting point: Zero

(16,8)

Finish point

Starting point

(0,0)

Page 8: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

x = (x1, x2)

Finish point

Starting point

),( xx

Starting point: Mean

Page 9: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

x = (16, 8)

Finish point

Starting point (12, 12)

Starting point: Mean

Page 10: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

One group

Page 11: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Many groups

Page 12: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Degrees of freedom

Page 13: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

We remove the effect of the meanWe centralized the data

= (4, -4)

Finish point

Starting point (mean) (12, 12)

(0, 0)

xx

x = (16, 8)

Page 14: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

We remove the effect of the mean(many groups)

Page 15: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

We remove the effect of the mean(many groups)

Page 16: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

What is the real dimensionality?

We remove the effect of the mean(many groups)

Page 17: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

We remove the effect of the man

• If we have two data, we will get one dimension.• If we have three data, we will get two dimensions

.

.

.• If we have n data, we will get n-1 dimensions.

In other words, degrees of freedom represent the true dimensionality of the data..

Page 18: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Variance

Page 19: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

(1.5, -1.5)(-0.5, 0,5) (2.5, -2.5)

What is the difference between these three (composed of two data each) ?

Length (distance) The higher the variability, the longer the lengthwill be.

Page 20: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

What is the difference between these three groups?

How do we measure the length (distance)?PythagorasHypotenuse of a triangle? = (4^2+3^2) = 25 = 5

4

(4,3)

3?5

Page 21: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

What is the difference between these three groups?

Therefore, the point (4,3) is at a distance of 5 from its starting point.

(4,3)5

n

ii xx

1

22 )(5 = sum of squares = variance×(n-1)

Page 22: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

What is the difference between these three groups?

What is the length of this three lines?

1?

1

1

?

?

A)

C)

B)

1

1 11

2

3

The dimensionality inflates the variability.

In order to a have measure that can take into account for the dimensionality, what do we need to do?

Page 23: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

What is the difference between these three groups?•We divide the length of the data set by its true dimensionality

1

)( Variance 1

2

n-

xxn

ii

= (quadratic) distance (from the mean) corrected by the (true) dimensionality of the data.

Page 24: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Normalization et standardization

Page 25: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Normalization vs Standardization

• To normalize is equivalent as to bring a given vector x (arrow) centered (mean = 0) at a length of 1..

• Normalization: z = x by its length zTz = 1

• Standardization: zx = x SD zx

Tzx = n-1

=> zx = z*(n-1)

Page 26: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups

Page 27: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

One group of three participants

Page 28: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups of three participants

Page 29: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups of three participants

• They can be represented by a plane

Page 30: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups of three participants

• They can be represented by a plane

Page 31: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups of three participants

• They can be represented by a plane

Page 32: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Two groups of three participants

• They can be represented by a plane

• This is true whatever the number of participants

Page 33: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Correlation and regression

Page 34: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Relation between two vectors• If two groups (u and v) has the same data, then the two vectors are superposed on

each other. • As the two vectors distinguish from each other, the angle between them will increase.

Page 35: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

• If the angle reaches 90 degrees, then they share nothing in common.

Relation between two vectors

Page 36: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

• The cosine of the angle is the coefficient of correlation

Relation between two vectors

T1 cov

cos

n

i ii r

s s

uv

uvu v

u vu vu v u v

Page 37: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

– The shortest distance is the one that crosses at 90° the vector u

Relation between two vectors

• Regression: 0 1v̂ b b u

b

e

Page 38: Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

– By substitution, we can isolate the b1 coefficient.

Relation between two vectors

• Regression: The formula to obtain the regression coefficients can be obtained directly from the geometry

T

T1

T T1

T T1

T 1 T T 1 T1

T 1 T1 1

0

( ) 0

0

( ) ( ) ( ) ( )

( ) ( ) 1

b

b

b

b

b b

u eu v u

u v u u

u v u u

u u u v u u u u

u u u v

If we generalized to any situation (multiple, multivariate)

T 1 T( )B X X X Y