62
CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Review (Probability & Linear Algebra)

Review (probability & linear algebra)

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review (probability & linear algebra)

CE-725 : Statistical Pattern Recognition Sharif University of TechnologySpring 2013

M. Soleymani

Review (Probability & Linear Algebra)

Page 2: Review (probability & linear algebra)

Outline Axioms of probability theory Conditional probability, Joint probability, Bayes rule Discrete and continuous random variables Probability mass and density functions Expected value, variance, standard deviation

Expectation for two variables covariance, correlation

Some probability distributions Gaussian distribution

Linear Algebra

2

Page 3: Review (probability & linear algebra)

Basic Probability Elements

3

Sample space ( ): set of all possible outcomes (or worlds) Outcomes are assumed to be mutually exclusive.

An event is a certain set of outcomes (i.e., subset of ).

A random variable is a function defined over the samplespace Gender: → { , } Height: → ℝ

Page 4: Review (probability & linear algebra)

Probability Space A probability space is defined as a triple :

A sample space Ω ≠ ∅ that contains the set of all possibleoutcomes (outcomes also called states of nature)

A set whose elements are called events. The events aresubsets of Ω. should be a “Borel Field”.

represents the probability measure that assignsprobabilities to events.

4

Page 5: Review (probability & linear algebra)

Probability and Logic Probability theory can be viewed as a generalization of

propositional logic Probabilistic logic

: is a propositional logic sentence Belief of agent in is no longer restricted to true, false,unknown

( ) can range from 0 to 1

5

Page 6: Review (probability & linear algebra)

Probability Axioms (Kolomogrov)

6

Axioms define a reasonable theory of uncertainty

Kolomogrov’s probability axioms (propositional notation) ( ) ≥ 0 (∀ ∈ ) If is a tautology = 1 ∨ = + − ( ∧ ) (∀ , ∈ )

True

a b^

Page 7: Review (probability & linear algebra)

Random Variables Random variables: Variables in probability theory

Domain of random variables:Boolean, discrete or continuous

Probability distribution: the function describing probabilitiesof possible values of a random variable = 1 = , ( = 2) = , …

7

Page 8: Review (probability & linear algebra)

Random Variables Random variable is a function that maps every outcome

in to a real (complex) number. To define probabilities easily as functions defined on (real)

numbers. Expectation, variance, …

Head

Real line0 1

Tail

8

Page 9: Review (probability & linear algebra)

Probabilistic Inference

9

Joint probability distribution Can specify probability of every atomic event We can find every probabilities from it (by summing over atomic

events).

Prior and posterior probabilities belief in the absence or presence of evidences

Bayes’ rule used when it is difficult to compute ( | ) but we have information

about ( | ) Independence new evidence may be irrelevant

Page 10: Review (probability & linear algebra)

Joint Probability Distribution

10

Probability of all combinations of the values for a set ofrandom variables.

If two or more random variables are considered together, theycan be described in terms of their joint probability

Example: Joint probability of features ( , , … , )

Page 11: Review (probability & linear algebra)

Prior and Posterior Probabilities Prior or unconditional probabilities of propositions: belief in

the absence of any other evidence e.g., = = 0.14( ℎ = ) = 0.6

Posterior or conditional probabilities: belief in the presence ofevidences ( | ) is the probability of given that is true e.g., ℎ = = )

11

Page 12: Review (probability & linear algebra)

Conditional Probability

if

Product rule (an alternative formulation):( ∧ ) = ( | ) ( ) = ( | ) ( ) ( | ) obeys the same rules as probabilities

E.g., ( | ) + (~ | ) = 1

12

Page 13: Review (probability & linear algebra)

Conditional Probability For statistically dependent variables, knowing the value of

one variable may allow us to better estimate the other.

All probabilities in effect are conditional probabilities E.g., ( ) = ( | )

Type equation here.(. | ) ΩΩ

Renormalize the probability of events jointly occur with b

13

Page 14: Review (probability & linear algebra)

Conditional Probability: Example Rolling a fair dice : the outcome is an even number : the outcome is a prime number

1 161 32

P a bP a b

P b

14

Page 15: Review (probability & linear algebra)

Chain rule

Chain rule is derived by successive application of product rule:( 1, … , ) = ( 1, … , ) ( | 1, … , )= ( 1, … , ) ( | 1, … , ) ( | 1, … , )= … = ( ) ( | 1, … , )

15

Page 16: Review (probability & linear algebra)

Law of Total Probability mutually exclusive and is an event (subset of )

= | ( ) + | ( ) + ⋯ + ( )

16

Page 17: Review (probability & linear algebra)

Independence Propositions and are independent iff

( | ) = ( )= ( , ) = ( ) ( ) Knowing tells us nothing about (and vice versa)

17

Page 18: Review (probability & linear algebra)

Bayes Rule

Bayes rule: Obtained form product rule: = =

In some problems, it may be difficult to compute ( | )directly, yet we might have information about ( | ).

( | ) = ( | ) ( ) / ( )18

Page 19: Review (probability & linear algebra)

Bayes Rule

Often it would be useful to derive the rule a bit further:

19

Page 20: Review (probability & linear algebra)

Bayes Rule: Example

20

Meningitis( ) & Stiff neck ( )

= = 0.01 = 0.7

( | ) = ( ) / ( ) = 0.7 × 0.0002/0.01 = 0.0014

Page 21: Review (probability & linear algebra)

Bayes Rule: General Form

21

Page 22: Review (probability & linear algebra)

Calculating Probabilities from Joint Distribution

22

Useful techniques Marginalization

= ∑ ( , )∈ Conditioning

= ∑ ( )∈ Normalization

X| = X,

Page 23: Review (probability & linear algebra)

Probability Mass Function (PMF) Probability Mass Function (PMF) shows the probability

for each value of a discrete random variable Each impulse magnitude is equal to the probability of the

corresponding outcome

Example: PMF of a fair dice

( = ) ≥ 0( = )∈ = 123

( )

Page 24: Review (probability & linear algebra)

Probability Density Function (PDF) Probability Density Function (PDF) is defined for

continuous random variables The probability of is ( ) ×

0 0 00

1( ) lim ( )2 2Xp x P x X x

0 0,2 2

x x x

24

( )( ) ≥ 0= 1

( )

Page 25: Review (probability & linear algebra)

Cumulative Distribution Function (CDF) Cumulative Distribution Function (CDF)

Defined as the integration of PDF Similarly defined on discrete variables (summation instead of integration)

.

X

x

X X

CDF x F x P X x

F x p d

Non-decreasing

Right Continuous(−∞) = 0(∞) = 1( ) = ( )( ) ( ) ( )X XP u x v F v F u

25

( )

Page 26: Review (probability & linear algebra)

Distribution Statistics Basic descriptors of spatial distributions: MeanValue Variance & Standard Deviation Moments Covariance & Correlation

26

Page 27: Review (probability & linear algebra)

Expected Value Expected (or mean) value: weighted average of all possible

values of the random variable Expectation of a discrete random variable :

Expectation of a function of a discrete random variable : Expected value of a continuous random variable : Expectation of a function of a continuous random variable :

x

E X xP x

( ) ( )x

E g X g x P x

XE X xp x dx

XE g X g x p x dx

27

Page 28: Review (probability & linear algebra)

Variance Variance: a measure of how far values of a random

variable are spread out around its expected value

Standard deviation: square root of variance

2

2 2

X

X

Var X E X

E X

var( )X X

X E X

28

Page 29: Review (probability & linear algebra)

Moments Moments nth order moment of a random variable :

Normalized nth order moment:

The first order moment is the mean value. The second order moment is the variance added by the

square of the mean.

nXE X

nnM E X

29

Page 30: Review (probability & linear algebra)

Correlation & Covariance Correlation

Covariance is the correlation of mean removed variables:

( , )Crr X Y E XY

( , ) X Y X YCov X Y E X Y E XY

30

( , )x y

xyP x y Discrete RVs

Page 31: Review (probability & linear algebra)

Covariance: Example

31

, = 0 , = 0.9

Page 32: Review (probability & linear algebra)

Covariance Properties

32

The covariance value shows the tendency of the pair ofRVs to increase together > 0 ∶ and tend to increase together < 0 : tends to decrease when increases = 0 : no linear correlation between and

Page 33: Review (probability & linear algebra)

Orthogonal, Uncorrelated & Independent RVs Orthogonal random variables ( ) , = 0

Uncorrelated random variables ()

, = 0 Independent random variables

, = 0 ⇏ Independent random variables

33

Page 34: Review (probability & linear algebra)

Pearson’s Product Moment Correlation

34

Defined only if both and are finite and nonzero.

shows the degree of linear dependence between and .

−1 ≤ ≤ 1 − − ≤ − − according to Cauchy-

Schwarz inequality ( ≤ )

= 1 shows a perfect positive linear relationship and = −1shows a perfect negative linear relationship

Page 35: Review (probability & linear algebra)

Pearson’s Correlation: Examples

35

[Wikipedia]

= ( , )

Page 36: Review (probability & linear algebra)

Covariance Matrix If is a vector of random variables ( -dim random

vector): Covariance matrix indicates the tendency of each pair of RVs

to vary together

36

= (( − )( − )) ⋯ (( − )( − ))⋮ ⋱ ⋮(( − )( − )) ⋯ (( − )( − ))

= [ − − ]= ⋮ = ( )⋮( )

Page 37: Review (probability & linear algebra)

Covariance Matrix: Two Variables

37

12 21 ( , )Cov X Y

= 1 00 1 = 1 0.90.9 1

Page 38: Review (probability & linear algebra)

Covariance Matrix

38

shows the covariance of and :

( , ) .ij i j i i j jCov X X E X X

= ⋯⋯⋮ ⋮ ⋱ ⋮⋯

Page 39: Review (probability & linear algebra)

Sums of Random Variables= + Mean: = + Variance: ( ) = ( ) + ( ) + 2 ( , ) If , independent: ( ) = ( ) + ( )

Distribution:

,( ) ( , )

( ) ( ) ( ) ( )

Z X Y

X Y X Y

p z p x z x dx

p x p y p x p z x dx

39

independence

Page 40: Review (probability & linear algebra)

Some Famous Probability Mass and Density Functions Uniform

Gaussian (Normal)

1p x U a U bb a

2

22.1 ,. 2

x

p x e N

40

~ ( , )( )1−

~ ( , )( )12

Page 41: Review (probability & linear algebra)

Some Famous Probability Mass and Density Functions

Binomial

Exponential

1 n k knP X k p p

k

00 0

xx e x

p x e U xx

41

~ ( , )( = )

( )

0

Page 42: Review (probability & linear algebra)

Gaussian (Normal) Distribution

42

68% within − , +95% within − 2 , + 2

It is widely used to model the distribution of continuousvariables

Standard Normal distribution:

Page 43: Review (probability & linear algebra)

Multivariate Gaussian Distribution is a vector of Gaussian variables

Mahalanobis Distance:

1

1 1( ) ( )1 2( ) ( , )/2 1/2(2 ) | |

[ ] ( ) ( )

[( )( ) ]

Td

T

Tp N e

d

E E x E x

E

x μ x μx μ

μ x

x μ x μ

12 ( ) ( )Tr x μ x μ

43 −3−2

−10

12

3

−2

0

2

0

0.1

0.2

0.3

0.4

x1x2

Pro

babi

lity

Den

sity

Page 44: Review (probability & linear algebra)

Multivariate Gaussian Distribution

44

The covariance matrix is always symmetric and positive semi-definite

Multivariate Gaussian is completely specified by + (+ 1)/2 parameters

Special cases of : = : Independent random variables with the same variance

(circularly symmetric Gaussian)

Digonal matrix = …⋮ ⋱ ⋮⋯ : Independent random variables with

different variances

Page 45: Review (probability & linear algebra)

Multivariate Gaussian DistributionLevel Surfaces

45

The Gaussian distribution will be constant on surfaces in-space for which:− − = Principal axes of the hyper-ellipsoids are the eigenvectors of .

Bivariate Gaussian: Curves of constant density areellipses.

Hyper-ellipsoid

Page 46: Review (probability & linear algebra)

Bivariate Gaussian distribution and are the eigenvalues of ( ) and and

are the corresponding eigenvectors

46

=

Page 47: Review (probability & linear algebra)

Linear Transformations on Multivariate Gaussian

47

: Whitening transform

=

Page 48: Review (probability & linear algebra)

Gaussian Distribution Properties Some attracting properties of the Gaussian distribution:

Marginal and conditional distributions of a Gaussian are also Gaussian

After a linear transformation, a Gaussian distribution is again Gaussian There exists a linear transformation that diagonalizes the covariance matrix

(whitening transform). It converts the multivariate normal distribution into a spherical one.

Gaussian is a distribution that maximizes the entropy

Gaussian is stable and infinitely divisible Central Limit Theorem

Some distributions can be estimated by Gaussian distribution when theirparameter value is sufficiently large (e.g., Binomial)

48

Page 49: Review (probability & linear algebra)

Central Limit Theorem (under mild conditions) Suppose i.i.d. (Independent Identically Distributed) RVs

( = 1, … , ) with finite variances

Let be the sum of these RVs

Distribution of converges to a normal distribution asincreases, regardless to the distribution of the RVs.

Example:

1

N

N ii

S X

49

~ uniform, = 1, … ,= 1

Page 50: Review (probability & linear algebra)

Linear Algebra: Basic Definitions

Matrix :

Matrix Transpose

Symetric matrix = Vector

11 12 1

21 22 2

1 2

...

...[ ]

... ... ... ......

n

nij m n

m m mn

a a aa a a

a

a a a

A

1 ,1Tij jib a i n j m B A

1

1... [ ,..., ]Tn

n

aa a

a

a a

50

Page 51: Review (probability & linear algebra)

Linear Mapping Linear function ( + ) = ( ) + ( ) ∀ , ∈ ( ) = ( ) ∀ ∈ , ∈

A linear function: In general, a matrix × = ⋯ can be used to

denote a map : ℝ → ℝ where = + ⋯+ =

51

Page 52: Review (probability & linear algebra)

Inner (dot) product

Matrix multiplication

. .

[ ] [ ]

[ ] ,

ij m p ij p n

Tij m n ij i j

a b

c c A B

A B

AB = C

Linear Algebra: Basic Definitions

1

,n

Ti i

i

a b

a b a b

52

i-th row of A j-th column of B

Page 53: Review (probability & linear algebra)

Inner Product Inner (dot) product

Length (Euclidean norm) of a vector a is normalized iff ||a|| = 1

Angle between vectors

Orthogonal vectors and :

Orthonormal set of vectors , , … , :∀ , = 1 =0 . .

1

nT

i ii

a b

a b

2

1

nT

ii

a

a a a

cos|| |||| ||

T

a ba b

0T a b

53

Page 54: Review (probability & linear algebra)

Linear Independence A set of vectors is linearly independent if no vector is

a linear combination of other vectors. + + . . . + = 0 ⇒= = . . . = = 0

54

Page 55: Review (probability & linear algebra)

Matrix Determinant and Trace

Determinant ( ) = ( ) × ( )

Trace

1[ ]

n

jjj

tr A a

1det( ) ; 1,.... ;

( 1) det( )

n

ij ijj

i jij ij

A a A i n

A M

55

Page 56: Review (probability & linear algebra)

Matrix Inversion

Inverse of × :

exists iff det ( ) ≠ 0 (A is nonsingular) Singular: det ( ) = 0 ill-conditioned:A is nonsingular but close to being singular

Pseudo-inverse for a non square matrix # = is not singular # =

1 nAB BA I B A

56

Page 57: Review (probability & linear algebra)

Matrix Rank : maximum number of linearly independent

columns or rows of A.

× :

Full rank × : iff is nonsingular( )

57

Page 58: Review (probability & linear algebra)

Eigenvectors and Eigenvalues

det( ) 0n A I

1( )

n

jj

tr

A

1

det( )n

jj

A A

Characteristic equation: n-th order polynomial, with n roots

58

Page 59: Review (probability & linear algebra)

Eigenvectors and Eigenvalues: Symmetric Matrix For a symmetric matrix, the eigenvectors corresponding

to distinct eigenvalues are orthogonal

These eigenvectors can be used to form an orthonormalset ( and )

59

Page 60: Review (probability & linear algebra)

Eigen Decomposition: Symmetric Matrix

= ⇒ = = Eigen decomposition of a symmetric matrix:

60

Page 61: Review (probability & linear algebra)

Positive definite matrix Symmetric × is positive definite:

Eigen values of a positive define matrix are positive: ∀ , > 0

61

Page 62: Review (probability & linear algebra)

Simple Vector Derivatives

62