Unit 2 Statistics Review Wang, Yuan-Kai, 王元凱 [email protected]http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 輔仁大學電機工程系 2006~2011 Bayesian Networks Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright Reference this document as: Wang, Yuan-Kai, “Statistics Review," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
1. Bayesian Networks Unit 2 Statistics Review Wang, Yuan-Kai,
[email protected] http://www.ykwang.tw Department of
Electrical Engineering, Fu Jen Univ. 2006~2011 Reference this
document as: Wang, Yuan-Kai, Statistics Review," Lecture Notes of
Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.Fu Jen University
Department of Electrical Engineering Wang, Yuan-Kai Copyright
2. Unit - Statistics Review p. 2 Goal of this Unit Review basic
concepts of statistics in terms of Image processing Pattern
recognitionFu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
3. Unit - Statistics Review p. 3 Related Units Previous unit(s)
Probability Review Next units Uncertainty Inference (Discrete)
Uncertainty Inference (Continuous)Fu Jen University Department of
Electrical Engineering Wang, Yuan-Kai Copyright
4. Unit - Statistics Review p. 4 Self-Study Artificial
Intelligence: a modern approach Russell & Norvig, 2nd, Prentice
Hall, 2003. pp.462~474, Chapter 13, Sec. 13.1~13.3 , 2002 D.
Grifiths, 2009, O ReillyFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
5. Unit - Statistics Review p. 5 Contents 1. Introduction 6 2.
Histogram 12 3. Central Tendency .............................. 18
4. Variance ............................................. 26 5.
Frequency Distribution ...... 34 6. Covariance
......................................... 52 7. Covariance Matrix
57 8. Correlation .......................................... 64 9.
Chart and Graph .. 68 10.References 79Fu Jen University Department
of Electrical Engineering Wang, Yuan-Kai Copyright
6. Unit - Statistics Review p. 6 1. Introduction Probability
and statistics are about uncertainty The world is full of
uncertainty Our hardware/software implementation needs to consider
uncertaintyFu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
7. Unit - Statistics Review p. 7 Uncertainty by Probability It
summarizes the uncertainty that arises from laziness and ignorance
An example P(your toothache is caused by a cavity) = 0.8 20%
represents your laziness and ignorance all other possible causesFu
Jen University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
8. Unit - Statistics Review p. 8 Uncertainty by Statistics It
derives probabilistic facts from a set of data Derive actual
probability number P(your toothache is caused by a cavity) = 0.8
Describe characteristics of data Mean, variance, moment, ... Build
the statistic model of data Gaussian, Gaussian Mixture Reason new
facts from the dataFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
9. Unit - Statistics Review p. 9 What Is Statistics Given a set
of data from a random variable A statistic is a number that
provides information about the data Descriptive statistics Two way
to describe data Measures of central tendency Measures of
dispersionFu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
10. Unit - Statistics Review p. 10 Measures of Central Tendency
Mean Average, expected value of the random variable Median Middle
value of the R.V. Mode The variable value at the peak of the
pmf/pdfFu Jen University Department of Electrical Engineering Wang,
Yuan-Kai Copyright
11. Unit - Statistics Review p. 11 Measures of Dispersion
Dispersion Variance Covariance Correlation Moment Others: range,
percentiles, 95% percentile,Fu Jen University Department of
Electrical Engineering Wang, Yuan-Kai Copyright
12. Unit - Statistics Review p. 12 2. Histogram This course has
15 students Every student has a score with values: 0, 10, 20, ...
100 Random variable X = Students score Scores of the 15 student
{20, 90, 90, 100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} 20x1;
50x1; 60x2; 70x4; 80x3; 90x3; 100x1Fu Jen University Department of
Electrical Engineering Wang, Yuan-Kai Copyright
13. Unit - Statistics Review p. 13 The Histogram 20x1; 50x1;
60x2; 70x4; 80x3; 90x3; 100x1 No. of X X 10 20 30 40 50 60 70 80 90
100 20x1/15; 50x1/15; 60x2/15; 70x4/15; 80x3/15; 90x3/15; 100x1/15
P(X) X Histogram is P.D.F. 10 20 30 40 50 60 70 80 90 100Fu Jen
University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
14. Unit - Statistics Review p. 14 Definitions For a random
variable X X has n possible values {x1, x2, ..., xn} Now there are
N random data of X x1, x2, .., xN Histogram & Distribution The
number of xi : N(xi) The probabilities of xi : p(xi)= N(xi)/NFu Jen
University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
15. Unit - Statistics Review p. 15 Histogram v.s. P.D.F.Fu Jen
University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
16. Unit - Statistics Review p. 16 2D Gaussian Histogram pdfFu
Jen University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
17. Unit - Statistics Review p. 17 Histogram of an Image Random
variable X (Gray level) has n possible values {x1, x2, ..., xn},
n=256 N random data x1, x2, .., xN of X, N=Width*Height Histogram:
N(xi) Distribution: P(xi) = N(xi) / NFu Jen University Department
of Electrical Engineering Wang, Yuan-Kai Copyright
18. Unit - Statistics Review p. 18 3. Central Tendency Random
variable X = Students score Scores of the 15 student {20, 90, 90,
100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} 20x1; 50x1; 60x2;
70x4; 80x3; 90x3; 100x1 P(X) Histogram X 10 20 30 40 50 60 70 80 90
100 Mean ?Fu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
19. Unit - Statistics Review p. 19 Mean Mean from the set of
data 1 N E[ X ] x N x i 1 i Mean from the p.d.f n E [ X ] x xi p(
xi ) i 1Fu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
20. Unit - Statistics Review p. 20 Mean of an Image 1 N n E[ X
] x xi E [ X ] x xi p( xi ) N i 1 i 1Fu Jen University Department
of Electrical Engineering Wang, Yuan-Kai Copyright
21. Unit - Statistics Review p. 21 Disadvantage of Mean Mean is
easily influenced by outlier (extreme values) Mean may not be the
real value P(X) 1 N x N x i 72 i 1 X 10 20 30 40 50 60 70 80 90
100Fu Jen University Department of Electrical Engineering Wang,
Yuan-Kai Copyright
22. Unit - Statistics Review p. 22 Median & Mode Median and
mode are another measures of central tendency Median: (1) Sort the
scores, (2) Select the middle {20, 50, 60, 60, 70, 70, 70, 70, 80
,80, 80, 90, 90, 90, 100} Mode: select the score with the maximum
N(X) or P(X) 20x1; 50x1; 60x2; 70x4; 80x3; 90x3; 100x1 P(X) X 10 20
30 40 50 60 70 80 90 100Fu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
23. Unit - Statistics Review p. 23 Advantage of Median &
Mode Median and mode is not influenced by outlier Median and mode
will be the real valueFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
24. Unit - Statistics Review p. 24 Expected Value E[X] : mean n
E[ X ] x xi p ( xi ) E[ X ] x i 1 xp( x)dx E[Xr] rth moment of X n
E[ X ] xi p ( xi ) E[ X ] x r p ( x)dx r r i 1 E[(X-)r] rthn
central moment of X E[( X ) ] ( xi ) p( xi ) r r i 1 E[( X ) ] ( x
) p ( x)dx r r Fu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
25. Unit - Statistics Review p. 25 Deviation about the Mean x
xi It indicates how far a value is from the center It is a very
important number to measure the dispersion of how a distribution
spreads outFu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
26. Unit - Statistics Review p. 26 4. Variance Variance and
standard deviation come from the deviation Average Deviation
Calculate all of the deviations and find their average It is a
measure of the typical amount any given data point might vary N (
xi x ) x x i AD i 1 NFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
27. Unit - Statistics Review p. 27 We need Absolute Deviation
xi x x i N xi | xi x | 1 1-3=-2 xi x 1 |1-3|=2 2 2-3=-1 AAD i 1 2
|2-3|=1 3 3-3=0 N 3 |3-3|=0 4 4-3=1 4 |4-3|=1 5 5-3=2 5 |5-3|=2 =15
=? N =15 =6 x 15/5 (x i x) x 15/5 ABD=6/ AD i 1 =3.0 N =3.0 5
=1.2Fu Jen University Department of Electrical Engineering Wang,
Yuan-Kai Copyright
28. Unit - Statistics Review p. 28 Or Square of the Deviation N
Square the deviations ( x i x ) 2 Take the square root to remove
Variance i 1 to return to the minus signs N original scale N x xi N
(x i x) 2 AAD i 1 N i 1 N N (x i x) AD i 1 NFu Jen University
Department of Electrical Engineering Wang, Yuan-Kai Copyright
29. Unit - Statistics Review p. 29 Sample Mean and Sample
Variance We can approximate the expected value by the sample mean N
x 1 N x i 1 i N Sample variance s i 2 1 N (x x) i 2 But, strangely
enough, if you1want a good approximation of the true variance, you
should use 2 N 2 1 N i N 1 s (x x) N 1 i 1 2Fu Jen University
Department of Electrical Engineering Wang, Yuan-Kai Copyright
30. Unit - Statistics Review p. 30 Variance of an Image n 1 1 2
( xi x ) 2 p ( xi ), n 256 N 2 N 1 i 1 ( xi x )2 i 0 1 n x N x p( x
) Moments i 1 i iFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
31. Unit - Statistics Review p. 31 An Example of Variance
Variance of the scores of 15 students in this course = ? {20, 90,
90, 100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} P(X) 1 N x x
72 i X N i 1 10 20 30 40 50 60 70 80 90 100 1 N i 2 N 1 i 1 ( x x
)2 = 388.6Fu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
32. Unit - Statistics Review p. 32 Ex. of Standard Deviation
Standard deviation (SD): = (Var)1/2 1 N ( x x ) = 19.7 N 1 i 1 i 2
P(X) 1 N | x X 10 20 30 40 50 60 70 80 90 100 i x| N i 1 52.3 72
91.7Fu Jen University Department of Electrical Engineering Wang,
Yuan-Kai Copyright
33. Unit - Statistics Review p. 33 Formal Definition of
Variance Var ( X ) x E[( X E[ X ]) 2 ] ( xi x ) 2 p( xi ) 2 i Var (
X ) E[( X E[ X ]) ] ( xi x ) p( xi )dx 2 x 2 2 xFu Jen University
Department of Electrical Engineering Wang, Yuan-Kai Copyright
34. Unit - Statistics Review p. 34 5. Frequency Distributions A
graph or chart that shows the number of observations of a given
value, or class intervalFu Jen University Department of Electrical
Engineering Wang, Yuan-Kai Copyright
35. Unit - Statistics Review p. 35 Shape of Distribution
Modality The number of peaks in the curve Skewness An asymmetry in
a distribution where values are shifted to one extreme or the
other. Kurtosis The degree of Peakedness/flatness in the curveFu
Jen University Department of Electrical Engineering Wang, Yuan-Kai
Copyright
36. Unit - Statistics Review p. 36 Modality Unimodal Bimodal
MultimodalFu Jen University Department of Electrical Engineering
Wang, Yuan-Kai Copyright
37. Unit - Statistics Review p. 37 Skewness (1/3) The third
moment about the Mean =0: symmetry distribution (Normal
distribution ) >0 : Right Skew (Positive Skew) 3: sharp peak K0:
sharp peak K0 If X tends to decrease when Y increases, then XY