Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Central Limit Theorem
ยง number of measurements needed to estimate the mean
ยง convergence of the estimated mean ยง a better approximation than the
Chebyshev Inequality (LLN).
Xingru [email protected]
XC 2020
Central Limit Theorem
๐! ๐"
๐# ๐$
๐% โฏ
๐& ๐'normal
density function
+
+
+
+
+
XC 2020
Central Limit Theorem
๐! ๐"
๐# ๐$
๐% โฏ
๐& ๐'normal
density function
++
+
+
+
If ๐' is the sum of ๐ mutually independent random variables, then the distribution function of ๐' is well-approximated by a normal density function.
XC 2020
CENTRAL LIMIT THEOREM FOR BERNOULLI TRIALS
binomial distribution
XC 2020
Bernoulli Trials
A Bernoulli trials process is a sequence of ๐ chance experiments such that
ยง Each experiment has two possible outcomes, which we may call success and failure.
ยง The probability ๐ of success on each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. The probability ๐ of failure is given by ๐ = 1 โ ๐.
50%Toss a Coinhead & tail
60%Weather Forecast
not rain & rain
12.5%Wheel of Fortune50 points & others
๐Bernoulli Trialsoutcome 1 & 2
XC 2020
Binomial Distribution
ยง Let ๐ be a positive integer and let ๐ be a real number between 0 and 1.
ยง Let ๐ต be the random variable which counts the number of successes in a Bernoulli trials process with parameters ๐ and ๐.
ยง Then the distribution ๐(๐, ๐, ๐) of ๐ต is called the binomial distribution.
Binomial Distribution
๐ ๐, ๐, ๐ =๐๐๐(๐')(
1 2 3 4 5
โ โ โ โ โ
XC 2020
Binomial Distribution
ยง Consider a Bernoulli trials process with probability ๐ for success on each trial.
ยง Let ๐* = 1 or 0 according as the ๐th outcome is a success or failure.
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐'. Then ๐' is the number of successes in ๐ trials.
ยง We know that ๐' has as its distribution the binomial probabilities ๐(๐, ๐, ๐).
XC 2020
๐ increases
drifts off to the right
flatten out
XC 2020
The Standardized Sum of ๐4
ยง Consider a Bernoulli trials process with probability ๐ for success on each trial.
ยง Let ๐* = 1 or 0 according as the ๐th outcome is a success or failure.
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐'. Then ๐' is the number of successes in ๐ trials.
ยง We know that ๐' has as its distribution the binomial probabilities ๐(๐, ๐, ๐).
ยง The standardized sum of ๐' is given by
๐'โ =,!)'-'-.
.
๐ = โฏ ๐% = โฏ
XC 2020
The Standardized Sum of ๐4
ยง Consider a Bernoulli trials process with probability ๐ for success on each trial.
ยง Let ๐* = 1 or 0 according as the ๐th outcome is a success or failure.
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐'. Then ๐' is the number of successes in ๐ trials.
ยง We know that ๐' has as its distribution the binomial probabilities ๐(๐, ๐, ๐).
ยง The standardized sum of ๐' is given by
๐'โ =,!)'-'-.
.
ยง ๐'โ always has expected value 0 and variance 1.๐ = 0
๐% = 1
XC 2020
The Standardized Sum of ๐4
=๐'โ =๐' โ ๐๐๐๐๐
๐(๐, ๐, 1)๐(๐, ๐, 2)
๐(๐, ๐, 3)๐(๐, ๐, 4)
๐(๐, ๐, 5)
๐(๐, ๐, 0)
โฏ๐(๐, ๐, ๐)
๐(๐, ๐, ๐)
0 1 2 3 4 5 โฆ ๐
0 โ ๐๐๐๐๐
1 โ ๐๐๐๐๐
2 โ ๐๐๐๐๐
3 โ ๐๐๐๐๐
4 โ ๐๐๐๐๐
5 โ ๐๐๐๐๐
โฆ ๐ โ ๐๐๐๐๐
๐
๐ฅXC 2020
๐'โ =๐' โ ๐๐๐๐๐
๐ = 0
๐% = 1
XC 2020
standardized sum
standard normal distribution
shapes: sameheights: different
why?
XC 2020
standardized sum standard normal distribution
shapes: sameheights: different why?
standardized sum: โ(/0' โ* = 1
standard normal distribution: โซ)1
21๐ ๐ฅ ๐๐ฅ = 1
XC 2020
standardized sum: โ(/0' โ* = 1
standard normal distribution: โซ)1
21๐ ๐ฅ ๐๐ฅ = 1 โ โ(/0' ๐ ๐ฅ* ๐๐ฅ
C(/0
'
โ* = 1 C(/0
'
๐ ๐ฅ* ๐๐ฅ โ 1
=โ( = ๐ ๐ฅ( ๐๐ฅ
=๐ ๐ฅ( =โ(๐๐ฅ
XC 2020
=โ) = ๐(๐, ๐, ๐)=๐'โ =๐' โ ๐๐๐๐๐ =๐๐ฅ = ๐ฅ)*& โ ๐ฅ) =
1๐๐๐
๐(๐, ๐, 1)๐(๐, ๐, 2)
๐(๐, ๐, 3)๐(๐, ๐, 4)
๐(๐, ๐, 5)
๐(๐, ๐, 0)
โฏ๐(๐, ๐, ๐)
0 1 2 3 4 5 โฆ ๐
0 โ ๐๐๐๐๐
1 โ ๐๐๐๐๐
2 โ ๐๐๐๐๐
3 โ ๐๐๐๐๐
4 โ ๐๐๐๐๐
5 โ ๐๐๐๐๐
โฆ ๐ โ ๐๐๐๐๐
=๐ = ๐๐๐๐ฅ) + ๐๐ ๐ : the integer nearest to ๐
XC 2020
โ( = ๐(๐, ๐, ๐)
๐๐ฅ = ๐ฅ(2& โ ๐ฅ( =1๐๐๐
๐ = ๐๐๐๐ฅ( + ๐๐
=๐ ๐ฅ( = 3"45= ๐๐๐ ๐(๐, ๐, ๐๐๐๐ฅ( + ๐๐ )
XC 2020
Central Limit Theorem for Binomial Distributions
ยง For the binomial distribution ๐(๐, ๐, ๐) we have
lim'โ1
๐๐๐๐ ๐, ๐, ๐๐ + ๐ฅ ๐๐๐ = ๐(๐ฅ),
where ๐(๐ฅ) is the standard normal density.
ยง The proof of this theorem can be carried out using Stirlingโs approximation.
Stirlingโs FormulaThe sequence ๐! is asymptotically equal
to ๐'๐)' 2๐๐.
XC 2020
Approximating Binomial Distributions
=lim'โ1
๐๐๐๐ ๐, ๐, ๐๐ + ๐ฅ ๐๐๐ = ๐(๐ฅ)
?๐ ๐, ๐, ๐ โ โฏ
normalbinomial
binomial normal
XC 2020
Approximating Binomial Distributions
=lim'โ,
๐๐๐๐ ๐, ๐, ๐๐ + ๐ฅ ๐๐๐ = ๐(๐ฅ) ?๐ ๐, ๐, ๐ โ โฏ
๐ = ๐๐ + ๐ฅ ๐๐๐
๐ฅ =๐ โ ๐๐๐๐๐
๐ ๐, ๐, ๐ โ๐(๐ฅ)๐๐๐
๐(๐, ๐, ๐) โ1๐๐๐
๐(๐ โ ๐๐๐๐๐
)
XC 2020
Example
Toss a coin
Find the probability of exactly 55 heads in 100 tosses of a coin.
=lim'โ,
๐๐๐๐ ๐, ๐, ๐๐ + ๐ฅ ๐๐๐ = ๐(๐ฅ)
๐(๐, ๐, ๐) โ1๐๐๐
๐(๐ โ ๐๐๐๐๐
)
XC 2020
Example
Toss a coin
Find the probability of exactly 55 heads in 100 tosses of a coin.
๐(๐, ๐, ๐) โ1๐๐๐
๐(๐ โ ๐๐๐๐๐
)๐ = 100 ๐ =12
๐๐ = 50 ๐๐๐ = 5
๐ = 55
๐ฅ =๐ โ ๐๐๐๐๐ = 1 ๐(100,
12, 55) โ
15๐(1)
XC 2020
Example
Toss a coin
Find the probability of exactly 55 heads in 100 tosses of a coin.
๐(๐, ๐, ๐) โ1๐๐๐ ๐(
๐ โ ๐๐๐๐๐ )
๐ 100,12, 55 โ
15๐ 1 =
15(12๐
๐)&/%)
Standard normal distribution ๐
๐ = 0 and ๐ = 1
๐8 ๐ฅ = &%9:
๐) 5); #/%:# = &%9๐)5#/%.
XC 2020
๐(๐, ๐, 1)๐(๐, ๐, 2)
๐(๐, ๐, 3)๐(๐, ๐, 4)
๐(๐, ๐, 5)
๐(๐, ๐, 0)
โฏ๐(๐, ๐, ๐)
0 1 2 3 4 5 โฆ ๐
0 โ ๐๐๐๐๐
1 โ ๐๐๐๐๐
2 โ ๐๐๐๐๐
3 โ ๐๐๐๐๐
4 โ ๐๐๐๐๐
5 โ ๐๐๐๐๐
โฆ ๐ โ ๐๐๐๐๐
A specific outcome
XC 2020
๐(๐, ๐, 1)๐(๐, ๐, 2)
๐(๐, ๐, 3)๐(๐, ๐, 4)
๐(๐, ๐, 5)
๐(๐, ๐, 0)
โฏ๐(๐, ๐, ๐)
0 1 2 3 4 5 โฆ ๐
0 โ ๐๐๐๐๐
1 โ ๐๐๐๐๐
2 โ ๐๐๐๐๐
3 โ ๐๐๐๐๐
4 โ ๐๐๐๐๐
5 โ ๐๐๐๐๐
โฆ ๐ โ ๐๐๐๐๐
Outcomes in an Interval
XC 2020
Central Limit Theorem for Binomial Distributions
ยง Let ๐' be the number of successes in n Bernoulli trials with probability ๐ for success, and let ๐ and ๐ be two fixed real numbers.
ยง Then
lim'โ21
๐ ๐ โค ,!)'-'-.
โค ๐ = โซ<=๐ ๐ฅ ๐๐ฅ = NA(๐, ๐).
ยง We denote this area by NA(๐, ๐).
๐ โค๐' โ ๐๐๐๐๐
= ๐'โ โค ๐ ๐ ๐๐๐ + ๐๐ โค ๐' โค ๐ ๐๐๐ + ๐๐
XC 2020
lim'โ21
๐ ๐ โค ,!)'-'-.
โค ๐ = NA(๐, ๐).
๐ โค๐' โ ๐๐๐๐๐
= ๐'โ โค ๐ ๐ ๐๐๐ + ๐๐ โค ๐' โค ๐ ๐๐๐ + ๐๐
๐ โค ๐' โค ๐ ๐ =๐ โ ๐๐๐๐๐
โค ๐'โ โค๐ โ ๐๐๐๐๐
= ๐
lim'โ21
๐ ๐ โค ๐' โค ๐ = lim'โ21
๐ *)'-'-.
โค ๐'โ โค>)'-'-.
= NA(*)'-'-.
, >)'-'-.
).
Approximating Binomial Distributions
XC 2020
๐ โค ๐' โค ๐ ๐ =๐ โ ๐๐๐๐๐
โค ๐'โ โค๐ โ ๐๐๐๐๐
= ๐
lim'โ21
๐ ๐ โค ๐' โค ๐ = lim'โ21
๐ *)'-'-.
โค ๐'โ โค>)'-'-.
= NA(*)'-'-.
, >)'-'-.
).
Approximating Binomial Distributions (more accurate)
lim@โBC
๐ ๐ โค ๐@ โค ๐ = NA(DE!"E@F
@FG,HB!"E@F
@FG).
๐ ๐
XC 2020
Example
Toss a coin
A coin is tossed 100 times. Estimate the probability that the number of heads lies between 40 and 60 (including the end points).
=lim'โ21
๐ ๐ โค ๐' โค ๐ = NA(๐ โ 12 โ ๐๐
๐๐๐ ,๐ + 12 โ ๐๐
๐๐๐ )
XC 2020
Toss a coin
A coin is tossed 100 times. Estimate the probability that the number of heads lies between 40 and 60 (including the end points).
๐ = 100 ๐ =12
๐๐ = 50 ๐๐๐ = 5
๐ = 40
๐ โ 12 โ ๐๐๐๐๐
= โ2.1
๐ = 60
๐ + 12 โ ๐๐๐๐๐
= 2.1
the outcome will not deviate by more than two standard deviations from the expected value
XC 2020
Toss a coin
A coin is tossed 100 times. Estimate the probability that the number of heads lies between 40 and 60 (including the end points).
๐ = 40
๐ โ 12 โ ๐๐๐๐๐ = โ2.1
๐ = 60
๐ + 12 โ ๐๐๐๐๐ = 2.1
=lim$โ&'
๐ ๐ โค ๐$ โค ๐ = NA(๐ โ 12 โ ๐๐
๐๐๐,๐ + 12 โ ๐๐
๐๐๐)
=lim$โ&'
๐ 40 โค ๐$ โค 60 = NA โ2.1,2.1= 2NA 0, 2.1
2๐ 2.1๐
XC 2020
Approximating Binomial Distribution
๐บ๐โ = ๐lim$โ'
๐๐๐๐ ๐, ๐, ๐๐ + ๐ฅ ๐๐๐ = ๐(๐ฅ)
๐บ๐ = ๐
๐(๐, ๐, ๐) โ1๐๐๐ ๐(
๐ โ ๐๐๐๐๐ )
๐ โค ๐บ๐โ โค ๐
lim'โ*,
๐ ๐ โค -!.'/'/0
โค ๐ = NA(๐, ๐).
๐ โค ๐บ๐ โค ๐
lim$โ&'
๐ ๐ โค ๐$ โค ๐ = NA(()!")$*
$*+,,&!")$*
$*+).
CLT
XC 2020
CENTRAL LIMIT THEOREM FOR DISCRETE INDEPENDENT TRIALS
any independent trials process such that the individual trials have finite variance
XC 2020
The Standardized Sum of ๐4
ยง Consider an independent trials process with common distribution function ๐ ๐ฅdefined on the integers, with expected value ๐ and variance ๐%.
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐' be the sum of ๐ independent discrete random variables of the process.
ยง The standardized sum of ๐' is given by
๐'โ =,!)';':#
.
ยง ๐'โ always has expected value 0 and variance 1.
๐ธ(๐') = โฏ
๐(๐') = โฏ
XC 2020
independent trials process
๐(๐' = ๐)
๐ ๐ฅ)=๐ โ ๐๐๐๐%
=๐'โ =๐' โ ๐๐๐๐%
normal
๐๐-๐(๐$ = ๐)
XC 2020
Approximation Theorem
ยง Let ๐&, ๐%, โฏ, ๐' be an independent trials process and let ๐' = ๐& + ๐% +โฏ+ ๐'.
ยง Assume that the greatest common divisor of the differences of all the values that the ๐( can take on is 1.
ยง Let ๐ธ ๐( = ๐ and ๐ ๐( = ๐%.
ยง Then for ๐ large,
๐(๐' = ๐) โ &':#
๐(()';':#
).
ยง Here ๐(๐ฅ) is the standard normal density.
XC 2020
Central Limit Theorem for a Discrete Independent Trials Process
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐' be the sum of n discrete independent random variables with common distribution having expected value ๐ and variance ๐%.
ยง Then, for ๐ < ๐,
lim'โ21
๐ ๐ โค ,!)';':#
โค ๐ = &%9โซ<
= ๐)5#/%๐๐ฅ = NA(๐, ๐).
XC 2020
Approximating a Discrete Independent Trials Process
๐(๐' = ๐) โ1๐๐%
๐(๐ โ ๐๐๐๐%
)
lim'โ*,
๐ ๐ โค ๐' โค ๐ =
NA(1.'2'3"
, 4.'2'3"
).
๐บ๐ = ๐
๐ โค ๐บ๐ โค ๐
CLT
XC 2020
Approximating a Discrete Independent Trials Process
lim'โ*,
๐ ๐ โค ๐' โค ๐ =
NA(5.#".'/
'/0,6*#".'/
'/0).
๐ โค ๐บ๐ โค ๐
๐(๐' = ๐) โ1๐๐%
๐(๐ โ ๐๐๐๐%
)
๐บ๐ = ๐
๐(๐, ๐, ๐) โ1๐๐๐
๐(๐ โ ๐๐๐๐๐
)
๐บ๐ = ๐
lim'โ*,
๐ ๐ โค ๐' โค ๐ =
NA(1.'2'3"
, 4.'2'3"
).
๐ โค ๐บ๐ โค ๐
Bernoulli
XC 2020
ExampleA surveying instrument makes an error of -2, -1, 0, 1, or 2 feet with equal probabilities when measuring the height of a 200-foot tower.
XC 2020
Find the expected value and the variance for the height obtained using this instrument once.
Example
height = error + 200
error -2 -1 0 1 2
probability15
15
15
15
15
๐ธ error =15โ2 โ 1 + 0 + 1 + 2 = 0
๐ error =15(โ2)%+(โ1)%+0% + 1% + 2% = 2
๐ธ height= 0 + 200 = 200
๐ height = 2
XC 2020
Estimate the probability that in 18 independent measurements of this tower, the average of the measurements is between 199 and 201, inclusive.
Example
๐ = 200 ๐% = 2
=lim'โ21
๐ ๐ โค ๐' โค ๐ = NA(๐ โ ๐๐๐๐%
,๐ โ ๐๐๐๐%
).
199 โค๐'๐=๐'18
โค 201๐ = 18
XC 2020
Example
๐ = 200 ๐% = 2
=lim'โ*,
๐ ๐ โค ๐' โค ๐ = NA(๐ โ ๐๐๐๐%
,๐ โ ๐๐๐๐%
).
199 โค๐'๐=๐'18
โค 201๐ = 18
๐ 199 โค๐'18
โค 201 = ๐ 3582 โค ๐' โค 3618
โ NA3582 โ 3600
36,3618 โ 3600
36= NA(โ3, 3).
XC 2020
03
01
02
ยง independent
Discrete Random Variables
ยง independentยง identical
Bernoulli Trials
ยง independentยง identical
Discrete Trials
Central Limit Theorem
XC 2020
Central Limit Theorem
ยง Let ๐&, ๐%, โฏ, ๐' be a sequence of independent discrete random variables. There exist a constant ๐ด, such that ๐* โค ๐ด for all ๐.
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐' be the sum. Assume that ๐' โ โ.
ยง For each ๐, denote the expected value and variance of ๐* by ๐* and ๐*%, respectively.
ยง Define the expected value and variance of ๐' to be ๐' and ๐ '%, respectively.
ยง For ๐ < ๐,
lim'โ21
๐ ๐ โค ,!)@!A!
โค ๐ = &%9โซ<
= ๐)5#/%๐๐ฅ = NA(๐, ๐).
XC 2020
CENTRAL LIMIT THEOREM FOR CONTINUOUS INDEPENDENT TRIALScontinuous random variables with a common density function
XC 2020
Central Limit Theorem
ยง Let ๐' = ๐& + ๐% +โฏ+ ๐' be the sum of ๐ independent continuous random variables with common density function ๐ having expected value ๐ and variance ๐%.
ยง Let
๐'โ =,!)';':#
.
ยง Then we have, for all ๐ < ๐,
lim'โ21
๐ ๐ โค ๐'โ โค ๐ = &%9โซ<
= ๐)5#/%๐๐ฅ = NA(๐, ๐).
XC 2020
Example
ยง Suppose a surveyor wants to measure a known distance, say of 1 mile, using a transit and some method of triangulation.
ยง He knows that because of possible motion of the transit, atmospheric distortions, and human error, any one measurement is apt to slightly in error.
ยง He plans to make several measurements and take an average.
ยง He assumes that his measurements are independent random variables with common distribution of mean ๐ = 1 and standard deviation ๐ =0.0002.
XC 2020
Example
ยง He can say that if ๐ is large, the average ,!
'has a density function that
is approximately normal, with mean 1mile, and standard deviation 0.000%
'miles.
ยง How many measurements should he make to be reasonably sure that his average lies within 0.0001 of the true value?
XC 2020
Example
๐ = 1
Expected Value
๐ = 0.0002
Standard Deviation๐' = ๐& + ๐% +โฏ+ ๐'
Sum of ๐ independent measurements
ยง He can say that if ๐ is large, the average ,!' has a density function that is approximately normal, with mean 1 mile, and standard deviation 0.000%
'miles.
๐U = (0.0002)UVariance ๐ด' =
๐& + ๐% +โฏ+ ๐'๐
๐ธ ๐ด' = ๐๐ท ๐ด' =
๐๐
๐ ๐ด' =๐%
๐
Average of ๐ independent measurements
XC 2020
Example
Chebyshev inequality
LLN
๐'โ =๐' โ ๐๐๐๐%
CLT
ยง How many measurements should he make to be reasonably sure that his average lies within 0.0001 of the true value?
XC 2020
๐๐
๐บ๐
๐ท(|๐ฟ โ ๐|โฅ ๐บ)
inequalityChebyshev๐ท(|๐ฟ โ ๐| โฅ ๐บ) โค
๐๐
๐บ๐
This is a sample text. Insert your desired text
here.
This is a sample text. Insert your desired text here.
Let ๐ be a discrete random variable with expected value ๐ = ๐ธ(๐) and variance ๐% = ๐(๐), and let ๐ > 0 be any positive number. Then
not necessarily positive
XC 2020
Example
Chebyshev inequality
LLN
ยง How many measurements should he make to be reasonably sure that his average lies within 0.0001 of the true value?
๐ = 1 ๐% = (0.0002)%
=๐(|๐ โ ๐| โฅ ๐) โค๐%
๐%
๐ ๐ด' โ ๐ โฅ 0.0001 โค๐%
๐๐%=
0.0002 %
๐ 0.0001 % =4๐
XC 2020
Example
Chebyshev inequality
LLN
ยง How many measurements should he make to be reasonably sure that his average lies within 0.0001 of the true value?
=๐(|๐ โ ๐| โฅ ๐) โค๐%
๐%
๐ ๐ด' โ ๐ โฅ 0.0001 โค๐%
๐๐%=4๐= 0.05 ๐ = 80
XC 2020
Example
๐'โ =๐' โ ๐๐๐๐%
CLT
=lim'โ21
๐ ๐ โค ๐'โ โค ๐ = NA(๐, ๐)
๐ ๐ด' โ ๐ โค 0.0001 = ๐ ๐' โ ๐๐ โค 0.5๐๐
= ๐๐' โ ๐๐๐๐%
โค 0.5 ๐ = ๐ โ0.5 ๐ โค ๐'โ โค 0.5 ๐
โ NA โ0.5 ๐, 0.5 ๐
๐ = 1 ๐% = (0.0002)%
XC 2020
Example
๐'โ =๐' โ ๐๐๐๐%
CLT
=lim'โ21
๐ ๐ โค ๐'โ โค ๐ = NA(๐, ๐)
๐ ๐ด' โ ๐ โค 0.0001 โ NA โ0.5 ๐, 0.5 ๐ = 0.95
0.5 ๐ = 2 ๐ = 16
XC 2020
Example
Chebyshev inequality
LLN
๐'โ =๐' โ ๐๐๐๐%
CLT
ยง How many measurements should he make to be reasonably sure that his average lies within 0.0001 of the true value?
๐ = 80 ๐ = 16
XC 2020