Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
An Introduction to Laws of Large Numbers
JohnCVGMI Group
September 13, 2012
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Contents
1 Introduction
2 Introduction To Laws of Large NumbersWeak Law of Large NumbersStrong LawStrongest Law
3 ExamplesInformation TheoryStatistical Learning
4 AppendixRandom VariablesWorking with R.V.’sIndependenceLimits of Random VariablesModes of ConvergenceChebyshev
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Contents
1 Introduction
2 Introduction To Laws of Large NumbersWeak Law of Large NumbersStrong LawStrongest Law
3 ExamplesInformation TheoryStatistical Learning
4 AppendixRandom VariablesWorking with R.V.’sIndependenceLimits of Random VariablesModes of ConvergenceChebyshev
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Contents
1 Introduction
2 Introduction To Laws of Large NumbersWeak Law of Large NumbersStrong LawStrongest Law
3 ExamplesInformation TheoryStatistical Learning
4 AppendixRandom VariablesWorking with R.V.’sIndependenceLimits of Random VariablesModes of ConvergenceChebyshev
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Contents
1 Introduction
2 Introduction To Laws of Large NumbersWeak Law of Large NumbersStrong LawStrongest Law
3 ExamplesInformation TheoryStatistical Learning
4 AppendixRandom VariablesWorking with R.V.’sIndependenceLimits of Random VariablesModes of ConvergenceChebyshev
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Intuition
We’re working with random variables. What could we observe?Random Variables Xn∞n=1
... More specifically
Bernoulli sequence, looking at probability of average of thesum of N random events. P(x1 <
∑Ni=1 Xi < x2)
Coin Flipping anybody?as N increases we see that the probability of observing anequal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases theaverage observation should tend toward the theoreticalmean
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Intuition
We’re working with random variables. What could we observe?Random Variables Xn∞n=1 ... More specifically
Bernoulli sequence, looking at probability of average of thesum of N random events. P(x1 <
∑Ni=1 Xi < x2)
Coin Flipping anybody?as N increases we see that the probability of observing anequal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases theaverage observation should tend toward the theoreticalmean
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Intuition
We’re working with random variables. What could we observe?Random Variables Xn∞n=1 ... More specifically
Bernoulli sequence, looking at probability of average of thesum of N random events. P(x1 <
∑Ni=1 Xi < x2)
Coin Flipping anybody?
as N increases we see that the probability of observing anequal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases theaverage observation should tend toward the theoreticalmean
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Intuition
We’re working with random variables. What could we observe?Random Variables Xn∞n=1 ... More specifically
Bernoulli sequence, looking at probability of average of thesum of N random events. P(x1 <
∑Ni=1 Xi < x2)
Coin Flipping anybody?as N increases we see that the probability of observing anequal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases theaverage observation should tend toward the theoreticalmean
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Intuition
We’re working with random variables. What could we observe?Random Variables Xn∞n=1 ... More specifically
Bernoulli sequence, looking at probability of average of thesum of N random events. P(x1 <
∑Ni=1 Xi < x2)
Coin Flipping anybody?as N increases we see that the probability of observing anequal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases theaverage observation should tend toward the theoreticalmean
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Let’s try to work out this most basic fundamental theorem ofrandom variables arising from repeatedly observing a randomevent. How do we build such a theorem?
In blackbox scenario we want to be able to useindependence.
We want to be able to use variance and expectation: solet’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn, and variance, by σn.
We will create new random variables from the sequenceand work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Let’s try to work out this most basic fundamental theorem ofrandom variables arising from repeatedly observing a randomevent. How do we build such a theorem?
In blackbox scenario we want to be able to useindependence.
We want to be able to use variance and expectation: solet’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn, and variance, by σn.
We will create new random variables from the sequenceand work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Let’s try to work out this most basic fundamental theorem ofrandom variables arising from repeatedly observing a randomevent. How do we build such a theorem?
In blackbox scenario we want to be able to useindependence.
We want to be able to use variance and expectation: solet’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn, and variance, by σn.
We will create new random variables from the sequenceand work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Let’s try to work out this most basic fundamental theorem ofrandom variables arising from repeatedly observing a randomevent. How do we build such a theorem?
In blackbox scenario we want to be able to useindependence.
We want to be able to use variance and expectation: solet’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn, and variance, by σn.
We will create new random variables from the sequenceand work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Let’s try to work out this most basic fundamental theorem ofrandom variables arising from repeatedly observing a randomevent. How do we build such a theorem?
In blackbox scenario we want to be able to useindependence.
We want to be able to use variance and expectation: solet’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn, and variance, by σn.
We will create new random variables from the sequenceand work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σn. Thenn∑
i=1Xi − µi → 0.
So a sequence of functions on Ω, the sample space, are goingtowards a function = zero...
Can we prove this?
We haven’t used (σn): we’ll clearly need these sinceotherwise the Xi are wildly unpredictable.
IDEA: Constrain limn∑
i=1
σ2i = 0
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σn. Thenn∑
i=1Xi − µi → 0.
So a sequence of functions on Ω, the sample space, are goingtowards a function = zero...
Can we prove this?
We haven’t used (σn): we’ll clearly need these sinceotherwise the Xi are wildly unpredictable.
IDEA: Constrain limn∑
i=1
σ2i = 0
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σn. Thenn∑
i=1Xi − µi → 0.
So a sequence of functions on Ω, the sample space, are goingtowards a function = zero...
Can we prove this?
We haven’t used (σn): we’ll clearly need these sinceotherwise the Xi are wildly unpredictable.
IDEA: Constrain limn∑
i=1
σ2i = 0
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σn. Thenn∑
i=1Xi − µi → 0.
So a sequence of functions on Ω, the sample space, are goingtowards a function = zero...
Can we prove this?
We haven’t used (σn): we’ll clearly need these sinceotherwise the Xi are wildly unpredictable.
IDEA: Constrain limn∑
i=1
σ2i = 0
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Counterexample
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law: Pitfalls
We haven’t specified a mode of convergence: this is atechnical point, but if you recall uniform, pointwise, a.e.,normed convergence from analysis then the goal of theproof can change radically.
IDEA: We want our rule to hold with high probability, thatis it should hold for essentially all values of ω ∈ Ω.Convergence in the normed space is pretty ambitious atthis point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply somethingabout the rate of convergence of the Xn − µn.The Xn − µn should converge slower than the σn.
IDEA: Weaken constraint to lim n−2n∑
i=1
σ2i = 0
The Xn − µn should converge on average
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law: Pitfalls
We haven’t specified a mode of convergence: this is atechnical point, but if you recall uniform, pointwise, a.e.,normed convergence from analysis then the goal of theproof can change radically.
IDEA: We want our rule to hold with high probability, thatis it should hold for essentially all values of ω ∈ Ω.Convergence in the normed space is pretty ambitious atthis point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply somethingabout the rate of convergence of the Xn − µn.The Xn − µn should converge slower than the σn.
IDEA: Weaken constraint to lim n−2n∑
i=1
σ2i = 0
The Xn − µn should converge on average
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law: Pitfalls
We haven’t specified a mode of convergence: this is atechnical point, but if you recall uniform, pointwise, a.e.,normed convergence from analysis then the goal of theproof can change radically.
IDEA: We want our rule to hold with high probability, thatis it should hold for essentially all values of ω ∈ Ω.Convergence in the normed space is pretty ambitious atthis point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply somethingabout the rate of convergence of the Xn − µn.The Xn − µn should converge slower than the σn.
IDEA: Weaken constraint to lim n−2n∑
i=1
σ2i = 0
The Xn − µn should converge on average
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Weak Law: Pitfalls
We haven’t specified a mode of convergence: this is atechnical point, but if you recall uniform, pointwise, a.e.,normed convergence from analysis then the goal of theproof can change radically.
IDEA: We want our rule to hold with high probability, thatis it should hold for essentially all values of ω ∈ Ω.Convergence in the normed space is pretty ambitious atthis point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply somethingabout the rate of convergence of the Xn − µn.The Xn − µn should converge slower than the σn.
IDEA: Weaken constraint to lim n−2n∑
i=1
σ2i = 0
The Xn − µn should converge on average
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Law of Large Numbers 2.0
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σ2n. Then1n
n∑i=1
Xi − µi → 0 in
measure if lim 1n2
n∑i=1
σ2i = 0.
Let Yn(ω) = 1/nn∑
i=1(Xi (ω)− µi ). Then E (Yn) = 0 and
σ2(Yn) = 1/n2n∑
i=1σ2i by independence. Now we can use the
limit of σ2(Yn) from the hypothesis, and need to prove
P(ω : |Yn(ω)| > ε) ≤ σ2(Yn)
ε2→ 0 as n→∗ ∞
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Aside: Lp bounds
What does
P(ω : |Yn(ω)| > ε) ≤ σ2(Yn)
ε2→ 0 as n→∗ ∞
actually say?
Part of Yn > ε is bounded by the integral of Y 2n divided by
epsilon squared... Thinking about this makes it obvious.
Better bounds can be obtained: Chernoff bounds
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Law of Large Numbers 2.1
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σ2n. Then1n
n∑i=1
Xi − µi → 0 in
measure if lim n−2n∑
i=1σ2i = 0.
⇓ weakened; use P(maxk |∑k
i=1 Xi − µi | ≥ ε) ≤ ε−2∑n
i=1 σ2i
Theorem
Let Xn be a sequence of independent L2 random variables
with means µn and variances σ2n. Then1n
n∑i=1
Xi − µi → 0 a.e.
if lim n−2n∑
i=1σ2i <∞.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Khinchine’s Law
Theorem
Let Xn be a sequence of independent L1 random variables,
identically distributed, with means µ. Then 1n
n∑i=1
Xi − µi → 0
a.e. as n→∞.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Large Sample Size in Coding
Now we need to use the laws. Start with an example in sourcecoding.
C : X (Ω)→ Σ∗ encoding events
Interesting to know how complex C must be to encode(Yi )
Ni=1
Entropy: H(x) = E [−lg(p(x))]; representation of howuncertain a r.v. is
Problem: p(·) is unknown to the function and thedistribution needs to be learned. WLLN can be used toanswer: how uncertain is the distribution?
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Asymptotic Equipartition Property
Theorem
Asymptotic Equipartition Property If X1, . . . ,Xni .i .d .∼ p(x) then
−1
nlg(p(x1, . . . , xn))→ H(x).
Xii .i .d .∼ p(x) so lgp(xi ) are i.i.d. The weak law of large numbers
says that
P(| − 1
n(lg(
∏i
(p(xi )))− E [∏i
−lg(p(x)))]| > ε)→ 0
P(| − 1
n
∑i
lg((p(xi )))− E [−lg(p(x))]| > ε)→ 0
so the sample entropy approaches the true entropy inprobability as the sample size increases.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Statistical Learning
Want to minimize R(f ) = E (l(x , y , f (x))) egl = (x , y , f )→ 1/2|f (x)− y |Stuck minimizing over all f , under a distribution we don’tknow... hopeless...
IDEA: Take Law of Large numbers and apply it to thisframework, and hopefullyRemp(f ) = 1/m
∑i l(xi , yi , f (xi ))→ R(f )
Use Lp bounds to prove convergence results on testingerror.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Mini Appendix: Measure Theory
I won’t go into too much detail in this regard. If we’ve made itthis far then great.
(X ,A) is a measurable space if A ⊂ 2X is closed undercountable unions and complements. These correspond to’measurable events’.
(X ,A) with µ is a measure space if µ : A → [0,∞) iscountably additive over disjoint sets: µ(∪iAi ) =
∑i µ(Ai )
if Ai ∩ Aj = ∅ if i 6= j .
More great properties fall out of this quickly: measure ofthe emptyset is zero, measures are monotonic under thecontainment (partial) ordering, even dominated andmonotone convergence (of sets) come out from this.
Chebyshev’s Inequality - Let f ∈ Lp(R) and denoteEα = x : |f | > α and now ||f ||pp ≥
∫Eα|f |p ≥ αpµ(Eα)
by monotonicity and positivity of measures.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
L2 R.V.’s: Definition
We say that a function f : X → R belongs to the functionspace Lp(X ) if ||f ||p = (
∫|f (x)|pdx)1/p <∞ and say f has
finite Lp norm.
Definition
The fundamental object being considered in statistics is therandom variable. Given a measure space (Ω,B, µ) X : Ω→ Ris an L2 random variable if
X < r ∈ B ∀r ∈ R
µ(Ω) = 1 and (∫X (ω)2dµ(ω)
)1/2
<∞.
We may also say that a random variable with finite 2ndmoment is a Borel-measurable real valued function in L2(X ).
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
L2 R.V.’s: Questions
Why B?Physical paradoxes, eg. Banach-TarskiWant to talk about physical events ∼ measurable setsB is as descriptive as we’d like most of the time
What does µ do here?µ weights the events ∼ measurable setsPuts values to ω ∈ Ω : X (ω) < r; Ω is the sample spaceX maps random events (patterns occuring in data) to thereals which have a good standard notion of measure. So Xinduces the distribution P(B) = µ(X−1(B)) which gives
P(X (ω) ∈ B) = µ(X−1(B))
as expected. This is usually written without the samplevariable.
For more sophisticated questions/answers see MAA 6616or STA 7466, but discussion is encouraged
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Statistics: Functions of Measured Events
Now we can start building
Theorem
If f : R→ R is measurable, then∫f (x) dP =
∫f (X (ω)) dµ
ie. the random variable pushes forward a measure onto R, andthe integrals of measurable functions of random variables aretherefore ultimately determined by real integration.
This may be proved using Dirac measures and measureproperties. Following these lines we may develop the basics ofthe theory of probability from measure theory. (ok, Homework)
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Statistics: Moments
Definition
Define the first moment as the linear functional
E (X ) =
∫t dP.
Then the pth central moment is is a functional given by
mp(X ) =
∫(t − E (X ))p dP.
Note that when X ∈ L2 these are well defined by the theoremabove. Pullbacks are omitted. Since we’re talking about L2
let’s define the second central moment as σ2 for convenience.If we need higher moments, sadly, mathematics says no.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Statistics: Independence
Now that we know what random variables are, let’s try todefine how a collection of them interact in the most basic waypossible.
Definition
Let X = Xα be a collection of random variables. Then X isindependent if
P(∩αX−1α (Bα)) = P((X1, ...,Xn)−1(B1 × ...× Bn))
= (µ(X−11 )× ...× µ(X−1n ))(B1 × ...× Bn).
(Head Explodes) Really this just means that the Xα areindependent if their joint distributions are given by the productmeasure over the induced measures.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Limits: Repeating Independent Trials
The significance of the belabored definition of independence isthat when a joint distribution, just a distribution over severalrandom variables, contains zero information about how thevariables are related.
If we have an infinite collection of random variables,independent of each other - in spite of independence - wewould like to be able to infer information about the lowerorder moments from the higher order moments and viceversa.
If we have ideal conditions on the the random variablesthen we should be able to deduce information aboutmoments from a very large sequence.
The type of convergence we get should be sensitive to thehypotheses.
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Modes of Convergence
We clearly need to specify how the means are converging,right?
Why?So how could
∑Ni=1(Xi − µi )/N → 0?
In measure : limN P(|∑N
i=1(Xi − µi )/N| ≥ ε)→ 0
In norm: limN(∫
(∑N
i=1(Xi − µi )/N)pdµ(ω))1/p → 0
A.E : P(limN |∑N
i=1(Xi − µi )/N| > 0)→ 0
AE
µ Lp Lp−i
fnfnk
fn
fnk
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Modes of Convergence
We clearly need to specify how the means are converging,right? Why?
So how could∑N
i=1(Xi − µi )/N → 0?
In measure : limN P(|∑N
i=1(Xi − µi )/N| ≥ ε)→ 0
In norm: limN(∫
(∑N
i=1(Xi − µi )/N)pdµ(ω))1/p → 0
A.E : P(limN |∑N
i=1(Xi − µi )/N| > 0)→ 0
AE
µ Lp Lp−i
fnfnk
fn
fnk
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Modes of Convergence
We clearly need to specify how the means are converging,right? Why?So how could
∑Ni=1(Xi − µi )/N → 0?
In measure : limN P(|∑N
i=1(Xi − µi )/N| ≥ ε)→ 0
In norm: limN(∫
(∑N
i=1(Xi − µi )/N)pdµ(ω))1/p → 0
A.E : P(limN |∑N
i=1(Xi − µi )/N| > 0)→ 0
AE
µ Lp Lp−i
fnfnk
fn
fnk
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Modes of Convergence
We clearly need to specify how the means are converging,right? Why?So how could
∑Ni=1(Xi − µi )/N → 0?
In measure : limN P(|∑N
i=1(Xi − µi )/N| ≥ ε)→ 0
In norm: limN(∫
(∑N
i=1(Xi − µi )/N)pdµ(ω))1/p → 0
A.E : P(limN |∑N
i=1(Xi − µi )/N| > 0)→ 0
AE
µ Lp Lp−i
fnfnk
fn
fnk
Introduction
IntroductionTo Laws ofLargeNumbers
Weak Law ofLarge Numbers
Strong Law
Strongest Law
Examples
InformationTheory
StatisticalLearning
Appendix
RandomVariables
Working withR.V.’s
Independence
Limits ofRandomVariables
Modes ofConvergence
Chebyshev
Chebyshev