Upload
ahmet-halit-doenmez
View
217
Download
0
Embed Size (px)
Citation preview
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 1/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics
İnsan TUNALI 9 February 2016Econ 311 – Econometrics I Lectures 3-5
REVIEW OF PROBABILITY AND STATISTICS
Draws on:
Stock & Watson, Ch.2, Sections 2.1-2.3Goldberger, Chs. 3-4.
From the syllabus: “The prerequisites for ECON 311 include MATH 201
(Statistics), and ECON 201 (Intermediate Microeconomics). Students who got agrade of C − or below in MATH 201 are strongly advised to work independently to
make up for any deficiency they have during the first two weeks of the semester.”
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 2/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics2/40
The probability framework for statistical inference
(a) Population, random variable, and distribution
Random variables and distributions can be classified as:
• Univariate, bivariate, trivariate …
•
Discrete, continuous, mixed.
(b) Moments of a distribution (mean, variance, standard deviation,
covariance, correlation)
(c)
Conditional distributions and conditional means(d) Distribution of a sample of data drawn randomly from a
population: Y B1B,…, Y BnB (subject of another handout).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 3/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics3/40
(a) Population, random variable, and distribution
Population
•
The group or collection of all possible entities of interest
• We will think of populations as being “very big” (∞ is an
approximation to “very big”)
Outcomes... sample space... events... MATH 201
Random variable Y
• Numerical summary of a random outcome
Population distribution of Y
• Discrete case: The probabilities of different values of Y that
occur in the population
• Continous case: Likelihood of particular ranges of Y
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 4/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics4/40
How to envision
Discrete Probability Distributions
Urn model: Population = Balls in an urn
Each ball has a value (Y ) written on it;
Y has K distinct values: y1, y2, ..., yi, ..., yK .
Suppose we were to sample from this (univariate) population, with
replacement, infinitely many times…
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 5/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics5/40
(Population) Distribution of Y :
pi = Pr(Y = yi), i = 1, 2, …, K .
Gives the proportion of times we encounter a ball with value Y = yi,i = 1, 2, …, K .
Alternate notation: f ( y) = Pr(Y = y); “probability function (p.f.) of Y .”
Clearly pi ≥ 0 for all i, and Σi pi = 1.
Convention: Σi ≡ ∑=1
Examples:
> Gender: M (=0)/F (=1). We prefer the numerical representation…
> Standing: freshman (=1), sophomore (=2), junior (=3), senior(=4).
> Ranges of wages (group them in intervals first).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 6/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics6/40
Cumulative Distribution Function (c.d.f) of Y :
F ( y) = Pr(Y ≤ y) = ∑ )(≤ y y
ii
y f
= ∑≤ y y
ii
p .
That is, to find F ( y) we sum the pi’s up to the value p = Pr(Y = y).
Use of the c.d.f.:
Pr(a < Y ≤ b) = F (b) – F (a).
Important features of the distribution of Y :
(Population) Mean of Y :
μY = E (Y ) = Σi yi pi. (Here Σi ≡ ∑ )=1
Also known as “the expected value of Y” or simply “expectation of Y.”
Remark: Expectation is a weighted average of the values of Y whereweights are the probabilities with which distinct values occur.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 7/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics7/40
The idea of “weighted averaging” can be extended to functions of Y .
Suppose Z = h(Y ), any function of Y . Then the expected value of h(Y )
is: E ( Z ) = E [h(Y )]
= Σi h( yi) pi.
Thus knowledge of the probability distribution of Y is sufficient for
calculating the expectation of functions of Y as well.Examples:
(i) Take Z = Y 2. Then
E ( Z ) = E (Y 2) = Σi yi2 pi.
(ii) Take Z = (Y– μY )2. Then
E ( Z ) = E [(Y– μY )2] = Σi ( yi – μY )2 pi.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 8/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics8/40
With this choice of h(Y ), we get the:
(Population) Variance of Y :
2Y σ = V (Y ) = E [(Y– μY )2]
= Σi ( yi – μY )2 pi.
In words, variance equals the expected value of (or the expectation
of)“the squared deviation of Y from its mean.”
Example: Suppose random variable Y can take on one of two values, y0 = 0 and y1 = 1 with probabilities p0 and p1.
Since p0 + p1 =1, we may take
Pr (Y = 1) = p and Pr (Y = 0) = 1 – p, 0 < p < 1.
We say Y has a “Bernoulli distribution with parameter p” and
write: Y ~ Bernoulli ( p).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 9/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics9/40
For Y ~ Bernoulli ( p):
μY = E (Y ) = Σi yi pi = (0)(1 – p) + (1)( p) = p;
E (Y 2) = Σi y2i pi = (0)2(1 – p) + (1)2( p) = p;
2Y σ = V (Y ) = Σi ( yi – μY )2 pi
= (0 – p)2(1 – p) + (1 – p)2( p) = … = p(1 – p). ///
(iii) Linear functions: Take Z = a + bY , where a and b are constants. E ( Z ) = E (a + bY )
= Σk (a + byk ) pk = a Σk pk + b Σk yk pk
= a + b E (Y ).
In words, expectation of a linear function (of Y) equals the linear function of the expectation (of Y).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 10/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics10/40
Useful algebra: Let Y * = Y – μY , deviation of Y from its population
mean.
This function is linear in Y , as in (iii), where a = – μY , and b = 1.
E (Y *) = – μY + (1) E (Y ) = 0.
In words, expectation of a deviation around the mean is zero.
Next, examine Y *2 = (Y – μY )2 = Y 2 + μY 2 – 2YμY next; this
function is not linear in Y .
E (Y *2) = E (Y 2 + μY 2 – 2YμY )
= E (Y 2) + μY 2 – 2 μY E (Y ) (*)
= E (Y 2) + μY 2 – 2 μY
2 = E (Y 2) – [ E (Y )]2.
In line (*) we exploited the fact that E (.), which involves weightedaveraging, is a “linear” operator; thus expectation of a sum equalsthe sum of expectations.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 11/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics11/40
From (ii), V (Y ) = E (Y *2); thus
V (Y ) = E (Y 2) – [ E (Y )]2.
In words, variance of Y equals “expectation of squared Y” minus“square of expected Y”.
Finally, let Z = a + bY as in (iii), and consider the deviation of Z fromits mean:
Z * = Z – E ( Z )= a + bY – [a + bE (Y )] = bY *.
It follows that the variance of Z is related to the variance of Y via:
V ( Z ) = E ( Z *2)
= E [(bY *
)2
] = E (b2
Y *2
) = b2
E (Y *2
) = b2
V (Y ).In words, variance of a linear function equals slope squared times
the variance
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 12/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics12/40
Exercise: Well-drilling project .
Based on previous experience, a contractor believes he will find water
within 1-5 days, and attaches a probability to each possible outcome.Let T denote the (random amount of) time it takes to complete drilling.The probability distribution (p.f.) of T is:
t = time (days) 1 2 3 4 5Pr(T = t ) = f T (t ) 0.1 0.2 0.3 0.3 0.1
(i) Find the cumulative distribution function (c.d.f.) of T and interpretit.
t = time (days) 1 2 3 4 5F T (t ) =
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 13/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics13/40
(ii) Find the expected duration of the project and interpret the numberyou find.
The contractors’s total project cost is made up of two parts: A fixedcost of TL2,000, plus TL 500 for each day taken to complete the
drilling.(iii) Find the expected total project cost.
(iv) Find the variance of the project cost.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 14/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics14/40
Prediction: Consider the urn model, where population consists of
balls in an urn. A ball is picked at random. Your task is to guess the
value Y written on it. What would your guess be?Example: Suppose you had to predict how long a particular well
drilling project would take. What would your guess be?
One of the possible values of T ?
Some other number?
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 15/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics15/40
We need more structure. Clearly, prediction is subject to error.
Errors can be costly, and large errors can be more costly.
What is the cost of a poor prediction?
Let “c” be your guess (a number).
Define prediction error as U = Y – c.
We would like to make U small. Since Y is a random variable, U is also a
random variable.
More definitions:
E (U ) = E (Y ) – c = bias of your guess (“c”).
E (U 2) = E [(Y – c)2]) = mean (expected ) squared error of guess c.
Mean Squared Prediction Error criterion: Suppose the objective is to
minimize E (U 2). Then the best predictor (guess) is c = μY = E (Y ).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 16/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics16/40
Proof: Can use calculus.
E (U 2) = E [(Y – c)2]) = Σi ( yi – c)2 pi.
Differentiation yields:
∂ E (U 2)/∂c = Σi ∂[( yi – c)2 pi]/∂c = Σi [–2( yi – c) pi].
Setting the derivative to zero yields the first order condition (F.O.C.)
for a minimum: Σi [–2( yi – c) pi] = 0.
That is,
Σi yi pi = c Σi pi.
We know: Σi pi = 1, and Σi yi pi = E (Y ), so the solution is c = μY .(Check the second order condition to verify that we located a minimum.)///
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 17/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics17/40
Non-Calculus proof: For brevity let μ = μY and reexamine the
prediction error:
U = Y – c = Y – μ – (c – μ) = Y * – (c – μ),
where Y * = Y – μ as usual. Square both sides and expand:
U 2 = [Y * – (c – μ)]2 = Y *2 + (c – μ)2 – 2Y *(c – μ).
Take expectations, and recall “useful algebra”:
E (U 2) = E [Y *2 + (c – μ)2 – 2Y *(c – μ)]
= E (Y *2) + (c – μ)2 – 2(c – μ) E (Y *)
= V (Y ) + (c – μ)2
.Since V (Y ) > 0 and (c – μ)2 ≥ 0, minimum E (U 2) is obtained by
setting c = μ. ///
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 18/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics18/40
Remarks:
(i) If we use the mean squared prediction error , then the
population mean (that is, the expectation of the randomvariable) is the best guess (predictor) of a draw from that
population (distribution).
(ii) Variance equals the value of the expected squared prediction
error when the population mean is used as the predictor.(iii) Other criteria may yield different choices of best predictor.
For example, if the criterion were minimization of the
expected absolute prediction error, namely E (|U |), then the
population median would be the best predictor.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 19/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics19/40
How to envision
lConditionaMarginal
Joint
Probability Distributions
Urn model: Population = Balls in an urn
Bivariate population: Each ball has a pair of values ( X , Y ) written on it.
X has J distinct values: x1, x2, ..., x j, ..., x J .
Y has K distinct values: y1, y2, ..., yk , ..., yK .
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 20/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics20/40
Joint (population) distribution of X and Y :
p jk = Pr( X = x j, Y = yk ), j = 1, 2, …, J ; k = 1, 2, …, K .
Gives the proportion of times we encounter a ball with paired values( x j, yk ), j = 1, 2, …, J ; k = 1, 2, …, K .
The joint distribution classifies the balls according to values of both
X and Y . To obtain a “marginal” distribution, we reclassify the balls
in the urn according to the distinct values of one “margin”. We
ignore the distinct values of the second margin.
Marginal (population) distribution of X :
p j = Pr( X = x j), j = 1, 2,…, J .
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 21/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics21/40
Here we ignore the values of Y , and examine the proportion of timeswe encounter a ball with values x j, j = 1, 2,…, J .
How to obtain a marginal distribution of X from the joint distributionof X and Y :
p j = Σk p jk , j = 1, 2,…, J .
(Population) Mean of X :
μ X = E ( X ) = Σ j x j p j.(Population) Variance of X :
2 X σ = V ( X ) = Σ j ( x j – μ X )2 p j.
The marginal distribution of Y, its mean and variance may be
obtained in analogous fashion (write down the formula!).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 22/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics22/40
Exercise: Consider S&W Table 2.3, Panel A (see next page).
(i) Verify the derivation of the marginal distributions of A and M .
(ii)
Find the means E ( A), E ( M ) and variances V ( A), V ( M ).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 23/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics23/40
(Stock & Watson)
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 24/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics24/40
To obtain a “conditional” distribution, we first sort the balls
according to one of the two values, and put them in different urns.
We then examine the contents of a specific urn.To obtain the conditional distributions of Y given X we sort on
distinct values x j:
POPULATION
SUBPOPULATIONS . . . . . . . .
X = x1 X = x2 X = x j X = x J
Each urn has a distribution of values of Y!
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 25/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics25/40
These conditional distributions may be different (hence each
subpopulation may have a different mean and variance). We can
distinguish between them as long as we record the distinct value of X for that urn.
Conditional (population) distribution of Y given X = x j:
pk | j = Pr(Y = yk | X = x j) =
j
jk
p
p, k = 1, 2, …, K .
The derivation requires p j > 0.
Conditional (population) mean of Y given X = x j:
μY | j = E (Y | X = x j) = Σk yk pk | j.
Conditional (population) variance of Y given X = x j:2| jY σ = V (Y | X = x j) = Σk ( yk – μY | j)2 pk | j.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 26/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics26/40
The conditional distributions of X given Y = yk , and their conditional
means and variances may be obtained in analogous fashion (write
down the formula you would use!).Exercises: Consider S&W Table 2.3, Panel B (see page 22 above).
(i) Verify the derivation in Panel B.
(ii) Find E ( M | A) for A = 0, 1 and interpret them.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 27/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics27/40
Practical uses of conditional expectations:
• Consider the conditional distributions given in S&W Table 2.3
(p.70). Suppose you have an old computer. How would you justify buying a new computer?
Hint: Calculate the benefit (reduction in expected crashes) of
switching from an old computer to a new one.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 28/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics28/40
Practical uses of conditional expectations, cont’d:
• Consider the urn model. We obtain a random draw from the joint
distribution of ( X , Y ). We tell you the value of X . What is your best guess of the value of Y ?
Hint: Suppose X = x j. Then an equivalent way of stating the problem
is that Y has been drawn from the urn labeled X = x j.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 29/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics29/40
We saw that “expectation” is weighted average. In the urn model, if
we focus on urn labelled X = x j, we find the conditional mean using
μY | j = E (Y | X = x j) = Σk yk pk | j, j = 1,2,…, J .
In classifying the balls, the urn labelled X = x j is used with probability
p j = Pr( X = x j). As a consequence:
μY = E (Y ) = Σ j E (Y | X = x j) Pr( X = x j).Thus, expectation (mean) of Y is a weighted average of the
conditional expectations of Y given X = x j , weighted by Pr( X = x j).
We may write: E (Y ) = E X [ E (Y | X )].
This results is known as the Law of Iterated Expectations.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 30/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics30/40
Law of Iterated Expectations:
E (Y ) = E X [ E (Y | X )].
Observe that:
• The “inner” expectation E (Y | X ) is a weighted average of the
different values of y, weighed by conditional probabilities
Pr(Y = yk | X = x j) (here X is “given”, we know which urn the balls
come from).
• The “outer” expectation E X [.] is a weighted average of the
different values of E (Y | X = x j), weighed by probabilities Pr( X = x j).
Exercise: Earlier we used the marginal distribution of M to calculate E ( M ). Can you think of another way to compute E ( M )? (see S&W:72)
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 31/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics31/40
Functions of jointly distributed random variables:
Let Z = h( X , Y ), a function of two random variables, X and Y .
Suppose the joint distribution of X and Y is known.
Then the expectation of Z can be computed in the usual manner,
as a weighted average:
E ( Z
) = E
[h( X
,Y
)] = Σ j
Σ j h
( xi
, y j
) Pr( X
= x j
,Y
= y j
)= Σ j Σ j h( xi, y j) pij ()
where the probability weights pij, i = 1, 2,…, I and j = 1, 2,…, J are
obtained from the joint distribution.
Exercise: Use S&W Table 2.3 to compute E ( MA).
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 32/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics32/40
(Population) covariance:
In a joint distribution, the degree to which two random variables
are related may be measured with the help of covariance:
Cov( X , Y ) = σ XY = E ( X *Y *) = E [( X – μ X )( Y – μY )]
= Σ j Σk ( x j – μ X )( yk – μY ) Pr( X = x j, Y = yk ).
Remark: We took Z = h( X , Y ) = ( X – μ X )(Y – μY ) and found E ( Z )…
Useful algebra:
E ( X *Y *) = E [( X – μ X )( Y – μY )] = …
= E ( XY ) – E ( X ) E (Y ). ()
In words, covariance equals the expected value of the product, minusthe product of the expectations.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 33/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics33/40
(Population) covariance cont’d:
The “sign” of covariance is informative about the nature of the
relation:
If above average values of X go together with above average
values of Y (so that below average values of X go together with
below average values of Y ) covariance will be positive.
If above average values of one variable go together with below
average values of the other, covariance will be negative.
Exercise: Suppose X = weight, Y = height of individuals in a
population. Can you guess the sign of Cov( X , Y ) = ?
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 34/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics34/40
(Population) correlation:
The magnitude of covariance is affected by the units of measurement
of the variables. For a unit-free measure, we turn to correlation:
Corr ( X , Y ) = ρ XY = (,)
� ()� () =
.
It can be shown that – 1 ≤ ρ XY ≤ 1.
Random variables are said to be uncorrelated if ρ XY = 0. Clearlyfor this to happen, σ XY = 0 must hold.
Recall that in general E (Y | X ) is a function of X ; it tells us how the
conditional mean of Y given X = x j changes with x j, j = 1, 2, …, J.
Remark: Think about the urn model. Think about prediction.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 35/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics35/40
Suppose E (Y | X ) = E (Y ) = μY , a constant. To describe this case, we say
Y is mean-independent of X .
Claim 1: If Y is mean-independent of X , then σ XY = 0 ( ρ XY = 0).
Proof: E ( XY ) = E (YX ) = E X [ E (YX | X )] = E X [ E (Y | X ) X ];
*When we “condition” on X , we set it equal to a particular value.
If E (Y | X ) = E (Y ), the last expression simplifies:
= E X [ E (Y ) X ] = E (Y ) E ( X ).
We showed:
If Y is mean-independent of X , then E ( XY ) = E (Y ) E ( X ).
Return to () and note that σ XY = 0 iff E ( XY ) = E ( X ) E (Y ). Thus σ XY = 0… ///
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 36/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics36/40
CAUTION: If σ XY = 0, it does not follow that E (Y | X ) = constant.
Covariance/correlation capture the linear relation between X and Y .
It could be that the relation is non-linear, so that E (Y | X ) varieswith X , but yet σ XY = 0.
Example: Modify the joint distribution in Assignment 2 Part II as:
and (re)calculate Cov( X , Y ).
f ( x, y) x = –1 x = 0 x =1 y = 1 0.20 0.10 0.20 y = 2 0.10 0.30 0.10
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 37/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics37/40
Independence: Random variables X and Y are (statistically)
independent , if knowledge of the value of one of the variables
provides no information about the other. Formally:
From the definition of conditional probabilities,
Pr(Y = y, X = x) = Pr(Y = y | X = x)Pr( X = x).
Thus an equivalent condition for, and implication of independence is:
I.1. X and Y are independently distributed if, for all values of x and y,
Pr(Y = y | X = x) = Pr(Y = y).
I.2. Pr(Y = y, X = x) = Pr(Y = y)Pr( X = x), for all values of x and y.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 38/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics38/40
Claim 2: If X and Y are independently distributed, then
E ( X |Y ) = E ( X ) and E (Y | X ) = E (Y ).
Proof: E (Y | X = x j) = Σk yk pk | j
= Σk yk j
jk
p
p
= Σk yk ( p j pk / p j) = Σk yk pk = E (Y ). ///
SUMMARY:
Independence Mean-independence Zero correlation.
However: We cannot go from right to left!
Stronger condition implies the weaker condition; not the other wayaround.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 39/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics39/40
Additional Linear Function Rules: (S&W Appendix 2.1)
Suppose Z = X + Y. Then using (), it is easy to show
E ( Z ) = E ( X ) + E (Y ).
In words, expectation of a sum equals the sum of expectations.
Continuing, if Z = X + Y , then Z * = X * + Y *, and Z *2 = X *2 + Y *2 + 2 X *Y *,
where the asterisk denotes the deviation from the expectation. So
V ( Z ) = E ( Z *2) = E ( X *2) + E (Y *2) + 2E ( X *Y *)
= V ( X ) + V (Y ) + 2C ( X ,Y ).
In words, variance of a sum equals the sum of the variances plus twice
the covariance.Exercise: Use the same logic to find the variance of a difference.
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats
http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 40/40
Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics40/40
Generalizing to linear functions, if
Z = a + bX + cY
where a, b and c are constants, then
E ( Z ) = a + bE ( X ) + cE (Y ),
so the deviation from the expectation is Z * = bX * + cY *, and the
variance of Z is
V ( Z ) = E ( Z *2) = b2V ( X ) + c2V (Y ) + 2bcC ( X ,Y ).
Still more generally, for a pair of random variables
Z 1 = a1 + b1 X + c1Y , Z 2 = a2 + b2 X + c2Y ,
where a’s b’s and c’s are constants, the covariance of Z 1 and Z 2 is
C ( Z 1, Z 2) = b1b2V ( X ) + c1c2V (Y ) + (b1c2 + b2c1)C ( X ,Y ).