40
8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 1/40 Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics İnsan TUNALI 9 February 2016 Econ 311 – Econometrics I Lectures 3-5 REVIEW OF PROBABILITY AND STATISTICS Draws on: Stock & Watson, Ch.2, Sections 2.1-2.3 Goldberger, Chs. 3-4. From the syllabus: “The prerequisites for ECON 311 include MATH 201 (Statistics), and ECON 201 (Intermediate Microeconomics). Students who got a grade of C or below in MATH 201 are strongly advised to work independently to make up for any deficiency they have during the first two weeks of the semester.

Weeks 2-3 L3-5 Review of Probability & Stats

Embed Size (px)

Citation preview

Page 1: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 1/40

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics

İnsan TUNALI 9 February 2016Econ 311 – Econometrics I Lectures 3-5

REVIEW OF PROBABILITY AND STATISTICS

Draws on:

Stock & Watson, Ch.2, Sections 2.1-2.3Goldberger, Chs. 3-4.

From the syllabus: “The prerequisites for ECON 311 include MATH 201

(Statistics), and ECON 201 (Intermediate Microeconomics). Students who got agrade of C − or below in MATH 201 are strongly advised to work independently to

make up for any deficiency they have during the first two weeks of the semester.”

Page 2: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 2/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics2/40

The probability framework for statistical inference

(a) Population, random variable, and distribution

Random variables and distributions can be classified as:

• Univariate, bivariate, trivariate … 

 

• 

Discrete, continuous, mixed.

(b) Moments of a distribution (mean, variance, standard deviation,

covariance, correlation)

(c) 

Conditional distributions and conditional means(d) Distribution of a sample of data drawn randomly from a

 population: Y B1B,…, Y BnB (subject of another handout). 

Page 3: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 3/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics3/40

(a) Population, random variable, and distribution

 Population

• 

The group or collection of all possible entities of interest

• We will think of populations as being “very big” (∞ is an

approximation to “very big”)

Outcomes... sample space... events... MATH 201

 Random variable Y

•  Numerical summary of a random outcome

 Population distribution of Y

• Discrete case: The probabilities of different values of Y  that

occur in the population

• Continous case: Likelihood of particular ranges of Y  

Page 4: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 4/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics4/40

How to envision

Discrete Probability Distributions

Urn model: Population = Balls in an urn

Each ball has a value (Y ) written on it;

Y  has K  distinct values:  y1, y2, ..., yi, ..., yK . 

Suppose we were to sample from this (univariate) population, with

replacement, infinitely many times… 

Page 5: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 5/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics5/40

(Population) Distribution of Y :

 pi = Pr(Y  = yi), i = 1, 2, …, K .

Gives the proportion of times we encounter a ball with value Y  = yi,i = 1, 2, …, K .

Alternate notation:  f ( y) = Pr(Y  = y); “probability function (p.f.) of  Y .”

Clearly pi  ≥ 0 for all i, and Σi  pi = 1.

Convention: Σi ≡ ∑=1  

Examples:

> Gender: M (=0)/F (=1). We prefer the numerical representation…

> Standing: freshman (=1), sophomore (=2), junior (=3), senior(=4).

> Ranges of wages (group them in intervals first).

Page 6: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 6/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics6/40

Cumulative Distribution Function (c.d.f) of Y :

F ( y) = Pr(Y  ≤ y) = ∑   )(≤  y y

ii

 y f 

 

= ∑≤  y y

ii

 p .

That is, to find F ( y) we sum the pi’s up to the value p = Pr(Y  = y).

Use of the c.d.f.:

Pr(a < Y ≤ b) = F (b) – F (a).

Important features of the distribution of Y :

  (Population) Mean of Y :

 μY  = E (Y ) = Σi  yi  pi. (Here Σi ≡ ∑   )=1  

Also known as “the expected value of Y” or simply “expectation of Y.”

Remark: Expectation is a weighted average of the values of Y  whereweights are the probabilities with which distinct values occur.

Page 7: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 7/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics7/40

The idea of “weighted averaging” can be extended to functions of Y .

Suppose Z  = h(Y ), any function of Y . Then the expected value of h(Y )

is: E ( Z ) = E [h(Y )]

= Σi h( yi) pi.

Thus knowledge of the probability distribution of Y  is sufficient for

calculating the expectation of functions of Y as well.Examples:

(i) Take Z  = Y 2. Then

 E ( Z ) = E (Y 2) = Σi  yi2  pi.

(ii) Take Z  = (Y– μY )2. Then

 E ( Z ) = E [(Y– μY )2] = Σi ( yi – μY  )2  pi.

Page 8: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 8/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics8/40

With this choice of h(Y ), we get the:

  (Population) Variance of Y :

2Y σ    = V (Y ) = E [(Y– μY )2]

= Σi ( yi – μY  )2  pi.

In words, variance equals the expected value of (or the expectation

of)“the squared deviation of Y from its mean.”

Example: Suppose random variable Y  can take on one of two values, y0 = 0 and y1 = 1 with probabilities p0 and p1.

Since p0 + p1 =1, we may take

Pr (Y  = 1) = p and Pr (Y  = 0) = 1 – p, 0 < p < 1.

We say Y  has a “Bernoulli distribution with parameter p” and

write: Y  ~ Bernoulli ( p).

Page 9: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 9/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics9/40

For Y  ~ Bernoulli ( p):

 μY  = E (Y ) = Σi  yi  pi = (0)(1 – p) + (1)( p) = p;

 E (Y 2) = Σi  y2i  pi = (0)2(1 – p) + (1)2( p) = p;

2Y σ    = V (Y ) = Σi ( yi  – μY  )2 pi 

= (0 – p)2(1 – p) + (1 – p)2( p) = … = p(1 – p). ///

(iii) Linear functions: Take Z  = a + bY , where a and b are constants. E ( Z ) = E (a + bY )

= Σk  (a + byk ) pk   = a Σk   pk  + b Σk  yk   pk  

= a + b E (Y ).

In words, expectation of a linear function (of Y) equals the linear function of the expectation (of Y).

Page 10: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 10/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics10/40

Useful algebra: Let Y * = Y  – μY , deviation of Y  from its population

mean.

This function is linear in Y , as in (iii), where a = – μY , and b = 1.

 E (Y *) = – μY  + (1) E (Y ) = 0.

In words, expectation of a deviation around the mean is zero.

 Next, examine Y *2 = (Y  – μY )2 = Y 2 + μY 2 – 2YμY next; this

function is not  linear in Y .

 E (Y *2) = E (Y 2 + μY 2 – 2YμY )

= E (Y 2) + μY 2 – 2 μY   E (Y ) (*)

= E (Y 2) + μY 2 – 2 μY 

2 = E (Y 2) –  [ E (Y )]2.

In line (*) we exploited the fact that E (.), which involves weightedaveraging, is a “linear” operator; thus expectation of a sum equalsthe sum of expectations.

Page 11: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 11/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics11/40

From (ii), V (Y ) = E (Y *2); thus

V (Y ) = E (Y 2) –  [ E (Y )]2.

In words, variance of Y equals “expectation of squared Y” minus“square of expected Y”.

Finally, let Z  = a + bY as in (iii), and consider the deviation of Z  fromits mean:

 Z * = Z  – E ( Z )= a + bY  – [a + bE (Y )] = bY *.

It follows that the variance of Z  is related to the variance of Y  via:

V ( Z ) = E ( Z *2)

= E [(bY *

)2

] = E (b2

Y *2

) = b2

 E (Y *2

) = b2

V (Y ).In words, variance of a linear function equals slope squared times

the variance

Page 12: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 12/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics12/40

Exercise: Well-drilling project . 

Based on previous experience, a contractor believes he will find water

within 1-5 days, and attaches a probability to each possible outcome.Let T  denote the (random amount of) time it takes to complete drilling.The probability distribution (p.f.) of T  is:

t  = time (days) 1 2 3 4 5Pr(T  = t ) = f T (t ) 0.1 0.2 0.3 0.3 0.1

(i) Find the cumulative distribution function (c.d.f.) of T  and interpretit.

t  = time (days) 1 2 3 4 5F T (t ) =

Page 13: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 13/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics13/40

(ii) Find the expected duration of the project and interpret the numberyou find.

The contractors’s total project cost is made up of two parts: A fixedcost of TL2,000, plus TL 500 for each day taken to complete the

drilling.(iii) Find the expected total project cost.

(iv) Find the variance of the project cost.

Page 14: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 14/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics14/40

Prediction: Consider the urn model, where population consists of

 balls in an urn. A ball is picked at random. Your task is to guess the

value Y  written on it. What would your guess be?Example: Suppose you had to predict how long a particular well

drilling project would take. What would your guess be?

One of the possible values of T ?

Some other number?

Page 15: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 15/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics15/40

We need more structure. Clearly, prediction is subject to error.

Errors can be costly, and large errors can be more costly.

What is the cost of a poor prediction?

Let “c”  be your guess (a number). 

Define prediction error  as U  = Y – c.

We would like to make U  small. Since Y is a random variable, U is also a

random variable.

More definitions:

 E (U ) = E (Y ) – c = bias of your guess (“c”).

 E (U 2) = E [(Y  – c)2]) = mean (expected ) squared error  of guess c.

Mean Squared Prediction Error criterion: Suppose the objective is to

minimize E (U 2). Then the best predictor (guess) is c = μY  = E (Y ).

Page 16: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 16/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics16/40

Proof: Can use calculus.

 E (U 2) = E [(Y  – c)2]) = Σi ( yi – c)2 pi.

Differentiation yields:

∂ E (U 2)/∂c = Σi ∂[( yi – c)2 pi]/∂c = Σi [–2( yi – c) pi].

Setting the derivative to zero yields the first order condition (F.O.C.)

for a minimum: Σi [–2( yi – c) pi] = 0.

That is,

Σi  yi  pi = c Σi  pi.

We know: Σi  pi = 1, and  Σi  yi  pi = E (Y ), so the solution is c = μY .(Check the second order condition to verify that we located a minimum.)///

Page 17: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 17/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics17/40

 Non-Calculus proof: For brevity let μ = μY   and reexamine the

 prediction error:

U  = Y – c = Y – μ – (c – μ) = Y *  – (c – μ),

where Y * = Y – μ as usual. Square both sides and expand:

U 2 = [Y *  – (c – μ)]2 = Y *2 + (c – μ)2 – 2Y *(c – μ).

Take expectations, and recall “useful algebra”:

 E (U 2) = E [Y *2 + (c – μ)2 – 2Y *(c – μ)]

= E (Y *2) + (c – μ)2 – 2(c – μ) E (Y *)

= V (Y ) + (c – μ)2

.Since V (Y ) > 0 and (c – μ)2 ≥ 0, minimum E (U 2) is obtained by

setting c = μ.  ///  

Page 18: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 18/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics18/40

Remarks:

(i)  If we use the mean squared prediction error , then the

 population mean (that is, the expectation of the randomvariable) is the best guess (predictor) of a draw from that

 population (distribution).

(ii)  Variance equals the value of the expected squared prediction

error when the population mean is used as the predictor.(iii)  Other criteria may yield different choices of best predictor.

For example, if the criterion were minimization of the

expected absolute prediction error, namely E (|U |), then the

 population median would be the best predictor.

Page 19: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 19/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics19/40

How to envision

lConditionaMarginal

Joint

 

Probability Distributions

Urn model: Population = Balls in an urn

Bivariate population: Each ball has a pair of values ( X , Y ) written on it.

 X has J  distinct values:  x1, x2, ..., x j, ..., x J . 

Y  has K  distinct values:  y1, y2, ..., yk , ..., yK . 

Page 20: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 20/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics20/40

Joint (population) distribution of X  and Y :

 p jk  = Pr( X  = x j, Y = yk ),  j = 1, 2, …, J ; k  = 1, 2, …, K .

Gives the proportion of times we encounter a ball with paired values( x j, yk ),  j = 1, 2, …, J ; k  = 1, 2, …, K .

The joint distribution classifies the balls according to values of both

 X and Y . To obtain a “marginal” distribution, we reclassify the balls

in the urn according to the distinct values of one “margin”. We

ignore the distinct values of the second margin.

Marginal (population) distribution of X :

 p j = Pr( X  = x j), j = 1, 2,…, J .

Page 21: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 21/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics21/40

Here we ignore the values of Y , and examine the proportion of timeswe encounter a ball with values x j, j = 1, 2,…, J .

How to obtain a marginal distribution of X  from the joint distributionof X  and Y :

 p j = Σk   p jk , j = 1, 2,…, J .

(Population) Mean of X :

 μ X  = E ( X ) = Σ j  x j  p j.(Population) Variance of X :

2 X σ    = V ( X ) = Σ j ( x j – μ X  )2  p j.

The marginal distribution of Y, its mean and variance may be

obtained in analogous fashion (write down the formula!).

Page 22: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 22/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics22/40

Exercise: Consider S&W Table 2.3, Panel A (see next page).

(i)  Verify the derivation of the marginal distributions of A and M . 

(ii) 

Find the means E ( A), E ( M ) and variances V ( A), V ( M ). 

Page 23: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 23/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics23/40

(Stock & Watson)

Page 24: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 24/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics24/40

To obtain a “conditional” distribution, we first sort the balls

according to one of the two values, and put them in different urns.

We then examine the contents of a specific urn.To obtain the conditional distributions of Y  given X  we sort on

distinct values  x j:

POPULATION

SUBPOPULATIONS . . . . . . . .

 X = x1  X = x2  X = x j  X = x J

 Each urn has a distribution of values of Y!

Page 25: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 25/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics25/40

These conditional distributions may be different (hence each

subpopulation may have a different mean and variance). We can

distinguish between them as long as we record the distinct value of X for that urn.

Conditional (population) distribution of Y  given X  = x j:

 pk | j = Pr(Y  = yk  | X  = x j) =

 j

 jk 

 p

 p, k  = 1, 2, …, K .

The derivation requires p j > 0.

Conditional (population) mean of Y given X  = x j:

 μY | j = E (Y | X = x j) = Σk   yk   pk | j.

Conditional (population) variance of Y  given X  = x j:2| jY σ    = V (Y | X = x j) = Σk  ( yk  – μY | j)2  pk | j.

Page 26: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 26/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics26/40

The conditional distributions of X given Y = yk  , and their conditional

means and variances may be obtained in analogous fashion (write

down the formula you would use!).Exercises: Consider S&W Table 2.3, Panel B (see page 22 above).

(i)  Verify the derivation in Panel B.

(ii)  Find E ( M  | A) for A = 0, 1 and interpret them.

Page 27: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 27/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics27/40

Practical uses of conditional expectations:

•  Consider the conditional distributions given in S&W Table 2.3

(p.70). Suppose you have an old computer. How would you justify buying a new computer?

Hint: Calculate the benefit (reduction in expected crashes) of

switching from an old computer to a new one.

Page 28: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 28/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics28/40

Practical uses of conditional expectations, cont’d:

•  Consider the urn model. We obtain a random draw from the joint

distribution of ( X , Y ). We tell you the value of X . What is your best guess of the value of Y ?

Hint: Suppose X  = x j. Then an equivalent way of stating the problem

is that Y  has been drawn from the urn labeled X  = x j.

Page 29: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 29/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics29/40

We saw that “expectation” is weighted average. In the urn model, if

we focus on urn labelled X = x j, we find the conditional mean using

 μY | j = E (Y | X = x j) = Σk   yk   pk | j, j = 1,2,…, J .

In classifying the balls, the urn labelled X = x j is used with probability

 p j = Pr( X  = x j). As a consequence:

 μY  = E (Y ) = Σ j E (Y | X = x j) Pr( X  = x j).Thus, expectation (mean) of Y  is a weighted average of the

conditional expectations of Y  given X = x j , weighted by Pr( X  = x j).

We may write:  E (Y ) = E  X [ E (Y | X )].

This results is known as the Law of Iterated Expectations.

Page 30: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 30/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics30/40

Law of Iterated Expectations:

 E (Y ) = E  X [ E (Y | X )].

Observe that:

•  The “inner” expectation E (Y | X ) is a weighted average of the

different values of y, weighed by conditional probabilities

Pr(Y  = yk | X  = x j) (here X  is “given”, we know which urn the balls

come from).

•  The “outer” expectation E  X [.] is a weighted average of the

different values of E (Y | X = x j), weighed by probabilities Pr( X  = x j).

Exercise: Earlier we used the marginal distribution of M to calculate E ( M ). Can you think of another way to compute E ( M )? (see S&W:72)

Page 31: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 31/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics31/40

Functions of jointly distributed random variables:

Let Z  = h( X , Y ), a function of two random variables, X  and Y .

Suppose the joint distribution of X  and Y is known.

Then the expectation of Z  can be computed in the usual manner,

as a weighted average:

 E ( Z 

) = E 

[h( X 

,Y 

)] = Σ j

Σ j h

( xi

, y j

) Pr( X 

 = x j

,Y

= y j

)= Σ j Σ j h( xi, y j) pij  ()

where the probability weights pij, i = 1, 2,…, I  and j = 1, 2,…, J are

obtained from the joint distribution.

Exercise: Use S&W Table 2.3 to compute E ( MA).

Page 32: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 32/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics32/40

(Population) covariance:

In a joint distribution, the degree to which two random variables

are related may be measured with the help of covariance:

Cov( X , Y ) = σ  XY  = E ( X *Y *) = E [( X  – μ X )( Y  – μY )]

= Σ j Σk ( x j – μ X  )( yk  – μY  ) Pr( X  = x j, Y = yk ).

Remark: We took Z  = h( X , Y ) = ( X  – μ X )(Y  – μY ) and found E ( Z )…

Useful algebra:

 E ( X *Y *) = E [( X  – μ X )( Y  – μY )] = … 

= E ( XY ) – E ( X ) E (Y ). ()

In words, covariance equals the expected value of the product, minusthe product of the expectations. 

Page 33: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 33/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics33/40

(Population) covariance cont’d:

The “sign” of covariance is informative about the nature of the

relation:

If above average values of X  go together with above average 

values of Y (so that below average values of X  go together with

below average values of Y ) covariance will be positive.

If above average values of one variable go together with below

average values of the other, covariance will be negative.

Exercise: Suppose X  = weight, Y  = height of individuals in a

 population. Can you guess the sign of Cov( X , Y ) = ?

Page 34: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 34/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics34/40

(Population) correlation:

The magnitude of covariance is affected by the units of measurement

of the variables. For a unit-free measure, we turn to correlation:

Corr ( X , Y ) = ρ XY = (,)

� ()� () =

 .

It can be shown that – 1 ≤ ρ XY  ≤ 1.

Random variables are said to be uncorrelated  if ρ XY  = 0. Clearlyfor this to happen, σ  XY  = 0 must hold.

Recall that in general E (Y | X ) is a function of X ; it tells us how the

conditional mean of Y  given X  = x j changes with x j, j = 1, 2, …, J. 

Remark: Think about the urn model. Think about prediction.

Page 35: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 35/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics35/40

Suppose E (Y | X ) = E (Y ) = μY , a constant. To describe this case, we say

Y  is mean-independent of X .

Claim 1: If Y  is mean-independent of X , then σ  XY  = 0 (  ρ XY  = 0).

Proof:  E ( XY ) = E (YX ) = E  X [ E (YX | X )] = E  X [ E (Y | X ) X ];

*When we “condition” on X , we set it equal to a particular value.

If E (Y | X ) = E (Y ), the last expression simplifies:

= E  X [ E (Y ) X ] = E (Y ) E ( X ).

We showed:

If Y is mean-independent of X , then E ( XY ) = E (Y ) E ( X ).

Return to () and note that σ  XY  = 0 iff E ( XY ) = E ( X ) E (Y ). Thus σ  XY  = 0… ///

Page 36: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 36/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics36/40

CAUTION: If σ  XY  = 0, it does not follow that E (Y | X ) = constant.

Covariance/correlation capture the linear  relation between X  and Y .

It could be that the relation is non-linear, so that E (Y | X ) varieswith X , but yet σ  XY  = 0.

Example: Modify the joint distribution in Assignment 2 Part II as:

and (re)calculate Cov( X , Y ).

 f ( x, y)  x = –1  x = 0  x =1 y = 1 0.20 0.10 0.20 y = 2 0.10 0.30 0.10

Page 37: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 37/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics37/40

Independence: Random variables X  and Y  are (statistically)

independent , if knowledge of the value of one of the variables

 provides no information about the other. Formally:

From the definition of conditional probabilities,

Pr(Y  = y, X  = x) = Pr(Y  = y | X  = x)Pr( X  = x).

Thus an equivalent condition for, and implication of independence is:

I.1. X  and Y  are independently distributed  if, for all values of x and y,

Pr(Y  = y | X  = x) = Pr(Y  = y). 

I.2. Pr(Y  = y, X  = x) = Pr(Y  = y)Pr( X  = x), for all values of x and y. 

Page 38: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 38/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics38/40

Claim 2: If  X and Y  are independently distributed, then

 E ( X |Y ) = E ( X ) and E (Y | X ) = E (Y ).

Proof:  E (Y | X = x j) = Σk   yk   pk | j 

= Σk   yk   j

 jk 

 p

 p

 = Σk   yk  ( p j pk  /  p j) = Σk   yk   pk  = E (Y ). ///

SUMMARY:

Independence  Mean-independence  Zero correlation.

However: We cannot go from right to left!

Stronger condition implies the weaker condition; not the other wayaround.

Page 39: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 39/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics39/40

Additional Linear Function Rules: (S&W Appendix 2.1)

Suppose Z  = X  + Y. Then using (), it is easy to show

 E ( Z ) = E ( X ) + E (Y ).

In words, expectation of a sum equals the sum of expectations. 

Continuing, if Z  = X  + Y , then Z * = X * + Y *, and Z *2 = X *2 + Y *2 + 2 X *Y *,

where the asterisk denotes the deviation from the expectation. So

V ( Z ) = E ( Z *2) = E ( X *2) + E (Y *2) + 2E ( X *Y *)

= V ( X ) + V (Y ) + 2C ( X ,Y ).

In words, variance of a sum equals the sum of the variances plus twice

the covariance.Exercise: Use the same logic to find the variance of a difference.

Page 40: Weeks 2-3 L3-5 Review of Probability & Stats

8/17/2019 Weeks 2-3 L3-5 Review of Probability & Stats

http://slidepdf.com/reader/full/weeks-2-3-l3-5-review-of-probability-stats 40/40

 

Econ 311 – Spring 2016, Weeks 2-3: Review of Probability and Statistics40/40

Generalizing to linear functions, if

 Z  = a + bX + cY  

where a, b and c are constants, then

 E ( Z ) = a + bE ( X ) + cE (Y ),

so the deviation from the expectation is Z * = bX * + cY *, and the

variance of Z  is

V ( Z ) = E ( Z *2) = b2V ( X ) + c2V (Y ) + 2bcC ( X ,Y ).

Still more generally, for a pair of random variables

 Z 1 = a1 + b1 X + c1Y ,  Z 2 = a2 + b2 X + c2Y ,

where a’s b’s and c’s are constants, the covariance of Z 1 and Z 2 is

C ( Z 1, Z 2) = b1b2V ( X ) + c1c2V (Y ) + (b1c2 + b2c1)C ( X ,Y ).