143
BU255 FINAL Exam-AID Taught by: Greg Overholt

BU255 FINAL Exam-AID Taught by: Greg Overholt

Embed Size (px)

DESCRIPTION

BU255 FINAL Exam-AID Taught by: Greg Overholt. What are we doing??. Stats  Lectures 10 to 20!! All of it. Lecture 10 & 11:Estimation. Chapter s 5 and 6 : - PowerPoint PPT Presentation

Citation preview

Page 1: BU255 FINAL Exam-AID Taught by: Greg Overholt

BU255 FINAL Exam-AIDTaught by: Greg Overholt

Page 2: BU255 FINAL Exam-AID Taught by: Greg Overholt

What are we doing??

• Stats • Lectures 10 to 20!!

– All of it..

Page 3: BU255 FINAL Exam-AID Taught by: Greg Overholt

Lecture 10 & 11:Estimation

Page 4: BU255 FINAL Exam-AID Taught by: Greg Overholt

Chapters 5 and 6:• Binomial, Poisson, normal, and exponential

distributions allow us to make probability statements about X (an individual member of the population).

• To do so we need the population parameters.– Binomial: p– Poisson: μ– Normal: μ and σ– Exponential: λ

Page 5: BU255 FINAL Exam-AID Taught by: Greg Overholt

Chapter 7: Sampling distributions allow us to make

probability statements about sample statistics.

We need the population parameters.Sample mean: µ and σSample proportion: p

However, in almost all realistic situations parameters are unknown. We will use the sampling distribution to draw inferences about the unknown population parameters.

Page 6: BU255 FINAL Exam-AID Taught by: Greg Overholt

• Introduction to Statistical Inference• Estimation

– Point and Interval Estimators– Properties of Estimators

• Interval Estimation [ confidence intervals]

• Determining Sample Size

Page 7: BU255 FINAL Exam-AID Taught by: Greg Overholt

Estimation

• Estimation: determining approximate value of pop parameter based on sample statistic.

• 2 types:– Point Estimator

• No good. Too small

– Interval Estimator• Used almost all the time.• Uses an interval to estimate the population

parameter. • Provides % certainty that it is between a lower and

upper bound

Page 8: BU255 FINAL Exam-AID Taught by: Greg Overholt

Estimating u when σ known

You typically want to know u.. And if you have σ?

XZ

n

1n

zxn

zxP 2/2/

100(1-)% Confidence Interval of μ when is known

Page 9: BU255 FINAL Exam-AID Taught by: Greg Overholt

EXAMPLE

• Diageo, sampled 85 Laurier students and determined the sample mean of alcohol consumption was 510 drinks a term. They previous calculated that the population standard deviation was 46. Please create interval of population mean with 95% confidence.

X(bar) = 510 n = 85 σ = 46 Za/2 .. Unknown, but we want 95% confidence.

95% in the middle, so that’s 2.5% on each tail, so we want to find the Z value of .475 = 1.96

= 510 – 1.96(46/√85) < u < 510+ 1.96(46/√85)

= 510 – 9.78 < u < 510+ 9.78

= 500.22 < u < 519.78

95% confident that the average number of drinks for the population is between 500.22 and 519.78

Page 10: BU255 FINAL Exam-AID Taught by: Greg Overholt

Intro – T dist: What’s different?

• In past, we have known standard deviation of the population (which is unrealistic)

– With it, we can use Z stat to make inferences

• NOW, we don’t know st dev. So, have to use the ‘sample st dev’ – why we use the T-stat– GOT to have a normal (or approx) population

dist!

Page 11: BU255 FINAL Exam-AID Taught by: Greg Overholt

T-Distribution

Page 12: BU255 FINAL Exam-AID Taught by: Greg Overholt

Degrees of Freedom

• It is the number of items that are free to vary to define the mean.. The best way to think of it is to assume one of the numbers is your mean, and the rest are simply numbers around the mean to determine its shape (so the degrees of freedom are the number of items that determine the shape (n – the one center value)

• (Normal has df = infinity

Page 13: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 14: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 15: BU255 FINAL Exam-AID Taught by: Greg Overholt

EXCEL (good MC)

• T-dist calculations can be done using excel:– TDIST(x,degrees_freedom,tails)

• This is when you want the % in the tail(s).• TDIST(1.3,60,1)

– 1.3 is your t-value (like your z-value) and the curve is drawn with 60 degrees of freedom and you want the 1 tail test (vs 2). (ANSWER = 0.0992 (so 9% in the 1 tail test))

– TINV(p-value,degrees_freedom)• This is the inverse. Give it the % in the tails and it will

give you the T-value. • NOTE: will give you the % in a 2-tail test!!!!

– SO, if they wanted you to do the inverse of the q above to get a t-value of 1.3:

– TINV(0.1984, 60) – you double the percentage for 2 tail!

Page 16: BU255 FINAL Exam-AID Taught by: Greg Overholt

Formula / example

• Assume pop is relatively normal– Confidence interval formula

QUESTION: The researched average cost of a standing-room only ticket to a Leafs game from a scalper is $168.

A random sample of buying 16 tickets from different scalpers resulted in xbar= $172.50, s = $15.40. Find the 95% interval estimate. Assume population distribution is relatively normal.

1n

Xt

s

n

/ 2, 1 / 2, 1n n

s sx t x t

n n

Degrees of Freedom?: (n-1) = 15.

What is the T value? (t .025, 15 ) =

Page 17: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 18: BU255 FINAL Exam-AID Taught by: Greg Overholt

Formula / example

• Assume pop is normal (relatively)– Confidence interval formula

The researched average cost of a standing-room only ticket to a Leafs game from a scalper is $168.

A random sample of buying 16 tickets from different scalpers resulted in xbar= $172.50, s = $15.40. Find the 95% interval estimate. Assume population distribution is relatively normal.

1n

Xt

s

n

/ 2, 1 / 2, 1n n

s sx t x t

n n

Degrees of Freedom?: (n-1) = 15.

What is the T value? (t .025, 15 ) = 2.131

SO formula is 172.50 – 2.131(15.40/4) < u < 172.50 + 2.131(15.4/4)

INTERVAL of 95% confidence 164.3 to 180.7 … YES, this includes 168

Page 19: BU255 FINAL Exam-AID Taught by: Greg Overholt

Estimating the Population Proportion

Assumption: ˆ ˆ5 and (1 ) 5np n p

ˆ

ˆ ˆ(1 )

p pZ

p pn

/ 2 / 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

p p p pp z p p z

n n

Formula (did this in midterm):

Confidence Interval of p(new.. But simply rearranging letters):

Page 20: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example

What proportion of male students in Canada are have a violent case of ‘Beiber Fever’?

A random sample of 1,350 Laurier students were sampled, and 250 of them reveled they had ‘Beiber Fever’.. What is the 98% confidence interval for the population proportion?

Page 21: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example

A random sample of 1,350 Laurier students were sampled, and 250 of them reveled they had ‘Beiber Fever’.. What is the 98% confidence interval for the population proportion?

/ 2 / 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

p p p pp z p p z

n n

P (hat) = 250/1350 = .185

Z a/2 = the Z value which has 1% in each tail (2% in total for a 98% confidence)

Page 22: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example

A random sample of 1,350 Laurier students were sampled, and 250 of them reveled they had ‘Beiber Fever’.. What is the 98% confidence interval for the population proportion?

/ 2 / 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ

p p p pp z p p z

n n

P (hat) = 250/1350 = .185

Z a/2 = the Z value which has 1% in each tail (2% in total for a 98% confidence) 2.325.

RESULT= 0.185 + / – 2.325 ( √ .185 ( 1 - .185) / 1350 )

98% confidence range = 0.1604 to 0.2095

Page 23: BU255 FINAL Exam-AID Taught by: Greg Overholt

Selecting the sample size

• The difference between the sample mean and the population mean is called the error of estimation.

• You can make sure you stay within it, by another freaking formula:

E = error (given in q)

2

/ 2zn

E

Page 24: BU255 FINAL Exam-AID Taught by: Greg Overholt

EXAMPLE

• I want to know how many students I need to interview to find out how many times a Laurier student facebook stakes in 1 day. I want to be 95% certain and that the range of error is 2. It turns out the standard deviation of this stat is 5. GO:

n = ? (what we want to find out)

σ = 5

E = 2

Z a/2 = 95% confidence.. Which is 2.5% in each tail, which is a z value of 1.96

n = (1.96 * 5 / 2 ) 2

n = 24.01 (so need 25 people)

2

/ 2zn

E

Page 25: BU255 FINAL Exam-AID Taught by: Greg Overholt

Determining n when Estimating p

2/ 2

2

(1 )p p zn

E

1. Use the historical min or max of p, if available.2. To be safe, use p = 0.5 if p is totally unknown.

What proportion of students in statistics actually open their textbook? To estimate this proportion within 5% and be 95% confident, how large a sample should you take?

If historically <15% of students ever do, and NO historical information is available.

n = .15 ( .85) * 1.962 / .052

N = 100.062

N = 101 (MUST round up!)

Page 26: BU255 FINAL Exam-AID Taught by: Greg Overholt

Class 12 & 13: Intro Hypothesis Testing for

single populations

Page 27: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypothesis Testing

• There are two procedures for making inferences:– Estimation. – Hypotheses testing.

• The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.

Page 28: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypothesis Testing

• There are two hypothesis:– Null Hypothesis (H0)

• Assumed to be true• Ex. The defendant is innocent

– Alternative (or research) Hypothesis (H1)

• Opposite of H0

• Ex. The defendant is guilty

• NOTE: The null will always states the parameter equal the value specified in the alternative.

Page 29: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypothesis Testing Process

• Step 1: State the Null and Alternative– Eg: You want to see if the exam average will be

greater then 75%.• H0 = 75• H1 > 75

• Step 2: randomly sample the pop and create a test statistic (in this case a sample mean)– The procedure begins with the NULL BEING TRUE (and the

goal is to see if there is enough evidence to say that the alternative is true).

• Step 3: Make statement about hypo– If t-stat value is inconsistent with null hypo, we reject the

null alternative is true.

Page 30: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypo Testing Decisions

1. Reject the null in favour of the alternative

• Sufficient evidence to support the alternative

2. Do not reject the null in favour of alt.– Does not mean ‘accepting the null’ (just

not enough evidence)– Ex. Can’t prove that the defendant is

guilty does not mean that he is innocent

Page 31: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypo Testing Errors

• Two types of errors are possible when making the decision whether to reject H0(the null hypothesis)

• Type 1 error (alpha): reject null hypothesis – send a innocent man to jail (reject null when null is true!) MOST SERIOUS OF THE TWO!

Page 32: BU255 FINAL Exam-AID Taught by: Greg Overholt

Our original hypothesis…

our new assumption…

Type 2 error: don’t reject a false null hypothesis (go with the safe null assumption.. Don’t have the balls to reject it!! )Guilty man goes free. (not rejected null when null is actually false)It can be calculated .. (later).

Hypo Testing Errors

THIS EXAMPLE IS TESTING AVG HYDRO BILLS, estimated mean of 170.

Sample bills were taken to get x bar, and trying to figure out the critical range.

Page 33: BU255 FINAL Exam-AID Taught by: Greg Overholt

2 ways to Test: Rejection Region

• Depending on you are looking for <, >, or not equal to, you define the rejection region• Level of significance = α

Page 34: BU255 FINAL Exam-AID Taught by: Greg Overholt

Test It: P-value

• The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true.

• The smallest value of α for which H0 can be rejected

p-value

QUESTION: Testing hydro bills.. If they think the average customer’s hydro bill is 170 (with standard deviation of 65), and they want to test to see if they are larger than that. The company tested 400 customers to find that they had an average of $178. Should you reject or accept the null hypothesis? (we want 95% confidence)

Page 35: BU255 FINAL Exam-AID Taught by: Greg Overholt

Test It: P-value

• The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed given that the null hypothesis is true.

• The smallest value of α for which H0 can be rejected

p-value

P-value =.0069

Z=2.46

Page 36: BU255 FINAL Exam-AID Taught by: Greg Overholt

Type II Error Example

Example:• H0: µ = 170• H1: µ > 170

• At a significance level of 5% we rejected H0 in favor of H1 since our sample mean (178) was greater than the critical value of (175.34).

If want to do a Type 2 error - In the question – they will have to give you the new mean to test. ($180 mean)

• β = P( x < 175.34, given that µ = 180), thus…

Page 37: BU255 FINAL Exam-AID Taught by: Greg Overholt

Our original hypothesis…

our new assumption…

Chance we send a guilty man free

Page 38: BU255 FINAL Exam-AID Taught by: Greg Overholt

Changing your confidence requirement!

Page 39: BU255 FINAL Exam-AID Taught by: Greg Overholt

INCREASE THE SAMPLE SIZE!

Page 40: BU255 FINAL Exam-AID Taught by: Greg Overholt

Estimating MEAN with T-stat (refresher)

T-dist instead of normal & sample stdev and not population stdev.

QUESTION: Tiger Woods is rumoured to pay his ‘girls’ $1million per year to stay quiet. If a random sample of 7 of them were taken and the mean was $800,000 with a stdev of $100,000. Find the 95% interval estimate of the population mean. (assume pop is normal..)

Degrees of freedom = 6

= 800K + t(.025) (100K/√7)

= 800K + 2.447(37,796)

= 800,000 +/- 92,486 RANGE between $707,513 and $892,486

Page 41: BU255 FINAL Exam-AID Taught by: Greg Overholt

Hypo testing with T-stat

• T-statistic: Same as z-stat, just using sample mean and stdev!

QUESTION: SO.. can we conclude with 95% confidence that the mean that Cheetah pays for his girls is not $1,000,000?

T = 800,000 – 1,000,000 / (100,000 / √7)

T = -200,000 / 37,796 = -5.29

H0. u = 1MillionH1. u ≠ 1 million (two tail test)

T-critical with 2.5% in each tail is at -/+ 2.447.

-5.29 is definitely past -2.447, reject the null = mean isn’t 1 million!

Page 42: BU255 FINAL Exam-AID Taught by: Greg Overholt

Third type.. proportion

ˆ

(1 )

p pz

p pn

Assumption:   np > 5, n(1-p) > 5

The high school student council believes that 11% of its students will come to the school dance wasted, and they wanted to test their belief. A sample of 200 students resulted in 28 indicating they in fact, will be tanked before they arrive. Use a probability of a Type I error of 0.10.

H0 = .11 will be drunk

H1 ≠ .11 will br drunk (have to use 2 tail test.. Don’t know which way they are testing)

P hat = 28/200 or .14

Page 43: BU255 FINAL Exam-AID Taught by: Greg Overholt

The high school student council believes that 11% of its students will come to the school dance wasted, and they wanted to test their belief. A sample of 200 students resulted in 28 indicating they in fact, will be tanked before they arrive. Use a probability of a Type I error of 0.10.

ˆ

(1 )

p pz

p pn

H0 = .11 will be drunk

H1 ≠ .11 will br drunk

P hat = 28/200 or .14

Z = .14 - .11 / √ ( .11 ( .88) / 200 )

Z = .03 / √ .000484

Z = .03 / .022

Z = 1.36

Is 1.36 far enough? On z-table, we need to know the p value of 1.36 and compare to .05 (half of significant level of .10)

Page 44: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 45: BU255 FINAL Exam-AID Taught by: Greg Overholt

The high school student council believes that 11% of its students will come to the school dance wasted, and they wanted to test their belief. A sample of 200 students resulted in 28 indicating they in fact, will be tanked before they arrive. Use a probability of a Type I error of 0.10.

ˆ

(1 )

p pz

p pn

H0 = .11 will be drunk

H1 ≠ .11 will br drunk

P hat = 28/200 or .14

Z = .14 - .11 / √ ( .11 ( .88) / 200 )

Z = .03 / √ .000484

Z = .03 / .022

Z = 1.36

Is 1.36 far enough? On z-table, we need to know the p value?

Z = 1.36 has .0869 in the right tail.

BUT this is a two-tail test, so it needed by < .05., so cannot reject the null in favour of the alternative.

Page 46: BU255 FINAL Exam-AID Taught by: Greg Overholt

RECAP!!

With 1 population, we talked about:Mean (known ) Mean (unknown )

ProportionStandard z-stat T-statistic z-stat for prop.

XZ

n

Page 47: BU255 FINAL Exam-AID Taught by: Greg Overholt

Lectures 14 &15: Inference about comparing Two

Populations

Page 48: BU255 FINAL Exam-AID Taught by: Greg Overholt

inference about comparing two population

• With two populations, we can be:– Comparing the means– Comparing paired observations– Comparing proportions

Page 49: BU255 FINAL Exam-AID Taught by: Greg Overholt

COMPARING TWO MEANS

• Similar to dealing with 1 mean, now we are looking at the difference of two pop means:

1. If you know the stdev’s?? Plug-and-play:

2 2

1 21 2

1 2/ 2

zx x n n

Confidence Formula:

Page 50: BU255 FINAL Exam-AID Taught by: Greg Overholt

A random sample of 32 business students from Laurier are asked how often they party so hard they don’t remember what happened during the night previous. A similar random sample is taken of 34 science students. The results and the population SDs are given below.

Q. Is there enough evidence to say that they differ with a 5% confidence level?

1

1

1

32

70.700

16.253

nx

2

2

2

34

62.187

12.900

nx

Question Example

Page 51: BU255 FINAL Exam-AID Taught by: Greg Overholt

• You have st dev’s.. You can use this formula:

a) z = (70.7 – 62.187) – (0) √(16.2532/32 + 12.92/34)

z = 8.513 / 3.62

z = 2.347 (what p value is that on the z table?)

Page 52: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 53: BU255 FINAL Exam-AID Taught by: Greg Overholt

• You have st dev’s.. You can use this formula:

a) z = (70.7 – 62.187) – (0) √(16.2532/32 + 12.92/34)

z = 8.513 / 3.62

z = 2.347 (what p value is that on the z table?)

P-value = .0096

(yes, less than 2.5% in tail – they DO DIFFER!!!)

Page 54: BU255 FINAL Exam-AID Taught by: Greg Overholt

Two means, unknown variances!

• MOST times you don’t know stdev’s, so not so straightforward, you need to the t-test, and there are 2 cases:– When they have equal variances– When they are unequal variances.

Page 55: BU255 FINAL Exam-AID Taught by: Greg Overholt

Type 1. Assumed equal variances

• With equal variances, you can use that property to bring together the variances with their degrees of freedom, assume pops are relatively normal. This is a T-statistic

First get a combined estimate value:

Use it to find your t-value:

Tn1+n2-2 =

Page 56: BU255 FINAL Exam-AID Taught by: Greg Overholt

Population 1: Average amount (per present) for those who buy presents at Walmart

Population 2: Average amount (per present) for those who buy presents at Value Village

Assume both populations are Normal.

1 2

1 2

1 2

13, 15

4.35, 6.84

1.20, 1.42

n n

x x

s s

9.95 8.99

Example

Page 57: BU255 FINAL Exam-AID Taught by: Greg Overholt

Is there enough evidence to conclude that people at Walmart spend more then those at Value Village? αα=0.10=0.10H0 u1 = u2H1 u1 < u2 (or u1-u2 > 0).. 1 tail test!!!

Step 1: Get Pooled Variance

Sp2 = (13-1)*1.2*1.2 + 14*1.42*1.42

13+15-2

Sp2 = 1.75

1 2

1 2

1 2

13, 15

4.35, 6.84

1.20, 1.42

n n

x x

s s

9.95 8.99

Page 58: BU255 FINAL Exam-AID Taught by: Greg Overholt

Step 1: Get Pooled Variance

Sp2 = (13-1)*1.2*1.2 + 14*1.42*1.42

13+15-2

Sp2 = 1.75

Step 2: get T statistic

(9.95 – 8.99) – 0

√1.75 √ 1/13 + 1/15

t13+15-2, .10 =

t26,.10 = .96 / .50128 = 1.915

Now need to check with critical value at .10 significance!

1 2

1 2

1 2

13, 15

4.35, 6.84

1.20, 1.42

n n

x x

s s

9.95 8.99

Page 59: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 60: BU255 FINAL Exam-AID Taught by: Greg Overholt

Step 1: Get Pooled Variance

Sp2 = (13-1)*1.2*1.2 + 14*1.42*1.42

13+15-2

Sp2 = 1.75

Step 2: get T statistic

(9.95 – 8.99) – 0

√1.75 √ 1/13 + 1/15

t13+15-2, .10 =

1 2

1 2

1 2

13, 15

4.35, 6.84

1.20, 1.42

n n

x x

s s

9.95 8.99

t26,0.10 = .96 / .50128 = 1.915

Now need to check with critical value at .10 significance!

YES! 1.915 > 1.315, so reject null in favour that walmart shoppers spend more

Page 61: BU255 FINAL Exam-AID Taught by: Greg Overholt

Construct a 90% confidence interval on the difference between the average spend at Walmart vs The Village.

1 21 2 2, / 21 2

1 1( ) *n n px x t s

n n

We need t12+14-2, .05 now.. Back to the table!!

Page 62: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 63: BU255 FINAL Exam-AID Taught by: Greg Overholt

Construct a 90% confidence interval on the difference between the average spend at Walmart vs The Village.

1 21 2 2, / 21 2

1 1( ) *n n px x t s

n n

t26, .05 = 1.706

Result = .96 +/- 1.706*0.66 (from last example)

The difference range of Walmart – Value Village purchases is between -$0.165 to $2.085

Page 64: BU255 FINAL Exam-AID Taught by: Greg Overholt

Type 2. unequal variances

• With UNEQUAL variances, you have this intense degree of freedom formula – NEED TO CHAT ABOUT, what do you guys need to know about these formulas??

Use it to find your t-value:

Page 65: BU255 FINAL Exam-AID Taught by: Greg Overholt

Chapter 13 (inference about comparing two population)

• With two populations, we can be:– Comparing the means– Comparing paired observations– Comparing proportions

Page 66: BU255 FINAL Exam-AID Taught by: Greg Overholt

Paired Observation

Matched Pairs Experiment (t-test and estimator of UD )

• if you can find a way to pair the independent samples, then you can use this method. Just cause they have the same number of samples, doesn’t mean they are matched, even if they are ordered, they NEED to be matched on another variable (gpa buckets, same people but different dates etc).

• We are actually making inference on the mean difference between matched pairs of the two populations: D= μ1 – μ2

• Most common hypotheses: – Ho: D=0 – Ha: D (<, >, ≠) 0

Page 67: BU255 FINAL Exam-AID Taught by: Greg Overholt

Matched Pairs TestMatched Pairs Test: Mean : Mean Difference Difference Between Two Dependent Between Two Dependent SamplesSamples

• Additional Assumption: the difference between matching pairs of the two possible populations is Normal.

number of pairs

= mean sample difference

= mean population difference

= SD of sample difference

/

d

d

d

D

s

d D

s n

n

100(1 - )% CI of D

1, / 2d

n

sd t

n

Formula for t-test (Df = n - 1 )

Page 68: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example

• Your study group of 8 people is crazy competitive, and decide to go to the SOS Exam-AID to see if their session helped your average marks. Below are the resultsMidterm Final (with Exam-AID!) difference

1 65 68 32 72 72 03 80 81 14 75 85 105 87 92 56 69 67 -27 70 72 28 81 86 5

average 74.875 77.875 3.70328

Page 69: BU255 FINAL Exam-AID Taught by: Greg Overholt

Is there enough evidence to conclude that their marks went up from the Exam-AID? Use αα=0.01=0.01

number of pairs

= mean sample difference

= mean population difference

= SD of sample difference

/

d

d

d

D

s

d D

s n

n

t 7,.01 = 77.875 – 74.875

3.703 / √ 8

Midterm Final (with Exam-AID!) difference1 65 68 32 72 72 03 80 81 14 75 85 105 87 92 56 69 67 -27 70 72 28 81 86 5

average 74.875 77.875 3.70328

t 7,.01 = 3 / 1.309 = 2.291

Is it enough?? TO THE TABLE!!!

Page 70: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 71: BU255 FINAL Exam-AID Taught by: Greg Overholt

Is there enough evidence to conclude that their marks went up from the Exam-AID? Use αα=0.01=0.01

number of pairs

= mean sample difference

= mean population difference

= SD of sample difference

/

d

d

d

D

s

d D

s n

n

t 7,.01 = 77.875 – 74.875

3.703 / √ 8

Midterm Final (with Exam-AID!) difference1 65 68 32 72 72 03 80 81 14 75 85 105 87 92 56 69 67 -27 70 72 28 81 86 5

average 74.875 77.875 3.70328

t 7,.01 = 3 / 1.309 = 2.291

Is it enough?? TO THE TABLE!!!

t 7,.01 is 2.998, and this example is 2.291… not far enough to ensure that we are 99% confident that the Exam-AID helped.

Page 72: BU255 FINAL Exam-AID Taught by: Greg Overholt

• What about a confidence interval for 95% confidence?

• t7,.025 =?

100(1 - )% CI of D

1, / 2d

n

sd t

n

Page 73: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 74: BU255 FINAL Exam-AID Taught by: Greg Overholt

• What about a confidence interval for 95% confidence?

• t7,.025 = 2.365

Interval = 3 +/- 2.365 (3.703 / √8)Interval with 95% confidence, their post-

ExamAID marks will be -0.09628 to 6.09628 higher.

100(1 - )% CI of D

1, / 2d

n

sd t

n

Page 75: BU255 FINAL Exam-AID Taught by: Greg Overholt

Chapter 13 (inference about comparing two population)

• With two populations, we can be:– Comparing the variances– Comparing the means– Comparing paired observations– Comparing proportions

Page 76: BU255 FINAL Exam-AID Taught by: Greg Overholt

Examples of two proportions

• Comparing market share of a product for two different markets

• Studying the proportion of female customers in two different geographic areas such as Quebec and Ontario.

• Comparing the proportion of defective products from one period to another

Page 77: BU255 FINAL Exam-AID Taught by: Greg Overholt

3) Inference about the difference between population proportions (with nominal data) – – Using nominal data, so win/lose categories.– Z statistic– same restriction of the p*n and p*(1-n) > 5 (but

now for both populations)• If over 5, then p1 – p2 is NORMAL, use these

formulas:

Page 78: BU255 FINAL Exam-AID Taught by: Greg Overholt

Formulas!!!!

Page 79: BU255 FINAL Exam-AID Taught by: Greg Overholt

A random sample survey of 300 stats students reveals that 120 won’t study for this exam more then 1 day before it. .. A sample survey of 250 accounting students revealed that 90 of them won’t study more then a day before it. Is the proportion of stats procrastinators higher? Use α=0.01.

P1 = 120/300 = .4

P2 = 90/250 = .36

P hat = 300*.4 + 250*.36 / (550) = .3818

Z = .4 - .36 / √ .3818*.6181*(1/300+1/250) Z = .04 / .0316 Z = .96152

TO THE TABLE!! What is the p value?

Page 80: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 81: BU255 FINAL Exam-AID Taught by: Greg Overholt

A random sample survey of 300 stats students reveals that 120 won’t study for this exam more then 1 day before it. .. A sample survey of 250 accounting students revealed that 90 of them won’t study more then a day before it. Is the proportion of stats procrastinators higher? Use α=0.01.

P1 = 120/300 = .4

P2 = 90/250 = .36

P hat = 300*.4 + 250*.36 / (550) = .3818

Z = .4 - .36 / √ .3818*.6181*(1/300+1/250) Z = .04 / .0316 Z = .96152

Z of .96 has .1685 (.5-3315) in the tail, that is MUCH greater than .01 .. So not enough evidence to reject the null that Stats students are higher procrastinators.

Page 82: BU255 FINAL Exam-AID Taught by: Greg Overholt

RECAP of 2 populations!!

• We looked at TWO populations now:

Two proportions:

z =

Matched Pairs:Population Means (with stdev)

Population Means (no stdev)

t =

Page 83: BU255 FINAL Exam-AID Taught by: Greg Overholt

Lecture 16 : Analysis of Variance

Page 84: BU255 FINAL Exam-AID Taught by: Greg Overholt

Analysis of Variance (ANOVA)

• comparing 2 or more population of INTERVAL data• determine whether differences exist between population

means– done by analyzing sample variance, and the ANOVA

technique:• Single Factor (or 1-way): For populations which

have only 1 factor that you are comparing them against, then you use the ANOVA: Single Factor. This is like comparing sales from 3 cities with the factor being the marketing strategy.

Page 85: BU255 FINAL Exam-AID Taught by: Greg Overholt

One-Factor ANOVA

All means are the same:The null hypothesis is not

rejected

H0 :1 2 3 k

Ha : Not all i are the same

1 2 3

Page 86: BU255 FINAL Exam-AID Taught by: Greg Overholt

At least one mean is different:The null hypothesis is rejected

H0 :1 2 3 k

Ha : Not all i are the same

1 2 3

1 2 3

or

One-Factor ANOVA

Page 87: BU255 FINAL Exam-AID Taught by: Greg Overholt

Partitioning the Variation

Total Variation = the aggregate dispersion of the individual data values across the various populations

Within-Sample Variation (SSE) = dispersion that exists among the data values within a particular population

Between-Sample Variation (SSC) = dispersion among the sample means (sometimes referred to SST)

Page 88: BU255 FINAL Exam-AID Taught by: Greg Overholt

1 2 3

Response

3x

1x 2x

1 2 3

Response

3x1x 2x

x

Between Group Variation (SSC) + Within Group Variation (SSE)

RECAP Error = SSC + SSE

1 2 3

Response, X

x Total Sum of Squares =

+

Page 89: BU255 FINAL Exam-AID Taught by: Greg Overholt

Total Sum of Squares

Where:

k = number of populations

ni = sample size from population i

xij = jth measurement from population i

x = grand mean (mean of all data values)

TOTAL ERROR=

+

Page 90: BU255 FINAL Exam-AID Taught by: Greg Overholt

One-Way ANOVA Table

Source of Variation

dfSS MS

Between Samples

SST MST =

Within Samples

nT - CSSE MSE =

Total nT - 1SST+SSE

C - 1 MST

MSE

F ratio

C = number of populations

nT = sum of the sample sizes from all populationsdf = degrees of freedom

SST

C - 1

SSE

nT - C

F =

Page 91: BU255 FINAL Exam-AID Taught by: Greg Overholt

One-Factor ANOVA F-Test Statistic

• Test statistic

MST is mean squares between variancesMSE is mean squares within variances

• Degrees of freedom– dfC = k – 1 (k = number of populations)

– dfE = nT – k (nT = sum of sample sizes from all populations)

H0: μ1= μ2 = … = μ k

Ha: At least two population means are different

F = MST / MSE

Page 92: BU255 FINAL Exam-AID Taught by: Greg Overholt

Single Factor:- Comparing 3 independent populations, with the factor being

marketing strategy.Q. Is there enough evidence to support that the sales of this product

differ? 123456789101112131415

A B C D E F GAnova: Single Factor

SUMMARYGroups Count Sum Average Variance

Convenience 20 11551 577.6 10775.0Quality 20 13060 653.0 7238.1Price 20 12173 608.7 8670.2

ANOVASource of Variation SS df MS F P-value F crit

Between Groups 57512.23 2 28756.1 3.23 0.0468 3.16Within Groups 506983.5 57 8894.4

Total 564495.7 59

All this says is that at least 2 of the means differ!

Analysis of Variance

Page 93: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example?

• Want to see if there is any difference between the speed of shot for three top brands of hockey sticks.Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

Page 94: BU255 FINAL Exam-AID Taught by: Greg Overholt

Anova: Single Factor

SUMMARYGroups Count Sum Average Variance

Bauer 3 154 51.33333 10.33333Nike 3 168 56 7Easton 3 155 51.66667 12.33333

ANOVASource of Variation SS df MS F P-value F critBetween GroupsWithin Groups

Example?

Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

SOLVE!

Page 95: BU255 FINAL Exam-AID Taught by: Greg Overholt

Groups Count Sum Average VarianceBauer 3 154 51.33333 10.33333Nike 3 168 56 7

ANOVASource of Variation SS df MS F P-value F critBetween GroupsWithin Groups

Example?

Bauer (mph’s): 50,55,49 (avg = 51.3)Nike (mph’s): 58,53,57 (avg = 56)Easton (mph’s): 48,52,55 (avg = 51.6)

BIG average = 53

= 3 (51.33-53)2 + 3(56-53)2 + 3(51.66-53)2

= 40.667

40.667

= (50-51.3)2 + (55-51.3)2 + (49-51.3)2 + (58-56)2 + (53-56)2 + (57-56)2 + (48-51.6)2 + (52-51.6)2 + (55-51.6)2

= 59.333

59.333

Page 96: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example?

Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

Groups Count Sum Average VarianceBauer 3 154 51.33333 10.33333Nike 3 168 56 7

ANOVASource of Variation SS df MS F P-value F critBetween Groups 40.66667Within Groups 59.33333

Between groups df = number of manufactures – 1

Within groups df = number of all trial – number of manufactures

26

Page 97: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example?

Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

Groups Count Sum Average VarianceBauer 3 154 51.33333 10.33333Nike 3 168 56 7

ANOVASource of Variation SS df MS F P-value F critBetween Groups 40.66667Within Groups 59.33333

SST / df = 40.6667 / 2

SSE / df = 59.3333 / 6

26

20.3339.8888

Page 98: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example?

Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

Groups Count Sum Average VarianceBauer 3 154 51.33333 10.33333Nike 3 168 56 7

ANOVASource of Variation SS df MS F P-value F critBetween Groups 40.66667Within Groups 59.33333

F = MST / MSE = 20.33 / 9.888

26

20.3339.8888

2.056

Page 99: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example?

Bauer (mph’s): 50,55,49Nike (mph’s): 58,53,57Easton (mph’s): 48,52,55

Groups Count Sum Average VarianceBauer 3 154 51.33333 10.33333Nike 3 168 56 7

ANOVASource of Variation SS df MS F P-value F critBetween Groups 40.66667Within Groups 59.33333

26

20.3339.8888

2.056 0.20888 5.143253

P value is .20 .. if significance level is .05, then you would not reject the null, indicating that the three sticks are not significantly different.

0 2.056F-value

5.14critical

Page 100: BU255 FINAL Exam-AID Taught by: Greg Overholt

Must be normal and equal variances:- If nonnormal, replace test with Kruskal-Wallis

test (making the numbers ordinal) – not covered- If unequal variances – we CANNOT DO!

NOTES / THOUGHTS:- Why do we need it? (if 4 pop’s, LOTS of pairs of

means to compare) potential for type 1 error is huge. But need t-test for 2 pop’s cause ANOVA only says that means ‘differ’ (not < or >)

Analysis of Variance

Page 101: BU255 FINAL Exam-AID Taught by: Greg Overholt

Lecture 17&18: Correlation and

Simple Regression Analysis

Page 102: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression

REGRESSION:USED TO: analysis the relationship between interval variables.• Is there a linear relationship between one variable (dependent

variable) and other variables (independent variables)? • Predict dependent variables based on independent ones.

Page 103: BU255 FINAL Exam-AID Taught by: Greg Overholt

Least Squares Method

• Least Squares Method– The objective of the scatter diagram is to measure the

strength and direction of the linear relationship– Both can be more easily judged by drawing a straight line

through the data.– How to draw that line? LSM!

• This line has the smallest sum of squared distances to all the points on the plot.

Page 104: BU255 FINAL Exam-AID Taught by: Greg Overholt

• LSM: It creates a line, and it is created by:You calculate b1, then for b0 sub in the mean values

of x and y, solve for b0 and rewrite.

Least Squares Method

21

XY

XX

X X Y Y SS

SSbX X

Page 105: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression and Correlation

• Eg: resell value of car with x miles on the odometer

METHOD 1: SSEStandard Error (SSE) = sum of the error for each of the points. How good the error in the points is.(relate this to the mean)0.32 vs sample mean of 15 ($15,000 for a car on average) - (average data points)

Standard Error of estimate

Page 106: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression and Correlation

• Eg: resell value of car with x miles on the odometer METHOD 2: TEST SLOPE (part 1)If we want to see if a relationship (if no relationship, slope = horizontal = 0 ):H0: β = 0

H1: β ≠ 0

Can reject if you know the t-critical, or can just look at p-value:

P-values here ASSUME two-tail test!! So with this, compare 0.000 with 0.05 or whatever significance you are given!

Page 107: BU255 FINAL Exam-AID Taught by: Greg Overholt

• Eg: resell value of car with x miles on the odometer METHOD 2: TEST SLOPE (part 2)If we want to see if it has a negative relationship (eg: slope< 0):H0: β = 0

H1: β < 0

Can reject if you know the t-critical, or can just look at p-value:

SINCE p-value assumes two tail, you need to divide the p-value by 2!! So compare 0.000/2 and .05 This one is easy, but if it was 0.06 .. You’d have to guess (if looking for > (when it is a negative coeff) you need to 1 – value!

Linear Regression and Correlation

Page 108: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression and Correlation

• Eg: resell value of car with x miles on the odometer METHOD 3: Coefficient of CorrelationIf want to see if there is a relationship between the variables. NOTE: R between -1 and +1

H0 : ρ = 0H1: ρ ≠ 0

0.8052 = pretty good positive relationship!

BUT… how much can be explained by this model???

Page 109: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression and Correlation

• Eg: resell value of car with x miles on the odometer METHOD 3.5: Coefficient of DeterminationRemember – when we squared R, we go Determination (how much of the variation is due to the independent variable (if 1 – no error, and all variation due to indep, if 0 – no linear relationship between variables, and all error).

Here – 64% of the variation in the price is determined by the mileage!!

Page 110: BU255 FINAL Exam-AID Taught by: Greg Overholt

Confidence Interval

11 / 2, 2 *n bb t s

What if they ask you for a confidence interval for 95%?

Use above formula:

b1 = -0.0669 (intercept of the slope)

Sb1 = .005 (Standard Error)

ta/2,n-2 = t.025,13 to the table!!

n = 15 prices were sampled.

Page 111: BU255 FINAL Exam-AID Taught by: Greg Overholt
Page 112: BU255 FINAL Exam-AID Taught by: Greg Overholt

Confidence Interval

11 / 2, 2 *n bb t s

What if they ask you for a confidence interval for 95%?

Use above formula:

B1 = 17.25 (intercept of the slope)

b1 = -0.0669 (slope)

Sb1 = .005(Standard Error)

T.025,13 = 2.160

Result = -0.0699 +/- 2.160*.005

RANGE -0.077 -0.0561

n = 15 prices were sampled.

Page 113: BU255 FINAL Exam-AID Taught by: Greg Overholt

With regression done.. What can you do?

Once the regression line is confirmed to be valid, it is suitable as an estimation and prediction tool:

1. Point estimate of y for xo

2. Prediction interval of individual y for xo

3. Confidence interval of average y for xo

Page 114: BU255 FINAL Exam-AID Taught by: Greg Overholt

POINT estimate

16.114

Point is used to find out an estimate for ONE particular value of x.

Using the following sample, simple regression analysis gives ˆ 1.57 0.0407Y X

Q. What is the point estimate of the cost for 73 passengers?

1.57 + 0.0407(73) = 4.5411

Page 115: BU255 FINAL Exam-AID Taught by: Greg Overholt

Linear Regression and Correlation

Prediction Interval VS Confidence Interval??

• To find out the expected value of an individual item (prediction interval) or the expected value of the mean of a population (confidence interval estimate)

• The confidence interval estimate of the expected value of y will be narrower than the prediction interval for the same given value of x and confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.

Page 116: BU255 FINAL Exam-AID Taught by: Greg Overholt

16.116

Construct the 95% confidence interval for the mean cost for a flight with 73 passengers. Xbar=77.5, SSxx=1689, Se=0.177.

(assume this regression was done with 20 samples)

Confidence Interval for average y given xo

2

/ 2, 201ˆ *SS*n e

xx

Yn

XXt S

What’s Y? we found it in the last question: 4.5411

What’s t a/2,n-2 .. t .025,18 = 2.10

Result? 4.5411 +/- 2.10*0.177* √ [ 1/20 + (73-77.5)2 / 1689 ]

4.5411 +/- 0.0925 or a 95% confidence Interval of 4.448 4.633

Page 117: BU255 FINAL Exam-AID Taught by: Greg Overholt

16.117

Construct the 95% confidence interval for the cost for a particular flight with 73 passengers. Xbar=77.5, SSxx=1689, Se=0.177.(assume this regression was done with 20 samples)

PREDICTION Interval for average y given xo

Only difference? The 1+ in the square root..

Result? 4.5411 +/- 2.10*0.177* √ [ 1+ 1/20 + (73-77.5)2 / 1689 ]

4.5411 +/- 0.383 or a 95% PREDICTION Interval for the cost of 1 flight is between 4.158 4.924

2

/ 2, 201ˆ * 1SS*n e

xx

Yn

XXt S

NOTE… the range for the Confidence Interval was much smaller 4.448 4.633

Page 118: BU255 FINAL Exam-AID Taught by: Greg Overholt

Regression Diagnostics

• Three conditions required in order to perform a regression analysis. (all on upcoming slides)

1. Error Variable must be normally distributed 2. Error variable must have a constant

varianceHeteroscedasticity: when the error variable

does not have a constant variance

3. Errors must be independent of each other• Can’t be correlated (Time series data usually

have errors that are correlated)• How to diagnose? – Residual Analysis. (look at the

difference between the actual and predicted results)

Page 119: BU255 FINAL Exam-AID Taught by: Greg Overholt

16.119

Normality…

We can take the residuals and put them into a histogram to

visually check for normality…

…we’re looking for a bell shaped histogram with the mean close to zero.

Page 120: BU255 FINAL Exam-AID Taught by: Greg Overholt

16.120

Constant Variance…

When the requirement of a constant variance is violated, we

have a condition of heteroscedasticity.

Page 121: BU255 FINAL Exam-AID Taught by: Greg Overholt

16.121

Independence

When the requirement of independence is violated, there may be a trend in residuals.

Page 122: BU255 FINAL Exam-AID Taught by: Greg Overholt

Lecture 19 & 20: Multiple Regression

Page 123: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Many different variables that go into these types.

Eg: A Hotel is looking to expand – but they don’t know where in this 1 particular city. Should it be close to the airport? Close to high income housing? To the university?

TO find out, sample hotels in the area, find out their details for the various questions you think are factors towards profitability, and then RUN EXCEL to see which factors actually affect profit!!

Page 124: BU255 FINAL Exam-AID Taught by: Greg Overholt

Required Conditions…

For these regression methods to be valid the following four conditions for the error variable must be met:

1. The distribution of the error variable is normal. (draw histogram of errors)

2. The mean of the error variable is 0. (calculate error variable)

3. The standard deviation of error is constant.(plot residuals vs Y to see if constant)

4. The errors are independent. (plot residuals vs time periods to see if connected)

Page 125: BU255 FINAL Exam-AID Taught by: Greg Overholt

Eg: Hotel profit margin – based on 6 factors.

Multiple Regression (multiple variables, all first-order)

Page 126: BU255 FINAL Exam-AID Taught by: Greg Overholt

Eg: Hotel profit margin – based on 6 factors.ANOVA (used most for multiple reg.)The printout gives us the overall quality of this model – Significance F! Could compare F F-critical, or look at Significance F and compare to significance level! 0.00 < 0.05 – If greater then 0.05, it says none of the factors have a relationship.

Multiple Regression (multiple variables, all first-order)

Page 127: BU255 FINAL Exam-AID Taught by: Greg Overholt

Eg: Hotel profit margin – based on 6 factors.

R Square / Adjusted R Square:

R square = coefficient of determination.Adjusted = coef of Det if MORE THEN 1 VARIABLE! Use this one to determine how much of the variation is from the variables!!

Multiple Regression (multiple variables, all first-order)

Page 128: BU255 FINAL Exam-AID Taught by: Greg Overholt

Eg: Hotel profit margin – based on 6 factors.

Variable Relationships?:

Look to the P-values. If < significant (using 0.05) then they are good (reject the null saying that there is no relationship) The others can be removed.

Multiple Regression (multiple variables, all first-order)

Page 129: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Eg: Hotel profit margin – based on 6 factors.

Interpret the Intercepts!

+ means that the profit margin goes up! - means that the profit margin goes down.

Page 130: BU255 FINAL Exam-AID Taught by: Greg Overholt

Example Q

• You can predict the outcome if the built where: – There are 3815 rooms within 3 miles of the site.– The closest other hotel or motel is .9 miles away.– The amount of office space is 476,000 square feet.– Census data indicates the median household income in the

area (rounded to the nearest thousand) is $35,000

Page 131: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multicollinearity• When 2 or more of your variables are not just related to

the dependent variable (eg: profit margin), but are correlated to each other (so if distance to university goes down, then income of houses goes up/down). There will always be some of this, but if it is a strong coorelation = multicollinearity.

• WHAT DOES THIS MEAN?: The overall model can be tested for relationships (significance F ok!), but you cannot tell which individual variables are related! (not indiv t-tests!!)

• FIX?: You can built the model one variable at a time, or delete some.. Not needed for final.

Multiple Regression (multiple variables, all first-order)

Page 132: BU255 FINAL Exam-AID Taught by: Greg Overholt

• Indicator Variables (or dummy var)– Either 0 or 1– If 3 different colours of cars (and colour may

affect price of cars) then you need 2 dummy variables (1 less then options)

Multiple Regression (multiple variables, all first-order)

Page 133: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Page 134: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Page 135: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Page 136: BU255 FINAL Exam-AID Taught by: Greg Overholt

Multiple Regression (multiple variables, all first-order)

Page 137: BU255 FINAL Exam-AID Taught by: Greg Overholt

Bayes Theorem

• Start with your initial or prior probabilities.

• You get new info.• So now with new info, you calculate revised

or posterior probabilities• This process is Bayes Theorem

Page 138: BU255 FINAL Exam-AID Taught by: Greg Overholt

Bayes Theorem

• Bayes’ theorem is applicable when the events for which we want to compute posterior probabilities are mutually exclusive and their union is the entire sample space

Conditional Probability:

P(Ai|B) = P(Ai)*P(B|Ai)

P(B)

KEY DIFFERENCE: You are just now, adding up all the partitions that contain B on the bottom, since you have them all split up.

Page 139: BU255 FINAL Exam-AID Taught by: Greg Overholt

Bayes Theorem

• Example: – Two printer cartridge companies, Alamo and

Jersey. – Alamo makes 65% of the cartridges– Jersey makes 35%.– Alamo has a defective rate of 8%– Jersey has a defective rate of 12%a) Customer purchases a cartridge, prob that Alamo

made it?

- Cartridge is tested, and it is defective. b) What is the probability that Alamo made the cartridge? c) What is the probability that Jersey made the cartridge?

Page 140: BU255 FINAL Exam-AID Taught by: Greg Overholt

ANSWER

• The knowledge of the producer breakdown is the prior probability:– Alamo = 65% P(E1)

– Jersey = 35% P(E2)

• We know the conditional probabilities of the defective rates:– Alamo = 8% P(D|E1)

– Jersey = 12% P(D|E2)

Page 141: BU255 FINAL Exam-AID Taught by: Greg Overholt

ANSWER 1: TABLE

Prior Conditional Joint Posterior

Alamo .65 .08 .052 .052/.094 = .553

Jersey .35 .12 .042 .042/.094 = .447

Total defective

.094 1.000

Odds of getting an alamo cartridge that is defective if you bought it at futureshop by random

Given that you got a defective cartridge, since there is a 9.4% chance of getting a defective one, and 5.2% of that 9.4% is Alamo’s, then you have a 55.3% of it being Alamo’s!

Page 142: BU255 FINAL Exam-AID Taught by: Greg Overholt

ANSWER 2: TREE

Alamo.65

Jersey.35

Defective.08

Defective.12

Acceptable.88

Acceptable.92

.052

.598

.042

.308

.094

Revised Probabilty: Alamo = .052 / .094 = .553Revised Probabilty: Jersey = .042 / .094 = .447

Page 143: BU255 FINAL Exam-AID Taught by: Greg Overholt

REMINDERS!

• Financial Accounting Exam-AID Wednesday.

• STATS sessions again Thursday and Friday.– TELL YOUR FRIENDS!!!

• Interested in going on an outreach trip? SOS is running at least 2 trip (May & August).. And other trips nationally.. E-mail: [email protected]