52
Stat 155, Section 2, Last Time • Binomial Distribution – Normal Approximation – Continuity Correction – Proportions (different scale from “counts”) • Distribution of Sample Means – Law of Averages, Part 1 – Normal Data Normal Mean – Law of Averages, Part 2: Everything (averaged) Normal

Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Embed Size (px)

Citation preview

Page 1: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Stat 155, Section 2, Last Time• Binomial Distribution

– Normal Approximation– Continuity Correction– Proportions (different scale from “counts”)

• Distribution of Sample Means– Law of Averages, Part 1 – Normal Data Normal Mean– Law of Averages, Part 2:

Everything (averaged) Normal

Page 2: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 382-396, 400-416

Approximate Reading for Next Class:

Pages 425-428, 431-439

Page 3: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Chapter 6: Statistical Inference

Main Idea:

Form conclusions by

quantifying uncertainty

(will study several approaches,

first is…)

Page 4: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Section 6.1: Confidence Intervals

Background:

The sample mean, , is an “estimate”

of the population mean,

How accurate?

(there is “variability”, how

much?)

X

Page 5: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Recall the Sampling Distribution:

(maybe an approximation)

nNX

,~

Page 6: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Thus understand error as:

How to explain to untrained consumers?

(who don’t know randomness,

distributions, normal curves)

ndistX 'n

Page 7: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Approach: present an interval

With endpoints:

Estimate +- margin of error

I.e.

reflecting variability

How to choose ?

mX

m

Page 8: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Choice of “Confidence Interval radius”,

i.e. margin of error, :

Notes:

• No Absolute Range (i.e. including “everything”) is available

• From infinite tail of normal dist’n

• So need to specify desired accuracy

m

Page 9: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

HW: 6.1

Page 10: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence IntervalsChoice of margin of error, :Approach:• Choose a Confidence Level• Often 0.95

(e.g. FDA likes this number for approving new drugs, and it is a common standard for publication in many fields)

• And take margin of error to include that part of sampling distribution

m

Page 11: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

E.g. For confidence level 0.95, want

distribution

0.95 = Area

= margin of errorm

X

Page 12: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Computation: Recall NORMINV takes

areas (probs), and returns cutoffs

Issue: NORMINV works with lower areas

Note: lower tail

included

Page 13: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

So adapt needed probs to lower areas….

When inner area = 0.95,

Right tail = 0.025

Shaded Area = 0.975

So need to compute:

nNORMINV

,,975.0

Page 14: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Need to compute:

Major problem: is unknown

• But should answer depend on ?

• “Accuracy” is only about spread

• Not centerpoint

• Need another view of the problem

nNORMINV

,,975.0

Page 15: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Approach to unknown :

Recenter, i.e. look at dist’n

Key concept:

Centered at 0

Now can calculate as:

nNORMINVm

,0,975.0

X

Page 16: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Computation of:

Smaller Problem: Don’t know

Approach 1: Estimate with

• Leads to complications

• Will study later

Approach 2: Sometimes know

nNORMINVm

,0,975.0

s

Page 17: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

E.g. Crop researchers plant 15 plots

with a new variety of corn. The

yields, in bushels per acre are:

Assume that = 10 bushels / acre

138

139.1

113

132.5

140.7

109.7

118.9

134.8

109.6

127.3

115.6

130.4

130.2

111.7

105.5

Page 18: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence IntervalsE.g. Find:

a) The 90% Confidence Interval for the mean value , for this type of corn.

b) The 95% Confidence Interval.

c) The 99% Confidence Interval.

d) How do the CIs change as the confidence level increases?

Solution, part 1 of:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg22.xls

Page 19: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

An EXCEL shortcut:

CONFIDENCE

Careful: parameter is:

2 tailed outer area

So for level = 0.90, = 0.10

Page 20: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

HW: 6.5, 6.9, 6.13, 6.15, 6.19

Page 21: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Additional use of margin of error idea

Background: distributions

Small n Large n

X

n

Page 22: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Could choose n to make = desired value

But S. D. is not very interpretable, so make “margin of error”, m = desired value

Then get: “ is within m units of ,

95% of the time”

n

X

Page 23: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Given m, how do we find n?

Solve for n (the equation):

n

mn

XPmXP

95.0

nm

ZP

Page 24: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Graphically, find m so that:

Area = 0.95 Area = 0.975

nm

nm

Page 25: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Thus solve:

2

1,0,975.0

NORMINVm

n

1,0,975.0NORMINVn

m

1,0,975.0NORMINVm

n

Page 26: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

Numerical fine points:

• Change this for coverage prob. ≠ 0.95

• Round decimals upwards,

To be “sure of desired coverage”

2

1,0,975.0

NORMINVm

n

Page 27: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Choice of Sample Size

EXCEL Implementation:

Class Example 22, Part 2:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg22.xls

HW: 6.22 (1945), 6.23

2

1,0,975.0

NORMINVm

n

Page 28: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

2 Equivalent Views:

Distribution Distribution

95%

pic 1 pic 2

m m m 0 m

X X

Page 29: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

Mathematically:

pic 1 pic 2

no pic

"",.. bracketsmXmXICtheP

mXPmXmP 95.0

mXmXP

Page 30: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

Frequentist View: If repeat the

experiment many times,

About 95% of the time, CI will contain

(and 5% of the time it won’t)

Page 31: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Confidence Intervals

Nice Illustration:

Publisher’s Website

• Statistical Applets

• Confidence Intervals

Shows proper interpretation:

• If repeat drawing the sample

• Interval will cover truth 95% of time

Page 32: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

Revisit Class Example 17http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg17.xls

Recall Class HW:

Estimate % of Male Students at UNC

C.I. View: Class Example 23http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg23.xls

Illustrates idea:

CI should cover 95% of time

Page 33: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

Class Example 23:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg23.xls

Q1: SD too small Too many cover

Q2: SD too big Too few cover

Q3: Big Bias Too few cover

Q4: Good sampling About right

Q5: Simulated Bi Shows “natural var’n”

Page 34: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Interpretation of Conf. Intervals

HW: 6.27, 6.29, 6.31

Page 35: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

And now for somethingcompletely different….

A fun dance video:

http://ebaumsworld.com/2006/07/robotdance.html

Suggested by David Moltz

Page 36: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Sec. 6.2 Tests of Significance

= Hypothesis Tests

Big Picture View:

Another way of handling random error

I.e. a different view point

Idea: Answer yes or no questions, under uncertainty

(e.g. from sampling or measurement error)

Page 37: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Hypothesis Tests

Some Examples:

• Will Candidate A win the election?

• Does smoking cause cancer?

• Is Brand X better than Brand Y?

• Is a drug effective?

• Is a proposed new business strategy effective?

(marketing research focuses on this)

Page 38: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Hypothesis Tests

E.g. A fast food chain currently brings in

profits of $20,000 per store, per day. A

new menu is proposed. Would it be

more profitable?

Test: Have 10 stores (randomly selected!)

try the new menu, let = average of

their daily profits.

X

Page 39: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business ExampleSimplest View: for :

new menu looks better.

Otherwise looks worse.

Problem: New menu might be no better (or

even worse), but could have

by bad luck of sampling

(only sample of size 10)

000,20$X

000,20$X

Page 40: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Problem: How to handle & quantify gray area in these decisions.

Note: Can never make a definite conclusion e.g. as in Mathematics,

Statistics is more about real life…

(E.g. even if or , that might be bad luck of sampling, although very unlikely)

0$X 000,000,1$X

Page 41: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Hypothesis Testing

Note: Can never make a definite conclusion,

Instead measure strength of evidence.

Approach I: (note: different from text)

Choose among 3 Hypotheses:

H+: Strong evidence new menu is better

H0: Evidence is inconclusive

H-: Strong evidence new menu is worse

Page 42: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Caution!!!

• Not following text right now

• This part of course can be slippery

• I am “breaking this down to basics”

• Easier to understand

(If you pay careful attention)

• Will “tie things together” later

• And return to textbook approach later

Page 43: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Hypothesis Testing

Terminology:

H0 is called null hypothesis

Setup: H+, H0, H- are in terms of

parameters, i.e. population quantities

(recall population vs. sample)

Page 44: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

E.g. Let = true (over all stores) daily

profit from new menu.

H+: (new is better)

H0: (about the same)

H-: (new is worse)000,20$

000,20$

000,20$

Page 45: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Base decision on best guess:

Will quantify strength of the evidence using

probability distribution of

E.g. Choose H+

Choose H0

Choose H-000,20$X

000,20$X

000,20$X

X

Page 46: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

How to draw line?

(There are many ways,

here is traditional approach)

Insist that H+ (or H-) show strong evidence

I.e. They get burden of proof

(Note: one way of solving

gray area problem)

Page 47: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Assess strength of evidence by asking:

“How strange is observed value ,

assuming H0 is true?”

In particular, use tails of H0 distribution as

measure of strength of evidence

X

Page 48: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business ExampleUse tails of H0 distribution as measure of

strength of evidence: distribution

under H0

observed value ofUse this probability to measure

strength of evidence

X

X

k20$

Page 49: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Hypothesis Testing

Define the p-value, for either H+ or H-, as:

P{what was seen, or more conclusive | H0}

Note 1: small p-value strong evidence against H0, i.e. for H+ (or H-)

Note 2: p-value is also called observed significance level.

Page 50: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Suppose observe: ,

based on

Note , but is this conclusive?

or could this be due to natural sampling variation?

(i.e. do we risk losing money from new menu?)

400,2$s000,21$X10n

000,20$X

Page 51: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Assess evidence for H+ by:

H+ p-value = Area

10400,2

,000,20' NndistX

000,21$000,20$

Page 52: Stat 155, Section 2, Last Time Binomial Distribution –Normal Approximation –Continuity Correction –Proportions (different scale from “counts”) Distribution

Fast Food Business Example

Computation in EXCEL:

Class Example 22, Part 1:http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg24.xls

P-value = 0.094.

“1 in 10”, “could be random variation”,

“not very strong evidence”