70
NORMAL AND STANDARD NORMAL DISTRIBUTION Avjinder Singh Kaler and Kristi Mai

Normal and standard normal distribution

Embed Size (px)

Citation preview

Page 1: Normal and standard normal distribution

NORMAL AND STANDARD NORMAL DISTRIBUTION

Avjinder Singh Kaler and Kristi Mai

Page 2: Normal and standard normal distribution

Normal and Standard Normal Distribution

Sampling Distributions and Estimators

Hypothesis Testing

Testing a Claim about a Population Proportion

Page 3: Normal and standard normal distribution

We can find areas (probabilities) for different regions under a normal model using StatCrunch.

Page 4: Normal and standard normal distribution

A bone mineral density test can be helpful in identifying the presence of osteoporosis.

The result of the test is commonly measured as a z score, which has a normal distribution with a mean of 0 and a standard deviation of 1.

A randomly selected adult undergoes a bone density test.

Find the probability that the result is a reading less than 1.27.

Page 5: Normal and standard normal distribution

The probability of random adult having a bone density less than 1.27 is 0.8980.

( 1.27) 0.8980P z

Page 6: Normal and standard normal distribution

Using the same bone density test, find the probability that a randomly selected person has a result above –1.00 (which is considered to be in the “normal” range of bone density readings.

The probability of a randomly selected adult having a bone density above –1 is 0.8413.

Page 7: Normal and standard normal distribution

A bone density reading between –1.00 and –2.50 indicates the subject has osteopenia. Find this probability.

The probability of a randomly selected adult having osteopenia is 0.1525.

Page 8: Normal and standard normal distribution

denotes the probability that the z score is between a and b.

denotes the probability that the z score is greater than a.

denotes the probability that the z score is less than a.

( )P a z b

( )P z a

( )P z a

Page 9: Normal and standard normal distribution

Finding the 95th Percentile

1.645

5% or 0.05

(z score will be positive)

Page 10: Normal and standard normal distribution

Using the same bone density test, find the bone density scores that separates the bottom 2.5% and find the score that separates the top 2.5%.

Page 11: Normal and standard normal distribution

a numerical measurement describing some characteristic of a population

population

parameter

Page 12: Normal and standard normal distribution

a numerical measurement describing some characteristic of a sample

sample

statistic

Page 13: Normal and standard normal distribution

The distribution from which we draw the sample

It includes all individual data values from the population so that the values of relevant population parameters are fixed

We are interested in estimating the unknown population parameters

Page 14: Normal and standard normal distribution

The distribution of the sample data

It includes all individual data values from a sample and is what we

observe, generally, in practice.

The data distribution can be described by sample statistics.

Data distributions differ from one random sample to another.

The law of large numbers applies here making the data distribution of

larger samples closer in appearance to the underlying population

distribution.

Page 15: Normal and standard normal distribution

The sampling distribution of a statistic (such as the sample proportion) is the distribution of all values of the statistic when all possible samples of the same size n are taken from the same population.

The sampling distribution of a statistic is typically represented as a probability distribution in the format of a table, probability histogram, or formula.

Page 16: Normal and standard normal distribution

The sampling distribution of the proportion is the distribution of sample proportions, with all samples having the same sample size n taken from the same population.

Notation:

Note:

𝑝 =𝑥

𝑛 where 𝑥 is the number of successes and n is the sample size

p = population proportion

= sample proportion p

Page 17: Normal and standard normal distribution

Sample proportions target the value of the population proportion.

The mean (center) of the distribution of sample proportions is the population proportion.

The distribution of the sample proportion tends to be a normal distribution.

Page 18: Normal and standard normal distribution

Consider repeating this process:

Roll a die 5 times. Find the proportion of odd numbers of the results.

Page 19: Normal and standard normal distribution

All outcomes are equally likely, so the population proportion of odd

numbers is 0.50; the proportion of the 10,000 trials is 0.50. If continued

indefinitely, the mean of sample proportions will be 0.50. Also, notice

the distribution is “approximately normal.”

Specific results from 10,000 trials

Page 20: Normal and standard normal distribution

We use a sample proportion to estimate the value of a population proportion.

The sample proportion is the best point estimate of the population proportion.

We can use a sample proportion to construct a confidence interval to estimate the true value of a population proportion, and we should know how to interpret such confidence intervals.

We should know how to find the sample size necessary to estimate a population proportion.

Page 21: Normal and standard normal distribution

A point estimate is a single value (or point) used to approximate a population parameter.

Page 22: Normal and standard normal distribution

The sample proportion is the best point estimate of the population proportion p.

p

Page 23: Normal and standard normal distribution

The Pew Research Center conducted a survey of 1007 adults and found that 85% of them know what Twitter is.

The best point estimate of p, the population proportion, is the sample proportion:

ˆ 0.85p

Page 24: Normal and standard normal distribution

A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter.

C.I. is needed to understand how good or bad a point estimate may be

C.I. gives us information about the accuracy of the estimate

Page 25: Normal and standard normal distribution

A confidence level is the probability 1 – 𝛼 (often expressed as the equivalent percentage value) that the confidence interval actually contains the true population parameter, assuming that the estimation process is repeated a large number of times.

The confidence level is also called degree of confidence, or the confidence coefficient.

Most common choices are 90%, 95%, or 99%.

(α = 0.10), (α = 0.05), (α = 0.01)

Page 26: Normal and standard normal distribution

We must be careful to interpret confidence intervals correctly. There is a

correct interpretation and many different and creative incorrect

interpretations of the confidence interval 0.828 < p < 0.872.

“We are 95% confident that the interval from 0.828 to 0.872 actually

does contain the true value of the population proportion p.”

This means that if we were to select many different samples of size 1007

and construct the corresponding confidence intervals, 95% of them

would actually contain the true value of the population proportion p.

(Note that in this correct interpretation, the level of 95% refers to the

success rate of the process being used to estimate the proportion.)

Page 27: Normal and standard normal distribution

A critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur.

The number 𝑧𝛼/2 is a critical value that is a z score with the property that

it separates an area of 𝛼

2 in the right tail of the standard normal

distribution.

Page 28: Normal and standard normal distribution

Critical Values

/ 2az / 2az

Page 29: Normal and standard normal distribution

Confidence Level 𝜶 Critical Value, zα/2

90% 0.10 1.645

95% 0.05 1.96

99% 0.01 2.575

Page 30: Normal and standard normal distribution

The margin of error, denoted by E, is the maximum likely difference (with

probability 1 – 𝛼, such as 0.95) between the observed sample statistic

and the true value of the population parameter.

Page 31: Normal and standard normal distribution

The margin of error E is also called the maximum error of the estimate and can be found by multiplying the critical value and the standard deviation of the sample proportions:

2

ˆ ˆpqE z

n

Page 32: Normal and standard normal distribution

p = population proportion

= sample proportion

n = number of sample values

E = margin of error

zα/2 = z score separating an area of α/2 in the

right tail of the standard normal distribution.

p

Page 33: Normal and standard normal distribution

1. The sample is a simple random sample (SRS).

2. The conditions for the binomial distribution are satisfied: • there is a fixed number of trials

• the trials are independent

• there are two categories of outcomes: success and failure

• the probabilities remain constant for each trial

3. There are at least 5 successes and 5 failures.

Page 34: Normal and standard normal distribution

where

2

ˆ ˆpqE z

n

ˆ ˆp E p p E

Page 35: Normal and standard normal distribution

p Eˆ ˆ( , )p E p E

Page 36: Normal and standard normal distribution

In the Chapter Problem we noted that a Pew Research Center poll of 1007 randomly selected adults showed that 85% of respondents know what Twitter is. The sample results are n = 1007 and

a) Find the 95% confidence interval estimate of the population proportion p.

b) Based on the results, can we safely conclude that more than 75% of adults know what Twitter is?

p = 0.85.

Page 37: Normal and standard normal distribution

Requirement check:

simple random sample

fixed number of trials,1007

trials are independent

two outcomes per trial

probability remains constant

Note: number of successes and failures are both at least 5.

Page 38: Normal and standard normal distribution

a) Based on StatCrunch, the 95% confidence interval:

b) Based on the confidence interval obtained in part (a), it does appear that more than 75% of adults know what Twitter is.

Because the limits of 0.828 and 0.872 are likely to contain the true population proportion, it appears that the population proportion is a value greater than 0.75.

0 828 0 872 . .p

Page 39: Normal and standard normal distribution

Margin of Error:

= (upper confidence limit) — (lower confidence limit)

2

Point estimate of :

= (upper confidence limit) + (lower confidence limit)

2

Ep

Page 40: Normal and standard normal distribution

Suppose we want to collect sample data in order to estimate some population proportion.

The question is how many sample items must be obtained?

Requirement:

• The sample must be a SRS

Page 41: Normal and standard normal distribution

(solve for n by algebra)

2

2

2

ˆ ˆ( )az pqn

E 2

ˆ ˆa

pqE z

n

Page 42: Normal and standard normal distribution

When an estimate of is known:

When no estimate of is known:

2

2

2

( ) 0.25azn

E

2

2

2

ˆ ˆ( )az pqn

E

pp

Page 43: Normal and standard normal distribution

If the computed sample size n is not a whole number, round the value of n up to the next larger whole number.

Page 44: Normal and standard normal distribution

Many companies are interested in knowing the percentage of adults who buy clothing online. How many adults must be surveyed in order to be 95% confident that the sample percentage is in error by no more than three percentage points? a) Use a recent result from the Census Bureau: 66% of adults buy

clothing online.

b) Assume that we have no prior information suggesting a possible value of the proportion.

Page 45: Normal and standard normal distribution

a) Use

To be 95% confident that our sample

percentage is within three

percentage points of the true

percentage for all adults, we should

obtain a simple random sample of

958 adults.

2

ˆ ˆ ˆ0.66 and 1 0.34

0.05 so 1.96

0.03

p q p

z

E

2

2

2

2

2

ˆ ˆ

1.96 0.66 0.34

0.03

957.839

958

z pqn

E

Page 46: Normal and standard normal distribution

b) Use

To be 95% confident that our

sample percentage is within

three percentage points of the

true percentage for all adults, we

should obtain a simple random

sample of 1068 adults.

a = 0.05 so za 2

= 1.96

E = 0.03

2

2

2

2

2

0.25

1.96 0.25

0.03

1067.1111

1068

zn

E

Page 47: Normal and standard normal distribution

A hypothesis is a claim or statement about a property of a population.

A hypothesis test is a procedure for testing a claim about a property of a population.

Page 48: Normal and standard normal distribution

The null hypothesis (denoted by H0) is a statement that the value of a population parameter (such as proportion) is equal to some claimed value.

We test the null hypothesis directly in the sense that we assume it is

true and reach a conclusion to either “reject H0” or “fail to reject H0”.

Page 49: Normal and standard normal distribution

The alternative hypothesis (denoted by 𝐻1 or 𝐻𝛼) is the statement that the parameter has a value that somehow differs from the null hypothesis.

The symbolic form of the alternative hypothesis must use one of these symbols: <, >, ≠.

Page 50: Normal and standard normal distribution

Assume that 100 babies are born to 100 couples treated with the XSORT method of gender selection that is claimed to make girls more likely.

We observe 58 girls in 100 babies. Write the hypotheses to test the claim the “with the XSORT method, the proportion of girls is greater than the 50% that occurs without any treatment”.

0

1

: 0.5

: 0.5

H p

H p

Page 51: Normal and standard normal distribution

The test statistic is a value used in making a decision about the null hypothesis, and is found by converting the sample statistic to a score with the assumption that the null hypothesis is true.

Page 52: Normal and standard normal distribution

A critical value is any value that separates the critical region (where we reject the null hypothesis) from the values of the test statistic that do not lead to rejection of the null hypothesis.

• The critical values depend on the nature of the null hypothesis, the sampling distribution that applies, and the significance level α.

The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject the null hypothesis.

Page 53: Normal and standard normal distribution

For the XSORT birth hypothesis test, the critical value and critical region for an 𝛼 = 0.05 test are shown below:

Page 54: Normal and standard normal distribution

The significance level (denoted by 𝛼) is the probability that the test statistic will fall in the critical region when the null hypothesis is actually true (making the mistake of rejecting the null hypothesis when it is true).

Common choices for 𝛼 are 0.05, 0.01, and 0.10.

Page 55: Normal and standard normal distribution

The P-value (or probability value) is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data, assuming that the null hypothesis is true.

Critical region in the left tail (<):

Critical region in the right tail (>):

Critical region in two tails (≠):

P-value = area to the left of the test statistic

P-value = area to the right of the test statistic

P-value = 2*Area in the tail beyond the test

statistic

Page 56: Normal and standard normal distribution

The claim that the XSORT method of gender selection increases the likelihood of having a baby girl results in the following null and alternative hypotheses:

The test statistic was :

The test statistic of z = 1.60 has an area of 0.0548 to its right, so a right-tailed test with test statistic z = 1.60 has a P-value of 0.0548.

0

1

: 0.5

: 0.5

H p

H p

ˆ 0.58 0.51.60

0.5 0.5

100

p pz

pq

n

Page 57: Normal and standard normal distribution

The tails in a distribution are the extreme regions bounded by critical values. Determinations of P-values and critical values are affected by whether a critical region is in two tails, the left tail, or the right tail. It, therefore, becomes important to correctly ,characterize a hypothesis test as two-tailed, left-tailed, or right-tailed.

Page 58: Normal and standard normal distribution

α is divided equally between the two

tails of the critical region

0

1

:

:

H

H

Page 59: Normal and standard normal distribution

All α in the left tail

0 :H

1 :H

Page 60: Normal and standard normal distribution

0 :H

1 :H All α in the right tail

Page 61: Normal and standard normal distribution
Page 62: Normal and standard normal distribution

Don’t confuse a P-value with a proportion p. Know this distinction:

P-value = probability of getting a test statistic at least as extreme as the one representing sample data

p = population proportion

Page 63: Normal and standard normal distribution

Using the significance level α:

If p-value ≤ 𝛼, reject H0.

If p-value > 𝛼, fail to reject H0.

“If the p-value is low, the null must go; if the p-value is high, the null will fly.”

Page 64: Normal and standard normal distribution

If the test statistic falls within the critical region, reject H0.

If the test statistic does not fall within the critical region, fail to reject H0.

Page 65: Normal and standard normal distribution

If the constructed CI does not include the claimed value of the population parameter which is assumed true under the null hypothesis, reject H0.

Note:

This method may yield a DIFFERENT result that the p-value method and the critical value method.

Page 66: Normal and standard normal distribution

For the XSORT baby gender test, the test had a test statistic of z = 1.60 and a P-Value of 0.0548. We tested:

Using the P-Value method, we would fail to reject the null at the 𝛼 = 0.05 level.

Using the critical value method, we would fail to reject the null because the test statistic of 𝑧 = 1.60 does not fall in the rejection region.

(You will come to the same decision using either method.)

0

1

: 0.5

: 0.5

H p

H p

Page 67: Normal and standard normal distribution

For the XSORT baby gender test, there was not sufficient evidence to support the claim that the XSORT method is effective in increasing the probability that a baby girl will be born.

Page 68: Normal and standard normal distribution
Page 69: Normal and standard normal distribution

Never conclude a hypothesis test with a statement of “reject the null hypothesis” or “fail to reject the null hypothesis.”

Always make sense of the conclusion with a statement that uses simple nontechnical wording that addresses the original claim.

Page 70: Normal and standard normal distribution

Some texts use “accept the null hypothesis.”

We are not proving the null hypothesis.

Fail to reject says more correctly that the available evidence is not strong enough to warrant rejection of the null hypothesis.