49
Business Statistics for Managerial Decision Introduction to Inference

Business Statistics for Managerial Decision Introduction to Inference

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Business Statistics for Managerial Decision Introduction to Inference

Business Statistics for Managerial Decision

Introduction to Inference

Page 2: Business Statistics for Managerial Decision Introduction to Inference

Introduction to Inference The purpose of inference is to draw conclusions from data. Conclusions take into account the natural variability in the

data, therefore formal inference relies on probability to describe chance variation.

We will go over the two most prominent types of formal statistical inference

Confidence Intervals for estimating the value of a population parameter.

Tests of significance which asses the evidence for a claim. Both types of inference are based on the sampling

distribution of statistics.

Page 3: Business Statistics for Managerial Decision Introduction to Inference

Introduction to Inference Since both methods of formal inference are based

on sampling distributions, they require probability model for the data.

The model is most secure and inference is most reliable when the data are produced by a properly randomized design.

When we use statistical inference we assume that the data come from a randomly selected sample or a randomized experiment.

Page 4: Business Statistics for Managerial Decision Introduction to Inference

Estimating with Confidence Community banks are banks with less than a billion dollars

of assets. There are approximately 7500 such banks in the United States. In many studies of the industry these banks are considered separately from banks that have more than a billion dollars of assets. The latter banks are called “large institutions.” The community bankers Council of the American bankers Association (ABA) conducts an annual survey of community banks. For the 110 banks that make up the sample in a recent survey, the mean assets are = 220 (in millions of dollars). What can we say about , the mean assets of all community banks?

X

Page 5: Business Statistics for Managerial Decision Introduction to Inference

Estimating with Confidence The sample mean is the natural estimator of the

unknown population mean . We know that

is an unbiased estimator of . The law of large numbers says that the sample mean must

approach the population mean as the size of the sample grows.

Therefore, the value = 220 appears to be a reasonable estimate of the mean assets for all community banks.

What if we want to do more than just provide a point estimate?

X

X

X

Page 6: Business Statistics for Managerial Decision Introduction to Inference

Estimating with Confidence Suppose that we are interested in the value

of some parameter (for example ), and we want to construct a confidence interval for it, specifying some particular desired level of confidence.

Page 7: Business Statistics for Managerial Decision Introduction to Inference

Estimating with Confidence If we have a way to estimate this parameter from

sample data (using an estimator, for example sample mean), and we know the distribution of the estimator, we can use this knowledge to construct a probability statement involving both the estimator and the true value of the parameter which we are trying to estimate.

This statement is manipulated mathematically to yield confidence limits.

Page 8: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval A level C confidence interval for a

parameter has the following form: An interval calculated from the data, usually of

the form

Estimate [Factor][SE of estimate] The value of the factor will depend upon the

level of confidence desired, and the distribution of the estimator.

Page 9: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval Suppose we are investigating a continuous

random variable x, which is normally distributed with a mean and variance 2.

We can estimate the population mean using the sample mean , calculated from a random sample of n observations; 2 can also be estimated using the sample variance S2.

X

Page 10: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval We can also estimate the standard error of

the sample mean,

:X

ˆ ( )s

SE Xn

Page 11: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval The value of the factor will depend upon the level

of confidence desired, and the distribution of the estimator.

The sampling distribution is exactly N(, ) when the population has the N(, ) distribution.

The central Limit theorem says that this same sampling distribution is approximately correct for large samples whenever the population mean and standard deviation are and .

n

Page 12: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval for a Population Mean

To construct a level C confidence interval, 1st catch the central C area under a Normal curve.

Since all Normal distributions are the same in the standard scale, we obtain what we need from the standard Normal curve.

Page 13: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval for a Population Mean

The figure in previous slide shows the relationship between central area C and the points z* that marks off this area.

Values of z* for many choices of C can be found from standard Normal table (table A).

Here are some examples; Z* 1.645 1.96 2.575 C 90% 95% 99%

Page 14: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval for a Population Mean

Any Normal curve has probability C between the points z* standard deviation below the mean and the point z* above the mean.

Sample mean has Normal distribution with mean and standard deviation n.

X

Page 15: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval for a Population Mean

So there is probability C that lies between

This is exactly the same as saying that the unknown population mean lies between

nz

nz

and

nzX

nzX

and

X

Page 16: Business Statistics for Managerial Decision Introduction to Inference

Confidence Interval for a Population Mean Choose a SRS of size n from a population having unknown

mean and known standard deviation . A level C confidence interval for is

Here z* is the critical value with area C between –z* and z* under the standard Normal curve. The quantity

is the margin of error. The interval is exact when the population distribution is normal and is approximately correct when n is large in other cases.

nzX

nz

Page 17: Business Statistics for Managerial Decision Introduction to Inference

Example: Banks’ loan –to-deposit ration

The ABA survey of community banks also asked about the loan-to-deposit ratio (LTDR), a bank’s total loans as a percent of its total deposits. The mean LTDR for the 110 banks in the sample is

and the standard deviation is s = 12.3. This sample is sufficiently large for us to use s as the population here. Find a 95% confidence interval for the mean LTDR for community banks.

7.76X

Page 18: Business Statistics for Managerial Decision Introduction to Inference

How Confidence Intervals behave? The margin of error z*n for estimating the mean of a

Normal population illustrate several important properties that are shared by all confidence intervals in common use.

Higher confidence level increases z* and therefore increases the margin of error for intervals based on the same data.

If the margin of error is too large, there are two ways to reduce it:

Use a lower level confidence (smaller c, hence smaller z*) Increase the sample size (larger n)

Page 19: Business Statistics for Managerial Decision Introduction to Inference

Example: Banks’ loan –to-deposit ration Suppose there were only

25 banks in the survey of community banks, and that X and are unchanged. The margin of error increases from 2.3 to

A 95% confidence interval for is:

8.425

3.1296.1*

nz

Page 20: Business Statistics for Managerial Decision Introduction to Inference

Example: Banks’ loan –to-deposit ration Suppose that we demand

99% confidence interval for the mean LTDR rather than 95% when n is 110.

The margin of error increases from 2.3 to

What is the 99% confidence interval?

0.3110

3.12575.2*

nz

Page 21: Business Statistics for Managerial Decision Introduction to Inference

Choosing Sample size. A wise user of statistics never plans data

collection without at the same time planning the inference.

One can arrange to have both high confidence and a small margin of error.

To obtain the desired margin of error m, set the expression equal to m, substitute the critical value z* for your desired confidence level, and solve for the sample size n.

nz

*

Page 22: Business Statistics for Managerial Decision Introduction to Inference

Choosing Sample size. The confidence interval for a population

mean will have a specified margin of error m when the sample size is

2*

m

zn

Page 23: Business Statistics for Managerial Decision Introduction to Inference

How many banks should we survey?

In our previous example we found that the margin of error was 2.3 for estimating the mean LTDR of community banks with n = 110 and 95% confidence. We are willing to settle for a margin of error of 3.0 when we do the survey next year. How many banks should we survey if we still want 95% confidence interval?

Page 24: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance Confidence intervals are appropriate when our goal is to

estimate a population parameter. The second type of inference is directed at assessing the

evidence provided by the data in favor of some claim about the population.

A significance test is a formal procedure for comparing observed data with a hypothesis whose truth we want to assess.

The hypothesis is a statement about the parameters in a population or model.

The results of a test are expressed in terms of a probability that measures how well the data and the hypothesis agree.

Page 25: Business Statistics for Managerial Decision Introduction to Inference

Example: Bank’s net income The community bank survey described in previous

lecture also asked about net income and reported the percent change in net income between the first half of last year and the first half of this year. The mean change for the 110 banks in the sample is Because the sample size is large, we are willing to use the sample standard deviation s = 26.4% as if it were the population standard deviation . The large sample size also makes it reasonable to assume that is approximately normal.

%1.8X

X

Page 26: Business Statistics for Managerial Decision Introduction to Inference

Example: Bank’s net income Is the 8.1% mean increase in a sample good evidence that

the net income for all banks has changed? The sample result might happen just by chance even if the

true mean change for all banks is = 0%. To answer this question we asks another

Suppose that the truth about the population is that = 0% (this is our hypothesis)

What is the probability of observing a sample mean at least as far from zero as 8.1%?

Page 27: Business Statistics for Managerial Decision Introduction to Inference

Example: Bank’s net income The answer is:

Because this probability is so small, we see that the sample mean is incompatible with a population mean of = 0.

We conclude that the income of community banks has changed since last year.

0006.9994.1

)22.3()1104.26

01.8()1.8(

ZPZPXp

1.8X

Page 28: Business Statistics for Managerial Decision Introduction to Inference

Example: Bank’s net income We calculated a probability assuming that the true

mean is zero ( = 0 ). This probability guides our final choice.

If the probability is very small, the data don’t fit our assumption that ( = 0 ) and we conclude that the mean is not in fact zero.

The fact that the calculated probability is very small leads us to conclude that the average percent change in income is not in fact zero.

Page 29: Business Statistics for Managerial Decision Introduction to Inference

Example:Is this percent change different from zero?

Suppose that next year the percent change in net income for a sample of 110 banks is = 3.5%. (We assume that the standard deviation = 26.4%.) This sample mean is closer to the value = 0 corresponding to no mean change in income. What is the probability that the mean of a sample of size n = 110 from a normal population with = 0 and standard deviation = 26.4 is as far away or farther away from zero as ?5.3X

X

Page 30: Business Statistics for Managerial Decision Introduction to Inference

Example:Is this percent change different from zero?

The answer is:

A sample result this far from zero would happen just by chance in 8% of samples from a population having true mean zero.

An outcome that could so easily happen by chance is not good evidence that the population mean is different from from zero.

08.9177.1

)39.1()1104.26

05.3()5.3(

zPzPXP

Page 31: Business Statistics for Managerial Decision Introduction to Inference

Example:Is this percent change different from zero?

The mean change in net assets for a sample of 110 banks will have this sampling distribution if the mean for the population of all banks is = 0.

A sample mean could easily happen by chance. A sample mean is far out on the curve that it would rarely happen just by chance.

%5.3X

%1.8X

Page 32: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details The first step in a test of significance is to state a

claim that we will try to find evidence against. Null Hypothesis H0

The statement being tested in a test of significance is called the null hypothesis.

The test of significance is designed to assess the strength of the evidence against the null hypothesis.

Usually the null hypothesis is a statement of “no effect” or “no difference.” We abbreviate “null hypothesis” as H0.

Page 33: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details A null hypothesis is a statement about a population, expressed

in terms of some parameter or parameters. The null hypothesis in our bank survey example is

H0 : = 0 It is convenient also to give a name to the statement we hope or

suspect is true instead of H0. This is called the alternative hypothesis and is abbreviated as

Ha. In our bank survey example the alternative hypothesis states

that the percent change in net income is not zero. We write this as

Ha : 0

Page 34: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details Since Ha expresses the effect that we hope to find evidence

for we often begin with Ha and then set up H0 as the statement that the Hoped-for effect is not present.

Stating Ha is not always straight forward. It is not always clear whether Ha should be one-sided or

two-sided. The alternative Ha : 0 in the bank net income

example is two-sided. In any given year, income may increase or decrease, so

we include both possibilities in the alternative hypothesis.

Page 35: Business Statistics for Managerial Decision Introduction to Inference

Example:Have we reduced processing time?

Your company hopes to reduce the mean time required to process customer orders. At present, this mean is 3.8 days. You study the process and eliminate some unnecessary steps. Did you succeed in decreasing the average process time? You hope to show that the mean is now less than 3.8 days, so the alternative hypothesis is one sided, Ha : < 3.8. The null hypothesis is as usual the “no change” value, H0 : = 3.8.

Page 36: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details

Test statistics We will learn the form of significance tests in a

number of common situations. Here are some principles that apply to most tests and that help in understanding the form of tests:

The test is based on a statistic that estimate the parameter appearing in the hypotheses.

Values of the estimate far from the parameter value specified by H0 gives evidence against H0.

Page 37: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details

A test statistic measures compatibility between the null hypothesis and the data.

Many test statistics can be thought of as a distance between a sample estimate of a parameter and the value of the parameter specified by the null hypothesis.

Page 38: Business Statistics for Managerial Decision Introduction to Inference

Example: bank’s income The hypotheses:

H0 : = 0

Ha : 0 The estimate of is the sample mean .

Because Ha is two-sided, large positive and negative values of (large increases and decreases of net income in the sample) counts as evidence against the null hypothesis.

X

X

Page 39: Business Statistics for Managerial Decision Introduction to Inference

Example: bank’s income The test statistic

The null hypothesis is H0 : = 0, and a sample gave the . The test statistic for this problem is the standardized version of :

This statistic is the distance between the sample mean and the hypothesized population mean in the standard scale of z-scores.

1.8XX

n

Xz

0

22.31104.26

01.8

z

Page 40: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details The test of significance assesses the evidence against the

null hypothesis and provides a numerical summary of this evidence in terms of probability.

P-value The probability, computed assuming that H0 is true, that the test

statistic would take a value extreme or more extreme than that actually observed is called the P-value of the test. The smaller the p-value, the stronger the evidence against H0 provided by the data.

To calculate the P-value, we must use the sampling distribution of the test statistic.

Page 41: Business Statistics for Managerial Decision Introduction to Inference

Example: bank’s income The P-value

In our banking example we found that the test statistic for testing H0 : = 0 versus Ha : 0 is

If the null hypothesis is true, we expect z to take a value not far from 0.

Because the alternative is two-sided, values of z far from 0 in either direction count ass evidence against H0. So the P-value is:

22.31104.26

01.8

z

0012.0006.0)9994.1(

)22.3()22.3(

zpzP

Page 42: Business Statistics for Managerial Decision Introduction to Inference

Example: bank’s income The p-value for bank’s

income. The two-sided p-value is

the probability (when H0 is true) that takes a value at least as far from 0 as the actually observed value.

X

Page 43: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details We know that smaller P-values indicate stronger

evidence against the null hypothesis. But how strong is strong evidence? One approach is to announce in advance how much

evidence against H0 we will require to reject H0. We compare the P-value with a level that says “this

evidence is strong enough.” The decisive level is called the significance level. It is denoted be the Greek letter .

Page 44: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details

If we choose = 0.05, we are requiring that the data give evidence against H0 so strong that it would happen no more than 5% of the time (1 in 20) when H0 is true.

Statistical significance If the p-value is as small or smaller than , we

say that the data are statistically significant at level .

Page 45: Business Statistics for Managerial Decision Introduction to Inference

Tests of Significance: Formal details You need not actually find

the p-value to asses significance at a fixed level .

You can compare the observed test statistic z with a critical value that marks off area in one or both tails of the standard Normal curve.

Page 46: Business Statistics for Managerial Decision Introduction to Inference

Test for a Population Mean

Page 47: Business Statistics for Managerial Decision Introduction to Inference

Two Types of Error In tests of hypothesis, there are simply two

hypotheses, and we must accept one and reject the other.

We hope that our decision will be correct, but sometimes it will be wrong.

There are two types of incorrect decisions. If we reject H0 when in fact H0 is true. This is called

type I error. If we accept H0 when in fact Ha is true. This is called

Type II error.

Page 48: Business Statistics for Managerial Decision Introduction to Inference

Two Types of Error

Page 49: Business Statistics for Managerial Decision Introduction to Inference

Two Types of Error Significance and type I error

The significance level of any fixed level test is the probability of a type I error.

That is, is the probability that the test will reject the null hypothesis H0 when in fact H0 is true.