Upload
noah-cummings
View
232
Download
5
Embed Size (px)
Citation preview
1
Lecture 19: Hypothesis Tests
Devore, Ch. 8.1
Topics
I. Statistical Hypotheses (pl!)– Null and Alternative Hypotheses– Testing statistics and rejection
regions
II. Errors in Hypothesis Testing– Type I and II errors
I. A Statistical Hypothesis..
• Is a claim about the value of a single population characteristic, or relationship between several population characteristics.
– Examples of Claims:• Mean diameter of the engine cylinder is 81 mm.• Mean of batch 1 is no different than the mean of batch 2.• Variance of batch 1 is different than variance of batch 2.• % Defective of batch 1 is less than 5%.
• In hypothesis testing, we take a sample of data and test a claim.
Null and Alternative Hypotheses
• To evaluate a claim, you identify a null and alternative hypothesis.
• Null Hypothesis, Ho– Claim that is initially assumed to be true.
• Alternative Hypothesis, Ha– Assertion that is contradictory to Ho.
• Null Hypothesis is rejected if sample evidence suggests that it is false. If not false, we fail to Reject Ho.
• So, possible outcomes of test are:– Reject Ho Or Fail to Reject Ho – NOTE: fail to reject Ho is different from saying
that we have proven Ho is true.
“Favored Claim”• In setting up a test, we typically have a favored claim which is
the Ho.– In practice, we typically set Ho as the condition with the
“=“ sign, and Ha as the condition with “<“, “>”, or “≠”• Familiar analogy: innocent until proven guilty.
• Practical examples:– Suppose you want to determine if a worker performs his
job ok.• Ho: worker is meeting the minimum job requirements.• Ha: worker is not meeting the minimum job
requirements.
– Suppose you want to know whether to rework a machine-tool.
• Ho: machine produces a part feature average on target.• Ha: machine does not produce a feature average on
target.
Sample Null Hypotheses
• Identify a null and alternative hypothesis for each of the prior examples.
– Mean diameter of the engine cylinder is 81 mm.
– Mean of batch 1 is not different than mean of batch 2.
– Variance of batch 1 is different than the variance of batch 2.
– % Defective of batch 1 is less than 5%.
A Test of Hypotheses
A test of hypotheses is a method for using sample data to decide whether the null hypothesis should be rejected.
Statistical Hypothesis Tests
• Hypothesis tests often involve one of the following:– Comparison of Means
• Single mean to a standard value• Two sample means• More than two sample means
– Comparison of Variances• Single variance to a standard value• Two sample variances• More than two sample variances.
– Comparison of Proportions• Single proportion to a standard value• Two proportions
• Of course, they may be applied to any test statistic: e.g., correlation, median, if distribution is normal, etc.
Test Procedures
• To perform a hypothesis test, you need:
– Null and Alternative Hypothesis– An assumed data pattern / distribution
(e.g., normal, iid)– Test Statistic - function of the sample data
on which the decision is based.– Rejection region - set of test statistic
values for which the Ho is rejected. (based on error threshold)
Example: Cylinder Bore
• Suppose you take a sample of 25 cylinders and consider a mean to be off target if bore diameter < 80.9. – If X ~ N( = 81, x
= 0.302) x-bar = ?
• Construct a 95% two-sided CI on the true mean, ,?
• With 95% confidence, would you conclude that a sample mean of 80.9 is different than a mean = 81?
II. Errors in Hypothesis Testing
• Rejection region of a hypothesis test is based on an acceptance of error when drawing a conclusion.
• When drawing a conclusion based on a test, four results are possible (hint: think of possible outcomes in a jury trial).
TRUTH What the Jury Says
Innocent / Guilty Innocent / Guilty
Outcomes of a Decision
ConcludeOr Say
NotDifferent
Different
Truth
Not Different Different
Definitions of Error Types
• Type I error [also known as alpha () error] - • FORMAL: Reject Ho when Ho is true• PRACTICAL: Conclude a difference exists when no
difference exists.
• Type II error [also known as beta () error] - • FORMAL: Fail to Reject Ho when Ho is false • PRACTICAL: Conclude no difference exists when a
difference exists.
• Why might a decision from a statistical test be wrong?
• Can we eliminate both types of errors?
Rejection Region: and
Suppose an experiment and a sample size are fixed, and a test statistic is chosen.
Decreasing the size of the rejection region to obtain a smaller value of
results in a larger value of , for any particular parameter value consistent with Ha.
Significance Level
Specify the largest value of that can be tolerated and find a rejection region having that value of . This makes as small as possible subject to the bound on . The resulting value of is referred to as the significance level.
Level Test
A test corresponding to the significance level is called a level test. A test with significance level is one for which the type I error probability is controlled at the specified level.
P(type I error) ~
• If we assume that Ho is true, we may calculate the probability of a wrong conclusion.
• Requirements: need an assumed distribution and estimates of distribution parameters (e.g., expected mean, variance).
• We do this by finding the probability of some value, X, relative to its underlying distribution (or pdf/cdf).– Value for X given X ~ Bin (15, 0.2)– Value for X given X ~ Normal (81, 0.252)
P(type I error) - Example
• Suppose 1% of books for a particular textbook fail a binding test. You wish to test the hypothesis that p=0.01 when X ~ Bin(200, 0.01).
– Test Statistic: X is the number of test failures.– Rejection Region: Conclude the failure rate has increased (reject
Ho) if you draw a sample with X >= 5. Accept Ho X<=4
– Identify a Ho and Ha for this situation.– If Ho is true, what is the probability that you will observe 5 or more
failures? [Hint: 1 - Pr(X <= 4)]– When Ho is true, what % of the time will you make a wrong
conclusion?– If you wanted to be 95% confident in your decision, would you
conclude the Ho is true? What is alpha in this example?
P(type II error) ~ • P(type II error) ~ prob conclude no difference exists, when a
difference exists. In other words, failure to detect some difference, .
– Example: suppose shifts some , so new= + – A type II error would be a failure to detect shift (i.e., conclude
no difference exists even though a shift has occurred)
• Unliketheerror does not exist for a unique value of test parameter (rejection limit). Rather, it varies based on the value for the amount of difference, , you are trying to detect given some sample size. – Ho: p = 0.01 (single value); Ha: p > 0.01 (many values exist)
Binding test example, Find
• If the fraction defective, p, equal to 0.01; n = 200.– Rejection Region: X >= 5
• What is the probability that you will fail to detect a shift in p from 0.01 to 0.015 (n=200)?
• What is the probability that you will fail to detect a shift in p from 0.01 to 0.05 (n=200)?
Engine Cylinder: & Errors
• Suppose you take a sample of 25 cylinders – If X ~ N( = 81, x = 0.302) x-bar = ?– Suppose you reject batch if sample mean <= 80.9.
• What is Pr(type I error)?– P(Ho is rejected when Ho is true)?
• What is Pr(type II error) if true mean shifts to 80.9 mm?
• What is Pr(type II error) if true mean shifts to 80.75 mm?
/ and Rejection Regions
• Suppose you wish to reduce type I errors by increasing the rejection region, what happens to error if you maintain the same test and same sample size, n.
• In our prior example, what is Pr(type I error) if rejection region is expanded to 80.75?
• What happens to Pr(type II error) if rejection region is expanded to 80.75 and you try to detect a shift from 81 - 80.75?
• Identify general statement about , and rejection regions given same experiment and fixed n.
/ and Sample Size
• Suppose you wish to increase the sample size from 25 to 50.
• If If X ~ N( = 81, x = 0.302)
• What happens to Pr(type I error) if n increases to 50?
• What happens to Pr(type II error) if n increases to 50 and you try to detect a new mean = 80.75?
• Comment about , and n.