55
Module 3 – Proposed Rearrangements and Additions New Order Topic L.O. Comment Mathematics Required Software? Stage Corrections, Changes to Existing Material BN3.1 Now Showing: Sens and Spec: Introduction 1-3 Five Questions None No Engage N/A BN3.2 Dichotomous Decisions 1-3 Current BN3.1 Proportional Reasoning No Engage N/A BN3.3 Now Showing: Sens and Spec: Comp and Egs 1-3 Five Questions Fractions, Percentages No Engage N/A BN3.4 The FST and Statistical Wooziness 1-3 Current BN3.3 Fractions, Percentages No Engage Take the Goggle references off the table 3.1. Just have the heading record the group’s status: Simulated Drunk or Simulated Sober. Remove goggle reference from Table 3.2 as well. BN3.5 Test for Republicanism 1-3 New Fractions, Percentages No Engage N/A BN3.6 Applying Sensitivity and Specificity 1-3 Current BN3.5 Fractions, Percentages No Engage Exhibit 2, second paragraph. Take out “Work in pairs to” and just have “Fill out …” BN3.7 Changing the Rules 1-3 Current BN3.6 Fractions, Percentages No Reflect N/A BN3.8 Conditional Reasoning with 2x2 Tables 1-3 Current BN3.15 Fractions, Percentages No Reflect Change title to “Thinking about Conditional Reasoning – I” Be aware that the BN3.6 reference in the exhibit is going to be BN 3.7. Let’s change that to “data tables introduced earlier” so we don’t have to always chase those. BN3.9 More Conditional Reasoning with 2x2 Tables 1-3 Current BN3.16 Fractions, Percentages No Reflect Change title to “Thinking about Conditional Reasoning – II” BN3.10 Bayes Rule 1-3 New Fractions, Equations No Extend N/A BN3.11 Can Sensitivity Stand Alone? 1-3 New Fractions, Proportional Reasoning No Extend N/A BN3.12 Screening Accuracy 1-3 New Convex combinations, Equations No Extend N/A BN3.13 Now Showing: Hyp Tsting as Diagnostic Tool 4-7 Five Questions None No Engage N/A BN3.14 Treatment Decision: Effective or Not 4-7 Current BN3.2 None No Engage N/A BN3.15 Stat Expts: Connecting to Sens and Spec 4-7 Current BN3.7 None No Engage N/A BN3.16 Now Showing: Hyp Tsting – Applying Concepts 4-7 Five Questions None No Engage N/A BN3.17 Stat Sig: Part I 4-7 Current BN3.9 None No Engage N/A BN3.18 Stat Sig: Part II 4-7 Current BN3.10 None No Engage N/A BN3.19 Media P-values: Part I 4-7 Current BN3.11 None No Engage Changes: 1. Title: Statistical Significance in the Media: Part III (I think we have to get p-values out of the title.) 2. Exhibit 1, paragraphs “H0 is called a null …. Just keep in mind:” Remove. 3. Exhibit 1, second bullet: replace “estimated false positive rate is sufficiently low” with “Type I error rate is set sufficiently low” 4. Exhibit 1, third bullet: replace last sentence with “Typically, “sufficiently low” means the Type I error rate was set to be 0.05 or less.” 5. Exhibit 1, fourth bullet. Remove. 6. Exhibit 1, last paragraph. Remove the last two sentences. This should all allow Exhibit 1 to be on one page and the questions for Exhibit 2 can be further opened up to make it easier to have a place to write. BN3.20 Media P-values: Part II 4-7 Current BN3.12 None No Engage Changes: 1. Title: Statistical Significance in the Media: Part IV 2. Exhibit 1, Question 2. Replace “concept of false positive rate” with “concept of Type I error”. 3. Exhibit 2, Opening paragraph. Change “administration’s stand on” to “strongly worded dismissal of” 4. Exhibit 2, paragraph “With the House … of opposition.” Remove 5. Exhibit 2, Start with “That dismissal might come …” Recover space for questions 1 nad 2 below the Exhibit. BN3.21 Now Showing: Practical Significance 4-7 Five Questions Fractions, percentages No Reflect N/A BN3.22 Practical Significance vs. Statistical Significance 4-7 Current BN3.13 Decimals, addition No Reflect N/A BN3.23 A Practical Discussion 4-7 New None Yes Reflect N/A BN3.24 Accept or Fail to Reject? Semantics or Real? 4-7 New None No Extend N/A BN3.25 Origins of Power 4-7 New None Yes Extend N/A

Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Module 3 – Proposed Rearrangements and Additions

New Order Topic L.O. Comment Mathematics Required Software? Stage Corrections, Changes to Existing Material

BN3.1 Now Showing: Sens and Spec: Introduction 1-3 Five Questions None No Engage N/A

BN3.2 Dichotomous Decisions 1-3 Current BN3.1 Proportional Reasoning No Engage N/A

BN3.3 Now Showing: Sens and Spec: Comp and Egs 1-3 Five Questions Fractions, Percentages No Engage N/A

BN3.4 The FST and Statistical Wooziness 1-3 Current BN3.3 Fractions, Percentages No Engage Take the Goggle references off the table 3.1. Just have the heading record the group’s status: Simulated Drunk or Simulated Sober. Remove goggle reference from Table 3.2 as well.

BN3.5 Test for Republicanism 1-3 New Fractions, Percentages No Engage N/A

BN3.6 Applying Sensitivity and Specificity 1-3 Current BN3.5 Fractions, Percentages No Engage Exhibit 2, second paragraph. Take out “Work in pairs to” and just have “Fill out …”

BN3.7 Changing the Rules 1-3 Current BN3.6 Fractions, Percentages No Reflect N/A

BN3.8 Conditional Reasoning with 2x2 Tables 1-3 Current BN3.15 Fractions, Percentages No Reflect Change title to “Thinking about Conditional Reasoning – I” Be aware that the BN3.6 reference in the exhibit is going to be BN 3.7. Let’s change that to “data tables introduced earlier” so we don’t have to always chase those.

BN3.9 More Conditional Reasoning with 2x2 Tables 1-3 Current BN3.16 Fractions, Percentages No Reflect Change title to “Thinking about Conditional Reasoning – II”

BN3.10 Bayes Rule 1-3 New Fractions, Equations No Extend N/A

BN3.11 Can Sensitivity Stand Alone? 1-3 New Fractions, Proportional Reasoning No Extend N/A

BN3.12 Screening Accuracy 1-3 New Convex combinations, Equations No Extend N/A

BN3.13 Now Showing: Hyp Tsting as Diagnostic Tool 4-7 Five Questions None No Engage N/A

BN3.14 Treatment Decision: Effective or Not 4-7 Current BN3.2 None No Engage N/A

BN3.15 Stat Expts: Connecting to Sens and Spec 4-7 Current BN3.7 None No Engage N/A

BN3.16 Now Showing: Hyp Tsting – Applying Concepts 4-7 Five Questions None No Engage N/A

BN3.17 Stat Sig: Part I 4-7 Current BN3.9 None No Engage N/A

BN3.18 Stat Sig: Part II 4-7 Current BN3.10 None No Engage N/A

BN3.19 Media P-values: Part I 4-7 Current BN3.11 None No Engage Changes: 1. Title: Statistical Significance in the Media: Part III (I think we have to get p-values out

of the title.) 2. Exhibit 1, paragraphs “H0 is called a null …. Just keep in mind:” Remove. 3. Exhibit 1, second bullet: replace “estimated false positive rate is sufficiently low” with

“Type I error rate is set sufficiently low” 4. Exhibit 1, third bullet: replace last sentence with “Typically, “sufficiently low” means

the Type I error rate was set to be 0.05 or less.” 5. Exhibit 1, fourth bullet. Remove. 6. Exhibit 1, last paragraph. Remove the last two sentences. This should all allow Exhibit

1 to be on one page and the questions for Exhibit 2 can be further opened up to make it easier to have a place to write.

BN3.20 Media P-values: Part II 4-7 Current BN3.12 None No Engage Changes: 1. Title: Statistical Significance in the Media: Part IV2. Exhibit 1, Question 2. Replace “concept of false positive rate” with “concept of Type I

error”. 3. Exhibit 2, Opening paragraph. Change “administration’s stand on” to “strongly worded

dismissal of”4. Exhibit 2, paragraph “With the House … of opposition.” Remove5. Exhibit 2, Start with “That dismissal might come …” Recover space for questions 1 nad

2 below the Exhibit.

BN3.21 Now Showing: Practical Significance 4-7 Five Questions Fractions, percentages No Reflect N/A

BN3.22 Practical Significance vs. Statistical Significance 4-7 Current BN3.13 Decimals, addition No Reflect N/A

BN3.23 A Practical Discussion 4-7 New None Yes Reflect N/A

BN3.24 Accept or Fail to Reject? Semantics or Real? 4-7 New None No Extend N/A

BN3.25 Origins of Power 4-7 New None Yes Extend N/A

Page 2: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BN3.26 Error Rates and P-values 4-7 New Logs, exponents Yes Extend N/A

BN3.27 Now Showing: Hyp Tsting – Computations 8-10 Five Questions Addition, division, sqr roots No Engage N/A

BN3.28 Practice with Proportions – I 8-10 Revised BN3.14 Addition, division, sqr roots No Engage Easiest to just replace this. See replacement in pdf package.

BN3.29 Practice with Proportions - II 8-10 New Addition, division, sqr roots, solving an equation

No Engage N/A

BN3.30 Confirming What We Read 8-10 New Addition, division, sqr root No Engage N/A

BN3.31 Computations vs. Understanding 8-10 New No Reflect N/A

BN3.32 The Role of Sample Size 8-10 New Addition, division, sqr root No Reflect N/A

BN3.33 Two-tailed Test for Proportions 8-10 New Addition, division, sqr root No Extend N/A

BN3.34 Confidence Intervals for Testing 8-10 New Addition, division, sqr root Yes Extend N/A

BN3.35 Hypothesis Testing Involving a Mean 8-10 New Addition, division, sqr root Yes Extend N/A

BN3.36 Hypothesis Testing Involving Two Means 8-10 New Addition, division, sqr root Yes Extend N/A

BC3.1 Leave as is

BC3.2 First paragraph under Assignment. Remove Make sure your name is on the paper.

BC3.3 Leave as is

Old BN3.17 REMOVE

Page 3: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.1_ LEARNING OUTCOME _

Now Showing: Sensitivity and Specificity - Introduction

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Sensitivity and Specificity Content Videos Answer the following questions while watching the content video on Sensitivity and Specificity -Introduction. 1. What are the two ways a screening test can be wrong? 2. What is a “false positive rate”? 3. What is sensitivity in terms of the false negative rate? 4. Give two reasons why the “99% accuracy” claims on home pregnancy tests might be misleading. 5. Of the eleven home pregnancy kits compared, which ones are most “accurate” and why?

Page 4: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.3_ LEARNING OUTCOME _

Now Showing: Sensitivity and Specificity --- Computations and Examples

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Sensitivity and Specificity Content Videos Answer the following questions while watching the content video on Sensitivity and Specificity –Computations and Examples. 1. In the ImPACT example, how many athletes studied were really not concussed? 2. In the ImPACT example, how many athletes studied were said to be concussed by the screening

test? 3. In the Beck Inventory example, what percentage of the time did the screening test make the right

decision? 4. In the Field Sobriety Test example, using a cutoff of 4, what was the sensitivity? 5. In the Field Sobriety Test example, what would happen to the false positive rate if the cutoff for

the test flagging a subject as drunk were changed from 4 to 8?

Page 5: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.5_ LEARNING OUTCOME 5

Test for Republicanism

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

Not a Red Cent? Title: Richer people continue to vote Republican Author: By Andrew Gelman Source: Campaigns and Elections, Political Economy (Comment), November, 14, 2012.

http://themonkeycage.org/2012/11/14/richer-people-continue-to-vote-republican/ Andrew Gelman is a professor of statistics and political science at Columbia University. He is also an expert on how and why people vote. After the 2012 Presidential election he weighed in on the old debate of who is richer, Democrats or Republicans. Gelman argues that “…, both the Democrats and the Republicans are ‘the party of the rich.’ But Republicans more so than Democrats.” Among the evidence he offers is the following graph:

Question

1. Carefully explain how the graph supports Gelman’s point.

 

Page 6: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit 2

We the People … Define “discretionary spending” as “the amount of money you use for non-essential items; money you spend as you see fit.” Your screening rule categorizes anyone who allocates $50.00 or more a week for discretionary spending as a Republican. Let’s see how well this screening test works. Your instructor may do this activity in class or you may be assigned to do it as homework. In either case, you are required to cross-classify at least 20 individuals into the following table. So you are required to collect two pieces of information on each individual: level of discretionary spending and party affiliation. You probably should collect both anonymously. Questions 1. What is the sensitivity of

this screening test based on the data collected? If you are not turning this page in with your answer then you should include a copy of the table in your answer.

2. What is the specificity of this screening test based on the data collected? If you are not turning

this page in with your answer then you should include a copy of the table in your answer.

3. In opposition to the position taken by Gelman, some have argued that so-called red states have a lower average per-capita income than blue states. This is a subtle argument. Offer at least one well-reasoned suggestion that it might be an oversimplification.

 Predicted Affiliation Based on Discretionary Spending 

Amounts  

$50.00 or more  Republican 

 Actual Party Affiliation 

 

 

  

Republican 

  

Non‐Republican 

  Totals 

 Republican 

     

 Non‐Republican 

     

 Totals 

     

Page 7: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.10_ LEARNING OUTCOME 5

Bayes Rule

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Background The notation P(A|B) is read as “the probability of A, given B has occurred.” So the “|” symbol is read as “given.” Formally, A and B are called events and P(A|B)is a conditional probability. For the particular context of screening tests, let’s use “T+” to denote the event, wherein a screening test says that a condition is present. Likewise “T-“ denotes a screening test saying a condition is absent. CP will denote condition really present, and CA, condition really absent. Some of these definitions you know, just not in this notation; some you don’t:

1. P(T+|CP) is the sensitivity of the test 2. P(T-|CA) is the specificity of the test 3. P(CP|T+) is the positive predictive value of the test 4. P(CA|T-) is the negative predictive value of the test 5. P(CP) is the prevalence of the condition in the population

Bayes Rule is a very useful way of relating conditional and unconditional probabilities.  It is stated as 

follows.  For any two events A and B (not just those defined above!) we have: 

P A|B P B|A xP A

P B 

Exhibit

Ottawa Ankle Test Title: Sensitivity of the Ottawa Rules Author: By G. Lucchesi, R., Jackson RE, W. Peacock WF, C. Cerasani, and R. Swor. Source: Ann Emerg Med. 1995 Jul;26(1):1-5. The Ottawa Ankle Test is designed to exclude fractures of the ankle (or midfoot) and is commonly practiced in medicine. A careful examination is performed that looks at two particular places on the ankle area. If there is tenderness in either of those two places, or an inability to bear weight on the foot, then the Ottawa rules conclude there is a fracture. Else, the test concludes there is not a fracture. In this publication, the authors conducted one of the first sensitivity and specificity studies for the Ottawa Ankle Test. The following table did not appear in the article referenced but was reconstructed

Page 8: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

from summary values that were recorded therein. A total of 421 patients with ankle injuries were studied. The truth regarding their fractures was confirmed with x-rays.

Questions

1. Let CP be the event that the ankle is truly fractured, and T+ the event that the test said it was. From the table, estimate the positive predictive value and the negative predictive value.

2. Which do you think a patient would be more interested in, P(T+|CP) or P(CP|TP)? Why?

3. Have a look at the Bayes Rule equation again. Write out the equation for the positive predictive

value. What happens to the positive predictive value as the prevalence in the population gets very large? Do you find that reasonable or not? Explain.

4. Write out the equation for the negative predictive value. What happens to the negative predictive

value as the prevalence in the population gets very large? Do you find that reasonable or not? Explain.

Predicted by the Ottawa Ankle Test 

 Truth Regarding Fracture 

  

Not Fractured  Fractured Totals 

Not Fractured  51  5  56 

Fractured  277  88  365 

Totals 328  93  421 

Page 9: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.11_ LEARNING OUTCOME 5

Can Sensitivity Stand Alone?

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

Stairway Stumble Title: Ruling a Diagnosis In or Out with “SpPin” and “SnNOut”: a Note of Caution Author: By D. Pewsner, M. Battaglia, C. Minder, A. Marx, H. Bucher, and M. Egger Source: BMJ, Volume 329, July 2004, pp. 209 ff. The authors of this study describe a situation where a physician attending to the patients of a colleague on vacation encountered the following: “… a 40 year old teacher who had sprained her right ankle. Returning from a conference, she had stumbled while walking down the stairs with a heavy bag. Examination revealed a moderately swollen lateral right ankle.” The attending screened the woman using the Ottawa ankle rules (see BN 3.10) and upon finding no tenderness of the bone ruled out a fracture without requiring an xray. His reasoning was based on the “SnNOut” idea that many physicians still use today - that’s the idea that high sensitivity (Sn) and a negative test result (N) implies that a real positive can safely be ruled out (Out). Is it reasonable to assume that sensitivity can stand on its own this way? Let’s look at the Ottawa data again and see. As before, we use “T+” to denote the event of a screening test saying that a condition is present. Likewise “T-“ denotes a screening test saying a condition is absent. CP will denote a condition really being present, and CA, condition really absent.

Questions

1. Using the data in the table, show that the sensitivity of the test is impressively high.

Predicted by the Ottawa Ankle Test 

 Truth Regarding Fracture 

  

Not Fractured  Fractured Totals 

Not Fractured  51  5  56 

Fractured  277  88  365 

Totals 328  93  421 

Page 10: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. What is the P(CP|T-)? Does this value support the SnNOut idea? Why or why not?

Exhibit 2

Epidemic of Fractures Now let’s change the data and assume the screening test results looked like:

Questions

1. Show that the sensitivity and specificity are essentially the same as in Exhibit 1.

2. Compute P(CP|T-) for these data. Was high sensitivity alone enough here to safely say a patient with a negative test likely does not have a fracture? Why or why not?

3. Find the prevalence of this new data table. How does it compare to the prevalence seen in the

data from Exhibit 1? The key to why the SnNOut presumption worked in Exhibit 1 and not Exhibit 2 is the difference in prevalence. Give some intuitive reasons why.

Predicted by the Ottawa Ankle Test 

 Truth Regarding Fracture 

  

Not Fractured  Fractured Totals 

Not Fractured  5  21  26 

Fractured  26  369  395 

Totals 31  390  421 

Page 11: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.12_ LEARNING OUTCOME 5

Screening Accuracy

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Background Just how do you tell if a screening test is accurate? We know to look at both the sensitivity and the specificity of the test and, ideally, we want both of those to be high. This isn’t always possible, however, since they tend to be somewhat inversely related. A measure of “overall accuracy” is often used to summarize in a single number the goodness of a screening test. Using the notation in the table below, overall accuracy is defined as:

Overall Accuracy = (Prevalence) x Sensitivity + (1-Prevalence) x Specificity where, as you already know: Prevalence = (C+D)/N Sensitivity = (D)/(C+D) Specificity = (A)/(A+B) Exhibit 1

CAGE Practice Title: Ruling a Diagnosis In or Out with “SpPin” and “SnNOut”: a Note of Caution Author: By D. Pewsner, M. Battaglia, C. Minder, A. Marx, H. Bucher, and M. Egger Source: BMJ, Volume 329, July 2004, pp. 209 ff. The authors of this study describe a situation where a physician attending to the patients of a colleague on vacation encountered the following: “… a 40 year old teacher who had sprained her right ankle. Returning from a conference, she had stumbled while walking down the stairs with a heavy bag. Examination revealed a moderately swollen lateral right ankle. The patient was able to walk but was clearly in pain. Her breath smelt of alcohol.” After ruling out an ankle fracture, the attending physician administered another screening test. This one is known as the CAGE test. It consists of 4 questions and agreement with two or more is taken as a suggestion of a problem with alcohol. Here are the data for one assessment of the goodness of this test:

Predicted by the Screening Test 

Truth    

Condition Not Present 

Condition Present  Totals 

Condition Not Present  A  C  A+C 

Condition Present   B  D  B+D 

Totals A+B  C+D  N=A+B+C+D 

Page 12: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions

1. Compute the sensitivity and specificity of this test.

2. Compute prevalence of the condition, using the data shown.

3. What is the “overall accuracy” of the test? Does this seem reasonable? Explain.

Exhibit 2

Relevant Prevalence Title: The Use of “Overall Accuracy” to Evaluate the Validity of Screening or Diagnostic Tests Author: By A. Alberg, J. Park, B. Hager,, M. Brock, and M. Diener-West Source: J GEN INTERN MED 2004;19:460– 465. The authors of this study discuss the pitfalls of using Overall Accuracy as a summary measure of how well a test is doing. The following table regarding a screening test for liver cancer was recreated from summary numbers in this article.

Questions

1. Compute the sensitivity and specificity of this test.

Predicted by CAGE Test 

 Truth Regarding Problem 

  

No Alcohol Problem 

Alcohol Problem  Totals 

No Alcohol Problem  400  57  457 

Alcohol Problem  1  60  61 

Totals 401  117  518 

Predicted by Test 

 Truth Regarding Cancer 

  

No Liver Cancer 

Liver Cancer Totals 

No Liver Cancer  543  18  561 

Liver Cancer  29  12  41 

Totals 572  30  602 

Page 13: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. What is the “overall accuracy” of the test? Does this seem reasonable? Explain.

Exhibit 3

Counting for Sex

The use of Overall Accuracy as a measure of the goodness of a screening test can have some unexpected side effects. Take gender identification. Suppose we have a test that is designed to identify gender simply by counting the number of letters in a person’s name.  Too many letters and 

we will decide person is a female. Too few and we will decide the person is a male.  Absurd?  Surely.  

But the important question before us now is whether the test has good Overall Accuracy. 

These data are real data from a class of statistics students from the University of Kentucky.  There 

were 10 males in the sample and 58 females.  The test was designed to identify a positive outcome 

(female) if there were 9 or more letters in the person’s name.  Here are the results:1 

 Questions 1. Compute the sensitivity and

specificity of this test.

2. What is the “overall accuracy” of the test? Does this value convince you that the number of letters in our names is a good way to identify our gender? Defend your answer.

                                                            1 A much bigger study of the behavior of this test is included as an optional Beyond the Class assignment in this module.   

Predicted by Test 

 Truth Regarding Gender 

  

Male  Female Totals 

Male  0  3  3 

Female  10  55  65 

Totals 10  58  68 

Page 14: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit 4

Let the Algebra Speak

By now you should have a feeling for how Overall Accuracy behaves, given the three examples you worked through above. This is certainly no mystery given the definition of the measure:

Overall Accuracy = (Prevalence) x Sensitivity + (1-Prevalence) x Specificity

Just by analyzing the simple equation for Overall Accuracy, answer the following questions.

Questions

1. Match the triplet of possible Sensitivity, Prevalence, Specificity values with the letter that best describes the value of the resulting Overall Accuracy. A. High, similar to Specificity B. Low, similar to Sensitivity C. High, similar to Sensitivity

D. Low, similar to Specificity E. Roughly the average of Sensitivity and

Specificity

2. It’s your first job out of College. Your boss has just asked you to use Overall Accuracy to order several different screening tests for a condition that is important in your workplace, but rare. Based on what you have learned from this activity, tell your boss why that is not a good idea.

Given …. Letter of Best Match

Sensitivity Prevalence Specificity High High Low Low Low High Any Value Around 50% Any Value Low Low Low Low High Low

Page 15: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.13_ LEARNING OUTCOME _

Now Showing: Hypothesis Testing as a Diagnostic Tool

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Hypothesis Testing Content Videos Answer the following questions while watching the content video on Hypothesis Testing – As a Diagnostic Tool. 1. What is a false positive rate in the context of hypothesis testing?     2. What is the goal of hypothesis testing?     3. What is a Type I error and how is it related to an “alpha level?”     4. What does it mean to say the results of a hypothesis test are statistically significant?     5. What is a p‐value and what is the appropriate way to use one.

Page 16: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.16_ LEARNING OUTCOME _

Now Showing: Hypothesis Testing ---Applying the Concepts

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Hypothesis Testing Content Videos Answer the following questions while watching the content video on Hypothesis Testing – Applying the Concepts. 1. When you see “statistically significant” or “not statistically significant” in an article what four 

things should you be able to do?  

i.    

ii.   

iii.    

iv.    2. In the study that compared calories consumed at McDonald’s and calories consumed at 

Subway, what has to be true about the size of the p‐value, had it been reported, and why do you know that? 

     3. In the vitamin study, what are we supposed to assume was the preset probability of rejecting 

H0 if, in fact, H0 is true?       

4. In a couple of the studies addressed on the video (e.g. Presidential Pay and Performance) the results were not statistically significant, so HA was not accepted.   Give reasons why this is not permission to safely say H0 is accepted.   Hint:  what does a Type I error address? 

  

Page 17: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

    

5. In the vitamin study the results were said to be “small, but statistically significant.”   Give an intuitive reason why you might not be surprised that a trial involving over 15,000 subjects could produce this kind of conclusion.

Page 18: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.21_ LEARNING OUTCOME _

Now Showing: Hypothesis Testing ---Practical Significance

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Hypothesis Testing Content Videos Answer the following questions while watching the content video on Hypothesis Testing – Practical Significance. 1. Distinguish “practical significance” from “statistical significance.” 

   2. What was the “effect size” in the multivitamins and cancer study?      3. In the mental illness and obesity study, suppose the average weight of a test group was 180 

pounds before the group participated in a health promotion program.  If the results were to be considered practically significant (on the average), what would the average weight of the test group have to be when the health promotion program was completed?       

4. In the article on coffee and pregnancy, why can we say that coffee is “statistically safe” to consume during pregnancy? 

      

Page 19: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

5. Get online and find a study that you produced statistically significant results, but results that you see as practically unimpressive.   Briefly describe the study here in two or three sentences,a and defend your assessment of the results as practically insignificant.

Page 20: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.23_ LEARNING OUTCOME 5

A Practical Discussion

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

The Economics of No Significance Title: US Supreme Court: statistical significance not needed in drug lawsuits Author: By Heidi Ledford Source: Nature.com, March 23, 2011 http://blogs.nature.com/news/2011/03/

us_supreme_court_statistical_s.html

We have encountered Zicam elsewhere in this book. It is worth one more look.

In a unanimous decision, the US Supreme Court ruled yesterday that a pharmaceutical company may be required to notify investors of safety reports regarding its products, even if those reports do not rise to the level of statistical significance.

Investors sued the company, based in Scottsdale, Arizona, arguing that it should have notified them earlier about reports that some of its popular zinc gluconate cold medications may have robbed some users of their sense of smell.

When news of the possible link finally became public, Matrixx stock plummeted.

Matrixx tried to shoot down the lawsuit by arguing that the adverse event reports it received about Zicam were not statistically significant. But Judge Sonia Sotomayor, writing for the court, said that test would be too stringent. “Both medical experts and the Food and Drug Administration (FDA) rely on evidence other than statistically significant data to establish an inference of causation,” she wrote. “It thus stands to reason that reasonable investors would act on such evidence.”

Question This is clearly a case where results that were not statistically significant were judged by the Supreme Court to be practically significant. What is the practical significance that was in question here? Be sure to remember who brought the case.

Page 21: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit 2

The N Crowd An MD/Ph.D at a major research institution is studying a new cannabis-based pain medication. The patients who volunteer for the study are randomly divided into two groups. Group 1 is given a placebo (treatment 1) and then after one hour rates the effectiveness of the pain relief on a scale of 1 to 100. Group 2 is given the new drug (treatment 2) and rates in a similar way. Initially only 25 volunteers are available for each group. By the end of the month there are 25000 volunteers in each group, making this the largest clinical trial in recent memory. Amazingly, there is only one point difference between the ratings of the placebo group and the active treatment group at each stage. Formally, the following choice has to be made:

H0: Treatment 2 is no different than Treatment 1

HA: Treatment 2 is different than Treatment 1

Questions

1. Help the researcher decide if her results are statistically significant each week by filling out the following table. The researcher adopted an alpha level of 0.05. If your instructor does not want you to do this a different way, please, use the convenient GRAPHPAD ® applet at http://www.graphpad.com/quickcalcs/ttest1/?Format=SD . The first entry has already been filled out and a screenshot of the applet is shown below. Make sure you can repeat the first entry.

For e

ach

entr

y in

the

tabl

e be

low

, fill

out

the

appl

et fo

rm a

s sh

own.

A

ll th

at c

hang

es is

the

per-

grou

p sa

mpl

e si

ze.

Aft

er th

e fo

rm is

fille

d ou

t, cl

ick

on “C

alcu

late

Now

.”

Page 22: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Date 1st

Week

2nd

Week

3rd

Week

4th

Week

Sample size in each group (N)

25 250 2500 25000

Standard deviation in each group

20 20 20 20

Treatment 1 Mean (Placebo)

500 500 500 500

Treatment 2 Mean (New Treatment)

501 501 501 501

The p-value: 0.8604

Statistically Significant? Check Yes or No

Yes

No

Yes

No

Yes

No

Yes

No

Page 23: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. Formally, a p-value was the probability of seeing data which point away from H0 and in the direction of HA as far or farther than the data which you have just seen - given that H0 is assumed to be true. In this exhibit, what kinds of data would solidly “point away” from H0? Give an example.

3. Define the “effect size” as the observed difference between the two means, divided by the common standard deviation. What is the effect size for each of the four weeks?

4. What does this exercise have to say about the relationship between sample size and statistical significance? Be very specific with your answer.

5. What does this exercise have to say about whether one can reliable infer practical significance from statistical significance?

Page 24: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.24_ LEARNING OUTCOME 5

Accept or Fail to Reject? Semantics or Real?

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

Legally Speaking Title: Innocent V. Not Guilty: Jury Decision Based Entirely on Evidence Author: By Hugh Duvall Source: http://www.oregoncriminalattorney.com/Criminal-Defense-Overview/ Innocent-V-Not-Guilty.shtml A lot of attorneys have written about the difference between innocence and non-guilty. Most quickly get into legalese that would obscure our point here. This site is particularly clear:

Juries never find defendants innocent. They cannot. Not only is it not their job, it is not within their power. They can only find them "not guilty." Once a person has been charged with having committed a crime, there is no mechanism by which that individual can prove his innocence. Yes, the law provides that the person is innocent unless proven guilty, but that is a legalism. It is not, nor could it be, a factual statement. The person, in fact, did or did not commit an offense. Each time a member of the media or other citizen states that William Kennedy Smith or one of the officers accused of beating Rodney King was found "innocent," they are not only incorrect, but are also ingraining within potential jurors a misconception about their role. They enhance the risk that enough jurors on a panel will retire into a jury room believing that it is their task to determine whether there is enough evidence to find a defendant innocent.

Questions

1. The role of the prosecution is to establish guilt beyond a reasonable doubt. The defense has no such burden of proof for innocence. How does this affect the ability of a jury to find a defendant innocent?

Page 25: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. If “I” denotes the event “defendant is presumed innocent” and “E” denotes “evidence presented by prosecution,” then a juror’s job is to evaluate P(E | I)1. So this probability can be ascertained to be big or small, depending on the case. Identify which of these (big or small) leads logically to a conclusion of “guilty” and explain why the other can’t logically lead to a conclusion of innocence.

3. Mr. Duvall goes on to say “As a society, in administering the prosecution function, we must keep at the forefront of our mind that there is no way to reverse the implication of charging someone with a crime. Allowing ourselves to ignore the distinction between a jury's ability to find someone "not guilty" and its inability to find someone "innocent" works against this important interest.” Explain in your own words what this means and why it is important in this discussion of “innocence” versus “not guilty.

                                                            1 See BN3.10 for a definition of this notation if you are unfamiliar with it. 

Page 26: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit 2

Alternative Evidence Title: "It’s like... you know": The Use of Analogies and Heuristics in Teaching Introductory Statistical Methods Author: By Michael A. Martin, Australian National University Source: Journal of Statistics Education Volume 11, Number 2 (2003), www.amstat.org/publications/jse/v11n2/martin.

The analogies between the U.S. criminal justice system and hypothesis testing are strong and potentially very useful as a tool for understanding the logic of an abstract task often seen as only having deductive relevance. In his article, Professor Michael Martin paired some basic concepts from both arenas and we have subset that list below.

Questions 1. Can you explain each of these eight correspondences? Take each one in turn and explain how

they are analogous. You must use the language of this course in your answers.

Criminal Trial Hypothesis Test

1. Defendant is innocent 2. Defendant is guilty 3. Verdict is to acquit 4. Verdict is to convict 5. Presumption of innocence 6. Conviction of an innocent person 7. Acquittal of a guilty person 8. Beyond reasonable doubt

Null hypothesis Alternative hypothesis Failure to reject the null hypothesis Rejection of the null hypothesis Assumption that the null hypothesis is true Type I error Type II error Fixed (small) probability of Type I error

Page 27: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

(continued) 2. Is the distinction between “accepting” and “failing to reject” a null hypothesis a real distinction

or just semantics? Defend your answer based on what you have learned in this exercise.

Page 28: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.25_ LEARNING OUTCOME 5

Origins of Power

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Background We haven’t talked much about Type II errors since we started hypothesis testing. In hypothesis testing the Type II error rate is the empirical probability of failing to reject H0 when you should, and it is typically denoted by the Greek letter β. The “power” of the hypothesis test is 1-β, and is directly analogous to sensitivity. Think of it as the probability of choosing HA, when HA is the right choice. The computation of power can be a complex endeavor, even for elementary forms of H0 and HA. We can gain valuable insights into what affects the power of a statistical test by using one of the many freely available online tools to handle all the complexity for us. Exhibit

Power and Beauty Title: Power and Sample Size Calculator Author: By Statistical Solutions, LLC Source: http://www.statisticalsolutions.net/pss_calc.php Suppose you are interested in how people rate their looks on a 20 point scale, with 0 being unbearably ugly and 20 being hopelessly gorgeous. Later in this workbook, you will learn how to do the mathematics required for testing the following hypothesis, once you have chosen a Type I error rate (alpha), created a rejection region, and collected a sample of n responses.

H0: true population average is 10

HA: true population average is not 10

Our immediate goal is different. What we want to do is to answer the question: how likely is that we will fail to reject H0 when H0 is really false? This would produce the Type II error rate, from which power is easily computed. To answer this, we have to ask “what is the true average if H0 is false?” HA only tells us that in this case the true average would be something other than 10. We will look at four possible values of the true average that are different than 10: 10.5, 11, 11.5, and 12.

Page 29: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions

1. Access the applet at the web address shown above. Make sure you have a reasonably current version of JAVA installed on your computer. When the applet page comes up: Enter 10 for mu(0). This is the hypothesized value of the population average. Enter 5 for Sigma. This is a measure of how variable the population values are. Click the radio button for Two sided test. Leave the alpha level (Type I error rate) fixed at 0.05 for questions 1-3.

Fill out the following table of power values as the sample size and the possible real value of mu (under HA) change. One entry is already done for you. Make sure you confirm that entry before proceeding.

POWER TABLE Possible real values of the population average (called mu(1) on applet) Sample Size 10.5 11 11.5 12

10 50 0.29

100 1000

10000

2. Look at the table. How does power change with sample size, regardless of the true average?

3. Look at the table. How does power change as the guessed true average (mu(1)) changes, regardless of sample size?

4. What would happen to power if you changed alpha? Investigate that question with the online calculator you used to fill out the table above. Think back to how sensitivity and specificity behaved in screening tests. How is this similar?

Page 30: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.26_ LEARNING OUTCOME 5

Error Rates and P-Values

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Background In the classic song by David Bowie, Major Tom is admonished to take his protein pills and put his helmet on. That might be good advice here. This brief encounter concerns something too important to ignore, but too complex to present mathematically. It is the source of so much misunderstanding we are safe to bet that you will see it presented incorrectly in almost all of your other classes that use or teach statistics. It has even been said that “…, this misunderstanding over measures of evidence versus error is so deeply entrenched that it is not even seen as being a problem by the vast majority of researchers.”1 Tighten your helmet straps. We are about to elevate your awareness in an oxygen-thin environment. What is a Type I error, really? Think of it this way. Suppose α is fixed to be 0.05. Then of all the experiments around the world being conducted at that level, at most 1 in 20 of those will result in the false rejection of the null hypothesis. It is a prescription for behavior. “If you use an alpha level of 0.05 for all your experiments, then over time you’ll only end up wrongly rejecting your nulls 5% of the time.” It is not a direct statement about any one of the individual experiments being reported. It’s a challenging concept, but similar in spirit to how you learned to interpret a confidence interval in Module II.

What is a p-value then? It is the probability of seeing data as contradictory to the null or more so, than what you have produced by the experiment. It allows for an inference to be made about the truth of the null. If a p-value is small, the suggestion is that the data are rare, assuming the null is true, indicating that the null might not be true. There are some worthy problems even with this interpretation, but we will leave those to your instructor to parse out if she decides to do so.

So what’s the problem? There are many. The problem we are addressing here concerns the widespread tendency for practitioners to misinterpret a p-value as an error rate. That is, to conclude that a p-value of, say, 0.05 means that there are only 5 chances in 100 that if you reject your null you will do so wrongly. That is not true.2 This is what we want to briefly explore below. You should come away with an understanding that the actual error rates (wrongly rejecting the null) may be much higher than those suggested by a misinterpretation of the p-value.

                                                            1 “Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing,” by R. Hubbard and M.J. Bayarri. The American Statistician, August 2003, Vol. 57, No. 3 2 It is also not true that a p-value of 0.01 is “more evidence against the null” than a p-value of 0.05. Nor is it correct to say that a p-value of 0.03 means that 0.03 is the “minimal value of alpha” that you could have picked and still been able to reject your null. These are common, widespread misinterpretation of a p-value. 

Page 31: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit

Ground Control Given a p-value of 0.03, what is the likelihood of wrongly rejecting the null? We know it is not 0.03. But what is it? One suggestion, not without its detractors, estimates the probability of wrongly rejecting the null, given a computed p-value by the following formula:

ConditionalErrorRate 1 e ∗ p ∗ log p

where

e is 2.718282 “log” is the natural logarithm, often denoted “ln” p is the p-value

So a p-value of 0.03 corresponds to an estimated conditional error rate of 0.22! That is, given one has seen a p-value of 0.03 there are almost 22 chances in100 that if the null is rejected it will be a mistake. Make sure you can confirm the conditional error computation for a p-value of 0.03.

Questions

1. What do these conditional “Type I” errors look like for different p-values? Using a software package (such as Microsoft Excel or Apple Numbers) fill out the following table:

2. Look at the table. If a practitioner insists on misinterpreting a p-value as the probability of wrongly rejecting the null, what does this table suggest the p-value would have to be before that error rate is approximately 5%?

3. Some who are aware of these issues have looked at tables like the one you constructed and concluded that at the very least practitioners should be encouraged to report their actual p-value and not, for instance, just that the p-value is less than 0.05. Why is this good advice?

p-valueConditional Error Rate

p-value

Conditional Error Rate

0.0001 0.01 0.0005 0.03 0.222367

0.001 0.05 0.004 0.1

Page 32: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

FYI for prepublication review of the problems’ worthiness only

pvalue  error 

0.0001  0.002497 

0.0005  0.010225 

0.001  0.018431 

0.004  0.056635 

0.01  0.111254 

0.03  0.222367 

0.05  0.28935 

0.1  0.384959 

Page 33: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.27_ LEARNING OUTCOME _

Now Showing: Hypothesis Testing ---Computations

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Hypothesis Testing Content Videos Answer the following questions while watching the content video on Hypothesis Testing – Computations 1. Explain why the video only chooses to look at one particular type of hypothesis in detail. 

   2. What are the four steps involved in testing a hypothesis?  

i.     

ii.    

iii.    

iv.     3. Compute the standard score for n = 300, p = 0.76, and p0 = 0.81.      4.  A p‐value is computed to be 0.034.  Someone then says:  “You can reject your null hypothesis at 

your preset alpha level of 0.05 and be assured that there are only 34 chances in 1000 that you will be making a mistake.”   Say what is right about this statement and correct what is wrong.     

   

Page 34: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

5. Find the p‐value associated with a z‐score of 2.25 and interpret.   

Page 35: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.28_ LEARNING OUTCOME 5

Practice with Proportions - I

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Review You may want to review the content video on Hypothesis Testing: Computations. In that video you were presented with four steps for testing a simple hypothesis of the form: Step 1: Establish a value for alpha level, the Type I Error Rate (typically taken to be 0.05) Step 2: Compute the standard score (z-value):  

Step3: Use a standard score table to locate the p-value associated with z. Step4: Compare the p-value to the preset Type I error rate. If it is smaller, reject H0. If it larger, fail to reject H0.  

 

Remember The Type I error rate is the analogous to the false positive rate. It is the probability of wrongly rejecting H0 in a long series of similar decisions with the same preset error rate.

The p-value is the probability, assuming H0 is true, of seeing data that is as consistent, or more consistent with the alternative hypothesis than the data seen in the current context.    

H0: p p0

HA: p > p0

zp p

p 1 pn

Page 36: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions 

Please provide full solutions to the following problems. 

 

1. A sample of n = 75 UK students is taken by the Kentucky Kernel, and each one is asked, “Does knowing that many of the basketball players Kentucky recruits will go pro after one year affect your sense of attachment to the team?” Suppose 60% in your sample said, “Yes, it does affect my attachment.” Is it safe for the Kernel to report that a majority of all UK students are likely to feel that way? Decide between H0: p ≤ 0.50 and HA: p > 0.50. Assume a Type I error rate of α = 0.05. Report a p-value, state what your decision is, and explain why.

2. A newspaper took a random sample of 1,200 registered voters and found that 925 would vote for the Democratic candidate for governor. Is this evidence that more than 3/4 of the entire voting population would vote for the Democrat? Assume a Type I error rate of α = 0.05. What are H0 and HA? Report a p-value, state what your decision is, and explain why.

Page 37: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.29_ LEARNING OUTCOME 5

Practice with Proportions - II

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Questions Please provide full solutions to the following problems.  

1. The CEO of a large electric utility claims that more than 80 percent of his customers are very satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. Among the sampled customers, 81% say they are very satisfied. We want to decide if the 81% is enough evidence to lead us accept or reject the CEO’s claim. Hence, we have to test the hypothesis H0: p <= 0.80 vs HA p > 0.80. Assume a Type I error rate of α = 0.05. Report a p-value, state what your decision is, and explain why.

 

 

 

 

 

 

 

 

2. Patients with advanced cancers of the stomach, bronchus, colon, ovary or breast were treated with 

ascorbate and their survival time post diagnosis was monitored.1   These data are in an appendix.  

Test the hypothesis that the proportion of all patients with these types of cancer (treated with 

ascorbate) who will survive more than a year is bigger than 0.40.  Assume a Type I error rate of α = 

0.05.  Report a p‐value, state what your decision is, and explain why. 

   

                                                            1 Cameron, E. and Pauling, L. (1978).   Supplemental ascorbate in the supportive treatment of cancer: re evaluation of prolongation of survival times in terminal human cancer.  Proceedings of the National Academy of Science USA, 75, 4538‐ 4542. 

Page 38: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

3. Refer to the cancer data referenced in Question 2.   Test the hypothesis that the proportion of all 

patients with breast cancer (treated with ascorbate) who will survive more than a year is bigger than 

0.75.  Assume a Type I error rate of α = 0.05.  Report a p‐value, state what your decision is, and 

explain why.  Are you surprised by this outcome?  Why or why not? 

 

 

 

 

 

 

 

 

 

4. Refer to Question 3.  In that problem what sample size would have been needed for the results to 

have produced a p‐value of .025?  Show all your work.

Page 39: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

 

Days Survived Organ

Days Survived Organ

Days Survived Organ

Days Survived Organ

Days Survived Organ

124 Stomach 81 Bronchus 248 Colon 1234 Ovary 1235 Breast 42 Stomach 461 Bronchus 377 Colon 89 Ovary 24 Breast 25 Stomach 20 Bronchus 189 Colon 201 Ovary 1581 Breast 45 Stomach 450 Bronchus 1843 Colon 356 Ovary 1166 Breast

412 Stomach 246 Bronchus 180 Colon 2970 Ovary 40 Breast 51 Stomach 166 Bronchus 537 Colon 456 Ovary 727 Breast

1112 Stomach 63 Bronchus 519 Colon 791 Breast 46 Stomach 64 Bronchus 455 Colon 1804 Breast

103 Stomach 155 Bronchus 406 Colon 3460 Breast 876 Stomach 859 Bronchus 365 Colon 719 Breast 146 Stomach 151 Bronchus 942 Colon 340 Stomach 166 Bronchus 776 Colon 396 Stomach 37 Bronchus 372 Colon

223 Bronchus 163 Colon 138 Bronchus 101 Colon 72 Bronchus 20 Colon

245 Bronchus 283 Colon

Page 40: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.30_ LEARNING OUTCOME 5

Confirming What We Read

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

Vytorin Verified In his 2008 article on Vytorin, author Alex Berenson pointed out what has now become old news: Patients taking Vytorin were statistically significantly more likely (with a preset Type I error rate of 0.01) than those taking a placebo to develop cancer. It turns out that after further studies were completed, the FDA ultimately concluded that “it is unlikely that Vytorin … increase(s) the risk of cancer or cancer-related death.” No mention of cancer is currently on the FDA website under the Vytorin entry.

Questions

1. Reports differ on exactly how many people were in the original Vytorin clinical trial (the SEAS trial), but it appears that 950 were assigned to take Vytorin and 920 were assigned to take a placebo. There were a reported 102 cases of cancer in the Vytorin group and 67 in the placebo group. That’s a 7.28% rate in the placebo group. Test the hypothesis below:

where p is the true proportion of all Vytorin users who would develop cancer while on the drug. Assume a Type I error rate of α = 0.01. Report a p-value, state what your decision is, and explain why.

H0: p 0.0728

HA: p > 0.0728

Page 41: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. Are your results from Question 1 consistent with what the summary in the Exhibit suggests? Explain.

Exhibit 2

Mary Jane Brain Title: Moderation of the Effect of Adolescent-Onset Cannabis Use on Adult Psychosis by a Functional Polymorphism in the Catechol-O-Methyltransferase Gene: Longitudinal Evidence of a Gene X Environment Interaction Author: By A. Caspi, et al. Source: BIOL PSYCHIATRY 2005;57:1117–1127

The authors studied the influence of adolescent marijuana use on adult psychosis as a function of certain genetic variables. In particular they studied the so-called COMT gene that is known to govern an enzyme that breaks down dopamine, a brain chemical involved in schizophrenia. COMT comes in different forms but we won’t go into that level of detail. The figure below shows the results of the study for individuals who had one particular COMT expression. As you can see, there are two groups being studied, one with no adolescent use of marijuana and the other with adolescent use. The vertical axis simply records the percentage in each group that went on to develop schizophrenia. The difference between the two groups has been declared statistically significant at the α = 0.05 level.

Question

1. Are the results statistically significant as claimed? Use the “No Adolescent Marijuana” group to determine a suitable p0 and then test

H0: p p0

HA: p > p0

Page 42: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.31_ LEARNING OUTCOME 5

Computations vs Understanding

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Introduction It is important that we be able to compute correctly. However, computation prowess is no substitute for a deeper understanding of what you are doing and why. This seems to especially be a problem in statistical science where, at the undergraduate level computations are pretty easy, but at all levels the underlying concepts are challenging. We are going to experience this divide in this activity. All you need to know to do the work below is how to test a simple one-tailed hypothesis involving a proportion, which we have been doing relentlessly, and a grasp of what a Type I error rate is. Exhibit

Eureka or Not? Let’s suppose that twenty identical experiments are taking place simultaneously around the world. In all cases the researchers are studying the same drug, which they hope will improve the survival rate of the black-winged peckerwood finch after it has been infected with a particular type of tree mold. The survival rate, left untreated, is unfortunately only 30%. None of the researchers know about the others work. The table to the right shows the results from the 20 different studies. In all cases the Type I error rate was taken to be 0.05 and the hypothesis being tested was:

Site

Observed survival rate

with drug

Number of finches studied

Able to reject H0: p ≤ 0.30?

1 0.35 n = 100 No 2 0.34 n = 100  No3 0.31 n = 100  No4 0.33 n = 100  No5 0.33 n = 100  No6 0.35 n = 100  No7 0.35 n = 100  No8 0.33 n = 100  No9 0.30 n = 100  No

10 0.34 n = 100  No11 0.34 n = 100  No12 0.30 n = 100  No13 0.34 n = 100  No14 0.31 n = 100  No15 0.31 n = 100  No16 0.31 n = 100  No

17* 0.45 n = 100  Yes18 0.30 n = 100  No19 0.35 n = 100  No20 0.33 n = 100  No

H0: p ≤ 0.30

HA: p > 0.30

Page 43: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions 1. Use the data from Site 17 to confirm that the null could be rejected. What is the p-value

associated with the result? 2. Combine all the studies (n = 100 x 20 = 2000) and test the hypothesis again. Confirm that it

cannot be rejected. Report the overall observed survival rate, and the p-value associated with the overall test.

3. We have a dilemma. 19 of the sites don’t seek publication because their results are not significant.

Site 17 gets published because the results produced there, with an identical experiment, are significant. We know (though the researchers don’t) that if we combine the results from all 20 sites then we will not be able to support the alternative, that the black-winged peckerwood finch will survive better if the drug is administered. Write out again what it means to have a Type I error rate of 0.05, and explain what has likely happened here in light of that definition.

Page 44: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.32_ LEARNING OUTCOME 5

Role of Sample Size

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

Exhibit 1

Better Than Chance? You have spent a great deal of time so far testing hypotheses involving a proportion. Let’s focus on the one-tailed version for the moment. Let’s suppose you are developing a new pill designed to help students guess better on yes/no questions on their tests. If they guess totally at random, they have a 50-50 chance of getting it right. You want to be able to show that students do better than with your pill so that you can get create some interest in crowd-based funding for the production and marketing costs of your pill. Unfortunately, in several different tests, you always ended up with 51% of the treatment group getting a test yes/no correct. Questions 1. Consider the following hypothesis. Complete the entries in the table below, for the different

sample sizes shown. Remember p is 0.51 in all cases. :    

Sample Size One-tailed p-value Statistically significant Results? (yes or no) 100

1000 10000

100000

H0: p ≤ p0

HA: p > p0

H0: p ≤ 0.50

HA: p > 0.50

Page 45: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

2. Look at the table. What happens to the p-value as the sample size increases?

3. So after you ran enough people through your study you were able to report your results were statistically significant and you want to begin seeking funding. Give and defend two reasons why you are likely to still have a very unconvincing case.

Exhibit 2

Crowd Control Title: Duration of sleep contributes to next-day pain report in the general population Author: By R. Edwards, et al. Source: Pain 137 (2008) 202–207  

The authors of this study followed patients with chronic pain who recorded both the number of hours they slept during the previous sleep period and the frequency of their pain symptoms. Pain was recorded on a five-point scale and a summary of the data they found are in the table below: Question It can be shown that a comparison of patients in the 0-3 hour category to those in the 11+ hour category is not statistically significant, in spite of a difference in means of 0.42. However, a comparison of patients in the 5-hour category to those in the 8-hour category is statistically significant, even though that difference in means is only 0.19. Give a solid reason why you think that happened. What practical implication does this have for our understanding testing results?

Sleep (Hours) 

Average Pain Rating 

Standard Deviation 

Sample Size 

0–3  1.36  1.51  75 

4  1.13  1.36  166 

5  0.94  1.29  434 

6  0.79  1.11  1138 

7  0.73  1.11  1568 

8  0.75  1.13  1557 

9  0.71  1.09  339 

10  1.24  1.4  119 

11+  1.78  1.59  66 

Page 46: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.33_ LEARNING OUTCOME 5

A Two-tailed Test

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

New Content Suppose we want to test a hypothesis of the following form. This is called a “two-sided” test because the alternative is a ≠. Most experienced statisticians would agree that testing two-sided hypotheses is wiser than not. The steps involved in producing a p-value are very similar, with a few critical differences. Read on. Step 1: Establish a value for alpha level, the Type I Error Rate (typically taken to be 0.05) Step 2: Compute the standard score (z-value):  

Step3: Use a standard score table to locate the p-value associated with z. When you use the table associated with this workbook you would look that up the same way you would for the > alternative. Step4: Here’s the new part. Now you compare the p-value to α/2, the preset Type I error rate divided by 2. If it is smaller, reject H0; if it larger, fail to reject H0.  

 

Remember The steps for testing a two-sided alternative involving a proportion are exactly the same as for the greater than alternative, until you get to the comparison with the Type I error rate. For the two-sided alternative you have to compare the p-value you found to the Type I error rate divided by 2.    

H0: p = p0

HA: p ≠ p0

zp p

p 1 pn

Page 47: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions 

Please provide full solutions to the following problems. 

 

1. Patients with advanced cancers of the stomach, bronchus, colon, ovary or breast were treated with 

ascorbate and their survival time post diagnosis was monitored.1   These data are in the appendix.  

Test the hypothesis that the proportion of all patients with these types of cancer (treated with 

ascorbate) who will survive more than a year is different than 0.40.  Assume a Type I error rate of α 

= 0.05.  Report a p‐value, state what your decision is, and explain why. 

2. Suppose you want to test a hypothesis about a proportion, similar to what you’ve just done, but you don’t know whether you want to use a two-tailed or a one-tailed alternative. You do know you have to have a Type I error rate of 0.05. You absent mindedly take a look at your data results before forming HA, and you notice p p . So you decide to go with a one-sided HA. Why might this be considered cheating? Be very clear with your reasons.

                                                            1 Cameron, E. and Pauling, L. (1978).   Supplemental ascorbate in the supportive treatment of cancer: re evaluation of prolongation of survival times in terminal human cancer.  Proceedings of the National Academy of Science USA, 75, 4538‐ 4542. 

Page 48: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.34_ LEARNING OUTCOME 5

Confidence Intervals for Testing

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

New Content You remember confidence intervals from Module II. It is often possible to use confidence intervals to actually test a hypothesis. Here’s how you would do it for the following hypothesis. Pay attention because we have to refine some of what you were told about confidence intervals in Module II. More sophistication, more truth! Read on. Step 1: Establish a value for alpha level, the Type I Error Rate (typically taken to be 0.05), and divide that by 2. Step 2: Find the standard score from a standard score table like the one in this workbook1 that corresponds to α/2. Call this z*. If α = 0.05, then z* = 1.96 . Step 3: Form the confidence interval  

Step 4: See if p0 is in this interval. If it is not, you can reject H0 with the assumed Type I error rate. If it is not, then you have to fail to reject H0. No comments are made about p-values when using

confidence intervals for testing. Connections You form the interval with a z* that would, in the words of Module II, correspond to level of confidence (1- α)%. This is equivalent to the computation found in Step 2, and it is convenient to know that computation since only a few z* values were given in that table back in Module II. The confidence interval in Step 3 looks a bit different than what you saw in Module II. Back then there was no context for p0 since there were no hypotheses to be tested. If you plug in ½ for p0, then that earlier formula results. It turns out a value of ½ is “conservative” in a sense your instructor may want to clarify. In any case, in this activity make sure you use the formula in Step 3.

                                                            1 Standard score tables are indexed differently in different publications.  These instructions apply to how the standard score table is formatted in this workbook. 

H0: p = p0

HA: p ≠ p0

p ∗ p 1 pn

Page 49: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions 

Please provide full solutions to the following problems. 

 

1. Patients with advanced cancers of the stomach, bronchus, colon, ovary or breast were treated with 

ascorbate and their survival time post diagnosis was monitored.2   These data are in the appendix.  

Use a confidence interval to test the hypothesis that the proportion of all patients with these types 

of cancer (treated with ascorbate) who will survive more than a year is different than 0.40.  Assume 

a Type I error rate of α = 0.05.  Report what your decision is, and explain why. 

2. Revisit Question 1. Do the same testing exercise over, using a confidence interval, but start with a Type I error rate of 0.01 instead of 0.05. Report your decision and explain why you made it the way you did.

                                                            2 Cameron, E. and Pauling, L. (1978).   Supplemental ascorbate in the supportive treatment of cancer: re evaluation of prolongation of survival times in terminal human cancer.  Proceedings of the National Academy of Science USA, 75, 4538‐ 4542. 

Page 50: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.35_ LEARNING OUTCOME 5

Single Mean Test

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

New Content1 Throughout this workbook we a have only focused on hypotheses involving a proportion. There are important pedagogical reasons for that, and as your instructor has undoubtedly told you often, the logic of all these tests, no matter how different the actual tests look, is the same. Still, you can only do so much practically a simple test for a proportion. In this brief activity you will be introduced to a test involving a single population mean μ. We will only look at the “two-sided” test of the form: Step 1: Establish a value for alpha level, the Type I Error Rate (typically taken to be 0.05) Step 2: Compute the standard score (z-value)2:

where x is the sample mean, s the sample standard deviation (see BN1.12), n the sample size, and μ the hypothesized value of the population mean μ

 

Step3: Now you have to get the p-value. Provided n is not too small, it is acceptable to do the following. Take z to the table in this workbook the same way you would for the > alternative and record that value. The p-value for the two-tailed alternative is twice this value. Step4: Compare the two-tailed p-value to α (the preset Type I error rate). If it is smaller, reject H0; if it larger, fail to reject H0.

                                                            1 This is only the briefest of introductions to a very common, very simple test of hypothesis. This workbook is not intended to be a collection of statistical methods, but rather an adventure into statistical thinking. Your instructor may want to supplement with additional problems. 2 This is not really a z test, but something called a Student’s t-test. We won’t make that distinction in this workbook. 

H0: μ. = μ.0

HA: μ. ≠ μ.0

zx μs√n

Page 51: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Example 

Drum Corps International sets field performance guidelines for all their competitive participants. Setup time, before a field performance, is important and has to be practiced carefully in order to meet the time constraints. These time limits may change from year to year, but the World Class division typically has 3 minutes to set up, 2 minutes to do a pre-show and then 12 minutes, to actually do the main show. Let’s suppose Carolina Crown, the 2013 World Class division champions, has practiced the setup 30 times with an average setup time of 3 minutes and 4 seconds, with a standard deviation of 15.2 seconds. Test the hypothesis that the true mean time for setup is different than 3 minutes.

We are given:

n = 30 sample setups a sample mean, x = 184 seconds a standard deviation of s = 15.2 seconds a hypothesize population mean of 180 seconds

Step 1: Let’s take alpha to be 0.05

 

Step 2:   

 

Step 3:  If you take 1.44 to the Standard Score Table, you get a one-sided p-value of 0.07493. So the two-tailed p-value is 0.14986.  

Step 4:  The two-tailed p-value is bigger than 0.05, so there is not enough statistical evidence for H0 to be rejected. Your mileage may vary depending on round off errors. Crown needs to keep in mind that a failure to reject H0 is not the same as accepting it. They still have a sample mean that is above specifications and a rather large standard deviation of times. Best advice is for Crown to work harder on setup.  

It’s as simple as this, every time. The next two pages have exercises for you to work.

H0: μ. = 180 secs

HA: μ. ≠ 180 secs

z 184 180

15.2√30

1.44

Page 52: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Questions3

1. Refer to the example above. The thirty hypothetical setup times for Crown are in the table to the right. Use these data and software (e.g. Microsoft Excel or any one of many online calculators – the purpose is for you to explore options and develop additional skills) to confirm the results in the above example. If your instructor does not have a preference you can do the following:

Go to http://www.wolframalpha.com/ and enter “z-test calculator” in the box

Fill out the required boxes (make sure you chose two-tailed test) Hit enter

In the space below record what software you ended up using, and the actual p-value it produced.

2. Let’s change the data some. Suppose on Attempt 1 the time was erroneously recorded as 100 instead of 200. Use software to re-test the hypothesis Set the Type I error rate to be 0.05. Make sure you report what software you used, the two-tailed p-value, your decision. Please note that depending on what software you use, you may have to compute the mean and standard deviation separately. If you instructor does not require a particular package, then use the online calculator site mentioned in Question 1.

                                                            3 Answers may vary depending on how the standard deviation is computed.  There are a couple different ways  We have not chosen to surface that nuance in this workbook and it may not always be clear what formula a software package is using.  However, with an n of 30 the answers should be pretty close regardless. 

AttemptSetup Time

1 200 2 189 3 180 4 168 5 168 6 195 7 167 8 200 9 167

10 210 11 185 12 167 13 190 14 180 15 175 16 175 17 182 18 171 19 202 20 195 21 212 22 203 23 150 24 175 25 200 26 182 27 184 28 165 29 203 30 180

H0: μ. = 180 secs

HA: μ. ≠ 180 secs

Page 53: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

3. Reflect on the answer you found in Question 2. What can you say about how finicky this hypothesis test is to a typo in one data point?

4. The Cadets Drum and Bugle Corps from Allentown, Pennsylvania, routinely competes with Carolina Crown in the same World Class division. They are also subject to the same setup time restrictions. Suppose, based on n = 30 sample setups they have a sample mean of x = 184 seconds just like Crown, but a standard deviation of s = 5.1 seconds. Test the hypothesis shown. Use a Type I error rate of 0.05. Make sure you report the two-tailed p-value, and your decision. Explain the difference between the results you found for the Cadets, and those for Crown (from the example).

H0: μ. = 180 secs

HA: μ. ≠ 180 secs

Page 54: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

BEYOND THE NUMBERS 3.36_ LEARNING OUTCOME 5

Hypothesis Testing --- Two Means

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.

New Content It might be fair to say that you can’t really do any practically interesting testing unless you know how to do testing for two means. This is likely the minimal exposure one would need, for example, to compare two treatments in a medical experiment. Keep in mind, developing methodology skills is not the focus of this workbook. Still, it would be good to know how to test the following hypothesis just in case you are asked to do a project in another class, or the second capstone in this one. Suppose we have two populations (or treatments), one with an unknown population mean μ1 and the other with an unknown population mean μ2. One simple expression of what it would mean to test if the treatments were different is shown below: While we could approach this much the way we did in BN 3.35, we are going to focus on using software to test the hypothesis. 1 So we will change our steps to reflect this software focus. Step 1: Establish a value for alpha level, the Type I Error Rate (typically taken to be 0.05) Step 2: Make sure you have the following ready as inputs:

The sample sizes from each of the two treatment groups The sample means from each of the two treatment groups The sample standard deviations from each of the two treatment groups

Step 3:  Choose one of many software applications or online calculators that will allow you to enter these inputs and get a two-tailed p-value in return. If you instructor does not have a preference, you can use the GraphPad® online calculator that has appeared elsewhere in this

workbook. The url is http://www.graphpad.com/quickcalcs/ttest1/?Format=SD .  

Step4: Compare the two-tailed p-value to α (the preset Type I error rate). If it is smaller, reject H0; if it larger, fail to reject H0.  

 

                                                            1 Your instructor may wish to show you the formula(s).  Just beware that there are many nuances to worry about and these are appropriately haggled over in a different kind of course.  

H0: μ1 = μ.2

HA: μ1 ≠ μ2

Page 55: Proposed Rearrangements and Additions New Order Topic L.O ...web.as.uky.edu/statistics/users/rayens/STA210... · Module 3 – Proposed Rearrangements and Additions New Order Topic

Exhibit

Sleep Pains Title: Duration of sleep contributes to next-day pain report in the general population Author: By R. Edwards, et al. Source: Pain 137 (2008) 202–207  

 There is a lot of interest in how disturbed sleep is related to pain perception. The authors of this study followed patients with chronic pain who recorded both the number of hours they slept during the previous sleep period and the frequency of their pain symptoms. The pain symptoms were record on a five-point scale: 0 = none of the time,1= a little, 2=some,3= most, and 4 = all of the time. A summary of the data they found are in the table below: We are interested in testing for different pairs of sleep categories. In all cases we will take the Type I error rate to be 0.05 Questions 1. Confirm that the two-tailed p-values associated with a comparison of the patients in the 0-3 and

the 5-hour sleep groups is 0.0115. What software did you use? Describe the steps involved.

2. Test the same hypothesis, but use the 5-hour group and the 8 hour group. What is the p-value, your decision and why did you make the decision you did?2

                                                            2 There is a serious problem with testing lots of pairs of hypotheses from a single study.   It really can complicate what one means by the Type I error rate.  Your instructor may choose to elaborate. 

Sleep (Hours) 

Average Pain Rating 

Standard Deviation 

Sample Size 

0–3  1.36  1.51  75 

4  1.13  1.36  166 

5  0.94  1.29  434 

6  0.79  1.11  1138 

7  0.73  1.11  1568 

8  0.75  1.13  1557 

9  0.71  1.09  339 

10  1.24  1.4  119 

11+  1.78  1.59  66 

H0: μ1 = μ.2

HA: μ1 ≠ μ2