14
Thinking about variation

Thinking about variation

  • Upload
    sage

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Thinking about variation. Learning Objectives. By the end of this lecture, you should be able to: Discuss with an example why it is important to know the variation when analyzing a dataset Interpret a series of Normal curves relative to each other in terms of their center and variation - PowerPoint PPT Presentation

Citation preview

Page 1: Thinking about variation

Thinking about variation

Page 2: Thinking about variation

Learning Objectives

By the end of this lecture, you should be able to:

– Discuss with an example why it is important to know the variation when analyzing a dataset

– Interpret a series of Normal curves relative to each other in terms of their center and variation

– Be able to compare values from different datasets by comparing their z-scores

Page 3: Thinking about variation

Thoughts on variation continued

• Let’s take a moment to think about spread (again)…• Suppose you score 12 out of 15 on a test.

– Great score?– Good score?– Average score?– Poor score? – Terrible score?

• Answer: You can’t tell! I hope you’d agree that you’d at least need the mean in order to interpret how good a score this was.

• Okay then, so suppose I tell you that the mean was 11 / 15. Now answer the same question: Is 12/15 with a mean of 11 this a Great score, Good score, Fair score, Poor score, Terrible score?

• Answer: You STILL can’t tell! While you could say that is somewhat better than average, you really have no way of knowing if it is approximately average, good, or great.

Page 4: Thinking about variation

Thoughts on variation continued

• Suppose I tell you that the mean was 11 / 15. Is 12/15 a: – Great score?– Good score?– Average score?– Poor score?

– Terrible score?

• Discussion: What’s missing from this interpretation is a measure of spread. Suppose I told you that of the 500 students who took this test, the vast majority scored between 9.5 and 10.5. In this case, you’d suspect that a score of 12 was, in fact, quite good, but you couldn’t put a number on it.

• KEY POINT: In order to properly interpret any score (of a Normal distribution), we simply can not ignore the standard deviation!!!

• Suppose the standard deviation was 0.5. In this case, a score of 12 is two standard deviations above the mean. This would be a score at about the 98th percentile – which is a great result.

• Suppose the standard deviation was 2. In that case, your z-score is +0.5 and you are in the 70th percentile which is good, but not fantastic.

• In other words, without knowing the spread, you simply do not know the story!

Page 5: Thinking about variation

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

What’s different? What’s the same?

In this group, means are different

( = 10, 15, and 20) while the standard

deviations are the same ( = 3)

In this group, the means are the

same ( = 15) but the standard

deviations are different ( = 2, 4,

and 6).

Page 6: Thinking about variation

Another extremely useful thing about working with normally distributed data is that we can compare apples and oranges! That is, because we can convert any observation into a z-score, we can then answer questions to compare seemingly non-comparable distributions.

Page 7: Thinking about variation

SAT vs ACT• Question: Suppose that student A scores 1140 on their SAT, and student B

scores 18.2 on their ACT. You are an admissions counselor and you need to make a decision based exclusively on their test score. Can you use this data to decide?

• Answer: If you can convert these numbers to their corresponding z-scores, then absolutely! To do so, you would, of course, need to know the mean and standard deviation of the two exams. This information is routinely provided by the testing services.

• E.g. If student A had a z-score of +1, that means he was in the 84th percentile for the SAT. If student B had a z-score of +1.3, that means that he was in the 90th percentile. So even though they took completely different exams, you do have a way of comparing them!

Page 8: Thinking about variation

A study was done in which the gestation time of mothers in a poor neighborhood was measured. While there were

free prenatal vitamins available, there was a great deal of misinformation about proper prenatal nutrition. The

gestation time of this group can be seen on the light-blue curve below.

Over the next couple of years, a public health project was implemented at local health-care institutions in which

women were also provided with nutritional counseling and healthier food. The results of a study after the nutritional

program was implemented are summarized on the orange graph below.

Try to interpret the results in your own words….

Example: Gestation time in malnourished mothers

Page 9: Thinking about variation

Try to interpret the results in your own words….

•The mean gestational time improved from about 250 to 266.

•In addition to the mean improving, there were more people who reached the mean (the peak of the

orange curve is higher than the peak of the blue curve).

•There was more consistency in the “better nutrition” group: the spread of the orange distribution is

narrower. (While you can simply eyeball it, and you can also quantify it by the standard deviation).

Example: Gestation time in malnourished mothers

Don’t feel bad if you didn’t automatically ‘get’ all these facts.

That’s why we do examples here! Your goal should be to begin making these kinds of interpretations on your own.

Page 10: Thinking about variation

A commonly accepted number for a minimum gestational period (ideally) is about 240 days or

longer. How might we quantify the improvement shown below?

Instead of waiting for me to answer, try to come up with it on your own. I.e. STOP and THINK

about it for a moment…

Answer: The best way would be to look at the percentage of women who reached the target of

240 days in each group.

Example: Gestation time in malnourished mothers

Page 11: Thinking about variation

0.3085. is 0.5- z ofleft the

under to Area

deviation) standard a (half

5.020

1020

)250240(

)(

20

250

240

z

z

xz

x

Vitamins Only:

In the group without nutritional counseling (vitamins only), what percent of mothers

failed to carry their babies at least 240 days?

Vitamins only: About 31% of women failed to reach the

target length of 240 days.

=250, =20, x=240

Page 12: Thinking about variation

0.0418. is 1.73- z ofleft the toArea

mean) from sd 2almost (

73.115

2615

)266240(

)(

15

266

240

z

z

xz

x

Nutritional counseling and better food=266, =15,

x=240

Conclusion: Compared to vitamin supplements alone, vitamins and better food resulted in a much smaller

percentage of women with pregnancy terms below 8 months (4% vs. 31%).

Nutritional assistance program: Only about 4% of

women failed to carry their babies 240 days!

Page 13: Thinking about variation

Going in the other direction…Remember: stats teachers love this!!

We may also want to find the observed range of values that correspond to a given proportion/ area under the curve.

For that, we go backward, that is, we start with the normal table:

we first find the desired

area/ proportion in the

body of the table,

we then read the

corresponding z-value from

the left column and top

row.

For an area to the left of 1.25 % (0.0125), the z-value is -2.24

Page 14: Thinking about variation

25695.255

)15*67.0(266

)*()(

0.67.-about is

25%lower

for the valuez

?

%25arealower

%75areaupper

15

266

x

x

zxx

z

x

Example:

=266, =15, upper area 75%

How long are the longest 75% of pregnancies when mothers in the neighborhood are entered in the

“better food” program?

Answer: This is another case where we start with an area, and need to come back to our ‘x’.

?

upper 75%

Conclusion: The 75% longest pregnancies in this group are about 256 days or longer.