Chapter 3 - Normal Distributionstatdhtx/methods8/Instructors... · P V z score p 2.050 0.9798 2.054 0.9800 2.060 0.9803 3.19 There is no meaningful discrimination to be made among

18

Chapter 3 - Normal Distribution

3.1 a. Original data:

1 2 2 3 3 3 4 4 4 4 5 5 5 6 6 7

B

B

B

B

B

B

B

0

0.5

1

1.5

2

2.5

3

3.5

4

0 1 2 3 4 5 6 7

Fre

qu

enc

y

b. To convert the distribution to a distribution of X - µ, subtract µ = 4 from each score:

-3 -2 -2 -1 -1 -1 0 0 0 0 1 1 1 2 2 3

c. To complete the conversion to z, divide each score by = 1.63:

-1.84 -1.23 -1.23 -0.61 -0.61 -0.61 0 0 0 0

0.61 0.61 0.61 1.23 1.23 1.84

3.2 Converting specific scores from distribution in Exercise 3.1 into z scores:

Xz

2.5 4

1.63

6.2 4

1.63

9 4

1.63

.921.353.07

score

(x)

z

score

2.5 -0.92 18% of the distribution lies below X = 2.5

6.2 +1.35 91% of the distribution lies below X = 6.2

9.0 +3.07 99.9% of the distribution lies below X = 9.0

19

960 975

15

15

15 1

990 975

15

15

15 1

3.3 Errors counting shoppers in a major department store:

a. X

z

between -1 and µ lie .3413

between +1 and µ lie .3413

.6826

Therefore between 960 and 990 are found approximately 68% of the scores.

b. 975 = µ; therefore 50% of the scores lie below 975.

c. .5000 lie below 975

.3413 lie between 975 and 990

.8413 (or 84%) lie below 990

3.4 Using the data in Exercise 3.3:

a. From Appendix z:

z score area

between z

and mean

0.67 0.2486

0.6745 0.2500 [interpolation from Appendix z]

0.68 0.2517

Therefore z = ±.6745 encompasses middle 50%.

975.6745

15

958.12 and 964.88

Xz

X

X

50% of the scores lie between counts of 965 and 985.

b. 75% of the counts would be less than 985 because we just calculated the middle 50%,

25% of which lie on either side of the mean. Since 50% lie below the mean, 50 + 25 =

75% lie below 985.

20

c. What scores would 95% of the counts lie between?

9751.96

15

945.6 and 1004.4

95% of the counts would lie between 946 and 1004

Xz

X

X

3.5 The supervisor’s count of shoppers:

Xz

950 975

15

1.67 X to ±1.67 = 2(.0475) = .095; therefore 9.5% of the time scores will be at least this

extreme.

3.6 a. Sketch:

b. 30 25

1.005

Xz

The smaller portion for z = 1.00 is .1587. Therefore 16% of the 4th

-graders score

better than the average 9th

-grader.

c. 25 30

0.510

Xz

The smaller portion for z = -0.5 is .3085. Therefore 31% of the 9th

-graders score

worse than the average 4th

-grader.

3.7 They would be equal when the two distributions have the same standard deviation.

21

3.8 Diagnostically meaningful cutoff:

1501.2817

30

111.549 the diagnostically

meaningful cutoff

Xz

X

X

z score area above z

1.2800 0.1003

1.2817 0.1000

1.2900 0.0985

3.9 Next year's salary raises:

a.

z X

1 . 2817 X 2000

400

$ 2512. 68 X

10% of the faculty will have a raise equal to or greater than $2,512.68.

b.

z X

1 . 645 X 2000

400

$ 1342 X

The 5% of the faculty who haven't done anything useful in years will receive no more

than $1,342 each, and probably don’t deserve that much.

22

3.10 Introductory Psychology students checking seatbelt usage:

a. Plot of distribution:

b.

z X

6244

7 2 . 57, p . 0051

A count this high (or higher) would occur by chance only .5% of the time. The

suspicion is that he just made up a number.

3.11 Transforming scores on diagnostic test for language problems:

X1 = original scores µ1 = 48 1 = 7

X2 = transformed scores µ2 = 80 2 = 10

2 1 /

10 7 /

0.7

C

C

C

Therefore to transform the original standard deviation from 7 to 10, we need to divide the

original scores by .7. However dividing the original scores by .7 divides their mean by

.7.

2 1 / 0.7 48 / .7 68.57X X

We want to raise the mean to 80. 80 – 68.57 = 11.43. Therefore we need to add 11.43 to

each score.

23

2 10.7 11.43X X

X2 = X1/0.7 + 11.43. [This formula summarizes the whole process.]

3.12 Skewed distribution of diagnostic test for language problems:

a. Diagram:

Distribution of Language

Test Scores

Score

Pro

po

rtio

n

10 20 30 40 50

0.0

00

.02

0.0

40

.06

0.0

8

b. To find the cutoff for the bottom 10% if the distribution is not normal, empirically

count up from the bottom.

3.13 October 1981 GRE, all people taking exam:

600 489

126

0.88 (larger portion) = 0.81

Xz

p

A GRE score of 600 would correspond to the 81st percentile.

24

3.14 For the data in Exercise 3.13:

489.6754

126

573.987

Xz

X

X

z score p

0.6700 0.7486

0.6745 0.7500

0.6800 0.7517

A GRE score of (.6745*126 + 489) = 574 would correspond to the 75th percentile.

3.15 October 1981 GRE, all seniors and nonenrolled college graduates:

600 507

118

0.79 .785

Xz

p

507

0.6745118

586.591

Xz

X

X

For seniors and nonenrolled college graduates, a GRE score of 600 is at the 79th

percentile, and a score of 587 would correspond to the 75th percentile.

3.16 Percentiles are dependent upon the reference group. As a group, the seniors and

nonenrolled college graduates did better on the GRE than did all people taking the exam.

A person receiving a given score, therefore, did better (scored at a higher percentile)

when compared to all people taking the exam than when compared only to the seniors

and nonenrolled college graduates taking the exam.

3.17 GPA scores:

88 2.46 0.86N X s [calculated from data set]

2.460.6745

0.86

3.04

X Xz

s

X

X

The 75th percentile for GPA is 3.04.

25

3.18 Diagnostically meaningful cutoff for Behavior Problem scores:

502.0540

10

70.54 cutoff

Xz

X

X

z score p

2.050 0.9798

2.054 0.9800

2.060 0.9803

3.19 There is no meaningful discrimination to be made among those scoring below the mean,

and therefore all people who score in that range are given a T score of 50.

3.20

Notice that some of the plots don’t look as neat as you might expect. Notice also that as

the sample size increases the plots look better.

26

3.21 Weight gain data

None of these is very close to normal, but the post intervention weight is closest.

.

3.22 Hand-calculated qqplot

z cumfreq cumperct zperc

-3.0 0 0.0000 0.00135

-2.5 0 0.0000 0.00621

-2.0 0 0.0000 0.02275

-1.5 11 0.0367 0.06681

-1.0 53 0.1767 0.15866

-0.5 97 0.3233 0.30854

0.0 162 0.5400 0.50000

0.5 220 0.7333 0.69146

1.0 259 0.8633 0.84134

1.5 278 0.9267 0.93319

2.0 289 0.9633 0.97725

2.5 292 0.9733 0.99379

3.0 300 1.0000 0.99865

3.23 I would first draw 16 scores from a normally distributed population with = 0 and = 1.

Call this variable z1. The sample (z1) would almost certainly have a sample mean and

standard deviation that are not 0 and 1. Then I would create a new variable z2 = z1-

mean(z1). This would have a mean of 0.00. Then I would divide z2 by sd(z1) to get a

new distribution (z3) with mean = 0 and sd = 1. Then make that variable have a st. dev.

of 4.25 by multiplying it by 4.25. Finally add 16.3 (the new mean). Now the mean is

exactly 16.3 and the standard deviation is exactly 4.25.

27

3.24 Salaries for assistant professors (1999-2000)

I expect that you would do reasonably well if you treated these as normally distributed,

especially if you calculated a trimmed mean and a Winsorized standard deviation. The

extreme salaries probably come from people who have either stayed at the rank of

Assistant Professor for many years, possibly because they don’t have the highest degree

in their field, or those who have come to the university with considerable nonacademic

experience. If you took the log of the salaries you would reduce the high end of the

distribution more than the low end, reducing the effect of very large salaries.

3.25 SAT Data

840

860

880

900

920

940

960

980

1,0

00

1,0

20

1,0

40

1,0

60

1,0

80

1,1

00

combined

0

2

4

6

8

10

12

Fre

qu

en

cy

Mean = 965.92Std. Dev. = 74.82056N = 50

The data are actually bimodal, with probably too few scores at the extremes.

-80

-70

-60

-50

-40

-30

-20

-10

0 10

20

30

40

50

60

70

80

adjcomb

0

2

4

6

8

10

12

14

Fre

qu

en

cy

Mean = 5.9674E-16Std. Dev. = 34.53279N = 50

These data are much more normally distributed. As we will see in Chapter Nine, there are two

kinds of students who take the SAT, depending on where they live. It the East most students take

it. In the West, students applying to high ranking eastern schools take it. This leads to the

bimodal distribution in the adjusted scores.

28

Chapter 4 – Sampling Distributions and Hypothesis Testing

4.1 Was last night's game an NHL hockey game?

a. Null hypothesis: The game was actually an NHL hockey game.

b. On the basis of that null hypothesis I expected that each team would earn somewhere

between 0 and 6 points. I then looked at the actual points and concluded that they

were way out of line with what I would expect if this were an NHL hockey game. I

therefore rejected the null hypothesis.

4.2 Am I overcharged at lunch?

a. Sketch:

b. No, $4.25 is a common observation.

c. I set up the null hypothesis that I was charged correctly. Therefore I would expect to

receive about $1.00 in change, give or take a quarter or so. The change that I

received was in line with that expectation, and therefore I have no basis for rejecting

H0.

4.3 A Type I error would be concluding that I had been shortchanged when in fact I had not.

4.4 A Type II error would be concluding that I had not been shortchanged when in fact I had.

4.5 The critical value would be that amount of change below which I would decide that I had

been shortchanged. The rejection region would be all amounts less than the critical

value—i.e., all amounts that would lead to rejection of H0.

29

4.6 I would adopt a one-tailed test (using the right-hand tail) if I wanted to detect being

shortchanged but was not concerned about receiving too much money. In that case I

would not reject the null hypothesis no matter how much excess change I received (i.e., I

would not care if the restaurant was being cheated). If I chose the wrong tail, however, I

would be looking out for the restaurant's interests and ignoring my own.

4.7 Was the son of the member of the Board of Trustees fairly admitted to graduate school?

490 650

50

3.2

Xz

z

z score p

3.00 0.0013

3.20 0.0007

3.25 0.0006

The probability that a student drawn at random from those properly admitted would have

a GRE score as low as 490 is .0007. I suspect that the fact that his mother was a member

of the Board of Trustees played a role in his admission.

4.8 The standard deviation is small because we have restricted our sample to the admitted

students, i.e., a high-scoring sample.

4.9 The distribution would drop away smoothly to the right for the same reason that it always

does—there are few high-scoring people. It would drop away steeply to the left because

fewer of the borderline students would be admitted (no matter how high the borderline is

set).

4.10 I would draw a very large number of samples. For each sample I would calculate the

mode, the range, and their ratio (M). I would then plot the resulting value of M.

4.11 M is called a test statistic.

4.12 Is the car at the stop sign going to stay there (H0) or dart out in front of you (H1)?

4.13 The alternative hypothesis is that this student was sampled from a population of students

whose mean is not equal to 650.

4.14 Sampling error is variability in a statistic from sample to sample that is due to chance—

i.e., due to which observations happened to be included in the sample.

4.15 The word "distribution" refers to the set of values obtained for any set of observations.

The phrase "sampling distribution" is reserved for the distribution of outcomes (either

theoretical or empirical) of a sample statistic.

4.16 If were to decrease, would increase and power would decrease.

30

4.17 a. Research hypothesis—Children who attend kindergarten adjust to 1st grade faster

than those who do not. Null hypothesis—1st-grade adjustment rates are equal for

children who did and did not attend Kindergarten.

b. Research hypothesis—Sex education in junior high school decreases the rate of

pregnancies among unmarried mothers in high school. Null hypothesis—The rate of

pregnancies among unmarried mothers in high school is the same regardless of the

presence or absence of sex education in junior high school.

4.18 Probability of a Type II error (ß) for distribution in Figure 4.4:

67 80 20

67 80

20

0.65

X

Xz

Looking z = -0.65 up in the Appendix, we find that .7422 of the scores fall above a score

of 67. is therefore 0.74.

4.19 Finger-tapping cutoff if = .01:

1002.327

20

53.46

Xz

X

X

z score p

2.3200 0.9898

2.3270 0.9900

2.3300 0.9901

For to equal .01, z must be -2.327. The cutoff score is therefore 53. The corresponding

value for z when a cutoff score of 53 is applied to the curve for H1:

53.46 80

20

1.33

Xz

53.46 80

Xz

31

Looking z = -1.33 up in Appendix z, we find that .9082 of the scores fall above a score of

53.46. is therefore 0.908.

4.20 In Section 4.11 we were running a one-tailed test so we compared the obtained

probability (.017) to .05 (placing the full 5% in the single tail) and rejected H0. If we

were using a two-tailed test we would compare the obtained probability (still .017) to

.025 (placing 5%/2 = 2.5% in each tail) and would still reject H0. In this case, therefore,

the results would have been the same in either case.

4.21 To determine whether there is a true relationship between grades and course evaluations I

would find a statistic that reflected the degree of relationship between two variables. (The

students will see such a statistic (r) in Chapter 9.) I would then calculate the sampling

distribution of that statistic in a situation in which there is no relationship between two

variables. Finally, I would calculate the statistic for a representative set of students and

classes and compare my sample value with the sampling distribution of that statistic.

4.22 I would repeat the answer to Exercise 4.21 except that here we are speaking of comparing

means rather than looking at relationships. In other words, I would obtain the sampling

distribution of the difference between two means under the condition that I am sampling

from populations with identical means. I would then calculate the difference between my

two sample means and compare it to that sampling distribution. (The students will see

such a test in Chapter 7, although there we will use the t statistic instead of the difference

between the means.)

4.23 a. You could draw a large sample of boys and a large sample of girls in the class and

calculate the mean allowance for each group. The null hypothesis would be the

hypothesis that the mean allowance, in the population, for boys is the same as for

girls.

b. I would use a two-tailed test because I want to be able to reject the null hypothesis

whether girls receive significantly more or significantly less allowance than boys.

c. I would reject the null hypothesis if the difference between the two sample means

were greater than I could expect to find due to chance. Otherwise I would not reject.

d. The most important thing to do would be to have some outside corroboration for the

amount of allowance reported by the children.

4.24 c. This is an interesting problem. On the one hand they have all of the states, so they

have the parameters and don’t have to estimate them. On the other hand, it would be

interesting to test a general hypothesis about whether there is something about private

ownership that keeps prices up (or down). I just don’t see how you test that here.

Students may struggle with this one.

32

4.25 In the parking lot example the traditional approach to hypothesis testing would test the

null hypothesis that the mean time to leave a space is the same whether someone is

waiting or not. If their test failed to reject the null hypothesis they would simply fail to

reject the null hypothesis, and would do so at a two-tailed level of = .05. Jones and

Tukey on the other hand would not consider that the null hypothesis of equal population

means could possibly be true. They would focus on making a conclusion about which

population mean is higher. A ―nonsignificant result‖ would only mean that they didn’t

have enough data to draw any conclusion. Jones and Tukey would also be likely to work

with a one-tailed = .025, but be actually making a two-tailed test because they would

not have to specify a hypothesized direction of difference.

4.26 Reporting effect sizes would put the results of any study in perspective. It would give the

reader some sense of how large a difference we are speaking about, rather than leaving

him or her with the conclusion that some (possibly trivial) difference is greater than we

would expect by chance.

4.27 Distribution of proportion of those seeking help who are women.

The sampling distribution of proportion of women in the sample.

a. It is quite unlikely that we would have 61% of our sample being women if p = .50. In

my particular sampling distribution as score of 61 or higher was obtained on 16/1000

= 1.6% of the time.

b. I would repeat the same procedure again except that I would draw from a binomial

distribution where p = .75.

33

Chapter 5 - Basic Concepts of Probability

5.1 a. Analytic: If two tennis players are exactly equally skillful so that the outcome of

their match is random, the probability is .50 that Player A will win the upcoming

match.

b. Relative frequency: If in past matches Player A has beaten Player B on 13 of the 17

occasions on which they played, then Player A has a probability of 13/17 = .76 of

winning their upcoming match, all other things held constant.

c. Subjective: Player A's coach feels that he has a probability of .90 of winning his

upcoming match with Player B.

5.2 a. p(that you will win) = 1/1000 = .001

b. p(that your brother will win) = 2/1000 = .002

c. p(that one or the other of you will win) = .001 + .002 = .003

5.3 a. p(that you will win 2nd prize given that you don't win 1st) = 1/9 = .111

b. p(that he will win 1st and you 2nd) = (2/10)(1/9) = (.20)(.111) = .022

c. p(that you will win 1st and he 2nd) = (1/10)(2/9) = (.10)(.22) = .022

d. p(that you are 1st and he 2nd [= .022]) + p(that he is 1st and you 2nd [= .022])

= p(that you and he will be 1st and 2nd) = .044

5.4 Joint probabilities were involved in Exercise 5.3b and 5.3c and when we combined those

results in 5.3d.

5.5 Conditional probabilities were involved in Exercise 5.3a.

5.6 Joint probabilities: What is the probability that I will be free to go skiing next Wednesday

and that the conditions will be good?

5.7 Conditional probabilities: What is the probability that skiing conditions will be good on

Wednesday, given that they are good today?

5.8 p(that they will look at each other at the same time) = p(that mother looks at baby) *

p(that baby looks at mother) = (2/24)(3/24) = (.083)(.125) = .01

5.9 p(that they will look at each other at the same time during waking hours) = p(that mother

looks at baby during waking hours) * p(that baby looks at mother during waking hours) =

(2/13)(3/13) = (.154)(.231) = .036

34

5.10 A flier that contains a message asking the person to dispose of it properly has a higher

probability of being found in the trash than we would expect if the message and disposal

were independent events.

5.11 A continuous distribution for which we care about the probability of an observation's

falling within some specified interval is exemplified by the probability that your baby

will be born on its due date.

5.12 The continuous distribution of children's learning abilities is often treated as discrete by

school systems, which divide children into those needing special education versus those

who should attend regular classes. Often schools further divide the regular classes into

different tracks.

5.13 Two examples of discrete variables: Variety of meat served at dinner tonight; Brand of

desktop computer owned.

5.14 p(that any applicant will be admitted) the ratio of the number admitted to the number

applying = 10/300 = .03

5.15 a. 20%, or 60 applicants, will fall at or above the 80th percentile and 10 of these will be

chosen. Therefore p(that an applicant with the highest rating will be admitted) =

10/60 = .167.

b. No one below the 80th percentile will be admitted, therefore p(that an applicant with

the lowest rating will be admitted) = 0/300 = .00.

5.16 Mean ADDSC score = 52.6, s = 12.42 [Calculated from Data Set.]

a.

50 52.6

0.2112.42

z

Since a score of 50 is below the mean, and since we are looking for the probability of

a score greater than 50, we want to look in the tables of the normal distribution in the

column labeled "larger portion".

p(larger portion) = .5832

b. 45% of the scores actually exceed 50, while 56% are = 50.

5.17 Mean ADDSC score for boys = 54.29, s = 12.90 [Calculated from Data Set]

a.

50 54.3

0.3312.90

z

35

Since a score of 50 is below the mean, and since we are looking for the probability of

a score greater than 50, we want to look in the tables of the normal distribution in the

column labeled "larger portion".

p(larger portion) = .6293

b. 29/55 = 53% > 50; 32/55 = 58% > 50. (Notice that one percentage refers to the

proportion greater than 50, while the other refers to the proportion greater than or

equal to 50.)

5.18 p(that person will drop out of school, given that he/she has an ADDSC of at least 60) =

7/25 = .28

5.19 Compare the probability of dropping out of school, ignoring the ADDSC score, with the

conditional probability of dropping out given that ADDSC in elementary school

exceeded some value (e.g., 66).

5.20 p(dropout) = 10/88 = .11; p(dropout|ADDSC > 60) = .28; Students are much more likely

to drop out of school if they scored at or above ADDSC = 60 in elementary school.

5.21 Plot of correct choices on trial 1 of a 5-choice task:

p(0) = .1074

p(1) = .2684

p(2) = .3020

p(3) = .2013

p(4) = .0881

p(5) = .0264

p(6) = .0055

p(7) = .0008

p(8) = .0001

p(9) = .0000

p(10) = .0000

5.22 p(6 or more correct) = p(6) + p(7) + p(8) + p(9) + p(10)

= .0055 + .0008 + .0001 + .0000 + .0000

= .0064

Thus if 6 of the 10 were correct I would conclude that they were not operating at chance

(there is some cheating going on!).

5.23 p(5 or more correct) = p(5) + p(6) + p(7) + p(8) + p(9) + p(10)

= .0264 + .0055 + .0008 + .0001 + .0000 + .0000

= .028 < .05

36

p(4 or more correct) = p(4) + p(5) + p(6) + p(7) + p(8) + p(9) + p(10)

= .0881 + .0264 + .0055 + .0008 + .0001 + .0000 + .0000

= .1209 > .05

At = .05, therefore, up to 4 correct choices indicate chance performance, but 5 or more

correct choices would lead me to conclude that they are no longer performing at chance

levels.

5.24 Probability statements about the treatment of automobile shoppers:

Simple probability: The probability that the salesperson will make a condescending

remark is .15.

Joint probability: The probability that the salesperson will make a condescending

remark and that the customer is a woman is .10.

Conditional prob: The probability that the salesperson will make a condescending

remark given that the customer is a woman is .25.

5.25 If there is no housing discrimination, then a person’s race and whether or not they are

offered a particular unit of housing are independent events. We could calculate the

probability that a particular unit (or a unit in a particular section of the city) will be

offered to anyone in a specific income group. We can also calculate the probability that

the customer is a member of an ethnic minority. We can then calculate the probability of

that person being shown the unit assuming independence and compare that answer

against the actual proportion of times a member of an ethnic minority was offered such a

unit.

5.26 Number of subjects needed in verbal learning experiment if each is to see different

classes of words in a different order:

4

4

4!24

4 4 !P

5.27 Number of subjects needed in Exercise 5.26's verbal learning experiment if each subject

can see only two of the four classes of words:

4

2

4! 4!12

4 2 ! 2!P

5.28 Chance that subject will press correctly on first trial when learning to press three out of

five buttons in a certain order:

5

3

5! 5!60

5 3 ! 2!P

37

There are 60 possible orders to push 3 out of 5 buttons. The probability that the subject

will choose the correct order on the first trial = p(1/60) = 0.017

5.29 The total number of ways of making ice cream cones =

6 6 6 6 6 6

6 5 4 3 2 11 6 15 20 15 6 63C C C C C C

[You can't have an ice cream cone without ice cream; exclude 6

0C ].

5.30 Different ways to record from the rat's brain:

6

4

6! 6!15

4! 6 4 ! 2!C

5.31 Knowledge of current events:

If p = .50 of being correct on any one true-false item, and N = 20:

20 11 9

11

20

11

20 11 9

11

(11) 5 5

20! 20!167,960

11! 20 11)! 11!9!

(11) 5 5 167,960 .00048828 .00195313 .16

p C

C

p C

Since the probability of 11 correct by chance is .16, the probability of 11 or more correct

must be greater than .16. Therefore we can not reject the hypothesis that p = .50 (student

is guessing) at = .05.

5.32 Probability of 25 blue M&M’s out of 60 draws sampling with replacement.

60 25 35

25

25 35

16 16

(25) .24 .76

60!.24 .76

25!35!

(5.191543797 10 )(3.200965864 10 ) .000067671

.0011196

p C

5.33 Driving test passed by 22 out of 30 drivers when 60% expected to pass:

38

z 2230( . 60)

30( . 60) ( . 40) 1 . 49; we cannot reject H 0 at . 05.

5.34 On the theory that practice in almost anything leads to improvement, we give a sample of

first year college students, who will major in the humanities (where there is a lot of

reading assigned), a test for reading speed at the beginning of the fall semester. At the

end of the year we again measure their reading speed. We wish to test the null hypothesis

that reading speed, on average (or for most people) increased over the year.

5.35 Students should come to understand that nature does not have a responsibility to make

things come out even in the end, and that it has a terrible memory of what has happened

in the past. Any ―law of averages‖ refers to the results of a long term series of events, and

it describes what we would expect to see. It does not have any self-correcting mechanism

built into it.

5.36 Probability of breast cancer

( ) .01

| .80

| .096

p BC

p BC

p BC

||

| ( | )

.80 .01

.80 .01 .096 .99

.008 .008.078

.008 .095 .103

p D H p Hp H D

p D H p H p D H p H

5.37 It is low because the probability of breast cancer is itself very low. But don’t be too

discouraged. Having collected some data (a positive mammography) the probability is

7.8 times higher than it would otherwise have been. (And if you are a woman, please

don’t stop having mammographies.)

5.38 Reducing the rate of false positives

Here we can use the same calculations, but just change .096 to .05.

( ) .01

| .80

| .05

p BC

p BC

p BC

||

| ( | )

.80 .01

.80 .01 .05 .99

.008 .008.139

.008 .0495 .103

p D H p Hp H D

p D H p H p D H p H

The probability has nearly doubled when we nearly halved our false positive rate.

Documents

Chapter 3 - Normal Distributionstatdhtx/methods8/Instructors... · P V z score p 2.050 0.9798 2.054 0.9800 2.060 0.9803 3.19 There is no meaningful discrimination to be made among