21
ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES 41, 259-279 (1988) Heuristics and Biases in Diagnostic Reasoning: I. Priors, Error Costs, and Test Accuracy JONATHAN BARONANDJOHN C. HERSHEY University of Pennsylvaniu In three experiments, undergraduate subjects were asked to evaluate or choose between hypothetical medical tests. Subjects were told the subjective prior probability of a hypothetical disease, the hit rate and false-alarm rate of each test, and the relative subjective cost of the two possible errors that might be made. By varying priors, cost, and test accuracy, we could measure the influence of each parameter on subjects’ responses. Subjects overweighed costs relative to both priors and test accuracy. In single-test cases in which the choice was whether to test or do something else (treat or withhold treatment), priors were not systematically misweighed relative to accuracy. When two tests were compared, priors were underweighed relative to accuracy. Justifi- cations agreed with the conclusions reached by analysis of the preferences. When evaluating a test, subjects do not seem to understand that high priors make hit rates more relevant, while low priors make false-alarm rates more relevant. Subjects do, however, understand that a large cost of not treating diseased patients makes hit rates more relevant, while a large cost for treating nondiseased patients makes false-alarm rates more relevant. The over- weighing of costs seems to result from the use of a heuristic in which the subject tends to minimize the probability of the worst kind of error, regardless of other parameters. 0 1988 Academic Press. Inc. Often we must decide whether a piece of evidence is worth collecting as a basis for action. Should we test a child to determine placement in a special class, or an employee to determine whether he/she uses drugs? Should we question a witness in a crime? In the laboratory, should we do a pilot study before investing resources in a larger one? Should we pay for an inspection before purchasing a house? Should we seek a second opinion before consenting to surgery? Should we have amniocentesis or a mammogram? If we are a physician, should we recommend one? At issue in these cases is whether a given test is worth doing, or which Hershey is in the Department of Decision Science, The Wharton School. Baron is in the Psychology Department. Both are senior fellows in the Leonard Davis Institute of Health Economics. This work was supported by NIMH Grant MH37241 (J.B.) and NSF Grants SES82-18565 (J.C.H.) and SES85-09807 (J.B. & J.C.H.). We thank Randall Cebul, Jane Beattie, and Robert Stemberg for helpful comments. Address correspondence, including reprint requests, to J. Baron, Psychology Department, University of Pennsylvania. 3815 Walnut St., Philadelphia, PA 19104-6196. 259 0749-5978/88 $3.00 Copyright D 1988 by Academic Prerr. Inc All rights of reproduction in any form rererved

Heuristics and biases in diagnostic reasoning: I. Priors, error costs, and test accuracy

Embed Size (px)

Citation preview

ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES 41, 259-279 (1988)

Heuristics and Biases in Diagnostic Reasoning: I. Priors, Error Costs, and Test Accuracy

JONATHAN BARONANDJOHN C. HERSHEY

University of Pennsylvaniu

In three experiments, undergraduate subjects were asked to evaluate or choose between hypothetical medical tests. Subjects were told the subjective prior probability of a hypothetical disease, the hit rate and false-alarm rate of each test, and the relative subjective cost of the two possible errors that might be made. By varying priors, cost, and test accuracy, we could measure the influence of each parameter on subjects’ responses. Subjects overweighed costs relative to both priors and test accuracy. In single-test cases in which the choice was whether to test or do something else (treat or withhold treatment), priors were not systematically misweighed relative to accuracy. When two tests were compared, priors were underweighed relative to accuracy. Justifi- cations agreed with the conclusions reached by analysis of the preferences. When evaluating a test, subjects do not seem to understand that high priors make hit rates more relevant, while low priors make false-alarm rates more relevant. Subjects do, however, understand that a large cost of not treating diseased patients makes hit rates more relevant, while a large cost for treating nondiseased patients makes false-alarm rates more relevant. The over- weighing of costs seems to result from the use of a heuristic in which the subject tends to minimize the probability of the worst kind of error, regardless of other parameters. 0 1988 Academic Press. Inc.

Often we must decide whether a piece of evidence is worth collecting as a basis for action. Should we test a child to determine placement in a special class, or an employee to determine whether he/she uses drugs? Should we question a witness in a crime? In the laboratory, should we do a pilot study before investing resources in a larger one? Should we pay for an inspection before purchasing a house? Should we seek a second opinion before consenting to surgery? Should we have amniocentesis or a mammogram? If we are a physician, should we recommend one?

At issue in these cases is whether a given test is worth doing, or which

Hershey is in the Department of Decision Science, The Wharton School. Baron is in the Psychology Department. Both are senior fellows in the Leonard Davis Institute of Health Economics. This work was supported by NIMH Grant MH37241 (J.B.) and NSF Grants SES82-18565 (J.C.H.) and SES85-09807 (J.B. & J.C.H.). We thank Randall Cebul, Jane Beattie, and Robert Stemberg for helpful comments. Address correspondence, including reprint requests, to J. Baron, Psychology Department, University of Pennsylvania. 3815 Walnut St., Philadelphia, PA 19104-6196.

259 0749-5978/88 $3.00 Copyright D 1988 by Academic Prerr. Inc All rights of reproduction in any form rererved

260 BARON AND HERSHEY

of several tests is best to do. Five general factors that are relevant to evaluation of a test are (1) the prior probability (without the test result) that we would be right to take the action at issue; (2) the probability that the test would tell us we were right, if we were, the hit rate; (3) the probability that it would tell us we were right if we were not, the fulse- alarm rate; (4) the cost of a false negative, the cost of neglecting to act when we should; and (5) the cost of a false positive, the harm from acting when we should not. Factors 2 and 3 concern the accuracy of the test. Factors 4 and 5 concern the costs of errors.

In this paper, we ask whether subjects without special training in deci- sion making weigh these factors appropriately when evaluating a single hypothetical test or when comparing two tests. Knowledge of the methods used by these subjects will suggest where training may be re- quired and where errors may be made in daily life and in various profes- sions. We use medical tests as our examples because the parameters can be described in a way that is realistic and believable to the subjects. We do not here address the question of whether our results generalize to other situations or to subjects with special training or experience. How- ever, we have no reason to believe that our results are peculiar to medical decisions.

The tests considered are assumed to have two outcomes, positive or negative. Subjects were told that the test outcome will affect action, that the cost and risk of the testing procedure itself is negligible, and that only one test can be done. For each case, the subject was given the following information:

P The prior probability that a patient has a particular disease, before testing;

Hi Hit rate of test i, that is, the probability of a positive result given that the disease is present (for each test, when two are considered);

Fi False-alarm rate of test i, the probability of a posi- tive result given that the disease is absent (for each test, when two are considered);

Neglect The net subjective cost of a false negative, failing to treat a patient who has the disease (a positive number);

Harm The net subjective cost of a false positive, treating a patient who does not have the disease (a positive number).

The Normative Model

We assume that the normative value of a course of action-testing, treating, or withholding treatment- is determined by its expected cost

PRIORS, COSTS, AND ACCURACY 261

(or disutility-Baron, 1985, chap. 4, defends this assumption). In all cases, we assume that the cost of the correct course of action is 0, so that we need consider only the probability and cost of each type of error: A false negative occurs with probability P(l - Hi), and its cost is what we call “neglect.” A false positive occurs with probability (1 - P)Fi, and its cost is termed “harm.” To compare two courses of action, one can simply consider the difference in the expected cost of the two actions.

When the choice given to the subject is between testing (with a single test) and treating, the difference in expected cost is

E(Test - Treat) = [P(l - H) * Neglect - (I - P)F* Harm] - [(I - P) * Harm] = P( 1 - II) a Neglect - (1 - P)( I - F) * Harm. (1)

If this difference in expectation is positive, testing has a higher cost, so treating is preferred. As an example, for one of the questions given to our subjects (Case 1, Table l), Neglect = Harm = 1, P = .67, H = .67. and F = .33. E(Test - Treat) = .67(1 - .67) - .33(1 - .33) = 0; the deci- sion is a toss-up.

In this case, and in those to follow, the final expression for E( * ) has two terms. The first refers to diseased patients, the second to nondis- eased patients. Here, the first term is the probability of neglecting dis- eased patients because of a false negative times the cost of that outcome (Neglect). The second term represents the probability of treating nondis- eased patients times the cost (Harm). The first term is positive because testing can lead to an increase in false negatives. The second term is negative because testing can prevent false positives. By examining Eq. (I), we can confirm that testing becomes less useful (has a higher ex- pected cost) when we are more certain that the patient is diseased, when the hit rate is lower or the false-alarm rate is higher. when neglect is higher, or when harm is lower.

When the choice is between testing (with a single test) and withholding treatment, the cost difference is

E(Test - Withhold) = -P * H * Neglect + (1 - P)F* Harm. (2)

If this is positive, withholding is preferred. Here, the first term is negative because testing reduces the probability of false negatives, and the second term is positive because testing increases the probability of false posi- tives. As in Eq. (I), testing becomes less useful for lower H or higher F. But testing is more useful than withholding treatment for higher P, higher neglect, or lower harm.

When the choice is between two tests, the cost difference is

E(Test 1 - Test 2) = -P(H, - NJ . Neglect + (1 - P)(F, - F2) * Harm. (3)

262 BARONANDHERSHEY

If this is negative, Test 1 is preferred; if positive, Test 2. For example, for Case 8, Table 1, E( * ) = - .50(.84 - .76) * 3 + .50(.32 - .08) * 1 = 0; al- though the difference in the first parentheses is smaller than that in the second, the importance of this difference is greater, and the overall choice is a toss-up. In Expression (3), the probability part of each term represents the difference in the probability of treating for the two dif- ferent tests. For example, if the proportion of patients who have the dis- ease is P, then H, of these will be detected by Test 1 and Hz by Test 2, so the difference in probability of detection (of diseased patients) will be P(H, - Hz). It is apparent from Expressions (l)-(3) that absolute values of Neglect and Harm are normatively irrelevant for determining which action is best; their ratio is sufficient. Moreover, multiplication of Harm and Neglect by the same constant will multiply the effects of all other relevant variables (H, F, 1 - H, etc.) by the same constant.

In Experiments 1 and 3, we set the two terms equal within Expressions (l)-(3). Therefore, on normative grounds, one should be indifferent. We create different cases by balancing the three parts of each expression against each other. For example, for Case 2 in Table 1, compared to Case 1, the increased prior would make treating more valuable, but this is compensated for by the increased accuracy, which favors testing. By ma- nipulating factors in this way, we can learn what factors subjects are sen- sitive to in their judgments. Thus, a subject who favors testing more in Case 2 than in Case 1 would be said to be attending more to accuracy than to priors.

In Experiment 2, one option is designed to be preferable to the other for most cases. Nevertheless, the levels of the three factors are varied so that we can determine the relative weights that subjects give to them.

Questions at Issue

The effects of priors, costs, and accuracy on test evaluation have, to our knowledge, not been examined. However, previous findings from other tasks raise questions for ours. Priors tend to be neglected when subjects are asked to predict posterior probabilities (Bar-Hillel, 1980; Kahneman & Tversky, 1972) except when the causal significance of the priors is clear (Ajzen, 1977; Kahneman & Tversky, 1980). We attempt to make the causal significance clear by presenting the prior as a subjective probability resulting from a physical examination, rather than a propor- tion of some population of which the patient is a member. This form of presentation may be typical of real diagnostic situations, where a subjec- tive prior is always possible but populations statistics may be hard to obtain.

In the two-test comparisons, the effect of priors on the decision is indi- rect, as seen in Eq. (3). Priors affect the choice only through their effect

PRIORS, COSTS, AND ACCURACY 263

on the relative significance of the difference between the tests in hit rates and the difference in false-alarm rates. In other words, the cues interact. The literature on multicue judgment (e.g., Camerer, 1981; Klayman, 1984) suggests that people have difficulty learning to use interactions, so, by loose analogy, we might expect subjects to be insensitive to this indi- rect effect.

In Eq. (3) the effect of costs is exactly analogous to the effect of priors in two-test comparisons. When Neglect is high relative to Harm, we should be more interested in the hit rates of the two tests, and when Harm is high, we should be most interested in false-alarm rates. We might expect costs to be neglected for the same reasons we would expect priors to be.

On the other hand, Bell (1982) and Loomes and Sugden (1982) have explained many of the findings of Kahneman and Tversky (1979) by sug- gesting that people try to avoid anticipated regret. For example, subjects may pay inordinate attention to negative outcomes that would not occur if a different choice were made. In this case, they might take more pains to avoid the worst outcome than the normative model specifies. (Fein- stein, 1984, refers to a similar effect in medical practice, which he calls the avoidance of “chagrin.“) For example, when Harm is greater than Neglect, subjects might tend to rely more heavily on false-alarm rates than they should.

EXPERIMENT 1

This experiment examines test preference in two-test cases in which priors, costs, and test accuracy were separately manipulated. Norma- tively, the tests were identical in cost. If subjects are oversensitive to some factor, they will prefer the test favored by that factor. In addition to these two-test cases, we inserted some single-test cases in which we varied priors and accuracy only.

Method

Subjects were undergraduate and graduate students solicited by placing a sign on a major walkway on the campus of the University of Pennsylvania. They filled out a questionnaire in a quiet room set aside for this purpose, often while other subjects were also working on question- naires. All questions of clarification were answered by the experimenter. Most subjects had no such questions and no apparent difficulty.

Table 1 shows the parameter values that subjects were given. For the first two sets, priors are pitted against accuracy. In Set 1, the decision is whether to test or treat, and neglect of priors would favor testing in Case 2 more than in Case 1. In Set 2, the decision is whether to test or with-

264 BARON AND HERSHEY

TABLE 1 PARAMETER VALUES FOR EXPERIMENT 1

Set 1. Test vs Treat: E(Test - Treat) = P(1 - M - (1 - P) (1 - F) Case P H I-P

1. .67 .67 .33 2. .80 .80 .20

Set 2. Test vs Withhold: &Test - Withhold) = -Pa H + (1 - P)F Case P H I-P

3. .33 .67 .67 4. .20 .80 .80

Set 3. Test 1 vs Test 2: E(Test 1 - Test 2) = -Neglect * P(H, - Hz) + Harm . (1 - - PI (F, FJ

Case Neglect P H, Hz Harm (1-P) F, 5. 1 .75 .84 .76 1 .25 .32 6. 1 .50 .88 .72 1 .50 .28 7. 1 .25 .92 .68 1 .75 .24

8. 3 .50 .84 .76 1 .50 .32 9. 1 .50 .92 .68 3 .50 .24

10. 3 .25 .88 .72 1 .?5 .28 11. I .75 .88 .72 3 .25 .28

F .33 .20

F .33 .20

F2 .08 .I2 .I6

.08

.I6

.I2

.12

Note. Costs are relative; only the ratio Neglect/Harm (as opposed to the values of Ne- glect and Harm) is normatively necessary.

hold treatment. Neglect of priors favor testing in Case 4 more than in Case 3.

In Set 3, subjects compared two tests. In all cases, Test 1 had both a higher hit rate and a higher false-alarm rate than Test 2. Case 6 was a baseline case where Neglect = Harm, P = 1 - P, and H, - H2 = F, - F2. Cases 5 and 7 pitted priors against accuracy. In Case 5, the difference between hit rates is .08, and the difference between false-alarm rates is .24. This difference by itself would favor Test 2, except that disease is three times as likely as no disease, and this balances the difference in accuracy between the two tests. Case 7 is the reverse. A subject who ignored priors would favor Test 2 in Case 5 but Test 1 in Case 7.

In Cases 8 and 9, cost is pitted against accuracy. Costs play a role here that is analogous to priors in Cases 5 and 7. A subject who underweighed cost relative to accuracy would be inclined toward Test 2 in Case 8 and Test 1 in Case 9. Cases 10 and 11 pit cost against priors. A subject who underweighed priors relative to cost would be inclined toward Test 1 in Case 10 and Test 2 in Case 11. Subjects were given the following instruc- tions:

In the following problems you will be asked to make decisions about diagnostic testing, as if you were a physician. You do not need to know anything about medi- cine, however. We will tell you everything you need to know.

PRIORS, COSTS, AND ACCURACY 265

Specifically, we will tell you the likelihood of a disease on the basis of an exami- nation alone, before any tests are done. You may, if you like, imagine that it is a kind of flu, that will last for a few weeks if not treated. You may imagine that the treatment is some sort of antibiotic that doesn’t always work and that has some chance of causing an allergic reaction. Thus, your decision whether to treat or not will depend on a weighing of the risks and benefits for a particular combination of disease and treatment.

We will also tell you, in each case, how you are to weigh these risks and ben- efits. We will tell you how to compare the expected damage from the two mistakes you might make: (1) failing to treat a patient who has the disease; (2) treating a patient who does not have the disease. These mistakes lead to different kinds of consequences, of course. The first mistake prevents the patient from returning to work as early as possible, etc.; the second exposes the patient to unnecessary risk or reactions, etc. However, these consequences must be weighed against one an- other if a decision is to be made. This comparison will change from case to case.’

Finally, we will give you some information about each test. We will tell you what proportion of people who have the disease will show a positive test result, and what proportion of people who do not have the disease will show a (false) positive test result. You may think of the tests as inexpensive and relatively free of risk- things like blood tests and throat cultures.

If you decide to do a test, you will treat the patient if the test is positive (ab- normal) and you will withhold treatment if the test is negative (normal).

We are interested in your ability to weigh the various factors correctly, not in your ability to devise formulas and make calculations. Thus. do not attempt the latter, even if you think you know how.

There were two versions of the questionnaire. In Version 1, each case was presented as follows, using Case 1 as an example:

Before you do any tests, the probability of the disease is .67. For this case, it is equally bad to treat when the disease is absent as it is to fail to

treat when the disease is present. This means that you should treat the disease if you think that the probability of the disease is greater than SO, in the absence of further testing, and you should not treat if the probability is less than .SO.

There is one test you can give.

Probability of a positive result in patients who have the disease

Probability of a positive result in pa- tients who do not have the disease

.67

.33

t These instructions, which emphasized the overall consequences of the two mistakes. should have prevented the subjects from assuming that the risk of reactions, etc., reduced the expected benefit of treatment. No subject appeared to make such an assumption. If such an assumption were made, the effective value of Neglect would be reduced by the value of Harm, leading to a bias against treatment, which was not found. It can also be shown that addition or subtraction of a constant to Neglect (or Harm) does not affect any predictions concerning the comparisons we make.

266 BARONANDHERSHEY

The question is, should you treat the disease without testing, or should you give the test, treating if the result is positive and not treating if it is negative? Give your answer in the form of a rating on the following scale:

3 Treating is highly preferable 2 Treating is moderately preferable 1 Treating is slightly preferable 0 It doesn’t matter, neither course of action is preferable to the other -1 Testing first is slightly preferable -2 Testing first is moderately preferable -3 Testing first is highly preferable

Write your answer here, and indicate briefly the reasons for your choice.

Note that in this question the costs were presented in two ways, di- rectly (i.e., “equally bad” or “three times as bad”) and as a threshold probability. The threshold probability should help subjects understand the implications of the costs for action.

Version 2 was designed after the results from Version 1 were available, with the purpose of making information about priors and costs equally salient. The ordering of information about priors and costs was reversed for half the subjects (with appropriate changes in initial instructions). In addition, both types of information were presented in the same way, as ratios. For example, Case 1 with the information about priors first began as follows:

For this case, you believe it is equally bad to treat when the disease is absent as it is to fail to treat when the disease is present.

Before you do any tests, you believe it is twice as likely that the patient has the disease as not.

For Cases 5- 11, the two-test comparisons, the probability, and cost information was presented as in Cases 1-4. The rest of the information was presented as follows, using Case 5 as an example:

There are two tests you can give, Test I and Test 2.

Test 1 Test 2 Probability of a positive result in patients who have the disease. .84 .?6 Probability of a positive result in patients who do not have the disease. .32 .08”

Subjects indicated their preference for Test 1 vs Test 2 on a scale ranging from 3 (Test 1 highly preferable) to - 3 (Test 2 highly preferable) with 0 representing indifference. Justifications were also requested. (Re- sults of unreported studies without justifications were substantively iden- tical .)

The order of two-test cases was chosen so as to maximize dissimilarity between adjacent cases, while ensuring that Case 6 was always in the

PRIORS, COSTS, AND ACCURACY 267

middle. Single-test cases were interleaved with two-test cases, except for the first and last pair of two-test cases. Twenty-four subjects did Version 1. Twenty subjects did Version 2, with 10 in each order of information about priors and costs. Half of the subjects in each version (and half in each information order in Version 2) did the cases in the order 5, 10, 1, 9. 2, 6, 3, 7, 4, 11, 8; and half did the cases in the reverse order.

Results

Ratings

Table 2 shows the mean ratings for each case, for the two versions separately.

For the single-test cases, we can compare the weighing of accuracy (“A” in Table 2) and priors (“I”‘). On the average, priors and accuracy were weighed about the same, as the normative model prescribes. How- ever, only 32% of the comparisons indicated equal attention to priors and

TABLE 2 MEANRATINGS(ONASCALEFROM 3 TO -3) FROMEXPERIMENT 1

Case

Version I Version 2

Mean I Mean t

Set 1. Test vs Treat (High ratings favor Treat)

1. Lower P, lower A 2. Higher P, higher A 2- 1 Difference (P over A)

Set 2. Test vs Withhold (High ratings favor Withhold)

3. Higher P, lower A 4. Lower P, higher A 3-4 Difference (P over A)

Set 3. Comparison of two tests (High ratings favor Test 1)

5. High P. low A for Test I 6. - 7. Low P, high A for Test 1 7-5 Difference (A over P)

8. High Neglect, low A, Test 1 9. Low Neglect, high A, Test 1 8-9 Difference (U over A)

10. High Neglect, low P, Test 1 11, Low Neglect, high P, Test 1 IO- 11 Difference (U over P)

-0.29 -0.89 -0.45 -1.34 0.25 0.55 0.85 2.16* 0.54 1.37 1.30 2.73**

0.46 1.55 -0.10 -0.33 -0.46 - 1.52 0.10 0.23

0.92 _ 2.47** -0.20 0.59

- 0.88 0.17 1.13 2.01

0.33 -0.25

0.58

- 2.29* 0.59 2.94** 3.15** 0.87

-0.64 - 1.00

- 1.10 -0.35

0.55 1.6s 0.85

- 1.65 2.50

--2.34 I.16 1.24 2.34” 1.76

-4.20 4.47**

1.17 3.62** 1.45 3.18** -1.21 -3.50** - 2.00 -9.75**

2.38 5.06** 3.45 6.06**

Note. Significance levels are based on a two-tailed t test; *, .05; **, .Ol.

268 BARONANDHERSHEY

accuracy (equal ratings for the two paired cases), so it cannot be assumed that subjects weighed both factors correctly in each case. Sometimes priors were overweighed, sometimes, underweighed.

For the two-test comparisons in Set 3, note first that there was no overall bias toward either test; the mean rating for each version was close to (and not significantly different from) zero. For Case 6, thirteen of the 24 subjects in Version 1 and 11 of the 20 in Version 2 gave 0 (the correct response for all cases) as the response. The mean ratings of the remaining cases in Set 3 were .04 and - .32 (t = .38 and - 1.50) for Versions 1 and 2, respectively, although 0 was given as a response only 9 times in Ver- sion 1 (out of 144 responses) and 6 times in Version 2 (out of 120).

For Cases 5 and 7, the results for both versions indicate that subjects favor the test with higher accuracy, regardless of the priors: Test 1 in Case 7, Test 2 in Case 5 (see Table 2). (When responses involving possible misunderstanding were eliminated, this difference between Cases 5 and 7 was still significant when both versions were combined; t(27) = 2.24, p < .05. Responses were eliminated if the justifications involved explicit cal- culations or misstatements about the data provided, if they were uninter- pretable, if they took conditional probabilities to be posteriors, or if they gave a reason that involved a false inference. In sum, we used a conser- vative criterion, which left only those responses that referred to priors, costs, or test parameters in an appropriate way.)

For Cases 8 and 9, subjects overweighed cost relative to accuracy, fa- voring Test 1 when Neglect was greater than Harm. This result was sig- nificant for Version 2 but not for Version 1 (and these results stood when questionable responses were omitted: t( 16) = -4.97, p < .OOl, and t(12) = - 1.74, N.S., respectively). The apparent difference between Version 1 and 2 is surprising, because, if anything, Version 2 would seem to have made cost less salient by removing the information about thresholds. (Other studies, unreported so far, have replicated the overweighing of costs relative to accuracy in a format like Version 2, but without justifica- tions, and in a format in which costs were specified in terms of probabili- ties of death. We thus take the failure to find this result in Version 1 to be, most likely, a fluke.)

2 Priors were weighed less when the choice was between testing and withholding treat- ment (Set 2) than when it was between testing and treatment (Set 1) (f(23) = -2.61, p < .02, for Version 1; r(19) = -2.24, p < .0.5 for Version 2). These effects of Set were not replicated in subsequent experiments, and may therefore be seen as flukes. In Version 1, priors were underweighed in Set 2 and were weighed correctly in Set 1. In Version 2, priors were weighed correctly in Set 2 and were overweighed in Set 1. Apparently, the wording of Version 2 drew more attention to priors. This curious result does not affect the interpreta- tion of other findings. There were no consistent individual differences in attention to priors, as assessed by consistency across cases in the tendency to over- or underweigh.

PRIORS, COSTS, AND ACCURACY 269

Cases 10 and 11 test the relative salience of priors and costs directly by pitting them against each other. Both versions show substantial over- weighing of costs (with questionable responses omitted, t(l5) = -6.46, p < .OOl, and t(l5) = - 13.2, p < .OOl, for Versions 1 and 2, respectively). This result cannot be ascribed to the way in which cost information is presented, because it occurs in both versions. Overall, then, Set 3 indi- cates that priors are underweighed relative to test accuracy and costs, and costs are overweighed relative to priors and (at least in one version) test accuracy. (Binomial tests across subjects agreed with t tests on the significance of each result.)

The neglect of priors in the two-test cases cannot be explained in terms of insensitivity to interactions. High cost for diseased patients, like high priors, favors Test 1 only because it makes the higher hit rates of Test 1 more relevant. In sum, although priors and costs both have indirect ef- fects (normatively) when two tests are compared, priors are under- weighed but costs are not.

Justifications

We classified the justifications into the categories shown in Table 3. Justifications in terms of hit rates or false-alarm rates were combined into a single category called “test accuracy,” since we were mainly interested in the relative attention paid to these factors as compaed to others. In Sets 1 and 2, mention of priors was scored if the subject mentioned priors in any way. Typically, this involved mentioning the direct effect of priors on the decision to treat (Set 1) or not treat (Set 2). In Set 3, mention of priors or costs was scored only when the subject indicated the relevance of the priors or costs to the appropriate test parameters.

For Sets 1 and 2, costs (which never favored one choice or the other) were never mentioned. On the whole, across both versions, priors and accuracy were mentioned about equally often. One justification coded as “other” in Table 3 is what might be called an “error strategy,” wherein the subject compared the errors resulting from the two options. For ex- ample, one subject for Case 1 said, “Treating everybody or first testing will both result in about a third mistreatment.” Overall, this strategy seemed to be effective, being associated with a “0” response 20 times and with other responses 13 times. (In 5 of the 33 cases, subjects spoke of correct diagnoses rather than errors, and all of these were associated with “0.“) For other justifications, “0” was given 21 times and other responses 122 times.

In Set 3, the relative frequency of the three main justifications (cost, priors, and accuracy) corresponds to what would be inferred from the numerical ratings. Specifically, accuracy is most frequently given as a

270 BARONANDHERSHEY

TABLE 3 SUBJECTS’ JUSTIFICATIONS IN EXPERIMENT 1

costs costs Priors Accuracy and only only only priors Other

Set 1. Test vs Treat 1. Lower P, lower A 134 574 18,12 2. Higher P, higher A 578 TO 17,12

Set 2. Test vs Withhold 3. Higher P, lower A 134 632 17,14 4. Lower P, higher A 1,3 3,O 20,17

Set 3. Test 1 vs Test 2 5. High P, low A, Test 1 O,l 495 13,8 o,o 796 6. - o,o o,o 18,16 o,o 624 7. Low P, high A, Test 1 04 433 10,lO 070 10,7 8. High Neglect, low A, Test 1 9,lO 0,o 837 03 7,3 9. Low Neglect, high A, Test 1 9,14 0,o 533 13 933 10. High Neglect, low P, Test 1 12,12 1,o 390 22 696 11. Low Neglect, high P, Test 1 13,12 130 431 091 66

Note. The entries indicate the number of subjects in each category for Version 1 and Version 2.

justification in Cases 5 and 7, cost in Cases 8 and 9, and cost in Cases 10 and 11.

Although the ordering is perfect, there was some use of priors in Cases 5 and 7, and some use of accuracy in Cases 8 and 9. Apparently, the pattern of biases we have found is not universal. In fact, a few justifica- tions (indicated in Table 3 as referring to “Cost and Priors”) could be interpreted as using a heuristic corresponding to the normative model itself. Examples are as follows:

Case 9. “Although Test 2 is more accurate in telling who doesn’t have the disease, Test 1 is much better in saying who does. There are probably more than three times as many people correctly diagnosed in Test 1, which makes up for the fact that it is three times as bad to treat for an absent disease.”

Case II. “I would choose Test 2 because it is three times as bad to treat when the disease is absent than it is to fail to treat when the disease is present and this test gives a lower probability of misdiagnosis of people without the disease. However, it is three times as likely that people will have the disease and this test is not as accurate in detecting these. There- fore, it is [only] moderately preferable.”

This heuristic was often, but not always, associated with (normatively correct) “0” responses. Whether training in such a heuristic can reduce

PRIORS, COSTS, AND ACCURACY 271

departures from the normative model is a matter for subsequent research. What is clear is that apparently appropriate heuristics are used spontane- ously by many subjects, and are thus very likely within the reach of others.

EXPERIMENT 2

In this experiment, subjects evaluate single tests in which costs are manipulated as well as priors and accuracy. In most of the cases, one option is normatively preferable to the other. We also avoid cases in which the priors are identical to the hit rate or the false-alarm rate. In Experiment 1, such equalities might have encouraged the use of special strategies based on comparison of these numbers. (A few subjects did refer to such equalities.) Finally, Neglect and Harm are provided on a scale roughly comparable in range to the percentage scale used for proba- bilities.

Method

The instructions were similar to those used in Experiment 1. However, a different format was used for presenting the information about each case. Case 1 serves as an example. All probabilities are in percentages:

PROBABILITY OF DISEASE: PROBABILITY OF NO DISEASE: PROBABILITY OF POSITIVE RESULT GIVEN DISEASE: PROBABILITY OF NEGATIVE RESULT GIVEN DISEASE: PROBABILITY OF POSITIVE RESULT GIVEN NO DISEASE: PROBABILITY OF NEGATIVE RESULT GIVEN NO DISEASE: SERIOUSNESS OF FAILING TO TREAT WHEN DISEASE IS

PRESENT: SERIOUSNESS OF TREATING WHEN DISEASE IS ABSENT:

68 32 12 28 16 84

80 80

Subjects gave their response on the same 7-point scale as that used in Experiment 1, followed by a written justification. Subjects were provided with a sheet containing the definition of each of these parameters listed. For example: “PROBABILITY OF DISEASE: This is based on every- thing you know about the patient except the result of the test (if you decide to do it). SERIOUSNESS OF FAILING TO TREAT WHEN DISEASE IS PRESENT: This indicates how bad a mistake it is to make this error. This number is supposed to represent the patient’s own as- sessment of seriousness, on a scale where 0 indicates that the mistake would not be bad at all. The same scale is used throughout all the cases you will see.”

The cases used are shown in Table 4. Eleven subjects, solicited as in Experiment 1, were given the cases in the order 1, 7, 5, 2, 3, 6, 4, 8, 14, 12, 9, 10, 13, 11, and eleven (one of whom was disregarded for giving

272 BARON AND HERSHEY

TABLE 4 PARAMETER VALUES AND MEAN RATINGS (BASED ON THE FIRST HALF OF EACH

SUBJECTS’ DATA ONLY) FOR EXPERIMENT 2

Mean Case (normative) Neglect P H Harm (1-P) F rating (t)

Set 1. Test vs Treat: &Test - Treat) = Neglect . P(1 - H) - Harm * (1 - P) (1 - F) 1. (-6.3) 80 .68 .72 80 .32 .16 - 1 .OO ( - 1.80) 2. (O,O) 80 .75 .72 80 .25 .16 -1.09(-2.21) 3. (6.3) 80 .82 .72 80 .18 .16 1.09 (2.39) 4. (-6.4) 80 .75 .79 80 .25 .05 - 1.45 (-4.66) 5. (6.6) 80 .75 .68 80 .25 .37 0.18 (0.35) 6. (-6.3) 60 .75 .72 90 .25 .I6 -2.18 (-9.64) 7. (6.3) 90 .75 .72 60 .25 .16 1.55 (3.56)

Set 2. Test vs. Withhold: @Test - Withhold) = Neglect * P. H + Harm . (1 - P) . F. 8. (-6.3) 80 .32 .84 80 .68 .28 - 2.00 (-9.49) 9. (0.0) 80 .25 .84 80 .75 .28 -0.30 (-0.51) 10. (6.3) 80 .I8 .84 80 .82 .28 -0.60 (- 1.20)

11. (-6.4) 80 .25 .95 80 .75 .21 -1.70(-3.60) 12. (6.6) 80 .25 .63 80 .75 .32 0.20 (0.43) 13. (-6.3) 90 .25 .84 60 .75 .28 - 1.80 (-4.63) 14. (6.3) 60 .25 .84 90 .75 .28 0.40 (1 .OO)

Note. The normative value is given in parentheses after each case number.

answers of only - 3) in the reverse order. Cases involving testing versus treatment (l-7) were blocked together, as were those involving testing versus withholding (8- 14). Otherwise, adjacent cases were dissimilar.

Set 1, the first 7 questions, uses high priors, and the choice is between testing and treating. Set 2, Cases 8- 14, uses low priors, and the choice is between testing and doing nothing. Otherwise, these cases are analogous to Cases 1-7, respectively.

The cost difference for the two options in each case is zero only for Cases 2 and 9. Cases 1 and 3 (and 8 and 10) manipulate priors so that it is better to treat (withhold) in Case 3 (Case 10) and test in Case 1 (Case 8). Thus, by subtracting the rating of Case 1 (Case 8) from that for Case 3 (Case lo), we can measure a subject’s sensitivity to priors. Similarly, Case 5 minus Case 4 (Case 12 minus Case 11) measures the effect of accuracy. Accuracy was manipulated by jointly changing both hits and false alarms. Finally, Case 7 minus Case 6 (Case 14 minus Case 13) mea- sures the effect of cost. Normatively, the effects of probability, accuracy, and the cost are the same. Thus, by comparing these effects, we can determine whether any of these factors is over- or underweighed relative to any other.

PRIORS, COSTS, AND ACCURACY 273

Results

Examination of the justifications revealed that 9 of the 21 subjects car- ried over the decision type (test vs treat or test vs withhold) into at least a few items in the second half of their questionnaire. Many of these sub- jects explicitly mentioned an incorrect alternative to testing. Accord- ingly, we analyzed ony the first half of each subject’s data, so that there were actually two groups, one given Cases l-7, and the other, Cases 8- 14.

The mean ratings are shown in the right column of Table 4. Both groups were combined for the main analysis. Subjects were highly sensitive to all three manipulations: priors (mean effect of 1.76 across both sets, t(20) = 5.58, p < .OOl), priors (mean 1.76, t = 4.09, p < .OOl), and cost (mean 3.00, t = 8.21, p < .OOl). However, they were not equally sensitive to the three manipulations (p < .025 by Hotelling’s 72 applied to the differences of the effects). They were more sensitive to cost than to accuracy and priors t(20) = 2.62, p < .02, two-tailed, for cost versus accuracy; t(20) = 2.84, p < .02, for cost versus priors). They were equally sensitive to accuracy and priors (mean difference in sensitivity = 0). (These results were unchanged by use of the whole data set instead of the first half, except that the difference between cost and accuracy effects was no longer significant.) There were no significant effects of group (test vs withhold or test vs treat) on these differences.

EXPERIMENT 3

In Experiments 1 and 2, cost is overweighed relative to accuracy and priors. There are two ways in which this may occur, which we shall call focusing and action bias. In focusing, subjects use cost information to focus attention on one accuracy parameter or the other. For example, when Neglect is particularly high, subjects might focus on the hit rate of a test. This explicit use of an interaction between parameters is what sub- jects typically say they did in the two-test comparisons in Experiment 1.

In action bias, cost information bears on the alternative action to testing. For example, when Neglect is high, subjects will be inclined to treat without testing or to test rather than withhold (because testing will make treatment more likely). Action bias, a direct, noninteractive heu- ristic, can explain the results of Experiment 2, but it cannot explain the results of the two-test comparisons in Experiment 1, where there was no alternative to testing. Note that both focusing and action bias could be caused by some more general mechanism, such as a simple tendency to behave as if the manipulation of cost were more extreme than it was.

In Experiment 3, we ask whether evidence for focusing can be found in

274 BARONANDHERSHEY

a single-test situation like that used in Experiment 2. To ask this, we manipulate hit rates and false-alarm rates independently, along with costs and priors. Thus, tests in some cases have generally high hit and false- alarm rates, and others have generally low rates.

Method

Twenty subjects were solicited as before. The instructions were essen- tially the same as those used in Experiment 2.

The cases used are shown in Table 5. Half of the subjects were given the casesinthe order9, 1,5,6, 8, 11,2, 3,7, 10, 12,4, 16,24,22, 19, 15, 14, 23, 20, 18, 17, 13, 21, and half in the reverse order. Cases involving testing versus treatment (1- 12) were blocked together, as were those in- volving testing versus nothing (13-24). Otherwise, adjacent cases were dissimilar.

Set 1, the first 12 questions, uses high priors, and the choice is between testing and treating. Set 2, Cases 13-24, used low priors, and the choice is between testing and doing nothing. Otherwise, these cases are analo- gous to Cases l- 12, respectively. The expected cost difference for the two options in each case is zero. Case 2 (and Case 14) is a kind of mid-

TABLE 5 PARAMETER VALUES AND MEAN RATINGS FOR EXPERIMENT 3

Cases Neglect P H Harm 1 - P F Mean rating (t)

Set 1. Test vs Treat: E(Test - Treat) = Neglect. P(1 - I$ - Harm * (1 - P) (1 - F) 198 1 .67 .6,.9 1 .33 .2,.8 -.lO(-.24), .30 (.71) 25’ 1 .80 .8,.9 1 .20 .2,.6 - .35 (- .74), .os (.lO) 3 1 .89 .9 1 .ll .2 - .40 (- .72)

4,lO 1 .80 A,.9 2 .20 .2,.8 -.SS(-2.10), -.35(-1.13) 5 2 .80 .9 1 .20 .2 .30 (.53) 6,ll 1 .89 .8,.9 2 .ll .2,.6 -l.lO(-2.92), -.SO(-1.52) 7,12 2 .67 .8,.9 1 .33 .2,.6 -.45(-.98), -.32(-.62)

Set 2. Test vs. Withhold: E(Test - Withhold) = - Neglect * P * H + Harm . (1 - P) . F 13,20 1 .33 .8,.2 1 .67 .4,.1 -.35(-.86), -.20(-.59) 14,21 1 .20 .8,.4 1 .80 .2,.1 -1.40(-4.92), -.16(-.46) 15 1 .ll .8 1 .89 .l -1.15 (-3.44)

16,22 2 .20 .8,.2 1 .80 .4,.1 -1.20(-3.21), -.50(-1.45) 17 1 .20 .8 2 .80 .1 - 1.10 (-3.10)

18,23 2 .ll .8,.4 1 .89 .2,.1 -1.90(-8.78), -l.OS(-3.28) 19,24 1 .33 .8,.4 2 .67 .2,.1 -.95(-2.30), -l.lO(-3.24)

Note. Where pairs of cases are shown on the same line, the values of H, F, and mean rating are shown in the order in which the cases are listed in the left column. For example, H was .6 for Case 1 and .9 for Case 8. In this table, negative ratings always favor testing; positive ratings favor treating in Set 1, and withholding treatment in Set 2.

PRIORS, COSTS, AND ACCURACY 275

point. Other cases manipulate cost, prior, and accuracy parameters so as to allow alternative models to be disentangled, as we shall explain later.

Results

The mean ratings are shown in the right column of Table 5. To analyze the results, we fit a five-parameter model to each subject’s ratings (all 24 cases). We assumed that the ratings were a linear function of five vari- ables. Each variable represented a factor that could produce distortions from the normative model (0 for all cases). Each variable was assigned a value for each case. We predicted each subject’s ratings by regressing them on these five variables. Each regression weight thus estimated the contribution of each factor to each subject’s ratings.

The first variable represented overattention to priors and was scored as 1 for Cases, 1,7,8, 12, 13, 19,20, and 24, as - 1 for Cases 3,6, 11, 15, 18, and 23, and as 0 for the remaining cases. A score of 1 on this variable means that overattention to priors would incline a subject toward testing in this case. A negative value of this parameter would indicate overatten- tion to priors (given that a decision to test corresponds to negative ratings).

The second variable represented overattention to costs through the ac- tion bias mechanism. It was scored as 1 for Cases 5, 7, 12, 17, 19, and 24, as - 1 for Cases 4, 6, 10, 11, 16, 18,22, and 23, and as 0 for the remaining cases. A score of 1 would mean that overattention to cost would incline a subject against testing, so that a positive value of this parameter indicates overattention to costs. No parameter was needed for overattention to accuracy, because accuracy is, in a sense, the baseline to which other factors are compared.

A third variable represented the focusing mechanism in combination with the action bias mechanism. (We could see no way of testing the focusing mechanism sensibly on its own.) To calculate this parameter, we assumed that the subject focused entirely on the patients for which errors were most serious: diseased patients when Neglect was greater than Harm, and nondiseased patients when Harm was greater than Neglect. When Harm and Neglect were equal, the value of this parameter was zero. For the relevant patients (diseased or nondiseased) only, we com- puted the difference in the probability of error for the two alternative actions. For example, in Case 4, the subject would focus on the nondis- eased patients. If the decision were to treat, the error probability would be 1 .O for these patients. If the decision were to test, the error probability would be .2. Thus, this reasoning would favor testing with a stength of - .8 (negative because testing corresponds to negative ratings). The value of this variable was therefore - .8 for Cases 4, 6, 16, and 18, - .4 for

276 BARONANDHERSHEY

Cases 11 and 23, - .2 for Cases 10 and 22, .l for Cases 5, 12, 17, and 24, .2 for Cases 7 and 19, and 0 for all other cases. A positive value for this parameter would provide evidence for focusing or for action bias.

We note that in many cases the difference between hit rate and false- alarm rate is quite low (e.g., Case 8), and it is surprising even to those familiar with the normative model that testing is worthwhile. This effect is not of interest here, but, if present, it could interfere with the measure- ment of other effects. Hence, another variable, which we called informa- tion, was simply the difference between hits and false alarms.

The final variable was simply condition, scored as 0 for test vs treat cases (l- 12) and 1 for test vs withhold cases (13-24). Again, this variable was included to eliminate extraneous variance.

The value of each parameter was estimated separately for each subject. Of interest are the mean values of each parameter across subjects. Overall, the four main parameters (priors, action bias, focusing, informa- tion) did differ from 0 (Hotelling’s P = 24.93, p < .Ol). The means (and t values against the null hypothesis of zero) for the individual parameters were: priors, .05 (0.33); action bias, - .I3 (-0.45);focusing, .11 (1.894,~ < .05, one tailed), information, - .10 (1.08), and condition, - .66 (- 2.18, p < .05, two tailed). The action bias and focusing parameters were signif- icant together (across subjects, Hotelling’s Tz = 11.23, p < .025). Al- though the focusing parameter was significant and the action bias param- eter was not, the former was not significantly greater than the latter; there was great variability among subjects in the action bias parameter. We may conclude that focusing or action bias account for the departures from the normative model.

Because some subjects may have had the same misunderstanding here as we noted in Experiment 2, in which they carried over the decision from their first block of cases to their second, we reanalyzed the data for the first half only. (However, justifications did not indicate so pro- nounced an effect here.) Again, the four main parameters were significant (P = 4.52, p < .025) and action bias and focusing were significant to- gether (P = 5.03, p < .025). The individual parameter values (and ts) were priors, - .15, (-0.52); action bias, .47 (1.70); focusing, .07 (1.13); and information, - .18 (- 1 .Sl>. There were no group differences in any parameters.

Both analyses indicate overattention to costs, although they do not an- swer clearly the question of whether this overattention takes the form of action bias alone or action bias combined with focusing. Clearly, action bias is present, however; in a separate analysis in which the focusing parameter was absent, the action bias parameter was highly significant. The fact that the focusing parameter is significant in the main analysis indicates that some focusing is very likely present as well.

PRIORS, COSTS, AND ACCURACY 277

There was also evidence for focusing in the justifications. For cases in which Neglect was greater than Harm, hits were mentioned in 47 cases and false alarms in 30. For cases in which Harm was greater than Ne- glect, hits were mentioned in 31 and false alarms in 49. The difference is significant by a x2 test (p < .OOl).

In sum, these results suggest that both action bias and focusing mecha- nisms account for overattention to costs. Subjects seem to choose the course of action most suited to the patients subject to the more serious error, but they seem to focus on the relevant accuracy parameter before making a judgment of how much this consideration should affect their overall judgment.

DISCUSSION

In all three experiments, subjects overattend to costs. We proposed two mechanisms for the overuse of cost, one involving interactions, in which the greater cost focuses the subject’s attention on hits or false alarms, and one involving direct effect on action, in which the cost in- clines the subject for or against testing in single-test cases. The focusing mechanism is apparently operating in the two-test comparisons of Exper- iment 1, and there is evidence for both this mechanism and the action bias mechanism in the single-test cases of Experiment 3.

One heuristic that can account for the overweighing of cost is a modi- fied form of “minimax regret.” Subjects take particular pains to mini- mize the probability of the worst mistake (when there is one). When the effect of cost is direct (as in the relevant single-test cases of Experiments 2 and 3), this heuristic favors a particular action. For example, when Ne- glect is greater than Harm, one would treat rather than test and test rather than withhold.

When the effect of cost is indirect (as in all relevant cases), this directs attention to that aspect of accuracy, hits or false alarms, that is relevant to the worst case. For example, the first half of a subject’s justification for Case 11, Experiment 1, was, “I would choose Test 2 because it is three times as bad to treat when the disease is absent than it is to fail to treat when the disease is present, and this test gives a lower probability of misdiagnosis of people without the disease.” This account is part of a broader class of regret models, including those of Bell (1982) and Loomes and Sugden (1982), which could account for our results as well.

In two-test comparisons (Experiment 1, Set 3), subjects underweighed priors relative to accuracy, despite our attempt to make priors salient by presenting them as the result of an examination of an individual patient rather than as statistics about a population. This precaution (which may account for the appropriate use of priors in single-test cases) removes one

278 BARON AND HERSHEY

common source of resistance to the use of priors (e.g., Cohen, 1981), the question about whether population statistics apply to a given case.

When subjects did attend to priors as well as costs, they weighed priors less heavily. In all six cases in which subjects mentioned correctly the relevance of both priors and costs, the rating suggested that cost was still overweighed. Typically, the argument about priors was added as an after- thought after the rating had, it seems, already been made, e.g., “How- ever, it is three times as likely that people will have the disease, and this test is not as accurate in detecting these.” Thus, there is an analogous heuristic for priors, but it is not used frequently, and, when used, it is not given much weight in the decision.

At the end of a training experiment (not reported here) one subject said, “Oh, I understand. What you want us to do is to trade off a large harm to a few patients with a smaller harm to many.” At the level of policy for the use of tests, the normative model corresponds exactly to this kind of utilitarian analysis. Thus, failure to apply the normative model could be related to certain objections to utilitarianism (e.g., Rawls, 1971). People may tend to think of medical (and other) testing as a matter of rights, where the situation of the people at greatest risk takes prece- dence. Subjects may sometimes translate costs into a kind of lexical or- dering in which minimizing the probability of the worst harm takes pri- ority over all else (see Baron, 1986). (The attempt to prevent them from doing this by expressing cost in terms of a threshold, in Experiment 1, Version 1, may simply have failed.)

Possibly, a tendency to use such a heuristic could result from the diffi- culty of balancing all relevant factors. However, difficulty alone cannot explain the choice of the heuristic used, as opposed to other equally simple heuristics. Further, we doubt that the extra effort required by more appropriate heuristics played a causal role in the choice of the heu- ristics that were used. There is no evidence that subjects considered and rejected more complex heuristics, or that they did not feel that the heu- ristics they used were perfectly adequate. (Only a few subjects in all ex- periments expressed doubt about their ability to do the task adequately.) Rather, we think that something like the minimax regret heuristic was their best considered procedure.

REFERENCES Ajzen, I. (1977). Intuitive theories of events and the effects of base-rate information on

prediction. Journal of Personality and Social Psychology, 35, 303-314. Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acfa Psychologica,

44, 211-213. Baron, J. (1985). Rationality and intelligence. Cambridge: Cambridge Univ. Press. Baron, J. (1986). Tradeoffs among reasons for action. Journal for the Theory of Social Be-

havior, 16, 173-195.

PRIORS, COSTS, AND ACCURACY 279

Bell, D. E. (1982). Regret in decision making under uncertainty. Operations Research, 30, 961-981.

Camerer, C. (1981). General conditions for the success of bootstrapping models. Organiza- tional Behavior and Human Performance, 21,41 l-422.

Cohen, L. J. (1981). Can human irrationality be experimentally demonstrated? The Beha\,- ioral and Brain Sciences, 4, 3 11-33 1.

Feinstein, A. R. (1984). The role of models for medical decisions: Algorithms and decision analysis. In S. T. Shulman (Ed.), Management of pharyngitis in nn era of declining rheumatic fever (pp. 105-l 13). Columbus, OH: Ross Laboratories.

Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representative- ness. Cognitive Psychology, 3,430-454.

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decisions under risk. Econometrica, 41, 263-291.

Kahneman, D., & Tversky, A. (1980). Causal schemas in judgments under uncertainty. In M. Fishbein (Ed)., Progress in social psychology. Hillsdale, NJ: Erlbaum.

Klayman, J. (1984). Learning from feedback in probabilistic environments. Acta Psycho& ica, 56, 81-92.

Loomes, G., & Sugden, R. (1982). Regret theory: An alternative theory of rational choice under uncertainty. Economic Journal, 92, 805-824.

Rawls, J. (1971). A theory ofjustice. Cambridge, MA: Harvard Univ. Press.

RECEIVED: March 10, 1986