VINEIS__From Figures to Values the Implicit Ethical Judgements in Our Measures of Health

7/28/2019 VINEIS__From Figures to Values the Implicit Ethical Judgements in Our Measures of Health

http://slidepdf.com/reader/full/vineisfrom-figures-to-values-the-implicit-ethical-judgements-in-our-measures 1/7

From Figures to Values: The Implicit Ethical

Judgements in our Measures of Health

Paolo VineisÃ, School of Public Health, Imperial College London

Roberto Satolli, Zadig, Milano Italy

ÃCorresponding author: Paolo Vineis, School of Public Health, Imperial College London, St Mary’s Campus, Norfolk Place W2 1PG London.

Tel: +44 (0)20 75943372; Fax: +44 (0)20 75943196; Email: [email protected]

The objective of the article is to examine the extensions of a clinical measure of efficacy, the Number Needed to

Treat (NNT), in different settings including screening, scanning, genetic testing and primary prevention, and the

associated ethical implications. We examine several situations in which the use of the NNT or NNS (Number

Needed to Screen) has been suggested, such as Prostate-Specific Antigen for prostate cancer, Magnetic

Resonance Imaging scans, genetic testing and banning of smoking. For each application, we explore the ethical

implications of the relevant measure. We have found that the different measures have different ethical impli-

cations. For example, the Number Needed to Prevent is the only measure that can be lower than one, indicating

with a numerical example that prevention is better than cure. Conversely, we raise questions about the accept-

ability of genetic screening. In a realistic example, we show that primary prevention of the effects of arsenic in

drinking water, targeted to the most susceptible, would require to genetically screen a large number of subjects,

whereas giving rise to ethical concerns. We warn against the abuse of testing, in particular genetic testing, we

show that different measures are associated with different ethical issues and that prevention tends to be better

than cure.

Introduction

How the impact of medical and preventive activities is

measured is one of the important issues that epidemi-

ologists face, and it has moral implications. The purpose

of this article is to show how different measures of treat-

ment and prevention are associated with very different

impacts for the populations involved, and entail differ-

ent moral implications. For the aim of this analysis, any

clinical or public health intervention can be considered

worthy on the basis of two ethical principles: (i) benefits

should exceed harm (beneficence) and (ii) the priority in

the use of public resources should be for interventions

that produce more benefits for more people (utility).

Number Needed to Treat

The Number Needed to Treat (NNT) is probably the

most useful single figure that one needs to know in order

to judge the efficacy of a therapy, and in fact of any

medical intervention. Its properties have been

described—see (Schulzer and Mancini, 1996) and

(Walter, 2001) for reviews and a discussion of statistical

aspects—and its use has thrived in the last decades—see

(Zulman et al., 2008) for an application to Public

Health strategies. It is a summary measure that allows

the physician to estimate how many patients need toreceive a treatment to have a benefit, it can be compared

with the expected burden of side-effects, with alternative

courses of action, and can lead to a cost–benefit analysis.

However, its extensions to testing, screening, scanning

(including incidental findings) and primary prevention

have not been fully explored and will be analysed here

from a public health perspective.

By examining different scenarios, we will address the

moral implications involved in the use of the NNT and

derived measures.

Scenario 1: Therapy and Tertiary Prevention

The NNT is the number of patients that is necessary to

treat with a drug or any other medical intervention to

save one life, to avoid the loss of 1 year of life (or of one

Quality Adjusted Life Years (QALY)), or to reduce other

specified adverse health outcomes. The NNT (Box 1) is a

function of the efficacy of the therapy and of the fre-

quency of the outcome we want to avoid or prevent.

PUBLIC HEALTH ETHICS VOLUME 5 NUMBER 1 2012 22–28 22

doi:10.1093/phe/phs003

! The Author 2012. Published by Oxford University Press. Available online at www.phe.oxfordjournals.org



When the therapy is very effective, like surgery for

appendicitis, and the outcome is frequent in the absence

of intervention, then the NNT is very close to 1, i.e. we

save almost all patients who are treated. This is a very

uncommon occurrence in medicine, and most NNTs

fluctuate around 50–500. Notice that the NNT may be

high, even for a common adverse outcome, not only if

the therapy is ineffective, but also if spontaneous recov-

ery occurs, since the measure of efficacy is based on acomparison between treated and untreated patients

(Box 1). Therefore, we may have a very high NNT in

the case of pancreatic cancer (frequency of death 100%,

highly ineffective therapies), but also for the therapy of

the common influenza, depending on the day of obser-

vation (with very high rate of spontaneous recoveries a

few days after treatment initiation).

The NNT increases with a decreasing frequency of the

outcome, whereas in contrast adverse side-effects of

therapies have the same occurrence rate, irrespectively

of the frequency of the outcome that we want to prevent.

For example, there is a fixed proportion of subjects who

will undergo aplastic anaemia after treatment with ibu-

profen, whether or not the drug is properly used in ser-

iously sick patients who really need it or inappropriately

used in subjects with a mild and self-containing disease.

This relationship is represented in Figure 1, which shows

that treatment should be initiated only when the advan-

tages overcome the side-effects. This well-known Figure

is usually applied to therapies, but common sense would

suggest to apply it to any medical act. Walter and

Sinclair (2009) have recently analysed the issue of

the ‘minimum target event risk for treatment’, i.e. the

threshold to undertake a treatment, and they noticed the

frequent lack of information that may allow an

informed decision.

The ‘first ethical implication’ is that any benefit

should be compared with side-effects, and the two are

asymmetric, because only benefits of treatment are

influenced by the frequency of the outcome, so thatdamage without benefit can easily occur for rare out-

comes. Benefit and harm are asymmetric also because

they do not necessarily refer to the same persons, so that

an intervention can slightly harm a large number of

people in order to benefit only one person. These two

asymmetries are in contrast with both the principles of

beneficence and of utility.

Number Needed to Test

Suppose that a doctor wants to prescribe a

Computerized Tomography (CT) for joint pain. If it is

highly likely that the CT will help her/him—to decide

whether to treat the patient or not, then the NNT for

therapy can be simply estimated for the treated subjects.

But if any treatment is unlikely to be undertaken, why is

the test performed? Has the doctor considered the po-

tential side-effects of the CT for the patient? In this cir-

cumstance, it seems reasonable to estimate not actually

the NNT, but the Number Needed to Test. In the case of

Box 1. Example of a measure of treatment efficacy

Let us consider a drug that is supposed to prevent heart disease (e.g. a statin). To express its efficacy, one can

calculate the frequency of deaths or of illnesses, after a sufficiently long time, in the treated group compared with

the control group. In a large study in healthy subjects with normal cholesterol, but with an altered level of an

inflammation marker (CRP) (the Jupiter study), the deaths were 12.5 per year every 1000 people in the control

group, and 10 in those treated with the drug. The two frequencies can be compared by calculating the difference(i.e. the deaths decreased by 2.5 per 1000 per year). However, this measure is rarely used to communicate benefits.

The authors of clinical trials prefer to calculate the percent of risk reduction in the treated arm compared with

controls, in this case 20 per cent (i.e. 2.5 divided by 12.5). In this way, the apparently modest absolute result is

transformed into a more attractive relative reduction. In other words, when the basal risk is low, even a modest

absolute benefit translates into an apparently large relative benefit. However, one of the most useful measures is the

NNT in order to avoid one adverse event such as death. In our example, the drug benefits 12.5 patients out of 1000

treated for 5 years (2.5 multiplied by 5 years). This means that (1000 divided by 12.5) subjects need to be treated to

obtain one benefit, i.e. to avoid one death. The NNT is thus 80 subjects. Is it large or small? To give an idea,

70 elderly patients with hypertension need to be treated for 5 years with anti-hypertensives, in order to avoid one

death; or, 100 male adults with no sign of heart disease need to take aspirin for 5 years in order to avoid an

infarction. Not only is the NNT an easily interpretable measure, but also allows comparative analyses including

costs. For example, if a year of therapy against cholesterol costs E1000 per patient; then approximately E400,000are needed to prevent one death by treating 80 people for 5 years.

FROM FIGURES TO VALUES 23



appendicitis, diagnosis is very simple, and in most cases

all patients undergoing the relevant tests will have thecorrect diagnosis and will be saved by surgery. But this is

clearly an exception. The doctor may decide to perform

a CT scan in 1000 patients with joint pain to identify the

10 who can theoretically benefit from a specific therapy.

The NNT for those 10 patients may be, say, 10 (to be

optimistic), i.e. out of the 10 patients with that condi-

tion who are treated, only 1 will recover, thanks to the

treatment. The other nine will get the drug (with its

side-effects) with no benefit. But we also have to include

in the equation the 990 patients who underwent a CT

scan with no gain. Therefore, the Number Needed to

Test is in fact 999, and among the side-effects, we have to

count also those of the diagnostic test. Again, the fre-

quency of the side-effects is independent of the efficacy

(or lack of) of the treatment and of the frequency of the

outcome.

The ‘second moral implication’ is that testing itself

(not only treatment) can lead to a large number of use-

less interventions, and the related discomfort. In fact,

the ratio between useful and useless interventions can be

much higher than for the NNT. Therefore, the calcula-

tion of the Number Needed to Test is more useful than

the NNT in evaluating the beneficence and utility of any medical intervention.

Scenario 2: Secondary Prevention—the

Example of PSA Testing

The 1000 hypothetical patients above were all affected

by joint pain. What about a screening scenario, such as

Prostate-Specific Antigen (PSA) for prostate cancer?

This situation is similar to the estimation of Number

Needed to Test, but the computation needs to incorp-

orate the prevalence of the condition in asymptomatic

subjects. It is like the Number Needed to Test but in the

absence of signs and symptoms, and therefore with a

usually much lower disease prevalence. In fact, the

Number Needed to Screen (NNS) for breast cancer

screening, e.g. is around 2500–20,000, depending on

the age bands. This means that at least 2500 women

will undergo the screening test to identify a fraction

who have a potentially malignant lesion, among whom

there is one who will be saved by the screening activity.

This leads to a ‘third moral implication’, i.e. the over-

all effect of a screening test in asymptomatic subjects

depends on the prevalence of the asymptomatic condi-

tion, so that a test has completely different effectiveness,

e.g. in different age groups, and in accord to the prin-

ciples of beneficence and utility should not be offered to

a population with a low prevalence of the condition,

when the expected benefits are likely to be exceeded by the harm.

According to one study, 3 million American men

aged 40–74 years would show abnormal PSA levels if

screened (>4.0 nanograms per millilitre; with a pro-

posed threshold of 2.5 nanograms per millilitre, an add-

itional 3 million men would be abnormal). However,

only 0.4 per cent of men in the age range 40–74 years

are expected to die every year from prostate cancer. Let

us suppose that screening reduces the risk of dying by 20

per cent, probably an optimistic estimate [this is the

estimate found in the European ERSPC trial, not in

the American PLCO (Andriole et al., 2009; Schroder

et al., 2009)]. With the figures given in the recent

ERSPC report (Schroder et al., 2009), the absolute risk

reduction is 0.7 per 1000 in 10 years, which gives a NNS

to save a life of 1400 (1/0.0007), a rather high value.

Another way to estimate the impact is to say that 48

additional tumours need to be treated to prevent one

death (Schroder et al., 2009). This means that approxi-

mately 1399 subjects will undergo screening with no

benefit, and 47/48 will suffer from all the complications

related to prostatectomy with no real gain in survival.

If we consider the different life expectations, the NNS toavoid the loss of 1 year (or a QALY) would probably be

higher for older people (70 years), in spite of the

higher prevalence of the cancer.

Scenario 3: Disease Prediction—the Example

of Genetic Testing

One can argue that breast cancer screening is useful

indeed, at least over the age of 50 years; and perhaps,

Figure 1. The Figure shows that with an increasing

frequency of health effects (outcomes) the NNT is lower,

i.e. the benefits of treatment are higher, whereas harm is

independent of the frequency of outcomes (see also Box 1).

24 VINEIS AND SATOLLI



less convincingly, that also PSA screening may be useful.

But there are instances in which no benefit can be

demonstrated. One such instance is screening for

low-penetrant genetic variants. Let us consider what

the website of Decode, an Icelandic firm specialized in

genetic research, offers. They suggest that by sending

them a blood sample they can identify the gene variants

that predispose to cancer and other chronic diseases.

What happens if one has a ‘bad gene’? There are in

fact only two possibilities: one is early diagnosis by a

screening test such as mammography, a strategy used

in women with Breast Cancer 1 (BRCA1) mutations; the

other is a primary preventive strategy, e.g. quitting

smoking for a smoking-related cancer.

Here, we are interested in the methodological proper-

ties of the NNT and the ensuing moral implications.

According to Decode’s website to predict the onset of

bladder cancer, a smoking-related cancer, they will look

at two gene variants, one in the region 8q24 (chromo-some 8) and the other in 5p15 (chromosome 5). Is it

useful? Will one be benefited? It is very hard to say, since

Decode does not explain what one is supposed to do

with the genetic information they offer. The only ways

to make use of such information are either to prevent

exposures to carcinogens or to screen the carriers of the

variant(s) with greater intensity than non-carriers.

Unfortunately, the second possibility does not apply in

this situation since there is no effective early detection

test for bladder cancer.

The ‘fourth moral implication’ is that, for beneficence

and utility, no testing should be done when an effectiveintervention is not available.

Let us then imagine that we screen people in order to

suggest them to avoid exposure to a bladder carcinogen,

such as arsenic. The example is purely theoretical and

has been fully developed elsewhere (Vineis et al., 2005).

We hypothesize that the relative risk associated with the

gene variant is 1.5 (low penetrance), that the cumulative

risk for bladder cancer is 1 per cent in the normal popu-

lation (1.5 per cent among the carriers of the gene vari-

ant), and that reduction or elimination of exposure to

arsenic leads to a 50 per cent reduction in the risk of

bladder cancer (all realistic assumptions). This means

that the cumulative risk after intervention is 0.75 per

cent in carriers of the gene variant, and the risk reduc-

tion becomes 0.015–0.0075, i.e. 0.75 pre cent, leading to

a NNT of 133 (1/0.75). Under this scenario, if the

exposure to arsenic is reduced only in the carriers of

the variant, we will need to ‘treat’ (i.e. to reduce expos-

ure for) 133 exposed subjects to prevent one case. If

exposure to arsenic is instead reduced for the ‘wild-type’

(again with an efficacy of 50 per cent), then the NNT is

200. The difference between 133 and 200 is clearly not

striking, i.e. selecting those with the gene variant is not

particularly advantageous. But there is a further com-

plication, because we need to screen the population to

identify the variant carriers; the wild-type occurs in 80

per cent of the people, the variant only in 20 per cent.

This means it would be necessary to screen 666 subjects

to identify the 133 to ‘treat ’ with preventive policies to

avoid one case of cancer (if we want to treat only the

variant carriers). So the costs and side-effects of screen-

ing may not be worthwhile, even without considering

ethical issues related to utility, etc.

Thus, the ‘fifth moral implication’ is that testing may

divert attention from a more equitable and effective

(on a population basis) intervention, in this case pri-

mary prevention.

Scenario 4: Incidental Findings—the Exampleof Brain Imaging

A clear example of a recent application of the NNTest is

a meta-analysis of studies on brain magnetic resonance

imaging (MRI), in which a rather high prevalence

(0.7%) of incidental findings occurs (Morris et al.,

2009). We could discuss whether this is screening or

not: usually, MRI is done because of symptoms, but

often it is done for research, and in any case incidental

findings arise that are unrelated to the symptoms.

Screening is usually not the purpose, or at least

the requirements of a screening test are not met.Clinicians do not know yet how to deal with incidental

findings, such as aneurysms, and guidelines are not

available. The authors of the meta-analysis use (appar-

ently for the first time) what they call the Number

Needed to Scan , which is only 50 for ‘any non-neoplastic

incidental finding’, clearly a very low figure: every 50

scans, one will be considered suspect or pathological.

It is worth noting that this Number Needed to Scan

has little to do with the NNT (or to screen), which is

the number of patients we need to treat/screen to avoid

one adverse outcome like death. In the case of scanning,

the index just tells us the number we need to scan to find

one positive result of any kind, irrespective of treatment

efficacy or usefulness of the finding.

It seems that the risk of haemorrhage from unrup-

tured aneurysms is low, but MRI is too recent to allow

for a sufficiently long follow-up. In contrast, the risk of

stroke or death from surgical interventions is sizable. In

practice, we do not know where we are positioned in the

graph of risks versus benefits shown in Figure 1.

Consider also that 94 per cent of meningiomas remain




asymptomatic and 63 per cent do not grow. On the

other side, the occasional discovery of a brain lesion

for the patient means the loss of the driving licence,

insurance and (in some countries and for some jobs)

of employment. These are all side-effects of the MRI that

do not depend on the efficacy of treatments and the

frequency of the outcomes.

This example is an extension of the fifth moral impli-

cation. In this case, not only there is no known beneficial

intervention, but also even the natural history of the

disease is little understood.

Scenario 5: the Number Needed to Prevent

As an article in the New York Times stated in January

2007, for most Americans, the biggest health threat is

not avian flu, West Nile or mad cow disease. Its their

health-care system: ‘advanced technology allows doctors

to look really hard for things to be wrong. We can detecttrace molecules in the blood. We can direct fiber-optic

devices into every orifice. And CT scans, ultrasounds,

MRI and PET scans let doctors define subtle struc-

tural defects deep inside the body. These technologies

make it possible to give a diagnosis to just about

everybody . . . Second, the rules are changing. Expert

panels constantly expand what constitutes disease:

thresholds for diagnosing diabetes, hypertension,

osteoporosis and obesity have all fallen in the last few

years. The criterion for normal cholesterol has dropped

multiple times. With these changes, disease can now be

diagnosed in more than half the population’.

In more technical terms, this escalation is represented

in Figure 2 that shows how the NNT or NNS is increas-

ing when we move to the left from death or frank symp-

tomatic disease to early diagnosis and to ‘pre-clinical

conditions’. In fact, this encompasses a series of meas-

ures that include the NNT on the right, then the

Number Needed to Test, then the NNS on the left.

The latter shift has repeatedly occurred in recent years

with almost all the thresholds used for diagnostic pur-

poses: cholesterol from 160 to 130 or 100 milligrams per

decilitre, fasting glycaemia from 140 milligrams per

decilitre to 126 milligrams per decilitre, systolic blood

pressure from 160 to 140 mmHg and then 120 mmHg. If

we add genetic testing for inherited susceptibility to dis-

ease, the NNT/NNS shifts further to the left in Figure 2.One important property of prevention is that the

NNP (Number Needed to Prevent one case of disease)

can be<1. This apparently paradoxical situation occurs

when a relatively limited preventive action has an

impact that goes beyond those who are directly affected

by it, e.g. for an indirect fallout. The typical example is

herd immunity : vaccinating a relatively limited number

of subjects prevents the disease in many more, e.g.

Figure 2. The vertical axis shows the NNT, the number of subjects who need to be examined and treated to avoid one adverse

effect such as death. The dark grey area represents those who are beyond the clinical threshold, i.e. those who already suffer

from symptoms or have signs of disease. The light grey area includes those who are considered ill only because they are above

a certain threshold such as glycaemia, though they still feel well. With a decreasing threshold in asymptomatic persons, the

NNT increases (to the left).




treating 10, we save 100. Similarly, banning smoking in

public places has a positive effect not only in those po-

tentially exposed to second-hand smoke (the target

population), but also in smokers, who will smoke less.

Zulman et al. (2008) have considered how the NNT

helps disentangling the efficacies of different public

health strategies, including focused strategies aimed athigh-risk groups, versus unfocused strategies aimed at

the general population. They notice that a population-

based intervention is a good option (in terms of NNT,

though it should be more adequately called NNP) if

there are no adverse effects, whereas a targeted approach

may prevent more deaths while treating fewer people if

adverse effects are present.

The ‘sixth moral implication’ seems to be that pre-

vention is better than cure also for a very technical

reason, related to utility and beneficence, i.e. at least in

certain cases the ratio between the ‘treated’ individuals

and the benefited individuals is <1.

Conclusions

By reviewing the different measures related to the ori-

ginal NNT (as summarized in Table 1), we have identi-

fied some interesting features of each of them. In

exceptional cases, the NNT is 1, when all patients

would die and all are saved; but usually the NNT is

>1, and it is even greater if we consider the Number

Needed to Test rather than the NNT, i.e. we include in

the denominator all the subjects who undergo a diag-

nostic test rather than only those who are offered the

treatment. By definition, the NNS is greater than the

NNT, because it involves asymptomatic persons whose

disease prevalence is lower than for the symptomatic

ones. If we want to screen for genetic susceptibility, as

several commercial laboratories now propose, we

have to compare the benefits gained by screening and

treating only the susceptibles, with the benefits

obtained by treating the whole population. At least for

low-penetrant genes, we may conclude that the overall

effort is far from being justified, as our example of

arsenic shows. Clearly, if there is no benefit (or benefits

are unknown, like in the case of the Number Needed to

Scan with MRI), the NNS tends to infinity.

Finally, if we extend the reasoning to primary preven-

tion, we discover an interesting property of the NNP,i.e. it is the only measure that can be <1, when the

benefit of prevention goes beyond the target of the pre-

ventive effort. Such situations can be much more fre-

quent than we think, from herd immunity to climate

change. In general, we can say that the different meas-

ures related to the original NNT can also be used as

proxies of the adherence of an intervention to the ethical

principles of beneficence and utility.

Acknowledgements

We are grateful to Michael Parker (Ethox, Oxford) for

useful suggestions.

Conflict of interest

None declared.

References

Andriole, G. L., Crawford, E. D., Grubb, R. L. 3rd, Buys,

S. S., Chia, D., Church, T. R., Fouad, M. N.,

Gelmann, E. P., Kvale, P. A., Reding, D. J.,

Weissfeld, J. L., Yokochi, L. A., O’Brien, B., Clapp,

J. D., Rathmell, J. M., Riley, T. L., Hayes, R. B.,

Kramer, B. S., Izmirlian, G., Miller, A. B., Pinsky,

P. F., Prorok, P. C., Gohagan, J. K. and Berg, C. D.

(2009). Mortality Results from a Randomized

Prostate-cancer Screening Trial. New England

Journal of Medicine , 360, 1310–1319.

Table 1. Properties of the different measures described in the text

Measure Components Range Examples

NNTreat Relative risk frequency of outcome 1 to infinity Surgery for acute

appendicitis

NNTest Same plus prevalence of disease

in symptomatic persons

>1 to infinity MRI for joint pain

NNScreen Same plus prevalence of pre-clinical

condition in asymptomatic persons

>1 to infinity Mammography

NNPrevent Relative Risk Frequency of outcome 1 to >1 Herd immunity




Morris, Z., Whiteley, W. N., Longstreth, W. T. Jr,

Weber, F., Lee, Y. C., Tsushima, Y., Alphs, H.,

Ladd, S. C., Warlow, C., Wardlaw, J. M. and

Al-Shahi Salman, R. (2009). Incidental Findings on

Brain Magnetic Resonance Imaging: Systematic

Review and Meta-analysis. BMJ , 339, b3016.

Schroder, F. H., Hugosson, J., Roobol, M. J., Tammela,

T. L., Ciatto, S., Nelen, V., Kwiatkowski, M., Lujan,

M., Lilja, H., Zappa, M., Denis, L. J., Recker, F.,

Berenguer, A., Maattanen, L., Bangma, C. H., Aus,

G., Villers, A., Rebillard, X., van der Kwast, T.,

Blijenberg, B. G., Moss, S. M., de Koning, H. J. and

Auvinen, A. (2009). Screening and Prostate-cancer

Mortality in a Randomized European Study. New

England Journal of Medicine , 360, 1320–1328.

Schulzer, M. and Mancini, G. B. (1996). ‘Unqualified

Success’ and ‘unmitigated Failure’: Number-

needed-to-treat-related Concepts for Assessing

Treatment Efficacy in the Presence of Treatment-

induced Adverse Events. International Journal of

Epidemiology , 25, 704–712.

Vineis, P., Ahsan, H. and Parker, M. (2005).

Genetic Screening and Occupational and Environ-

mental Exposures. Journal of Occupational and

Environmental Medicine , 62, 657–662, 597.

Walter, S. D. (2001). Number Needed to Treat (NNT):

Estimation of a Measure of Clinical Benefit. Statistics

in Medicine , 20, 3947–3962.

Walter, S. D. and Sinclair, J. C. (2009). Uncertainty in

the Minimum Event Risk to Justify Treatment was

Evaluated. Journal of Clinical Epidemiology , 62,

816–824.

Zulman, D. M., Vijan, S., Omenn, G. S. and Hayward,

R. A. (2008). The Relative Merits of Population-

based and Targeted Prevention Strategies. Milbank

Quarterly , 86, 557–580.


Documents

VINEIS__From Figures to Values the Implicit Ethical Judgements in Our Measures of Health