[email protected] 1 p 2 Clinical Trial Investigation Interpretation of Results “to p or not to p” Ferran Torres Hospital Clínic Barcelona / Universitat

[email protected]@uab.es 11

p


Clinical Trial InvestigationClinical Trial Investigation

Interpretation of ResultsInterpretation of Results“to p or not to p”

Ferran TorresFerran Torres

Hospital Clínic Barcelona / Universitat Autònoma Barcelona. Hospital Clínic Barcelona / Universitat Autònoma Barcelona.

EMA:EMA:

Scientific Advice Working Party (SAWP)Scientific Advice Working Party (SAWP)

Biostatistics Working Party (BSWP). Biostatistics Working Party (BSWP).


p


Today’s talk is on statistics



Statistics ConsiderationsStatistics Considerations


Basic statisticsBasic statistics Why Statistics?Why Statistics? Samples and populationsSamples and populations P-ValueP-Value Random and sistematical errorsRandom and sistematical errors Statistical errorsStatistical errors Sample sizeSample size Confidence IntervalsConfidence Intervals Interpretation of CI: superiority, non-Interpretation of CI: superiority, non-

inferiority, equivalenceinferiority, equivalence


The role of statisticsThe role of statistics

““Thus statistical methods are no Thus statistical methods are no substitute for common sense and substitute for common sense and objectivityobjectivity. They should never aim to . They should never aim to confuse the reader, but instead should confuse the reader, but instead should be a major contributor to the clarity of a be a major contributor to the clarity of a scientific argument.”scientific argument.”

The role of statistics. The role of statistics. Pocock SJ Pocock SJ . Br J Psychiat 1980; . Br J Psychiat 1980; 137:188-190137:188-190


Why Statistics?

Variation!!!!


Variability


Why Statistics?Medicine is a quantitative science but not exact

Not like physics or chemistry

Variation characterises much of medicine

Statistics is about handling and quantifying variation and uncertainty

Humans differ in response to exposure to adverse effectsExample: not every smoker dies of lung cancer

some non-smokers die of lung cancerHumans differ in response to treatment

Example: penicillin does not cure all infectionsHumans differ in disease symptoms

Example: Sometimes cough and sometimes wheeze are presenting features for asthma


Why Statistics Are Necessary

Statistics can tell us whether events could have happened by chance and to make decisions

We need to use Statistics because of variability in our data

Generalize: can what we know help to predict what will happen in new and different situations?


Population and Samples

Target Population

Population of the Study

Sample


Extrapolation

Sample

Population

Inferential analysisStatistical Tests

Confidence Intervals

Study Results

“Conclusions”


Statistical Inference

Statistical Tests=> p-value

Confidence Intervals


Valid samples?Population

Likely to occur

Unlikely to occurInvalid Sample and Conclusions


P-valueThe p-value is a “tool” to answer the question:

–Could the observed results have occurred by chance*?

–Remember:Decision given the observed results in a SAMPLE

Extrapolating results to POPULATION

*: accounts exclusively for the random error, not bias

p < .05“statistically significant”


P-value: an intuitive definition

The p-value is the probability of having observed our data when the null hypothesis is true (no differences exist)

Steps:1) Calculate the treatment differences in the sample (A-B)2) Assume that both treatments are equal (A=B) and then…3) …calculate the probability of obtaining a magnitude of at

least the observed differences, given the assumption 24) We conclude according the probability:

a. p<0.05: the differences are unlikely to be explained by random, – we assume that the treatment explains the differences

b. p>0.05: the differences could be explained by random, 1) we assume that random explains the differences


Factors influencing statistical significance

• Signal

• Noise (background)

• Quantity

• Difference

• Variance (SD)

• Quantity of data


130 150 170

01 02 03 04 05

True Value

Random vs Sistematic error

Random Systematic (Bias)

130 150 170

01 05

02 03

04

True Value

Example: Systolic Blood Pressure (mm Hg)


Random vs Sistematic error

Sample size

Sample size

Random

Bias


P-value

A “statistically significant” result (p<.05)

tells us NOTHING about clinical or scientific importance. Only, that the results were not due to chance.

A p-value does NOT account for biasonly by random error

STAT REPORT


P-valueA “very low” p-value do NOT imply:

–Clinical relevance (NO!!!)

–Magnitude of the treatment effect (NO!!)

With n or variability p

•Please never compare p-values!! (NO!!!)


RCT from a statistical point of view

1 homogeneous population 2 distinct populations

RandomisationTreatment B (control)

Treatment A


RCT

Sample Population


• Statistics can never PROVE anything beyond any doubt, just beyond reasonable doubt!!

• … because of working with samples and random error


Type I & II Error & Power

Reality (Population)

A=B A≠B

Conclusion (sample)

“A=B” p>0.05 OK Type I I error

()

A≠B p<0.05 Type I error

() OK


Utilidad de Creer en la Existencia de Dios (según Pascal)

Realidad

Dios Existe Dios No Existe

Dios Existe Acierto No PenalizaciónDecisiónde Pascal

Dios No Existe Condena Eterna Acierto

H0: Dios No ExisteH1: Dios Existe


Type I & II Error & PowerType I Error ()

– False positive– Rejecting the null hypothesis when in fact it is true – Standard: =0.05– In words, chance of finding statistical significance when in fact

there truly was no effect

Type II Error ()– False negative– Accepting the null hypothesis when in fact alternative is true– Standard: =0.20 or 0.10– In words, chance of not finding statistical significance when in

fact there was an effect


The planned number of participants is calculated on the basis of:

– Expected effect of treatment(s)

– Variability of the chosen endpoint

– Accepted risks in conclusion

↗ effect ↘ number

↗ variability ↗ number

↗ risk ↘ number

Sample Size


Sample Size The planned number of participants is calculated

on the basis of:






↗ risk ↘ number

ALTURA

ALTURA

Fre

cu

en

cia

300

200

100

0

Desv. típ. = 25.54

Media = 165.1

N = 2000.00

ALTURA

ALTURA

Fre

cue

nci

a

300

200

100

0

Desv. típ. = 26.94

Media = 165.0

N = 2000.00

ALTURA

ALTURA

Fre

cu

en

cia

120

100

80

60

40

20

0

Desv. típ. = 32.27

Media = 165.1

N = 2000.00


Sample Size The planned number of participants is calculated on the

basis of:






↗ risk ↘ number

Reality (Population)

A=B A≠B

Conclusion (sample)

“A=B” p>0.05 OK Type I I error

()

A≠B p<0.05 Type I error

() POWER


Interval Estimation

Confidence Confidence intervalinterval

Sample Sample statistic statistic

(point (point estimate)estimate)

Confidence Confidence limit (lower)limit (lower)

Confidence Confidence limit limit

(upper)(upper)

““A probability that the population A probability that the population parameter falls somewhere within parameter falls somewhere within

the intervalthe interval””


95%CIBetter than p-values…

– …use the data collected in the trial to give an estimate of the treatment effect size, together with a measure of how certain we are of our estimate

CI is a range of values within which the “true” treatment effect is believed to be found, with a given level of confidence. –95% CI is a range of values within which the ‘true’ treatment effect will lie 95% of the time

Generally, 95% CI is calculated as –Sample Estimate ± 1.96 x Standard Error


Superiority study

d > 0+ effect

IC95%

d = 0No

differences

d < 0- effect

Test betterControl better


0

Lower equivalence boundary

Upper equivalence boundary

Treatment more effective -><- Treatment less effective

Statistical Superiority

Non-inferiority

Equivalence

Inferiority

Treatment-Control

Statistically and Clinically superiority


Escalas de medición del efecto

P0 P1 Difabs Difrel RR OR

80.0% 75.0% -5.0% -6.3% 0.938 0.75015.0% 10.0% -5.0% -33.3% 0.667 0.63015.0% 14.0% -1.0% -6.7% 0.933 0.922

Riesgos


Cálculo de RR y OR

RR ó OR > 1

RR ó OR =1

RR ó OR < 1

Factor de riesgo

Ausencia de ‘efecto’

Factor protector


Cálculo de RR y OR

No Expuestos

Expuestos

Enfermos

Proporción en Expuestos: 0.50

Proporción en no Expuestos: 0.25

RR=2

Odds en Expuestos:

2/2=> 1 Odds en no

Expuestos: 1/3

OR=3


dcba

OR

2

10000004

10000008

OR

dcc

baa

RR

2

10000044

10000088

RR

Enfermos No EnfermosExp 8 1000000No Exp 4 1000000


Enfermos No Enfermos

Exp 524288 1000000No Exp 262144 1000000

dcba

OR

2

1000000262144

1000000524288

OR

dcc

baa

RR

6560.1

1262144262144

1524288524288

RR


Seamos críticos

En ocasiones las cosas no son lo que parecen


Seamos críticosObtención de los resultados

¿Es adecuada la técnica estadística utilizada?

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7

Encuesta AEncuesta B

•T-Test•ANOVA de medidas repetidas




Seamos críticos

Afirmaciones sin especificación de resultados

Porcentajes sin el denominador

Medias sin intervalo de confianza

¿Me fío del valor?


Seamos críticos A un paciente se le recomienda una intervención

quirúrgica y pregunta por la probabilidad de sobrevivir.

El cirujano le contesta que en las 30 operaciones que ha realizado, ningún paciente ha muerto.

¿Qué valores de P(morir) son compatibles con esta información, con una confianza del 95%?

Otro ejemplo más


Seamos críticosSolución

Límite superior del IC 95% para p=0 con n=30Pr(X=0,n=30,ps) = 0,025

La solución aproximada no sirve. Solución exacta, basada en la binomial:

{0; 0,116}

Incluso si la mortalidad es de un 11,6%, en 30 intervenciones no se observará ninguna muerte con Pr=0,025


Seamos críticos Si se disponen de datos...

... No se han de desperdiciar. Unos datos bien ‘torturados’ al final cantan.

¡¡¡ p<0.05 !!!


... ¿Y lo del denominador?El famoso perro fantástico


Por que después pasa lo que pasa


Key statistical issuesKey statistical issues MultiplicityMultiplicity Subgroups: interaction & confoundingSubgroups: interaction & confounding Superiority and non-inferiority (and Superiority and non-inferiority (and )) Adjustment by covariatesAdjustment by covariates Missing dataMissing data OthersOthers

– Interim analysesInterim analyses– Meta-analysis vs one pivotal studyMeta-analysis vs one pivotal study– Flexible designsFlexible designs


MULTIPLICITYMULTIPLICITY


Torneo Roland Garros 19991ª Ronda

Carlos Moyá vs Markus Hipfl

Moyá Hipfl

J uegos Totales Ganados 22 24Puntos Totales Ganados 147 1461er Servicio 62% 69%Aces 5 3Doble Faltas 4 5% Ganadores con el 1er Servicio 63 de 95 = 66% 61 de 96 = 64%% Ganadores con el 2º Servicio 25 de 58 = 43% 20 de 44 = 45%Ganadores (incluyendo el Servicio) 30 56Errores No Forzados 62 75Puntos de Break Ganados 6 of 21 = 29% 6 of 27 = 22%Aproximaciones a la red 48 of 71 = 68% 29 of 41 = 71%Velocidad del Servicio más Rápido 200 KPH 193 KPHPromedio Velocidad 1er Servicio 157 KPH 141 KPHPromedio Velocidad 2º Servicio 132 KPH 126 KPH

Set 1 2 3 4 5

Carlos Moyá 3 1 6 6 6Markus Hipfl 6 6 4 4 4


Lancet 2005; 365: 1591–95

To say it To say it colloquially,colloquially,

torture the data torture the data until they until they speak...speak...


Torturing data…Torturing data…– Investigators examine Investigators examine additional endpointsadditional endpoints, ,

manipulate group manipulate group comparisonscomparisons, do many , do many subgroup subgroup analyses, and analyses, and undertake undertake repeated interim analysesrepeated interim analyses..

– Investigators Investigators should should report all analytical report all analytical comparisons comparisons implemented. Unfortunately, they implemented. Unfortunately, they sometimes hide the complete analysis, handicapping sometimes hide the complete analysis, handicapping the readerthe reader’’s understanding of the s understanding of the results.results.

Lancet 2005; 365: 1591–95Lancet 2005; 365: 1591–95


Design Conduction Results


MultiplicityMultiplicity

K independent hypothesis : HK independent hypothesis : H01 01 , H, H02 02 , ... , H, ... , H0K0K

S significant results ( p<S significant results ( p<) )

Pr (S Pr (S 1 | H 1 | H01 01 H H02 02 ... ... H H0K0K = H = H0.0.) ) = 1 - Pr (S=0|H= 1 - Pr (S=0|H0.0.))

= 1- (1 - = 1- (1 - ))KK

K Pr(S>=1|Ho.) K Pr(S>=1|Ho.)

1 0.0500 10 0.4013

2 0.0975 15 0.5367

3 0.1426 20 0.6415

4 0.1855 25 0.7226

5 0.2262 30 0.7854


Same examplesSame examples

case A case B case CVariables 2 5 5Times 2 4 4Subgroups 2 3 3Comparisons 1 1 3

total 8 60 180False positive rate 33.66% 96.61% 99.99%


MultiplicityMultiplicity Bonferroni correction Bonferroni correction (simplified (simplified

version)version)

– K tests with level of signification of K tests with level of signification of – Each test can be tested at the Each test can be tested at the /k /k levellevel

Example:Example:– 5 independent tests5 independent tests– Global level of significance=5%Global level of significance=5%– Each test shoud be tested at the 1% level Each test shoud be tested at the 1% level

5% /55% /5 => 1% => 1%


But this is the simplified version for the general public


Cautionary ExampleCautionary Example RCT to treat rheumatoid arthritis RCT to treat rheumatoid arthritis Basic Clin Med 1981, Basic Clin Med 1981, 1515: 445: 445

Several end‑points repeated at various Several end‑points repeated at various timepoints and various subdivisions timepoints and various subdivisions

48 of these gave p-values < 0.05 48 of these gave p-values < 0.05

But… expect 5% of 850 = 850/20 = 42.5 But… expect 5% of 850 = 850/20 = 42.5

=>so finding 48 is not very impressive =>so finding 48 is not very impressive


Some strategies to ‘burden’ with Some strategies to ‘burden’ with multiple contrastsmultiple contrasts


Handling Multiplicity in Handling Multiplicity in VariablesVariables

Scenario 1:Scenario 1: One Primary VariableOne Primary Variable– Identify Identify one primary variable one primary variable -- other -- other

variables are secondaryvariables are secondary

– Trial is positive if and only if primary Trial is positive if and only if primary variable shows significant (p < 0.05), variable shows significant (p < 0.05), positive resultspositive results




Scenario 2Scenario 2 Divide Type I ErrorDivide Type I Error

– Identify two (or more) co-primary variablesIdentify two (or more) co-primary variables

– Divide the 0.05 experiment-wise Divide the 0.05 experiment-wise Type I error Type I error over these co-primary variables, e.g., 0.04 for over these co-primary variables, e.g., 0.04 for the 1st, and 0.01 for the 2nd co-primary variablethe 1st, and 0.01 for the 2nd co-primary variable

– Trial is positive if at least one of the co-primary Trial is positive if at least one of the co-primary variables shows significant, positive resultsvariables shows significant, positive results



Scenario 3 Scenario 3 Sequentially Rejective Sequentially Rejective ProcedureProcedure– Identify n co-primary variables, e.g., n = 3Identify n co-primary variables, e.g., n = 3– Order obtained p-valuesOrder obtained p-values

Interpret the variable with the highest p-value at the Interpret the variable with the highest p-value at the 0.05 level; 0.05 level;

if significant, then interpret the variable with the 2nd if significant, then interpret the variable with the 2nd highest p-value at the 0.05/2 level; highest p-value at the 0.05/2 level;

if positive, then interpret the variable with the if positive, then interpret the variable with the smallest p-value at the 0.05/3 level. smallest p-value at the 0.05/3 level.

Test procedure stops when a test is not Test procedure stops when a test is not significant.significant.



Scenario 4Scenario 4 HierarchyHierarchy– Prespecify hierarchy Prespecify hierarchy among n co-primary variables,among n co-primary variables,

– All tested at the same levelAll tested at the same level interpret 1st variable at 0.05 level, if significant, then interpret 1st variable at 0.05 level, if significant, then interpret 2nd variable at 0.05 level; if positive, then interpret 2nd variable at 0.05 level; if positive, then interpret 3rd variable at 0.05 level. interpret 3rd variable at 0.05 level. ……

Test procedure stops when a test is not significant.Test procedure stops when a test is not significant.

– Trial is positive if first co-primary variable shows Trial is positive if first co-primary variable shows significant, positive resultsignificant, positive result


Secondary VariablesSecondary Variables Secondary variables can only be claimed if Secondary variables can only be claimed if

and only if and only if – the primary variable shows significant results, the primary variable shows significant results,

and and – the comparisons related to the secondary the comparisons related to the secondary

variables also are protected under the same variables also are protected under the same Type I error rate as the primary variable.Type I error rate as the primary variable.

Similar procedures as already discussed Similar procedures as already discussed can be used to protect Type I errorcan be used to protect Type I error


Handling Multiplicity in Handling Multiplicity in TreatmentsTreatments

Similar procedures as how to handle Similar procedures as how to handle multiplicity in variables.multiplicity in variables.

Additional procedures are available, Additional procedures are available, mainly geared to very specific settings of mainly geared to very specific settings of the statistical hypotheses.the statistical hypotheses.– Dunnett, Scheffee, REGW, Williams …Dunnett, Scheffee, REGW, Williams …


SUBGROUPSSUBGROUPS


SubgroupsSubgroups

Indiscriminate subgroup analyses pose Indiscriminate subgroup analyses pose serious serious multiplicity concerns. Problems multiplicity concerns. Problems reverberate throughout the medical reverberate throughout the medical literature. Even after many warnings, literature. Even after many warnings, some investigators doggedly persist in some investigators doggedly persist in undertaking excessive undertaking excessive subgroup subgroup analyses.analyses.

Lancet 2000; 355: Lancet 2000; 355: 1033–341033–34

Lancet 2005; 365: 1657–61Lancet 2005; 365: 1657–61


InteracciónInteracción

Edad < 45 años Edad >= 45 años

d=5%

d=0.7% d=11.5%


Factores de confusiónFactores de confusión

No fumadores Fumadores

d=6%

d=0%

d=0%


Subgroups & Simpson’s Subgroups & Simpson’s ParadoxParadox

Experimental Controln (%) n (%)

ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)

100 100


Subgroups & Simpson’s ParadoxSubgroups & Simpson’s Paradox cont.cont.Experimental Control

n (%) n (%)MALE Succes 10 (33%) 24 (40%)

Failure 20 (67%) 36 (60%)30 60

FEMALE Succes 60 (86%) 36 (90%)Failure 10 (14%) 4 (10%)

70 40

Experimental Controln (%) n (%)

ALL Succes 70 (70%) 60 (60%)Failure 30 (30%) 40 (40%)

100 100


SubgroupsSubgroups

AspirinPlaceboVascular Death150 147

Total 1357 1442

11.1% 10.2%

p=0.42045 d=-0.9

ISIS-2: Vascular death by Star signs

Geminis/Libra Other Star Signs

AspirinPlaceboVascular Death 654 868

Total 7228 7157

9.0% 12.1%

p<0.0001 d=3.1

Interacction p = 0.019

Lancet 1988; 2: 349–60.


Changes from ISIS-2 resultsChanges from ISIS-2 results

Lancet 2005; 365: 1657–61


“The answer to a randomized controlled trial that does not confirm one’s beliefs is not the conduct of several subanalyses until one can see what one believes. Rather, the answer is to re-examine one’s beliefs carefully.”

– BMJ 1999; 318: 1008–09.BMJ 1999; 318: 1008–09.


Lancet 2005; 365: 1657–61


the question is the question is NOTNOT: ‘Is the treatment : ‘Is the treatment effect in this subgroup statistically effect in this subgroup statistically significantly different from zero?’significantly different from zero?’

BUT…BUT…are there any differences in the are there any differences in the treatment effect treatment effect betweenbetween the various the various subgroups? subgroups?

The correct statistical procedures are The correct statistical procedures are either a test of heterogeneity or a test either a test of heterogeneity or a test for for interactioninteraction


SubgroupsSubgroups Recommendations: Recommendations:

– 1) Examine the global effect 1) Examine the global effect – 2) Test for the interaction2) Test for the interaction– 3) Plan 3) Plan adjustments for confirmatory adjustments for confirmatory

analysesanalyses– 4) Some points which increase the 4) Some points which increase the

credibility:credibility:Pre-specificationPre-specificationBiologic plausibilityBiologic plausibility


Lancet 2005; 365: 176–86


MULTIPLE INSPECTIONSMULTIPLE INSPECTIONS


Interim Analyses in the CDP

Z ValueZ ValueZ ValueZ Value

+2+2

+1+1

00

-1-1

-2-2

+2+2

+1+1

00

-1-1

-2-210 20 30 40 50 60 70 80 90 10010 20 30 40 50 60 70 80 90 100

Month of Follow-upMonth of Follow-up

(Month 0 = March 1966, Month 100 = July 1974)

Coronary Drug Project Mortality Surveillance. Circulation. 1973;47:I-1

http://clinicaltrials.gov/ct/show/NCT00000483;jsessionid=C4EA2EA9C3351138F8CAB6AFB723820A?order=23


Lancet 2005; 365: 1657–61


Tipos de diseño Tipos de diseño secuencialsecuencial

1) Reestimación del tamaño muestral1) Reestimación del tamaño muestral

2) Métodos secuenciales por grupos2) Métodos secuenciales por grupos

3) Aproximación por funciones de gasto de 3) Aproximación por funciones de gasto de

4) Intervalos de confianza repetidos4) Intervalos de confianza repetidos

5) Restricción estocástica5) Restricción estocástica

6) Métodos bayesianos6) Métodos bayesianos

7) Límites continuos (función de verosimilitud)7) Límites continuos (función de verosimilitud)


Diseño NO aplicable a método Diseño NO aplicable a método secuencialsecuencial

¿Análisis?

Desarrollo total

Reclutamiento


Diseño SÍ aplicable a método Diseño SÍ aplicable a método secuencialsecuencial

Análisis

Desarrollo total

Reclutamiento


Métodos secuenciales por Métodos secuenciales por gruposgrupos

Pocock (1977)Pocock (1977) Pruebas de significación repetidasPruebas de significación repetidas K = Nº máximo de inspecciones a K = Nº máximo de inspecciones a

realizarrealizar K fijo K fijo a prioria priori Análisis con pruebas estadísticas Análisis con pruebas estadísticas

clásicas (clásicas (22, , t-test, ...t-test, ...))


K z ' z ' z '1 2.782 0.005 2.576 0.010 2.178 0.0292 1.967 0.049 1.969 0.049 2.178 0.029

1 3.438 0.001 2.576 0.010 2.289 0.0222 2.431 0.015 2.576 0.010 2.289 0.0223 1.985 0.047 1.969 0.049 2.289 0.022

1 4.084 0.000 3.291 0.001 2.361 0.0182 2.888 0.004 3.291 0.001 2.361 0.0183 2.358 0.018 3.291 0.001 2.361 0.0184 2.042 0.041 1.969 0.049 2.361 0.018

1 4.555 0.000 3.291 0.001 2.413 0.0162 3.221 0.001 3.291 0.001 2.413 0.0163 2.630 0.009 3.291 0.001 2.413 0.0164 2.277 0.023 3.291 0.001 2.413 0.0165 2.037 0.042 1.969 0.049 2.413 0.016

O'Brien & Fleming Peto Pocock

Group Sequential MethodsGroup Sequential Methods


Modelo triangular bilateralModelo triangular bilateral


CPMP/EWP/482/99: CPMP/EWP/482/99: PTC on Switching between PTC on Switching between

Superiority and Non-Superiority and Non-InferiorityInferiority

&&

CPMP/EWP/2158/99:CPMP/EWP/2158/99: PtC on the Choice of DeltaPtC on the Choice of Delta


RANDOMIZATION & RANDOMIZATION & COVARIATESCOVARIATES


AdjustementAdjustement The objective should be not to compensate The objective should be not to compensate

unbalance (randomisation) but to unbalance (randomisation) but to improve the improve the precisionprecision

Avoid to adjust by post-randomization Avoid to adjust by post-randomization variablesvariables

In RCT, never use this widespread strategy: In RCT, never use this widespread strategy: ““adjust by any baseline significant variable adjust by any baseline significant variable (5% or 10% level)(5% or 10% level)””


StratificationStratification A priori A priori

May desire to have treatment groups May desire to have treatment groups balanced with respect to prognostic or risk balanced with respect to prognostic or risk factors (co-variates)factors (co-variates)

For large studies, randomization For large studies, randomization ““tendstends”” to give balance to give balance For smaller studies a better guarantee may be neededFor smaller studies a better guarantee may be needed

Useful only to a limited extent (especially for Useful only to a limited extent (especially for small trials) but small trials) but avoid to many variables avoid to many variables (i.e. (i.e. many empty or partly filled strata)many empty or partly filled strata)


Testing for Testing for ““baseline baseline homogeneityhomogeneity””

All observed All observed differences differences are known with are known with certainty to be certainty to be due to chance. due to chance.

We We must not must not test for ittest for it: : there is no alternative there is no alternative hypothesis whose truth can be supported by such a test. hypothesis whose truth can be supported by such a test.

If significant, the If significant, the estimatorestimator is still is still unbiasedunbiased

Balance:Balance:– Decreases the variance and Decreases the variance and increases the powerincreases the power. . – It has It has no effect on type I errorno effect on type I error..


Observed Unbalanced…Observed Unbalanced… NEVER NEVER justifies the post-hoc justifies the post-hoc

adjustmentadjustment::– RandomizationRandomization is more important is more important– The treatment effect is unbiased without The treatment effect is unbiased without

adjustment (adjustment (randomizationrandomization))– Type I error level takes into account for Type I error level takes into account for

“chance error”“chance error”– Post-hocPost-hoc: data driven analyses : data driven analyses – Multiplicity issues Multiplicity issues : increase type I error by : increase type I error by

allowing a post-hoc adjustmentallowing a post-hoc adjustment


Adjusted AnalysesAdjusted Analyses

‘‘ When the potential value of an When the potential value of an adjustment is in doubt, it is often adjustment is in doubt, it is often advisable to nominate the advisable to nominate the unadjusted analysis as the one for unadjusted analysis as the one for primary attentionprimary attention, the adjusted , the adjusted analysis being supportive.analysis being supportive.’’


Ajuste por covariablesAjuste por covariables

Definición Definición a prioria priori La aparición de La aparición de desigualdades basalesdesigualdades basales

NONO justifica el ajuste justifica el ajuste per se:per se:– Se da más importancia a la randomizaciónSe da más importancia a la randomización– Peligro de los análisis post-hocPeligro de los análisis post-hoc– MultiplicidadMultiplicidad

Como estrategia general, el Como estrategia general, el ajuste por ajuste por variables significativas basalesvariables significativas basales (ej, (ej, p<0.1 o p<0.05) a priori: p<0.1 o p<0.05) a priori: NO NO es válidaes válida


Definición de las distintas Definición de las distintas poblaciones de un estudiopoblaciones de un estudio


Objetivo: Evaluar la eficacia de un programa para reducir el peso frente a los a los consejos habituales

Diseño: Ensayo Clínico Aleatorio

Candidatos: 790

Obesos: 320

Grupo intervención: 161

Grupo control: 159

Rechazo: 59Petición espontánea: 54

Acaban: 102 Acaban: 105


Grupo intervención: 161

Grupo control: 159

Rechazo: 59Petición espontánea: 54

Acaban: 102 Acaban: 105

Grupo intervención Grupo Control

Opción A 161 159

Opción B 102 105

Opción C 59 54

Opción D 156 164


MISSING DATAMISSING DATA


Ex: LOCF & lineal extrapolation

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time (months)

LOCF

Lineal Regresion

Bias

Ad

as-

Cog

> Worse

< Better


Ex: Early drop-out due to AE

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time

(months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Active


Ex: Early drop-out due to lack of Efficacy

Ad

as-

Cog

36

32

28

24-

20

16

12

8

4 0 2 4 6 8 10 12 14 16 18 Time (months)

Placebo

Active

> Worse

< Better

Bias:

Favours

Placebo


RND

B

Baseline Last Visit

≠ Frecuencies

A

Drop-outs and missing dataDrop-outs and missing data

A A A A A AB B A

Visit 2Visit 1

A


RND

Baseline Last Visit

≠ Timing

A

Drop-outs and missing dataDrop-outs and missing data

A A A A B B

Visit 2Visit 1

B B B


MDMD e incorrecto uso de poblaciones e incorrecto uso de poblaciones (1)(1)

DiseñoDiseño Cirugía vs Tratamiento Médico en Cirugía vs Tratamiento Médico en

estenosis carotidea bilateral (Sackket et estenosis carotidea bilateral (Sackket et al., 1985)al., 1985)

Variable principalVariable principal: Número de pacientes : Número de pacientes que presenten TIA, ACV o muerteque presenten TIA, ACV o muerte

Distribución de los pacientes:Distribución de los pacientes: Pacientes randomizados:Pacientes randomizados: 167167 Tratamiento quirúrgico: Tratamiento quirúrgico: 94 94 Tratamiento médico:Tratamiento médico: 73 73

– Pacientes que no completaron el Pacientes que no completaron el estudio debido a ACV en las fases estudio debido a ACV en las fases iniciales de hospitalización: iniciales de hospitalización:

Tratamiento quirúrgico: 15 pacientesTratamiento quirúrgico: 15 pacientesTratamiento médico:Tratamiento médico: 01 pacientes 01 pacientes



Población Por Protocolo (PP):Población Por Protocolo (PP):

Pacientes que hayan completado el estudioPacientes que hayan completado el estudio

AnálisisAnálisis

– Tratamiento quirúrgico:Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54%43 / (94 - 15) = 43 / 79 = 54%

– Tratamiento médico:Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74%53 / (73 - 1) = 53 / 72 = 74%

– Reducción del riesgo:Reducción del riesgo: 27%, p = 0.0227%, p = 0.02

Primer análisis que se realiza :



El análisis definitivo queda de la siguiente forma :

Población Intención de Tratar (ITT):Población Intención de Tratar (ITT):

Todos los pacientes randomizadosTodos los pacientes randomizados

AnálisisAnálisis– Tratamiento quirúrgico:Tratamiento quirúrgico: 58 / 94 = 62%58 / 94 = 62%– Tratamiento médico:Tratamiento médico: 54 / 73 = 74%54 / 73 = 74%– Reducción del riesgo:Reducción del riesgo: 18%, p = 0.0918%, p = 0.09 (PP: 27%, p = (PP: 27%, p =

0.02)0.02)

Conclusiones: La población correcta de análisis es la ITT El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico


Handling of MDHandling of MD Methods for imputation:Methods for imputation:

– Many techniquesMany techniques– No gold standard for every situationNo gold standard for every situation– In principle, all methods may be valid:In principle, all methods may be valid:

Simple methods to more complex:Simple methods to more complex:– From LOCF to multiple imputation methodsFrom LOCF to multiple imputation methods– Worst Case, “Mean methods”Worst Case, “Mean methods”

Multiple ImputationMultiple Imputation But their appropriateness has to be justifiedBut their appropriateness has to be justified

Statistical approaches less sensitive to MDStatistical approaches less sensitive to MD::– Mixed modelsMixed models– Survival modelsSurvival models

They assume no relationship between treatment and the They assume no relationship between treatment and the missing outcome, and generally this cannot be assumed.missing outcome, and generally this cannot be assumed.


CONCLUSIONCONCLUSION





JAMA 2002; 287: 1807-1814


Effect Size & Sample SizeRelative Effect Absolute Size Power* difference (%) (%) (mmHg)----------------------------------- 0% 4.9% 0.0 10% 5.9% 0.2 20% 8.5% 0.4 30% 13.3% 0.6 40% 20.2% 0.8 50% 28.2% 1.0 60% 39.3% 1.2 70% 49.3% 1.4 80% 61.1% 1.6 90% 71.0% 1.8 100% 80.4% 2.0 -----------------------------------*Statistical power assuming constant variability (SD=20mmHg)



CPMP/EWP/482/99: CPMP/EWP/482/99: PTC on Switching between PTC on Switching between

Superiority and Non-Superiority and Non-InferiorityInferiority

&&

CPMP/EWP/2158/99:CPMP/EWP/2158/99: PtC on the Choice of DeltaPtC on the Choice of Delta


ENSAYOS DE NO-INFERIORIDADENSAYOS DE NO-INFERIORIDAD

NECESIDADNECESIDAD Implicaciones legales.Implicaciones legales. Implicaciones metodológicas.Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de Limitaciones éticas y prácticas al uso de

placebo.placebo. Limitaciones prácticas a la superioridad Limitaciones prácticas a la superioridad

frente a control activo.frente a control activo. Necesidad de información comparativa.Necesidad de información comparativa. Posibles valores añadidos.Posibles valores añadidos.




NECESIDADNECESIDAD Implicaciones legales. Implicaciones legales. Implicaciones metodológicas.Implicaciones metodológicas. Limitaciones éticas y prácticas al uso de Limitaciones éticas y prácticas al uso de




Aproximación con el PoderAproximación con el Poder(prueba clásica + cálculo del poder)(prueba clásica + cálculo del poder)













Lancet 2001,356: 1668-75







Valores añadidosValores añadidos

Posología: 1 vez al díaPosología: 1 vez al día Vía: vía oralVía: vía oral Seguridad: Acontecimientos adversosSeguridad: Acontecimientos adversos Poblaciones especiales: Ancianos, Poblaciones especiales: Ancianos,

pediatríapediatría InteraccionesInteracciones


Ensayos de EquivalenciaEnsayos de Equivalencia

Ensayos de Ensayos de bioequivalenciabioequivalencia (producto genérico vs (producto genérico vs comercializado)comercializado)

Nuestro producto no es peor y puede Nuestro producto no es peor y puede presentar otras ventajas (seguridad, presentar otras ventajas (seguridad, comodidad posológica …)comodidad posológica …)– No-inferioridadNo-inferioridad


ESTUDIO DE SUPERIORIDADESTUDIO DE SUPERIORIDAD

d > 0+ efecto

IC95%

d = 0No hay

diferencia

d < 0- efecto

Mejor TestMejor Control


ESTIMACIÓN POR INTERVALOESTIMACIÓN POR INTERVALO (ESTUDIO DE SUPERIORIDAD) (ESTUDIO DE SUPERIORIDAD)

Es estadísticamente significativa

d = 0No hay

diferencia

d < 0- efecto

d > 0+ efecto

IC95%



ESTIMACIÓN POR INTERVALOESTIMACIÓN POR INTERVALO(ESTUDIO DE SUPERIORIDAD)(ESTUDIO DE SUPERIORIDAD)

Es estadísticamente significativa con P=0,05

(justo en el límite)

d > 0+ efecto

d = 0No hay

diferencia

d < 0- efecto

IC 95%



ESTUDIO DE EQUIVALENCIAESTUDIO DE EQUIVALENCIA

d > 0+ efecto

d = 0No hay diferencia

d < 0- efecto

-d +d

Región de equivalencia

clínica

Delta: (Delta: ())• mayor diferencia sin relevancia clínica mayor diferencia sin relevancia clínica

o o • la menor diferencia con relevancia clínicala menor diferencia con relevancia clínica


EQUIVALENCIAEQUIVALENCIA

0

Equivalencia No equivalencia


NO-INFERIORIDAD TERAPÉUTICANO-INFERIORIDAD TERAPÉUTICA

No-Inferioridad No No-Inferioridad

0-



Main effi cacy End-Point

40%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active Placebo

30%

B

A

P

1/2 ?1/3 ?



40%

15%

45%40%

20%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active 1 Active 2 Active 3 Placebo 1 Placebo 2 Placebo 3



33%

8%

33%

13%

3%

40%

15%

58%

40%

20%

10%

47%

22%

65%

47%

27%

17%

51%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active 1 Active 2 Active 3 Placebo 1 Placebo 2 Placebo 3



40%

10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Active REF Placebo Active Test

30%

Documents

[email protected] 1 p 2 Clinical Trial Investigation Interpretation of Results “to p or not to p” Ferran Torres Hospital Clínic Barcelona / Universitat