52
Reverend Bayes Sample Sizes and Statistical Analysis

Reverend Bayes Sample Sizes and Statistical Analysis

  • Upload
    indra

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Reverend Bayes Sample Sizes and Statistical Analysis. Statisticians. Statisticians are generally ‘power’ mad, that is they want to minimise uncertainty around any effect estimate. Two camps: Frequentists and Bayesians. Frequentists. - PowerPoint PPT Presentation

Citation preview

Page 1: Reverend Bayes Sample Sizes and Statistical Analysis

Reverend BayesSample Sizes

and Statistical Analysis

Page 2: Reverend Bayes Sample Sizes and Statistical Analysis

Statisticians Statisticians are generally ‘power’

mad, that is they want to minimise uncertainty around any effect estimate.

Two camps: Frequentists and Bayesians.

Page 3: Reverend Bayes Sample Sizes and Statistical Analysis

Frequentists The philosophy underlying most of

our statistics is frequentist. Frequentists produce the ‘null’

hypothesis. The experiment is set up to ‘prove’ the null hypothesis of ‘no difference’.

See themselves as more ‘objective’ and ‘scientific’ than Bayesians.

Page 4: Reverend Bayes Sample Sizes and Statistical Analysis

Frequentist Null hypothesis This is nonsense! If we truly believed in the null

hypothesis we would not undertake a trial. We would just chose the cheapest treatment or give treatments according to patient preferences.

We usually have an ‘idea’ about the likely effect of a treatment.

Page 5: Reverend Bayes Sample Sizes and Statistical Analysis

Reverend Bayes A minister who lived in the 18th

Century but dabbled in statistics. Produced Baye’s theorem, which

includes prior beliefs in statistical calculations.

Was not published ‘till 20 years after his death - truly ‘publish or perish’.

Page 6: Reverend Bayes Sample Sizes and Statistical Analysis

God and statistics Frequentists believe the ‘truth is

out there’ and we are getting sample estimates of the truth.

Bayes believed only God can know the ‘truth’ and as mere mortals we can only gain probability estimates of the truth, which is why he developed Bayes theorum.

Page 7: Reverend Bayes Sample Sizes and Statistical Analysis

Bayesians Until recently Bayes’ approach only

used in diagnostic testing in health research.

Widely used in other areas. Not widely used partly because of

computational difficulties but also many think it is ‘unscientific’.

More recently computational problems have been largely solved and increased interest in using the method.

Page 8: Reverend Bayes Sample Sizes and Statistical Analysis

Bayesian statistics The Bayesian approach is

attractive as it is similar to everyday decision making.

One uses prior experience to make a judgement and use new data to inform future decisions.

Page 9: Reverend Bayes Sample Sizes and Statistical Analysis

Bayesians vs Frequentists When we seek to observe a 50%

increase or decrease in essence this is a Bayesian approach as we have a prior belief that A may be 50% more effective than B.

If we had a belief in the null hypothesis then the sample size would be infinite to ‘prove’ no difference.

Page 10: Reverend Bayes Sample Sizes and Statistical Analysis

Prior beliefs Bayesians want to be more explicit

about prior beliefs and include these in a design and analysis.

Data would have to be particularly strong to overturn a prior belief or weaker to confirm.

Page 11: Reverend Bayes Sample Sizes and Statistical Analysis

Bayesian Problems Bayesians argue that one should

keep doing a study until the confidence in the results are credible enough to stop the trial.

Problem that one cannot really plan a trial unless we have a prior sample size.

Page 12: Reverend Bayes Sample Sizes and Statistical Analysis

Not Scientific Prior beliefs may be so incorrect

that they could mislead research. Strong prior belief was HRT prevented heart disease. Shown to be untrue. Small trials showing this to be a fallacy would not overturn this strong belief.

Page 13: Reverend Bayes Sample Sizes and Statistical Analysis

GRIT Trial – Bayesian trial The design of this trial included prior

beliefs on the effectiveness of early or late delivery of babies.

Data were analysed every 6 months (without p values) and presented to clinicians in order for them to change their minds and either randomise more patients or stop randomising.

Page 14: Reverend Bayes Sample Sizes and Statistical Analysis

Bayesian analysis Expect to see more studies using

Bayesian methods in the future. Rapid area of statistical and

economic research.

Page 15: Reverend Bayes Sample Sizes and Statistical Analysis

Statistical Outcomes Two measures of effect.

Dichotomous = yes/no; dead/alive; passed/failed.

Continuous = blood pressure; weight; exam scores.

Page 16: Reverend Bayes Sample Sizes and Statistical Analysis

Binary outcomes Basically in a RCT we can compare

the percentages in the two groups. If the percentages are significantly

different this is due to the intervention.

Page 17: Reverend Bayes Sample Sizes and Statistical Analysis

Continuous outcomes Scores, such as blood pressure,

quality of life, test scores are compared. Usually the mean scores are compared although sometimes the medians are used.

Usually, mean scores have a ‘normal’ or near normal distribution.

Page 18: Reverend Bayes Sample Sizes and Statistical Analysis
Page 19: Reverend Bayes Sample Sizes and Statistical Analysis

Standard deviation This is calculated by taking the

differences of individual scores from the mean squaring these differences and dividing by the number of observations.

The square root is the SD of this.

Page 20: Reverend Bayes Sample Sizes and Statistical Analysis

Effect sizes The effect size is the difference between

means divided by the standard deviation.

If students in Group A have a mean score of 60 vs 50 in Group B and the standard deviation is 20 the effect size is 0.5 (10/20).

Few new health care treatments get effect sizes GREATER than 0.5.

Page 21: Reverend Bayes Sample Sizes and Statistical Analysis

Relative Risks etc Binary outcomes are often described in

relative risk or odds ratios. Relative risk is: if 10/100 events in group A versus 5/100 in group B. A vs B RR = 2 (10%/5%) B vs A RR = 0.5 (5%/10%).

Odds ratios produce similar results for rare events.

Confidence intervals passing through 1 = not statistically significant.

Page 22: Reverend Bayes Sample Sizes and Statistical Analysis

Sample sizes for trials The bigger the better – size

matters in trials. Most trials approach sample size

estimation using a frequentist approach.

Page 23: Reverend Bayes Sample Sizes and Statistical Analysis

Background Many trials are ‘underpowered’ that is

they are too small to detect a difference that is important.

This is commonly referred to as a Type II error.

At least 30% of trials published in major general journals are underpowered.

This is worse among other journals.

Page 24: Reverend Bayes Sample Sizes and Statistical Analysis

Meta-analysis of Hip Protectors (Ranked by Size)

Energy absorbing or unknow types

‘Shell’ type Protectors

Community

Community

Nursing Home

Community

Community

Page 25: Reverend Bayes Sample Sizes and Statistical Analysis

Hip protector trials All trials (bar ours of course) were

underpowered to detect large (e.g., 50%) reductions in hip fractures.

Small positive trials tended to be published giving an overestimated effect of benefit.

Page 26: Reverend Bayes Sample Sizes and Statistical Analysis

Sample size estimation Text books usually recommend the

following approach to sample size estimation. Define a clinically important difference

in outcome between treatments; Design an experiment that is

sufficiently large to show that such a difference is ‘statistically significant’.

Page 27: Reverend Bayes Sample Sizes and Statistical Analysis

Clinical Significance The first problem is definition of

what is ‘clinically significant’. This is usually unclear.

Any difference of death, for example, is pretty clinically significant.

To power a trial to reduce mortality by 1 death would require an almost infinitely large study.

Page 28: Reverend Bayes Sample Sizes and Statistical Analysis

Epidemiological Significance A more common justification of

sample size is observed effect sizes from epidemiological studies (which may be overestimates).

Or from meta-analyses of smaller trials (which again may over-estimate due to publication bias).

Page 29: Reverend Bayes Sample Sizes and Statistical Analysis

Statistical significance What is statistical significance?

Tradition in medical research states that p = 0.05 or lower is significant. Difference between p = 0.05 and p = 0.06 is trivial, one is significant and the other is not.

Other disciplines, economics, sometimes use p = 0.10.

Page 30: Reverend Bayes Sample Sizes and Statistical Analysis

P values Originally Pearson constructed p

values as a guide not as a cut off. The idea was that given what was known about a treatment (side-effects etc) the p value would add extra information as to whether one should accept the finding.

But p value = 0.05 has become set in stone.

Page 31: Reverend Bayes Sample Sizes and Statistical Analysis

Fallacy of P values If there is a treatment effect that is not

statistically significant p = 0.20 and the null hypothesis is accepted (I.e. there is no difference) you would have only a 20% chance of being correct and 80% of making the wrong decision.

Really one should go for a treatment that the data favours irrespective of the p value.

Page 32: Reverend Bayes Sample Sizes and Statistical Analysis

Significance BOTH clinical and statistical

significance are often arbitary constructs.

‘Economic significance’ can be less arbitrary.

One can ascertain an economic difference that makes sense.

To demonstrate cost neutrality is a significant endpoint.

Page 33: Reverend Bayes Sample Sizes and Statistical Analysis

Economic Significance For example, a randomised trial of two

methods of endometrial resection was powered to detect a 15% difference in ‘satisfaction’.

Important clinical outcome was re-treatment rates.

An economic difference of significance was about 8% in retreatment rates as this would be cost saving.

Torgerson & Campbell BMJ 2000;697.

Page 34: Reverend Bayes Sample Sizes and Statistical Analysis

Endometrial Resection The trial was only sufficiently

powerful to show a 12% difference in retreatment rates.

Trial showed a 4% difference (95% CI of –4% to 11%) but could not exclude an 8% difference.

Pinion et al. BMJ 1994;309:979-83.

Page 35: Reverend Bayes Sample Sizes and Statistical Analysis

Forget theory What normally happens is Clinician

says to statistician I can get 70 patients in a trial in a year.

Stato says ‘needs to be bigger’ clinician has a couple of mates who can add 140 more. Statistician calculates difference that 210 participants can show.

Page 36: Reverend Bayes Sample Sizes and Statistical Analysis

What should be done? For a continuous outcome (e.g. Quality of

Life, blood pressure) we should aim to detect AT LEAST half a standardize effect size, which needs 128 participants.

Ideally we need to detect a somewhat smaller difference.

For dichotomous outcome we should have enough power to detect a halving or doubling.

Page 37: Reverend Bayes Sample Sizes and Statistical Analysis

Attrition and clustering Do not forget to boost sample size to

take into account loss to follow-up.

Depending on patient group this might range from 5-30%.

Finally, if it is a cluster trial total sample size needs to be inflated.

Page 38: Reverend Bayes Sample Sizes and Statistical Analysis

How to calculate a sample size This is easy. Lots of tables or

programmes will do this. For continuous outcomes a simple formulae is: take standardised difference and divide the square of this into 32 (80% power) or 42 (90% power).

E.g., 0.5 squared is 0.25 32/0.25 = 128 or 42/0.25 = 168.

Page 39: Reverend Bayes Sample Sizes and Statistical Analysis

For binary outcomes Look at sample size tables or use

programme, but rule of thumb about 800 is needed for 80% power to show 10% difference between 40% and 50% or 50% and 60%. To see 5% difference quadruple sample size.

Page 40: Reverend Bayes Sample Sizes and Statistical Analysis

Cluster trials For cluster trials we need to inflate

the sample size to take into account the ICC of the clusters. 1+(cluster size X ICC) = design effect.

For example, a RCT of adult literacy classes mean size = 8. ICC from a previous trial shows ICC of reading = 0.3.

Page 41: Reverend Bayes Sample Sizes and Statistical Analysis

Cluster sample size We want to detect 0.5 difference

which for an individual RCT = 128 for 80% power. Cluster size = 8 take 1 =7.

7 x 0.3 = 2.1 + 1 = 3.1 = 397 participants or t 50 clusters of a mean of 8 per cluster.

Page 42: Reverend Bayes Sample Sizes and Statistical Analysis

Analysis The first analysis that many people

do is compare groups at baseline. Typical many comparisons are

made, for example, a paper of a trial in the most recent JAMA (Feb 4, 2004) shows this typical baseline comparison table.

Page 43: Reverend Bayes Sample Sizes and Statistical Analysis

Baseline Tests (n = 24 tests)

Page 44: Reverend Bayes Sample Sizes and Statistical Analysis

Baseline testing Of the 24 comparisons 3 were

‘statistically’ significant (I.e, p < 0.05).

What should we do with this information?

Has randomisation failed? It is useless information and an

exercise in futility.

Page 45: Reverend Bayes Sample Sizes and Statistical Analysis

Baseline testing Assuming randomisation has not

been subverted, which in this case looks unlikely, then any differences will have occurred ‘by chance’ they are random differences.

Page 46: Reverend Bayes Sample Sizes and Statistical Analysis

What is wrong with baseline testing? Baseline testing will ALWAYS throw

up chance differences. This can mislead the credulous into believing there is something ‘wrong’ with the study. Also it can mislead some statisticians into ‘correcting’ these baseline imbalances in the analysis.

Page 47: Reverend Bayes Sample Sizes and Statistical Analysis

Baseline variables: What should be done?

Before the study starts specify in advance important co-variates to be used in the analysis (e.g., centre, age) and adjust for these IRRESPECTIVE of whether or not randomisation balances them out.

Page 48: Reverend Bayes Sample Sizes and Statistical Analysis

Interim Data Analysis This is where the trial is analysed

BEFORE completion. This is done usually for ethical reasons so

that a trial can be stopped early if there is an overwhelming benefit or harm.

Women’s Health Initiative trial undertook an interim analysis and the trial was stopped because of harm.

Page 49: Reverend Bayes Sample Sizes and Statistical Analysis

Dangers of Interim Analysis

Sample size calculations assume 1 analysis. Repeated looks at the data WILL showed a significant differences, by chance, even when no difference exists.

The temptation is to stop the trial early when a statistical significance is achieved.

This could be a chance finding.

Page 50: Reverend Bayes Sample Sizes and Statistical Analysis

Interim Analysis To avoid premature stopping of a trial

interim analyses are usually undertaken by an independent committee with experience trialists.

Statistical significance is adjusted to take repeated looks of data into account (so p = 0.01 is significant rather than p = 0.05).

Page 51: Reverend Bayes Sample Sizes and Statistical Analysis

Analysis All point estimates should be

bounded by confidence intervals as well as the exact p value. A single principal analysis should be stated in advance (e.g., the primary outcome was a reduction in ALL fractures) secondary analysis are for research interest only.

Page 52: Reverend Bayes Sample Sizes and Statistical Analysis

Summary Sample size estimation is EASY.

The difficult bit is determining the likely effect size to inform the calculations.

Analyses are more straightforward from RCTs than non-RCTs because you do not need to adjust for baseline co-variates.