CRITICAL ANALYSIS WHICH RESEARCH DESIGN FOR WHICH CLINICAL PROBLEM?

CRITICAL ANALYSIS

WHICH RESEARCH DESIGN FOR

WHICH CLINICAL PROBLEM?

RCT Experimental

Treatment or intervention issue

UNCONTROLLED CLINICAL TRIAL or a HISTORICAL CONTROL GROUP(methodologically less rigorous)

Experimental

Treatment or intervention issue where nature of intervention makes proper control group impossible or unethical Early in a field of research -o to explore the safety of a new intervention o to identify unanticipated effectso to gather baseline data for the planning of more definitive trials

COHORT STUDY

Epidemiologic / Observational

Causation issue Prognosis issue (as deliberate experimental exposure of subjects to risk factors is unethical)

CASE CONTROL STUDY(methodologically weaker than a cohort study)


Causation issue Prognosis issue

(where a cohort study is impractical – e.g. rare conditions, or ones taking a long time to develop)

DIAGNOSTIC TEST STUDY

Observational

Evaluation of a proposed test or diagnostic instrument

CROSS-SECTIONAL STUDY(weaker than case control study)


Causation (cheaper than case control but can only find associations not causality)

CASE SERIES(methodologically v. weak)

Initial description & speculation on issues as above pre real research

SYSTEMATIC REVIEW Integrative

Integration of studies on one of the above issues to produce an overall conclusion

e.g. a meta-analysis is one type – most to date are on treatment issues

Appraising a Clinical Experimental Study:

Population/Subjects:What was the source population?What were the inclusion/exclusion criteria?Were they be a representative and relevant sample?How long was the follow-up period?

Are the Results of the Study Valid? Randomization? Was randomization list hidden?Were baseline characteristics of the groups same at start? Was there an intention-to-treat analysis?Were interventions & outcomes clearly defined & replicable?How complete was blinding? Assessed at end?Apart from the experimental intervention, were the groups treated equally ? i.e. same co-interventions?Was comparison group contaminated with main interventions?Was compliance with interventions measured/assured?Were all accounted for at end? (was follow-up complete)Was follow-up time sufficient to detect relevant outcomes?

Results:How large were the intervention effects?what measure(s) of 'event rate' or outcome were used?What was the NNT or NNH ?How accurate were estimations of the intervention effects

e.g. p-values, confidence intervals How large were the intervention effects?Did the study have sufficient power?

Applicability and Conclusions:Applicability & Relevance?

(to your patients, and is the treatment feasible & available)Were all important outcomes considered? Are the likely treatment benefits worth the potential harm & costs? (adverse effects) Strengths and weaknesses?

Appraising a Diagnostic Study:

Population/Subjects/Setting:What was the source population tested?What were the inclusion/exclusion criteria?Were subjects a representative and relevant sample?How did they recruit subjects?

Validity:Was there a Comparison with a 'Gold Standard Test':How did they define 'caseness' to be detected by the test?If is no 'Gold Standard', can test be validated in other ways?Was there blinding of Subjects and of Investigators to theory?How thorough was this and was it assessed at the end?Was Sample Size OK re Power?Did all subjects get both new test and Gold Standard test?Was there testing by 2 independent investigators?Was there planning for any adverse effects & dropoutsStatistical analysis sensible? Test-retest issues discussed?

Conclusions:Sensitivity - Proportion of true positives identified by a test or by epidemiological screening.Specificity - Proportion of true negatives identified by a test or by epidemiological screeningDid the test work as well as Gold Standard? Benefits vs harm?Relevance? Practicality in the real world? Are the likely clinical benefits worth the potential harm & costs? (e.g. adverse effects) Strengths and weaknesses? How could it be improved?

APPRAISING A CAUSATION STUDY:

Population/Subjects:What is the source population being studied?Did they define 'exposed' group vs 'comparison' group (cohort study) Or define controls (case-control study) - any randomisation?What were the inclusion/exclusion criteria?Were subjects a representative and relevant sample?How did they recruit subjects and comparisons/controls?

Basic Structure of Study:Cohort study? A Longitudinal study in which groups of people are interviewed repeatedly over a period of time - respondents usually share a common characteristic. Where the same group of people are followed up over time

this is known as a cohort study. If a group of different people are interviewed in each wave a survey this is known as a trend design.

Case-control study? (did exposure precede outcome?)Cross-sectional study? (did exposure precede outcome?)

Did Researchers Define: The causal factor studied - is their theory sensible?The 'outcome' caused by causal factor?Often the Risk Ratio is discussed (A comparison of the risk of some health-related event such as disease or death in two groups)

Was there Blinding?Re the hypothesis - ideally both subjects & assessorsHow good was this and was it assessed at the end?

Data & Validity:Was Sample Size Ok re Power Did they follow-up long enough? How did they allow for and manage dropouts?Significance? C.I.s? dose-response? Specificity?

Conclusions:Relevance & usefulness? Strengths and Weaknesses of study?How could it be improved?

APPRAISING A PROGNOSIS STUDY:A Prognostic Factor is a patient characteristic that can predict the patient's eventual outcome: a demographic: e.g. sex, age, race disease-specific: e.g. tumour stage, symptom pattern comorbidity: other co-existing conditions

Articles that report prognostic factors often use two independent patient samples: derivation sets asks - "what factors might predict patient outcomes?" validation sets ask - "do these prognostic factors predict patient outcomes accurately?"

Methods:Design? (cohort / case series / prospective vs. retrospective)Setting? hospital / location / clinicPatient Population? - number / screening or enrollment methods / number screened vs number enrolledDescription of prognostic or outcome factors* considered

*Prognostic Outcome Factors are the numbers of events that occur over time, expressed in: absolute terms: e.g. 5 year survival rate relative terms: e.g. risk from prognostic factor survival curves: a curve that starts at 100% of the study population and shows % of the population still surviving at successive times. Applied to onset

of a disease, complication or some other endpoint (e.g. time before relapse)

Validity:Was a defined, representative sample of patients assembled at a common (usually early) point of the illness ?Inclusion and exclusion criteria?Selection biases?Stage of disease?Was patient follow-up sufficiently long & complete?Reasons for incomplete follow-up?Prognostic factors similar for patients lost and not-lost to follow-up?Were objective & unbiased outcome criteria used?Outcomes defined at start of study?

ValidityAssessors and subjects blinded to prognostic factor theory?Statistical models seem OK?Follow-up: duration / completeness / accounting for patientsIf subgroups with different prognoses were identified :Was there adjustment for important prognostic factors?Are the (hopefully valid) results of this prognosis study important? i.e.: How large is the likelihood of the outcome event(s) in a specified time? Survival curves? How precise are prognostic estimates? Confidence intervals?

Conclusions:Strengths and Weaknesses of Study In context of other studies &/or current standard of care?Next steps for further study of this problem?

Can you apply the (hopefully valid & important) results of this study to caring for your own patients? - i.e.:were the study patients similar to your own? patients similar for demographics, severity, co-morbidity, and other prognostic factors?

will this evidence make a clinically important impact on your views on what to tell or to offer your patients? Compelling reason why the results should not be applied?Will the results lead directly to you selecting or avoiding therapy?Are the results useful for reassuring or counselling patients?

Incidence can be defined as the number of new occurrences of a phenomenon e.g. illness, in a defined population in a

specified period. An incidence rate would be the rate at which new cases of the phenomena occur in a given population.

Prevalence (also called Prevalence Rate re prevalence across time)

the number of cases (or events, or conditions) within a specified time period. e.g. prevalence of a condition includes all people with the condition even if the condition started prior to the start of the specified time period. Period prevalence The amount a particular disease present in a population over a period of time. Point prevalence The amount of a particular disease present in a population at a single point in time.

Appraising Systematic Reviews

(of treatment / Intervention Studies)What were the relevant population(s)?

What were the main exposure(s)?What were the comparison(s)?What were the outcome(s)?

Design of the Studies:experimental or non-experimental ?cross-sectional or longitudinal ?All trials included in a review should first have been appraised using the model for experimental studies

Validity of Review Results:were the criteria used to select studies for inclusion in Review both explicit and appropriate?Is it likely that any important, relevant studies were missed? (completeness of literature search)Was the validity of the included studies appraised?Were assessments of the studies reproducible? (documented and replicated)Were the results similar from study to study? (tests of heterogeneity)

Results (Size of Effects and Precision):What were the overall results of the review - how large were the effects ?How precise were the results ?

Applicability & Relevance:Are the results applicable in normal practice?Were all important outcomes considered?Are the likely treatment benefits worth the potential harm & costs ? (e.g. adverse effects etc.)Strengths & weaknesses of the Review?How could the Review be improved?

Critical Appraisal - NNTS & NNHSDecide from reading the study if the experimental group had a better outcome than the control group - if so, do the NNT

Orif the control group had a better outcome than the experimental group - if so, do the NNH

When the experimental treatment decreases risk of an undesirable outcome NNT and RBI (relative benefit increase) are useful: Number Needed to Treat = number of patients who need to be treated to cause 1 good outcome Number Needed to Harm = number of patients who need to be treated to cause 1 bad outcome

EER: event rate in the experimental group CER: event rate in the control group

If this is a difference, ignore minus signs except as a reminder as to whether treatment was overall helpful or harmful

E (event) = % outcome (express it as a decimal eg. 40% occurrence as 0.4)

e.g. in a study comparing mood stabilisers, a bad outcome might be that the manic state does not improve with the treatment, or gets worse

Absolute Benefit Increase: when the treatment benefits more experimental subjects than occurs with those in the control group: ABI = [ EER - CER ]

Relative Benefit Increase: fewer bad outcomes in the experimental group compared with the control group RBI = [ EER - CER ] / CER

NNT = 1 / ABI

EXAMPLE:

Treatment of acute mania.Results are a reduction of a certain amount on the young mania rating scale (YMRS) After 1 week

DRUG A:65% OF SUBJECTS HAD OUTCOMEPLACEBO:30% OF SUBJECTS HAD OUTCOME

[EER: event rate in the experimental groupCER: event rate in the control group]

E (event) = % outcome 65% (S) & 30% (C)EER IS THUS 0.65 CER IS THUS 0.30ABI = [ EER - CER ] 0.35NNT = 1 / ABI 1 / 0.35 = 2.86

So number needed to treat is close to 3 - i.e. We have to treat 3 patients for 1 to get benefit. This would be an extremely good and impressive NNT.

Asking a Research Question:What is the Question? (the Clinical Problem to be answered)

What sort of Issue being investigated: An Intervention or Treatment ? A Diagnostic Test or Instrument ? A Causal factor ? A Prognostic Factor ?

What is the main alternative for Comparison: A Control group? A Comparison group? A Placebo group? Comparing 2 interventions?

What is the main Outcome or Outcomes?

Examples

You are sure that on-call nights for psychiatric registrars and crisis nurses are always busier when there is a full moon. How would you try to determine whether this is in fact the case?

You are working in the C-L service of a general hospital. Budget cuts are threatened and you have to justify maintaining the C-L service to several medical wards. One ward refers to C-L a lot, and the other hardly ever. You feel that your service’s C-L input shortens the length of stay for patients with delirium and self-harm. How could you demonstrate this in time for next year’s budgeting round in 9 months’ time?

Significance - p values:

The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or difference (e.g., between means) in a sample occurred by pure chance, and that in the population from which the sample was drawn, no such relationship or differences exist.

The p-value represents the probability of error in accepting our observed result as valid, or "representative of the population."

P-valuesA p-value of 0.05 (1 in 20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke."

p values of <0.05 are by convention 'just' significant but this level of significance still involves a pretty high probability of error (5%).Results that are significant at the p <0.01 level are considered by convention statistically significant, and p <0.005 or p <0.001 levels are often called “highly significant”.

Data-mining and spurious significance:

The more analyses you perform on a data set, the more results will "by chance" meet the conventional significance level.

For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p <0.05 level, even if the values of the variables were totally random and don't correlate in the population.

Some statistical methods that involve many comparisons include some "correction" for the total no. of comparisons - but not all do.

Variable A variable is a phenomenon that varies and is measurable.

Independent variable is one which ‘causes’ the dependent variable – also known as the “effect” variable. E.g. the intervention or treatment in an experiment - is manipulated to show change in the dependent variable.

Dependent variable is known as the outcome variable. The value depends on other independent variables and its value will change as the independent variable or intervention changes.

Confounding variable one which systematically varies with the independent variable and also has a causal effect on the dependent variable. The influence of a confounding variable may be difficult to identify, since it can be hard to separate the independent variable from confounding variables in real life.

Extraneous variable is a variable other than the independent variable which may have some influence on the dependent variable and may be a potential confounding variable if it is not controlled for.

Intervening variable occurs in the causal pathway between the independent variable and the dependent variable. It is statistically associated with both the independent ad the dependent variable.

Shows the extent to which a change in one variable is associated with change in another variable – the relationship between them.

Best to have +/-0.90 and above to show a correlation

Range from -1.00 to +1.00.

-1.00 = perfect (strong) negative relationship.

+1.00 = perfect (strong) positive relationship.

0.00 (midpoint) = no relationship at all.

Correlation Coefficients:

Strength vs Reliability of a Relationship Between Variables:

In general, in a sample of a particular size, the larger the size of the relationship between variables, the more reliable the relationship.If there are few observations, then there are also few possible combinations of values, so the probability of a chance combination showing a strong correlation is high - so small 'n' studies are statistically weak. If a correlation between variables in question is very small in the population, then there's no way to identify it in a study unless the sample is very large. Similarly, if a correlation is very large in the population, then it can be found to be highly significant even in a very small sample. If a coin is slightly asymmetrical, and when tossed is slightly more likely to produce heads than tails (e.g. 60% vs. 40%), then ten tosses would not be enough to show that the coin is asymmetrical. But if the coin is weighted to almost always fall as heads, then ten tosses would be quite enough to show this.

Reliability the extent to which a measure gives consistent results. It is also a pre-condition for validity. Validity the extent to which a study measures what it purports to measure.

Internal validity relates to the validity of the study itself, including both the design and the instruments used.

External validity extent to which findings from a study can be generalised to a wider population and be claimed to be representative.

Face validity extent that the measure or instrument being used really measures what it is supposed to.

Construct validity extent to which the measurement corresponds to the theoretical concepts (constructs) concerning the object of the study.

Content validity is a set of operations or measures which together operationalize all aspects of a concept.

Criterion validity is the extent to which measurement correlates with an external indicator of the phenomenon. There are two types of criterion validity - concurrent and predictive: i) concurrent validity is a comparison against another external measurement at the same point in time ii) predictive validity is the extent to which the measurement can act as a predictor of the criterion. Predictive validity can be useful in relation to health since it can act as a early risk indicator before a condition develops in full.

Instrument validity is the extent to which the instrument or indicator measures what it is supposed to measure. A study could have instrument validity but still lack validity overall due to lack of external validity

Bias (also called Error)

is a deviation of the results from the truth.

Systematic error (or Constant error)

can be caused by the presence of a confounding variable in an experiment – e.g. sampling bias

Random error is non-systematic bias which can negate the influence of the independent variable. Reliability is affected by random error.

Other terms and concepts to learn:

• Measures of Central Tendancy and of Variability

• Types of Data

Confidence Interval:If the Confidence Interval does not overlap zero, the effect is said to be statistically significant

CI is range of values, within which we're fairly sure the true value of the parameter being investigated lies.

If independent samples are taken repeatedly from the population & a Confidence Interval calculated for each, a certain % (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%.

Width of the confidence interval shows how uncertain we are about the unknown parameter. Very wide interval more data should be collected before anything definite can be said about parameter.

Odds Ratios:Compares frequency of exposure to risk factors in epidemiological studies

The odds ratio is a reasonable approximation of the relative risk when the outcome is relatively large (e.g., when less than 1% of the people exposed to an agent develop disease). The odds ratio produces larger errors as the outcome rate rises above 1%.

You can say that a proposed risk factor acts as a significant risk to disease if: odds ratio is >1 lower edge of the C.I. >1

VARIOUS TESTS:Have some idea what each is for -

A reasonable reference is:http://www.une.edu.au/WebStat/unit_materials/c6_common_statistical_tests/

Parametric Tests and Non-Parametric Tests Nonparametric methods are used when we know nothing about the distribution of the variable in the population.

Not so much that they are for non-normal distributed data, but there's no assumption of a normal distribution Parametric tests are used where there is a normal distribution

Parametric vs Non-Parametric tests

Memorize a name of each sort – e.g.:

Parametric test Non-parametric analogue

One-sample t-test Nothing quite comparable

Paired sample t-test Wilcoxon t-test

Independent samples t-test

Mann-Whitney U Test

Pearson's correlation Spearman's correlation

Null Hypothesis

The alternative hypothesis (to the researcher’s theory). It usually assumes that there is no relationship between the dependent and independent variables. The null hypothesis is assumed to be correct until research demonstrates that it is incorrect. This process is known as falsification.

POWER

Type I Error Rate (Alpha) The probability of incorrectly rejecting a true null hypothesis (a Type I error gives a

false positive result)

Type II Error Rate (Beta) The probability of incorrectly accepting a false null hypothesis (a Type II error gives a

false negative result)

In the social sciences there are conventions that: = the Type I error (risk of a false positive)

- must be kept at or below 0.05 (50%) = the Type II error (risk of a false negative)

- must be kept low as well (20% or less, generally)

Statistical Power is equal to 1 - and must be kept correspondingly high

Power should be at least 0.80 (80%) to detect a reasonable departure from the null hypothesis

Statistical Power = The probability of rejecting a false null hypothesis

In Reject-Support (RS) research (the usual kind):

(the opposite is true in Accept-Support AS research)

1. The researcher wants to reject the null hypothesis 2. "Society" wants to control Type I error (false positives)3. The researcher is very concerned about Type II error (false negative - missing the

fact that you have a result that supports your theory - is much more likely to get published)

4. High sample size works for the researcher 5. But if there is "too much power", trivial effects become "highly significant"

Factors influencing power in a statistical test:

1.What kind of statistical test is being used 2.Sample size 3.Size of the experimental effect 4.Level of error in experimental measurements

A Sampling Distribution: the distribution of a statistic over repeated samples

The Standard Error of the Proportion: the standard deviation of the distribution of the sample proportion over repeated samples

Power Analysis in Studies

In planning a study, one must estimate:

1. What would be the reasonable minimum experimental effect that one wants to detect

2. A minimum Power to detect that effect3. The sample size that will achieve that desired level of Power

Steps required for Power analysis and sample size estimation:

1. The type of analysis and the null hypothesis are specified 2. Power and required sample size for a reasonable range of likely

experimental effects is investigated 3. The sample size required to detect a reasonable experimental effect

(i.e. departure from the null hypothesis) with a reasonable level of power is calculated, while allowing for a reasonable margin of error

Method: (Excerpt) Statistical analysis

“It was estimated that in order to detect a 30% difference between the percentage of responders in the control group compared with that in the exercise group at the P=0.05 level of significance, a sample size of 40 subjects per group would be required to

give a power of 90%. Data on poorly responsive depression are scant but the proportion of responders in the control group was reasonably anticipated to be 10%, compared with an anticipated

40% in the exercise group. “

Was a power analysis done prior to the study? What is the main implication?

Yes. The power was set at 0.9 (90%)Power = 1-beta (beta is the probability of making a Type-II

error)So, 0.9 = 1- beta, or Beta = 1 - 0.9, which is 0.1 or 10%.

Thus the risk of making a Type-II error in this study was 10%, as opposed to most studies which set Power at 0.8 - i.e. they tolerate a risk of 20% of making a Type-II error (a false negative)

Main Implication was that the study did have enough power to detect a significant improvement, which it did not do

Ethics in Research: http://www.wma.net/e/policy/b3.htm

World Medical Association Helsinki principles for research in humans

Ethics Committees: Think about their role and how to design studies to meet these requirements

RANZCP principles from Code of Ethics:Psychiatrists involved in clinical research shall adhere to those relevant ethical principles embodied in national and international guidelines:

College Code of Ethics (paraphrased)

It's done on people so high standards are needed and must be scientifically justifiedMust be OKd by an Ethics Committee Minimize any harm to subjects The interests of subjects always takes precedence over science or society's interests Informed consent must be obtained from people participating in researchSpecial care to be taken with consent from those in dependent relationships, eg. students, prisoners, the elderlyFor minors - consent from parent/guardian

If subjects aren't competent to consent get this from a relative or guardian Subjects can withdraw at any time & it won't jeopardise their care If a researcher uncovers clinically relevant information needing acting on, researcher should tell the patient & their doctorConfidential information obtained from the research stays within the study No plagiarism, acknowledge all referencesResearch reports to be truthful and accurate Ensure participants are deidentified Declare any conflict of interest in all publications

College Code of Ethics (paraphrased)

Documents

CRITICAL ANALYSIS WHICH RESEARCH DESIGN FOR WHICH CLINICAL PROBLEM?