View
3.230
Download
2
Category
Preview:
DESCRIPTION
This is the presentation of the BITS training session on "Essential statistics".View more material on http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203865:essential-statistics&catid=81:training-pages&Itemid=190
Citation preview
Introduction to statistics
Els Adriaens, PhD
December 17, 2010 1
Overview
Outline
Formulate a relevant research question
Study design
Gather the data according to the plan
Analyze the data
Explorative data-analyses (descriptives, graphically)
Drawing inference (answer our research question with certain confidence)
Report the results
Overview 2
Experimental versus observation studiesDesign of an experimental studyOverview study designs
Experimental studyObservational studyMixed experimental and observational studies
Part 1 – Design of a study 3
Part 1
Design of a study
Experimental versus observation studiesDesign of an experimental studyOverview study designs
Experimental studyObservational studyMixed experimental and observational studies
Experimental study
Factor levels (treatments) randomly assigned over the different experimental units (control over explanatory variable)
→ information about cause-and effect relationship between the explanatory factors and a response variable
Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children were selected at random and received Vit C (treatment group) the remaining children received a placebo (control group)
Qualitative explanatory factor with two levels and children as experimental units
4Part 1 – Design of a study
Experimental versus observation studiesDesign of an experimental studyOverview study designs
Experimental studyObservational studyMixed experimental and observational studies
Observational study
Data obtained from non-experimental study: explanatory variables not controlled, randomization of the treatments to experimental units does not occur
→ establish associations between the explanatory factors and a response variable
Example: Company officials wished to study the relation between the age of an employee and the number of days of illness in a year.
Explanatory variable not controlled → age is observed
Establish associations but no cause-and-effect: a positive relation between age and number of days of illness may not imply that number of days of illness is the direct result of age → younger employees work indoors while older employees usually work outdoors, and therefore work location is more responsible for the number of days of illness instead of age
5Part 1 – Design of a study
Experimental versus observation studiesDesign of an experimental studyOverview study designs
Experimental studyObservational studyMixed experimental and observational studies
Mixed studies
Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug on lowering blood cholesterol was investigated. Within each hospital center volunteers were randomly assigned to one of the two treatments (drug / placebo)
Experimental factor: treatment (drug versus placebo)
Observational factor: hospital center, not randomly assigned since each volunteer was assigned to the nearest hospital center
6Part 1 – Design of a study
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Structure of the experiment
2 levels of factor A x 3 levels of factor B = 6 treatments
experimental unit: smallest unit of experimental material to which a treatment can be assigned, the experimental unit is determined by the method of randomization
7
1 2 3
4 5 6
Factor B
Factor A
Level 1 Level 2 Level 3
Level 1
Level 2
Experimental unitReplicates = treatment repeated → estimate experimental error
Part 1 – Design of a study
Number of factors: initial stages of investigation → include many factors (more than can possibly studied in a single experiment)
Cause-and-effect diagrams are often used to identify factors that could affect the outcome → reduce number of factors
Example : 4 factors each 2 levels → 16 treatment combinations
Number of levels of each factor:
Qualitative factors
Quantitative factors: # levels reflect the type of trend expected by the experimenter
• 2 levels ~ linear change in response: min – max of specified range
• 3 levels ~ quadratic trend
• > 4 levels ~ detailed examination shape of response curve desired
Range of factor is one of the most important design decisions
8
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Part 1 – Design of a study
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Measurements: precision versus accuracy
Precision of a variable: the degree to which a variable has nearly the same value when measured several times. It is a function of random error (chance) and is assessed as the reproducibility of repeated measurements.
Example: weigh the same person 3 times on an electronic balance and obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kg
The more precise a measurement, the greater the statistical power at a given sample size to estimate mean values and to test hypothesis
Variability may be due to operator, instrument and subject
Minimize random error and improve precision
Operating manuals, training the operator, refining / automating instruments
Repeat the measurement and average over a larger number of observations (but! added cost, practical difficulties)
9Part 1 – Design of a study
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Accuracy of a variable: the degree to which a variable actually represents what it is supposed to represent. It is a function of systematic error (bias) which is often difficult to detect and has important influence on the validity of the result.
Example 1: incorrect calibration of an instrument
Example 2: gastric freezing as a treatment for ulcers in the upper part of the intestine
Improve accuracy and minimize bias
Operating manuals, training the operator, refining / automating instruments
Periodic calibration using a gold standard (example 1)
Blinding: double–blind study: the experimental subject and the evaluator have no information on which treatment that they receive or give, any inaccuracy in measuring the outcome will be the same in the 2 groups (example 2)
10Part 1 – Design of a study
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Bias and variance in shooting arrows at a target. Bias means that the archer systematically misses in the same direction. Variance means that the arrows are scattered (Moore and McCabe 2002)
11Part 3 – Statistical inference
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Sampling from a population
Simple random sample
12
Random draws
With equal probability
Population (N elements) Sample (n elements)
Part 1 – Design of a study
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Randomization → treatments are at random assigned to experimental units
Tends to eliminate the influence of extraneous factors not under direct control of the experimenter
Blocking → increase precision by talking into account other factors
13
Subjects
Males
Group 1 → treatment 1
Group 2 → treatment 2
Group 3 → treatment 3
Females
Group 1 → treatment 1
Group 2 → treatment 2
Group 3 → treatment 3
Randomization
Randomization
Part 1 – Design of a study
HeterogeneousHomogeneous
Homogeneous
Experimental - observation studiesDesign of an experimental studyOverview study designs
Factors and treatmentsRandomizationSampling from a population
Measurements
Stratified Sampling
Suppose we want to know the attitudes of male and female students in the engineering school
Is a simple random sample from that school a good idea?
No too few women (10%)
Stratify the sample, pick a random sample from
Stratum 1: female engineers
Stratum 2: male engineers
Estimates are measured with comparable precission. Learn from distribution in each stratum, do NOT pool the data
e.g. if the average weight is 60kg for the women and 80 kg for the men, The average engineer will weight 10% x 60 + 90% x 80 = 78 kg
14Part 1 – Design of a study
Types of variablesUnivariate descriptivesBivariate descriptives
Part 2 – Explorative data-analysis 15
Part 2
Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Descriptive statistics
Allows the researcher to describe or summarize the data. This is typically done in the beginning of a results section. The researcher gives an idea of the sample size, the characteristics under study (e.g. baseline characteristics in a clinical trial)
Example: A total of 235 students participated in this study, 163 women (69.4%) versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a slightly higher score on exam 2 in comparison to the male students (80.7 ±18.1).
16Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
We typically start with univariate explorations (one variable at a time). Next, describe joint distributions (2 by 2 = bivariate; more variables = multivariate)
Graphical summary to inspect the shape of the distribution: symmetry, modality, heaviness of tails
Numerical summary: classical measures of location and spread
Mean and standard deviation
Median and interquartile range
Mode: value that occurs most often (useful for nominal data)
17Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Notes on notation
A random variable X is a variable whose value is a numerical outcome of a random phenomenon (nonnumerical outcomes are numerically encoded)
Random variables are usually denoted by capital letters such as X, Y, …
Fixed constants or observed values are usually denoted by small letters e.g. x, y. Special constants (to be specified) will be written as Greek letters α, β, μ, σ
indices i will subscript random or observed outcomes for individual observations in the data set: Yi , yi
18Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
19Part 2 – Explorative data-analysis
Type Characteristic Example Descriptive statistic
Information content
Categorical the set of all possible values can be enumerated
• Nominal Unordered categories Gender, race Counts, proportions
Lower
• Ordinal Ordered categories Degree of pain Median Intermediate
Continuous or ordered discrete
can take all possible values within some interval of real numbers (continuous) or limited to integers (discrete)
Weight, number of cigarettes per day
Mean, standard deviation
Higher
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Mean of a series of observations xi, i = 1, 2, …, n
Properties given that X and Y are random variables and ‘a’ is a scalar
Median (M): middle of the distribution such that at least 50% of the outcomes is larger than or equal to M and at least 50% of the outcomes is smaller than or equal to M
For n uneven: this is the middle value in order of magnitude
For n even: one will take the average of the two middle values
20Part 2 – Explorative data-analysis
yx
bxaba
YYYX
XbaX
+=+=
+=+=
+
+
µµµ
µµ
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Mean is very sensitive to outliers
21
Numbers of partners desired in the next 30 yearsMiller and Fishkin, 1997
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Standard deviation of a series of observed values xi
When the variable is approximately normally distributed, approximately 95% of the data will lie between and
Square of SD is called the Variance Var(x)
Variation coefficient
22
∑=−=
n
i i xxn
xSD1
2)(1)(
%100)(x
xSD
)(96.1 xSDx − )(96.1 xSDx +
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Interquartile range (IQR): distance Q3 – Q1 with
Q1: a value such that at least 25% of the outcomes fall below Q1 and at least 75% of the outcomes fall above Q1
Q3: a value such that at least 75% of the outcomes fall below Q3 and at least 25% of the outcomes fall above Q3
If more than one value satisfies this criterion, the average is usually taken
23Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Five number summary: Min, Q1, Median, Q3 Max
24
Birt
h w
eigh
t
Part 2 – Explorative data-analysis
quartiles
Median
whiskers
reach to largest observation within a distance of 1.5 x IQR
1.5 x IQR
IQR
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Bar diagram for continuous data – relative or absolute frequencies
25
Birth weight
Per
cent
age
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Normal distribution
Density
μ is the population mean
σ² is the population variance
Notation X ~ N(μ, σ²)
If X ~ N(μ, σ²), then ~ N(0, 1) is a standard normal distribution
26
σµ−
=XZ
2
21
21)(
−
−= σ
µ
πσφ
x
ex
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Properties of the standard normal distribution N(0, 1)
unimodal: 1 maximum (i.e. 0)
symmetric around 0
68-95-99.7 rule:
• 68% of the area under the curve (AUC) lies between -1 and 1, 68% of the observations fall within 1 SD of the mean μ
• 95% of the AUC lies between -2 and 2, 95% of the observations fall within 2 SD of the mean μ
• 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall within 3 SD of the mean μ
27Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Histogram – BoxplotMeasures for location centerMeasures of spread
Normal curve
Normal quantile plot
Compares two distributions by plotting their quantiles against each other
If the observed and the normal distribution are identical, points are expected to lie on a straight line with intercept 0 and slope 1
Distributions with the same shape but simply rescaled or shifted still show up on a straight line but with different intercept (shift) or slope (scale change)
28
Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential data
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Bivariate relations – continuous data
Graphical: boxplots, (stacked) histrograms, scatter plots
Correlation coefficient (r):
Takes values between -1 and 1
Pearson correlation coefficient
expresses a degree of linear dependence
29
∑=
−×
−=
n
i
ii
ySDyy
xSDxx
nr
1 )()(1
Source wikipedia – Anscombe’s Quartet
r = 0.816 ! Summary statistic cannot replace the individual examination of the data
Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Bivariate relations - Spearman’s Rank correlation (-1 and 1)
Measures of monotone association (extent to which as one variable increases, the other variable tends to increase or decrease)
No assumption on linearity
Ordinal variables
30Part 2 – Explorative data-analysis
Source: Answers.com
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Bivariate relations - Spearman’s Rank correlation (-1 and 1)
31Part 2 – Explorative data-analysis
Br J Ophthalmol 2001;85:534-536
Corneal irregular astigmatism afterlaser in situ keratomileusis formyopia
Spearman rank correlation rs=0.440, p <0.0001http://geographyfieldwork.com/SpearmansRank.htm
X
2x2 associations – categorical data: comparing two proportions
Many studies are designed to compare two groups (X) on a binary response variable (Y)
32
YX Success Failure
Group 1 π1 1-π1
Group 2 π2 1-π2
Example: is there an association between antiviral drug use (X) and pneumonia (Y).
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Part 2 – Explorative data-analysis
π: probability of succes
1-π: probability of failure
PneumoniaYes No
Antiviral drug 579 45172 45751Control 648 45103 45751
PneumoniaYes No
Antiviral drug 0.013 0.987 1Control 0.014 0.986 1
Risk difference: is there a difference between the group taking antiviral drug and the control group
π1 – π2 = 0.013 – 0.014 = -0.001
Properties
-1 ≤ (π1 - π2) ≤ 1
if response is independent of group, then (π1 - π2) = 0
A difference may be more important when both success probabilities are close to 0 or 1 than when both p’s are close to 0.5
Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09)
In the first case, p1 is 10 times larger than p2 while in the second case p1 is only 1.2 times larger than p2.
33Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Relative risk: ratio of the success probabilities of the 2 groups
Properties
0 ≤ (π1/ π2) ≥ 1
if response is independent of group, then (π1/ π2) = 1
Antiviral drug example
(p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999
The sample proportion of pneumonia cases was 10.6% lower for the group prescribed antiviral drug. The CI of the relative risk indicates that the risk of pneumonia is at least 1% lower for the group prescribed antiviral drug.
34Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Odds ratio
For a probability π of success, the odds are defined to be
Odds ≥ 0 with values > 1 when a success is more likely than a failure. For example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as a success.
The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio
Properties odds ratio
0 ≤ θ ≥ ∞
When X and Y are independent, then θ = 1
the odds ratio does not change value when the orientation of the table reverses (rows become columns, columns become rows)
35Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Odds ratio - continued
Properties
if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and thus subjects in row 1 are more likely to have success than are subjects in row 2
θ = 4 does not mean that the probability π1 is four times π2 (that would be the interpretation of relative risk)
the odds ratio does not change when both cell counts within any row (or column, but not both) are multiplied by a nonzero constant; this implies that the odds ratio does not depend on the marginal counts within a row/column
36Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Odds ratio - Example
Sample odds ratio is computed by
For the patients prescribed antiviral drug, the estimated odds of pneumonia is 579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases with no pneumonia.
The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797, 0.999). The estimated odds for patients prescribed antiviral drug equals 0.892 times the estimated odds for patients in the control group. The estimated odds were 10.8% lower for the antiviral drug group.
37Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
PneumoniaYes No
Antiviral drug 579 45172 45751Control 648 45103 45751
Relation between odds ratio and relative risk
When the proportion of successes is close to 0 for both groups, the sample odds ratio is similar to the sample relative risk. In such a case, on odds ratio of 0.89 does mean that the probability of success for the patients prescribed antiviral drug is about 0.89 times the probability of success for the patients in the control group
Relative risk = 0.894 (95% CI: 0.799, 0.999)
Odds ratio = 0.892 (95% CI: 0.797, 0.999)
38Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
What should be used, risk difference, relative risk or odds ratio
The odds ratio is the preferred estimate
In a case-control study it is usually not possible to estimate the probability of an outcome given X (π1), and therefore it is also not possible to estimate the difference of proportions or relative risk for that outcome
In a retrospective study, 709 patients with lung cancer (cases) were queried about their smoking behavior (X). Each case was matched with a control patients: same age, same gender, same hospital but no lung cancer
Odds ratio = 2.97 the estimated odds of lung cancer for smokers were2.97 times the estimated odds for non-smokers
39Part 2 – Explorative data-analysis
Types of variablesUnivariate descriptivesBivariate descriptives
Continuous dataCategorical data
Lung cancerCases Controls
Smoker 688 650Non-smoker 21 59Total 709 709
Part 3 – Statistical inference 40
Part 3
Statistical inference
DistributionsBias and varianceHypothesis testing
Statistical inference: by using the laws of probability, we infer conclusions about a population from data collected in a random sample
A parameter (μ, σ) is a number that describes the population. A parameter is a fixed number, but its value is unkown in practice.
A statistic ( ) is a number that describes the sample. Its value is known when we have collected a sample, but it changes from sample to sample.
41
Random sampleCollect data
μ, σ
Make inferences about population
Population (N elements)Sample (n elements)
X
)(xSDX
)(, xSDX
Part 3 – Statistical inference
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
Binomial distribution
Poisson distribution
Normal distribution
42Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Binomial distributionPoisson distributionNormal distribution
Binomial distribution
Fixed number of n independent observations
Each observations falls in one of two categories (success/failure)
The probability of success ‘p’ is the same for each observation
→ denote X the number of successes among the n observations which can take values 0, 1, …, n then X ~ B(n, p)
Properties
Probability mass function
DistributionsBias and varianceHypothesis testing
Binomial distributionPoisson distributionNormal distribution
43
)1(2 pnpnp
X
X
−=
=
σ
µ
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Binomial distributionPoisson distributionNormal distribution
Poisson distribution: expresses the number Y of events in a given unit of time, space, volume, or any other dimension
Example → modeling a phenomenon in which we are waiting for an occurrence (waiting for customers to arrive in a bank)
Basic assumption: for small time intervals, the probability of an occurrence is proportional to the length of waiting time
Single parameter λ >0, the average number of events per unit of measurement.
44
k = number of occurrences of an eventλ = expected number of occurrences that occur during the given interval
λσ
λµ
=
=2Y
Y
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Binomial distributionPoisson distributionNormal distribution
Normal distribution
density
X1, X2, …, Xn is a simple random sample with mean μ and variance σ²
if Xi ~ N(μ, σ²) then ~ N(μ, σ²/n)
Central limit theorem
Draw a simple random sample (X1,… , Xn) of size n from a population with mean μ and finite variance σ². When n is large, the sample average then follows approximately a normal distribution regardless of the data distribution.
~
45
2
21
21)(
−
−= σ
µ
πσφ
x
ex
X
X
nN ²,σµ
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Sampling variabilityStandard deviation vs standard errorConfidence interval
Law of large numbers: population mean μ of X is unknown. The mean of a simple random sample → estimate of μ .
is a random variable that varies in repeated sampling
guarantees that as the sample size of a simple random sample increases, the sample mean gets closer to the population mean μ
Unbiased statistic: a statistic used to estimate an unknown parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.
Variability of a statistic is described by the spread of its sampling distribution.
Spread determined by sampling design and sample size. Larger samples have smaller spread.
46
x
x
X
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Sampling variabilityStandard deviation vs standard errorConfidence interval
How precise is our estimate?
47
Sample Population
Generalize findings for general populationEstimate must approximate the population value
Representative sample→ prevents the results for the sample from being biased
→ results are still subject to sampling variability: different samples from the same population will yield different results
Generalizing results from the sample to the study population then requires that we acknowledge sampling variability
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Sampling variabilityStandard deviation vs standard errorConfidence interval
Standard deviation ≠ standard error
Standard error measures the uncertainty in an estimate (standard error of the mean = SEM)
Standard deviation (SD) of the observations → measures the variability in the observations
both are standard deviations, but the standard error shrinks with increasing sample size, in contrast to the standard deviation of the observations
The mean and SD are the preferred summary statistics for (normally distributed) data, and the mean and 95% confidence interval are preferred for reporting an estimate and its measure of precision.
48
µ
Sampling distribution of the sample means
nσ
X
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Sampling variabilityStandard deviation vs standard errorConfidence interval
Confidence intervals
When we estimate a parameter by calculating a sample statistic, there is a degree of uncertainty in our estimation
We can construct an interval around the sample mean within which we expect the true population mean μ with known probability (e.g. 95% chance)
(1-α)100% confidence interval for the mean contains the population mean with (1-α)100 % chance. Confidence level or coverage probability is (1-α)
49
±n
zX σ
X
×± − nstX n 2/,1α
σ known σ unknown
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
Hypothesis testing
The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’
The average … is equal in both treatment groups
The alternative hypothesis (HA) is claiming the opposite
The average … differs by treatment
50
Type of decision H0 true HA true
Accept H0p > α Correct decision (1-α) Type II error (β)
Reject H0p < α Type I error (α) Correct decision (1- β)
Part 3 – Statistical inference
Power
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
We assume H0 is true unless we can demonstrate, based on sample data at the desired level of confidence, that HA is true.
→ level of confidence related to 2 potential types of statistical errors
• example: in a clinical trial we want to study the effect of an experimental drug (T) and compare it to a placebo (P)
H0 : effect of drug T = effect of P
HA : effect of drug T ≠ effect of P
Type I error (false positive): concern of the regulators, the drug is not working but it will go to the market
Type II error (false negative): concern of pharmaceutical companies, could not prove that the new drug is working
51Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
Sensitivity and specificity
52
Positive (ill) Negative (not-ill)
Test outcome → Positive
True Positive (TP)False Positive (FP)
Type I error (P-value)
Test outcome→ Negative
False negative(FN)Type II error
True Negative (TN)
Part 3 – Statistical inference
SensitivityProportion ill
people identified as being ill
SpecificityProportion non-ill people identified
non-ill
Gold standard
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
When are hypothesis needed
Hypothesis are not needed in descriptive studies
If any of the following terms appears in the research question (study not simply descriptive) a hypothesis should be formulated: greater than, less than, causes, leads to, compared with, more likely than, associated with, related to, similar to, correlated with.
The hypothesis should be clearly stated in advance.
53Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
Principal of statistical testing
calculate a test statistic which measures ‘distance’ from the observed sample to the null hypothesis, whose distribution is known under the null hypothesis
Reject Ho
test statistic t exceeds a chosen cut-off c (critical value) in magnitude
p-value stays below a chosen cut-off α in magnitude
safety principle: cut-off is chosen such that the risk of making a Type I error is controlled at a prespecified significance level α
Usually α = 0.05 (test performed at the 5% significance level)
the power of the test (probability to avoid Type II errors, 1-β) is not controlled → chose adequate designs and sufficiently large sample sizes
54Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
critical value c: reject H0 when the test statistic t exceeds the chosen cut-off cin magnitude
p-value: probability to find a result for the test statistic at least as extreme as the observed result (in the direction of the alternative hypothesis), if the null hypothesis holds
55
Acceptance region
Distribution of test statistic
2α
2α
Rejection region Rejection region
cL cR
α = 0.05
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA)
For many testing problems H0 is formulated very precisely, but there are usually an infinite number of distributions consistent with HA.
56Part 3 – Statistical inference
nσ
Standardized effect sizeσµµ 01 − With what probability must the statistical test
detect this smallest relevant difference?~ 91% chance of finding an association of that size or greater
DistributionsBias and varianceHypothesis testing
Principle of statistical testsp-value and powerone-sided versus two-sided testing
One-sided versus two sided testing
Decided prior to data analysis and avoid one-sided tests unless there are really good reasons for using them (only one direction of the association is clinically or biologically relevant)
never wrong to use a two-sided test where a one-sided test is applicable
at most a slight loss of power
57
Two-sided testing One-sided testing
Part 3 – Statistical inference
DistributionsBias and varianceHypothesis testing
Multiple and Post Hoc Hypotheses - testing problem
Inflated rate of false positive conclusions (Type I error)
Assume we perform 3 independent comparison between 2 groups, each conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857→ the chance of finding at least one false positive statistically significant test increases to 14.3% (1-0.857=0.143, not 0.05)
Adjusting for multiple hypotheses is especially important when the consequences of making a false positive error are large e.g. mistakenly concluding that an ineffective treatment is beneficial
Adjustments can be made → False Discovery rate control
58Part 3 – Statistical inference
Part 4 – Statistical tests 59
Part 4
Statistical tests
Continuous data
Parametric statistics
Non-parametric statistics
Categorical data
Ordinal versus nominal
Types of testing
One-sample tests
Two dependent groups
Two independent groups
More than two groups
Controlling for covariates
60Part 4 – Statistical tests
Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions
Dependent versus independent
61Part 4 – Statistical tests
Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions
DependentSubject Time x
Treatment AWeight
Time y Treatment B
WeightVolunteer 1 x1A x1B
Volunteer 2 x2A x2B
Volunteer 3 x3A x3B
Volunteer 4 x4A x4B
Volunteer 5 x5A x5B
IndependentSubject Treatment Weight
Volunteer 1 A x1A
Volunteer 2 A x2A
Volunteer 3 A x3A
Volunteer 4 A x4A
Volunteer 5 A x5A
Volunteer 6 B x6B
Volunteer 7 B x7B
Volunteer 8 B x8B
Volunteer 9 B x9B
Volunteer 10 B x10B
Parametric statistics
assumes that the data come from a type of probability distribution and make inferences about the parameters of the distribution
requires assumptions (e.g. Normal distribution), if they are correct they produce more accurate and precise estimates and have generally more statistical power
e.g. Independent sample t-test
Assumptions
• Independent observations
• Population 1 → X1i ~ N(μ1, σ²)
Population 2 → X2i ~ N(μ2, σ²)
H0 : μ1 = μ2 → H0 two distributions are equal
62Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Non-parametric statistics – rank tests
no specific assumption about the population distribution required
Example: statistics based on Rank tests
Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is defined as
The smallest observation gets rank 1, the second smallest rank 2, …, the largest observation gets rank n.
In case of ties (a tie is a pair of equal observations), the ranks of the tied observations are defined as the average of their ranks according to the definition just given. These are called mid-ranks.
63Part 4 – Statistical tests
Rj = R(Xj) = number of observations in the sample < Xj
)(1
ji
n
iXXI ≤=∑
=
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Example
Properties of rank-transformed observations
they only depend on the ordering of the observations
they are insensitive to outliers (robust)
the distribution of the ranks does not depend on the distribution of the observations
64
Observations Ranks2 18 212 (3+4)/212 (3+4)/215 539 6
Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Non-parametric statistics – permutation tests
reference distribution of a characteristic of interest is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.
Example: a company has a new training program and whishes to evaluate if the new method is better than the traditional one. To assess the effect of the new method, they set up an experiment with 7 new employees. Four of them are randomly assigned to the new training method, and the other three received the old training method.
65
New Traditional 37 2349 3155 4657
New Traditional 37 2349 31 55
55 31 4657
Observed data
Permutations
35!3!4
!747
==
Rearrangement
Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Permutation tests
to verify whether there is a difference in means of a continuous measurement in 2 independent populations
Permutation null distribution
H0 : F1(x) = F2(x) for all x.
HA : μ1 > μ2
Test statistic
Example: we have 35 possible permutations (each having a t*-value), the collection of all the t*-values is the permutation null distribution
66
21 XXT −=
Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Permutation test - example
Test statistic → t = 49.5 – 33.3 = 16.2
Permutation null distribution of the 35 possible permutations, under the null hypothesis all t*-values are equally likely
H0 will be rejected for large T (T>c, critical value), c controls the type I error rate at α P(T > c |H0) < α
67
21 XXT −=
Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Parametric versus non-parametric tests
Parametric tests: the data are sampled from a population with N-distribution OR large sample size (CLT)
Smaller sample size: outliers or skewed distribution can be problematic → transformation or non-parametric tests (permutation or rank tests)
Permutation tests: very flexible
Non-parametric rank tests: in case of no meaningful measurement scale (pain score, Apgar score, …)
Careful with formulation of H0 and interpretation of the analysis
Less power
68Part 4 – Statistical tests
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Categorical / discrete data: the set of all possible values can be enumerated
Ordinal data: ordered categories
Age group, pain assessment from no to severe, Likert scales (agree strongly, agree, neutral, disagree, disagree strongly)
Nominal data: categories have no natural order, sometimes called qualitative data (gender, race, hair color)
Counts: variables are represented by frequencies
Proportions / percentages
Ratio of counts e.g. binary or dichotomous data: have exactly two possible outcomes (success / failure), we count the number of success in the number of trials
69
Continuous dataCategorical data – Proportions
Parametric statisticsNon-parametric statistics
Rank testsPermutation tests
Part 4 – Statistical tests
One-sample tests Parametric statisticsNon-parametric statistics Categorical data - Proportions
One-sample t-test
One-sample t-test
to verify whether the mean of a continuous measurement deviates from a given value μ0
H0 : μ = μ0
HA : μ ≠ μ0
Test statistic
t-distributed with n-1 degrees of freedom (df)
Assumptions
Independent observations
Normally distributed observations or large sample
70Part 4 – Statistical tests
One categorical variable with J ≥ 2 categories
Example: number of students in each of the three main subjects in the 1st
master psychology (2003-2004)
Suppose that in the population, the true proportions are:
Part 6 – Categorical data 71
One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions
1-way contingency tables
X² test One categorical variable with J ≥ 2 categories
Statistic
H0 : pj = πj for all j or for frequencies nj = μj
HA : pj ≠ πj
Statistic
Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null hypothesis should be rejected.
Part 6 – Categorical data 72
One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions
1-way contingency tables
Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Paired sample t-test
Paired sample t-test
to verify whether 2 continuous measurements, obtained from paired subjects, are the same on average
H0 : μ1 = μ2
HA : μ1 ≠ μ2
→ calculate differences Y = X1 – X2 and use the one-sample t-test to verify whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y
Assumptions
Independent differences
Normally distributed differences or large sample (n ≥ 40)
n ≥ 15 t-test fine unless very skewed distribution or outliers
n < 15 data ~ N-distr, very skewed distribution or outliers problematic
73Part 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’
Wilcoxon signed rank test
Compare 2 dependent samples → the difference variable Y = X1 - X2
Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi
-
observations on the negative differences (i = 1, …, n-) then
H0 : P(Y - < Y +) = ½
HA : P(Y - < Y +) > ½
Statistic
74Part 4 – Statistical tests
Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Wilcoxon signed rank test
Wilcoxon signed rank test - Example
Two stories ware narrated to children with reading disorders, story 1 was not illustrated whereas story 2 was illustrated
V= 9, n=5, p=0.406
From this small sample we could not conclude that children with reading disorders can tell a story better when the story was illustrated.
75
Child 1 2 3 4 5Story 1 0.40 0.72 0.00 0.36 0.55
Story 2 0.77 0.49 0.66 0.28 0.38
Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17
ranks of |Yi | 4 3 5 1 2
signed ranks 4 -3 5 -1 -2 V = 9
Part 4 – Statistical tests
Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Wilcoxon signed rank test
Models for matched pairs
For comparing categorical responses for 2 samples when each sample has the same subject or when a natural pairing exists between each subject in one sample and a subject from the other sample.
McNemar test compares proportions in paired studies
H0 : π1+ = π+1
HA : π1+ ≠ π+1
76Part 4 – Statistical tests
Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Models for matched pairs
After TotalBefore Yes NoYes n11 n12 n1+
No n21 n22 n2+
Total n+1 n+2 n
Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Independent sample t-test
Independent sample t-test
to verify whether the mean of a continuous measurement is the same in 2 independent populations
H0 : μ1 = μ2 versus HA : μ1 ≠ μ2
Test statistic
Measurement variance = in the 2 groups
Measurement variance ≠ in the 2 groups
Assumptions
Independent observations
Normally distributed observations or large sample in each group
Small but equal sample size n1 = n2 = 5 and shape of distributions comparable → we can still trust on t-test procedures
77
*t
Part 4 – Statistical tests
Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Independent sample t-test
Independent sample t-test – continued
Measurement variance = in the 2 groups, SE of the mean difference can be estimated as
With
Measurement variance ≠ in the 2 groups, SE of the mean difference can be estimated as
(1-α)100% confidence interval for μ1 - μ2 versus
78Part 4 – Statistical tests
Mann-Whitney (U) test, Wilcoxon rank-sum test
Compare 2 independent samples
H0 : F1(x) = F2(x) for all x
HA : P(X1 < X2) ≠ ½
where X1 and X2 have distributions F1 and F2, respectively.
If X1 and X2 are continuous random variables, the test may be thought of as testing the null hypothesis that the probability of an observation from one population exceeding an observation from the second population is 0.5, this implies
P(X1 < X2) = P(X1 > X2) = ½
→ test statistics based on this principle
79Part 4 – Statistical tests
Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Rank testsMann-Whitney U, Wilcoxon Rank Sum
Is the Wilcoxon rank-sum test the nonparametric alternative for the independent-sample t-test?
RememberH0 : F1(x) = F2(x) for all x (2 distributions are equal)
HA : P(X1 < X2) ≠ ½
→ the ranks cannot be used to estimate the mean!
Independent sample t-test
H0 : μ1 = μ2
HA : μ1 ≠ μ2
80Part 4 – Statistical tests
Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions
Rank testsMann-Whitney U, Wilcoxon Rank Sum
2x2 contingency tables
Example: Patient characteristics at the onset of first-line treatment with gefitinib or chemotherapy
81
Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
2X2 contingency tables
ECOG PS Total
Treatm < 2 ≥ 2
Gefinitib 70 17 87
Chemo 57 4 61
Total 127 21
Frequency
ECOG PS Total
Treatm < 2 ≥ 2
Gefinitib 0.805 0.195 1.00
Chemo 0.934 0.066 1.00
Total
Conditional distribution of ECOP PS status given treatment
Two variables are said to be statistically independent if the conditional distributions of Y (Eastern Cooperative Oncology Performance status) are identical at each level of X (treatment)
Part 4 – Statistical tests
Testing independence - Pearson chi-square test
H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj
HA : πij ≠ πi+ π+j
Statistic
Example
Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated, The proportion of patients with a poor ECOG performance status (≥ 2) was higher in the first-line gefitinib group (20%) than in the first-line chemotherapy group (7%; P = 0.026).
82
Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
2X2 contingency tables
Part 4 – Statistical tests
Testing independence – Fisher’s exact test
For small samples, Fisher’s exact test: assumes that the row and margin totals are fixed (hypergeometric distribution). When this assumption is not met (most cases), Fisher’s exact test is very conservative, resulting in a type I error below 0.05.
H0 : θ = 1
HA : θ ≠ 1
Part 6 – Categorical data 83
Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
2X2 contingency tables
Treatm Adeno Nonadeno Total
Gefinitib 85 2 87
Chemo 58 3 61
Total 142 5 673
Two-sided p-values:Fisher’s exact test p = 0.403Chi-square test p=0.385
Large samples
In case of very large sample sizes pearson chi-square will reject almost any null hypothesis, even if the deviation of the observed from the expected counts is of little importance → use the Gini index (value equals the proportion of observations that would have to be moved from one cell to another in order for the observed counts to equal the expected counts
Small samples
Inferences based on chi-square distribution become questionable when the expected counts in some cells become too small (below 5) even when the total sample size is large → use exact solutions (Fishers Exact test)
Part 6 – Categorical data 84
Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
2X2 contingency tables
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
One-way analysis of variance (ANOVA)
to verify whether the mean of a continuous measurement is the same in 2 or more independent populations
H0 : μ1 = μ2 = … = μk versus
HA : at least 1 of the population means differs
Test statistic
Assumptions
Independent observations
Normally distributed observations or large sample within each group (Q-Q plots)
Equal variance in each group (boxplots or Levene’s test)
85
~knkF −− ,1F =
Between MSE
Within MSE
H0
Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
ANOVA principle
Is variation between groups large as compared to variation within groups
86
∑∑∑∑∑∑= == == =
−+−=−k
i
n
ji
k
i
n
jiij
k
i
n
jij
iii
YYYYYY1 1
2
1 1
2
1 1
2 )()()(
Total Sum of Squares = within SS + between SS
Consider k groups with each ni observations with jth observation in ith group
Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
87
ANOVA Table
Total
n-kWithin
k-1Between
FMean Squared ErrorMSE
dfSum of SquaresSS
Source
∑∑= =
−k
i
n
jij
i
YY1 1
2)(
∑∑= =
−k
i
n
jiij
i
YY1 1
2)(
∑∑= =
−k
i
n
ji
i
YY1 1
2)(1−k
SSB
knSSW
−
W
BMSE
MSE
Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
Deviations from the assumptions
one-way analysis of variance is robust against lack of normality
→ in case of important deviations from a normal distribution : use nonparametric Kruskal-Wallis test or transformations
ANOVA is not sensitive to the assumption of homogeneity of variances (perform Levene’s test at the 1% sigificance level)
→ heterogeneity of variances
• little impact when the group level sample sizes ≈ equal: Type I error rate is slightly increased
• with important heterogeneity and markedly ≠ group level sample sizes, weighted least squares regression may be used, weighting each observation by the inverse group level standard deviation
88Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
Post-hoc analysis
if ANOVA detects no difference, we conclude that there is insufficient evidence of a difference in means
if ANOVA detects a difference → post-hoc analysis to investigate where the difference is
DO NOT perform all pairwise comparisons using independent samples t-tests → multiple testing problem
Assume we perform 3 different t-test, each conducted with α = 0.05
The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857 (assuming independence of tests)→ the level of sign that at least one of the three tests leads to conclusion HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05).
The level of significance and power for a family of tests ≠ individual test
89Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
Family-wise error rate - αE
The probability of making at least 1 false discovery (type I errors) among all the hypotheses when performing multiple pairwise tests
→ We should correct for the risk of false detections
most procedures for multiple testing are designed to control the risk of at least 1 false detection at αE, assuming that all k null hypotheses are true
when the k tests are independent, each with significance level α, then
αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α
family-wise error rate increases with the number of tests
90Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
Multiple comparison procedures that control family-wise error rate
Bonferroni procedure
Conservative test: makes less Type I errors than allowed for (and thus more Type II errors)
Only applicable when the effects to be investigated are identified in advance of the data analysis
Tukey procedure
Preferred method when only pairwise comparisons are to be made
Scheffé procedure
Preferred method when the family of interest is a set of all possible contrasts among the factor level means
91Part 4 – Statistical tests
≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Variance
Rules of thumb
never interpret a large p-value as indicating absence of association
never interpret a small p-value as indicating an important association
report p-values in combination with an effect estimate and confidence interval! This allows for judging whether the effect is practically significant.
in some cases, it may be advisable to determine equivalence intervals prior to data analysis
92Part 4 – Statistical tests
> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions
Kruskal-Wallis test
Kruskal-Wallis rank test
k-sample problem, compare more than 2 independent samples
H0 : F1(x) = F2(x) = … = Fk(x) for all x
HA : P(X1 < X2) ≠ ½ the observations in some populations are systematically larger than in other populations
Assumptions
the observations in each group come from populations with the same shape of distribution
93Part 4 – Statistical tests
> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions
Kruskal-Wallis test
Kruskal-Wallis rank test
the rank test statistic is basically an MSEbetween based on the ranks
rank all observations in the combined sample
let Rij denote the rank Xij (i =1, …, k, j =1, …, ni)
Kruskal-Wallis test statistic
94Part 4 – Statistical tests
average of the ranks Rij (j =1, …, ni) in the ith group
> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions
Kruskal-Wallis test
Kruskal-Wallis rank test
when H0 is rejected → at least 2 means are different → pairwise comparisons Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis in terms of probabilities: HA : P(X1 > X2) …
Family-wise error rate – αE → we should correct for the risk of false detections, Bonferroni correction: when m tests must be performed simultaneously, each of the tests must be performed at α = αE / m
equivalent: multiply each p-value with m before interpreting
95Part 4 – Statistical tests
≥ two independent samplescontrolling for covariate
Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Analysis of Covariance (ANCOVA)
Analysis of Covariance - ANCOVA
Adjustment for a confounder (e.g. age)
Just like in ANOVA we have a treatment effect (consider for example 3 treatments)
We add the variable age to our model → adjustment for a confounder
96Part 4 – Statistical tests
≥ two independent samplescontrolling for covariate
Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Breslow-Day testCochran-Mantel-Haenszel test
Three-way contingency tables
In studying the effect of an explanatory variable X on a response variable Y, one should control covariates that can influence that relationship
Example: Peginterferon alfa for hepatitis C
97Part 4 – Statistical tests
Virologic Response
Genotype Treatment Yes No
1 A 138 160B 103 182
2 A 106 34B 88 57
Total A 244 194B 191 239
Conditional odds ratio θ1
Marginal odds ratio
Conditional odds ratio θ2
≥ two independent samplescontrolling for covariate
Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Breslow-Day testCochran-Mantel-Haenszel test
Breslow-Day test for testing homogeneity of odds ratios
The odds ratio between X and Y is the same as in different Z categories. It is a test of homogeneous association.
98Part 4 – Statistical tests
≥ two independent samplescontrolling for covariate
Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Breslow-Day testCochran-Mantel-Haenszel test
Cochran-Mantel-Haenszel Test of conditional independence
Conditional XY independence given Z in a 2 × 2 × K table.
The response is conditionally independent of the treatment in any given strata
Inappropriate when the association varies dramatically among the partial tables
99Part 4 – Statistical tests
≥ two independent samplescontrolling for covariate
Parametric statisticsNon-parametric statisticsCategorical data – Proportions
Breslow-Day testCochran-Mantel-Haenszel test
Cochran-Mantel-Haenszel Test of conditional independence
Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36, p=0.064 CMH test). Indicating that the response is independent of the treatment in the different ECOP PS strata.
100Part 4 – Statistical tests
6. Bokemeyer et al, 2008: M&M and p 667 Efficacy
Response
ECOP PS Treatment Yes No
0 Cet. + FOLFOLFOX-4
1 Cet. + FOLFOLFOX-4
2 Cet. + FOLFOLFOX-4
Total Cet. + FOL 77 92FOLFOX-4 60 108
Conditional odds ratio θ1
Marginal odds ratio = 1.51
Conditional odds ratio θ2
Conditional odds ratio θ3
Recommended