100
Introduction to statistics Els Adriaens, PhD December 17, 2010 1

Basics statistics

  • Upload
    bits

  • View
    3.230

  • Download
    2

Embed Size (px)

DESCRIPTION

This is the presentation of the BITS training session on "Essential statistics".View more material on http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203865:essential-statistics&catid=81:training-pages&Itemid=190

Citation preview

Page 1: Basics statistics

Introduction to statistics

Els Adriaens, PhD

December 17, 2010 1

Page 2: Basics statistics

Overview

Outline

Formulate a relevant research question

Study design

Gather the data according to the plan

Analyze the data

Explorative data-analyses (descriptives, graphically)

Drawing inference (answer our research question with certain confidence)

Report the results

Overview 2

Page 3: Basics statistics

Experimental versus observation studiesDesign of an experimental studyOverview study designs

Experimental studyObservational studyMixed experimental and observational studies

Part 1 – Design of a study 3

Part 1

Design of a study

Page 4: Basics statistics

Experimental versus observation studiesDesign of an experimental studyOverview study designs

Experimental studyObservational studyMixed experimental and observational studies

Experimental study

Factor levels (treatments) randomly assigned over the different experimental units (control over explanatory variable)

→ information about cause-and effect relationship between the explanatory factors and a response variable

Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children were selected at random and received Vit C (treatment group) the remaining children received a placebo (control group)

Qualitative explanatory factor with two levels and children as experimental units

4Part 1 – Design of a study

Page 5: Basics statistics

Experimental versus observation studiesDesign of an experimental studyOverview study designs

Experimental studyObservational studyMixed experimental and observational studies

Observational study

Data obtained from non-experimental study: explanatory variables not controlled, randomization of the treatments to experimental units does not occur

→ establish associations between the explanatory factors and a response variable

Example: Company officials wished to study the relation between the age of an employee and the number of days of illness in a year.

Explanatory variable not controlled → age is observed

Establish associations but no cause-and-effect: a positive relation between age and number of days of illness may not imply that number of days of illness is the direct result of age → younger employees work indoors while older employees usually work outdoors, and therefore work location is more responsible for the number of days of illness instead of age

5Part 1 – Design of a study

Page 6: Basics statistics

Experimental versus observation studiesDesign of an experimental studyOverview study designs

Experimental studyObservational studyMixed experimental and observational studies

Mixed studies

Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug on lowering blood cholesterol was investigated. Within each hospital center volunteers were randomly assigned to one of the two treatments (drug / placebo)

Experimental factor: treatment (drug versus placebo)

Observational factor: hospital center, not randomly assigned since each volunteer was assigned to the nearest hospital center

6Part 1 – Design of a study

Page 7: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Structure of the experiment

2 levels of factor A x 3 levels of factor B = 6 treatments

experimental unit: smallest unit of experimental material to which a treatment can be assigned, the experimental unit is determined by the method of randomization

7

1 2 3

4 5 6

Factor B

Factor A

Level 1 Level 2 Level 3

Level 1

Level 2

Experimental unitReplicates = treatment repeated → estimate experimental error

Part 1 – Design of a study

Page 8: Basics statistics

Number of factors: initial stages of investigation → include many factors (more than can possibly studied in a single experiment)

Cause-and-effect diagrams are often used to identify factors that could affect the outcome → reduce number of factors

Example : 4 factors each 2 levels → 16 treatment combinations

Number of levels of each factor:

Qualitative factors

Quantitative factors: # levels reflect the type of trend expected by the experimenter

• 2 levels ~ linear change in response: min – max of specified range

• 3 levels ~ quadratic trend

• > 4 levels ~ detailed examination shape of response curve desired

Range of factor is one of the most important design decisions

8

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Part 1 – Design of a study

Page 9: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Measurements: precision versus accuracy

Precision of a variable: the degree to which a variable has nearly the same value when measured several times. It is a function of random error (chance) and is assessed as the reproducibility of repeated measurements.

Example: weigh the same person 3 times on an electronic balance and obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kg

The more precise a measurement, the greater the statistical power at a given sample size to estimate mean values and to test hypothesis

Variability may be due to operator, instrument and subject

Minimize random error and improve precision

Operating manuals, training the operator, refining / automating instruments

Repeat the measurement and average over a larger number of observations (but! added cost, practical difficulties)

9Part 1 – Design of a study

Page 10: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Accuracy of a variable: the degree to which a variable actually represents what it is supposed to represent. It is a function of systematic error (bias) which is often difficult to detect and has important influence on the validity of the result.

Example 1: incorrect calibration of an instrument

Example 2: gastric freezing as a treatment for ulcers in the upper part of the intestine

Improve accuracy and minimize bias

Operating manuals, training the operator, refining / automating instruments

Periodic calibration using a gold standard (example 1)

Blinding: double–blind study: the experimental subject and the evaluator have no information on which treatment that they receive or give, any inaccuracy in measuring the outcome will be the same in the 2 groups (example 2)

10Part 1 – Design of a study

Page 11: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Bias and variance in shooting arrows at a target. Bias means that the archer systematically misses in the same direction. Variance means that the arrows are scattered (Moore and McCabe 2002)

11Part 3 – Statistical inference

Page 12: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Sampling from a population

Simple random sample

12

Random draws

With equal probability

Population (N elements) Sample (n elements)

Part 1 – Design of a study

Page 13: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Randomization → treatments are at random assigned to experimental units

Tends to eliminate the influence of extraneous factors not under direct control of the experimenter

Blocking → increase precision by talking into account other factors

13

Subjects

Males

Group 1 → treatment 1

Group 2 → treatment 2

Group 3 → treatment 3

Females

Group 1 → treatment 1

Group 2 → treatment 2

Group 3 → treatment 3

Randomization

Randomization

Part 1 – Design of a study

HeterogeneousHomogeneous

Homogeneous

Page 14: Basics statistics

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Stratified Sampling

Suppose we want to know the attitudes of male and female students in the engineering school

Is a simple random sample from that school a good idea?

No too few women (10%)

Stratify the sample, pick a random sample from

Stratum 1: female engineers

Stratum 2: male engineers

Estimates are measured with comparable precission. Learn from distribution in each stratum, do NOT pool the data

e.g. if the average weight is 60kg for the women and 80 kg for the men, The average engineer will weight 10% x 60 + 90% x 80 = 78 kg

14Part 1 – Design of a study

Page 15: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Part 2 – Explorative data-analysis 15

Part 2

Explorative data-analysis

Page 16: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Descriptive statistics

Allows the researcher to describe or summarize the data. This is typically done in the beginning of a results section. The researcher gives an idea of the sample size, the characteristics under study (e.g. baseline characteristics in a clinical trial)

Example: A total of 235 students participated in this study, 163 women (69.4%) versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a slightly higher score on exam 2 in comparison to the male students (80.7 ±18.1).

16Part 2 – Explorative data-analysis

Page 17: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

We typically start with univariate explorations (one variable at a time). Next, describe joint distributions (2 by 2 = bivariate; more variables = multivariate)

Graphical summary to inspect the shape of the distribution: symmetry, modality, heaviness of tails

Numerical summary: classical measures of location and spread

Mean and standard deviation

Median and interquartile range

Mode: value that occurs most often (useful for nominal data)

17Part 2 – Explorative data-analysis

Page 18: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Notes on notation

A random variable X is a variable whose value is a numerical outcome of a random phenomenon (nonnumerical outcomes are numerically encoded)

Random variables are usually denoted by capital letters such as X, Y, …

Fixed constants or observed values are usually denoted by small letters e.g. x, y. Special constants (to be specified) will be written as Greek letters α, β, μ, σ

indices i will subscript random or observed outcomes for individual observations in the data set: Yi , yi

18Part 2 – Explorative data-analysis

Page 19: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

19Part 2 – Explorative data-analysis

Type Characteristic Example Descriptive statistic

Information content

Categorical the set of all possible values can be enumerated

• Nominal Unordered categories Gender, race Counts, proportions

Lower

• Ordinal Ordered categories Degree of pain Median Intermediate

Continuous or ordered discrete

can take all possible values within some interval of real numbers (continuous) or limited to integers (discrete)

Weight, number of cigarettes per day

Mean, standard deviation

Higher

Page 20: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Mean of a series of observations xi, i = 1, 2, …, n

Properties given that X and Y are random variables and ‘a’ is a scalar

Median (M): middle of the distribution such that at least 50% of the outcomes is larger than or equal to M and at least 50% of the outcomes is smaller than or equal to M

For n uneven: this is the middle value in order of magnitude

For n even: one will take the average of the two middle values

20Part 2 – Explorative data-analysis

yx

bxaba

YYYX

XbaX

+=+=

+=+=

+

+

µµµ

µµ

Page 21: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Mean is very sensitive to outliers

21

Numbers of partners desired in the next 30 yearsMiller and Fishkin, 1997

Part 2 – Explorative data-analysis

Page 22: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Standard deviation of a series of observed values xi

When the variable is approximately normally distributed, approximately 95% of the data will lie between and

Square of SD is called the Variance Var(x)

Variation coefficient

22

∑=−=

n

i i xxn

xSD1

2)(1)(

%100)(x

xSD

)(96.1 xSDx − )(96.1 xSDx +

Part 2 – Explorative data-analysis

Page 23: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Interquartile range (IQR): distance Q3 – Q1 with

Q1: a value such that at least 25% of the outcomes fall below Q1 and at least 75% of the outcomes fall above Q1

Q3: a value such that at least 75% of the outcomes fall below Q3 and at least 25% of the outcomes fall above Q3

If more than one value satisfies this criterion, the average is usually taken

23Part 2 – Explorative data-analysis

Page 24: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Five number summary: Min, Q1, Median, Q3 Max

24

Birt

h w

eigh

t

Part 2 – Explorative data-analysis

quartiles

Median

whiskers

reach to largest observation within a distance of 1.5 x IQR

1.5 x IQR

IQR

Page 25: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Bar diagram for continuous data – relative or absolute frequencies

25

Birth weight

Per

cent

age

Part 2 – Explorative data-analysis

Page 26: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Normal distribution

Density

μ is the population mean

σ² is the population variance

Notation X ~ N(μ, σ²)

If X ~ N(μ, σ²), then ~ N(0, 1) is a standard normal distribution

26

σµ−

=XZ

2

21

21)(

−= σ

µ

πσφ

x

ex

Part 2 – Explorative data-analysis

Page 27: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Properties of the standard normal distribution N(0, 1)

unimodal: 1 maximum (i.e. 0)

symmetric around 0

68-95-99.7 rule:

• 68% of the area under the curve (AUC) lies between -1 and 1, 68% of the observations fall within 1 SD of the mean μ

• 95% of the AUC lies between -2 and 2, 95% of the observations fall within 2 SD of the mean μ

• 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall within 3 SD of the mean μ

27Part 2 – Explorative data-analysis

Page 28: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Normal quantile plot

Compares two distributions by plotting their quantiles against each other

If the observed and the normal distribution are identical, points are expected to lie on a straight line with intercept 0 and slope 1

Distributions with the same shape but simply rescaled or shifted still show up on a straight line but with different intercept (shift) or slope (scale change)

28

Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential data

Part 2 – Explorative data-analysis

Page 29: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Bivariate relations – continuous data

Graphical: boxplots, (stacked) histrograms, scatter plots

Correlation coefficient (r):

Takes values between -1 and 1

Pearson correlation coefficient

expresses a degree of linear dependence

29

∑=

−×

−=

n

i

ii

ySDyy

xSDxx

nr

1 )()(1

Source wikipedia – Anscombe’s Quartet

r = 0.816 ! Summary statistic cannot replace the individual examination of the data

Part 2 – Explorative data-analysis

Page 30: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Bivariate relations - Spearman’s Rank correlation (-1 and 1)

Measures of monotone association (extent to which as one variable increases, the other variable tends to increase or decrease)

No assumption on linearity

Ordinal variables

30Part 2 – Explorative data-analysis

Source: Answers.com

Page 31: Basics statistics

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Bivariate relations - Spearman’s Rank correlation (-1 and 1)

31Part 2 – Explorative data-analysis

Br J Ophthalmol 2001;85:534-536

Corneal irregular astigmatism afterlaser in situ keratomileusis formyopia

Spearman rank correlation rs=0.440, p <0.0001http://geographyfieldwork.com/SpearmansRank.htm

X

Page 32: Basics statistics

2x2 associations – categorical data: comparing two proportions

Many studies are designed to compare two groups (X) on a binary response variable (Y)

32

YX Success Failure

Group 1 π1 1-π1

Group 2 π2 1-π2

Example: is there an association between antiviral drug use (X) and pneumonia (Y).

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Part 2 – Explorative data-analysis

π: probability of succes

1-π: probability of failure

PneumoniaYes No

Antiviral drug 579 45172 45751Control 648 45103 45751

PneumoniaYes No

Antiviral drug 0.013 0.987 1Control 0.014 0.986 1

Page 33: Basics statistics

Risk difference: is there a difference between the group taking antiviral drug and the control group

π1 – π2 = 0.013 – 0.014 = -0.001

Properties

-1 ≤ (π1 - π2) ≤ 1

if response is independent of group, then (π1 - π2) = 0

A difference may be more important when both success probabilities are close to 0 or 1 than when both p’s are close to 0.5

Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09)

In the first case, p1 is 10 times larger than p2 while in the second case p1 is only 1.2 times larger than p2.

33Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Page 34: Basics statistics

Relative risk: ratio of the success probabilities of the 2 groups

Properties

0 ≤ (π1/ π2) ≥ 1

if response is independent of group, then (π1/ π2) = 1

Antiviral drug example

(p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999

The sample proportion of pneumonia cases was 10.6% lower for the group prescribed antiviral drug. The CI of the relative risk indicates that the risk of pneumonia is at least 1% lower for the group prescribed antiviral drug.

34Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Page 35: Basics statistics

Odds ratio

For a probability π of success, the odds are defined to be

Odds ≥ 0 with values > 1 when a success is more likely than a failure. For example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as a success.

The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio

Properties odds ratio

0 ≤ θ ≥ ∞

When X and Y are independent, then θ = 1

the odds ratio does not change value when the orientation of the table reverses (rows become columns, columns become rows)

35Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Page 36: Basics statistics

Odds ratio - continued

Properties

if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and thus subjects in row 1 are more likely to have success than are subjects in row 2

θ = 4 does not mean that the probability π1 is four times π2 (that would be the interpretation of relative risk)

the odds ratio does not change when both cell counts within any row (or column, but not both) are multiplied by a nonzero constant; this implies that the odds ratio does not depend on the marginal counts within a row/column

36Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Page 37: Basics statistics

Odds ratio - Example

Sample odds ratio is computed by

For the patients prescribed antiviral drug, the estimated odds of pneumonia is 579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases with no pneumonia.

The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797, 0.999). The estimated odds for patients prescribed antiviral drug equals 0.892 times the estimated odds for patients in the control group. The estimated odds were 10.8% lower for the antiviral drug group.

37Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

PneumoniaYes No

Antiviral drug 579 45172 45751Control 648 45103 45751

Page 38: Basics statistics

Relation between odds ratio and relative risk

When the proportion of successes is close to 0 for both groups, the sample odds ratio is similar to the sample relative risk. In such a case, on odds ratio of 0.89 does mean that the probability of success for the patients prescribed antiviral drug is about 0.89 times the probability of success for the patients in the control group

Relative risk = 0.894 (95% CI: 0.799, 0.999)

Odds ratio = 0.892 (95% CI: 0.797, 0.999)

38Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Page 39: Basics statistics

What should be used, risk difference, relative risk or odds ratio

The odds ratio is the preferred estimate

In a case-control study it is usually not possible to estimate the probability of an outcome given X (π1), and therefore it is also not possible to estimate the difference of proportions or relative risk for that outcome

In a retrospective study, 709 patients with lung cancer (cases) were queried about their smoking behavior (X). Each case was matched with a control patients: same age, same gender, same hospital but no lung cancer

Odds ratio = 2.97 the estimated odds of lung cancer for smokers were2.97 times the estimated odds for non-smokers

39Part 2 – Explorative data-analysis

Types of variablesUnivariate descriptivesBivariate descriptives

Continuous dataCategorical data

Lung cancerCases Controls

Smoker 688 650Non-smoker 21 59Total 709 709

Page 40: Basics statistics

Part 3 – Statistical inference 40

Part 3

Statistical inference

Page 41: Basics statistics

DistributionsBias and varianceHypothesis testing

Statistical inference: by using the laws of probability, we infer conclusions about a population from data collected in a random sample

A parameter (μ, σ) is a number that describes the population. A parameter is a fixed number, but its value is unkown in practice.

A statistic ( ) is a number that describes the sample. Its value is known when we have collected a sample, but it changes from sample to sample.

41

Random sampleCollect data

μ, σ

Make inferences about population

Population (N elements)Sample (n elements)

X

)(xSDX

)(, xSDX

Part 3 – Statistical inference

Page 42: Basics statistics

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Binomial distribution

Poisson distribution

Normal distribution

42Part 3 – Statistical inference

DistributionsBias and varianceHypothesis testing

Binomial distributionPoisson distributionNormal distribution

Page 43: Basics statistics

Binomial distribution

Fixed number of n independent observations

Each observations falls in one of two categories (success/failure)

The probability of success ‘p’ is the same for each observation

→ denote X the number of successes among the n observations which can take values 0, 1, …, n then X ~ B(n, p)

Properties

Probability mass function

DistributionsBias and varianceHypothesis testing

Binomial distributionPoisson distributionNormal distribution

43

)1(2 pnpnp

X

X

−=

=

σ

µ

Part 3 – Statistical inference

Page 44: Basics statistics

DistributionsBias and varianceHypothesis testing

Binomial distributionPoisson distributionNormal distribution

Poisson distribution: expresses the number Y of events in a given unit of time, space, volume, or any other dimension

Example → modeling a phenomenon in which we are waiting for an occurrence (waiting for customers to arrive in a bank)

Basic assumption: for small time intervals, the probability of an occurrence is proportional to the length of waiting time

Single parameter λ >0, the average number of events per unit of measurement.

44

k = number of occurrences of an eventλ = expected number of occurrences that occur during the given interval

λσ

λµ

=

=2Y

Y

Part 3 – Statistical inference

Page 45: Basics statistics

DistributionsBias and varianceHypothesis testing

Binomial distributionPoisson distributionNormal distribution

Normal distribution

density

X1, X2, …, Xn is a simple random sample with mean μ and variance σ²

if Xi ~ N(μ, σ²) then ~ N(μ, σ²/n)

Central limit theorem

Draw a simple random sample (X1,… , Xn) of size n from a population with mean μ and finite variance σ². When n is large, the sample average then follows approximately a normal distribution regardless of the data distribution.

~

45

2

21

21)(

−= σ

µ

πσφ

x

ex

X

X

nN ²,σµ

Part 3 – Statistical inference

Page 46: Basics statistics

DistributionsBias and varianceHypothesis testing

Sampling variabilityStandard deviation vs standard errorConfidence interval

Law of large numbers: population mean μ of X is unknown. The mean of a simple random sample → estimate of μ .

is a random variable that varies in repeated sampling

guarantees that as the sample size of a simple random sample increases, the sample mean gets closer to the population mean μ

Unbiased statistic: a statistic used to estimate an unknown parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.

Variability of a statistic is described by the spread of its sampling distribution.

Spread determined by sampling design and sample size. Larger samples have smaller spread.

46

x

x

X

Part 3 – Statistical inference

Page 47: Basics statistics

DistributionsBias and varianceHypothesis testing

Sampling variabilityStandard deviation vs standard errorConfidence interval

How precise is our estimate?

47

Sample Population

Generalize findings for general populationEstimate must approximate the population value

Representative sample→ prevents the results for the sample from being biased

→ results are still subject to sampling variability: different samples from the same population will yield different results

Generalizing results from the sample to the study population then requires that we acknowledge sampling variability

Part 3 – Statistical inference

Page 48: Basics statistics

DistributionsBias and varianceHypothesis testing

Sampling variabilityStandard deviation vs standard errorConfidence interval

Standard deviation ≠ standard error

Standard error measures the uncertainty in an estimate (standard error of the mean = SEM)

Standard deviation (SD) of the observations → measures the variability in the observations

both are standard deviations, but the standard error shrinks with increasing sample size, in contrast to the standard deviation of the observations

The mean and SD are the preferred summary statistics for (normally distributed) data, and the mean and 95% confidence interval are preferred for reporting an estimate and its measure of precision.

48

µ

Sampling distribution of the sample means

X

Part 3 – Statistical inference

Page 49: Basics statistics

DistributionsBias and varianceHypothesis testing

Sampling variabilityStandard deviation vs standard errorConfidence interval

Confidence intervals

When we estimate a parameter by calculating a sample statistic, there is a degree of uncertainty in our estimation

We can construct an interval around the sample mean within which we expect the true population mean μ with known probability (e.g. 95% chance)

(1-α)100% confidence interval for the mean contains the population mean with (1-α)100 % chance. Confidence level or coverage probability is (1-α)

49

±n

zX σ

X

×± − nstX n 2/,1α

σ known σ unknown

Part 3 – Statistical inference

Page 50: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

Hypothesis testing

The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’

The average … is equal in both treatment groups

The alternative hypothesis (HA) is claiming the opposite

The average … differs by treatment

50

Type of decision H0 true HA true

Accept H0p > α Correct decision (1-α) Type II error (β)

Reject H0p < α Type I error (α) Correct decision (1- β)

Part 3 – Statistical inference

Power

Page 51: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

We assume H0 is true unless we can demonstrate, based on sample data at the desired level of confidence, that HA is true.

→ level of confidence related to 2 potential types of statistical errors

• example: in a clinical trial we want to study the effect of an experimental drug (T) and compare it to a placebo (P)

H0 : effect of drug T = effect of P

HA : effect of drug T ≠ effect of P

Type I error (false positive): concern of the regulators, the drug is not working but it will go to the market

Type II error (false negative): concern of pharmaceutical companies, could not prove that the new drug is working

51Part 3 – Statistical inference

Page 52: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

Sensitivity and specificity

52

Positive (ill) Negative (not-ill)

Test outcome → Positive

True Positive (TP)False Positive (FP)

Type I error (P-value)

Test outcome→ Negative

False negative(FN)Type II error

True Negative (TN)

Part 3 – Statistical inference

SensitivityProportion ill

people identified as being ill

SpecificityProportion non-ill people identified

non-ill

Gold standard

Page 53: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

When are hypothesis needed

Hypothesis are not needed in descriptive studies

If any of the following terms appears in the research question (study not simply descriptive) a hypothesis should be formulated: greater than, less than, causes, leads to, compared with, more likely than, associated with, related to, similar to, correlated with.

The hypothesis should be clearly stated in advance.

53Part 3 – Statistical inference

Page 54: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

Principal of statistical testing

calculate a test statistic which measures ‘distance’ from the observed sample to the null hypothesis, whose distribution is known under the null hypothesis

Reject Ho

test statistic t exceeds a chosen cut-off c (critical value) in magnitude

p-value stays below a chosen cut-off α in magnitude

safety principle: cut-off is chosen such that the risk of making a Type I error is controlled at a prespecified significance level α

Usually α = 0.05 (test performed at the 5% significance level)

the power of the test (probability to avoid Type II errors, 1-β) is not controlled → chose adequate designs and sufficiently large sample sizes

54Part 3 – Statistical inference

Page 55: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

critical value c: reject H0 when the test statistic t exceeds the chosen cut-off cin magnitude

p-value: probability to find a result for the test statistic at least as extreme as the observed result (in the direction of the alternative hypothesis), if the null hypothesis holds

55

Acceptance region

Distribution of test statistic

Rejection region Rejection region

cL cR

α = 0.05

Part 3 – Statistical inference

Page 56: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA)

For many testing problems H0 is formulated very precisely, but there are usually an infinite number of distributions consistent with HA.

56Part 3 – Statistical inference

Standardized effect sizeσµµ 01 − With what probability must the statistical test

detect this smallest relevant difference?~ 91% chance of finding an association of that size or greater

Page 57: Basics statistics

DistributionsBias and varianceHypothesis testing

Principle of statistical testsp-value and powerone-sided versus two-sided testing

One-sided versus two sided testing

Decided prior to data analysis and avoid one-sided tests unless there are really good reasons for using them (only one direction of the association is clinically or biologically relevant)

never wrong to use a two-sided test where a one-sided test is applicable

at most a slight loss of power

57

Two-sided testing One-sided testing

Part 3 – Statistical inference

Page 58: Basics statistics

DistributionsBias and varianceHypothesis testing

Multiple and Post Hoc Hypotheses - testing problem

Inflated rate of false positive conclusions (Type I error)

Assume we perform 3 independent comparison between 2 groups, each conducted with α = 0.05

The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857→ the chance of finding at least one false positive statistically significant test increases to 14.3% (1-0.857=0.143, not 0.05)

Adjusting for multiple hypotheses is especially important when the consequences of making a false positive error are large e.g. mistakenly concluding that an ineffective treatment is beneficial

Adjustments can be made → False Discovery rate control

58Part 3 – Statistical inference

Page 59: Basics statistics

Part 4 – Statistical tests 59

Part 4

Statistical tests

Page 60: Basics statistics

Continuous data

Parametric statistics

Non-parametric statistics

Categorical data

Ordinal versus nominal

Types of testing

One-sample tests

Two dependent groups

Two independent groups

More than two groups

Controlling for covariates

60Part 4 – Statistical tests

Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions

Page 61: Basics statistics

Dependent versus independent

61Part 4 – Statistical tests

Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions

DependentSubject Time x

Treatment AWeight

Time y Treatment B

WeightVolunteer 1 x1A x1B

Volunteer 2 x2A x2B

Volunteer 3 x3A x3B

Volunteer 4 x4A x4B

Volunteer 5 x5A x5B

IndependentSubject Treatment Weight

Volunteer 1 A x1A

Volunteer 2 A x2A

Volunteer 3 A x3A

Volunteer 4 A x4A

Volunteer 5 A x5A

Volunteer 6 B x6B

Volunteer 7 B x7B

Volunteer 8 B x8B

Volunteer 9 B x9B

Volunteer 10 B x10B

Page 62: Basics statistics

Parametric statistics

assumes that the data come from a type of probability distribution and make inferences about the parameters of the distribution

requires assumptions (e.g. Normal distribution), if they are correct they produce more accurate and precise estimates and have generally more statistical power

e.g. Independent sample t-test

Assumptions

• Independent observations

• Population 1 → X1i ~ N(μ1, σ²)

Population 2 → X2i ~ N(μ2, σ²)

H0 : μ1 = μ2 → H0 two distributions are equal

62Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Page 63: Basics statistics

Non-parametric statistics – rank tests

no specific assumption about the population distribution required

Example: statistics based on Rank tests

Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is defined as

The smallest observation gets rank 1, the second smallest rank 2, …, the largest observation gets rank n.

In case of ties (a tie is a pair of equal observations), the ranks of the tied observations are defined as the average of their ranks according to the definition just given. These are called mid-ranks.

63Part 4 – Statistical tests

Rj = R(Xj) = number of observations in the sample < Xj

)(1

ji

n

iXXI ≤=∑

=

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 64: Basics statistics

Example

Properties of rank-transformed observations

they only depend on the ordering of the observations

they are insensitive to outliers (robust)

the distribution of the ranks does not depend on the distribution of the observations

64

Observations Ranks2 18 212 (3+4)/212 (3+4)/215 539 6

Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 65: Basics statistics

Non-parametric statistics – permutation tests

reference distribution of a characteristic of interest is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.

Example: a company has a new training program and whishes to evaluate if the new method is better than the traditional one. To assess the effect of the new method, they set up an experiment with 7 new employees. Four of them are randomly assigned to the new training method, and the other three received the old training method.

65

New Traditional 37 2349 3155 4657

New Traditional 37 2349 31 55

55 31 4657

Observed data

Permutations

35!3!4

!747

==

Rearrangement

Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 66: Basics statistics

Permutation tests

to verify whether there is a difference in means of a continuous measurement in 2 independent populations

Permutation null distribution

H0 : F1(x) = F2(x) for all x.

HA : μ1 > μ2

Test statistic

Example: we have 35 possible permutations (each having a t*-value), the collection of all the t*-values is the permutation null distribution

66

21 XXT −=

Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 67: Basics statistics

Permutation test - example

Test statistic → t = 49.5 – 33.3 = 16.2

Permutation null distribution of the 35 possible permutations, under the null hypothesis all t*-values are equally likely

H0 will be rejected for large T (T>c, critical value), c controls the type I error rate at α P(T > c |H0) < α

67

21 XXT −=

Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 68: Basics statistics

Parametric versus non-parametric tests

Parametric tests: the data are sampled from a population with N-distribution OR large sample size (CLT)

Smaller sample size: outliers or skewed distribution can be problematic → transformation or non-parametric tests (permutation or rank tests)

Permutation tests: very flexible

Non-parametric rank tests: in case of no meaningful measurement scale (pain score, Apgar score, …)

Careful with formulation of H0 and interpretation of the analysis

Less power

68Part 4 – Statistical tests

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Page 69: Basics statistics

Categorical / discrete data: the set of all possible values can be enumerated

Ordinal data: ordered categories

Age group, pain assessment from no to severe, Likert scales (agree strongly, agree, neutral, disagree, disagree strongly)

Nominal data: categories have no natural order, sometimes called qualitative data (gender, race, hair color)

Counts: variables are represented by frequencies

Proportions / percentages

Ratio of counts e.g. binary or dichotomous data: have exactly two possible outcomes (success / failure), we count the number of success in the number of trials

69

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Rank testsPermutation tests

Part 4 – Statistical tests

Page 70: Basics statistics

One-sample tests Parametric statisticsNon-parametric statistics Categorical data - Proportions

One-sample t-test

One-sample t-test

to verify whether the mean of a continuous measurement deviates from a given value μ0

H0 : μ = μ0

HA : μ ≠ μ0

Test statistic

t-distributed with n-1 degrees of freedom (df)

Assumptions

Independent observations

Normally distributed observations or large sample

70Part 4 – Statistical tests

Page 71: Basics statistics

One categorical variable with J ≥ 2 categories

Example: number of students in each of the three main subjects in the 1st

master psychology (2003-2004)

Suppose that in the population, the true proportions are:

Part 6 – Categorical data 71

One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions

1-way contingency tables

Page 72: Basics statistics

X² test One categorical variable with J ≥ 2 categories

Statistic

H0 : pj = πj for all j or for frequencies nj = μj

HA : pj ≠ πj

Statistic

Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null hypothesis should be rejected.

Part 6 – Categorical data 72

One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions

1-way contingency tables

Page 73: Basics statistics

Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Paired sample t-test

Paired sample t-test

to verify whether 2 continuous measurements, obtained from paired subjects, are the same on average

H0 : μ1 = μ2

HA : μ1 ≠ μ2

→ calculate differences Y = X1 – X2 and use the one-sample t-test to verify whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y

Assumptions

Independent differences

Normally distributed differences or large sample (n ≥ 40)

n ≥ 15 t-test fine unless very skewed distribution or outliers

n < 15 data ~ N-distr, very skewed distribution or outliers problematic

73Part 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’

Page 74: Basics statistics

Wilcoxon signed rank test

Compare 2 dependent samples → the difference variable Y = X1 - X2

Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi

-

observations on the negative differences (i = 1, …, n-) then

H0 : P(Y - < Y +) = ½

HA : P(Y - < Y +) > ½

Statistic

74Part 4 – Statistical tests

Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Wilcoxon signed rank test

Page 75: Basics statistics

Wilcoxon signed rank test - Example

Two stories ware narrated to children with reading disorders, story 1 was not illustrated whereas story 2 was illustrated

V= 9, n=5, p=0.406

From this small sample we could not conclude that children with reading disorders can tell a story better when the story was illustrated.

75

Child 1 2 3 4 5Story 1 0.40 0.72 0.00 0.36 0.55

Story 2 0.77 0.49 0.66 0.28 0.38

Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17

ranks of |Yi | 4 3 5 1 2

signed ranks 4 -3 5 -1 -2 V = 9

Part 4 – Statistical tests

Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Wilcoxon signed rank test

Page 76: Basics statistics

Models for matched pairs

For comparing categorical responses for 2 samples when each sample has the same subject or when a natural pairing exists between each subject in one sample and a subject from the other sample.

McNemar test compares proportions in paired studies

H0 : π1+ = π+1

HA : π1+ ≠ π+1

76Part 4 – Statistical tests

Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Models for matched pairs

After TotalBefore Yes NoYes n11 n12 n1+

No n21 n22 n2+

Total n+1 n+2 n

Page 77: Basics statistics

Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Independent sample t-test

Independent sample t-test

to verify whether the mean of a continuous measurement is the same in 2 independent populations

H0 : μ1 = μ2 versus HA : μ1 ≠ μ2

Test statistic

Measurement variance = in the 2 groups

Measurement variance ≠ in the 2 groups

Assumptions

Independent observations

Normally distributed observations or large sample in each group

Small but equal sample size n1 = n2 = 5 and shape of distributions comparable → we can still trust on t-test procedures

77

*t

Part 4 – Statistical tests

Page 78: Basics statistics

Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Independent sample t-test

Independent sample t-test – continued

Measurement variance = in the 2 groups, SE of the mean difference can be estimated as

With

Measurement variance ≠ in the 2 groups, SE of the mean difference can be estimated as

(1-α)100% confidence interval for μ1 - μ2 versus

78Part 4 – Statistical tests

Page 79: Basics statistics

Mann-Whitney (U) test, Wilcoxon rank-sum test

Compare 2 independent samples

H0 : F1(x) = F2(x) for all x

HA : P(X1 < X2) ≠ ½

where X1 and X2 have distributions F1 and F2, respectively.

If X1 and X2 are continuous random variables, the test may be thought of as testing the null hypothesis that the probability of an observation from one population exceeding an observation from the second population is 0.5, this implies

P(X1 < X2) = P(X1 > X2) = ½

→ test statistics based on this principle

79Part 4 – Statistical tests

Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Rank testsMann-Whitney U, Wilcoxon Rank Sum

Page 80: Basics statistics

Is the Wilcoxon rank-sum test the nonparametric alternative for the independent-sample t-test?

RememberH0 : F1(x) = F2(x) for all x (2 distributions are equal)

HA : P(X1 < X2) ≠ ½

→ the ranks cannot be used to estimate the mean!

Independent sample t-test

H0 : μ1 = μ2

HA : μ1 ≠ μ2

80Part 4 – Statistical tests

Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Rank testsMann-Whitney U, Wilcoxon Rank Sum

Page 81: Basics statistics

2x2 contingency tables

Example: Patient characteristics at the onset of first-line treatment with gefitinib or chemotherapy

81

Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

2X2 contingency tables

ECOG PS Total

Treatm < 2 ≥ 2

Gefinitib 70 17 87

Chemo 57 4 61

Total 127 21

Frequency

ECOG PS Total

Treatm < 2 ≥ 2

Gefinitib 0.805 0.195 1.00

Chemo 0.934 0.066 1.00

Total

Conditional distribution of ECOP PS status given treatment

Two variables are said to be statistically independent if the conditional distributions of Y (Eastern Cooperative Oncology Performance status) are identical at each level of X (treatment)

Part 4 – Statistical tests

Page 82: Basics statistics

Testing independence - Pearson chi-square test

H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj

HA : πij ≠ πi+ π+j

Statistic

Example

Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated, The proportion of patients with a poor ECOG performance status (≥ 2) was higher in the first-line gefitinib group (20%) than in the first-line chemotherapy group (7%; P = 0.026).

82

Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

2X2 contingency tables

Part 4 – Statistical tests

Page 83: Basics statistics

Testing independence – Fisher’s exact test

For small samples, Fisher’s exact test: assumes that the row and margin totals are fixed (hypergeometric distribution). When this assumption is not met (most cases), Fisher’s exact test is very conservative, resulting in a type I error below 0.05.

H0 : θ = 1

HA : θ ≠ 1

Part 6 – Categorical data 83

Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

2X2 contingency tables

Treatm Adeno Nonadeno Total

Gefinitib 85 2 87

Chemo 58 3 61

Total 142 5 673

Two-sided p-values:Fisher’s exact test p = 0.403Chi-square test p=0.385

Page 84: Basics statistics

Large samples

In case of very large sample sizes pearson chi-square will reject almost any null hypothesis, even if the deviation of the observed from the expected counts is of little importance → use the Gini index (value equals the proportion of observations that would have to be moved from one cell to another in order for the observed counts to equal the expected counts

Small samples

Inferences based on chi-square distribution become questionable when the expected counts in some cells become too small (below 5) even when the total sample size is large → use exact solutions (Fishers Exact test)

Part 6 – Categorical data 84

Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

2X2 contingency tables

Page 85: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

One-way analysis of variance (ANOVA)

to verify whether the mean of a continuous measurement is the same in 2 or more independent populations

H0 : μ1 = μ2 = … = μk versus

HA : at least 1 of the population means differs

Test statistic

Assumptions

Independent observations

Normally distributed observations or large sample within each group (Q-Q plots)

Equal variance in each group (boxplots or Levene’s test)

85

~knkF −− ,1F =

Between MSE

Within MSE

H0

Part 4 – Statistical tests

Page 86: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

ANOVA principle

Is variation between groups large as compared to variation within groups

86

∑∑∑∑∑∑= == == =

−+−=−k

i

n

ji

k

i

n

jiij

k

i

n

jij

iii

YYYYYY1 1

2

1 1

2

1 1

2 )()()(

Total Sum of Squares = within SS + between SS

Consider k groups with each ni observations with jth observation in ith group

Part 4 – Statistical tests

Page 87: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

87

ANOVA Table

Total

n-kWithin

k-1Between

FMean Squared ErrorMSE

dfSum of SquaresSS

Source

∑∑= =

−k

i

n

jij

i

YY1 1

2)(

∑∑= =

−k

i

n

jiij

i

YY1 1

2)(

∑∑= =

−k

i

n

ji

i

YY1 1

2)(1−k

SSB

knSSW

W

BMSE

MSE

Part 4 – Statistical tests

Page 88: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

Deviations from the assumptions

one-way analysis of variance is robust against lack of normality

→ in case of important deviations from a normal distribution : use nonparametric Kruskal-Wallis test or transformations

ANOVA is not sensitive to the assumption of homogeneity of variances (perform Levene’s test at the 1% sigificance level)

→ heterogeneity of variances

• little impact when the group level sample sizes ≈ equal: Type I error rate is slightly increased

• with important heterogeneity and markedly ≠ group level sample sizes, weighted least squares regression may be used, weighting each observation by the inverse group level standard deviation

88Part 4 – Statistical tests

Page 89: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

Post-hoc analysis

if ANOVA detects no difference, we conclude that there is insufficient evidence of a difference in means

if ANOVA detects a difference → post-hoc analysis to investigate where the difference is

DO NOT perform all pairwise comparisons using independent samples t-tests → multiple testing problem

Assume we perform 3 different t-test, each conducted with α = 0.05

The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857 (assuming independence of tests)→ the level of sign that at least one of the three tests leads to conclusion HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05).

The level of significance and power for a family of tests ≠ individual test

89Part 4 – Statistical tests

Page 90: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

Family-wise error rate - αE

The probability of making at least 1 false discovery (type I errors) among all the hypotheses when performing multiple pairwise tests

→ We should correct for the risk of false detections

most procedures for multiple testing are designed to control the risk of at least 1 false detection at αE, assuming that all k null hypotheses are true

when the k tests are independent, each with significance level α, then

αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α

family-wise error rate increases with the number of tests

90Part 4 – Statistical tests

Page 91: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

Multiple comparison procedures that control family-wise error rate

Bonferroni procedure

Conservative test: makes less Type I errors than allowed for (and thus more Type II errors)

Only applicable when the effects to be investigated are identified in advance of the data analysis

Tukey procedure

Preferred method when only pairwise comparisons are to be made

Scheffé procedure

Preferred method when the family of interest is a set of all possible contrasts among the factor level means

91Part 4 – Statistical tests

Page 92: Basics statistics

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

Rules of thumb

never interpret a large p-value as indicating absence of association

never interpret a small p-value as indicating an important association

report p-values in combination with an effect estimate and confidence interval! This allows for judging whether the effect is practically significant.

in some cases, it may be advisable to determine equivalence intervals prior to data analysis

92Part 4 – Statistical tests

Page 93: Basics statistics

> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions

Kruskal-Wallis test

Kruskal-Wallis rank test

k-sample problem, compare more than 2 independent samples

H0 : F1(x) = F2(x) = … = Fk(x) for all x

HA : P(X1 < X2) ≠ ½ the observations in some populations are systematically larger than in other populations

Assumptions

the observations in each group come from populations with the same shape of distribution

93Part 4 – Statistical tests

Page 94: Basics statistics

> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions

Kruskal-Wallis test

Kruskal-Wallis rank test

the rank test statistic is basically an MSEbetween based on the ranks

rank all observations in the combined sample

let Rij denote the rank Xij (i =1, …, k, j =1, …, ni)

Kruskal-Wallis test statistic

94Part 4 – Statistical tests

average of the ranks Rij (j =1, …, ni) in the ith group

Page 95: Basics statistics

> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions

Kruskal-Wallis test

Kruskal-Wallis rank test

when H0 is rejected → at least 2 means are different → pairwise comparisons Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis in terms of probabilities: HA : P(X1 > X2) …

Family-wise error rate – αE → we should correct for the risk of false detections, Bonferroni correction: when m tests must be performed simultaneously, each of the tests must be performed at α = αE / m

equivalent: multiply each p-value with m before interpreting

95Part 4 – Statistical tests

Page 96: Basics statistics

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Covariance (ANCOVA)

Analysis of Covariance - ANCOVA

Adjustment for a confounder (e.g. age)

Just like in ANOVA we have a treatment effect (consider for example 3 treatments)

We add the variable age to our model → adjustment for a confounder

96Part 4 – Statistical tests

Page 97: Basics statistics

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Breslow-Day testCochran-Mantel-Haenszel test

Three-way contingency tables

In studying the effect of an explanatory variable X on a response variable Y, one should control covariates that can influence that relationship

Example: Peginterferon alfa for hepatitis C

97Part 4 – Statistical tests

Virologic Response

Genotype Treatment Yes No

1 A 138 160B 103 182

2 A 106 34B 88 57

Total A 244 194B 191 239

Conditional odds ratio θ1

Marginal odds ratio

Conditional odds ratio θ2

Page 98: Basics statistics

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Breslow-Day testCochran-Mantel-Haenszel test

Breslow-Day test for testing homogeneity of odds ratios

The odds ratio between X and Y is the same as in different Z categories. It is a test of homogeneous association.

98Part 4 – Statistical tests

Page 99: Basics statistics

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Breslow-Day testCochran-Mantel-Haenszel test

Cochran-Mantel-Haenszel Test of conditional independence

Conditional XY independence given Z in a 2 × 2 × K table.

The response is conditionally independent of the treatment in any given strata

Inappropriate when the association varies dramatically among the partial tables

99Part 4 – Statistical tests

Page 100: Basics statistics

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Breslow-Day testCochran-Mantel-Haenszel test

Cochran-Mantel-Haenszel Test of conditional independence

Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36, p=0.064 CMH test). Indicating that the response is independent of the treatment in the different ECOP PS strata.

100Part 4 – Statistical tests

6. Bokemeyer et al, 2008: M&M and p 667 Efficacy

Response

ECOP PS Treatment Yes No

0 Cet. + FOLFOLFOX-4

1 Cet. + FOLFOLFOX-4

2 Cet. + FOLFOLFOX-4

Total Cet. + FOL 77 92FOLFOX-4 60 108

Conditional odds ratio θ1

Marginal odds ratio = 1.51

Conditional odds ratio θ2

Conditional odds ratio θ3