Basics statistics

Introduction to statistics

Els Adriaens, PhD

December 17, 2010 1

Overview

Outline

Formulate a relevant research question

Study design

Gather the data according to the plan

Analyze the data

Explorative data-analyses (descriptives, graphically)

Drawing inference (answer our research question with certain confidence)

Report the results

Overview 2

Experimental versus observation studiesDesign of an experimental studyOverview study designs

Experimental studyObservational studyMixed experimental and observational studies

Part 1 – Design of a study 3

Part 1

Design of a study

Experimental study

Factor levels (treatments) randomly assigned over the different experimental units (control over explanatory variable)

→ information about cause-and effect relationship between the explanatory factors and a response variable

Example: Effect of Vitamin C on prevention of colds in 800 children. Half of the children were selected at random and received Vit C (treatment group) the remaining children received a placebo (control group)

Qualitative explanatory factor with two levels and children as experimental units

4Part 1 – Design of a study

Observational study

Data obtained from non-experimental study: explanatory variables not controlled, randomization of the treatments to experimental units does not occur

→ establish associations between the explanatory factors and a response variable

Example: Company officials wished to study the relation between the age of an employee and the number of days of illness in a year.

Explanatory variable not controlled → age is observed

Establish associations but no cause-and-effect: a positive relation between age and number of days of illness may not imply that number of days of illness is the direct result of age → younger employees work indoors while older employees usually work outdoors, and therefore work location is more responsible for the number of days of illness instead of age

Mixed studies

Example: a clinical trial performed in 3 hospital centers, at each center the effect of drug on lowering blood cholesterol was investigated. Within each hospital center volunteers were randomly assigned to one of the two treatments (drug / placebo)

Experimental factor: treatment (drug versus placebo)

Observational factor: hospital center, not randomly assigned since each volunteer was assigned to the nearest hospital center

Experimental - observation studiesDesign of an experimental studyOverview study designs

Factors and treatmentsRandomizationSampling from a population

Measurements

Structure of the experiment

2 levels of factor A x 3 levels of factor B = 6 treatments

experimental unit: smallest unit of experimental material to which a treatment can be assigned, the experimental unit is determined by the method of randomization

Factor B

Factor A

Level 1 Level 2 Level 3

Level 1

Level 2

Experimental unitReplicates = treatment repeated → estimate experimental error

Part 1 – Design of a study

Number of factors: initial stages of investigation → include many factors (more than can possibly studied in a single experiment)

Cause-and-effect diagrams are often used to identify factors that could affect the outcome → reduce number of factors

Example : 4 factors each 2 levels → 16 treatment combinations

Number of levels of each factor:

Qualitative factors

Quantitative factors: # levels reflect the type of trend expected by the experimenter

• 2 levels ~ linear change in response: min – max of specified range

• 3 levels ~ quadratic trend

• > 4 levels ~ detailed examination shape of response curve desired

Range of factor is one of the most important design decisions

Measurements

Measurements: precision versus accuracy

Precision of a variable: the degree to which a variable has nearly the same value when measured several times. It is a function of random error (chance) and is assessed as the reproducibility of repeated measurements.

Example: weigh the same person 3 times on an electronic balance and obtain slightly different measurements – 67.5 kg, 67.4 kg and 67.6 kg

The more precise a measurement, the greater the statistical power at a given sample size to estimate mean values and to test hypothesis

Variability may be due to operator, instrument and subject

Minimize random error and improve precision

Operating manuals, training the operator, refining / automating instruments

Repeat the measurement and average over a larger number of observations (but! added cost, practical difficulties)

Measurements

Accuracy of a variable: the degree to which a variable actually represents what it is supposed to represent. It is a function of systematic error (bias) which is often difficult to detect and has important influence on the validity of the result.

Example 1: incorrect calibration of an instrument

Example 2: gastric freezing as a treatment for ulcers in the upper part of the intestine

Improve accuracy and minimize bias

Operating manuals, training the operator, refining / automating instruments

Periodic calibration using a gold standard (example 1)

Blinding: double–blind study: the experimental subject and the evaluator have no information on which treatment that they receive or give, any inaccuracy in measuring the outcome will be the same in the 2 groups (example 2)

Measurements

Bias and variance in shooting arrows at a target. Bias means that the archer systematically misses in the same direction. Variance means that the arrows are scattered (Moore and McCabe 2002)

11Part 3 – Statistical inference

Measurements

Sampling from a population

Simple random sample

Random draws

With equal probability

Population (N elements) Sample (n elements)

Measurements

Randomization → treatments are at random assigned to experimental units

Tends to eliminate the influence of extraneous factors not under direct control of the experimenter

Blocking → increase precision by talking into account other factors

Subjects

Group 1 → treatment 1

Females

Randomization

HeterogeneousHomogeneous

Homogeneous

Measurements

Stratified Sampling

Suppose we want to know the attitudes of male and female students in the engineering school

Is a simple random sample from that school a good idea?

No too few women (10%)

Stratify the sample, pick a random sample from

Stratum 1: female engineers

Stratum 2: male engineers

Estimates are measured with comparable precission. Learn from distribution in each stratum, do NOT pool the data

e.g. if the average weight is 60kg for the women and 80 kg for the men, The average engineer will weight 10% x 60 + 90% x 80 = 78 kg

Types of variablesUnivariate descriptivesBivariate descriptives

Part 2 – Explorative data-analysis 15

Part 2

Explorative data-analysis

Descriptive statistics

Allows the researcher to describe or summarize the data. This is typically done in the beginning of a results section. The researcher gives an idea of the sample size, the characteristics under study (e.g. baseline characteristics in a clinical trial)

Example: A total of 235 students participated in this study, 163 women (69.4%) versus 72 men (30.6%). On average the female students (81.3 ± 19.4) had a slightly higher score on exam 2 in comparison to the male students (80.7 ±18.1).

16Part 2 – Explorative data-analysis

We typically start with univariate explorations (one variable at a time). Next, describe joint distributions (2 by 2 = bivariate; more variables = multivariate)

Graphical summary to inspect the shape of the distribution: symmetry, modality, heaviness of tails

Numerical summary: classical measures of location and spread

Mean and standard deviation

Median and interquartile range

Mode: value that occurs most often (useful for nominal data)

Notes on notation

A random variable X is a variable whose value is a numerical outcome of a random phenomenon (nonnumerical outcomes are numerically encoded)

Random variables are usually denoted by capital letters such as X, Y, …

Fixed constants or observed values are usually denoted by small letters e.g. x, y. Special constants (to be specified) will be written as Greek letters α, β, μ, σ

indices i will subscript random or observed outcomes for individual observations in the data set: Yi , yi

Type Characteristic Example Descriptive statistic

Information content

Categorical the set of all possible values can be enumerated

• Nominal Unordered categories Gender, race Counts, proportions

• Ordinal Ordered categories Degree of pain Median Intermediate

Continuous or ordered discrete

can take all possible values within some interval of real numbers (continuous) or limited to integers (discrete)

Weight, number of cigarettes per day

Mean, standard deviation

Higher

Histogram – BoxplotMeasures for location centerMeasures of spread

Normal curve

Mean of a series of observations xi, i = 1, 2, …, n

Properties given that X and Y are random variables and ‘a’ is a scalar

Median (M): middle of the distribution such that at least 50% of the outcomes is larger than or equal to M and at least 50% of the outcomes is smaller than or equal to M

For n uneven: this is the middle value in order of magnitude

For n even: one will take the average of the two middle values

µµµ

Normal curve

Mean is very sensitive to outliers

Numbers of partners desired in the next 30 yearsMiller and Fishkin, 1997

Part 2 – Explorative data-analysis

Normal curve

Standard deviation of a series of observed values xi

When the variable is approximately normally distributed, approximately 95% of the data will lie between and

Square of SD is called the Variance Var(x)

Variation coefficient

∑=−=

i i xxn

2)(1)(

%100)(x

)(96.1 xSDx − )(96.1 xSDx +

Normal curve

Interquartile range (IQR): distance Q3 – Q1 with

Q1: a value such that at least 25% of the outcomes fall below Q1 and at least 75% of the outcomes fall above Q1

Q3: a value such that at least 75% of the outcomes fall below Q3 and at least 25% of the outcomes fall above Q3

If more than one value satisfies this criterion, the average is usually taken

Normal curve

Five number summary: Min, Q1, Median, Q3 Max

quartiles

Median

whiskers

reach to largest observation within a distance of 1.5 x IQR

1.5 x IQR

Normal curve

Bar diagram for continuous data – relative or absolute frequencies

Birth weight

Normal curve

Normal distribution

Density

μ is the population mean

σ² is the population variance

Notation X ~ N(μ, σ²)

If X ~ N(μ, σ²), then ~ N(0, 1) is a standard normal distribution

σµ−

−= σ

πσφ

Normal curve

Properties of the standard normal distribution N(0, 1)

unimodal: 1 maximum (i.e. 0)

symmetric around 0

68-95-99.7 rule:

• 68% of the area under the curve (AUC) lies between -1 and 1, 68% of the observations fall within 1 SD of the mean μ

• 95% of the AUC lies between -2 and 2, 95% of the observations fall within 2 SD of the mean μ

• 99.7% of the AUC lies between -3 and 3, 99.7% of the observations fall within 3 SD of the mean μ

Normal curve

Normal quantile plot

Compares two distributions by plotting their quantiles against each other

If the observed and the normal distribution are identical, points are expected to lie on a straight line with intercept 0 and slope 1

Distributions with the same shape but simply rescaled or shifted still show up on a straight line but with different intercept (shift) or slope (scale change)

Normal Q-Q plot of randomly generate data N(0, 1) randomly generated exponential data

Continuous dataCategorical data

Bivariate relations – continuous data

Graphical: boxplots, (stacked) histrograms, scatter plots

Correlation coefficient (r):

Takes values between -1 and 1

Pearson correlation coefficient

expresses a degree of linear dependence

1 )()(1

Source wikipedia – Anscombe’s Quartet

r = 0.816 ! Summary statistic cannot replace the individual examination of the data

Bivariate relations - Spearman’s Rank correlation (-1 and 1)

Measures of monotone association (extent to which as one variable increases, the other variable tends to increase or decrease)

No assumption on linearity

Ordinal variables

Source: Answers.com

Bivariate relations - Spearman’s Rank correlation (-1 and 1)

Br J Ophthalmol 2001;85:534-536

Corneal irregular astigmatism afterlaser in situ keratomileusis formyopia

Spearman rank correlation rs=0.440, p <0.0001http://geographyfieldwork.com/SpearmansRank.htm

2x2 associations – categorical data: comparing two proportions

Many studies are designed to compare two groups (X) on a binary response variable (Y)

YX Success Failure

Group 1 π1 1-π1

Group 2 π2 1-π2

Example: is there an association between antiviral drug use (X) and pneumonia (Y).

π: probability of succes

1-π: probability of failure

PneumoniaYes No

Antiviral drug 579 45172 45751Control 648 45103 45751

PneumoniaYes No

Antiviral drug 0.013 0.987 1Control 0.014 0.986 1

Risk difference: is there a difference between the group taking antiviral drug and the control group

π1 – π2 = 0.013 – 0.014 = -0.001

Properties

-1 ≤ (π1 - π2) ≤ 1

if response is independent of group, then (π1 - π2) = 0

A difference may be more important when both success probabilities are close to 0 or 1 than when both p’s are close to 0.5

Example (p1-p2) = 0.09 (0.1-0.01=0.09) or (0.50-0.41=0.09)

In the first case, p1 is 10 times larger than p2 while in the second case p1 is only 1.2 times larger than p2.

Relative risk: ratio of the success probabilities of the 2 groups

Properties

0 ≤ (π1/ π2) ≥ 1

if response is independent of group, then (π1/ π2) = 1

Antiviral drug example

(p1/p2) = (.013/.0.14) = 0.894 with 95% CI: 0.799, 0.999

The sample proportion of pneumonia cases was 10.6% lower for the group prescribed antiviral drug. The CI of the relative risk indicates that the risk of pneumonia is at least 1% lower for the group prescribed antiviral drug.

Odds ratio

For a probability π of success, the odds are defined to be

Odds ≥ 0 with values > 1 when a success is more likely than a failure. For example, if π = .75, then the odds of success = .75/.25 = 3.0: a success is three times as likely as a failure. If Ω = 1/3, a failure is three times as likely as a success.

The ratio of the odds Ω1 and Ω2 in the two rows is called the odds ratio

Properties odds ratio

0 ≤ θ ≥ ∞

When X and Y are independent, then θ = 1

the odds ratio does not change value when the orientation of the table reverses (rows become columns, columns become rows)

Odds ratio - continued

Properties

if θ = 4, the odds of success in row 1 are 4 times the odds in row 2, and thus subjects in row 1 are more likely to have success than are subjects in row 2

θ = 4 does not mean that the probability π1 is four times π2 (that would be the interpretation of relative risk)

the odds ratio does not change when both cell counts within any row (or column, but not both) are multiplied by a nonzero constant; this implies that the odds ratio does not depend on the marginal counts within a row/column

Odds ratio - Example

Sample odds ratio is computed by

For the patients prescribed antiviral drug, the estimated odds of pneumonia is 579/45751 = 0.013. There were 1.3% pneumonia cases for every 100 cases with no pneumonia.

The sample odds ratio = 579*45103/648*45172 = 0.892. (95% CI: 0.797, 0.999). The estimated odds for patients prescribed antiviral drug equals 0.892 times the estimated odds for patients in the control group. The estimated odds were 10.8% lower for the antiviral drug group.

PneumoniaYes No

Antiviral drug 579 45172 45751Control 648 45103 45751

Relation between odds ratio and relative risk

When the proportion of successes is close to 0 for both groups, the sample odds ratio is similar to the sample relative risk. In such a case, on odds ratio of 0.89 does mean that the probability of success for the patients prescribed antiviral drug is about 0.89 times the probability of success for the patients in the control group

Relative risk = 0.894 (95% CI: 0.799, 0.999)

Odds ratio = 0.892 (95% CI: 0.797, 0.999)

What should be used, risk difference, relative risk or odds ratio

The odds ratio is the preferred estimate

In a case-control study it is usually not possible to estimate the probability of an outcome given X (π1), and therefore it is also not possible to estimate the difference of proportions or relative risk for that outcome

In a retrospective study, 709 patients with lung cancer (cases) were queried about their smoking behavior (X). Each case was matched with a control patients: same age, same gender, same hospital but no lung cancer

Odds ratio = 2.97 the estimated odds of lung cancer for smokers were2.97 times the estimated odds for non-smokers

Lung cancerCases Controls

Smoker 688 650Non-smoker 21 59Total 709 709

Part 3 – Statistical inference 40

Part 3

Statistical inference

DistributionsBias and varianceHypothesis testing

Statistical inference: by using the laws of probability, we infer conclusions about a population from data collected in a random sample

A parameter (μ, σ) is a number that describes the population. A parameter is a fixed number, but its value is unkown in practice.

A statistic ( ) is a number that describes the sample. Its value is known when we have collected a sample, but it changes from sample to sample.

Random sampleCollect data

μ, σ

Make inferences about population

Population (N elements)Sample (n elements)

)(xSDX

)(, xSDX

Part 3 – Statistical inference

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Binomial distribution

Poisson distribution

Normal distribution

Binomial distributionPoisson distributionNormal distribution

Binomial distribution

Fixed number of n independent observations

Each observations falls in one of two categories (success/failure)

The probability of success ‘p’ is the same for each observation

→ denote X the number of successes among the n observations which can take values 0, 1, …, n then X ~ B(n, p)

Properties

Probability mass function

)1(2 pnpnp

Poisson distribution: expresses the number Y of events in a given unit of time, space, volume, or any other dimension

Example → modeling a phenomenon in which we are waiting for an occurrence (waiting for customers to arrive in a bank)

Basic assumption: for small time intervals, the probability of an occurrence is proportional to the length of waiting time

Single parameter λ >0, the average number of events per unit of measurement.

k = number of occurrences of an eventλ = expected number of occurrences that occur during the given interval

Normal distribution

density

X1, X2, …, Xn is a simple random sample with mean μ and variance σ²

if Xi ~ N(μ, σ²) then ~ N(μ, σ²/n)

Central limit theorem

Draw a simple random sample (X1,… , Xn) of size n from a population with mean μ and finite variance σ². When n is large, the sample average then follows approximately a normal distribution regardless of the data distribution.

−= σ

πσφ

nN ²,σµ

Sampling variabilityStandard deviation vs standard errorConfidence interval

Law of large numbers: population mean μ of X is unknown. The mean of a simple random sample → estimate of μ .

is a random variable that varies in repeated sampling

guarantees that as the sample size of a simple random sample increases, the sample mean gets closer to the population mean μ

Unbiased statistic: a statistic used to estimate an unknown parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.

Variability of a statistic is described by the spread of its sampling distribution.

Spread determined by sampling design and sample size. Larger samples have smaller spread.

How precise is our estimate?

Sample Population

Generalize findings for general populationEstimate must approximate the population value

Representative sample→ prevents the results for the sample from being biased

→ results are still subject to sampling variability: different samples from the same population will yield different results

Generalizing results from the sample to the study population then requires that we acknowledge sampling variability

Standard deviation ≠ standard error

Standard error measures the uncertainty in an estimate (standard error of the mean = SEM)

Standard deviation (SD) of the observations → measures the variability in the observations

both are standard deviations, but the standard error shrinks with increasing sample size, in contrast to the standard deviation of the observations

The mean and SD are the preferred summary statistics for (normally distributed) data, and the mean and 95% confidence interval are preferred for reporting an estimate and its measure of precision.

Sampling distribution of the sample means

Confidence intervals

When we estimate a parameter by calculating a sample statistic, there is a degree of uncertainty in our estimation

We can construct an interval around the sample mean within which we expect the true population mean μ with known probability (e.g. 95% chance)

(1-α)100% confidence interval for the mean contains the population mean with (1-α)100 % chance. Confidence level or coverage probability is (1-α)

×± − nstX n 2/,1α

σ known σ unknown

Principle of statistical testsp-value and powerone-sided versus two-sided testing

Hypothesis testing

The null hypothesis (Ho) assumes ‘no difference’ or ‘no effect’

The average … is equal in both treatment groups

The alternative hypothesis (HA) is claiming the opposite

The average … differs by treatment

Type of decision H0 true HA true

Accept H0p > α Correct decision (1-α) Type II error (β)

Reject H0p < α Type I error (α) Correct decision (1- β)

We assume H0 is true unless we can demonstrate, based on sample data at the desired level of confidence, that HA is true.

→ level of confidence related to 2 potential types of statistical errors

• example: in a clinical trial we want to study the effect of an experimental drug (T) and compare it to a placebo (P)

H0 : effect of drug T = effect of P

HA : effect of drug T ≠ effect of P

Type I error (false positive): concern of the regulators, the drug is not working but it will go to the market

Type II error (false negative): concern of pharmaceutical companies, could not prove that the new drug is working

Sensitivity and specificity

Positive (ill) Negative (not-ill)

Test outcome → Positive

True Positive (TP)False Positive (FP)

Type I error (P-value)

Test outcome→ Negative

False negative(FN)Type II error

True Negative (TN)

SensitivityProportion ill

people identified as being ill

SpecificityProportion non-ill people identified

non-ill

Gold standard

When are hypothesis needed

Hypothesis are not needed in descriptive studies

If any of the following terms appears in the research question (study not simply descriptive) a hypothesis should be formulated: greater than, less than, causes, leads to, compared with, more likely than, associated with, related to, similar to, correlated with.

The hypothesis should be clearly stated in advance.

Principal of statistical testing

calculate a test statistic which measures ‘distance’ from the observed sample to the null hypothesis, whose distribution is known under the null hypothesis

Reject Ho

test statistic t exceeds a chosen cut-off c (critical value) in magnitude

p-value stays below a chosen cut-off α in magnitude

safety principle: cut-off is chosen such that the risk of making a Type I error is controlled at a prespecified significance level α

Usually α = 0.05 (test performed at the 5% significance level)

the power of the test (probability to avoid Type II errors, 1-β) is not controlled → chose adequate designs and sufficiently large sample sizes

critical value c: reject H0 when the test statistic t exceeds the chosen cut-off cin magnitude

p-value: probability to find a result for the test statistic at least as extreme as the observed result (in the direction of the alternative hypothesis), if the null hypothesis holds

Acceptance region

Distribution of test statistic

Rejection region Rejection region

α = 0.05

Power: 1 − β = 1 − P (accept H0|HA) = P (reject HA|HA)

For many testing problems H0 is formulated very precisely, but there are usually an infinite number of distributions consistent with HA.

Standardized effect sizeσµµ 01 − With what probability must the statistical test

detect this smallest relevant difference?~ 91% chance of finding an association of that size or greater

One-sided versus two sided testing

Decided prior to data analysis and avoid one-sided tests unless there are really good reasons for using them (only one direction of the association is clinically or biologically relevant)

never wrong to use a two-sided test where a one-sided test is applicable

at most a slight loss of power

Two-sided testing One-sided testing

Multiple and Post Hoc Hypotheses - testing problem

Inflated rate of false positive conclusions (Type I error)

Assume we perform 3 independent comparison between 2 groups, each conducted with α = 0.05

The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857→ the chance of finding at least one false positive statistically significant test increases to 14.3% (1-0.857=0.143, not 0.05)

Adjusting for multiple hypotheses is especially important when the consequences of making a false positive error are large e.g. mistakenly concluding that an ineffective treatment is beneficial

Adjustments can be made → False Discovery rate control

Part 4 – Statistical tests 59

Part 4

Statistical tests

Continuous data

Parametric statistics

Non-parametric statistics

Categorical data

Ordinal versus nominal

Types of testing

One-sample tests

Two dependent groups

Two independent groups

More than two groups

Controlling for covariates

60Part 4 – Statistical tests

Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions

Dependent versus independent

Continuous/Categorical data Parametric statisticsNon-parametric statistics Categorical data – Proportions

DependentSubject Time x

Treatment AWeight

Time y Treatment B

WeightVolunteer 1 x1A x1B

Volunteer 2 x2A x2B

Volunteer 3 x3A x3B

Volunteer 4 x4A x4B

Volunteer 5 x5A x5B

IndependentSubject Treatment Weight

Volunteer 1 A x1A

Volunteer 2 A x2A

Volunteer 3 A x3A

Volunteer 4 A x4A

Volunteer 5 A x5A

Volunteer 6 B x6B

Volunteer 7 B x7B

Volunteer 8 B x8B

Volunteer 9 B x9B

Volunteer 10 B x10B

Parametric statistics

assumes that the data come from a type of probability distribution and make inferences about the parameters of the distribution

requires assumptions (e.g. Normal distribution), if they are correct they produce more accurate and precise estimates and have generally more statistical power

e.g. Independent sample t-test

Assumptions

• Independent observations

• Population 1 → X1i ~ N(μ1, σ²)

Population 2 → X2i ~ N(μ2, σ²)

H0 : μ1 = μ2 → H0 two distributions are equal

Continuous dataCategorical data – Proportions

Parametric statisticsNon-parametric statistics

Non-parametric statistics – rank tests

no specific assumption about the population distribution required

Example: statistics based on Rank tests

Let X1, …, Xn denote a sample of n observations, the rank of observation Xj is defined as

The smallest observation gets rank 1, the second smallest rank 2, …, the largest observation gets rank n.

In case of ties (a tie is a pair of equal observations), the ranks of the tied observations are defined as the average of their ranks according to the definition just given. These are called mid-ranks.

Rj = R(Xj) = number of observations in the sample < Xj

iXXI ≤=∑

Rank testsPermutation tests

Example

Properties of rank-transformed observations

they only depend on the ordering of the observations

they are insensitive to outliers (robust)

the distribution of the ranks does not depend on the distribution of the observations

Observations Ranks2 18 212 (3+4)/212 (3+4)/215 539 6

Part 4 – Statistical tests

Non-parametric statistics – permutation tests

reference distribution of a characteristic of interest is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points.

Example: a company has a new training program and whishes to evaluate if the new method is better than the traditional one. To assess the effect of the new method, they set up an experiment with 7 new employees. Four of them are randomly assigned to the new training method, and the other three received the old training method.

New Traditional 37 2349 3155 4657

New Traditional 37 2349 31 55

55 31 4657

Observed data

Permutations

35!3!4

Rearrangement

Permutation tests

to verify whether there is a difference in means of a continuous measurement in 2 independent populations

Permutation null distribution

H0 : F1(x) = F2(x) for all x.

HA : μ1 > μ2

Test statistic

Example: we have 35 possible permutations (each having a t*-value), the collection of all the t*-values is the permutation null distribution

21 XXT −=

Permutation test - example

Test statistic → t = 49.5 – 33.3 = 16.2

Permutation null distribution of the 35 possible permutations, under the null hypothesis all t*-values are equally likely

H0 will be rejected for large T (T>c, critical value), c controls the type I error rate at α P(T > c |H0) < α

21 XXT −=

Parametric versus non-parametric tests

Parametric tests: the data are sampled from a population with N-distribution OR large sample size (CLT)

Smaller sample size: outliers or skewed distribution can be problematic → transformation or non-parametric tests (permutation or rank tests)

Permutation tests: very flexible

Non-parametric rank tests: in case of no meaningful measurement scale (pain score, Apgar score, …)

Careful with formulation of H0 and interpretation of the analysis

Less power

Categorical / discrete data: the set of all possible values can be enumerated

Ordinal data: ordered categories

Age group, pain assessment from no to severe, Likert scales (agree strongly, agree, neutral, disagree, disagree strongly)

Nominal data: categories have no natural order, sometimes called qualitative data (gender, race, hair color)

Counts: variables are represented by frequencies

Proportions / percentages

Ratio of counts e.g. binary or dichotomous data: have exactly two possible outcomes (success / failure), we count the number of success in the number of trials

One-sample tests Parametric statisticsNon-parametric statistics Categorical data - Proportions

One-sample t-test

to verify whether the mean of a continuous measurement deviates from a given value μ0

H0 : μ = μ0

HA : μ ≠ μ0

Test statistic

t-distributed with n-1 degrees of freedom (df)

Assumptions

Independent observations

Normally distributed observations or large sample

One categorical variable with J ≥ 2 categories

Example: number of students in each of the three main subjects in the 1st

master psychology (2003-2004)

Suppose that in the population, the true proportions are:

Part 6 – Categorical data 71

One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions

1-way contingency tables

X² test One categorical variable with J ≥ 2 categories

Statistic

H0 : pj = πj for all j or for frequencies nj = μj

HA : pj ≠ πj

Statistic

Example, df = J − 1 = 2 and P < .0001, strongly suggesting that the null hypothesis should be rejected.

One-sample tests Parametric statisticsNon-parametric statisticsCategorical data – Proportions

1-way contingency tables

Two dependent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Paired sample t-test

to verify whether 2 continuous measurements, obtained from paired subjects, are the same on average

H0 : μ1 = μ2

HA : μ1 ≠ μ2

→ calculate differences Y = X1 – X2 and use the one-sample t-test to verify whether H0 : μ = 0 versus HA : μ ≠ 0, where μ is the average of Y

Assumptions

Independent differences

Normally distributed differences or large sample (n ≥ 40)

n ≥ 15 t-test fine unless very skewed distribution or outliers

n < 15 data ~ N-distr, very skewed distribution or outliers problematic

73Part 4 – Statistical tests Source assumptions ‘Introduction to the practice of statistics, Moore & McCabe’

Wilcoxon signed rank test

Compare 2 dependent samples → the difference variable Y = X1 - X2

Whit Yi + observations on the positive differences (i = 1, …, n+) and Yi

observations on the negative differences (i = 1, …, n-) then

H0 : P(Y - < Y +) = ½

HA : P(Y - < Y +) > ½

Statistic

Wilcoxon signed rank test - Example

Two stories ware narrated to children with reading disorders, story 1 was not illustrated whereas story 2 was illustrated

V= 9, n=5, p=0.406

From this small sample we could not conclude that children with reading disorders can tell a story better when the story was illustrated.

Child 1 2 3 4 5Story 1 0.40 0.72 0.00 0.36 0.55

Story 2 0.77 0.49 0.66 0.28 0.38

Difference (Yi ) 0.37 -0.23 0.66 -0.08 -0.17

ranks of |Yi | 4 3 5 1 2

signed ranks 4 -3 5 -1 -2 V = 9

Models for matched pairs

For comparing categorical responses for 2 samples when each sample has the same subject or when a natural pairing exists between each subject in one sample and a subject from the other sample.

McNemar test compares proportions in paired studies

H0 : π1+ = π+1

HA : π1+ ≠ π+1

Models for matched pairs

After TotalBefore Yes NoYes n11 n12 n1+

No n21 n22 n2+

Total n+1 n+2 n

Two independent samples Parametric statisticsNon-parametric statistics Categorical data - Proportions

Independent sample t-test

to verify whether the mean of a continuous measurement is the same in 2 independent populations

H0 : μ1 = μ2 versus HA : μ1 ≠ μ2

Test statistic

Measurement variance = in the 2 groups

Measurement variance ≠ in the 2 groups

Assumptions

Normally distributed observations or large sample in each group

Small but equal sample size n1 = n2 = 5 and shape of distributions comparable → we can still trust on t-test procedures

Independent sample t-test – continued

Measurement variance = in the 2 groups, SE of the mean difference can be estimated as

Measurement variance ≠ in the 2 groups, SE of the mean difference can be estimated as

(1-α)100% confidence interval for μ1 - μ2 versus

Mann-Whitney (U) test, Wilcoxon rank-sum test

Compare 2 independent samples

H0 : F1(x) = F2(x) for all x

HA : P(X1 < X2) ≠ ½

where X1 and X2 have distributions F1 and F2, respectively.

If X1 and X2 are continuous random variables, the test may be thought of as testing the null hypothesis that the probability of an observation from one population exceeding an observation from the second population is 0.5, this implies

P(X1 < X2) = P(X1 > X2) = ½

→ test statistics based on this principle

Rank testsMann-Whitney U, Wilcoxon Rank Sum

Is the Wilcoxon rank-sum test the nonparametric alternative for the independent-sample t-test?

RememberH0 : F1(x) = F2(x) for all x (2 distributions are equal)

HA : P(X1 < X2) ≠ ½

→ the ranks cannot be used to estimate the mean!

H0 : μ1 = μ2

HA : μ1 ≠ μ2

Rank testsMann-Whitney U, Wilcoxon Rank Sum

2x2 contingency tables

Example: Patient characteristics at the onset of first-line treatment with gefitinib or chemotherapy

Two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

2X2 contingency tables

ECOG PS Total

Treatm < 2 ≥ 2

Gefinitib 70 17 87

Chemo 57 4 61

Total 127 21

Frequency

ECOG PS Total

Treatm < 2 ≥ 2

Gefinitib 0.805 0.195 1.00

Chemo 0.934 0.066 1.00

Conditional distribution of ECOP PS status given treatment

Two variables are said to be statistically independent if the conditional distributions of Y (Eastern Cooperative Oncology Performance status) are identical at each level of X (treatment)

Testing independence - Pearson chi-square test

H0 : πij = πi+ π+j for all i and j or for frequencies nj = μj

HA : πij ≠ πi+ π+j

Statistic

Example

Χ² = 4.964, df=1, ECOG PS status and treatment are significantly associated, The proportion of patients with a poor ECOG performance status (≥ 2) was higher in the first-line gefitinib group (20%) than in the first-line chemotherapy group (7%; P = 0.026).

Testing independence – Fisher’s exact test

For small samples, Fisher’s exact test: assumes that the row and margin totals are fixed (hypergeometric distribution). When this assumption is not met (most cases), Fisher’s exact test is very conservative, resulting in a type I error below 0.05.

H0 : θ = 1

HA : θ ≠ 1

Treatm Adeno Nonadeno Total

Gefinitib 85 2 87

Chemo 58 3 61

Total 142 5 673

Two-sided p-values:Fisher’s exact test p = 0.403Chi-square test p=0.385

Large samples

In case of very large sample sizes pearson chi-square will reject almost any null hypothesis, even if the deviation of the observed from the expected counts is of little importance → use the Gini index (value equals the proportion of observations that would have to be moved from one cell to another in order for the observed counts to equal the expected counts

Small samples

Inferences based on chi-square distribution become questionable when the expected counts in some cells become too small (below 5) even when the total sample size is large → use exact solutions (Fishers Exact test)

≥ two independent samples Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Variance

One-way analysis of variance (ANOVA)

to verify whether the mean of a continuous measurement is the same in 2 or more independent populations

H0 : μ1 = μ2 = … = μk versus

HA : at least 1 of the population means differs

Test statistic

Assumptions

Normally distributed observations or large sample within each group (Q-Q plots)

Equal variance in each group (boxplots or Levene’s test)

~knkF −− ,1F =

Between MSE

Within MSE

ANOVA principle

Is variation between groups large as compared to variation within groups

∑∑∑∑∑∑= == == =

−+−=−k

YYYYYY1 1

2 )()()(

Total Sum of Squares = within SS + between SS

Consider k groups with each ni observations with jth observation in ith group

ANOVA Table

n-kWithin

k-1Between

FMean Squared ErrorMSE

dfSum of SquaresSS

Source

∑∑= =

2)(1−k

Deviations from the assumptions

one-way analysis of variance is robust against lack of normality

→ in case of important deviations from a normal distribution : use nonparametric Kruskal-Wallis test or transformations

ANOVA is not sensitive to the assumption of homogeneity of variances (perform Levene’s test at the 1% sigificance level)

→ heterogeneity of variances

• little impact when the group level sample sizes ≈ equal: Type I error rate is slightly increased

• with important heterogeneity and markedly ≠ group level sample sizes, weighted least squares regression may be used, weighting each observation by the inverse group level standard deviation

Post-hoc analysis

if ANOVA detects no difference, we conclude that there is insufficient evidence of a difference in means

if ANOVA detects a difference → post-hoc analysis to investigate where the difference is

DO NOT perform all pairwise comparisons using independent samples t-tests → multiple testing problem

Assume we perform 3 different t-test, each conducted with α = 0.05

The probability that each of the tests → conclude H0 is correct in each case = (0.95)³ =0.857 (assuming independence of tests)→ the level of sign that at least one of the three tests leads to conclusion HA when H0 holds in each case would be 1-0.857=0.143 (not 0.05).

The level of significance and power for a family of tests ≠ individual test

Family-wise error rate - αE

The probability of making at least 1 false discovery (type I errors) among all the hypotheses when performing multiple pairwise tests

→ We should correct for the risk of false detections

most procedures for multiple testing are designed to control the risk of at least 1 false detection at αE, assuming that all k null hypotheses are true

when the k tests are independent, each with significance level α, then

αE = P(at least 1 Type I error) = 1 − (1 − α)k ≈ k α

family-wise error rate increases with the number of tests

Multiple comparison procedures that control family-wise error rate

Bonferroni procedure

Conservative test: makes less Type I errors than allowed for (and thus more Type II errors)

Only applicable when the effects to be investigated are identified in advance of the data analysis

Tukey procedure

Preferred method when only pairwise comparisons are to be made

Scheffé procedure

Preferred method when the family of interest is a set of all possible contrasts among the factor level means

Rules of thumb

never interpret a large p-value as indicating absence of association

never interpret a small p-value as indicating an important association

report p-values in combination with an effect estimate and confidence interval! This allows for judging whether the effect is practically significant.

in some cases, it may be advisable to determine equivalence intervals prior to data analysis

> two independent samples Parametric statisticsNon-parametric statistics Categorical data – Proportions

Kruskal-Wallis test

Kruskal-Wallis rank test

k-sample problem, compare more than 2 independent samples

H0 : F1(x) = F2(x) = … = Fk(x) for all x

HA : P(X1 < X2) ≠ ½ the observations in some populations are systematically larger than in other populations

Assumptions

the observations in each group come from populations with the same shape of distribution

Kruskal-Wallis test

the rank test statistic is basically an MSEbetween based on the ranks

rank all observations in the combined sample

let Rij denote the rank Xij (i =1, …, k, j =1, …, ni)

Kruskal-Wallis test statistic

average of the ranks Rij (j =1, …, ni) in the ith group

Kruskal-Wallis test

when H0 is rejected → at least 2 means are different → pairwise comparisons Wilcoxon rank sum statistic or Mann-Whitney statistic: alternative hypothesis in terms of probabilities: HA : P(X1 > X2) …

Family-wise error rate – αE → we should correct for the risk of false detections, Bonferroni correction: when m tests must be performed simultaneously, each of the tests must be performed at α = αE / m

equivalent: multiply each p-value with m before interpreting

≥ two independent samplescontrolling for covariate

Parametric statisticsNon-parametric statisticsCategorical data – Proportions

Analysis of Covariance (ANCOVA)

Analysis of Covariance - ANCOVA

Adjustment for a confounder (e.g. age)

Just like in ANOVA we have a treatment effect (consider for example 3 treatments)

We add the variable age to our model → adjustment for a confounder

Breslow-Day testCochran-Mantel-Haenszel test

Three-way contingency tables

In studying the effect of an explanatory variable X on a response variable Y, one should control covariates that can influence that relationship

Example: Peginterferon alfa for hepatitis C

Virologic Response

Genotype Treatment Yes No

1 A 138 160B 103 182

2 A 106 34B 88 57

Total A 244 194B 191 239

Conditional odds ratio θ1

Marginal odds ratio

Breslow-Day test for testing homogeneity of odds ratios

The odds ratio between X and Y is the same as in different Z categories. It is a test of homogeneous association.

Cochran-Mantel-Haenszel Test of conditional independence

Conditional XY independence given Z in a 2 × 2 × K table.

The response is conditionally independent of the treatment in any given strata

Inappropriate when the association varies dramatically among the partial tables

Cochran-Mantel-Haenszel Test of conditional independence

Example Colon cancer: ECOG PS-adjusted OR = 1.52 (95% CI, 0.98-2.36, p=0.064 CMH test). Indicating that the response is independent of the treatment in the different ECOP PS strata.

6. Bokemeyer et al, 2008: M&M and p 667 Efficacy

Response

ECOP PS Treatment Yes No

0 Cet. + FOLFOLFOX-4

Total Cet. + FOL 77 92FOLFOX-4 60 108

Marginal odds ratio = 1.51

Basics statistics

Technology

Statistics: Concepts of statistics for researchers · Statistics : Introduction: this is a four-session module which covers the basics of statistics and aims to provide a platform

Back to Basics, 2012 POPULATION HEALTH : Vital & Health Statistics

Chapter II: Basics from probability theory and statistics

9-3 Basics of Statistics (Presentation)

Back to Basics, 2013 POPULATION HEALTH : Vital & Health Statistics

Chapter 2: Basics from Probability Theory and Statistics

National Diabetes Statistics Report, 2017diabetes.org/assets/pdfs/basics/cdc-statistics-report-2017.pdf · National Diabetes Statistics Report, 2017 Estimates of Diabetes and Its

Back to Basics, 2011 POPULATION HEALTH : Vital & Health Statistics

Current Employment Statistics & Local Area Unemployment Statistics Basics Current Employment Statistics & Local Area Unemployment Statistics Basics Joseph

ABSITE statistics: the absolute basics - Jones Surgeryjonessurgery.com/wp-content/uploads/Jones-C-ABSITE-Statistics-1.1.pdfABSITE statistics: the absolute basics Christian Jones, MD,

Excel 2007 BASICS for Elementary Statistics: Looking at the Big

Basics of Modern Mathematical Statistics - Extra Materialsextras.springer.com/2014/978-3-662-52386-5/978-3-662-52386-5_Cover... · Basics of Modern Mathematical Statistics Exercises

Quantitative Basics Descriptive Statistics Class 2a

Basics of Matrix Algebra for Statistics with Rwebéducation.com/wp-content/uploads/2018/07/Basics-of-Matrix-Alg… · Basics of Matrix Algebra for Statistics with R provides a guide

Probability and Statistics for Management And Engineering Basics

Statistics Made Easy-Basics

Statistics for Clinical Trials: Basics of a Phase III ... · Statistics for Clinical Trials: Basics of a Phase III Trial Design Greg Pond Ph.D., P.Stat. Ontario Clinical Oncology

Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions

Basics of Statistics-26!6!2015

Basics of Modern Mathematical Statistics - PreMoLabpremolab.ru/c_files/c49/Ff7OxE07fQ.pdfVladimir Spokoiny, Thorsten Dickhaus Basics of Modern Mathematical Statistics { Textbook {April