Data Analysis Class 6: Hypothesis testing and confidence intervals

Data Analysis

Class 6: Hypothesis testingand confidence intervals

What to ‘expect’?

• We have studied various distributions:– Bernoulli – binary– Geometric – number of Bernoulli experiments to first success– Binomial – number of successes in n Bernoulli experiments– Gaussian – real-valued, bell-shaped curve– Exponential – real-valued time to first event– Poisson – number of events in a unit time interval

• Tells us what to expect from measurements• What if we are not sure about the parameters?• Can we use measurements as evidence?

– Fitting distributions to data– Testing hypotheses based on data– Confidence intervals

Remember…

Type Distribution / density function Mean Variance

Bernoulli

Geometric

Binomial

Gaussian

Exponential

Poisson

xnxnx ppxXP )1()(

ppxXP x 1)1()(

x)(- x) P(X exp

)exp()(

)1( pp

np )1( pnp

Properties of distributions

• Mean

• Sample mean (average)

• Variance

• Sample variance

dxxXxPxXxPXXE )()(

dxxXPxxXPxXXE )()()()( 22222

Moment matching

• Choosing parameters such that

First, second, … order moments are equal to their empirical estimate

Bernoulli / geometric / binomial

• Bernoulli first order moment:

• (Similar for geometric and binomial)

• Only one parameter higher order moments not needed

Bernoulli / geometric / binomial

Empirical mean: p = 0.34,estimated based on 100*10 Bernoulli outcomes

Gaussian

• First and second order moments:

• Only two parameters, so no need for higher moments

Gaussian

Empirical means: 3.2 and 15.6

Empirical stds: 1.1 and 2.0

Multivariate Gaussian

• Multivariate Gaussian density function:

• Parameters:– Mean vector mu– Covariance matrix Sigma

1 μxΣμx

Multivariate Gaussian

• Parameters can be estimated from a set of samples {x1,x2,…,xn}:

μxμxΣ

Exponential / Poisson

• Exponential mean:

• Poisson mean:

• Both will give the same result (lambda = empirical number of events per unit of time)

http://news.bbc.co.uk/1/hi/world/europe/2008892.stm

Significant plane crashes since 1 January 1998…

Lambda = 0.015 = 1/mean(time between crashes)

Lambda = 1.5 = average number of crashes in 100 days

(note: 100 larger, because unit time interval is x 100 too)

Hypothesis testing• Given a distribution (and parameters)

• E.g. binomial: number of faulty items in a lot of a factory pipeline

• Empirical data may urge us to revise our hypothesis

Binomial distribution

• Consider a pipeline in a factory

• If the expected probability of a fault is (should be) 0.01, what can we conclude if we see a batch with 10% faults?

• p=0.01, n=100 (batch size), x=10 (number faulty)

• Probability to see something equally or more surprising= the p-value: P(X>=10)?

• Extremely small we should reject the hypothesis that p=0.01! the pipeline must be broken!?

86.7)1()10(10

EppXP xnx

Binomial distribution

• In practice:

• This can be computed by the cumulative binomial distribution function

• In matlab (with p=0.01):1-binocdf(9,100,p)

86.7)1()10(10

EppXP xnx

Poisson distribution

• Assume the expected number of plane crashes in 100 days is supposed to be 1.5What can we conclude if there are 5 in a given 100 days? (It is true for the 16th unit time interval)

• The p-value = P(X>=5)

• P-value is small – should we reject the null hypothesis that lambda=1.5?

)exp()5(

Poisson distribution

• In practice:

• This can be computed by means of the cumulative Poisson distribution function

• In matlab (with lambda=1.5):1-poisscdf(4,lambda)

)exp()5(

Hypothesis testing

• In general:– Assume a null hypothesis for the data

• Faults are Bernoulli random variables with given p• Crashes occur with a fixed probability lambda per unit time

interval– Gather data– Compute a test statistic of the data

• Number of faults in a batch of n• Number of crashes in a unit time interval

– Compute the p-value as the test statistic is equally large on random data from the null hypothesis

– If the p-value is smaller than a threshold (0.01, 0.05…), reject the null hypothesis

Hypothesis testing

• In general:– Hypothesis testing quantifies that a random

variable will typically be close to its mean– This holds more strongly as the standard

deviation is smaller

Permutation testing

• Sometimes, the distribution of the test statistic can be too complex

• Then: permutation testing– Generate random data sets by permutating the one

sampled (1000 times)– Compute the fraction of times the test statistic is

larger in those permuted versions– This is an approximation of the p-value

• (Assumption of this approach: permuted versions of the data are equally likely under the null hypothesis)

Permutation testing

• Test statistic = number of plane crashes in 16th unit time interval of 100 days

Permutation testing• Generate 1000 random crash time series with the same

number of crashes in the same period (e.g. by permuting the days)

• Compute the number of crashes in the 16th unit time (of 100 days)

• Compute the proportion of those 1000 permutations where the number of crashes in this interval was at least 5

• This is the p-value estimate!

• Result (in my experiment): 0.018 – very close to 0.02 as computed using Poisson

Confidence intervals• Rather than computing a point estimate for

the mean …

• … we can compute an interval for the mean

• A range of values in which the mean will be with high confidence

Confidence intervals• Consider a pipeline in a factory

• If the expected probability of a fault is (should be) 0.01, what can we conclude if we see a batch with 10% faults?

• n=100 (batch size), x=10 (number faulty)

• Let’s say: we reject the null hypothesis if p-value < delta=0.05

• p=0.01 p-value = 7.6E-8p=0.05 p-value = 0.028p=0.055 p-value = 0.05p=0.1 p-value = 0.54

• The set of all values for p for which the p-value >= 0.05 isthe confidence interval with confidence delta=0.05:

[0.055,1]

Confidence intervals

• Assume in a given unit time interval of 100 days, there are 5 crashes

• p-value threshold used: delta=0.01

• lambda = 1 p-value = 0.0037lambda = 1.28 p-value = 0.01lambda = 2 p-value = 0.053lambda = 4 p-value = 0.37

• Confidence interval with confidence delta=0.01:[1.28,infinity]

• This was one-sided• Two-sided:• For all lambda values in the interval:

P(at least 4 crashes)>=0.005P(at most 4 crashes)>=0.005

• Two-sided confidence interval with confidence delta=0.01:

[1.08,12.6]

• Indeed:1-poisscdf(4,1.08) = 0.005poisscdf(4,12.6) = 0.005

• Other (more common) interpretation of confidence intervals:– With probability (over the sampled data) equal to the confidence

parameter,the confidence interval will contain the actual value

• You can verify that this is the case…(think about it)

[For any mean outside the interval, the probability of the observed test statistic (or more extreme) is less than delta. Hence, the probability over the data that the interval contains the actual mean is at least delta]

Lab session• On the temperature time series data:

– Compute the 12-dimensional mean temperature over the year– Compute the covariance matrix– Visualize both in the report (using plot and imagesc)

• On the Titanic data:– Compute the probability of having died among first class passengers

(report)– Compute the p-value for the null hypothesis that the probability of

having died for third class passengers is the same (report)– Compute the probability of survival among all male passengers (report)– Compute the p-value for the null hypothesis that the probability of

survival for female passengers is the same (report)• On the plane crash data:

– Make a histogram of number of plane crashes per time unit, starting on 1/1/1998, and fit a Poisson to it (as in the lecture), but with unit time interval equal to 50 days (report)

– Find the unit time interval with the largest number of crashes (report)– Compute the p-value for this time interval both analytically using the

Poisson cumulative distribution function as well as using permutation testing. What can you conclude, e.g. with p-value threshold equal to 0.01? (report)

Data Analysis Class 6: Hypothesis testing and confidence intervals

Documents

Warm up On slide. Section 11.1 Chi-Square Inference Summary Means (Hypothesis Test and Confidence Intervals) Proportions (Hypothesis Test and Confidence

Confidence Intervals and Hypothesis Tests (Statistical ... · Confidence Intervals and Hypothesis Tests (Statistical Inference) Ian Jolliffe Introduction Illustrative Example Types

On Confidence Intervals and Two-Sided Hypothesis …736874/FULLTEXT01.pdfOn Confidence Intervals and Two-Sided Hypothesis Testing. ... intervals for binomial proportions. Statistics

Confidence Intervals and Hypothesis Testing Using and

Module 23: Proportions: Confidence Intervals and Hypothesis Tests, Two Samples

Confidence Intervals and Hypothesis Tests (Statistical Inference)

Statistical Inference II: Pitfalls of hypothesis testing; confidence intervals/effect sizes

Confidence Intervals and Hypothesis Tests: Two Samples

Inferential Statistics Confidence Intervals and Hypothesis Testing

Confidence Intervals and Hypothesis Tests: Two …faculty.fiu.edu/~mcguckd/LectureNotesSTATSICh9.pdfConfidence Intervals and Hypothesis Tests: Two Samples Inferences Based on Two Samples

Jeopardy Statistics Edition. Terms General Probability Sampling Distributions Confidence Intervals Hypothesis Tests: Proportions Hypothesis Tests: Means

Hypothesis Tests and Confidence Intervals in Multiple Regressors

Confidence Intervals and Hypothesis tests with Proportions

Two Population Means Hypothesis Testing and Confidence Intervals For Matched Pairs

Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis

Statistical inference: confidence intervals and hypothesis testing

June 18, 2008Stat 111 - Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing

Chapter 7 Hypothesis Tests and Confidence Intervals in Multiple Regression

Two Population Means Hypothesis Testing and Confidence Intervals For Matched Pairs

Conservative Hypothesis Tests and Confidence Intervals ... · M.T. Harrison/Conservative Hypothesis Tests and Con dence Intervals 4 for all 2[0;1] and n 0 under the null hypothesis,