Probability and Statistical Inference Gehlbach: Chapter 8

Preview:

Citation preview

Probability and Probability and Statistical InferenceStatistical Inference

Gehlbach: Chapter 8Gehlbach: Chapter 8

Objective of Statistical Analysis

To answer research questions using observed data, using data reduction and analyzing variability

To make an inference about a population based on information contained in a sample from that population

To provide an associated measure of how good the inference is

Basic Concepts of Statistics

Estimation & InferenceEstimation & InferenceMethod of testing hypothesesBased on statistics: f( )

SamplingSamplingMethod of collecting dataBased on probability

PopulationPopulation : Parameters

SampleSample

: Sample values: Sample values

X

X

General Approach to Statistical AnalysisGeneral Approach to Statistical Analysis

Population DistributionRandom variables-Parameters: µ, σ

Samples of size Ngenerate data

Descriptive Statistics (figures, tables)Estimation: statistics , SD

Tests of Hypothesis

Inference about the population

Statistical Inference

Sampling

X

OutlineOutline

• Probability• Definition• Probability Laws• Random Variable• Probability Distributions

• Statistical Inference• Definition• Sample vs. Population• Sampling Variability• Sampling Problems• Central Limit Theorem• Hypothesis Testing• Test Statistics• P-value Calculation• Errors in Inference• P-value Adjustments• Confidence Intervals

We disagree with Stephen

• A working understanding of P-values is not difficult to come by.

• For the most part, Statistics and clinical research can work well together.

• Good collaborations result when researchers have some knowledge of design and analysis issues

all parts

ProbabilityProbability

Probability and the P-valueProbability and the P-value

• You need to understand what a P-value means

• P-value represents a probabilistic statement

• Need to understand concept of probability distributions

• More on P-values later

Definition of ProbabilityDefinition of Probability

• An experiment is any process by which an observation is made

• An event (E or Ei) is any outcome of an experiment

• The sample space (S) is the set of all possible outcomes of an experiment

• Probability: a measure based on the sample S; in the simplest case is empirically estimated by # times event occurs / total # trialsE.g.: Pr(of a red car) = (# red cars seen) / (total # cars)

• Probability is the basis for statistical inference

Axiomatic ProbabilityAxiomatic Probability(laying down “the laws”)

For any sample space S containing events E1, E2, E3,…; we assign a number, P(Ei), called the probability of Ei such that:

1. 0 ≤ P(Ei) ≤ 1

2. P(S) = 1

3. If E1, E2, E3,…are pairwise mutually exclusive events in S then

)...) i1i

321 EP(EEP(E

E1 E2

Union and Intersection:Union and Intersection:Venn DiagramsVenn Diagrams

E1E2

Union of E1 and E2: “E1 or E2”, denoted E1UE2:

Intersection of E1 and E2: “E1 and E2”, denoted E1∩E2:

Laws of ProbabilityLaws of Probability(the sequel)

• Let (“E complement”) be set of events in S not in E, then P( )= 1-P(E)

• P(E1U E2) = P(E1) + P(E2) – P(E1∩ E2)

• The conditional probability of E1 given E2 has occurred:

• Events E1 and E2 are independent if

P(E1∩E2) = P(E1)P(E2)

EE

)P(E

)EP(E )E|P(E

2

2121

Conditional ProbabilityConditional Probability• Restrict yourself to a “subspace” of the

sample space

Male Female

Infection 20% 10%

No infection 35% 35%

● P(I|M) = P(I∩M)/P(M) = 0.2/0.55 = 0.36● P(M|I) = P(I∩M)/P(I) = 0.2/0.3 = 0.67

• Categorical data analysis:

odds ratio = ratio of odds of two conditional probabilities

• Survival analysis, conditional probabilities of the form :

P(alive at time t1+t2 | survive to t1)

Conditional probability examplesConditional probability examples

Random VariablesRandom Variables(where the math begins)

• A random variable is a (set) function with domain S and range (i.e., a real-valued function defined over a sample space)

• E.g.: tossing a coin, let X=1 of heads, X=0 if tails– P(X=0) = P(X=1) = ½– Many times the random variable of interest will

be the realized value of the experiment (e.g., if X is the b-segment PSV from RDS)

– Random variables have probability distributions

Probability DistributionsTwo types:

Discrete distributions (and discrete random variables) are represented by a finite (or countable) number of values

P(X=x) = p(x)

Continuous distributions (and random variables) are be represented by a real-valued interval

P(x1<X<x2) = F(x2) – F(x1)

Expected Value & Variance

• Random variables are typically described using two quantities: – Expected value = E(X) (the mean, usually “μ”)– Variance = V(X) (usually “σ2”)

• Discrete Case:E(X) = V(X) =

• Continuous Case:

)xp(x ii

i )xp(E(x)]-[x ii

2i

-

dx f(x) x E(X)

-

2 dx f(x) μ)-(x V(X)

Discrete Distribution ExampleDiscrete Distribution Example

Binomial:– Experiment consists of n identical trials– Each trial has only 2 outcomes: success (S) or failure (F)– P(S) = p for a single trial; P(F) = 1-p = q– Trials are independent– R.V. X = the number of successes in n trials

xnx ppx

n

)(1 p(x)

Continuous Distribution Example

Normal (Gaussian):

• The normal distribution is defined by its probability density function, which is given as

for parameters μ and σ, where σ > 0.

x

μxexp

1f(x)

2

2

2 ,

σ

X ~ N(μ, σ2), E(X) = μ and V(X) = σ2

X

f(x)

1 2

Same VarianceDifferent Means

X ~ N(1, 2) X ~ N(2, 2)

X

f(x)

Same MeanDifferent Variances

X ~ N(, 12)

X ~ N(, 22)

Statistical Statistical InferenceInference

Statistical InferenceStatistical Inference

• Is there a difference in the population?

• You do not know about the population. Just the sample you collected.

• Develop a Probability model

• Infer characteristics of a population from a sample

• How likely is it that sample data support null hypothesis

Statistical InferenceStatistical Inference

Mean = ?Mean = 16.2

Inference

Sample

Population

Definition of InferenceDefinition of Inference

• Infer a conclusion/estimate about a population based on a sample from the population

• If you collect data from whole population you don’t need to infer anything

• Inference = conducting hypothesis tests (for p-values), estimating 95% CI’s

Sample vs. Population (example)

• “The primary sample [involved] students in the 3rd through 5th grades in a community bordering a major urban center in North Carolina… The sampling frame for the study was all third through fifth-grade students attending the seven public elementary schools in the community (n=2,033). From the sampling frame, school district evaluation staff generated a random sample of 700 students.”

Source: Bowen, NK. (2006) Psychometric properties of Elementary School Success Profile for Children. Social Work Research, 30(1), p. 53.

Philosophy of Science

• Idea: We posit a paradigm and attempt to falsify that paradigm.

• Science progresses faster via attempting to falsify a paradigm than attempting to corroborate a paradigm.

(Thomas S. Kuhn. 1970. The Structure of Scientific Revolutions. University of Chicago Press.)

Philosophy of Science

• Easier to collect evidence to contradict something than to prove truth?

• The fastest way to progress in science under a paradigm of falsification is through perturbation experiments.

• In epidemiology, – often unable to do perturbation experiments– it becomes a process of accumulating evidence

• Statistical testing provides a rigorous data-driven framework for falsifying hypothesis

What is Statistical Inference?What is Statistical Inference?

• A generalization made about a larger group or population from the study of a sample of that population.

• Sampling variability: repeat your study (sample) over and over again. Results from each sample would be different.

Sampling VariabilitySampling Variability

Mean = ?Mean = 16.2

Inference

Sample

Population

Sampling VariabilitySampling Variability

Mean = ?Mean = 17.1

Inference

Sample

Population

Sampling ProblemsSampling Problems

• Low Response Rate

• Refusals to Participate

• Attrition

Low Response Rate Low Response Rate

• Response rate = % of targeted sample that supply requested information

• Statistical inferences extend only to individuals who are similar to completers

• Low response rate ≠ Nonresponse bias, but is a possible symptom

Low Response Rate (examples) • “One hundred six of the 360 questionnaires were returned, a response rate of 29%.” Source: Nordquist, G. (2006) Patient insurance status and do-not-resuscitate orders:

Survival of the richest? Journal of Sociology & Social Welfare, 33(1), p. 81.

• “At the 7th week, we sent a follow-up letter to thank the respondents and to remind the nonrespondents to complete and return their questionnaires. The follow-up letter generated 66 additional usable responses.” Source: Zhao JJ, Truell AD, Alexander MW, Hill IB. (2006) Less success than meets the eye? The impact of Master of Business Administration education on graduates’ careers. Journal of Education for Business, 81(5), p. 263.

• “The response rate, however, was below our expectation. We used 2 procedures to explore issues related to non-response bias. First, there were several identical items that we used in both the onsite and mailback surveys. We compared the responses of the non-respondents to those of respondents for [both surveys]. No significant differences between respondents and non-respondents were observed. We then conducted a follow-up telephone survey of non-respondents to test for potential non-response bias as well as to explore reasons why they had not returned their survey instruments…”Source: Kyle GT, Mowen AJ, Absher JD, Havitz ME. (2006) Commitment to public leisure service providers: A conceptual and psychometric analysis. Journal of Leisure Research, 38(1), 86-87.

Refusals to ParticipateRefusals to Participate

• Similar kind of problem to having low response rates

• Statistical inferences may extend only to those who agreed to participate, not to all asked to participate

• Compare those who agree to refusals

Refusals to Participate (example)

• “Participants were 38 children aged between 7 and 9 years. Children were from working- or middle-class backgrounds, and were drawn from 2 primary schools in the north of England. Letters were sent to the parents of all children between 7 and 9 in both schools seeking consent to participate in the study. Around 40% of the parents approached agreed for their children to take part.”Source: Meins E, Fernyhough C, Johnson F, Lidstone J. (2006) Mind-mindedness in children: Individual differences in internal-state talk in middle childhood. British Journal of Developmental Psychology, 24(1), p. 184.

AttritionAttrition

• Individuals who drop out before study’s end (not an issue for every study design)

• Differences between those who drop out and those who stay in are called Attrition bias.

• Conduct follow-up study on dropouts • Compare baseline data

Attrition (example)• “…Of the 251 men who completed an assigned intervention, about a fifth

(19%) failed to return for a 1-month assessment and more than half (54%) for a 3-month assessment… Conclusions also cannot be generalized beyond the sample [partly because] attrition in the evaluation study was relatively high and it was not random. Therefore, findings cannot be generalized to those least likely to complete intervention sessions or follow-up assessments.”Source: Williams ML, Bowen AM, Timpson SC, Ross MW, Atkinson JS. (2006) HIV prevention and street-based male sex workers: An evaluation of brief interventions. AIDS Education & Prevention, 18(3), pp.207-214.

• “The 171 participants who did not return for their two follow-up visits represent a significant attrition rate (34%). A comparison of demographic and baseline measures indicated that [those who stayed in the study versus those who did not] differed on age, BMI, when diagnosed, language, ethnicity, HbA1c, PCS, MCS and symptoms of depression (CES-D).” Source: Maljanian R, Grey N, Staff I, Conroy L. (2005) Intensive telephone follow-up to a hospital-based disease management model for patients with diabetes mellitus. Disease Management, 8(1), p. 18.

Back to Inference….Back to Inference….

MotivationMotivation

• Typically you want to see if there are differences between groups (i.e., Treatment vs. Control)

• Approach this by looking at “typical” or “difference on average” between groups

• Thus we look at differences in central tendency to quantify group differences

• Test if two sample means are different (assuming same variance) in experiment

X

f(x)

1 2

Same VarianceDifferent Means

X ~ N(1, 2) X ~ N(2, 2)

Central Limit TheoremCentral Limit Theorem

• The CLT states that regardless of the distribution of the original data, the average of the data is Normally distributed

• Why such a big deal?

• Allows for hypothesis testing (p-values) and CI’s to be estimated

Central Limit TheoremCentral Limit Theorem

• If a random sample is drawn from a population, a statistic (like the sample average) follows a distribution called a “sampling distribution”.

• CLT tells us the sampling distribution of the average is a Normal distribution, regardless of the distribution of the original observations, as the sample size increases.

# of Infections

f(x)

C T

P-value = 0.164

X ~ N(C, 2) X ~ N(T, 2)

What is the P-value?What is the P-value?

• The p-value represents the probability of getting a test statistic as extreme or more under the null hypothesis

• That is, the p-value is the chances you obtained your data results under the assumption that your null hypothesis is true.

• If this probability is low (say p<0.05), then you conclude your data results do not support the null being true and “reject the null hypothesis.”

Hypothesis Testing & P-valueHypothesis Testing & P-value

• P-value is: Pr(observed data results | null hypothesis is true)

• If P-value is low, then conclude null hypothesis is not true and reject the null (“in data we trust”)

• How low is low?

Statistical Significance

If the P-value is as small or smaller than the pre-determined Type I error (size) , we say that the data are statistically significant at level .

What value of is typically assumed?

5.0 Mean# Infections

f(x)

5.34.7 5.64.4

Critical limit

Critical region

Fail to reject H0

Probability Distribution & P-valueProbability Distribution & P-value

Reject H0

5.0 Mean# Infections

f(x)

5.34.7 5.64.4

Critical limit

Critical region

Fail to reject H0

2-sided P-value & Probability Distribution2-sided P-value & Probability Distribution

Reject H0Reject H0

Critical limit

Critical region

Why P-value < 0.05 ?

This arbitrary cutoff has evolved over time as somewhat precedent.

In legal matters, courts typically require statistical significance at the 5% level.

The P-value

The P-value is a continuum of evidence against the null hypothesis.

Not just a dichotomous indicator of significance.

Would you change your standard of care surgery procedure for p=0.049999 vs. p=0.050001?

Gehlbach’s beefs with P-valueGehlbach’s beefs with P-value

• Size of P-value does not indicate the [clinical] importance of the result

• Results may be statistically significant but practically unimportant

• Differences not statistically significant are not necessarily unimportant ***

• Any difference can become statistically significant if N is large enough

• Even if there is statistical significance is there clinical significance?

Controversy around Controversy around HT and P-valueHT and P-value

“A methodological culprit responsible for spurious theoretical conclusions”

(Meehl, 1967; see Greenwald et al, 1996)

“The p-value is a measure of the credibility of the null hypothesis. The smaller the P-value is, the less likely one feels the null hypothesis can be true.”

HT and p-valueHT and p-value

• “It cannot be denied that many journal editors and investigators use P-value < 0.05 as a yardstick for the publishability of a result.”

• “This is unfortunate because not only P-value, but also the sample size and magnitude of a physically important difference determine the quality of an experimental finding.”

HT and p-valueHT and p-value

• “[We] endorse the reporting of estimation statistics (such as effect sizes, variabilities, and confidence intervals) for all important hypothesis tests.”

– Greenwald et al (1996)

Test Statistics

• Each hypothesis test has an associated test statistic.

• A test statistic measures compatibility between the null hypothesis and the data.

• A test statistic is a random variable with a certain distribution.

• A test statistic is used to calculate probability (P-value) for the test of significance.

How a P-value is calculatedHow a P-value is calculated

• A data summary statistic is estimated (like the sample mean)

• A “test” statistic is calculated which relates the data summary statistic to the null hypothesis about the population parameter (the population mean)

• The observed/calculated test statistic is compared to what is expected under the null hypothesis using the Sampling Distribution of the test statistic

• The Probability of finding the observed test statistic (or more extreme) is calculated (this is the P-value)

Hypothesis Testing

1. Set up a null and alternative hypothesis

2. Calculate test statistic

3. Calculate the P-value for the test statistic

4. Based on P-value make a decision to reject or fail to reject the null hypothesis

5. Make your conclusion

Errors in Errors in Statistical InferenceStatistical Inference

The Four Possible Outcomesin Hypothesis Testing

Truth in Population

H0 true H0 false

Fail to reject H0

H0 is true & H0 is not rejected

H0 is false & H0 is not rejected

Reject H0 H0 is true &

H0 is rejected H0 is false & H0 is rejected

Dec

isio

n b

ased

on

Dat

a

Note similarities to diagnostic tests!

The Four Possible Outcomesin Hypothesis Testing

TRUTH

H0 true H0 false

Fail to reject H0

Type II error

()

Reject H0 Type I error

()

DE

CIS

ION

Power (1-)

Conditioned on column!

Type I ErrorsType I Errors

= Pr(Type I error) = Pr(reject H0 | H0 is true)

“Innocent until proven guilty”

Rejected innocence but defendant is truly innocent (concluded guilty).

Type II ErrorsType II Errors

= Pr(Type II error) = Pr(do not reject H0 | H0 is false)

“Innocent until proven guilty”

Do not reject innocence but defendant was truly guilty.(conclude innocent).

PP-value -value adjustmentsadjustments

PP-value adjustments-value adjustments

• Sometimes adjustments for multiple testing are made

• Bonferroni α = (alpha) / (# of tests)

• alpha is usually 0.05 (P-value cutoff)

• Bonferroni is a common (but conservative) adjustment; many others exist

PP-value adjustments (example)-value adjustments (example)

• “An alpha of .05 was used for all statistical tests. The Bonferroni correction was used, however, to reduce the chance of committing a Type I error. Therefore, given that five statistical tests were conducted, the adjusted alpha used to reject the null hypothesis was .05/5 or alpha = .01.”

Source: Cumming-McCann A. (2005) An investigation of rehabilitation counselor characteristics, white racial attitudes, and self-reported multicultural counseling competencies. Rehabilitation Counseling Bulletin, 48(3), 170-171.

Confidence Confidence IntervalsIntervals

(CI’s)(CI’s)

Confidence IntervalsConfidence Intervals

• What is the idea of confidence interval?

Calculate a range of reasonable values Calculate a range of reasonable values (an interval) that should include the (an interval) that should include the population value (point estimate) 95% of population value (point estimate) 95% of the time if you were to collect sample data the time if you were to collect sample data over and over again.over and over again.

Confidence IntervalsConfidence Intervals

( | | )

95% Confidence

In other words, if 100 different samples were drawn from the same population and 100 intervals were calculated, approximately 95 of them would contain the population mean.

Confidence IntervalsConfidence Intervals

• 100*(1-α)% Confidence Interval for Mean:

• 100*(1-α)% Confidence Interval for Proportion:

n

sdtX ndf 21 ,1

n

ppzp

ˆ1ˆˆ 21

95% Confidence Intervals95% Confidence Intervals

• 95% Confidence Interval for Mean:

• 95% Confidence Interval for Proportion:

n

sdX 2

n

ppp

ˆ1ˆ2ˆ

Bayesian vs. Classical InferenceBayesian vs. Classical Inference

• There are 2 main camps of Statistical Inference:– Frequentist (classical) statistical inference

– Bayesian statistical inference

• Bayesian inference incorporates “past knowledge” about the probability of events using “prior probabilities”

• Bayesian paradigm assumes parameters of interest follow a statistical distribution of their own; Frequentist inference assumes parameters are fixed

• Statistical inference is then performed to ascertain what the “posterior probability” of outcomes are, depending on:– the data

– the assumed prior probabilities

Schedule

*10/01 seminar will meet in Wachovia 2314

Recommended