Contingency Table Analysis Mary Whiteside, Ph.D

Contingency Table AnalysisContingency Table Analysis

Mary Whiteside, Ph.D.

OverviewOverview

Hypotheses of equal proportionsHypotheses of independenceExact distributions and Fisher’s testThe Chi squared approximationMedian testMeasures of dependenceThe Chi squared goodness-of-fit testCochran’s test

Contingency Table ExamplesContingency Table Examples

Countries - religion by government States – dominant political party by

geographic region Mutual funds - style by family Companies - industry by location of

headquarters

More examples - More examples -

Countries - government by GDP categories States - divorce laws by divorce rate categories Mutual funds - family by Morning Star rankings Companies - industry by price earnings ratio

category

Statistical Inference hypothesis Statistical Inference hypothesis of equal proportionsof equal proportionsH0: all probabilities (estimated by proportions,

relative frequencies) in the same column are equal,

H1:at least two of the probabilities in the same column are not equal

Here, for an r x c contingency table, r populations are sampled with fixed row totals, n1, n2, … nr.

Hypothesis of independenceHypothesis of independence

H0: no association

i.e. row and column variable are independent,

H1: an association,

i.e. row and column variable are not independent

Here, one populations is sampled with sample size N. Row totals are random variables.

Exact distribution for 2 x 2 tables: Exact distribution for 2 x 2 tables: hypothesis of equal proportions; nhypothesis of equal proportions; n11 = = nn22 = 2 = 2 2 0

2 0

2 0

0 2

0 2

2 0

0 2

0 2

2 0

1 1

0 2

1 1

Fisher’s Exact TestFisher’s Exact Test

For 2 x 2 tables assuming fixed row and column totals r, N-r, c, N-c:

Test statistic = x, the frequency of cell11

Probability = hyper-geometric probability of x successes in a sample of size r from a population of size N with c successes

Large sample approximation for Large sample approximation for either test either test Chi squared

= Observed - Expected]2 /ExpectedObserved frequency for cell ij comes

from cross-tabulation of dataExpected frequency for cell ij

= Probability Cell ij * N

Degrees of freedom (r-1)*(c-1)

Computing Cell ProbabilitiesComputing Cell Probabilities

Assumes independence or equal probabilities (the null hypothesis)

Probability Cell ij = Probability Row i

* Probability Column j

= (R i/N) * (C j/N)

Expected frequency ij = (R/N)*(C/N)*N

= R*C/N.

Distribution of the SumDistribution of the Sum

Chi Square with (r-1)*(c-1) degrees of freedom

Assumes Observed - Expected]2 /Expected

is standard normal squared

ImpliesObserved - Expected] /Square root[Expected]is standard normal

Implies and Observed is a Poisson RV

Poisson is approximately normal if > 5, traditional guideline

Conover’s relaxed guideline page 201

Measures of Strength: Measures of Strength: Categorical VariablesCategorical VariablesPhi 2x2Cramer's V for rxc Pearson's Contingency

CoefficientTschuprow's T

Measures of Strength: Measures of Strength: Ordinal VariablesOrdinal VariablesLambda A .. Rows dependentLambda B .. Columns dependentSymmetric LambdaKendall's tau-BKendall's tau-CGamma

Steps of Statistical AnalysisSteps of Statistical Analysis

Significance - Strength

1- Test for significance of the observed association

2 - If significant, measure the strength of the association

Consider the correlation Consider the correlation coefficientcoefficient a measure of association (linear relationship

between two quantitative variables)significant but not strongsignificant and strongnot significant but “strong”not significant and not strong

r and Prob (p-value)r and Prob (p-value)

r = .20 p-value < .05 r = .90 p-value < .05r = .90 p-value > .05r = .20 p-value > .05

ConceptsConcepts

Predictive associations must be both significant and strong

In a particular application, an association may be important even if it is not predictive (I.e. strong)

More conceptsMore concepts

Highly significant , weak associations result from large samples

Insignificant “strong” associations result from small samples - they may prove to be either predictive or weak with larger samples

ExamplesExamples

Heart attack Outcomes by Anticoagulant Treatment

Admission Decisions by Gender

SummarySummary

Is there an association?– Investigate with Chi square p-value

If so, how strong is it?– Select the appropriate measure of

strength of associationWhere does it occur?

– Examine cell contributions

Documents

Contingency Table Analysis Mary Whiteside, Ph.D