Upload
lena-smerdon
View
228
Download
2
Tags:
Embed Size (px)
Citation preview
Contingency Table AnalysisContingency Table Analysis
Mary Whiteside, Ph.D.
OverviewOverview
Hypotheses of equal proportionsHypotheses of independenceExact distributions and Fisher’s testThe Chi squared approximationMedian testMeasures of dependenceThe Chi squared goodness-of-fit testCochran’s test
Contingency Table ExamplesContingency Table Examples
Countries - religion by government States – dominant political party by
geographic region Mutual funds - style by family Companies - industry by location of
headquarters
More examples - More examples -
Countries - government by GDP categories States - divorce laws by divorce rate categories Mutual funds - family by Morning Star rankings Companies - industry by price earnings ratio
category
Statistical Inference hypothesis Statistical Inference hypothesis of equal proportionsof equal proportionsH0: all probabilities (estimated by proportions,
relative frequencies) in the same column are equal,
H1:at least two of the probabilities in the same column are not equal
Here, for an r x c contingency table, r populations are sampled with fixed row totals, n1, n2, … nr.
Hypothesis of independenceHypothesis of independence
H0: no association
i.e. row and column variable are independent,
H1: an association,
i.e. row and column variable are not independent
Here, one populations is sampled with sample size N. Row totals are random variables.
Exact distribution for 2 x 2 tables: Exact distribution for 2 x 2 tables: hypothesis of equal proportions; nhypothesis of equal proportions; n11 = = nn22 = 2 = 2 2 0
2 0
2 0
0 2
0 2
2 0
0 2
0 2
2 0
1 1
0 2
1 1
Fisher’s Exact TestFisher’s Exact Test
For 2 x 2 tables assuming fixed row and column totals r, N-r, c, N-c:
Test statistic = x, the frequency of cell11
Probability = hyper-geometric probability of x successes in a sample of size r from a population of size N with c successes
Large sample approximation for Large sample approximation for either test either test Chi squared
= Observed - Expected]2 /ExpectedObserved frequency for cell ij comes
from cross-tabulation of dataExpected frequency for cell ij
= Probability Cell ij * N
Degrees of freedom (r-1)*(c-1)
Computing Cell ProbabilitiesComputing Cell Probabilities
Assumes independence or equal probabilities (the null hypothesis)
Probability Cell ij = Probability Row i
* Probability Column j
= (R i/N) * (C j/N)
Expected frequency ij = (R/N)*(C/N)*N
= R*C/N.
Distribution of the SumDistribution of the Sum
Chi Square with (r-1)*(c-1) degrees of freedom
Assumes Observed - Expected]2 /Expected
is standard normal squared
ImpliesObserved - Expected] /Square root[Expected]is standard normal
Implies and Observed is a Poisson RV
Poisson is approximately normal if > 5, traditional guideline
Conover’s relaxed guideline page 201
Measures of Strength: Measures of Strength: Categorical VariablesCategorical VariablesPhi 2x2Cramer's V for rxc Pearson's Contingency
CoefficientTschuprow's T
Measures of Strength: Measures of Strength: Ordinal VariablesOrdinal VariablesLambda A .. Rows dependentLambda B .. Columns dependentSymmetric LambdaKendall's tau-BKendall's tau-CGamma
Steps of Statistical AnalysisSteps of Statistical Analysis
Significance - Strength
1- Test for significance of the observed association
2 - If significant, measure the strength of the association
Consider the correlation Consider the correlation coefficientcoefficient a measure of association (linear relationship
between two quantitative variables)significant but not strongsignificant and strongnot significant but “strong”not significant and not strong
r and Prob (p-value)r and Prob (p-value)
r = .20 p-value < .05 r = .90 p-value < .05r = .90 p-value > .05r = .20 p-value > .05
ConceptsConcepts
Predictive associations must be both significant and strong
In a particular application, an association may be important even if it is not predictive (I.e. strong)
More conceptsMore concepts
Highly significant , weak associations result from large samples
Insignificant “strong” associations result from small samples - they may prove to be either predictive or weak with larger samples
ExamplesExamples
Heart attack Outcomes by Anticoagulant Treatment
Admission Decisions by Gender
SummarySummary
Is there an association?– Investigate with Chi square p-value
If so, how strong is it?– Select the appropriate measure of
strength of associationWhere does it occur?
– Examine cell contributions