Upload
haque
View
230
Download
1
Embed Size (px)
Citation preview
Two-Way ANOVA + Nonparametric Testing
Lecture #8BIOE 597, Spring 2017, Penn State University
By Xiao Liu
Agenda
• Non-parametric testing
• Two-Way ANOVA
• Review
o Sign Test
o Wilcoxon Signed Rank Test
o Wilcoxon Rank Sum Test
o Kruskal-Wallis Test
ANOVA (Review)
• Basics• Purpose of ANOVA: Comparing means of different
populations
• Difference from t-test
ANOVA (Review)
• Basics• Purpose of ANOVA: Comparing means of different
populations
• Difference from t-testo T-test: for comparing at most two population meanso ANOVA: for comparing means of two or more populations
ANOVA (Review)
• Basics• Purpose of ANOVA: Comparing means of different
populations
• Difference from t-test
• Why not use multiple two-sample t tests?
o T-test: for comparing at most two population meanso ANOVA: for comparing means of two or more populations
ANOVA (Review)
• Basics• Purpose of ANOVA: Comparing means of different
populations
• Difference from t-test
• Why not use multiple two-sample t tests?ANOVA allows researchers to evaluate all of the mean differences in a single hypothesis test using a single α-level and, thereby, keeps the risk of a Type I error under control no matter how many different means are being compared
o T-test: for comparing at most two population meanso ANOVA: for comparing means of two or more populations
ANOVA (Review)
• Rationale• ANOVA partitions the total variation (sum of squares, SS) of
the dataset
into two separate componentso variation between groups
o variation within groups (𝑥$%−𝑥$' ))(𝑥$' − �̅�))
(𝑥$%−�̅�))
ANOVA (Review)
• Rationale• ANOVA partitions the total variation (sum of squares, SS) of
the dataset
into two separate componentso variation between groups
o variation within groups (𝑥$%−𝑥$' ))(𝑥$' − �̅�))
(𝑥$%−�̅�))
𝑆𝑆𝑇 = ..(𝑥$%−�̅�)) = 𝑆𝑆𝐺 + 𝑆𝑆𝐸 =.𝑛𝑖(𝑥𝑖' − �̅�))+..(𝑥$%−𝑥𝑖'))4$
%56
7
$56
7
$56
4$
%56
7
$56
ANOVA (Review)
• Rationale• ANOVA partitions the total variation (sum of squares, SS) of
the dataset
into two separate componentso variation between groups
o variation within groups (𝑥$%−𝑥$' ))(𝑥$' − �̅�))
(𝑥$%−�̅�))
𝑆𝑆𝑇 = ..(𝑥$%−�̅�)) = 𝑆𝑆𝐺 + 𝑆𝑆𝐸 =.𝑛𝑖(𝑥𝑖' − �̅�))+..(𝑥$%−𝑥𝑖'))4$
%56
7
$56
7
$56
4$
%56
7
$56
𝐹 =𝐵𝑒𝑡𝑤𝑒𝑒𝑛𝐺𝑟𝑜𝑢𝑝𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑊𝑖𝑡ℎ𝑖𝑛𝐺𝑟𝑜𝑢𝑝𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 =
𝑀𝑆𝐺𝑀𝑆𝐸 =
𝑆𝑆𝐺/(𝐼 − 1)𝑆𝑆𝐸(𝑁 − 𝐾)
~𝐹(𝐼 − 1,𝑁 − 𝐼)
ANOVA (Review)
• ANOVA Table
Source SS df MS F pBetween 34.74 2 17.37 6.45 0.0062
Within 59.26 22 2.69Total 94 24
ANOVA (Review)
• Multiple Comparison
• Multiple comparison will accumulate type I error (a) of individual tests and thus result in a much larger experimentwise type I error
1 − 1 − 𝛼 N
ANOVA (Review)
• Multiple Comparison
• Fisher’s Least Significant Difference (liberal)
LSDij=t6TU),VWX𝑀𝑆𝐸 ∗ (
1𝑛𝑖+1𝑛𝑗)
�
• Bonferroni Correction (conservative)Use a/p instead of a for p individual tests
• Turkey’s Honest Significant Difference (in between)
𝐻𝑆𝐷𝑖𝑗 =𝑞6TU),7,_T7
2�𝑀𝑆𝐸 ∗ (
1𝑛𝑖+1𝑛𝑗)
�
ANOVA (Review)
• Contrast
• The standard error for the contrast is
• The corresponding sample contrast is
𝑆𝐸a = 𝑠𝑝 .𝑎$𝑛𝑖
�
�
� = 𝑀𝑆𝐸 ∗.𝑎$𝑛𝑖
�
�
�
𝑐 = .𝑎$𝑥$'�
�
• Test statistic
𝑡 =𝑐𝑆𝐸𝑐
~𝑡(𝐷𝐹𝐸)
with the restriction that .𝑎$ = 0�
�
Two-Way ANOVA
• In one-way ANOVA, data fall into categories of onevariables: e.g., treatment A , treatment B, and placebo.
• In two-way ANOVA, data fall into categories in two different ways: each observation can be placed into a cell of a two-way table.
• Example: in addition to ”treatment”, the second variable ”hospital” shoud also influence outcomes.
• Sometimes we are interested in studying both variables, sometimes the second variable is used only to reduceunexplained variance. Then it is called a blocking variable
Two-Way ANOVA
• In one-way ANOVA, data fall into categories of onevariables: e.g., treatment A , treatment B, and placebo.
• In two-way ANOVA, data fall into categories in two different ways: each observation can be placed into a cell of a two-way table.
• Example: in addition to ”treatment”, the second variable ”hospital” shoud also influence outcomes.
• Sometimes we are interested in studying both variables, sometimes the second variable is used only to reduceunexplained variance. Then it is called a blocking variable
Two-Way ANOVA
• In one-way ANOVA, data fall into categories of onevariables: e.g., treatment A , treatment B, and placebo.
• In two-way ANOVA, data fall into categories in two different ways: each observation can be placed into a cell of a two-way table.
• Example: in addition to ”treatment”, the second variable ”hospital” shoud also influence outcomes.
• Sometimes we are interested in studying both variables, sometimes the second variable is used only to reduceunexplained variance. Then it is called a blocking variable
Two-Way ANOVA
• In one-way ANOVA, data fall into categories of onevariables: e.g., treatment A , treatment B, and placebo.
• In two-way ANOVA, data fall into categories in two different ways: each observation can be placed into a cell of a two-way table.
• Example: in addition to ”treatment”, the second variable ”hospital” shoud also influence outcomes.
• Sometimes we are interested in studying both variables, sometimes the second variable is used only to reduceunexplained variance. Then it is called a blocking variable
Notations of Two-Way ANOVA
• Assume K categories, H blocks, and assume Lobservations xij1, xij2, …,xijL for each category iand each block j block, so we have n=KHLobservations.
oMean for category i:
oMean for block j:
oMean for cell ij:
oOverall mean:
ix ••
jx• •
x
ijx •
Two-Way ANOVA: Sum of Squares
2
1( )
K
ii
SSG HL x x••=
= -å 2
1( )
H
jj
SSB KL x x• •=
= -å
2
1 1 1( )
K H L
ijli j l
SST x x= = =
= -ååå2
1 1 1( )
K H L
ijl iji j l
SSE x x •= = =
= -ååå
Two-Way ANOVA: Sum of Squares
2
1( )
K
ii
SSG HL x x••=
= -å 2
1( )
H
jj
SSB KL x x• •=
= -å
2
1 1 1( )
K H L
ijli j l
SST x x= = =
= -ååå2
1 1 1( )
K H L
ijl iji j l
SSE x x •= = =
= -ååå
2
1 1( )
K H
ij i ji j
SSI L x x x x• •• • •= =
= - - +åå
Two-Way ANOVA: Sum of Squares
2
1( )
K
ii
SSG HL x x••=
= -å 2
1( )
H
jj
SSB KL x x• •=
= -å
2
1 1 1( )
K H L
ijli j l
SST x x= = =
= -ååå
SSG SSB SSI SSE SST+ + + =
2
1 1 1( )
K H L
ijl iji j l
SSE x x •= = =
= -ååå
2
1 1( )
K H
ij i ji j
SSI L x x x x• •• • •= =
= - - +åå
Two-Way ANOVA: Sum of Squares
2
1( )
K
ii
SSG HL x x••=
= -å 2
1( )
H
jj
SSB KL x x• •=
= -å
2
1 1 1( )
K H L
ijli j l
SST x x= = =
= -ååå
SSG SSB SSI SSE SST+ + + =
2
1 1 1( )
K H L
ijl iji j l
SSE x x •= = =
= -ååå
2
1 1( )
K H
ij i ji j
SSI L x x x x• •• • •= =
= - - +åå
𝐷𝐹𝐺 = 𝐾 − 1 𝐷𝐹𝐵 = 𝐻 − 1
𝐷𝐹𝐸 = 𝐾𝐻(𝐿 − 1) 𝐷𝐹𝑇 = 𝐾𝐻𝐿 − 1
𝐷𝐹𝐼 = (𝐾 − 1)(𝐻 − 1)
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Test for interaction: compare MSI/MSE with
Test for block effect: compare MSB/MSE withTest for group effect: compare MSG/MSE with
1, ( 1)K KH LF - -
1, ( 1)H KH LF - -
( 1)( 1), ( 1)K H KH LF - - -
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Test for interaction: compare MSI/MSE with
Test for block effect: compare MSB/MSE withTest for group effect: compare MSG/MSE with
1, ( 1)K KH LF - -
1, ( 1)H KH LF - -
( 1)( 1), ( 1)K H KH LF - - -
Main Effects
Two-Way ANOVA Table
Source of variation
Sums of squares
Deg. of freedom
Mean squares F ratio
Between groups SSG K-1 MSG= SSG/(K-1) MSG/MSE
Between blocks SSB H-1 MSB= SSB/(H-1) MSB/MSE
Interaction SSI (K-1)(H-1) MSI= SSI/(K-1)(H-1)
MSI/MSE
Error SSE KH(L-1) MSE= SSE/KH(L-1)
Total SST n-1
Test for interaction: compare MSI/MSE with
Test for block effect: compare MSB/MSE withTest for group effect: compare MSG/MSE with
1, ( 1)K KH LF - -
1, ( 1)H KH LF - -
( 1)( 1), ( 1)K H KH LF - - -
Main Effects
Interaction
Two-Way ANOVA: Interaction
• Statistical interaction means the effect of one explanatory variable(s) on the response variable depends on the value of another independent variable(s)
• In other words, the simultaneous influence of two variable on a third is not additive.
• Example: A weight loss can be achieved by either diet or exercise, but a combination of two may result in a weight loss larger than the sum of two.
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: YesMain Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: NoMain Effect of B: Yes
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: NoMain Effect of B: Yes
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: Yes
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: YesMain Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: YesMain Effect of B: No
Interaction: No
Main Effect of A: Yes Main Effect of B: No
Interaction: Yes
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: NoMain Effect of B: Yes
Interaction: Yes
Main Effect of A: Yes Main Effect of B: No
Interaction: Yes
Main Effect of A: Yes Main Effect of B: No
Interaction: No
Two-Way ANOVA: Interaction
• An interaction plot displays the levels of one explanatory variable on the X axis and has a separate line for the means of each level of the other explanatory variable. The Y axis is the response variable.
Main Effect of A: NoMain Effect of B: Yes
Interaction: Yes
Main Effect of A: Yes Main Effect of B: No
Interaction: Yes
Main Effect of A: NoMain Effect of B: No
Interaction: Yes
Non-parametric Testing
• Parametric Testing
• Involve population parameters, e.g., mean
• Stringent assumptions, e.g., normality
• Examples: z-test, t-test, chi-squared test, F test
Non-parametric Testing
• A nonparametric test is a hypothesis test that does not require any specific conditions concerning the shape of populations or the value of any population parameters
• Data measured on any scales (ratio, interval, ordinal)
Non-parametric Testing
• A nonparametric test is a hypothesis test that does not require any specific conditions concerning the shape of populations or the value of any population parameters
• Data measured on any scales (ratio, interval, ordinal)
Non-parametric Testing
• A nonparametric test is a hypothesis test that does not require any specific conditions concerning the shape of populations or the value of any population parameters
• Data measured on any scales (ratio, interval, ordinal)
• If the assumptions of parametric test are upheld, use it – on grounds of efficiency
• If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical)
• If assumptions are not fixable, use nonparametric test
When to Use
Sign Test
• Corresponding to one sample t-test or paired t test
• Can also be used for matched pairs of sample data, nominal data with two categories, or a population median against a hypothesized value.
Sign Test: Steps
• State the hypotheses. Typically
• Count the number of values larger and smaller than the median in the null hypothesis and denote them as r+ and r-
respectively
• Choose r = max(r+, r-)
• Calculate sample size as n=r++ r-, ignore the value equal to the median
• Calculate p-value using bin(n, 0.5): the probability of observing a value of r or higher.
o For single population, H0: Median is equal to M0.
o For paired populations, H0: two populations have the equal medians.
Sign Test: Example• Example: The table below shows the hours of relief
provided by two analgesic drugs in 12 patients suffering from arthritis. Is there any evidence that one drug provides longer relief than the other?
Sign Test: Example• Example: The table below shows the hours of relief
provided by two analgesic drugs in 12 patients suffering from arthritis. Is there any evidence that one drug provides longer relief than the other?
• H0: The median of paired difference is 0
Sign Test
• Difference between pairs
Sign Test
• Difference between pairs
r+=9 r-=3 n =12
Sign Test
• Difference between pairs
r+=9 r-=3 n =12
p-value = 2*(1-binocdf(8, 12, 0.5) = 0.146
Wilcoxon Signed-Rank Test
• Corresponding to one-sample t test and paired t test
• More powerful than sign test
• In addition to sign, the rank of observations is taken into consideration
Wilcoxon Signed-Rank Test: Steps• Make null and alternative hypotheses. Typically
o H0: difference between the pairs follows a symmetric distribution around zeroo H1: difference between the pairs does not follow a symmetric distribution
around zero
Wilcoxon Signed-Rank Test: Steps• Make null and alternative hypotheses. Typically
o H0: difference between the pairs follows a symmetric distribution around zeroo H1: difference between the pairs does not follow a symmetric distribution
around zero
Wilcoxon Signed-Rank Test: Steps• Make null and alternative hypotheses. Typically
o H0: difference between the pairs follows a symmetric distribution around zeroo H1: difference between the pairs does not follow a symmetric distribution
around zero
Wilcoxon Signed-Rank Test: Steps• Make null and alternative hypotheses. Typically
o H0: difference between the pairs follows a symmetric distribution around zeroo H1: difference between the pairs does not follow a symmetric distribution
around zero
Wilcoxon Signed-Rank Test: Steps• Make null and alternative hypotheses. Typically
o H0: difference between the pairs follows a symmetric distribution around zeroo H1: difference between the pairs does not follow a symmetric distribution
around zero
Wilcoxon Signed-Rank Test: Steps
Wilcoxon Signed-Rank Test: Steps
Wilcoxon Signed-Rank Test: Steps
Wilcoxon Signed-Rank Test: Example
Wilcoxon Signed-Rank Test: Example
Wilcoxon Signed-Rank Test: Example
Wilcoxon Signed-Rank Test: Example
Wilcoxon Signed-Rank Test: Example
Wilcoxon Rank Sum Test
• Corresponding to two-sample t test
• Equivalent to Mann–Whitney U test
• Nearly as efficient as the t-test on normal distributions
Wilcoxon Rank Sum Test: Steps• Assume we have n1 observations in group 1 and n2 observations
in group 2. (N = n1 + n2)• Make null and alternative hypotheses. Typically
o H0: two groups have the same distributions o H1: One distribution has values that are systematically larger
Wilcoxon Rank Sum Test: Steps
1. Combine the n1 + n2 observations into one group and rank the observations from smallest to largest.
• Assume we have n1 observations in group 1 and n2 observations in group 2. (N = n1 + n2)
• Make null and alternative hypotheses. Typicallyo H0: two groups have the same distributions o H1: One distribution has values that are systematically larger
Wilcoxon Rank Sum Test: Steps
1. Combine the n1 + n2 observations into one group and rank the observations from smallest to largest.
• Assume we have n1 observations in group 1 and n2 observations in group 2. (N = n1 + n2)
• Make null and alternative hypotheses. Typicallyo H0: two groups have the same distributions o H1: One distribution has values that are systematically larger
2. Find the observed rank sum, W, of group 1.
Wilcoxon Rank Sum Test: Steps
1. Combine the n1 + n2 observations into one group and rank the observations from smallest to largest.
• Assume we have n1 observations in group 1 and n2 observations in group 2. (N = n1 + n2)
• Make null and alternative hypotheses. Typicallyo H0: two groups have the same distributions o H1: One distribution has values that are systematically larger
2. Find the observed rank sum, W, of group 1.
3. Under null hypothesis, the W has the mean and standard deviation
Wilcoxon Rank Sum Test: Steps
4. For small samples, the p-value can be calculated by 1) using permutation test; 2) using softwares: different software may give different but
close results. You must learn what your software offers. or 3) the hypothesis testing can be achieved by comparing the
W statistic to critical values from the corresponding table.
Wilcoxon Rank Sum Test: Steps
4. For small samples, the p-value can be calculated by 1) using permutation test; 2) using softwares: different software may give different but
close results. You must learn what your software offers. or 3) the hypothesis testing can be achieved by comparing the
W statistic to critical values from the corresponding table.
~ N(0,1)
5. For large samples (N>=10), the normal approximation can be used instead
Wilcoxon Rank Sum Test: Example
• Example: A production planner wants to see if the operating rates for 2 factories is the same. For factory 1, the rates (% of capacity) are 71, 82, 77, 92, 88. For factory 2, the rates are 85, 82, 94 & 97.
• Question: Do the factory rates have the same probability distributions at the significance level of 0.05?
Wilcoxon Rank Sum Test: Example
• Example: A production planner wants to see if the operating rates for 2 factories is the same. For factory 1, the rates (% of capacity) are 71, 82, 77, 92, 88. For factory 2, the rates are 85, 82, 94 & 97.
• Question: Do the factory rates have the same probability distributions at the significance level of 0.05?
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 8582 8277 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 8582 8277 2 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 8582 3 82 477 2 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 8582 3 82 477 2 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 8582 3 3.5 82 4 3.577 2 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 9492 9788 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 9492 9788 6 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 9492 7 9788 6 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 94 892 7 9788 6 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 94 892 7 97 988 6 ... ...
RankSum
Wilcoxon Rank Sum Test: Example
Factory1 Factory2Rate Rank Rate Rank71 1 85 582 3 3.5 82 4 3.577 2 94 892 7 97 988 6 ... ...
RankSum 19.5 25.5
Wilcoxon Rank Sum Test: Example• To be able to use the table we have as blow, we calculate the
rank sum of group 2
𝑊2 = 25.5 is between 12 and 28, we therefore do not have sufficient evidence to reject H0.
Wilcoxon Rank Sum Test: Example• Alternatively, we can calculate p-value with normal
approximation either using W1 or W2
Wilcoxon Rank Sum Test: Example• Alternatively, we can calculate p-value with normal
approximation either using W1 or W2
For W1 = 19.5
𝑍1 =19.5 − 5 ∗ (9 + 1)/24 ∗ 5 ∗ (9 + 1)/12� = −1.34
Wilcoxon Rank Sum Test: Example• Alternatively, we can calculate p-value with normal
approximation either using W1 or W2
For W1 = 19.5
𝑍1 =19.5 − 5 ∗ (9 + 1)/24 ∗ 5 ∗ (9 + 1)/12� = −1.34
For W2 = 25.5
𝑍2 =25.5 − 4 ∗ (9 + 1)/24 ∗ 5 ∗ (9 + 1)/12� = 1.34
Wilcoxon Rank Sum Test: Example• Alternatively, we can calculate p-value with normal
approximation either using W1 or W2
For W1 = 19.5
𝑍1 =19.5 − 5 ∗ (9 + 1)/24 ∗ 5 ∗ (9 + 1)/12� = −1.34
For W2 = 25.5
𝑍2 =25.5 − 4 ∗ (9 + 1)/24 ∗ 5 ∗ (9 + 1)/12� = 1.34
Both z scores are corresponding to a p-value of 0.178 for two-tailed test
Kruskal-Wallis Test
• Corresponding to ANOVA
Kruskal-Wallis Test
• Corresponding to ANOVA
• Relax the normality assumption of the ANOVA
Kruskal-Wallis Test
• Corresponding to ANOVA
• Relax the normality assumption of the ANOVA
• Only assume that the response has a continuous distribution in each population
Kruskal-Wallis Test
• Corresponding to ANOVA
• Relax the normality assumption of the ANOVA
• Only assume that the response has a continuous distribution in each population
• Typical hypotheses
o H0: All groups have the same distributiono H1: Responses in some groups are systematically higher than in others.
Kruskal-Wallis Test: Steps
1. Combine the N observations into one group and rank the observations from smallest to largest.
• Assume we have n1, n2, …, nI observations in I populations (N = n1 + n2 + … + nI)
Kruskal-Wallis Test: Steps
1. Combine the N observations into one group and rank the observations from smallest to largest.
2. Calculate the sum of ranks for the ith sample as Ri.
• Assume we have n1, n2, …, nI observations in I populations (N = n1 + n2 + … + nI)
Kruskal-Wallis Test: Steps
1. Combine the N observations into one group and rank the observations from smallest to largest.
2. Calculate the sum of ranks for the ith sample as Ri.
• Assume we have n1, n2, …, nI observations in I populations (N = n1 + n2 + … + nI)
3. Caculate the Kruskal-Wallis statistic
Kruskal-Wallis Test: Steps
1. Combine the N observations into one group and rank the observations from smallest to largest.
2. Calculate the sum of ranks for the ith sample as Ri.
4. When the sample size ni are large (>=5) and all I populations have the same continuous distribution, H approximately follows chi-squared distribution
• Assume we have n1, n2, …, nI observations in I populations (N = n1 + n2 + … + nI)
3. Caculate the Kruskal-Wallis statistic
𝐻~𝜒 )(𝐼 − 1)