26
Non-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 1 Wednesday, March 4, 2009 Danstan Bagenda PhD March 2009 MUSPH Parametric Methods Majority of Statistical Methods based on: some statistical model some assumption about sampling distribution (eg. underlying normal) Theoratical distributions are described by quantities called PARAMETERS notably mean & SD (Normal Distribution) methods using such distributional assumptions called PARAMETRIC METHODS 2 Wednesday, March 4, 2009

Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Non-Parametric Statistics

Danstan Bagenda PhD

MUSPH 2009

1Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Parametric MethodsMajority of Statistical Methods based on:

some statistical model

some assumption about sampling distribution (eg. underlying normal)

Theoratical distributions are described by quantities called PARAMETERS

notably mean & SD (Normal Distribution)

methods using such distributional assumptions called PARAMETRIC METHODS

2Wednesday, March 4, 2009

Page 2: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Mathematical description of how the values are DISTRIBUTED

These graphs show probability

Distributions

Histogram

0

2

4

6

8

10

3 4 5 6 7 8 9

10

11

12

13

14

15

More

Bin

Frequency

Frequency

3Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Binomial Distribution

2 categories of outcome (yes/no, died/survived)

Fixed number of trials

Same probability of “success” for each trial

4Wednesday, March 4, 2009

Page 3: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Poisson Distribution

Given a rate that events occur (at random), how many events occur in a set time

e.g. given an average rate if 5 cardiac arrests a week, whats the probablity of more than 2 in one day?

5Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Normal DistributionNot every variable follows the normal distribution, but many do.

A symmetric distribution

Often skewed data can be transformed using log to produce a normal distribution

µ !±1

68%

68% of the data

95% of the data

99.7% of the data

µ !

µ !

µ !

±

±

±

1

2

3

6Wednesday, March 4, 2009

Page 4: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2008

7

Measurement Scales

Nominal: Count the data in a category.

Ordinal: Rank the data along some characteristic.

Interval: When the elements can be differentiated and ordered and the arithmetic difference between elements is meaningful.

Ratio: When there is an interval scale with a fixed origin and the basic scale indicates proportionality.

These scales are cumulative where each scale involves the characteristics of the former scale plus

another property.

7Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH 8

Exploratory Data Analysis

Stem and Leaf

Given a set of numbers, the use of the Stem and Leaf will help to see the data and help to explain what is happening in the data.

The Stem is the column of data that is the first significant digit. The Leaf is the row of numbers that represent the other information

from the data. The structure can be established in various ways by modifying the stem definition.

8Wednesday, March 4, 2009

Page 5: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH 9

Box and Whiskers

There are five numbers that can be used to describe a set of data. The numbers are the extremes (largest and smallest) the point at which half are above and half are below (median) and the points half way between the extremes and the median (the

quartiles or sometimes called the "hinges").

Rank for the median = (1+ count)/2. When the count is even, like for 6 numbers, the median is a "half" rank. In this case, the

median is 3.5 or the average of the observations ranked 3 and 4.

The hinge is (1+ median)/2. This may also result in a fraction. In the case of 6 numbers, the hinge is (1+3.5)/2 or = 2.25. This will be the second number plus 1/4 of the way to the third.

9Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH 10

Tests for Normality

Skewness is a measure of asymmetry of the distribution of numbers. To calculate skewness, the formula is nothing more than the mean of the cubed Z scores. If the calculated value is 0 it

means that the distribution of the set of numbers is symmetrical. A normal distribution is symmetrical.

Kurtosis is a measure of the presence of extreme values in the

distribution. If the distribution is relatively peaked in the middle, and has relatively high tails, kurtosis will be large. The formula for calculating the degree of kurtosis is nothing more

than the mean of the Z scores taken to the fourth power minus 3.

Note: These tests are very sensitive. Often a single number or a few numbers in a large distribution

will result in a statistically significant skewness or kurtosis statistic. Inspecting a frequency distribution will reveal the outliers.

10Wednesday, March 4, 2009

Page 6: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Non Parametric

Previously, other class discussed the three types of ttests, which represented the most common applications of hypothesis testing:

1) One Sample ttest – uses the mean of a single sample to determine if the projected population mean is statistically different from a specified value;

2) Two Sample ttest – uses the mean of two samples to determine if the two projected populations are statistically different from each other;

3) Paired ttest – uses the difference of two paired samples to determine if the paired samples (i.e., pre-post treatment) are statistically different.

11Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Non ParametricAll of these tests, required the following to be true:

1.the individual samples were normal;2.the individual samples came from populations with equal variance;3.the individual samples were independent;4.we preferred that the individual samples were of a size greater than 30.

In some cases, our data will violate these assumptions. Specifically, we may find ourselves with small samples, which are not normal and contain extreme observations. If the samples are still independent, with approx equal variance, we use non-parametric tests in lieu of ttests.

12Wednesday, March 4, 2009

Page 7: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Non-Parametric Methods

Do not make distributional assumptions

Eg used for Skewed variable that remains skewed inspite of (say log) transformation => non-normal

Also called DISTRIBUTION-FREE or

RANK METHODS because based on analysis of ranks rather than actual data

13Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Parametric & Non-Parametric tests

Parametric - based on assumptions about the distribution the data is from (usually assume the normal)

Non-parametric - make weaker assumptions, so are applicable in a wider set of circumstances.

14Wednesday, March 4, 2009

Page 8: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Parametric methods

Have greater power IF assumptions are met

Extract more information from the data

Applicable if:

data seems to be normal - look for symmetric data

sample size is big enough

15Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Non-Parametric methods

Usually less powerful

Applicable if:

data is ordinal rather than numeric

sample sizes are small

data is skewed, or otherwise non-normal

16Wednesday, March 4, 2009

Page 9: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Non-Parametric methods

mostly based on ranks - so removes disproportionate effect of huge observations

17Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH 18

Parametric statistical tests often involve the assumption that the underlying population or populations have certain characteristics typically

related to the normal distribution.

Non-parametric statistical tests are either not concerned with the population parameters

or do not depend on the functional form of the population from which the sample has been drawn.

18Wednesday, March 4, 2009

Page 10: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Non-Parametric methods

mostly based on ranks - so removes disproportionate effect of huge observations

19Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

1 Sample Signed Test

Non-parametric equivalent of the 1 SAMPLE T-TEST

ASSUMPTIONS:

Data is non-normally distributed, even after log transforming. This test makes no assumption about the shape of the population distribution, therefore test can handle data that is skewed

20Wednesday, March 4, 2009

Page 11: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

1 Sample Signed TestTHE POINT:

The SIGN TEST simply computes a significance test of a hypothesized median value for a single data set.

Whether some observation is likely to belong to some data set you have defined (ie, whether observation of interest is a typical value of that dataset

Formally, you are either testing to see whether the observation is a good ESTIMATE OF THE MEDIAN, or whether it falls within confidence levels set around that median

21Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

1 Sample Signed TestTHE WORKINGS:

Under the hypothesis that the sample median (!) is equal to some hypothesized value (!0, so H0:!=!0), then you’d expect that 1/2 the data set (ie, S) of sample size n to be > hypothesized value !0. If S>0.5n then !>!0, & if S<0.5n then !<!0.

The SIGN TEST simply computes whether there is a significant deviation from this assumption, & gives you a p-value based on a binomial distribution

22Wednesday, March 4, 2009

Page 12: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Another non-parametric test equivalent of a 1-sample t-test

Sign test considers ONLY whether @ obs is > or < the chosen value of interest

It is preferable to take some account of MAGNITUDE of observations

23Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH 24

Tests of Location: One Sample

One Sample (Ordinary) Sign Test: Ordinal

One of the oldest nonparametric procedures, the Sign Test, has been traced from the 1700s. The data are converted to a series of plus and minus signs by subtracting the median or measure of interest from each observation. The test evaluates the number of plus signs and minus signs.

Assumptions:

1. The variable of interest is measured on at least an ordinal scale.2. The variable of interest is continuous. The n sample measurements

! are designated by X1, X2, …,Xn.3. The sample is a random sample of independent measurements

! from a population with a median M that is hypothesized to be ! the median Mo.

24Wednesday, March 4, 2009

Page 13: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Assumes sample is randomly taken from a popn., with a symmetric freq distribution

the symmetry assumption doesn’t assume normality - simply that there roughly is same no. of values > & < the median

WSTAT computed by summing the ranked differences of the deviation of each variable from a hypothesized median above the hypothesized value

25Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Calculate the difference between @ observation & value of interest

IGNORING the signs of the differences, RANK them in order of magnitude

Calculate the SUM of the RANKS of the smaller of the NEGATIVE (or POSITIVE) ranks & compare to WILCOXON table (small samples n<25) or NORMAL distribution (if larger n)

26Wednesday, March 4, 2009

Page 14: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

27Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

When we have more than one gp of observations: Distinguish (paired vs indpt)

PAIRED - SAME individuals studied more than once (usually in different circumstances)

OR 2 different gps of subjects INDIVDUALLY MATCHED eg MATCHED PAIR CASE-CONTROL study or Cross-Over Trial

28Wednesday, March 4, 2009

Page 15: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Paired data - normally interested in the AVERAGE DIFFERENCE for @ individual & variability of these DIFFERENCES (ie, within-subject variation). Method removes between-subject variation

Since we look at DIFFERENCES effectively becomes a 1 sample problem

ie, we can use 1-sample t-test, sign rank test or wilcoxon signed rank test as above

29Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Since we look at DIFFERENCES effectively becomes a 1 sample problem

ie, we can use 1-sample t-test, sign rank test or wilcoxon signed rank test as above

Use non-parametric versions if DIFFERENCES are NOT normally distributed

eg Wilcoxon signed Rank test as above

30Wednesday, March 4, 2009

Page 16: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Wilcoxon signed rank

Matched case control study of breast cancer & the Oral contraceptive pill (OC).

10 women with breast cancer were matched with 10 age, sex & ses controls

Total duration of time they used OC was noted

31Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Eg

Pair 1 2 3 4 5 6 7 8 9 10Case 2.0 10 7.1 2.3 3.0 4.1 10 11 12.1 15

Control 1.5 9.1 8.1 1.5 3.1 5.2 1 9.6 7.6 9Diff 0.5 0.9 -1 0.8 -0.1-1.1 9 0.9 4.5 6

Ignore signs 0.5 0.9 1 0.8 0.1 1.1 9 0.9 4.5 6Ranks 2 4.5 6 3 1 7 104.5 8 9Signed ranks

2 4.5 -6 3 -1 -7 104.5 8 9

T= abs(-6-1-7) = 14 Look at Wilcoxon Tables

32Wednesday, March 4, 2009

Page 17: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

33Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH 34

Wilcoxon Signed Rank Test: Interval

Tests of Location: One Sample

The Wilcoxon Signed Rank Test is used to determine if a specific point or hypothesized median could be the population median. In addition to the sign of the differences, this procedure uses magnitude of the difference of each

observation from a hypothesized median.

Assumptions:

1. The variable is continuous.2. The underlying or population distribution is symmetrical.3. The scale of measurement is interval so that the

observations can ! be placed in rank order.4. The sample is a random sample of independent measures

from a ! population with an unknown Median M that is compared with ! a hypothesized median Mo.

34Wednesday, March 4, 2009

Page 18: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH 35

Tests of Location: Two Independent Samples

Mann-Whitney U: Ordinal

The Mann-Whitney Test is a nonparametric test that is analogues to the two-sample parametric t-test. The ability to consider the magnitude of the data as well as their direction provides a more powerful statistical test. If both samples have the same median, they should have approximately the same average rank.

Assumptions:

1. The data consist of a random sample of observations !from population 1 with unknown median Mx and ! another random sample of observations from ! population 2 with unknown median My.

2. The two samples are independent.3. The variable observed is a continuous random variable.4. The measurement scale employed is at least ordinal.5. The distribution functions of the two populations do not

! differ. This means they have homogeneous variance.

35Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Mann-Whitney Test

Randomised trial comparing asprin with placebo for headache, 8 patients on asprin & 10 on placebo

rated their improvment on a 10 cm line (VAS scale), a measure of 0 => no improvement & 10=> very much better

36Wednesday, March 4, 2009

Page 19: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Eg1 2 3 4 5 6 7 8 9 10

Asprin (8) 7.5 8.3 9.1 6.25.4 8.3 6.5 8.410.5

Placebo (10)

3.1 5.6 4.56.2 5.1 5.3 5.5 4.1 4.3 4.2

Obs 3.1 4.1 4.24.34.5 5.1 5.3 5.45.5 5.6Rank 1 2 3 4 5 6 7 8 9 10

obs 6.2 6.2 6.5 7.5 8.3 8.3 8.4 9.1Rank 11.5 11.5 13 14 15.5 15.5 17 18

R2 = 1+2+3+4+5+6+7+9+10+11.5 = 58.5 R1 = 8+11.5+13+14+15.5+15.5+17+18 = 112.5

37Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Mann-Whitney U EgCalculate Ranks of each gp:

Calculate U for each gp:U2 = n1*n2+n2(n2+1)/2 - R2 = 8*10+[10*(10+1)/2-58.5 = 76.5

R2 = 1+2+3+4+5+6+7+9+10+11.5 = 58.5 R1 = 8+11.5+13+14+15.5+15.5+17+18 = 112.5

U1 = n1*n2+n1(n1+1)/2 - R1 = 8*10+[8*(8+1)/2-112.5 = 3.5

Reject the Ho if smaller of U1 & U2 is < critical U in Mann-Whitney tables :In this case U1 = 3.5 is smaller & is < Critical U=17

at alpha=0.05 , so conclude p<0.05 (see table)

38Wednesday, March 4, 2009

Page 20: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

U1 = 3.5 < Critical of 17 critical value for n1=8 n2=10 at alpha=0.05 => p<0.05 & reject Ho

39Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Alternatively - 2-sample Non Parametric

If we have two small samples which follow a non-normal distribution, and contain extreme observations, we will use

the Wilcoxon Rank Sum Test in lieu of the simple two sample ttest.

The Wilcoxon Rank Sum test follows the following steps:1.) assign all values across the two samples ascending rankings (the lowest ranking will be 1 and the highest ranking will be n1 + n2);2.) in the case of ties, give the tied values the average ranking (e.g., if two values would generate a “4”, give each value a 4.5, and the next value a 6);3.)Let T denote the summation of the rankings for population 1.

40Wednesday, March 4, 2009

Page 21: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Alternative- Non ParametricWhen using the Wilcoxon Rank Sum test, the null

hypothesis is that the two distributions are the same, and the claim is that they are different – population 1 is shifted

to the left or to the right of population 2 (one tail) or a shift exists in either direction (2 tailed).

The test statistic is T, which is the summation of the ranks of population 1.

The critical value, can be found in Wilcoxon Rank Sum test Tables

41Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Alternative Non Parametric

Decision rule:

1. Reject H0 if T > Tu

2. Reject H0 if T < TL

3. Reject H0 if T > Tu or if T < TL

Note that the power of a Wilcoxon Rank Sum test is always lower than the power

of a ttest.

42Wednesday, March 4, 2009

Page 22: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

43Wednesday, March 4, 2009

Danstan Bagenda PhD March 2009 MUSPH

Kruskal-Wallis TestNon-parametric version of the ANOVA

Rank the observations, ignoring which gp they come from

Add the ranks of observations in @ gp & combine these totals & look at Kruskal-Wallis table

H = 12Sum(n_i(Rbar_i-R_bar)^2/N(N+1) follows Chi-squared dist with k-1 df

HO: gps from popn with same median

44Wednesday, March 4, 2009

Page 23: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH 45

Tests of Location: Three or More Independent Samples Kruskal-Wallis One-way Analysis of Variance by Ranks! This test extends the Mann-Whitney U from two groups to more than two groups, much as the ANOVA extends the t-test. It does require a continuous scale where the Median Test only requires a dichotomy.

Assumptions:1. The data for analyses consist of k random samples of

! sizes n1,n2,n3,…nk.2. The observations are independent both within and among

! samples.3. The variable of interest is continuous.4. The measurement scale is at least ordinal.5. The populations are identical except for a possible

! difference in location for at least one population.

45Wednesday, March 4, 2009

Used to find relationship between two categorical variables, using a contingency table

eg:

!2 test parametric

Nausea

present absent

drug 1 44 60

drug 2 20 51

46Wednesday, March 4, 2009

Page 24: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

H0 : No relationship - number in cell given by (overall row proportion * column total)

Calculate expected values for each cell under this:

!2 test parametric

Nausea

present absent

drug 1 4438.03

6060.02

drug 2 2025.97

5140.98

47Wednesday, March 4, 2009

Test Statistic:

Compare to a Chi-squared distribution with (rows-1)*(cols-1) degrees of freedom

Yates continuity correction for a 2x2 table.

If any of the expected cell counts is <5, use FISHERS EXACT TEST instead

!2 test parametric

(Observed - Expected)

Expected

2

all cells of table

!

( )Observed - Expected -1

2

Expected

2

all cells of table

!

48Wednesday, March 4, 2009

Page 25: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2009 MUSPH

Fishers exact TestIf expected cell in 2x2 table < 5, chi squared in applicable

probability of observing the particular table is:

[(a+b)!(c+d)!(a+c)!(b+d)!]/N!a!b!c!d!

Calculate probability of other tables that can be identified that have the same marginals as the table of results & also give as much or more evidence for an association between factors

49Wednesday, March 4, 2009

Danstan Bagenda PhD March 2008

50

Nonparametric Tests of Location

No. of Samples

Scale

One Sample Two SamplesIndependent

Two SamplesRelated

Three or more Samples

Independent

Three or more

Samples Related

Nominal Binomial TestRuns Test

McNemar Test

Cochran Q

Ordinal Sign Test Median TestMann-Whitney U Test

Sign Test for Two

Dependent Samples

Median TestKruskal-Wallis

ANOVA by Ranks

Friedman Two-way ANOVA

Interval Wilcoxon Signed-ranks

Test

Wilcoxon Matched

Pairs Signed-ranks Test

Parametric Equivalent

One Sample t-test

Two Sample t-test

Paired t-test ANOVA Within Subjects ANOVA

50Wednesday, March 4, 2009

Page 26: Parametric Methods Non-Parametric Statisticsdanstan.com/students/non_parametric_IPH_2009.pdfNon-Parametric Statistics Danstan Bagenda PhD MUSPH 2009 W ednesday, M arch 4, 2009 1 Danstan

Danstan Bagenda PhD March 2008

51

Nonparametric Analysis for Association

No. of VariablesScale

Two Variables Three or more Variables

Nominal Phi Coefficient (2x2)Point Biserial (2xLinear)

Chi Square Test of Independence

Log-Linear (not include)

Ordinal Spearman Rho Kendall's Coefficient of Concordance W

Parametric Equivalent

Pearson Correlation Eta Squared

51Wednesday, March 4, 2009

End

52Wednesday, March 4, 2009