Two sample tests

Two sample tests

Study designs

• Single sample, compare two sub-samples (cross-sectional survey)

• Compare samples from 2 different populations (2 cross sectional surveys, case-control study)

• Single sample; subjects randomly allocated to different interventions (experiment, clinical trial)

Simple randomization

1. Generate n uniform (0,1) random deviates.2. If ui<0.5 assign to intervention A to unit i; if ui> 0.5 assign B.3. Note nA is a random variable with E(nA)=0.5.

Restricted randomization

1. Generate a U(0,1) deviate, ui, for each unit in the sample.2. Sort the deviates from smallest to largest.3. Assign intervention A to the units with the n/2 smallest ui’s.4. Note this results in half of the sample assigned to each intervention;i.e. nA is fixed.

1 2

2 21 2

1 21 1

Suppose we have N subjects. How

many should be allocated to each group?

Let N=n n .

V(Y Y ) .n N n

1 1 2

2 21 22 2

1 1 1

1 1

2 2

Find n that minimizes V(Y Y ).

dV0.

dn n (N n )

n.

n

2 21 2

1 2

If we randomly assign subjects to

interventions, it is reasonable to assume

. Then the optimum allocation

is n n .

1i 1 2j 2

2k k k k

2 21 2

1 2 1 21 2

Two independent samples, normally

distributed data

Y i 1,2,3,...,n ; Y j 1,2,3,...,n .

Y ~ N( , / n ).

Y Y ~ N( , )n n

2 2

If X and Y are 2 r.v. and a and b are

constants, then

E(aX+bY)=aE(X)+bE(Y), and

V(aX+bY)=a V(X)+b V(Y)+2abcov(X,Y)

1 2

2 21 2

1 21 2

Therefore if a=1 and b=-1:

Y Y is approximately distributed as

N( , ).n n

1 2 1 22 21 2

1 2

1 2 1 22 21 2

1 2

(Y Y ) ( )~ N(0,1).

n n

but

(Y Y ) ( ) does not follow t-distn.

s sn n

.2nn

s)1n(s)1n(s

:average) (weighted estimate Pooled

. estimates s and estimates s

.:assumption Additional

21

222

2112

p

222

221

222

21

2nn

21p

212121

t~

n1

n1

s

)()YY(

Example:An experiment was conducted to see if a drug could prevent premature birth. 30 women atrisk of premature birth were assigned to take the drug or a placebo (15 in each group).Outcome: birthweight.

0 1 2

A 1 2

28,

.05

H : (1=drug; 2=placebo)

H :

C {t t }

C {t 1.7}

BirthweightsDrug Placebo

6.9 6.4

7.6 6.7

7.3 5.4

7.6 8.2

6.8 5.3

7.2 6.6

8.0 5.8

5.5 5.7

5.8 6.2

7.3 7.1

8.2 7.0

6.9 6.9

6.8 5.6

5.7 4.2

8.6 6.8

1 1

2 2

2 22p

.o5 0

y 7.08; s 0.899

y 6.26; s 0.961

14(.899) 14(.961)s 0.8659

287.08 6.26

t 2.412

.93115

t C , reject H ; p .01.

k

2 20 1 2

2 2 2k k k n 1

2

Testing H : :

We know that

(n 1)s / ~ k=1,2.

It can be shown that the distribution of the

ratio of two independent distributed

random variables divided by their dfs is

the F distribution

1 2 with df and df degrees

of freedom.

k

1 2

2k

2 1,

1 2

In general:

If X ~ ; k=1,2,

XY= ~ F

X

1 2

2 2k k k k

2 22 1 1 1

2 21 2 2 2

2 21 1

n 1,n 12 22 2

It follows that letting

X (n 1)s / ; k=1,2,

(n 1)(n 1)s /Y=

(n 1)(n 1)s /

s / ~ F

s /

1 2

1 2 1 2

1 2 1 2

22 2 1

0 1 2 n 1,n 122

0

n 1,n 1, / 2 n 1,n 1,1 / 2

2 21 1

n 1,n 1, / 2 n 1,n 1,1 / 22 22 2

0

sUnder H : , ~ F .

s

So, to test H , we find the critical values

F and F .

s sIf F or F ,

s s

we reject H .

1 22 1

, ,, ,1

Note :

1F .

F

2 2 2 20 1 2 A 1

.025,14,14 .025,14,14

2

2

0

Example birthweights:

H : ; H : .

C {F 2.79 F 1/ 2.79 0.36}

.899F 0.88

.961do not reject H .

2 21 2What do we do if ?

Fisher Behrens problem.

1. Satterwaite approximation.

2. Transform the data.

3. Nonparametric methods.

2 2 21 1 2 2

2 2 2 21 1 2 2

1 2

Satterwaite Approximation :

Statistic is usual t statistic.

(s / n s / n )df= 2

(s / n ) (s / n )n 1 n 1

2 21 2

2

4

If , we can consider a variance

stabilizing transformation.

Some examples:

If , W= Y.

If , W=ln(Y).

If , W=1/Y.

1 2

Notes:

(1) We transform both Y and Y .

(2) I rarely use transformations.

1 2

j j

2j j j

If n and n are large, the homogeniety

of variance assumption is not important.

Recall if n is large, Y is approximately

distributed as N( , / n ).

0 1 2

1 22 21 2

1 2

To test H : , we use

(Y Y )z= .

s sn n

0

1 2

Under H , z is approximately distributed

as a N(0,1) variate. The approximation

gets better as n ,n .

j

This approximation is good enough for

practical purposes if n 25; j=1,2.

Note also that the assumption that the Y's

are normally distributed is not needed

for this statistic (Central Limit Theorm).

A study was done to compare the percent body fat of 3rd gradersAt schools on 2 Native AmericanReservations: Gila River (TohonaO’odham) and White River (Apache).

0 T A A T A

T A

.05

H : ; H :

n 63; n 35.

C {z 1.96}.

T T

A A

2 2

0

y 37.9%; s 8.66

y 32.8% s 6.88

37.9 32.8z 3.20

8.66 6.8863 35

reject H ; p=0.0014

If the sample sizes are small and the Y's

are not normally distributed:

1. Transform the Y's.

2. Nonparametric method

1

Wilcoxon-Mann Whitney rank sum test:

1. Pool the two samples and rank them from

smallest to largest.

2. Replace the observations with their ranks.

3. Compute the sum of the ranks, W ,in

group 1.

j j

0 1 2

A 1 2

What hypothesis does the Wilcoxon

procedure test?

Assume Y ~ F (y); j=1,2.

H : F (y) F (y)

H : F (y) F (y ),

where is a constant.

1 2

1

0

1

There are N=n n subjects in our study.

NThus there are possible outcomes.

n

Under H , each is equally likely. We compute

the distribution of W by enumeration.

1 2

Example : 7 students are taking a series

of exams. They are randomly divided into

2 groups: n 3, n =4. After the first exam,

group 1 is told they scored badly on exam 1

regardless of their score; group

2 was

told nothing.

0

A

The null hypothesis is that telling the

students that they are doing poorly

will have no effect on their

subsequent grade.

H : 0

H : 0

Group1 Group2

65 89

73 70

69 92

88

7There are 35 possible outcomes

3

of the study.

Grades on second exam

Ranks W1 Ranks W1 Ranks W1

1,2,3 6 1,5,6 12 2,6,7 15

1,2,4 7 1,5,7 13 3,4,5 12

1,2,5 8 1,6,7 14 3,4,6 13

1,2,6 9 2,3,4 9 3,4,7 14

1,2,7 10 2,3,5 10 3,5,6 14

1,3,4 8 2,3,6 11 3,5,7 15

1,3,5 9 2,3,7 12 3,6,7 16

1,3,6 10 2,4,5 11 4,5,6 15

1,3,7 11 2,4,6 12 4,5,7 16

1,4,5 10 2,4,7 13 4,6,7 17

1,4,6 11 2,5,6 13 5,6,7 18

1,4,7 12 2,5,7 14

1 1 2c.d.f . of W for n 3 and n 4

w F(w)

6 0.02856

7 0.05714

8 0.1143

9 0.2000

10 0.3142

11 0.4286

12 0.5714

13 0.6857

14 0.8000

15 0.8857

16 0.9429

17 0.9714

18 1.0000

0.1

1

0

Note: it is impossible to conduct a

2-sided =0.05 test. We will do a 2-sided

0.1 test. C {6,18}.

Observed W 1 2 4 7. do not

reject H . P value=2(0.05714)=0.1143.

1 2

11

1 21

N(N 1)Note : W W .

2

n (N 1)E(W ) .

2n n (N 1)

V(W ) .12

1 2

1 1

1

If n and n are large,

W E(W )z=

V(W )

will be approximately distributed

as (z).

1 2

1 21

q1 2

i i ii 1

This approximation is good for n ,n 12.

If there are ties:

n n (N 1)V(W )

12n n

{ t (t 1)(t 1)}12N(N 1)

Birthweights (lbs.)Drug Rank Placebo Rank

6.9 18 6.4 11

7.6 25.5 6.7 13

7.3 23.5 5.4 3

7.6 25.5 8.2 27.5

6.8 15 5.3 2

7.2 22 6.6 12

8.0 29 5.8 8.5

5.5 4 5.7 6.5

5.8 8.5 6.2 10

7.3 23.5 7.1 21

8.2 27.5 7.0 20

6.9 18 6.9 18

6.8 15 5.6 5

5.7 6.5 4.2 1

8.6 30 6.8 15

0 A

.05

d

2

d

H : 0; H : 0;

C {z 1.645}

15(31)E(W ) 232.5

2

15 (31)V(W no ties)= 581.25

12

1 2 3 4 5

6 7

q

i i ii 1

2

d

q 7; t 2; t 2; t 3; t 3; t 2;

t 2; t 2.

t (t 1)(t 1) 78

78(15)V(W adj. for ties) 581.25

12(31)(32)

581.25 1.47 579.78

d

0

W 291.5;

291.5 232.5z 2.45

579.78

reject H ; p=0.0071.

1 2

1i 2 j 1 2

1 1i 2 j

Mann-Whitney test:

Consider all n n possible pairs

(Y ,Y ); i=1,2,...,n ; j==1,2,...,n .

Let U # of pairs with Y Y .

1 21 1

It can be shown that:

n (N n 1)U W .

2Therefore the Mann-Whitney and

Wilcoxon tests are equivalent.

k

The Wilcoxon statistic is a special case

of a set of simple linear rank statistics.

Let R be the rank of the kth obs.

in the combined groups; k=1,2,...,N.

N

k k k k kk 1

k

Simple linear rank statistic:

S[a(R ),c ] c a(R ), where a(R ) is

a known function and c is a series

of constants.

N N

k k k kk 1 k 1

N

kk 1

N N2 2k k

k 1 k 1

It can be shown:

1E{S[a(R ),c ]} [ a(R )] c

N

a c

and

V(S)

1[ {a(R ) a} ] (c c)N 1

If N is large, the distribution of

S-E(S)z=

V(S)

is approximately (z).

k k

k

N

k k k k 1k 1

If a(R ) R , and

1 if kth obs in grp.1c

0 if kth obs. in grp.2

S[a(R ),c ] c R W .

k

1 kk

Normal Scores Test:

c as before,

Ra(R ) ( )

N 1

n kk

Savage Scores-logrank test

1 1 1a (R ) ...

N N 1 N R 1

1 1 1ln(c) 1 ...

2 3 c

n k k

k

Therefore:

a (R ) ln(N) ln(N R )

Nln( ).

N R

k

n kk

However, this is undefined if R N, so take

N 1a (R ) ln( )

N R 1

This form of the statistic is called the

logrank statistic and is used in survival

analysis.

1 0.1 0.095312 0.211111 0.2006713 0.336111 0.3184544 0.478968 0.4519855 0.645635 0.6061366 0.845635 0.7884577 1.095635 1.0116018 1.428968 1.2992839 1.928968 1.70474810 2.928968 2.397895

N=10Rank Savage Score Logrank score

Optimum LRS:

Distribution an(Rk)Normal Normal ScoresExponential Savage ScoresLogistic Wilcoxon Scores

1, 21

1 2 1

Permutation test:

N subjects randomly assigned to

N2 groups; n n . There are possible

n

assignments and each is equally likely.

Each of these assignments results in a

value of Y Y . Compute Y

2Y for

each possible outcome.

0

Compute the empirical distribution

under H of equal means. From the

edf, determine the critical region of

the test.

7Example: test scores; 35 possible

3

outcomes.

65 69 70 73 88 89 92 -17.50 65 69 73 70 88 89 92 -15.7565 69 88 70 73 89 92 -7.00 65 69 89 70 73 88 92 -6.4265 69 92 70 73 88 89 -4.67 65 70 73 69 88 89 92 -15.1765 70 88 69 73 89 92 -6.42 65 70 89 69 73 88 92 -5.8365 70 92 69 73 88 89 -4.08 65 73 88 69 70 89 92 -4.6765 73 89 69 70 88 92 -4.08 65 73 92 69 70 88 89 -2.3365 88 89 69 70 73 92 4.67 65 88 92 69 70 73 89 6.4265 89 92 69 70 73 92 6.00 69 70 73 65 88 89 92 -12.8369 70 88 65 73 89 92 -4.08 69 70 89 65 73 88 92 -3.5069 70 92 65 73 88 89 -1.75 69 73 88 65 70 89 92 -2.3369 73 89 65 70 88 92 -1.75 69 73 92 65 70 88 89 0.0069 88 89 65 70 73 92 7.00 69 88 92 65 70 73 89 8.7569 89 92 65 70 73 88 9.33 70 73 88 65 69 89 92 -1.7570 73 89 65 69 88 92 -1.17 70 73 92 65 69 88 89 0.5870 88 89 65 69 73 92 7.58 70 88 92 65 69 73 89 9.3370 89 92 65 69 73 88 9.92 73 88 89 65 69 70 92 9.3373 88 92 65 69 70 89 11.08 73 89 92 65 69 70 92 10.6788 89 92 65 69 70 73 20.42

Grp1 Grp2 diff. Grp1 Grp2 diff.

-17.50 0.029 6.41 0.714-15.75 0.057 7.00 0.743-15.17 0.086 7.58 0.771-12.83 0.114 8.75 0.800 -7.00 0.143 9.33 0.886 -6.42 0.200 9.92 0.914 -5.83 0.229 10.67 0.943 -4.67 0.286 11.08 0.971 -4.08 0.371 20.42 1.000 -3.50 0.400 -2.33 0.457 -1.75 0.543 -1.17 0.571 0.00 0.600 0.58 0.629 4.67 0.657 6.00 0.686

CDF of diff d P(Diff≤d) d (Diff≤d)

.1

1 2

0

Critical region:

C {d 17.5 or d=20.42},

where d=Y Y .

Observed d=-15.75; do not reject H .

Permutation Test

• No assumptions except random assignment

• Computations extensive if N is moderately large

• CLT type theorem shows that for large samples normal approximation is good.

Documents

Two sample tests