Upload
beverly-charles
View
273
Download
0
Embed Size (px)
Citation preview
1
Nonparametric Methods(非参数统计)
Chapter 15
Nonparametric Methods
15.1The Sign Test: A Hypothesis Test about the Median(符号检验)
15.2 The Wilcoxon Rank Sum Test ( Wilcoxon 符号和检验)
1.1 Nonparametric Tests ( 非参数检验 )
A. One-Sample Mean Test
Many tests are concern with testing some parameter under a certain distribution.
Test under a normal population if is known, the Z-test is recommended, where is the sample mean
and n is the sample size.
0100 :H vs:H ),(N 2
2
n/
XZ 0
X
1.1 Nonparametric Tests
B. Two-Sample Mean Tests
Test under two respective normal populations
and . If a t-test is suggested.
211210 :H vs:H
),(N 211
),(N 222
unknown are 21
In most cases the variances are unknown.
Comparing Means of Two Populationsunknown are 21
2
21
2
212121
~11
)()(
nn
p
t
nns
μμXXt
2 population from taken sample theof size
2 population from taken sample theof variance
2 population from taken sample theofmean
1 population from taken sample theof size
1 population from taken sample theof variance
1 population from taken sample theofmean
)1()1(
)1()1( variancepooled where
2
22
2
1
21
1
21
222
2112
n
s
X
n
s
X
nn
snsns p
1.1 Nonparametric Tests
If the data are not normal distributed, the distribution of the t-statistic is unknown and depends the distribution of the populations.
There are a huge amount of underlying distributions.
Can we have some tests that are distribution free? The nonparametric test is one of such kinds of tests.
A local pizza restaurant located close to a college campus advertises that their delivery time to a college dormitory is less than for a local branch of a national pizza chain.
In order to determine whether this advertisement is valid, you and some friends have decided to order 10 pizzas from the local pizza restaurant and 10 pizzas from the national chain, all at different times. The delivery times in minutes PIZZATIME are shown.
Example 1.1 Delivery times
Testing for the difference in the mean delivery times
Local Chain16.8 18.1 22.0 19.511.7 14.1 15.2 17.015.6 21.8 18.7 19.516.7 13.9 15.6 16.517.5 20.8 20.8 24.0
Example 1.1 Delivery times
We can use t-test for this comparison if the delivery times are normal distributed.
Since the distribution of delivery times is not normal distributed, we might have difficulty to use the t-test.
Example 1.1 Delivery times
Example 1.1 Delivery times
We can consider the following way to compare these two restaurants
Local 16.8 11.7 15.6 16.7 15.7 18.1 14.1 21.8 13.9 20.8
Chain 22.0 15.2 18.7 15.6 20.8 19.5 17.0 19.5 16.5 24.0
result + + + - + + + - + +
If two restaurants have the same level of the delivery time, there is a half chance for “+” and another half for “-”.
The number of “+”, denoted by T, follows the binomial distribution with p=0.5.
The number of “-” also follows the binomial distribution with p=0.5.
T=8 in this example.
1.2 Sign Test (符号检验 )
Review: Binomial Distribution
A. Bernoulli trials A trial with only two outcomes (yes or no,
success or fail, boy or girl, win or loss, 1 or 0) and related probabilities p and 1-p, is called a Bernoulli trial.
B. Several Bernoulli trials Let X be the number of success in n
independently identical Bernoulli trials . Random variable is said to follow a binomial distribution B(n;p).
Review: Binomial Distribution
C. Binomial probability distribution (二项概率分布 )
The probability of X=k is given by
)10( sample in the successes ofnumber
failure ofy probabilit1
success ofy probabilit
nsobservatio ofnumber
and given successes ofy probabilit where
)1()!(!
!)(
,n,,kk
-p
p
n
pnkk)P(X
ppknk
nkXP knk
1.2 Sign Test: Example 1.1
5.0p:H vs5.0p:H 10
One tailed test
0.05510)X(P)9X(P)8X(P)8X(P
055.0)8X(Pvalue p
Binomial Test
1.00 8 .80 .50 .109.00 2 .20
10 1.00
Group 1Group 2Total
resultCategory N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
SPSS result:
1.2 Sign Test: Example 1.1
5.0p:H vs5.0p:H 10
Two tailed test
0.05510)X(P)9X(P)8X(P)8X(P
11.0)8X(2Pvalue p
Binomial Test
1.00 8 .80 .50 .109.00 2 .20
10 1.00
Group 1Group 2Total
resultCategory N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
SPSS result:
An Italian restaurant, close to a college campus, contemplated a new recipe for the sauce used on its pizza. A random sample of eight students was chosen, and each was asked to rate on a scale from 1 to 10 the tastes of the original sauce and the propose new one. The scores of the tests comparison are:
Example 1.2: Product Preference
We can’t use the t-test for this data as the score is not normal distributed.
The statistic T, the number of “+”, follows
B (7;0.5) as the score of case “G” is zero. This sample gives T=2 .
Example 1.2: Product Preference
Binomial Test
.00 5 .71 .50 .4531.00 2 .29
7 1.00
Group 1Group 2Total
VAR00005Category N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
1.2 Sign Test: Example 1.2
5.0p:H0 One tailed test
0.2266)2X(P)1X(P)0X(P)2X(Pvalue pSPSS result:
5.0p:H1
There is no overall tendency to prefer one product to the other
A majority prefer the new product (or fewer than 50% prefer the old product)
1.2 Sign Test: Example 1.2
5.0p:H vs5.0p:H 10
Two tailed test
0.45320.22662)2X(2Pvaluep
SPSS result:
Also, note that
0.4532)5X(P)2X(Pvaluep
Binomial Test
.00 5 .71 .50 .4531.00 2 .29
7 1.00
Group 1Group 2Total
VAR00005Category N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
Review: Binomial Distribution
C. Properties of the binomial distribution The expectation of B(n;p) is The variance of B(n;p) is The standard deviation of B(n;p) is
np
)1( pnp )1( pnp
D. Normal Approximation (Section 6.4 of the book) )p;n(B~X
)p1(np
npa-
)p1(np
npb
)p1(np
npbZ
)p1(np
npaP
)p1(np
npb
)p1(np
npX
)p1(np
npaP)bXa(P
Review: Binomial Distribution
where is the distribution function of )( )1,0(N
Example 1.3 Customer Sales
(Example 6.8, p. 213)
A saleswoman makes initial telephone contact with potential customers in an effort to assess whether a follow-up visit to their homes is likely to be worthwhile. Her experience suggests that 40% of the initial contacts lead to follow-up visit. If she contacts 100 people by telephone, what is the probability that between 45 and 50 home visits will result?
Solution to Example 1.3: Customer Sales
Solution Let X be the number of follow-up visits. Then X has a binomial distribution with n=100 and p=0.40. Approximating the required probability gives
0.1332
8461.09793.0
(1.02)-(2.04)
)04.2Z02.1(P
)6.0)(4.0)((100
)4.0)((10050Z
)6.0)(4.0)((100
)4.0)((10045P)50X45(P
This probability is shown as an area under the standard normal curve below.
Solution to Example 1.3: Customer Sales
Number of Successes
The continuity correction
Since the binomial distribution is discrete and the normal distribution is continuous, it is common practice to use continuity correction in the approximation:
Return to Example 1.3
)p1(np
np0.5-a-
)p1(np
np0.5b)bXa(P
0.1587
8208.09795.0
(0.92)-(2.14)
)6.0)(4.0)((100
)4.0)((1000.5-45
)6.0)(4.0)((100
)4.0)((1005.050)50X45(P
1.2 Sign test: normal approximation n5.0)p1(np ,5n.0np
The approximation test-statistic
n5.0
5n.0TTz
**
where corrected for continuity defined as follows:
*T
a. For a two-tail test
T if ,5.0T
T if ,5.0TT*
c. For an lower tail test 5.0TT*
b. For an upper tail test 5.0TT*
Example 1.4 Ice Cream
Solution:
Use the normal approximation equations:
Example 1.4 Ice Cream
5.40T ,4840 since 53.1899.4
485.40TZ *
*
Binomial Test
56.00 56 .58 .50 .125a
40.00 40 .4296 1.00
Group 1Group 2Total
VAR00002Category N
ObservedProp. Test Prop.
Asymp. Sig.(2-tailed)
Based on Z Approximation.a.
The SPSS output:
126.00630.02value p
1.3 Sign test for single population median
Example 1.5
Solution:
The dean of the School of Business Administration at a particular university would like information about the starting incomes of recent college graduates. A random sample of 23 recent graduates indicated the following starting salaries:29250 29900 28070 31400 31100 29000 33000 50000 28500 3100034800 42100 33200 36000 65800 34000 29900 32000 31500 29900
32890 36000 35000
Do the data indicate that the median starting income differs from $35000?
35000$Median:H VS 35000$Median:H 10
Since the distribution of incomes is often skewed, the sign test is recommended. There is a half chance that the income is greater than $35,000 if the hypothesis is true. Let T be the number of the income > $35,000.
N=23-1=22 as one data=$35,000. T=17
Solution to Example 1.5
35.2345.2
115.0TZ
11220.55n.0np
345.2225.0
0188.00094.02value p
SPSS output to Example 1.5
Binomial Test
<= 35000 17 .77 .50 .017> 35000 5 .23
22 1.00
Group 1Group 2Total
VAR00001Category N
ObservedProp. Test Prop.
Exact Sig.(2-tailed)
1.4 Wilcoxon Rank Sum Test
Two population identical test
Take a sample of size from the first population, and a sample of size from the second population,
We Want to test
1n
2n)x(F1
)x(F2
211210 FF:H vsFF:H
1.4 Wilcoxon Rank Sum Test
The sign test does not use all the information from the data set.
The sign test for the delivery time in Example 1.1 ignores the time length. The Wilcoxon rank sum test provides a method to incorporate information about the magnitude of the differences between two populations.
1.4 Wilcoxon Rank Sum Test
Two samples are pooled and sorted them in ascending order.
Let T denote the sum of the ranks of the observations from the first population.
Wilcoxon Rank Sum Test: Example 1.1
Sort the Local data 11.7, 13.9, 14.1, 15.6, 16.7, 16.8, 17.5, 18.1,
20,8, 21.8 Sort the Chain data 15.2, 15.6, 16.5, 17.0, 18.7, 19.5, 19.5, 20,8,
22.0, 24.0 Sort the mixed dataRank 1 2 3 4 5 6 7 8 9 10
Local 11.7
13.9
14.1
15.6 16.7
16.8
Chain 15.2
15.6 16.5
17.0
Rank 11 12 13 14 15 16 17 18 19 20Local 17.
518.1
20.8 21.8
Chain 18.7
19.5
19.5
20.8 22.0
24.0
Wilcoxon Rank Sum Test: Example 1.1
Sum of the rank
Test-statistic
Normal approximation
861816.51211985.5321Tlocal
86TT local
105,2
)11010(10)T(E
175
12
)11010(1010)T(Var
4363,.-1175
10586Z
0.1510.07552valuep
SPSS output to Example 1.1
Ranks
10 8.60 86.0010 12.40 124.0020
grouplocalchainTotal
timeN Mean Rank Sum of Ranks
Test Statistics b
31.00086.000-1.438
.150
.165a
Mann-Whitney UWilcoxon WZAsymp. Sig. (2-tailed)Exact Sig.[2*(1-tailed Sig.)]
time
Not corrected for ties.a.
Grouping Variable: groupb.
Example 1.6
Example 1.6
64402
)18080(80)T(E
8586712
)18080(8080)T(Var
89.285867
64407287Z
0.00380.00192valuep
Solution: