Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Module 2: Measures of Location, Dispersion and Partition
MEASURES OF CENTRAL TENDENCY
A set of data has two main features:
(i) It’s centre
(ii) The way it is arranged around this centre
The word “centre” is also referred to as the average of the data. In essence, an average is
a single figure which is a representative of all the other figures present in the data set.
The individual figures in the data set must have the tendency to cluster around the
average.
The averages are called the measures of central tendency and are very useful in locating
a frequency distribution. As such the measures of central tendency are also called
measures of location.
There are three major types of averages that are often used in Statistics. These are:
(i) Arithmetic Mean (ii) Median (iii) Mode
(i) Arithmetic Mean
The arithmetic mean can be considered under two cases:
Case I: Mean for ungrouped data
The mean 𝑋 is given as:
𝑋 = 𝑋𝑖
𝑛
Example 3:
Find the mean of the following figures: 75, 65, 69, 45, 67, 64, 68, 71, 83, 72
Solution:
𝑋 = 𝑋𝑖
𝑛=
𝑋1 + 𝑋2 + ⋯ + 𝑋10
10
=75 + 65 + 69 + 45 + 67 + 68 + 71 + 83 + 72
10
=679
10
= 67.90
2
Case II: Mean for grouped data
We will consider two methods:
Method I:Unassumed Mean Method
This method is also referred to as the LONG METHOD. The mean 𝑋 is given as:
𝑋 = 𝑓𝑋
𝑓
Method II: Assumed Mean Method
This method is also referred to as the SHORT METHOD. The mean 𝑋 is given as:
𝑋 = 𝐴 + 𝑓𝑑
𝑓
Where 𝑑 = 𝑋𝑖 − 𝐴; A is the assumed mean
Example 4:
The following are the distances in km that Bowen University students have to travel to get to
the University:
21 25 35 47 34 22 55 38 23 32
25 40 33 17 48 38 43 36 13 34
46 27 33 41 32 23 36 35 45 37
33 39 48 33 20 53 31 19 45 35
28 44 35 42 28 50 22 48 36 53
26 38 42 35 54 37 29 34 28 20
39 22 27 57 38 45 29 25 31 35
56 38 41 21 35 55 20 39 43 31
(i) Use tally method to classify the above data into classes 10-14, 15-19, 20-24 etc. (ii) Find the arithmetic mean using the long and short methods.
Solution:
Class Interval Tally Mark f X fX 𝒅 = 𝑿𝒊 − 𝑨 fd
10 – 14 / 1 12 12 -20 -20
15 – 19 // 2 17 34 -15 -30
20 – 24 //////// 10 22 220 -10 -100
25 – 29 //////// / 11 27 297 -5 -55
30 – 34 //////// // 12 32 384 0 0
35 – 39 //////// //////// 20 37 740 5 100
40 – 44 //// /// 8 42 336 10 80
45 – 49 //// /// 8 47 376 15 120
50 – 54 //// 4 52 208 20 80
3
55 – 59 //// 4 57 228 25 100
∑ 80 2835 275
(i) Long Method:
𝑋 = 𝑓𝑋
𝑓=
2835
80= 35.44
(ii) Short Method:
𝑋 = 𝐴 + 𝑓𝑑
𝑓
= 32 +275
80
= 32 + 3.44
= 35.44
Note: X, the class mark is obtained as the midpoint of the class interval. That is,
10+14
2=
24
2=12;
15+19
2=
34
2=12.
Alternatively, since the class size/width (C) is known to be 5 in this example, after
obtaining the first X, the class size can be added to the previous X. For example, after
obtaining 12 (the first X) add 5, which becomes 17 (the second X), adding another 5
gives 22 (the third X) etc.
A, the assumed mean in this example is taken to be 32. But any of the values of the
Xs can be taken and used for the calculation.
(ii) Median:
The median of a set of data arranged in order of magnitude is the middle term if it is
odd and the average of the two middle terms if it is even.
Median for Grouped Data
There are two (2) common methods for obtaining the median for grouped data:
(i) Use of formula (ii) Cumulative frequency curve (ogive)
Our focus is on the use of formula. The formula for the median is given as:
Median (Md) = Lm + N
2− fb
fMd × C
Where:
Lm is the lower limit of the median class
4
fb is the sum of all frequencies before the median class
fMd is the frequency of the median class
C is the class width/size
N is the total number of items
Example 5:
Using the data in Example 4, find the median.
Solution:
Class Interval f cf
10 – 14 1 1
15 – 19 2 3
20 – 24 10 13
25 – 29 11 24
30 – 34 12 36
35 – 39 20 56
40 – 44 8 64
45 – 49 8 72
50 – 54 4 76
55 – 59 4 80
∑ 80
(Md) = Lm + 𝑁
2− fb
fMd × 𝐶
First, obtain the cumulative frequency (cf) by summing the frequencies (f) one after the
other, till the last. That is, starting point is 1, then 1+2 = 3; 3+10 = 13; 13+11 = 24 etc.
Then, find 𝑵
𝟐 to obtain the location of the median class.
𝑁
2=
80
2= 40
Next, go to the cf column to find where 40 falls. That will be contained in 56, since the
previous sum is 36. Tracing that class, we have the median class to be 35 – 39.
Next, obtain:
Lm= 35; fb= 36; fMd = 20; C = 5
5
∴ Median = 35 + 40−36
20 × 5
= 35 + 4
20 × 5
= 35 + 1
= 36
(iii) Mode:
The mode of a set of numbers is that value which occurs with the highest frequency; in other words, the mode is the most common value. The mode may not exist, and even if it does exist it may not be unique. Mode for Grouped Data The mode for grouped data can be obtained in two ways:
(i) Use of formula (ii) Histogram of distribution
Our focus is on the use of formula. The formula for the mode is given as:
Mode = Lm + fm −fa
fm −fa + fm −fb × C
= Lm + fm −fa
2fm −fa−fb × C
Where: Lm is the lower limit of the modal class
fm is the frequency of the modal class
fa is the frequency immediately above the modal class
fb is the frequency immediately below the modal class
Note: For the calculation of the mode, only actual frequencies are required, not the cumulative frequency. Example 6: Given the data below, obtain the mode:
Class Interval f
10 – 14 1
15 – 19 2
20 – 24 10
25 – 29 11
6
30 – 34 12
35 – 39 20
40 – 44 8
45 – 49 8
50 – 54 4
55 – 59 4
∑ 80
Solution:
Mode = Lm + fm −fa
2fm −fa−fb × C
Where: fm = 20; fa = 12; fb = 8; C = 5; Lm = 35
∴Mode = 35 + 20−12
20−12 + 20−8 × 5
= 35 + 8
8 + 12 × 5
= 35 + 8
20 × 5
= 35 + 2
= 37
OR
∴ Mode = 35 + 20−12
2 20 −12−8 × 5
= 35 + 8
40 − 20 × 5
= 35 + 8
20 × 5
= 35 + 2
= 37
7
Assignment 1:
Using the data from Example 4, obtain the arithmetic mean using the short method with
the following assumed means, A:
(i) Biochemistry Students, A= 12
(ii) Medical Laboratory Science Students, A = 22
(iii) Food Science & Technology Students, A = 37
(iv) Nutrition & Dietetics Students, A = 42
(v) Microbiology Students A = 47
(vi) Pure and Applied Biology Students, A = 17
8
MEASURES OF PARTITION
The measures of partition divide a ranked set of data into different parts. The median
for example divides a data set arranged in order of magnitude into two equal parts.
Other measures of partition include:
(i) Quartiles
The quartiles of a ranked set of data are the three values that divide the data set into
four equal parts. These values are denoted by 𝑄1, 𝑄2 and 𝑄3which are called first
(lower), second (middle) and third (upper) quartiles respectively. The second quartile is
the median of the data set.
Lower Quartile (𝑸𝟏)
The formula for the lower quartile is given as:
𝑄1 = L1 +
𝑁
4− fb
f𝑄1 × 𝐶
Where:
L1 is the lower limit of the lower quartile class
fb is the sum of all frequencies before the lower quartile class
f𝑄1 is the frequency of the lower quartile class
C is the class width/size
N is the total number of items
Middle Quartile (𝑸𝟐)
The formula for the middle quartile is given as:
𝑄2 = L2 +
𝑁
2− fb
f𝑄2 × 𝐶
Where:
L2 is the lower limit of the middle quartile class
fb is the sum of all frequencies before the middle quartile class
f𝑄2 is the frequency of the middle quartile class
9
C is the class width/size
N is the total number of items
Upper Quartile (𝑸𝟑)
The formula for the upper quartile is given as:
𝑄3 = L3 +
3𝑁
4− fb
f𝑄3 × 𝐶
Where:
L3 is the lower limit of the upper quartile class
fb is the sum of all frequencies before the upper quartile class
f𝑄3 is the frequency of the upper quartile class
C is the class width/size
N is the total number of items
Interquartile Range (IQR)
IQR is the difference between the upper quartile and lower quartile. That is:
IQR = 𝑄3 − 𝑄1
Semi-Interquartile Range (SIQR)
SIQR is the average of the difference between the upper quartile and lower quartile.
That is:
SIQR =𝑄3 − 𝑄1
2
(ii) Deciles:
The deciles are the nine values that divide the ranked data set into ten (10) parts. They
are denoted by 𝐷1,𝐷2 …, 𝐷9.
The formula for the 6th decile for example is given as:
𝐷6 = L6 +
6𝑁
10− fb
f𝐷6 × 𝐶
Where:
10
L6 is the lower limit of the 6th decile class
fb is the sum of all frequencies before the 6th decile class
f𝐷6 is the frequency of the 6th decile class
C is the class width/size
N is the total number of items
(iii) Percentiles:
The percentiles are the ninety-nine values that divide the distribution into hundred
(100) equal parts. They are denoted by 𝑃1, 𝑃2, …, 𝑃99.
The formula for the 70th percentile for example is given as:
𝑃70 = L70 +
70𝑁
100− fb
f𝑃70 × 𝐶
Where:
L70 is the lower limit of the 70th percentile class
fb is the sum of all frequencies before the 70th percentile class
f𝑃70 is the frequency of the 70th percentile class
C is the class width/size
N is the total number of items
Relationship Between Quartiles and Percentiles
First quartile corresponds to 25th percentile
Second quartile corresponds to 50th percentile
Third quartile corresponds to 75th percentile
Note:
Quartiles, deciles, percentiles, and other values obtained by equal partitions of the data
set are collectively called quantiles.
11
Example 7:
Given the data below, obtain the following: (i) 𝑄1 (ii) 𝑄2 (iii) 𝑄3 (iv) IQR (v) SIQR (vi) 𝐷8 (vii) 𝑃90
Class Interval f cf
10 – 19 2 2
20– 29 10 12
30 – 39 15 27
40 – 49 18 45
50 – 59 10 55
60 – 69 6 61
70 – 79 6 67
80 – 89 10 77
90 – 99 11 88
100 – 109 12 100
∑ 100
Solution:
(i) 𝑸𝟏
𝑄1 = L1 +
𝑁
4− fb
f𝑄1 × 𝐶
𝑁
4=
100
4= 25
Next, go to the cf column to find where 25 falls. That will be contained in 27, since the
previous sum is 12. Tracing that class, we have the lower quartile class to be 30 – 39.
Next, obtain:
L1= 30; fb= 12; f𝑄1= 15; C = 10
∴ 𝑄1 = 30 + 25 − 12
15 × 10 = 38.67
(ii) 𝑸𝟐
𝑄2 = L2 +
𝑁
2− fb
f𝑄2 × 𝐶
𝑁
2=
100
2= 50
Next, go to the cf column to find where 50 falls. That will be contained in 55, since the
previous sum is 45. Tracing that class, we have the middle quartile class to be 50 – 59.
12
Next, obtain:
L2= 50; fb= 45; f𝑄2= 10; C = 10
∴ 𝑄2 = 50 + 50 − 45
10 × 10 = 55
(iii) 𝑸𝟑
𝑄3 = L3 +
3𝑁
4− fb
f𝑄3 × 𝐶
3𝑁
4=
3 × 100
4= 75
Next, go to the cf column to find where 75 falls. That will be contained in 77, since the
previous sum is 67. Tracing that class, we have the upper quartile class to be 80 – 89.
Next, obtain:
L3= 80; fb= 67; f𝑄2= 10; C = 10
∴ 𝑄3 = 80 + 75 − 67
10 × 10 = 88
(iv) IQR
IQR = 88 – 38.67 = 49.33
(v) SIQR
SIQR = 88−38.67
2=
49.33
2= 24.67
(vi) 𝑫𝟖
𝐷8 = L8 +
8𝑁
10− fb
f𝐷8 × 𝐶
8𝑁
10=
8 × 100
10= 80
Next, go to the cf column to find where 80 falls. That will be contained in 88, since the
previous sum is 77. Tracing that class, we have the 8th decile class to be 90 – 99.
Next, obtain:
L8= 90; fb= 77; f𝐷8= 11; C = 10
13
∴ 𝐷8 = 90 + 80 − 77
11 × 10 = 92.73
(vii) 𝑷𝟗𝟎
𝑃90 = L90 +
90𝑁
100− fb
f𝑃90 × 𝐶
90𝑁
100=
90 × 100
100= 90
Next, go to the cf column to find where 90 falls. That will be contained in 100, since the
previous sum is 88. Tracing that class, we have the 90th percentile class to be 100 – 109.
Next, obtain:
L90= 100; fb= 88; f𝑃90= 12; C = 10
∴ 𝑃90 = 100 + 90 − 88
12 × 10 = 101.67
14
MEASURES OF DISPERSION
Measures of dispersion measure how spread out a set of data is. Measures of dispersion
are also referred to as measures of spread. The following measures of dispersion will be
considered:
(i) Range
(ii) Mean deviation
(iii) Variance
(iv) Standard deviation
(v) Coefficient of variation
Range
The range is the highest value minus the lowest value in a given distribution.
Mean Deviation (MD)
The mean deviation is the average of the deviations of the numbers from the mean of
the distribution ignoring all negative signs.
Case I: Mean Deviation for Ungrouped Data
The formula for mean deviation for ungrouped data is given as follows:
MD= 𝑿𝒊−𝑿
𝒏
Case II: Mean Deviation for Grouped Data
The formula for mean deviation for ungrouped data is given as follows:
MD= 𝒇 𝑿𝒊−𝑿
𝒇
Variance
The variance is the sum of the squared deviations from the mean divided by the size of
the population.
𝜎2(sigma square) represents the population variance and 𝑆2 represents the sample
variance.
Note:
The sample variance is used as an estimate when the exact population variance is not
known.
15
We will consider the following formulae for the variance:
Method Variance for ungrouped data Variance for grouped data
I 𝑆2 =
𝑋𝑖 − 𝑋 2
𝑛 − 1 𝑆2 =
𝑓 𝑋𝑖 − 𝑋 2
𝑓 − 1
II
𝑆2 = 𝑋2 −
𝑋 2
𝑛
𝑛 − 1 𝑆2 =
𝑓𝑋2 − 𝑓𝑋 2
𝑓
𝑓 − 1
III
𝑆2 = 𝑑2 −
𝑑 2
𝑛
𝑛 − 1
Where 𝑑 = 𝑋𝑖 − 𝐴; A is the assumed mean
𝑆2 = 𝑓𝑑2 −
𝑓𝑑 2
𝑓
𝑓 − 1
Standard Deviation
Standard deviation is the square root of the variance. In essence, the variance is the
square of the standard deviation.
Coefficient of Variation (CV)
Coefficient of variation represents the ratio of the standard deviation to the mean. It is
very useful for comparing the degree of variation from one data series to another. The
formula for the CV is given by:
CV =Standard Deviation
Mean× 100%
Example 8
Given the data below, find:
(i) Mean deviation (ii) Variance (iii) Standard deviation (iv) Coefficient of variation
Class Interval f
10 – 19 2
20– 29 10
30 – 39 15
40 – 49 18
50 – 59 10
60 – 69 6
70 – 79 6
80 – 89 10
90 – 99 11
100 – 109 12
∑ 100
16
Solution
(i) Mean Deviation
Class Interval
f X fX 𝑿 − 𝑿 𝑿 − 𝑿 𝒇 𝑿 − 𝑿
10 – 19 2 14.5 29 -46.6 46.6 93.2
20– 29 10 24.5 245 -36.6 36.6 366
30 – 39 15 34.5 517.5 -26.6 26.6 399
40 – 49 18 44.5 801 -16.6 16.6 298.8
50 – 59 10 54.5 545 -6.6 6.6 66
60 – 69 6 64.5 387 3.4 3.4 20.4
70 – 79 6 74.5 447 13.4 13.4 80.4
80 – 89 10 84.5 845 23.4 23.4 234
90 – 99 11 94.5 1039.5 33.4 33.4 367.4
100 – 109 12 104.5 1254 43.4 43.4 520.8
∑ 100 6110 2446
First, calculate the mean
𝑋 = 𝑓𝑋
𝑓=
6110
100= 61.10
Then, calculate the mean deviation
MD= 𝒇 𝑿𝒊−𝑿
𝒇=
2446
100= 24.46
(ii) Variance
Class Interval
f X fX 𝑿 − 𝑿 𝑿 − 𝑿 𝟐 𝒇 𝑿 − 𝑿 𝟐 𝑿𝟐 𝒇𝑿𝟐
10 – 19 2 14.5 29 -46.6 2171.56 4343.12 210.25 420.5
20– 29 10 24.5 245 -36.6 1339.56 13395.60 600.25 6002.5
30 – 39 15 34.5 517.5 -26.6 707.56 10613.40 1190.25 17853.75
40 – 49 18 44.5 801 -16.6 275.56 4960.08 1980.25 35644.5
50 – 59 10 54.5 545 -6.6 43.56 435.60 2970.25 29702.5
60 – 69 6 64.5 387 3.4 11.56 69.36 4160.25 24961.5
70 – 79 6 74.5 447 13.4 179.56 1077.36 5550.25 33301.5
80 – 89 10 84.5 845 23.4 547.56 5475.6 7140.25 71402.5
90 – 99 11 94.5 1039.5 33.4 1115.56 12271.16 8930.25 98232.75
100 – 109 12 104.5 1254 43.4 1883.56 22602.72 10920.25 131043
∑ 100 6110 75244 448565
Using Method I:
𝑆2 = 𝑓 𝑋𝑖 − 𝑋 2
𝑓 − 1
17
=75244
100 − 1
=75244
99
= 760.04
Using Method II:
𝑆2 = 𝑓𝑋2 −
𝑓𝑋 2
𝑓
𝑓 − 1
=448565 −
6110 2
100
100 − 1
=448565 − 373321
99
=75244
99
= 760.04
Exercise
Use Method III to calculate the variance.
(iii) Standard Deviation (SD)
SD = Variance
= 760.04
= 27.57
(iv) Coefficient of Variation (CV)
CV =Standard Deviation
Mean× 100%
∴ CV =27.57
61.10× 100%
= 45.12%
18
Assignment 2:
Given the following weights of patients:
68 84 75 82 68 90 62 88 76 93
86 67 73 81 72 63 76 75 85 77
73 79 88 73 60 93 71 59 85 75
66 78 82 75 94 77 69 74 68 60
61 65 75 87 74 62 95 78 63 72
79 62 67 97 78 85 69 65 71 75
96 78 81 61 75 95 60 79 83 71
65 80 73 57 88 78 51 76 53 74
(i) Use tally method to classify the above data into classes 50-54, 55-59, 60-64 etc. (ii) Compute the following:
(a) Mode (b) Median (c) Interquartile Range (d) 7th decile (e) 85th percentile (f) Coefficient of variation