18
1 Module 2: Measures of Location, Dispersion and Partition MEASURES OF CENTRAL TENDENCY A set of data has two main features: (i) It’s centre (ii) The way it is arranged around this centre The word “centre” is also referred to as the average of the data. In essence, an average is a single figure which is a representative of all the other figures present in the data set. The individual figures in the data set must have the tendency to cluster around the average. The averages are called the measures of central tendency and are very useful in locating a frequency distribution. As such the measures of central tendency are also called measures of location. There are three major types of averages that are often used in Statistics. These are: (i) Arithmetic Mean (ii) Median (iii) Mode (i) Arithmetic Mean The arithmetic mean can be considered under two cases: Case I: Mean for ungrouped data The mean is given as: = Example 3: Find the mean of the following figures: 75, 65, 69, 45, 67, 64, 68, 71, 83, 72 Solution: = = 1 + 2 + + 10 10 = 75 + 65 + 69 + 45 + 67 + 68 + 71 + 83 + 72 10 = 679 10 = 67.90

Module 2: Measures of Location, Dispersion and Partition

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 2: Measures of Location, Dispersion and Partition

1

Module 2: Measures of Location, Dispersion and Partition

MEASURES OF CENTRAL TENDENCY

A set of data has two main features:

(i) It’s centre

(ii) The way it is arranged around this centre

The word “centre” is also referred to as the average of the data. In essence, an average is

a single figure which is a representative of all the other figures present in the data set.

The individual figures in the data set must have the tendency to cluster around the

average.

The averages are called the measures of central tendency and are very useful in locating

a frequency distribution. As such the measures of central tendency are also called

measures of location.

There are three major types of averages that are often used in Statistics. These are:

(i) Arithmetic Mean (ii) Median (iii) Mode

(i) Arithmetic Mean

The arithmetic mean can be considered under two cases:

Case I: Mean for ungrouped data

The mean 𝑋 is given as:

𝑋 = 𝑋𝑖

𝑛

Example 3:

Find the mean of the following figures: 75, 65, 69, 45, 67, 64, 68, 71, 83, 72

Solution:

𝑋 = 𝑋𝑖

𝑛=

𝑋1 + 𝑋2 + ⋯ + 𝑋10

10

=75 + 65 + 69 + 45 + 67 + 68 + 71 + 83 + 72

10

=679

10

= 67.90

Page 2: Module 2: Measures of Location, Dispersion and Partition

2

Case II: Mean for grouped data

We will consider two methods:

Method I:Unassumed Mean Method

This method is also referred to as the LONG METHOD. The mean 𝑋 is given as:

𝑋 = 𝑓𝑋

𝑓

Method II: Assumed Mean Method

This method is also referred to as the SHORT METHOD. The mean 𝑋 is given as:

𝑋 = 𝐴 + 𝑓𝑑

𝑓

Where 𝑑 = 𝑋𝑖 − 𝐴; A is the assumed mean

Example 4:

The following are the distances in km that Bowen University students have to travel to get to

the University:

21 25 35 47 34 22 55 38 23 32

25 40 33 17 48 38 43 36 13 34

46 27 33 41 32 23 36 35 45 37

33 39 48 33 20 53 31 19 45 35

28 44 35 42 28 50 22 48 36 53

26 38 42 35 54 37 29 34 28 20

39 22 27 57 38 45 29 25 31 35

56 38 41 21 35 55 20 39 43 31

(i) Use tally method to classify the above data into classes 10-14, 15-19, 20-24 etc. (ii) Find the arithmetic mean using the long and short methods.

Solution:

Class Interval Tally Mark f X fX 𝒅 = 𝑿𝒊 − 𝑨 fd

10 – 14 / 1 12 12 -20 -20

15 – 19 // 2 17 34 -15 -30

20 – 24 //////// 10 22 220 -10 -100

25 – 29 //////// / 11 27 297 -5 -55

30 – 34 //////// // 12 32 384 0 0

35 – 39 //////// //////// 20 37 740 5 100

40 – 44 //// /// 8 42 336 10 80

45 – 49 //// /// 8 47 376 15 120

50 – 54 //// 4 52 208 20 80

Page 3: Module 2: Measures of Location, Dispersion and Partition

3

55 – 59 //// 4 57 228 25 100

∑ 80 2835 275

(i) Long Method:

𝑋 = 𝑓𝑋

𝑓=

2835

80= 35.44

(ii) Short Method:

𝑋 = 𝐴 + 𝑓𝑑

𝑓

= 32 +275

80

= 32 + 3.44

= 35.44

Note: X, the class mark is obtained as the midpoint of the class interval. That is,

10+14

2=

24

2=12;

15+19

2=

34

2=12.

Alternatively, since the class size/width (C) is known to be 5 in this example, after

obtaining the first X, the class size can be added to the previous X. For example, after

obtaining 12 (the first X) add 5, which becomes 17 (the second X), adding another 5

gives 22 (the third X) etc.

A, the assumed mean in this example is taken to be 32. But any of the values of the

Xs can be taken and used for the calculation.

(ii) Median:

The median of a set of data arranged in order of magnitude is the middle term if it is

odd and the average of the two middle terms if it is even.

Median for Grouped Data

There are two (2) common methods for obtaining the median for grouped data:

(i) Use of formula (ii) Cumulative frequency curve (ogive)

Our focus is on the use of formula. The formula for the median is given as:

Median (Md) = Lm + N

2− fb

fMd × C

Where:

Lm is the lower limit of the median class

Page 4: Module 2: Measures of Location, Dispersion and Partition

4

fb is the sum of all frequencies before the median class

fMd is the frequency of the median class

C is the class width/size

N is the total number of items

Example 5:

Using the data in Example 4, find the median.

Solution:

Class Interval f cf

10 – 14 1 1

15 – 19 2 3

20 – 24 10 13

25 – 29 11 24

30 – 34 12 36

35 – 39 20 56

40 – 44 8 64

45 – 49 8 72

50 – 54 4 76

55 – 59 4 80

∑ 80

(Md) = Lm + 𝑁

2− fb

fMd × 𝐶

First, obtain the cumulative frequency (cf) by summing the frequencies (f) one after the

other, till the last. That is, starting point is 1, then 1+2 = 3; 3+10 = 13; 13+11 = 24 etc.

Then, find 𝑵

𝟐 to obtain the location of the median class.

𝑁

2=

80

2= 40

Next, go to the cf column to find where 40 falls. That will be contained in 56, since the

previous sum is 36. Tracing that class, we have the median class to be 35 – 39.

Next, obtain:

Lm= 35; fb= 36; fMd = 20; C = 5

Page 5: Module 2: Measures of Location, Dispersion and Partition

5

∴ Median = 35 + 40−36

20 × 5

= 35 + 4

20 × 5

= 35 + 1

= 36

(iii) Mode:

The mode of a set of numbers is that value which occurs with the highest frequency; in other words, the mode is the most common value. The mode may not exist, and even if it does exist it may not be unique. Mode for Grouped Data The mode for grouped data can be obtained in two ways:

(i) Use of formula (ii) Histogram of distribution

Our focus is on the use of formula. The formula for the mode is given as:

Mode = Lm + fm −fa

fm −fa + fm −fb × C

= Lm + fm −fa

2fm −fa−fb × C

Where: Lm is the lower limit of the modal class

fm is the frequency of the modal class

fa is the frequency immediately above the modal class

fb is the frequency immediately below the modal class

Note: For the calculation of the mode, only actual frequencies are required, not the cumulative frequency. Example 6: Given the data below, obtain the mode:

Class Interval f

10 – 14 1

15 – 19 2

20 – 24 10

25 – 29 11

Page 6: Module 2: Measures of Location, Dispersion and Partition

6

30 – 34 12

35 – 39 20

40 – 44 8

45 – 49 8

50 – 54 4

55 – 59 4

∑ 80

Solution:

Mode = Lm + fm −fa

2fm −fa−fb × C

Where: fm = 20; fa = 12; fb = 8; C = 5; Lm = 35

∴Mode = 35 + 20−12

20−12 + 20−8 × 5

= 35 + 8

8 + 12 × 5

= 35 + 8

20 × 5

= 35 + 2

= 37

OR

∴ Mode = 35 + 20−12

2 20 −12−8 × 5

= 35 + 8

40 − 20 × 5

= 35 + 8

20 × 5

= 35 + 2

= 37

Page 7: Module 2: Measures of Location, Dispersion and Partition

7

Assignment 1:

Using the data from Example 4, obtain the arithmetic mean using the short method with

the following assumed means, A:

(i) Biochemistry Students, A= 12

(ii) Medical Laboratory Science Students, A = 22

(iii) Food Science & Technology Students, A = 37

(iv) Nutrition & Dietetics Students, A = 42

(v) Microbiology Students A = 47

(vi) Pure and Applied Biology Students, A = 17

Page 8: Module 2: Measures of Location, Dispersion and Partition

8

MEASURES OF PARTITION

The measures of partition divide a ranked set of data into different parts. The median

for example divides a data set arranged in order of magnitude into two equal parts.

Other measures of partition include:

(i) Quartiles

The quartiles of a ranked set of data are the three values that divide the data set into

four equal parts. These values are denoted by 𝑄1, 𝑄2 and 𝑄3which are called first

(lower), second (middle) and third (upper) quartiles respectively. The second quartile is

the median of the data set.

Lower Quartile (𝑸𝟏)

The formula for the lower quartile is given as:

𝑄1 = L1 +

𝑁

4− fb

f𝑄1 × 𝐶

Where:

L1 is the lower limit of the lower quartile class

fb is the sum of all frequencies before the lower quartile class

f𝑄1 is the frequency of the lower quartile class

C is the class width/size

N is the total number of items

Middle Quartile (𝑸𝟐)

The formula for the middle quartile is given as:

𝑄2 = L2 +

𝑁

2− fb

f𝑄2 × 𝐶

Where:

L2 is the lower limit of the middle quartile class

fb is the sum of all frequencies before the middle quartile class

f𝑄2 is the frequency of the middle quartile class

Page 9: Module 2: Measures of Location, Dispersion and Partition

9

C is the class width/size

N is the total number of items

Upper Quartile (𝑸𝟑)

The formula for the upper quartile is given as:

𝑄3 = L3 +

3𝑁

4− fb

f𝑄3 × 𝐶

Where:

L3 is the lower limit of the upper quartile class

fb is the sum of all frequencies before the upper quartile class

f𝑄3 is the frequency of the upper quartile class

C is the class width/size

N is the total number of items

Interquartile Range (IQR)

IQR is the difference between the upper quartile and lower quartile. That is:

IQR = 𝑄3 − 𝑄1

Semi-Interquartile Range (SIQR)

SIQR is the average of the difference between the upper quartile and lower quartile.

That is:

SIQR =𝑄3 − 𝑄1

2

(ii) Deciles:

The deciles are the nine values that divide the ranked data set into ten (10) parts. They

are denoted by 𝐷1,𝐷2 …, 𝐷9.

The formula for the 6th decile for example is given as:

𝐷6 = L6 +

6𝑁

10− fb

f𝐷6 × 𝐶

Where:

Page 10: Module 2: Measures of Location, Dispersion and Partition

10

L6 is the lower limit of the 6th decile class

fb is the sum of all frequencies before the 6th decile class

f𝐷6 is the frequency of the 6th decile class

C is the class width/size

N is the total number of items

(iii) Percentiles:

The percentiles are the ninety-nine values that divide the distribution into hundred

(100) equal parts. They are denoted by 𝑃1, 𝑃2, …, 𝑃99.

The formula for the 70th percentile for example is given as:

𝑃70 = L70 +

70𝑁

100− fb

f𝑃70 × 𝐶

Where:

L70 is the lower limit of the 70th percentile class

fb is the sum of all frequencies before the 70th percentile class

f𝑃70 is the frequency of the 70th percentile class

C is the class width/size

N is the total number of items

Relationship Between Quartiles and Percentiles

First quartile corresponds to 25th percentile

Second quartile corresponds to 50th percentile

Third quartile corresponds to 75th percentile

Note:

Quartiles, deciles, percentiles, and other values obtained by equal partitions of the data

set are collectively called quantiles.

Page 11: Module 2: Measures of Location, Dispersion and Partition

11

Example 7:

Given the data below, obtain the following: (i) 𝑄1 (ii) 𝑄2 (iii) 𝑄3 (iv) IQR (v) SIQR (vi) 𝐷8 (vii) 𝑃90

Class Interval f cf

10 – 19 2 2

20– 29 10 12

30 – 39 15 27

40 – 49 18 45

50 – 59 10 55

60 – 69 6 61

70 – 79 6 67

80 – 89 10 77

90 – 99 11 88

100 – 109 12 100

∑ 100

Solution:

(i) 𝑸𝟏

𝑄1 = L1 +

𝑁

4− fb

f𝑄1 × 𝐶

𝑁

4=

100

4= 25

Next, go to the cf column to find where 25 falls. That will be contained in 27, since the

previous sum is 12. Tracing that class, we have the lower quartile class to be 30 – 39.

Next, obtain:

L1= 30; fb= 12; f𝑄1= 15; C = 10

∴ 𝑄1 = 30 + 25 − 12

15 × 10 = 38.67

(ii) 𝑸𝟐

𝑄2 = L2 +

𝑁

2− fb

f𝑄2 × 𝐶

𝑁

2=

100

2= 50

Next, go to the cf column to find where 50 falls. That will be contained in 55, since the

previous sum is 45. Tracing that class, we have the middle quartile class to be 50 – 59.

Page 12: Module 2: Measures of Location, Dispersion and Partition

12

Next, obtain:

L2= 50; fb= 45; f𝑄2= 10; C = 10

∴ 𝑄2 = 50 + 50 − 45

10 × 10 = 55

(iii) 𝑸𝟑

𝑄3 = L3 +

3𝑁

4− fb

f𝑄3 × 𝐶

3𝑁

4=

3 × 100

4= 75

Next, go to the cf column to find where 75 falls. That will be contained in 77, since the

previous sum is 67. Tracing that class, we have the upper quartile class to be 80 – 89.

Next, obtain:

L3= 80; fb= 67; f𝑄2= 10; C = 10

∴ 𝑄3 = 80 + 75 − 67

10 × 10 = 88

(iv) IQR

IQR = 88 – 38.67 = 49.33

(v) SIQR

SIQR = 88−38.67

2=

49.33

2= 24.67

(vi) 𝑫𝟖

𝐷8 = L8 +

8𝑁

10− fb

f𝐷8 × 𝐶

8𝑁

10=

8 × 100

10= 80

Next, go to the cf column to find where 80 falls. That will be contained in 88, since the

previous sum is 77. Tracing that class, we have the 8th decile class to be 90 – 99.

Next, obtain:

L8= 90; fb= 77; f𝐷8= 11; C = 10

Page 13: Module 2: Measures of Location, Dispersion and Partition

13

∴ 𝐷8 = 90 + 80 − 77

11 × 10 = 92.73

(vii) 𝑷𝟗𝟎

𝑃90 = L90 +

90𝑁

100− fb

f𝑃90 × 𝐶

90𝑁

100=

90 × 100

100= 90

Next, go to the cf column to find where 90 falls. That will be contained in 100, since the

previous sum is 88. Tracing that class, we have the 90th percentile class to be 100 – 109.

Next, obtain:

L90= 100; fb= 88; f𝑃90= 12; C = 10

∴ 𝑃90 = 100 + 90 − 88

12 × 10 = 101.67

Page 14: Module 2: Measures of Location, Dispersion and Partition

14

MEASURES OF DISPERSION

Measures of dispersion measure how spread out a set of data is. Measures of dispersion

are also referred to as measures of spread. The following measures of dispersion will be

considered:

(i) Range

(ii) Mean deviation

(iii) Variance

(iv) Standard deviation

(v) Coefficient of variation

Range

The range is the highest value minus the lowest value in a given distribution.

Mean Deviation (MD)

The mean deviation is the average of the deviations of the numbers from the mean of

the distribution ignoring all negative signs.

Case I: Mean Deviation for Ungrouped Data

The formula for mean deviation for ungrouped data is given as follows:

MD= 𝑿𝒊−𝑿

𝒏

Case II: Mean Deviation for Grouped Data

The formula for mean deviation for ungrouped data is given as follows:

MD= 𝒇 𝑿𝒊−𝑿

𝒇

Variance

The variance is the sum of the squared deviations from the mean divided by the size of

the population.

𝜎2(sigma square) represents the population variance and 𝑆2 represents the sample

variance.

Note:

The sample variance is used as an estimate when the exact population variance is not

known.

Page 15: Module 2: Measures of Location, Dispersion and Partition

15

We will consider the following formulae for the variance:

Method Variance for ungrouped data Variance for grouped data

I 𝑆2 =

𝑋𝑖 − 𝑋 2

𝑛 − 1 𝑆2 =

𝑓 𝑋𝑖 − 𝑋 2

𝑓 − 1

II

𝑆2 = 𝑋2 −

𝑋 2

𝑛

𝑛 − 1 𝑆2 =

𝑓𝑋2 − 𝑓𝑋 2

𝑓

𝑓 − 1

III

𝑆2 = 𝑑2 −

𝑑 2

𝑛

𝑛 − 1

Where 𝑑 = 𝑋𝑖 − 𝐴; A is the assumed mean

𝑆2 = 𝑓𝑑2 −

𝑓𝑑 2

𝑓

𝑓 − 1

Standard Deviation

Standard deviation is the square root of the variance. In essence, the variance is the

square of the standard deviation.

Coefficient of Variation (CV)

Coefficient of variation represents the ratio of the standard deviation to the mean. It is

very useful for comparing the degree of variation from one data series to another. The

formula for the CV is given by:

CV =Standard Deviation

Mean× 100%

Example 8

Given the data below, find:

(i) Mean deviation (ii) Variance (iii) Standard deviation (iv) Coefficient of variation

Class Interval f

10 – 19 2

20– 29 10

30 – 39 15

40 – 49 18

50 – 59 10

60 – 69 6

70 – 79 6

80 – 89 10

90 – 99 11

100 – 109 12

∑ 100

Page 16: Module 2: Measures of Location, Dispersion and Partition

16

Solution

(i) Mean Deviation

Class Interval

f X fX 𝑿 − 𝑿 𝑿 − 𝑿 𝒇 𝑿 − 𝑿

10 – 19 2 14.5 29 -46.6 46.6 93.2

20– 29 10 24.5 245 -36.6 36.6 366

30 – 39 15 34.5 517.5 -26.6 26.6 399

40 – 49 18 44.5 801 -16.6 16.6 298.8

50 – 59 10 54.5 545 -6.6 6.6 66

60 – 69 6 64.5 387 3.4 3.4 20.4

70 – 79 6 74.5 447 13.4 13.4 80.4

80 – 89 10 84.5 845 23.4 23.4 234

90 – 99 11 94.5 1039.5 33.4 33.4 367.4

100 – 109 12 104.5 1254 43.4 43.4 520.8

∑ 100 6110 2446

First, calculate the mean

𝑋 = 𝑓𝑋

𝑓=

6110

100= 61.10

Then, calculate the mean deviation

MD= 𝒇 𝑿𝒊−𝑿

𝒇=

2446

100= 24.46

(ii) Variance

Class Interval

f X fX 𝑿 − 𝑿 𝑿 − 𝑿 𝟐 𝒇 𝑿 − 𝑿 𝟐 𝑿𝟐 𝒇𝑿𝟐

10 – 19 2 14.5 29 -46.6 2171.56 4343.12 210.25 420.5

20– 29 10 24.5 245 -36.6 1339.56 13395.60 600.25 6002.5

30 – 39 15 34.5 517.5 -26.6 707.56 10613.40 1190.25 17853.75

40 – 49 18 44.5 801 -16.6 275.56 4960.08 1980.25 35644.5

50 – 59 10 54.5 545 -6.6 43.56 435.60 2970.25 29702.5

60 – 69 6 64.5 387 3.4 11.56 69.36 4160.25 24961.5

70 – 79 6 74.5 447 13.4 179.56 1077.36 5550.25 33301.5

80 – 89 10 84.5 845 23.4 547.56 5475.6 7140.25 71402.5

90 – 99 11 94.5 1039.5 33.4 1115.56 12271.16 8930.25 98232.75

100 – 109 12 104.5 1254 43.4 1883.56 22602.72 10920.25 131043

∑ 100 6110 75244 448565

Using Method I:

𝑆2 = 𝑓 𝑋𝑖 − 𝑋 2

𝑓 − 1

Page 17: Module 2: Measures of Location, Dispersion and Partition

17

=75244

100 − 1

=75244

99

= 760.04

Using Method II:

𝑆2 = 𝑓𝑋2 −

𝑓𝑋 2

𝑓

𝑓 − 1

=448565 −

6110 2

100

100 − 1

=448565 − 373321

99

=75244

99

= 760.04

Exercise

Use Method III to calculate the variance.

(iii) Standard Deviation (SD)

SD = Variance

= 760.04

= 27.57

(iv) Coefficient of Variation (CV)

CV =Standard Deviation

Mean× 100%

∴ CV =27.57

61.10× 100%

= 45.12%

Page 18: Module 2: Measures of Location, Dispersion and Partition

18

Assignment 2:

Given the following weights of patients:

68 84 75 82 68 90 62 88 76 93

86 67 73 81 72 63 76 75 85 77

73 79 88 73 60 93 71 59 85 75

66 78 82 75 94 77 69 74 68 60

61 65 75 87 74 62 95 78 63 72

79 62 67 97 78 85 69 65 71 75

96 78 81 61 75 95 60 79 83 71

65 80 73 57 88 78 51 76 53 74

(i) Use tally method to classify the above data into classes 50-54, 55-59, 60-64 etc. (ii) Compute the following:

(a) Mode (b) Median (c) Interquartile Range (d) 7th decile (e) 85th percentile (f) Coefficient of variation