Upload
prabhakar-bhattacharya
View
1.910
Download
1
Tags:
Embed Size (px)
Citation preview
1
Module 4Module 4Measures of Central Tendency and DispersionMeasures of Central Tendency and Dispersion
2
Measures of Central Tendency-- Mean
• Arithmetic • Geometric• Harmonic• Weighted Mean •
Median, Quartiles, Percentiles, Deciles
Mode
Measures of Variation
Measures Of Central Tendency And Dispersion
3
Range
Mean Deviation
Standard Deviation ( Variance )
Inter Quartile Range
Coefficient of Variation
Measures of Skewness and Kurtosis
Standardised Variables and Scores
Measures Of Central Tendency And Dispersion
4
Measures of Location or Central Measures of Location or Central Tendency Tendency
Measure of Location
Centre of Gravity
There are three such measures:
Mean
Median, Quartiles, Percentiles and Deciles
Mode
5
Properties of a MeasureProperties of a Measure
It should be easy to understand and calculate
It should be based on all observations
It should not be much affected by a few extreme
observations
It should be amenable to mathematical treatment. For
example, we should be able to
calculate the combined measure for two sets of
observations given the measure for each of the two sets
6
Mean Mean There are three types of means viz.,
Arithmetic Mean
Harmonic Mean
Geometric Mean
7
Arithmetic Mean Arithmetic Mean Ungrouped (Raw) Data Ungrouped (Raw) Data
nsObservatio ofNumber
nsObservatio of Sumx
n
xi
8
Illustration 4.1Illustration 4.1
Table 4.1 : Equity Holdings of 20 Indian Billionaires
( Rs. in Millions)2717 2796 3098 3144 3527
3534 3862 4186 4310 4506
4745 4784 4923 5034 5071
5424 5561 6505 6707 6874
9
Illustration 4.1Illustration 4.1
For the above data, the A.M. is 2717 + 2796 +…… 4645+….. + 5424 + ….+ 6874 = --------------------------------------------------------------------------
20
= Rs. 4565.4 Millions
x
10
Arithmetic Mean Arithmetic Mean Grouped Data Grouped Data
i
i
f
f ixx
11
Illustration 4.2Illustration 4.2The calculation is illustrated with the data relating to equity holdings of the group of 20 billionaires given in Table 3.1
Class Interval( 1 )
Frequency( fi ) ( 2 )
Mid Value of Class Interval
( xi ) ( 3 )
fixi
Col.(4) = Col.(2) x Col.(3)
2000 – 3000 2 2500 5000
3000 – 4000 5 3500 17500
4000 – 5000 6 4500 27000
5000 – 6000 4 5500 22000
6000 – 7000 3 6500 19500
Sum fi = 20 fixi = 91000
12
Illustration 4.2Illustration 4.2
values of fi and fixi , in formula
= 9100 ÷ 20
= 4550
i
i
f
f ixx
13
Weighted Arithmetic Mean Weighted Arithmetic Mean
if the values x1, x2 x3, …. xi, ….xn have weights
w1, w2 w3, …. wi, ….,wn then the weighted mean
of x is given as
i
ii
w
xwx
14
Illustration 4.3Illustration 4.3
Item Monthly Consump
tion
Weight(wi)
Rise in Price (Percentage)
(pi)
wipi
Sugar 5 5 20 100
Rice 20 20 10 200
15
Illustration 4.3Illustration 4.3
Therefore, the average price rise could be
evaluated as
= =
= = = 12.
Thus the average price rise is 12 % .
205
200100
25
300
i
ii
w
pwp
16
Geometric Mean Geometric Mean
The Geometric Mean ( G. M.) of a series of observations with x1, x2, x3, ……..,xn is defined
as the nth root of the product of these values . Mathematically G.M. = { ( x1 )( x2 )( x3 )…………….(xn ) }
(1/ n )
It may be noted that the G.M. cannot be defined if any value of x is zero as the whole product of various values becomes zero.
17
Illustration 4.5 Illustration 4.5
For the data with values, 2,4, and 8, G.M. = (2 x 4 x 8 ) (1/3) = (64) 1/3
= 4
18
Average Rate of Growth of Average Rate of Growth of Production/Business or Increase in Prices Production/Business or Increase in Prices
If P1 is the production in the first year and Pn is
the production in the nth year, then the average
rate of growth is given by ( G – 100) % where,
G = 100 (Pn / P1 )1/(n-1)
or log G = log 100 + { 1/(n–1) } (log Pn – log P1)
19
Example 4.4
The wholesale price index in the year 2000-01 was 145.3. It increased to 195.5 in the year 2005-06. What has been the average rate of increase in the index during the last 5 years. Solution:By using the formula ( 4.8), we have log G = 2 +{ (1/5) ( log 195.5 – log145.3 ) }
= 2.02578Therefore,
G = Anti log (2.02578) = 106.11Thus the average rate of increase = 106.11 100 = 6.11%
20
Combined G.M. of Two Sets of Data Combined G.M. of Two Sets of Data
If G1 & G2 are the Geometric means of two sets
of data, then the combined Geometric mean, say G, of the combined data is given by :
n1 log G1 + n2 log G2
log G = ------------------------------- n1 + n2
21
Combined G.M. of Two Sets of DataCombined G.M. of Two Sets of Data
As another example, suppose the average growth
rate during the first five years of business is 20 %,
and the average growth rate of business during the
next five years is 15 %, and we wish to find the
average growth rate for the entire period of 10
years. This growth rate can be found by calculating
the combined geometric mean of the geometric
means 120 and 115, for the two blocks of 5-year
periods. Thus, the requisite G.M., say G, can be
worked out as follows:
22
Combined G.M. of Two Sets of DataCombined G.M. of Two Sets of Data
5 log 120 + 5 log 115 5 x 2.07918 + 5 x 2.06070log G = ------------------------------- = ---------------------------------- 5 + 5 10 20.6994 = ------------ = 2.06994 10Therefore,
G = antilog 2.06994 = 117.47
Thus the combined average rate of growth for the period of 10 years is 17.47%.
23
Weighted Geometric Mean Weighted Geometric Mean
Just like weighted arithmetic mean, we also have weighted Geometric mean
If x1, x2,….,xi,….,xn are n observations with
weights w1, w2, …wi,.., wn, then their G.M. is
defined as:
wi log xi
G.M. = ---------------------- wi
24
Harmonic Mean Harmonic Mean The harmonic mean (H.M.) is defined as the reciprocal of the arithmetic mean of the reciprocals of the observations.
For example, if x1 and x2 are two observations, then the arithmetic means of their reciprocals viz 1/x1 and 1/ x2 is
{(1 / x1) + (1 / x2)} / 2= (x2 + x1) / 2 x1 x2
The reciprocal of this arithmetic mean is 2 x1 x2 / (x2 + x1). This is called the harmonic mean. Thus the harmonic mean of two observations x1 and x2 is 2 x1 x2
-----------------
x1 + x2
25
Relationship Among A.M. G.M. and H.M. Relationship Among A.M. G.M. and H.M.
The relationships among the magnitudes of the three types of Means calculated from the same data are as follows: (i) H.M. ≤ G.M. ≤ A.M. i.e. the arithmetic mean is greater than or equal to the geometric which is greater than or equal to the harmonic mean. ( ii ) G.M. = i.e. geometric mean is the square root of the product of arithmetic mean and harmonic mean.
( iii) H.M. = ( G.M.) 2 / A .M.
... MHMA
26
Median Median
whenever there are some extreme values in the data, calculation of A.M. is not desirable.
Further, whenever, exact values of some observations are not available, A.M. cannot be calculated.
In both the situations, another measure of location called Median is used.
27
Median - Ungrouped Data Median - Ungrouped Data
First the data is arranged in ascending/descending order. In the earlier example relating to equity holdings data of 20 billionaires given in Table 4.1, the data is arranged as per ascending order as follows 2717 2796 3098 3144 3527 35343862 4187 4310 4506 4745 4784 49235034 5071 5424 5561 6505 6707 6874
Here, the number of observations is 20, and therefore there is no middle observation. However, the two middle most observations are 10th and 11th. The values are 4506 and 4745. Therefore, the median is their average.
4506 + 4745 9251 Median = ----------------- = -----------
2 2
= 4625.5 Thus, the median equity holdings of the 20 billionaires is Rs.4625.5 Millions.
28
Median - GroupedMedian - Grouped
The median for the grouped data is also defined as the value
corresponding to the ( (n+1)/2 )th observation, and is calculated
from the following formula:
( (n/2) –fc )
Median = Lm + ----------------- wm
fm
where,
•Lm is the lower limit of 'the median class internal i.e. the interval which
contains n/2th observation
•fm is the frequency of the median class interval i.e. the class interval which
contains the ( (n)/2 )th observation
•fc is the cumulative frequency up to the median class- interval
•wm is the width of the median class-interval
•n is the number of total observations.
29
Illustration 4.2Illustration 4.2
Class Interval Frequency Cumulative frequency
2000-3000 2 2
3000-4000 5 7
4000-5000 6 13
5000-6000 4 17
6000-70000 3 20
30
Illustration 4.2Illustration 4.2Here, n = 20, the median class interval is from 4000 to 5000 as the 10th observation lies in this interval.Further,
Lm = 4000
fm = 6
fc = 7
wm = 1000
Therefore, 20/2 –7 x 1000
Median = 4000 + ------------------------- 6
= 4000 + 3/6 x 1000= 4000 + 500
= 4500
31
MedianMedian
The median divides the data into two parts such that the number of observations less than the median are equal to the number of observations more than it.
This property makes median very useful measure when the data is skewed like income distribution among persons/households, marks obtained in competitive examinations like that for admission to Engineering / Medical Colleges, etc.
32
Graphical Method of Finding the Graphical Method of Finding the MedianMedian
If we draw both the ogives viz. “Less Than “ and “ More Than”, for a data, then the point of intersection of the two ogives is the Median.
0
5
10
15
20
25
Median
Less Than Ogive
More Than Ogive
33
Quartiles Quartiles
Median divides the data into two parts such that 50 % of the observations are less than it and 50 % are more than it. Similarly, there are “Quartiles”. There are three Quartiles viz. Q1 , Q2 and Q3. These are referred
to as first, second and third quartiles. The first quartile , Q1, divides the data into two parts
such that 25 % ( Quarter ) of the observations are less than it and 75 % more than it.
The second quartile, Q2, is the same as median. The third
quartile divides the data into two parts such that 75 % observations are less than it and 25 % are more than it.
All these can be determined, graphically, with the help of the Ogive curve
QuartilesQuartiles
Ogive Curve (Less than type)
0.00%20.00%40.00%60.00%80.00%
100.00%120.00%
2000
3000
4000
5000
6000
7000
Mor
e
Bin
Fre
qu
en
cy
Cumulative %
35
QuartilesQuartiles
data Q1 and Q3 are defined as values corresponding to
an observation given below :
Ungrouped Data Grouped Data (arranged in ascending or descending order) Lower Quartile Q1 {( n + 1 ) / 4 }th ( n / 4 )th
Median Q2 { ( n + 1 ) / 2 }th
( n / 2 )th
Upper Quartile Q3 {3 ( n + 1 ) / 4 } th (3 n / 4 )th
36
QuartilesQuartiles
1
1
1
)4/(1 Q
Q
cQ w
f
fnLQ
3
3
3
)4/3(3 Q
Q
cQ w
f
fnLQ
37
Equity Holding DataEquity Holding Data
Class Interval Frequency Cumulative frequency
2000-3000 2 2
3000-4000 5 7
4000-5000 6 13
5000-6000 4 17
6000-70000 3 20
38
( (20/4) – 2 )
Q1 = 3000 + --------------- 1000
5 ( 5 – 2 )
= 3000 + -------------------- 1000 5
3000= 3000 + ------------- 5
= 3000 + 600 = 3600 The interpretation of this value of Q1 is that 25 %
billionaires have equity holdings less than Rs.
39
(15 – 13)
Q3 = ------------- 1000 +5000
4 2
= ------- 1000 +5000 4
= 5500The interpretation of this value of Q3 is that 75 %
billionaires have equity holdings less than Rs. 5500 Millions.
40
Percentiles Percentiles
(95/100) n – fc
P95 = L P95 + ------------------- x wP95
f P95
where, L P95 is the lower point of the class interval
containing 95th percent of total frequency, fc is the
cumulative frequency up to the 95th percentile interval, f P95 is
the frequency of the 95th percentile interval and wP95 is the
width of the 95th percentile interval.
41
Deciles Deciles
Just like quartiles divide the data in four parts, the
deciles divide the data into ten parts – first deciles
( 10% ) , second ( 20% ) , and so on. In fact, P10 ,
P20 , ……………….., P90 are the same as deciles.
And just as second quartile and median are the
same, so the fifth decile i.e. P50 and the median are the
same.
42
Mode Mode
fm - f0
Mode = Lm + ----------------- wm
fm - f0 - f2where ,
Lm is the lower point of the modal class interval
fm is the frequency of the modal class interval
f0 is the frequency of the interval just before the modal interval
f2 is the frequency of the interval just after the modal interval
wm is the width of the modal class interval
43
Equity Holding DataEquity Holding Data
the modal interval i.e., the class interval with the
maximum frequency (6) is 4000 to 5000. Further,
Lm = 4000
wm = 1000
fm = 6
f0 = 5
f2 = 4
Therefore
44
Equity Holding DataEquity Holding Data
( 6 – 5 )
Mode = 4000 + -------------------- 1000
2 6 – 5 – 4
= 4000 + 1000
= 4000 + 333.3
= 4333.3
Thus the modal equity holdings of the billionaires is
Rs. 4333.3 Millions.
45
Empirical Relationship among Empirical Relationship among Mean, Median and Mode Mean, Median and Mode
In a moderately skewed distributions, it is found that the following relationship, generally, holds good :
Mean – Mode = 3 (Mean – Median)
From the above relationship between, Mean, Median and Mode, if the values of two of these are given, the value of third measure can be found out
Equity Holding DataEquity Holding Data
4333 4500 4565
(mode) (median) (mean)
Right Skewed Distribution Right Skewed Distribution
Mode Median Mean
SymmetricalSymmetrical
Mode Median Mean
Left Skewed DistributionLeft Skewed Distribution
Mean Median Mode
50
Features of a Good Statistical Features of a Good Statistical Average Average
Readily computable, comprehensible and easily understood
It should be based on all the observations
It should be reliable. enough to be taken as true representative of the
population
It should not be much affected by the extreme values in the data
It should be amenable to further mathematical treatment. This properly
helps in assessing the reliability of conclusions drawn about the population
value with the help of sample value
Should not vary much from sample to sample taken from the same
population.
51
Comparison of Measures of Location Comparison of Measures of Location Arithmetic Mean
Advantages Disadvantages
(i) Easy to understand and calculate(ii) Makes use of full data(iii) Only number and sum of the observations need be known for its calculation.
(i ) Unduly influenced by extreme values (ii) Cannot be calculated from the data with open-end class- intervals in grouped data or when values of all observations are available – all that is known that some observations are either less than or greater than some value, in ungrouped data
52
Geometric Mean
Advantages Disadvantages
(i) Makes use of full data (ii) Extreme large values have lesser impacts(ii) Useful for data relating to rations and percentage(iv) Useful for rate of change/growth
(i) Cannot be calculated if any observation has the value zero(ii) Difficult to calculate
and interpret
53
Median
Advantages Disadvantages
(i) Simple to understand (ii) Extreme values do not have any impact(iii) Can be calculated even if values of all observations are not known or data has open-end class intervals(iv) Used for measuring qualities and factors which are not quantifiable(v) Can be approximately determined with the help of a graph (ogives)
(i) Arranging values in ascending
/descending order may sometime be tedious
(ii) Sum of the observations cannot be found out, if only Median is known
(i) Not amenable for mathematical calculations