Upload
asclabisb
View
226
Download
0
Embed Size (px)
Citation preview
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
1/17
Applied Statistics and Computing Lab
CENTRAL TENDENCY AND DISPERSION:For grouped data
Applied Statistics and Computing Lab
Indian School of Business
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
2/17
Applied Statistics and Computing Lab
Learning goals
Understanding data with class intervals
Learning to evaluate various measures ofcentral tendency and dispersion, for grouped
data
2
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
3/17
Applied Statistics and Computing Lab
Introduction We studied measures of central tendency and
dispersion for discrete data The data was represented in form of a list
How do we deal with data with class intervals?
Can we find a value that represents a given classinterval?
Class intervals could emerge from both discrete as well
as continuous data
We would look at a dataset consisting of Nobservations, distributed across n classes
3
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
4/17
Applied Statistics and Computing Lab
Class mark Class mark is the midpoint of a class interval
Calculated as the arithmetic mean of the class limits E.g. if we are looking at the number of students whose scores lie
between 60 and 70, (60 is the lower limit and 70 is the upper limit)6 0 + 7 0
2= 65
is the class mark or the midpoint of the class interval 60-70
Class mark cannot be determined for a data with open classes
(intervals indicated by open bracket on either sides)
In case of overlapping classes (where the upper limit of a class and
the lower limit of the next one are equal), we assign that
overlapping value to that class where the value is the lower limit
4
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
5/17
Applied Statistics and Computing Lab
Cumulative frequency Cumulative frequency is the frequency of values up to the upper limit of the
corresponding class interval
For the class, denote its frequency as , cumulative frequency as and classmark as
Class # Class interval Frequency Cumulative frequency Class mark
1 =
=
+
22 = +
= +
2
3 = + + =
+
2
i = + ++ =
+
2
n = + + ++ = = +
2
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
6/17
Applied Statistics and Computing Lab
Example Weights from the body measurement data used earlier
Weight values are given up to one decimal point
6
Class interval Frequency Cumulative frequency Class mark
40-49.9 27 27 44.95
50-59.9 124 151 54.95
60-69.9 120 271 64.95
70-79.9 115 386 74.95
80-89.9 87 473 84.95
90-99.9 25 498 94.95
100-109.9 8 506 104.95
110-119.9 1 507 114.95
Total 507 - -
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
7/17
Applied Statistics and Computing Lab
Summation of all values In ungrouped data set, the value of each observation is
considered
In grouped data, that is not possible How can we account for all the values of a dataset? As the class mark or the midpoint is considered to
represent every value belonging to that particular classinterval, that value holds to be a proxy for all the values
Can repeat the class mark as many times as thenumber of values belonging to that class interval;nothing but the frequency
Hence, =
7
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
8/17
Applied Statistics and Computing Lab
Means For data consisting of N observations distributed across n distinct
class intervals, =
= =
=
=
For the weights data, Arithmetic mean = 69.15
Geometric mean = 67.88
Harmonic mean = 66.64
8
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
9/17
Applied Statistics and Computing Lab
Median How do we determine the value that has 50% of the data on each of its
two sides?
Initially we can at least determine the class interval in which the valuewould lie, the median class
Let be the upper limit and be the lower limit of the median class
Let indicate the frequency of the median class and indicate thecumulative frequency of the class preceding the median class, then
= +( )(
2 )
This is obtained under the assumption that cumulative frequencyincreases from every class to another
9
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
10/17
Applied Statistics and Computing Lab
Median (contd.) Total 507 observations
= 253.5 observation splits the data into 2 equal halves
10
Class interval Frequency Cumulativefrequency
40-49.9 27 27
50-59.9 124 151
60-69.9 120 271
70-79.9 115 386
80-89.9 87 473
90-99.9 25 498
100-109.9 8 506
110-119.9 1 507
Total 507 -
Median class, as the
253.5th observation
would lie in thisinterval
= +
( )(2
)
= 60 +
(69.9 60)(507
2 151)
120 = 68.46
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
11/17
Applied Statistics and Computing Lab
Quantiles Suppose k is the number of quantiles
k=4 for quartiles, k=10 for deciles and k=100 for percentiles
The quantile is the (
) value of the data
Must note that (
) is not the numerical value of the quantile, it is only the position corresponding
to the quantile when the data is organised in an ascending order
For median i.e. the 2nd quartile, it was the (
)= (
) value
Using cumulative frequencies, we can then determine the class to which the given quantile belongs
As per the notations used earlier,
= + [
]
= 4 and = 2 gives the median
= = { +
34
} { +
14
}
Where, the s and s refer to the lower and upper limits of the corresponding quantile classes
11
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
12/17
Applied Statistics and Computing Lab
Mode Can easily identify the class interval with the highest frequency; the
modal class
How do we determine the value which has the highest density? Formula given by:
= + [
] where;
u
12
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
13/17
Applied Statistics and Computing Lab13
Mode (contd.)
13
Class interval Frequency
40-49.9 27
50-59.9 124
60-69.9 120
70-79.9 115
80-89.9 87
90-99.9 25
100-109.9 8
110-119.9 1
Total 507
Modal class, class
interval with the
highest frequency
= +
+ = 50 + 59.9 50
124 27
124 27 + 124 120= 59.51
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
14/17
Applied Statistics and Computing Lab
Absolute deviations
For data consisting of N observations
distributed across n distinct class intervals,
=
=
where, is the class mark
14
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
15/17
Applied Statistics and Computing Lab
Central moments For grouped data consisting of N observations
distributed across n distinct class intervals, = =
( )
where, is the class mark
= = ()
= Coefficient of skewness and kurtosis can be
calculated accordingly
15
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
16/17
Applied Statistics and Computing Lab
Conclusion We can verify that the values obtained with the
formulae for grouped data, are very close to the valuesobtained by considering the data as ungrouped
In many situation, describing data using class intervalsis more insightful
Therefore these formulae can be useful for quick handcalculation In this age of extensive computational power, these
measures can be calculated without dividing the data
into class intervals Yet, these formulae are important from theoretical
point of view
16
7/30/2019 (8b) Grouped Data_central Tendency and Dispersion
17/17
Applied Statistics and Computing Lab
Thank you