UNIT 1 PART II
Q. A. - 1
Prepared By
Kunal Mojidra
2
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
UNGROUPED VERSUS GROUPED DATA
Ungrouped data
• have not been summarized in any way
• are also called raw data
Grouped data
• have been organized into a frequency
distribution
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
3
EXAMPLE OF UNGROUPED DATA
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Ages of a Sample of
Doctors’ in Gandhinagar
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
4
FREQUENCY DISTRIBUTION OF AGES OF A
SAMPLE OF DOCTORS’ IN GANDHINAGAR
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
5
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Smallest
Largest
51 =
23 - 74 =
Smallest -Largest = Range
DATA RANGE
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
6
NUMBER OF CLASSES AND CLASS WIDTH
The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive
summarization.
• More than 15 classes leave too much detail.
Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
10 = Width Class
8.5 =6
51
classes of No
Range = Width Class eApproximat
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
7
CLASS MIDPOINT
35 =
2
40 + 30 =
2
endpoint class ending+endpoint class beginning =Midpoint Class
35 =
102
1 + 30 =
widthclass2
1 +point beginning class =Midpoint Class
RELATIVE AND CUMULATIVE FREQUENCY
Relative frequency is the proportion of the
total frequency that is in any given class interval
in a frequency distribution.
Relative frequency is the individual class
frequency divided by the total frequency.
The cumulative frequency is a running total
of frequencies through the classes of a frequency
distribution.
The cumulative frequency for each class interval
is the frequency for that class interval added to
the preceding cumulative total.
9
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 1
10
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
QUANTITATIVE DATA GRAPHS
One of the most effective mechanisms for presenting data in a form meaningful to decision makers is graphical depiction.
Through graphs and charts, the decision maker can often get an overall picture of the data and reach some useful conclusions merely by studying the chart or graph.
Converting data to graphics can be creative and artful.
Often the most difficult step in this process is to reduce important and sometimes expensive data to a graphic picture that is both clear and concise and yet consistent with the message of the original data.
One of the most important uses of graphical depiction in statistics is to help the researcher determine the shape of a distribution. 11
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
QUANTITATIVE DATA GRAPHS
Quantitative data graphs are plotted along a
numerical scale, and qualitative graphs are
plotted using non-numerical categories.
12
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Quantitative data graphs
histogramfrequency polygon
ogive dot plotstem-and-leaf plot.
EXAMPLE 2
13
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
HISTOGRAMS
One of the more widely used types of graphs for
quantitative data is the histogram.
A histogram is a series of contiguous bars or
rectangles that represent the frequency of data in
given class intervals.
If the class intervals used along the horizontal
axis are equal, then the height of the bars
represent the frequency of values in a given class
interval. 14
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
HISTOGRAM
If the class intervals are unequal, then the areas
of the bars (rectangles) can be used for relative
comparisons of class frequencies
A histogram is a useful tool for differentiating the
frequencies of class intervals.
A quick glance at a histogram reveals which class
intervals produce the highest frequency totals
15
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
HISTOGRAM
16
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
FREQUENCY POLYGONS
A frequency polygon, like the histogram, is a graphical display of class frequencies.
However, instead of using bars or rectangles like a histogram, in a frequency polygon each class frequency is plotted as a dot at the class midpoint, and the dots are connected by a series of line segments.
Construction of a frequency polygon begins by scaling class midpoints along the horizontal axis and the frequency scale along the vertical axis.
A dot is plotted for the associated frequency value at each class midpoint.
Connecting these midpoint dots completes the graph.
17
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
FREQUENCY POLYGONS
18
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
OGIVES
An ogive (o-jive) is a cumulative frequency polygon.
Construction begins by labeling the x-axis with the
class endpoints and the y-axis with the frequencies.
However, the use of cumulative frequency values
requires that the scale along the y-axis be great
enough to include the frequency total.
A dot of zero frequency is plotted at the beginning of
the first class, and construction proceeds by
marking a dot at the end of each class interval for
the cumulative value.
Connecting the dots then completes the ogive.
19
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
OGIVES
20
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
DOT PLOTS
A relatively simple statistical chart that is generally used
to display continuous, Quantitative data is the dot plot.
In a dot plot, each data value is plotted along the
horizontal axis and is represented on the chart by a dot.
If multiple data points have the same values, the dots
will stack up vertically.
Dot plots can be especially useful for observing the
overall shape of the distribution of data points along with
identifying data values or intervals for which there are
groupings and gaps in the data.
21
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
STEM-AND-LEAF PLOT
Another way to organize raw data into groups
besides using a frequency distribution is a stem-
and-leaf plot.
This technique is simple and provides a unique
view of the data.
A stem-and-leaf plot is constructed by separating
the digits for each number of the data into two
groups, a stem and a leaf.
The leftmost digits are the stem and consist of the
higher valued digits.
The rightmost digits are the leaves and contain
the lower values. 22
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
STEM-AND-LEAF PLOT
23
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
QUALITATIVE GRAPHS
In contrast to quantitative data graphs that are
plotted along a numerical scale, qualitative
graphs are plotted using non-numerical
categories.
24
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
PIE CHARTS
A pie chart is a circular depiction of data where the area of the
whole pie represents 100% of the data and slices of the pie
represent a percentage breakdown of the sublevels.
Pie charts show the relative magnitudes of the parts to the
whole.
They are widely used in business, particularly to depict such
things as budget categories, market share, and time/resource
allocations.
However, the use of pie charts is minimized in the sciences and
technology because pie charts can lead to less accurate
judgments than are possible with other types of graphs.
Generally, it is more difficult for the viewer to interpret the
relative size of angles in a pie chart than to judge the length of
rectangles in a bar chart.
25
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 3
26
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Leading Petroleum Refining Companies
PIE CHART
27
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
BAR GRAPHS
Another widely used qualitative data graphing
technique is the bar graph or bar chart.
A bar graph or chart contains two or more
categories along one axis and a series of bars, one
for each category, along the other axis.
Typically, the length of the bar represents the
magnitude of the measure (amount, frequency,
money, percentage, etc.) for each category.
The bar graph is qualitative because the
categories are non-numerical, and it may be
either horizontal or vertical. 28
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
BAR GRAPHS
In Excel, horizontal bar graphs are referred to as
bar charts, and vertical bar graphs are referred
to as column charts.
A bar graph generally is constructed from the
same type of data that is used to produce a pie
chart.
However, an advantage of using a bar graph over
a pie chart for a given set of data is that for
categories that are close in value, it is considered
easier to see the difference in the bars of bar
graph than discriminating between pie slices.29
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 4
30
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
How Much is Spent on Back to College
Shopping by the Average Student
BAR GRAPHS
31
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
PARETO CHARTS
A third type of qualitative data graph is a Pareto chart, which could be viewed as a particular application of the bar graph.
An important concept and movement in business is total quality management.
One of the important aspects of total quality management is the constant search for causes of problems in products and processes.
A graphical technique for displaying problem causes is Pareto analysis.
Pareto analysis is a quantitative tallying of the number and types of defects that occur with a product or service.
Analysts use this tally to produce a vertical bar chart that displays the most common types of defects, ranked in order of occurrence from left to right. The bar chart is called a Pareto chart. 32
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 6
Suppose the number of electric motors being
rejected by inspectors for a company hasbeen
increasing.
Company officials examine the records of several
hundred of the motors inwhich at least one defect
was found to determine which defects occurred
more frequently.
They find that 40% of the defects involved poor
wiring, 30% involved a short in the coil, 25%
involved a defective plug, and 5% involved
cessation of bearings.33
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
PARETO CHARTS
34
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
GRAPHICAL DEPICTION OF TWO-
VARIABLE NUMERICAL DATA:
SCATTER PLOTS
Many times in business research it is imprtant to
explore the relationship between two numerical
variables.
A graphical mechanism for examining the
relationship between two numerical variables—
the scatter plot (or scatter diagram).
A scatter plot is a two-dimensional graph plot of
pairs of points from two numerical variables.
35
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
36
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
SCATTER PLOT
Registered
Vehicles
(1000's)
Petrol Sales
(1000's of
Liters)
5 60
15 120
9 90
15 140
7 600
100
200
0 5 10 15 20
Pe
tro
l S
ale
s
Registered Vehicles
MEASURES OF CENTRAL TENDENCY:
UNGROUPED DATA
One type of measure that is used to describe a set
of data is the measure of central tendency.
Measures of central tendency yield information
about the center, or middle part, of a group of
numbers.
Measures of central tendency do not focus on the
span of the data set or how far values are from
the middle numbers.
The measures of central tendency here for
ungrouped data are the mode, the median, the
mean, percentiles, and quartiles.37
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
MODE
The mode is the most frequently occurring value
in a set of data.
In the case of a tie for the most frequently
occurring value, two modes are listed. Then the
data are said to be bimodal.
Data sets with more than two modes are referred
to as multimodal. 38
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
MEDIAN
The median is the middle value in an ordered array of
numbers.
For an array with an odd number of terms, the
median is the middle number.
For an array with an even number of terms, the
median is the average of the two middle numbers.
A disadvantage of the median is that not all the
information from the numbers is used.
For example, information about the specific asking
price of the most expensive house does not really
enter into the computation of the median.
The level of data measurement must be at least
ordinal for a median to be meaningful.39
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
MEDIAN
The following steps are used to determine the median.
STEP 1. Arrange the observations in an ordered data
array.
STEP 2. For an odd number of terms, find the middle
term of the ordered array. It is the median.
STEP 3. For an even number of terms, find the
average of the middle two terms. This average is the
median.
40
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
MEAN
The arithmetic mean is the average of a group of
numbers and is computed by summing all
numbers and dividing by the number of numbers.
Because the arithmetic mean is so widely
used, most statisticians refer to it simply as the
mean.
41
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 7
42
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
MEAN
The mean is affected by each and every value,
which is an advantage.
The mean uses all the data, and each data item
influences the mean.
It is also a disadvantage because extremely large
or small values can cause the mean to be pulled
toward the extreme value.
The mean is the most commonly used measure of
central tendency because it uses each data item
in its computation, it is a familiar measure, and
it has mathematical properties that make it
attractive to use in inferential statistics analysis. 43
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
44
VARIABILITY
Mean
Mean
No Variability in Cash Flow
Variability in Cash Flow
PERCENTILES
Percentiles are measures of central tendency that divide a group of data into 100 parts.
There are 99 percentiles because it takes 99 dividers to separate a group of data into 100 parts.
The nth percentile is the value such that at least n percent of the data are below that value and at most (100 - n) percent are above that value.
Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it
The median and the 50th percentile have the same value.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data 45
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
STEPS IN DETERMINING THE LOCATION
OF A PERCENTILE
46
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 8
Determine the 30th percentile of the following
eight numbers: 14, 12, 19, 23, 5, 13, 28, 17.
47
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
QUARTILES
Measures of central tendency that divide a group of
data into four subgroups
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
Q1 is equal to the 25th percentile
Q2 is located at 50th percentile and equals the
median
Q3 is equal to the 75th percentile
Quartile values are not necessarily members of the
data set48
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
QUARTILES
49
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 9
50
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 10
51
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 11
52
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE – 12
53
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
LMEASURES OF VARIABILITY:
UNGROUPED DATA
Measures of variability describe the spread or the
dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
54
RANGE
The difference between the largest and the smallest
values in a set of data
Simple to compute
Ignores all data points except the two extremes
Example:
Range = Largest - Smallest
= 48 - 35 = 13
35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
55
INTERQUARTILE RANGE
Range of values between the first and third quartiles
Range of the ―middle half‖
Less influenced by extremes
Interquartile Range Q Q 3 1
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
56
DEVIATION FROM THE MEAN
Data set: 5, 9, 16, 17, 18
Mean:
Deviations from the mean: -8, -4, 3, 4, 5
X
N
65
513
0 5 10 15 20
-8-4
+3+4
+5
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
57
MEAN ABSOLUTE DEVIATION
Average of the absolute deviations from the mean
5
9
16
17
18
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
X X X M A D
X
N. . .
.
24
5
4 8
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
58
POPULATION VARIANCE
Average of the squared deviations from the
arithmetic mean
5
9
16
17
18
-8
-4
+3
+4
+5
0
64
16
9
16
25
130
X X 2
X
2
2
1 3 0
5
2 6 0
XN
.
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
59
POPULATION STANDARD DEVIATION
2
2
2
1 3 0
5
2 6 0
2 6 0
5 1
XN
.
.
.
5
9
16
17
18
-8
-4
+3
+4
+5
0
64
16
9
16
25
130
X X 2
X
Square root of the variance
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
60
SAMPLE VARIANCE
Average of the squared deviations from the arithmetic mean
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
X X X 2
X X 2
2
1
6 6 3 8 6 6
3
2 2 1 2 8 8 6 7
SX Xn
,
, .
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
61
SAMPLE STANDARD DEVIATION
Square root of the
sample variance 2
2
2
1
6 6 3 8 6 6
3
2 2 1 2 8 8 6 7
2 2 1 2 8 8 6 7
4 7 0 4 1
SX X
S
n
S
,
, .
, .
.
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
X X X 2
X X
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
62
USES OF STANDARD DEVIATION
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
63
STANDARD DEVIATION AS AN
INDICATOR OF FINANCIAL RISK
Annualized Rate of Return
FinancialSecurity
A 15% 3%
B 15% 7%
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
64
EMPIRICAL RULE
Data are normally distributed (or approximately
normal)
1 2
3
95
99.7
68
Distance from
the Mean
Percentage of Values
Falling Within Distance
EXAMPLE -13
A company produces a lightweight valve that is specified to weigh 1365 grams.
Unfortunately, because of imperfections in the manufacturing process not all of the valves produced weigh exactly 1365 grams.
In fact, the weights of the valves produced are normally distributed with a mean weight of 1365 grams and a standard deviation of 294 grams.
Within what range of weights would approximately 95% of the valve weights fall?
Approximately 16% of the weights would be more than what value?
Approximately 0.15% of the weights would be less than what value? 65
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
66
CHEBYSHEV’S THEOREM
Applies to all distributions
P k X kk
for
( ) 11
2
k > 1
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
67
CHEBYSHEV’S THEOREM
Applies to all distributions
4
2
3 1-1/32 = 0.89
1-1/22 = 0.75
Distance from
the Mean
Minimum Proportion
of Values Falling
Within Distance
Number of
Standard
Deviations
K = 2
K = 3
K = 4 1-1/42 = 0.94
EXAMPLE – 14
In the computing industry the average age of
professional employees tends to be younger than in
many other business professions.
Suppose the average age of a professional employed
by a particular computer firm is 28 with a standard
deviation of 6 years.
A histogram of professional employee ages with this
firm reveals that the data are not normally
distributed but rather are amassed in the 20s and
that few workers are over 40.
Apply Chebyshev’s theorem to determine within
what range of ages would at least 80% of the
workers’ ages fall.68
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
69
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
COEFFICIENT OF VARIATION
Ratio of the standard deviation to the
mean, expressed as a percentage
Measurement of relative dispersion
C V. .
100
EXAMPLE – 15
Five weeks of average prices for stock A are 57,
68, 64, 71, and 62. Compute a coefficient of
variation.
Z- SCORE
A z score represents the number of standard
deviations a value (x) is above or below the mean
of a set of numbers when the data are normally
distributed.
Using z scores allows translation of a value’s raw
distance from the mean into units of standard
deviations.
If a z score is negative, the raw value (x) is below
the mean. If the z score is positive,
the raw value (x) is above the mean70
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EXAMPLE - 16
For a data set that is normally distributed with a
mean of 50 and a standard deviation of
10, Determine the z score for a value of 70.
71
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
EMPIRICAL RULE IN FORM OF Z- SCORE
Between z = -1.00 and z = +1.00 are
approximately 68% of the values.
Between z = -2.00 and z = +2.00 are
approximately 95% of the values.
Between z = -3.00 and z = +3.00 are
approximately 99.7% of the values.
72
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
73
MEASURES OF CENTRAL TENDENCY
AND VARIABILITY: GROUPED DATA
Measures of Central Tendency
Mean
Median
Mode
Measures of Variability
Variance
Standard Deviation
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
74
MEAN OF GROUPED DATA
Weighted average of class midpoints
Class frequencies are the weights
fM
f
fM
N
f M f M f M f M
f f f f
i i
i
1 1 2 2 3 3
1 2 3
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
75
EXAMPLE -17
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM
f
2150
5043 0.
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
76
MEDIAN OF GROUPED DATA
Median L
Ncf
fW
Where
p
med
2
:
L the lower limit of the median class
cf = cumulative frequency of class preceding the median class
f = frequency of the median class
W = width of the median class
N = total of frequencies
p
med
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
77
EXAMPLE - 18
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
N = 50
Md L
Ncf
fW
p
med
2
40
50
224
1110
40 909.
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
78
MODE OF GROUPED DATA
Midpoint of the modal class
Modal class has the greatest frequency
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Mode
30 40
235
EXAMPLE - 19
For the following grouped data, Find Mean, Mode
and Median.
79
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
80
VARIANCE AND STANDARD DEVIATION
OF GROUPED DATA - PUPOLATION
EXAMPLE - 20
If these is the following data of a population, find
Variance and Std. Deviation.
81
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
AS PER ORIGINAL FORMULA
82
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
AS PER COMPUTATIONAL FORMULA
83
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
84
VARIANCE AND STANDARD DEVIATION
OF GROUPED DATA - SAMPLE
EXAMPLE - 21
Compute the mean, median, mode, variance, and
standard deviation on the following sample data.
85
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
86
MEASURES OF SHAPE
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Box and Whisker Plots
Graphic display of a distribution
Reveals skewness
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
87
SKEWNESS
Negatively
Skewed
Positively
SkewedSymmetric
(Not Skewed)
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
88
SKEWNESSSKEWNESS
Negatively
Skewed
Mode
Median
Mean
Symmetric
(Not Skewed)
Mean
Median
Mode
Positively
Skewed
Mode
Median
Mean
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
89
COEFFICIENT OF SKEWNESS
Summary measure for skewness
If S < 0, the distribution is negatively skewed
(skewed to the left).
If S = 0, the distribution is symmetric (not skewed).
If S > 0, the distribution is positively skewed
(skewed to the right).
S
Md
3
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
90
COEFFICIENT OF SKEWNESS
1
1
1
1
1 1
1
23
26
12 3
3
3 23 26
12 3
0 73
M
SM
d
d
.
.
.
2
2
2
2
2 2
2
26
26
12 3
3
3 26 26
12 3
0
M
SM
d
d
.
.
3
3
3
3
3 3
3
29
26
12 3
3
3 29 26
12 3
0 73
M
SM
d
d
.
.
.
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
91
KURTOSIS
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal in shape
Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
BOX AND WHISKER PLOT
A box-and-whisker plot, sometimes called a box plot, is a diagram that utilizes the upper and lower quartiles along with the median and the two most extreme values to depict a distribution graphically.
Five specific values are used:
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set 92
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
BOX AND WHISKER PLOT
Inner Fences
IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences
Lower outer fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
93
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
94
BOX AND WHISKER PLOT
Q1 Q3Q2Minimum Maximum
Un
it 1 -
Pa
rt II, QA
-1
, KU
NA
L
95
SKEWNESS: BOX AND WHISKER PLOTS, AND
COEFFICIENT OF SKEWNESS
Negatively
Skewed
Positively
Skewed
Symmetric
(Not Skewed)
S < 0 S > 0