Download pdf - Unit 1 Part II QA(1)

UNIT 1 PART II

Q. A. - 1

Prepared By

Kunal Mojidra

2

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

UNGROUPED VERSUS GROUPED DATA

Ungrouped data

• have not been summarized in any way

• are also called raw data

Grouped data

• have been organized into a frequency

distribution

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

3

EXAMPLE OF UNGROUPED DATA

42

30

53

50

52

30

55

49

61

74

26

58

40

40

28

36

30

33

31

37

32

37

30

32

23

32

58

43

30

29

34

50

47

31

35

26

64

46

40

43

57

30

49

40

25

50

52

32

60

54

Ages of a Sample of

Doctors’ in Gandhinagar

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

4

FREQUENCY DISTRIBUTION OF AGES OF A

SAMPLE OF DOCTORS’ IN GANDHINAGAR

Class Interval Frequency

20-under 30 6

30-under 40 18

40-under 50 11

50-under 60 11

60-under 70 3

70-under 80 1

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

5

42

30

53

50

52

30

55

49

61

74

26

58

40

40

28

36

30

33

31

37

32

37

30

32

23

32

58

43

30

29

34

50

47

31

35

26

64

46

40

43

57

30

49

40

25

50

52

32

60

54

Smallest

Largest

51 =

23 - 74 =

Smallest -Largest = Range

DATA RANGE

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

6

NUMBER OF CLASSES AND CLASS WIDTH

The number of classes should be between 5 and 15.

• Fewer than 5 classes cause excessive

summarization.

• More than 15 classes leave too much detail.

Class Width

• Divide the range by the number of classes for an

approximate class width

• Round up to a convenient number

10 = Width Class

8.5 =6

51

classes of No

Range = Width Class eApproximat

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

7

CLASS MIDPOINT

35 =

2

40 + 30 =

2

endpoint class ending+endpoint class beginning =Midpoint Class

35 =

102

1 + 30 =

widthclass2

1 +point beginning class =Midpoint Class

RELATIVE AND CUMULATIVE FREQUENCY

Relative frequency is the proportion of the

total frequency that is in any given class interval

in a frequency distribution.

Relative frequency is the individual class

frequency divided by the total frequency.

The cumulative frequency is a running total

of frequencies through the classes of a frequency

distribution.

The cumulative frequency for each class interval

is the frequency for that class interval added to

the preceding cumulative total.

9

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 1

10

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

QUANTITATIVE DATA GRAPHS

One of the most effective mechanisms for presenting data in a form meaningful to decision makers is graphical depiction.

Through graphs and charts, the decision maker can often get an overall picture of the data and reach some useful conclusions merely by studying the chart or graph.

Converting data to graphics can be creative and artful.

Often the most difficult step in this process is to reduce important and sometimes expensive data to a graphic picture that is both clear and concise and yet consistent with the message of the original data.

One of the most important uses of graphical depiction in statistics is to help the researcher determine the shape of a distribution. 11

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

QUANTITATIVE DATA GRAPHS

Quantitative data graphs are plotted along a

numerical scale, and qualitative graphs are

plotted using non-numerical categories.

12

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Quantitative data graphs

histogramfrequency polygon

ogive dot plotstem-and-leaf plot.

EXAMPLE 2

13

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

HISTOGRAMS

One of the more widely used types of graphs for

quantitative data is the histogram.

A histogram is a series of contiguous bars or

rectangles that represent the frequency of data in

given class intervals.

If the class intervals used along the horizontal

axis are equal, then the height of the bars

represent the frequency of values in a given class

interval. 14

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

HISTOGRAM

If the class intervals are unequal, then the areas

of the bars (rectangles) can be used for relative

comparisons of class frequencies

A histogram is a useful tool for differentiating the

frequencies of class intervals.

A quick glance at a histogram reveals which class

intervals produce the highest frequency totals

15

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

HISTOGRAM

16

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

FREQUENCY POLYGONS

A frequency polygon, like the histogram, is a graphical display of class frequencies.

However, instead of using bars or rectangles like a histogram, in a frequency polygon each class frequency is plotted as a dot at the class midpoint, and the dots are connected by a series of line segments.

Construction of a frequency polygon begins by scaling class midpoints along the horizontal axis and the frequency scale along the vertical axis.

A dot is plotted for the associated frequency value at each class midpoint.

Connecting these midpoint dots completes the graph.

17

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

FREQUENCY POLYGONS

18

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

OGIVES

An ogive (o-jive) is a cumulative frequency polygon.

Construction begins by labeling the x-axis with the

class endpoints and the y-axis with the frequencies.

However, the use of cumulative frequency values

requires that the scale along the y-axis be great

enough to include the frequency total.

A dot of zero frequency is plotted at the beginning of

the first class, and construction proceeds by

marking a dot at the end of each class interval for

the cumulative value.

Connecting the dots then completes the ogive.

19

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

OGIVES

20

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

DOT PLOTS

A relatively simple statistical chart that is generally used

to display continuous, Quantitative data is the dot plot.

In a dot plot, each data value is plotted along the

horizontal axis and is represented on the chart by a dot.

If multiple data points have the same values, the dots

will stack up vertically.

Dot plots can be especially useful for observing the

overall shape of the distribution of data points along with

identifying data values or intervals for which there are

groupings and gaps in the data.

21

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

STEM-AND-LEAF PLOT

Another way to organize raw data into groups

besides using a frequency distribution is a stem-

and-leaf plot.

This technique is simple and provides a unique

view of the data.

A stem-and-leaf plot is constructed by separating

the digits for each number of the data into two

groups, a stem and a leaf.

The leftmost digits are the stem and consist of the

higher valued digits.

The rightmost digits are the leaves and contain

the lower values. 22

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

STEM-AND-LEAF PLOT

23

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

QUALITATIVE GRAPHS

In contrast to quantitative data graphs that are

plotted along a numerical scale, qualitative

graphs are plotted using non-numerical

categories.

24

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

PIE CHARTS

A pie chart is a circular depiction of data where the area of the

whole pie represents 100% of the data and slices of the pie

represent a percentage breakdown of the sublevels.

Pie charts show the relative magnitudes of the parts to the

whole.

They are widely used in business, particularly to depict such

things as budget categories, market share, and time/resource

allocations.

However, the use of pie charts is minimized in the sciences and

technology because pie charts can lead to less accurate

judgments than are possible with other types of graphs.

Generally, it is more difficult for the viewer to interpret the

relative size of angles in a pie chart than to judge the length of

rectangles in a bar chart.

25

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 3

26

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Leading Petroleum Refining Companies

PIE CHART

27

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

BAR GRAPHS

Another widely used qualitative data graphing

technique is the bar graph or bar chart.

A bar graph or chart contains two or more

categories along one axis and a series of bars, one

for each category, along the other axis.

Typically, the length of the bar represents the

magnitude of the measure (amount, frequency,

money, percentage, etc.) for each category.

The bar graph is qualitative because the

categories are non-numerical, and it may be

either horizontal or vertical. 28

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

BAR GRAPHS

In Excel, horizontal bar graphs are referred to as

bar charts, and vertical bar graphs are referred

to as column charts.

A bar graph generally is constructed from the

same type of data that is used to produce a pie

chart.

However, an advantage of using a bar graph over

a pie chart for a given set of data is that for

categories that are close in value, it is considered

easier to see the difference in the bars of bar

graph than discriminating between pie slices.29

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 4

30

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

How Much is Spent on Back to College

Shopping by the Average Student

BAR GRAPHS

31

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

PARETO CHARTS

A third type of qualitative data graph is a Pareto chart, which could be viewed as a particular application of the bar graph.

An important concept and movement in business is total quality management.

One of the important aspects of total quality management is the constant search for causes of problems in products and processes.

A graphical technique for displaying problem causes is Pareto analysis.

Pareto analysis is a quantitative tallying of the number and types of defects that occur with a product or service.

Analysts use this tally to produce a vertical bar chart that displays the most common types of defects, ranked in order of occurrence from left to right. The bar chart is called a Pareto chart. 32

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 6

Suppose the number of electric motors being

rejected by inspectors for a company hasbeen

increasing.

Company officials examine the records of several

hundred of the motors inwhich at least one defect

was found to determine which defects occurred

more frequently.

They find that 40% of the defects involved poor

wiring, 30% involved a short in the coil, 25%

involved a defective plug, and 5% involved

cessation of bearings.33

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

PARETO CHARTS

34

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

GRAPHICAL DEPICTION OF TWO-

VARIABLE NUMERICAL DATA:

SCATTER PLOTS

Many times in business research it is imprtant to

explore the relationship between two numerical

variables.

A graphical mechanism for examining the

relationship between two numerical variables—

the scatter plot (or scatter diagram).

A scatter plot is a two-dimensional graph plot of

pairs of points from two numerical variables.

35

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

36

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

SCATTER PLOT

Registered

Vehicles

(1000's)

Petrol Sales

(1000's of

Liters)

5 60

15 120

9 90

15 140

7 600

100

200

0 5 10 15 20

Pe

tro

l S

ale

s

Registered Vehicles

MEASURES OF CENTRAL TENDENCY:

UNGROUPED DATA

One type of measure that is used to describe a set

of data is the measure of central tendency.

Measures of central tendency yield information

about the center, or middle part, of a group of

numbers.

Measures of central tendency do not focus on the

span of the data set or how far values are from

the middle numbers.

The measures of central tendency here for

ungrouped data are the mode, the median, the

mean, percentiles, and quartiles.37

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

MODE

The mode is the most frequently occurring value

in a set of data.

In the case of a tie for the most frequently

occurring value, two modes are listed. Then the

data are said to be bimodal.

Data sets with more than two modes are referred

to as multimodal. 38

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

MEDIAN

The median is the middle value in an ordered array of

numbers.

For an array with an odd number of terms, the

median is the middle number.

For an array with an even number of terms, the

median is the average of the two middle numbers.

A disadvantage of the median is that not all the

information from the numbers is used.

For example, information about the specific asking

price of the most expensive house does not really

enter into the computation of the median.

The level of data measurement must be at least

ordinal for a median to be meaningful.39

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

MEDIAN

The following steps are used to determine the median.

STEP 1. Arrange the observations in an ordered data

array.

STEP 2. For an odd number of terms, find the middle

term of the ordered array. It is the median.

STEP 3. For an even number of terms, find the

average of the middle two terms. This average is the

median.

40

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

MEAN

The arithmetic mean is the average of a group of

numbers and is computed by summing all

numbers and dividing by the number of numbers.

Because the arithmetic mean is so widely

used, most statisticians refer to it simply as the

mean.

41

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 7

42

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

MEAN

The mean is affected by each and every value,

which is an advantage.

The mean uses all the data, and each data item

influences the mean.

It is also a disadvantage because extremely large

or small values can cause the mean to be pulled

toward the extreme value.

The mean is the most commonly used measure of

central tendency because it uses each data item

in its computation, it is a familiar measure, and

it has mathematical properties that make it

attractive to use in inferential statistics analysis. 43

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

44

VARIABILITY

Mean

Mean

No Variability in Cash Flow

Variability in Cash Flow

PERCENTILES

Percentiles are measures of central tendency that divide a group of data into 100 parts.

There are 99 percentiles because it takes 99 dividers to separate a group of data into 100 parts.

The nth percentile is the value such that at least n percent of the data are below that value and at most (100 - n) percent are above that value.

Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it

The median and the 50th percentile have the same value.

Applicable for ordinal, interval, and ratio data

Not applicable for nominal data 45

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

STEPS IN DETERMINING THE LOCATION

OF A PERCENTILE

46

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 8

Determine the 30th percentile of the following

eight numbers: 14, 12, 19, 23, 5, 13, 28, 17.

47

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

QUARTILES

Measures of central tendency that divide a group of

data into four subgroups

Q1: 25% of the data set is below the first quartile

Q2: 50% of the data set is below the second quartile

Q3: 75% of the data set is below the third quartile

Q1 is equal to the 25th percentile

Q2 is located at 50th percentile and equals the

median

Q3 is equal to the 75th percentile

Quartile values are not necessarily members of the

data set48

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

QUARTILES

49

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 9

50

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 10

51

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 11

52

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE – 12

53

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

LMEASURES OF VARIABILITY:

UNGROUPED DATA

Measures of variability describe the spread or the

dispersion of a set of data.

Common Measures of Variability

Range

Interquartile Range

Mean Absolute Deviation

Variance

Standard Deviation

Z scores

Coefficient of Variation

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

54

RANGE

The difference between the largest and the smallest

values in a set of data

Simple to compute

Ignores all data points except the two extremes

Example:

Range = Largest - Smallest

= 48 - 35 = 13

35

37

37

39

40

40

41

41

43

43

43

43

44

44

44

44

44

45

45

46

46

46

46

48

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

55

INTERQUARTILE RANGE

Range of values between the first and third quartiles

Range of the ―middle half‖

Less influenced by extremes

Interquartile Range Q Q 3 1

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

56

DEVIATION FROM THE MEAN

Data set: 5, 9, 16, 17, 18

Mean:

Deviations from the mean: -8, -4, 3, 4, 5

X

N

65

513

0 5 10 15 20

-8-4

+3+4

+5

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

57

MEAN ABSOLUTE DEVIATION

Average of the absolute deviations from the mean

5

9

16

17

18

-8

-4

+3

+4

+5

0

+8

+4

+3

+4

+5

24

X X X M A D

X

N. . .

.

24

5

4 8

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

58

POPULATION VARIANCE

Average of the squared deviations from the

arithmetic mean

5

9

16

17

18

-8

-4

+3

+4

+5

0

64

16

9

16

25

130

X X 2

X

2

2

1 3 0

5

2 6 0

XN

.

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

59

POPULATION STANDARD DEVIATION

2

2

2

1 3 0

5

2 6 0

2 6 0

5 1

XN

.

.

.

5

9

16

17

18

-8

-4

+3

+4

+5

0

64

16

9

16

25

130

X X 2

X

Square root of the variance

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

60

SAMPLE VARIANCE

Average of the squared deviations from the arithmetic mean

2,398

1,844

1,539

1,311

7,092

625

71

-234

-462

0

390,625

5,041

54,756

213,444

663,866

X X X 2

X X 2

2

1

6 6 3 8 6 6

3

2 2 1 2 8 8 6 7

SX Xn

,

, .

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

61

SAMPLE STANDARD DEVIATION

Square root of the

sample variance 2

2

2

1

6 6 3 8 6 6

3

2 2 1 2 8 8 6 7

2 2 1 2 8 8 6 7

4 7 0 4 1

SX X

S

n

S

,

, .

, .

.

2,398

1,844

1,539

1,311

7,092

625

71

-234

-462

0

390,625

5,041

54,756

213,444

663,866

X X X 2

X X

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

62

USES OF STANDARD DEVIATION

Indicator of financial risk

Quality Control

construction of quality control charts

process capability studies

Comparing populations

household incomes in two cities

employee absenteeism at two plants

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

63

STANDARD DEVIATION AS AN

INDICATOR OF FINANCIAL RISK

Annualized Rate of Return

FinancialSecurity

A 15% 3%

B 15% 7%

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

64

EMPIRICAL RULE

Data are normally distributed (or approximately

normal)

1 2

3

95

99.7

68

Distance from

the Mean

Percentage of Values

Falling Within Distance

EXAMPLE -13

A company produces a lightweight valve that is specified to weigh 1365 grams.

Unfortunately, because of imperfections in the manufacturing process not all of the valves produced weigh exactly 1365 grams.

In fact, the weights of the valves produced are normally distributed with a mean weight of 1365 grams and a standard deviation of 294 grams.

Within what range of weights would approximately 95% of the valve weights fall?

Approximately 16% of the weights would be more than what value?

Approximately 0.15% of the weights would be less than what value? 65

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

66

CHEBYSHEV’S THEOREM

Applies to all distributions

P k X kk

for

( ) 11

2

k > 1

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

67

CHEBYSHEV’S THEOREM

Applies to all distributions

4

2

3 1-1/32 = 0.89

1-1/22 = 0.75

Distance from

the Mean

Minimum Proportion

of Values Falling

Within Distance

Number of

Standard

Deviations

K = 2

K = 3

K = 4 1-1/42 = 0.94

EXAMPLE – 14

In the computing industry the average age of

professional employees tends to be younger than in

many other business professions.

Suppose the average age of a professional employed

by a particular computer firm is 28 with a standard

deviation of 6 years.

A histogram of professional employee ages with this

firm reveals that the data are not normally

distributed but rather are amassed in the 20s and

that few workers are over 40.

Apply Chebyshev’s theorem to determine within

what range of ages would at least 80% of the

workers’ ages fall.68

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

69

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

COEFFICIENT OF VARIATION

Ratio of the standard deviation to the

mean, expressed as a percentage

Measurement of relative dispersion

C V. .

100

EXAMPLE – 15

Five weeks of average prices for stock A are 57,

68, 64, 71, and 62. Compute a coefficient of

variation.

Z- SCORE

A z score represents the number of standard

deviations a value (x) is above or below the mean

of a set of numbers when the data are normally

distributed.

Using z scores allows translation of a value’s raw

distance from the mean into units of standard

deviations.

If a z score is negative, the raw value (x) is below

the mean. If the z score is positive,

the raw value (x) is above the mean70

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EXAMPLE - 16

For a data set that is normally distributed with a

mean of 50 and a standard deviation of

10, Determine the z score for a value of 70.

71

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

EMPIRICAL RULE IN FORM OF Z- SCORE

Between z = -1.00 and z = +1.00 are

approximately 68% of the values.


approximately 95% of the values.


approximately 99.7% of the values.

72

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

73

MEASURES OF CENTRAL TENDENCY

AND VARIABILITY: GROUPED DATA

Measures of Central Tendency

Mean

Median

Mode

Measures of Variability

Variance

Standard Deviation

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

74

MEAN OF GROUPED DATA

Weighted average of class midpoints

Class frequencies are the weights

fM

f

fM

N

f M f M f M f M

f f f f

i i

i

1 1 2 2 3 3

1 2 3

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

75

EXAMPLE -17

Class Interval Frequency Class Midpoint fM

20-under 30 6 25 150

30-under 40 18 35 630

40-under 50 11 45 495

50-under 60 11 55 605

60-under 70 3 65 195

70-under 80 1 75 75

50 2150

fM

f

2150

5043 0.

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

76

MEDIAN OF GROUPED DATA

Median L

Ncf

fW

Where

p

med

2

:

L the lower limit of the median class

cf = cumulative frequency of class preceding the median class

f = frequency of the median class

W = width of the median class

N = total of frequencies

p

med

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

77

EXAMPLE - 18

Cumulative

Class Interval Frequency Frequency

20-under 30 6 6

30-under 40 18 24

40-under 50 11 35

50-under 60 11 46

60-under 70 3 49

70-under 80 1 50

N = 50

Md L

Ncf

fW

p

med

2

40

50

224

1110

40 909.

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

78

MODE OF GROUPED DATA

Midpoint of the modal class

Modal class has the greatest frequency

Class Interval Frequency

20-under 30 6

30-under 40 18

40-under 50 11

50-under 60 11

60-under 70 3

70-under 80 1

Mode

30 40

235

EXAMPLE - 19

For the following grouped data, Find Mean, Mode

and Median.

79

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

80

VARIANCE AND STANDARD DEVIATION

OF GROUPED DATA - PUPOLATION

EXAMPLE - 20

If these is the following data of a population, find

Variance and Std. Deviation.

81

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

AS PER ORIGINAL FORMULA

82

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

AS PER COMPUTATIONAL FORMULA

83

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

84

VARIANCE AND STANDARD DEVIATION

OF GROUPED DATA - SAMPLE

EXAMPLE - 21

Compute the mean, median, mode, variance, and

standard deviation on the following sample data.

85

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

86

MEASURES OF SHAPE

Skewness

Absence of symmetry

Extreme values in one side of a distribution

Kurtosis

Peakedness of a distribution

Leptokurtic: high and thin

Mesokurtic: normal shape

Platykurtic: flat and spread out

Box and Whisker Plots

Graphic display of a distribution

Reveals skewness

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

87

SKEWNESS

Negatively

Skewed

Positively

SkewedSymmetric

(Not Skewed)

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

88

SKEWNESSSKEWNESS

Negatively

Skewed

Mode

Median

Mean

Symmetric

(Not Skewed)

Mean

Median

Mode

Positively

Skewed

Mode

Median

Mean

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

89

COEFFICIENT OF SKEWNESS

Summary measure for skewness

If S < 0, the distribution is negatively skewed

(skewed to the left).

If S = 0, the distribution is symmetric (not skewed).

If S > 0, the distribution is positively skewed

(skewed to the right).

S

Md

3

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

90


1

1

1

1

1 1

1

23

26

12 3

3

3 23 26

12 3

0 73

M

SM

d

d

.

.

.

2

2

2

2

2 2

2

26

26

12 3

3

3 26 26

12 3

0

M

SM

d

d

.

.

3

3

3

3

3 3

3

29

26

12 3

3

3 29 26

12 3

0 73

M

SM

d

d

.

.

.

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

91

KURTOSIS

Peakedness of a distribution

Leptokurtic: high and thin

Mesokurtic: normal in shape

Platykurtic: flat and spread out

Leptokurtic

Mesokurtic

Platykurtic

BOX AND WHISKER PLOT

A box-and-whisker plot, sometimes called a box plot, is a diagram that utilizes the upper and lower quartiles along with the median and the two most extreme values to depict a distribution graphically.

Five specific values are used:

Median, Q2

First quartile, Q1

Third quartile, Q3

Minimum value in the data set

Maximum value in the data set 92

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L


Inner Fences

IQR = Q3 - Q1

Lower inner fence = Q1 - 1.5 IQR

Upper inner fence = Q3 + 1.5 IQR

Outer Fences

Lower outer fence = Q1 - 3.0 IQR

Upper outer fence = Q3 + 3.0 IQR

93

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

94


Q1 Q3Q2Minimum Maximum

Un

it 1 -

Pa

rt II, QA

-1

, KU

NA

L

95

SKEWNESS: BOX AND WHISKER PLOTS, AND


Negatively

Skewed

Positively

Skewed

Symmetric

(Not Skewed)

S < 0 S > 0