252
QUANTITATIVE METHODS STUDY GUIDE PROGRAMME : MBA Year 1 CREDIT POINTS : 20 points NOTIONAL LEARNING : 200 hours over 1 semester TUTOR SUPPORT : [email protected] Copyright © 2013 MANAGEMENT COLLEGE OF SOUTHERN AFRICA All rights reserved; no part of this book may be reproduced in any form or by any means, including photocopying machines, without the written permission of the publisher REF: EQM 2013

MBA 1 Quantitative Methods January 2013

Embed Size (px)

DESCRIPTION

QTM

Citation preview

  • QUANTITATIVE METHODS

    STUDY GUIDE

    PROGRAMME : MBA Year 1

    CREDIT POINTS : 20 points

    NOTIONAL LEARNING : 200 hours over 1 semester

    TUTOR SUPPORT : [email protected]

    Copyright 2013

    MANAGEMENT COLLEGE OF SOUTHERN AFRICA

    All rights reserved; no part of this book may be reproduced in any form or by any means, including

    photocopying machines, without the written permission of the publisher

    REF: EQM 2013

  • Quantitative Methods

    MANCOSA - MBA 1

    TABLE OF CONTENTS

    UNIT

    TITLE OF SECTION

    PAGE

    General Outcomes

    3

    Prescribed Reading

    4

    1

    Graphical Representation

    5

    2

    Measure of Central Tendency

    27

    3

    Measure of Dispersion (Variability)

    53

    4

    Probability

    75

    5

    Probability Distribution

    101

    6

    Hypothesis Testing

    115

    7

    Simple Linear Regression and Correlation Analysis

    149

    8

    Forecasting Time Series Analysis

    167

    9

    Decision Analysis Decision Trees and Payoff Tables

    185

    Solutions to Units Exercises

    209

    References

    243

    Tables

    245

  • Quantitative Methods

    MANCOSA - MBA 2

  • Quantitative Methods

    MANCOSA - MBA 3

    General Outcomes

    Studying this module will enable the student to:

    Apply simple statistical tools and analyses to solve business-related problems.

    Interpret and analyse business data for production, planning, forecasting and other decision-making

    functions.

    Communicate effectively with statistical analysts.

    Apply quantitative methods and techniques to other management disciplines Economics, Accounting,

    Financial Management, Marketing and Research.

    Syllabus: The syllabus for the module is as follows:

    Topic 1: Descriptive Statistics:

    a. Graphical Representation

    b. Measures of central Tendency

    c. Measures of spread

    d. Probability and Probability distributions

    Topic 2: Inferential Statistics:

    a. Hypothesis testing

    b. Simple linear regression and correlation analysis

    Topic 3: Forecasting Time series analysis

    Topic 4: Decision Analysis Decision Trees and payoff tables

    Topic 5: Time Value of Money

    a. Simple and Compound Interest

    b. Depreciation

    c. Present Value

    d. NPV

    e. IRR

  • Quantitative Methods

    MANCOSA - MBA 4

    READING

    Prescribed Textbook:

    Trevor Wegner (2006). Applied Business Statistics: Methods and Applications, Juta & Co, Ltd: Cape Town

    Recommended Textbook:

    Lind, Marchal and Wathen (2005). Statistical Techniques in Business and Economics (12th Edition), New York:

    McGraw-Hill. Chapter 1

    The purpose of this course

    Statistics as a subject has been included in the MBA curriculum because it is needed in two main areas:

    1. Descriptive statistics are used in subjects like Finance, Operations etc. to describe business phenomena.

    When you get to these study areas it will be explained where they are used, and

    2. It is a requirement for an MBA degree that you must complete a research project. In this research project

    you will have to collect data. In processing the data to make decisions you will need inference. Inference

    (hypothesis testing) is covered in the latter part of this course.

  • Quantitative Methods

    MANCOSA - MBA 5

    UNIT 1

    GRAPHICAL REPRESENTATION

  • Quantitative Methods

    MANCOSA - MBA 6

    UNIT 1: GRAPHICAL REPRESENTATION

    OBJECTIVES

    By the end of this study unit, you should be able to:

    1. Recognise whether the type of data under consideration is quantitative, qualitative, or ranked.

    2. Summarise a set of quantitative data by means of a frequency distribution, histogram, relative

    frequency polygon.

    3. Summarise a set of qualitative data by means of a pie chart and bar chart.

    CONTENTS

    1.1 Introduction

    1.2 Types of data

    1.3 Graphical Techniques for Quantitative Data

    1.4 Pie Charts, Bar Charts, and Line Charts

    1.5 Scatter Diagrams

    1.1 Introduction

    The basic types of data is described in this unit. In Section 1.3 some graphical methods to present the data is

    included.

    1.2 Types of data

    Statistics is the science of collecting and analyzing data. Data are obtained by measuring the values of one or

    more variables. Data can be classified as either quantitative data or qualitative data.

    Quantitative data are measurements that are recorded on a naturally occurring numerical scale.

    Some examples of quantitative data are:

    The time that you have to wait for the next bus.

    Your height or weight

    Qualitative data can only be classified into categories like:

    The political party that you support

    Your gender

    Sometimes arbitrary numerical values are assigned to qualitative data like calling males 1 etc.

  • Quantitative Methods

    MANCOSA - MBA 7

    The appropriate graphical method to be used in presenting data depends, in part, on the type of data

    under consideration. Later in the guide, when statistical inference is covered, the data type will help to

    identify the appropriate statistical technique to be used in solving a problem. In a few situations, it will be

    necessary to recognise whether or not a set of non-quantitative data can be ordered. If the categories for

    a set of non-quantitative data can be ordered or ranked, we have a third type of data, called ranked data.

    SELF-ASSESSMENT ACTIVITY 1.1

    How do I identify quantitative data?

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.1

    Quantitative data are real numbers. They are not numbers arbitrarily assigned to represent qualitative data. An

    experiment that produces qualitative data always asks for verbal, non-numerical responses (e.g., yes and no;

    defective and non-defective; Catholic, Protestant, and other).

    Numerical data can also be classified as discrete (when there are only specific values that appear like the

    number of students in a class) or continuous (when you can have intermediate values like your height that can

    be measured more accurately).

    Continuous data are sometimes summarized in tables where the number of data items in each interval is given.

    See the example of interval data in the next table:

    Mass (kg) Frequency

    45-49 6

    50-54 14

    55-59 25

    60-64 11

  • Quantitative Methods

    MANCOSA - MBA 8

    SELF-ASSESSMENT ACTIVITY 1.2

    How do I identify quantitative data?

    For each of the following examples of data, determine whether the data type is quantitative, qualitative,

    or ranked.

    a) the weekly level of the prime interest rate during the past year.

    b) the make of car driven by each of a sample of executives.

    c) the number of contacts made by each of a company's salespersons during a week.

    d) the rating (excellent, good, fair, or poor) given to a particular television program by each of a sample

    of viewers.

    e) the number of shares traded on the New York Stock Exchange each week throughout 2005.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.2

    a) Quantitative, if the interest rate level is expressed as a percentage. If the level is simply observed as

    being high, moderate, or low, then the data type is qualitative.

    b) Qualitative.

    c) Quantitative.

    d) Ranked, because the categories can be ordered.

    e) Quantitative.

    1.3. Graphical Techniques for Quantitative Data

    This section introduced the basic methods of descriptive statistics used for organising a set of numerical

    data in tabular form and presenting it graphically. Summarising data in this way requires that you first

    group the data into classes. Judgment is required concerning the number and the size of the classes to

    be used. The presentation of the grouped data should enable the user to quickly grasp the general shape

    of the distribution of the data.

    SELF-ASSESSMENT ACTIVITY 1.3

    How do I choose the number of classes and the width of the classes to be used in constructing a

    frequency distribution?

  • Quantitative Methods

    MANCOSA - MBA 9

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.3

    Although this choice is arbitrary and no hard-and-fast rules can be given, here are a few useful

    guidelines:

    1. The classes must be non-overlapping, so that each measurement falls into exactly one class.

    Therefore, choose the classes so that no measurement falls on a class boundary.

    2. Choose the number of classes to be used as a number between 5 and 20, with smaller numbers of

    classes being chosen for smaller data sets.

    3. The approximate width of each class is given by the following:

    Approximate class width = classesofNumber

    valueimuminmvalueMaximum

    Choose the actual class width to be a value close to the approximate width that is convenient to work with.

    Avoid awkward fractional values.

    SELF-ASSESSMENT ACTIVITY 1.4

    The weights in kilograms of a group of workers are as follows:

    173 165 171 175 188

    183 177 160 151 169

    162 179 145 171 175

    168 158 186 182 162

    154 180 164 166 157

    1.4.1 Construct a stem and leaf display for these data.

    1.4.2 Construct a frequency distribution for these data.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.4

    1.4.1 The first step in constructing a stem and leaf display is to decide how to split each observation

    (weight) into two parts: a stem and a leaf. For this example, we will define the first two digits of an

    observation to be its stem and the third digit to be its leaf. Thus, the first two weights are split into

    a stem and a leaf as follows:

    Weight Stem Leaf

    173 17 3

    183 18 3

  • Quantitative Methods

    MANCOSA - MBA 10

    Scanning the remaining weights, we find that there are five possible stems (14, 15, 16, 17 and 18), which

    we list in a column from smallest to largest, as shown below. Next, we consider each observation in turn

    and place its leaf in the same row as its stem, to the right of the vertical line. The resulting stem and leaf

    display shown below has grouped the 25 weights into five categories. The second row of the display,

    corresponding to the stem 15, has four leaves: 4, 8, 1 and 7. The four weights represented in the second

    row are therefore 151, 154, 157 and 158.

    Stem Leaf

    14 5

    15 1 4 7 8

    16 2 8 5 0 4 6 9 2

    17 3 7 9 1 5 1 5

    18 3 0 6 2 8

    1.4.2 The hardest, and most important, step in constructing a frequency distribution is choosing the

    number and width of the classes. Constructing a stem and leaf display first is often helpful. For this

    example, the display in part a) suggests using five classes, each with a width of 10 pounds. The number

    (or frequency) of weights falling into each class is then recorded as shown in the table that follows. Care

    must be taken to define the classes in such a way that each measurement belongs to exactly one class.

    We will follow the convention that a class (such as 140 up to 150) contains all measurements from the

    lower limit (140) up to, but not including, the upper limit (150).

    Stem No of leaves

    140 up to 150 1

    150 up to 160 4

    160 up to 170 8

    170 up to 180 7

    180 up to 190 5

    Total 25

    Suppose that we hadn't first constructed a stem and leaf display, or that the stem and leaf display

    contained only a few, or too many, categories. (If the number of measurements is less than 50, the

    frequency distribution should contain between 5 and 7 classes.) We might then begin by noting that the

    smallest and largest measurements are 145 and 188, respectively, so that the range of the

    measurements is 188 - 145 = 43. If we decide to use five classes, the approximate width of each class is

    43/5 = 8.6. In order to work with "round" numbers, we have chosen to use a class width of 10 and to set

    the lower limit of the first class at 140.

  • Quantitative Methods

    MANCOSA - MBA 11

    SELF-ASSESSMENT ACTIVITY 1.5

    Refer to the data in Example 1.4 above

    1.5.1 Construct a relative frequency histogram for the data.

    1.5.2 Construct a relative frequency polygon for the data.

    1.5.3 Construct an ogive for the data.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.5

    1.5.1 The relative frequencies, obtained by dividing each frequency by 25, are shown below:

    Class Limits Frequency Relative

    Frequency

    Cumulative Relative

    Frequency

    140 up to 150 1 0.04 0.04

    150 up to 160 4 0.16 0.20

    160 up to 170 8 0.32 0.52

    170 up to 180 7 0.28 0.80

    180 up to 190 5 0.20 1.00

    00.050.1

    0.150.2

    0.250.3

    0.35

    Re

    lativ

    e fre

    quen

    cy

    Weight (Kg)

    Relative frequency histogram for weight of workers

    140150160170180190

    The relative frequency histogram is constructed by erecting over each class interval a rectangle, the height

    of which equals the relative frequency of that class.

  • Quantitative Methods

    MANCOSA - MBA 12

    1.5.2 The relative frequency polygon is constructed by plotting the relative frequency of each class above

    the midpoint of that class and then joining the points with straight lines. The polygon is closed by

    considering one additional class (with zero frequency) at each end of the distribution and extending a

    straight line to the midpoint of each of these classes.

    1.5.3 The cumulative relative frequencies are shown in the table in part 1.5.1. The cumulative relative

    frequency of a particular class is the proportion of measurements that fall below the upper limit

    of that class. To construct the ogive, the cumulative relative frequency of each class is plotted

    above the upper limit of that class, and the points representing the cumulative frequencies are

    then joined by straight lines. The ogive is closed at the lower end by extending a straight line to

    the lower limit of the first class.

    Weights ( kg )

  • Quantitative Methods

    MANCOSA - MBA 13

    1.4 Pie Charts, Bar Charts, and Line Charts

    The methods described in the previous section are appropriate for summarizing data that are quanti-

    tative, or numerical measurements. But we must also be able to describe data that are qualitative, or

    categorical data. These data consist of attributes, which are the names of the categories into which the

    observations are sorted.

    1.4.1 Pie Chart

    A pie chart is a useful method for displaying the percentage of observations that fall into each category of

    qualitative data, while a bar chart can be used to display the frequency of observations that fall into each

    category. If the categories consist of points in time and the objective is to focus on the trend in

    frequencies over time, a line chart is useful.

    SELF-ASSESSMENT ACTIVITY 1.6

    Refer to the data in Example 1.4 above

    According to the New York Times (27 September 1987), the June levels of unemployment in the United

    States for five years were as follows:

    Year Unemployed (millions)

    1983 10.7

    1984 8.5

    1985 8.3

    1986 8.2

    1987 7.3

    1.6.1 Use a bar chart to depict these data.

    1.6.2 Use a line chart to depict these data.

  • Quantitative Methods

    MANCOSA - MBA 14

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.6

    1.6.1 The five years, or categories, are represented by intervals of equal width on the horizontal axis. The

    height of the vertical bar erected above any year is proportional to the frequency (number of

    unemployed) corresponding to that year.

    Bar Chart for Unemployment

    0

    2

    4

    6

    8

    10

    12

    1983 1984 1985 1986 1987

    Year

    Freq

    uen

    cy ( m

    illio

    ns)

    1.6.2 A line chart is obtained by plotting the frequency of a category above the point on the horizontal axis

    representing that category and then joining the points with straight lines.

    0

    2

    4

    6

    8

    10

    12

    1983 1984 1985 1986 1987

  • Quantitative Methods

    MANCOSA - MBA 15

    SELF-ASSESSMENT ACTIVITY 1.7

    The New York Times article alluded to in self-assessment 1.6 reported that 6 million Americans who say

    they want work are not even seeking jobs.

    A breakdown of these 6 million Americans by race follows:

    Race Frequency

    White 4320000

    Black 1500000

    Other 180000

    Required: Use a pie chart to depict these data.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.7

    A pie chart is an effective method of showing the percentage breakdown of a whole entity into its

    component parts. We must first determine the percentage of the 6 million Americans belonging to each of

    the three racial categories: 72% white, 25% black, and 3% other. Each category is represented by a slice of

    the pie (a circle) that is proportional in size to the percentage (or relative frequency) corresponding to that

    category. Since the entire circle corresponds to 360, the angle between the lines demarcating the White

    sector is therefore (0.72)(360) = 259.2. In a similar manner, we can determine that the *angles for the

    Black and Other sectors are 90 and 10.8, respectively. The pie chart is on the next page.

    (259.2)

    (90)

  • Quantitative Methods

    MANCOSA - MBA 16

    1.4.2 Bar charts

    Bar charts are a quick and easy way of showing variation in or between variables.

    Rectangles of equal width are drawn so that the area enclosed by each rectangle is proportional to the

    size of the variable it represents. This type of graph not only illustrates a general trend, but also allows a

    quick and accurate comparison of one period with another or the illustration of a situation a particular

    time. When drawing up bar charts take care to:

    make the bars reasonably wide so that they can be clearly seen;

    draw them neatly and professionally;

    ensure that the bars all have the same width;

    ensure that the gaps between the bars have the same width.

    We can produce a variety of bar charts to provide an overview of the data.

    Simple bars representing each variable are drawn either vertically or horizontally.

    1.4.3 Component or stacked bar chart

    A single bar is drawn for each variable, with the heights of the bars representing the totals of the

    categories. Each bar is then subdivided to show the components that make up the total bar. These

    components may be identified by colouring or shading, accompanied by an explanatory key to show what

    each component represents.

  • Quantitative Methods

    MANCOSA - MBA 17

    Percentage component bar chart

    The components are converted to percentages of the total, and the bars are divided in proportion to

    these percentages. The scale is a percentage scale and the height of each bar is therefore 100%

    1.4.4 Multiple bar charts

    Two or more bars are grouped together in each category. The use of a key helps to distinguish between

    the categories.

    1.5 Scatter Diagrams

    This section introduced the notion of the relationship between two quantitative variables. Economists, for

    example, are interested in the relationship between inflation rates and unemployment rates. Business

    owners are interested in many variables, including the relationship between their advertising

    expenditures and sales levels. The graphical technique used to depict the relationship between the

    variables X and Y is the scatter diagram, which is a plot of all pairs of values (x, y) for the variables X

    and Y.

  • Quantitative Methods

    MANCOSA - MBA 18

    SELF-ASSESSMENT ACTIVITY 1.8

    An educational economist wants to establish the relationship between an individual's income and

    education. She takes a random sample of 10 individuals and asks for their income (in $1,000s) and

    education (in years). The results are shown below. Construct a scatter diagram for these data, and

    describe the relationship between the number of years of education and income level.

    x (education) y (income)

    11 25

    12 33

    11 22

    15 41

    8 18

    10 28

    11 32

    11 24

    17 53

    11 26

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 1.8

    If we feel that the value of one variable (such as income) depends to some degree on the value of the

    other variable (such as years of education), the first variable (income) is called the dependent variable

    and is plotted on the vertical axis. The ten pairs of values for education (x) and income (y) are plotted in

    Figure 1.5.1, forming a scatter diagram.

    The scatter diagram allows us to observe two characteristics about the relationship between education

    (x) and income (y):

    1. Because these two variables move together-that is, their values tend to increase together and

    decrease together, there is a positive relationship between the two variables.

    2. The relationship between income and years of education appears to be linear, since we can

    imagine drawing a straight line (as opposed to a curved line) through the scatter diagram that

    approximates the positive relationship between the two variables.

  • Quantitative Methods

    MANCOSA - MBA 19

    The pattern of a scatter diagram provides us with information about the relationship between two

    variables. Figure 1.1 depicts a positive linear relationship. If two variables move in opposite directions,

    and the scatter diagram consists of points that appear to cluster around a straight line, then the variables

    have a negative linear relationship (see Figure 1.2). It is possible to have nonlinear relationships (see

    Figures 1.3 and 1.4), as well as situations in which the two variables are unrelated (see Figure 1.5). In

    Unit 7, we will compute numerical measures of the strength of the linear relationship between two

    variables.

    Figure 1.1

    Scatter Diagram for Self Assessment

    0

    10

    20

    30

    40

    50

    60

    0 2 4 6 8 10 12 14 16 18 20

    Years of Education

    Incom

    e ($'

    000)

    F i g 1. 2 N e ga t i v e Li ne a r Re l a t i onshi p

    05

    1015202530

    0 10 20 30X

    Figure 1.3 Nonlinear Relationship

    0

    50

    100

    0 10 20 30X

    Y

    Figure 1.4 Nonlinear Relatiuonship

    05

    101520

    0 10 20 30

    X

    Y

    Fig 1.5 No Relationshp

    010

    2030

    0 10 20 30X

    Y

  • Quantitative Methods

    MANCOSA - MBA 20

    Unit 1 Exercises: (Solutions are found at the end of the module guide)

    Exercise 1.1

    Describe three ways of (graphically) representing data which you can consider to be appropriate for inclusion in a

    companys annual report and accounts. Name the advantages of these forms of data representation.

    Exercise 1.2

    Produce a pie chart showing the percentage market share of the passenger car market held by each of South

    Africas car manufacturers.

    Manufacturer 1991 Sales (Units)

    Toyota

    Nissan

    Volkswagen

    Delta

    Ford

    MBSA

    BMW

    MMI

    51 653

    20 793

    39 757

    20 949

    18 631

    15 756

    15 431

    14 731

    Total 1991 Sales 197 701

    Exercise 1.3

    Produce a component bar chart showing the breakdown of car sales for Toyota, Nissan and Ford only between

    the first and second half of 1991.

    Manufacturer

    1991 Sales (units)

    Total units First Half

    (Jan - June)

    Second Half

    Toyota

    Nissan

    Ford

    15 653

    20 793

    18 631

    19 629

    9 565

    9 875

    32 024

    11 228

    8 756

    Totals 91 077 39 069 52 008

  • Quantitative Methods

    MANCOSA - MBA 21

    Exercise 1.4

    Produce a line graph showing the trend in market share for Volkswagen and Nissan over the period 1982 to

    1991.

    Year Volkswagen Nissan

    1982

    1983

    1984

    1985

    1986

    1987

    1988

    1989

    1990

    1991

    13.4

    11.6

    9.8

    14.4

    17.4

    19.9

    21.3

    22.2

    19.6

    20.1

    9.9

    9.6

    8.2

    6.8

    7.8

    9.7

    11.7

    10.2

    10.6

    10.5

    Comment on the findings.

    Exercise 1.5

    Areas of Continents of the World.

    Continents Area in million of

    Square kilometres

    Africa

    Asia

    Europe

    North America

    Oceania

    South America

    Russia

    30.3

    26.3

    4.9

    24.3

    8.5

    17.9

    20.5

    (i) Draw a bar chart of the above information

    (ii) Construct a pie chart to represent the total area.

  • Quantitative Methods

    MANCOSA - MBA 22

    Exercise 1.6

    The distance travelled (in kilometres) by a courier service motorcycle on 30 trips were recorded by the driver.

    24 19 21 27 20 17 17 32 22 26

    18 13 23 30 10 13 18 22 34 16

    18 23 15 19 28 25 25 20 17 15

    a) Define the random variable, the data type, and the measurement scale. b) From the data set, prepare:

    i. an absolute frequency distribution,

    ii. a relative frequency distribution, and

    iii. the (relative) less than ogive.

    c) Construct the following graphs: i. a histogram of the relative frequency distribution, and

    ii. the cumulative frequency polygon.

    d) From the graphs, read off: i. what percentage of trips was between 25 and 30 km long?

    ii. what percentage of trips were under 25 km long?

    iii. what percentage of trips were 22 km or more?

    iv. below which distance were 55% of the trips made?

    v. above which distance were 20% of the trips made?

    Exercise 1.7

    Vorovka Director Marketing has offices in Windhoek, Johannesburg, Durban and Botswana. The number of

    employees in each location and their genders are tabulated below.

    Office Females Males Total

    Windhoek 12 8 20

    Johannesburg 9 15 24

    Durban 23 6 29

    a) Plot a cluster bar chart to show the total number of employees in each office.

    b) Plot a component bar chart to show the number of employees in each office by gender.

    c) Plot a cluster bar chart to show the number of employees at each office by gender.

  • Quantitative Methods

    MANCOSA - MBA 23

    Exercise 1.8

    Tourists seeking holiday accommodation in a self-catering complex in the resort ABC of Namibia can make either

    a one-or two-week booking. The manager of the complex has produced the following table to show the bookings

    she received last season:

    Type of booking

    Tourists home country One-week Two-week

    France 13 44

    Germany 29 36

    Holland 17 21

    Ireland 8 5

    a) Produce a simple bar chart to show the total number of bookings by home country. b) Produce a component bar chart to show the number of bookings by home country and types of booking. c) Produce a cluster bar chart to show the numbers of bookings by home country and type of booking.

    Exercise 1.9

    A roadside breakdown assistance service answer 37 calls in Cape Town on one day. The response times taken

    to deal with these calls were noted and have been arranged in grouped frequency distribution below.

    Response time (minutes) Number of calls

    20 to under 30 4

    30 to under 40 8

    40 to under 50 17

    50 to under 60 6

    60 to under 70 2

    a) Produce a histogram to portray this distribution and describe the shape of the distribution. b) Find the cumulative frequency for each class. c) Produce a cumulative frequency graph of the distribution.

  • Quantitative Methods

    MANCOSA - MBA 24

    Exercise 1.10

    Rents per person (to the nearest $) for 83 flats and houses advertised on the notice boards at a university were

    collected and the following grouped frequency distribution compiled:

    Rent per person ($) Frequency

    35 - 39 13

    40 - 44 29

    45 - 49 22

    50 - 54 10

    55 - 59 7

    60 - 64 2

    a) Plot a histogram to portray this distribution and comment on the shape of the distribution. b) Find the cumulative frequency for each class. c) Plot a cumulative frequency graph of the distribution.

    Exercise 1.11

    Monthly membership fees in $ for 22 health clubs are:

    34 43 44 22 73 69 48 67 33 56 67

    27 78 60 63 32 67 41 65 48 48 77

    Compile a stem and leaf display of these data.

    The clubs whose fees appear in bold do not have a swimming pool. Highlight them in your display.

    Exercise 1.12

    Select which of the statements listed below on the right-hand side describes the words listed on the left-hand

    side.

    (i) Histogram a) can only take a limited number of values

    (ii) Time series b) segments or slice represents categories

    (iii) Pictogram c) each plotted point represents a pair values

    (iv) Discrete data d) separates parts of each observation

    (v) Stem and leaf display e) each block represents a class

    (vi) Scatter diagram f) data collected at regular intervals over time

    (vii) Pie chart g) comprises set of small pictures

  • Quantitative Methods

    MANCOSA - MBA 25

    Student review questions

    1. Describe the difference between quantitative data and qualitative data.

    2. For each of the following examples of data, determine whether the data are quantitative, qualitative,

    or ranked.

    a) the month of the highest sales for each firm in a sample.

    b) the department in which each of a sample of university professors teaches.

    c) the weekly closing price of gold throughout a year.

    d) the size of soft drink (large, medium, or small) ordered by a sample of customers in a restaurant.

    e) the number of barrels of crude oil imported monthly by the United States.

    3. Identify the type of data observed for each of the following variables.

    a) the number of students in a statistics class.

    b) the student evaluations of the professor (1 = poor, 5 = excellent).

    c) the political preferences of voters.

    d) the states in the United States of America.

    e) the size of a condominium (in square feet).

  • Quantitative Methods

    MANCOSA - MBA 26

  • Quantitative Methods

    MANCOSA - MBA 27

    UNIT 2

    MEASURES OF CENTRAL TENDENCY

  • Quantitative Methods

    MANCOSA - MBA 28

    UNIT 2: MEASURES OF CENTRAL TENDENCY

    OBJECTIVES

    By the end of this study unit, you should be able to:

    Determine the mean, median and mode for grouped and ungrouped data.

    Describe the symmetry/skewness of a set of data in terms of the mean, median and mode.

    Calculate the range, standard deviation, variance, quartiles and inter-quartile range for grouped as well as

    ungrouped data.

    CONTENTS

    2.1 Introduction

    2.2 Ungrouped data

    2.2.1 Mean

    2.2.2 Median

    2.2.3 Mode

    2.3 Grouped data

    2.3.1 Mean for grouped data

    2.3.2 Median for grouped data

    2.3.3 Mode for grouped data

    2.4 The best average

    2.5 Box plots

    2.6 Self-evaluation

  • Quantitative Methods

    MANCOSA - MBA 29

    2.1 Introduction

    This unit discusses numerical descriptive measures used to summarise and describe sets of data. There are

    three commonly used numerical measures of central tendency of a data set: the mean, the median, and the

    mode. You are expected to know how to compute each of these measures for a given data set. Moreover, you

    are expected to know the advantages and disadvantages of each of these measures, as well as the type of data

    for which each is an appropriate measure.

    An average that consists of a single value that is central to or representative of the entire data set is information

    of great importance. The most commonly used averages are the mean, median and mode. There are three

    measures of central tendency that are often used:

    2.2 For ungrouped data

    2.2.1 The arithmetic mean

    The first and most important one is the arithmetic mean (at school you just called this the average). Sometimes

    we merely call the arithmetic mean the mean.

    To calculate the mean of some numbers we merely add the numbers together and divide the total by the number

    of values.

    The mean of: 4, 5, 6, 7, 8, 10 is 40 / 6 = 6.66 (The total of the values is 40 and there are 6 values.)

    In Excel the mean can be found by placing = Average(4,5,6,7,8,10) in a cell.

    The mean can be written as a formula: N

    xx

    i=

    We say X-bar (or the mean) is the sum of the values ( ix s) divided by the number of values (N).

    The arithmetic mean is the most important of all numerical descriptive measurements, and it corresponds to what

    most people call an average.

    Definition 2.1: The arithmetic mean of a list of scores is obtained by adding the scores and dividing the total by

    the number of scores. It will be referred to simply as the mean.

  • Quantitative Methods

    MANCOSA - MBA 30

    Example 1

    Find the mean of the scores 2, 3, 6, 7, 12.

    The mean score is 2 3 6 7 12 6

    5+ + + +

    = .

    Formula 1: Mean: x = x

    n

    Where,

    denotes summation of a set of values.

    x is the variable used to represent raw scores.

    n represents the number of scores being considered.

    The result can be denoted by x if the available scores are samples from a larger population. If all scores of the

    population are available, then we can denote the computed mean by the greek letter (pronounced mu).

    2.2.2 The median

    The median is the middle value of an ordered set of numbers. In the case 4, 5, 6, 7, 8, 10 the middle value is

    between the 6 and the 7. So we say that the median is 6.5.

    Note: It is important that the values must be in the correct order before you choose the middle value.

    Definition 2: The median of a set of scores is the middle value when the scores are arranged in order of

    increasing (or decreasing) magnitude.

    After first arranging the original scores in increasing (or decreasing) order, the median will be either of the

    following:

    1. If the number of scores is odd, the median is the number that is exactly in the middle of the list.

    2. If the number of scores is even, the median is found by computing the mean of the two middle numbers.

    Steps

    Arrange the data in an array.

    Determine the position of the median.

    Median position = 2

    1+n

    Read the value of the median from the number list.

  • Quantitative Methods

    MANCOSA - MBA 31

    Example: Find the median of each data set.

    1. Over a 7-day period, the number of customers (per day) purchasing at Hides Leather Shop was as follows:

    4 80 50 10 60 12 5

    Array:

    4 5 10 12 50 60 80

    Median = (n+1)/2th item = (7+1)/2 = 4th item = 12.

    The median is the fourth item which is 12.

    2. Over an 8-day period, the number of customers observed at the shop per day was as follows:

    21 5 11 7 12 15 20 5

    Array:

    5 5 7 11 12 15 20 21

    Position of median: n + 1 = 8 + 1 = 4.5 (between 4th and 5th positions)

    2 2

    Median = (11+12)/2 = 11.5 (Average of 4th and 5th values)

    SELF-ASSESSMENT ACTIVITY 2.3

    The time taken to complete an assembling task has been measured for a group of employees and the results are

    shown below:

    Find the median in the scores 8, 2, 7, 3, 6, 9.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 2.3

    Begin by arranging the scores in increasing order.

    2 3 6 7 8 9

    We note that the numbers 6 and 7 share the middle position which is the average of the 3rd and 4th positions, i.e.

    the (3+4)/2 = 3.5th position. Thus the median is the average of the 3rd and 4th values.

    The mean of these two scores is therefore 5.62

    76=

    + which is the median.

  • Quantitative Methods

    MANCOSA - MBA 32

    2.2.3 The mode

    The mode is the most common value. If we look at the following set of numbers:

    3, 4, 5, 6, 6, 6, 7 the mode is 6 because it is the number that appears most often.

    Definition 3: The mode is obtained from a collection of scores by selecting the score that occurs most frequently.

    In those cases where no score is repeated there is no mode. Where two scores both occur with the same

    greatest frequency, the data set is bimodal. If more than two scores occur with the same greatest frequency,

    each is a mode and the data set is multimodal.

    For ungrouped data the mode requires no calculation and can easily be obtained from a number list. If there is

    no value that occurs more often than the others, then there is no mode, but this is not the same as a mode of

    zero. A set of data may also have more than one mode and is then said to be bi-modal or multi-modal.

    Example

    1. The commission earnings of five salespeople were as follows for the previous month:

    R5000 R5200 R5200 R5700 R8600

    The modal commission was R5200

    2. The lengths of stay (in days) for sample of 9 patients in a hospital are:

    17 19 19 4 19 26 4 21 4

    The modal lengths of stay are 19 and 4 days.

    Example

    There are 40 buck, 25 elephants and 20 smaller animals at a water hole. The modal category is buck since it

    has the highest frequency.

    The mode is the only central measure that can be used with data at the nominal level of measurement.

    Example

    The hourly income rates (in $) of 5 students are: 4 9 7 16 10

    There is no mode.

  • Quantitative Methods

    MANCOSA - MBA 33

    2.3 Grouped data

    The problem is that we do not always have the actual data.

    Sometimes the data is given as a frequency distribution. If we look at Table 1:

    Table 1

    Mass (in kg) Frequency

    45-49 6

    50-54 14

    55-59 25

    60-64 11

    We know that there are 6 values in the first interval (first class) 45-49, but we do not have the actual values.

    We must still be able to find the mean, the median and the mode.

    2.3.1 Mean for grouped data

    To get the mean, we take the midpoint of every class to represent the class.

    There are 6 values in the first class. The midpoint of the first class is (45+49) / 2 = 47.

    The total for the values of the first class is therefore 6 times 47 = 282.

    The total for the values in the second class is 14 times 52 = 728.

    The total for the values in the class 55-59 is 25 times 57 = 1425.

    The total for the values in the interval 60-64 is 11 times 62 = 682.

    If we add the class totals together we get 3117 (Check if this is correct)

    To get the mean we must now divide by the number of values.

    The number of values are 6+14+25+11 = 56.

    The mean is 3117 divided by 56 = 55.66 kg.

    As a formula we can write this as

    =

    i

    ii

    fxf

    x , where we say x-bar (the mean) is the sum of the frequency

    times the class midpoint, divided by the sum of the frequencies.

    The value for the arithmetic mean that you get from ungrouped data is a better value to use, if the actual

    ungrouped data is available.

  • Quantitative Methods

    MANCOSA - MBA 34

    The mean for grouped data or the mean from a frequency distribution

    Simple Frequency Distribution

    Formula 2.2: mean:

    = ffx

    x

    where x = class mark

    f = frequency

    SELF-ASSESSMENT ACTIVITY 2.1

    The number of times per week that a particular photocopy machine breaks down was recorded over a period of

    60 weeks. The results are given in the frequency table below.

    Number of breakdowns 0 1 2 3 4 5

    Number of weeks 15 12 16 10 5 2

    Required

    1. Find the mean number of breakdowns per week over the 60-week period.

    2. A metro council needs information about the times local bicycle commuters spend on the road. A sample of

    12 local bicycle commuters yields the following times in minutes:

    22 29 27 30 12 22 31 15 26 16 48 23

    Determine the mean travelling time.

  • Quantitative Methods

    MANCOSA - MBA 35

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 2.1

    Table 2.1: Calculations for self assessment activity 2.1

    Note that the figures in the third (fx) column have been formed by multiplying the corresponding figures in the

    first two columns. From Equation 2.2, the mean number of breakdowns per week is:

    73.160

    104===

    ffx

    x

    (Reasonable check: The data are very roughly balanced around 2, which is also the mode. A mean not too far

    from 2 is therefore reasonable.)

    2. Mean: 1.2512301

    12234816261531221230272922

    =+++++++++++

    =x

    Grouped Frequency Distribution

    When using tabulated or grouped data from a frequency distribution, the individual values are not known. To

    enable us to calculate this statistic, we need to assume that observation in a particular interval all take the same

    value, and that value is the midpoint of the interval.

    fxx f=

    x = class midpoint

    f = frequency of each class

    n = number of observation in the sample = f

    Steps

    compute the midpoint (x) for each class.

    multiply each midpoint by the respective frequency of that class (xf) and sum the product (xf).

    Sum the frequency column, n = f

    Divide the xf by n

    x f fx

    0 15 0

    1 12 12

    2 16 32

    3 10 30

    4 5 20

    5 2 10

    Total f = 60 fx = 104

  • Quantitative Methods

    MANCOSA - MBA 36

    Example

    The times taken to complete a particular assembling task have been measured for 250 employees and the

    results are shown below.

    Time (min) No. of people (f) x fx

    0 - 5 2 2.5 5.0

    5 - 10 2 7.5 15.0

    10 - 15 3 12.5 37.5

    15 - 20 5 17.5 87.5

    20 - 25 5 22.5 112.5

    25 - 30 18 27.5 495.5

    30 - 35 85 32.5 2 762.5

    35 - 40 92 37.5 3 450.0

    40 - 45 37 42.5 1 572.5

    45 - 50 1 47.5 47.5

    Total 250 8 585.0

    The arithmetic mean time is: 34.34250

    8585===

    ffx

    x min.

    Activity

    The times during working hours in a factory when a certain machine is not operating as a result of breakage are

    recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the mean of the

    distribution

    Time (min) f

    0 - 10

    10 - 20

    20 - 30

    30 - 40

    40 - 50

    50 - 60

    60 - 70

    70 - 80

    80 - 90

    3

    13

    30

    25

    14

    8

    4

    2

    1

    Total 100

  • Quantitative Methods

    MANCOSA - MBA 37

    2.3.2 The median for grouped data

    As with the mean, we can get the median from grouped data as well. In this case we look at the cumulative

    frequency.

    There are 56 values in the table below, so the middle value will be value number 56 divided by 2 = 28. We want

    to estimate what value number 28 was.

    Mass (in kg) Frequency Cumulative

    Frequency

    45-49 6 6

    50-54 14 20

    55-59 25 45

    60-64 11 56

    At the end of the interval 45-49, we only have 6 values, so this is not at the median yet.

    At the end of the 50-54 interval, we have 20 values, this is still short of the value 28 that we are looking for.

    At the end of the interval 55-59, we have passed 45 values, this means we passed value number 28 as we

    moved through the interval 55-59.

    The median can be found from the following interpolation formula:

    Me

    MeMe f

    cFnLMedian )2/( 1+=

    where MeL is the lower limit of the median class. We said that the median class is the class 55-59. The lower

    limit is the smallest value that will be rounded to this class, which is 54.5.

    n is 56, the sum of the frequencies, so n/2 is 28.

    1MeF is the cumulative frequency of the class that precedes the median class, which is 20. (Make sure you can

    see where this value comes from in the table of cumulative frequencies.

    Mef is the frequency of the median class, which is 25 (from the table). c is the class width, which is 5. You can take 59-54 to get it, or you can take the actual class limits 59.5 minus

    54.5.

    Put these values into the formula and we get

    1.566.15.5425

    5)2028(5.54 =+=+=median

    Check that this value is in fact in the class 55-59.

  • Quantitative Methods

    MANCOSA - MBA 38

    Note: You have to know the basic structure of the formula. In this guide different letters will be used in the

    formula. You must know the formula, not the symbols used to represent the different variables. What would

    happen in the exam if the formula is given with different symbols, would you still be able to calculate the median?

    As with the mean, the value for the median that you get from ungrouped data is more accurate. If you have the

    data available (like when you do your research project) it is better to use the ungrouped data to get the median.

    Calculation of Median for Grouped data

    The median can be determined either graphically or by calculation. With grouped data we are unable to

    determine where the true middle value falls, but we can estimate the median by using a formula and assuming

    that the median value will be the th

    n

    2item.

    Median = cfFn

    Lm

    +2

    L = lower boundary of the median class

    f = sum of all the frequencies up to, but not including, the median class or the cumulative

  • Quantitative Methods

    MANCOSA - MBA 39

    SELF-ASSESSMENT ACTIVITY 2.2

    The time taken to complete an assembling task has been measured for 250 employees and the results are

    shown below:

    Time taken (min) Number of people (f) Cumulative

  • Quantitative Methods

    MANCOSA - MBA 40

    2.3.3 The mode from grouped data

    The mode is the most common value. It is the maximum value of the histogram that we want to estimate.

    Mass (in kg) Frequency Cumulative

    Frequency

    45-49 6 6

    50-54 14 20

    55-59 25 45

    60-64 11 56

    The mode can be found by first deciding in what class it is and then using an interpolation formula.

    From the table we see that the class (interval) with the highest frequency is the class 55-59 with a frequency of

    25. So we say that the class 55-59 is the modal class.

    The interpolation formula is 11

    1

    2)(

    +

    +=MoMoMo

    MoMoMo fff

    cffLMode

    MoL , the lower limit of the modal class is 54.5,

    Mof , the frequency of the modal class is 25,

    1Mof , the frequency of the previous class is 14,

    1+Mof , the frequency of the next class is 11, And c , the class width is 5.

    Put these values into the formula to get

    7.562.25.541114252

    5)1425(5.54 =+=

    +=Mode

    So the mode is 56.7.

    Later a different formula will be given where MoMo ffd = 11 , so again make sure that you are not confused if the formula looks different, it is the same formula. Remember if a lecturer uses a formula that looks slightly

    different, it is up to you as a masters level student to check that it is still the same formula.

    Unlike the median and the mean, the value we get for the mode is more accurate from grouped data. So

    whenever possible calculate the mode from the grouped data.

  • Quantitative Methods

    MANCOSA - MBA 41

    Calculation of the mode from a grouped frequency distribution.

    It is not possible to calculate the exact value of the mode of the original data in a grouped frequency distribution,

    since information is lost when the data are grouped. However, it is possible to make an estimate of the mode.

    The class interval with the largest frequency is called the modal class.

    (Note: The following formula looks different. Does it give the same answer?)

    Mode = L + 1

    1 2

    dc

    d d

    +

    Where:

    L = lower limit of the modal class.

    1d = frequency of the modal class minus the frequency of the immediately preceeding class.

    2d = frequency of the modal class minus the frequency of the class that immediately follows the modal class.

    c = the length of the class interval of the modal class.

    Steps

    Select the class containing the highest frequency as the modal class.

    Use the formula to estimate the modal value.

    Activity

    The number of times during working hours in a factory when a certain machine is not operating as a result of

    breakage are recorded for a sample of 100 breakdowns and summarized in the following distribution. Find the

    mode of the distribution

    Time (min) f

    0 10 3

    10 20 13

    20 30 30

    30 40 25

    40 50 14

    50 60 8

    60 70 4

    70 80 2

    80 90 1

    Total 100

  • Quantitative Methods

    MANCOSA - MBA 42

    Solution

    The interval having the highest frequency, namely 30, is the 3rd interval: (20 30).

    Mode = L + 1

    1 2

    dc

    d d

    +

    min27.2727.720

    221702010

    22172010

    517133020 =+=+=+=

    +

    +=

    We used 20 as the lower limit, because if you look at the table you will see that the data are continuous and the

    values are not rounded off. 19.999 would be in the class 10 to 20, while 20.00001 would be in the class 20 to 30.

    2.4 The Best Average/Symmetry

    The different averages have different advantages and disadvantages, and there are no objective criteria that

    determine the most representative average of all data sets. Each researcher has to use his/her own discretion on

    a set of data.

    The mean is the most familiar average. It exists for each data set, takes every score into account, is affected by

    extreme scores, and works well with many statistical methods.

    The median is commonly used. It always exists, does not take every score into account, is not affected by

    extreme scores, and is often a good choice if there are some extreme scores in the data set.

    The mode is sometimes used. It might not exist, or there may be more than one mode. It does not take every

    score into account, is not affected by extreme score, and is appropriate for data at the nominal level.

    The best measure for central location

    The arithmetic mean is more affected by extreme values. If your data has some values that are very large or

    small (relative to the other values) then it is better to use the median. When we get to the normal distribution in a

    later unit, you will see why the arithmetic mean is important.

    Skewness

    If there are large extreme values in your data the mean will be pulled to the right and we say that the distribution

    is positive skew.

    For a symmetrical distribution the mean, median and mode will be about the same

    ModeMedianx == If we measure the mass or height of people it is usually a symmetrical (or normal) distribution. IQs or test results

    are also usually from a normal distribution.

  • Quantitative Methods

    MANCOSA - MBA 43

    A histogram of a symmetrical distribution is given in the following figure:

    For a distribution that is skewed to the right the mode will be less than the median and the median will be less

    than the mean.

    xMedianMode

  • Quantitative Methods

    MANCOSA - MBA 44

    A histogram that is skewed to the left (negative skewed) is shown in the following figure:

    As a general rule the difference between the median and the mode is about twice the difference between the

    mean and the median.

    If the data are skewed to the left there are some outliers on the left (small values). If the data are skewed to the

    right then there are some large outliers.

    If the mean is 55.66, the Median is 56.1 and the Mode is 56.7. We thus have ModeMedianx

  • Quantitative Methods

    MANCOSA - MBA 45

    A comparison of the mean and median can reveal information about skewness. Data can be identified as

    skewed to the left, symmetric, skewed to the right. Data skewed to the left will have the mean and median to the

    left of the mode, but in unpredictable order, as illustrated below:

    The Relative Positions of the Mean, Median, and Mode:

    Symmetric DistributionZero skewness :Mean =Median = Mode

    ModeMedianMean

    The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

    Positively skewed: Mean>Median>Mode

    ModeMedian

    Mean

  • Quantitative Methods

    MANCOSA - MBA 46

    Negatively Skewed: Mean

  • Quantitative Methods

    MANCOSA - MBA 47

    2.5 Box plots

    The box plot (box-and-whisker diagram) is a part of exploratory data analysis and reveals more information about

    how the data is spread. The construction of a box plot requires the minimum, the maximum, the median, and two

    other values called hinges.

    Definition 1: The minimum score, the maximum score, the median, and two hinges constitute a 5-

    number summary of a set of data.

    Definition 2: The lower hinge is the median of the lower half of all scores (from the minimum score up

    to the original median).

    Definition 3: The upper hinge is the median of the upper half of all scores (from the original median up

    to the maximum score).

    1. Arrange the data in ascending order.

    2. Find the median.

    3. List the lower half of the data from the minimum score up to and including the median found in step 2. The

    left hinge is the median of these scores (This value is called the first quartile).

    4. List the upper half of the data starting with the median and including it in the scores up to and including the

    maximum. The right hinge is the median of these scores. (This is called the third quartile).

    5. List the minimum, the left hinge (from step 3), the median (from step 2), the right hinge (from step 4), and the

    maximum.

    Example. Construct the box plot for the following 20 scores:

    9, 8, 6, 12, 4, 15, 7, 16, 8, 6, 13, 5, 9, 16, 4, 2, 6, 15, 9, 3

    Arranging in increasing order, the list is:

    2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16.

    The lower half, after finding the median score 8 and including it, is:

    2, 3, 4, 4, 5, 6, 6, 6, 7, 8, 8. The median of these scores is 6.

    The upper half including the median of 8 is:

    8, 8, 9, 9, 9, 12, 13, 15, 15, 16, 16. The median of these score is 12.

    The minimum score is 2, the maximum score is 16, the median is 8, the left hinge is 6, and the right hinge is 12.

    To construct the box plot, begin with a horizontal (vertical) scale. Box the hinges as shown and extend the lines

    to connect the minimum score to a hinge and the maximum score to a hinge.

  • Quantitative Methods

    MANCOSA - MBA 48

    0 6 8 122 104 14 16 18 20

    Unit 2 Exercises: (Solutions are found at the end of the module guide)

    Exercise 2.1

    A supermarket sells kilogram-bags of pears. The numbers of pears in 21 bags were:

    7 9 8 8 10 9 8 10 10 8 9

    10 7 9 9 9 7 8 7 8 9

    a) Find the mode, median and mean for these data.

    b) Compare your results and comment on the likely shape of the distribution.

    c) Plot a simple bar chart to portray the data.

    Exercise 2.2

    The number of credit cards carried by 25 shoppers are:

    2 5 2 0 4 3 0 1 1 7 1 4 1

    3 9 4 1 4 1 5 5 2 3 1 1

    a) Determine the mode and median of this distribution.

    b) Calculate the mean of the distribution and compare it to the mode and median.

    What can you conclude about the shape of the distribution?

    c) Draw a bar chart to represents the distribution and confirm your conclusions in (b).

  • Quantitative Methods

    MANCOSA - MBA 49

    Exercise 2.3

    A supermarket has one checkout for customers who wish to purchase 10 items or less.

    The numbers of items presented at this checkout by 19 customers were:

    10 8 7 7 6 11 10 8 9 9

    9 6 10 9 8 9 10 10 10

    a) Find the mode, median and mean for these data.

    b) What do your results for (a) tell you about the shape of the distribution?

    c) Plot a simple bar chart to portray the distribution.

    Exercise 2.4

    The numbers of driving tests taken to pass by 28 clients of a driving school are given in the following table:

    a) Obtain the mode, median and mean from this frequency distribution and compare their value.

    b) Plot a simple bar chart of the distribution.

    Exercise 2.5

    2.5.1 Spina Software Solutions operates an on-line help and advice service for PC owners. The numbers of

    calls made to them by subscribers in a month are tabulated below.

    2.5.2

    Number of subscribers

    Calls made Female Male

    1 31 47

    2 44 42

    3 19 24

    4 6 15

    5 1 4

    Find the mode, median and mean for both distributions and use them to compare the two distributions.

    Tests taken Number of clients

    1 10

    2 8

    3 4

    4 3

    5 3

  • Quantitative Methods

    MANCOSA - MBA 50

    Exercise 2.6

    Toofley the chemists own 29 pharmacies. The number of packets of a new skin medication sold in each of their

    shops in a week were:

    7 22 17 13 11 20 15 18 5 22

    6 18 10 13 33 13 9 8 9 19

    19 8 12 12 21 20 12 13 22

    a) Find the mode and range of the data.

    b) Identify the median of the data.

    c) Find the lower and upper quartile values.

    d) Determine the semi-interquartile range.

    Exercise 2.7

    Voditel international owns a large fleet of company cars. The mileages, in thousands of miles, of a sample of 17

    of their cars over the last financial year were:

    11 31 27 26 27 35 23 19 28 25

    15 36 29 27 26 22 20

    Calculate the mean and standard deviation of these mileage figures.

    Exercise 2.8

    Three credit companies each produced an analysis of its customers bills over the last month. The following

    results have been published:

    Company Mean bill size Standard deviation of bill size

    Akula N$559 N$172

    Bremia N$612 N$147

    Dolg N$507 N$161

    Are the following statements true or false?

    a) Dolg bills are on average the smallest and vary more than those from the other companies.

    b) Bremia bills are on average the largest and vary more than those from other companies.

    c) Akula bill are on average larger than those from Dolg and vary more than those from Bremia.

    d) Akula bill are on average smaller than those from Bremia and vary less than those from Dolg.

    e) Bremia bill are on average larger than those from Akula and vary more than those from Dolg.

    f) Dolg bill vary less than those from Akula and are on average less than those from Bremia.

  • Quantitative Methods

    MANCOSA - MBA 51

    Exercise 2.9

    The Kilocalories per portion in a sample of 32 different breakfast cereals were recorded and collated into the

    following grouped frequency distribution:

    Kcal per portion Frequency

    80 up to 120 3

    120 up to 160 11

    160 up to 200 9

    200 up to 240 7

    240 up to 280 2

    a) Obtain an approximate value for the median of the distribution.

    b) Calculate approximate values for the mean and standard deviation of the distribution.

    Exercise 2.10

    The stem and leaf display below shows the Friday night admission prices for 31 clubs.

    Stem Leaves

    0 44

    0 5555677789

    1 000224444

    1 5555588

    2 002

    Leaf unit =N$1

    Find the values of the median and semi-interquartile range.

    Exercise 2.11

    Select which of the statements on the right-hand side best defines the words on the left-hand side.

    (i) median (a) the square of the standard deviation

    (ii) range (b) a diagram based on order statistics

    (iii) variance (c) the most frequently occurring value

    (iv) boxplot (d) the difference between the extreme observations

    (v) SIQR (e) the middle value

    (vi) mode (f) half the difference between the first and third quartiles

  • Quantitative Methods

    MANCOSA - MBA 52

    Student self review questions

    1) What is a measure of location.

    2) How is the arithmetic mean defined?

    3) Why is the special notation x1,x2,.,x,, used?

    4) What does fx mean?

    5) Why is the formula for the arithmetic mean of a frequency distribution different to that for the mean of a

    set?

    6) How is it that the mean of a grouped frequency distribution cannot be calculated exactly?

    7) In what situation would a weighted mean be used?

    8) Why is the mean considered to be the mathematical average?

    9) What is the main disadvantage of the mean?

    10) How is the mode defined?

    11) Why is the mode not used extensively in statistical analysis?

    12) Under what conditions may any one of the mean, median or mode be estimated, given the values of the

    other two?

    13) Write down the definition of the geometric mean and the type of values that it can be used to average.

    14) Write down the definition of the harmonic mean and type of values that it can be used to average.

    15) How is the median defined?

    16) If a set has an even number of items, how can the median be determined?

    17) Describe briefly how to estimate the median of a grouped frequency distribution graphically.

    18) What is the graphical equivalent of the interpolation formula?

    19) On balance, why is the graphical method preferred to the formula method for estimating the median?

    20) Name two separate conditions under which the median rather than the mean would be chosen as a

    measure of location and explain why.

    21) What is the main disadvantage of the median?

    22) What characteristic of the mean deviation precludes it from being the natural partner to the mean?

    23) How is the standard deviation defined?

    24) What is the practical advantage in using the computational formula for calculating the standard deviation?

    25) The standard deviation is the natural partner to the mean. Explain why this is so.

    26) What percentage of an approximately symmetric distribution lies within two standard deviation from the

    mean?

    27) What is the coefficient of variation and how is it used?

    28) How is Pearsons measure of skewness calculated and how does it measure skewness?

    29) What is the variance and why is it not used for practical purposes as a measure of dispersion?

  • Quantitative Methods

    MANCOSA - MBA 53

    UNIT 3

    MEASURE OF DISPERSION (VARIABILITY)

  • Quantitative Methods

    MANCOSA - MBA 54

    UNIT 3: MEASURE OF DISPERSION (VARIABILITY)

    OBJECTIVES

    By the end of this study unit, you should be able to:

    Define the various measures of dispersion.

    Compute each dispersion measure for both grouped and ungrouped sets of data.

    Interpret each measure of dispersion.

    CONTENT

    3.1 Introduction

    3.2 Range

    3.3 Standard deviation

    3.4 Variance

    3.5 Coefficient of variation

    3.6 Measure of non-central position

    3.7 Self-evaluation

  • Quantitative Methods

    MANCOSA - MBA 55

    3.1 Introduction

    For two projects A and B, we estimate the returns on the projects over the next year. We look at the percentage

    return that will be achieved under different conditions (pessimistic, normal or optimistic).

    Pessimistic Normal Optimistic

    Project A 12 13 14

    Project B 0 13 26

    Must the company invest in Project A or Project B if the probability that the pessimistic, normal or optimistic

    conditions will prevail are equal?

    For Project A the mean is %133/)141312( =++=x For Project B the mean is %133/)26130( =++=x

    The mean returns for the projects are equal. That means the expected returns for the projects are equal. Would

    you prefer Project A, where your minimum return is 12% or Project B, where you could make no return at all

    (0%)? You will do a course on Finance as part of the MBA. In this course you will learn that you have to select

    the project that is more predictable (you want to maximize your return, but at the same time you want to minimize

    your risk). The returns for the projects are the same, but Project A is more predictable. In statistics we need

    measures to measure this spread. For the example above (with only three values) it is easy to see that Project B

    has a wider spread of returns, but what happens if we have hundreds of values?

    The variability among data is one characteristic to which averages are not sensitive. Consider following two

    groups of data:

    Group A Group B

    65

    66

    67

    68

    71

    73

    74

    77

    77

    77

    42

    54

    58

    62

    67

    77

    77

    85

    93

    100

  • Quantitative Methods

    MANCOSA - MBA 56

    Computed Averages:

    Group A

    Mean = 71510

    = 71.5

    Median = 72

    Mode = 77

    Group B

    Mean = 71510

    = 71.5

    Median = 72

    Mode = 77

    Interpretation

    Although there is no difference in the computed central measures between the two groups, the scores of Group

    B are much more widely scattered than the scores for Group A.

    SELF-ASSESSMENT ACTIVITY 3.1

    Which types of measures are used to measure dispersion (variability)?

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.1

    The measures that are used to measure dispersion are:

    Range

    Standard deviation

    Interquartile range

    Quartile deviation

    Variance

    The method of computation, appropriate data types, uses and interpretation of each are now described.

    3.2 Range

    The first measure is the range. This is merely the biggest value minus the smallest value. For project A above it

    is 14 - 12 = 2%, while for Project B it is 26 0 = 26%. The problem with this measure is that it looks only at the

    two observations, we would rather have a measure that uses all the values.

    The range is simply the difference between the highest value and the lowest value. For group A, the range is 77

    65 = 12, and the range for group B is 100 42 = 58, which suggests greater dispersion. The range depends

    only on the maximum and minimum scores, and is a rough measure of spread.

  • Quantitative Methods

    MANCOSA - MBA 57

    Ungrouped data: Range = Maximum value Minimum value = max minx x

    Grouped data: Range = Upper limit of highest class Lower limit of lowest class.

    SELF-ASSESSMENT ACTIVITY 3.2

    The merchandising manager for a retail clothing chain has recorded 30 observations on the number of days

    between re-orders for a particular range of womans clothing.

    The re-order intervals (in days) are:

    18 26 15 17 7 27 24 17 10 17

    23 29 28 18 10 23 16 9 12 26

    5 12 23 22 24 14 16 26 19 22

    Find the range of the number of days between re-orders.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.2

    maxx = 29

    minx = 5

    Range = 29 5 = 24 days

    Interpretation

    24 days separates the shortest time ( minx ) between successive re-orders from the longest time ( maxx ) between

    successive re-orders for a particular range of womans clothing. The range depends only on the minimum and

    maximum scores.

  • Quantitative Methods

    MANCOSA - MBA 58

    3.3 Standard deviation

    The standard deviation is given by the formula:

    1)( 2

    =

    n

    xxS .

    For the following data, with the probability of the outcomes assumes equal, the standard deviation is calculated

    as:

    Pessimistic Normal Optimistic

    Project A 12 13 14

    Project B 0 13 26

    For Project A the standard deviation is 122

    13)1314()1313()1312( 222

    ==

    ++=S .

    For project B the standard deviation is 132

    33813

    )1326()1313()130( 222==

    ++=S .

    We see that the standard deviation for Project B is 13 times as large as the standard deviation for Project A.

    On Excel the standard deviation for project A can be found by placing =stdev(12,13,14) in a cell.

    In the exam you do not have Excel, so you will have to use a calculator. Most calculators can calculate the

    statistical functions.

    1. Put the calculator on Stat mode.

    2. Enter 12

    3. Press the DATA button (usually the M+ button).

    4. The calculator displays 1, this means that you have entered one value.

    5. Enter 13 and press DATA, the calculator displays 2.

    6. Enter 14 and press DATA, the calculator displays 3.

    7. Now ask for x , (It is usually second function 4) and the calculator will display 13.

    8. Ask for nS , (Usually second function 6) and the calculator will display 1. (if you are working with a sample

    you would use 1nS . (Some calculators show this as 1n )

    Try this for Project B to see that you are doing it correctly.

  • Quantitative Methods

    MANCOSA - MBA 59

    In Unit 5 we will come back to this. At this stage we can state that about two thirds of the values fall within one

    standard deviation from the mean. About two thirds (about 37) values fall between 55.66-20.53 = 35.13 and

    55.66+20.53 = 76.19. This gives us an indication of how far the values are from the mean (the central value).

    In Corporate Finance the risk (uncertainty) is often measured with the standard deviation. They often say that

    the risk is 20.53, but to be correct they should say that the standard deviation is 20.53.

    3.3.1 Ungrouped data

    2( )1

    x xs

    n

    =

    . Mathematical formula.

    or

    ( ) ( )22( 1)

    n x xs

    n n

    =

    Computational formula.

    Steps (Mathematical formula)

    1. Compute the arithmetic mean ( x ).

    2. Subtract the mean from each data value: ( x x ).

    3. Square each difference: ( )2x x . 4. Sum the squared differences: ( )2x x . 5. Calculate the average by dividing the sum by ( )1n . Division by ( )1n is to correct the bias in estimating

    the population standard deviation using the sample standard deviation.

    6. The standard deviation is the square root of this total.

    Example

    Find the standard deviation of the following sample scores: 2, 3, 5, 6, 9, 17

    x ( )x x 2( )x x 2 -5 25 3 -4 16 5 -2 4 6 -1 1 9 2 4 17 10 100

    = 42 = 0 = 150

    7642

    : ==xmean

  • Quantitative Methods

    MANCOSA - MBA 60

    Using the mathematical formula for the ungrouped data, the standard deviation is:

    5.5305

    15016

    150==

    =s

    We will now use the computational formula for the self assessment exercise above.

    From the previous table above, the sum of x is: 42x = .

    The sum of the squares is: 2x = 4 + 9 + 25 + 36 + 81 + 289 = 444.

    Thus the standard deviation is:

    ( ) ( )5.530

    30900

    3017642664

    )16(6)42()444(6

    )1(222

    ==

    =

    =

    =

    nn

    xxns

    The answer is identical to result calculated previously.

    Check whether you get the same answer if you use the statistics function on the calculator.

    3.4.2 Grouped data

    If the actual raw data are not available and we have to calculate the standard deviation from the grouped data,

    we use the formula: 1

    22

    =

    fxnfx

    S .

    Table 1

    Mass (in kg) Class midpoint Frequency 2fx 45-49 47 6 6 times 47 times 47 = 13254

    50-54 52 14 14 times 52 times 52 = 37856

    55-59 57 25 25 times 57 times 57 = 81225

    60-64 62 11 11 times 62 times 62 = 42284

    In Unit 2 (See 2.2.1) we calculated the mean as 55.66 kg and we saw that the total frequency is 56.

    To get the 2fx , we have to add the column 13254+37856+81225+42284 = 174 619.

    53.20156

    66.55561746191

    222

    =

    =

    =

    fxnfx

    S

  • Quantitative Methods

    MANCOSA - MBA 61

    If data have been grouped into a frequency distribution, each class is represented by its midpoint ( )x . 2( )

    1x x f

    sn

    =

    Mathematical formula

    Steps

    1. Compute the arithmetic mean ( )x . 2. Subtract the mean from each midpoint and square the difference: 2( )x x . 3. Multiply the squared difference by the frequency within each class: 2( )x x f . 4. Sum the result to obtain the total squared deviation from the mean: 2( )x x f . 5. Calculate the average by dividing this total by ( 1)n . 6. The standard deviation is the square root of this total.

    OR

    ( ) ( )22( 1)

    n fx fxs

    n n

    =

    Computational formula

    x = class mark (midpoint of class interval)

    f = frequency

    n = sample size

    SELF-ASSESSMENT ACTIVITY 3.3

    The errors in seven invoices were recorded as follows: 120, 30, 40, 8, 5, 20, 29

    Use this data to calculate the standard deviation using both the Mathematical formula and Computational

    formula.

  • Quantitative Methods

    MANCOSA - MBA 62

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.3

    Mathematical formula

    x ( )x x 2( )x x 120

    30

    40

    8

    5

    20

    29

    84

    -6

    4

    -28

    -31

    -16

    -7

    7 056

    36

    16

    784

    961

    256

    49

    252 0 9 158

    x = 2527

    = 36 07.3117

    91581

    )( 2

    =

    =

    n

    xxs

    Computational formula

    x 2x

    120

    30

    40

    8

    5

    20

    29

    14400

    900

    1600

    64

    25

    400

    841

    x =252 2x =18 230

    ( ) ( )22( 1)

    n x xs

    n n

    =

    ( ) ( )27 18230 252

    7(7 1)

    =

    127610 63504

    42

    =

    6410642

    = 1526.333= = 39.07

  • Quantitative Methods

    MANCOSA - MBA 63

    SELF-ASSESSMENT ACTIVITY 3.4

    The times (in hours per week) that 50 office staff members spent using personal computers were as follows:

    Time (hours/week) Frequency (f)

    0 - 3

    3 - 6

    6 - 9

    9 - 12

    12 - 15

    15 18

    14

    6

    6

    7

    14

    3

    f = 50

    Use this data to compute the standard deviation using both Mathematical and Computational formulae.

    SOLUTION TO SELF-ASSESSMENT ACTIVITY 3.4

    Mathematical formula approach

    Time (h) f x fx 2( )x x 2( )x x f 0 - 3

    3 - 6

    6 - 9

    9 - 12

    12 - 15

    15 - 18

    14

    6

    6

    7

    14

    3

    1.5

    4.5

    7.5

    10.5

    13.5

    16.5

    21

    27

    45

    73.5

    189

    49.5

    43.56

    12.96

    0.36

    5.76

    29.16

    70.56

    609.84

    77.76

    2.16

    40.32

    408.24

    211.68

    = 50 = 405 = 1 350.00

    Mean: x = fxf

    = 40550

    = 8.1 h.

    Standard Deviation: 25.5150

    13501

    )( 2

    =

    =

    n

    fxxs h.

  • Quantitative Methods

    MANCOSA - MBA 64

    Using Computational formula approach

    Time (h) f x fx 2x 2fx 0 - 3

    3 - 6

    6 - 9

    9 - 12

    12 - 15

    15 - 18

    14

    6

    6

    7

    14

    3

    1.5

    4.5

    7.5

    10.5

    13.5

    16.5

    21

    27

    45

    73.5

    189

    49.5

    2.25

    20.25

    56.25

    110.25

    182.25

    272.25

    31.5

    121.5

    337.5

    771.75

    2551.5

    816.75

    f = 50 x = 54 fx = 405 2

    x = 43.5 2fx = 4630.5

    ( ) ( )22( 1)

    n fx fxs

    n n

    =

    ( ) ( )250 4630.5 40550(50 1)

    =

    231525 1640252450

    =

    675002450

    = 27.55= = 5.248 5.25 hours

    3.4 Variance

    The variance is the square of the standard deviation.

    The variance for Project A is 12, and for project B it is 132 = 169.

    Computation for ungrouped data

    Example:

    Consider the ages (in years) of 7 second hand cars: 13 7 10 15 12 18 9

    Age in years (x ) x x x ( )2x x 13 12 +1 1

    7 12 -5 25

    10 12 -2 4

    15 12 +3 9

    12 12 0 0

    18 12 +6 36

    9 12 -3 9

    Total ( )x x = 0 ( )2x x = 84

  • Quantitative Methods

    MANCOSA - MBA 65

    Step 1:

    Find the sample mean. x = x

    n

    = 847

    = 12 years.

    Step 2:

    Find the squared deviation of each observation from the sample mean.

    Since ( )x x =0, in column 3 above, the deviation must first be squared to avoid the plus and minus deviations cancelling each other. These squared deviations are then summed (see column 4 above).

    Step 3:

    Compute the variance by dividing the total squared deviation by (n-1).

    i.e., variance ( 2s ) =

    1)( 2

    n

    xxw

    = 84

    7 1 =

    846

    = 14

    The formula for a variance can now be expressed as:

    Variance = 1sizesample

    deviationssquaredofsum

    22 ( )

    1x x

    sn

    =

    Mathematical formula

    The above mathematical formula for the variance is very complex. A more efficient approach using computational

    technique is strongly recommended for students.

    2 22 ( )

    ( 1)x n x

    sn

    =

    Computational formula

    Example

    The variance for the car age problem.

    The computational variance formula is used.

    Age of car in years ( x) 2x

    13 7 10 15 12 18 9

    169 49 100 225 144 324 81

    x = 84 2x = 1 092

    22 1092 (7)(12 )

    (7 1)s

    =

    = 146

    84=

    n = 7 x = 847

    = 12 years

  • Quantitative Methods

    MANCOSA - MBA