Upload
api-3818523
View
210
Download
2
Embed Size (px)
Citation preview
STX1110 Introduction to Quantitative Methods © Middlesex University Business School
STX1110
Introduction to Quantitative Methods
Module Notes
Mathematics and Statistics Group
Page 1
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
sbegin
Contents Unit Page
Statistics Section
S1 Collecting Data 1 5
S2 Collecting Data 2 19
S3 Summarising and Presenting Data 1 27
S4 Summarising and Presenting Data 2 47
S5 Summarising and Presenting Data 3 63
S6 Numerical summaries of Data 1 77
S7 Numerical summaries of Data 2 91
S8 Correlation and Regression 1 107
S9 Correlation and Regression 2 123
S10 Estimation 141
Mathematics Section
M1 Financial Maths 1 157
M2 Financial Maths 2 177
M3 Index Numbers 191
M4 Intro to Probability 215
M5 Standard Normal distribution 231
M6 General Normal distribution 243
M7 Linear Equations 253
M8 Linear Programming and Optimisation 269
M9 Time Series Analysis 285
Page 3
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
STATISTICS SECTION
Original author: Cathy Minett-Smith Revisions by: Thomas Bending
Alison Megeney
S1 Collecting Data 1 Page 5
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=DC1
STARTSECTION=scope_1.htm= SECTION~
Collecting Data 1 Context
Statistics is concerned with scientific methods for collecting, summarising, presenting and analysing data (information) which may be numerical or non numerical. Quite often we need to make decisions, or draw valid conclusions when we are given incomplete information. For example, we may need to say something about the total sales of a company when we only have a small selection of the company’s invoices. Statistical methods can help you make the ‘best decision’ or draw the ‘most reasonable’ conclusion.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should:
• appreciate the relevance of Statistics; • be able to explain what is understood by the Statistical term ‘population’; • be able to explain what a statistical sample from a population is; • understand why in certain situations it may be necessary to take a
sample; • be able to explain the necessary steps involved in a selection of sampling
methods. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Why study Statistics?
There are a number of reasons why it makes sense to have a basic grasp of statistics. Below are listed just some of them.
• Much of the information we have to process at home, as consumers, at work and in the community comes in the form of numbers, graphs and charts. A statistical awareness helps us to make sense of the information we are confronted with, particularly when it is being used in a misleading way.
• Many of the important decisions we have to make involve handling numbers and weighing up risk. A knowledge of statistics won’t guarantee you make the right decision, but at least your decisions should be better informed.
S1 Collecting Data 1 Page 6
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
• Almost every academic subject is becoming increasingly quantitative so it is becoming harder to hide from the need to have a basic understanding of statistics. This is particularly true of business.
• You may not believe me but Statistics can be enjoyable. By the end of this module you may find yourself agreeing with me!
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Statistical quotes
“There are 3 kinds of lies: Lies, damned lies and Statistics”, Mark Twain
“Statistics is like a bikini; what they reveal is suggestive, what they conceal is vital”, Mr Motivator (GMTV 1996)
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Statistical investigations
There are four key stages in a Statistical Investigation;
• pose a question; • collect relevant data; • analyse the data; • interpret the results.
Collecting relevant data is determined by whether or not the information will allow you to answer your central question. If your initial question is clearly formulated it will be easier to make sensible decisions at the other stages of the investigation.
We are going to spend time considering the second step of the investigation process: Collecting Relevant Data. This falls into two main sections;
• Identifying individuals or items to question, test or measure. • Collecting information from each person or item identified.
We will deal with the first point in this unit and leave the second point until the next unit.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Sampling methods
Before we get into the detail of various sampling methods we need to explain some terminology which we will be using.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
S1 Collecting Data 1 Page 7
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Sampling terminology
Population
In statistics, this term ‘population’ isn’t restricted to referring just to a population of animals or humans but it is used to describe any large group of things that we are trying to measure.
Sample
A sample is a selection of items from the population which will be questioned or measured. A good sample is one which fairly represents the population from which it was taken.
Examples
Suppose the central question of our investigation is ‘What is the average amount of time each week that Middlesex University Students spend in the library?’
• Population : All Middlesex University Students • Possible sample: Students registered for STX1110.
Would this sample be likely to generate an answer that is representative of the population?
Suppose the central question of our investigation is ‘What proportion of the components coming off a production line are defective?’
• Population : All products produced. • Possible sample : Every 50th item produced.
Would this sample be likely to generate an answer that is representative of the population?
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Taking a Census
If you measure or question all of the population this is called ‘taking a census’. Obviously, if you question everyone and record the answers accurately then your conclusions will be absolutely correct. This may leave you thinking; ‘Why bother to take a sample then?’ Consider the following points.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
S1 Collecting Data 1 Page 8
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Why take a Sample?
• In practice, a census rarely achieves the completeness required. • A census is typically expensive both financially and in respect of the
time required. The time and money may not be available and the cost of the census may exceed the value of the results.
• By the time you complete a census the results may be out of date. • If your central question means the items measured need to be tested to
destruction, a census will result in the manufacturer having no product left to sell!
• It may be the case that your population is unidentifiable. This is particularly true in situations such as market research.
The main disadvantage of taking a sample is that sometimes it can be difficult to convince other people that your sample results will generalise to the whole population. It is important to ensure that your sample is representative or unbiased. Generally, an unbiased sample result will generalise to a correct population result. Consequently, sampling is one of the most important subjects in quantitative methods. Various sampling methods have been developed to ensure that the resulting sample is unbiased.
When choosing a sample it is important that the individuals or items in the sample cover all areas of the population to be examined. If this requirement is not met the resulting sample will be biased.
ENDSECTION STARTSECTION=content_8.htm= SECTION~
Sampling Frame
Some sampling methods require all members of the population to be known and identifiable. The structure which supports this identification is called a sampling frame. Some methods require a sampling frame only as a listing of the population while other methods also need certain characteristics of each member to be known. For example, employees can be identified from company records; in addition we can identify if the employee is male or female, full time or part time etc. A sampling frame should have the following characteristics;
• Completeness. Are all the members of the population included on the list and are the necessary subgroups, i.e. male/female identifiable?
• Accuracy. Is the information correct? • Is the list up to date? • Is the sampling frame readily available? • Does each member appear only once in the list?
S1 Collecting Data 1 Page 9
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Two readily available sampling frames for the UK population are the council tax register (list of dwellings) and the electoral register (list of individuals).
ENDSECTION STARTSECTION=content_9.htm= SECTION~
Simple Random Sampling
A simple random sample is selected in such a way that every item in the population has an equal chance of being selected. The following steps are involved in taking a simple random sample.
• Acquire a sampling frame and number all the individuals from 1 to the size of the population.
• Generate a random number. Your calculator will generate a random number for you, or use a computer or use random number generator tables.
• Select the member of the population whose number matches the generated random number.
• Repeat the process to the required sample size.
A sample generated this way will be unbiased but it does have some disadvantages.
• You need a sampling frame • Each selected person needs to be located and questioned. This may take
a long time and the individual may be untraceable. • Certain attributes may be over or under represented. For example the
ratio of male/female employees in your sample may not reflect the ratio of male/female employees in your workforce.
ENDSECTION STARTSECTION=content_10.htm= SECTION~
Stratified Sampling
This is a method of selecting the right proportion of respondents from each attribute/subgroup of the population. To take a stratified sample you need to complete the following steps;
• Acquire a sampling frame with the required attributes known for each individual.
• Split the population into certain attributes or subgroups. • Calculate how many individuals to sample from each subgroup as
explained below. • Use simple random sampling to select the relevant number of
respondents from each subgroup.
S1 Collecting Data 1 Page 10
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
Suppose your workforce is made up of the following types of employee
Number of Employees by category
Type of employee Number of employees
Manual 200
Administrative 70
Manager 30
300
If a sample of size 50 was required the sample would be made up as follows
Number of Employees by category in the sample
Type of employee Number of employees in the sample
Manual 3333.3350300200
==×
Administrative 1267.115030070
==×
Manager 55030030
=×
50
Select the 33 manual workers by completing a simple random sample on the list of all manual workers in the company etc.
The need for a detailed sampling frame generally causes an increase in time and effort required to obtain the sample. Consequently stratified sampling is only used when you feel the people in the different subgroups or strata will respond differently. In many situations the subgroups or strata could involve multiple classifications. In the above example I could classify each manual worker as male or female, full time or part time etc and likewise for the other types of worker. As the number of strata or subgroups increase the sample size also needs to increase to reflect the complexity in the structure of the sample.
We are not always fortunate enough to have a sampling frame. In such situations we need to resort to non-random sampling techniques to generate our sample.
ENDSECTION STARTSECTION=content_11.htm= SECTION~
S1 Collecting Data 1 Page 11
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Systematic Sampling
Systematic sampling can be used with or without a sampling frame and may provide a good approximation to random sampling. To take a systematic sample you randomly select a starting point and then select every nth item . For example to select a sample of size 30 from the 300 students on a module list then every 10th ( 300 ÷ 30) student after a random start in the first 10 students should be selected from the list. It is particularly useful for situations where the population is physically in evidence such as items coming off a production line or a row of houses etc.
Systematic sampling does not require a sampling frame, although you can use it with a sampling frame, and it is very easy to use. Bias can occur if recurring sets in the population are possible. For example, if 4 machines feed a production line and we sample every 4th item it is possible to end up with a sample consisting of products all made on the same machine which would then be biased.
ENDSECTION STARTSECTION=content_12.htm= SECTION~
Quota Sampling
This is the most popular method of sampling in areas such as market research. The method uses an interviewer or team of interviewers each with a set number (quota) of subjects to interview. This method of sampling places a lot of responsibility on the interviewer as the choice of the subject to be interviewed is entirely up to them which can lead to bias in the sample. This can be overcome to some extent by subdividing the quota into different types to ensure the sample represents the population. For example the interviewer could be told to interview 30 males between the ages of 20 and 30 etc. This method of sampling is not random and it is hard to eliminate interviewer bias but it is administratively easy and relatively cheap.
Refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 4 (pages 59-80), for a more detailed discussion of these sampling techniques and details of techniques not discussed
in this lecture such as cluster sampling and multi-stage sampling.
ENDSECTION STARTSECTION=content_13.htm= SECTION~
The Size of a Sample
As well as deciding on an appropriate sampling method for a given situation, the size of the sample selected also needs some consideration. There is no rule for determining a sample size but various points need to be considered when deciding on the size of the sample.
S1 Collecting Data 1 Page 12
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
• The larger the size of the sample the more accurate will be the results. However, there reaches a point when there is little to be gained by increasing the sample size.
Administrative considerations usually play a greater role in determining the sample size. Considerations such as;
• Money and time available; • Aims of the study and the precision required; • The number of subgroups or strata required.
Module requirements,
Every week you will need to complete the following
• Read through the notes covered in the lectures and complete the seminar questions at the end of the units covered in lectures before the following week’s seminar. You will need to make a note of any points you need to discuss further with your seminar tutor or an advice centre tutor.
• Go to the STX1110 Oasis page attempt the quizzes and access the seminar solutions which are released at regular intervals. Make a note of anything you wish to discuss with your STX1110 seminar tutor or an advice centre tutor.
• Check your e-mail and the STX1110 Oasis page for any announcements and information sent by the STX1110 team.
• Complete further reading from relevant chapters from Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot.
S1 Collecting Data 1 Page 13
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Questions
For each unit you are expected to complete the seminar questions that accompany each lecture and bring your answers to the seminars. Tutors will check your work has been completed.
Therefore, you must complete the following seminar questions before the next seminar.
You are advised that if you do not participate in this module by attempting questions and arriving prepared for seminars you will NOT receive a participation (attendance) mark for the seminar.
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Seminar Question S1.1
a. A workforce consists of 500 workers, 350 of whom are male and 150 female. How many males and females would be included in a sample of size 100 if the proportions of males and females in the sample are to be the same as those in the population?
b. If males and females are further classified as being full time or part time explain the four subgroups that the population can be split up into and indicate how many workers in the population are in each subgroup if 100 men and 50 women are part time. How many employees would be selected from each of these groups in a sample of size 100?
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Question S1.2
Identify the major sources of bias in each of these situations.
a. A survey is conducted to study the extent of use of convenience foods (such as frozen foods) by households in a community. A random sample of households is selected and the data collected by telephone interviews made during the hours of 8am to 5pm. Non respondents are ignored.
b. A radio station conducts a poll to identify what are the best restaurants in a community by asking its listeners to call the station and state their opinions.
c. An organisation is interested in monthly household expenditures for groceries. Representatives of the organisation conduct exit surveys of every 3rd shopper at several major supermarkets on weekday afternoons.
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
S1 Collecting Data 1 Page 14
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question S1.3
What biases might the following sampling methods pick up?
a. Estimating yearly sales figures of a shop over a ten year period by sampling sales figures systematically every 6 months.
b. Estimating a household’s monthly expenditure by sampling their bills and payments on the first week of every month.
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S1.4
A proposal was received by the Local Authority Planning Officer for a motel, public house and restaurant to be built on some private land in the city suburbs. Following an article by the builder in the local paper, the office received 300 letters of which only 28 supported the proposal. What conclusions can the planning officer draw from these statistics? Describe what action could be taken to gauge people’s views further.
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question S1.5
The following is a list of general practice doctors. Also recorded is whether the doctors are in practice by themselves (S), have a partner (P), or are in a group practice (G).
a. Using the following list of 10 random numbers indicate which doctors will be included in a simple random sample of size 10. 0.9120 0.0124 0.5246 0.3287 0.7895 0.3366 0.0003 0.1025 0.1157 0.8425
b. Indicate which doctors would be included in a systematic sample if the sampling interval is 5 and you start with Mark Hillard.
c. The survey will consider issues of how doctors respond to out of hours calls from their patients and consequently the type of practice may be an important factor in the response. Indicate how many doctors from each type of practice would be selected if a stratified sample of size 15 were to be used. How would you then select the relevant numbers of doctors who are in partnership?
S1 Collecting Data 1 Page 15
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Physician Type of practice
Physician Type of practice
R. E. Scherbarth, M.D. S Gregory Yost, M.D. P
Crystal R. Goveia, M.D. P J. Christian Zona, M.D. P
Mark D. Hillard, M.D. P Larry Johnson, M.D. P
Jeanine S. Huttner, M.D. P Sanford Kimmel, M.D. P
Francis Aona, M.D. P Harry Mayhew, M.D. S
Janet Arrowsmith, M.D. P Leroy Rogers, M.D. S
David DeFrance, M.D. S Thomas Tafelski, M.D. S
Judith Furlong, M.D. S Mark Zilkoski, M.D. G
Leslie Jackson, M.D. G Ken Bertka, M.D. G
Paul Langenkamp, M.D. S Mark DeMichie, M.D. G
Philip Lepkowski, M.D. S John Eggert, M.D. P
Wendy Martin, M.D. S Jeanne Fiorito, M.D. P
Denny Mauricio, M.D. P Michael Fitzpatrick, M.D.
P
Hasmukh Parmar, M.D. P Charles Holt, M.D. P
Ricardo Pena, M.D. P Richard Koby, M.D. P
David Reames, M.D. P John Meier, M.D. P
Ronald Reynolds, M.D. G Douglas Smucker, M.D. S
Mark Steinmetz, M.D. G David Weldy, M.D. P
Geza Torok, M.D. S Cheryl Zabarowski, M.D.
P
Mark Young, M.D. P
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• explain what is meant by the terms statistical population and sample; • why is sampling used to collect data; • explain the steps involved in taking a simple random sample; • explain what a stratified sample is and why one might be used; • calculate how many respondents should be surveyed in each group of a
stratified sample; • be able to explain what a systematic and quota sample are, give
examples of where they are appropriate and list their weaknesses and strengths.
S1 Collecting Data 1 Page 16
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Every week you should log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is given below.
Step 1. Log on to Oasis and open the STX1110 oasis page. Step 2. Click the London Module content icon.
London Module Content
Step 3. Choose whether you would like to try either a mathematics or statistics
topic from STX1110.
Statistics
Mathematics
Step 4. Now you will see a page containing links for each topic in that section. For example, unit S1 Collecting Data.Click on the topic that you would like to try.
S1: Collecting Data 1
Step5. You will now see a table of contents for this topic. To get to the quizzes go
to the bottom of the list and click on EXTRA .
Table of Contents
1. UnitS1 - Data Collection 1
S1 Collecting Data 1 Page 17
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
2. UnitS1 - Content
3. UnitS1 - Activity
4. UnitS1 - Think
5. UnitS1 - Extra
Step 6. You should now see the following. Just click on begin quiz and have a go at the questions. Keep a record of your working and answer so that you can compare it to the solution and feedback provided. If you need help drop in to an advice centre session in S206 or see your seminar tutor.
Additional Content and Activities
Complete the following Quiz and Questions. There are 7 questions in this formative (non-assessed) assessment with two optional questions.
*Click here to begin quiz*
Remember to click on the button at the top of the screen to move onto the next question
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S2 Collecting Data 2 Page 19
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=DC2
STARTSECTION=scope_1.htm= SECTION~
Collecting Data 2 Context
In the previous unit we said that a statistical investigation has four stages.
• pose a question; • collect relevant data; • analyse the data; • interpret the results.
We began looking at the collection of data in the previous unit by introducing the ideas of sampling as a way of selecting items or individuals from whom to collect relevant data. We also said that the data which is collected will depend on the question or purpose of the investigation. In this unit we will move onto methods of collecting data or information from identified individuals.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• appreciate the basic different types of data; • explain the difference between primary and secondary data; • design questionnaires; • appreciate the advantages and disadvantages of some methods of
questionnaire distribution.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Secondary and Primary Data Sources
Data generation costs time and money so before you rush into trying to generate your own data take time to check that someone else hasn’t already done it for you. Data which have already been collected by someone else are called Secondary Data.
There are various sources of secondary data such as published statistics through the Office of National Statistics. The European Union and United Nations also publish statistics as do many newspapers. Libraries are full of valuable sources of secondary data.
S2 Collecting Data 2 Page 20
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
For a more detailed discussion of sources of secondary data refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 4, page 76.
In the absence of suitable secondary data you may have to carry out an experiment or survey yourself. In doing so you would generate primary source data which would be specific to your particular investigation.
In this unit we will concentrate on survey methods of collecting primary data from people. There are two basic methods of conducting surveys. You can either observe people’s behaviours or ask people questions. We will be concentrating on the latter of these two options.
There are basically two ways of surveying the opinions of people.
• Interview them (which could involve the use of a questionnaire) • Conduct a postal questionnaire
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Interviews
Interviews typically take one of two forms: face to face interviews or telephone interviews. In principle collecting information by conducting interviews is easy. It needs someone to pose questions, which may be provided in the form of a questionnaire, and then listen to and record the answers. In practice it is more complicated and interviewers need training to make sure they get reliable answer and that they record those answers accurately. For example, without training an interviewer may explain questions to people in such a way as to provoke a particular sort of response introducing a bias known as interviewer bias into the results. Similarly interviewers should not direct respondents to a particular answer or response by their expression or tone of voice. One of the main drawbacks of personal interviews is the cost.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Postal Surveys
Sending a printed questionnaire through the post has the advantage of being cheap and easy to organise so that very large samples can be used. Obviously one of the drawbacks of a postal survey is that the questions can’t be explained to respondents and there is no opportunity to clarify points that they don’t understand. So it is important that the questionnaire is relatively straight forward. Perhaps the main disadvantage of postal surveys is the low response rate. Generally a survey can expect to get replies from about 20% of the questionnaires. Response rates can be improved by making the questionnaires short or making follow up telephone calls. Alternatively a reward for completing
S2 Collecting Data 2 Page 21
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
the questionnaire can be offered which is known as an incentive. However, the use of incentives often introduces a bias to the survey.
For a greater discussion of personal interviews and postal surveys refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 4.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Questionnaire Design
Usually each of the survey methods mentioned relies on the use of a questionnaire to collect the information. You have no way of checking that a questionnaire has been answered truthfully or that the respondent has understood it properly. It is therefore crucial that a questionnaire is well designed to reduce inaccuracies in answers which is termed response bias.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Questionnaire Design Guidelines
Although it may seem easy, designing a good questionnaire is difficult. An enormous amount of work has been done on the design of questionnaires which has led to some guidelines for good practice.
• Questionnaires should ask a series of related questions and should be as short as possible.
• Questions should follow a logical sequence. • The questions should be simple, unambiguous and easy to understand. If
people don’t understand the question they will give a convenient answer or no answer rather than a true one.
• Questions should use everyday language and not involve technical jargon.
• Questions shouldn’t involve calculations or test of memory. • Be careful with the wording of questions so that they are not offensive or
leading. Even simple changes in phrasing can give quite different results. Use neutral phrases where possible; for example, instead of saying ‘do you like this cake?’ the question could be rephrased as ‘rate the taste of this cake on a scale of 1 to 5’
• Do not ask irrelevant questions. • Avoid vague questions such as ‘do you usually buy more meat than
vegetables?’ this raise questions such as, what is usual and what is more?
S2 Collecting Data 2 Page 22
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
• Avoid open questions where possible. They are difficult to answer. Instead ask questions which allow precoded answers so that respondents are offered a series of choices and can select the most appropriate. They are easier to answer and analyse.
• Phrase all personal questions carefully. ‘Have you retired from paid work?’ might receive better responses and be just as useful as the more sensitive ‘How old are you?’
These are just a few pointers in the design of a good questionnaire. If you read around the subject you’ll discover many more.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Pilot Surveys
Having developed a questionnaire it is a good idea to trial it on a few respondents before using it to collect you data. This is called a pilot survey and can help sort out any problems in your questionnaire which may save lots of time and money later.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
Errors in survey methods for collecting data
There are three main types of error that can appear in survey methods.
• Sampling error which arises when the sample selected is not representative of the population. A great deal of consideration needs to be given at the sampling stage to decide what your target population is and insure that the sample represents this target population.
• Response error which can occur when respondent are unable to respond either because they didn’t understand the question or were guessing etc. It can also occur if an answer is incorrectly or inaccurately recorded.
• Non response error occurs when respondents refuse to take part in the survey and is a particular problem for postal questionnaires.
ENDSECTION STARTSECTION=content_8.htm= SECTION~
S2 Collecting Data 2 Page 23
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Validity and Precision
In addition to the correct recording of data there are two important ideas about data quality which relate to the way in which the data were obtained.
Validity: A valid method of measurement or observation which actually measures the concept you intended it to. A related concept is the absence of bias and systematically producing results that are different (above or below) the true value.
For example: “Children like exams. Attendance is highest at exam time” This statement was reportedly said by Sir Rhodes Boyson MP former headmaster and education minister on ITN News at One 17th August 1988. What variable was Boyson trying to measure and what can you say about the validity of his measurement?
Precision: How precisely has a variable been measured? For example, is age measured to the nearest year, month or day? A related concept is reliability. A reliable method of measurement is one that will produce a similar answer if repeated.
For example: Suppose you have the annual salary of 5 employees
£13,458 £19,496 £12,752 £26,785 £16,220
How would they appear if they had been measured to the nearest £1000? What is the total salary of these employees to the nearest £1000?
S2 Collecting Data 2 Page 24
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Seminar Questions
Seminar Question S2.1
a) Calculate the total for the following set of data.
41 62 87 96 32 39
b) Calculate the total again after first fairly rounding the data to the nearest 10.
c) Calculate the total again using biased rounding by rounding each data item down to the lowest 10.
d) Calculate the total again using biased rounding by rounding each data item up to the highest 10.
e) What do your answers suggest about the effect of rounding on subsequent calculations?
f) Can you think of an example of data that is typically rounded down when it is recorded? Can you also think of an example of data that is typically rounded up when it is recorded?
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Question S2.2
The questionnaire below, surveying people’s health practices and attitudes, is based on examples given in the book Surveys in Social Research by D.A. Vaus. Have a go at answering the questionnaire and then give some thought as to the sorts of biases and inaccuracies it might produce.
A Survey on Health
How healthy are you?
Are the health practices in your household run on matriarchal or patriarchal lines?
How often do your parents visit the doctor?
Do you oppose or favour cutting health spending, even if cuts threaten the health of children and pensioners?
Do you agree or disagree with the government's policy on the funding of medical training?
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
S2 Collecting Data 2 Page 25
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question S2.3
List some possible sources of secondary data. You will need to read a text such as Essential Quantitative Methods for Business, Management and Finance, or Business Basics to answer this question.
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S2.4
On 11th February 1985, under the banner headline ‘We’ve had ENOUGH!’, a tabloid newspaper produced the following questionnaire. Make a note of any criticisms of the way it was designed.
ENDSECTION STARTSECTION=think_1.htm= SECTION~
S2 Collecting Data 2 Page 26
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• explain the difference between primary and secondary sources of data; • list some sources of secondary data; • design/criticise questionnaires; • explain sources of error in conducting a survey and suggest ways of
minimising these errors; • explain how interviews and postal surveys are conducted and discuss
their strengths and weaknesses.
Extra Activities
Every week you should log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S3 Summarising and Presenting Data 1 Page 27
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=SPD1
STARTSECTION=scope_1.htm= SECTION~
Summarising and Presenting Data 1 Context
Data is a term you will come across repeatedly during your study of quantitative methods but what does it mean? People tend to think of data as collections of numbers. Yet data may be non numerical and even numeric data can belong to different categories. Data is simply a scientific term for facts, figures, information and measurements. The nature of the data that you have will determine what form of statistical analysis is appropriate. From the outset it is important to determine what sort of data you have.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• identify different types of data; • compile a frequency table of a discrete variable; • compile and discuss cross tabulations; • appreciate the use of percentages for making comparisons; • construct and interpret graphical methods useful in summarising
discrete data. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Types of Data
Text books will describe a number of data classifications. However, all data falls into one of two basic types: attributes and variables.
Attributes
An attribute is something an object has either got or not got. For example, an individual will be classified as either male or female. Another word for attribute data is categorical or qualitative data. The measurements of a categorical variable fall into one and only one of a set of categories. For example, an employee may be considered to be full time, part time or contractual. The specific names or qualities do not contain any implicit ordering, i.e. a full time employee is not considered to be better than a part time employee.
S3 Summarising and Presenting Data 1 Page 28
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Variables
A variable is something which can be measured. For example a person’s weight can be measured according to some scale such as kilograms (kgs), or the number of rooms in a house. Variables can be further classified as discrete or continuous.
Discrete variables
Discrete data is characterised by the fact that it can be measured precisely. For example, the number of defective items coming off a production line, shoe size and the number of children in a family are all discrete variables. It should be fairly obvious to you that the number of defective products and the number of children are discrete as these variables can only take integer values, i.e. 1,2,3,4 etc. Shoe size may appear a little different since we have half sizes in the U.K. i.e. someone could have a shoe size of 5½ which is not an integer. However, a British shoe size could not be 5.236 etc hence the variable is discrete as it has a limited range of values and is measured precisely. Typically a discrete variable is recorded as a count, but not always.
Continuous Variables
Continuous data may take on any value and is typically measured rather than counted. Continuous variables are not measured precisely but are approximated. For example we typically measure someone’s height to the nearest centimetre but there is no reason why the measurements could not be made to the nearest one hundredth of a centimetre. Two people who have the same height to the nearest centimetre could almost certainly be distinguished if more precise measurements were taken.
If you read through some text books data will be further classified as being measured on the nominal, ordinal, interval or ratio scale. This level of detail is not required for this module but you will find an explanation of these scales in a text book such as Quantitative Methods for Business by Donald Waters (Addison-Wesley).
ENDSECTION STARTSECTION=content_2.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 29
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Summarising and presenting data
Having collected your data it needs to be presented in a concise and meaningful manner so that patterns and characteristics in the data are immediately apparent. Tables and graphs are a very effective way of displaying any patterns in the data or relationships that may exist between variables. During the next three weeks we will look at some of the tabular and graphical methods popular in presenting the data in a concise and easy to understand way. They are purely descriptive techniques and provide little opportunity for further detailed numerical analysis of the data.
In this unit we will concentrate on attribute and simple discrete variables as the presentation techniques used for each of these types of data is essentially the same.
Tables
One of the easiest and most effective ways of presenting data is in a table. This is perhaps the most widely used method of data presentation. Whenever you pick up a newspaper, magazine or report you are likely to see a table. Spreadsheets make the design and manipulation of tables very easy.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Frequency table or frequency distribution
A frequency table (or distribution) is a tabular summary of a set of data showing the frequency (or number) of data items in each of several categories or non-overlapping classes.
Example
A frequency table of the categorical data gender for a workforce could be
Table 1
Gender Frequency (count)
Male 12
Female 3
Total 15
S3 Summarising and Presenting Data 1 Page 30
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The table shows the number of data items in each category. Sometimes a table of counts is presented so that the percentage or proportion of data items in each category can be seen.
The relative frequency of a class or category is the proportion of the total number of data items belonging to that class. For a data set with n observations, the relative frequency of each class or category is
relative frequency = frequency of the class ÷n
This is easily converted to a percentage by multiplying by 100. The relative frequencies and percentages for the above example are given in the following table.
Gender Frequency Relative Frequency %
Male 12 8.01512 =
0.8×100=80%
Female 3 2.0153 =
0.2×100=20%
Total 15 1 100
Percentages are particularly useful for making comparisons. Now go and do Exercise S3.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Exercise S3.1
Would it be true to say that the company detailed in table 2 has the same number of males in its workforce as the company whose workforce is summarised in table 1?
Table 2
Gender Frequency
Male 12
Female 10
Total 22
S3 Summarising and Presenting Data 1 Page 31
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
The following data set refers to the number of children in each of 23 surveyed families.
0 1 2 0 3 0 1 1 0
2 3 2 1 1 2 4 3 4
2 2 2 1 0 3
The corresponding frequency distribution is
Number of children Frequency Relative frequency
%
0 5 0.22 22
1 6 0.27 27
2 7 0.30 30
3 4 0.17 17
4 2 0.04 4
Total 23 1 100
This form of frequency distribution can be used for discrete numeric data provided there are not too many distinct numeric outcomes.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 32
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Cross Tabulations
More complex tables of counts can summarise data on two variables simultaneously. Such cross tabulations (or contingency tables) allow investigation of the relationship between the tabulated variables. The values of one variable define the rows of the table and the values of the other variable define the columns. The number in each cell of the table (the intersection of a row and a column) represents the count of the corresponding combination of values. Often row and column totals are included; these give the ordinary frequency distributions of the row and column respectively and are referred to as the marginal distributions of the table. Cross tabulations provide a standard method of summarising the data from a survey and of presenting data in reports and publications.
Example
A personnel department produces a summary of its workforce by gender and marital status in the following cross tabulation.
Table 3: Tabulation of gender against marital status for a company.
Gender Marital Status
Male Female
Total
Single 1 1 2
Married 10 2 12
Widowed 1 0 1
Total 12 3 15
Now go and do Exercise S3.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise S3.2
• What does the value 10 in this cross tabulation represent? • What is the total number of employees in this company? • What would be the marginal, or ordinary, frequency distribution for the
marital status of employees in this company?
S3 Summarising and Presenting Data 1 Page 33
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
It is possible to extend cross tabulations to include discrete data as well as category data. It would be a relatively easy exercise to produce a cross tabulation of gender against number of children for the employees in a company.
A cross tabulation can be summarised by calculating percentages of the row or column totals. If one variable (the explanatory variable) is believed to influence the other (the response variable), then one normally takes percentages of the totals for the explanatory variable.
Example A random sample of 309 furniture defects were recorded and classified according to the type of defect (A, B, C, D) and the production shift (1, 2, 3) in which the item of furniture was manufactured.
Table 4: Production shift against type of defect for a furniture manufacturing process.
Type of defect Shift
A B C D
Total
1 15 21 45 13 94
2 26 31 34 5 96
3 33 17 49 20 119
Total 74 69 128 38 309
We are interested in knowing if the different shifts can be used to explain the occurrence of the different types of defect. So we will be regarding the type of defect as the response variable and the shift as the explanatory variable. As the shift forms the different rows of the table we will use the row totals to calculate the percentages which can then be used to make comparisons.
Table 5: Comparison of type of defect by shift
Type of defect Shift
A B C D
Total
1(%) %161009415 =× %2210094
21 =× %481009445 =× %1410094
13 =× 100%
2(%) 27% 32% 35% 5% 100%
3(%) 28% 14% 41% 17% 100%
All Shifts(%) 24% 22% 41% 12% 100%
S3 Summarising and Presenting Data 1 Page 34
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The percentages calculated from the row totals give the proportions of the various types of defect for each shift. The percentages make it easier to describe the relationship between the two variables and attention should be paid to the way in which percentages differ from row to row. For instance, shift 1 produces a lower percentage of defect A and a higher percentage of defect C than the other 2 shifts.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Relationship between a category (or discrete) variable and a continuous variable
Cross tabulations can be used to summarise the relationship between a continuous variable and a category variable if the continuous numerical variable is first categorised (or grouped) In this way, the relationship between a category variables and a continuous variable can be explored.
Example
The following cross tabulation shows the price of a meal and whether the meal was rated as good, very good or excellent. Notice that the price of a meal is a continuous variable and is presented as the price of a meal being within a certain range: the price variable has been grouped!
Table 6: Cross tabulation of the quality of a meal by price.
Price of meal Quality Rating £5 – £9 £10 – £14 £15 – £19 £20 – £24
Total
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
To explore if the quality of a meal affects its price we will treat the quality as the explanatory variable and take row percentages.
S3 Summarising and Presenting Data 1 Page 35
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Comparison of the quality of a meal by price
Price of meal Quality Rating
£5 – £9 £10 – £14 £15 – £19 £20 – £24
Total
Good(%) 50% 48% 2% 0% 100%
Very Good(%) 23% 43% 31% 3% 100%
Excellent(%) 3% 21% 42% 34% 100%
All Qualities(%) 26% 39% 25% 10% 100%
Refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 5, for a discussion of good guidelines for tabulation.
Three (or more) variables can be summarised in more complicated tables. However, some care must be taken in interpreting such tables.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Graphical descriptions of discrete data
Instead of presenting data in a table it might be better to give a visual display in the form of a graph of chart. Visual displays are good for summarising the data and drawing attention to a particular point. They can also be useful for comparing data sets. Tables on the other hand usually give more detailed information about the data set.
The success of any presentation can be judged by how easy it is to understand. A good presentation should make information clear and allow us to see the overall picture. But good presentations do not happen by chance and need careful planning. If you look at a diagram or table and cannot understand it, it is most probable that the presentation is poor and the fault is with the presenter rather than the viewer.
Sometimes, even when a presentation seems clear, you can look closer and see that it does not give a true picture of the data. This may be a result of poor presentation but sometimes comes from a deliberate decision to present data in a form that is misleading. The problem is that diagrams are a powerful means of presenting data, first impressions last! But they only give a summary and this summary can be misleading, either intentionally or by mistake.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 36
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Pie charts
A pie chart is used to show pictorially the relative sizes of component elements of a total. The circle (or pie) represents the total of the data. The circle is then split into sectors (pieces of pie), the size of each one being drawn in proportion to the frequency of each data item.
Example
The costs of production at Factory A and Factory B during March of one year were as follows.
Table 7: Production costs at two factories
Factory A Factory B £ 000’s % £ 000’s %
Materials 70 35 50 20 Labour 30 15 125 50
Overheads 90 45 50 20 Administration 10 5 25 10
Total 200 100 250 100
A pie chart presentation for the figures of each of these factories is presented below:
Factory A
Materials35%
Labour15%
Overheads45%
Admin5%
S3 Summarising and Presenting Data 1 Page 37
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Factory B
Labour50%
Admin10% Materials
20%
Overheads20%
For a detailed discussion of how to draw these pie charts refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 5.
Pie charts are very good for comparing the relative sizes of elements of a total. They show very clearly when one element is a bigger or smaller proportion of the total. In this example they are very good for comparing the costs of production at the two factories. Factory A’s costs consist largely of overheads whereas at factory B labour is the largest cost. However, pie charts do have a number of disadvantages.
• Actual numbers or % associated with each category need to presented on the diagram as they are virtually impossible to calculate from the pie chart.
• They are not a very good presentation method if there are too many different categories.
• The impression they can give is easily distorted, by presenting a 3 dimensional pie chart for example.
ENDSECTION STARTSECTION=content_8.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 38
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Bar Charts
The bar chart is one of the most common methods of presenting data in a visual form. There are 3 main types of bar chart;
• Simple bar charts • Component bar charts, including percentage component bar charts • Multiple (or compound) bar charts
A simple bar chart is a chart consisting of a set of non-joining bars. A separate bar for each data item is drawn to a height which is proportional to the frequency of the data item. The widths of each bar are always the same. The bars are usually drawn vertically but they can be drawn horizontally.
Example
The following frequency distribution shows the number of sales of computers sold by each of 5 computer companies in a sample of 50 sales.
Table 8: Number of computers sold by company
Company Frequency Apple 13
Compaq 12 Gateway 5
IBM 9 Packard Bell 11
Total 50
The corresponding bar chart for this data is
0
2
4
6
8
10
12
14
Apple Compaq Gateway IBM Packard Bell
Freq
uenc
y
ENDSECTION STARTSECTION=content_9.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 39
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Pareto Charts
A pareto chart is essentially a bar chart but the categories are arranged according to frequency so that the tallest bar is at the left. Pareto charts can be extremely useful tools in business applications as attention is focused on the more important categories. Presented below is a pareto chart of the above frequency distribution.
0
2
4
6
8
10
12
14
Apple Compaq Packard Bell IBM Gateway
Freq
uenc
y
For a discussion of other forms of bar charts and their compilation refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot .
ENDSECTION STARTSECTION=content_10.htm= SECTION~
Pictograms
Pictograms are becoming very popular as they are easy to generate with modern PCs. They are a very elementary form of visual representation but they can be informative and more effective than other methods of presenting data to the general public who, by and large, may lack the understanding and interest demanded by the less attractive forms of presentation. However, they are not accurate forms of presentation. Furthermore, they provide lots of scope for confusion or misleading interpretations of the data.
Type 1 Pictogram
A picture is selected which represents the data. Each picture is then repeated to the required size.
S3 Summarising and Presenting Data 1 Page 40
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Number of chairs sold by Fred's factory
1997
1998
1999
2000
2001 = 5000 chairs
Type 2 Pictogram
The representative picture is magnified instead of being repeated.
WONDA WASH
WONDA WASH
26,500
149,800
Year 1
Year 2
Detergent sales (kg)
S3 Summarising and Presenting Data 1 Page 41
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Questions
Seminar Question S3.1
A large express delivery company is intending to invest money to upgrade the quality of service it offers its customers. The following frequency distributions show:
A: How a sample of customers responded when asked what was important to them in terms of the quality of the service they received.
B: Results of some further investigations by the company.
Frequency Distribution A
Most important service requested by customers
Frequency
Lower cost 19
Less damage to goods 11
Correct billing 60
Faster delivery 9
On time delivery 65
Frequency Distribution B
Reasons for late delivery Frequency
Driver unavailable 10
Van unavailable 8
Waiting for supplies from another van
30
Van damage 5
a) Draw a pareto chart for each of the above frequency distributions.
b) What action should the company take to improve quality of service?
S3 Summarising and Presenting Data 1 Page 42
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S3.2
A random sample of 500 households was surveyed and data on three variables, household size, household income and number of cars owned were collected. The data are summarised in the following table.
Number of cars Income Household size 2 or fewer More than 2
4 or fewer 125 100 Less than £20,000
More than 4 15 60
4 or fewer 100 50 £20,000 or more
More than 4 10 40
a) Make a two way table of income by number of cars owned by summing entries in the table so that household size is not considered. Include the marginal totals.
b) Discuss the nature of the apparent association between household income and number of cars owned.
c) Make a two way table of household size by number of cars owned by summing entries in the table so that income is not considered. Include the marginal totals.
d) Discuss the nature of the apparent association between household size and number of cars owned.
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question S3.3
A children’s charity is concerned about the number of children living in poverty. It decided to undertake an analysis to draw conclusions about what sort of families were most likely to have children affected by poverty. The following table was compiled using government statistics. The table includes all families whose income was below 50% of the national average. Households have been split into four categories and the number of children in each of these categories presented.
S3 Summarising and Presenting Data 1 Page 43
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Status of household Number of children
Parent in full time work 1,026,000
Lone parent 938,000
Unemployed Parent 763,000
Pensioner 320,000
a) Convert the figures into a percentage of the total and represent these percentages on a bar chart. What other graphical method could you have used?
b) What does this graph tell you about the sort of household most likely to have children living in poverty?
The following table gives you additional information about the total number of children in each of these categories in the U.K.
Status of household Number of children in poverty
Total number of children
Parent in full time work 1,026,000 9,330,000
Lone parent 938,000 1,250,000
Unemployed Parent 763,000 930,000
Pensioner 320,000 470,000
Total 11,980,000
c) Using this table calculate the percentage of children in each category that are in poverty. For example, the calculation for the lone parent household category would be
%75100000,250,1
000,938=×
d) Draw these results as a bar chart. How do you interpret these figures?
e) Which of these two analyses do you think is more informative? Explain your answer.
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
S3 Summarising and Presenting Data 1 Page 44
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question S3.4
The equal opportunities officer of a public sector organisation has been examining the results of applications for promotion over the last year.
‘In the last year, there were 450 female applicants for promotion, of whom 40 were successful and 410 were unsuccessful. There were 760 male applicants for promotion, of whom 124 were successful and 636 unsuccessful.’
a) Present this information in a suitable tabular form.
b) Summarise your table in such a way that comparisons between women and men can be made more easily.
c) Comment on the result obtained. ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Seminar Question S3.5
A health authority has 5 hospitals in its district. The number of beds in each hospital is classified as follows.
A component and a component percentage bar chart are provided below. Referring to these diagrams, write a brief report to the health authority on the provision of beds for its patients. Briefly comment on the differences between these graphical representations.
Hospitals
Foothills General Southern Heathview St Johns
Maternity 24 38 6 0 0
Surgical 86 85 45 30 24
Medical 82 55 30 30 35
Category of bed
Psychiatric 25 22 30 65 76
S3 Summarising and Presenting Data 1 Page 45
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Component bar chart
0
50
100
150
200
250
Foothills General Southern Heathview St Johns
PsychiatricMedicalSurgicalMaternity
Percentage component bar chart
0%
20%
40%
60%
80%
100%
Foothills General Southern Heathview St Johns
PsychiatricMedicalSurgicalMaternity
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• summarising discrete data in frequency tables; • using cross tabulations to summarise information on 2 variables (one of
which may be continuous); • using % to compare counts and being able to identify whether to use row
or column totals to calculate % when discussing cross tabulations; • drawing and interpreting bar charts; • discussing pie charts and pictograms; • knowledge of the advantages and disadvantages of different methods of
graphical summary.
S3 Summarising and Presenting Data 1 Page 46
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S4 Summarising and Presenting Data 2 Page 47
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=SPD2
STARTSECTION=scope_1.htm= SECTION~
Summarising and Presenting Data 2 Context
In the last unit we began to look at how tables and graphs can be used to present the information in a data set in a manner which makes it easier to see the important features of the data set. The data sets we considered in the last unit were similar in that they consisted of category data or discrete data with a limited range of values. When the number of distinct data values in the data set is large (20 or more say), or the data is continuous in nature, the techniques covered in the last unit are of little use, so we need to do something different.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• use grouped frequency distributions to summarise numerical data; • present the information in a grouped frequency table graphically using a
histogram; • construct a histogram for grouped frequency tables with unequal class
widths; • use frequency polygons for comparing two grouped frequency
distributions; • appreciate how the choice of scale on a graph affects the visual
impression of the graph. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Grouped Frequency Distribution
A grouped frequency distribution organises the data items into groups or classes of values. It then shows how many data items are within each class which is referred to as the class frequency.
Example 1
The following data refer to the age of 25 employees:
63 27 46 47 22 64 30 19 69 36 65 60 40 66 55 33 47 42 49 23 22 46 62 30 20
S4 Summarising and Presenting Data 2 Page 48
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
To present this as a grouped frequency distribution we need to decide on what the classes should be. There is no set rule as to how you should choose your classes but it will depend to some extent on the size of the data set. Generally a grouped frequency is easier to read if the class intervals are in round numbers, i.e. multiples of 5 or 10 or 100 etc. In the above example we have ages ranging from 19 to 69. If we choose intervals of size 5 we will need a lot of classes (about 12) to cover a range of 19 to 69 and the data set is itself not very large so we would be spreading the data out among too many classifications. So instead we will use a class interval of size 10. This would mean that the classes of the grouped frequency would be;
10 – 19 (this includes the 10 values 10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
20 – 29,
30 – 39, etc
Counting how many of the observation lie in each class produces the following grouped frequency distribution:
Table 1: Grouped frequency distribution of the age of 25 employees
Age of employee
Number of employees
10 – 19 1
20 – 29 5
30 – 39 4
40 – 49 7
50 – 59 1
60 – 69 7
Total 25
Grouping allows you to see any pattern in the data. However, it is important to realise that grouping results in a loss of information. In the above example we know that there are 7 employees whose age is between 40 and 49 but, without access to the original data set, we do not know their exact ages. A clearer pattern has been bought at the cost of a loss of information. Consequently if you should wish to use a grouped frequency distribution to perform any mathematical calculations the answers will not be exact.
S4 Summarising and Presenting Data 2 Page 49
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example 2
The following grouped frequency distribution shows the height of 50 individuals. A different convention has been used to describe the classes because height is a continuous variable.
Table 2: Height of 50 individuals
Height (cm) Number of individuals
Less than but including 155 1
Over 155, up to and including 166 3
Over 165, up to and including 175 8
Over 175, up to and including 185 16
Over 185, up to and including 195 18
Over 195 4
Total 50
Choosing the intervals for a grouped frequency
The choice of the intervals or classes in a grouped frequency is entirely up to you. However, you do want to produce a grouped frequency that is easy to read so choosing interval in 10s, 100s, etc is usually a good idea. There are some general rules to follow when compiling a grouped frequency distribution:
• You should not have too many classes and you should not have too few classes. If you have too few classes too much information is lost and hence important details of the data will also be lost. If you have too many classes the resulting grouped frequency distribution has too much detail and patterns in the data set are hard to observe. Somewhere between 5 and 12 classes should be enough.
• Classes must not overlap. If for example you had the two classes 10 – 20 and 20 – 30 which class would a data item with a value of 20 belong to? You can’t put it in both as this would result in double counting!
• Open-ended classes can be used but only at the two ends of the distribution.
For a greater discussion of how to prepare grouped frequency distributions and the choice of class intervals refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 5.
S4 Summarising and Presenting Data 2 Page 50
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Definitions associated with grouped frequency distribution
To illustrate some ideas associated with grouped frequency distributions we will consider the first three classes of the distribution in example 1.
10 – 19 20 – 29 30 – 39
Class Limits: These are the upper and lower values of the classes as physically described in the distribution. So for the 10 – 19 class above the lower limit is 10 and the upper limit is 19. You should be able to detail the limits for the remaining classes.
Class Boundaries: These are the upper and lower values of a class that mark common points between classes. Considering the 20 – 29 class, the lower boundary would be the common point of where the previous class (10 – 19) and this class (20 – 29) meet. This would be considered to be the middle of where the first class ends (19) and the next class starts (20) so the lower boundary is 19.5 which is half way between 19 and 20. A similar argument can be applied to the upper boundary which would then give 29.5 as the value for the upper boundary. Boundaries for example 2 would be more difficult. We will only be using boundaries when discussing the construction of histograms so we shall return to this later.
Class midpoint: This is what is says it is: the value half way through the range of the class. It is calculated as
(Upper class limit + Lower class limit) / 2
So for the 10 – 19 class, the class midpoint would be
.5.142
1910=
+
We will be using class midpoints to draw frequency polygons. They are also useful for performing calculations with grouped frequency distributions. As we have already said, the grouping results in a loss of information. If we did have to do a calculation with the distribution in example 1 we would know that there are 7 data items in the range 40 – 49 but we don’t know their value. The best we can do is say that each of the 7 items has a value of 44.5 which is the midpoint of the class.
S4 Summarising and Presenting Data 2 Page 51
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Class Width: This is the difference in the boundaries of the class or the difference in the lower limit of this class and the following class. So for the 20 – 29 class the class width is either calculated as;
the difference in the boundaries,
29.5 – 19.5 = 10
or as the difference in the lower limits of this class and the following class,
30 – 20 = 10. ENDSECTION STARTSECTION=content_4.htm= SECTION~
Histograms
A grouped frequency distribution can be represented graphically using a histogram. A histogram is similar to a bar chart with a rectangle (bar) being used to represent the frequency of each class. However, the rectangles of a histogram join up to distinguish it from a bar chart and remind us that we are typically dealing with continuous data. The horizontal axis of a histogram represents a continuous number scale and it is important to be aware that the bars on a histogram do not represent separate categories (as on a bar chart) but rather adjacent intervals on a number line. In other words, although a bar chart and a histogram look similar, they are designed to represent two different types of data. The bar chart is useful for depicting separate categories while the histogram describes the ‘shape’ of data that have been measured on a continuous number scale.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Histogram with classes of equal width
Provided each class in the grouped frequency distribution is of equal width, the height of the rectangle for each class represents the frequency of that class. In this respect, histograms are similar to bar charts for distributions with classes of equal width. The classes of the grouped frequency distribution are represented along the horizontal axis on a continuous number scale and the frequencies are presented along the vertical axis. The histogram for example 1 would be as follows.
S4 Summarising and Presenting Data 2 Page 52
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Histogram for the ages of 25 employees.
0
1
2
3
4
5
6
7
14.5 24.5 34.5 44.5 54.5 64.5
Age
Freq
uenc
y
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Histogram with classes of unequal width
If the frequency distribution has unequal class widths the area and not the height of each rectangle represents the frequency of each class. Put another way, the heights of the bars have to be adjusted for the fact that the bars do not have equal width.
Example 3
Consider the following distribution which shows the length of time in minutes that a computer help line advisor spends on the phone with each caller.
Time in Minutes Number of callers
Less than 10 8
10 or more, but less than 20 10
20 or more, but less than 30 16
30 or more, but less than 40 15
40 or more, but less than 50 11
50 or more, but less than 60 4
60 or more, but less than 70 2
70 or more, but less than 80 1
80 or more, but less than 90 1
S4 Summarising and Presenting Data 2 Page 53
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
A histogram of this distribution would be as follows:
0
5
10
15
20
0 10 20 30 40 50 60 70 80 90 100
Time in minutes
It is clear from the distribution that few callers are on the phone for more than 50 minutes. This is also apparent from the histogram as it has a long tail at the higher end of the number scale where there are only 8 values in the last 4 classes. So it is decided to combine the last three classes together so that the distribution is now
Time in Minutes Number of callers
Less than 10 8
10 or more, but less than 20 10
20 or more, but less than 30 16
30 or more, but less than 40 15
40 or more, but less than 50 11
50 or more, but less than 90 4 + 2 + 1 + 1 = 8
When drawing the histogram we cannot change the numerical scale on the horizontal axis so the last class is now four times as wide as the others.
S4 Summarising and Presenting Data 2 Page 54
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
If the heights of the bars were to still represent the frequency of each class the resulting histogram would be:
0
5
10
15
20
0 10 20 30 40 50 60 70 80 90 100
Time in minutes
It seems that the process of collapsing several class intervals together has the misleading effect of accentuating the importance of the wider interval. Since this interval is 4 times as wide as the others, it makes sense to adjust for this by reducing the height of this bar to a quarter of the height shown in the above histogram. The adjusted height is called the frequency density of the class. If the classes of a distribution are of unequal width then the frequency density of each class should be calculated prior to drawing the histogram. The frequency densities are calculated as
Frequency density = frequency of class/width of class
Time in Minutes Number of callers
Width Frequency density
Less than 10 8 1 8/1 = 8
10 or more, but less than 20 10 1 10/1 = 10
20 or more, but less than 30 16 1 16/1 = 16
30 or more, but less than 40 15 1 15/1 = 15
40 or more, but less than 50 11 1 11/1 = 11
50 or more, but less than 90 8 4 8/4 = 2
NB: The width of the second class is really 20 – 10 = 10 and the width of the last class is 90 – 50 = 40. However, it is easier to say that the first 5 classes all have the same width which we call 1 and the last class is 4 times as wide.
S4 Summarising and Presenting Data 2 Page 55
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The resulting histogram of the frequency densities would then be:
0
5
10
15
20
0 10 20 30 40 50 60 70 80 90 100
Time in minutes
Refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 5 for further discussion on histograms.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
Frequency polygons
Instead of using a histogram, which is a series of rectangles, it might be preferable to display the frequency distribution as a single curve. This is known as a frequency polygon. Each class is represented by a single point and the height of each point represents the class frequency. The position of the point must be directly above the class midpoint. The points are then joined up to form the frequency polygon.
Example The following distribution details the prices of 80 cars sold at a car show room last month
Selling price (£ thousands) Midpoint Number of cars
6 and up to but not including 8 7 8
8 and up to but not including 10 9 23
10 and up to but not including 12 11 17
12 and up to but not including 14 13 18
14 and up to but not including 16 15 8
16 and up to but not including 18 17 4
18 and up to but not including 20 19 2
Total 80
S4 Summarising and Presenting Data 2 Page 56
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
A frequency polygon of this distribution follows. Notice that in order to complete the polygon, midpoints of 5 and 21 were added to the x-axis to ‘anchor’ the polygon at zero frequencies.
Frequency polygon of the selling prices of cars
0
5
10
15
20
25
0 5 10 15 20 25Selling price (£000s)
Freq
uenc
y
The frequency polygon is particularly useful if you want to compare two frequency distributions. The following graph compares the car sales at two car show rooms. The total number of cars sold at these two show rooms is similar so a direct comparison is possible. If the difference in the total number of cars sold at each show room were large, converting the frequencies to relative frequencies and then plotting the two polygons would allow a clearer comparison.
Frequency polygons comparing car salesat two showrooms
05
1015202530
0 5 10 15 20 25 30Selling price (£000s)
Freq
uenc
y
ENDSECTION STARTSECTION=content_8.htm= SECTION~
The effect of scale on line graphs
The last sort of graphical representation we will consider in this unit is a line graph. You will have seen examples of these scattered throughout the newspapers and text books etc. A classic use of the line graph is a time series plot which displays figures recorded over time such as monthly sales figures etc.
S4 Summarising and Presenting Data 2 Page 57
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Seminar Questions
Seminar Question S4.1
The following frequency distributions represent the number of days during a year that employees of a large retail company were absent due to illness. The table includes a summary for two departments, namely Customer Relations and Finance.
Number of days absent
Number of employees in Customer Relations
Number of employees in Finance
0 and up to 3 5 5
3 and up to 6 6 12
6 and up to 9 11 23
9 and up to 12 15 8
12 and up to 15 10 2
Total 47 50
a) Construct a histogram of the distribution for the employees in customer relations.
b) Construct a frequency polygon comparing the distributions for the two departments.
c) Discuss the rate of employee absenteeism for the two departments. ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Question S4.2
Driving under the influence of alcohol is a serious offence. The following data gives the ages of a random sample of 50 drivers arrested whilst driving under the influence of alcohol.
46 16 41 26 22 33 30 22 36 34 63 21 26 18 27 24 31 38 26 55 31 47 27 43 35 22 64 40 58 20 49 37 53 25 29 32 23 49 39 40 24 56 30 51 21 45 27 34 47 35
a) Construct a frequency distribution of these age figures and determine the relative frequency distribution.
b) Draw a histogram of the frequency distribution.
S4 Summarising and Presenting Data 2 Page 58
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Question S4.3
A bank is studying the number of times their automatic cash point located in a supermarket is used each day. The following data set details how many times it was used on each of the last 30 days.
83 64 84 76 84 54 75 59 70 61
63 80 84 73 68 52 65 90 52 77
95 36 78 61 59 84 95 47 87 60
a) Produce a frequency distribution for the number of times the cash point was used.
b) What was the smallest and largest number of times that the machine was used?
c) Around what values did the number of times the machine was used tend to cluster?
d) From the distribution, how many times would you say the machine was used on typical day?
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S4.4
As a preliminary to a review of recruitment policy, a study was made of the age structure of a firm. The results for the 1,000 staff were:
Age in years Number of staff
20 but less than 25 60
25 but less than 30 110
30 but less than 35 120
35 but less than 40 180
40 but less than 45 200
45 but less than 50 150
50 but less than 65 180
Total 1000
Draw a histogram of this distribution, commenting on any difficulties you encounter.
S4 Summarising and Presenting Data 2 Page 59
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question S4.5
The following data set details the strength of the wind for the 31 days of July in a given year.
Wind type Number of days
Strong wind 10
Calm 5
Gale 7
Light breeze 9
Total 31
A series of 7 graphical representations of this data set follow. Some of the graphs suggested are quite sensible whilst others are not. Look at the suggested graphs and try to decide which are useful and which are not.
Graph A
0
2
4
6
8
10
Strongwind
Calm Gale Lightbreeze
Day
s
Graph B
0
5
10
15
20
25
30
35
Strongwind
Calm Gale Lightbreeze
Total
Day
s
S4 Summarising and Presenting Data 2 Page 60
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Graph C
0
5
10
15
20
25
30
35
Strong wind Calm Gale Light breeze Total
Day
s
Graph D
4
5
6
7
8
9
10
Calm Light breeze Strong wind Gale
Day
s
0
Graph E
0123456789
10
Calm Light breeze Strong wind Gale
Day
s
S4 Summarising and Presenting Data 2 Page 61
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Graph F
Strong wind
Calm
Gale
Light breeze
Total
Graph G
Strong wind
CalmGale
Light breeze
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• compiling grouped frequency distributions of continuous data; • constructing histograms of grouped frequency distributions; • adjusting the frequencies for grouped frequency distributions to produce an
undistorted histogram; • using frequency polygons to compare distributions by considering questions
of where the data cluster and how the data spread; • appreciating how the choice of scale influences the visual impression of a
graph.
S4 Summarising and Presenting Data 2 Page 62
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S5 Summarising and Presenting Data 3 Page 63
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=SPD3
STARTSECTION=scope_1.htm= SECTION~
Summarising and Presenting Data 3 Context
In the previous unit we considered how to organise data into a grouped frequency distribution and to present this graphically using a histogram. The major advantage of presenting data this way is that we get a quick visual picture of the shape of the distribution. That is, we can see where the data are concentrated and also determine whether there are any extremely large or small values. However, there are two disadvantages to organising data into a frequency distribution:
• we lose the exact identity of each data value; • we are not sure how the data values within each class are distributed.
A stem and leaf display, or a stemplot, can be used as an alternative to a histogram. It allows us to see the general shape of the distribution of the data, but it has the advantage of not losing the value of each data item.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• compile a stem and leaf display of numerical data; • recognise the similarities and differences between a stem and leaf
display and a frequency distribution; • use back to back stem and leaf displays to compare two data sets by
considering ideas of where the data are clustered and how spread out the data are;
• understand and be able to calculate a median value. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Stem and Leaf Display
A stem and leaf display is a statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit(s) the leaf. The stems are located along the vertical axis, and the leaf for each observation along the horizontal axis. The following example will explain the details of developing a stem and leaf display.
S5 Summarising and Presenting Data 3 Page 64
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example 1
To illustrate the use of a stem and leaf display we shall return to the data set concerning the age of 25 employees which we used in the previous unit:
63 27 46 47 22 64 30 19 69
36 65 60 40 66 55 33 47 42
49 23 22 46 62 30 20
The data consists of two digit numbers so it is fairly obvious how we will split the numbers into stems and leaves. The first digit (the ‘tens’ digit) will form the stems and the second digit (the ‘units’ digit) will form the leaves. For example, for the data item 63, the stem is the 6 and the leaf is 3. For a stem and leaf display, write all the possible stems, in order, on the left hand side of a vertical line. Then go through the data values, in the order they are given, and record the leaf of the value opposite the corresponding stem. The first five values (63, 27, 46, 47, 22) are put in like this:
1
2 7 2
3
4 6 7
5
6 3
When all the values have been recorded the display looks like this:
1 9
2 7 2 3 2 0
3 0 6 3 0
4 6 7 0 7 2 9 6
5 5
6 3 4 9 5 0 6 2
1 9
2 7 2 3 2 0
3 0 6 3 0
4 6 7 0 7 2 9 6
5 5
6 3 4 9 5 0 6 2
S5 Summarising and Presenting Data 3 Page 65
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
To complete the stem and lead display we need to indicate the scale which is done by indicating the unit value of a leaf. This is very important and no stem and leaf display is complete without an indication of what the scale of the numbers are. It is also a good idea to order the leaves from smallest to largest and include a count column which indicates how many data items are on each stem. Then to finish it all off, include a title!
Stem and leaf display of ages of part time employees in years
Count
1 9 (1)
2 0 2 2 3 7 (5)
3 0 0 3 6 (4)
4 0 2 6 6 7 7 9 (7)
5 5 (1)
6 0 2 3 4 5 6 9 (7)
Leaf unit = 1 year
Notice that, as well as sorting the data into order, the stem and leaf provides a visual display of the data: it is easy to compute the numbers of employees in different age groups. Note that it is therefore essential to space out the leaves evenly. The leaves on each stem can be counted and these counts have been shown in brackets on the right of the leaves. Each count shows the frequency associated with each stem. In this example the counts show the number of employees in each age group. So there is 1 employee in the age range 10 – 19, 5 in the age range 20 – 29 etc. If you take this information and present it in a table you get:
Age of employee Number of employees
10 – 19 1
20 – 29 5
30 – 39 4
40 – 49 7
50 – 59 1
60 – 69 7
Total 25
This is exactly the same as the grouped frequency for this data set presented as table 1 in the lecture for the previous unit. So stem and leaf displays can be useful as a method of compiling grouped frequency distributions.
S5 Summarising and Presenting Data 3 Page 66
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Suppose boxes are put around each row of leaves as follows:
Stem and leaf display of ages of part time employees in years
1 9
2 0 2 2 3 7
3 0 0 3 6
4 0 2 6 6 7 7 9
5 5
6 0 2 3 4 5 6 9
Leaf unit = 1 year
Now remove the leaves but keep the boxes. This shows the shape of the stem and leaf display but not the individual values. Each stem is represented by a box or bar whose length represents its frequency. This is what we have previously referred to as a histogram but here it is presented on its side rather than vertically. Notice that the stems have been replaced by intervals like 20 – 29, which in this example represents age groups.
Ages of 25 employees
10 – 19
20 – 29
30 – 39
40 – 49
50 – 59
60 – 69
So stem and leaf displays are very similar to grouped frequency distributions and histograms. However, they have certain advantages over histograms.
• The actual values of the raw data from which it has been drawn have been preserved.
• When the stem and leaf display is drawn in its sorted form, the data are displayed in rank order from lowest to highest. As we will see later, this can be very useful.
S5 Summarising and Presenting Data 3 Page 67
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Comparing data sets
When two data sets are to be compared, they can be drawn onto separate stemplots and placed back to back.
Example 2
The following data set details the age of 40 successful women. The data set was taken from the birthday column of The Independent newspaper over a period of several consecutive days.
77 32 55 55 59 67 55 60 51 82
66 66 100 29 61 47 52 46 53 63
74 47 58 72 55 50 36 52 58 48
80 41 54 53 70 68 42 62 98 45
Now go and do Exercise S5.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise S5.1
Show that the stem and leaf display of this data set is:
2 9
3 2 6
4 1 2 5 6 7 7 8
5 0 1 2 2 3 3 4 5 5 5 5 8 8 9
6 0 1 2 3 6 6 7 8
7 0 2 4 7
8 0 2
9 8
10 0
Leaf unit = 1 year
S5 Summarising and Presenting Data 3 Page 68
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The same newspaper also gave the ages of 40 successful men. This data set is :
38 43 46 48 49 49 50 51 51 51
51 51 53 53 56 58 62 64 64 66
66 66 67 69 69 71 71 74 74 76
77 79 80 80 80 81 81 82 82 87
If you present this data set in a stem and leaf display you get the following:
3 8
4 3 6 8 9 9
5 0 1 1 1 1 1 3 3 6 8
6 2 4 4 6 6 6 7 9 9
7 1 1 4 4 6 7 9
8 0 0 0 1 1 2 2 7
Leaf unit = 1 year
Trying to compare the two data sets from separate stem and leaf displays can be difficult. It is much easier if we draw them on back to back stem and leaf displays as shown below.
Back to back stem plot comparing the ages of successful men and women
Men Women
2 9
8 3 2 6
9 9 8 6 3 4 1 2 5 6 7 7 8
8 6 3 3 1 1 1 1 1 0 5 0 1 2 2 3 3 4 5 5 5 5 8 8 9
9 9 7 6 6 6 4 4 2 6 0 1 2 3 6 6 7 8
9 7 6 4 4 1 1 7 0 2 4 7
7 2 2 1 1 0 0 0 8 0 2
9 8
10 0
Leaf unit = 1 year
S5 Summarising and Presenting Data 3 Page 69
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
When interpreting a back to back stem plot you need to ask yourself two questions.
• Do the two data sets appear to be clustering in the same place? • Do the two data sets appear to have similar spread?
It is clear from the back to back stem plot that the women’s ages are more widely spread than that of the men (a range of 29 – 100 years as opposed to 38 – 67 for the men). Also, the women were, overall, slightly younger than the men. There are 15 men who are over 70 as opposed to only 8 women.
Stem and leaf displays should be used in a fairly flexible way, there are no hard and fast rules. The examples we have covered here have used data with only two digit numbers. With values like these it is fairly obvious that the stem should represent the ‘tens’ and the leaves should correspond to ‘units’. However, if the data to be displayed were all decimal numbers less than 1 or very large numbers bigger than, say, 1000, then the stem and leaves would need to be redefined accordingly. For three digit numbers the stems could be hundreds and the leaves tens with the unit digits forgotten.
Sometimes there are too many leaves to fit onto a single stem so you can split each stem into two: leaves with values 0 to 4 go on the first half of the stem and those with values 5 to 9 go on the second half of the stem. You can continue and split the stem into smaller and smaller categories if necessary.
Example 3
Suppose we want to compile a stem and leaf display of the ages of 20 students in a class.
17 18 19 19 20 20 21 22 22 23
23 24 24 24 27 28 29 29 31 32
If we represent this as a stem and leaf display using the tens as the stem we would get;
Ages of 20 students
1 7 8 9 9
2 0 0 1 2 2 3 3 4 4 4 7 8 9 9
3 1 2
Leaf unit = 1 year
S5 Summarising and Presenting Data 3 Page 70
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
As the range of ages is quite limited we end up with a stem and leaf display with a lot of information on the middle row. Maybe it would be better if we split each stem in half. That is, instead of looking at how many student are aged between 20 and 29, look at how many students are in each of the ranges 20 – 24, and 25 – 29. This would give us;
Ages of 20 students
1
1* 7 8 9 9
2 0 0 1 2 2 3 3 4 4 4
2* 7 8 9 9
3 1 2
3*
Leaf unit = 1 year
The notation 2* is used to indicate that the stem has been split into two and it is assumed that the split is even so that the part of the stem labelled as 2 has leaves from 0 to 4 and the part of the stem labelled as 2* has leaves from 5 to 9. If the split is more complicated than this you may need to include a key explaining the split.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Numerical summaries based on ordered data
Simple numerical summaries of data can be obtained if the data is first sorted into ascending numerical order. As a stem and leaf display produces the ordered data this is the perfect opportunity to begin to discuss these ideas. If the data are ordered from smallest value to largest value, three important summary values are;
• the minimum: the smallest data item; • the maximum: the largest data item; • the median: the middle data item.
Example 4
Suppose we have the following set of 5 exam marks.
65% 61% 60% 58% 71%
If we order these values from smallest to largest we get
58% 60% 61% 65% 71%
S5 Summarising and Presenting Data 3 Page 71
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The minimum is obviously 58% and the maximum is 71%. The median is the middle value. As this data set is quite small it is easy to see that the middle value is 61%. However, if the data set were larger it would be much harder to pick out the median value by eye. So we need to find another method for calculating a median. For larger data sets, provided we know the position of the median we can then locate it in the list of ordered data.
We calculate the position of the median as (n + 1)/2 where n is the number of data items.
In this example, there are n = 5 observations so the median is at position
(5+1)/2 = 6/2 = 3.
So the median is the third data item. The value of the third data item is 61%, so the median is 61%.
In this example the number of data items, n = 5, was odd so there is a unique data item in the middle of the data. What happens when the number of data items is even?
Now go and do Exercise S5.1
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise S5.2
Consider the following set of exam marks.
58% 60% 61% 65% 71% 73%
Again the minimum and the maximum values are easily found but what value would you now quote as a median?
A difficulty arises as there is no unique middle value. We have two middle values: 61% and 65%. When this happens the convention is to take the average of the two middle values so that the median is
%.632
%65%61=
+
What happens to our technique for finding the position of the median if the data set has an even number of data items? In this example n = 6 so the position of the median is
.5.327
216
21
==+
=+n
S5 Summarising and Presenting Data 3 Page 72
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The decimal point in this calculation alerts you to the fact that you have an even number of data items so you need to take an average of two values. So we will take the average of the 3rd and the 4th data items as shown above.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Using a stem and leaf display to calculate a median
As a stem and leaf display orders the data it can be very useful when trying to calculate a median.
Example 5
Returning to the stem and leaf display of the ages of 25 employees, calculate the minimum, maximum and median. The stem and leaf display is repeated below.
Stem and Leaf Display of Ages of part time employees in years
Count
1 9 (1)
2 0 2 2 3 7 (5)
3 0 0 3 6 (4)
4 0 2 6 6 7 7 9 (7)
5 5 (1)
6 0 2 3 4 5 6 9 (7)
Leaf unit = 1 year
The minimum for this data set is easily read off as 19 from the first row, and the maximum as 69 from the last value of the last row. To calculate the median we need its position first.
Position of the median .132
262
1252
1==
+=
+=
n
So the median is the 13th data item. We can either count through the stem and leaf display or use the counts at the end of each row to help. The 13th data item is on the 4th row and its value is 46. So the median is 46.
S5 Summarising and Presenting Data 3 Page 73
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Questions
Seminar Question S5.1
The following stem and leaf display shows the number of units produced per day in a factory.
Count
3 8 (1)
4 (0)
5 6 (1)
6 0 1 3 3 5 5 7 9 (8)
7 0 2 3 6 7 (5)
8 3 5 9 (3)
9 0 0 1 5 6 (5)
10 3 6 (2)
Leaf unit = 1 unit
a) How many days were studied in this survey of production?
b) List the actual data values in the fourth row.
c) What are the minimum, maximum and median values?
d) What percentage of days did the factory produce 80 or more units? ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S5.2
A community college requires all its students to take a basic maths test before beginning any degree program. The scores on the test can range from 0 to 100. For 70 students interested in the associate of management degree the scores were as follows.
S5 Summarising and Presenting Data 3 Page 74
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
22 60 80 75 87 92 65 46 33 95
72 98 100 37 58 75 86 92 77 85
86 97 83 81 87 42 91 89 87 84
72 86 63 42 26 97 93 98 72 82
85 79 84 75 83 92 89 63 86 68
80 97 81 87 72 89 87 73 65 52
76 86 91 53 67 67 69 72 92 81
a) Make a stem and leaf display of these scores.
b) Briefly discuss the distribution of marks commenting on where the marks cluster and how spread out they are.
c) Calculate the minimum, maximum and median score.
d) One of the financial maths options on this degree course requires the student to have a score of at least 70 on this test. What percentage of these students are eligible to take this financial maths option?
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question S5.3
The Boston Marathon is the oldest in the U.S. The distance is approximately 26 miles. Boston University has a record of all the winning times for the Boston Marathon: they are all over 2 hours. The following data are the minutes over 2 hours for the winning male runner.
Years 1953 – 1972
18 20 18 14 20 25 22 20 23 23
18 19 16 17 15 22 13 10 18 15
Years 1973 – 1992
16 13 9 20 14 10 9 12 9 8
9 10 14 7 11 8 9 8 11 8
S5 Summarising and Presenting Data 3 Page 75
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
a) Construct a back to back stem plot of the two data sets for the minutes over 2 hours of the winning times. You will need to use a split stem so split each stem into two equal parts.
b) Compare the two distributions. How many times under 15 minutes are there in each distribution? What does the display show you about how the winning times have changed?
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question S5.4
The following table shows the daily number of absentees for a company of 500 employees over a period of eight weeks.
a) Form a stem and leaf display of the absenteeism figures.
Week Monday Tuesday Wednesday Thursday Friday
1 20 8 9 5 34
2 15 6 11 12 19
3 26 8 12 8 19
4 21 12 16 16 24
5 13 9 5 12 35
6 23 13 14 8 33
7 14 13 5 13 31
8 26 11 10 9 35
b) Calculate the minimum, maximum and median number of absentees.
c) Write a short verbal summary of your results. Consider whether there are any other features of the data that you think should be explored.
ENDSECTION STARTSECTION=think_1.htm= SECTION~
S5 Summarising and Presenting Data 3 Page 76
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• the construction of stem and leaf displays; • the similarities and differences between stem and leaf displays and
histograms; • using back to back stem and leaf displays to compare data sets by
considering issues such as where the data sets cluster and how spread out the data is;
• using ordered data to calculate numerical summaries of data, minimum, maximum and median.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S6 Numerical Summaries of Data 1 Page 77
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=NSD1
STARTSECTION=scope_1.htm= SECTION~
Numerical Summaries of Data 1 Context
In the last 3 units we have looked at how data sets can be summarised and presented using tables and diagrams. We’ve also begun to look at how data sets can be compared using back to back stem and leaf plots. When comparing data sets we need to be able to summarise the information as briefly and usefully as possible. This unit will begin to look at ways to do this.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should:
• appreciate what a statistical measure of location is; • appreciate what a statistical measure of dispersion; • understand that for a measure of location there is a partner
measure of dispersion; • understand and be able to calculate a mode and discuss its
strengths and weaknesses; • extend your knowledge of the median to calculate a five figure
summary; • understand the use of a median and be able to discuss its strengths
and weaknesses; • appreciate that the quartile deviation is the partner measure of
dispersion for a median. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Measures of Location and Dispersion
In practice, the two most useful considerations which help to summarise a mass of figures are:
• What is a typical or average value for this data set? • How widely spread are the figures?
Numerical summary statistics are single numbers that represent particular features of the data set and try to answer the two questions above. These single number summaries may be of interest in their own right or they may be used in conjunction with histograms etc to allow more objective comparisons of data sets. Single number summaries of data sets are important because they provide immediate impressions of order of magnitude and they allow simple comparisons.
S6 Numerical Summaries of Data 1 Page 78
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
For example, to decide if one department’s production is higher than another’s you don’t necessarily need all the production figures for the past month for both departments. If you know department A typically completed 300 orders and department B completed 175 orders a quick comparison of these numbers will tell you that department A seems to be performing better.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Measure of Location
A measure of location summarises the information from a set of data into just one number which gives us an idea of what the numbers in the data set are typically like and is useful for answering the first question above. The measure of location should reflect where the data are tending to cluster on a histogram. If the data set is nicely behaved, in the sense that there are not too many extremely high or low values, the data should tend to cluster in the centre of its range so measures of location are often referred to as measures of central tendency.
How do we go about choosing the single number which will be representative of all the numbers in the data set? If the data is clustering in the middle of its range then an obvious candidate for the measure of the location is the middle data item or the Median. We considered the calculation of a median in the last unit and will return to this later. In statistics there are three main measures of location:
• Arithmetic mean: we will refer to this simply as the mean. This is the average in the usual sense of the word; the sum of the observations divided by the number of observations.
• Mode: the value of the data item that occurs most frequently in the data set.
• Median: the value of the middle data item when the observations are arranged in numerical order.
Often in publications all three of these measures of location are referred to as an average. As we shall see later, for some data sets all three of these measures of location may be different so it useful to know which of these averages are actually being used.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Measure of Dispersion
As well as knowing what a typical value for the data set is and where the data is clustering, it is useful to have some idea of how spread out the data are. A measure of dispersion is a single number which tries to describe how spread out the data are and attempts to answer the second question above.
Two sets of data could quite easily have the same location (be clustering in the same place) but the spread, or dispersion, of the data in each data set may be very
S6 Numerical Summaries of Data 1 Page 79
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
different. The measure of dispersion gives you information about how the individual data items vary around the measure of location. Do the data items cluster tightly around the measure of location or are they more spread out?
Some commonly used measures of dispersion are
• Range: Spread between the highest and lowest data values. • Interquartile range: Spread in the middle 50% of the data • Standard deviation: A measure of how the data cluster around the
mean.
Measures of location and measures of dispersion are related in that when you quote a measure of location you should also quote an accompanying measure of dispersion. The measures of location tend to have a natural partner from amongst the measures of dispersion.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Medians Revisited and Five Figure Summaries
We have discussed medians and the calculation of a median value in the last unit. The median is defined to be the value of the data item at the middle of the ordered data set. It is calculated by first evaluating its position using the formula (n+1)/2 where n is the number of data items. Once you have the position of the median you look through the ordered data to find the value of the data item at the relevant position.
Example 1 (recap)
The following data set gives the salary of 12 part time employees.
4,800 5,110 5,520 5,570 6,325 6,750
6,785 7,320 7,320 7,320 8,894 9,500
The minimum value is 4,800 and the maximum value is 9,500.
To evaluate the median we need to know its position.
Position of the median is 5.62
132
1122
1==
+=
+n
As the position is not an exact integer (it ends in .5) we know that the data set does not have a unique middle value and we need to average the 6th and 7th data items to find the median.
S6 Numerical Summaries of Data 1 Page 80
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The 6th data item has a value of 6,750 and the 7th has a value of 6,785 so the median is evaluated as;
Median = 50.67672
67856750=
+
The Median has a number of advantages and disadvantages as a measure of location.
• As the median only uses the middle observation in a data set it is not sensitive to extreme high or low values in a data set. Extreme values in data sets are often mistakes and so this means that a median is not distorted by mistakes in the data. As we shall see later, other measures of location are not so resilient to extreme values in data.
• The median is based purely on the data positions. The actual values of the data items are not used in its calculation so its use in more advanced statistical work is limited.
• The median often takes a value equal to one of the original data items in the data set.
• The median is useful for situations where data sets are difficult or expensive to obtain. For example, the median life of 100 light bulbs can be tested by waiting until the 50th one goes out.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Quartiles
The median divides the ordered data into two halves, each with the same number of observations. Each of these halves may, in turn, be divided into two by quartiles, so that the data is split into four quarters.
Minimum Median Maximum o • o • o Lower quartile Upper quartile
This cannot be done exactly unless the number of observations is divisible by 4, but it is easiest to define the lower quartile as the median of the lower half of the data and the upper quartile as the median of the upper half of the data.
As with calculating the median, we calculate the value of the upper and lower quartiles by first evaluating their position in the data set.
Position of the lower quartile is given by 4
1+n
and the position of the upper quartile is given by 4
13 +n .
S6 Numerical Summaries of Data 1 Page 81
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example continued
Returning to the example of part time employees salaries, evaluate the lower and upper quartiles.
The position of the lower quartile is 25.34
134
1124
1==
+=
+n .
Obviously there is no data item at position 3.25 so we need to make a decision. There are a number of things you can do but the easiest thing to do is round this position to the nearest whole number. So, 3.25 rounded gives us 3 so we will use the data item at position 3 to be the lower quartile.
The value of the third data item is 5,520 so the lower quartile is 5,520.
The position of the upper quartile is
75.925.334
1334
11234
13 =×=⎟⎠⎞
⎜⎝⎛=⎟
⎠⎞
⎜⎝⎛ +
=+n .
Again, there is no data item at position 9.75 so we will round this value to 10 and use the data item at position 10 to represent the upper quartile.
The value of the 10th data item is 7,320 so the upper quartile is 7,320. ENDSECTION STARTSECTION=content_6.htm= SECTION~
Five figure summary
Having calculated the median and quartiles the five figure summary is then made up as:
• Minimum • Lower Quartile (Q1) • Median (Q2) • Upper Quartile (Q3) • Maximum
For the example above the five figure summary would be;
Minimum 4,800
Lower quartile (Q1) 5,520
Median (Q2) 6,767.50
Upper quartile (Q3) 7,320
Maximum 9,500
Now go and do Exercise S6.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
S6 Numerical Summaries of Data 1 Page 82
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise S6.1
The following data set is the age of 25 employees considered in previous lectures. Having been provided with the data set and the stem and leaf plot, calculate the five figure summary.
63 27 46 47 22 64 30 19 69
36 65 60 40 66 55 33 47 42
49 23 22 46 62 30 20
Stem and Leaf Display of Ages of part time employees in years
Count
1 9 (1)
2 0 2 2 3 7 (5)
3 0 0 3 6 (4)
4 0 2 6 6 7 7 9 (7)
5 5 (1)
6 0 2 3 4 5 6 9 (7)
Leaf Unit = 1 year
The measure of location which forms part of the five figure summary is the Median. We indicated earlier that whenever we use a measure of location we need to quote an accompanying measure of dispersion. A five figure summary allows us to calculate two measures of dispersion; the range and the interquartile range.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
Range
The range is simply the difference between the minimum and the maximum values of a data set. The five figure summary quotes the minimum and maximum as two of its values. The five figure summary for example 1 is:
Minimum 4,800
Lower quartile (Q1) 5,520
Median (Q2) 6,767.50
Upper quartile (Q3) 7,320
Maximum 9,500
S6 Numerical Summaries of Data 1 Page 83
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
So the range is evaluated as 700,4800,4500,9 =− .
As you can see the range is very easy to calculate and understand. However, it does have a number of disadvantages.
• As only two values are used to calculate the range it is very sensitive to extreme values (outliers) in the data set.
• The range indicates the variation between the smallest and largest values in the data set but does not tell us how much the values vary from one another.
• The range has no natural partner amongst the measures of location and is not used in further advanced statistical work.
For the above reasons, the range has limited practical use except in the area of quality control.
ENDSECTION STARTSECTION=content_8.htm= SECTION~
Interquartile Range
The Interquartile Range is the difference between the upper and lower quartiles and hence it shows the range of the values in the middle half of the data set.
Interquartile Range = Q3 - Q1.
In the above example : Interquartile Range = 7,320 – 5,520 = 1,800.
• The range is inappropriate as a measure of dispersion when there are extreme values in a data set. As the Interquartile Range only uses the middle 50% of the data the extreme values are eliminated from the calculations and hence the Interquartile Range is not influenced by extreme values.
• The Interquartile Range is the natural partner to the Median. The smaller the Interquartile Range, the less dispersed are the data and the data is clustered quite close to the median. So, it could be argued, that the smaller the Interquartile Range is, the better the Median is at representing a typical value of the data set.
ENDSECTION STARTSECTION=content_9.htm= SECTION~
S6 Numerical Summaries of Data 1 Page 84
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Box plots
A Box plot (or box and whiskers plot) is a graphical method for representing a five figure summary. A box plot for the five figure summary in example 1 is;
10
20
30
40
50
60
70
Age
The central rectangle which marks out the two quartiles is called the ‘box’ while the horizontal lines on either side are the ‘whiskers.’ Just by observing the size and balance of the box and the whiskered components we can gain a quick and useful overall impression about how the data is distributed.
Box plots are not as informative as stem and leaf plots or histograms because they do not show the patterns of the data between the points of the five figure summary. That is, you know the range of the middle half of the data but you do not know how the data is spread within this range. Box plots however, are particularly useful for comparing two or more data sets. The following Box plot compares the verbal reasoning scores for students admitted to graduate study in departments in America classified according to the general categories displayed.
GRE verbal scores
200
300
400
500
600
700
800
Alldepartments
Naturalsciences
Engineering Socialsciences
Humanitiesand arts
Education
S6 Numerical Summaries of Data 1 Page 85
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The centre, spread and range of the distributions of average score are immediately apparent. For example, the scores for Engineering departments are tightly concentrated about a median average score of about 540. The highest median verbal score occurs for students admitted to departments in the Humanities and Arts and the lowest median scores were for student admitted to Education department. The interquartile range is about the same for all the categories with the exception of engineering where it is smaller. Finally, although there are some differences in overall spread as measured by the range, the median scores do no vary a great deal.
ENDSECTION STARTSECTION=content_10.htm= SECTION~
Mode
The Mode is another example of a measure of location. The mode of a data set is defined to be the value of the most frequently occurring item. Therefore, it could be argued that the mode is the best measure of a typical value for a data set if it quotes the value of the item that occurs the most often.
Example
Consider the following hotel room prices.
£ per night 49 52 55 55 55 55 55 60 69
The mode of this data set is £55 as this is the value which occurs most frequently. This value occurs 5 times and no other value occurs more than once.
This data set has 9 observations, so the median is the 5th observation which is £55. For this set of data the median and the mode are equal, however, this will not be the case in general.
ENDSECTION STARTSECTION=content_11.htm= SECTION~
S6 Numerical Summaries of Data 1 Page 86
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Frequency Tables
Frequency table of discrete data
Finding the mode for a frequency table of discrete values is very easy. The frequency column in the table tells you how many times each data value occurred in the data set. The mode is the value of the data item with the highest frequency. To find the mode look down the frequency column to identify the highest frequency and read off the corresponding modal value.
The following frequency table of the number of children in 23 surveyed families was discussed in the unit Summarising and Presenting Data 1.
Number of children Frequency
0 5
1 6
2 7
3 4
4 2
Total 23
Looking down the frequency column the highest frequency is 7. This tells us that the data item 2 occurred 7 times in the data set and was the most frequently occurring item. So the modal value for this data set is 2 children.
Frequency tables of grouped data
When data is presented in a grouped frequency table is possible to identify an interval with the highest frequency but it is not really possible to identify a modal value. The following frequency table shows the distance travelled by a group of 120 salespeople.
Distance travelled in Kms Number of salespeople (Frequency)
400 – 419 12
420 – 439 27
440 – 459 34
460 – 479 24
480 – 499 15
500 – 529 8
Total 120
S6 Numerical Summaries of Data 1 Page 87
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
From the table it is clear that the highest frequency is 34 which corresponds with a data interval of 400-459. So we could say that the modal class for this distribution is 400-459, i.e. salespeople are most likely to travel distances between 400 and 459 km.
Some text books will detail techniques for calculating a modal value for grouped frequency distributions using a formula or a graphical technique from a histogram. Both of these techniques assume that the modal value is in the modal class!!! There is no reason to suggest that this assumption is true; maybe every data item is different (which is highly likely for a distribution such as the one presented here), or maybe the data item which occurs most frequently isn’t in the modal class!. Therefore, I think it is better to just quote a modal class and don’t attempt to estimate a modal value.
ENDSECTION STARTSECTION=content_12.htm= SECTION~
Notes on the Mode
The mode has a very specific use in statistical summaries of data. It is only used when the purpose of the summary is to say what happens most often in the data set. If this is not the prime objective of the analysis the mode is not used as a regular measure of location for a number of reasons.
• Sometimes a data set does not have a modal value. This is particularly true of continuous data where every single value is likely to be different.
• Not all the data values are used when you calculate the mode so it can’t be used in more advanced statistical work.
• The mode is the only measure of location which you can use with category or attribute data.
• It is possible to have more than 1 mode. For example, the following table has 2 modal values, 2 and 4. If asked for the mode in such a situation you should quote both values (Don’t average them!).
Data 1 2 3 4 5
Frequency 2 6 4 6 5
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
S6 Numerical Summaries of Data 1 Page 88
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Questions
Seminar Question S6.1
At an inner city hospital there is concern about the high turnover of nurses. A survey was done to determine how long (in months) nurses had been in their current positions. The responses of 20 nurses were
23 2 5 14 25 36 27 42 12 8
7 23 29 26 28 11 20 31 8 36
Another survey was done at the hospital to determine how long (in months) clerical staff had been in their current positions. The responses of 20 clerical staff were
25 22 7 24 26 31 18 14 17 20
31 42 6 25 22 3 29 32 15 72
a) Rank each set of data.
b) Calculate the five figure summary for each set of data.
c) Compare the data sets using box and whiskers plots. (Discuss the location of the median, the location of the middle halves of the data sets etc.). Does the turnover of nursing staff appear to be different from that of clerical staff?
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Question S6.2
The following box plot shows the height of 50 women.
1.55 1.6 1.65 1.7 1.75
Height (m)
a) From the box plot estimate the values of the five figure summary.
b) Calculate the interquartile range
c) Does the data appear to be evenly spread throughout its range? (HINT: Think about the spread of the first 50% of the data and the last 50% of the data.).
S6 Numerical Summaries of Data 1 Page 89
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S6.3
A sixth form college needs to make a report to the budget committee about the average number of hours a student spends in timetabled classes each week. A student needs to have 12 timetabled hours a week to be classified as full time but all students can participate in up to 20 hours of timetabled classes. A random sample of 40 students yielded the following information about the amount of hours they were spending in the classroom each week.
12 12 12 12 12 12 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15 15 15 16 16 16 16 17 17 17 17 17 18 18 18 19 19 20 20
a) What is the modal amount of time that students spend in the classroom?
b) Calculate the five figure summary of the length of time students spend in the classroom.
c) Are the median and mode the same?
d) If the budget committee is going to fund the college according to the average amount of time students spend in the classroom, which of these two measures of location do you think the college will use? (This funding structure implies that there will be more money if the classroom time load is higher.)
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• what is understood by the terms measure of location and measure of dispersion;
• calculation of a five figure summary; • compilation of a box plot and their use in comparing data sets; • calculation of a mode; • knowledge of what measure of dispersion is the partner to the
median.
S6 Numerical Summaries of Data 1 Page 90
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
S7 Numerical Summaries of Data 2 Page 91
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION ENDCHAPTER
CHAPTER=NSD2
STARTSECTION=scope_1.htm= SECTION~
Numerical Summaries of Data 2 Context
In the previous unit we explained what is understood by the statistical terms measure of location and measure of dispersion. We discussed the median, mode, range and interquartile range. In this unit we will be considering the mean and standard deviation, which are the most usual forms of numerical summary statistics that are used in statistical analyses.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• calculate a mean and discuss its strengths and weaknesses; • understand that the mean can be distorted by extreme values in a data
set; • calculate a standard deviation and appreciate that it is the partner
measure of dispersion to use with the mean; • appreciate that in many situations the three measures of location we
have considered will all give different values, and understand of which measure may be more appropriate.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Mean
This is the summary statistic which should be familiar to you. It is calculated by dividing the sum of the observations by the number of observations. Notation:
nx
x ∑=
In this notation;
• The data set is referred to by the letter x. • Putting a line, or bar, above x is the standard method of denoting a
mean calculation. So x denotes the mean of the data set x. If we had a second data set we could refer to this data set using the letter y and its mean would be denoted by y .
S7 Numerical Summaries of Data 2 Page 92
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
• The statistical notation for the mean of a set of data uses the symbol
∑ (sigma). ∑ means ‘the sum of’ and is used as shorthand to
represent the ‘sum of a set of values.’ So, ∑ x represents the sum of
the data set x. • n represents the number of data items in the data set in question.
Example 1 Returning to the data set of the age of 25 employees which we have used in previous units.
63 27 46 47 22 64 30 19 69
36 65 60 40 66 55 33 47 42
49 23 22 46 62 30 20
To calculate the mean of this data set we first add up the data so;
.1083...........47462763∑ =++++=x
This data set has 25 data items so 25=n .
With these two numbers the mean of the data set is calculated as;
32.4325
1083=== ∑
nx
x years.
Note that no-one actually had an age of 43.32 years. The mean is merely representative of how old an employee will typically be.
Now go and do Exercise S7.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise S7.1
The number of issues of a particular monthly magazine read by 20 people in a year were as follows;
0 1 11 0 0 0 2 12 0 0
12 1 0 0 0 0 12 0 11 0
i) Calculate the median, mode and mean of this data set.
ii) To what extent do these three values provide an adequate summary of the data set? What are the most important features of the data?
ENDSECTION STARTSECTION=content_2.htm= SECTION~
S7 Numerical Summaries of Data 2 Page 93
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Notes about the mean
Advantages
• The mean is easy to calculate and widely understood as an average value.
• All of the values in the data set are used to calculate the mean so it is representative of the whole data set.
• It is supported by mathematical theory and is suited to further statistical analysis.
Disadvantages
• Its value may not correspond to any actual value. For example, the average family may have 2.3 children but no family can have exactly 2.3 children.
• The mean may be distorted by high or low values in a data set. For example, the mean of the numbers, 100, 105, 110 and 110 is 106.25. However, the mean of the numbers 100, 105, 110, 110 and 500 is 185. The high value of 500 distorts the mean and in some cases the mean would be a misleading and inappropriate figure. Extreme values are not uncommon in financial data!
In the seminar we will be looking at how you calculate the mean of a frequency distribution of discrete data. We will not be considering the mean of grouped frequency distributions on this module. Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 6 covers both of these situations.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Standard Deviation
As we already know, along with a measure of location we need a measure of dispersion which tells us how tightly the data cluster around the measure of location. If the measure of location being used is the mean, the appropriate measure of dispersion is the standard deviation. The standard deviation is based on measuring how far each data item deviates from the mean. There are various ways that you could do this but the standard deviation is the method most commonly used.
S7 Numerical Summaries of Data 2 Page 94
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Rationale for the Standard Deviation
Before we get into calculating a standard deviation we’ll use the following set of numbers to try and explain the rationale behind the formula used in its calculation.
Suppose we have the 5 numbers
£1 £2 £3 £4 £5
It’s very easy to calculate the mean of this data set as 3£=x .
The measure of dispersion that we need to quote along with this mean should reflect how close the individual data items are to the mean. So the obvious thing to do is calculate how far each data item is away from the mean (we’ll ignore the units for a while so that the calculations are less cluttered).
xx − :
1 – 3 = –2 2 – 3 = –1 3 – 3 = 0 4 – 3 = 1 5 – 3 = 2
These numbers, (–2, –1, 0, 1, 2) are the deviations, , of each data item from the mean )( xx − . We’re not interested in the individual deviations of each data item from the mean. What we want is an idea of a typical or average deviation of a data item from the mean. So what we need to do is calculate an average of these deviations.
Average Deviation = 050
5210)1()2(
==+++−+−
This average has come out to be 0 which would imply that there is no deviation (space) between the mean value and the data items. This could only happen if all values in the data set were the same. Clearly this is not the case! The problem is that the negative deviations (e.g. –2) and positive deviations (e.g. 2) have cancelled out in this calculation. The deviation is negative if the data item is less in value than the mean and it is positive otherwise. When looking at the measure of spread we don’t actually need to worry if the data item is less or more in value than the mean, we just want to know how far away from the mean it is. In other words we can ignore the minus signs (–) in the above lists of deviations; the deviations would then be
2 1 0 1 2
S7 Numerical Summaries of Data 2 Page 95
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
And the average of these deviations would be
Average Deviation = .2.156
521012
==++++
This average deviation is referred to as Mean Deviation and is a valid measure of dispersion. It tells us that, on average, the data items are 1.2 units away from the mean. The units in this case are £.
Refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 6 for further discussion of the mean deviation.
The mean deviation is the ideal measure of dispersion in terms of describing how the data spread about the mean. However, this process of ignoring the minus signs, although easy to do, is mathematically difficult to manipulate so the mean deviation is not suitable for further statistical analysis. We were ignoring the minus signs to get around the problem of negative and positive deviations cancelling each other out when we calculated the average deviation. Is there something else we could have done to deal with this problem? We could have squared each deviation instead because when you multiply two negative numbers together the answer is positive. So let’s list the deviations again. Notice I’ve included the units (£) again as we’ll need them in a minute to explain part of the rationale of the calculation of a standard deviation.
xx −
£1 – £3 = –£2 £2 – £3 = –£1 £3 – £3 = £0 £4 – £3 = £1 £5 – £3 = £2
Instead of ignoring the minus signs we will now square each deviation instead. This gives us:
( )2xx −
(–£2)2 = £24 (–£1)2 = £21 (£0)2 = £20 (£1)2 = £21 (£2)2 = £24
If we now calculate the average of these squared deviations we get
( ) ( ) ( ).£25
)(£105
£41014 2222
==++++
=−∑
nxx
This process of calculating the average of the squared deviations is called a variance and is a valid measure of dispersion.
S7 Numerical Summaries of Data 2 Page 96
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Variance and Standard Deviation
Variance
( )n
xx∑ − 2
The problem with using this as a measure of dispersion to go with the mean is that it has different units. The units of the mean are £ and the units of the variance are £2. We get a round this problem by taking a square root of the variance which gives us the standard deviation.
Standard Deviation
( )n
xx∑ − 2
In the above example the standard deviation would be
( ) ( ) ( ) .4142.1££25£10 2
22
===−∑
nxx
Notes about the standard deviation:
• The standard deviation is by far the most important of the measures of dispersion but its importance is due to its mathematical properties rather than its descriptive properties. However, the more the individual data items differ from the mean, the greater will be ( )2xx − and so the standard deviation will be larger. Hence the greater the dispersion, the larger the standard deviation will be.
• The standard deviation is the natural partner to the mean as the mean is used in its calculation.
• The standard deviation is truly representative of the data set as all the data items are used in its calculation.
• The standard deviation gives too much importance to extreme values in the data set and hence is a distorted measure of dispersion when extreme values exist in the data set.
• Many people use definitions of the variance and standard deviation which are slightly different form the one given above. They use the following formula
Variance = ( )
.1
2
−−∑
nxx
S7 Numerical Summaries of Data 2 Page 97
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
This will obviously give a different numerical result. However the reasons for dividing by n – 1 instead of n are related to sampling theory and cannot be easily understood in the context of describing the dispersion of a single data set. For large data sets the two formulas will produce similar results. However, you should be aware that many (but not all!) computer packages divide by n – 1 when working out the variance and the standard deviation. Similarly a standard deviation button on a calculator may divide by n – 1. It doesn’t actually matter which definition you use as long as you stick to the same one when comparing two or more data sets.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Hand Calculation of the Standard Deviation
Unless all the observations and their mean are reasonably small integers, hand calculation of the variance can be long and messy. For this reason it is usual to use the fact that
Variance = ( )
.222
xnx
nxx
−=− ∑∑
This is just a mathematical result which can be derived using algebra. Using this formula you calculate the squares of the original data values rather than the squared deviations.
Returning to the above example
£1 £2 £3 £4 £5
We already know that the mean of this data set is £3.
To use the hand calculation formula we now need to calculate the squared data items and add them up.
x2
12 = 1(£2) 22 = 4(£2) 32 = 9(£2) 42 = 16(£2) 52 = 25(£2)
( )∑ =++++= 22 £552516941x .
The variance is now calculated as
( ) ( ) ( ) ( ) ( ).£2£9£113£5£55 2222
22
2
=−=−=−∑ xnx
S7 Numerical Summaries of Data 2 Page 98
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The standard deviation is then the square root of the variance:
Standard deviation = ( ) .4142.1££2 222
==−∑ xnx
So we get exactly the same answer as we had before. Now go and do Exercise S7.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise S7.2
Calculate the standard deviation for the data set concerning hotel room prices which we used in the last unit.
£ per night
49 52 55 55 55 55 55 60 69
We showed in the last unit that the median and mode for this data set is £55. If you do the mean calculations it gives you x =£56.11. Using this value you now need to calculate the variance using the technique described above.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Coefficient of Variation
Sometimes we need to compare two data sets to see if one set of numbers is more variable than another.
Example
Consider the two following sets of summary statistics
Data Set 1 Data Set 2
Mean ( x ) 10 100
Standard Deviation (s) 2.5 25
Which of the data sets is more variable?
Comparing the standard deviations you would conclude that data set 2 is more variable as it has the higher standard deviation. However, the two data sets relate to sets of figures of quite different orders of size which is clearly demonstrated when you compare the mean values (10 compared to 100).
S7 Numerical Summaries of Data 2 Page 99
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
We can obtain some idea of the degree of relative variability if we relate the size of the variation to the average of the figures it was derived from. This leads to the coefficient of variation.
Coefficient of Variation = 100×xs
where s stands for the standard deviation. Returning to the summary statistics for these two data sets.
Data Set 1 Data Set 2
Mean ( x ) 10 100
Standard Deviation (s) 2.5 25
Coefficient of Variation 2510010
5.2=×
25100
10025
=×
So, using the coefficient of variation, both data sets are equally variable. ENDSECTION STARTSECTION=content_7.htm= SECTION~
Comparing Mean, Median and Mode
As we have seen during the last few units, for a given data set the mean, median and mode may all be different. This raises the obvious question of which we should use in a given situation.
If the mean, median and mode of a set of data are all the same, the distribution of the data is said to be symmetric.
Symmetric (zero skewness)
17 18 19 20 21 22 23
Years
Freq
uenc
y
mode = mean = median = 20
The median and mean are both located in the middle of this distribution. The modal value is the one with the highest frequency, so it will be the value corresponding to the highest point of the above curve and so the mode is also the same as the median and the mean.
S7 Numerical Summaries of Data 2 Page 100
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
In practice, the frequency distribution will tend to lean in one direction or the other because the data set will either have predominantly low values with a few high values or vice versa. This is called skew.
If a data set is made up predominantly of small values with a few high extreme values the distribution would resemble the one below. The mode is again located under the highest part of the curve so would be on the left of the median. The mean is distorted by the high values and would be bigger in value than the median.
This is known as right skew or positive skew.
Skewed to the right (positively skewed)
0 200 400 600 800 1000 1200 1400 1600 1800
Weekly income
Freq
uenc
y
mode = £300
median = £510
mean = £600
If instead the data is largely made up of high values with a few low extreme values, the mean will be distorted by the low values and be less in value than the median. The mode is again under the highest part of the curve so the mode is bigger than the median.
This is known as left skew or negative skew.
Skewed to the left (negatively skewed)
0 100 200 300 400 500 600 700 800 900
Tensile strength
Freq
uenc
y
mode = 750
median = 645
mean = 600
S7 Numerical Summaries of Data 2 Page 101
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
So if it is possible for all the values to be different which one should we use? The following points may be useful as a guide to which measure of location to use.
• The mode is only used when you want to talk about the most frequently occurring data item.
• If you’re not using a mode then generally you could use a median or a mean. Most usually, you would use the mean as a measure of location unless you are in the position described in the next bullet point.
• If you have either a few extremely low or extremely high data values the mean may provide a distorted measure of location. So in this situation it may be better to quite the median instead of the mean as the measure of location.
• When you want to compare two or more data sets you will usually use a mean. Only a mean uses all the data items in its calculation so only the mean will really reflect differences in the data sets. The only exception to this is when you are comparing data sets using box plots to make a quick comparison, as the median is a fundamental part of a box plot.
Having decided which measure of location to use, which measure of dispersion should you use?
You should never really be in the position where you are trying to answer this question. Measures of dispersion form natural partners with the measures of location. So the use of the measure of dispersion is straight forward:
• use a range and interquartile range if you’re quoting the median as the measure of location;
• use a standard deviation if you’re quoting the mean as the measure of location.
S7 Numerical Summaries of Data 2 Page 102
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Questions
Seminar Question S7.1
Jane purchased a new home computer and has been having trouble with voltage spikes on the power line. Such voltage jumps can be caused by the operation of appliance such as clothes dryers and electric irons or just by a power surge on the outside power line. The following data was obtained about voltages when certain appliances were turned on and off. (The normal voltage in the U.K is 240)
146 280 156 284 160 280 180 266
i) Compute the mean, standard deviation, coefficient of variation and range of this data set.
Jane was advised to buy a power surge protector which protects the computer from strong voltage spikes. The voltages were again measured when certain appliance were turned off and on. The results were
200 240 216 228 210 234 206 228
ii) Compute the mean, standard deviation, coefficient of variation and range of the voltages using the power surge protector.
iii) Compare your answers to parts i) and ii). Were the means about the same? Were the voltage distributions different with and without the power surge protector? How did the standard deviation, coefficient of variation and range reflect this when the mean did not?
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S7.2
A production department uses a sampling procedure to test the quality of newly produced items. The department employs the following decision rule at an inspection station: If a sample of 14 items has a variance of more than 0.005, the production line must be shut down for repairs. Suppose the following data has just been collected:
3.43 3.45 3.43 3.48 3.52 3.50 3.39
3.48 3.41 3.38 3.49 3.45 3.51 3.50
Should the production line be shut down? Explain your answer. ENDSECTION STARTSECTION=activity_5.htm= SECTION~
S7 Numerical Summaries of Data 2 Page 103
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question S7.3
The following data sets show the number of days of work missed by employees in two departments of the same company.
Department A : 20 employees
Number of days missed by each employee in one year.
0 0 0 0 1 1 1 2 2 2
3 3 3 5 5 5 8 10 15 95
Department B : 30 employees
Number of days missed by each employee in one year.
2 2 2 2 2 2 3 3 3 4
4 5 5 5 6 6 7 7 7 7
8 8 8 8 8 8 10 10 12 15
For each department calculate the five figure summary, the mean and the standard deviation of these absenteeism figures. Describe any difference between the two departments.
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question S7.4
This question will take you through the process of calculating the mean for a frequency table of discrete data.
The following table summarises how many questions a group of 11 students answered correctly on a diagnostic test.
Number of correct questions Number of students 12 3 13 4 14 2 15 2
i) Using this table write down the raw data set, i.e. there are 3 occurrences of the item 12 etc so the raw data set will begin as 12 12 12 etc
ii) Write down, in long hand, the calculation of the mean; i.e on the top line of the formula right down the expression which adds up all the numbers, i.e. 12+12+12 etc
S7 Numerical Summaries of Data 2 Page 104
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
iii) Is there a quicker way of writing down the expression on the top line (HINT: use multiplication.)
iv) Can you now see how this quicker expression relates to the frequency table above?
ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Seminar Question S7.5
A production process randomly selects 30 boxes of components for inspection during its quality control process. Each box contains 20 items and the quality control process counts how many defective components are in each of the 30 boxes. If, on average, there are more than 2 defective components in each box the process is shut down. During the first quality inspection of the day the number of defective components found in the 30 sample boxes is summarised below.
Number of defective components Number of boxes
0 9
1 11
2 5
3 3
4 2
i) Write down the data set and show that the mean, median and mode of the number of defective components are 1.3, 1 and 1 respectively.
ii) Five hours later the quality inspection is repeated and the number of defective components found in 30 boxes is summarised below.
Number of defective components
Number of boxes
0 6
1 10
2 5
3 5
4 2
5 2
The median and mode of this data set is 1 and the mean is 1.8. Comment on the strengths and weaknesses of using each of these measures of location in this situation.
ENDSECTION STARTSECTION=think_1.htm= SECTION~
S7 Numerical Summaries of Data 2 Page 105
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• calculating a mean; • calculating a standard deviation; • using the coefficient of variation to compare the spread of two data sets
as measured by the standard deviation; • appreciation of why the values for mean, median and mode can all be
different for some data sets and the concept of skew; • understanding of which measure of location is appropriate in a given
situation; • knowledge of which measure of dispersion to use given you have
selected an appropriate measure of location.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S8 Correlation and Regression 1 Page 107
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=CR1
STARTSECTION=scope_1.htm= SECTION~
Correlation and Regression 1 Context
The statistical techniques we have considered so far in this module have concentrated on methods of describing a single variable (set of data). We now turn our attention to methods which will allow us to examine two variables to see to what extent they are related. This is called bivariate analysis.
It is often the case that two variables are related, i.e. changes in one variable are accompanied by changes in the other variable. Bivariate analysis is concerned with assessing how strong the relationship between two variables is (correlation) and modelling that relationship (regression).
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• understand the concept of correlation; • construct and use a scatter plot to asses the existence of a linear
relationship between two variables; • know the difference between positive and negative correlation; • be able to calculate and interpret the product moment correlation
coefficient; • be able to calculate and interpret Spearman’s rank correlation
coefficient; • be aware of the problems of spurious correlation.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Correlation Analysis
The basic idea of correlation analysis is to assess the strength of the linear (straight line) association that may exist between two variables.
Scatter Plot
A useful method of investigating if there is a relationship between two variables is to draw a scatter plot. Plot one of the variables along the x axis and the other one along the y axis and examine the resulting scatter plot for any pattern. The correlation analysis we are considering assesses the strength of a linear relationship so we are hoping to see the pattern in the scatter plot suggesting a straight line relationship between the two variables.
S8 Correlation and Regression 1 Page 108
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
A sales manager has collected information for ten of his staff relating to their length of experience in years and their annual sales. The data that was collected is shown below.
Experience (in years) Annual sales (£000’s)
1 40
2 49
3 46
4 51
5 52
6 56
7 60
8 62
9 59
10 68
In this example we would expect the length of experience to explain the annual sales. So we will make the length of experience the explanatory variable and the annual sales the response variable (we discussed the ideas of explanatory and response variables in the unit Summarising and Presenting Data 1). Having decided this we will produce a scatter plot with the explanatory variable (years of experience) on the horizontal (x) axis and response variable (annual sales) on the vertical (y) axis.
N.B. It doesn’t matter which way round we plot the variables for correlation analysis, i.e. we don’t need to decide which variable is the explanatory and which is the response at this stage. However, we will need to be able to make this distinction in the next unit.
S8 Correlation and Regression 1 Page 109
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
A scatter plot of this data set is as follows:
01020304050607080
0 2 4 6 8 10 12Years of experience
Ann
ual s
ales
(£00
0s)
To produce this plot we begin with the first pair of observations. This relates to an individual with 1 year of experience who made 40 thousand pounds worth of sales. To plot this point move along the horizontal axis to x = 1, then go vertically to y = 40 and place a dot at the intersection. This process is repeated for the remaining pairs of data.
This scatter plot does suggest that there is some relationship or correlation between length of service and annual sales. As the length of service increases, the annual sales also increase. Furthermore, the points seem to following the pattern of a straight line. This means that the correlation analysis techniques that we will be discussing can be used to assess the strength of this relationship.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Degrees of Correlation
The relationship between two variables can be classified as one of the following.
• A perfect relationship (perfect correlation) • A partial relationship (partly correlated) • No relationship (uncorrelated)
Furthermore, the relationship can be described as positive or negative.
Positive correlation
As one variable increase so does the other. Low values of one variable are associated with low values of the other variable and high values of one variable are associated with high values of the other.
S8 Correlation and Regression 1 Page 110
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Negative correlation
As one variable increases the other one decreases. So high values of one variable are associated with low values of the other variable.
All these differing degrees of correlation and the positive or negative nature of the relationship can be illustrated using scatter plots.
Perfect negative correlation
x
y
Line has negative slope
r = –1
Perfect positive correlation
x
y
Line has positive slope
r = 1
If the relationship between the two variables is perfect then all the values lie on a straight line. An exact linear relationship exists between the two variables. If the line slopes down the relationship is negative, if the line slopes up the relationship is positive.
As the points begin to move away from the line the relationship gets weaker. The following two plots indicate what you can expect to see on a scatter plot as the relationship between the two variables weakens. These plots demonstrate partial correlation which is what you’re most likely to meet in practice.
Weak negative correlation (x and y l inearly related to some
extent)
Price x
Qua
ntity
sold
y
Strong positive correlation (x and y linearly related to a large extent)
School GPA x
Col
lege
GPA
y
S8 Correlation and Regression 1 Page 111
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
If there is no relationship between the two variables then the scatter plot will resemble a random scatter of points as shown below.
Zero correlation, r = 0 (x and y not linearly related)
Height x
Inco
me
y
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Product Moment Correlation Coefficient
The degree of the relationship between the two variables can be measured and we can decide numerically if the relationship is perfect or partial. If two variables are partially correlated we can decide if the relationship is strong or weak.
The degree of correlation is measured by the product moment correlation coefficient which we will denote by the letter r.
The correlation coefficient will always be a number between –1 and +1. If you get a value outside of this range you have made a mistake.
• If r = +1 we have perfect positive correlation. • If 0 < r < 1 we have partial positive correlation. • If r = 0 we have no correlation. • If –1 < r < 0 we have partial negative correlation. • If r = –1 we have perfect negative correlation.
The correlation coefficient is calculated using the following formula
( )( ) ( )( )2 22 2.
n xy x yr
n x x n y y
−=
− −
∑ ∑ ∑∑ ∑ ∑ ∑
The letters x and y represent the pairs of data for the two variables and n is the number of pairs of data used in the analysis.
The formula may look complicated but all it involves is calculating relevant sums and substituting them into the formula.
S8 Correlation and Regression 1 Page 112
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example continued
In this example, the length of experience is referred to as x and the annual sales is referred to as y. The formula requires us to calculate 5 sums:
• ∑ x . This is calculated by adding up the variable referred to as x, experience.
• ∑ y . This is calculated by adding up the variable referred to as y, annual sales.
• ∑ 2x . This is calculated by first squaring each value of the variable x and then adding these values up.
• ∑ 2y . This is calculated by first squaring each value of the variable y and then adding these values up.
• ∑ xy . This is calculated by multiplying each value of x by its corresponding y value and adding the results up.
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Product Moment Correlation Coefficient Continued
The best way to perform the calculations is to set them out in a table.
Experience
x
Annual Sales
y
x2
y2
xy
1 40 1×1=1 40×40=1600 1×40=40
2 49 2×2=4 49×49=2401 2×49=98
3 46 9 2116 138
4 51 16 2601 204
5 52 25 2704 260
6 56 36 3136 336
7 60 49 3600 420
8 62 64 3844 496
9 59 81 3481 531
10 68 100 4624 680
∑ = 55x ∑ = 543y ∑ = 3852x ∑ = 301072y ∑ = 3203xy
S8 Correlation and Regression 1 Page 113
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Having calculated the sums we now need to substitute them into the formula.
( )( ) ( )( )2 22 2.
n xy x yr
n x x n y y
−=
− −
∑ ∑ ∑∑ ∑ ∑ ∑
We have 10 pairs of data in this example so n=10. Substituting this value for n and the sum )(Σ values from the table gives:
( ) ( )( ) ( )[ ] ( ) ( )[ ]22 54330107105538510
54355320310
−×−×
×−×=r
( )[ ] ( )[ ] 62218252165
294849301070302538502986532030
×=
−−−
=r
2165 2165 0.96.2265.46355132325
r = = =
This confirms what we saw on the scatter plot. The correlation between years of experience and annual sales is positive. So as a member of staff gains more experience their sales increase. The relationship is also very strong as r=0.96 is quite close to 1. This is substantiated by the points on the graph following the pattern of a straight line quite closely.
Now go and do Exercise S8.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Exercise S8.1
A large industrial plant has seven divisions that do the same type of work. A safety inspector visits each division of 20 workers regularly. The number of work hours devoted to safety training and the number of work hours lost due to industry related accidents are recorded for each separate division in the following table and scatter plot.
Hours in safety training 10.0 19.5 30.0 45.0 50.0 65.0 80.0
Hours lost due to accidents
80 65 68 55 35 10 12
S8 Correlation and Regression 1 Page 114
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Scatter plot of number of hours lost due to accidents against number of hours of safety training
0102030405060708090
0 20 40 60 80 100
Hours in safety training
Hou
rs lo
st d
ue to
acc
iden
ts
i) What does this information tell you about the relationship between safety training and the number of accidents?
ii) If x represents hours in safety training and y represents hours lost due to accident, calculate the product moment correlation coefficient. You may find it easier to make use of the following table.
x y x2 y2 xy
10.0 80
19.5 65
30.0 68
45.0 55
50.0 35
65.0 10
80.0 12
∑ =x ∑ =y
S8 Correlation and Regression 1 Page 115
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Spearman's Rank Correlation Coefficient
In the example and exercise above the data were given in terms of the values of the relevant variables. So in the example we knew how many years of experience the salesmen had and the actual value of their annual sales. Sometimes however, the data may be in terms of the order or rank of the data rather than actual values. So in the example we could have been given the salesmen rank in terms of length of service and their sales ranked according to who sold the most. So the salesman with the highest sales would get a rank of 1, the next highest would get a rank of 2 and so on. When this happens the product moment correlation coefficient is no longer appropriate and a correlation coefficient know as Spearman’s Rank Correlation Coefficient, rS , should be calculated using the following formula.
( ) ⎥⎥⎦⎤
⎢⎢⎣
⎡
−
×−= ∑
16
1 2
2
nnd
rs
where n = number of pairs of data as in the product moment correlation coefficient.
d = the difference between the rankings in each set of data.
As with the product moment correlation coefficient, Spearman’s Rank Correlation Coefficient, rs, will be a number between –1 and +1 and it is interpreted in the same way.
Example
The following data set shows the placing of seven students in their statistics and economics examination. A 1 indicates the student who performed the best.
Student A B C D E F G
Statistics placing 2 1 4 6 5 3 7
Economics placing 1 3 7 5 6 2 4
From the table it is clear that Student B produced the best performance on the Statistics examination and student G produced the worst performance. Similarly, Student A produced the best performance on the Economics paper and student C produced the worst performance. What we now want to know is if there is any relationship between the placing on the two papers. In other words, if you do well in statistics are you also likely to do well in economics and vice versa.
S8 Correlation and Regression 1 Page 116
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
To do this we will calculate Spearman’s Rank Correlation Coefficient, rS. The first thing we need to do is calculate the differences in the ranks and square these differences, which we will do in a table.
Student Statistics Rank
Economics Rank
d d2
A 2 1 (2 – 1) = 1 12 = 1
B 1 3 (1 – 3) = –2 (–2)2 = 4
C 4 7 –3 (–3)2 = 9
D 6 5 1 1
E 5 6 –1 1
F 3 2 1 1
G 7 4 3 9
∑ = 262d
Then we need to substitute ∑ 2d and n into the formula for Spearman’s Rank Correlation Coefficient.
( ) ⎥⎥⎦⎤
⎢⎢⎣
⎡
−
×−= ∑
16
1 2
2
nnd
rs
( ) ( )2
6 26 156 156 1561 1 1 1 1 0.4643 0.5357.7 49 1 7 48 3367 7 1sr
⎡ ⎤×⎢ ⎥= − = − = − = − = − =
− ×−⎢ ⎥⎣ ⎦
The correlation is positive which suggest that a high placing on the Statistics paper corresponds to a high placing on the Economics paper. However, the correlation is not strong as the value of the correlation coefficient is only 0.5357.
S8 Correlation and Regression 1 Page 117
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Now go and do Exercise S8.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise S8.2
A student organisation surveyed both recent graduates and current students to obtain information on the quality of teaching at a university. An analysis of the responses produced the teaching ability ranking shown in the following table.
Professor A B C D E F G H I J
Current Students 4 6 8 3 1 2 5 10 7 9
Recent Graduates 6 8 5 1 2 3 7 9 4 10
Is there a relationship between the rankings of the current students and the recent graduates? You may make use of the following table in calculating your answer.
Professor Current Students
Recent Graduates
A 4 6
B 6 8
C 8 5
D 3 1
E 1 2
F 2 3
G 5 7
H 10 9
I 7 4
J 9 10
S8 Correlation and Regression 1 Page 118
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
It is possible to get tied ranks when presenting data in this way. This would happen if two students did equally well in the Statistics paper and got joint first place. Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 11 discusses how to calculate Spearman’s Rank Correlation Coefficient when you have tied ranks.
Note
It is worth remembering that ranked data are less precise than actual data values. An item ranked first may be slightly better than an item ranked second or it may be much better. It follows that the results of rank correlations are less precise and we must interpret them carefully. Wherever possible, the product moment correlation coefficient must be used. There is a question in the seminar problems which illustrates this point.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Correlation and Causation
A causal relationship is one where the value of one variable is directly attributable to the value of the other. A causal relationship between two variables implies a strong correlation. However a strong correlation does not imply a causal relationship. When interpreting r it is important to realise that there may be no direct connection at all between strongly correlated variables. Such correlation is termed Spurious Correlation.
What we can conclude when we find two variables with a strong correlation is that there is a relationship between the two variables, not that a change in one causes a change in the other. The relationship may be due to the dependence of both variables on a third variable. For example, sales of ice cream and sunglasses are strongly correlated, not because of a direct causal link but because the weather influences both variables.
For a discussion of correlation and causation see Business Basics Quantitative methods or any other suitable text from the reading list.
S8 Correlation and Regression 1 Page 119
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Questions
Seminar Question S8.1
The following figures give (in units of £10m) the turnover and profit before taxation for a firm. Calculate the product moment correlation coefficient and comment on the result.
Turnover 106 125 147 167 187 220
Profit 10 12 16 17 18 22 ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S8.2
The finance division of a large company is investigating its procedures for the selection of new accountancy trainees. Potential applicants are given, prior to appointment, both a written test and a formal interview. The performances of eight successful applicants were rated after their first full year with the company. The independent rankings of written test, interview assessment and job performance for the eight trainees are given as follows.
Trainee A B C D E F G H
Written test 6 2 7 4 1 5 3 8
Interview 1 4 2 3 6 5 8 7
Job Performance 1 2 3 4 5 6 7 8
a) Calculate the Spearman’s Rank Correlation coefficient between:
i) job performance and written test
ii) job performance and interview assessment.
b) Which of the variables, written test and interview assessment is more strongly related to job performance?
c) Can you conclude that the variable most weakly correlated with job performance is not necessary as part of the selection process? Explain your answer.
S8 Correlation and Regression 1 Page 120
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question S8.3
Produce a scatter plot of the following set of data.
X 2 4 6 8
Y 10 20 30 40
a) Without performing any calculations what would be the value of both the product moment correlation coefficient and the Spearman’s rank correlation coefficient? Explain your answer.
b) Suppose the data were altered slightly due to a typing error and the data is now
X 2 4 6 8
Y 10 24 30 47
Produce another scatter plot of this data set. From this plot and the data set would you expect there to be a change in the values of;
i) The Product Moment Correlation Coefficient
ii) Spearman's Rank correlation Coefficient
Explain your answer. What does this question illustrate about the precision of each of these correlation coefficients?
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question S8.4
Consider the following data set.
Year 1986 1987 1988 1989 1990
Girls Average weekly pocket money (in pence)
114 120 122 136 147
Cautions for violent offences (thousands)
9.5 11.3 12.7 14.7 16.8
a) Produce a scatter plot of the data and calculate the product moment correlation coefficient between pocket money and the number of cautions for violent offences.
b) What explanation can you come up for the resulting strong correlation?
S8 Correlation and Regression 1 Page 121
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of and be able to answer any questions relating to all of the following points;
• scatter plots and discussing the nature of any apparent pattern between two sets of data;
• calculating and interpreting the product moment correlation coefficient; • calculating and interpreting the Spearman’s Rank Correlation
Coefficient; • understand the way in which the product moment correlation
coefficient is better; • understand the idea of spurious correlation.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S9 Correlation and Regression 2 Page 123
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=CR2
STARTSECTION=scope_1.htm= SECTION~
Correlation and Regression 2 Context
In the last unit we discussed the use of correlation and scatter plots as tools to assess if a relationship existed between two sets of data and a means of deciding how strong the relationship was.
We said that correlation tests the strength of a linear relationship that may exist between two sets of data. So on the scatter plot we were looking for the points to be following a pattern that suggested a straight line. The word used to describe the overall shape of points plotted on a scatter plot is the trend. In the first example in the last unit the trend of the points was to follow a straight line sloping upwards and in the exercise the trend of the points was to follow a straight line sloping downwards. However, it is possible to be a bit more precise about this pattern. Imagine drawing a straight line through the middle of the points on the scatter plots. Finding the equation of such a line would allow us to move from vague, wordy descriptions of the trend to a more precise mathematical description of the relationship in question. Once the line has been defined in this more formal way, it becomes possible to make predictions about where we think other points may lie.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• produce a scatter plot and identify a line of best fit by eye; • understand the basis of the least squares estimation of the line of best
fit; • use the formula to calculate the equation of the line of best fit; • assess the fit of the line using the coefficient of determination; • use a line of best fit for estimation and prediction and be aware of the
reliability of the resulting estimates.
ENDSECTION STARTSECTION=scope_3.htm= SECTION~
S9 Correlation and Regression 2 Page 124
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Recap on the equation of a straight line
This process of fitting the best line through the points on a scatter plot involves a few basic mathematical ideas about graphs and straight lines.
The equation that describes a linear (straight line) relationship between two variables x and y is
bxay +=
This is the general equation of a straight line and describes all possible straight line that you could draw on a set of axes. What we want is the very specific straight line which best describes or best fits the points on the scatter plot. In order to define a specific straight line we need to know two pieces of information;
• The slope of the line (b) • Where the line crosses the y axis (a)
The slope of the line can be either positive or negative depending on whether the line slopes upwards or downwards.
So calculating a line of best fit for a set of data means we need to find appropriate values for a and b in the above equation of a straight line.
For more discussion of the mathematical ideas of straight lines refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 1. Also work through unit M7.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Fitting the best line by eye
One method of finding the best line that passes through a set of points on a scatter plot is to use your own judgement as to what is the best line. Using a ruler draw a line through the middle of the points so that there are just as many points above the line as below the line. Having done this there are methods that will allow you to calculate values for a and b resulting in the equation of this line. However, this effort is unnecessary really as you can use the line itself on the scatter plot for estimation.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
S9 Correlation and Regression 2 Page 125
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The Method of Least Squares
When fitting a line through the data points by eye you will probably draw a line which goes as close to the points as possible. The method of least squares is a mathematical method for calculating a and b so the resulting line is such that the sum of the squares of the distance from each point to the line is a minimum.
Suppose we have a scatter plot of the data such as the one that follows. The line of best fit will not pass perfectly through all the points which means there is a deviation from each point to the line. Obviously the “best line” will be one such that these deviations are as small as possible. Another way of saying this is that the best line will be one such that the sum of the deviations from the points to the line will be as small as possible (i.e. the sum is minimised). However, as some of the points are above the line and some are below the line this means some of the deviations are positive and some are negative. As we already know from the lecture on standard deviation, when you add positive and negative numbers together they cancel out. So instead of finding values for a and b such that the sum of the deviations is minimised, we will square all the deviations and find values for a and b such that the sum of the squared deviations is minimised.
x
y
d 1
d 2
d n
So the method of least squares chooses values for a and b so that the sum of the squared deviations;
222
21 ................ nddd +++
is as small as possible.
S9 Correlation and Regression 2 Page 126
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The mathematics involved in proving how to calculate a and b to achieve this is quite involved. However, it can be proved that if we use the formulae
( )∑ ∑∑ ∑ ∑
−
−= 22 xxn
yxxynb
and
xbyn
xb
ny
a −=−= ∑∑
the resulting straight line will be such that the sum of the squares of the distance from each point to the line is a minimum. The resulting regression line is called the Least Squares Regression Line of y on x. If you compare the formula for b to the formula for r last week you will see that they are very similar:
( )( ) ( )( )∑ ∑∑ ∑∑ ∑ ∑
−−
−=
2222 yynxxn
yxxynr
The top lines (numerators) of r and b are identical and the bottom line (denominator) of b is part of the expression in the denominator of r. This emphasises the fact that correlation and regression are closely related and the calculations involved in regression will be pretty similar to those of correlation.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
The Method of Least Squares continued
Example
For this example we will return to the data set which we used in the last unit when we discussed correlation.
A sales manager has collected information for ten of his staff relating to their length of experience in years and their annual sales. The data that was collected is shown below.
S9 Correlation and Regression 2 Page 127
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Experience (in years) Annual sales (£000’s)
1 40
2 49
3 46
4 51
5 52
6 56
7 60
8 62
9 59
10 68
As we discussed last week, we would expect years of experience to explain annual sales so we will make experience the x variable or explanatory variable and annual sales the y variable or response variable. (We shall return to this idea later.)
Just to remind ourselves a scatter plot of this data set was:
01020304050607080
0 2 4 6 8 10 12Years of experience
Ann
ual s
ales
(£00
0s)
These points clearly lie close to a straight line so to continue with a least squares regression analysis is sensible.
As with correlation, the calculations involve evaluating various sums and then substituting them into the formula. So the process of the calculation will be very similar to in the last unit and is best performed in a table.
S9 Correlation and Regression 2 Page 128
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The table of calculations from last week was:
Experience x
Annual Sales y
x2
y2
xy
1 40 1×1=1 40×40=1600 1×40=40
2 49 2×2=4 49×49=2401 2×49=98
3 46 9 2116 138
4 51 16 2601 204
5 52 25 2704 260
6 56 36 3136 336
7 60 49 3600 420
8 62 64 3844 496
9 59 81 3481 531
10 68 100 4624 680
∑ = 55x ∑ = 543y ∑ = 3852x ∑ = 301072y ∑ = 3203xy
To calculate the regression line we need all of these sums apart from
∑ = 301072y .
To calculate the equation of the regression line we will first calculate the value of b:
( )∑ ∑∑ ∑ ∑
−
−= 22 xxn
yxxynb
Substituting in the values for the various sums from the above table we get;
( ) ( )( ) ( )2
10 3203 55 543 32030 29865 2165 2.62.3850 3025 82510 385 55
b× − × −
= = = =−× −
Having calculated a value for b we need to use this value in the calculation of a.
( )
543 552.6210 10
54.3 2.62 5.5 54.3 14.41 39.89.
y xa b
n n⎛ ⎞= − = − ×⎜ ⎟⎝ ⎠
= − × = − =
∑ ∑
S9 Correlation and Regression 2 Page 129
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The equation of the resulting regression line of y (annual sales ) on x (years of experience) is
y = 39.89 + 2.62x.
The scatter plot with the line of best fit drawn on it is:
y = 2.6242x + 39.867
0
10
20
30
40
50
60
70
80
0 2 4 6 8 10 12
Years of experience
Ann
ual s
ales
(£00
0s)
ENDSECTION STARTSECTION=content_4.htm= SECTION~
S9 Correlation and Regression 2 Page 130
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Interpreting the coefficients
The terms a and b are called the coefficients of the least squares regression line. We already know that, in the straight line,
• a is the intercept of the line with the y axis. Or a is the value of y when x = 0.
• b is the slope of the line.
But what do the values of a and b actually mean in the context of this data?
• a is the value of the y variable (annual sales) when the x variable (experience) = 0. So in the context of this question, when x (experience) = 0, y (annual sales) = a (39.89). So this means we would expect a salesmen with no experience to make, on average, 39.89 (£000’s) worth of sales in a year.
• b is the value of the slope. In this example b is positive because the line slopes upwards. The value of b tells you what the change in y will be when x increases by 1. So in this question, if x (years of experience) increases by 1, y (annual sales) will change by 2.62. This is a positive change so as a salesman gains 1 more year of experience you should expect to see his annual sales increase by 2.62 (£000’s).
For a greater discussion of the interpretation of the coefficients in a least square regression line refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 11.
Now go and do Exercise S9.1
ENDSECTION STARTECTION=activity_1.htm= SECTION~
S9 Correlation and Regression 2 Page 131
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise S9.1
Returning to the exercise from the last unit concerning of the number of hours spent on safety training and the number of hours lost due to accidents. The data set and scatter plot were as follows;
Hours in safety training 10.0 19.5 30.0 45.0 50.0 65.0 80.0
Hours lost due to accidents 80 65 68 55 35 10 12
0102030405060708090
0 20 40 60 80 100
Hours in safety training
Hou
rs lo
st d
ue to
acc
iden
ts
• Would it be sensible to continue with a least squares regression analysis on this data? Explain your answer.
• Calculate the least squares regression line of number of accidents (y) on time spent in safety training (x).
• Interpret the coefficients in your least squares regression line.
You can use the following information,
∑ = 5.299x ∑ = 325y ∑ = 5.9942xy
∑ = 25.165302x ∑ =197432y
S9 Correlation and Regression 2 Page 132
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Uses of Regression
Having fitted the relationship between the two variables it can be used to estimate values of y for given values of x or to exercise control.
Estimation
If we know there is a very close relationship between the years of experience a salesman has and their annual sales, we could estimate the annual sales for a salesman with a given amount of experience.
Example
What would be the estimated annual sales for a salesman with 6 years of experience? Obviously within the data set we have a salesman with precisely 6 years of experience. However, the y value of 56 which corresponds with this individual in the data set is his annual sales, not the general amount of sales you could expect of any salesman who had 6 years of experience.
The regression equation for this example was
y = 39.89 + 2.62x.
We want to know the estimated value of y when x = 6.
( )39.89 2.62 6 39.89 15.72 55.61.y = + × = + =
So we would expect a salesman with 6 years of experience to make, on average, 55.61 (£000’s) worth of sales in a year.
Obviously the mathematics involved in using a regression line for estimation is fairly simple. However, you do need to be careful when using the estimated values and somehow judge their reliability. Obviously part of the reliability of the estimate will be attributable to how well the line fits the data which is a point we will be returning to. If the data points very closely follow the suggested line, the line will be a good fit to the data and the estimates will be fairly good. Another important part of the reliability of the estimate is the value of x that you are trying to estimate from.
S9 Correlation and Regression 2 Page 133
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Interpolation
If the given x value which you are estimating from is within the range of x values used to fit the regression line, then you can estimate y with a fair degree of confidence. In the estimation above, we were estimating y when x = 6. The regression line was fitted on values of x ranging from 0 to 10 so this is an interpolated estimate so we can be fairly confident with our answer.
Extrapolation
If the given x value is outside the range of values of x used to fit the regression line, then the estimate for y needs to be treated with some caution. There is no evidence to suggest that the regression relationship holds outside the fitted range of x as you have no data there.
Refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 11 for more discussion on the reliability of estimates.
Control
In the above example, we know from the regression relationship that a salesman with 6 years of experience should be making, on average, 55.61 (£000’s) worth of sales in a year. If a salesman with 6 years of experience in the company was only making 20 (£000’s) in a year it would alert you to a potential problem that may be worth investigating.
Similarly, suppose we have a regression relationship of the cost of maintenance of a machine on the age of the machine. We can use the regression line to estimate the maintenance cost of a machine of a given age. If the actual maintenance cost is higher than expected, it indicates that the machine is not functioning as it should. An overhaul of the machine may rectify matters and reduce the maintenance costs.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
S9 Correlation and Regression 2 Page 134
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Explanatory and Response Variables
When dealing with two variables it is important to know which is the response (or dependent) variable and which is the explanatory variable. We have looked at this before but a brief summary follows:
• The explanatory variable (x) is not affected by changes in the other variable but it can be used to help explain changes in the other variable. In the example of experience and annual sales you would expect salesmen with greater experience to make more sales. So the length of experience can be used to explain the sales so experience is the explanatory (x) variable. The response variable is affected by changes in the other variable.
Sometimes it is obvious which variable is the response and which variable is the explanatory. In other situations it is not. Suppose we collect data from individuals which consists of recording their weight and height. In trying to decide between the explanatory variable and the response, does your height explain your weight or does your weight explain your height? It could be either, I suppose. If you’re not sure which variable to make the response (y) and the explanatory (x), you always put the variable to be estimated as y. So in the situation of wanting to use someone’s height to estimate their weight, you would make the weight the response, y.
ENDSECTION STARTSECTION=content_7.htm= SECTION~
Coefficient of Determination
The product moment correlation coefficient, r, can be used to evaluate the coefficient of determination, r2. The coefficient of determination specifies how much of the variation in the response variable is explained by variation in the explanatory variable.
Example
Last week we showed that the correlation between length of experience and annual sales of salesmen was
0.96.r =
The coefficient of determination is then
2 20.96 0.9216.r = =
This is usually quoted as a percentage so r2 = 92.16%. This tells us that 92.16% of the variation in annual sales can be explained by variation in years of
S9 Correlation and Regression 2 Page 135
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
experience. This is obviously very high so the fit is good and interpolated estimates will be fairly reliable.
Clearly for a regression to be useful we would want the r2 to be quite large. ENDSECTION STARTSECTION=content_8.htm= SECTION~
Extreme Values
As we have already discussed when we were calculating the mean, extreme values can distort the results of statistical analyses. This is particularly true of regression analyses. Extreme values can have a big effect on a regression line and need to be given careful consideration. Consider the following data set.
x 10 8 9 11 14 6 4 12 7 5
y 7.46 6.77 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73
The scatter plot with the fitted regression line indicates that the points follow a very good straight line pattern, the correlation is very high and the resulting regression line would give us excellent estimates:
y = 0.35x + 4.01
r = 0.99
0123456789
10
0 2 4 6 8 10 12 14 16x
y
If we now take the same data set but add in the additional point x =13, y =23.5, (this point is shown in bold in the table below) what happens to the fitted regression line?
x 10 8 9 11 14 6 4 12 7 5 13
y 7.46 6.77 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73 23.5
S9 Correlation and Regression 2 Page 136
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
As the scatter graph and regression line show, although there is only point which is different between these two data sets, the correlation in the second data set has dropped considerably, from 0.99 to 0.58, which means estimates will be less reliable. The plot also shows that the fitted regression line does not fit the majority of the data well at all. The one extra point which is an extreme value has pulled the line away from a very good fit to quite a poor fit. So this one extreme point will have quite a large detrimental effect on the estimation of y from this data set.
y = 0.89x + 0.46
r = 0.58
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16x
y
S9 Correlation and Regression 2 Page 137
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Questions
Seminar Question S9.1
A company is interested in the effectiveness of its advertising expenditures. An experiment is conducted to study how the amount spent on advertising affects the sales of a soft drink the company produces. Ten sales areas are included in the experiment. Each area spends its allocated advertising budget (in 10,000’s of £) on a prime time television commercial. The observations are recorded which shows the expenditures and sales for the 10 areas.
Advertising expenditure (x)
2 2 3 4 5 5 6 7 7 8
Sales (y) 8 4 10 7 11 15 19 16 23 20
i) Explain why the sales variable has been designated as the response (y) and the advertising as the explanatory variable (x).
ii) Produce a scatter plot of this data and indicate if regression analysis is suitable for this data (use a scale of 0 to 16 on the x axis).
iii) Calculate the regression line of sales on advertising expenditure. Interpret the coefficients in this regression line.
iv) Calculate the coefficient of determination and interpret this value.
v) Estimate the sales for an area with an advertising budget of 3.5 (10,000’s of £) and estimate the sales for an area with an advertising budget of 15 (10,000’s of £). Comment on the reliability of these estimates.
vi) Additional data becomes available for 4 further areas which is detailed below.
Additional x values 11 12 14 16
Additional y values 22 21 22 20
Add these points to your plot. What does this additional information tell you about the reliability of your estimate above for a sales advertising budget of 15?
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
S9 Correlation and Regression 2 Page 138
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question S9.2
A city council is considering increasing the number of police at public events, such as football matches, in an effort to reduce crime. Before making a final decision, the council asks the chief of police to survey other public events of similar size to determine the relationship between the number of police and the number of crimes reported. The chief gathered the following information.
Number of police (in hundreds)
15 17 25 27 17 12 11 22
Number of crimes (in tens)
17 13 5 7 7 21 19 6
i) If we want to estimate crimes based on the number of police, which variable is the response variable and which is the explanatory variable?
ii) Draw a scatter plot and comment on the suitability of using least squares regression analysis for this data.
iii) Calculate the appropriate least squares regression line.
iv) Calculate and interpret the coefficient of determination. ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S9.3
Consider the following set of data.
x 2 4 1 5 3
y 15 25 10 40 30
i) Produce a scatter plot of this data. Use a scale from -50 to 50 on the y axis.
ii) Does the relationship appear to be quite strong between these two sets of data? Calculate the product moment correlation coefficient to support your conclusion.
iii) A third data item becomes available so that the data set is.
x 2 4 1 5 3 6
y 15 25 10 40 30 -50
S9 Correlation and Regression 2 Page 139
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
iv) Add this additional data point to the plot. What would you expect to happen to the value of the correlation coefficient and the fitted line? What does this show you about extreme values in regression analysis?
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of and be able to answer any questions relating to all of the following points;
• the ideas under lying the method of least squares estimation of a regression line;
• calculating a least squares regression line; • interpreting the coefficients of a least squares regression line; • estimating and predicting values of the response variable from a least
squares regression line and understanding the reliability of these estimates;
• calculating and interpreting the coefficient of determination; • understanding the effect of extreme values on least square regression
lines.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
S10 Estimation Page 141
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=ESTIM
STARTSECTION=scope_1.htm= SECTION~
Estimation Context
This unit will bring together ideas and concepts which you have studied in previous lectures. In particular we will revisit ideas covered in
• Sampling • Means and standard deviations • Normal distribution
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should be able to:
• use the information in a sample to calculate a point estimate of an unknown population mean;
• understand the concept of a sampling distribution and know that the sample mean, x , follows a normal distribution;
• calculate the standard error of x ; • calculate an interval estimate of an unknown population mean; • discuss how the sample size and level of required confidence affects the
calculated interval estimate. •
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Revision of Sampling
Earlier in the module we considered ways of selecting a sub group or sample from a population. Having done so, we only collect statistical information (data) from people, or items, in the sample. We also said that, provided the sample was selected in a representative or unbiased way, any results that are true for the sample will generalise to be a correct population result.
Example
Suppose we want to know the mean income of all single males in the U.K. To answer this question we decide to take a random sample of 20 single males and ask them their income, which produces the following results (in £).
S10 Estimation Page 142
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
12,000 26,000 35,000 24,500 18,000 15,500 28,500 18,000 54,000 43,000
17,500 22,000 21,500 26,000 16,000 16,500 24,500 27,500 29,000 17,000
We can calculate the mean salary for these 20 men using
.600,24£20
000,492£=== ∑
nx
x
This is the exact average salary for these 20 men. However, we want to know the average salary for all single men in the U.K. So we assume that our sample of 20 men was representative of all single men in the U.K. and say the average salary of all single men in the U.K. is also £24,600.
So we take a sample in order to estimate something about the population of interest as a whole. In the above example we asked a sample of 20 single men how much they earned and worked out the sample average, x . This then gives us an idea of the average earnings (population average) of all single males in the U.K. The sample average is likely to be close to the population average, provided the sample is representative, but it is unlikely to be totally precise. (The sample average x is unlikely to be exactly the same as the population average.)
Obviously, if we had asked all the single males in the U.K. their salary and then worked out the average the answer would be precise. However, in the first unit on collecting data we discussed lots of reasons why it is often impractical to ask the whole population. Usually you have to use imprecise or incomplete sample information to find out something relating to the population.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Estimation notation
So our situation is that we have a population (all single males) and we want to know some information about the population. Usually we want information about the mean and the standard deviation for the population. The notation we will use for these two quantities is;
• μ = population mean (mean salary of all U.K. single males) • σ = standard deviation (standard deviation of salary of all U.K. single
males)
S10 Estimation Page 143
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
We don’t ask all the population so we don’t know the values of μ and σ . We take a representative sample and calculate statistics from the sample information.
• x = sample mean (mean salary of 20 single men in the sample) • s = sample standard deviation (standard deviation of salary of 20 single
men in the sample) • n = sample size.
We use the information in the sample to talk about the population. ENDSECTION STARTSECTION=content_3.htm= SECTION~
Point Estimate of an unknown population mean μ
A point estimate is a single number which is used to estimate the population mean μ . The obvious point estimate of μ is the sample mean x .
So, in the last example we calculated x = £24,600 from the sample of 20 men. What we are really interested in is the mean for all single men, the population mean μ . So we say the value of x (sample mean) is a point estimate of μ (population mean).
An estimate of μ is x , so an estimate of μ is £24,600.
The average salary of all single men in the U.K. is approximately £24,600.
Problem
Someone else takes another sample of 20 single men. The average salary they calculate from their sample is £23,250! This is different to the average we calculated from the sample in the first example.
Both of these samples results are trying to estimate the value of the population mean μ (mean for all single men). Is one of the estimates better than the other? If so, which one is better?
The value we compute for x will vary in a random manner from sample to sample. We are using the sample mean x to estimate the population mean μ . We can’t expect x to estimate μ perfectly as it is based on incomplete information. What we are concerned with in this lecture is the accuracy of the estimate of μ .
ENDSECTION STARTSECTION=content_4.htm= SECTION~
S10 Estimation Page 144
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Sampling distribution of x
The fact that the value of x changes from sample to sample can actually be used to help us. Imagine if we take lots of samples and calculate the sample mean from each. We are now left with a collection of sample means which we could plot as a frequency distribution. This distribution is called the sampling distribution of the mean. Theoretically it is the set of means of all possible samples of size 20 which could be taken from the population.
Suppose for the example concerning the salary of single men in the U.K. that, instead of taking 1 sample, we actually took 250 samples all of size 20. We could calculate the mean x for each of these samples which would give us 250 separate x calculations. Some of these sample means may turn out to be the same but there would be a degree of variation between the 250 values calculated for x . A frequency distribution summary of these 250 values for x might be.
Mean salary from sample ( x ) in £ Frequency f (number of samples)
14,000 but less than 16,000 3
16,000 but less than 18,000 7
18,000 but less than 20,000 16
20,000 but less than 22,000 30
22,000 but less than 24,000 44
24,000 but less than 26,000 50
26,000 but less than 28,000 44
28,000 but less than 30,000 30
30,000 but less than 32,000 16
32,000 but less than 34,000 7
34,000 but less than 36,000 3
∑ = 250f
S10 Estimation Page 145
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
A histogram of this distribution is
0
10
20
30
40
50
60
15000 17000 19000 21000 23000 25000 27000 29000 31000 33000 35000
This suggests that the average salary of 20 men in a sample ( x ) might range from £14,000 to £36,000. The true mean of the population ( μ ), that is the true mean salary of all U.K. single males, presumably also lies somewhere in this range.
If we know about the sampling distribution of the mean, particularly the variability in the sampling distribution, we have information about how good a sample mean ( x ) is as an estimate of the population mean ( μ ). So we can use the sampling distribution to tell us something about how accurate x is.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Properties of the sampling distribution of the mean
A sampling distribution of the mean has the following important properties.
• It is very close to being normally distributed. This is true even if the distribution of the population from which the samples are drawn is nowhere near normal. The larger the sample, the more closely will the sampling distribution approximate to a normal distribution.
N.B. Remember, to characterise a normal distribution we need to know its mean and standard deviation.
• The mean of the sampling distribution is the same as the population mean μ .
• The sampling distribution has a standard deviation which is called the standard error of the mean (SE) and is calculated as
SE = nσ .
We don’t know σ as this is the standard deviation of the population. However, we can calculate the standard deviation for the
S10 Estimation Page 146
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
sample, s, and use this as an estimate of σ so we can calculate the standard error of the mean using the formula
SE = ns . Using this information we can use x to quote an estimate for
μ and use information in the sampling distribution to tell us how good the estimate is by quoting a range of likely values for μ .
Pause for thought
Am I now suggesting that we take lots and lots of samples, calculate the mean and look at the distributions of the resulting set of sample means? Surely this would involve a lot of work, and one of the reasons for taking a sample in the first place was to reduce the amount of work we need to do?. Fortunately, there is a very important statistical result which comes to our rescue and allows us to use the information contained in just one sample to come up with the sampling distribution of the mean. This result is called the Central Limit Theorem and is crucial to statistics.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Central Limit Theorem
Suppose we have a population with mean μ and standard deviation σ . Imagine that we take repeated samples of size n, where n is large, by which we mean that n ≥30.
The central limit theorem tells us that:
• The sampling distribution of the means is a normal distribution. • The mean of the sampling distribution is the population mean μ . • The standard deviation of the sampling distribution is the standard error
of the mean, SE = n
σ .
Putting this all together we can say that the sample mean x follows a normal distribution with mean μ and standard deviation
nσ
2
~ N ,xn
σμ⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ .
S10 Estimation Page 147
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
μ
We can use this fact to get extra information from our sample results.
Example on salary continued
The data concerning salary for our sample of 20 single men was
12,000 26,000 35,000 24,500 18,000 15,500 28,500 18,000 54,000 43,000
17,500 22,000 21,500 26,000 16,000 16,500 24,500 27,500 29,000 17,000
We already know that the mean for this data is
.600,24£20
000,492£=== ∑
nx
x
We can also calculate the standard deviation using
( )
2341.987697540000
605160000702700000
600,2420
01405400000 222
==
−=
−=−= ∑ xnx
s
So the standard error of the means is n
σ . Using s as an estimate for σ , the
standard error of the means is
9876.2341 2208.3931.20n
σ = =
S10 Estimation Page 148
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Using the central limit theorem we know that
( )( )2
2~ N , ~ N , 2208.3931xn
σμ μ⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ .
Since almost all the normal distribution lies within 3 standard deviations of the mean, the range of this sampling distribution is
( )3931.22083×−μ up to ( )3931.22083×+μ .
Using x as the estimate for μ this gives a range of
( )3931.22083×−x up to ( )3931.22083×+x
i.e. ( )3931.22083600,24 ×− up to ( )3931.22083600,24 ×+
i.e. £17,974.821 up to £31,225.179.
So although x is not a precise estimate of μ , we can be fairly confident that μ is in the range £17,974.821 up to £31,225.179.
£17,974.82 £31,225.18μ
ENDSECTION STARTSECTION=content_7.htm= SECTION~
S10 Estimation Page 149
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Confidence Interval
One of the ways of expressing the accuracy associated with a particular sample mean is to quote a range of values which μ is likely to be within, or an interval estimate for μ . This is called a confidence interval.
A confidence interval gives us a range of values which we say will contain the true value of μ with a certain degree of confidence. So a confidence interval is a range of values which is likely to contain the true value of the population mean μ .
From our knowledge of a normal distribution together with the information that the sample means are normally distributed we can calculate a confidence interval for μ using the formula
⎟⎟⎠
⎞⎜⎜⎝
⎛×±
nzx σ
where the value of z is found from standard normal tables to satisfy a required level of confidence.
Salary example continued
Suppose we want a 95% confidence interval for the salary of single males in the U.K. From our sample we know that x = 24,600 and s = 9876.2341.
To calculate the 95% confidence interval we need to know what value of z to use.
We chose the value of z so that the confidence interval is centred in the middle 95% of the normal curve as shown below:
95%100% – 95% = 5%5% / 2 = 2.5% = 0.025 2.5% = 0.025
0
We want the value of z so that the area in each of the two tails is 0.025 (2.5%).
S10 Estimation Page 150
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Using the left hand table of the normal tables on page 207 of the handbook we need to look for 0.025 in the main body of the table and then work out what z value gives us a lower tail area of 0.025. The value 0.025 is in the middle chunk of the table and corresponds to a z value of –1.96:
2.5% = 0.025
0-1.96
Using this value the confidence interval is then
( )
9876.234124,600 1.9620
24,600 1.96 2208.393124,600 4328.450420,271.55 up to 28,928.45.
x znσ⎛ ⎞ ⎛ ⎞± × = ± ×⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠= ± ×
= ±=
So we are 95% confident that the true value of the mean salary of single males in the U.K. is between £20,271.55 and £28,928.45.
Now go and do Exercise S10.1 ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Exercise S10.1
What would a 99% confidence interval for the salary of single males in the U.K. be?
S10 Estimation Page 151
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Why don't we use a 100% confidence interval?
It is not possible to quote an interval which will contain the true value of the population mean μ with 100% accuracy as the interval is based on incomplete sample information. However, we quote a confidence interval which contains the true value of μ with high probability. Most usually we use a 95% confidence interval. This means that if we were to take 100 samples, 95 of them would produce values for x and s which would result in confidence intervals that did contain the true value for μ . However, 5 of them would result in a confidence interval which actually didn’t contain the true value of μ . Therefore, when we quote a confidence interval, there is always a small chance that it does not contain the true value of μ .
ENDSECTION STARTSECTION=content_8.htm= SECTION~
Final Comment
When calculating the confidence interval we used the formula
22
xnx
s −= ∑ which is equivalent to ( )
nxx
s ∑ −=
2
to calculate the standard deviation for the sample.
If you refer back to the unit on Numerical Summaries of Data 2 I indicated that some people use a different definition for the standard deviation. Instead of dividing by n they divide by n – 1 to give
( )1
2
−
−= ∑
nxx
s.
When calculating confidence intervals you should use the formula for the standard deviation where you divide by n – 1. This then gives a better estimate of the unknown population standard deviation σ . However, for this module, if you are asked to calculate a standard deviation for a sample you can use the method covered in the Numerical Summaries of Data 2 unit.
S10 Estimation Page 152
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Questions
Seminar Question S10.1
A random sample of 100 records of a mail order company for March of this year revealed that the values of individual orders had a mean ( x ) of £65 and a standard deviation (s) of £4.
i) What is the standard error of the mean?
ii) Calculate a 95% confidence interval of the mean value of orders that the company received in March.
iii) Suppose you wanted a 90% confidence interval, what would the value of z be in the formula for the confidence interval?
iv) Calculate the 90% confidence interval of the mean value of orders that the company received in March.
v) In February, the mean value of orders was £68. Does there seem to have been a change from February to March? Explain your answer.
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Question S10.2
Fast food service companies try to devise wage plans that provide incentive and produce salaries for their managers that are competitive with corresponding positions in competing companies. A random sample of 12 unit managers for one company shows that they earn an average salary of £36,750 with a standard deviation of £3,100.
i) Calculate a 95% confidence interval for the mean salary of the company’s managers.
ii) Do the data suggest that the mean salary earned by the company’s units managers differs from £38,500 which is the mean salary paid by a competitor firm? Explain your answer.
S10 Estimation Page 153
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question S10.3
A random sample of 80 observations had x =14.1 and s=2.6.
i) Find a 95% confidence interval for μ .
ii) Find a 99% confidence interval for μ .
iii) Which of these 2 confidence intervals is wider? What does this show about how the level of required accuracy (or confidence) affects the resulting confidence interval?
iv) Suppose the sample contained only 32 observations but x =14.1 and s =2.6 were the same. Recalculate the 95% confidence interval with the smaller sample size.
v) What does the calculation in part (iv) show you about how the size of a sample affects the accuracy of the estimates?
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able answer any questions relating to, the following points:
• you should know that sample results are used to estimate unknown population quantities: typically the sample mean ( x ) is used to estimate the population mean ( μ );
• you should be able to discuss why sample estimates are not precise estimates and therefore why we need a measure of how accurate the estimates are;
• you should understand what is meant by a sampling distribution; • you should know, that because of the Central Limit Theorem, we know
what the sampling distribution of the mean x is Normal with mean μ and
standard deviation n
σ ;
• you should be able to calculate and interpret a confidence interval for an unknown population mean μ .
S10 Estimation Page 154
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
MATHEMATICS SECTION
Original author: Alison Megeney Revisions by: Thomas Bending
Alison Megeney
M1 Financial Mathematics 1 Page 157
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=FM1
STARTSECTION=scope_1.htm= SECTION~
Financial Mathematics 1 Context
This unit introduces financial mathematics. In it we consider how investments and commodities gain or lose value over time. We will examine different forms of interest and investigate how these affect the worth of an investment over time. This will enable us to determine what return we can expect on an investment, or see how a commodity loses value. In the next unit we will build on these ideas to enable us to determine whether or not the returns of an investment plan make investing in it worthwhile. We come across examples of financial mathematics everyday. Whether it is the interest we gain on our savings, credit card or student loan, or the depreciation or loss in value of our car. The financial mathematics you will consider in this unit will help you to interpret financial information from a variety of sources that you encounter in everyday life, and make decisions based on your evaluation of the financial small print. So if you are wondering if your credit card is offering you the most competitive interest rate or to see if you are better off moving your savings to another bank or building society, then this unit can inform your decision. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit, you should:
• understand the concept of simple and compound interest and be able to perform calculations;
• be aware of the differences between nominal and effective rates of interest and be able to calculate the effective rate (APR);
• understand the concept of straight line depreciation and be able to perform calculations.
M1 Financial Mathematics 1 Page 158
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Interest and Depreciation
What is interest? It is the amount earned on an investment, or the amount paid on a loan.
Credit card example:
The interest is the amount of extra money added to your credit card bill every month.
What is depreciation?
This is a loss of value of an investment over time.
Value of a car example:
Your car loses value over time. Different cars lose different proportions of their current value each year. Try looking on the web to see how a car of your choice loses value over time.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Useful Definitions
P = Principal amount.
This is the amount initially considered, i.e. the amount of money invested.
For example this is the amount you open your bank account with.
A = Accrued Amount.
This is the updated value of the principal amount after some fixed time.
e.g. This is your current bank balance. So the original amount plus the extra money earned.
i = Interest rate. The proportionate amount of money to be added to the principal amount.
e.g. This is usually a fixed percentage, say 5%=0.05.
n = The number of time periods.
This is the length of time (years, months, days) over which the money has been borrowed or invested.
e.g. This is the total number of times the interest is added on to you account every year.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Simple Interest Investment
The interest amount of interest added to your investment is calculated based on you principal amount, and not on the accrued amount (current value of your investment).
M1 Financial Mathematics 1 Page 159
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Simple Interest example
Suppose you have £200 which you invest at a simple interest rate of 10% per year. How much will have accrued after 3 years? So the amount accrued after 3 years is £260.
Year Amount on which interest is calculated
Interest Amount Accrued
1 £200 10% of £200 =
0.1 × 200 = £20
£200 + £20 =£220
2 £200 10% of £200 =
0.1 × 200 = £20
£220 + £20 =£240
3 £200 10% of £200 =
0.1 × 200 = £20
£240 + £20 =£260
This can be written in notation as A3 = £260. ENDSECTION STARTSECTION=content_4.htm= SECTION~
Simple Interest Formula
To calculate the accrued amount after n time periods can be a long process so it is advised that you should be able to use the formula. An = P×[ 1+ (i×n)] This formula will be provided in the exam and does not need to be memorised.
Simple interest formula example
Suppose we have £200 invested at a simple interest rate of 10% per annum. How much will have accrued after 10 years? We will use, An = P×[ 1+ (i×n)] So P = 200, i =10% = 0.1, and n =10. This gives A10 = 200×[ 1+ (0.1×10)] =200×[1+1]= 200 × 2 = 400
Not sure about how to use brackets?
See Improve your maths by Bancroft and Fletcher for helpful tips. Now go and do Exercise M1.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Exercise M1.1 - Simple Interest Activity
M1 Financial Mathematics 1 Page 160
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Suppose you have £500 which you invest at a simple interest rate of 25% per year.
How much will have accrued after 4 years?
Year Amount on which interest is calculated
Interest Amount Accrued
1
£ 25% of £ =
2
£
3
£
4
£
What is the amount accrued after 4 years?
Write the amount accrued after 4 years in notation
How much has accrued after 10 years?
ENDSECTION STARTSECTION=content_5.htm= SECTION~
M1 Financial Mathematics 1 Page 161
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Compound interest investment
The interest amount of interest added to your investment is calculated based on your accrued amount (current value of your investment), and not on the principal amount.
Compound Interest example
Suppose you have £200 which you invest at a compound interest rate of 10% per year. How much will have accrued after 3 years?
Year Amount on which interest is calculated
Interest Amount Accrued
1 £200 10% of £200 =
0.1 × 200 = £ 20
£200 + £20 =£220
2 £220 10% of £220 =
0.1 × 220 = £ 22
£220 + £20 =£242
3 £242 10% of £242 =
0.1 × 242 = £ 24.20
£242 + £20 =£266.20
So the amount accrued after 3 years is £266.20. This can be written in notation as, A3 = £266.20.
M1 Financial Mathematics 1 Page 162
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Compound interest formula
It is advised that you use the formula. An = P×[ 1+ i]n This formula will be provided in the exam and does not need to be memorised.
Compound interest formula example
Suppose we have £200 invested at a compound interest rate of 10% per annum. How much will have accrued after 10 years? We will use An = P×[ 1+ i]n So P = 200, i = 10% = 0.1, and n =10. This gives A10 = 200 × [ 1+ 0.1]10 = 200 × [1.1]10 = 200 × 2.5937425 = 518.75 2dp
Note:
On your calculator you should have one of the following function buttons, xy ,yx , ab , ba, ^ To evaluate 56 follow these steps on your calculator , • press 5 • press the xy
button • press 6 and then =. This produces an answer of 15625.
Need more help with powers?
Need more help with powers (exponents)?
See Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 1 for further examples.
Now go and do Exercise M1.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise M1.2 - Calculator Activity
Compute the following;
46 =
(1.2)3 =
(1 + 0.1)4 = Now go and do Exercise M1.3
M1 Financial Mathematics 1 Page 163
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Exercise M1.3 - Compound Interest Activity
Suppose you have £500 which you invest at a compound interest rate of 25% per year.
How much will have accrued after 4 years?
Year Amount on which interest is calculated
Interest Amount Accrued
1
£
2
£
3
£
4
£
What is the amount accrued after 4 years?
Write this in notation.
How much has accrued after 10 years? ENDSECTION STARTSECTION=content_7.htm= SECTION~
Nominal and Effective Interest Rates
Nominal rate
Rates of interest are often expressed or quoted as figures per year (per annum) even though they may be compounded ( or act) over time periods less than a year. The given annual rate is known as the Nominal Annual rate. Nominal rate examples
Credit cards A store card advertisement states that it has a nominal annual rate of 24% that acts (compounded) monthly. So has a compound monthly interest rate of 2% per month.
Is 24% the true percentage rate that acts?
M1 Financial Mathematics 1 Page 164
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Finance package
Finance company pays for your goods and then charges you a nominal annual rate of interest on the amount you have borrowed.
ENDSECTION STARTSECTION=content_8.htm= SECTION~
Nominal and Effective Interest Rates
Effective Rate (or Actual Percentage Rate)
This is the rate of interest that effects (acts) on your investment or the amount that you have borrowed. This interest rate is always greater than the quoted nominal rate.
APR example
Suppose you buy a cooker which costs £200.The manager offers you a finance package which allows you to pay in one years time, but the cost of the cooker will be increased by an annual nominal interest rate of 20% compounded Quarterly. How much will you be paying for the cooker in 1 year? The interest rate which acts on the price of the cooker is 5% per quarter, (20% annual ÷ 4 quarters = 5% per quarter).
M1 Financial Mathematics 1 Page 165
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Number of times interest has been added
Amount on which interest is calculated
Interest Amount accrued
1 £200 5% of £200 =
0.05 × 200 = £ 10
£200 + £10 = £210
2 £210 5% of £210 =
0.05 × 210 = £10.50
£210 + £10.50 = £220.50
3 £220.5 5% of £220.50 =
0.05 × 220.50 =
£11.025
£220.5 + £11.025
=£231.525
4 £231.525 5% of £231.525 =
0.05 × 231.525 =
£11.57625
£231.525 +11.57625
= 243.10125
= £243.10 2dp
So we will pay an extra £43.10 for the cooker if we accept the finance package offered. If we express this increase as a percentage we will obtain the effective interest rate or Actual Percentage Rate. So,
Effective Rate ( or APR ) = %55.21%100200
10.43=×⎟
⎠⎞
⎜⎝⎛ .
ENDSECTION STARTSECTION=content_9.htm= SECTION~
Annual Percentage Rate interest formula
It is advised that you use the formula,
11
1rate annual nominal1...
−⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
−⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
n
n
ni
nRPA
where n is the number of compoundings in one year.
M1 Financial Mathematics 1 Page 166
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
APR Formula example
Suppose a credit card charges a nominal annual rate of 24% , 0.24, which is compounded (acts) monthly, n = 12. What is the Actual Percentage Rate?
( )
2dp. to%82.262682418.0
12682418.1102.1
102.01112
0.241=
1rate annual nominal1...
12
1212
==
−=−=
−+=−⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+
−⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
n
nRPA
Now go and do Exercise M1.4
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Exercise M1.4 - APR Activity 1
Suppose a credit card charges a nominal annual rate of 24%
What is the Actual Percentage Rate if the interest is compounded in the following ways?
Weekly
Daily
Hourly Now go and do Exercise M1.5
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Exercise M1.5 - APR Activity 2
Suppose a finance package has a nominal annual rate of 10% which is compounded (acts) quarterly. What is the Actual Percentage Rate?
1n
rate annual nominal1... −⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
nRPA =
M1 Financial Mathematics 1 Page 167
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_10.htm= SECTION~
Depreciation
This is the loss in value of an item over time.
ENDSECTION STARTSECTION=content_11.htm= SECTION~
Straight Line Depreciation
This is the opposite of simple interest. Once the proportion of the initial value has been calculated, it is subtracted from the total each year ( or month or day). A fixed amount of money is deducted at each time point.
Straight Line Depreciation example
Suppose you have £200 which is depreciating at an interest rate of 10% per year. What value will be retained after 3 years using straight line depreciation?
Year Amount on which deduction is calculated
Deduction Amount accrued
1 £200 10% of £200 = 0.1 ×200 = £ 20
£200 – £20 = £180
2 £200 10% of £200 = 0.1 × 200 = £ 20
£180 – £20 = £160
3 £200 10% of £200 = 0.1 × 200 = £ 20
£160 – £20 = £140
So the amount retained after 3 years £140. This can be written in notation as D3 = £140
ENDSECTION STARTSECTION=content_12.htm= SECTION~
M1 Financial Mathematics 1 Page 168
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Straight Line Depreciation Formula
To calculate the amount retained after n time periods can be a long so it is advised that you should be able to use the formula. D n = P × [ 1– (i × n)] This formula will be provided in the exam and does not need to be memorised.
Straight Line Depreciation formula example
Suppose we have £200 (straight line) depreciating at an interest rate of 10% per annum. How much will have accrued after 10 years? We will use the formula, D n = P × [ 1– (i × n)] and let P = 200, i = 10%=0.1, and n = 10. This gives D 10 = 200 × [ 1– (0.1× 10)] = 200×[1–1] = 200×0 = 0 Now go and do Exercise M1.6
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Exercise M1.6 - Straight Line Depreciation Activity
Suppose you have a watch worth £500 whose value (straight line) depreciates at a rate of 15% per year.
What is the watch’s retained value after 4 years?
Year Amount on which deduction is calculated
Deduction Amount retained
1
£
2
£
3
£
4
£
Write the retained value after 4 years in notation
How much of the watch’s value is retained after 7 years?
ENDSECTION STARTSECTION=content_13.htm= SECTION~
Reduced Balance Depreciation (RBD)
This is the opposite of compound interest
M1 Financial Mathematics 1 Page 169
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
RBD example
Suppose that a cars value is depreciating by 15% per annum. If it initially cost £1000, what is its value after 3 years?
Year Amount on which deduction is calculated
Deduction Amount accrued
1 £1000 15% of £1000 = 0.15 × 1000 =£150
£1000 – £150 = £850
2 £850 15% of £850 = 0.15 × 850 =£127.50
£850 – £127.5 = £722.5
3 £722.5 10% of £722.5 = 0.15 × 722 =£108.375
£722.5 – £108.375 = £614.125
So the depreciated value of the car after 3 years is £ 614.125. ENDSECTION STARTSECTION=content_14.htm= SECTION~
Reduced Balance Depreciation Formula
It is advised that you use the formula. Dn = P×[ 1– i]n Where Dn is the depreciated value of the item after n time periods. This formula will be provided in the exam and does not need to be memorised.
M1 Financial Mathematics 1 Page 170
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
RBD Formula example
If £1000 depreciates at a rate of 15% per annum, then after 3 years we have a retained value of D3 = 1000×[ 1– 0.15]3 = 1000×[ 0.85]3 = 1000×0.614125 = £ 614.13 to 2 decimal places. Now go and do Exercise M1.7
ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Exercise M1.7 - RBD Activity
Suppose that an investment’s value is depreciating by 20% per annum.
If its initial value is £2500, what is its value after 3 years?
Year Amount on which deduction is calculated
Deduction Amount retained
1
£2500 20% of £2500 =
2
3
What is the depreciated value of the investment after 10 years?
M1 Financial Mathematics 1 Page 171
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_8.htm= SECTION~
Seminar Questions
Seminar Question M1.1 - Simple Interest
Suppose you have £500 which you invest at a simple interest rate of 20% per year.
How much will have accrued after 3 years?
Year Amount on which interest is calculated
Deduction Amount accrued
1
£ 20% of £ =
2
£
3
£
So the amount accrued after 4 years is £ . Write this in notation.
Use the simple interest formula to calculate how much has accrued after 10 years.
M1 Financial Mathematics 1 Page 172
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_9.htm= SECTION~
Seminar Question M1.2 - Compound Interest Question
Suppose you have £500 which you invest at a compound interest rate of 20% per year.
How much will have accrued after 3 years?
Year Amount on which interest is calculated
Deduction Amount accrued
1
£ 20% of £ =
2
£
3
£
So the amount accrued after 3 years is £ .
Write this in notation.
How much will have accrued after 10 years?
M1 Financial Mathematics 1 Page 173
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_10.htm= SECTION~
Seminar Question M1.3 - APR Question
Suppose a credit card charges a nominal annual rate of 20%
What is the Actual Percentage Rate if the interest is compounded in the following ways?
Quarterly n = 4
1rate annual nominal1... −⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
n
nRPA
Monthly n = 12
1rate annual nominal1... −⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦⎤
⎢⎣⎡+=
n
nRPA
Daily Hourly Comment on your results. ENDSECTION STARTSECTION=activity_11.htm= SECTION~
Seminar Question M1.4 - Reduced Balance Depreciation Question
Suppose that an investment’s value is depreciating (by reduced balance) by 10% per annum.
If its initial value is £500, what is its value after 3 years?
M1 Financial Mathematics 1 Page 174
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Year Amount on which deduction is calculated
Deduction Amount retained
1
£ 10% of £500 =
2
3
What is the depreciated value of the investment after 10 years?
ENDSECTION STARTSECTION=activity_12.htm= SECTION~
Seminar Question M1.5 - Straight Line Depreciation Question
Suppose that an investment’s value is depreciating (straight line) by 10% per annum.
If its initial value is £500, what is its value after 3 years?
Year Amount on which deduction is calculated
Deduction Amount retained
1
£ 10% of £500 =
2
3
What is the depreciated value of the investment after 10 years? ENDSECTION STARTSECTION=think.htm= SECTION~
M1 Financial Mathematics 1 Page 175
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer any questions relating to, all of the following points:
• examples of where commodities either gain interest or depreciate in value;
• the differences between compound and simple interest;
• how to calculate accrued amounts using compound and simple interest environments;
• the differences between nominal and effective rates of interest; • how to calculate an effective rate (or Actual percentage rate APR)
form a given nominal annual rate; • the differences between straight line and reduced balance
depreciation; • how to calculate the depreciated value of an item using straight
line or reduced balance depreciation.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section. A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
M2 Financial Mathematics 2 Page 177
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=FM2
STARTSECTION=scope_1.htm= SECTION~
Financial Maths 2 Context
In the last unit we examined how investments and commodities gained or lost value over time, using simple and compound interest environments and reduced balance and straight line depreciation. In this unit we will introduce a variety of techniques to enable us to determine whether or not the return of an investment plan make investing in it worthwhile. The techniques introduced in this unit will enable us to evaluate how much money to invest now to get an expected financial return in the future. Perhaps we need this return to cover the cost of a planned luxury holiday in a few years time, or to cover the instalments of a finance agreement. In addition these techniques will enable us to make to choose between investment plans based on their projected returns. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
• Having worked through this unit, you should: • understand the concept of present values and be able to perform
calculations for single value as and for a set of instalments; • understand the concept of net present values and be able
to perform calculations; • be able to determine whether or not an investment is worthwhile; • be able to compare two or more capital investments and determine
which, if any, are worthwhile.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
M2 Financial Mathematics 2 Page 178
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Present Values and Investment Appraisals
Suppose we have invested £100 at a compounded rate of 10% per annum. In one year you will have, £100(1+0.1) = £110.
We could pose a different question and ask how much we need to invest to get a return of £110 in a year’s time. From the calculation above we see that if we need £110 in a year’s time then we must have £100 to invest now so that in a year’s time with the accrued interest our investment will have increased to £110. So to have a return of £110 in one year with an interest rate of 10% you need to invest £100 now. This present value tells us how much this future return is worth to us now in the present. So present value, P, of a return of £110 in 1 year at an interest rate of 10% is
1.01110£100£ +
==P
Definitions
P = Present value.
This is the amount of money needed to invest now to receive a return of £A in n time periods. This is our initial investment.
A = Amount payable in n time periods.
This is the return on our investment. So our initial investment plus our the interest earned on our investment.
i = Investment rate (discount rate).
This is the interest rate that effect our investment. This is usually quoted as a percentage, say 10%, but should always be expressed as a decimal when used in calculations.
n = The number of time periods.
This is the length of time that the interest has to act on our investment. So it is the number of times that interest is added to our initial investment.
M2 Financial Mathematics 2 Page 179
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Present Value Formula
The present value of £A at an interest rate i for n time periods is:
ni)(AP+
=1
This is equivalent to, 1
(1 )nP A A Discount factori
= × = ×+
This version of the formula is often used by accountants. They use tables of calculated discount factors for particular values of i. Present Value example Suppose we need a return of £1200 in 4 years time to pay for a planned holiday. If you have a savings account with a compound interest rate 8% per annum how much money would you need to invest now so that over the 4 year period it would grow to £1200, and cover the cost of the holiday.
What is the present value of a return of £1200 invested at 8% per annum for 4 years? Let us use the present value formula.
(1 )n
APi
=+
where, A = Amount payable in n time periods = £1200 i = Investment rate = 8% = 0.08 n = The number of time periods = 4
M2 Financial Mathematics 2 Page 180
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Substituting in our values we obtain,
4
4
1200£ £(1 ) (1 0.08)
1200£ 1200 1.360489(1.08)
£882.04
n
APi
= =+ +
= = ÷
= So this tells us you need to invest £882.04 now at 8% per annum to obtain a return of £1200 on your initial investment in 4 years time. Now go and do Exercise M2.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Exercise M2.1 - Present Value Activity Calculate the present value of a return of £2500, after being invested at 15% per annum for 5 years. Use the present value formula below.
(1 )n
APi
=+
where, A = i = n =
so, (1 )n
APi
=+
So this tells us you need to invest £ now at 15% per annum to obtain a return of £2500 in 5 years time.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
The Present Value of a Set of Instalments
You are interested in buying a home cinema system. It costs a total of £1600. The salesperson offers you the opportunity to spread the cost of the purchase over 3 years. They offer you a deal where you pay £1000 now, followed by 3 subsequent annual payments of £200 at the end of each year.
M2 Financial Mathematics 2 Page 181
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
If you have a savings account with an interest rate of 8%, how much is the stereo worth to you now?
How much money do you need to invest now to be able to cover the cost of the instalments as they become due?
What is the present value of goods?
Present Value of a set of instalments
Time the is Payment Due
Payment Due
Number of time periods (n years)
Present value
Now £1000 n = 0 £1000
End of first year
£ 200 n = 1
£ 1)08.01(
200+ = £185.19
End of second year
£ 200 n = 2
£ 2)08.01(
200+ = £171.47
End of third year
£ 200 n = 3
£ 3)08.01(
200+ = £158.77
Present value of goods = £1000 + £185.19 + £171.47 + £158.77 = £1515.43 Amount paid = £1600 So £1515.43 invested now would pay off the instalments as they became due. Now go and do Exercise M2.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Exercise M2.2 - Present Value of a Set of Instalments Activity
Suppose that a stereo costs £1800 over a 3 year period. The payment consists of an initial down payment of £1500 and then a single payment of £100 every year for the next 3 years.
If you have an interest rate of 10% what is the present value of goods?
Time Payment Due
Payment Due
Number of time periods (n years)
Present value
M2 Financial Mathematics 2 Page 182
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Now £1500 n = 0 £1500
End of first year £ 100 n = 1
£ 1)10.01(
100+ = £
End of second year £ 100 n = 2
£ 2)10.01(
100+ = £
End of third year £ 100 n = 3 £
Total =
Present value of goods = Amount paid = £1800. So £ invested now would pay off the instalments as they became due. ENDSECTION STARTSECTION=content_4.htm= SECTION~
M2 Financial Mathematics 2 Page 183
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Capital Investments
This is a project consisting of an initial cash outlay, and estimated inflows and outflows of cash for the life of the project. Discounted cash flow (DCF) techniques can be used to evaluate capital expenditure projects.
Comparing Investments
Two DCF methods are:
1. Net Present Value method (NPV) 2. Internal Rate of Return (IRR)
See Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 3 for details and examples of IRR. ENDSECTION STARTSECTION=content_5.htm= SECTION~
Net Present Value (NPV)
To evaluate a project calculate all the present values associated with it.
Net Present Value example
Suppose we have the following cash flow with an associated project , we calculate the net cash flow , then with our discount rate we calculate their corresponding present values and then add them together to obtain the NPV. Suppose that the interest rate is 18.5%.
Year Cash Inflow Cash Outflow
0 £ 0 £12000
1 £ 8000 £ 8500
2 £12000 £ 3000
3 £10000 £ 1500
4 £ 6500 £ 1500
Year Cash Inflow Cash Outflow Net Cash Flow = Inflow - Outflow
0 £ 0 £12000 £-12000
M2 Financial Mathematics 2 Page 184
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
1 £ 8000 £ 8500 £ -500
2 £12000 £ 3000 £ 9000
3 £10000 £ 1500 £ 8500
4 £ 6500 £ 1500 £ 5000
Year Net Cash Flow Discount factor Present Value = NCF × Df
0 -£ 12000 11 0185
10( . )+=
-£ 12000
1 -£ 500 11 0185
084391( . ).
+=
-£ 500 × 0.8439 = -£ 421.95
2 £ 9000 11 0185
071212( . ).
+=
£ 9000 × 0.7121 = £ 6408.90
3 £ 8500 11 0185
060103( . ).
+=
£ 8500 × 0.6010 = £ 5108.50
4 £ 5000 11 0185
050714( . ).
+=
£ 5000 × 0.5071 = £ 2535.50
Total =
So the Net Present Value is -£ 12000 - £ 421.95 + £ 6408.90 + £ 5108.50 + £ 2535.50 = £ 1630.95 We say that the project makes a profit if NPV > 0. We say that the project breaks even if NPV = 0. We say that the project makes a loss if NPV < 0. ENDSECTION STARTSECTION=content_6.htm= SECTION~
Internal Rate of Return (IRR)
This method of DCF determines the rate of interest, i, at which the NPV=0.
See Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot chapter 3 for further details and examples.
Comparison of projects
M2 Financial Mathematics 2 Page 185
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Projects can be compared using NPV and IRR techniques. Usually the situation is more complicated than just comparing the two methods. So perhaps one project may involve more initial outlay, this should be taken into consideration.
Net Present Value example
Suppose there are two possible projects A and B. Both projects generate a total of £10000 over 4 years and both cost an initial payment of £6000, which is a single cash outflow. Their estimated inflows are detailed below.
Year Project A Project B
1 £4000 £1000
2 £3000 £2000
3 £2000 £3000
4 £1000 £4000
If the money has no time value the projects are identical.
If the interest rate is 10 % (discount rate is 10 %) which project is best?
M2 Financial Mathematics 2 Page 186
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Evaluation of Project A using the Net Present Value method Year Cash Inflow £ Cash Outflow
£ Net cash flow
£ Discount factor
( 4dp) Present Value £
0 0 6000 – 6000 1 – 6000
1 4000 0 4000 1 0 . 9 0 9 11 . 1
=
3636.40
2 3000 0 3000 2
1 0 .8 2 6 41 .1
=
2479.2
3 2000 0 2000 3
1 0 .7 5 1 31 .1
=
1502.6
4 1000 0 1000 4
1 0 .6 8 3 01 .1
=
683
Net present value of project A = Total of the present value column = = £ (– 6000 + 3636.4 + 2479.2 + 1502.6 + 683) = £ 2301.25 Project A’s NPV is positive and hence makes a profit. Evaluation of Project B using the Net Present Value method
Year Cash Inflow £
Cash Outflow £
Net cash flow £
Discount factor Present Value £
0 0 6000 - 6000 1 – 6000
1 1000 0 1000 1 0 . 9 0 9 11 . 1
=
909.09
2 2000 0 2000 2
1 0 .8 2 6 41 .1
=
1652.89
3 3000 0 3000 3
1 0 .7 5 1 31 .1
=
2253.94
4 4000 0 4000 4
1 0 .6 8 3 01 .1
=
2732.05
Net present value of project B = £(–6000 + 909.09 + 1652.89+ 2253.94 + 2732.05) = £1547.97 Project B’s NPV is positive and hence makes a profit.
Which project makes the biggest profit? Which would you invest in?
M2 Financial Mathematics 2 Page 187
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
Seminar Questions
Seminar Question M2.1 - Present Value Question
Calculate the present values of the following:-
a) A return of £1000 payable in 5 years time with an interest rate of 10%.
b) A return of £5000 payable in 18 months time if the Nominal annual rate is 24% and it compounded monthly.
c) An initial payment of £500 now, and £100 payable at the end of each year for the next three years. Suppose that you have an interest rate of 10%, what is the present value of goods.
Amount Number of time periods ( n years)
Time Payment Due Present value
£500 n = 0 Now £
£100 n = 1 End of first year £
£100 n = 2 End of second year £
£100 n = 3 End of third year £
Total =
Present value of goods =
Amount paid =
So £ invested now would pay off the instalments as they became due.
M2 Financial Mathematics 2 Page 188
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Question M2.2 - Net Present Value Question
Suppose we have the following cash flow with an associated project, A, and that the discount rate is 15.5%. Calculate the NPV of project A.
Year Cash Inflow Cash Outflow Net Cash Flow = Inflow - Outflow
0 £ 0 £10000 £
1 £ 8000 £ 1000 £
2 £ 8000 £ 1000 £
3 £ 9000 £ 2000 £
4 £ 6500 £ 3000 £
Year Net Cash Flow Discount factor
ni)1(1+
Present Value = NCF × Df
0
1 £
1
£
2
£
3
£
4
£
Net Present Value of Project A =
If another project has an NPV of £2437 which of the projects would you prefer?
M2 Financial Mathematics 2 Page 189
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• how to calculate a present value of a single return or for a set of instalments;
• the concept of a present value or worth of an investment; • what makes up a capital investment; • methods of comparing capital investments; • how to evaluate the Net Present Value (NPV) of an investment
plan; • the relationship between the NPV of an investment and profit and
loss.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
M3 Index Numbers Page 191
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=INDEXNO
STARTSECTION=scope_1.htm= SECTION~
Index Numbers Context
In this unit we will introduce index numbers as a method of standardising economic commodities in order to make sensible comparisons. The commodities we wish to compare may be measured in different ways and units. For example production figures may be measured in different ways for different types of industries. Hence we must standardise values in order to make comparisons between them. The methods we encounter could also help us identify statistical trends. A company may want to investigate the growth in production, or the loss in value of its stock. It would be possible to identify the rate of growth in production per month using a series of index numbers calculated over a period of time. You may have heard the term index before, and know of examples of indices. For example, the FTSE 100 index, retail price index, the output of production index, and index linked investments. In the next unit we will build the ideas and methods in this unit to enable us to compare economic commodities made up of components.
ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to;
• investigate how an index number reflects a change in prices or quantity;
• compute index numbers; • interpret index numbers; • determine which index is most suitable in a given situation and
justify your choice; • discuss their limitations.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
M3 Index Numbers Page 192
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Index Numbers
What are index numbers?
What do index numbers enable us to do? Index numbers are a method of standardising commodities, so for example prices of products or wages so that they can be compared over time. The Index number (Index relative) measures the percentage change in some economic commodity over time.
What is an Economic commodity?
Prices Wages Production figures
Types of index number
There are two main types of index, price and quantity. A price index measures changes in the cost of a commodity from one time period to another. An example is the Retail Price Index, which measures change in the cost of items of expenditure of the average household. A quantity index measures changes in the quantities of a commodity from one time period to another. An example is the productivity index, which measures the change in productivity of a group of workers.
Representation of index numbers
Index numbers are expressed in terms of a base 100 like percentages. An index number of value 100 represents the original or base value. An index number above 100 represents an increase, and an index number below represents a decrease. Increase of 50% produces an INDEX NUMBER of 150. Decrease of 10% produces an INDEX NUMBER of 90.
M3 Index Numbers Page 193
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Now go and do Exercise M3.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise M3.1
What index number would represent the following?
An increase of 60%. A decrease of 25%.
Now go and do Exercise M3.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~ Exercise M3.2
What percentage change is represented by the following index numbers:
Index number = 115.
Index number = 83. ENDSECTION STARTSECTION=content_2.htm= SECTION~
Construction of an Index number
Petrol prices example
Petrol costs 49p in Oct 1995 and, 52p per litre in Dec 1995. • Percentage Price Increase = • Index number =
M3 Index Numbers Page 194
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Index number formula
100 Relative)Index (Number Index ×=o
n
VV
. This formula will be provided in the exam and does not need to be memorised.
Petrol prices example
Returning to our petrol price example, we are given that petrol costs 49p per litre in Oct 1995, and 52p per litre in Dec 1995. The base time is usually the earliest time we collected or starting point of, our data values. The base time is Oct 1995, and so V0 = 49p. The other time is Dec 1995, and so Vn = 52p.
125.1061004952100 100) (Oct Dec =×=×==
o
n
VVI
Notation This is expressed in the following way IDec(Oct = 100) = 106.125 ↑ ↑ ↑
Other time Base time Index number representing the change in prices in Dec relative to prices in Oct.
IDEC/OCT = 106.125 ↑ ↑ Other time / Base time Index number representing the change in prices
in Dec relative to prices in Oct. .
Now go and do Exercise M3.3
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
M3 Index Numbers Page 195
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise M3.3 - Quantity Index Activity
The maths group employed 35 lecturers in April 92 and 32 in April 95.
What is the base time, and what value does it take?
What is the current time, and what value does it take?
Calculate the quantity index that represents this change using the
index number formula, 100×o
n
VV
.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Time Series of relatives
How do we investigate the growth of economic commodities over time?
How do the values of an index number (relative) change over time?
Two possible methods are 1. Fixed base relatives 2. Chain base relatives ENDSECTION STARTSECTION=content_4.htm= SECTION~
Fixed Base Index Numbers
Each relative is calculated on the same fixed time point (the same base). This is a suitable method for commodities whose nature remains unchanged over time.
M3 Index Numbers Page 196
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
The production figures of a bottle factory for a five month period are given below. Jan Feb Mar Apr May
Production 4563 4254 4841 4644 5290
0
1000
2000
3000
4000
5000
0 1 2 3 4 5 6
If we plot a graph of the production figures against time we see the production figures generally increase. Let us investigate how the productivity of the factory changes over time by comparing the monthly production figures to a fixed point in time, say March. So March will be the base time. Using this method we compare January’s production to March’s production, and then we compare February’s production to March’s production, and so on. We do this using the index number formula, calculating the production index for our series of data values. As the base time is March we know that the value of V0 is the production figure for March, so V0 = 4841. This value will remain constant throughout our calculation. The value of Vn is the production figure of the month we are comparing to March. Hence,
1004841
figure productionMonthly
100 March) base (Number Index
×=
×=o
n
VV
M3 Index Numbers Page 197
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Using this we obtain the following.
Jan Feb Mar Apr May
Production 4563 4254 4841 4644 5290
Fixed base relative
=×10048414563
=×100
48414254
=×10048414841
=×10048414644
=×100
48415290
Now go and do Exercise M3.4
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Exercise M3.4
Calculate the series of fixed base index numbers that represent the following data:
Jan Feb Mar Apr May
Production 4431 4542 4650 4781 4892
Fixed base relative (April)
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Changing the base of fixed base relatives
Suppose you have an index number relative to some base (Old base), and you wish to change it to a different base (New base). You could recalculate, or you could use the formula. This rescales our index numbers so they are relative to our new base.
base) (Oldnumber Index valuebase New valuebase Old Base) New number(Index ×=
M3 Index Numbers Page 198
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
Let’s change the base to January.
March) (Basenumber Index 45634841 Jan) Base number(Index ×=
= 1.06 × Index number ( Base March )
Jan Feb Mar Apr May
Production 4563 4254 4841 4644 5290
Fixed base index March (old base)
94.3 87.9 100 95.9 109.3
Fixed base index Jan (new base)
1.06×94.3 = 100
1.06×87.9 = 93.2
1.06×100 = 106
1.06×95.9 =101.6
1.06×109.3 = 115.9
Now go and do Exercise M3.5
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Exercise M3.5
Change the base to January.
Index number ( Base Jan ) = ×baseNew
base OldIndex number ( Base
April )
Jan Feb Mar Apr May
Production 4431 4542 4650 4781 4892
Fixed base relative (April)
Fixed base relative (Jan)
ENDSECTION STARTSECTION=content_6.htm= SECTION~
M3 Index Numbers Page 199
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Chain base relative
Each relative is calculated with respect to the immediately preceding time point. We use chain base index numbers when the nature of the commodity is rapidly changing.
Example
The sales figures for a mobile phone manufacturer are given below. Jan Feb Mar Apr May June
Sales 2150 2660 3324 4156 5160 6300
01000
20003000
40005000
60007000
0 2 4 6
If we plot a graph of the production figures against time we see the production figures sharply increase. June’s production figures are nearly three times that of January. The growth in production figures is rapidly increasing. In this case it is more sensible to investigate the rate of growth in production comparing this months production figures to those of the immediately preceding time point.
Jan Feb Mar Apr May June
Sales 2150 2660 3324 4156 5160 6300
Chain base relative
100
21502660
×
10026603324
×
10033244156
×
10041565160
×
10051606300
×
Now go and do Exercise M3.6
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
M3 Index Numbers Page 200
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise M3.6
Calculate the set of chain base index numbers for the following data.
Jan Feb Mar Apr May June
Sales 2050 2560 3024 3998 4986 6000
Chain base relative
ENDSECTION STARTSECTION=content_7.htm= SECTION~
Comparing Index numbers (fixed base)
The Real Value Index will not be covered during the lecture, however you will need to read this to complete question M3.3 of the seminar questions. If you were told that the price of chocolate had increased by 5% you would think that you could buy afford to buy less chocolate. If your wages have increased by 7% then in real terms the price of chocolate has decreased as your wages have increased at a higher rate. We will use indicators to help us judge real increases.
Examples of indicators
• Retail price index • Output of production index We will use the retail price index as an indicator for the economy in the following examples. ENDSECTION STARTSECTION=content_8.htm= SECTION~
M3 Index Numbers Page 201
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Time Series Deflation
Change in the real value of a commodity over time.
Real value index
The real value index is calculated using the following,
100IndicatorCurrent
Indicator Base valueBase
lueCurrent va R.V.I ××=
Example
The following table contains information relating to the average weekly earnings of a set of factory workers.
1974 1984
Average Earnings 59.6 174.3
RPI 134.8 351.8
Let’s investigate the growth in wages from 1974 to 1984. To calculate the index of earnings
relative to 1974 we use 100 I ×=o
n
VV .
Table 1 (without incorporating the indicator)
1974 1984
Average Earnings 59.6 174.3
Index of Earnings base 1974.
100 292100
6.593.174 I =×=
So we see a 192% increase in wages over the 10 year period.
Does this represent a real increase?
What is the increase in earnings after incorporating the indicator? Now we can calculate the real value index of earnings relative to 1974. This will enable us to judge the increase in earnings in real terms. Table 2 (Incorporating the indicator)
1974 1984
Average Earnings 59.6 174.3
RPI 134.8 351.8
RVI base 1974 100 112
M3 Index Numbers Page 202
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
112100351.8 134.8
59.6174.3 )1001984( R.V.I 1974 =××==
Now go and do Exercise M3.7
ENDSECTION STARTSECTION=activity_7.htm= SECTION~ Exercise M3.7
Complete the table: 1976 1986
RPI 134.8 351.8
Price 18 65
Price Index = 100×o
n
VV
RVI base 1976
M3 Index Numbers Page 203
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_9.htm= SECTION~
Composite Index Numbers
A composite index number is obtained by combining information from a set of economic commodities called components.
Weighting of Components
For example, the Retail Price Index has components, which consist of food, alcohol, fuel, transport, etc. In calculating a composite index number, each factor will be weighted. The weighting is considered as a measure of the importance of the component. For example, in the Retail Price Index food will be weighted more heavily than alcohol. Food is a necessity; alcohol is a luxury; hence food has a larger weight than alcohol.
Popular Weighting Factors
Calculating a price index for the production of an item consisting of three components. Use the quantity required of each component as the weighting factor. When calculating a quantity index use prices as weights. ENDSECTION STARTSECTION=content_10.htm= SECTION~
Weighted Average Index
Calculate an index relative, ( 100 I ×=o
n
VV ) , for each component,
Obtain a weighted average of these relatives.
( )∑∑=
wwI
indexaverageWeighted
where w = weighting factor, and I = the index for each relative.
M3 Index Numbers Page 204
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
A Building contractor buys 8 tonnes of concrete, 14 tonnes of bricks and 2 tonnes of cement in order to complete 1 job. The prices of each component in 1999 and 2000 are displayed in the table below.
Commodity Weighting
w
1999 Price
V0
2000 Price
Vn 100 I ×=
o
n
VV
wI
Concrete 8 25 24 96100
2524 =×
Bricks 14 34 38 8.111100
3438 =×
Cement 2 64 80 125100
6480 =×
Total
( )∑∑=
wwI
indexaverageWeighted =
Now go and do Exercise M3.8
ENDSECTION STARTSECTION=activity_8.htm= SECTION~ Exercise M3.8 - Weighted Average Index Activity
Calculate the weighted average index of the labour costs.
Category of worker
Number of workers =
w
Hourly wage rates 1987
V0
Hourly wage rates 1989
Vn
100×=o
n
VV
I
wI
Craftsmen 120 £4.50 £6.00
Labourer 200 £3.20 £3.80
Drivers 80 £3.80 £4.60
Total
( )∑∑=
wwI
indexaverageWeighted =
M3 Index Numbers Page 205
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_11.htm= SECTION~
Weighted Aggregate Index
Perhaps we should compare the current cost with the base cost. This method is called the weighted aggregate index.
( )( ) 100×=
∑∑
o
n
WVWV
indexaggregateWeighted
where Σ WVn = the total of weight × current price for each component, Σ WV0 = the total of weight × base price for each component.
Commodity
Weighting w
Price V0
Price Vn
WV0
WVn
Concrete 8 25 24 200 192
Bricks 14 34 38 476 532
Cement 2 64 80 128 160
Total 804 884
( )( ) 100×=
∑∑
o
n
WVWV
indexaggregateWeighted
=
884804
100×
=109.95 ≈110 Now go and do Exercise M3.9
ENDSECTION STARTSECTION=activity_9.htm= SECTION~
M3 Index Numbers Page 206
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise M3.9 - Weighted Aggregate Index Activity
Calculate the weighted aggregate index of the labour costs.
Category of worker
Number of workers
w
Hourly wage rates 1987
V0
Hourly wage rates 1989
Vn
WV0
WVn
Craftsmen 120 £4.50 £6.00
Labourer 200 £3.20 £3.80
Drivers 80 £3.80 £4.60
Total
( )( ) 100×=
∑∑
o
n
WVWV
indexaggregateWeighted
M3 Index Numbers Page 207
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_10.htm= SECTION~
Seminar Questions
Seminar Question M3.1
a) Calculate the series of fixed base index numbers that represent the following data
July Aug Sept Oct Nov
Production 2431 2542 2650 2781 2892
Fixed base relative (Sept)
b) Change the base to October.
Index number ( Base October) = ×baseNew
base OldIndex number (Base Sept)
July Aug Sept Oct Nov
Production 2431 2542 2650 2781 2892
Fixed base relative (October)
ENDSECTION STARTSECTION=activity_11.htm= SECTION~
Seminar Question M3.2
The average weekly earnings of manual employees in manufacturing industry increased from £132.98 in October 1983 to £164.74 in October 1986. The RPI increased from 340.7 to 388.4 over this period.
i) Calculate the index number that represents the change in earnings from 1983 to 1986.
ii) What was the real value index of earnings in over this period? ENDSECTION STARTSECTION=activity_12.htm= SECTION~
M3 Index Numbers Page 208
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question M3.3
Calculate the set of fixed base (Jan) and chain base index numbers for the following data. Which is more appropriate for the data? Explain your answer.
Jan Feb Mar Apr May
Sales 1050 1560 2024 2998 3986
Fixed base
Chain base relative
ENDSECTION STARTSECTION=activity_13.htm= SECTION~
Seminar Question M3.4
A firm uses three materials in its manufacturing processes. The quantities bought in 1980 and 1989 and the prices are as follows:
Material Units 1980 Prices (£)
1989 Prices (£)
1980 Quantities
1989 Quantities
A Thousands 100 150 10 16
B Gallons 1 2 100 120
C Metres 2 5 50 70
i) Calculate the price index and quantity index each component with base period 1980.
ii) Calculate the total cost of the manufacturing process in 1980 and 1989. Express the percentage change in the cost of the manufacturing process as an index number.
ENDSECTION STARTSECTION=activity_14.htm= SECTION~
M3 Index Numbers Page 209
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seminar Question M3.5
A cleaning product is made up of four components, A, B, C, and D. The table below displays the quantities used to make the cleaning agent and their prices in 1999 and 2000. Calculate the weighted average index .
Weighting w
1999 Price
V0
2000 Price
Vn
I= 100×o
n
VV WI
A 4 1.00 1.4
B 3 0.95 1.2
C 2 0.80 1.00
D 2 0.45 0.60
Total
( )∑∑=
wwI
indexaverageWeighted =
ENDSECTION STARTSECTION=activity_15.htm= SECTION~
Seminar Question M3.6
Calculate the weighted aggregate index.
Weighting w 1999 Price
V0
2000 Price
Vn
WV0 WVn
A 4 1.00 1.4
B 3 0.95 1.2
C 2 0.80 1.00
D 2 0.45 0.60
Total
( )( ) 100×=
∑∑
o
n
WVWV
indexaggregateWeighted
ENDSECTION STARTSECTION=think_1.htm= SECTION~
M3 Index Numbers Page 210
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• the calculation, interpretation and representation of index numbers; • when to use fixed and chain base index numbers on a time series of
data • how to judge real change; • the differences between the weighted average and weighted
aggregate index..
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
M3 Index Numbers Page 211
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
We shall next consider two particular types of weighted aggregate indices: the Laspeyre and Paasche indices. ENDSECTION STARTSECTION=content_12.htm= SECTION~
Laspeyre Index
This index always uses base time period weights. Generally it is used with price and quantity indices - A Laspeyre price index uses base time period quantities as weights.
( )( ) 100×=
∑∑
oo
no
pqpq
indexpriceLaspeyre
A Laspeyre quantity index uses base time period prices as weights.
( )( ) 100×=
∑∑
oo
no
qpqp
indexquantityLaspeyre
Laspeyre example
A firm uses three materials in its manufacturing processes. The quantities bought in 1980 and 1989 and the prices are as follows:
Material 1980 Prices
op
1989 Prices
np
1980 Quantities
oq
1989 Quantities
nq
no pq oo pq
A 100 150 10 16 10 ×150=1500 10 ×100=1000
B 1 2 100 120 100 ×2=200 100 ×1=100
C 2 5 50 70 50 ×5=250 50 ×2=100
Totals 1950 1200
M3 Index Numbers Page 212
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
( )( )
5.16210012001950
100
100
=×=
×=
×=∑∑
columnpqoftotalcolumnpqoftotal
pqpq
indexpriceLaspeyres
oo
no
oo
no
Laspeyre Index The Laspeyre index assumes that the quantities of goods are held constant from the base year. This is probably not realistic since as prices go up, consumers tend to buy less. So the Laspeyre Index usually over-estimates price increases ENDSECTION STARTSECTION=content_13.htm= SECTION~
Paasche Index
This index always uses the current time period weights.
( )( ) 100×=
∑∑
on
nn
pqpq
indexpricePaasche
( )( ) 100×=
∑∑
on
nn
qpqp
indexquantityPaasche
Remember a formula sheet is provided in the exam. Do not learn these formulas.
M3 Index Numbers Page 213
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Paasche example
Material 1980 Prices
op
1989 Prices
np
1980 Quantities
oq
1989 Quantities
nq
nn pq on pq
A 100 150 10 16 16 ×150= 2400 16 ×100=1600
B 1 2 100 120 120 ×2=240 120 ×1=120
C 2 5 50 70 70 ×5=350 70 ×2=140
Totals 2990 1860
( )( ) 100×=
∑∑
on
nn
pqpq
indexpricePaasche
8.160100
18602990
100
=×
×=columnpqoftotalcolumnpqoftotal
on
nn
Paasche Index The Paasche index assumes the current quantities are true for the base period. So the Paasche Index tends to under-estimate price increases. Also the quantities need to be updated each year. The quantities are generally more difficult than prices to determine so the Laspeyre Index is more popular. ENDSECTION ENDCHAPTER
M4 Introduction to Probability Page 215
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=PROB
STARTSECTION=scope_1.htm= SECTION~
Introduction to Probability Context
In this unit we will introduce the concept of probability, chance and randomness. We will determine how likely events are to occur when we conduct or observe an experiment. We begin by investigating discrete experiments and then build on these ideas in subsequent lectures when we investigate continuous experiments. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to;
• define an experiment and its associated outcome set; • calculate discrete probabilities; • express probabilities as decimals, fractions and percentages; • demonstrate knowledge of the properties of discrete probabilities. • demonstrate knowledge of the characteristics of the normal
distribution. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Probability and Chance
Probability measures the chance that something will happen. Statements about probability occur in everyday speech. For example the following statements are concerned with chance:
• It is highly likely that I will enjoy STX1110. • Nine times out of ten I forget to switch off my phone before going
into my seminar. • I am almost certain I will pass the multiple choice test.
Probability gives a structure to the idea of chance and allows us to try and measure the level of uncertainty or chance. This will enable us to evaluate the level of associated risk. This level of risk will inform decisions and choices to be made in the future. For example we could evaluate how risky an investment is and so determine how high likely an investment is to make a loss. If I feel that the chance of making a loss is too great then I will not invest. Probability gives a well defined structure about the idea of chance.
Definitions
M4 Introduction to Probability Page 216
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
An EXPERIMENT, X, is a situation that can be performed (or considered) to gain information. Our experiment could be as simple as picking a STX1110 tutor at random from the module team list. Possibly our experiment may be more complicated, perhaps taking part in a raffle, or playing the lottery. An OUTCOME SET, Ω, is a set of possible results associated with an experiment . Let X = picking a STX1110 tutor at random from the module team list. Then, Ω is the list of names of all STX1110 tutor in the module team. For example, Ω = { Alison , Cathy, Chris, Emma, Gary, John, Matt, Patricia, Thomas, Zainab } An EVENT, E, is either a single or combination outcomes. Let X = picking STX1110 tutor at random from the module team list and Ω = { Alison , Cathy, Chris, Emma, Gary, John, Matt, Patricia, Thomas, Zainab } One event could be picking a STX1110 tutor at random from the list and their name beginning with C. This condition is satisfied by two of the outcomes (names) in our outcome set (list), Cathy and Chris. This could be written as, E = picking Cathy or Chris.
Examples
Experiment, X Set of Outcomes,
Ω
An Event, E
Flipping a coin {Heads, Tails} The coin landing face up.
Guessing the sex of a baby {Boy, Girl } Giving Birth to a Girl
Simple Questionnaire
e.g. 1 Question :
Do you like STX1110 ?
{Yes, No, Do not Know}
A student picked at random liking STX1110.
Complex Questionnaire
e.g. Lots of questions asked to the whole of Middlesex University.
1. At which campus are you studying STX1110?
Hendon (HE) or Dubai (D)
2. Which do you prefer?
A) The statistics lectures
B) The mathematics lectures
C) No preference
{ (HE, A),
(HE, B),
(HE, C,
(D, A),
(D, B),
(D, C) }
A student picked at random preferring the mathematics lectures.
M4 Introduction to Probability Page 217
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example
Suppose an experiment consists of flipping a coin twice. What is the set of outcomes? For simplicity let us denote the coin landing Heads up by H, and Tails up by T. The first time we flip the coin it could land either Heads up or Tails up: 2 results. The second time we flip the coin it could also land either Heads up or Tails up: 2 results. So if the first result is a Heads then the result of second flip will be one of two possible answers, H, T, and if the result of the first flip is Tails then the result of the second flip will also be one of two possible answers. This gives four (2×2) outcomes to list. It is much easier to list these in a table.
Second Result H T
H
(H, H)
(H, T)
First Result
T
(T, H)
(T, T)
So, Ω = { (H, H), (H, T), (T, H), (T, T) }.
M4 Introduction to Probability Page 218
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Properties of Probabilities
Probabilities are measured on a scale between 0 and 1. This is just a scale and is a consequence of how the probability is calculated. Sometimes we refer to the percentage chance. We convert our probability value that is between zero and one to a percentage. Hence our percentage chance will lie between 0% and 100%.
Impossible 50- 50 Chance Certain
0 0.5 1 If the probability of an event, P(E), is 0, then the event is impossible.
The probability that the following events will occur is 0 since they are all impossible.
• A STX1110 student being 50 metres tall. • Winning a raffle if I do not have a ticket.
If the probability of an event, P(E), is 1, then the event is certain. The probability that the following events will occur is 1 since they are all certain to occur.
• The probability of a person being less than 50 metres tall. • The probability of winning a raffle if I have all of the tickets.
The sum of all probabilities associated with an experiment is 1. Probabilities are often presented as percentages;
For example, P(E) = 0.5 or 50% P(E) = 0.75 or 75% P(E) = 0.125 or 12.5% ENDSECTION STARTSECTION=content_2.htm= SECTION~
M4 Introduction to Probability Page 219
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Evaluating probabilities for discrete variables
Definition
For an event E associated with a experiment X , the probability of observing the event is denoted by P(E) and is defined as the following;
outcomes possible ofnumber Total occurcan event an waysofNumber )P(E =
Note we can only use this formula for experiments with a finite number of outcomes which are equally likely.
Example
Suppose that an experiment, X, consists of rolling a six sided fair die, and noting the result. Then the set of outcomes for this experiment is Ω = { 1, 2, 3, 4, 5, 6 }.
What is the probability that the die will result in an even number, P(E) ? Now if we assume that the die is fair, then each face of the die is equally likely to be rolled. In fact each face has a one in six chance of being rolled.
So, P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 61
Let our event E be obtaining an even number. So E is obtaining 2 or 4 or 6, which is three of our six outcomes. Hence the probability that the die will result in an even number is,
21
63
experiment for the outcomes ofnumber Total occurcan event an waysofNumber )P( ===E
Now go and do Exercise M4.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
M4 Introduction to Probability Page 220
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise M4.1
Suppose an experiment consists of tossing a coin twice, and noting the result on each.
Calculate the probability that you obtain 2 heads in the two tosses of the coin.
Calculate the probability that you obtain at least one head in the 2 tosses of the coin.
Now go and do Exercise M4.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~ Exercise M4.2
A bag contains sweets which are either small, medium or small, and either red or yellow , in the following numbers:
Red Yellow
Small 4 6
Medium 5 5
Large 8 2
Find the probability that the sweet picked is
i. red
ii. yellow
iii. large and yellow
iv. small, medium, or large
v. extra large.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
M4 Introduction to Probability Page 221
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Estimating probabilities from observed data
Sometimes it will be necessary to estimate probabilities using data we have observed from conducting an experiment.
Example
We could estimate the probability that a student passes the STX1110 exam by finding out how many students have passed during the last 3 years and using this proportion as an estimate of the probability that a student will pass the STX1110 exam this year. The table below details the number of students who have passed the STX1110 exam for the last three years.
Number of students
Pass Fail
2002 221 12
2003 550 21
2004 750 26
Based on this information how likely is it for a student to pass the STX1110 exam this year? First we must work out how many students have passed the test in total over the last three years. To do this we add all the entries in the pass column. Then we must find out how many students have taken the test, either passing or failing, in total. So we add all the entries in the table.
M4 Introduction to Probability Page 222
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
This is then expressed as a proportion. Number of students
Pass Fail Totals
2002 221 12 233
2003 550 21 571
2004 750 26 776
Totals 1521 59 1580
So based on this information the probability that a student will pass the STX1110 exam is,
%3.969627.015801521)exam STX1110 thePass( ===P
. Now go and do Exercise M4.3
ENDSECTION STARTSECTION=activity_3.htm= SECTION~ Exercise M4.3
A module leader collects information concerning the punctuality of students in different lecture groups. The results are displayed in the following table.
Level of Punctuality
Early On time Late
A 36 19 10
B 20 116 65
Group
C 3 35 100
How likely are the following: i. A student being late?
ii. A student being early and in group B?
iii. A student being in group A?
iv. A student not being late?
ENDSECTION STARTSECTION=content_4.htm= SECTION~ Normal Probabilities
M4 Introduction to Probability Page 223
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
At the beginning of the course we discussed frequency distributions. These are concerned with the number of times each outcome happens and the pattern of the number of occurrences of each outcome.
Example
Suppose we looked at the height distribution of the male students attending STX1110 last semester. Height 5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’0” 6’1” 6’2”
Frequency 1 6 15 22 25 20 16 5 1 A graph of the data is as follows,
0
5
10
15
20
25
5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’0” 6’1” 6’2”
What is the probability that a male student picked at random will be less than or equal to 5’8” in height? We find estimate this probability by finding the total number of students who are less than or equal to 5’8” in height, and divide by the total number. So we find
%82.191982.011122)( ===EP
. We could visualise this probability by saying it is the percentage area on our graph.
M4 Introduction to Probability Page 224
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
0
5
10
15
20
25
5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’0” 6’1” 6’2”
We could plot a relative frequency histogram and then the areas would directly represent the probability we want to evaluate. If we could measure the heights with increasing accuracy the width of the bars above would become smaller. As this happened the shape of the distribution graph would tend to a smooth curve. We could then say that probabilities could be estimated by the area under this curve. In the next unit we will use the normal distribution to do this. There are some special distributions in statistics which model the observed outcomes of experiments quite well. This is important, because given a little bit of information we can use the statistical information to calculate any required probability.
M4 Introduction to Probability Page 225
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_5.htm= SECTION~
The Normal Distribution
This is the most frequently used and important distribution in statistics; it has been shown that it models many things that occur in nature, for example the heights of males, weights of students, or time taken to complete an activity.
Characteristics of the Normal Distribution
It has a very distinctive shape.
The normal distribution is distributed about its mean:
Most of the values cluster about the mean. The variance determines how ‘tightly’. The frequency tapers away either side of the mean and tends to 0.
The total area under the curve is 1. The two values that characterise the shape of a normal distribution are the mean μ and the
standard deviation σ. The mean is a measure of location, and the standard deviation is a measure of dispersion.
Example:
Suppose f1 is the normal density function with mean μ = 0, and standard deviation s1. Suppose also that f2 is the normal density function with mean μ = 0, and standard deviation s2. We can see that both distributions have the same mean, f2 has larger standard deviation than f1.
f1
f2
M4 Introduction to Probability Page 226
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The distribution for f2 is more widely spread over the range. In effect all values lie within ± 3 standard deviations of the mean.
M4 Introduction to Probability Page 227
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Questions
Seminar Question M4.1
Suppose that an experiment, X, consists of rolling a six sided fair die, and noting the result.
Write down the set of outcomes and calculate the following probabilities:
Set of outcomes =
a) What is the probability of rolling a 4 or a 5? P(4 or 5) =
b) What is the probability of rolling 4 or more? P(4 or more) =
c) What is the probability of rolling a 7? P(7) =
d) What is the probability of rolling an odd or an even number?
P(odd or even) = ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question M4.2
In Euro-Wisney theme park in Paris, park helpers dressed up as Creepy Crawly characters distribute free gifts from a sack at random.
Before the next toy is picked, one of the park helpers refills the bag with a variety of action figures from the Creepy Crawly range. The number of each size and type is as follows.
Flint Princess
Large action figure 5 10
Medium action figure 6 9
Small action figure 9 6
Calculate the probability that the toy picked out is;
i) A Flint action figure ii) A large action figure iii) A medium Princess action figure. iv) Either a Flint or a Princess action figure.
M4 Introduction to Probability Page 228
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question M4.3
For an experiment we roll two dice together and add their results up. Finish off filling the results table below.
Result on first die Result on second die Total 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 2 1 3 2 2 4 2 3 5 2 4 6 2 5 7 2 6 8 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6
Calculate the probability of obtaining a total of 11. P (total = 11)
M4 Introduction to Probability Page 229
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Seminar Question M4.4
The set of heights of male STX1110 students is made up as follows:
Height to the nearest inch Number of students 5’ 7’’ and below 4
5’ 8’’ 8 5’ 9’’ 15
5’ 10’’ 23 5’ 11’’ 24
6’ 17 6’ 1’’ 7 6’ 2’’ 3
6’3’’ and above 1
i) Graph the above data.
ii) What percentage of male students are six foot or taller?
iii) What is the probability of picking out a male student from the group and him being shorter than six foot?
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• the properties of discrete probabilities; • how to evaluate probabilities and express them as fractions
decimals and percentages; • the link between probabilities, percentages and percentage areas
of the graphed data; • the characteristics of the normal distribution.
M4 Introduction to Probability Page 230
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
M5 Standard Normal Distribution
Page 231
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=SND
STARTSECTION=scope_1.htm= SECTION~
Standard Normal Distribution Context
In the last unit we introduced probability for discrete variables. We then considered how it may be possible to make predictions for continuous variables, such as heights and weights. We introduced the notion of approximating probabilities by evaluating an area under a curve. In this unit we build on these ideas and will introduce the normal distribution. We start by looking at a special case called the standard normal distribution. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to:
• use standard normal distribution tables to evaluate less than or lower tail probabilities;
• manipulate less then probabilities to evaluate greater than (upper tail) and interval (strip) probabilities;
• construct an appropriate z value from tables for a given probability. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Standard Normal Distribution
The standard normal distribution has a mean equal to 0 and a standard deviation equal to 1.
Probabilities associated with the standard normal distributions are tabulated. We use these tables to evaluate probabilities. Tables give us the probability that our experiment (say Z) takes a value less than a number (say z), P(Z ≤ z).
M5 Standard Normal Distribution
Page 232
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Graphically it is the area under the curve to the left of the number z.
Example Suppose that our experiment Z is the error in measuring someone’s height in centimetres. We believe that Z is normally distributed with mean = μ = 0 and standard deviation = σ = 1. We denote this by Z ~ N(0,1).
Properties of the standard normal distribution
1. The total area under the curve = 1
2. Due to the symmetry of the graph we see that the area under the curve to the left of zero is half of the total area. This tells us that P(Z<0) is one half.
P( Z<0) = ½
M5 Standard Normal Distribution
Page 233
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Calculating Standard Normal Probabilities
Less than or lower tail probabilities
Suppose we want to evaluate the probability that our error, Z, is less than 0.41cm. In notation this is written as P(Z < 0.41).
Drawing a diagram helps you visualize the probability that you are calculating. We want P(Z < 0.41), graphically this is;
Now we use our tables. Tables provide the probability that Z<z. We look up the corresponding value using the z-score, the non-zero number z. In our example we must look up 0.41 in our tables. We do this by splitting the value into units, tenths, and hundredths. The units and tenths part of the z-score tells us which row we must look in, and the hundredths identifies the column. For our example we know that the associated probability is in the 0.4 row and 1 column. We follow the row and column until we highlight the appropriate entry, see below.
↓
z 0 1 2
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
0.3 .6179 .6217 .6255
→ 0.4 .6554 .6591 .6628
0.5 .6915 .6950 .6985
We find the corresponding probability is P(Z < 0.41) = 0.6591. This tells us that 65.91% of our measurement errors are less than 0.41cm. Now go and do Exercise M5.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise M5.1
Evaluate the probability that our error is less than 1.63cm.
So in notation P(Z < 1.63).
M5 Standard Normal Distribution
Page 234
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Graphically this is
Now we use our tables.
We find the corresponding probability is Now go and do Exercise M5.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~ Exercise M5.2
Evaluate the probability that our error is less than 2.5cm.
So in notation P(Z < 2.5).
Evaluate the probability that our error is less than –1.05cm.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
M5 Standard Normal Distribution
Page 235
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Greater than or upper tail probabilities
Suppose we want to calculate the probability that Z, our measurement error, is greater than
0.41. Graphically this is
=
–
This tells us that P(Z > 0.41) = 1–P(Z < 0.41) = 1- 0.6591 = 0.3409. So 34.09% of measurement errors are greater then 0.41cm. Now go and do Exercise M5.3
ENDSECTION STARTSECTION=activity_3.htm= SECTION~
M5 Standard Normal Distribution
Page 236
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Exercise M5.3
Calculate the probability that Z, our measurement error, is greater than 1.65.
Graphically this is
Now go and do Exercise M5.4
ENDSECTION STARTSECTION=activity_4.htm= SECTION~ Exercise M5.4
Calculate the probability that Z, our measurement error, is greater than -0.37.
Now go and do Exercise M5.5
ENDSECTION STARTSECTION=activity_5.htm= SECTION~ Exercise M5.5
Calculate the probability that Z, our measurement error, is greater than 1.06.
M5 Standard Normal Distribution
Page 237
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_4.htm= SECTION~
Interval or strip probabilities
Suppose we need the probability that 0.41 < Z < 1.35. Graphically this is
=
-
Look up 1.35 and 0.41 on your tables, this gives, P(0.41 < Z < 1.35) = 0.9115 - 0.6591= 0.2524 So 25.24% of errors take a value between 0.41cm and 1.35cm. Now go and do Exercise M5.6
M5 Standard Normal Distribution
Page 238
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_6.htm= SECTION~ Exercise M5.6
Suppose we need the probability that -0.58 < Z < 1.35. Graphically this is
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Recap
So far in this unit have evaluated less than or lower probabilities using the standard normal distribution tables. We have also manipulated these less than probabilities to give greater than probabilities by subtracting the standard normal distribution table values from 1. Additionally we have calculated interval or strip probabilities by evaluating the difference between two standard normal distribution table values. ENDSECTION STARTSECTION=content_6.htm= SECTION~
Finding a Z value
Up until this point we have been using our Z value to find a probability. This probability is an area under the normal curve which we have to evaluate. Suppose now we know the area, but need to find the z value it comes from.
Example
So suppose we want to find the z value that gives us an upper tail value of 0.05, ( 5% ). Graphically this is,
M5 Standard Normal Distribution
Page 239
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
How do we find the value of z?
We know that P(Z < z)= 0.95. So if we look through the standard normal probability tables and find the entry closest to 0.95 we can find the value of z. The closest table entries are 0.9505 and 0.9495.
↓ ↓
z 0 1 2 3 4 5 6
…
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406
→ 1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750
The table entry of 0.9505 is in the 1.6 row and 5 column. So here z =1.65. The table entry of 0.9495 is in the 1.6 row and 4 column. So here z =1.64 We can use either of these z values, but to get a better estimate for z we could take their average. So take z = (1.65+1.64) ÷ 2 = 1.645. Example Find the z value that gives us an upper tail value of 0.1, ( 10% ). Graphically this is, We know that P(Z < z) = 0.9. Look through the standard normal probability tables and find the entry closest to 0.9.
M5 Standard Normal Distribution
Page 240
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The closest table entry is 0.8997, it is 0.0003 away from 0.9.
↓
z … 3 4 5 6 7 8 9
…
1.0 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8708 .8729 .8749 .8770 .8790 .8810 .8830
→ 1.2 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9370 .9382 .9394 .9406 .9418 .9429 .9441
The table entry of 0.8997 is in the 1.2 row and 8 column. So here z =1.28. Example Find the z value that gives us an upper tail value of 0.025, ( 2.5% ). Graphically this is,
We know that P(Z < z) = 0.975. Look through the standard normal probability tables and find the entry closest to 0.975. The table entry is exactly 0.9750.
Which row and column is it in? This will tell us the value of z.
M5 Standard Normal Distribution
Page 241
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Seminar Questions
Seminar Question M5.1
a) Sketch the shape of the standard normal distribution.
b) What do we know about the mean and variance of the standard normal distribution?
c) Evaluate the following probabilities
i) P(Z < 1.68)
ii) P(Z < -0.96)
iii) P(-0.96 < Z < 1.68) ENDSECTION STARTSECTION=activity_8.htm= SECTION~
Seminar Question M5.2
a) Find the z value that gives us an upper tail value of 0.01, ( 1% ).
b) Find the z value such that P(Z>z) = 0.1 = 10%. ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• how to use standard normal tables; • the link between less than, greater than and strip probabilities; • how to estimate a z-value from a given probability.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
M5 Standard Normal Distribution
Page 242
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION ENDCHAPTER
M6 General Normal Distribution Page 243
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=GND
STARTSECTION=scope_1.htm= SECTION~</p>
General Normal Distribution Context
In the last unit we introduced the normal distribution. We used a special case called the standard normal distribution to make predictions for continuous variables, such as heights and weights. We also used the idea of approximating probabilities by evaluating an area under a curve. In this unit we build on these ideas and will investigate the normal distribution in general. So for normal variables that do not have a mean of zero and a variance of one. We will learn how to standardise, and compute probabilities in these situations. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to:
• give the steps for standardisation; • use these steps to standardise general normal probabilities, and
obtain less than (lower tail) probabilities from tables; • manipulate less then probabilities to evaluate greater than (upper
tail) and interval (strip) probabilities for the general normal distribution.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
General Normal Distribution
Suppose that the situation that we are interested in follows a normal distribution but it does not have mean 0 and standard deviation 1.
M6 General Normal Distribution Page 244
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example The heights of female STX1110 students are normally distributed with mean 160cm and standard deviation 5cm. X = the height of a female STX1110 student We write, X ~ N(160 , 25). Suppose we want the probability that the student is shorter than 170cm, so P(The height of the female student is less than 170cm) = P(X <170). We do not have tables for this specific normal distribution, so we convert to a standard normal. We transform our general normal curve into a standard normal curve. To do this we first move it along the x-axis and then change the shape of the curve. Mathematically this is done via the following steps.
Steps for standardising
Subtract the mean. Divide by the standard deviation. Then evaluate the probabilities using standard normal tables.
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Standardising the General Normal Distribution
To calculate the probabilities of a general normal distribution we must transform one graph into the standard normal graph.
We do this by first moving the graph to the origin.
The next step is to change the shape of the graph by squashing it.
M6 General Normal Distribution Page 245
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Let us use the steps for standardising for our example
Steps for standardising
Subtract the mean. Divide by the standard deviation. Then evaluate the probabilities using standard normal tables.
M6 General Normal Distribution Page 246
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example P(The height of a female STX1110 student is less than 170cm) =
P( X < 170) = ⎟⎠⎞
⎜⎝⎛ −
<=⎟⎠⎞
⎜⎝⎛ −
<−
5 160 170P
5 160170
5 160P ZX
( )2P5 01P <=⎟⎠⎞
⎜⎝⎛ <= ZZ
We can use the tables to find the tail area required. Graphically this is,
So the required probability is the area under the curve to the left of 2.
The value 170 160
5−
is called a Z-score.
More formally a Z-score for any x value from a normal distribution mean μ and standard deviation σ is defined to be
σμ−
=xz
Example Suppose that X is the height of a female STX1110, and let X be normally distributed with mean 160cm and standard deviation 5cm, (so variance = 25). In notation this is X ~ N(160 , 25)
Calculate P(Height of STX1110 student is less than 155cm). In notation this is P(X < 155). To evaluate this probability we must use the steps for standardisation. Subtract the mean. Divide by the standard deviation Then evaluate the probabilities using standard normal tables. So this gives,
P(X < 155) = =⎟⎠⎞
⎜⎝⎛ −
<−
5 601155
5 160 P X P(Z< –1).
We can now evaluate this probability by looking up –1.00 in our tables.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Further Worked Examples
The time taken to type up each set of STX1110 notes is normally distributed with mean 6.4 hours and standard deviation 1.2 hours. Calculate the probability that it takes
M6 General Normal Distribution Page 247
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
a) Less than 6 hours. b) Greater than 7 hours. c) Between 6 and 7 hours. Let X = Length of time to type up the notes. Then X ~ N ( 6.4 , (1.2)2 ).
Example a) We need to calculate P(X < 6). Once we apply the steps for standardisation we have,
P(X < 6) = P(Z< 6 6 4
12− ..
) = P(Z < –0.33 )
where Z ~ N ( 0 , 1). Graphically this is,
Using the standard normal tables we find P( Z < –0.33 ) = 0.3707. So 37% of the time the notes will take less than 6 hours to type up. Example b) We need to calculate P(X > 7). Once we apply the steps for standardisation we have,
P(X > 7) = P (Z> 7 6 4
12− ..
) = P(Z > 0.5 )
Graphically this is,
Now we can use our tables to evaluate the required probability. Remember as this is an upper tail probability we must subtract the table value from one. This gives, P(X > 7) = P(Z > 0.5) = 1 – P(Z < 0.5 ) = 1 – 0.6915 = 0.3085. So, roughly 31% of the time it will take more than 7 hours to type up the notes.
M6 General Normal Distribution Page 248
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example c) P(6 < X < 7 ) = P (X < 7) – P(X < 6) = 0.6915– 0.3707 = 0.3208 Graphically,
=
-
M6 General Normal Distribution Page 249
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Seminar Questions
Seminar Question M6.1
Suppose the weight of a bag of potatoes in pounds is N (5, (0.2)2).
X = Weight of a bag in pounds. X ~ N (5, (0.2)2).
i) Calculate the probability a bag weighs less than 5.5 pounds.
P(X< 5.5) =
ii) Calculate the probability a bag weighs more than 5.5 pounds.
P(X > 5.5) =
iii) Calculate the probability a bag weighs between 5.5 and 4.5 pounds.
P(4.5 < X < 5.5) =
ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Question M6.2
Suppose X ~ N (6.4 , (1.2)2). Calculate the following probabilities.
i) P(X < 6.5) =
ii) P(X > 7.5) =
iii) P(5.8 < X < 6.7) =
ENDSECTION STARTSECTION=content_4.htm= SECTION~
M6 General Normal Distribution The standard normal distribution Page 250
-4 -3 -2 -1 0 1 2 3 4z
-4 -3 -2 -1 0 1 2 3 4z
z 0 1 2 3 4 5 6 7 8 9 z 0 1 2 3 4 5 6 7 8 9 -3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010 0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 -2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014 0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 -2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019 0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 -2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026 0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 -2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036 0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 -2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048 0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 -2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064 0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 -2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084 0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 -2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110 0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 -2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143 -2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183 1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 -1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 -1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294 1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 -1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367 1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 -1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455 1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 -1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559 1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545 -1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681 1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633 -1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823 1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706 -1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985 1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767 -1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170 -1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857 -0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611 2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890 -0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867 2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916 -0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148 2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936 -0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451 2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952 -0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776 2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964 -0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121 2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974 -0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483 2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981 -0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859 2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986 -0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247 0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641 3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
M6 General Normal Distribution
Page 251
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• how the steps for standardisation transform a general normal variable into a standard normal variable;
• how to use the steps for standardisation to find general normal probabilities from the standard normal tables;
• the link between less than, greater than and strip probabilities for a general normal variable.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
.
ENDSECTION ENDCHAPTER
M7 Linear Equations Page 253
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=LE
STARTSECTION=scope_1.htm= SECTION~
Linear Equations Context
This unit is designed to enable you to top up the mathematical skills required to formulate and solve linear programming problems. We will revise the skills required to generate and manipulate linear equations. The techniques that we revise will provide us with the basic mathematical tools to help us find the optimal allocation of resources like time, materials, money, to achieve the best solution to the business problem. So to maximise profit we may need to evaluate the optimal number of each type of product to produce within the restraints that we have. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to:
• formulate linear equations; • manipulate linear equations; • construct linear graphs; • identify regions within a plot.
ENDSECTION STARTSECTION=content_1.htm= SECTION~
Linear Equations
Solutions to equations can be thought of as lines or curves. All the equations we will be working with will be linear equations, so those equations whose solutions can be thought of as straight lines. These equations do not contain terms with powers of x not equal to 1. So they do not have x2, x3, x4, etc, but will have a multiple of x. Linear equations are those of the form;
bxay += a is where the line cuts the y axis, and b is the gradient (slope) of the line.
M7 Linear Equations Page 254
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
y = 10 + 20x
-60
-40
-20
0
20
40
60
80
-3 -2 -1 0 1 2 3
You may have seen this written as cmxy += , where c is where the line cuts the y axis, and m is the gradient (slope) of the line. It really doesn’t matter what letters you use to represent the slope and intercept. We will use a and b as we will see this form later in the course during the regression and time series units. The equation, bxay += , tells us what the y-value is for any given x-value. So we can see that y depends on x. Often we see y called the dependent variable, and x the independent variable. Another way to think of this is that if we change the value of x then the value of y will move in response. So when y changes we know that it can be explained by x moving. Often we see y called the response variable and x the explanatory variable. The numerical value of b tells us how y responds when we increase x by +1. So if xy 2010 += , then we know that the line representing this equation crosses the y-axis at 10 when x = 0, and that every time you increase x by +1, y increases by +20. See the example below.
M7 Linear Equations Page 255
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example y = 10 + 20x If we calculate the corresponding values of y for x between –3 and 3, we can see that as x increases by +1, y increases by +20.
→+1
→+1
→+1
→+1
→+1
→+1
x –3 –2 –1 0 1 2 3
y = 10 + 20x y = 10 + 20(–3)
= 10 – 60 = –50
y = 10 + 20(–2)
= 10 – 40 = –30
–10 10 30 50 70
→+ 20
→+ 20
→+ 20
→+ 20
→+ 20
→+ 20
Using these point we can plot the graph for y = 10 + 20x. The line slopes upwards telling us that there is a positive relationship between x and y. This tells us that as x increases y increases.
y = 10 + 20x
-60
-40
-20
0
20
40
60
80
-3 -2 -1 0 1 2 3
M7 Linear Equations Page 256
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example y = 10 – 20x Suppose that our equation has a negative b value, so a negative slope. If we calculate the corresponding values of y for x between –3 and 3, we can see that as x increase by +1, y decreases by 20.
→+1
→+1
→+1
→+1
→+1
→+1
x –3 –2 –1 0 1 2 3
y = 10 – 20x y = 10 – 20(–3)
= 10 – (–60) = 10 + 60 = 70
y = 10 – 20(–2)
= 10 + 40 = 50
30 10 -10 –30 –50
→− 20
→− 20
→− 20
→− 20
→− 20
→− 20
Using these point we can plot the graph for y = 10 – 20x. The line slopes downwards telling us that there is a negative relationship between x and y. This tells us that as x increases y decreases.
y = 10 - 20x
-60
-40
-20
0
20
40
60
80
-3 -2 -1 0 1 2 3
M7 Linear Equations Page 257
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Manipulating linear equations
Often we see equations in the form;
dycxP +=
Examples 21 = 15x + 3y 100 = 20x + 10y We can rearrange these equations by first taking cx from both sides, and then dividing by d. So for the equation 100 = 20x + 10y we first take 20x from both sides, and then divide by 10. So, 100 – 20x = 20x + 10y – 20x The – 20x and the + 20x terms cancel out leaving, 100 – 20x = 10y. Dividing by 10 then gives, y = 10 – 2x. So although the equations look different at first sight, they are just different arrangements of each other. Now go and do Exercise M7.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise M7.1
Rearrange the following into the form y = a + bx.
i) 600 = 6x + 10y
ii) 700 = 10x + 7y
iii) 18 = 3x + 2y
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Drawing Graphs
All we need to draw a line is two points which the line passes through. We say that two points fix a line. Any two points. The easiest two points to calculate are the points where the line crosses the axis. The equation of the line below is 600 = 6x + 10y or y =60 – 0.6x.
M7 Linear Equations Page 258
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
y = 60 - 0.6x
X
Y
Here x =0, y =?
Here y =0, x =?
STEP 1
Find where the line cuts the x-axis (where y = 0).Let y = 0 in your equation, and then rearrange it to find the x value. Example Suppose our linear equation is 600 = 6x + 10y. Let y = 0. 600 = 6x + 10(0) = 6x + 0 = 6x. Divide by 6 to find x: x = 600 ÷ 6 = 100. This gives us the co-ordinates of our first point, y =0, so remembering that this is written in the form (x, y) this is (100,0). So the line crosses the x-axis at (100, 0).
M7 Linear Equations Page 259
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
STEP 2
Find where the line cuts the y-axis (where x =0).Let x = 0 in your equation, and then rearrange it to find the y value. Example Our linear equation is 600 = 6x + 10y. Let x = 0. 600 = 6(0) + 10y = 0 + 10y = 10y. Divide by 10 to find y: y = 600 ÷ 10= 60. This gives us the co-ordinates of our second point, x = 0, y = 60, (0,60)
STEP 3
Mark these two points on the axis of your graph and join them up with a straight line.
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110 120
y = 60 - 0.6x
x
y
Now go and do Exercise M7.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~ Exercise M7.2
Find two points that satisfy 2x + 3y = 18, and draw the graph of this linear equation.
Use the steps above to draw the straight line graph of 2x + 3y = 18.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Finding the intersection of two straight lines.
M7 Linear Equations Page 260
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
After drawing the lines representing two linear equations on the same graph you may find that they intersect as above. To find the co-ordinates of the point where the lines meet we substitute one equation into the other and then solve.
Simultaneous Equations
If there are two unknown variables that we want to find, we need two equations – known as simultaneous equations – in order to do so. There are two methods of solving simultaneous equations: elimination or substitution. ENDSECTION STARTSECTION=content_4.htm= SECTION~
Solving simultaneous equations by elimination.
Equation 1: 2052 =+ yx Equation 2: 1123 =− yx
STEP 1
Eliminate x by making the coefficients of both equations the same. In this example, we can multiply the first equation by 3 and the second by 2:
203)52(3 ×=+ yx so 60156 =+ yx 112)23(2 ×=− yx so 2246 =− yx
STEP 2
Subtract one equation from the other to eliminate x: (6x + 15y) – (6x – 4y) = 60 – 22 the terms involving x cancel out and so we have 19y = 38 dividing by 19, y = 38/19 = 2.
STEP 3
0
20
40
60
80
100
120
-25 0 25 50 75 100 125
500 = 10x + 5y
600 = 6x + 10y
M7 Linear Equations Page 261
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Substitute this value of y into one of the equations: 2052 =+ yx so substituting in y = 2 gives 20)2(52 =+x
20102 =+x 102 =⇒ x 5=⇒x .
Now go and do Exercise M7.3
ENDSECTION STARTSECTION=activity_3.htm= SECTION~ Exercise M7.3
Try solving the equations above again, but eliminate the variable y first.
ENDSECTION STARTSECTION=content_5.htm= SECTION~
M7 Linear Equations Page 262
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Solving Simultaneous Equations by Substitution
600 = 6x + 10y Label this (1) 500 = 10x + 5y Label this (2)
STEP 1
Rearrange (1) into the form y = a + bx.
Divide by 10: yx1010
106
10600
+= .
Which is 60 = 0.6x + y. Take 0.6x from both sides: 60 – 0.6x = 0.6x + y – 0.6x = y. 60 – 0.6x = y.
STEP 2
Substitute (3) into (2) for y and solve. 500 = 10x + 5y = 10x + 5(60–0.6x) = 10x + 300 – 3x = 7x +300
7428
7200
7300500
==−
=x .
STEP 3
Substitute this value for x into (2) to find a corresponding y value. 500 = 10(28 4/7) + 5y 100 = 2(28 4/7) + y (dividing by 5) Rearranging gives y = 100 – 2(28 4/7) = 42 6/7. ENDSECTION STARTSECTION=content_6.htm= SECTION~
M7 Linear Equations Page 263
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Estimating points of intersection using the graphical method
Finding the co-ordinates by solving the simultaneous equations is exact, but not always straight forward. An alternative method is to approximately find the co-ordinates of the intersection by reading the co-ordinates off the graph as accurately as possible. This method is not exact, but is more straightforward. The accuracy of the co-ordinates depends on how well you have drawn your graph.
From the graph, the intersection has co-ordinates (28, 42).
Inequalities
We will be using statements like y < a + bx where <means “less than” y ≤ a + bx where ≤ means “less than or equal to” y > a + bx where > means “greater than” y ≥ a + bx where ≥ means “greater than or equal to” Linear inequalities define regions on a graph
0
20
40
60
80
100
120
-25 0 25 50 75 100 125
500 = 10x + 5y
600 = 6x + 10y
M7 Linear Equations Page 264
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example 1 The shaded area below represents the region 2x + 3y ≤18:
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9 10
2x + 3y = 18
x
y
We shade all points (x, y) for which 2x + 3y ≤ 18. This region is below and to the left of the line. The shading crosses the axis as there are points with negative co-ordinates that satisfy the inequality, e.g. (–1, –1) since 2x + 3y = –5 which is less than 18. Example 2 The shaded area below represents the region 2x + 3y ≤ 18, with x and y positive.
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9 10
2x + 3y = 18
x
y
We shade all positive points (x, y) for which 2x + 3y ≤ 18. This region is below and to the left of the line, but it does not cross the axes.
Example 3 The shaded area below represents the region 2x + 3y ≤ 18, with x ≤ 3.
M7 Linear Equations Page 265
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9 10
2x + 3y = 18
x
y
x = 3
We shade all points (x, y) for which 2x + 3y ≤ 18, where x ≤ 3. This region is below the line 2x + 3y ≤ 18 and to the left of the x = 3 line.
M7 Linear Equations Page 266
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Seminar Questions
Seminar Question M7.1
Draw the graphs for the following linear equations:
i) 5x + 12y = 2400
ii) 3x + 4y = 1200
iii) x + 5y = 800
Also find the points where the equations above meet. You can use simultaneous equations or estimate the points from your graph.
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Question M7.2
Shade the following regions:
i) 5x + 12y ≤ 2400
ii) 3x + 4y ≤ 1200
iii) x + 5y ≤ 800
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question M7.3
Shade the regions (from Seminar Question M7.2) with the additional
condition that x and y are both positive.
ENDSECTION STARTSECTION=activity_7.htm= SECTION~
Seminar Question M7.4
Shade the region that satisfies the following conditions simultaneously:
5x + 12y ≤ 2400, 3x + 4y ≤ 1200, x + 5y ≤ 800, and x & y positive.
M7 Linear Equations Page 267
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• how to find the co-ordinates of the points where a linear equation cross the axes;
• plot linear equations; • evaluate points of intersection for two linear equations by solving
simultaneous equations or by estimating these graphically; • how to identify regions that satisfy inequalities.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
M8 Linear Programming and Optimisation Page 269
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=LP
STARTSECTION=scope_1.htm= SECTION~
Linear Programming and Optimisation Context
In this unit we investigate methods for solving linear programming problems. We will find optimal solutions, either maximising profit or minimising cost, using the mathematical skills from the previous unit to linear business problems. We will consider optimisation problems involving two variables. Often, linear programming is used to investigate the optimal allocation of certain resources to maximise profit. For example a plastics manufacturer makes washing up bowls and bins and wants to find how many of each type should be made to maximise the profit given various constraining factors such as the amount of plastic available. By the end of this unit you will be able to identify the number of each product to make to produce the maximum profit. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to;
• formulate linear programming problems; • graph the constraint inequalities and identify the feasible region; • find an optimal solution to the problem; • evaluate the utilisation of resources for points within the feasible
region. ENDSECTION STARTSECTION=content_1.htm= SECTION~
Profit Lines
Suppose a company is providing two products; eg kitchen bins and washing up bowls. The profit on a kitchen bin is £3 and the profit on a washing up bowl is £2. Let x = number of bins produced, and y = number of bowls produced, then the Profit function is, P = 3x + 2y. How do we graph this?
M8 Linear Programming and Optimisation Page 270
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
a) Use three axes Difficult trying to show a 3 dimensional picture on 2 dimensional paper. Example P = 3x + 2y
0
5
10
15
P
1 2 3 4 01
2
x
y
b) Use 2 axes, x and y, and draw the profit function for particular values of P. Example If P = £20 our linear equation is 20 = 3x + 2y which passes through the points (0,10) and (6.67,0). If P = £60 our linear equation is 60 = 3x + 2y which passes through the points (20,0) and (0,30).
0
5
10
15
20
25
30
35
-10 -5 0 5 10 15 20 25 30
60=3x +2y
20=3x +2y
The profit lines are all parallel and as the profit increases the line moves further up the graph. ENDSECTION STARTSECTION=content_2.htm= SECTION~
Gradient of a General Profit Line
Let c and d be constants, then our profit function is, P = cx + dy. Taking cx from both side we have dy = – cx + P.
Then dPx
dcy +
−=
The gradient of the profit line is dc− , so for our example,
P = 3x + 2y so our gradient = –3/2.
M8 Linear Programming and Optimisation Page 271
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Linear Programming
Linear programming is concerned with maximising (or minimising) some linear objective function (e.g. profit function), subject to some constraints on x and y. This means finding values for x and y which maximise the objective function and which satisfy the constraints. There are several ways of solving this problem.
Example: Maximising Profit
A manufacturer makes two products, X and Y. Each X requires 5 hours in the assembly department, 3 hours in the spraying department and 1 hour in the finishing department. For Y, the time required in each of these departments is 12 hours, 4 hours and 5 hours respectively. The total weekly hours available in each department are 2400, 1200 and 800. If the profits are £30 on each X and £100 on each Y, what is the maximum profit output?
M8 Linear Programming and Optimisation Page 272
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
X
(time in hrs) Y
(time in hrs) Total time available
(hrs)
ASSEMBLY 5 12 2400
SPRAYING 3 4 1200
FINISHING 1 5 800
PROFIT 30 100
How do we express this as a linear programme?
Linear Programme
We want to maximise profit, P = 30x + 100y Subject to our constraints, Assembly 5x + 12y ≤ 2400 Spraying 3x + 4y ≤ 1200 Finishing x + 5y ≤ 800 We have the additional constraints that x and y are positive. Each constraint tells us how resources are used within the maximum resources we have available. In the Assembly department the time used for a given level of output (x ,y) is 5x + l2y. So we know that for any solution to our problem the amount used is less than or equal to the amount available, 2400. Any difference between what we have used and what we have available is spare. We will evaluate this later.
How do we find the optimal profit point?
So how do we find the values for x and y that maximise P?
M8 Linear Programming and Optimisation Page 273
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
STEP 1
Find the region that satisfies our constraints by drawing a graph. So we must plot each constraint equation onto our graph. Assembly: 5x + l2y = 2400 This line passes through (0, 200) and (480, 0). Spraying: 3x + 4y = 1200 (0, 300) and (400, 0). Finishing: x + 5y = 800 (0, 160) and (800, 0).
STEP 2
We now shade the area on our graph which satisfies our constraint inequalities simultaneously. This produces our feasible region, the set of possible answers to our linear programme.
0
50
100
150
200
250
300
350
0 100 200 300 400 500 600 700 800 900 1000
Finishing
Spraying
Assembly
The Optimum point is the point (x, y) that is in the feasible region and which maximises the objective function.
M8 Linear Programming and Optimisation Page 274
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
STEP 3
To find the optimum point you can either compare slopes of the edges of the feasible region with the slope of the objective function, or find the co-ordinates of the points A, B, C, and D, then calculate the profit at each of these points to see which is the maximum value.
0
50
100
150
200
250
300
350
0 100 200 300 400 500 600 700 800 900 1000
A B
C
D
Spraying
FinishingAssembly
ENDSECTION STARTSECTION=content_4.htm= SECTION~
M8 Linear Programming and Optimisation Page 275
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Method of Slopes
The slope of the objective function and the constraints are calculated and compared to determine the optimum point. The slope or gradient of an equation of the form, P = cx +dy , is given by
dc−
. So if we calculate the slopes of the constraints, finishing, assembly, and spraying, and compare them to the slope of the profit function we will be able to identify the optimum point. Profit function P = 30x + 100y Slope = – 30/100 = –0.3. Finishing: x + 5y = 800 Slope = –1/5 = –0.2. Assembly: 5x + l2y = 2400 Slope = –5/12 = –0.42. Spraying: 3x + 4y = 1200 Slope = – ¾ = –0.75. Now –0.3 is between –0.2 and –0.42, and so the optimum point is the intersection of the Finishing and Assembly lines, so the point marked B on our graph. Next we must find the co-ordinates of B. So we can either estimate the co-ordinates from the graph or use simultaneous equations for an exact solution.
So the Optimum Point = ⎟⎠⎞
⎜⎝⎛
131123,
138184 = (184.6, 123.1) which produces a maximum profit of
30(184.6) + 100(123.1)= £17,846.15. ENDSECTION STARTSECTION=content_5.htm= SECTION~
M8 Linear Programming and Optimisation Page 276
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Calculating Profit Method
First find the co-ordinates of A, B, C, D. We know A= (0, 160) and D = (400, 0) from the graph. We then have to solve two sets of simultaneous equations to find the co-ordinates of B and C, or estimate their co-ordinates by reading them from the graph. Once we have the co-ordinates we substitute them into the profit function and find the point that produces the maximum profit.
Point Co-ordinates
(x, y)
Profit = P = 30x + 100y
A (0, 160) 30(0) + 100(160) = £16,000
B (184.6, 123.1) 30(184.6) + 100(123.1)= £17,846.15
C (300, 75) 30(300) + 100(75)= £16,500
D (400, 0) 30(400) + 100(0)= £12,000
We clearly see that point B is the optimal point, since it produces the maximum profit.
Optimal Solution
Using either method we find that point B is the optimal solution to our linear programme and that we should produce 184 of product X and 123 of product Y to maximise profit.
M8 Linear Programming and Optimisation Page 277
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_6.htm= SECTION~
Utilisation of Resources
Once we have an optimal solution we can assess how fully we are using the resources available at each corner of the feasible region.
Point (x, y) Used Available Spare = Available – used
A (0, 160)
Finishing x + 5y = 0 + 5(160) = 800 Fully used
800 800 – 800 = 0
Assembly 5x + l2y = (5×0) + (12×160) = 1920
2400 2400 – 1920 = 480
Spraying 3x + 4y = (3×0) + (4×160) = 640
1200 1200 – 640 =560
B (184.6,123.1)
Finishing Fully used 800 0
Assembly Fully used 2400 0
Spraying 3x + 4y = (3×184.6) + (4×123.1) = 1046.2
1200 1200 – 1046.2 = 153.8
C (300, 75)
Finishing x + 5y = 300 + (5×75) = 675
800 800 – 675 = 125
Assembly Fully used 2400 0
Spraying Fully used 1200 0
D (400, 0)
Finishing x + 5y = 400 + (5×0) = 400
800 800 – 400 = 400
Assembly 5x + l2y = (5×400) + (12×0) = 2000
2400 2400 – 2000 = 400
Spraying Fully used 1200 0
M8 Linear Programming and Optimisation Page 278
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example: Minimising Cost
A health food manufacturer wishes to blend two kinds of food so that the package can claim that a 100g portion will contain enough of two particular vitamins to meet the daily requirement for good health.
The requirements are:
Vitamin A at least 250 units and Vitamin B at least 225 units.
The vitamin content of each g of the two foods are shown below.
Food 1 Food 2
Vit A 5 12
Vit B 3 4
If food 1 costs 0.5p per g, and food 2 costs 0.4g per g, how much of each food should be used in each 100g portion to give enough of vitamins A and B at minimum cost?
How do we express this as a linear programme?
Let the manufacturer blend x g. of Food 1 with y g of Food 2. This will help us construct our cost function and constraint inequalities.
We want to minimise the cost of the food portion. Since Food 1 costs 0.5p per g, and Food 2 costs 0.4g per g and as our portion is made up of x g. of Food 1 and y g of Food 2 then the cost of a portion is
yxCCost 4.05.0 +==
We know that each portions weight must be at least 100g. So the weight of Food 1(x) plus Food 2 (y) must be at least 100g. This produces
100≥+ yx
Also we know that the Vitamin A content of the portion must be at least 250 units. So the Vitamin A content of Food 1 plus the Vitamin A content Food 2 must be at least 250 units. This produces
250125 ≥+ yx
M8 Linear Programming and Optimisation Page 279
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Similarly the Vitamin B content of the portion must be at least 225 units. So the Vitamin B content of Food 1 plus the Vitamin B content Food 2 must be at least 225 units. This produces
22543 ≥+ yx
Common sense tells us that the quantities we use of each food will be positive. This gives 0,0 ≥≥ yx .
Linear Programme
We want to minimise Cost = 0.5x + 0.4y subject to the following constraints, x + y ≥ 100 (1) 100g portion 2x + 5y ≥ 250 (2) Vitamin A requirement 3x + 2y ≥ 225 (3) Vitamin B requirement x ≥ 0, y ≥ 0 To find the region that satisfies our constraints we draw the graph of our constraints. So we must plot each constraint equation onto our graph. Constraint (1) x + y ≥ 100 This line passes through (0, 100) and (100, 0). Constraint (2) 2x +5 y ≥ 250 (0, 50) and (125, 0). Constraint (3) 3x +2 y ≥ 225 (0, 112.5) and (75, 0).
M8 Linear Programming and Optimisation Page 280
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Method of Slopes
The slope of the objective function and the constraints are calculated and compared to determine the optimum point. The slope or gradient of an equation of the form, P = cx +dy , is given by
dc−
. So if we calculate the slopes of the constraints, finishing, assembly, and spraying, and compare them to the slope of the profit function we will be able to identify the optimum point.
yxCCost 4.05.0 +== Slope = –0.5/0.4 = –1.25 Constraint (1) x + y ≥ 100 Slope = –1/1 = –1 Constraint (2) 2x +5 y ≥ 250 Slope = –2/5 = –0.4 Constraint (3) 3x +2 y ≥ 225 Slope = –3/2 = –1.5 Now –1.35 is between –1 and –1.5, and so the optimum point is the intersection of (1) and (3), so point B. So we can either estimate the co-ordinates from the graph or use simultaneous equations for an exact solution. So the Optimum Point = (25, 75) which produces a minimum cost of £ (0.5(25) + 0.4(75)) = 42.5p. So the manufacturer should blend 25 g of Food 1 with 75 g of Food 2 to achieve a minimum cost of 42.5 p per 100g.
M8 Linear Programming and Optimisation Page 281
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Calculating the cost method
First find the co-ordinates of A, B, C, D. Once we have the co-ordinates we substitute them into the cost function and find the point that produces the minimum cost.
Point Co-ordinates
(x, y)
yxCCost 4.05.0 +==
A (0, 112.5) 0.5(0) + 0.4(112.5) = 45p
B (25, 75) 0.5(25) + 0.4(75)= 42.5p
C (83⅓,16⅔)
0.5(83⅓)+ 0.4(16⅔)= 49.93p
D (125, 0) 0.5(125) + 0.4(0)= 62.5p
We clearly see that point B is the optimal point, since it produces the minimum cost.
Optimal Solution
Using either method we find that point B is the optimal solution to our linear programme and that we should include 25g of Food 1 and 75g of Food 2 to minimise cost subject to our constraints. So the manufacturer should blend 25 g of Food 1 with 75 g of Food 2 to achieve a minimum cost of 42.5 p per 100g.
Exercise M8.1
Try calculating the utilisation of resources at the optimal point for the above example.
M8 Linear Programming and Optimisation Page 282
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_1.htm= SECTION~
Seminar Questions
Seminar Question M8.1
The table below gives information about a bed manufacturer, Sleepy Nights Ltd, who make and distribute single and double beds. Each manufactured item goes through 3 departments, assembly, testing and distribution.
Time for activity in mins
Singles (x) Doubles (y) Total time available
Assembly 9 12 7200
Testing 10 10 6500
Distribution 12 6 6000
Profit 150 100
a) Formulate the above information as a linear programming problem, and write down your objective function.
b) Draw a graph of the constraint inequalities.
c) Find the optimum profit point, indicate this on your graph and evaluate the maximum profit.
d) Describe the utilisation of resources at the optimum point. ENDSECTION STARTSECTION=activity_2.htm= SECTION~
Seminar Question M8.2
The table below gives information about a car manufacturer, Citrus Cars Ltd, for two models the AM and the CM. Each car must go through 3 departments, assembly, testing and distribution.
Time for activity in mins
AM (x) CM (y) Total time available
Assembly 50 30 3000
Testing 100 100 7000
Distribution 60 110 6600
Profit 2500 3500
M8 Linear Programming and Optimisation Page 283
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
a) Formulate the above information as a linear programming problem, and write down your objective function.
b) Draw a graph of the constraint inequalities.
c) Find the optimum profit point, indicate this on your graph and evaluate the maximum profit.
d) Describe the utilisation of resources at the optimum point.
e) If the profit of the AM is increased to £4500 how would this affect the optimum point?
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• the formulation of the profit function from the information provided;
• the formulation of the constraint inequalities from the information provided;
• graph the constraint inequalities and identify the feasible region; • identification of the optimal point, and calculate the maximum
profit achieved; • how to evaluate the utilisation of resources.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
ENDSECTION ENDCHAPTER
M9 Time Series Analysis Page 285
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
CHAPTER=TS
STARTSECTION=scope_1.htm= SECTION~
Time Series Analysis Context
At the beginning of the course we examined methods used to collect data. We then represented our results graphically, and summarised the data. We also used regression analysis to describe any linear relationships between variables. The regression methods used are appropriate when considering a causal relationship between variables. Having estimated the regression model it can be used to estimate/forecast/predict a value of the response variable, y, for a known value of the explanatory variable, x. Thus, we could use such models to forecast the sales of a commodity given we know the advertising expenditure etc. In this unit we will look at how the relationship between a variable (such as sales, demand, price of materials, labour costs) and time can be analysed so as to predict future values of the variable. The generation of reliable estimates of future values is crucial for planning purposes in many businesses. Time series analysis involves consideration of historical data to obtain estimates or forecasts of future values based on past values. ENDSECTION STARTSECTION=scope_2.htm= SECTION~
Objectives
Having worked through this unit you should be able to;
• know what a time series is; • construct a time series graph; • describe the components of a time series; • estimate the trend using regression and moving average
techniques; • find seasonal variations using the additive model and
multiplicative models; • forecast by extrapolating a trend and adjusting for seasonal
variation.
M9 Time Series Analysis Page 286
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_1.htm= SECTION~
What is a Time Series?
A series of values taken over a time period is referred to as a time series. For our purposes we will assume the data was recorded at regular time intervals. The following are examples of time series.
• Financial Time Index recorded daily for the last 5 years • Daily air pollution levels in London for the last month • Number of road deaths in U.K. recorded weekly for the last 10
years • Monthly sales of a company over the last 2 years • Total annual costs of production for a company for the last 10
years. Now go and do Exercise M9.1
ENDSECTION STARTSECTION=activity_1.htm= SECTION~ Exercise M9.1
How many observations are there in each of the above time series?
ENDSECTION STARTSECTION=content_2.htm= SECTION~
Aims of Time Series Analysis
• To identify patterns in the data. E.g. electricity bills will be high in winter and low in summer
• To gain an understanding of the variation in the data, both in the long and the short term. E.g. perhaps in the short term, the number of road deaths fluctuates due to the weather conditions, but in the long term the number of road deaths are increasing since the number of cars on the road are increasing.
Time Series Plot
A graph of a time series is produced by plotting our variables of interest against time. The horizontal axis represents time and the vertical axis represents the values of the data recorded. The graph is very useful in identifying characteristics of the data.
M9 Time Series Analysis Page 287
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Consider the following time series.
Year 1998 1999 2000 2001 2002 2003 2004
Sales (£000s) 20 21 24 23 27 30 28
A graph of this data is as follows:
0
5
10
15
20
25
30
35
1998 1999 2000 2001 2002 2003
Sale
s (£0
00s)
.
ENDSECTION STARTSECTION=content_3.htm= SECTION~
Components of a Time Series
In order to consider the behaviour of such time series it is useful to separate the values into a number of components. Trend (T) The trend is the underlying long term movement over time in the value of the time series data.
M9 Time Series Analysis Page 288
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Example In the following three time series there are three types of trend which are immediately apparent in the time series graphs.
Year Output per labour hour (units)
Cost per unit (£)
Number of employees
1999 30 1.00 100
2000 24 1.08 103
2001 26 1.20 96
2002 22 1.15 102
2003 21 1.18 103
2004 17 1.25 98
Series A Series B Series C
Time series A (output)
0
5
10
15
20
25
30
35
1999 2000 2001 2002 2003 2004
Series A
There is a downward trend in the output per labour hour. Output per labour hour did not fall every year because it went up between 2000 and 2001, but the long term movement (trend) is clearly a downward one.
M9 Time Series Analysis Page 289
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Time series B (cost)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1999 2000 2001 2002 2003 2004
Series B
There is an upward trend in the cost per unit. Although costs went down in 2002 from a higher level in 2001, the basic movement over time is one of rising costs. Time series C (number of employees)
92
94
96
98
100
102
104
1999 2000 2001 2002 2003 2004
Series C
There is no clear movement up or down, and the number of employees remained fairly constant around 100. The trend is therefore a static or level one.
M9 Time Series Analysis Page 290
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Seasonal Variations (S)
These are short-term fluctuations in recorded values due to circumstances which affect results at different times. Seasonal is a term which may appear to refer to seasons of the year but its meaning in time series analysis is somewhat broader as the following examples show.
• Daily seasons: the data could be the number of patients recorded daily in a casualty department. There would be more patients on the weekends.
• Monthly seasons: there would be more cold / flu cases in the winter months than the summer months.
• Quarterly seasons; Electricity bills arrive quarterly and the winter quarters tend to have higher bills.
Other seasonal examples
• Sales of ice cream will be higher in the summer than in the winter, and sales of overcoats will be higher in the autumn than in the spring.
• The telephone network may be heavily used at certain times of the day and much less at other times.
Cyclical Variations
Cyclical variations are medium term changes in results caused by circumstances which repeat in cycles. These variations could cause the data to be below (or above) the trend line for periods of longer than one year. In business, cyclical variations are commonly associated with economic cycles, successive booms and slumps in the economy. Cyclical variations are longer term than seasonal variations.
Residual Variation (R)
Other factors causing variation which cannot be explained by the trend or seasonal component: for example, measurement error. ENDSECTION STARTSECTION=content_4.htm= SECTION~
M9 Time Series Analysis Page 291
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Finding the Trend (T)
There are three principle methods of finding a trend.
Inspection
The trend can be drawn by eye on a graph in such a way that appears to lie evenly between the recorded data points.
Regression Analysis
This method makes the assumption that the trend line, whether up or down, is a straight line. Periods of time (such as years for the data in the trend examples A, B and C) are numbered commonly from 1 and the regression line of the data on these period numbers is found. That line is then taken to be the trend.
Moving Averages
This method attempts to remove seasonal variations by a process of averaging. Example: Associate Petroleum Inc The table below shows the volume of heating oil sold by Associate Petroleum Inc in the Eastern European sector over the period 1994-1997. The figures give the number of barrels of oil in thousands sold during each 4 month period during these years.
Sales of heating oil (1000 barrels)
Year Jan – Apr May – Aug Sep – Dec
1994 35 15 42
1995 36 19 44
1996 41 22 47
1997 45 26 52
M9 Time Series Analysis Page 292
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
A graph of this data is as follows:
0
10
20
30
40
50
60
Jan – Apr
'94
May – A
ug '94
Sep – Dec '94
Jan – Apr
'95
May – A
ug '95
Sep – Dec '95
Jan – Apr
'96
May – A
ug '96
Sep – Dec '96
Jan – Apr
'97
May – A
ug '97
Sep – Dec '97
1000
bar
rels
Sales (y)
There is a clear upwards trend in the data. Furthermore, there is a seasonal effect as the May – Aug values each year are below the trend are lower than the Jan – Apr and Sep – Dec values.
M9 Time Series Analysis Page 293
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_5.htm= SECTION~
Regression Estimate of the Trend
Setting up the Data To estimate the trend using regression analysis, the time index (i.e. 1994, Jan – Apr) needs to be replaced by a number. Remember that the variable of interest, sales, is the response y. This would give the following data set.
Year Period Time point (x) Sales (y)
1994 Jan – Apr 1 35
May – Aug 2 15
Sep – Dec 3 42
1995 Jan – Apr 4 36
May – Aug 5 19
Sep – Dec 6 44
1996 Jan – Apr 7 41
May – Aug 8 22
Sep – Dec 9 47
1997 Jan – Apr 10 45
May – Aug 11 26
Sep – Dec 12 52
Now go and do Exercise M9.2
ENDSECTION STARTSECTION=activity_2.htm= SECTION~ Exercise M9.2
Show that
∑ = 78x , ∑ = 6502x , ∑ = 424y , ∑ =165862y , ∑ = 2940xy .
Hence show that the regression line of sales (y) on time (x) is
y = 26.97 + 1.2867x.
M9 Time Series Analysis Page 294
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Now go and do Exercise M9.3
ENDSECTION STARTSECTION=activity_3.htm= SECTION~ Exercise M9.3
• Forecast the value of the trend for the May – Aug period of 1998.
Hint: what would the value of x be for the May – Aug period of 1998?
• What is the problem with using the trend forecast for May – Aug period of 1998 as the value of the forecasted sales for May – Aug period of 1998?
The graph clearly shows that the data has a seasonal component. This means that the data varies about the trend line in a way that can be described by the time period, or season, that the data point relates to. Jan – Apr and Sep – Dec values are above the trend line and May – Aug values are below the trend line. Therefore, the forecast must also take into account this seasonal aspect of the data.
ENDSECTION STARTSECTION=content_6.htm= SECTION~
M9 Time Series Analysis Page 295
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Moving Average Method
Moving averages for the values of a time series are arithmetic means of successive and overlapping values taken n at a time. The number of values, n, used to calculate the moving average is called the period of the moving average. Example n = 3 Year 1994 1995 1996 1997
Period Jan-Apr
May-Aug
Sep-Dec
Jan-Apr
May-Aug
Sep-Dec
Jan-Apr
May-Aug
Sep-Dec
Jan-Apr
May-Aug
Sep-Dec
Sales (y)
35 15 42 36 19 44 41 22 47 45 26 52
Moving Total
92
93
97
99
104
105
108
112
116
121
Moving average
30.7
31
32.3
33
34.7
35
36
37.3
38.7
40.3
Note that the period of the moving average, n, must coincide with the length of the natural cycle of the series. Example Quarterly data n = 4 4 period moving average Monthly data n = 12 12 period moving average Student shop data n = 5 5 period moving average The time series graph with this trend line drawn on it is
0
10
20
30
40
50
60
Jan – Apr '94
May – Aug '94
Sep – Dec '94
Jan – Apr '95
May – Aug '95
Sep – Dec '95
Jan – Apr '96
May – Aug '96
Sep – Dec '96
Jan – Apr '97
May – Aug '97
Sep – Dec '97
1000
bar
rels
Centring
If the period of the moving average is odd then the moving average is automatically centred. This means that the moving average locates at a time point that corresponds to the time point on actual data value in the series. If n is even the moving average would not be centred automatically. As the moving average needs to coincide with times that the actual time series values were recorded at an even period moving average needs to be centred by you. The Quarterly sales figures for a company are given below.
M9 Time Series Analysis Page 296
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Quarter
Year First Second Third Fourth
1998 588 612 636 660
1999 495 515 535 555
2000 400 416 432 448
2001 707 735 763 791
M9 Time Series Analysis Page 297
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
First we calculate the moving average using n=4, and then we average again to obtain our centred trend values.
Year and Quarter
Sales y Moving total n=4
Moving average
Trend
1998 First 588
Second 495 2190 547.5
Third 400 (547.5+553.5)÷2 = 550.5
2214 553.5 Fourth 707 (553.5+558.5)÷2 =556.0
2234 558.5 1999 First 612 560.5
2250 562.5 Second 515 566.0
2278 569.5 Third 416 572.5
2302 575.5 Fourth 735 578.0
2322 580.5 2000 First 636 582.5
2338 584.5 Second 535 588.0
2366 591.5 Third 432 594.5
2390 597.5 Fourth 763 600
2410 602.5 2001 First 660 604.5
2426 606.5 Second 555 610
2454 613.5 Third 448
Fourth 791
Below displays our times series for sales with the moving average trend also indicated.
M9 Time Series Analysis Page 298
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
0
100
200
300
400
500
600
700
800
90019
98 F
irst
Seco
nd
Third
Four
th
1999
Firs
t
Seco
nd
Third
Four
th
2000
Firs
t
Seco
nd
Third
Four
th
2001
Firs
t
Seco
nd
Third
Four
th
Time
Sale
sSales Y Trend
ENDSECTION STARTSECTION=content_7.htm= SECTION~
M9 Time Series Analysis Page 299
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Finding the Seasonal Variation
Once a trend has been established, by whatever method, we can find the seasonal variations. When isolating seasonal variations we need to establish if we are using an additive or multiplicative time series model.
Additive Model
This is used when the seasonal elements are relatively constant over the complete time period being analysed. So in the following graph, the peaks of the time series graph are all the same size, a1 = a2 = a3. In such a case the time series value can be expressed as the sum of a trend and seasonal component. The standard expression describing this type of model would be; y = T + S + R. Example
Additive Model
0
100
200
300
400
500
600
700
800
900
1998
Q1
Q2
Q3
Q4
1999
Q1
Q2
Q3
Q4
2000
Q1
Q2
Q3
Q4
2001
Q1
Q2
Q3
Q4
Qua
rter
ly S
ales
a aa
12
3
Multiplicative Model
This type of model is used when the seasonal elements change in proportion to the trend values over the complete time period being analysed. So the peaks in the following graph for the multiplicative model get bigger as the trend increases, a1 < a2 < a3. In this case the time series can be expressed as the product of a trend and a seasonal component and can be expressed as ;
M9 Time Series Analysis Page 300
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
y = T × S × R. Example
Multiplicative Model
0
100
200
300
400
500
600
700
800
900
1000
1998
Q1
Q2
Q3
Q4
1999
Q1
Q2
Q3
Q4
2000
Q1
Q2
Q3
Q4
2001
Q1
Q2
Q3
Q4
Qua
rter
ly S
ales
aa
a
1
23
ENDSECTION STARTSECTION=content_8.htm= SECTION~
M9 Time Series Analysis Page 301
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The Additive Model, y = T + S + R.
We will illustrate finding the seasonal variations referring to the Associate Petroleum Inc example from earlier. The data set used to fit the regression trend on is
Year Period Time point (x) Sales (y)
1994 Jan – Apr 1 35
May – Aug 2 15
Sep – Dec 3 42
1995 Jan – Apr 4 36
May – Aug 5 19
Sep – Dec 6 44
1996 Jan – Apr 7 41
May – Aug 8 22
Sep – Dec 9 47
1997 Jan – Apr 10 45
May – Aug 11 26
Sep – Dec 12 52
For each of the time points we can estimate the trend value using the fitted regression line, T = 26.97 + 1.2867x The value of x for the first time point is 1. So the estimate of the trend for the first tie point is given by substituting x = 1 in the regression equation. This gives, Trend estimate T(1) = 26.97 + (1.2867×1) = 28.2567 The value of x for the second time point is 2. So the estimate of the trend for the second time point is given by substituting x = 2 in the regression equation. This gives, Trend estimate T(2) = 26.97 + (1.2867×2) = 29.5434
M9 Time Series Analysis Page 302
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
This can be done for each point giving the following table.
Year Period Time point (x)
Sales (y) Trend (T) T = 26.97 + 1.2867x
1994 Jan-Apr 1 35 28.2567
May-Aug 2 15 29.5434
Sep-Dec 3 42 30.8301
1995 Jan-Apr 4 36 32.1168
May-Aug 5 19 33.4035
Sep-Dec 6 44 34.6902
1996 Jan-Apr 7 41 35.9769
May-Aug 8 22 37.2636
Sep-Dec 9 47 38.5503
1997 Jan-Apr 10 45 39.837
May-Aug 11 26 41.1237
Sep-Dec 12 52 42.4104
The additive model for time series is y = T + S + R. We can therefore write y – T = S + R. In other words, if we deduct the trend values from the time series values, we will be left with the seasonal and residual components of the time series. If we assume that the residual component is very small and hence negligible, the seasonal component can be found as S = y – T, the de-trended series.
M9 Time Series Analysis Page 303
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Year Period Sales (y) Trend, T = 26.97 + 1.2867x
S = y – T
1994 Jan – Apr 35 28.2567 35 – 28.2567 = 6.7433
May – Aug 15 29.5434 15 – 29.5434 = –14.5434
Sep – Dec 42 30.8301 42 – 30.8301 = 11.1699
1995 Jan – Apr 36 32.1168 3.8832
May – Aug 19 33.4035 –14.4035
Sep – Dec 44 34.6902 9.3098
1996 Jan – Apr 41 35.9769 5.0231
May – Aug 22 37.2636 –15.2636
Sep – Dec 47 38.5503 8.4497
1997 Jan – Apr 45 39.837 5.163
May – Aug 26 41.1237 –15.1237
Sep – Dec 52 42.4104 9.5896
You will notice that the difference between the actual time series result and the trend line average for any one time period is not the same from year to year. That is the May – Aug seasonal effects each year are not the same. This is because y – T contains not only seasonal variations but random variations as well. So to evaluate a seasonal estimate for each time period, we average the seasonal estimates for each year corresponding to that time period. The May – Aug seasonal effect is then
.8336.1443342.59
4)1237.15()2636.15()4035.14()5434.14(S Aug -May
−=−
=
−+−+−+−=
Now go and do Exercise M9.4
ENDSECTION STARTSECTION=activity_4.htm= SECTION~
Exercise M9.4
Find the seasonal effects for Jan – Apr and Sep – Dec for the above example.
M9 Time Series Analysis Page 304
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=content_9.htm= SECTION~
Forecasting with the Additive Model
Forecasting is an essential but difficult task in business. There are several mathematical techniques for producing forecasts. They will not necessarily provide reliable forecasts but they can help in making future plans. The technique we will use here will consist of extrapolating a trend and the adjusting this trend for seasonal variations.
Example
Returning again to the Associate Petroleum Inc example, how we would produce a forecast for the sales of oil in May – Aug 1998.
Step 1: Estimate the trend, T.
To estimate the trend we would need to know the value of x at this future point in time to substitute into the regression equation T = 26.97 + 1.2867x. If Sep – Dec 1997 has a value of 12 for x, then May – Aug 1998 has a value of 12 + 2 = 14 as May – Aug 1998 is two time periods into the future from the last value of the given time series. This means the trend forecast for May – Aug 1998 is Trend estimate T = 26.97 + (1.2867×14) = 44.9838.
Step 2: Evaluate the seasonal component, S.
Assuming the existing pattern in the data continues, we have just calculated the seasonal component for May – Aug to be –14.8336.
Step 3: Combine the trend and seasonal components.
In this case we are using the additive model y = T + S + R We have a forecast for the trend, T, and the seasonal component, S. We cannot isolate R but hopefully this is small. Hence the forecast is constructed by adding the trend and seasonal components together. Forecast for the sales of oil in May-Aug 1998 y = 44.9838 + (–14.8336) = 30.1502. ENDSECTION STARTSECTION=content_10.htm= SECTION~
M9 Time Series Analysis Page 305
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
The Multiplicative Model, y = T×S×R
The multiplicative model for time series is y = T× S × R. We can therefore write y ÷ T = S × R. In other words, if we divide the time series values by the trend values, we will be left with the seasonal and residual components of the time series. If we assume that the residual component is very small and hence negligible. The seasonal component can be found as S = y ÷T, the de-trended series.
Year and Quarter Sales y. Trend T y ÷ T
1998 First 588
Second 495
Third 400 550.5 400 ÷ 550.5 = 0.7266
Fourth 707 556.0 1.2716
1999 First 612 560.5 1.0919
Second 515 566.0 0.9099
Third 416 572.5 0.7266
Fourth 735 578.0 1.2716
2000 First 636 582.5 1.0918
Second 535 588.0 0.9099
Third 432 594.5 0.7267
Fourth 763 600 1.2717
2001 First 660 604.5 1.0918
Second 555 610 0.9098
Third 448
Fourth 791 You will notice that the difference between the actual time series result and the trend line average for any one time period is not the same from year to year. That is the seasonal effects for each quarter for each year are not the same. This is because y ÷ T contains not only seasonal variations but random variations as well. So to come up with a seasonal estimate for each time period, we average the seasonal estimates for each year corresponding to that time period.
Seasonal Effect for the First Quarter
M9 Time Series Analysis Page 306
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Average all y ÷ T values corresponding to first quarter
0918.13
0918.10918.10919.11 =
++=S .
Seasonal Effect for the Second Quarter Average all y ÷ T values corresponding to second quarter
9099.03
9098.09099.09099.02 =
++=S .
Seasonal Effect for the Third Quarter
7266.03
7267.07266.07266.03 =
++=S .
Seasonal Effect for the Fourth Quarter
2717.13
2717.12716.12716.14 =
++=S .
Note: The seasonal effects should add up to 4, the period of the moving average. Here the total is 3.999. This is close enough not to worry about. You can adjust each S slightly to make the sum 4 if needed. ENDSECTION STARTSECTION=content_11.htm= SECTION~
M9 Time Series Analysis Page 307
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
Forecasting with the Multiplicative Model
Returning again to the quarterly sales example, how we would produce a forecast for the sales for the second quarter of 2002.
Step 1: Estimate the trend, T.
To estimate the trend we would use the graph, extrapolating the trend, the trend estimate for 2002 quarter 2 is 628.
0100200300400500600700800900
1998
Firs
tSe
cond
Third
Four
th19
99 F
irst
Seco
ndTh
irdFo
urth
2000
Firs
tSe
cond
Third
Four
th20
01 F
irst
Seco
ndTh
irdFo
urth
2001
Firs
tSe
cond
Time
Sale
s
Sales Y Trend
Step 2: Evaluate the seasonal component, S.
Assuming the existing pattern in the data continues, we have calculated the seasonal component for the second quarter to be S2 = 0.9099.
Step 3: Combine the trend and seasonal components.
In this case we are using the multiplicative model y = T × S × R We have a forecast for the trend, T, and the seasonal component, S. We cannot isolate R but hopefully this is small. Hence the forecast is constructed by multiplying the trend and seasonal component together. Forecast for the sales for the second quarter of 2002 y = 628×0.9099 = 571.39.
M9 Time Series Analysis Page 308
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ENDSECTION STARTSECTION=activity_5.htm= SECTION~
Seminar Questions
Seminar Question M9.1
The table below shows the total export orders for a company during 1993–6. The figures are given in £ millions.
Total exports (£ millions)
Jan – Apr May – Aug Sep – Dec
1993 4.5 5.6 4.9
1994 5.1 5.9 5.2
1995 5.4 6.8 5.8
1996 6.0 6.8 6.1
i) Calculate a regression trend estimate for this time series.
ii) Estimate the seasonal variations and thus forecast the value of exports for the three time periods in 1997 using the multiplicative model.
ENDSECTION STARTSECTION=activity_6.htm= SECTION~
Seminar Question M9.2
The figures below give the total newspaper sales of a company based in Canada in each quarter during the years 1994–7. The figures show the average daily circulation over each quarter in 100,000s.
Daily newspaper sales
1994 1995 1996 1997
1st quarter 2.2 2.6 2.9 3.2
2nd quarter 2.9 3.2 3.4 3.6
3rd quarter 3.3 3.6 3.9 4.2
4th quarter 2.4 2.7 2.8 3.1
i) Plot these values onto a graph.
M9 Time Series Analysis Page 309
STX1110 Introduction to Quantitative Methods Mathematics and Statistics Group Middlesex University Business School
ii) Calculate the moving average trend (n = 4) and seasonal variations, estimate the circulation figures in each quarter during 1998 by using an additive model.
iii) Would an additive or multiplicative model be appropriate in this example? Forecast the same range of figures using the multiplicative model and compare your results from the two methods.
ENDSECTION STARTSECTION=think_1.htm= SECTION~
How much do you know?
If you have understood the content of this unit you should have knowledge of, and be able to answer questions relating to, all of the following points:
• what components make up a time series; • the characteristics of additive and multiplicative models; • under what conditions you need to use regression analysis or
moving averages to evaluate a trend; • what steps are needed to forecast values.
Extra Activities
Log on to the STX1110 Oasis page and attempt the quizzes in the extra section.
A guide on how to find the STX1110 Oasis quizzes is on page 16 of this book.
Do you want to know more?
For further examples, and exercises refer to Essential Quantitative Methods for Business, Management and Finance, third edition, Les Oakshot, or any of the books on the STX1110 reading list.
.
ENDSECTION ENDCHAPTER