MQ Maths B Yr 11 Ch 10mathsbooks.net/Maths Quest 11B for Queensland/Ch 10... · Chapter 10 Summary statistics 415 Measures of central tendency The mean The mean of a set of data is

10syllabussyllabusrrefefererenceenceTopic:• Applied statistical

analysis

In thisIn this chachapterpter10A Measures of central

tendency10B Range and interquartile

range10C The standard deviation10D Boxplots10E Back-to-back stem plots10F Parallel boxplots

Summarystatistics

MQ Maths B Yr 11 Ch 10 Page 413 Friday, October 26, 2001 10:33 AM

414 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d

IntroductionYvonne works in the quality control department of a soft drink bottling company. Softdrink is bottled by two machines, each of which is set to pour one litre of soft drinkinto every bottle. It is part of Yvonne’s job to take 20 bottles of soft drink from eachmachine and measure the contents in millilitres. The results she obtained from one suchcheck are shown below.

Yvonne must use these results to assess if the machines are sufficiently accurate indispensing soft drink into the bottles.

One of the main tasks of a statistician is to summarise large volumes of data. It isuseful to find one score that is typical of a whole set of data, or a few figures which candescribe its distribution.

Finding the mean, median and mode are three different methods of arriving at ascore that is typical or central to the data set. Mean, median and mode are often calledmeasures of central tendency.

Machine A 1009 992 990 1018 1017 985 984 1008 1020 1005

992 983 1020 988 996 984 989 1014 995 1004

Machine B 1002 991 990 980 1004 1018 1008 997 992 999

1010 1004 1001 1003 1009 1004 1006 1001 997 994

Everybody is different. How do we measure what is typical?


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 415

Measures of central tendencyThe meanThe mean of a set of data is what is referred to in everyday language as the average.

For the set of data 4 7 9 12 18:

mean =

= 10

The symbol we use to represent the mean is , that is, a lower-case x with a bar on

top. So, in this case, = 10.The formal definition of the mean is:

=

where Σx represents the sum of all of the observations in the data set and n representsthe number of observations in the data set.

Note that the symbol, Σ, is the Greek letter, sigma, which represents ‘the sum of’.The mean is also referred to as a summary statistic and is a measure of the centre of

a distribution. The mean is the point about which the distribution ‘balances’.Consider the masses of 7 potatoes, given in

grams, below.100 120 130 145 160 170 190The mean is 145 g. The observ-

ations 130 and 160 ‘balance’each other since they are each15 g from the mean. Similarly,the observations 120 and 170‘balance’ each other since theyare each 25 g from the mean, asdo the observations 100 and 190.Note that the median is also 145 g.That is, for this set of data the meanand the median give the same value for thecentre. This is because the distribution is symmetric.

Now consider two cases in which the distribution of data is not symmetric.

Case 1Consider the masses of a different set of 7 potatoes, given in grams below.

100 105 110 115 120 160 200

The median of this distribution is 115 g and the mean is 130 g. There are 5 observationsthat are less than the mean and only 2 that are more. In other words, the mean does notgive us a good indication of the centre of the distribution. However, there is still a ‘bal-ance’ between observations below the mean and those above, in terms of the spread ofall the observations from the mean. Therefore, the mean is still useful to give a measureof the central tendency of the distribution but in cases where the distribution is skewed,the median gives a better indication of the centre. For a positively skewed distribution,as in the case above, the mean will be greater than the median. For a negatively skeweddistribution the mean will be less than the median.

4 7 9 12 18+ + + +5

----------------------------------------------

x

x

xΣxn

------



Case 2Consider the data below, showing the weekly income (to the nearest $10) of 10 families living in a suburban street.

$300 $670 $680 $690 $700 $710 $710$720 $730 $750

In this case, = = $666, and the

median is $705.One of the values in this set, $300, is

clearly an outlier. As a result, the value of the mean is below the weekly income of the other 9 households. In such a case themean is not very useful in establishing the centre; however, the ‘balance’ still remainsfor this negatively skewed distribution.

The mean is calculated by using the values of the observations and because of this itbecomes a less reliable measure of the centre of the distribution when the distribution isskewed or contains an outlier. Because the median is based on the order of the observ-ations rather than their value, it is a better measure of the centre of such distributions.

Calculating the mean using a graphics calculator

When data are presented in a frequency table with class intervals and we don’t knowwhat the raw data are, we employ another method to find the mean of these groupeddata. This other method is shown in the example that follows and uses the midpoints ofthe class intervals to represent the raw data.

x6660

10------------

CASIO

WE 10-1 Mean

Calculate the mean of the set of data below.10, 12, 15, 16, 18, 19, 22, 25, 27, 29

THINK WRITE/DISPLAYEnter the data in L1. (Press and select 1:Edit.)

Calculate the mean.(a) Press .(b) Highlight CALC in the top line.(c) Highlight 1:1–Var Stats and press

.(d) Press L1 and press .(e) Several values are given. The top

entry = 19.3 gives us the mean. = 19.3

1 STAT

2STAT

ENTER2nd ENTER

x x

1WORKEDExample



Remember that the Greek letter sigma, Σ, represents ‘the sum of’. So, Σf means thesum of the frequencies and is the total of all the numbers in the frequency column.To find the mean for grouped data,

=

where f represents the frequency of the data and m represents the midpoint of the classinterval of the grouped data.

The ages of a group of 30 people attending a superannuation seminar are recorded in the frequency table below.

Calculate the mean age of those attending the seminar.

Age(class intervals)

Frequencyf

Age(class intervals)

Frequencyf

20–2930–3940–49

16

13

50–5960–6970–79

631

THINK WRITE

Since we don’t have individual raw ages, but rather a class interval, we need to decide on one particular age to represent each interval. We use the midpoint, m, of the class interval. Add an extra column to the table to display these.The midpoint of the first interval is

, the midpoint of the

second interval is 34.5 and so on.

So, =

≈ 46.8 (correct to 1 decimal place)

Multiply each of the midpoints by the frequency and display these values in another column headed f × m. For the first interval we have 24.5 × 1 = 24.5. For the second interval we have 34.5 × 6 = 207 and so on.

Sum the product of the midpoints and the frequencies in the f × m column.24.5 + 207 + 578.5 + 327 + 193.5 + 74.5 = 1405

Divide this sum by the total number of people attending the seminar.

1

20 29+2

------------------ 24.5=

Age(class

intervals)Frequency

f

Mid-point of

class interval

m f × m

20–2930–3940–4950–5960–6970–79

16

13631

24.534.544.554.564.574.5

24.5207578.5327193.5

74.5

Σf = 30 Σ( f × m)= 1405

x1405

30------------

2

3

4

2WORKEDExample

xΣ f m×( )

Σf------------------------



The medianThe median is the midpoint of a set of data. Half the data are less than or equal to themedian.

Consider the set of data: 2 5 6 8 11 12 15. These data are in ordered form (that is,from lowest to highest). There are 7 observations. The median in this case is the middleor fourth score; that is, 8.

Consider the set of data: 1 3 5 6 7 8 8 9 10 12. These data are in orderedform also; however, in this case there is an even number of scores; that is, there are10 scores. The median in this case lies halfway between the 5th score (7) and the

6th score (8). So the median is 7.5. (Alternatively, median = .)

When there are n records in a set of ordered data, the median can be located at the

th position.

CASIO

WE 10-2 Mean

Solution to worked example 2 using a graphics calculator

THINK DISPLAY

Since we don’t have individual raw ages, but rather a class interval, we need to decide on one particular age to represent each interval. We use the midpoint, m, of the class interval. The midpoint of the first

interval is = 24.5. The midpoint

of the second interval is 34.5 and so on.Enter these midpoints as list 1.Enter the frequency of each of the class intervals as list 2. For example, L1(3)=44.5 and L2(3)=13.

Calculate the mean.(a) Press .(b) Highlight CALC in the top line.(c) Highlight 1:1–Var Stats and press

.(d) Enter L1 and L2 by pressing [L1],

and [L2]; then pressing .

A number of values is given. The top entry, = 46.8, gives us the mean. =

1

20 29+2

------------------

2

3STAT

ENTER2nd

, 2ndENTER

x x 46.8

7 8+2

------------ 7.5=

n 1+2

------------


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 419Checking this against our previous example, we have n = 10; that is, there were 10

observations in the set. The median was located at the = 5.5th position; that

is, halfway between the 5th and the 6th terms.A stem plot provides a quick way of locating a median since the data in a stem plot

are already ordered.

ModeThere are many examples where neither the mean nor the median is the appropriatemeasure of the typical score in a data set.

Consider the case of a clothing store. It needs to re-order a supply of dresses. Toknow what sizes to order it looks at past sales of this particular style and gathers thefollowing data:

8 12 14 12 16 10 12 14 16 1814 12 14 12 12 8 18 16 12 14

For this data set the mean dress size is 13.2. Dresses are not sold in size 13.2, so thishas very little meaning. The median is 13, which also has little meaning as dresses aresold only in even-numbered sizes.

What is most important to the clothing store is the dress size that sells the most. Inthis case size 12 occurs most frequently. The score that has the highest frequency iscalled the mode.

When two scores occur most often an equal number of times, both scores are givenas the mode. In this situation the scores are bimodal. If all scores occur most often anequal number of times, then the distribution has no mode.

To find the mode from a frequency distribution table, we simply give the score thathas the highest frequency.

10 1+2

---------------

Consider the stem plot below which contains 22 observations. What is the median?

THINK WRITE

Find the median position, where n = 22. Median = th position

= th position

= 11.5th positionFind the 11th and 12th terms. 11th term = 35

12th term = 38The median is halfway between the 11th and 12th terms. Median = 36.5

1n 1+

2------------

22 1+2

---------------

2

3

3WORKEDExample

Stem223344

Leaf3 35 7 91 3 3 4 45 8 9 90 2 26 8 8 8 9

Key: 3|4 = 34



When a table is presented using grouped data, we do not have a single mode. Inthese cases, the class with the highest frequency is called the modal class.

For the frequency distribution at right, state the mode.

THINK WRITE

The highest frequency is 14 which belongs to the score 17 and so 17 is the mode.

Mode = 17

Score Frequency

14 3

15 6

16 11

17 14

18 10

19 7

4WORKEDExample

remember1. The mean is given by = where Σx represents the sum of all the

observations in the data set and n represents the number of observations in the data set.

2. The mean is calculated by using the values of the observations and because of this it becomes a less reliable measure of the centre of the distribution when the distribution is skewed or contains an outlier.

3. To find the mean for grouped data, = where f represents the

frequency of the data and m represents the midpoint of the class interval of the grouped data.

4. The median is the midpoint of a set of data. Half the data are less than or equal to the median.

5. When there are n observations in a set of ordered data, the median can be

located at the th position.

6. The mode is the score with the highest frequency.

x Σxn

-------

xΣ f m×( )

Σ f--------------------------

n 1+2

------------

remember



Measures of central tendency

A graphics calculator may be used for this exercise.

1 Find the mean of each of the following sets of data.a 5 6 8 8 9b 3 4 4 5 5 6 7 7 7 8 8 9 9 10 10 12c 4.3 4.5 4.7 4.9 5.1 5.3 5.5 5.6d 11 13 15 15 16 18 20 21 22e 0.4 0.5 0.7 0.8 0.8 0.9 1.0 1.1 1.2 1.0 1.3

2 Calculate the mean of each of the following and explain whether or not it gives us agood picture of the centre of the data.a 0.7 0.8 0.85 0.9 0.92 2.3b 14 16 16 17 17 17 19 20c 23 24 28 29 33 34 37 39d 2 15 17 18 18 19 20

3 The number of people attending sculpture classes at the local TAFE college for eachweek during the first semester is given below.

15 12 15 11 14 8 14 15 11 107 11 12 14 15 14 15 9 10 11

What is the mean number of people attending each week? (Express your answer to thenearest whole number.)

4

The ages of a group of junior pilots joining an international airline are indicated on thestem plot below.

5

The mean age of this group of pilots is:A 20B 28C 29D 29.15E 29.5

Key: 2|1 = 21 yrs

Stem2222233333

Leaf124 56 6 78 8 8 90 1 12 34 468

The number of people present each week at a 15-week horticul-tural course is given by the stem plot at right.The mean number of people attending each week was closest to:A 17.7 B 18C 19.5 D 20E 21.2

Key: 2|4 = 24 people

Stem001122

Leaf472 45 5 6 7 81 2 47 7 7

10A

Mathcad

One-variablestatistics

EXCEL Spreadsheet


WWORKEDORKEDEExample

1

GC program

UV stats

mmultiple choiceultiple choice




6 For each of the following, write down whether the mean or the median would providea better indication of the centre of the distribution.

a A positively skewed distribution

b A symmetric distribution

c A distribution with an outlier

d A negatively skewed distribution

7 Find the mean of each set of data given below.

8 The ages of people attending a beginner’s course in karate are indicated in thefrequency table below.

What is the mean age of those attending the course? (Express your answer correct to the nearest whole number.)

a Class interval Frequency, f b Class interval Frequency, f

0–910–1920–2930–3940–4950–59

136

1712

5

0–45–9

10–1415–1925–2930–34

257

1386

c Class interval Frequency, f d Class interval Frequency, f

0–4950–99

100–149150–199200–249250–299

278

1412

5

1–67–12

13–1819–2425–3031–36

141923222014

AgeFrequency,

f

10–14

15–19

20-24

25–29

30–34

35–39

40–44

45–49

5

5

7

4

3

2

2

1

EXCEL

Spreadsheet

One-variable statistic with intervals


2


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4239 Write down the median of the sets of data shown in the following stem plots. The key

for each stem plot is 3 | 4 = 34.

10 For each of the following sets of data, write down the median.a 2 4 6 7 9b 12 15 17 19 21c 3 4 5 6 7 8 9d 3 5 7 8 12 13 15 16e 12 13 15 16 18 19 21 23 24 26f 3 8 4 2 1 6 5g 16 21 14 28 23 15 11 19 25h 7 4 3 4 9 5 10 4 2 11i 29 23 22 33 26 18 37 22 16

11 Find the mode for each of the following. (Hint: Some are bimodal and others have no mode.)a 16, 17, 19, 15, 17, 19, 14, 16, 17b 147, 151, 148, 150, 148, 152, 151c 2, 3, 1, 9, 7, 6, 8d 68, 72, 73, 72, 72, 71, 72, 68, 71, 68e 2.6, 2.5, 2.9, 2.6, 2.4, 2.4, 2.3, 2.5, 2.6

a Stem0123456

Leaf72 32 4 5 7 90 2 3 6 8 84 7 8 9 92 7 81 3

b Stem0000011111

Leaf0 0 1 12 2 3 34 4 5 5 5 5 5 5 5 56 6 6 6 78 8 8 90 0 13 35 57

c Stem0000011111

Leaf124 4 56 6 6 78 8 8 8 9 90 0 0 1 1 1 12 2 2 3 3 34 4 5 56 7 78 9

d Stem3333344444

Leaf1

68 90 0 1 1 12 2 3 3 3 34 5 5 56 79

e Stem6061626364656667

Leaf2 5 81 3 3 6 7 8 90 1 2 4 6 7 8 8 92 2 4 5 7 83 6 74 5 83 54


3

GC program

UV stats

Mathcad


EXCEL Spreadsheet




12 Use the tables below to state the mode of the distribution.

13 For each of the following grouped distributions, state the modal class.

14 The following data give the age of 25 patients admitted to the emergency ward of ahospital.

18 16 6 75 2423 82 74 25 2143 19 84 72 3174 24 20 63 7980 20 23 17 19

a Represent the data in a frequency distribution table. (Use classes 0–14, 15–29,30–44, etc.)

b Find the mean age of patients admitted.c Find the median class of age of patients admitted.d Find the modal class for age of patients admitted.e Do any of your statistics (mean, median or mode) give a clear representation of the

typical age of an emergency ward patient?f Give some reasons that could explain the pattern of the distribution of data in this

question.


4 a Score Frequency

1 2

2 4

3 5

4 6

5 3

b Score Frequency

5 1

6 3

7 5

8 8

9 5

10 3

c Score Frequency

38 2

39 4

40 1

41 5

42 6

43 3

44 6

45 2

a bClass Frequency

1–4 6

5–8 12

9–12 30

13–16 23

17–20 46

21–24 27

25–28 9

Class Frequency

1–7 3

8–14 8

15–21 9

22–28 25

29–35 12

36–42 11

43–49 2


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 42515 The batting scores for two cricket players over six innings are as

follows:

Player A 31, 34, 42, 28, 30, 41 Player B 0, 0, 1, 0, 250, 0

a Find the mean score for each player.b Which player appears to be better if the mean result is

used?c Find the median score for each player.d Which player appears to be better when the decision is

based on the median result?e Which player do you think would be more useful to

have in a cricket team and why? How can the meanresult sometimes lead to a misleading conclusion?

Mean and median amount of soft drinkRemember Yvonne’s quality control check? We are going to calculate the mean and median amount of soft drink dispensed by each machine into each bottle.

1 Use your graphics calculator to store the data for Machine A as a list. Name the list MA. The data for Machine A were

2 Use your graphics calculator to store the data for Machine B as a list. Name the list MB. The data for machine B were

3 Use the statistics function of the graphics calculator to find the mean and median amount of soft drink dispensed into bottles by each machine.

4 At this stage can you say which machine most accurately dispenses soft drink into bottles?

Note :

1. If you do not have a graphics calculator, you can still calculate the mean and median of each data set.

2. If you do have a graphics calculator, It would be easier to enter the two sets of data under L1 and L2 (press and select 1:Edit); however, if the data are stored as named lists MA and MB these can be retained for use later in the chapter.

WorkS

HEET10.1

CASIO

Softdrink

1009 992 990 1018 1017 985 984 1008 1020 1005

992 983 1020 988 996 984 989 1014 995 1004

1002 991 990 980 1004 1018 1008 997 992 999

1010 1004 1001 1003 1009 1004 1006 1001 997 994

STAT



Range and interquartile rangeWe have now looked at measures of central tendency, but although a set of scores mayhave the same mean, median and mode they still may be very different data sets. Con-sider the results obtained by two groups of 10 students on the same mathematics test.Group A: 45, 46, 47, 48, 50, 50, 52, 53, 54, 55Group B: 10, 20, 30, 40, 50, 50, 60, 70, 80, 90In both groups the mean, median and mode mark is 50, but we can see that they arevery different data sets. We can see that Group A has a very bunched group of scoresbut Group B’s scores are very spread out.

The range and interquartile range are examples of a measure of spread. Thesemeasures of spread help us analyse the spread of various data sets.

The rangeThe range is the easiest of this group of summary statistics to calculate. The range of aset of data is the difference between the highest and lowest values in that set.

It is usually not too difficult to locate the highest and lowest values in a set of data.Only when there is a very large number of observations might the job be made moredifficult. In the example above that compared the results on a mathematics test by twogroups the range for Group A is found by subtracting the lowest score (45) from thehighest score (55). Similarly, we can say that the range for Group B is 90 − 10 = 80.Statistically we can write the lowest score as minX and the highest score as maxX andso the range can be found using the formula

Range = maxX − minB.The values of maxX and minX can be found using a graphics calculator as you will seelater in the chapter.

While the range gives us some idea about the spread of the data it is not terriblyinformative since it gives us no idea of how the data are distributed between the highestand lowest values.

The interquartile rangeWe have seen that the median divides a set of data in half. Similarly, quartiles divide aset of data in quarters. The symbols used to refer to these quartiles are Q1, Q2 and Q3.

The middle quartile, Q2, is the median.

The interquartile range IQR = Q3 − Q1

The interquartile range gives us the range of the middle 50% of values in our set of data.There are four steps to locating Q1 and Q3.

Step 1. Write down the data in ordered form from lowest to highest.Step 2. Locate the median; that is, locate Q2.Step 3. Now consider just the lower half of the set of data. Find the middle score. This

score is Q1.Step 4. Now consider just the upper half of the set of data. Find the middle score. This

score is Q3.The four cases given below illustrate this method.

Case 1Consider data containing the 6 observations: 3 6 10 12 15 21.The data are already ordered. The median is 11.


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 427Consider the lower half of the set, which is 3 6 10. The middle score is 6, so Q1 = 6.Consider the upper half of the set, which is 12 15 21. The middle score is 15, so Q3 = 15.

Case 2Consider a set of data containing the 7 observations: 4 9 11 13 17 23 30.The data are already ordered. The median is 13.Consider the lower half of the set, which is 4 9 11. The middle score is 9, so Q1 = 9.Consider the upper half of the set, which is 17 23 30. The middle score is 23, so Q3 = 23.

Case 3Consider a set of data containing the 8 observations: 1 3 9 10 15 17 21 26.The data are already ordered. The median is 12.5.Consider the lower half of the set, which is 1 3 9 10. The middle score is 6, so Q1 = 6.Consider the upper half of the set, which is 15 17 21 26. The middle score is 19, so Q3 = 19.

Case 4Consider a set of data containing the 9 observations: 2 7 13 14 17 19 21 25 29.The data are already ordered. The median is 17.Consider the lower half of the set, which is 2 7 13 14. The middle score is 10, so Q1 = 10.Consider the upper half of the set, which is 19 21 25 29. The middle score is 23, so Q3 = 23.

A graphics calculator provides possibly the fastest way of locating quartiles and hencefinding the value of the interquartile range.

The ages of the patients who attended the casualty department of an inner suburban hospital on one particular afternoon are shown below.

14 3 27 42 19 17 73 60 62 21 23 2 5 58 33 19 81 59 25 17 69

Find the interquartile range of these data.

THINK WRITEOrder the data. 2 3 5 14 17 17 19 19 21 23

25 27 33 42 58 59 60 62 69 73 81Find the median. The median is 25 since ten scores lie below it and ten

lie above it.Find the middle score of the lower half of the data.

For the scores 2 3 5 14 17 17 1919 21 23, the middle score is 17. So, Q1 = 17.

Find the middle score of the upper half of the data.

For the scores 27 33 42 58 59 60 6269 73 81, the middle score is 59.5.So, Q3 = 59.5.

Calculate the interquartile range. IQR = Q3 − Q1 = 59.5 − 17 = 42.5

1

2

3

4

5

5WORKEDExample



In most cases we are asked to find the interquartile range of a grouped distribution. Thisrequires us to draw a cumulative frequency polygon and find the 25th and 75th percentile.

A percentile is a measure of where in a set of scores an individual score lies. Forexample, the 25th percentile has 25% of scores below it and 75% above it.

To find the interquartile range, draw a second vertical axis that shows the 25th, 50th,and 75th percentile. A line is drawn from the 25th, 50th and 75th percentile to the ogiveand then down to the horizontal axis. The value for the quartiles can then be calculated.

The median is the score that is found at the 50th percentile.

CASIO

WE10–6

Parents are often shocked at the amount of money their children spend. The data below give the amount spent (to the nearest whole dollar) by each child in a group that was taken on an excursion to the Exhibition.

15 12 17 23 21 19 16 11 17 18 2324 25 21 20 37 17 25 22 21 19

Calculate the interquartile range for these data.

THINK DISPLAYEnter the data.(a) Press .(b) Select 1:Edit by pressing .(c) Enter the data in L1.

Obtain the values of the quartiles.(a) Press .(b) Select CALC.(c) Select 1:1–Var Stats by pressing

.(d) Enter L1 (press [L1]). Press

.

A list of statistics appears. We shall be using a number of these later. We are looking for the first and third quartiles.Scroll down the screen using the key.

Q1 = 17 and Q3 = 23So, IQR = 23 − 17 = 6

1STAT

ENTER

2STAT

ENTER2nd

ENTER

3

�

6WORKEDExample



The cumulative frequency histogram and polygon at right shows the number of customers who order different volumes of concrete from a readymix concrete company during a day.

Find the:a medianb interquartile range for this distribution.

THINK DISPLAY/WRITE

a Draw a vertical axis showing the percentiles.

a

Draw a line for the 50th percentile to the ogive and estimate the median.

Median = 0.9

b Draw a line for the 25th and 75th percentiles and estimate these values.

b Lower quartile = 0.4Upper quartile = 1.6

Calculate the interquartile range by subtracting the lower quartile from the upper quartile.

Interquartile range = 1.6 − 0.4= 1.2

0.250

10

20

30

0.751.25

1.752.25

2.75

Number of customers

Cum

ulat

ive

freq

uenc

y

40

50

1

0.250

10

20

30

0.751.25

1.752.25

2.75

Number of customers

40

50 100%

75%

50%

25%

0%

2

1

2

7WORKEDExample

remember1. The range of a set of data is the difference between the highest and lowest

values in that set.2. The interquartile range IQR = Q3 − Q1.3. The interquartile range gives us the range of the middle 50% of values in our

set of data.4. There are four steps to locating Q1 and Q3.

Step 1. Write down the set of data in ordered form from lowest to highest.Step 2. Locate the median, that is, locate Q2.Step 3. Now consider just the lower half of the set of data. Find the middle

score. This score is Q1.Step 4. Now consider just the upper half of the set of data. Find the middle

score. This score is Q3.5. The interquartile range of a grouped distribution is estimated from an ogive.

remember



Range and interquartile range

1 Write down the range of the sets of data shown in the following stem plots. The key foreach stem plot is 3 | 4 = 34.

2 For each of the following sets of data, write down the range.a 2 4 6 7 9b 12 15 17 19 21c 3 4 5 6 7 8 9d 3 5 7 8 12 13 15 16e 12 13 15 16 18 19 21 23 24 26f 3 8 4 2 1 6 5g 16 21 14 28 23 15 11 19 25h 7 4 3 4 9 5 10 4 2 11i 29 23 22 33 26 18 37 22 16

a Stem0123456

Leaf72 32 4 5 7 90 2 3 6 8 84 7 8 9 92 7 81 3

b Stem0000011111

Leaf0 0 1 12 2 3 34 4 5 5 5 5 5 5 5 56 6 6 6 78 8 8 90 0 13 35 57

c Stem0000011111

Leaf124 4 56 6 6 78 8 8 8 9 90 0 0 1 1 1 12 2 2 3 3 34 4 5 56 7 78 9

d Stem3333344444

Leaf1

68 90 0 1 1 12 2 3 3 3 34 5 5 56 79

e Stem6061626364656667

Leaf2 5 81 3 3 6 7 8 90 1 2 4 6 7 8 8 92 2 4 5 7 83 6 74 5 83 54

10B

EXCEL

Spreadsheet

One-variable statistics

Mathca

d

One-variable statistics

GCpro

gram

UV stats


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4313 a On the 9th of August, the number of cars that stopped at the drive-in area at a

McBurger restaurant during each hour (from 7.00 am until 10.00 pm) is shown below.14 18 8 9 12 24 25 15 18 25 24 21 25 24 14

Find the interquartile range of this set of data.

b At the nearby Kenny’s Fried Chicken restaurant on the same day, the number ofcars stopping during each hour that it was open is shown below.

7 9 13 16 19 12 11 18 20 19 21 20 18 10Find the interquartile range of these data.

4 Write down a set of data for which n = 5, the median is 6 and the range is 7.

5 Write down a set of data for which n = 8, the median is 7.5 and the range is 10.

6

The quartiles for a set of data are calculated and found to be Q1 = 13, Q2 = 18, andQ3 = 25. Which of the following statements is true?A The interquartile range of the data is 5.B The interquartile range of the data is 7.C The interquartile range of the data is 12.D The median is 12.E The median is 19.

It is recommended that a graphics calculator be used for questions 7 and 8.

7 For each of the following sets of data find the median, the interquartile range and therange. a

b

c

8 For each set of data shown on the stem plots, find the median, the interquartile rangeand the range.

1619

1211

86

715

2632

3218

1543

5131

2923

4523

2223

2525

2721

3619

3129

3228

3931

2927

2022

3029

1.26.1

2.33.7

4.15.4

2.43.7

1.55.2

3.73.8

6.16.3

2.47.1

3.64.9

1.2

a Stem23456789

1011

Leaf3 5 5 6 7 8 9 90 2 2 3 4 6 6 7 8 82 2 4 5 6 6 6 7 90 3 3 5 62 45 927

4 Key: 4|2 = 42

b Stem11223344

Leaf4

1 45 7 8 8 91 2 2 2 4 4 4 45 5 5 63 4

Key: 2|5 = 25


5 SkillSH

EET 10.1



6

EXCEL Spreadsheet

Inter-quartilerange



9 The frequency histogram and polygon at right displays the results of a survey of 50 drivers who were asked about the number of speeding fines they have received.a Use the ogive to find the median of the

distribution.b Find the lower quartile. c Find the upper quartile.d Calculate the interquartile range.

10 The frequency distribution table below shows the result of a survey of 90 householdswho were asked about the number of times they had been the victim of crime.

a Add a column for cumulative frequency to the table.b Draw a cumulative frequency histogram and polygon.c Use your graph to find the median of the distribution. d Calculate the interquartile range.

The standard deviationThe standard deviation is the most sophisticated and also the most useful measure ofspread.

The standard deviation σ can be calculated by using the following formula:

s =

To calculate the standard deviation by hand, use the following steps:

Score Frequency

0 26

1 31

2 22

3 8

4 3

Range of soft drink amountsTake another look at Yvonne’s quality control check. We have previously found two measures of central tendency for the two soft drink dispensing machines, but these alone are not enough to state which machine dispenses soft drink most accurately.1 Find the range of amounts dispensed by each machine.2 Find the interquartile range of amounts dispensed by each machine.3 From these results, which machine appears to dispense soft drink most

accurately?


7

00

10

20

30

1 2 3 4 5No. of speeding finesreceived by drivers

Cum

ulat

ive

freq

uenc

y

40

50

5

15

25

35

45

f xi x–( )2∑n

------------------------------


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 433Step 1. Find the mean.Step 2. Find the difference between each piece of data and the mean.Step 3. Square the differences.Step 4. Add the squared differences.Step 5. Divide by the number of scores.Step 6. Take the square root.

This algorithm is used to find the standard deviation in the following worked example.

Fortunately, you will not always have to go through this series of steps each time youwish to calculate a standard deviation. Your calculator should have a built-in programfor the computation of standard deviations.

Predicting the mean and standard deviation of a population from a sampleIt is not always practical to measure a particular statistic for a whole population so usuallya sample of the population is taken. It is found that the mean of a sample is a reliableestimate of the mean of a population but the standard deviation of a population is slightlymore than the standard deviation of any sample drawn from it. In other words, the com-plete population shows slightly more variability than any sample drawn from it.

A formula used to predict the standard deviation, s, of a complete population from

a sample of scores is:

Notice that the only difference between the formulas is that the divisor has changedfrom n to n − 1. Your calculator is also equipped with a built-in program for the calcu-lation of this formula. It is worth checking that you can obtain both results from yourcalculator and can distinguish between them. Try reworking the data from workedexample 8. You should find that the standard deviation among the 8 packets of lollies

The following data give the number of lollies in each of 8 packets. Find the standard deviation of the data.14, 14, 13, 15, 16, 13, 14, 17THINK WRITE

Find the mean.= 14.5

Find the difference between each score and the mean.

Differences from mean:−0.5, −0.5, −1.5, 0.5, 1.5, −1.5, −0.5, 2.5

Square each difference. Squared differences:0.25, 0.25, 2.25, 0.25, 2.25, 2.25, 0.25, 6.25

Add the squared differences. 0.25 + 0.25 + 2.25 + 0.25 + 2.25 + 2.25 + 0.25 + 6.25= 14

Divide by the number of scores. 14 ÷ 8 = 1.75

Take the square root and round to 4 decimal places.

= 1.3229The standard deviation σ = 1.3229.

1x

14 14 13 15 16 13 14 17+ + + + + + +8

--------------------------------------------------------------------------------------------=

2

3

4

5

6 1.75

8WORKEDExample

sf xi x–( )2∑n 1–

------------------------------=



was σ = 1.3229. If you were to use this sample to predict the standard deviation of allboxes of lollies then the standard deviation would be s = 1.4142.

A final point worth noting about the standard deviation is that, despite its sophisti-cation, it is still influenced to a high degree by extreme values. Care should be takenwhen using this statistic with data that include such values.

CASIO

WE10–9

The following frequency distribution gives the prices paid by a car wrecking yard for 40 car wrecks.

a Find the mean and standard deviation in the price paid for these wrecks.b Estimate the mean and standard deviation in the price paid for wrecks by this yard in

general.

Price ($) Frequency Price $ Frequency0–500 2 2000–2500 7500–1000 4 2500–3000 61000–1500 8 3000–3500 31500–2000 10

THINK WRITEa Calculate the midpoint of each price

range and enter this in L1 on your graphics calculator (press , select EDIT and 1:Edit).Enter the frequency values in L2.

a

Press , select CALC and 1:1–Var Stats and enter L1, L2 to generate the screen shown opposite.

Press to calculate the statistics.

The mean is shown as and the standard deviation as σx.

Mean price = $1825Standard deviation = $787

b Using the 1–Var Stats output screen, we use to estimate the mean, and sx to estimate the standard deviation of a wider population.

b Population mean price (estimate)= $1825

Population standard deviation (estimate)= $797.03

1

STAT

2

3 STAT

4 ENTER

5 x

x

9WORKEDExample



The standard deviation

1 Use the algorithm (series of steps) to find the standard deviation of the following datawithout using your calculator’s in-built program.

2 Now use the calculator’s in-built program to check each of the standard deviationsthat you calculated in the previous question.

You may use your calculator’s in-built program for finding the standard devi-ation and mean in the rest of the questions.

3 Consider the following two groups of people.

a Calculate the mean height, median height and mode height for each group. Whatdo you notice?

b Are the groups really the same?c Which group would you expect to show the greatest range in heights?d Which group would you expect to show the greatest interquartile range in heights?e Which group would you expect to show the greatest standard deviation in heights?f Calculate these statistics to confirm your predictions.

a 3, 5, 8, 2, 7, 1, 6, 5 b 11, 8, 7, 12, 10, 11, 14c 25, 15, 78, 35, 56, 41, 17, 24 d 5.2, 4.7, 5.1, 12.6, 4.8

remember1. The standard deviation of a group of scores can be found using the formula:

2. The standard deviation of a population can be predicted from a sample of scores by using the formula:

3. The lower the standard deviation the closer together the scores.4. On your calculator the population standard deviation is denoted and the

sample standard deviation sx.

σf xi x–( )2∑

n-----------------------------=

sf xi x–( )2∑n 1–

-----------------------------=

sx

remember

10CWWORKEDORKEDEExample

8

Group A Group B160 170 170 170 170 170 180

Hei

ght (

cm)

160 170 170 110 230 170 180



4 The following frequency distribution table shows the number of visitors that came toa city museum during the course of a month.

a Find the range of the data.b Find the mean of the data.c Find the standard deviation and variance of the data.

5

Calculate the standard deviation of the following data to 3 decimal places.

6 The following frequency distribution table shows the life expectancy of 175household light globes.

a Find the range of the data.b Find the mean and standard deviation in the lifetimes of this sample of light globes.c Estimate the mean and standard deviation in the lifetimes of all light globes of this

brand.

7 The following frequency distribution table shows the distribution of daily maximumtemperatures during the course of a full year.

a Add a cumulative frequency column to the table.b Draw an ogive of the data.c Find the upper and lower quartiles of the data and calculate the interquartile range.d Use the ogive to find the median (50th percentile of the data).e Find the mean of the data.f Find the standard deviation of the data.g Find the range of the data.

Visitor number 80– 90– 100– 110– 120– 130–

Frequency 1 4 11 9 4 2

Score 10– 20– 30– 40– 50–Frequency 1 6 9 4 1

A 3.027 B 9.437 C 9.209 D 34.048 E None of the above.

Life (h) Frequency Life (h) Frequency

200– 2 450– 38

250– 5 500– 26

300– 12 550– 15

350– 25 600– 7

400– 42 650– 3

Maximum temperature °C

Number of days

Maximum temperature °C

Number of days

0– 4 20– 94

5– 22 25– 19

10– 95 30– 5

15– 124 35– 2



9


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4378 The following data give the number of fruit that have formed on each of 30 trees in an

orchard.

a Complete a frequency distribution table for the data.b Draw an ogive of the data.c Use the ogive to find the median, lower quartile and upper quartile of the data.d Find the interquartile range of the data.e Find the mean of the data.f Find the standard deviation of the data.g Estimate the standard deviation in the number of fruit for the whole orchard.h Find the range of the data.

9 The polygons drawn at right show thelifetimes of two samples of different brandsof toaster elements when subject to continueduse.a Which brand has the longest mean life?b Estimate the mean life of each brand.c Which brand has the greatest standard

deviation in its performance?d What does this say about the consistency of this element?e Which brand is best? Give a case in support of Electric Mate. Give a case in

support of Hot Wire.

10 Crunch and Crinkle are two brands of potato crisps. Each is sold in packets nominallyof the same size and for the same price. Upon investigation of a sample of packets ofeach it is found that Crunch and Crinkle have the same mean weight (25 g). The stan-dard deviation of the weights of Crunch is, however, 5 g and the standard deviation ofthe weights of Crinkle is 2 g. Which brand do you think would represent the bestvalue for money under these circumstances? Why?

458160

487353

524654

364858

384441

723944

365247

745876

565768

466555

Standard deviation of softdrink amounts

Returning to Yvonne’s measurements, we are now ready to find the standard deviation of the amounts of soft drink dispensed by each machine.

1 Use the data lists MA and MB stored on your graphics calculator to find the standard deviation for each machine. If you do not have a graphics calculator you can obtain the standard deviation by re-entering each set of data.

2 Explain your choice of σx or sx.

3 Interpret the results in terms of assessing which machine dispenses the soft drink most accurately.

150 160 170 180Hours

Hot Wire

Electric Mate

190 200 210 220 230

WorkS

HEET10.2



BoxplotsFive number summaryA five number summary is a list consisting of the lowest score, lower quartile, median, upper quartile and greatest score of a set of data.

A five number summary gives information about the spread of a set of data. The con-vention is not to detail the numbers with labels but to present them in order; so, forexample, the five number summary:

4 15 21 23 28would be interpreted as lowest score 4, lower quartile 15, median 21, upper quartile 23and greatest score 28.

BoxplotsA boxplot (or box-and-whisker plot) is a graph of the five number summary. It is apowerful way to show the spread of data. Boxplots consist of a central divided box withattached ‘whiskers’. The box spans the interquartile range. The median is marked by avertical line inside the box. The whiskers indicate the range of scores:

Boxplots are always drawn to scale. They are presented either with the five number summary figures attached as labels (diagram at right) or with a scale presented alongside the boxplot like the diagram below.

From the following five number summary find:a the median b the interquartile range c the range.29 37 39 44 48 THINK WRITEThe figures are presented in the order of lowest score, lower quartile, median, upper quartile, greatest score.

Xmin = 29, QL = 37, median = 39, QU = 44, Xmax = 48

a The median is 39. a Median = 39b The interquartile range is the difference

between the upper and lower quartiles.b IQR = QU − QL

= 44 − 37= 7

c The range is the difference between the greatest score and the lowest score.

c Range = Xmax − Xmin

= 48 − 29= 19

10WORKEDExample

Indicates thelowest score

Indicates thelower quartile

Indicates themedian

Indicates theupper quartile

Indicates thegreatest score

4 15 21 23 28

0 5 10 15 20 25 30 Scale



Interpreting a boxplotThe boxplot neatly divides the data into four sections. One-quarter of the scores liebetween the lowest score and the lower quartile, one-quarter between the lower quartileand the median, one-quarter between the median and the upper quartile, and one-quarter between the upper quartile and the greatest score. The reader can easily seewhere clustering of the data occurs. For example, a small box with relatively long whis-kers would indicate that half of the data (from QL to QU) would be confined to a smallrange and the data could be described as clustered. A wide box with relatively shortwhiskers would indicate that half of the data (from QL to QU) would be spread over awide range and the data could be described as spread. Consider the boxplots belowwith their matching histograms.

Identification of extreme valuesExtreme values often make the whiskers appear longer than they should and hence givethe appearance that the data are spread over a much greater range than they really are.

If an extreme value occurs in a set of data it can be denoted by a small cross on the boxplot. The whisker is then shortened to the next largest (or smallest) figure.

The boxplot below shows that the lowest score was 5. This was an extreme value asthe rest of the scores were located within the range 15 to 42.

Size

Positively skewed data

f

Size

Clustered data

f

Size

Normally distributed data

f

SizeSpread data

f

Size

Negatively skewed data

f

0 5 10 15 20 25 30 35 40 45 Scale

×



The following stem-and-leaf plot gives the speed of 25 cars caught by a roadside speed camera.Key:8 2 = 82 km/h

8* 6 = 86 km/h

a Prepare a five number summary of the data.b Draw a boxplot of the data. (Identify any extreme values.)c Describe the distribution of the data.

Stem88*99*

1010*11

Leaf2 2 4 4 4 45 5 6 6 7 9 9 90 1 1 2 45 6 90 2

4

THINK WRITE

First identify the positions of the median and upper and lower quartiles. There are 25

pieces of data. The median is the th

score. The lower quartile is the median of the lower half of the data. The upper quartile is the median of the upper half of the data (each half contains 12 scores).

The median is the th score — that is, the13th score.

The QL is the th score in the lower half

— that is, the 6.5th score. That is, halfway between the 6th and 7th scores.The QU is halfway between the 6th and 7th scores in the upper half of the data.

Mark the position of the median and upper and lower quartiles on the stem plot.

Key: 8 2 = 82 km/h

8* 6 = 86 km/h

a Write the five number summary:The lowest score is 82.The lower quartile is between 84 and 85 — that is, 84.5.The median is 89.The upper quartile is between 94 and 95 — that is, 94.5.The greatest score is 114.

a Five number summary:82, 84.5, 89, 94.5, 114

1

n 1+2

------------

25 1+2

---------------

12 1+2

---------------

2

Stem88*99*

1010*11

Leaf2 2 4 4 4 45 5 6 6 7 9 9 90 1 1 2 45 6 90 2

4

QL

QU

Median

11WORKEDExample



1. Clear the Y= editor (press and) and turn off any existing plots by

pressing [STAT PLOT] and choosing4: PlotsOff.

2. Press and select 1:Edit to enter xdata in L1 and frequencies in L2 if data aregrouped.

3. Press [STAT PLOT] then andselect settings as below (use arrow keys andpress to make each choice). If dataare not grouped, leave Freq = 1.

4. Press , choose 9: ZoomStat thenpress .

5. Press to explore the plot.

THINK WRITE

b Start by ruling a suitable scale. Remember to include the units of measurement. The box represents the interquartile range so it runs from 84.5 to 94.5. The median is a vertical line in the box at 89. The whiskers should extend to the lowest score (82) and the highest score (114). But the score 114 is a great deal higher than any of the others in the set and might be regarded as an extreme value. It should be indicated by a cross and the whisker will extend only as far as 102 (the second largest number in the set).

b

c Even when the extreme value is excluded the data appear to be skewed with high values being spread over a much greater range.

c The data are skewed (positively) and include one extremely high value.

80 90 100 110 km/h

×

Graphics CalculatorGraphics Calculator tip!tip! Creating a boxplot from a frequency table

CASIO

Boxplot

Y=CLEAR

2nd

STAT

2nd ENTER

ENTER

ZOOMENTER

TRACE



Boxplots

1 From the following five number summary find:a the medianb the interquartile rangec the range.

6, 11, 13, 16, 32


101, 119, 122, 125, 128


39.2, 46.5, 49.0, 52.3, 57.8

4 The boxplot below shows the distribution of final points scored by a football teamover a season’s roster.

a What was the team’s greatest points score?b What was the team’s least points score?c What was the team’s median points score?d What was the range of points scored?e What was the interquartile range of points scored?

remember1. A five number summary is a list consisting of the lowest score, lower quartile,

median, upper quartile and greatest score of a set of data.2. A boxplot is a graphical representation of a five number summary and is a

powerful tool to show the spread of data.3. The box spans the interquartile range; the median is marked by a vertical line

inside the box and the whiskers extend to the lowest and greatest scores.4. Boxplots are always drawn to scale.5. If an extreme value occurs in a set of data, it can be denoted by a small cross;

the whisker is then shortened to the next largest (or smallest) value.

remember

10DWWORKEDORKEDEExample

10

50 70 90 110 130 150 Points


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4435 The boxplot below shows the distribution of data formed by counting the number of

honey bears in each of a large sample of packs.

a What was the largest number of honey bears in any pack?b What was the smallest number of honey bears in any pack?c What was the median number of honey bears in any pack?d What was the range of numbers of honey bears per pack?e What was the interquartile range of honey bears per pack?

Questions 6 to 8 refer to the following boxplot.

6

The median of the data is:

7

The interquartile range of the data:

E cannot be determined because of extreme values.

8

Which of the following is not true of the data represented by the boxplot?A One-quarter of the scores are between 5 and 20.B Half of the scores are between 20 and 25.C The lowest quarter of the data is spread over a wide range.D Most of the data are contained between the scores of 5 and 20.E The data are skewed left.

9 The number of sales made each day by a salesperson is recorded over a 2-weekperiod:

25, 31, 28, 43, 37, 43, 22, 45, 48, 33a Prepare a five number summary of the data. (There is no need to draw a stem-and-

leaf plot of the data. Just arrange them in order of size.)b Draw a boxplot of the data.

10 The data below show monthly rainfall in millimetres.

a Prepare a five number summary of the data.b Draw a boxplot of the data.

A 5 B 20 C 23 D 25 E 31

A is 23 B is 26 C is 5 D is 20 to 25

J F M A M J J A S O N D

10 12 21 23 39 22 15 11 22 37 45 30

30 35 40 45 50 55 60 Scale

5

×

10 15 20 25 30 Scale




GC program

UVstatistics

EXCEL Spreadsheet

Boxplots



11 The stemplot at right details the age of 25 offenders who were caught during random breath testing.a Prepare a five number summary of the data.b Draw a boxplot of the data.c Describe the distribution of the data.

12 The following stem-and-leaf plot details the price at which 30 houses in a particular suburb sold for.a Prepare a five number summary of the data.b Draw a boxplot of the data.(You might like to use a graphics calculator for this question.)

13 The following data detail the number of hamburgers sold by a fast food outlet every day over a 4-week period.

a Prepare a stem-and-leaf plot of the data. (Use a class size of 10.)

b Draw a boxplot of the data.(You might like to use a graphics calculator for this question.)

14 The following data show the ages of 30 mothers upon the birth of their first baby.

a Prepare a stem-and-leaf plot of the data. (Use a class size of 5.)b Draw a boxplot of the data. Indicate any extreme values appropriately.c Describe the distribution in words. What does the distribution say about the age

that mothers have their first baby?(You might like to use a graphics calculator for this question.)

M T W T F S S

125 144 132 148 187 172 181

134 157 152 126 155 183 188

131 121 165 129 143 182 181

152 163 150 148 152 179 181

222531212922

183219331817

171923232248

222325242418

242823202020


11

Key:1 8 = 18 yearsStem1234567

Leaf8 8 9 9 90 0 0 1 1 3 4 6 90 1 2 72 53 6 864

Key: 12 4 = $124 000Stem121314151617

Leaf4 7 90 0 2 5 50 0 2 3 5 5 7 9 90 0 2 3 7 7 80 2 2 5 85



Match the boxplot with its most likely histogram.

Back-to-back stem plotsIn chapter 9, we saw how to construct a stem plot for a set of univariate data. We canalso extend a stem plot so that it displays bivariate data. Specifically, we shall create astem plot that displays the relationship between a numerical variable and a categoricalvariable. We shall limit ourselves in this section to categorical variables with just twocategories, for example sex. The two categories are used to provide two, back-to-backleaves of a stem plot.

A back-to-back stem plot is used to display bivariate data, involving a numerical variable and a categorical variable with 2 categories.


A B

Size

f

Size

f

Size

f

Size

f

Size

fC D E

The girls and boys in Grade 4 at Kingston Primary School submitted projects on the Olympic Games. The marks they obtained out of 20 are given below.

Display the data on a back-to-back stem plot.Continued over page

Girls’ marks 16 17 19 15 12 16 17 19 19 16

Boys’ marks 14 15 16 13 12 13 14 13 15 14

12WORKEDExample

MQ Maths B Yr 11 Ch 10 Page 445 Monday, October 29, 2001 7:51 AM


The back-to-back stem plot allows us to make some visual comparisons of the twodistributions. In the above example the centre of the distribution for the girls is higherthan the centre of the distribution for the boys. The spread of each of the distributionsseems to be about the same. For the boys, the marks are grouped around the 12–15marks; for the girls, they are grouped around the 16–19 marks. On the whole, we canconclude that the girls obtained better marks than the boys did.

To get a more precise picture of the centre and spread of each of the distributions wecan use the summary statistics discussed in chapter 1. Specifically, we are interested in:1. the mean and the median (to measure the centre of the distributions), and2. the interquartile range and the standard deviation (to measure the spread of the

distributions).We saw in chapter 1 that the calculation of these summary statistics is very straight-

forward and rapid using a graphics calculator.

THINK WRITE

Identify the highest and lowest scores in order to decide on the stems.

Highest score = 19Lowest score = 12Use a stem of 1, divide into fifths.

Create an unordered stem plot first. Put the boys’ scores on the left, and the girls’ scores on the right.

Key: 1 2 = 12Leaf Stem LeafBoys Girls

13 2 3 3 1 2

4 5 4 5 4 1 56 1 6 7 6 7 6

1 9 9 9Now order the stem plot. The scores on the left should increase in value from right to left, while the scores on the right should increase in value from left to right.

Key: 1 2 = 12Leaf Stem LeafBoys Girls

3 3 3 2 1 25 5 4 4 4 1 5

6 1 6 6 6 7 71 9 9 9

1

2

3

The number of ‘how to vote’ cards handed out by various Australian Labor Party and Liberal Party volunteers during the course of a polling day is shown below.

Display the data using a back-to-back stem plot and use this, together with summary statistics, to compare the distributions of the number of cards handed out by the Labor and Liberal volunteers.

Labor 180193

233202

246210

252222

263257

270247

229234

238226

226214

211204

Liberal 204287

215273

226266

253233

263244

272250

285261

245272

267280

275279

13WORKEDExample



THINK WRITEConstruct the stem plot. Key: 18 0 = 180

Leaf Stem LeafLabor Liberal

0 183 19

4 2 20 44 1 0 21 5

9 6 6 2 22 68 4 3 23 3

7 6 24 4 57 2 25 0 3

3 26 1 3 6 70 27 2 2 3 5 9

28 0 5 7Use a graphics calculator to calculate the summary statistics: the mean, the median, the standard deviation and the interquartile range. Enter each set of data as a separate list. (See worked example 6 on how to use your graphics calculator to calculate these values.)

For the Labor volunteers:Mean = 227.9Median = 227.5Interquartile range = 36Standard deviation = 23.9

For the Liberal volunteers:Mean = 257.5Median = 264.5Interquartile range = 29.5Standard deviation = 23.4

Comment on the relationship. From the stem plot we see that the Labor distribution is symmetric and therefore the mean and the median are very close, whereas the Liberal distribution is negatively skewed.

Since the distribution is skewed, the median is a better indicator of the centre of the distribution than is the mean.

Comparing the medians therefore, we have the median number of cards handed out for Labor at 228 and for Liberal at 265, which is a big difference.

The standard deviations were similar as were the interquartile ranges. There was not a lot of difference in the spread of the data.

In essence, the Liberal Party volunteers handed out a lot more ‘how to vote’ cards than the Labor Party volunteers did.

1

2

3

remember1. A back-to-back stem plot displays bivariate data involving a numerical variable

and a categorical variable with two categories.2. In the ordered stem plot, the scores on the left side of the stem increase in value

from right to left.3. Together with summary statistics, back-to-back stem plots can be used for

comparing two distributions.

remember



Back-to-back stem plots

1 The marks (out of 50), obtained for the end-of-term test by the students in German andFrench classes are given below. Display the data on a back-to-back stem plot.

2 The birth masses of 10 boys and 10 girls (in kilograms, to the nearest 100 grams) arerecorded in the table below. Display the data on a back-to-back stem plot.

3 The number of delivery trucks making deliveries to a supermarket each day over a2-week period was recorded for two neighbouring supermarkets —supermarket A andsupermarket B. The data are shown below.

a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distri-

butions of the number of trucks delivering to supermarkets A and B.

4 The marks out of 20 for males and females on a science test for a Year-10 class aregiven below.


butions of the marks of the males and the females.

5 The end-of-year English marks for 10 students in an English class were compared over2 years. The marks for 1998 and for the same students in 1999 are shown below.


butions of the marks obtained by the students in 1998 and 1999.

German 20 38 45 21 30 39 41 22 27 33 30 21 25 32 37 42 26 31 25 37

French 23 25 36 46 44 39 38 24 25 42 38 34 28 31 44 30 35 48 43 34

Boys 3.4 5.0 4.2 3.7 4.9 3.4 3.8 4.8 3.6 4.3

Girls 3.0 2.7 3.7 3.3 4.0 3.1 2.6 3.2 3.6 3.1

A 11 15 20 25 12 16 21 27 16 17 17 22 23 24

B 10 15 20 25 30 35 16 31 32 21 23 26 28 29

Females 12 13 14 14 15 15 16 17

Males 10 12 13 14 14 15 17 19

1998 30 31 35 37 39 41 41 42 43 46

1999 22 26 27 28 30 31 31 33 34 36

10EWWORKEDORKEDEExample

12


13


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4496 The age and gender of a group of people attending a fitness class are recorded below.


butions of the ages of the female to male members of the fitness class.

7 The scores on a board game are recorded for a group of kindergarten children and for agroup of children in a preparatory school.

a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distributions

of the scores of the kindergarten children compared to the preparatory school children.

8The pair of variables that could be displayed on a back-to-back stem plot is:A the height of student and the number of people in the student’s householdB the time put into completing an assignment and a pass or fail score on the assignmentC the weight of a businessman and his ageD the religion of an adult and the person’s head circumferenceE the income bracket of an employees and the time the employee has worked for the

company

9A back-to-back stem plot is a useful way of displaying the relationship between:A the proximity to markets (km) and the cost of fresh foods on average per kilogramB height and head circumferenceC age and attitude to gambling (for or against)D weight and ageE the money spent during a day of shopping and the number of shops visited on that day

Female 23 24 25 26 27 28 30 31

Male 22 25 30 31 36 37 42 46

Kindergarten 3 13 14 25 28 32 36 41 47 50

Prep. School 5 12 17 25 27 32 35 44 46 52





Parallel boxplotsWe saw in the previous section that we could display relationships between a numericalvariable and a categorical variable with just two categories, using a back-to-back stem plot.

When we want to display a relationship between a numerical variable and acategorical variable with more than two categories, a parallel boxplot can be used.

A parallel boxplot is obtained by constructing individual boxplots for eachdistribution, using the common scale.

Construction of individual boxplots was discussed in detail earlier in this chapter(see page 438). In this section we concentrate on comparing distributions representedby a number of boxplots (that is, on the interpretation of parallel boxplots).

CASIO

WE 10-14

The four Year-7 classes at Western Secondary College complete the same end-of-year maths test. The marks, expressed as percentages for each of the students in the four classes, are given below.

Display the data using a parallel boxplot and use this to describe any similarities or differences in the distributions of the marks between the four classes.

7A 7B 7C 7D 7A 7B 7C 7D

40 60 50 40 69 78 70 69

43 62 51 42 63 82 72 73

45 63 53 43 63 85 73 74

47 64 55 45 68 87 74 75

50 70 57 50 70 89 76 80

52 73 60 53 75 90 80 81

53 74 63 55 80 92 82 82

54 76 65 59 85 95 82 83

57 77 67 60 89 97 85 84

60 77 69 61 90 97 89 90

THINK WRITE/DISPLAYCreate the first boxplot (for class 7A) on a graphics calculator using [STAT PLOT] and appropriate WINDOW settings. Using to show key values, sketch the first boxplot using pen and paper, leaving room for three additional plots.

12nd

TRACE

14WORKEDExample



THINK WRITE

Repeat step 1 for the other three classes. All four boxplots share the common scale.

Describe the similarities and differences between the four distributions.

Class 7B had the highest median mark and the range of the distribution was only 37. The lowest mark in 7B was 60.

We notice that the median of 7A’s marks is approxi-mately 60. So, 50% of students in 7A received less than 60. This means that half of 7A had scores that were less than the lowest score in 7B.

The range of marks in 7A was about the same as that of 7D with the highest scores in each about equal, and the lowest scores in each about equal. However, the median mark in 7D was higher than the median mark in 7A so, despite a similar range, more students in 7D received a higher mark than in 7A.

While 7D had a top score that was higher than that of 7C, the median score in 7C was higher than that of 7D and the bottom 25% of scores in 7D were less than the lowest score in 7C. In summary, 7B did best, followed by 7C then 7D and finally 7A.

2

30 40 50 60 70 80 90 100

7D

7C

7B

7A

Maths mark (%)

3

remember1. A relationship between a numerical variable and a categorical variable with

more than two categories can be displayed using a parallel boxplot.2. A parallel boxplot is obtained by constructing individual boxplots for each

distribution, using a common scale.

remember



Parallel boxplots

1 The heights (in cm) of students in 9A, 10A and 11A were recorded andare shown in the table below.

a Construct a parallel boxplot to show the data.b Use the boxplot to compare the distributions of height for the 3 classes.

2 The amounts of money contributed annually to superannuation schemes by people in3 different age groups are shown below.

a Construct a parallel boxplot to show the data.b Use the boxplot to comment on the distributions.

9A 10A 11A 9A 10A 11A 9A 10A 11A

120 140 151 146 153 164 158 168 175

126 143 153 147 156 166 160 170 180

131 146 154 150 162 167 162 173 187

138 147 158 156 164 169 164 175 189

140 149 160 157 165 169 165 176 193

143 151 163 158 167 172 170 180 199

20–29 30–39 40–49 20–29 30–39 40–49

2000 4000 10 000 6500 7000 13 700

3100 5200 11 200 6700 8000 13 900

5000 6000 12 000 7000 9000 14 000

5500 6300 13 300 9200 10 300 14 300

6200 6800 13 500 10 000 12 000 15 000

10F

GCpro

gram

UV stats


14EXCEL

Spreadsheet

Parallel boxplots


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4533 The numbers of jars of vitamin A, B, C and multi-vitamins sold per week by a local

chemist are shown below.

a Construct a parallel boxplot to display the data.b Use the boxplot to compare the distributions of sales for the 4 types of vitamin.

4The ages of the employees at 5 different companies of the same size are comparedusing the parallel boxplots shown below.

For each of the following, select from:

a Which company has the greatest range of ages?

b Which company has the greatest interquartile range of ages?

c Which company has the lowest median age?

d Which company has the greatest range of ages among their oldest 25% of employees?

Vitamin A 5 6 7 7 8 8 9 11 13 14

Vitamin B 10 10 11 12 14 15 15 15 17 19

Vitamin C 8 8 9 9 9 10 11 12 12 13

Multi-vitamins 12 13 13 15 16 16 17 19 19 20

A company A B company BC company C D company DE company E


20 25 30 35 40 45 50 55 60

Company A

Company B

Company C

Company E

Company D



Measures of central tendency

• The mean is given by where represents the sum of all observations in

the data set and n represents the number of observations in the data set.• The mean is calculated by using the values of the observations, and because of this

it becomes a less reliable measure of the distribution when the distribution is skewed or contains an outlier.

• To find the mean for grouped data, where f represents the frequency

of the data and m represents the midpoint of the class interval of the grouped data.• The median is the midpoint of a set of data. Half the data are less than or equal to

the median. Where there are no observations in a set of ordered data, the median is

located at the th position.

• The mode is the score in the data set with the highest frequency.

Range and interquartile range• The range of a data set is the difference between the highest and lowest values in

that data set.• The interquartile range IQR = Q3 − Q1.• There are four steps to locating Q1 and Q3.

Step 1. Write down the data set in order from lowest to highest.Step 2. Locate the median; that is, locate Q2.Step 3. Now consider the lower half of the data set. Find the middle score. This score is Q1.Step 4. Now consider the upper half of the data set. Find the middle score. This score is Q3.

• The values of the median as well as Q1 and Q3 can be estimated by using an ogive.

Standard deviation• The standard deviation of a group of scores can be found using the formula:

σ = or by using a calculator.

• The standard deviation of a population can be predicted from a sample of scores by using the formula:

or by using a calculator.

Boxplots• A five number summary is a list, consisting of the lowest score, lower quartile,

median, upper quartile and the greatest score (in that order) of the data.

summaryx

x∑n

--------= x∑

x f m×( )∑=f∑

----------------------------------

n 1+2

------------

f xi x–( )2∑n

-----------------------------

sf xi x–( )2∑n 1–

-----------------------------=


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 455• A boxplot is a graph of the five number summary.• The boxplot is a powerful tool to show the spread of the data.• Boxplots are always drawn to scale.• The box spans the interquartile range; the median is marked by the vertical line

inside the box; the whiskers extend to the lowest and greatest scores.

• The extreme values can be denoted by a small cross; the whiskers are then shortened to the next largest (or smallest) value.

Comparing sets of data• Back-to-back stem plots

1. are useful to compare the distribution of two similar sets of data2. share the same stem3. contain a key, which usually relates to data on the right4. have the data on the left arranged outwards from the plot as it increases.

• Parallel boxplots 1. are useful for quantitative comparisons2. share a common scale3. compare two or more sets of data.

Indicates thelowest score

Indicates thelower quartile

Indicates themedian

Indicates theupper quartile

Indicates thegreatest score



1 Calculate the mean of each of the following sets of scores.

a 4, 9, 5, 3, 5, 6, 2, 7, 1, 10b 65, 67, 87, 45, 90, 92, 50, 23c 7.2, 7.9, 7.0, 8.1, 7.5, 7.5, 8.7d 5, 114, 23, 12, 25

2 Complete the frequency distribution table below and use it to estimate the mean of the distribution.

3 Use the statistics function on your calculator to find the mean of each of the following sets of scores.

a 2, 18, 26, 121, 96, 32, 14, 2, 0, 0b 2, 2, 12, 12, 12, 32, 32, 47, 58c 0.2, 0.3, 0.6, 0.4, 0.3, 0.7, 0.8, 0.6, 0.5, 0.4, 0.1

4 Use the statistics function on your calculator to find the mean of the following distributions. Where necessary, give your answers correct to 1 decimal place.

Class Class centre (x) Frequency (f )

21–24 3

25–28 9

29–32 17

33–36 31

37–40 29

41–44 25

45–48 19

49–52 10

Σ f =

10A

CHAPTERreview

10A

10A

10Aa bScore Frequency

10 23

20 47

30 68

40 56

50 17

Score Frequency

24 45

25 89

26 124

27 102

28 78

29 46



c

5 For each of the following sets of scores, find the median.

a 25, 26, 26, 27, 27, 28, 30, 32, 35

b 4, 5, 8, 5, 8, 6, 7, 10, 4, 8, 4

c 3.2, 3.1, 3.0, 3.5, 3.2, 3.2, 3.2, 3.6

d 2, 3, 7, 4, 4, 8, 5, 7, 7, 6

e 121, 135, 111, 154, 147, 165, 101, 108

6 Copy and complete each of the following frequency tables and then use them to find the median.

a

b

Class Class centre Frequency

10–12 11 18

13–15 14 32

16–18 17 34

19–21 20 40

22–24 23 28

25–27 26 14

28–30 29 6

Score Frequency Cumulative frequency

0 2

1 6

2 11

3 7

4 6

5 3

Score Frequency Cumulative frequency

54 2

55 5

56 14

57 11

58 6

59 1

60 1

10A

10A



c

7 Use the frequency table below to state the modal class.

8 Below are the number of goals scored by a netball team in ten matches in a tournament.25 26 19 24 28 67 21 22 28 18

a Calculate the mean.b Calculate the median.c Calculate the mode.d Which of the above is the best summary statistic? Explain your answer.

9 Give an example of a statistical analysis where the best summary statistic is:

10 Find the range of each of the following sets of scores.a 28 24 26 24 25 29 22 27 25b 118 2 56 45 72 43 69 84 159 0c 1.9 0.7 0.5 0.8 1.1 1.5 1.4

11 For each of the data sets in question 10 calculate:i the median iii the lower quartile

iii the upper quartile iv the interquartile range.

Score FrequencyCumulative frequency

66 8

67 10

68 12

69 14

70 7

71 5

72 4

Class Class centre Frequency

30–33 31.5 12

34–37 35.5 26

38–41 39.5 34

42–45 43.5 45

46–49 47.5 52

50–53 51.5 23

a the mean b the median c the mode.

10A

10A

10A

10B

10B


C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 45912 The cumulative frequency histogram and polygon at right

shows the number of goals scored by a soccer team in each match over a season. Use the graph to calculate:a the medianb the lower and upper quartilesc the interquartile range.

13 The cumulative frequency histogram and polygon below show the number of apples on each tree in an orchard.

Use the graph to find:a the medianb the lower and upper quartilesc the interquartile range.

14 The following scores were recorded: 2, 6, 9, 1, 3, 5, 6, 7, 8, 2. For these data find:a the meanb the medianc the moded the rangee the interquartile rangef the standard deviation.

15

For the above set of data find:

Score Frequency Score Frequency

30–39 3 60–69 15

40–49 6 70–79 18

50–59 12 80–89 10

a the mean b the median group c the modal groupd the range e the standard deviation

10B

005

101520253035

1 2 3 4Number of goals scored

in a soccer match

Cum

ulat

ive

freq

uenc

y

4045

10B

34.50

50100150

44.554.5

64.574.5

84.5

Number of apples on a tree

Cum

ulat

ive

freq

uenc

y

200250300

10B,C

10B,C



16 For the box-and-whisker plot drawn below:

a state the medianb calculate the rangec calculate the interquartile range.

17 The number of babies born each day at a hospital over a year is tabulated and the five number summary is given below.

Lower extreme = 1Upper quartile = 16Lower quartile = 8Upper extreme = 18Median = 14

Show this information in a box-and-whisker plot.

18 The number of hours of counselling received by a group of 9 full-time firefighters and 9 volunteer firefighters after a serious bushfire is given below.

a Construct a back-to-back stem plot to display the data.b Comment on the distributions of the number of hours of counselling of the full-time

firefighters and the volunteers.

19 The IQ of 8 players in 3 different football teams were recorded and are shown below.

Display the data in parallel boxplots.

Full-time 2 4 3 5 2 4 6 1 3

Volunteer 8 10 11 11 12 13 13 14 15

Team A 120 105 140 116 98 105 130 102

Team B 110 104 120 109 106 95 102 100

Team C 121 115 145 130 120 114 116 123

10D

0 5 10 15 20 25 30 35 40 45 50 55 60

10D

10E

testtest

CHAPTERyyourselfourself

testyyourselfourself

10

10F


Documents

MQ Maths B Yr 11 Ch 10mathsbooks.net/Maths Quest 11B for Queensland/Ch 10... · Chapter 10 Summary statistics 415 Measures of central tendency The mean The mean of a set of data is