Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
10syllabussyllabusrrefefererenceenceTopic:• Applied statistical
analysis
In thisIn this chachapterpter10A Measures of central
tendency10B Range and interquartile
range10C The standard deviation10D Boxplots10E Back-to-back stem plots10F Parallel boxplots
Summarystatistics
MQ Maths B Yr 11 Ch 10 Page 413 Friday, October 26, 2001 10:33 AM
414 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
IntroductionYvonne works in the quality control department of a soft drink bottling company. Softdrink is bottled by two machines, each of which is set to pour one litre of soft drinkinto every bottle. It is part of Yvonne’s job to take 20 bottles of soft drink from eachmachine and measure the contents in millilitres. The results she obtained from one suchcheck are shown below.
Yvonne must use these results to assess if the machines are sufficiently accurate indispensing soft drink into the bottles.
One of the main tasks of a statistician is to summarise large volumes of data. It isuseful to find one score that is typical of a whole set of data, or a few figures which candescribe its distribution.
Finding the mean, median and mode are three different methods of arriving at ascore that is typical or central to the data set. Mean, median and mode are often calledmeasures of central tendency.
Machine A 1009 992 990 1018 1017 985 984 1008 1020 1005
992 983 1020 988 996 984 989 1014 995 1004
Machine B 1002 991 990 980 1004 1018 1008 997 992 999
1010 1004 1001 1003 1009 1004 1006 1001 997 994
Everybody is different. How do we measure what is typical?
MQ Maths B Yr 11 Ch 10 Page 414 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 415
Measures of central tendencyThe meanThe mean of a set of data is what is referred to in everyday language as the average.
For the set of data 4 7 9 12 18:
mean =
= 10
The symbol we use to represent the mean is , that is, a lower-case x with a bar on
top. So, in this case, = 10.The formal definition of the mean is:
=
where Σx represents the sum of all of the observations in the data set and n representsthe number of observations in the data set.
Note that the symbol, Σ, is the Greek letter, sigma, which represents ‘the sum of’.The mean is also referred to as a summary statistic and is a measure of the centre of
a distribution. The mean is the point about which the distribution ‘balances’.Consider the masses of 7 potatoes, given in
grams, below.100 120 130 145 160 170 190The mean is 145 g. The observ-
ations 130 and 160 ‘balance’each other since they are each15 g from the mean. Similarly,the observations 120 and 170‘balance’ each other since theyare each 25 g from the mean, asdo the observations 100 and 190.Note that the median is also 145 g.That is, for this set of data the meanand the median give the same value for thecentre. This is because the distribution is symmetric.
Now consider two cases in which the distribution of data is not symmetric.
Case 1Consider the masses of a different set of 7 potatoes, given in grams below.
100 105 110 115 120 160 200
The median of this distribution is 115 g and the mean is 130 g. There are 5 observationsthat are less than the mean and only 2 that are more. In other words, the mean does notgive us a good indication of the centre of the distribution. However, there is still a ‘bal-ance’ between observations below the mean and those above, in terms of the spread ofall the observations from the mean. Therefore, the mean is still useful to give a measureof the central tendency of the distribution but in cases where the distribution is skewed,the median gives a better indication of the centre. For a positively skewed distribution,as in the case above, the mean will be greater than the median. For a negatively skeweddistribution the mean will be less than the median.
4 7 9 12 18+ + + +5
----------------------------------------------
x
x
xΣxn
------
MQ Maths B Yr 11 Ch 10 Page 415 Friday, October 26, 2001 10:33 AM
416 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Case 2Consider the data below, showing the weekly income (to the nearest $10) of 10 families living in a suburban street.
$300 $670 $680 $690 $700 $710 $710$720 $730 $750
In this case, = = $666, and the
median is $705.One of the values in this set, $300, is
clearly an outlier. As a result, the value of the mean is below the weekly income of the other 9 households. In such a case themean is not very useful in establishing the centre; however, the ‘balance’ still remainsfor this negatively skewed distribution.
The mean is calculated by using the values of the observations and because of this itbecomes a less reliable measure of the centre of the distribution when the distribution isskewed or contains an outlier. Because the median is based on the order of the observ-ations rather than their value, it is a better measure of the centre of such distributions.
Calculating the mean using a graphics calculator
When data are presented in a frequency table with class intervals and we don’t knowwhat the raw data are, we employ another method to find the mean of these groupeddata. This other method is shown in the example that follows and uses the midpoints ofthe class intervals to represent the raw data.
x6660
10------------
CASIO
WE 10-1 Mean
Calculate the mean of the set of data below.10, 12, 15, 16, 18, 19, 22, 25, 27, 29
THINK WRITE/DISPLAYEnter the data in L1. (Press and select 1:Edit.)
Calculate the mean.(a) Press .(b) Highlight CALC in the top line.(c) Highlight 1:1–Var Stats and press
.(d) Press L1 and press .(e) Several values are given. The top
entry = 19.3 gives us the mean. = 19.3
1 STAT
2STAT
ENTER2nd ENTER
x x
1WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 416 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 417
Remember that the Greek letter sigma, Σ, represents ‘the sum of’. So, Σf means thesum of the frequencies and is the total of all the numbers in the frequency column.To find the mean for grouped data,
=
where f represents the frequency of the data and m represents the midpoint of the classinterval of the grouped data.
The ages of a group of 30 people attending a superannuation seminar are recorded in the frequency table below.
Calculate the mean age of those attending the seminar.
Age(class intervals)
Frequencyf
Age(class intervals)
Frequencyf
20–2930–3940–49
16
13
50–5960–6970–79
631
THINK WRITE
Since we don’t have individual raw ages, but rather a class interval, we need to decide on one particular age to represent each interval. We use the midpoint, m, of the class interval. Add an extra column to the table to display these.The midpoint of the first interval is
, the midpoint of the
second interval is 34.5 and so on.
So, =
≈ 46.8 (correct to 1 decimal place)
Multiply each of the midpoints by the frequency and display these values in another column headed f × m. For the first interval we have 24.5 × 1 = 24.5. For the second interval we have 34.5 × 6 = 207 and so on.
Sum the product of the midpoints and the frequencies in the f × m column.24.5 + 207 + 578.5 + 327 + 193.5 + 74.5 = 1405
Divide this sum by the total number of people attending the seminar.
1
20 29+2
------------------ 24.5=
Age(class
intervals)Frequency
f
Mid-point of
class interval
m f × m
20–2930–3940–4950–5960–6970–79
16
13631
24.534.544.554.564.574.5
24.5207578.5327193.5
74.5
Σf = 30 Σ( f × m)= 1405
x1405
30------------
2
3
4
2WORKEDExample
xΣ f m×( )
Σf------------------------
MQ Maths B Yr 11 Ch 10 Page 417 Friday, October 26, 2001 10:33 AM
418 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
The medianThe median is the midpoint of a set of data. Half the data are less than or equal to themedian.
Consider the set of data: 2 5 6 8 11 12 15. These data are in ordered form (that is,from lowest to highest). There are 7 observations. The median in this case is the middleor fourth score; that is, 8.
Consider the set of data: 1 3 5 6 7 8 8 9 10 12. These data are in orderedform also; however, in this case there is an even number of scores; that is, there are10 scores. The median in this case lies halfway between the 5th score (7) and the
6th score (8). So the median is 7.5. (Alternatively, median = .)
When there are n records in a set of ordered data, the median can be located at the
th position.
CASIO
WE 10-2 Mean
Solution to worked example 2 using a graphics calculator
THINK DISPLAY
Since we don’t have individual raw ages, but rather a class interval, we need to decide on one particular age to represent each interval. We use the midpoint, m, of the class interval. The midpoint of the first
interval is = 24.5. The midpoint
of the second interval is 34.5 and so on.Enter these midpoints as list 1.Enter the frequency of each of the class intervals as list 2. For example, L1(3)=44.5 and L2(3)=13.
Calculate the mean.(a) Press .(b) Highlight CALC in the top line.(c) Highlight 1:1–Var Stats and press
.(d) Enter L1 and L2 by pressing [L1],
and [L2]; then pressing .
A number of values is given. The top entry, = 46.8, gives us the mean. =
1
20 29+2
------------------
2
3STAT
ENTER2nd
, 2ndENTER
x x 46.8
7 8+2
------------ 7.5=
n 1+2
------------
MQ Maths B Yr 11 Ch 10 Page 418 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 419Checking this against our previous example, we have n = 10; that is, there were 10
observations in the set. The median was located at the = 5.5th position; that
is, halfway between the 5th and the 6th terms.A stem plot provides a quick way of locating a median since the data in a stem plot
are already ordered.
ModeThere are many examples where neither the mean nor the median is the appropriatemeasure of the typical score in a data set.
Consider the case of a clothing store. It needs to re-order a supply of dresses. Toknow what sizes to order it looks at past sales of this particular style and gathers thefollowing data:
8 12 14 12 16 10 12 14 16 1814 12 14 12 12 8 18 16 12 14
For this data set the mean dress size is 13.2. Dresses are not sold in size 13.2, so thishas very little meaning. The median is 13, which also has little meaning as dresses aresold only in even-numbered sizes.
What is most important to the clothing store is the dress size that sells the most. Inthis case size 12 occurs most frequently. The score that has the highest frequency iscalled the mode.
When two scores occur most often an equal number of times, both scores are givenas the mode. In this situation the scores are bimodal. If all scores occur most often anequal number of times, then the distribution has no mode.
To find the mode from a frequency distribution table, we simply give the score thathas the highest frequency.
10 1+2
---------------
Consider the stem plot below which contains 22 observations. What is the median?
THINK WRITE
Find the median position, where n = 22. Median = th position
= th position
= 11.5th positionFind the 11th and 12th terms. 11th term = 35
12th term = 38The median is halfway between the 11th and 12th terms. Median = 36.5
1n 1+
2------------
22 1+2
---------------
2
3
3WORKEDExample
Stem223344
Leaf3 35 7 91 3 3 4 45 8 9 90 2 26 8 8 8 9
Key: 3|4 = 34
MQ Maths B Yr 11 Ch 10 Page 419 Friday, October 26, 2001 10:33 AM
420 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
When a table is presented using grouped data, we do not have a single mode. Inthese cases, the class with the highest frequency is called the modal class.
For the frequency distribution at right, state the mode.
THINK WRITE
The highest frequency is 14 which belongs to the score 17 and so 17 is the mode.
Mode = 17
Score Frequency
14 3
15 6
16 11
17 14
18 10
19 7
4WORKEDExample
remember1. The mean is given by = where Σx represents the sum of all the
observations in the data set and n represents the number of observations in the data set.
2. The mean is calculated by using the values of the observations and because of this it becomes a less reliable measure of the centre of the distribution when the distribution is skewed or contains an outlier.
3. To find the mean for grouped data, = where f represents the
frequency of the data and m represents the midpoint of the class interval of the grouped data.
4. The median is the midpoint of a set of data. Half the data are less than or equal to the median.
5. When there are n observations in a set of ordered data, the median can be
located at the th position.
6. The mode is the score with the highest frequency.
x Σxn
-------
xΣ f m×( )
Σ f--------------------------
n 1+2
------------
remember
MQ Maths B Yr 11 Ch 10 Page 420 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 421
Measures of central tendency
A graphics calculator may be used for this exercise.
1 Find the mean of each of the following sets of data.a 5 6 8 8 9b 3 4 4 5 5 6 7 7 7 8 8 9 9 10 10 12c 4.3 4.5 4.7 4.9 5.1 5.3 5.5 5.6d 11 13 15 15 16 18 20 21 22e 0.4 0.5 0.7 0.8 0.8 0.9 1.0 1.1 1.2 1.0 1.3
2 Calculate the mean of each of the following and explain whether or not it gives us agood picture of the centre of the data.a 0.7 0.8 0.85 0.9 0.92 2.3b 14 16 16 17 17 17 19 20c 23 24 28 29 33 34 37 39d 2 15 17 18 18 19 20
3 The number of people attending sculpture classes at the local TAFE college for eachweek during the first semester is given below.
15 12 15 11 14 8 14 15 11 107 11 12 14 15 14 15 9 10 11
What is the mean number of people attending each week? (Express your answer to thenearest whole number.)
4
The ages of a group of junior pilots joining an international airline are indicated on thestem plot below.
5
The mean age of this group of pilots is:A 20B 28C 29D 29.15E 29.5
Key: 2|1 = 21 yrs
Stem2222233333
Leaf124 56 6 78 8 8 90 1 12 34 468
The number of people present each week at a 15-week horticul-tural course is given by the stem plot at right.The mean number of people attending each week was closest to:A 17.7 B 18C 19.5 D 20E 21.2
Key: 2|4 = 24 people
Stem001122
Leaf472 45 5 6 7 81 2 47 7 7
10A
Mathcad
One-variablestatistics
EXCEL Spreadsheet
One-variablestatistics
WWORKEDORKEDEExample
1
GC program
UV stats
mmultiple choiceultiple choice
mmultiple choiceultiple choice
MQ Maths B Yr 11 Ch 10 Page 421 Friday, October 26, 2001 10:33 AM
422 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
6 For each of the following, write down whether the mean or the median would providea better indication of the centre of the distribution.
a A positively skewed distribution
b A symmetric distribution
c A distribution with an outlier
d A negatively skewed distribution
7 Find the mean of each set of data given below.
8 The ages of people attending a beginner’s course in karate are indicated in thefrequency table below.
What is the mean age of those attending the course? (Express your answer correct to the nearest whole number.)
a Class interval Frequency, f b Class interval Frequency, f
0–910–1920–2930–3940–4950–59
136
1712
5
0–45–9
10–1415–1925–2930–34
257
1386
c Class interval Frequency, f d Class interval Frequency, f
0–4950–99
100–149150–199200–249250–299
278
1412
5
1–67–12
13–1819–2425–3031–36
141923222014
AgeFrequency,
f
10–14
15–19
20-24
25–29
30–34
35–39
40–44
45–49
5
5
7
4
3
2
2
1
EXCEL
Spreadsheet
One-variable statistic with intervals
WWORKEDORKEDEExample
2
MQ Maths B Yr 11 Ch 10 Page 422 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4239 Write down the median of the sets of data shown in the following stem plots. The key
for each stem plot is 3 | 4 = 34.
10 For each of the following sets of data, write down the median.a 2 4 6 7 9b 12 15 17 19 21c 3 4 5 6 7 8 9d 3 5 7 8 12 13 15 16e 12 13 15 16 18 19 21 23 24 26f 3 8 4 2 1 6 5g 16 21 14 28 23 15 11 19 25h 7 4 3 4 9 5 10 4 2 11i 29 23 22 33 26 18 37 22 16
11 Find the mode for each of the following. (Hint: Some are bimodal and others have no mode.)a 16, 17, 19, 15, 17, 19, 14, 16, 17b 147, 151, 148, 150, 148, 152, 151c 2, 3, 1, 9, 7, 6, 8d 68, 72, 73, 72, 72, 71, 72, 68, 71, 68e 2.6, 2.5, 2.9, 2.6, 2.4, 2.4, 2.3, 2.5, 2.6
a Stem0123456
Leaf72 32 4 5 7 90 2 3 6 8 84 7 8 9 92 7 81 3
b Stem0000011111
Leaf0 0 1 12 2 3 34 4 5 5 5 5 5 5 5 56 6 6 6 78 8 8 90 0 13 35 57
c Stem0000011111
Leaf124 4 56 6 6 78 8 8 8 9 90 0 0 1 1 1 12 2 2 3 3 34 4 5 56 7 78 9
d Stem3333344444
Leaf1
68 90 0 1 1 12 2 3 3 3 34 5 5 56 79
e Stem6061626364656667
Leaf2 5 81 3 3 6 7 8 90 1 2 4 6 7 8 8 92 2 4 5 7 83 6 74 5 83 54
WWORKEDORKEDEExample
3
GC program
UV stats
Mathcad
One-variablestatistics
EXCEL Spreadsheet
One-variablestatistics
MQ Maths B Yr 11 Ch 10 Page 423 Friday, October 26, 2001 10:33 AM
424 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
12 Use the tables below to state the mode of the distribution.
13 For each of the following grouped distributions, state the modal class.
14 The following data give the age of 25 patients admitted to the emergency ward of ahospital.
18 16 6 75 2423 82 74 25 2143 19 84 72 3174 24 20 63 7980 20 23 17 19
a Represent the data in a frequency distribution table. (Use classes 0–14, 15–29,30–44, etc.)
b Find the mean age of patients admitted.c Find the median class of age of patients admitted.d Find the modal class for age of patients admitted.e Do any of your statistics (mean, median or mode) give a clear representation of the
typical age of an emergency ward patient?f Give some reasons that could explain the pattern of the distribution of data in this
question.
WWORKEDORKEDEExample
4 a Score Frequency
1 2
2 4
3 5
4 6
5 3
b Score Frequency
5 1
6 3
7 5
8 8
9 5
10 3
c Score Frequency
38 2
39 4
40 1
41 5
42 6
43 3
44 6
45 2
a bClass Frequency
1–4 6
5–8 12
9–12 30
13–16 23
17–20 46
21–24 27
25–28 9
Class Frequency
1–7 3
8–14 8
15–21 9
22–28 25
29–35 12
36–42 11
43–49 2
MQ Maths B Yr 11 Ch 10 Page 424 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 42515 The batting scores for two cricket players over six innings are as
follows:
Player A 31, 34, 42, 28, 30, 41 Player B 0, 0, 1, 0, 250, 0
a Find the mean score for each player.b Which player appears to be better if the mean result is
used?c Find the median score for each player.d Which player appears to be better when the decision is
based on the median result?e Which player do you think would be more useful to
have in a cricket team and why? How can the meanresult sometimes lead to a misleading conclusion?
Mean and median amount of soft drinkRemember Yvonne’s quality control check? We are going to calculate the mean and median amount of soft drink dispensed by each machine into each bottle.
1 Use your graphics calculator to store the data for Machine A as a list. Name the list MA. The data for Machine A were
2 Use your graphics calculator to store the data for Machine B as a list. Name the list MB. The data for machine B were
3 Use the statistics function of the graphics calculator to find the mean and median amount of soft drink dispensed into bottles by each machine.
4 At this stage can you say which machine most accurately dispenses soft drink into bottles?
Note :
1. If you do not have a graphics calculator, you can still calculate the mean and median of each data set.
2. If you do have a graphics calculator, It would be easier to enter the two sets of data under L1 and L2 (press and select 1:Edit); however, if the data are stored as named lists MA and MB these can be retained for use later in the chapter.
WorkS
HEET10.1
CASIO
Softdrink
1009 992 990 1018 1017 985 984 1008 1020 1005
992 983 1020 988 996 984 989 1014 995 1004
1002 991 990 980 1004 1018 1008 997 992 999
1010 1004 1001 1003 1009 1004 1006 1001 997 994
STAT
MQ Maths B Yr 11 Ch 10 Page 425 Friday, October 26, 2001 10:33 AM
426 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Range and interquartile rangeWe have now looked at measures of central tendency, but although a set of scores mayhave the same mean, median and mode they still may be very different data sets. Con-sider the results obtained by two groups of 10 students on the same mathematics test.Group A: 45, 46, 47, 48, 50, 50, 52, 53, 54, 55Group B: 10, 20, 30, 40, 50, 50, 60, 70, 80, 90In both groups the mean, median and mode mark is 50, but we can see that they arevery different data sets. We can see that Group A has a very bunched group of scoresbut Group B’s scores are very spread out.
The range and interquartile range are examples of a measure of spread. Thesemeasures of spread help us analyse the spread of various data sets.
The rangeThe range is the easiest of this group of summary statistics to calculate. The range of aset of data is the difference between the highest and lowest values in that set.
It is usually not too difficult to locate the highest and lowest values in a set of data.Only when there is a very large number of observations might the job be made moredifficult. In the example above that compared the results on a mathematics test by twogroups the range for Group A is found by subtracting the lowest score (45) from thehighest score (55). Similarly, we can say that the range for Group B is 90 − 10 = 80.Statistically we can write the lowest score as minX and the highest score as maxX andso the range can be found using the formula
Range = maxX − minB.The values of maxX and minX can be found using a graphics calculator as you will seelater in the chapter.
While the range gives us some idea about the spread of the data it is not terriblyinformative since it gives us no idea of how the data are distributed between the highestand lowest values.
The interquartile rangeWe have seen that the median divides a set of data in half. Similarly, quartiles divide aset of data in quarters. The symbols used to refer to these quartiles are Q1, Q2 and Q3.
The middle quartile, Q2, is the median.
The interquartile range IQR = Q3 − Q1
The interquartile range gives us the range of the middle 50% of values in our set of data.There are four steps to locating Q1 and Q3.
Step 1. Write down the data in ordered form from lowest to highest.Step 2. Locate the median; that is, locate Q2.Step 3. Now consider just the lower half of the set of data. Find the middle score. This
score is Q1.Step 4. Now consider just the upper half of the set of data. Find the middle score. This
score is Q3.The four cases given below illustrate this method.
Case 1Consider data containing the 6 observations: 3 6 10 12 15 21.The data are already ordered. The median is 11.
MQ Maths B Yr 11 Ch 10 Page 426 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 427Consider the lower half of the set, which is 3 6 10. The middle score is 6, so Q1 = 6.Consider the upper half of the set, which is 12 15 21. The middle score is 15, so Q3 = 15.
Case 2Consider a set of data containing the 7 observations: 4 9 11 13 17 23 30.The data are already ordered. The median is 13.Consider the lower half of the set, which is 4 9 11. The middle score is 9, so Q1 = 9.Consider the upper half of the set, which is 17 23 30. The middle score is 23, so Q3 = 23.
Case 3Consider a set of data containing the 8 observations: 1 3 9 10 15 17 21 26.The data are already ordered. The median is 12.5.Consider the lower half of the set, which is 1 3 9 10. The middle score is 6, so Q1 = 6.Consider the upper half of the set, which is 15 17 21 26. The middle score is 19, so Q3 = 19.
Case 4Consider a set of data containing the 9 observations: 2 7 13 14 17 19 21 25 29.The data are already ordered. The median is 17.Consider the lower half of the set, which is 2 7 13 14. The middle score is 10, so Q1 = 10.Consider the upper half of the set, which is 19 21 25 29. The middle score is 23, so Q3 = 23.
A graphics calculator provides possibly the fastest way of locating quartiles and hencefinding the value of the interquartile range.
The ages of the patients who attended the casualty department of an inner suburban hospital on one particular afternoon are shown below.
14 3 27 42 19 17 73 60 62 21 23 2 5 58 33 19 81 59 25 17 69
Find the interquartile range of these data.
THINK WRITEOrder the data. 2 3 5 14 17 17 19 19 21 23
25 27 33 42 58 59 60 62 69 73 81Find the median. The median is 25 since ten scores lie below it and ten
lie above it.Find the middle score of the lower half of the data.
For the scores 2 3 5 14 17 17 1919 21 23, the middle score is 17. So, Q1 = 17.
Find the middle score of the upper half of the data.
For the scores 27 33 42 58 59 60 6269 73 81, the middle score is 59.5.So, Q3 = 59.5.
Calculate the interquartile range. IQR = Q3 − Q1 = 59.5 − 17 = 42.5
1
2
3
4
5
5WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 427 Friday, October 26, 2001 10:33 AM
428 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
In most cases we are asked to find the interquartile range of a grouped distribution. Thisrequires us to draw a cumulative frequency polygon and find the 25th and 75th percentile.
A percentile is a measure of where in a set of scores an individual score lies. Forexample, the 25th percentile has 25% of scores below it and 75% above it.
To find the interquartile range, draw a second vertical axis that shows the 25th, 50th,and 75th percentile. A line is drawn from the 25th, 50th and 75th percentile to the ogiveand then down to the horizontal axis. The value for the quartiles can then be calculated.
The median is the score that is found at the 50th percentile.
CASIO
WE10–6
Parents are often shocked at the amount of money their children spend. The data below give the amount spent (to the nearest whole dollar) by each child in a group that was taken on an excursion to the Exhibition.
15 12 17 23 21 19 16 11 17 18 2324 25 21 20 37 17 25 22 21 19
Calculate the interquartile range for these data.
THINK DISPLAYEnter the data.(a) Press .(b) Select 1:Edit by pressing .(c) Enter the data in L1.
Obtain the values of the quartiles.(a) Press .(b) Select CALC.(c) Select 1:1–Var Stats by pressing
.(d) Enter L1 (press [L1]). Press
.
A list of statistics appears. We shall be using a number of these later. We are looking for the first and third quartiles.Scroll down the screen using the key.
Q1 = 17 and Q3 = 23So, IQR = 23 − 17 = 6
1STAT
ENTER
2STAT
ENTER2nd
ENTER
3
�
6WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 428 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 429
The cumulative frequency histogram and polygon at right shows the number of customers who order different volumes of concrete from a readymix concrete company during a day.
Find the:a medianb interquartile range for this distribution.
THINK DISPLAY/WRITE
a Draw a vertical axis showing the percentiles.
a
Draw a line for the 50th percentile to the ogive and estimate the median.
Median = 0.9
b Draw a line for the 25th and 75th percentiles and estimate these values.
b Lower quartile = 0.4Upper quartile = 1.6
Calculate the interquartile range by subtracting the lower quartile from the upper quartile.
Interquartile range = 1.6 − 0.4= 1.2
0.250
10
20
30
0.751.25
1.752.25
2.75
Number of customers
Cum
ulat
ive
freq
uenc
y
40
50
1
0.250
10
20
30
0.751.25
1.752.25
2.75
Number of customers
40
50 100%
75%
50%
25%
0%
2
1
2
7WORKEDExample
remember1. The range of a set of data is the difference between the highest and lowest
values in that set.2. The interquartile range IQR = Q3 − Q1.3. The interquartile range gives us the range of the middle 50% of values in our
set of data.4. There are four steps to locating Q1 and Q3.
Step 1. Write down the set of data in ordered form from lowest to highest.Step 2. Locate the median, that is, locate Q2.Step 3. Now consider just the lower half of the set of data. Find the middle
score. This score is Q1.Step 4. Now consider just the upper half of the set of data. Find the middle
score. This score is Q3.5. The interquartile range of a grouped distribution is estimated from an ogive.
remember
MQ Maths B Yr 11 Ch 10 Page 429 Friday, October 26, 2001 10:33 AM
430 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Range and interquartile range
1 Write down the range of the sets of data shown in the following stem plots. The key foreach stem plot is 3 | 4 = 34.
2 For each of the following sets of data, write down the range.a 2 4 6 7 9b 12 15 17 19 21c 3 4 5 6 7 8 9d 3 5 7 8 12 13 15 16e 12 13 15 16 18 19 21 23 24 26f 3 8 4 2 1 6 5g 16 21 14 28 23 15 11 19 25h 7 4 3 4 9 5 10 4 2 11i 29 23 22 33 26 18 37 22 16
a Stem0123456
Leaf72 32 4 5 7 90 2 3 6 8 84 7 8 9 92 7 81 3
b Stem0000011111
Leaf0 0 1 12 2 3 34 4 5 5 5 5 5 5 5 56 6 6 6 78 8 8 90 0 13 35 57
c Stem0000011111
Leaf124 4 56 6 6 78 8 8 8 9 90 0 0 1 1 1 12 2 2 3 3 34 4 5 56 7 78 9
d Stem3333344444
Leaf1
68 90 0 1 1 12 2 3 3 3 34 5 5 56 79
e Stem6061626364656667
Leaf2 5 81 3 3 6 7 8 90 1 2 4 6 7 8 8 92 2 4 5 7 83 6 74 5 83 54
10B
EXCEL
Spreadsheet
One-variable statistics
Mathca
d
One-variable statistics
GCpro
gram
UV stats
MQ Maths B Yr 11 Ch 10 Page 430 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4313 a On the 9th of August, the number of cars that stopped at the drive-in area at a
McBurger restaurant during each hour (from 7.00 am until 10.00 pm) is shown below.14 18 8 9 12 24 25 15 18 25 24 21 25 24 14
Find the interquartile range of this set of data.
b At the nearby Kenny’s Fried Chicken restaurant on the same day, the number ofcars stopping during each hour that it was open is shown below.
7 9 13 16 19 12 11 18 20 19 21 20 18 10Find the interquartile range of these data.
4 Write down a set of data for which n = 5, the median is 6 and the range is 7.
5 Write down a set of data for which n = 8, the median is 7.5 and the range is 10.
6
The quartiles for a set of data are calculated and found to be Q1 = 13, Q2 = 18, andQ3 = 25. Which of the following statements is true?A The interquartile range of the data is 5.B The interquartile range of the data is 7.C The interquartile range of the data is 12.D The median is 12.E The median is 19.
It is recommended that a graphics calculator be used for questions 7 and 8.
7 For each of the following sets of data find the median, the interquartile range and therange. a
b
c
8 For each set of data shown on the stem plots, find the median, the interquartile rangeand the range.
1619
1211
86
715
2632
3218
1543
5131
2923
4523
2223
2525
2721
3619
3129
3228
3931
2927
2022
3029
1.26.1
2.33.7
4.15.4
2.43.7
1.55.2
3.73.8
6.16.3
2.47.1
3.64.9
1.2
a Stem23456789
1011
Leaf3 5 5 6 7 8 9 90 2 2 3 4 6 6 7 8 82 2 4 5 6 6 6 7 90 3 3 5 62 45 927
4 Key: 4|2 = 42
b Stem11223344
Leaf4
1 45 7 8 8 91 2 2 2 4 4 4 45 5 5 63 4
Key: 2|5 = 25
WWORKEDORKEDEExample
5 SkillSH
EET 10.1
mmultiple choiceultiple choice
WWORKEDORKEDEExample
6
EXCEL Spreadsheet
Inter-quartilerange
MQ Maths B Yr 11 Ch 10 Page 431 Friday, October 26, 2001 10:33 AM
432 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
9 The frequency histogram and polygon at right displays the results of a survey of 50 drivers who were asked about the number of speeding fines they have received.a Use the ogive to find the median of the
distribution.b Find the lower quartile. c Find the upper quartile.d Calculate the interquartile range.
10 The frequency distribution table below shows the result of a survey of 90 householdswho were asked about the number of times they had been the victim of crime.
a Add a column for cumulative frequency to the table.b Draw a cumulative frequency histogram and polygon.c Use your graph to find the median of the distribution. d Calculate the interquartile range.
The standard deviationThe standard deviation is the most sophisticated and also the most useful measure ofspread.
The standard deviation σ can be calculated by using the following formula:
s =
To calculate the standard deviation by hand, use the following steps:
Score Frequency
0 26
1 31
2 22
3 8
4 3
Range of soft drink amountsTake another look at Yvonne’s quality control check. We have previously found two measures of central tendency for the two soft drink dispensing machines, but these alone are not enough to state which machine dispenses soft drink most accurately.1 Find the range of amounts dispensed by each machine.2 Find the interquartile range of amounts dispensed by each machine.3 From these results, which machine appears to dispense soft drink most
accurately?
WWORKEDORKEDEExample
7
00
10
20
30
1 2 3 4 5No. of speeding finesreceived by drivers
Cum
ulat
ive
freq
uenc
y
40
50
5
15
25
35
45
f xi x–( )2∑n
------------------------------
MQ Maths B Yr 11 Ch 10 Page 432 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 433Step 1. Find the mean.Step 2. Find the difference between each piece of data and the mean.Step 3. Square the differences.Step 4. Add the squared differences.Step 5. Divide by the number of scores.Step 6. Take the square root.
This algorithm is used to find the standard deviation in the following worked example.
Fortunately, you will not always have to go through this series of steps each time youwish to calculate a standard deviation. Your calculator should have a built-in programfor the computation of standard deviations.
Predicting the mean and standard deviation of a population from a sampleIt is not always practical to measure a particular statistic for a whole population so usuallya sample of the population is taken. It is found that the mean of a sample is a reliableestimate of the mean of a population but the standard deviation of a population is slightlymore than the standard deviation of any sample drawn from it. In other words, the com-plete population shows slightly more variability than any sample drawn from it.
A formula used to predict the standard deviation, s, of a complete population from
a sample of scores is:
Notice that the only difference between the formulas is that the divisor has changedfrom n to n − 1. Your calculator is also equipped with a built-in program for the calcu-lation of this formula. It is worth checking that you can obtain both results from yourcalculator and can distinguish between them. Try reworking the data from workedexample 8. You should find that the standard deviation among the 8 packets of lollies
The following data give the number of lollies in each of 8 packets. Find the standard deviation of the data.14, 14, 13, 15, 16, 13, 14, 17THINK WRITE
Find the mean.= 14.5
Find the difference between each score and the mean.
Differences from mean:−0.5, −0.5, −1.5, 0.5, 1.5, −1.5, −0.5, 2.5
Square each difference. Squared differences:0.25, 0.25, 2.25, 0.25, 2.25, 2.25, 0.25, 6.25
Add the squared differences. 0.25 + 0.25 + 2.25 + 0.25 + 2.25 + 2.25 + 0.25 + 6.25= 14
Divide by the number of scores. 14 ÷ 8 = 1.75
Take the square root and round to 4 decimal places.
= 1.3229The standard deviation σ = 1.3229.
1x
14 14 13 15 16 13 14 17+ + + + + + +8
--------------------------------------------------------------------------------------------=
2
3
4
5
6 1.75
8WORKEDExample
sf xi x–( )2∑n 1–
------------------------------=
MQ Maths B Yr 11 Ch 10 Page 433 Friday, October 26, 2001 10:33 AM
434 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
was σ = 1.3229. If you were to use this sample to predict the standard deviation of allboxes of lollies then the standard deviation would be s = 1.4142.
A final point worth noting about the standard deviation is that, despite its sophisti-cation, it is still influenced to a high degree by extreme values. Care should be takenwhen using this statistic with data that include such values.
CASIO
WE10–9
The following frequency distribution gives the prices paid by a car wrecking yard for 40 car wrecks.
a Find the mean and standard deviation in the price paid for these wrecks.b Estimate the mean and standard deviation in the price paid for wrecks by this yard in
general.
Price ($) Frequency Price $ Frequency0–500 2 2000–2500 7500–1000 4 2500–3000 61000–1500 8 3000–3500 31500–2000 10
THINK WRITEa Calculate the midpoint of each price
range and enter this in L1 on your graphics calculator (press , select EDIT and 1:Edit).Enter the frequency values in L2.
a
Press , select CALC and 1:1–Var Stats and enter L1, L2 to generate the screen shown opposite.
Press to calculate the statistics.
The mean is shown as and the standard deviation as σx.
Mean price = $1825Standard deviation = $787
b Using the 1–Var Stats output screen, we use to estimate the mean, and sx to estimate the standard deviation of a wider population.
b Population mean price (estimate)= $1825
Population standard deviation (estimate)= $797.03
1
STAT
2
3 STAT
4 ENTER
5 x
x
9WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 434 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 435
The standard deviation
1 Use the algorithm (series of steps) to find the standard deviation of the following datawithout using your calculator’s in-built program.
2 Now use the calculator’s in-built program to check each of the standard deviationsthat you calculated in the previous question.
You may use your calculator’s in-built program for finding the standard devi-ation and mean in the rest of the questions.
3 Consider the following two groups of people.
a Calculate the mean height, median height and mode height for each group. Whatdo you notice?
b Are the groups really the same?c Which group would you expect to show the greatest range in heights?d Which group would you expect to show the greatest interquartile range in heights?e Which group would you expect to show the greatest standard deviation in heights?f Calculate these statistics to confirm your predictions.
a 3, 5, 8, 2, 7, 1, 6, 5 b 11, 8, 7, 12, 10, 11, 14c 25, 15, 78, 35, 56, 41, 17, 24 d 5.2, 4.7, 5.1, 12.6, 4.8
remember1. The standard deviation of a group of scores can be found using the formula:
2. The standard deviation of a population can be predicted from a sample of scores by using the formula:
3. The lower the standard deviation the closer together the scores.4. On your calculator the population standard deviation is denoted and the
sample standard deviation sx.
σf xi x–( )2∑
n-----------------------------=
sf xi x–( )2∑n 1–
-----------------------------=
sx
remember
10CWWORKEDORKEDEExample
8
Group A Group B160 170 170 170 170 170 180
Hei
ght (
cm)
160 170 170 110 230 170 180
MQ Maths B Yr 11 Ch 10 Page 435 Friday, October 26, 2001 10:33 AM
436 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
4 The following frequency distribution table shows the number of visitors that came toa city museum during the course of a month.
a Find the range of the data.b Find the mean of the data.c Find the standard deviation and variance of the data.
5
Calculate the standard deviation of the following data to 3 decimal places.
6 The following frequency distribution table shows the life expectancy of 175household light globes.
a Find the range of the data.b Find the mean and standard deviation in the lifetimes of this sample of light globes.c Estimate the mean and standard deviation in the lifetimes of all light globes of this
brand.
7 The following frequency distribution table shows the distribution of daily maximumtemperatures during the course of a full year.
a Add a cumulative frequency column to the table.b Draw an ogive of the data.c Find the upper and lower quartiles of the data and calculate the interquartile range.d Use the ogive to find the median (50th percentile of the data).e Find the mean of the data.f Find the standard deviation of the data.g Find the range of the data.
Visitor number 80– 90– 100– 110– 120– 130–
Frequency 1 4 11 9 4 2
Score 10– 20– 30– 40– 50–Frequency 1 6 9 4 1
A 3.027 B 9.437 C 9.209 D 34.048 E None of the above.
Life (h) Frequency Life (h) Frequency
200– 2 450– 38
250– 5 500– 26
300– 12 550– 15
350– 25 600– 7
400– 42 650– 3
Maximum temperature °C
Number of days
Maximum temperature °C
Number of days
0– 4 20– 94
5– 22 25– 19
10– 95 30– 5
15– 124 35– 2
mmultiple choiceultiple choice
WWORKEDORKEDEExample
9
MQ Maths B Yr 11 Ch 10 Page 436 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4378 The following data give the number of fruit that have formed on each of 30 trees in an
orchard.
a Complete a frequency distribution table for the data.b Draw an ogive of the data.c Use the ogive to find the median, lower quartile and upper quartile of the data.d Find the interquartile range of the data.e Find the mean of the data.f Find the standard deviation of the data.g Estimate the standard deviation in the number of fruit for the whole orchard.h Find the range of the data.
9 The polygons drawn at right show thelifetimes of two samples of different brandsof toaster elements when subject to continueduse.a Which brand has the longest mean life?b Estimate the mean life of each brand.c Which brand has the greatest standard
deviation in its performance?d What does this say about the consistency of this element?e Which brand is best? Give a case in support of Electric Mate. Give a case in
support of Hot Wire.
10 Crunch and Crinkle are two brands of potato crisps. Each is sold in packets nominallyof the same size and for the same price. Upon investigation of a sample of packets ofeach it is found that Crunch and Crinkle have the same mean weight (25 g). The stan-dard deviation of the weights of Crunch is, however, 5 g and the standard deviation ofthe weights of Crinkle is 2 g. Which brand do you think would represent the bestvalue for money under these circumstances? Why?
458160
487353
524654
364858
384441
723944
365247
745876
565768
466555
Standard deviation of softdrink amounts
Returning to Yvonne’s measurements, we are now ready to find the standard deviation of the amounts of soft drink dispensed by each machine.
1 Use the data lists MA and MB stored on your graphics calculator to find the standard deviation for each machine. If you do not have a graphics calculator you can obtain the standard deviation by re-entering each set of data.
2 Explain your choice of σx or sx.
3 Interpret the results in terms of assessing which machine dispenses the soft drink most accurately.
150 160 170 180Hours
Hot Wire
Electric Mate
190 200 210 220 230
WorkS
HEET10.2
MQ Maths B Yr 11 Ch 10 Page 437 Friday, October 26, 2001 10:33 AM
438 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
BoxplotsFive number summaryA five number summary is a list consisting of the lowest score, lower quartile, median, upper quartile and greatest score of a set of data.
A five number summary gives information about the spread of a set of data. The con-vention is not to detail the numbers with labels but to present them in order; so, forexample, the five number summary:
4 15 21 23 28would be interpreted as lowest score 4, lower quartile 15, median 21, upper quartile 23and greatest score 28.
BoxplotsA boxplot (or box-and-whisker plot) is a graph of the five number summary. It is apowerful way to show the spread of data. Boxplots consist of a central divided box withattached ‘whiskers’. The box spans the interquartile range. The median is marked by avertical line inside the box. The whiskers indicate the range of scores:
Boxplots are always drawn to scale. They are presented either with the five number summary figures attached as labels (diagram at right) or with a scale presented alongside the boxplot like the diagram below.
From the following five number summary find:a the median b the interquartile range c the range.29 37 39 44 48 THINK WRITEThe figures are presented in the order of lowest score, lower quartile, median, upper quartile, greatest score.
Xmin = 29, QL = 37, median = 39, QU = 44, Xmax = 48
a The median is 39. a Median = 39b The interquartile range is the difference
between the upper and lower quartiles.b IQR = QU − QL
= 44 − 37= 7
c The range is the difference between the greatest score and the lowest score.
c Range = Xmax − Xmin
= 48 − 29= 19
10WORKEDExample
Indicates thelowest score
Indicates thelower quartile
Indicates themedian
Indicates theupper quartile
Indicates thegreatest score
4 15 21 23 28
0 5 10 15 20 25 30 Scale
MQ Maths B Yr 11 Ch 10 Page 438 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 439
Interpreting a boxplotThe boxplot neatly divides the data into four sections. One-quarter of the scores liebetween the lowest score and the lower quartile, one-quarter between the lower quartileand the median, one-quarter between the median and the upper quartile, and one-quarter between the upper quartile and the greatest score. The reader can easily seewhere clustering of the data occurs. For example, a small box with relatively long whis-kers would indicate that half of the data (from QL to QU) would be confined to a smallrange and the data could be described as clustered. A wide box with relatively shortwhiskers would indicate that half of the data (from QL to QU) would be spread over awide range and the data could be described as spread. Consider the boxplots belowwith their matching histograms.
Identification of extreme valuesExtreme values often make the whiskers appear longer than they should and hence givethe appearance that the data are spread over a much greater range than they really are.
If an extreme value occurs in a set of data it can be denoted by a small cross on the boxplot. The whisker is then shortened to the next largest (or smallest) figure.
The boxplot below shows that the lowest score was 5. This was an extreme value asthe rest of the scores were located within the range 15 to 42.
Size
Positively skewed data
f
Size
Clustered data
f
Size
Normally distributed data
f
SizeSpread data
f
Size
Negatively skewed data
f
0 5 10 15 20 25 30 35 40 45 Scale
×
MQ Maths B Yr 11 Ch 10 Page 439 Friday, October 26, 2001 10:33 AM
440 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
The following stem-and-leaf plot gives the speed of 25 cars caught by a roadside speed camera.Key:8 2 = 82 km/h
8* 6 = 86 km/h
a Prepare a five number summary of the data.b Draw a boxplot of the data. (Identify any extreme values.)c Describe the distribution of the data.
Stem88*99*
1010*11
Leaf2 2 4 4 4 45 5 6 6 7 9 9 90 1 1 2 45 6 90 2
4
THINK WRITE
First identify the positions of the median and upper and lower quartiles. There are 25
pieces of data. The median is the th
score. The lower quartile is the median of the lower half of the data. The upper quartile is the median of the upper half of the data (each half contains 12 scores).
The median is the th score — that is, the13th score.
The QL is the th score in the lower half
— that is, the 6.5th score. That is, halfway between the 6th and 7th scores.The QU is halfway between the 6th and 7th scores in the upper half of the data.
Mark the position of the median and upper and lower quartiles on the stem plot.
Key: 8 2 = 82 km/h
8* 6 = 86 km/h
a Write the five number summary:The lowest score is 82.The lower quartile is between 84 and 85 — that is, 84.5.The median is 89.The upper quartile is between 94 and 95 — that is, 94.5.The greatest score is 114.
a Five number summary:82, 84.5, 89, 94.5, 114
1
n 1+2
------------
25 1+2
---------------
12 1+2
---------------
2
Stem88*99*
1010*11
Leaf2 2 4 4 4 45 5 6 6 7 9 9 90 1 1 2 45 6 90 2
4
QL
QU
Median
11WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 440 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 441
1. Clear the Y= editor (press and) and turn off any existing plots by
pressing [STAT PLOT] and choosing4: PlotsOff.
2. Press and select 1:Edit to enter xdata in L1 and frequencies in L2 if data aregrouped.
3. Press [STAT PLOT] then andselect settings as below (use arrow keys andpress to make each choice). If dataare not grouped, leave Freq = 1.
4. Press , choose 9: ZoomStat thenpress .
5. Press to explore the plot.
THINK WRITE
b Start by ruling a suitable scale. Remember to include the units of measurement. The box represents the interquartile range so it runs from 84.5 to 94.5. The median is a vertical line in the box at 89. The whiskers should extend to the lowest score (82) and the highest score (114). But the score 114 is a great deal higher than any of the others in the set and might be regarded as an extreme value. It should be indicated by a cross and the whisker will extend only as far as 102 (the second largest number in the set).
b
c Even when the extreme value is excluded the data appear to be skewed with high values being spread over a much greater range.
c The data are skewed (positively) and include one extremely high value.
80 90 100 110 km/h
×
Graphics CalculatorGraphics Calculator tip!tip! Creating a boxplot from a frequency table
CASIO
Boxplot
Y=CLEAR
2nd
STAT
2nd ENTER
ENTER
ZOOMENTER
TRACE
MQ Maths B Yr 11 Ch 10 Page 441 Friday, October 26, 2001 10:33 AM
442 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Boxplots
1 From the following five number summary find:a the medianb the interquartile rangec the range.
6, 11, 13, 16, 32
2 From the following five number summary find:a the medianb the interquartile rangec the range.
101, 119, 122, 125, 128
3 From the following five number summary find:a the medianb the interquartile rangec the range.
39.2, 46.5, 49.0, 52.3, 57.8
4 The boxplot below shows the distribution of final points scored by a football teamover a season’s roster.
a What was the team’s greatest points score?b What was the team’s least points score?c What was the team’s median points score?d What was the range of points scored?e What was the interquartile range of points scored?
remember1. A five number summary is a list consisting of the lowest score, lower quartile,
median, upper quartile and greatest score of a set of data.2. A boxplot is a graphical representation of a five number summary and is a
powerful tool to show the spread of data.3. The box spans the interquartile range; the median is marked by a vertical line
inside the box and the whiskers extend to the lowest and greatest scores.4. Boxplots are always drawn to scale.5. If an extreme value occurs in a set of data, it can be denoted by a small cross;
the whisker is then shortened to the next largest (or smallest) value.
remember
10DWWORKEDORKEDEExample
10
50 70 90 110 130 150 Points
MQ Maths B Yr 11 Ch 10 Page 442 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4435 The boxplot below shows the distribution of data formed by counting the number of
honey bears in each of a large sample of packs.
a What was the largest number of honey bears in any pack?b What was the smallest number of honey bears in any pack?c What was the median number of honey bears in any pack?d What was the range of numbers of honey bears per pack?e What was the interquartile range of honey bears per pack?
Questions 6 to 8 refer to the following boxplot.
6
The median of the data is:
7
The interquartile range of the data:
E cannot be determined because of extreme values.
8
Which of the following is not true of the data represented by the boxplot?A One-quarter of the scores are between 5 and 20.B Half of the scores are between 20 and 25.C The lowest quarter of the data is spread over a wide range.D Most of the data are contained between the scores of 5 and 20.E The data are skewed left.
9 The number of sales made each day by a salesperson is recorded over a 2-weekperiod:
25, 31, 28, 43, 37, 43, 22, 45, 48, 33a Prepare a five number summary of the data. (There is no need to draw a stem-and-
leaf plot of the data. Just arrange them in order of size.)b Draw a boxplot of the data.
10 The data below show monthly rainfall in millimetres.
a Prepare a five number summary of the data.b Draw a boxplot of the data.
A 5 B 20 C 23 D 25 E 31
A is 23 B is 26 C is 5 D is 20 to 25
J F M A M J J A S O N D
10 12 21 23 39 22 15 11 22 37 45 30
30 35 40 45 50 55 60 Scale
5
×
10 15 20 25 30 Scale
mmultiple choiceultiple choice
mmultiple choiceultiple choice
mmultiple choiceultiple choice
GC program
UVstatistics
EXCEL Spreadsheet
Boxplots
MQ Maths B Yr 11 Ch 10 Page 443 Friday, October 26, 2001 10:33 AM
444 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
11 The stemplot at right details the age of 25 offenders who were caught during random breath testing.a Prepare a five number summary of the data.b Draw a boxplot of the data.c Describe the distribution of the data.
12 The following stem-and-leaf plot details the price at which 30 houses in a particular suburb sold for.a Prepare a five number summary of the data.b Draw a boxplot of the data.(You might like to use a graphics calculator for this question.)
13 The following data detail the number of hamburgers sold by a fast food outlet every day over a 4-week period.
a Prepare a stem-and-leaf plot of the data. (Use a class size of 10.)
b Draw a boxplot of the data.(You might like to use a graphics calculator for this question.)
14 The following data show the ages of 30 mothers upon the birth of their first baby.
a Prepare a stem-and-leaf plot of the data. (Use a class size of 5.)b Draw a boxplot of the data. Indicate any extreme values appropriately.c Describe the distribution in words. What does the distribution say about the age
that mothers have their first baby?(You might like to use a graphics calculator for this question.)
M T W T F S S
125 144 132 148 187 172 181
134 157 152 126 155 183 188
131 121 165 129 143 182 181
152 163 150 148 152 179 181
222531212922
183219331817
171923232248
222325242418
242823202020
WWORKEDORKEDEExample
11
Key:1 8 = 18 yearsStem1234567
Leaf8 8 9 9 90 0 0 1 1 3 4 6 90 1 2 72 53 6 864
Key: 12 4 = $124 000Stem121314151617
Leaf4 7 90 0 2 5 50 0 2 3 5 5 7 9 90 0 2 3 7 7 80 2 2 5 85
MQ Maths B Yr 11 Ch 10 Page 444 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 44515
Match the boxplot with its most likely histogram.
Back-to-back stem plotsIn chapter 9, we saw how to construct a stem plot for a set of univariate data. We canalso extend a stem plot so that it displays bivariate data. Specifically, we shall create astem plot that displays the relationship between a numerical variable and a categoricalvariable. We shall limit ourselves in this section to categorical variables with just twocategories, for example sex. The two categories are used to provide two, back-to-backleaves of a stem plot.
A back-to-back stem plot is used to display bivariate data, involving a numerical variable and a categorical variable with 2 categories.
mmultiple choiceultiple choice
A B
Size
f
Size
f
Size
f
Size
f
Size
fC D E
The girls and boys in Grade 4 at Kingston Primary School submitted projects on the Olympic Games. The marks they obtained out of 20 are given below.
Display the data on a back-to-back stem plot.Continued over page
Girls’ marks 16 17 19 15 12 16 17 19 19 16
Boys’ marks 14 15 16 13 12 13 14 13 15 14
12WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 445 Monday, October 29, 2001 7:51 AM
446 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
The back-to-back stem plot allows us to make some visual comparisons of the twodistributions. In the above example the centre of the distribution for the girls is higherthan the centre of the distribution for the boys. The spread of each of the distributionsseems to be about the same. For the boys, the marks are grouped around the 12–15marks; for the girls, they are grouped around the 16–19 marks. On the whole, we canconclude that the girls obtained better marks than the boys did.
To get a more precise picture of the centre and spread of each of the distributions wecan use the summary statistics discussed in chapter 1. Specifically, we are interested in:1. the mean and the median (to measure the centre of the distributions), and2. the interquartile range and the standard deviation (to measure the spread of the
distributions).We saw in chapter 1 that the calculation of these summary statistics is very straight-
forward and rapid using a graphics calculator.
THINK WRITE
Identify the highest and lowest scores in order to decide on the stems.
Highest score = 19Lowest score = 12Use a stem of 1, divide into fifths.
Create an unordered stem plot first. Put the boys’ scores on the left, and the girls’ scores on the right.
Key: 1 2 = 12Leaf Stem LeafBoys Girls
13 2 3 3 1 2
4 5 4 5 4 1 56 1 6 7 6 7 6
1 9 9 9Now order the stem plot. The scores on the left should increase in value from right to left, while the scores on the right should increase in value from left to right.
Key: 1 2 = 12Leaf Stem LeafBoys Girls
3 3 3 2 1 25 5 4 4 4 1 5
6 1 6 6 6 7 71 9 9 9
1
2
3
The number of ‘how to vote’ cards handed out by various Australian Labor Party and Liberal Party volunteers during the course of a polling day is shown below.
Display the data using a back-to-back stem plot and use this, together with summary statistics, to compare the distributions of the number of cards handed out by the Labor and Liberal volunteers.
Labor 180193
233202
246210
252222
263257
270247
229234
238226
226214
211204
Liberal 204287
215273
226266
253233
263244
272250
285261
245272
267280
275279
13WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 446 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 447
THINK WRITEConstruct the stem plot. Key: 18 0 = 180
Leaf Stem LeafLabor Liberal
0 183 19
4 2 20 44 1 0 21 5
9 6 6 2 22 68 4 3 23 3
7 6 24 4 57 2 25 0 3
3 26 1 3 6 70 27 2 2 3 5 9
28 0 5 7Use a graphics calculator to calculate the summary statistics: the mean, the median, the standard deviation and the interquartile range. Enter each set of data as a separate list. (See worked example 6 on how to use your graphics calculator to calculate these values.)
For the Labor volunteers:Mean = 227.9Median = 227.5Interquartile range = 36Standard deviation = 23.9
For the Liberal volunteers:Mean = 257.5Median = 264.5Interquartile range = 29.5Standard deviation = 23.4
Comment on the relationship. From the stem plot we see that the Labor distribution is symmetric and therefore the mean and the median are very close, whereas the Liberal distribution is negatively skewed.
Since the distribution is skewed, the median is a better indicator of the centre of the distribution than is the mean.
Comparing the medians therefore, we have the median number of cards handed out for Labor at 228 and for Liberal at 265, which is a big difference.
The standard deviations were similar as were the interquartile ranges. There was not a lot of difference in the spread of the data.
In essence, the Liberal Party volunteers handed out a lot more ‘how to vote’ cards than the Labor Party volunteers did.
1
2
3
remember1. A back-to-back stem plot displays bivariate data involving a numerical variable
and a categorical variable with two categories.2. In the ordered stem plot, the scores on the left side of the stem increase in value
from right to left.3. Together with summary statistics, back-to-back stem plots can be used for
comparing two distributions.
remember
MQ Maths B Yr 11 Ch 10 Page 447 Friday, October 26, 2001 10:33 AM
448 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Back-to-back stem plots
1 The marks (out of 50), obtained for the end-of-term test by the students in German andFrench classes are given below. Display the data on a back-to-back stem plot.
2 The birth masses of 10 boys and 10 girls (in kilograms, to the nearest 100 grams) arerecorded in the table below. Display the data on a back-to-back stem plot.
3 The number of delivery trucks making deliveries to a supermarket each day over a2-week period was recorded for two neighbouring supermarkets —supermarket A andsupermarket B. The data are shown below.
a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distri-
butions of the number of trucks delivering to supermarkets A and B.
4 The marks out of 20 for males and females on a science test for a Year-10 class aregiven below.
a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distri-
butions of the marks of the males and the females.
5 The end-of-year English marks for 10 students in an English class were compared over2 years. The marks for 1998 and for the same students in 1999 are shown below.
a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distri-
butions of the marks obtained by the students in 1998 and 1999.
German 20 38 45 21 30 39 41 22 27 33 30 21 25 32 37 42 26 31 25 37
French 23 25 36 46 44 39 38 24 25 42 38 34 28 31 44 30 35 48 43 34
Boys 3.4 5.0 4.2 3.7 4.9 3.4 3.8 4.8 3.6 4.3
Girls 3.0 2.7 3.7 3.3 4.0 3.1 2.6 3.2 3.6 3.1
A 11 15 20 25 12 16 21 27 16 17 17 22 23 24
B 10 15 20 25 30 35 16 31 32 21 23 26 28 29
Females 12 13 14 14 15 15 16 17
Males 10 12 13 14 14 15 17 19
1998 30 31 35 37 39 41 41 42 43 46
1999 22 26 27 28 30 31 31 33 34 36
10EWWORKEDORKEDEExample
12
WWORKEDORKEDEExample
13
MQ Maths B Yr 11 Ch 10 Page 448 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4496 The age and gender of a group of people attending a fitness class are recorded below.
a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distri-
butions of the ages of the female to male members of the fitness class.
7 The scores on a board game are recorded for a group of kindergarten children and for agroup of children in a preparatory school.
a Display the data on a back-to-back stem plot.b Use the stem plot, together with some summary statistics, to compare the distributions
of the scores of the kindergarten children compared to the preparatory school children.
8The pair of variables that could be displayed on a back-to-back stem plot is:A the height of student and the number of people in the student’s householdB the time put into completing an assignment and a pass or fail score on the assignmentC the weight of a businessman and his ageD the religion of an adult and the person’s head circumferenceE the income bracket of an employees and the time the employee has worked for the
company
9A back-to-back stem plot is a useful way of displaying the relationship between:A the proximity to markets (km) and the cost of fresh foods on average per kilogramB height and head circumferenceC age and attitude to gambling (for or against)D weight and ageE the money spent during a day of shopping and the number of shops visited on that day
Female 23 24 25 26 27 28 30 31
Male 22 25 30 31 36 37 42 46
Kindergarten 3 13 14 25 28 32 36 41 47 50
Prep. School 5 12 17 25 27 32 35 44 46 52
mmultiple choiceultiple choice
mmultiple choiceultiple choice
MQ Maths B Yr 11 Ch 10 Page 449 Friday, October 26, 2001 10:33 AM
450 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Parallel boxplotsWe saw in the previous section that we could display relationships between a numericalvariable and a categorical variable with just two categories, using a back-to-back stem plot.
When we want to display a relationship between a numerical variable and acategorical variable with more than two categories, a parallel boxplot can be used.
A parallel boxplot is obtained by constructing individual boxplots for eachdistribution, using the common scale.
Construction of individual boxplots was discussed in detail earlier in this chapter(see page 438). In this section we concentrate on comparing distributions representedby a number of boxplots (that is, on the interpretation of parallel boxplots).
CASIO
WE 10-14
The four Year-7 classes at Western Secondary College complete the same end-of-year maths test. The marks, expressed as percentages for each of the students in the four classes, are given below.
Display the data using a parallel boxplot and use this to describe any similarities or differences in the distributions of the marks between the four classes.
7A 7B 7C 7D 7A 7B 7C 7D
40 60 50 40 69 78 70 69
43 62 51 42 63 82 72 73
45 63 53 43 63 85 73 74
47 64 55 45 68 87 74 75
50 70 57 50 70 89 76 80
52 73 60 53 75 90 80 81
53 74 63 55 80 92 82 82
54 76 65 59 85 95 82 83
57 77 67 60 89 97 85 84
60 77 69 61 90 97 89 90
THINK WRITE/DISPLAYCreate the first boxplot (for class 7A) on a graphics calculator using [STAT PLOT] and appropriate WINDOW settings. Using to show key values, sketch the first boxplot using pen and paper, leaving room for three additional plots.
12nd
TRACE
14WORKEDExample
MQ Maths B Yr 11 Ch 10 Page 450 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 451
THINK WRITE
Repeat step 1 for the other three classes. All four boxplots share the common scale.
Describe the similarities and differences between the four distributions.
Class 7B had the highest median mark and the range of the distribution was only 37. The lowest mark in 7B was 60.
We notice that the median of 7A’s marks is approxi-mately 60. So, 50% of students in 7A received less than 60. This means that half of 7A had scores that were less than the lowest score in 7B.
The range of marks in 7A was about the same as that of 7D with the highest scores in each about equal, and the lowest scores in each about equal. However, the median mark in 7D was higher than the median mark in 7A so, despite a similar range, more students in 7D received a higher mark than in 7A.
While 7D had a top score that was higher than that of 7C, the median score in 7C was higher than that of 7D and the bottom 25% of scores in 7D were less than the lowest score in 7C. In summary, 7B did best, followed by 7C then 7D and finally 7A.
2
30 40 50 60 70 80 90 100
7D
7C
7B
7A
Maths mark (%)
3
remember1. A relationship between a numerical variable and a categorical variable with
more than two categories can be displayed using a parallel boxplot.2. A parallel boxplot is obtained by constructing individual boxplots for each
distribution, using a common scale.
remember
MQ Maths B Yr 11 Ch 10 Page 451 Friday, October 26, 2001 10:33 AM
452 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Parallel boxplots
1 The heights (in cm) of students in 9A, 10A and 11A were recorded andare shown in the table below.
a Construct a parallel boxplot to show the data.b Use the boxplot to compare the distributions of height for the 3 classes.
2 The amounts of money contributed annually to superannuation schemes by people in3 different age groups are shown below.
a Construct a parallel boxplot to show the data.b Use the boxplot to comment on the distributions.
9A 10A 11A 9A 10A 11A 9A 10A 11A
120 140 151 146 153 164 158 168 175
126 143 153 147 156 166 160 170 180
131 146 154 150 162 167 162 173 187
138 147 158 156 164 169 164 175 189
140 149 160 157 165 169 165 176 193
143 151 163 158 167 172 170 180 199
20–29 30–39 40–49 20–29 30–39 40–49
2000 4000 10 000 6500 7000 13 700
3100 5200 11 200 6700 8000 13 900
5000 6000 12 000 7000 9000 14 000
5500 6300 13 300 9200 10 300 14 300
6200 6800 13 500 10 000 12 000 15 000
10F
GCpro
gram
UV stats
WWORKEDORKEDEExample
14EXCEL
Spreadsheet
Parallel boxplots
MQ Maths B Yr 11 Ch 10 Page 452 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 4533 The numbers of jars of vitamin A, B, C and multi-vitamins sold per week by a local
chemist are shown below.
a Construct a parallel boxplot to display the data.b Use the boxplot to compare the distributions of sales for the 4 types of vitamin.
4The ages of the employees at 5 different companies of the same size are comparedusing the parallel boxplots shown below.
For each of the following, select from:
a Which company has the greatest range of ages?
b Which company has the greatest interquartile range of ages?
c Which company has the lowest median age?
d Which company has the greatest range of ages among their oldest 25% of employees?
Vitamin A 5 6 7 7 8 8 9 11 13 14
Vitamin B 10 10 11 12 14 15 15 15 17 19
Vitamin C 8 8 9 9 9 10 11 12 12 13
Multi-vitamins 12 13 13 15 16 16 17 19 19 20
A company A B company BC company C D company DE company E
mmultiple choiceultiple choice
20 25 30 35 40 45 50 55 60
Company A
Company B
Company C
Company E
Company D
MQ Maths B Yr 11 Ch 10 Page 453 Friday, October 26, 2001 10:33 AM
454 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
Measures of central tendency
• The mean is given by where represents the sum of all observations in
the data set and n represents the number of observations in the data set.• The mean is calculated by using the values of the observations, and because of this
it becomes a less reliable measure of the distribution when the distribution is skewed or contains an outlier.
• To find the mean for grouped data, where f represents the frequency
of the data and m represents the midpoint of the class interval of the grouped data.• The median is the midpoint of a set of data. Half the data are less than or equal to
the median. Where there are no observations in a set of ordered data, the median is
located at the th position.
• The mode is the score in the data set with the highest frequency.
Range and interquartile range• The range of a data set is the difference between the highest and lowest values in
that data set.• The interquartile range IQR = Q3 − Q1.• There are four steps to locating Q1 and Q3.
Step 1. Write down the data set in order from lowest to highest.Step 2. Locate the median; that is, locate Q2.Step 3. Now consider the lower half of the data set. Find the middle score. This score is Q1.Step 4. Now consider the upper half of the data set. Find the middle score. This score is Q3.
• The values of the median as well as Q1 and Q3 can be estimated by using an ogive.
Standard deviation• The standard deviation of a group of scores can be found using the formula:
σ = or by using a calculator.
• The standard deviation of a population can be predicted from a sample of scores by using the formula:
or by using a calculator.
Boxplots• A five number summary is a list, consisting of the lowest score, lower quartile,
median, upper quartile and the greatest score (in that order) of the data.
summaryx
x∑n
--------= x∑
x f m×( )∑=f∑
----------------------------------
n 1+2
------------
f xi x–( )2∑n
-----------------------------
sf xi x–( )2∑n 1–
-----------------------------=
MQ Maths B Yr 11 Ch 10 Page 454 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 455• A boxplot is a graph of the five number summary.• The boxplot is a powerful tool to show the spread of the data.• Boxplots are always drawn to scale.• The box spans the interquartile range; the median is marked by the vertical line
inside the box; the whiskers extend to the lowest and greatest scores.
• The extreme values can be denoted by a small cross; the whiskers are then shortened to the next largest (or smallest) value.
Comparing sets of data• Back-to-back stem plots
1. are useful to compare the distribution of two similar sets of data2. share the same stem3. contain a key, which usually relates to data on the right4. have the data on the left arranged outwards from the plot as it increases.
• Parallel boxplots 1. are useful for quantitative comparisons2. share a common scale3. compare two or more sets of data.
Indicates thelowest score
Indicates thelower quartile
Indicates themedian
Indicates theupper quartile
Indicates thegreatest score
MQ Maths B Yr 11 Ch 10 Page 455 Friday, October 26, 2001 10:33 AM
456 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
1 Calculate the mean of each of the following sets of scores.
a 4, 9, 5, 3, 5, 6, 2, 7, 1, 10b 65, 67, 87, 45, 90, 92, 50, 23c 7.2, 7.9, 7.0, 8.1, 7.5, 7.5, 8.7d 5, 114, 23, 12, 25
2 Complete the frequency distribution table below and use it to estimate the mean of the distribution.
3 Use the statistics function on your calculator to find the mean of each of the following sets of scores.
a 2, 18, 26, 121, 96, 32, 14, 2, 0, 0b 2, 2, 12, 12, 12, 32, 32, 47, 58c 0.2, 0.3, 0.6, 0.4, 0.3, 0.7, 0.8, 0.6, 0.5, 0.4, 0.1
4 Use the statistics function on your calculator to find the mean of the following distributions. Where necessary, give your answers correct to 1 decimal place.
Class Class centre (x) Frequency (f )
21–24 3
25–28 9
29–32 17
33–36 31
37–40 29
41–44 25
45–48 19
49–52 10
Σ f =
10A
CHAPTERreview
10A
10A
10Aa bScore Frequency
10 23
20 47
30 68
40 56
50 17
Score Frequency
24 45
25 89
26 124
27 102
28 78
29 46
MQ Maths B Yr 11 Ch 10 Page 456 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 457
c
5 For each of the following sets of scores, find the median.
a 25, 26, 26, 27, 27, 28, 30, 32, 35
b 4, 5, 8, 5, 8, 6, 7, 10, 4, 8, 4
c 3.2, 3.1, 3.0, 3.5, 3.2, 3.2, 3.2, 3.6
d 2, 3, 7, 4, 4, 8, 5, 7, 7, 6
e 121, 135, 111, 154, 147, 165, 101, 108
6 Copy and complete each of the following frequency tables and then use them to find the median.
a
b
Class Class centre Frequency
10–12 11 18
13–15 14 32
16–18 17 34
19–21 20 40
22–24 23 28
25–27 26 14
28–30 29 6
Score Frequency Cumulative frequency
0 2
1 6
2 11
3 7
4 6
5 3
Score Frequency Cumulative frequency
54 2
55 5
56 14
57 11
58 6
59 1
60 1
10A
10A
MQ Maths B Yr 11 Ch 10 Page 457 Friday, October 26, 2001 10:33 AM
458 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
c
7 Use the frequency table below to state the modal class.
8 Below are the number of goals scored by a netball team in ten matches in a tournament.25 26 19 24 28 67 21 22 28 18
a Calculate the mean.b Calculate the median.c Calculate the mode.d Which of the above is the best summary statistic? Explain your answer.
9 Give an example of a statistical analysis where the best summary statistic is:
10 Find the range of each of the following sets of scores.a 28 24 26 24 25 29 22 27 25b 118 2 56 45 72 43 69 84 159 0c 1.9 0.7 0.5 0.8 1.1 1.5 1.4
11 For each of the data sets in question 10 calculate:i the median iii the lower quartile
iii the upper quartile iv the interquartile range.
Score FrequencyCumulative frequency
66 8
67 10
68 12
69 14
70 7
71 5
72 4
Class Class centre Frequency
30–33 31.5 12
34–37 35.5 26
38–41 39.5 34
42–45 43.5 45
46–49 47.5 52
50–53 51.5 23
a the mean b the median c the mode.
10A
10A
10A
10B
10B
MQ Maths B Yr 11 Ch 10 Page 458 Friday, October 26, 2001 10:33 AM
C h a p t e r 1 0 S u m m a r y s t a t i s t i c s 45912 The cumulative frequency histogram and polygon at right
shows the number of goals scored by a soccer team in each match over a season. Use the graph to calculate:a the medianb the lower and upper quartilesc the interquartile range.
13 The cumulative frequency histogram and polygon below show the number of apples on each tree in an orchard.
Use the graph to find:a the medianb the lower and upper quartilesc the interquartile range.
14 The following scores were recorded: 2, 6, 9, 1, 3, 5, 6, 7, 8, 2. For these data find:a the meanb the medianc the moded the rangee the interquartile rangef the standard deviation.
15
For the above set of data find:
Score Frequency Score Frequency
30–39 3 60–69 15
40–49 6 70–79 18
50–59 12 80–89 10
a the mean b the median group c the modal groupd the range e the standard deviation
10B
005
101520253035
1 2 3 4Number of goals scored
in a soccer match
Cum
ulat
ive
freq
uenc
y
4045
10B
34.50
50100150
44.554.5
64.574.5
84.5
Number of apples on a tree
Cum
ulat
ive
freq
uenc
y
200250300
10B,C
10B,C
MQ Maths B Yr 11 Ch 10 Page 459 Friday, October 26, 2001 10:33 AM
460 M a t h s Q u e s t M a t h s B Ye a r 1 1 f o r Q u e e n s l a n d
16 For the box-and-whisker plot drawn below:
a state the medianb calculate the rangec calculate the interquartile range.
17 The number of babies born each day at a hospital over a year is tabulated and the five number summary is given below.
Lower extreme = 1Upper quartile = 16Lower quartile = 8Upper extreme = 18Median = 14
Show this information in a box-and-whisker plot.
18 The number of hours of counselling received by a group of 9 full-time firefighters and 9 volunteer firefighters after a serious bushfire is given below.
a Construct a back-to-back stem plot to display the data.b Comment on the distributions of the number of hours of counselling of the full-time
firefighters and the volunteers.
19 The IQ of 8 players in 3 different football teams were recorded and are shown below.
Display the data in parallel boxplots.
Full-time 2 4 3 5 2 4 6 1 3
Volunteer 8 10 11 11 12 13 13 14 15
Team A 120 105 140 116 98 105 130 102
Team B 110 104 120 109 106 95 102 100
Team C 121 115 145 130 120 114 116 123
10D
0 5 10 15 20 25 30 35 40 45 50 55 60
10D
10E
testtest
CHAPTERyyourselfourself
testyyourselfourself
10
10F
MQ Maths B Yr 11 Ch 10 Page 460 Friday, October 26, 2001 10:33 AM