Upload
dinhthu
View
214
Download
1
Embed Size (px)
Citation preview
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
C H A P T E R
6CORE
Data transformation
What do we mean by data transformation?
What is the effect of applying a log, squared or reciprocal transformation to the x
variable?
What is the effect of applying a log, squared or reciprocal transformation to the y
variable?
How do I choose which data transformation to apply?
How do I carry out a regression analysis with transformed data?
6.1 Data transformationThere are methods for fitting curves to non-linear relationships, using non-linear regression.
However, this procedure is mathematically complicated and the results difficult to interpret.
The method of dealing with a non-linear relationship favoured in practice is to apply a
mathematical function to one of the variables, so that the relationship between the variables
becomes closer to a straight line. By appropriate choice of the function, the scale of
measurement is stretched or compressed. There are many functions that can be used to
transform the data, but here we will consider only three. These are:
the square transformation
the logarithmic transformation
the reciprocal transformation
When first confronted with data transformation, many people tend to be suspicious. However,
when we think about it from the point of view of analysing a set of data, there is nothing
special about the units of measurement used when gathering the data. In general, units used are
chosen because they are convenient for recording and reporting the data. ‘Natural’ units tend
to be used, for example, seconds when recording time, or metres when recording length. But
what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres
per kilometre (x−1)? In measurement, natural often tends to mean familiar. For example, to a
chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen-ion
concentration (log x) rather than the hydrogen-ion concentration (x).
166
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 167
Generally, it is only luck if a data set reveals all its hidden information when analysed in the
form in which it was initially gathered and/or reported. It is part of the analyst’s role to search
out different ways of looking at the data in order to enhance our understanding of that data.
One of the most powerful tools available to help achieve this task is data transformation.
How do these transformations affect the values to which they are applied? Consider the
following table of numbers:
value 0.2 0.4 0.6 1 2 3 4
(value)2 0.04 0.16 0.36 1 4 9 16
log(value) −0.699 −0.398 −0.222 0 0.301 0.477 0.602
1/value 5 2.5 1.667 1 0.5 0.333 0.25
From the table we can see that the transformations have the following effects on the data values:
The square transformation has the effect of decreasing values less than 1, and increasing
values greater than 1. Large values are increased the most. For example, 22 = 4, while
202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400 are
396 units apart. That is, the effect of the square transformation is to stretch the values.
The log transformation reduces all values, and values between 0 and 1 become negative.
Large values are reduced much more than small values. For example, log 2 = 0.301, while
log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and
1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the
values. Note that the log function can only be applied to values which are greater than 0.
The reciprocal transformation again reduces all values greater than one. Large values
are reduced much more than small values. For example, 12 = 0.5, while 1
20= 0.05, so that
while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units
apart. That is, the effect of the reciprocal transformation is to compress the large values to
an even greater extent than the log transformation.
Thus it can be seen that all transformations have a greater effect on the larger values, but this
effect varies for each transformation.
Exercise 6A
1 a Copy and complete the table.value 1 2 3 4 5 6 7
(value)2
log(value)
1/value
b Use the information in the table to
complete the following statements
by deleting the incorrect term.
i The squared transformation
stretches/compresses the scale.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
168 Essential Further Mathematics – Core
2 a Copy and complete the table. value 1 2 4 8 16 32 64
(value)2
log(value)
1/value
b Use the information in the table to
complete the following statements
by deleting the incorrect term.
i The squared transformation
stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
3 a Copy and complete the table.
value 1 10 100 1000 10000 100000
(value)2
log(value)
1/value
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
4 a Copy and complete the table.
value 20 10 5 2.5 1.25 0.625
(value)2
log(value)
1/value
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
5 a Copy and complete the table.
value 2 20 200 2000 20000 200000
(value)2
log(value)
1/value
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 169
6.2 Transforming the x axisWe are interested in linearising the relationship between two variables, x and y, and the
transformations discussed in the previous section can be applied to either x or y (but not both
here). We will examine the effect of transforming the x axis and the y axis separately.
Transforming the x axis will have the effect of moving the x values on the plot horizontally,
and leave the y values unaltered. The square, log and reciprocal transformations can be applied
to the x axis with the following effects:
Transformation Outcome Graph
x2 Spreads out the high x values relative
to the smaller x values
x
y
log x Compresses large x values relative to
the smaller data values
x
y
1x
Also compresses large x values relative
to the smaller data values, to a greater
extent than log x . Note that values of x
less than 1 become greater than 1, and
values of x greater than 1 become less
than 1, so that the order of the data
values is reversed. x
y
The following examples show the effect on the relationship between x and y when the squared,
log and reciprocal transformations are applied to the x values.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
170 Essential Further Mathematics – Core
Example 1 Linearising the relationship with a squared transformation
a Plot the data in the table, and comment on the
form of the relationship between x and y.x 0 1 2 3 4
y 2 3 6 11 18
b Apply a squared transformation to the x values (x2), again plot the data, and comment on
the form of the relationship between y and x2.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or non-linear.
x4321
5
0
10
15
20
y
The relationship between y and x is non-linear.3 Write down your conclusion.
b 1 Construct a new table of values.x2 0 1 4 9 16
y 2 3 6 11 18
2 Plot the values of y against x2.
3 Decide if the form of the
relationship is linear or non-linear.
x2
2015105
5
0
10
15
20
y
The relationship between y and x2 is linear.4 Write down your conclusion.
Data transformation is very conveniently carried out with the aid of a graphics calculator, and
in practice, this is how you will do it in future. Note that, throughout this chapter, you will find
it useful to enter the data into named lists because you will need to keep track of the various
lists of transformed data as you work through the problems.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 171
How to apply the squared transformation using the TI-Nspire CAS
Plot the data presented in the table below.
x 0 1 2 3 4
y 2 3 6 11 18
Apply a squared transformation to the x values (x2) and replot the data.
Steps1 Start a new document by pressing /
+ N .
2 Select 3:Add Lists & Spreadsheet.Enter the data into lists named x and y,
as shown.
3 Press c and select 5:Data & Statistics.
Construct a scatterplot of y against x.
Let x be the independent variable and y
the dependent variable. The plot is
clearly non-linear.
4 Return to the Lists & Spreadsheetapplication (by pressing / + ).
To calculate the values of x2 and store
them in a list named xsq (short for
x-squared), do the following:
a Move the cursor to the top of column C
and type xsq. Press enter .
b Move the cursor to the grey cell
immediately below the xsq heading. We
need to enter the expression = x2. To do
this, press = then VAR ( ), highlight
the variable x and then press enter to paste
x into the formula line. Finally, type ∧2
(or press ) to complete the formula.
Press enter to calculate and display the
x-squared values.
Note: The dash in front of the x (i.e. x) is
automatically added when a list name is
pasted from the VAR menu.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
172 Essential Further Mathematics – Core
Note: You can also type in the variable x
and then select Variable Reference
when prompted. This avoids using the
VAR menu.
5 Construct a scatterplot of y against x2.
Press / + to return to the
scatterplot created earlier and change
the independent variable to xsq as
follows:
a Move the cursor to the textbox area
below the horizontal (or x-) axis.
Press x when prompted and select
the variable xsq. Press enter to paste
the variable to that axis.
b A scatterplot of y against xsq (x2) is
then displayed, as shown. The plot is
clearly linear.
Note: If you wish to keep the original
plot of y against x you can create a new
Data & Statistics page to plot the
transformed data.
How to apply the squared transformation using the ClassPad
Plot the data presented in the table below.
x 0 1 2 3 4
y 2 3 6 11 18
Apply a squared transformation to the x values (x2) and replot the data.
Steps1 Open the Statistics application and
enter the data into the columns named xand y. Your screen should look like the
one shown.
2 Construct a scatterplot of y against x.
Let x be the independent variable and y
the dependent variable. The plot is
clearly non-linear.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 173
3 To calculate the values of x2 and store
them in a list named xsq (i.e. x-squared),
do the following:
a Tap to highlight the cell at the top of
the next empty list (in this case,
list3). Rename by typing xsq and
pressing enter .
b Tap to highlight the cell at the bottom
of the newly named xsq column (in
the row titled Cal ). Type x2 and
press to calculate and list the x2
values.
4 Construct a scatterplot of y against xsq
(i.e. x2). The plot is clearly linear.
Example 2 Linearising the relationship with the log transformation
a Plot the data in the table, and comment
on the form of the relationship between
x and y.
x 1 10 100 400 600 1000
y 0 10 20 25 28 30
b Apply a log transformation to the x values (log x), again plot the data, and comment on the
form of the relationship between y and log x.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or
non-linear.
x200 400 600 800 1000
5
0
10
15
30
25
20
y
The relationship between y and x is non-linear.3 Write down your conclusion.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
174 Essential Further Mathematics – Core
b 1 Construct a new table of values.log x 0 1 2 2.6 2.8 3
y 0 10 20 25 28 30
2 Plot the values of y against log x.
3 Decide if the form of the relationship
is linear or non-linear.
5
0
10
15
30
25
20
y
0.5 1 1.5 2 2.5 3log x
The relationship between y and log x is linear.4 Write down your conclusion.
Once again, this transformation is very conveniently carried out with the aid of a graphics
calculator.
How to apply the log transformation using the TI-Nspire CAS
Plot the data presented in the table below.
x 1 10 100 400 600 1000
y 0 10 20 25 28 30
Apply a log transformation to the x values (log(x)) and replot the data.
Steps1 Start a new document by
pressing / + N .
2 Select 3:Add Lists &Spreadsheet.Enter the data into lists
named x and y, as shown
opposite.
3 Press c and select
5:Data & Statistics.
Construct a scatterplot of y against x.
Let x be the independent variable
and y the dependent variable. The plot is
clearly non-linear.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 175
4 Return to the Lists & Spreadsheetapplication (by pressing / + ).
To calculate the values of log x and
store them in a list named lx (short for
log x), complete the
following:
a Move the cursor to the top of
column C and type lx. Press enter .
b Move the cursor to the grey cell
immediately below the lx
heading and type = log(. Then
press VAR ( ), highlight the
variable x, press enter to paste xinto the formula line, then type)
to complete the command. Pressenter to calculate and display the
log values.
If your answers are not given as
decimals, refer to
the Appendix to change Mode settings to APPRX.
5 Construct a scatterplot of y against
log x.
Use / + to return to the
scatterplot created earlier and
change the independent variable
to lx.A scatterplot of y against lx (i.e. the log of x)
is displayed, as shown. The plot is clearly linear.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
176 Essential Further Mathematics – Core
How to apply the log transformation using the ClassPad
Plot the data presented in the table below.
x 1 10 100 400 600 1000
y 0 10 20 25 28 30
Apply a log transformation to the x values (log(x)) and replot the data.
Steps1 Open the Statistics
application and enter
the data into the
columns named x and
y. Your screen should
look like the one
shown opposite.
2 Construct a scatterplot
of y against x. Let x be
the independent
variable and y the
dependent variable.
The plot is clearly
non-linear.
3 To calculate the values
of log x and store them
in a list named lx (short
for log x), complete the
following:
a. Tap to highlight the
cell at the top of the
next empty list (in
this case, list3).
Rename by typing
lx and pressing
enter .
b. Tap to highlight the
cell at the bottom of
the newly named lx
column (in the row
titled Cal ).
Typing log(x) and
pressing
calculates and lists
the values of log x.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 177
Note: To ensure decimal values are displayed, Decimal should be visible in the status bar
(at the bottom). If Standard is visible, tap Standard and it will change to Decimal.
4 Construct a scatterplot
of y against lx (i.e.
log x). The plot is
clearly linear.
Example 3 Linearising the relationship with the1x
transformation
a Plot the data in the table, and comment on
the form of the relationship between x and y.x 1 2 3 4 5
y 30 15 10 7.5 6
b Apply a reciprocal transformation to the x values(
1x
), again plot the data, and comment on
the form of the relationship between y and 1x .
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or non-linear.
x54321
5
10
15
20
30
25
y
The relationship between y and x is non-linear.3 Write down your conclusion.
b 1 Construct a new table of values.1/x 1.0 0.5 0.33 0.25 0.2
y 30 15 10 7.5 6
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
178 Essential Further Mathematics – Core
2 Plot the values of y against 1x .
3 Decide if the form of the
relationship is linear or non-linear.
5
10
15
20
30
25
1.000.800.600.400.20x
1
y
The relationship between y and1
xis linear.4 Write down your conclusion.
Once again, this transformation is very conveniently carried out with the aid of a graphics
calculator.
How to apply the reciprocal transformation using the TI-Nspire CAS
Plot the data presented in the table below.
x 1 2 3 4 5
y 30 15 10 7.5 6
Apply a reciprocal transformation to the x values
(1
x
)and replot the data.
Steps1 Start a new document by pressing /
+ N .
2 Select 3:Add Lists & Spreadsheet.
Enter the data into lists named x and y,
as shown opposite.
3 Press c and select 5:Data & Statistics.
Construct a scatterplot of y against x.
Let x be the independent variable and y
the dependent variable. The plot is
clearly non-linear.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 179
4 Return to the Lists & Spreadsheet
application (by pressing / + ).
To calculate the values of1
x, complete the
following:
a Move the cursor to the top of column
C and type recx (short for the
reciprocal of x). Press enter .
b Move the cursor to the grey cell
immediately below the recx heading
and type = 1 ÷, then press VAR
( ) and highlight the variable x and
press enter to paste into the formula
line. Press enter to calculate
anddisplay the1
xvalues.
If your answers are not presented as
decimals, refer to the Appendix to
change Mode settings to APPRX.
5 Construct a scatterplot of y against1
x(i.e. recx)
Use / + to return to the scatterplot
created earlier and change the independent
variable to recx.
A scatterplot of y against recx (the
reciprocal of x) is displayed as shown. The
plot is clearly linear.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
180 Essential Further Mathematics – Core
How to apply the reciprocal transformation using the ClassPad
Plot the data presented in the table below.
x 1 2 3 4 5
y 30 15 10 7.5 6
Apply a reciprocal transformation to the x values
(1
x
)and replot the data.
Steps1 Open the Statistics application
and enter the data into the
columns named x and y. Your
screen should look like the one
shown opposite.
2 Construct a scatterplot of yagainst x. Let x be the
independent variable and y the
dependent variable. The plot is
clearly non-linear.
3 To calculate the values of1
xand
store them in a list named recx(short for the reciprocal of x),
complete the following:
a Tap to highlight the cell at the
top of the next empty list (in
this case, list3). Rename by
typing recx and pressing enter .
b Tap to highlight the cell at the
bottom of the newly named
recx column (in the row titled
Cal ). Typing 1 ÷ x and
pressing calculates and
lists the1
xvalues.
4 Construct a scatterplot of y
against recx(
i.e.1
x
). The plot is
clearly linear.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 181
What sorts of non-linear relationships can we linearise using the x2 transformation?The x2 transformation has the effect of stretching out the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to x2 transformation. Note that for the x2 transformation to
apply, the scatterplot should peak or bottom around x = 0.
x
y
x
y
What sorts of non-linear relationships can we linearise using the log x transformation?The log x transformation has the effect of compressing the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to log x transformation.
x
y
x
y
What sorts of non-linear relationships can we linearise using the1x
transformation?As a guide, relationships that have scatterplots which look like those shown below can often
(but not always) be linearised using the x to1
xtransformation.
x
y
x
y
Exercise 6B
These exercises are expected to be completed with the aid of a graphics calculator.
1 a Plot the data in the table, and comment on
the form of the relationship between y and x.x 0 1 2 3 4
y 16 15 12 7 0b Apply a squared transformation to the x values
(x2), again plot the data, and comment on the form of the relationship between y and x2.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
182 Essential Further Mathematics – Core
2 a Plot the data in the table, and comment on
the form of the relationship between y and x.x 1 2 3 4 5
y 3 9 19 33 51
b Apply a squared transformation to the x values
(x2), again plot the data, and comment on the form of the relationship between y and x2.
3 a Plot the data in the following table, and
comment on the form of the relationship
between y and x.
x 1 2 3 4 5
y 30 27 22 15 6
b Apply a squared transformation to the x values (x2), again plot the data, and comment
on the form of the relationship between y and x2.
4 a Plot the data in the following table,
and comment on the form of the
relationship between y and x.
x 1 10 100 400 600 1000
y 30 20 10 5 2 0
b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
5 a Plot the data in the table, and
comment on the form of the
relationship between y and x.
x 5 10 150 500 1000
y 3.1 4.0 7.5 9.1 10.0
b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
6 a Plot the data in the table, and
comment on the form of the
relationship between y and x.
x 10 44 132 436 981
y 15.0 11.8 9.4 6.8 5.0
b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
7 a Plot the data in the table, and comment on the
form of the relationship between y and x.x 2 4 6 8 10
y 60 30 20 15 12b Apply a reciprocal transformation to the x values
(1/x), again plot the data and comment on the form of the relationship between y and 1/x.
8 a Plot the data in the table, and comment on the
form of the relationship between y and x.x 1 2 3 4 5
y 61 31 21 16 13b Apply a reciprocal transformation to the x values
(1/x), again plot the data and comment on the form of the relationship between y and 1/x.
9 a Plot the data in the following table, and
comment on the form of the relationship
between y and x.
x 2 4 6 8 10
y 10 70 90 100 106
b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment
on the form of the relationship between y and 1/x.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 183
c Name an x-axis transformation that should also work for this data. Try it and see.
d Name an x-axis transformation that should not work for this data. Try it and see.
10 The table below shows the diameter (in cm) of a number of umbrellas, along with the
number of people each umbrella is designed to keep dry.
Diameter 50 70 85 100 110
Number of people 1 2 3 4 5
a Construct a scatterplot showing the relationship between number of people and
umbrella diameter, and comment on the form.
b Apply a squared transformation to the x values (x2), again plot the data, and comment
on the form of the relationship between y and x2.
11 The table below shows the performance level on a task of a number of people, along with
the time spent (in minutes) in practising the task.
Time spent on practise 0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 7.0
Level of performance 1.0 1.5 2.0 3.0 3.0 3.5 4.0 3.5 3.9 3.6
a Construct a scatterplot showing the relationship between the time spent on practice and
level of performance, and comment on the form.
b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
12 The table below shows the horsepower of several cars, along with their fuel consumption in
kilometres/litre.
Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7
Horsepower 155 125 75 110 138 88 80 70 100 103
a Construct a scatterplot showing the relationship between horsepower and fuel
consumption, and comment on the form.
b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment
on the form of the relationship between y and 1/x.
6.3 Transforming the y axisAnother way to linearise the relationship between x and y is to apply these transformations to
the y axis. Transforming the y axis will have the effect of moving the y values on the plot
vertically, and leave the x values unaltered. The square, log and reciprocal transformations can
be applied to the y axis with the following effects:SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
184 Essential Further Mathematics – Core
Transformation Outcome Graph
y2 Spreads out the large y values relative
to the smaller data values
x
y
log y Compresses large y values relative to
the smaller data values
x
y
1y
Also compresses large y values relative
to the smaller data values, to a greater
extent than log y. Note that values of y
less than 1 become greater than 1, and
values of y greater than 1 become less
than 1, so that the order of the data
values is reversed. x
y
The following examples show the effect on the relationship between x and y when the squared,
log and reciprocal transformations are applied to the y values. Once again, all these data
transformations can be very conveniently carried out with the aid of a graphics calculator.
Example 4 Linearising the relationship with a squared transformation
a Plot the data in this table, and comment on the form of the relationship between y and x.
x 0 1 2 3 4 5
y 0 3.2 4.5 5.5 6.3 7.1
b Apply a squared transformation to the y values (y2), again plot the data, and comment on
the form of the relationship between y2 and x.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or non-linear.
x
y
1 2 3 4 50
2
4
6
8
The relationship between y and x is non-linear.3 Write down your conclusion.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 185
b 1 Construct a new table of values. x 0 1 2 3 4 5
Y2 0 10.2 20.3 30.3 39.7 50.4
2 Plot the values of y2 against x.
3 Decide if the form of the
relationship is linear or non-linear.
x1 2 3 4 5
0
10
20
40
50
60
30
y 2
The relationship between y 2 and x is linear.4 Write down your conclusion.
Example 5 Linearising the relationship with the log transformation
a Plot the data in this table, and comment on the
form of the relationship between y and x.x 0 1 2 3 4 5
y 100 37 14 5 2 1b Apply a log transformation to the y values (log y),
again plot the data, and comment on the form of the relationship between log y and x.
Solution
a 1 Plot the values of log y against x.
2 Decide if the form of the
relationship is linear or non-linear.
y
x1 2 3 4 5
0
20
40
60
80
100
The relationship between y and x is non-linear.3 Write down your conclusion.
b 1 Construct a new table of values.x 0 1 2 3 4 5
log y 2.00 1.57 1.15 0.70 0.30 0.00
2 Plot the values of log y against x.
3 Decide if the form of the
relationship is linear or non-linear.
x
log y
1 2 3 4 50
0.5
1.0
1.5
2.0
The relationship between log y and x is linear.4 Write down your conclusion.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
186 Essential Further Mathematics – Core
Example 6 Linearising the relationship with the1y
transformation
a Plot the data in this table, and comment on the
form of the relationship between y and x.x 1 2 3 4 5
y 10.0 5.0 3.3 2.5 2.0b Apply a reciprocal transformation to the y values
(1/y), again plot the data, and comment on the form of the relationship between x and 1/y.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the relationship
is linear or non-linear.
x
y
1 2 3 4 52
4
6
8
10
The relationship between y and x is non-linear.3 Write down your conclusion.
b 1 Construct a new table of values.x 1 2 3 4 5
1/y 0.1 0.2 0.3 0.4 0.52 Plot the values of1
yagainst x.
3 Decide if the form of the
relationship is linear or non-linear.
x1 2 3 4 5
0.1
0.2
0.3
0.4
0.5
1y
The relationship between1
yand x is linear.4 Write down your conclusion.
What sorts of non-linear relationships can we linearise using the y2 transformation?The y2 transformation has the effect of stretching out the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to y2 transformation. Note that for the y2 transformation to
apply, the scatterplot should peak or bottom around y = 0.
x
y
x
ySAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 187
What sorts of non-linear relationships can we linearise using the log y transformation?The log y transformation has the effect of compressing the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to log y transformation.
x
y
x
y
What sorts of non-linear relationships can we linearise using the1y
transformation?
As a guide, relationships that have scatterplots which look like those shown below can often
(but not always) be linearised using the y to1
ytransformation.
x
y
x
y
Exercise 6C
These exercises are expected to be completed with the aid of a graphics calculator.
1 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 0 2 4 6 8 10
y 1.2 2.8 3.7 4.5 5.1 5.7b Apply a squared transformation to the y values
(y2). Plot the data, and comment on the form of the relationship between y2 and x.
2 a Plot the data in the table. Comment on
the form of the relationship between
y and x.
x 5 10 15 20 25 30
y 13.2 12.2 11.2 10.0 8.7 7.1
b Apply a squared transformation to the y values (y2). Plot the data, and comment on the
form of the relationship between y2 and x.
3 a Plot the data in the table. Comment on
the form of the relationship between
y and x.
x 2 6 11 12 21 40
y 5.1 6.2 7.3 7.5 9.1 11.8
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
188 Essential Further Mathematics – Core
b Apply a squared transformation to the y values (y2). Plot the data and comment on the
form of the relationship between y2 and x.
4 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 0.1 0.2 0.3 0.4 0.5
y 15.8 25.1 39.8 63.1 100.0b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
5 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 2 4 6 8 10
y 7.94 6.31 5.01 3.98 3.16b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
6 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 1 3 5 7 9
y 7 32 147 681 3162b Apply a log transformation to the y values
(log y). Plot the data, and comment on the form of the relationship between log y and x.
7 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 1 2 3 4 5
y 1 0.5 0.33 0.25 0.20b Apply a reciprocal transformation to the
y values (1/y). Plot the data and comment on the form of the relationship
between 1/y and x.
8 a Plot the data in the table. Comment on the
form of the relationship between y and x.x 0.2 0.4 0.6 0.8 1.0
y 0.71 0.56 0.45 0.38 0.33b Apply a reciprocal transformation to the y
values (1/y). Plot the data and comment on the form of the relationship between
1/y and x.
9 a Plot the data in the table. Comment
on the form of the relationship
between y and x.
x 11 14 26 35 41
y 0.43 0.34 0.19 0.14 0.12
b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on
the form of the relationship between 1/y and x.
c Name a y-axis transformation that should also work for this data. Try it and see.
d Name a y-axis transformation that should not work for this data. Try it and see.
10 The time taken for a local anaesthetic to take effect is related to the dose given. To
investigate this relationship a researcher collected the data shown.
Dose 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
Time 3.67 3.55 3.42 3.29 3.15 3.00 2.85 2.68 2.51 2.32 2.12
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 189
a Construct a scatterplot showing the relationship between the dose of anaesthetic and the
time taken for it to take effect, and comment on the form.
b Apply a squared transformation to the time values (y), again plot the data, and comment
on the form of the relationship between time squared (y2) and dose and x.
11 The table below shows the number of internet users signing up with a new internet service
provider for each of the first nine months of their first year of operation.
Number 24 32 35 44 60 61 78 92 118
Month 1 2 3 4 5 6 7 8 9
a Construct a scatterplot showing the relationship between number of users signing up
and month, and comment on the form. Month is the independent variable.
b Apply a log transformation to the number of users (y), again plot the data, and comment
on the form of the relationship between log (number) and month and x.
12 A group of ten students was given an opportunity to practise a complex matching task as
often as they liked before they were assessed on the task. The number of times they
practised the task and the number of errors made when assessed are given in the table
below.
Number 1 2 2 4 5 6 7 7 9 11
Errors 14 9 11 5 4 4 3 3 2 2
a Construct a scatterplot showing the relationship between number of practices and
number of errors (y), and comment on the form.
b Apply a reciprocal transformation to the number of errors values (1/y), again plot the
data, and comment on the form of the relationship between 1/number of errors and
number of practices.
6.4 Choosing and applying the appropriatetransformationPutting together the information in Sections 6.2 and 6.3, we can see that there may be more
than one transformation which linearises the scatterplot. The forms of the scatterplots that can
be transformed by the squared, log or reciprocal transformations can be largely classified into
one of four categories, shown as the circle of transformations.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
190 Essential Further Mathematics – Core
y2
log x
1x
y 2
x2
log y
log x
1x
1y
x2
log y
1
Possible transformations Possible transformations
The circle of transformations
y
Note that the transformations we have introduced in this chapter are only able to linearise
relationships which are consistently increasing or decreasing.
The advantage of having alternatives is that in practice, we can always try each of them to
see which gives us the best result. How do we decide which transformation is the best? The
best transformation is the one that results in the best linear model. To choose the best linear
model we will consider for each transformation applied:
the residual plot, in order to evaluate the linearity of the transformed relationship
the value of the coefficient of determination (r2): a higher value indicates a better fit
This procedure is illustrated in Example 7.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 191
Example 7
The data in this table gives life expectancy in years and gross national product, GNP, in dollars
for 24 countries in 1982. Using an appropriate transformation, find a regression model for the
relationship between life expectancy in years and GNP.
Country GNP Life expectancy
Nicaragua 950 58
Paraguay 1670 65
Venezuela 4250 68
France 11 520 74
West Germany∗ 12 280 73
Greece 4170 73
Norway 14 300 75
Czechoslovakia∗ 5540 71
Austria 9830 72
Jordan 1680 61
Sri Lanka 320 67
Brunei 22 260 66
Country GNP Life expectancy
Indonesia 550 50
North Korea 930 66
Mongolia 940 64
Taiwan 2 670 72
Australia 11 220 74
Congo 1 420 48
Ethiopia 150 41
Guinea 330 44
Mauritania 520 44
Nigeria 940 49
Togo 350 48
Zaire 180 48
Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537∗ West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia
Solution
1 Decide which of the variables is the
independent variable, and which is the
dependent variable.
The independent variable is GNP. The
dependent variable is Life expectancy.
2 Plot the values of y against x, decide if
the form of the relationship is linear or
non-linear, and find the value of the
coefficient of determination (r2).
Life
exp
ecta
ncy
GNP
3 Write down your conclusion. The relationship between y and x is
non-linear: r 2 = 36.7%.
4 Compare the shape of this plot to those in the
circle of transformations (page 166). The
scatterplot is similar to the plot in the top
left-hand corner. Thus, the y2, log x and 1x
are the transformations to investigate.
Suitable transformations are y 2, log x
and
(1x
).
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
192 Essential Further Mathematics – Core
Try the y2 transformation
5 Calculate the values of (Life expectancy)2
and plot these against GNP. Comment
on the linearity of the plot.
GNP
Lif
e ex
pect
ancy
squ
ared
6 Fit a regression line, and find the value of
the coefficient of determination (r2).
Produce a residual plot, and use this to
comment on the form of the relationship.
r 2 = 38.4%. The relationship between (Life
expectancy) 2 and x is still non-linear. This is
confirmed by the residual plot.
Res
idua
l
GNP
0
7 Comment on the effect of the
transformation. The y 2 transformation has not really helped.
Try the log x transformation
5 Calculate the values of log GNP and plot
these against Life expectancy.
log GNP
Lif
e ex
pect
ancy
6 Fit a regression line, and find the value of
r2. Produce a residual plot, and use this to
comment on the form of the relationship.
r 2 = 66.0%. The relationship between Life
expectancy and log GNP is closer to linear.
This is confirmed by the residual plot.SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 193
7 Comment on the effect of the
transformation.
log GNP
Res
idua
l
The log x transformation has linearised the
relationship quite well.
Try the 1/x transformation
8 Calculate the values of1
GNPand plot
Life expectancy against1
GNP.
Lif
e ex
pect
ancy
1/GNP
9 Fit a regression line, and find the value of
r2. Produce a residual plot and use this to
comment on the form of the relationship.
r 2 = 51.5%. The relationship between Life
expectancy and 1/GNP is reasonably linear.
This is confirmed by the residual plot.
1/GNP
Res
idua
l
0
10 Comment on the effect of the
transformation.
The 1/x transformation has done a
reasonable job in linearising the relationship.
11 Decide which transformation is the
most appropriate for this relationship.
Choose the transformation which
gives the most linear relationship
(from the residual plots) and the
highest value of r2.
The most appropriate transformation to use
here is the log x transformation, as the
residual plot shows that the relationship
between log GNP and Life expectancy is
linear, and this model has the highest
coefficient of determination, r 2 = 66.0%.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
194 Essential Further Mathematics – Core
12 As the relationship between Life expectancy
and log (GNP) appears to be linear and there
are no obvious outliers, we can use the
least squares method to fit a line to the data.
Using a calculator, find the equation of the
least squares regression line and write it in
terms of the transformed variables.
Using the log x transformation gives a
regression model for the relationship:
Life expectancy = 14.3 + 14.5 × log(GNP)
Note: The independent variable (IV) is nowlog GNP, and the dependent variable (DV) isLife expectancy.
Some commentsIt might seem ‘unnatural’ to talk about the wealth of a country in terms of log (GNP), yet when
we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure
than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka
is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which
tells us that Australia’s GNP is 101.5884 or 35 times that of Sri Lanka. ‘Natural’ units of
measurement are more often those that are familiar rather than those that are most useful!
Exercise 6D
1 The following scatterplots show non-linear relationships. For each scatterplot, state which
of the transformations x2, log x, 1/x , y2, log y, 1/y, if any, you would apply to linearise the
relationship.a
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
x
y
b
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
x
y
c
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
x
y
d
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
x
ySAM
PLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Chapter 6 — Data transformation 195
2 The data below gives the yield in kilograms and length in metres of twelve commercial
potato plots.
Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320
Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3
a Construct a scatterplot showing the relationship between yield and length of plot and
comment on the form.
b Using an appropriate transformation, find a regression model for the relationship
between yield in kilograms and length of plot in metres.
3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per
pack. Some data drawn from that study is shown opposite.
Cost ($) 4.00 4.50 4.80 5.50 6.00 6.50 7.50
Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0
a Construct a scatterplot showing the relationship between the cost of cigarettes and
cigarette consumption, and comment on the form.
b Using an appropriate transformation find a regression model for the relationship between
the cost of cigarettes and cigarette consumption.
4 The population of a large town increased over a 13 year period as shown in the table.
Year Population Year Population
1 58860 8 61726
2 57770 9 60387
3 58206 10 61646
4 59513 11 62347
5 59983 12 64185
6 60123 13 67158
7 59763
a Construct a scatterplot showing the
annual population growth of the town,
and comment on the form.
b Using an appropriate transformation,
find a regression model for the annual
population growth of the town.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
196 Essential Further Mathematics – Core
5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and
the US dollar over a period of 18 months in the 1990s is given in the table below.
Month Exchange rate Month Exchange rate
(US $) (US $)
1 0.77 10 0.75
2 0.77 11 0.75
3 0.77 12 0.72
4 0.76 13 0.72
5 0.78 14 0.69
6 0.78 15 0.69
7 0.77 16 0.68
7 0.76 17 0.71
8 0.76 18 0.70
9 0.76
a Construct a scatterplot showing
the exchange rate over the 18-month
period, and comment on the form.
b Using an appropriate transformation,
find a regression model for the
exchange rate over that 18-month
period.
6 The table below shows the percentage of people who can read (literacy rate) and the gross
domestic product (GDP) for a selection of 14 countries.
Country Literary Gross domestic
rate (%) product/capita
Botswana 72 2677
Cambodia 35 260
Canada 97 19904
Ethiopia 24 122
France 99 18944
Georgia 99 4500
Germany 99 17539
Honduras 73 1030
Japan 99 19860
Liberia 40 409
Pakistan 35 406
Saudi Arabia 62 6651
Switzerland 99 22384
Syria 64 2436
a Construct a scatterplot showing the
relationship between literacy rate
and GDP, and comment on the form.
b Using an appropriate transformation
find a regression model for the
relationship between literacy rate
and GDP for this group of countries.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Review
Chapter 6 — Data transformation 197
Key ideas and chapter summary
Data transformation This means changing the scale on either the x or y axis. It is
performed when a residual plot shows that the underlying
relationship in a set of bivariate data is clearly non-linear.
x2 or y2 transformation The square transformation stretches out the upper end of
the scale on an axis.
log x or log y transformation The log transformation compresses the upper end of the
scale on an axis.1
xor
1
ytransformation The reciprocal transformation compresses the upper end
of the scale on an axis to a greater extent than the log
transformation.
Residual plots Residual plots are used to assess the effectiveness of each
data transformation.
Coefficient of determination (r2) The transformation which results in a linear relationship
and which has the highest value of the coefficient of
determination is considered to be the best transformation.
The circle of transformations The circle of transformations provides guidance in
choosing the transformations that can be used to linearise
various types of scatterplots. See page 166.
Skills check
Having completed this chapter you should be able to:
recognise which of the x2, log x,1
x, y2, log y or
1
ytransformations might be used to
linearise a bivariate relationship
apply each of these transformations to a data set
use residual plots and the coefficient of determination r2 to decide which
transformation gives the best model for the relationship
use the transformed variable as part of a regression analysis to give a model for the
relationship
Multiple-choice questions
1 The missing data values, a and b, in the table are:
value 1 2 3 4
(value)2 a 4 9 16
log(value) 0 b 0.477 0.602
A a = 0, b = 0.5 B a = 1, b = 0.5 C a = 1, b = 0.301
D a = 1, b = 0.602 E a = 1, b = 0.693
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Rev
iew
198 Essential Further Mathematics – Core
2 Select the statement which correctly completes the sentence:
‘The effect of a log transformation is to . . .’
A stretch the high values in the data B maintain the distance between values
C stretch the low values in the data D compress the high values in the data
E reverse the order of the values in the data
3 The scatterplot opposite shows the
relationship between the number of weeks
each person has been on a diet program and
their weight loss in kilograms for a group
of subjects. A least squares regression
line has been fitted to the data.
Number of weeks on a diet
0
2
4
6
8
10
12
14
2 3 4 5 6 7
Wei
ght l
oss
The residual plot for this least squares line would look like:
A
Num
ber
of w
eeks
on
a di
et
2
3
4
5
6
7
–4.00 –2.00 0.00 2.00 4.00Residual
B
Number of weeks on a diet2 3 4 5 6 7
–4.00
–2.00
0.00
2.00
4.00
Res
idua
l
C
Num
ber
of w
eeks
on
a di
et
0 2 64 8 10 12 142
3
4
5
6
7
Weight loss
D
Number of weeks on a diet2 3 6 74 5
02468
101214
Wei
ght l
oss
E
Number of weeks on a diet2 3 4 5 6 7
–4.00
–2.00
0.00
2.00
4.00
Res
idua
l
4 The relationship between two variables y and
x as shown in the scatterplot is non-linear.
01 2 3 4 5 6 7 8 9 10
1
2
3
4
5
x
y
In an attempt to transform the relationship to
linearity, a student would be advised to:
A leave out the first four points
B use a y2 transformation
C use a log y transformation
D use a1
ytransformation
E use a least squares regression line
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Review
Chapter 6 — Data transformation 199
5 The relationship between two variables y and x as shown in the scatterplot is
non-linear.
01 2 3 4 5 6 7 8 9 10
1
2
3
4
5
x
yWhich of the following sets of transformations
could possibly linearise this relationship?
A log y,1
y, log x,
1
xB y2, x2
C y2, log x,1
xD log y,
1
y, x2
E ax + b
6 The relationship between two variables y and x as shown in the scatterplot is
non-linear.
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
x
yWhich of the following transformations is most
likely to linearise the relationship?
A a1
xtransformation B a y2 transformation
C a log y transformation D a1
ytransformation
E a log x transformation
7 The relationship between two variables y and x as shown in the scatterplot is clearly
non-linear.
1 2 3 4 5 6 7 8 9 10x0
1
2
3
4
5
yIn an attempt to transform the relationship
to linearity, a student would be advised to
apply:
A an x2 transformation
B a y2 transformation
C a log y transformation
D a1
ytransformation
E none of these
8 Brian has determined from a scatterplot of his data that the appropriate
transformations for his data are log x, 1/x and y2. After applying each of these
transformations to the data, he obtains the results shown below.
Model Residuals r2
y vs x Curved 79.6%
y vs log x Random 80.8%
y vs 1/x Random 81.9%
y2 vs x Random 88.4%
(cont’d.)
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Rev
iew
200 Essential Further Mathematics – Core
Based on the information in the table, which transformation would you suggest
Brian use?
A an x2 transformation B a y2 transformation C a log x transformation
D a1
xtransformation E no transformation
9 When investigating the relationship between the weight of the strawberries picked
from a strawberry patch, and the width of the patch, Suzie decides that an x2
transformation is appropriate. After transforming the data, she fits a least squares
regression line to the data and determines that the intercept is 10 and the slope is 5.
Based on this information, the model that Suzie has fitted to the data can be written
as:
A (weight)2 = 10 + 5 × width B weight = 5 + 10 × (width)2
C weight = 10 + 5 × (width)2 D (weight)2 = 10 + 5 × (width)2
E (weight)2 = 5 + 10 × width
10 Suppose that the model which describes the relationship between the hours spent
studying for an exam and the mark achieved can be modelled by the equation:
Mark = 20 + 40 × log (Hours)
From this model, we would predict that a student who studies for 20 hours would
score a mark (to the nearest whole number) of:
A 80 B 78 C 180 D 72 E 140
Extended-response questions
1 Measurements of distance travelled in metres and time taken in seconds were made
on a falling body. The data is given in the table below.
Time 0 1 2 3 4 5 6
Distance 0 5.2 18.0 42.0 79.0 128.0 168.0
Time2
a Construct a scatterplot of the data and comment on its form.
b Determine the values of (Time)2 and complete the table.
c Construct a scatterplot of Distance against (Time)2.
d Obtain a residual plot for the new model and comment on the linearity.
e Determine the value of r2 for the new model.
f Write down the regression equation for the new model in terms of the variables
in the question.
g Use the regression equation to predict the distance travelled in seven seconds.
2 The data in the table below shows the marks obtained by students on a test and the
amount of time they reported studying for the test:
Mark 62 74 79 80 56 86 92 87 64 88 48 32
Time (hours) 1.5 2.25 3.0 2.5 0.8 3.5 6.0 2.75 1.0 4.5 0.5 0.1
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Review
Chapter 6 — Data transformation 201
a We want to predict a student’s mark from the time they reported studying for the
test. In this situation, which is the dependent variable and which is the
independent variable?
b Construct a scatterplot and comment on the relationship between test mark and
time spent studying in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefficient of determination and interpret.
iii Construct a residual plot and use it to comment on the suitability of modelling
the relationship between Mark and Time spent studying with a straight line.
d Apply a log transformation to Time. Then:
i construct a scatterplot for the transformed data
ii find the equation of the least squares regression line for the transformed data
iii use the equation to predict the mark obtained after 5 hours of study
iv calculate the coefficient of determination and interpret
v construct a residual plot and use it to comment on the linearity of the
transformed model
3 The following are the testosterone levels and the age at first conviction for violent
and aggressive crimes collected on a sample of young male prisoners. It is believed
that the higher the testosterone level in a male prisoner, the earlier they are likely to
be convicted of a violent and aggressive crime. A correlation and regression
analysis is also given.
Testosterone Age at firstconviction
1305 11
1000 12
1175 13
1495 14
1060 15
800 16
1005 16
710 17
1150 18
605 20
690 21
700 23
625 24
610 27
450 30
1012141618202224262830
400
600
800
1000
1200
1400
1600
Testosterone level
Age
y = 31.9 – 0.015xr2 = 0.662
400
600
800
1000
1200
1400
1600
Testosterone level
Res
idua
l
10
2345
–1–2–3–4–5
(cont’d.)
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Rev
iew
202 Essential Further Mathematics – Core
a What is the value of Pearson’s correlation coefficient r?
b Write the equation of the least squares regression line in terms of Testosterone
level and Age.
c Interpret the value of r2 in terms of Testosterone level and Age.
d Use the residual plot to comment on the linearity of the relationship.
e Construct a scatterplot of Age against log (Testosterone).
f Obtain a residual plot for the new model and comment on the linearity.
g Determine the value of r2 for the new model.
h Write down the regression equation for the new model in terms of the variables
in the question.
4 Are infant mortality rates in a country related to the number of doctors in a
country? The data below gives infant mortality rates (deaths per 1000 births) and
doctor numbers (per 100 000 people) for 17 countries.
Infant mortality No. of doctors Infant mortality No. of doctors
12 192 15 270
13 222 85 9
12 154 20 357
14 294 21 250
10 182 54 79
10 179 75 59
7 204 121 27
10 271 71 52
111 61
a Construct a scatterplot of infant mortality against number of doctors and
comment on the relationship between infant mortality rate and doctor numbers
in terms of direction, outliers, form and strength.
b Construct a scatterplot of Infant mortality against log (Number of doctors).
c Obtain a residual plot for the new model and comment on the linearity.
d Determine the value of r2 for the new model.
e Write down the regression equation for the new model in terms of the variables
in the question.
f Use the regression equation to predict the infant mortality rate when there are
100 doctors (per 100 000).
5 Tree ages can be determined by cutting down a tree and counting the number of
rings on the stump of its trunk. This, however, is a destructive process and it would
be useful to have a method of working out the approximate age of a tree without
having to cut it down. Noting the obvious, that trees tend to get bigger as they get
older, we might be able to use some external measurement of size to help us
estimate the age of a tree.
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin
P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28
Review
Chapter 6 — Data transformation 203
The data below shows the age (in years) and diameter at chest height (in cm) of a
sample of trees of the same species taken from a commercial plantation.
Age Diameter Age Diameter(years) (centimetres) (years) (centimetres)
4 2.0 16 11.4
5 2.0 18 11.7
8 2.5 22 14.7
8 5.1 25 16.5
8 7.5 29 15.2
10 5.1 30 15.2
10 8.9 34 17.8
12 12.4 38 17.8
13 9.0 40 19.1
14 6.4
a We wish to predict the age of a tree from its diameter at chest height. In this
situation, which is the dependent variable and which is the independent variable?
b Construct a scatterplot and comment on the relationship between age and
diameter in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefficient of determination and interpret.
iii Form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter with a straight line.
d Use the x2 transformation to linearise the data. Then:
i construct a scatterplot of age against diameter squared
ii find the equation of the least squares regression line for the transformed data
iii calculate the coefficient of determination and interpret
iv form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter squared with a straight line
SAMPLE
Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin