38
C H A P T E R 6 CORE Data transformation What do we mean by data transformation? What is the effect of applying a log, squared or reciprocal transformation to the x variable? What is the effect of applying a log, squared or reciprocal transformation to the y variable? How do I choose which data transformation to apply? How do I carry out a regression analysis with transformed data? 6.1 Data transformation There are methods for fitting curves to non-linear relationships, using non-linear regression. However, this procedure is mathematically complicated and the results difficult to interpret. The method of dealing with a non-linear relationship favoured in practice is to apply a mathematical function to one of the variables, so that the relationship between the variables becomes closer to a straight line. By appropriate choice of the function, the scale of measurement is stretched or compressed. There are many functions that can be used to transform the data, but here we will consider only three. These are: the square transformation the logarithmic transformation the reciprocal transformation When first confronted with data transformation, many people tend to be suspicious. However, when we think about it from the point of view of analysing a set of data, there is nothing special about the units of measurement used when gathering the data. In general, units used are chosen because they are convenient for recording and reporting the data. ‘Natural’ units tend to be used, for example, seconds when recording time, or metres when recording length. But what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres per kilometre (x 1 )? In measurement, natural often tends to mean familiar. For example, to a chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen-ion concentration (log x) rather than the hydrogen-ion concentration (x). 166 SAMPLE Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics – · 2008-9-9

  • Upload
    dinhthu

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

C H A P T E R

6CORE

Data transformation

What do we mean by data transformation?

What is the effect of applying a log, squared or reciprocal transformation to the x

variable?

What is the effect of applying a log, squared or reciprocal transformation to the y

variable?

How do I choose which data transformation to apply?

How do I carry out a regression analysis with transformed data?

6.1 Data transformationThere are methods for fitting curves to non-linear relationships, using non-linear regression.

However, this procedure is mathematically complicated and the results difficult to interpret.

The method of dealing with a non-linear relationship favoured in practice is to apply a

mathematical function to one of the variables, so that the relationship between the variables

becomes closer to a straight line. By appropriate choice of the function, the scale of

measurement is stretched or compressed. There are many functions that can be used to

transform the data, but here we will consider only three. These are:

the square transformation

the logarithmic transformation

the reciprocal transformation

When first confronted with data transformation, many people tend to be suspicious. However,

when we think about it from the point of view of analysing a set of data, there is nothing

special about the units of measurement used when gathering the data. In general, units used are

chosen because they are convenient for recording and reporting the data. ‘Natural’ units tend

to be used, for example, seconds when recording time, or metres when recording length. But

what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres

per kilometre (x−1)? In measurement, natural often tends to mean familiar. For example, to a

chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen-ion

concentration (log x) rather than the hydrogen-ion concentration (x).

166

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 2: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 167

Generally, it is only luck if a data set reveals all its hidden information when analysed in the

form in which it was initially gathered and/or reported. It is part of the analyst’s role to search

out different ways of looking at the data in order to enhance our understanding of that data.

One of the most powerful tools available to help achieve this task is data transformation.

How do these transformations affect the values to which they are applied? Consider the

following table of numbers:

value 0.2 0.4 0.6 1 2 3 4

(value)2 0.04 0.16 0.36 1 4 9 16

log(value) −0.699 −0.398 −0.222 0 0.301 0.477 0.602

1/value 5 2.5 1.667 1 0.5 0.333 0.25

From the table we can see that the transformations have the following effects on the data values:

The square transformation has the effect of decreasing values less than 1, and increasing

values greater than 1. Large values are increased the most. For example, 22 = 4, while

202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400 are

396 units apart. That is, the effect of the square transformation is to stretch the values.

The log transformation reduces all values, and values between 0 and 1 become negative.

Large values are reduced much more than small values. For example, log 2 = 0.301, while

log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and

1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the

values. Note that the log function can only be applied to values which are greater than 0.

The reciprocal transformation again reduces all values greater than one. Large values

are reduced much more than small values. For example, 12 = 0.5, while 1

20= 0.05, so that

while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units

apart. That is, the effect of the reciprocal transformation is to compress the large values to

an even greater extent than the log transformation.

Thus it can be seen that all transformations have a greater effect on the larger values, but this

effect varies for each transformation.

Exercise 6A

1 a Copy and complete the table.value 1 2 3 4 5 6 7

(value)2

log(value)

1/value

b Use the information in the table to

complete the following statements

by deleting the incorrect term.

i The squared transformation

stretches/compresses the scale.

ii The log transformation stretches/compresses the larger values.

iii The reciprocal transformation stretches/compresses the larger values.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 3: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

168 Essential Further Mathematics – Core

2 a Copy and complete the table. value 1 2 4 8 16 32 64

(value)2

log(value)

1/value

b Use the information in the table to

complete the following statements

by deleting the incorrect term.

i The squared transformation

stretches/compresses the larger values.

ii The log transformation stretches/compresses the larger values.

iii The reciprocal transformation stretches/compresses the larger values.

3 a Copy and complete the table.

value 1 10 100 1000 10000 100000

(value)2

log(value)

1/value

b Use the information in the table to complete the following statements by deleting the

incorrect term.

i The squared transformation stretches/compresses the larger values.

ii The log transformation stretches/compresses the larger values.

iii The reciprocal transformation stretches/compresses the larger values.

4 a Copy and complete the table.

value 20 10 5 2.5 1.25 0.625

(value)2

log(value)

1/value

b Use the information in the table to complete the following statements by deleting the

incorrect term.

i The squared transformation stretches/compresses the larger values.

ii The log transformation stretches/compresses the larger values.

iii The reciprocal transformation stretches/compresses the larger values.

5 a Copy and complete the table.

value 2 20 200 2000 20000 200000

(value)2

log(value)

1/value

b Use the information in the table to complete the following statements by deleting the

incorrect term.

i The squared transformation stretches/compresses the larger values.

ii The log transformation stretches/compresses the larger values.

iii The reciprocal transformation stretches/compresses the larger values.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 4: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 169

6.2 Transforming the x axisWe are interested in linearising the relationship between two variables, x and y, and the

transformations discussed in the previous section can be applied to either x or y (but not both

here). We will examine the effect of transforming the x axis and the y axis separately.

Transforming the x axis will have the effect of moving the x values on the plot horizontally,

and leave the y values unaltered. The square, log and reciprocal transformations can be applied

to the x axis with the following effects:

Transformation Outcome Graph

x2 Spreads out the high x values relative

to the smaller x values

x

y

log x Compresses large x values relative to

the smaller data values

x

y

1x

Also compresses large x values relative

to the smaller data values, to a greater

extent than log x . Note that values of x

less than 1 become greater than 1, and

values of x greater than 1 become less

than 1, so that the order of the data

values is reversed. x

y

The following examples show the effect on the relationship between x and y when the squared,

log and reciprocal transformations are applied to the x values.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 5: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

170 Essential Further Mathematics – Core

Example 1 Linearising the relationship with a squared transformation

a Plot the data in the table, and comment on the

form of the relationship between x and y.x 0 1 2 3 4

y 2 3 6 11 18

b Apply a squared transformation to the x values (x2), again plot the data, and comment on

the form of the relationship between y and x2.

Solution

a 1 Plot the values of y against x.

2 Decide if the form of the

relationship is linear or non-linear.

x4321

5

0

10

15

20

y

The relationship between y and x is non-linear.3 Write down your conclusion.

b 1 Construct a new table of values.x2 0 1 4 9 16

y 2 3 6 11 18

2 Plot the values of y against x2.

3 Decide if the form of the

relationship is linear or non-linear.

x2

2015105

5

0

10

15

20

y

The relationship between y and x2 is linear.4 Write down your conclusion.

Data transformation is very conveniently carried out with the aid of a graphics calculator, and

in practice, this is how you will do it in future. Note that, throughout this chapter, you will find

it useful to enter the data into named lists because you will need to keep track of the various

lists of transformed data as you work through the problems.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 6: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 171

How to apply the squared transformation using the TI-Nspire CAS

Plot the data presented in the table below.

x 0 1 2 3 4

y 2 3 6 11 18

Apply a squared transformation to the x values (x2) and replot the data.

Steps1 Start a new document by pressing /

+ N .

2 Select 3:Add Lists & Spreadsheet.Enter the data into lists named x and y,

as shown.

3 Press c and select 5:Data & Statistics.

Construct a scatterplot of y against x.

Let x be the independent variable and y

the dependent variable. The plot is

clearly non-linear.

4 Return to the Lists & Spreadsheetapplication (by pressing / + ).

To calculate the values of x2 and store

them in a list named xsq (short for

x-squared), do the following:

a Move the cursor to the top of column C

and type xsq. Press enter .

b Move the cursor to the grey cell

immediately below the xsq heading. We

need to enter the expression = x2. To do

this, press = then VAR ( ), highlight

the variable x and then press enter to paste

x into the formula line. Finally, type ∧2

(or press ) to complete the formula.

Press enter to calculate and display the

x-squared values.

Note: The dash in front of the x (i.e. x) is

automatically added when a list name is

pasted from the VAR menu.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 7: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

172 Essential Further Mathematics – Core

Note: You can also type in the variable x

and then select Variable Reference

when prompted. This avoids using the

VAR menu.

5 Construct a scatterplot of y against x2.

Press / + to return to the

scatterplot created earlier and change

the independent variable to xsq as

follows:

a Move the cursor to the textbox area

below the horizontal (or x-) axis.

Press x when prompted and select

the variable xsq. Press enter to paste

the variable to that axis.

b A scatterplot of y against xsq (x2) is

then displayed, as shown. The plot is

clearly linear.

Note: If you wish to keep the original

plot of y against x you can create a new

Data & Statistics page to plot the

transformed data.

How to apply the squared transformation using the ClassPad

Plot the data presented in the table below.

x 0 1 2 3 4

y 2 3 6 11 18

Apply a squared transformation to the x values (x2) and replot the data.

Steps1 Open the Statistics application and

enter the data into the columns named xand y. Your screen should look like the

one shown.

2 Construct a scatterplot of y against x.

Let x be the independent variable and y

the dependent variable. The plot is

clearly non-linear.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 8: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 173

3 To calculate the values of x2 and store

them in a list named xsq (i.e. x-squared),

do the following:

a Tap to highlight the cell at the top of

the next empty list (in this case,

list3). Rename by typing xsq and

pressing enter .

b Tap to highlight the cell at the bottom

of the newly named xsq column (in

the row titled Cal ). Type x2 and

press to calculate and list the x2

values.

4 Construct a scatterplot of y against xsq

(i.e. x2). The plot is clearly linear.

Example 2 Linearising the relationship with the log transformation

a Plot the data in the table, and comment

on the form of the relationship between

x and y.

x 1 10 100 400 600 1000

y 0 10 20 25 28 30

b Apply a log transformation to the x values (log x), again plot the data, and comment on the

form of the relationship between y and log x.

Solution

a 1 Plot the values of y against x.

2 Decide if the form of the

relationship is linear or

non-linear.

x200 400 600 800 1000

5

0

10

15

30

25

20

y

The relationship between y and x is non-linear.3 Write down your conclusion.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 9: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

174 Essential Further Mathematics – Core

b 1 Construct a new table of values.log x 0 1 2 2.6 2.8 3

y 0 10 20 25 28 30

2 Plot the values of y against log x.

3 Decide if the form of the relationship

is linear or non-linear.

5

0

10

15

30

25

20

y

0.5 1 1.5 2 2.5 3log x

The relationship between y and log x is linear.4 Write down your conclusion.

Once again, this transformation is very conveniently carried out with the aid of a graphics

calculator.

How to apply the log transformation using the TI-Nspire CAS

Plot the data presented in the table below.

x 1 10 100 400 600 1000

y 0 10 20 25 28 30

Apply a log transformation to the x values (log(x)) and replot the data.

Steps1 Start a new document by

pressing / + N .

2 Select 3:Add Lists &Spreadsheet.Enter the data into lists

named x and y, as shown

opposite.

3 Press c and select

5:Data & Statistics.

Construct a scatterplot of y against x.

Let x be the independent variable

and y the dependent variable. The plot is

clearly non-linear.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 10: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 175

4 Return to the Lists & Spreadsheetapplication (by pressing / + ).

To calculate the values of log x and

store them in a list named lx (short for

log x), complete the

following:

a Move the cursor to the top of

column C and type lx. Press enter .

b Move the cursor to the grey cell

immediately below the lx

heading and type = log(. Then

press VAR ( ), highlight the

variable x, press enter to paste xinto the formula line, then type)

to complete the command. Pressenter to calculate and display the

log values.

If your answers are not given as

decimals, refer to

the Appendix to change Mode settings to APPRX.

5 Construct a scatterplot of y against

log x.

Use / + to return to the

scatterplot created earlier and

change the independent variable

to lx.A scatterplot of y against lx (i.e. the log of x)

is displayed, as shown. The plot is clearly linear.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 11: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

176 Essential Further Mathematics – Core

How to apply the log transformation using the ClassPad

Plot the data presented in the table below.

x 1 10 100 400 600 1000

y 0 10 20 25 28 30

Apply a log transformation to the x values (log(x)) and replot the data.

Steps1 Open the Statistics

application and enter

the data into the

columns named x and

y. Your screen should

look like the one

shown opposite.

2 Construct a scatterplot

of y against x. Let x be

the independent

variable and y the

dependent variable.

The plot is clearly

non-linear.

3 To calculate the values

of log x and store them

in a list named lx (short

for log x), complete the

following:

a. Tap to highlight the

cell at the top of the

next empty list (in

this case, list3).

Rename by typing

lx and pressing

enter .

b. Tap to highlight the

cell at the bottom of

the newly named lx

column (in the row

titled Cal ).

Typing log(x) and

pressing

calculates and lists

the values of log x.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 12: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 177

Note: To ensure decimal values are displayed, Decimal should be visible in the status bar

(at the bottom). If Standard is visible, tap Standard and it will change to Decimal.

4 Construct a scatterplot

of y against lx (i.e.

log x). The plot is

clearly linear.

Example 3 Linearising the relationship with the1x

transformation

a Plot the data in the table, and comment on

the form of the relationship between x and y.x 1 2 3 4 5

y 30 15 10 7.5 6

b Apply a reciprocal transformation to the x values(

1x

), again plot the data, and comment on

the form of the relationship between y and 1x .

Solution

a 1 Plot the values of y against x.

2 Decide if the form of the

relationship is linear or non-linear.

x54321

5

10

15

20

30

25

y

The relationship between y and x is non-linear.3 Write down your conclusion.

b 1 Construct a new table of values.1/x 1.0 0.5 0.33 0.25 0.2

y 30 15 10 7.5 6

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 13: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

178 Essential Further Mathematics – Core

2 Plot the values of y against 1x .

3 Decide if the form of the

relationship is linear or non-linear.

5

10

15

20

30

25

1.000.800.600.400.20x

1

y

The relationship between y and1

xis linear.4 Write down your conclusion.

Once again, this transformation is very conveniently carried out with the aid of a graphics

calculator.

How to apply the reciprocal transformation using the TI-Nspire CAS

Plot the data presented in the table below.

x 1 2 3 4 5

y 30 15 10 7.5 6

Apply a reciprocal transformation to the x values

(1

x

)and replot the data.

Steps1 Start a new document by pressing /

+ N .

2 Select 3:Add Lists & Spreadsheet.

Enter the data into lists named x and y,

as shown opposite.

3 Press c and select 5:Data & Statistics.

Construct a scatterplot of y against x.

Let x be the independent variable and y

the dependent variable. The plot is

clearly non-linear.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 14: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 179

4 Return to the Lists & Spreadsheet

application (by pressing / + ).

To calculate the values of1

x, complete the

following:

a Move the cursor to the top of column

C and type recx (short for the

reciprocal of x). Press enter .

b Move the cursor to the grey cell

immediately below the recx heading

and type = 1 ÷, then press VAR

( ) and highlight the variable x and

press enter to paste into the formula

line. Press enter to calculate

anddisplay the1

xvalues.

If your answers are not presented as

decimals, refer to the Appendix to

change Mode settings to APPRX.

5 Construct a scatterplot of y against1

x(i.e. recx)

Use / + to return to the scatterplot

created earlier and change the independent

variable to recx.

A scatterplot of y against recx (the

reciprocal of x) is displayed as shown. The

plot is clearly linear.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 15: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

180 Essential Further Mathematics – Core

How to apply the reciprocal transformation using the ClassPad

Plot the data presented in the table below.

x 1 2 3 4 5

y 30 15 10 7.5 6

Apply a reciprocal transformation to the x values

(1

x

)and replot the data.

Steps1 Open the Statistics application

and enter the data into the

columns named x and y. Your

screen should look like the one

shown opposite.

2 Construct a scatterplot of yagainst x. Let x be the

independent variable and y the

dependent variable. The plot is

clearly non-linear.

3 To calculate the values of1

xand

store them in a list named recx(short for the reciprocal of x),

complete the following:

a Tap to highlight the cell at the

top of the next empty list (in

this case, list3). Rename by

typing recx and pressing enter .

b Tap to highlight the cell at the

bottom of the newly named

recx column (in the row titled

Cal ). Typing 1 ÷ x and

pressing calculates and

lists the1

xvalues.

4 Construct a scatterplot of y

against recx(

i.e.1

x

). The plot is

clearly linear.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 16: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 181

What sorts of non-linear relationships can we linearise using the x2 transformation?The x2 transformation has the effect of stretching out the upper end of the x scale. As a guide,

relationships that have scatterplots which look like those shown below can often (but not

always) be linearised using the x to x2 transformation. Note that for the x2 transformation to

apply, the scatterplot should peak or bottom around x = 0.

x

y

x

y

What sorts of non-linear relationships can we linearise using the log x transformation?The log x transformation has the effect of compressing the upper end of the x scale. As a guide,

relationships that have scatterplots which look like those shown below can often (but not

always) be linearised using the x to log x transformation.

x

y

x

y

What sorts of non-linear relationships can we linearise using the1x

transformation?As a guide, relationships that have scatterplots which look like those shown below can often

(but not always) be linearised using the x to1

xtransformation.

x

y

x

y

Exercise 6B

These exercises are expected to be completed with the aid of a graphics calculator.

1 a Plot the data in the table, and comment on

the form of the relationship between y and x.x 0 1 2 3 4

y 16 15 12 7 0b Apply a squared transformation to the x values

(x2), again plot the data, and comment on the form of the relationship between y and x2.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 17: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

182 Essential Further Mathematics – Core

2 a Plot the data in the table, and comment on

the form of the relationship between y and x.x 1 2 3 4 5

y 3 9 19 33 51

b Apply a squared transformation to the x values

(x2), again plot the data, and comment on the form of the relationship between y and x2.

3 a Plot the data in the following table, and

comment on the form of the relationship

between y and x.

x 1 2 3 4 5

y 30 27 22 15 6

b Apply a squared transformation to the x values (x2), again plot the data, and comment

on the form of the relationship between y and x2.

4 a Plot the data in the following table,

and comment on the form of the

relationship between y and x.

x 1 10 100 400 600 1000

y 30 20 10 5 2 0

b Apply a log transformation to the x values (log x), again plot the data, and comment on

the form of the relationship between y and log x.

5 a Plot the data in the table, and

comment on the form of the

relationship between y and x.

x 5 10 150 500 1000

y 3.1 4.0 7.5 9.1 10.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on

the form of the relationship between y and log x.

6 a Plot the data in the table, and

comment on the form of the

relationship between y and x.

x 10 44 132 436 981

y 15.0 11.8 9.4 6.8 5.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on

the form of the relationship between y and log x.

7 a Plot the data in the table, and comment on the

form of the relationship between y and x.x 2 4 6 8 10

y 60 30 20 15 12b Apply a reciprocal transformation to the x values

(1/x), again plot the data and comment on the form of the relationship between y and 1/x.

8 a Plot the data in the table, and comment on the

form of the relationship between y and x.x 1 2 3 4 5

y 61 31 21 16 13b Apply a reciprocal transformation to the x values

(1/x), again plot the data and comment on the form of the relationship between y and 1/x.

9 a Plot the data in the following table, and

comment on the form of the relationship

between y and x.

x 2 4 6 8 10

y 10 70 90 100 106

b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment

on the form of the relationship between y and 1/x.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 18: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 183

c Name an x-axis transformation that should also work for this data. Try it and see.

d Name an x-axis transformation that should not work for this data. Try it and see.

10 The table below shows the diameter (in cm) of a number of umbrellas, along with the

number of people each umbrella is designed to keep dry.

Diameter 50 70 85 100 110

Number of people 1 2 3 4 5

a Construct a scatterplot showing the relationship between number of people and

umbrella diameter, and comment on the form.

b Apply a squared transformation to the x values (x2), again plot the data, and comment

on the form of the relationship between y and x2.

11 The table below shows the performance level on a task of a number of people, along with

the time spent (in minutes) in practising the task.

Time spent on practise 0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 7.0

Level of performance 1.0 1.5 2.0 3.0 3.0 3.5 4.0 3.5 3.9 3.6

a Construct a scatterplot showing the relationship between the time spent on practice and

level of performance, and comment on the form.

b Apply a log transformation to the x values (log x), again plot the data, and comment on

the form of the relationship between y and log x.

12 The table below shows the horsepower of several cars, along with their fuel consumption in

kilometres/litre.

Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7

Horsepower 155 125 75 110 138 88 80 70 100 103

a Construct a scatterplot showing the relationship between horsepower and fuel

consumption, and comment on the form.

b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment

on the form of the relationship between y and 1/x.

6.3 Transforming the y axisAnother way to linearise the relationship between x and y is to apply these transformations to

the y axis. Transforming the y axis will have the effect of moving the y values on the plot

vertically, and leave the x values unaltered. The square, log and reciprocal transformations can

be applied to the y axis with the following effects:SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 19: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

184 Essential Further Mathematics – Core

Transformation Outcome Graph

y2 Spreads out the large y values relative

to the smaller data values

x

y

log y Compresses large y values relative to

the smaller data values

x

y

1y

Also compresses large y values relative

to the smaller data values, to a greater

extent than log y. Note that values of y

less than 1 become greater than 1, and

values of y greater than 1 become less

than 1, so that the order of the data

values is reversed. x

y

The following examples show the effect on the relationship between x and y when the squared,

log and reciprocal transformations are applied to the y values. Once again, all these data

transformations can be very conveniently carried out with the aid of a graphics calculator.

Example 4 Linearising the relationship with a squared transformation

a Plot the data in this table, and comment on the form of the relationship between y and x.

x 0 1 2 3 4 5

y 0 3.2 4.5 5.5 6.3 7.1

b Apply a squared transformation to the y values (y2), again plot the data, and comment on

the form of the relationship between y2 and x.

Solution

a 1 Plot the values of y against x.

2 Decide if the form of the

relationship is linear or non-linear.

x

y

1 2 3 4 50

2

4

6

8

The relationship between y and x is non-linear.3 Write down your conclusion.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 20: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 185

b 1 Construct a new table of values. x 0 1 2 3 4 5

Y2 0 10.2 20.3 30.3 39.7 50.4

2 Plot the values of y2 against x.

3 Decide if the form of the

relationship is linear or non-linear.

x1 2 3 4 5

0

10

20

40

50

60

30

y 2

The relationship between y 2 and x is linear.4 Write down your conclusion.

Example 5 Linearising the relationship with the log transformation

a Plot the data in this table, and comment on the

form of the relationship between y and x.x 0 1 2 3 4 5

y 100 37 14 5 2 1b Apply a log transformation to the y values (log y),

again plot the data, and comment on the form of the relationship between log y and x.

Solution

a 1 Plot the values of log y against x.

2 Decide if the form of the

relationship is linear or non-linear.

y

x1 2 3 4 5

0

20

40

60

80

100

The relationship between y and x is non-linear.3 Write down your conclusion.

b 1 Construct a new table of values.x 0 1 2 3 4 5

log y 2.00 1.57 1.15 0.70 0.30 0.00

2 Plot the values of log y against x.

3 Decide if the form of the

relationship is linear or non-linear.

x

log y

1 2 3 4 50

0.5

1.0

1.5

2.0

The relationship between log y and x is linear.4 Write down your conclusion.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 21: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

186 Essential Further Mathematics – Core

Example 6 Linearising the relationship with the1y

transformation

a Plot the data in this table, and comment on the

form of the relationship between y and x.x 1 2 3 4 5

y 10.0 5.0 3.3 2.5 2.0b Apply a reciprocal transformation to the y values

(1/y), again plot the data, and comment on the form of the relationship between x and 1/y.

Solution

a 1 Plot the values of y against x.

2 Decide if the form of the relationship

is linear or non-linear.

x

y

1 2 3 4 52

4

6

8

10

The relationship between y and x is non-linear.3 Write down your conclusion.

b 1 Construct a new table of values.x 1 2 3 4 5

1/y 0.1 0.2 0.3 0.4 0.52 Plot the values of1

yagainst x.

3 Decide if the form of the

relationship is linear or non-linear.

x1 2 3 4 5

0.1

0.2

0.3

0.4

0.5

1y

The relationship between1

yand x is linear.4 Write down your conclusion.

What sorts of non-linear relationships can we linearise using the y2 transformation?The y2 transformation has the effect of stretching out the upper end of the y scale. As a guide,

relationships that have scatterplots which look like those shown below can often (but not

always) be linearised using the y to y2 transformation. Note that for the y2 transformation to

apply, the scatterplot should peak or bottom around y = 0.

x

y

x

ySAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 22: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 187

What sorts of non-linear relationships can we linearise using the log y transformation?The log y transformation has the effect of compressing the upper end of the y scale. As a guide,

relationships that have scatterplots which look like those shown below can often (but not

always) be linearised using the y to log y transformation.

x

y

x

y

What sorts of non-linear relationships can we linearise using the1y

transformation?

As a guide, relationships that have scatterplots which look like those shown below can often

(but not always) be linearised using the y to1

ytransformation.

x

y

x

y

Exercise 6C

These exercises are expected to be completed with the aid of a graphics calculator.

1 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 0 2 4 6 8 10

y 1.2 2.8 3.7 4.5 5.1 5.7b Apply a squared transformation to the y values

(y2). Plot the data, and comment on the form of the relationship between y2 and x.

2 a Plot the data in the table. Comment on

the form of the relationship between

y and x.

x 5 10 15 20 25 30

y 13.2 12.2 11.2 10.0 8.7 7.1

b Apply a squared transformation to the y values (y2). Plot the data, and comment on the

form of the relationship between y2 and x.

3 a Plot the data in the table. Comment on

the form of the relationship between

y and x.

x 2 6 11 12 21 40

y 5.1 6.2 7.3 7.5 9.1 11.8

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 23: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

188 Essential Further Mathematics – Core

b Apply a squared transformation to the y values (y2). Plot the data and comment on the

form of the relationship between y2 and x.

4 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 0.1 0.2 0.3 0.4 0.5

y 15.8 25.1 39.8 63.1 100.0b Apply a log transformation to the y values

(log y). Plot the data and comment on the form of the relationship between log y and x.

5 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 2 4 6 8 10

y 7.94 6.31 5.01 3.98 3.16b Apply a log transformation to the y values

(log y). Plot the data and comment on the form of the relationship between log y and x.

6 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 1 3 5 7 9

y 7 32 147 681 3162b Apply a log transformation to the y values

(log y). Plot the data, and comment on the form of the relationship between log y and x.

7 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 1 2 3 4 5

y 1 0.5 0.33 0.25 0.20b Apply a reciprocal transformation to the

y values (1/y). Plot the data and comment on the form of the relationship

between 1/y and x.

8 a Plot the data in the table. Comment on the

form of the relationship between y and x.x 0.2 0.4 0.6 0.8 1.0

y 0.71 0.56 0.45 0.38 0.33b Apply a reciprocal transformation to the y

values (1/y). Plot the data and comment on the form of the relationship between

1/y and x.

9 a Plot the data in the table. Comment

on the form of the relationship

between y and x.

x 11 14 26 35 41

y 0.43 0.34 0.19 0.14 0.12

b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on

the form of the relationship between 1/y and x.

c Name a y-axis transformation that should also work for this data. Try it and see.

d Name a y-axis transformation that should not work for this data. Try it and see.

10 The time taken for a local anaesthetic to take effect is related to the dose given. To

investigate this relationship a researcher collected the data shown.

Dose 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5

Time 3.67 3.55 3.42 3.29 3.15 3.00 2.85 2.68 2.51 2.32 2.12

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 24: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 189

a Construct a scatterplot showing the relationship between the dose of anaesthetic and the

time taken for it to take effect, and comment on the form.

b Apply a squared transformation to the time values (y), again plot the data, and comment

on the form of the relationship between time squared (y2) and dose and x.

11 The table below shows the number of internet users signing up with a new internet service

provider for each of the first nine months of their first year of operation.

Number 24 32 35 44 60 61 78 92 118

Month 1 2 3 4 5 6 7 8 9

a Construct a scatterplot showing the relationship between number of users signing up

and month, and comment on the form. Month is the independent variable.

b Apply a log transformation to the number of users (y), again plot the data, and comment

on the form of the relationship between log (number) and month and x.

12 A group of ten students was given an opportunity to practise a complex matching task as

often as they liked before they were assessed on the task. The number of times they

practised the task and the number of errors made when assessed are given in the table

below.

Number 1 2 2 4 5 6 7 7 9 11

Errors 14 9 11 5 4 4 3 3 2 2

a Construct a scatterplot showing the relationship between number of practices and

number of errors (y), and comment on the form.

b Apply a reciprocal transformation to the number of errors values (1/y), again plot the

data, and comment on the form of the relationship between 1/number of errors and

number of practices.

6.4 Choosing and applying the appropriatetransformationPutting together the information in Sections 6.2 and 6.3, we can see that there may be more

than one transformation which linearises the scatterplot. The forms of the scatterplots that can

be transformed by the squared, log or reciprocal transformations can be largely classified into

one of four categories, shown as the circle of transformations.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 25: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

190 Essential Further Mathematics – Core

y2

log x

1x

y 2

x2

log y

log x

1x

1y

x2

log y

1

Possible transformations Possible transformations

The circle of transformations

y

Note that the transformations we have introduced in this chapter are only able to linearise

relationships which are consistently increasing or decreasing.

The advantage of having alternatives is that in practice, we can always try each of them to

see which gives us the best result. How do we decide which transformation is the best? The

best transformation is the one that results in the best linear model. To choose the best linear

model we will consider for each transformation applied:

the residual plot, in order to evaluate the linearity of the transformed relationship

the value of the coefficient of determination (r2): a higher value indicates a better fit

This procedure is illustrated in Example 7.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 26: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 191

Example 7

The data in this table gives life expectancy in years and gross national product, GNP, in dollars

for 24 countries in 1982. Using an appropriate transformation, find a regression model for the

relationship between life expectancy in years and GNP.

Country GNP Life expectancy

Nicaragua 950 58

Paraguay 1670 65

Venezuela 4250 68

France 11 520 74

West Germany∗ 12 280 73

Greece 4170 73

Norway 14 300 75

Czechoslovakia∗ 5540 71

Austria 9830 72

Jordan 1680 61

Sri Lanka 320 67

Brunei 22 260 66

Country GNP Life expectancy

Indonesia 550 50

North Korea 930 66

Mongolia 940 64

Taiwan 2 670 72

Australia 11 220 74

Congo 1 420 48

Ethiopia 150 41

Guinea 330 44

Mauritania 520 44

Nigeria 940 49

Togo 350 48

Zaire 180 48

Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537∗ West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia

Solution

1 Decide which of the variables is the

independent variable, and which is the

dependent variable.

The independent variable is GNP. The

dependent variable is Life expectancy.

2 Plot the values of y against x, decide if

the form of the relationship is linear or

non-linear, and find the value of the

coefficient of determination (r2).

Life

exp

ecta

ncy

GNP

3 Write down your conclusion. The relationship between y and x is

non-linear: r 2 = 36.7%.

4 Compare the shape of this plot to those in the

circle of transformations (page 166). The

scatterplot is similar to the plot in the top

left-hand corner. Thus, the y2, log x and 1x

are the transformations to investigate.

Suitable transformations are y 2, log x

and

(1x

).

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 27: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

192 Essential Further Mathematics – Core

Try the y2 transformation

5 Calculate the values of (Life expectancy)2

and plot these against GNP. Comment

on the linearity of the plot.

GNP

Lif

e ex

pect

ancy

squ

ared

6 Fit a regression line, and find the value of

the coefficient of determination (r2).

Produce a residual plot, and use this to

comment on the form of the relationship.

r 2 = 38.4%. The relationship between (Life

expectancy) 2 and x is still non-linear. This is

confirmed by the residual plot.

Res

idua

l

GNP

0

7 Comment on the effect of the

transformation. The y 2 transformation has not really helped.

Try the log x transformation

5 Calculate the values of log GNP and plot

these against Life expectancy.

log GNP

Lif

e ex

pect

ancy

6 Fit a regression line, and find the value of

r2. Produce a residual plot, and use this to

comment on the form of the relationship.

r 2 = 66.0%. The relationship between Life

expectancy and log GNP is closer to linear.

This is confirmed by the residual plot.SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 28: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 193

7 Comment on the effect of the

transformation.

log GNP

Res

idua

l

The log x transformation has linearised the

relationship quite well.

Try the 1/x transformation

8 Calculate the values of1

GNPand plot

Life expectancy against1

GNP.

Lif

e ex

pect

ancy

1/GNP

9 Fit a regression line, and find the value of

r2. Produce a residual plot and use this to

comment on the form of the relationship.

r 2 = 51.5%. The relationship between Life

expectancy and 1/GNP is reasonably linear.

This is confirmed by the residual plot.

1/GNP

Res

idua

l

0

10 Comment on the effect of the

transformation.

The 1/x transformation has done a

reasonable job in linearising the relationship.

11 Decide which transformation is the

most appropriate for this relationship.

Choose the transformation which

gives the most linear relationship

(from the residual plots) and the

highest value of r2.

The most appropriate transformation to use

here is the log x transformation, as the

residual plot shows that the relationship

between log GNP and Life expectancy is

linear, and this model has the highest

coefficient of determination, r 2 = 66.0%.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 29: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

194 Essential Further Mathematics – Core

12 As the relationship between Life expectancy

and log (GNP) appears to be linear and there

are no obvious outliers, we can use the

least squares method to fit a line to the data.

Using a calculator, find the equation of the

least squares regression line and write it in

terms of the transformed variables.

Using the log x transformation gives a

regression model for the relationship:

Life expectancy = 14.3 + 14.5 × log(GNP)

Note: The independent variable (IV) is nowlog GNP, and the dependent variable (DV) isLife expectancy.

Some commentsIt might seem ‘unnatural’ to talk about the wealth of a country in terms of log (GNP), yet when

we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure

than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka

is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which

tells us that Australia’s GNP is 101.5884 or 35 times that of Sri Lanka. ‘Natural’ units of

measurement are more often those that are familiar rather than those that are most useful!

Exercise 6D

1 The following scatterplots show non-linear relationships. For each scatterplot, state which

of the transformations x2, log x, 1/x , y2, log y, 1/y, if any, you would apply to linearise the

relationship.a

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

y

b

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

y

c

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

y

d

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

ySAM

PLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 30: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Chapter 6 — Data transformation 195

2 The data below gives the yield in kilograms and length in metres of twelve commercial

potato plots.

Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320

Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3

a Construct a scatterplot showing the relationship between yield and length of plot and

comment on the form.

b Using an appropriate transformation, find a regression model for the relationship

between yield in kilograms and length of plot in metres.

3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per

pack. Some data drawn from that study is shown opposite.

Cost ($) 4.00 4.50 4.80 5.50 6.00 6.50 7.50

Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0

a Construct a scatterplot showing the relationship between the cost of cigarettes and

cigarette consumption, and comment on the form.

b Using an appropriate transformation find a regression model for the relationship between

the cost of cigarettes and cigarette consumption.

4 The population of a large town increased over a 13 year period as shown in the table.

Year Population Year Population

1 58860 8 61726

2 57770 9 60387

3 58206 10 61646

4 59513 11 62347

5 59983 12 64185

6 60123 13 67158

7 59763

a Construct a scatterplot showing the

annual population growth of the town,

and comment on the form.

b Using an appropriate transformation,

find a regression model for the annual

population growth of the town.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 31: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

196 Essential Further Mathematics – Core

5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and

the US dollar over a period of 18 months in the 1990s is given in the table below.

Month Exchange rate Month Exchange rate

(US $) (US $)

1 0.77 10 0.75

2 0.77 11 0.75

3 0.77 12 0.72

4 0.76 13 0.72

5 0.78 14 0.69

6 0.78 15 0.69

7 0.77 16 0.68

7 0.76 17 0.71

8 0.76 18 0.70

9 0.76

a Construct a scatterplot showing

the exchange rate over the 18-month

period, and comment on the form.

b Using an appropriate transformation,

find a regression model for the

exchange rate over that 18-month

period.

6 The table below shows the percentage of people who can read (literacy rate) and the gross

domestic product (GDP) for a selection of 14 countries.

Country Literary Gross domestic

rate (%) product/capita

Botswana 72 2677

Cambodia 35 260

Canada 97 19904

Ethiopia 24 122

France 99 18944

Georgia 99 4500

Germany 99 17539

Honduras 73 1030

Japan 99 19860

Liberia 40 409

Pakistan 35 406

Saudi Arabia 62 6651

Switzerland 99 22384

Syria 64 2436

a Construct a scatterplot showing the

relationship between literacy rate

and GDP, and comment on the form.

b Using an appropriate transformation

find a regression model for the

relationship between literacy rate

and GDP for this group of countries.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 32: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Review

Chapter 6 — Data transformation 197

Key ideas and chapter summary

Data transformation This means changing the scale on either the x or y axis. It is

performed when a residual plot shows that the underlying

relationship in a set of bivariate data is clearly non-linear.

x2 or y2 transformation The square transformation stretches out the upper end of

the scale on an axis.

log x or log y transformation The log transformation compresses the upper end of the

scale on an axis.1

xor

1

ytransformation The reciprocal transformation compresses the upper end

of the scale on an axis to a greater extent than the log

transformation.

Residual plots Residual plots are used to assess the effectiveness of each

data transformation.

Coefficient of determination (r2) The transformation which results in a linear relationship

and which has the highest value of the coefficient of

determination is considered to be the best transformation.

The circle of transformations The circle of transformations provides guidance in

choosing the transformations that can be used to linearise

various types of scatterplots. See page 166.

Skills check

Having completed this chapter you should be able to:

recognise which of the x2, log x,1

x, y2, log y or

1

ytransformations might be used to

linearise a bivariate relationship

apply each of these transformations to a data set

use residual plots and the coefficient of determination r2 to decide which

transformation gives the best model for the relationship

use the transformed variable as part of a regression analysis to give a model for the

relationship

Multiple-choice questions

1 The missing data values, a and b, in the table are:

value 1 2 3 4

(value)2 a 4 9 16

log(value) 0 b 0.477 0.602

A a = 0, b = 0.5 B a = 1, b = 0.5 C a = 1, b = 0.301

D a = 1, b = 0.602 E a = 1, b = 0.693

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 33: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Rev

iew

198 Essential Further Mathematics – Core

2 Select the statement which correctly completes the sentence:

‘The effect of a log transformation is to . . .’

A stretch the high values in the data B maintain the distance between values

C stretch the low values in the data D compress the high values in the data

E reverse the order of the values in the data

3 The scatterplot opposite shows the

relationship between the number of weeks

each person has been on a diet program and

their weight loss in kilograms for a group

of subjects. A least squares regression

line has been fitted to the data.

Number of weeks on a diet

0

2

4

6

8

10

12

14

2 3 4 5 6 7

Wei

ght l

oss

The residual plot for this least squares line would look like:

A

Num

ber

of w

eeks

on

a di

et

2

3

4

5

6

7

–4.00 –2.00 0.00 2.00 4.00Residual

B

Number of weeks on a diet2 3 4 5 6 7

–4.00

–2.00

0.00

2.00

4.00

Res

idua

l

C

Num

ber

of w

eeks

on

a di

et

0 2 64 8 10 12 142

3

4

5

6

7

Weight loss

D

Number of weeks on a diet2 3 6 74 5

02468

101214

Wei

ght l

oss

E

Number of weeks on a diet2 3 4 5 6 7

–4.00

–2.00

0.00

2.00

4.00

Res

idua

l

4 The relationship between two variables y and

x as shown in the scatterplot is non-linear.

01 2 3 4 5 6 7 8 9 10

1

2

3

4

5

x

y

In an attempt to transform the relationship to

linearity, a student would be advised to:

A leave out the first four points

B use a y2 transformation

C use a log y transformation

D use a1

ytransformation

E use a least squares regression line

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 34: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Review

Chapter 6 — Data transformation 199

5 The relationship between two variables y and x as shown in the scatterplot is

non-linear.

01 2 3 4 5 6 7 8 9 10

1

2

3

4

5

x

yWhich of the following sets of transformations

could possibly linearise this relationship?

A log y,1

y, log x,

1

xB y2, x2

C y2, log x,1

xD log y,

1

y, x2

E ax + b

6 The relationship between two variables y and x as shown in the scatterplot is

non-linear.

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

yWhich of the following transformations is most

likely to linearise the relationship?

A a1

xtransformation B a y2 transformation

C a log y transformation D a1

ytransformation

E a log x transformation

7 The relationship between two variables y and x as shown in the scatterplot is clearly

non-linear.

1 2 3 4 5 6 7 8 9 10x0

1

2

3

4

5

yIn an attempt to transform the relationship

to linearity, a student would be advised to

apply:

A an x2 transformation

B a y2 transformation

C a log y transformation

D a1

ytransformation

E none of these

8 Brian has determined from a scatterplot of his data that the appropriate

transformations for his data are log x, 1/x and y2. After applying each of these

transformations to the data, he obtains the results shown below.

Model Residuals r2

y vs x Curved 79.6%

y vs log x Random 80.8%

y vs 1/x Random 81.9%

y2 vs x Random 88.4%

(cont’d.)

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 35: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Rev

iew

200 Essential Further Mathematics – Core

Based on the information in the table, which transformation would you suggest

Brian use?

A an x2 transformation B a y2 transformation C a log x transformation

D a1

xtransformation E no transformation

9 When investigating the relationship between the weight of the strawberries picked

from a strawberry patch, and the width of the patch, Suzie decides that an x2

transformation is appropriate. After transforming the data, she fits a least squares

regression line to the data and determines that the intercept is 10 and the slope is 5.

Based on this information, the model that Suzie has fitted to the data can be written

as:

A (weight)2 = 10 + 5 × width B weight = 5 + 10 × (width)2

C weight = 10 + 5 × (width)2 D (weight)2 = 10 + 5 × (width)2

E (weight)2 = 5 + 10 × width

10 Suppose that the model which describes the relationship between the hours spent

studying for an exam and the mark achieved can be modelled by the equation:

Mark = 20 + 40 × log (Hours)

From this model, we would predict that a student who studies for 20 hours would

score a mark (to the nearest whole number) of:

A 80 B 78 C 180 D 72 E 140

Extended-response questions

1 Measurements of distance travelled in metres and time taken in seconds were made

on a falling body. The data is given in the table below.

Time 0 1 2 3 4 5 6

Distance 0 5.2 18.0 42.0 79.0 128.0 168.0

Time2

a Construct a scatterplot of the data and comment on its form.

b Determine the values of (Time)2 and complete the table.

c Construct a scatterplot of Distance against (Time)2.

d Obtain a residual plot for the new model and comment on the linearity.

e Determine the value of r2 for the new model.

f Write down the regression equation for the new model in terms of the variables

in the question.

g Use the regression equation to predict the distance travelled in seven seconds.

2 The data in the table below shows the marks obtained by students on a test and the

amount of time they reported studying for the test:

Mark 62 74 79 80 56 86 92 87 64 88 48 32

Time (hours) 1.5 2.25 3.0 2.5 0.8 3.5 6.0 2.75 1.0 4.5 0.5 0.1

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 36: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Review

Chapter 6 — Data transformation 201

a We want to predict a student’s mark from the time they reported studying for the

test. In this situation, which is the dependent variable and which is the

independent variable?

b Construct a scatterplot and comment on the relationship between test mark and

time spent studying in terms of direction, outliers, form and strength.

c i Fit a linear model to the data and record its equation. Interpret the slope in

terms of the problem at hand.

ii Calculate the coefficient of determination and interpret.

iii Construct a residual plot and use it to comment on the suitability of modelling

the relationship between Mark and Time spent studying with a straight line.

d Apply a log transformation to Time. Then:

i construct a scatterplot for the transformed data

ii find the equation of the least squares regression line for the transformed data

iii use the equation to predict the mark obtained after 5 hours of study

iv calculate the coefficient of determination and interpret

v construct a residual plot and use it to comment on the linearity of the

transformed model

3 The following are the testosterone levels and the age at first conviction for violent

and aggressive crimes collected on a sample of young male prisoners. It is believed

that the higher the testosterone level in a male prisoner, the earlier they are likely to

be convicted of a violent and aggressive crime. A correlation and regression

analysis is also given.

Testosterone Age at firstconviction

1305 11

1000 12

1175 13

1495 14

1060 15

800 16

1005 16

710 17

1150 18

605 20

690 21

700 23

625 24

610 27

450 30

1012141618202224262830

400

600

800

1000

1200

1400

1600

Testosterone level

Age

y = 31.9 – 0.015xr2 = 0.662

400

600

800

1000

1200

1400

1600

Testosterone level

Res

idua

l

10

2345

–1–2–3–4–5

(cont’d.)

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 37: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Rev

iew

202 Essential Further Mathematics – Core

a What is the value of Pearson’s correlation coefficient r?

b Write the equation of the least squares regression line in terms of Testosterone

level and Age.

c Interpret the value of r2 in terms of Testosterone level and Age.

d Use the residual plot to comment on the linearity of the relationship.

e Construct a scatterplot of Age against log (Testosterone).

f Obtain a residual plot for the new model and comment on the linearity.

g Determine the value of r2 for the new model.

h Write down the regression equation for the new model in terms of the variables

in the question.

4 Are infant mortality rates in a country related to the number of doctors in a

country? The data below gives infant mortality rates (deaths per 1000 births) and

doctor numbers (per 100 000 people) for 17 countries.

Infant mortality No. of doctors Infant mortality No. of doctors

12 192 15 270

13 222 85 9

12 154 20 357

14 294 21 250

10 182 54 79

10 179 75 59

7 204 121 27

10 271 71 52

111 61

a Construct a scatterplot of infant mortality against number of doctors and

comment on the relationship between infant mortality rate and doctor numbers

in terms of direction, outliers, form and strength.

b Construct a scatterplot of Infant mortality against log (Number of doctors).

c Obtain a residual plot for the new model and comment on the linearity.

d Determine the value of r2 for the new model.

e Write down the regression equation for the new model in terms of the variables

in the question.

f Use the regression equation to predict the infant mortality rate when there are

100 doctors (per 100 000).

5 Tree ages can be determined by cutting down a tree and counting the number of

rings on the stump of its trunk. This, however, is a destructive process and it would

be useful to have a method of working out the approximate age of a tree without

having to cut it down. Noting the obvious, that trees tend to get bigger as they get

older, we might be able to use some external measurement of size to help us

estimate the age of a tree.

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin

Page 38: CORE Data transformation SAMPLECORE Data transformation ... form in which it was initially gathered and/or reported. ... 168 Essential Further Mathematics –  · 2008-9-9

P1: FXS/ABE P2: FXS9780521740517c06.xml CUAU031-EVANS September 4, 2008 13:28

Review

Chapter 6 — Data transformation 203

The data below shows the age (in years) and diameter at chest height (in cm) of a

sample of trees of the same species taken from a commercial plantation.

Age Diameter Age Diameter(years) (centimetres) (years) (centimetres)

4 2.0 16 11.4

5 2.0 18 11.7

8 2.5 22 14.7

8 5.1 25 16.5

8 7.5 29 15.2

10 5.1 30 15.2

10 8.9 34 17.8

12 12.4 38 17.8

13 9.0 40 19.1

14 6.4

a We wish to predict the age of a tree from its diameter at chest height. In this

situation, which is the dependent variable and which is the independent variable?

b Construct a scatterplot and comment on the relationship between age and

diameter in terms of direction, outliers, form and strength.

c i Fit a linear model to the data and record its equation. Interpret the slope in

terms of the problem at hand.

ii Calculate the coefficient of determination and interpret.

iii Form a residual plot and use it to comment on the suitability of modelling the

relationship between age and diameter with a straight line.

d Use the x2 transformation to linearise the data. Then:

i construct a scatterplot of age against diameter squared

ii find the equation of the least squares regression line for the transformed data

iii calculate the coefficient of determination and interpret

iv form a residual plot and use it to comment on the suitability of modelling the

relationship between age and diameter squared with a straight line

SAMPLE

Cambridge University Press • Uncorrected Sample pages • 978-0-521-61328-6 • 2008 © Jones, Evans, Lipson TI-Nspire & Casio ClassPad material in collaboration with Brown and McMenamin