Upload
vuongkhanh
View
220
Download
0
Embed Size (px)
Citation preview
1A 1A Classifying data 5
Exercise 1A
Basic ideas1 a What is a categorical variable? Give an example.
b What is a numerical variable? Give an example.
2 There are two types of categorical variables. Name them and give an example of each.
3 There are two types of numerical variables. Name them and give an example of each.
Types of variables: categorical or numerical4 Classify each of the following variables (in italics) as categorical or numerical when
recording information about:
time (in minutes) spent exercisingeach day
a time spent playing computer games(hours)
e
number of frogs in a pondb number of people in a busfbank account numbersc eye colour (brown, blue, green )gheight (short, average, tall)d post code.h
Categorical variables: nominal or ordinal5 Classify the categorical variables identified below (in italics) as nominal or ordinal.
a The colour of a pencilb The different types of animals in a zooc The floor levels in a building (0, 1, 2, 3 . . . )d The speed of a car (on or below the speed limit, above the speed limit)e Shoe size (6, 8, 10, . . . )f Family names
Numerical variables: discrete or continuous6 Classify the numerical variables identified below (in italics) as discrete or continuous.
a The number of pages in a bookb The cost ( in dollars) to fill the tank of a car with petrolc The volume of petrol (in litres) used to fill the tank of a card The speed of a car in km/he The number of people at a football matchf The air temperature in degrees Celsius
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
1B 1B Displaying and describing the distributions of categorical variables 11
Exercise 1B
Constructing frequency tables from raw data1 a In a frequency table, what is the mode?
b Identify the mode in the following datasets.
i Grades: A A C B A B B B B D C
ii Shoe size: 8 9 9 10 8 8 7 9 8 10 12 8 10
2 The following data identify the state of residence of a group of people, where1 = Victoria, 2 = South Australia and 3 =Western Australia.
2 1 1 1 3 1 3 1 1 3 3
a Is the variable state of residence, categorical or numerical?b Form a frequency table (with both numbers and percentages) to show the
distribution of state of residence for this group of people. Use the table inExample 1 as a model.
c Construct a bar chart using Example 2 as a model.
3 The size (S = small, M = medium, L = large) of 20 cars was recorded as follows.S S L M M M L S S M
M S L S M M M S S Ma Is the variable size in this context numerical or categorical?b Form a frequency table (with both numbers and percentages) to show the
distribution of size for these cars. Use the table in Example 1 as a model.c
Constructing a percentage segmented bar chart from a frequency table4 The table shows the frequency distribution of the
place of birth for 500 Australians.
a Is place of birth an ordinal or a nominalvariable?
b Display the data in the form of a percentagesegmented bar chart.
Place of birth Percentage
Australia 78.3
Overseas 21.8
Total 100.1
5 The table records the number of newcars sold in Australia during the firstquarter of 1 year, categorised by type ofvehicle (private, commercial).
a Is type of vehicle an ordinal or anominal variable?
Frequency
Type of vehicle Number Percentage
Private 132 736
Commercial 49 109
Total
Construct a percentage bar chart.
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
12 Core ! Chapter 1 ! Displaying and describing data distributions 1B
b Copy and complete the table giving the percentages correct to the nearest wholenumber.
c Display the data in the form of a percentage segmented bar chart.
Analysing frequency tables and writing reports6 The table shows the frequency
distribution of school type for a numberof schools. The table is incomplete.
a Write down the information missingfrom the table.
b How many schools are categorisedas ‘independent’?
Frequency
School type Number Percentage
Catholic 4 20
Government 11
Independent 5 25
Total 100
c How many schools are there in total?d What percentage of schools are categorised as ‘government’?e Use the information in the frequency table to complete the following report
describing the distribution of school type for these schools.
Reportschools were classified according to school type. The majority of these
schools, %, were found to be . Of the remaining schools,were while 20% were .
7 Twenty-two students were asked the question,‘How often do you play sport?’, with thepossible responses: ‘regularly’, ‘sometimes’or ‘rarely’. The distribution of responses issummarised in the frequency table.
a Write down the information missing fromthe table.
Frequency
Plays sport Number Percentage
Regularly 5 22.7
Sometimes 10
Rarely 31.8
Total 22
b Use the information in the frequency table to complete the report below describingthe distribution of student responses to the question, ‘How often do you play sport?’
ReportWhen students were asked the question, ‘How often do you play sport’,the dominant response was ‘Sometimes’, given by % of the students. Ofthe remaining students, % of the students responded that they playedsport while % said that they played sport .
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
1B 1C Displaying and describing the distributions of numerical variables 13
8 The table shows the frequency distribution ofthe eye colour of 11 preschool children.
Use the information in the table to write a briefreport describing the frequency distribution ofeye colour.
Frequency
Eye colour Number Percentage
Brown 6 54.5
Hazel 2 18.2
Blue 3 27.3
Total 11 100.0
1C Displaying and describing the distributions ofnumerical variables
" The grouped frequency distributionWhen looking at ways of organising and displaying numerical data, we are faced with theproblem of how to deal with continuous variables that can take a large range of values – forexample, age (0–100+). Listing all possible ages would be tedious and produce a large andunwieldy frequency table or graphical display.
To solve this problem, we group the data into a small number of convenient intervals. Wethen organise the data into a frequency table using these data intervals. We call this sort oftable a grouped frequency table.
Constructing a grouped frequency table
The data below give the average hours worked per week in 23 countries.
35.0 48.0 45.0 43.0 38.2 50.0 39.8 40.7 40.0 50.0 35.4 38.840.2 45.0 45.0 40.0 43.0 48.8 43.3 53.1 35.6 44.1 34.8
Form a grouped frequency table with five intervals.
Example 5
Solution
1 Set up a table as shown. Use fiveintervals: 30.0–34.9, 35.0–39.9, . . . ,50.0–54.9.
2 List these intervals, in ascending order,under ‘Average hours worked’.
3 Count the number of countries whoseaverage working hours fall into each ofthe intervals.Record these values in the ‘Number’column.
Average Frequencyhours worked Number Percentage30.0−34.9 1 4.335.0−39.9 6 26.140.0−44.9 8 34.845.0−49.9 5 21.750.0−54.9 3 13.0Total 23 99.9
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
24 Core ⌅ Chapter 1 ⌅ Displaying and describing data distributions 1C
Exercise 1C
Constructing a histogram from a frequency table
1 Construct a histogram to display theinformation in the frequency table opposite.Use the histogram in Example 6 as a model.Label axes and mark scales.
Population density Frequency
0–199 11
200–399 4
400–599 4
600–799 2
800–999 1
Total 22Reading information from a histogram2 The histogram opposite displays the distribution
of the number of words in 30 randomly selectedsentences.
a What percentage of these sentences contained:
i 5–9 words?ii 25–29 words?iii 10–19 words?iv fewer than 15 words?
Perc
enta
ge
Number of wordsin sentence
0
5
10
30
35
30252015105
25
15
20
Write answers correct to the nearest per cent.b How many of these sentences contained:
20–24 words?i more than 25 words?ii
c What is the modal interval?
3 The histogram opposite displays thedistribution of the average battingaverages of cricketers playing for adistrict team.
a How many players have theiraverages recorded in this histogram?
b How many of these cricketers had abatting average:
20 or more?i
less than 15?ii
Freq
uenc
y
Batting average0
030 45 55504035252015105
1
2
3
4
at least 20 but less than 30?iii
of 45?iv
c What percentage of these cricketers had a batting average:
50 or more?i at least 20 but less than 40?ii
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
1C 1C Displaying and describing the distributions of numerical variables 25
Constructing a histogram from raw data using a CAS calculator4 The pulse rates of 23 students are given below.
86 82 96 71 90 78 68 71 68 88 76 7470 78 69 77 64 80 83 78 88 70 86
b i What is the starting point of the third column?ii What is the ‘count’ for the third column? What are the actual data values?
c Redraw the histogram so that the column width is five and the first column startsat 60.
d For this histogram, what is the count in the interval ‘65 to <70’?
5 The numbers of children in the families of 25 VCE students are listed below.1 6 2 5 5 3 4 1 2 7 3 4 53 1 3 2 1 4 4 3 9 4 3 3
b What is the starting point for the fourth column and what is the count?c Redraw the histogram so that the column width is two and the first column starts
at 0.d i What is the count in the interval from 6 to less than 8?
ii What actual data value(s) does this interval include?
Determining the shape, centre and spread from a histogram6 Identify each of the following histograms as approximately symmetric, positively
skewed or negatively skewed, and mark the following.
i The mode (if there is a clear mode)ii Any potential outliersiii The approximate location of the centre
Freq
uenc
y
Histogram A05
101520a
Histogram B
Freq
uenc
y
020406580b
Histogram C
Freq
uenc
y
0
5
10
15
20c
Histogram D
Freq
uenc
y
0
5
10
15
20d
a Use a CAS calculator to construct a histogram so that the first column starts at 63and the column width is two.
a Use a CAS calculator to construct a histogram so that the column width is one andthe first column starts at 0.5.
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
26 Core ! Chapter 1 ! Displaying and describing data distributions 1C
Histogram EFr
eque
ncy
05
101520e
Histogram F
Freq
uenc
y
05
101520f
7 These three histograms show themarks obtained by a group ofstudents in three subjects.
a Are each of the distributionsapproximately symmetric orskewed?
b Are there any clear outliers?c Determine the interval
containing the central mark foreach of the three subjects.
d In which subject was thespread of marks the least?
Freq
uenc
y
Subject A Subject B Subject CMarks
02 14 18 22 26 30 34 38 42 46106
1
32
456789
10
Use the maximum range to estimate the spread.e In which subject did the marks vary most? Use the range to estimate the spread.
Describing a histogram in the context of its data8 The histogram opposite shows
the distribution of pulse rate for28 students.Use the histogram to completethe report below describingthe distribution of pulse rate interms of shape, centre, spreadand outliers (if any). 60
0
1
2
3
4
5
6
65 70 75 80Pulse rate (beats per minute)
85 90 95 100 105 110 115
Freq
uenc
y (c
ount
)
ReportFor the students, the distribution of pulse rates is with anoutlier. The centre of the distribution lies between beats per minute andthe spread of the distribution is beats per minute. The outlier lies insomewhere between beats per minute.
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
1C 1D Using a log scale to display data 27
9 The histogram opposite shows the distribution oftravel times (in minutes) for 42 journeys from anouter suburban station to the city.
Use the histogram to write a brief reportdescribing the distribution of travel times interms of shape, centre, spread and outliers (ifany).
0
12
10
8
6
4
2
55 9590858075706560
Freq
uenc
y
Travel time (minutes)
1D Using a log scale to display dataMany numerical variables that we deal with in statistics have values that range overseveral orders of magnitude. For example, the population of countries range from a fewthousand to hundreds of thousands, to millions, to hundreds of millions to just over 1 billion.Constructing a histogram that effectively locates every country on the plot is impossible.
One way to solve this problem is to use a scale that spreads out the countries with smallpopulations and ‘pulls in’ the countries with huge populations.
A scale that will do this is called a logarithmic scale (or, more commonly, a log scale).However, before you learn to apply log scales, you will have to learn something aboutlogarithms.
" A brief introduction to logarithms to the base 10 and theirinterpretation
Consider the numbers:
0.01, 0.1, 1, 10, 100, 1000, 10 000, 100 000, 1 000 000
Such numbers can be written more compactly as:
10−2, 10−1, 100, 101, 102, 103, 104, 105, 106
In fact, if we make it clear we are only talking about powers of 10, we can merely
write down the powers:
−2, −1, 0, 1, 2, 3, 4, 5, 6
These powers are called the logarithms of the numbers or ‘logs’ for short.
When we use logarithms to write numbers as powers of 10, we say we are working withlogarithms to the base 10. We can indicate this by writing log10.
Note: We could also use logarithms to write numbers as powers of two, for example, 8 = 23, or powersof 5 – for example, 625 = 54. In these cases we would be working with logarithms to the base 2 and 5respectively. Only base 10 logarithms are required for this course.
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
34 Core ! Chapter 1 ! Displaying and describing data distributions 1D
Exercise 1D
Determining logs from numbers1 Using a CAS calculator, find the logs of the following numbers correct to one decimal
place.
2.5a 25b 250c 2500d0.5e 0.05f 0.005g 0.0005h
Determining numbers from logs2 Find the numbers whose logs are:
−2.5a −1.5b −0.5c 0d
Write your decimal answers correct to two significant figures.
Constructing a histogram with a log scale3 The brain weights of the same 27 animal species (in g) are recorded below.
465 423 120 115 5.50 50.0 4600 419 655
115 25.6 680 406 1320 5712 70.0 179 56.0
1.00 0.40 12.1 175 157 440 155 3.00 180a Construct a histogram to display the distribution of brain weights and comment on
its shape.b Construct a histogram to display the log of the brain weights and note the shape of
the distribution.
Interpreting a histogram with a log scale4 The histogram opposite shows the
distribution of brain weights (in g)of 27 animal species plotted on a logscale.
a The brain weight (in g) of amouse is 0.4 g. What value wouldbe plotted on the log scale?
b The brain weight (in g) of anAfrican elephant is 5712 g. Whatis the log of this brain weight (totwo significant figures)?
−2 −1 6543210log weight
0
3
6
9
Freq
uenc
y
c What brain weight (in g) is represented by the number 2 on the log scale?d What brain weight (in g) is represented by the number –1 on the log scale?e Use the histogram to determine the number of these animals with brain weights:
over 1000 g ii between 1 and 100 g iii over 1 g.i
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Review
Chapter 1 review 35
Key ideas and chapter summary
Univariate data Univariate data are generated when each observation involvesrecording information about a single variable, for example a datasetcontaining the heights of the children in a preschool.
Types of variables Variables can be classified as numerical or categorical.
Categoricalvariables
Categorical variables are used to represent characteristics ofindividuals. Categorical variables come in two types: nominal andordinal. Nominal variables generate data values that can only be usedby name, e.g. eye colour. Ordinal variables generate data values thatcan be used to both name and order, e.g. house number.
Numericalvariables
Numerical variables are used to represent quantities. Numericalvariables come in two types: discrete and categorical. Discrete variablesrepresent quantities – e.g. the number of cars in a car park. Continuousvariables represent quantities that are measured rather than counted –for example, weights in kg.
Frequency table A frequency table lists the values a variable takes, along with howoften (frequently) each value occurs. Frequency can be recorded as:
! the number of times a value occurs – e.g. the number of females inthe dataset is 32
! the percentage of times a value occurs – e.g. the percentage offemales in the dataset is 45.5%.
Bar chart Bar charts are used to display frequency distribution of categoricaldata.
Describingdistributionsof categoricalvariables
For a small number of categories, the distribution of a categoricalvariable is described in terms of the dominant category (if any), theorder of occurrence of each category, and its relative importance.
Mode, modalcategory
The mode (or modal interval) is the value of a variable (or the intervalof values) that occurs most frequently.
Histogram A histogram is used to display the frequency distribution of anumerical variable. It is suitable for medium- to large-sized datasets.
Describing thedistribution of anumerical variable
The distribution of a numerical variable can be described in terms of:
! shape: symmetric or skewed (positive or negative)! outliers: values that appear to stand out! centre: the midpoint of the distribution (median)! spread: one measure is the range of values covered
(range = largest value – smallest value).
Log scales Log scales can be used to transform a skewed histogram to symmetry.
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Rev
iew
36 Core ! Chapter 1 ! Displaying and describing data distributions
Skills check
Having completed this chapter, you should be able to:
! differentiate between categorical data and numerical data! differentiate between nominal and ordinal categorical data! differentiate between discrete and continuous numerical data! interpret the information contained in a frequency table! identify and interpret the mode! construct a bar chart, segmented bar chart or histogram from a frequency table! read and interpret a histogram with a log scale.
Multiple-choice questions
The following information relates to Questions 1 and 2.
A survey collected information about the number of cars owned by a family and the car size(small, medium, large).
1 The variables number of cars owned and car size (small, medium, large) are:
both categorical variablesA both numerical variablesBa categorical and a numerical variable respectivelyCa numerical and a categorical variable respectivelyDa nominal and a discrete variable respectivelyE
2 The variables head diameter (in cm) and sex (male, female) are:
both categorical variablesA both numerical variablesBan ordinal and a nominal variable respectivelyCa discrete and a nominal variable respectivelyDa continuous and a nominal variable respectivelyE
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Review
Chapter 1 review 37
The following information relates to Questions 3 and 4.
The percentage segmented bar chart shows thedistribution of hair colour for 200 students.
3 The number of students with brown hair isclosest to:
4A 34B 57C72D 114E
4 The most common hair colour is:
blackA blondeBbrownC redD
20
30
40
50
10
70
80
90
100
60
RedBlackBrown
Perc
enta
ge
0
Blonde
OtherHair color
Questions 5 to 8 relate to the two-way frequency table below.
A group of 189 healthy middle-aged adults wereasked whether or not they were currently on adiet. Their responses by sex are summarised inthe two-way frequency table below.
5 The total number of females in the groupis:
76A 78B 111C113D 189E
Sex
Diet Male Female Total
Yes 31 45 76
No 47 66 113
Total 78 111 189
6 The number of males who said they were on a diet is:
31A 45B 47C 66D 78E
7 The percentage of females not on a diet is closest to:
39.7%A 41.5%B 59.5%C 60.3%D 66.0%E
8 The percentage of people on a diet who were male is:
39.7%A 40.8%B 41.5%C 58.4%D 76.0%E
Questions 9 to 13 relate to the histogram shown below.
The histogram opposite displays the testscores of a class of students.
9 The number of students is:
6A 18B 20C21D 22E 0
6 8 18Test score
24 26 28222016141210
Freq
uenc
y
1
32
456
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Rev
iew
38 Core ! Chapter 1 ! Displaying and describing data distributions
10 The number of students in the class who obtained a test score less than 14 is:
4A 10B 14C 16D 28E
11 The histogram is best described as:
negatively skewedA negatively skewed with an outlierBapproximately symmetricC Dpositively skewedE
12 The centre of the distribution lies in the interval:
8–10A 10–12B 12–14C 14–16D 18–20E
13 The spread of the students’ marks is closest to:
8A 10B 12C 20D 22E
14 log10 100 equals:
0A 1B 2C 3D 100E
15 Find the number whose log is 2.314; give the answer to the nearest whole number.
2A 21B 206C 231D 20606E
The following information relates to Questions 16 and 17.
The percentage histogram opposite displaysthe distribution of the log of the annualper capita CO2 emissions (in tonnes) for192 countries in 2011.
0%−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
8%
16%
24%
32%
log CO2
Perc
enta
ge
16 Australia’s per capita CO2 emissions in 2011 were 16.8 tonnes. In which column ofthe histogram would Australia be located?
−0.5 to <0A 0 to <0.5B 0.5 to <1C 1 to <1.5D 1 to <1.5E
17 The percentage of countries with per capita CO2 emissions of under 10 tonnes isclosest to:
14%A 17%B 31%C 69%D 88%E
approximately symmetric with an outlier
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Review
Chapter 1 review 39
Extended-response questions
1 One hundred and twenty-one students wereasked to identify their preferred leisureactivity. The results of the survey aredisplayed in a bar chart.
a What percentage of students nominatedwatching TV as their preferred leisureactivity?
b What percentage of students in totalnominated either going to the movies orreading as their preferred leisure activity?
c What is the most popular leisure activityfor these students? How many rated thisactivity as their preferred activity?
0
5
0
15
20
25
30
10Perc
enta
ge
Sport TVM
usic
Mov
ies
Reading
Other
Preferred leisure activity
2 A group of 52 teenagers were asked,‘Do you agree that the use of marijuanashould be legalised?’ Their responses aresummarised in the table.
a Construct a properly labelled and scaledfrequency bar chart for the data.
b Complete the table by calculating thepercentages, to one decimal place.
Frequency
Legalise Number Percentage
Agree 18
Disagree 26
Don’t know 8
Total 52
c Use the percentages to construct a percentage segmented bar chart for the data.d Use the frequency table to help you complete the following report.
ReportIn response to the question, ‘Do you agree that the use of marijuana shouldbe legalised?’, 50% of the 52 students . Of the remaining students,
% agreed, while % said that they .
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016
Rev
iew
40 Core ! Chapter 1 ! Displaying and describing data distributions
3 Students were asked how much they spenton entertainment each month. The resultsare displayed in the histogram. Use thehistogram to answer the following questions.
a How many students:
i were surveyed?ii spent $100–105 per month?
b What is the mode?c How many students spent $110 or more
per month?0
2
4
6
8
10
90 100Amount ($)110 120 130 140
Freq
uenc
y
d What percentage spent less than $100 per month?e i Name the shape of the distribution displayed by the histogram.
ii Locate the interval containing the centre of the distribution.iii Determine the spread of the distribution using the range.
4 The distribution of the waiting times of34 cars stopped by a traffic light is shown inthe histogram. Use the histogram to write areport on the distribution of waiting times interms of shape, centre, spread and outliers.
0
2
4
6
8
10
5 10 15 20 25 30 40 45 50 55Waiting time (seconds)
Freq
uenc
y
Cambridge Senior Maths AC/VCE Further Mathematics 3&4
ISBN 978-1-316-61622-2 © Jones et al. 2016 Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press Updated November 2016