Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Chapter 1: Displaying and Describing Data Distributions
Page 1 of 16
Further Mathematics 2018 Core: Data Analysis Chapter 1 – Displaying and Describing Data Distributions Extract from Study Design
Key knowledge
• Types of data: categorical (nominal and ordinal) and numerical (discrete and continuous) • Frequency tables, bar charts including segmented bar charts, histograms, stem plots, dot plots, and their
application in the context of displaying and describing distributions • log (base 10) scales, and their purpose and application
Key skills
• Construct frequency tables and bar charts and use them to describe and interpret the distributions of categorical variables
• Answer statistical questions that require a knowledge of the distribution/s of one or more categorical variables • Construct stem and dot plots, boxplots, histograms and appropriate summary statistics and use them to
describe and interpret the distributions of numerical variables • Answer statistical questions that require a knowledge of the distribution/s of one or more numerical variables
Chapter Sections Questions to be completed (all parts unless specified)
1A Classifying data 1, 2, 3, 4, 5, 6
1B Displaying & describing the distributions of categorical variables 1, 2, 3, 4, 5, 6, 7, 8
1C Displaying & describing the distributions of numerical variables 1, 2, 3, 4, 5, 6, 7, 8, 9
1D Using a log scale to display data 1, 2, 3, 4,
Chapter 1 Review Multiple Choice 1 -‐17 Extended Response 1 -‐ 4
MORE RESOURCES
http://drweiser.weebly.com
Core: Data Analysis
Page 2 of 16
Table of Contents 1A CLASSIFYING DATA 3 VARIABLES 3
TYPES OF VARIABLES 3 EXAMPLE 3 EXAMPLE 3
1B DISPLAYING AND DESCRIBING THE DISTRIBUTIONS OF CATEGORICAL VARIABLES 4 FREQUENCY TABLE 4
EXAMPLE 1 4 THE BAR CHART 4
KEY CHARACTERISTICS OF A BAR CHART 4 EXAMPLE 2 5
STACKED OR SEGMENTED BAR CHART 5 EXAMPLE 3 5 EXAMPLE 3 USING CAS CALCULATOR -‐ SEGMENTED PERCENTAGE BAR CHART 6
THE MODE (OR THE MODAL CATEGORY) 7 ANSWERING STATISTICAL QUESTIONS INVOLVING CATEGORICAL VARIABLES 7 EXAMPLE 4 7
1C DISPLAYING AND DESCRIBING THE DISTRIBUTIONS OF NUMERICAL VARIABLES 8 THE GROUPED FREQUENCY DISTRIBUTION 8
EXAMPLE 5 8 THE HISTOGRAM AND ITS CONSTRUCTION 8
CONSTRUCTING A HISTOGRAM 8 EXAMPLE 6 8 CAS CALCULATOR EXAMPLE 2 -‐ CONSTRUCTING A HISTOGRAM 9 WHAT TO LOOK FOR IN A HISTOGRAM 10 EXAMPLE 7 DESCRIBING A HISTOGRAM IN TERMS OF SHAPE, CENTRE AND SPREAD 11
1D USING A LOG SCALE TO DISPLAY DATA 12 PROPERTIES OF LOGS TO THE BASE 10 12
WHY USE LOGS 12 WORKING WITH LOGS 13 EXAMPLE 8 13
ANALYSING DATA DISPLAYS WITH A LOG SCALE 13 EXAMPLE 9 13
CAS CALCULATOR EXAMPLE 3 -‐ CONSTRUCTING A HISTOGRAM WITH A LOG SCALE 14 ROUNDING TO A GIVEN NUMBER OF SIGNIFICANT FIGURES 16
EXAMPLE. 16 SIGNIFICANT FIGURES AND ZEROS 16
Chapter 1: Displaying and Describing Data Distributions
Page 3 of 16
1A Classifying data Variables
In a dataset, we call the qualities or quantities about which we record information variables.
Types of variables
Categorical – represents characteristics or qualities of people or things • Nominal – group individuals according to a particular characteristics e.g. male, female • Ordinal – group and order individuals according to a particular characteristic e.g. fitness level – low,
medium or high. Numerical – represents quantities, can be counted or measured.
• Discrete – represents quantities that are counted e.g. number of pets in a house. ‘How many?’ • Continuous – represents quantities that can be measured. ‘How much?’
Example Which of the following is not numerical data?
A. Maths test results B. Ages C. AFL football teams D. Heights of students in a class E. Lengths of bacterium
Example Which of the following is not discrete data?
A. Number of students older than 17.5 years old B. Number of girls in a class C. Number of questions correct in a multiple-‐choice test D. Number of students above 180 cm in a class E. Height of the tallest student in a class
Data
CategoricalNon numerical data
Nominaleg. Favourite fruit-‐ Mangoes-‐ Apples-‐ Bananas
Ordinaleg. Opinion of death
sentence -‐ Strongly agree-‐ Agree-‐ Not sure-‐ Disagree-‐ Strongly disagree
NumericalNumerical data
DiscreteWhole number responses
eg. Number of children in a school, 382
ContinuousCan have decimals
orfractions within answer.eg. Height of class members175.5 cm165.0 cm180.5 cm.
Core: Data Analysis
Page 4 of 16
1B Displaying and describing the distributions of categorical variables Frequency table
• Large amounts of data can be organised • Patterns (distribution) or trends are visible. • The frequency of categorical data can be seen.
Example 1 The sex of 11 preschool children is as shown (F = female, M = male)
F M M F F M F F F M M Construct a frequency table (including percentage frequencies) to display the data.
Sex Frequency Number Percentage
Female Male Total
The bar chart
• Represents the key information in a frequency table graphically.
Key characteristics of a bar chart
• Frequency is on vertical axis. • The variable displayed is on the horizontal axis. • The height of the bar shows the frequency (count or percentage). • The bars have gaps that separate the category. • There is one bar for each category.
Chapter 1: Displaying and Describing Data Distributions
Page 5 of 16
Example 2
The climate type of 23 countries is classified as cold, mild or hot. The results are summarised in the table. Construction a frequency bar chart to display this information.
Stacked or segmented bar chart
• Compares two or more categorical variables
• The lengths of the segments are determined by the frequencies
• The height of the bar gives the total frequency
• In percentage segmented bar charts, the lengths are determined by the percentages. The height of the bar is 100
The segmented bar chart opposite was formed from the climate data used in Example 2.
Example 3
The climate type of 23 countries is classified as cold, mild or hot. Construct a percentage frequency segmented bar chart to display this information.
Climate Type Frequency Number Percentage
Cold 3 Mild 14 Hot 6 Total 23
Climate Type Frequency Number Percentage
Cold 3 Mild 14 Hot 6 Total 23
Core: Data Analysis
Page 6 of 16
Example 3 Using CAS calculator -‐ Segmented percentage bar chart 1. In a New document, Add List &
Spreadsheet page
Press home c, 1 New Document 4 Add Lists & Spreadsheet
2. Enter the Data
Climate in Column “a” Number in Column “b”
3. Press menu b 3 3: Data 8 8: Summary Plot Choose X List – climate Summary List – Number Display on: -New Page
4. The resulting plot is a bar chart Change it to pie chart Press menu b 1 1:Plot Type 9 9:Pie Chart
5. Show all percentages Press menu b 2 2:Plot Properties 4 4:Show all labels
6. Construct the segmented bar chart as above by using this information. NOTE: We use a Pie chart ONLY to get the percentages because the CAS can’t do a segmented percentage bar chart.
Chapter 1: Displaying and Describing Data Distributions
Page 7 of 16
The mode (or the modal category)
• The most frequently occurring value or category.
Answering statistical questions involving categorical variables
• Depends on data for its answer Statistical questions that are of most interest when working with a single categorical variable are of these forms: • Is there a dominant category into which a significant percentage of individuals fall or are the individuals
relatively evenly spread across all of the categories? For example, are the shoppers in a department store predominantly male or female, or are there roughly equal numbers of males and females?
• How many and/or what percentage of individuals fall in to each category? For example, what percentage of visitors to a national park are ‘day-‐trippers’ and what percentageof visitors are staying overnight?
A short-‐written report is the standard way to answer these questions. The following guidelines are designed to help you to produce such a report.
Example 4
In an investigation of the variation of climate type across countries, the climate types of 23 countries were classified as cold, mild or hot. The data are displayed in a frequency table to show the percentages.
Use the information in the frequency table to write a concise report on the distribution of climate types across these 23 countries.
The climate types of _____ countries were classified as being, ‘_________’, ‘_________’ or
‘_________’. The majority of the countries, _____%, were found to have a _________ climate.
Of the remaining countries, _____% were found to have a _________ climate, while ____% were found to have a _________ climate.
Core: Data Analysis
Page 8 of 16
1C Displaying and describing the distributions of numerical variables The grouped frequency distribution
• Used when the data have large ranges of values • Group data into a small number of convenient intervals. • Organise data in the frequency table with data intervals.
Example 5 The data below give the average hours worked per week in 23 countries.
35.0 48.0 45.0 43.0 38.2 50.0 39.8 40.7 40.0 50.0 35.4 38.8 40.2 45.0 45.0 40.0 43.0 48.8 43.3 53.1 35.6 44.1 34.8
Form a grouped frequency table with five intervals
The histogram and its construction
• The histogram is a graphical display of the information in the grouped frequency table.
Constructing a histogram • Frequency is on vertical axis • The values of the variable being displayed are plotted on the horizontal axis • Each bar in a histogram corresponds to a data interval • The height of the bar gives the frequency
Example 6 Construct a histogram for the frequency table.
Average hours worked
Frequency Number Percentage
Total
Chapter 1: Displaying and Describing Data Distributions
Page 9 of 16
CAS calculator example 2 -‐ Constructing a histogram Display the following set of 27 marks in the form of a histogram.
16 11 4 25 15 7 14 13 14 12 15 13 16 14 15 12 18 22 17 18 23 15 13 17 18 22 23 1. Start a new document by pressing /N (or c>New Document). If
prompted to save an existing document, move cursor to No and press ·. 2. Select 4 Add Lists & Spreadsheet.Enter the data into a list named marks. • Move the cursor to the name space of column A and type in marks as the
list name. Press ·. • Move the cursor down to row 1, type in the first data value and press
·. Continue until all the data have been entered. Press ·after each entry.
3. Statistical graphing is done through the Data & Statistics application. Press
/I (or /~) and select 5 Add Data & Statistics. • Press e·(or click on the Click to add variable box on the x-‐axis) to show
the list of variables. Select marks.Press ·to paste marks to that axis. A dot plot is displayed as the default. To change the plot to a histogram, press b>1 Plot Type> 3 Histogram. Your screen should now look like that shown opposite. This histogram has a column (or bin) width of 2 and a starting point of 2. Note: If you click on a column, it will be selected. Hint: If you accidentally move a column or data point, /Z will undo the move.
Change the histogram column (bin) width to 4 and the starting point to 2.
• /b2 Plot Properties. 2 Histogram Properties • Select 2 Bin Settings> 1 Equal Bin Width. • In the settings menu change the Width to 4 and the Starting Point 2
Hint: Alternatively, pressing/b· with the cursor on the histogram gives you a contextual menu that relates only to histograms.
A new histogram is displayed with column width of 4 and a starting point of 2 but it no longer fits the window.
To solve this problem, press b> 5 Windows Zoom > 2 Zoom-‐Data and ·to obtain the histogram as shown below right.
Core: Data Analysis
Page 10 of 16
To change the frequency axis to a percentage axis, press /b2 (Plot Properties), 2 (Histogram Properties), 1 (Histogram Scale), 2 (Percent) • In Short: /b2 2 1 2
What to look for in a histogram
• Shape and outliers – do some data values occur more frequently or is it relatively flat, showing the distribution is approximately the same.
o Symmetric distribution -‐ the histogram has a single peak and tail off evenly on both sides e.g. measuring intelligence scores.
o Bimodal – there two peaks, with a dip in the middle and tail e.g. distance thrown by Olympic discuss throwers (both male and female).
o Positively skewed – tails off to the right e.g. the
wages in a large organisation. o Negatively skewed – tails off to the left e.g. the
distribution of ages at death.
o Outliers – data is away from the main body
of the data e.g. the height of football players including the ruck man.
• Centre o The median – the middle of the distribution.
Chapter 1: Displaying and Describing Data Distributions
Page 11 of 16
• Spread o Is there a narrow single-‐peak? The data is tightly
clustered around the centre of distribution.
o Is the peak broad, the data values are more
widely spread out and are not tightly clustered around the centre.
Example 7 Describing a histogram in terms of shape, centre and spread
The histogram opposite shows the distribution of the number of phones per 1000 people in 85 countries.
a) Describe its shape and note outliers (if any).
_______________________________________________________
_______________________________________________________
b) Locate the centre of the distribution.
_______________________________________________________
___________________________________________________________________________________________
c) Estimate the spread of the distribution.
___________________________________________________________________________________________
___________________________________________________________________________________________
d) Comment on the distribution of the histogram
For these ____ countries, the distribution of the number of phones per ______ people is ____________ skewed. The ___________ of the distribution lies between _____ and _____ phones/1000 __________. The __________ of the distribution is _______ phones/1000 people. There are no _________________.
Core: Data Analysis
Page 12 of 16
1D Using a log scale to display data Properties of logs to the base 10
1. If a number is greater than one, its log to the base 10 is greater than zero.
2. If a number is greater than zero but less than one, its log to the base is negative.
3. If the number is zero, then its log is undefined.
log10 (10) = log (101) = 1 log10 (100) = log (102) = 2
Note: when log is written without the subscript 10 it always refers to log10
log10 (1000) = log (103) = 3 log(10n) = n
Why use logs
Sometimes a data set will contain data points that vary so much in size that plotting them using a traditional scale becomes very difficult.
For example: If we are studying the population of different cities in Australia we might end up with the following data points:
A histogram splitting the data into class intervals of 100 000 would then appear as follows:
By applying a logarithmic function to all the population values in the above table transforms it to:
Examples of where the logarithmic scale is used in real life: Richter scale measuring strength of an earthquake and sound or noise decibels
Log10(population) Frequency
4 -‐ 5 4 5 -‐ 6 4 6 -‐ 7 3
Chapter 1: Displaying and Describing Data Distributions
Page 13 of 16
Working with logs
To construct and interpret a log data plot, you need to:
1. Work out the log for any number.
2. Work backwards from a log to the number it represents.
Example 8
a) Find the log of 45, correct to two significant figures.
b) Find the number whose log is 2.7125, correct to the nearest whole number.
Analysing data displays with a log scale
Example 9
The histogram shows the distribution of the weights of 27 animal species plotted on a log scale.
a) What body weight (in kg) is represented by the number 4 on the log scale?
____________________________________________
____________________________________________
____________________________________________
b) How many of these animals have body weights more than 10 000kg?
___________________________________________________________________________________________
___________________________________________________________________________________________
c) The weight of a cat is 3.3 kg. Use your calculator to determine the log of its weight correct to two significant figures.
___________________________________________________________________________________________
___________________________________________________________________________________________
d) Determine the weight (in kg) whose log weight is 3.4 (the elephant). Write your answer correct to the nearest whole number.
___________________________________________________________________________________________
___________________________________________________________________________________________
Core: Data Analysis
Page 14 of 16
CAS calculator example 3 -‐ Constructing a histogram with a log scale
The weights of 27 animal species (in kg) are recorded below.
1.4 470 36 28 1.0 12,000 2600 190 520 10
3.3 530 210 62 6700 9400 6.8 35 0.12
0.023 2.5 56 100 52 87,000 0.12 190
Construct a histogram to display the distribution:
a) Of the body weights of these 27 animals and describe its shape
1. Start a new document by pressing /N (or c>New Document). If prompted to save an existing document, move cursor to No and press ·.
2. Select 4 Add Lists & Spreadsheet.Enter the data into a list named weight.
• Move the cursor to the name space of column A and type in marks as the list name. Press ·.
• Move the cursor down to row 1, type in the first data value and press ·. Continue until all the data have been entered. Press ·after each entry.
3. Statistical graphing is done through the Data & Statistics application. Press /I (or /~) and select 5 Add Data & Statistics.
• Press e·(or click on the Click to add variable box on the x-‐axis) to show the list of variables. Select weight · set that axis.
A dot plot is displayed as the default. To change the plot to a histogram,
press b>1 Plot Type> 3 Histogram. Your screen should now look like that shown opposite. This histogram has a column (or bin) width of 2 and a starting point of 2.
Note: If you click on a column, it will be selected.
Hint: If you accidentally move a column or data point, /Z will undo the move.
___________________________________________________________________________________________
___________________________________________________________________________________________
Chapter 1: Displaying and Describing Data Distributions
Page 15 of 16
b) Of the log body weights of these animals and describe its shape.
Return to the Lists & Spreadsheet screen.
Name another column ‘logweight’.
Move the cursor to the grey cell below the ‘logweight’ heading. Type in = log(weight). Press · to calculate the values of logweight
Plot a histogram using a log scale. That is, plot the variable ‘logweight’.
Note: Use b>Plot Properties>Histogram Properties>Bin Settings>Equal BinWidth and set the column width (bin) to 1and alignment (start point) to −2 and use b> 5 Windows Zoom > 2 Zoom-‐Data and ·to obtain the histogram as shown below right.
Core: Data Analysis
Page 16 of 16
Rounding to a given number of significant figures Significant figures are a method of simplifying a number by rounding it to a base 10 value. Questions relating to significant figures will require a number to be written correct to x number of significant figures. In order to complete this rounding, the relevant significant figure(s) needs to be identified.
Example. Consider the number 123.456 789. • This value has 9 significant figures, as there are nine numbers that tell us something about the particular
place value in which they are located. • The most significant of these values is the number 1, as it indicates the overall value of this number is in the
hundreds. • If asked to round this value to 1 significant figure, the number would be rounded to the nearest hundred,
which in this case would be 100. • If rounding to 2 significant figures, the answer would be rounded to the nearest 10, which is 120. • Rounding this value to 6 significant figures means the first 6 significant figures need to be acknowledged,
123.456. However, as the number following the 6th significant figure is above 5, the corresponding value needs to round up, therefore making the final answer 123.457.
Rounding hint: If the number after the required number of significant figures is 5 or more, round up. If this number is 4 or below, leave it as is. Significant figures and Zeros Zeros present an interesting challenging when evaluating significant figures and are best explained using examples. • 4056 contains 4 significant figures. The zero is considered a significant figure as there are numbers on either
side of it. • 4000 contains 1 significant figure. The zeros are ignored as they are place holders and may have been
rounded. • 4000.0 contains 5 significant figures. In this situation the zeros are considered important due to the zero
after the decimal point. A zero after the decimal point indicates the numbers before it are precise. • 0.004 contains 1 significant figure. As with 4000, the zeros are place holders. • 0.0040 contains 2 significant figures. The zero following the 4 implies the value is accurate to this degree. The following examples show how these rules work:
0.003561 – leading digits are ignored – 4 significant figures 70.036 – zeros between other digits are significant – 5 significant figures 5.320 – zeros included after decimal digits are significant – 4 significant figures 450000 – trailing zeros are not significant – 2 significant figures 78000.0 – the zeros after the decimal point are significant, so the zeros between other numbers are significant – 6 significant figures
As when rounding to a given number of decimal places, when rounding to a given number of significant figures consider digit after the specified number of figures. If it is five or more, round the final digit up; if it is four below, keep the final digit as it is.
5067.37 – rounded to 2 significant figures is 5100 3199.01 -‐ rounded to 4 significant figures is 3199 0.004931 -‐ rounded to 3 significant figures is 0.00493 1020004 -‐ rounded to 2 significant figures is 1000000