Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Common Core Math 1 Unit 2 One Variable Statistics
1 | P a g e
Name:____________________________
Period: _____
One Variable Statistics
Common Core Math 1 Unit 2 One Variable Statistics
2 | P a g e
Common Core Math 1 Unit 2 One Variable Statistics
3 | P a g e
Main Concepts Page #
Study Guide 2
Vocabulary 3 – 6
Identify and Describe Data 6 – 7
Frequency Tables, Dot Plots, and Histograms 8 – 11
Measures of Center and Spread 11 – 15
Box Plots & Outliers 16 – 18
Comparing Data Sets 19 – 20
Unit 2 Review 21 – 23
Answer Key 24 – 31
Common Core Math 1 Unit 2 One Variable Statistics
4 | P a g e
Common Core Math 1 Unit 2 One Variable Statistics
5 | P a g e
Common Core Standards
N-Q.1 Use units as a way to understand problems and to guide the solution of multi-step problems; choose and interpret units consistently in formulas; choose and interpret the scale and the origin in graphs and data displays.
N-Q.2 Define appropriate quantities for the purpose of descriptive modeling.
N-Q.3 Choose a level of accuracy appropriate to limitations on measurement when reporting quantities.
S-ID.1 Represent data with plots on the real number line (dot plots, histograms, and box plots).
S-ID.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.
S-ID.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).
As a result of learning, students should be able to…
represent data with plots on the real number line (dotplots, histograms, and boxplots).
choose and interpret the scale and the origin in data displays.
choose an appropriate level of accuracy when reporting statistical quantities.
use technology to calculate summary statistics and visually represent data.
based on the shape of a data distribution, choose the appropriate measures of center (mean or median) and spread (standard deviation or interquartile range) to describe the distribution.
interpret summary statistics for center and spread in the context of the data.
compare the center and spread of two or more different data sets in context.
interpret differences in shape, center, and spread in context.
use the context of the data to explain why its distribution takes on a particular shape.
explain the effect of outliers on the shape, center, and spread of data distributions.
use the 1.5(IQR) rule to determine if there are outliers in a data set.
define appropriate quantities to measure when collecting quantitative data to describe a population.
Essential Understandings:
There are two types of data: quantitative and qualitative. Data can be described graphically and numerically.
Frequency tables, dot plots, box plots, and histograms are visual representations of data distributions that can
be created and described.
The calculator can be used to create dot plots, box plots, and histograms to represent a data distribution.
There are situations when the median should be used to describe the center of a distribution instead of mean.
The standard deviation can be calculated by hand or on the calculator to describe the spread of a data
distribution when the mean is used to describe the data’s center.
Box plots can be used to visually represent a data distribution, use interquartile range to describe the spread of
a data distribution, use the median to describe the center of the data, and recognize the influence of outliers on
visual and numerical representations of data.
Data sets can be compared using visual and numerical representations.
An outlier can affect the shape, center, and spread of data distributions.
The 1.5 IQR rule is used to determine if there are outliers in a data set.
Essential Questions
How can the representation and analysis of data inform and influence decisions?
How can you collect, organize, and display data?
What is the most appropriate way to display a given set of data?
How can we use statistics to manipulate data?
When informed of a statistic, how can you determine if the information is misleading?
Common Core Math 1 Unit 2 One Variable Statistics
6 | P a g e
box plot
Example and Notes to help YOU remember:
categorical data
Example and Notes to help YOU remember:
clustering
Example and Notes to help YOU remember:
data
Example and Notes to help YOU remember:
dot plot
Example and Notes to help YOU remember:
frequency
Example and Notes to help YOU remember:
Common Core Math 1 Unit 2 One Variable Statistics
7 | P a g e
frequency distribution
Example and Notes to help YOU remember:
interquartile range
Example and Notes to help YOU remember:
mean
Example and Notes to help YOU remember:
mean absolute distribution
Example and Notes to help YOU remember:
measures of center
Example and Notes to help YOU remember:
measures of spread
Example and Notes to help YOU remember:
Common Core Math 1 Unit 2 One Variable Statistics
8 | P a g e
median
Example and Notes to help YOU remember:
modified box plot
Example and Notes to help YOU remember:
outlier
Example and Notes to help YOU remember:
population
Example and Notes to help YOU remember:
quantitative data
Example and Notes to help YOU remember:
skewed data
Example and Notes to help YOU remember:
Common Core Math 1 Unit 2 One Variable Statistics
9 | P a g e
standard deviation
Example and Notes to help YOU remember:
symmetrical
Example and Notes to help YOU remember:
variable (statistics)
Example and Notes to help YOU remember:
Common Core Math 1 Unit 2 One Variable Statistics
10 | P a g e
Read the questions and answer them based on what you’ve learned about statistics in previous years. After
you have answered each one to the best of your ability, underline your original answers and then read the
answers in the answer key. Edit your answers. Make note of information that you underlined that was
incorrect. Make note of new information that you added in order to make the answer more precise.
1) What does univariate mean?
2) Data collection has two key purposes. What are they?
3) What are the two types of data? Which type of data is each of the following: Social Security
numbers, best-selling dish, profit in $, expenses in $, best employee
4) What are two ways in which data can be displayed? What are examples of each?
5) What are the four characteristics that should be addressed when dealing with data? What are
examples of each?
Common Core Math 1 Unit 2 One Variable Statistics
11 | P a g e
1. Determine whether the following data is categorical (C) or quantitative (Q) a. The candidate a survey respondent will support in an upcoming election. b. The length of time of people’s drive to work. c. The number of televisions in a household. d. The distance kickers for a football team can kick a football. e. The number of pages copied in the copy room each day. f. The kind of tree in each person’s front yard in a neighborhood. g. The type of blood a person has. h. The jersey numbers of the football team. i. The heights of the tallest buildings in the world. j. The language spoken by 2000 people randomly surveyed at JFK Airport.
Frequency tables, dot plots, and histograms.
Data can be described graphically or numerically. Three ways to describe quantitative data graphically are
with frequency tables, box plots, and histograms.
In this unit, we are focusing on quantitative data. Dot plots allow us to see the individual data. With
histograms and box plots the individual data is lost. Frequency tables can display individual data depending
upon how they are constructed. Frequency tables can also be used for categorical data.
Carefully check each graph’s title and the labels of its axes.
There are 4 things to address when describing the distribution of data: shape, center, spread, and outliers
There are 4 basic shapes that you should be able to identify. Not all graphs will be 1 of these 4 shapes.
*Based on the names, sketch a histogram for each of the 4 basic shapes.
skewed left (extreme low data) skewed right (extreme high data)
mound (or symmetrical) uniform
The center and any outliers can be estimated by carefully looking at the dot plot, histogram, or frequency table.
The spread for dot plots and histograms can be described by the data’s range.
Frequency tables can be used to quickly display or organize data.
Common Core Math 1 Unit 2 One Variable Statistics
12 | P a g e
A dot plot is a single axis graph and it can be quickly created from a frequency table. Be precise when creating your
columns of dots.
A histogram is a type of bar graph. The bars touch to show continuity. Bars normally represent equal (or close to
equal) intervals of data. They are often referred to as bins, classes or groupings. There should be 5-8 bins
unless otherwise stated, and they should be the same width. Sometimes just the lower value of the bar is
given to the left. Sometimes the bar’s range is given below the bar. Be careful about overlapping data.
Scores on the last quiz: 3.5, 6, 6.5, 7, 7, 8, 8.5, 9, 9, 9.5, 10
* Use the scores on the last quiz to create a dot plot and a histogram.
1. The following frequency table lists the ages of patrons dining at a local restaurant at 7:00 PM.
Age Frequency
1-10 1
11-20 4
21-30 5
31-40 7
41-50 6
51-60 5
61-70 1
a) Construct a histogram for the data. Use the graph paper.
Common Core Math 1 Unit 2 One Variable Statistics
13 | P a g e
b) Describe the data distribution in context (shape, center, spread, and any outliers).
c) Create two sets of data that could be represented by the histogram.
2) A bank wants to improve its customer service. Before deciding to hire more workers, the manager decides to get some information on the waiting times customers currently experience. During a week, 50 customers were randomly selected, and their waiting times, in minutes, were recorded.
The data are as follows: 18.5, 9.1, 3.1, 6.2, 1.3, 0.5, 4.2, 5.2, 0.0, 10.8, 5.8, 1.8, 1.5, 1.9, 0.4, 3.5, 8.5, 11.1, 0.3, 1.2, 4.4, 3.8, 5.8, 1.9, 3.6, 2.5, 4.5, 5.8, 1.5, 0.7, 0.8, 0.1, 9.7, 2.6, 0.8, 1.2, 2.9, 3.0, 3.2, 2.8, 10.9, 0.1, 5.9, 1.4, 0.3, 5.5, 4.8, 0.9, 1.6, and 2.2.
a) Construct a frequency table for the data which will be used to make a histogram. (Remember to define the classes so that there are approximately 5-8 groupings.)
b) Construct a histogram. Use the graph paper.
c) Describe the data distribution in context.
d) Would it make more sense to display the data in a dot (line) plot? Explain.
Common Core Math 1 Unit 2 One Variable Statistics
14 | P a g e
Creating a histogram on the calculator.
Use the following heights of a group of middle school students as an example.
Step 1: Enter the height data in L1. Carefully check your data.
Commands: LIST
Step 2: Setup the graph. The screens may look a little different from the
pictures below depending on which calculator you use.
Commands: 2nd ; Y= ; 1
Enter ; Use the arrow key to select “On” and press enter. ; Use the arrow key to select the histogram and
press enter
Step 3: Adjust the window (press the window button) to fit the data. Use Xscl to adjust the width of the
bars. I used 10 here because I want to see how many people are in each category of 10 inches. Ymax
controls the height of the bar on the screen. If you skip this step, the calculator will choose numbers for you
and you might not see your graph when you do step 4. If this occurs, after doing step 4, press “zoom” and
choose “zoomstat”. The calculator will choose values for “window” which will make the histogram fit on
your screen.
Commands: Window
Student Height (in.)
Megan 61
Morgan 65
Kyle 70
Darren 71
Angie 76
Cassie 65
Brady 59
Chris 78
Cody 74
Maria 81
Using this window setting, we will have sorted the data into four categories: Heights between 50 and 59 inches, 60 to 69, 70 to 79 and 80 to 89.
Common Core Math 1 Unit 2 One Variable Statistics
15 | P a g e
Step 4: Look at your graph.
Commands: Graph
Measures of Center and Spread
In previous courses, you have calculated the center of a set of data. The two measures of center that we are going to
use with quantitative data are mean (arithmetic mean) and median.
Review:
*How do you calculate the mean of a set of data?
*How do you calculate the median of a set of data?
*How do you determine whether the mean or median is the most appropriate measure of center for a set of data?
*Calculate the mean and median for the following set of data. {8, 9, 11, 10, 97}
*Calculate the mean and median for the following set of data. {18, 19, 37, 22}
*Which measure of center is the most appropriate for each set?
The measure of center can be used to predict a typical value for a set of data. The measure of center does not give us a
complete picture. The measure of spread is needed to predict how consistent the data will be. Sometimes, we expect
the data to be very consistent. Other times, we would expect a fairly large spread.
*Think of a real-world data set that should have a low measure of spread.
*Think of a real-world data set that should have a fairly large measure of spread.
Consider the following test scores:
Student Test 1 Test 2 Test 3 Test 4
Johnny 65 82 93 100
Will 82 86 88 84
Anna 80 99 73 88
1. Who is the best student? How do you know?
2. What is the mean test score for each student?
Using the Trace button
you can see that there are 5 people in the 70 to 79 inch category. Use the arrow keys to move from one bar to the next.
Common Core Math 1 Unit 2 One Variable Statistics
16 | P a g e
3. Based on the mean, who is the best student?
4. If asked to select one student, who would you pick as the best student? Explain.
Investigation 1: Deviation from the Mean
Usually we calculate the mean, or average, test score to describe how a student is doing. Johnny, Will, and
Anna all have the same average. However, these three students do not seem to be “equal” in their test
performance. We need more information than just the typical test score to describe how they are doing.
One thing we can look at is how consistent each student is with their test performance. Does each student
tend to do about the same on each test, or does it vary a lot from test to test? Measures of spread will give
us that information. In statistics, deviation is the amount that a single data value differs from another
value. Often that other value is the mean.
5. Complete the table below by finding the deviation from the mean for each test score for each student.
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Anna
Test 1
Test 2
Test 3
Test 4
Sum of Deviations
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Johnny
Test 1
Test 2
Test 3
Test 4
Sum of Deviations
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Will
Test 1
Test 2
Test 3
Test 4
Sum of Deviations
6. How does the sum of the deviations from
the mean relate to the mean being the
measure of center for a set of data?
Common Core Math 1 Unit 2 One Variable Statistics
17 | P a g e
Investigation 2: Mean Absolute Deviation
One way to measure consistency is to find the average deviation from the mean. In other words, how far do most
values in a data set fall from the mean? One way to answer this question would be to find the average deviation, or
distance, that the data values fall from the mean. So we would add up the deviations to find the total deviation and
then divide by the number of data values to find the mean deviation. However, the fact that the deviations from the
mean always add up to zero is a problem. No matter what non-zero number we divide zero by, we always get zero!
When talking about spread, a value of zero indicates that there is no spread, or variability. One way to fix this problem
is to look at only the distances from the mean, and not their directions as indicated by the sign of the deviation
(positive or negative). We can take the absolute value of the distances and then find the average distance.
7. Calculate the Mean Absolute Deviation (MAD) for each student’s scores.
8. What does the Mean Absolute Deviation (MAD) tell you about each student? Is there one student who seems to be
more consistent than the others?
9. Interpret the MAD for Johnny in context.
Investigation 3: Calculating the Standard Deviation Below is the formula for calculating the standard deviation. It looks fun, doesn’t it? Let’s break it into parts so we can
see how it is finding the “average deviation from the mean.”
Step 1: In the table below, record the data values (test scores) in the second column labeled “Value.” The x is used to denote a
value from the data set. Johnny’s first value is written for you.
Step 2: Record the mean at the bottom of the second column next to the symbol . Mu (pronounced “mew”) is the
lowercase Greek letter that later became our letter “m.” is another symbol that we use for mean (in addition to �̅�).
Step 3: In the third column, calculate the deviation from the mean for each test score by taking each test score and
subtracting the mean. The first difference has been done for you.
Step 4: Add the values in the third column to find the sum of the deviations from the mean. If you have done
everything correctly so far, the sum should be zero. The capital Greek letter , called sigma, is a symbol that is used to
indicate the sum.
Step 5: Square each deviation to make it positive and record these values in the last column of the table. The first
value is done for you. Remember that parentheses are necessary -202 = -400 and (-20)2 = 400.
Step 6: Find the sum of the squared deviations by adding up the values in the fourth column and putting the sum at
the bottom of the column. This is the sum of the squared deviations from the mean.
Step 7: Find the average of the squared deviations from the mean by dividing the sum of column four by the number
of data values, n, (the number of test scores).
Step 8: “Un-do” the squaring by taking the square root. Now you have found the standard deviation! The symbol for
standard deviation is the lower-case letter sigma, .
n
x 2)(
Common Core Math 1 Unit 2 One Variable Statistics
18 | P a g e
Johnny’s Data
Test Value (𝑥) Deviations from the Mean Value – Mean
(𝑥 − 𝜇)
Squared Deviations from the Mean (Value – Mean)2
(𝑥 − 𝜇)2
1 65 65-85 = -20 (-20)2 = 400
2
3
4
Mean 𝜇 = Sum ∑(𝑥 − 𝜇) = Sum ∑(𝑥 − 𝜇)2 =
Average of Squared Deviations ∑(𝑥 − 𝜇)2
𝑛=
Square Root of Average of Squared
Deviations √
∑(𝑥 − 𝜇)2
𝑛=
Will’s Data
Test Value (𝑥) Deviations from the Mean Value – Mean
(𝑥 − 𝜇)
Squared Deviations from the Mean (Value – Mean)2
(𝑥 − 𝜇)2
1
2
3
4
Mean 𝜇 = Sum ∑(𝑥 − 𝜇) = Sum ∑(𝑥 − 𝜇)2 =
Average of Squared Deviations ∑(𝑥 − 𝜇)2
𝑛=
Square Root of Average of Squared
Deviations √
∑(𝑥 − 𝜇)2
𝑛=
Common Core Math 1 Unit 2 One Variable Statistics
19 | P a g e
10. Translate into words: ∑(𝑥 − 𝜇)2. 11. Interpret Anna’s standard deviation in context. 12. Who is the best student? How do you know?
Finding the mean, median and standard deviation on the calculator.
Press “List”. Enter Johnny’s grades in List 1, Will’s grades in List 2, and Anna’s grades in List 3.
Once all of the scores are entered, and you have checked to make sure that they are correct, press “2nd”
and “MODE” (QUIT).
Press “2nd” and then “LIST” (STAT). Use the arrow key to select “CALC” and then choose “1-Var Stats”.
Press enter and the screen should say “1-Var Stats”. You now need to choose the list that the calculator
will use. Press “2nd” and then “LIST” and select the list you want to use.
Your screen should now state “1-Var Stats” and either L1, L2, or L3. Press “Enter” and look at the
information.
Normally, we will be using Sx (sample standard deviation) for standard deviation. This is used more
frequently because in order to use σx (population standard deviation) we need to have the data for the
entire population and that is not often the case. Sample deviation is the larger of the two values. In this
case, we would use σx because we had all of the students’ scores.
13. Why do you think you were taught about MAD as a measure of spread in previous math courses and
now you’re learning about using standard deviation as the measure of spread when mean is the
measure of center?
Anna’s Data
Test Value (𝑥) Deviations from the Mean Value – Mean
(𝑥 − 𝜇)
Squared Deviations from the Mean (Value – Mean)2
(𝑥 − 𝜇)2
1
2
3
4
Mean 𝜇 = Sum ∑(𝑥 − 𝜇) = Sum ∑(𝑥 − 𝜇)2 =
Average of Squared Deviations ∑(𝑥 − 𝜇)2
𝑛=
Square Root of Average of Squared
Deviations √
∑(𝑥 − 𝜇)2
𝑛=
Common Core Math 1 Unit 2 One Variable Statistics
20 | P a g e
Review: Box Plots
Create a box plot for each set of data. Make note of any numbers in the set that you think are outliers. State
the values for the five-number summary (upper quartile, middle quartile, lower quartile, least value, greatest
value) and the values for the measures of spread (range, and interquartile range). Remember to clearly show
your work for each calculation.
IQR & Outliers:
The IQR can be used to determine whether or not the data set includes an outlier.
First, multiply the IQR by 1.5. Second, subtract the product of 1.5 and the IQR from the first quartile. That
number is the lowest number that would not be considered an outlier. Third, add the product of 1.5 and the
IQR to the third quartile. That is the largest number that would not be considered an outlier. In a modified
box plot, only non-outlier data is used to create the box plot. All outlier points are displayed as circles.
As data is added to a set, the IQR and the five-number summary need to be recalculated in order to
determine whether or not the new data set includes an outlier.
4. Calculate the greatest and least values for the data sets in #1 & #2 that would not be considered outliers.
a. What is the smallest value that would not be an outlier for the data in #1?
b. What is the largest value that would not be an outlier for the data in #1?
c. What is the smallest value that would not be an outlier for the data in #2?
d. What is the largest value that would not be an outlier for the data in #2?
1. {3, 8, 6, 8, 11, 9, 8}
Q1:
Q2:
Q3:
Greatest Value:
Least Value:
five-number summary:
Range:
Interquartile Range:
2. {12, 3, 2, 4, 5, 6, 6, 4, 10, 7, 9, 8}
Q1:
Q2:
Q3:
Greatest Value:
Least Value:
five-number summary:
Range:
Interquartile Range:
3. The middle quartile, or second quartile, is also which measure of central tendency?
1 12 1 12
Common Core Math 1 Unit 2 One Variable Statistics
21 | P a g e
Create a box plot for each set of data. If the data set has outliers, create a modified box plot. Clearly show
your calculations.
5. {12, 2, 3, 4, 5, 3, 5, 7, 6}
Q1:
Q2:
Q3:
Range:
Interquartile Range:
Greatest Value:
Least Value:
Five-Number Summary:
6. 11, 5, 7, 9, 3, 8, 9
Q1:
Q2:
Q3:
Range:
Interquartile Range:
Greatest Value:
Least Value:
Five-Number Summary:
8. How would the values for the box plot change for #6 if {2} was added to the data?
9. How would the values for the box plot change for #6 if {12} was added to the data?
1 12 1 12
10. x is an integer less than 13. What is the largest possible value of x if it is an outlier?
{x, 11, 13, 14, 15, 15, 15, 16, 16, 17, 17}
Show your work and sketch the modified box plot. The largest possible value of x is ______.
7. Which numbers would be considered outliers for #5 and #6?
a. For the data set in #5, outliers would be numbers less than ____________ or greater
than _____________.
b. For the data set in #6, outliers would be numbers less than ____________ or greater
than _____________.
Common Core Math 1 Unit 2 One Variable Statistics
22 | P a g e
For each set of data, box plot, or five-number summary calculate the least number that could be in the set
that would not be considered an outlier and the greatest number that could be in the set that would not be
considered an outlier. State how many outliers the set has. (no outliers, exactly # outliers, or at least #
outliers.
11. {3, 7, 7, 9, 9, 11, 17}
For this set, outliers would be less than _______ or greater than _______. This data set
has__________________________________ outlier(s). List the numbers in the set that are known
outliers: __________________.
12. {1, 1, 7, 8, 9, 9, 9, 10, 12, 20}
For this set, outliers would be less than _______ or greater than _______. This data set
has__________________________________ outlier(s). List the numbers in the set that are known
outliers: __________________.
13. Five-number summary: {1 / 8 / 9 / 10 / 13}
For this set, outliers would be less than _______ or greater than _______. This data set
has__________________________________ outlier(s). List the numbers in the set that are known
outliers: __________________.
14. Five-number summary: {2 / 12 / 13 / 16 / 22.5}
For this set, outliers would be less than _______ or greater than _______. This data set
has__________________________________ outliers. List the numbers in the set that are known
outliers: __________________.
15.
For this set, outliers would be less than _______ or greater than _______. This data set
has__________________________________ outlier(s). List the numbers in the set that are known
outliers: __________________.
1 12
Common Core Math 1 Unit 2 One Variable Statistics
23 | P a g e
Comparing Box Plots
Data sets can be compared based on their four key characteristics: center, spread, shape, and outliers. Depending on
the type of graph, you can also use other characteristics. For example, box plots can be compared based on their lower
and upper quartile values.
Many customers have been complaining about the service that they have received from the waiters at a
restaurant. The manager gave the servers a 120-question test about their job. The scores for the first test
were unacceptable, so the manager had the staff retrained and then gave them a second test. The test
scores are showed in figure 2.
2. Compare the two box plots.
3. What conclusions would you draw if you were the restaurant’s manager? Create a numbered list.
The head chef has not been happy with the consistency of the other cooks in the kitchen and has been
keeping track of the number of errors they make each day. Figure 3 shows how many errors each of the four
cooks made each day last month.
4. Analyze and compare the box plots.
5. Draw conclusions about the workers based on the box plots. If you were the head chef, what
recommendations would you make to the owner of the restaurant? Create a numbered list.
1. Describe, compare, and contrast the two box plots.
Test Scores
1st Period
100 90 80
2nd Period
40
35
30
25
20
15
10
5
Common Core Math 1 Unit 2 One Variable Statistics
24 | P a g e
6. What information is not presented in the box plots, but would be helpful to better understand the
situation? Make a numbered list.
Comparing Dot Plots
A candy company has two manufacturing plants. Quality control experts randomly sampled bags of candy to
see how the actual weight of the candy compared to the weight listed on the bag. The scale measured the
weight of each bag to the nearest half of a gram.
7. Compare and contrast the data in the two dot plots.
8. Add eight data points to the dot plot for manufacturing plant B so that the data distribution becomes more peaked
and less widely spread.
25
0.0
24
9.0
24
8.0
24
7.0
25
1.0
25
2.0
Actual Weight of the Candy in the Bag (g) 2
50
.0
24
9.0
24
8.0
24
7.0
25
1.0
25
2.0
Actual Weight of the Candy in the Bag (g)
Manufacturing Plant A Manufacturing Plant A Manufacturing Plant B
Common Core Math 1 Unit 2 One Variable Statistics
25 | P a g e
Unit 2 Review
1. Is it more appropriate to use standard deviation or IQR with mean?
2. Is it more appropriate to use standard deviation or IQR with median?
3. Answer the questions based on the box-and-whisker plot.
a) What is the median? b) What is lower quartile? c) What is the least value?
d) What is the greatest value? e) What is the upper quartile?
4. The histogram shows
the graphs of last month’s
telephone bills for all of
the customers in a town.
A) How many customers
had bills that were less
than $40?
B) Based on the
histogram, how many
customers does the
telephone company have?
5. Which measure of spread is calculated by using absolute values?
6. Which measure of spread is calculated by using square roots?
1 10
Common Core Math 1 Unit 2 One Variable Statistics
26 | P a g e
7. A box plot has a minimum value of 8 and a maximum value of 52. The lower quartile is 22, the
middle quartile is 25, and the upper quartile is 30.
Clearly show your work.
For this set, outliers would be less than _______ or greater than _______.
Are there any outliers?
8. a) Which measure of center is it most appropriate to use to describe a set of data which includes
outliers?
b) The data should be written in numerical order before calculating which measure of center?
9. a) Calculate the median for the following set of data {2, 6, 7, 8, 95}
b) Calculate the median for the following set of data {8, 5, 2, 9, 1, 2, 7, 10}
10. a) Calculate the mean for the following set of data {9, 1, 5}
b) Calculate the mean for the following set of data {2, 7, 1, 3, 7}
11. Two sets of data have the same mean and median. The first set of data has a standard deviation
of 3. The second set of data has a standard deviation of 13. In which set would you expect to find
the data closer to the center? Explain.
12. Quiz scores from Mr. Warner’s 3rd period class.
Quiz Scores
a) What is the median score on the last quiz?
b) Calculate the average (arithmetic mean) score on the last quiz.
1 10
Common Core Math 1 Unit 2 One Variable Statistics
27 | P a g e
13. Calculate the range of the data from the dot plot in #12.
14. A box plot has a minimum value of 8 and a maximum value of 32. The lower quartile is 22, the
middle quartile is 25, and the upper quartile is 30.
a) What is the range?
b) What is the interquartile range?
15. Sketch a dot plot which has a uniform shape.
16. Determine whether each of the following data is categorical (C) or quantitative (Q).
______ the number of minutes each student spends studying math each day
______ students’ favorite classes at school (Math will win!)
______ the high temperatures for Raleigh, NC on 10/21 for each of the past 100 years
______ the types of apples at the grocery store
______ the number of times Bobo says “Sasquach” on each episode of Finding Bigfoot.
17. Most students studied hard and did very well on the test. A few students procrastinated and
failed the test. The scores were used to create a histogram. Which of the four shapes would best
describe this histogram?
18. Create a box plot for the data. Show your work. {2, 4, 5, 6, 6, 8, 9}
19. Create a dot plot that is mound shaped.
20. What are the four main characteristics that should be addressed when describing the distribution
of data?
1 10
1 10
Common Core Math 1 Unit 2 One Variable Statistics
28 | P a g e
Answers:
Pages 3 – 6: Vocabulary
box plot Also called a box-and-whisker plot. A graph of quantitative data built using the five number summary of the data. The graph is made by making vertical lines above the lower quartile, median, and upper quartile on a number line and connecting these vertical lines with horizontal segments to form the box. The minimum and maximum values are represented by dots and connected by segments, or "whiskers", to the box. Box plots can be drawn either horizontally or vertically.
categorical data Data that records qualities or characteristics of an individual, such as gender or eye color. (Also call qualitative data)
clustering When data seems to be gathered around a particular value or values.
data A collection of information in context.
dot plot A graph of quantitative data using dots placed above a simple number line. If a data value occurs more than once, the dots are stacked on top of each other over that value on the number line.
frequency How often something happens (usually during a period of time).
frequency distribution A table that summarizes the distribution of quantitative data by dividing the data into intervals (also called classes) and counting the number of data values within each interval. This count is called the frequency. A histogram is a graph of a frequency distribution of quantitative data where intervals of data are represented with bars. The widths of the bars are determined by the length of the intervals into which the data has been divided and the heights of the bars are proportional to the class frequencies.
interquartile range A measure of the spread of the middle 50% of a set of quantitative data; the difference between the upper and lower quartiles. IQR = Q3 − Q1
mean A numerical measure of center that is the arithmetic average of the data.
mean absolute deviation (MAD)
A numerical measure of spread that shows how much data values vary from the mean for a quantitative data set. A low mean absolute deviation indicates that the data points tend to be very close to the mean, whereas a high mean absolute deviation indicates that the data points are spread out over a large range of values. The process of calculating the mean absolute deviation involves taking the absolute value of the deviations from the mean.
measures of center Numerical measures that describe the typical value of a quantitative data set. In this unit, we will be studying the mean and the median.
measures of spread Numerical measures that describe how much values typically vary from the center in a quantitative data set. In this unit, we will be studying interquartile range and standard deviation.
Common Core Math 1 Unit 2 One Variable Statistics
29 | P a g e
median A numerical measure of center that describes the middle value of a data set. Note that the median does not have to be one of the values in the data set, but a value that divides the data set in half so that 50% of the data values lie above the median and 50% of the data values fall below the median.
modified box plot A box plot that indicates which data values, if any, are outliers by representing them as dots separate from the box plot. The whisker(s) connect the box to the lowest and/or highest data values that are not outliers, instead of the minimum and/or maximum values.
outlier A data value that does not fit the overall pattern of the data distribution. In the case of one-variable data, an outlier is a value that is more than 1.5 IQR above the third quartile or below the first quartile.
population In statistics, a set of individuals that we wish to describe and/or make predictions about, such as group of people, a row of plants, a set of batteries, a group of rats, or bacteria placed into test tubes.
quantitative data Data that measures a characteristic of an individual, such as height, weight, or age. (Also called measurement data)
skewed data Data that tends to have a long tail on one side or the other
standard deviation A numerical measure of spread that shows how much data values vary from the mean for a quantitative data set. A low standard deviation indicates that the data points tend to be very close to the mean, whereas a high standard deviation indicates that the data points are spread out over a large range of values. The process of calculating the standard deviation involves squaring the deviations from the mean.
symmetrical A symmetric distribution is one where the left and right hand sides of the
distribution are roughly equally balanced around the mean.
variable Characteristic recorded about each individual in a data set. Examples: age, time since last online purchase, zip code, gender, height, hair color, etc.
Page 6:
1. Univariate data is data dealing with one variable. In the last unit of this year, you will learn about bivariate data.
2. Data can be used in two ways.
1) data analysis (used to describe)
A restaurant could use data analysis in order to pay its workers based on how many hours they worked, or use
it to try to get people to invest in the business.
2) statistical inference (used to predict)
A restaurant could use statistical inference in order to figure out how much food to order from its vendors, or
use it to figure out how many workers they will likely need for each shift.
3. Data can be grouped into two sets: categorical data and quantitative data. Sometimes quantitative data is called
numerical data. This can be misleading since numbers are sometimes considered categorical.
Common Core Math 1 Unit 2 One Variable Statistics
30 | P a g e
Categorical data is data that doesn’t have units associated with it. Examples include Social Security numbers, best-
selling dish, and favorite waiter. Even though Social Security numbers are numbers, they don’t have a unit associated
with them. It wouldn’t make sense to calculate their mean or median.
Quantitative data is data that is measured in units. Examples include profit in $, and expenses in $. Quantitative data is
what we will be focusing on in unit 2.
4. Data can be displayed in graphs. Examples include dot (or line) plots, histograms, box plots, scatter plots, and
frequency tables.
Data can also be displayed as numbers. Examples include mean, median, range, standard deviation, and interquartile
range (IQR).
5. The four characteristics that should be addressed when dealing with data:
1) measure of center (mean or median)
2) measure of spread (MAD or standard deviation when dealing with the mean, IQR when dealing with median,
and range for both median and mean)
3) shape (skewed right, skewed left, mound, and uniform)
4) outliers (obvious outliers as well as statistical outliers which are calculated by using the IQR and a boxplot)
Page 7 1. a) C b) Q c) Q d) Q e) Q f) C g) C h) C i) Q j) C
Pages 8 & 9 1 a.
b. The distribution of ages is mound shaped (symmetrical) and centered around 31 to 40 years. The ages
possibly ranged from 1 to 70 years, giving a range of 69. There are no outliers.
c. Answers should vary. One issue with histograms (and box plots) is that the individual data is unknown.
Each example should have 1 number between 1-10, 4 numbers between 11-20, 5 numbers between 21-30, 7
numbers between 31-40, 6 numbers between 41-50, 5 numbers between 51-60, and 1 number in the range
of 61-70.
0
2
4
6
8
1-1
0
11
-20
21
-30
31
-40
41
-50
51
-60
61
-70
Fre
qu
en
cy
Age (in years)
Series1
This histogram could also be drawn by writing
1, 11, 21, 31, 41, 51, 61, and 71 on the
horizontal axis. The histogram should have a
title of “Restaurant Patrons” or something else
appropriate.
Common Core Math 1 Unit 2 One Variable Statistics
31 | P a g e
2) a, b, and c.
Pages 11 – 15.
1. Answers will vary. 2. 85 3. Answers will vary. 4. Answers will vary. Some may look at Johnny’s increasing
trend over time. Some may look at Will’s consistency. Some may note Anna’s one low grade. Just using the mean to
describe each student is not enough. I think that we can all agree that they are not “equal” in their test performance.
We need more information than just the typical test score. One thing to look at is how consistent each student is, and
measures of spread will give us that information.
5.
The distribution of wait times is skewed right. Most of
the wait times fell to the left of 6 minutes with one
extreme value between 18 – 20.9 minutes which is an
outlier. The center is around 3 minutes. The range is
18.5. The title of the histogram should be something like
“Waiting Times”. The vertical axis should be labeled
“Frequency”.
Time in minutes
d) A dot plot would be more difficult to make and not as useful. There are a lot of different data
points, so a dot plot would have to have all numbers, by tenths, from 0 to 18.5. A dot plot with
186 possible data points would be quite large. Also, it isn’t useful for the context of the problem.
The difference between a customer waiting 1.2 minutes and 1.3 minutes is a matter of 6
seconds. This is fairly minimal.
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Johnny
Test 1 65 85 -20
Test 2 82 85 -3
Test 3 93 85 8
Test 4 100 85 15
Sum of Deviations 0
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Will
Test 1 82 85 -3
Test 2 86 85 1
Test 3 88 85 3
Test 4 84 85 -1
Sum of Deviations 0
Common Core Math 1 Unit 2 One Variable Statistics
32 | P a g e
Score
𝑥
Mean
�̅�
Deviation from
the Mean
𝑥 − �̅�
Anna
Test 1 80 85 -5
Test 2 99 85 14
Test 3 73 85 -12
Test 4 88 85 3
Sum of Deviations 0
6. The mean is the balance point, so the distances between the mean and each point balance in either direction.
7. Johnny’s is 11.5. Will’s is 2. Anna’s is 8.5.
8. The MAD tells us the average deviation from the mean for each student, telling us how much their test scores
typically vary from their average test score. Will’s MAD is the lowest, indicating that his test scores are the least spread
out.
9. Johnny’s test scores typically fall within 11-12 points of his average score of 85. So Johnny’s scores typically vary
anywhere from 74 to 96.
Standard deviation tables.
Johnny’s Data
Test Value (𝑥) Deviations from the Mean
Value – Mean (𝑥 − 𝜇)
Squared Deviations from the Mean (Value – Mean)2
(𝑥 − 𝜇)2
1 65 65-85 = -20 (-20)2 = 400
2 82 82-85 = -3 (-3)2 = 9
3 93 93-85 = 8 (8)2 = 64
4 100 100-85 = 15 (15)2 = 225
Mean 𝜇 = 85 Sum ∑(𝑥 − 𝜇) = 0 Sum ∑(𝑥 − 𝜇)2 = 698
Average of Squared
Deviations ∑(𝑥 − 𝜇)2
𝑛=
698
4= 174.5
Square Root of Average of
Squared Deviations √
∑(𝑥 − 𝜇)2
𝑛= √174.5 ≈ 13.2
Common Core Math 1 Unit 2 One Variable Statistics
33 | P a g e
Will’s Data
Test Value (𝑥) Deviations from the Mean
Value – Mean (𝑥 − 𝜇)
Squared Deviations from the Mean
(Value – Mean)2
(𝑥 − 𝜇)2
1 82 -3 9
2 86 1 1
3 88 3 9
4 84 -1 1
Mean 𝜇 = 85 Sum ∑(𝑥 − 𝜇) = 0 Sum ∑(𝑥 − 𝜇)2 = 20
Average of Squared
Deviations ∑(𝑥 − 𝜇)2
𝑛=
20
4= 5
Square Root of Average of
Squared Deviations √
∑(𝑥 − 𝜇)2
𝑛= √5 ≈ 2.2
Anna’s Data
Test Value (𝑥) Deviations from the Mean
Value – Mean (𝑥 − 𝜇)
Squared Deviations from the Mean
(Value – Mean)2
(𝑥 − 𝜇)2
1 80 -5 25
2 99 14 196
3 73 -12 144
4 88 3 9
Mean 𝜇 = 85 Sum ∑(𝑥 − 𝜇) = 0 Sum ∑(𝑥 − 𝜇)2 = 374
Average of Squared
Deviations ∑(𝑥 − 𝜇)2
𝑛=
374
4= 93.5
Square Root of Average of
Squared Deviations √
∑(𝑥 − 𝜇)2
𝑛= √93.5 ≈ 9.7
10. The sum of the squared deviations from the mean. In other words, subtract the mean from each value, square
these differences and add them up.
11. Anna’s test grades typically vary by about ten points from her average score of 85. So her test scores typically vary
from about 75 to 95.
Common Core Math 1 Unit 2 One Variable Statistics
34 | P a g e
12. Answers will vary. Will is the most consistent as his scores vary by about 2.2 points from the mean of 85 points.
13. Answers will vary. The measure of center only gives us a one-number summary of a data set. It doesn’t give us an
idea of how much the numbers in the set vary from the center. The measure of spread gives us an idea of how far from
the center we can expect to find the data. MAD is fairly easy to calculate, but it isn’t as accurate as standard deviation,
especially when outliers are in a set of data. However, standard deviations involve square roots and they are often
irrational numbers, so MAD is used as the center until students have learned about square roots and irrational
numbers.
Pages 16 – 18
1. 3, 6, 8, 8, 8, 9, 11
Q1: 6 Q2: 8 Q3: 9 Greatest Value: 11 Least Value: 3 five-number summary: 3/6/8/9/11 Range: 11 – 3 = 8
Interquartile Range: 9 – 6 = 3
2. 2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12
Q1:4+4
2= 4 Q2:
6+6
2 = 6 Q3:
8+9
2= 8.5 Greatest Value: 12 Least Value: 2 five-number summary: 2/4/6/8.5/12
Range: 12 – 2 = 10 Interquartile Range: 8.5 – 4 = 4.5
3. The middle quartile is also the median.
4. a. 6 – 1.5(3) = 1.5 b. 9 + 1.5(3) = 13.5 c. 4 – 1.5(4.5) = –2.75 d. 8.5 + 1.5(4.5) = 15.25
5. 2, 3, 5, 5, 6, 6, 8, 9, 12
Q1: 3+5
2= 4 Q2: 6 Q3:
8+9
2= 8.5
Range: 12 – 2 = 10 Interquartile Range: 8.5 – 4 = 4.5 Greatest Value:12 Least Value:2
5-Number Summary: 2/4/6/8.5/12
6. 3, 5, 7, 8, 9, 9, 11
Q1: 5 Q2: 8 Q3: 9 Range: 11 – 3 = 8 Interquartile Range: 9 – 5 = 4 Greatest Value: 11 Least Value: 3
5-Number Summary: 3/5/8/9/11
7. a) –2.25 ; 11.75 b) –1 ; 15
8. Since 2 would be the new “least value”, the range will increase. The interquartile range will either increase or stay the
same. The quartiles will either decrease or stay the same. {2, 3, 5, 7, 8, 9, 9, 11}
Range: 11 – 2 = 9 ; Least value: 2 ; Q1: 3+5
2= 4 ; Q2:
7+8
2= 7.5 ; Q3=
9+9
2= 9 ; IQ range: 9 – 4 = 5
9. Since 12 would be the new “greatest value”, the range will increase. The interquartile range and the quartiles will
either increase or stay the same. {3, 5, 7, 8, 9, 9, 11, 12}
Range: 12 – 3 = 9 ; Greatest value: 12 ; Q1: 5+7
2= 6 ; Q2:
8+9
2= 8.5 ; Q3=
9+11
2= 10 ; IQ range: 10 – 6 = 4
1 12
1 12
Common Core Math 1 Unit 2 One Variable Statistics
35 | P a g e
10. 13 – 1.5(3) = 8.5. The largest possible value of x is 8.
11. 1 ; 17 ; no outliers ; none
12. 2.5; 14.5 ; exactly 4 ; 1, 1, 12, 20
13. 4.5; 13.5 ; at least 2 ; 1, 13
14. 6; 22; at least 2 ; 2, 22.5
15. 2; 10; at least 1 ; 1, 2
Pages 19 & 20
1. Answers may vary. Both sets of data have the same interquartile range (94 – 84 = 10 ; 96 – 86 = 10) and the same
range (96 – 80 = 16 ; 100 – 84 = 16). 25% of the scores in 1st period were at or above 94. 50% of the scores in 2nd
period were at or above 94. The lowest score in 2nd period was the same as the lower quartile in 1st period, so 25% of the
scores from 1st period were less than or equal to the lowest score in 2nd period. The highest score in 1st period was the
same as the upper quartile in 2nd period, so 25% of the scores in 2nd period were greater than or equal to the highest score
in 1st period. Overall, the students in 2nd period earned higher scores than the students in 1st period.
2. Answers may vary. When comparing box plots, focus on the range, IQR, and the 5-number summary.
3. Answers may vary.
4. Answers may vary. When comparing box plots, focus on the range, IQR, and the 5-number summary.
5. Answers may vary.
6. Answers may vary.
7. Answers may vary. The data from Plant A is much more peaked and less widely spread than the data from Plant B.
Both dot plots are mound-shaped. Both have a center of around 250 grams. Neither set of data has outliers.
8. The 8 points should be added in the 249.5 to 250.5 range.
j
Common Core Math 1 Unit 2 One Variable Statistics
36 | P a g e