One Variable Statistics - Unbound · 2019-11-28 · Common Core Math 1 Unit 2 One Variable Statistics 3 | P a g e Main Concepts Page # Study Guide 2 Vocabulary 3 – 6 Identify and

Common Core Math 1 Unit 2 One Variable Statistics

1 | P a g e

Name:____________________________

Period: _____

One Variable Statistics


2 | P a g e


3 | P a g e

Main Concepts Page #

Study Guide 2

Vocabulary 3 – 6

Identify and Describe Data 6 – 7

Frequency Tables, Dot Plots, and Histograms 8 – 11

Measures of Center and Spread 11 – 15

Box Plots & Outliers 16 – 18

Comparing Data Sets 19 – 20

Unit 2 Review 21 – 23

Answer Key 24 – 31


4 | P a g e


5 | P a g e

Common Core Standards

N-Q.1 Use units as a way to understand problems and to guide the solution of multi-step problems; choose and interpret units consistently in formulas; choose and interpret the scale and the origin in graphs and data displays.

N-Q.2 Define appropriate quantities for the purpose of descriptive modeling.

N-Q.3 Choose a level of accuracy appropriate to limitations on measurement when reporting quantities.

S-ID.1 Represent data with plots on the real number line (dot plots, histograms, and box plots).

S-ID.2 Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.

S-ID.3 Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

As a result of learning, students should be able to…

represent data with plots on the real number line (dotplots, histograms, and boxplots).

choose and interpret the scale and the origin in data displays.

choose an appropriate level of accuracy when reporting statistical quantities.

use technology to calculate summary statistics and visually represent data.

based on the shape of a data distribution, choose the appropriate measures of center (mean or median) and spread (standard deviation or interquartile range) to describe the distribution.

interpret summary statistics for center and spread in the context of the data.

compare the center and spread of two or more different data sets in context.

interpret differences in shape, center, and spread in context.

use the context of the data to explain why its distribution takes on a particular shape.

explain the effect of outliers on the shape, center, and spread of data distributions.

use the 1.5(IQR) rule to determine if there are outliers in a data set.

define appropriate quantities to measure when collecting quantitative data to describe a population.

Essential Understandings:

There are two types of data: quantitative and qualitative. Data can be described graphically and numerically.

Frequency tables, dot plots, box plots, and histograms are visual representations of data distributions that can

be created and described.

The calculator can be used to create dot plots, box plots, and histograms to represent a data distribution.

There are situations when the median should be used to describe the center of a distribution instead of mean.

The standard deviation can be calculated by hand or on the calculator to describe the spread of a data

distribution when the mean is used to describe the data’s center.

Box plots can be used to visually represent a data distribution, use interquartile range to describe the spread of

a data distribution, use the median to describe the center of the data, and recognize the influence of outliers on

visual and numerical representations of data.

Data sets can be compared using visual and numerical representations.

An outlier can affect the shape, center, and spread of data distributions.

The 1.5 IQR rule is used to determine if there are outliers in a data set.

Essential Questions

How can the representation and analysis of data inform and influence decisions?

How can you collect, organize, and display data?

What is the most appropriate way to display a given set of data?

How can we use statistics to manipulate data?

When informed of a statistic, how can you determine if the information is misleading?


6 | P a g e

box plot

Example and Notes to help YOU remember:

categorical data


clustering


data


dot plot


frequency



7 | P a g e

frequency distribution


interquartile range


mean


mean absolute distribution


measures of center


measures of spread



8 | P a g e

median


modified box plot


outlier


population


quantitative data


skewed data



9 | P a g e

standard deviation


symmetrical


variable (statistics)



10 | P a g e

Read the questions and answer them based on what you’ve learned about statistics in previous years. After

you have answered each one to the best of your ability, underline your original answers and then read the

answers in the answer key. Edit your answers. Make note of information that you underlined that was

incorrect. Make note of new information that you added in order to make the answer more precise.

1) What does univariate mean?

2) Data collection has two key purposes. What are they?

3) What are the two types of data? Which type of data is each of the following: Social Security

numbers, best-selling dish, profit in $, expenses in $, best employee

4) What are two ways in which data can be displayed? What are examples of each?

5) What are the four characteristics that should be addressed when dealing with data? What are

examples of each?


11 | P a g e

1. Determine whether the following data is categorical (C) or quantitative (Q) a. The candidate a survey respondent will support in an upcoming election. b. The length of time of people’s drive to work. c. The number of televisions in a household. d. The distance kickers for a football team can kick a football. e. The number of pages copied in the copy room each day. f. The kind of tree in each person’s front yard in a neighborhood. g. The type of blood a person has. h. The jersey numbers of the football team. i. The heights of the tallest buildings in the world. j. The language spoken by 2000 people randomly surveyed at JFK Airport.

Frequency tables, dot plots, and histograms.

Data can be described graphically or numerically. Three ways to describe quantitative data graphically are

with frequency tables, box plots, and histograms.

In this unit, we are focusing on quantitative data. Dot plots allow us to see the individual data. With

histograms and box plots the individual data is lost. Frequency tables can display individual data depending

upon how they are constructed. Frequency tables can also be used for categorical data.

Carefully check each graph’s title and the labels of its axes.

There are 4 things to address when describing the distribution of data: shape, center, spread, and outliers

There are 4 basic shapes that you should be able to identify. Not all graphs will be 1 of these 4 shapes.

*Based on the names, sketch a histogram for each of the 4 basic shapes.

skewed left (extreme low data) skewed right (extreme high data)

mound (or symmetrical) uniform

The center and any outliers can be estimated by carefully looking at the dot plot, histogram, or frequency table.

The spread for dot plots and histograms can be described by the data’s range.

Frequency tables can be used to quickly display or organize data.


12 | P a g e

A dot plot is a single axis graph and it can be quickly created from a frequency table. Be precise when creating your

columns of dots.

A histogram is a type of bar graph. The bars touch to show continuity. Bars normally represent equal (or close to

equal) intervals of data. They are often referred to as bins, classes or groupings. There should be 5-8 bins

unless otherwise stated, and they should be the same width. Sometimes just the lower value of the bar is

given to the left. Sometimes the bar’s range is given below the bar. Be careful about overlapping data.

Scores on the last quiz: 3.5, 6, 6.5, 7, 7, 8, 8.5, 9, 9, 9.5, 10

* Use the scores on the last quiz to create a dot plot and a histogram.

1. The following frequency table lists the ages of patrons dining at a local restaurant at 7:00 PM.

Age Frequency

1-10 1

11-20 4

21-30 5

31-40 7

41-50 6

51-60 5

61-70 1

a) Construct a histogram for the data. Use the graph paper.


13 | P a g e

b) Describe the data distribution in context (shape, center, spread, and any outliers).

c) Create two sets of data that could be represented by the histogram.

2) A bank wants to improve its customer service. Before deciding to hire more workers, the manager decides to get some information on the waiting times customers currently experience. During a week, 50 customers were randomly selected, and their waiting times, in minutes, were recorded.

The data are as follows: 18.5, 9.1, 3.1, 6.2, 1.3, 0.5, 4.2, 5.2, 0.0, 10.8, 5.8, 1.8, 1.5, 1.9, 0.4, 3.5, 8.5, 11.1, 0.3, 1.2, 4.4, 3.8, 5.8, 1.9, 3.6, 2.5, 4.5, 5.8, 1.5, 0.7, 0.8, 0.1, 9.7, 2.6, 0.8, 1.2, 2.9, 3.0, 3.2, 2.8, 10.9, 0.1, 5.9, 1.4, 0.3, 5.5, 4.8, 0.9, 1.6, and 2.2.

a) Construct a frequency table for the data which will be used to make a histogram. (Remember to define the classes so that there are approximately 5-8 groupings.)

b) Construct a histogram. Use the graph paper.

c) Describe the data distribution in context.

d) Would it make more sense to display the data in a dot (line) plot? Explain.


14 | P a g e

Creating a histogram on the calculator.

Use the following heights of a group of middle school students as an example.

Step 1: Enter the height data in L1. Carefully check your data.

Commands: LIST

Step 2: Setup the graph. The screens may look a little different from the

pictures below depending on which calculator you use.

Commands: 2nd ; Y= ; 1

Enter ; Use the arrow key to select “On” and press enter. ; Use the arrow key to select the histogram and

press enter

Step 3: Adjust the window (press the window button) to fit the data. Use Xscl to adjust the width of the

bars. I used 10 here because I want to see how many people are in each category of 10 inches. Ymax

controls the height of the bar on the screen. If you skip this step, the calculator will choose numbers for you

and you might not see your graph when you do step 4. If this occurs, after doing step 4, press “zoom” and

choose “zoomstat”. The calculator will choose values for “window” which will make the histogram fit on

your screen.

Commands: Window

Student Height (in.)

Megan 61

Morgan 65

Kyle 70

Darren 71

Angie 76

Cassie 65

Brady 59

Chris 78

Cody 74

Maria 81

Using this window setting, we will have sorted the data into four categories: Heights between 50 and 59 inches, 60 to 69, 70 to 79 and 80 to 89.


15 | P a g e

Step 4: Look at your graph.

Commands: Graph

Measures of Center and Spread

In previous courses, you have calculated the center of a set of data. The two measures of center that we are going to

use with quantitative data are mean (arithmetic mean) and median.

Review:

*How do you calculate the mean of a set of data?

*How do you calculate the median of a set of data?

*How do you determine whether the mean or median is the most appropriate measure of center for a set of data?

*Calculate the mean and median for the following set of data. {8, 9, 11, 10, 97}

*Calculate the mean and median for the following set of data. {18, 19, 37, 22}

*Which measure of center is the most appropriate for each set?

The measure of center can be used to predict a typical value for a set of data. The measure of center does not give us a

complete picture. The measure of spread is needed to predict how consistent the data will be. Sometimes, we expect

the data to be very consistent. Other times, we would expect a fairly large spread.

*Think of a real-world data set that should have a low measure of spread.

*Think of a real-world data set that should have a fairly large measure of spread.

Consider the following test scores:

Student Test 1 Test 2 Test 3 Test 4

Johnny 65 82 93 100

Will 82 86 88 84

Anna 80 99 73 88

1. Who is the best student? How do you know?

2. What is the mean test score for each student?

Using the Trace button

you can see that there are 5 people in the 70 to 79 inch category. Use the arrow keys to move from one bar to the next.


16 | P a g e

3. Based on the mean, who is the best student?

4. If asked to select one student, who would you pick as the best student? Explain.

Investigation 1: Deviation from the Mean

Usually we calculate the mean, or average, test score to describe how a student is doing. Johnny, Will, and

Anna all have the same average. However, these three students do not seem to be “equal” in their test

performance. We need more information than just the typical test score to describe how they are doing.

One thing we can look at is how consistent each student is with their test performance. Does each student

tend to do about the same on each test, or does it vary a lot from test to test? Measures of spread will give

us that information. In statistics, deviation is the amount that a single data value differs from another

value. Often that other value is the mean.

5. Complete the table below by finding the deviation from the mean for each test score for each student.

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Anna

Test 1

Test 2

Test 3

Test 4

Sum of Deviations

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Johnny

Test 1

Test 2

Test 3

Test 4

Sum of Deviations

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Will

Test 1

Test 2

Test 3

Test 4

Sum of Deviations

6. How does the sum of the deviations from

the mean relate to the mean being the

measure of center for a set of data?


17 | P a g e

Investigation 2: Mean Absolute Deviation

One way to measure consistency is to find the average deviation from the mean. In other words, how far do most

values in a data set fall from the mean? One way to answer this question would be to find the average deviation, or

distance, that the data values fall from the mean. So we would add up the deviations to find the total deviation and

then divide by the number of data values to find the mean deviation. However, the fact that the deviations from the

mean always add up to zero is a problem. No matter what non-zero number we divide zero by, we always get zero!

When talking about spread, a value of zero indicates that there is no spread, or variability. One way to fix this problem

is to look at only the distances from the mean, and not their directions as indicated by the sign of the deviation

(positive or negative). We can take the absolute value of the distances and then find the average distance.

7. Calculate the Mean Absolute Deviation (MAD) for each student’s scores.

8. What does the Mean Absolute Deviation (MAD) tell you about each student? Is there one student who seems to be

more consistent than the others?

9. Interpret the MAD for Johnny in context.

Investigation 3: Calculating the Standard Deviation Below is the formula for calculating the standard deviation. It looks fun, doesn’t it? Let’s break it into parts so we can

see how it is finding the “average deviation from the mean.”

Step 1: In the table below, record the data values (test scores) in the second column labeled “Value.” The x is used to denote a

value from the data set. Johnny’s first value is written for you.

Step 2: Record the mean at the bottom of the second column next to the symbol . Mu (pronounced “mew”) is the

lowercase Greek letter that later became our letter “m.” is another symbol that we use for mean (in addition to �̅�).

Step 3: In the third column, calculate the deviation from the mean for each test score by taking each test score and

subtracting the mean. The first difference has been done for you.

Step 4: Add the values in the third column to find the sum of the deviations from the mean. If you have done

everything correctly so far, the sum should be zero. The capital Greek letter , called sigma, is a symbol that is used to

indicate the sum.

Step 5: Square each deviation to make it positive and record these values in the last column of the table. The first

value is done for you. Remember that parentheses are necessary -202 = -400 and (-20)2 = 400.

Step 6: Find the sum of the squared deviations by adding up the values in the fourth column and putting the sum at

the bottom of the column. This is the sum of the squared deviations from the mean.

Step 7: Find the average of the squared deviations from the mean by dividing the sum of column four by the number

of data values, n, (the number of test scores).

Step 8: “Un-do” the squaring by taking the square root. Now you have found the standard deviation! The symbol for

standard deviation is the lower-case letter sigma, .

n

x 2)(


18 | P a g e

Johnny’s Data

Test Value (𝑥) Deviations from the Mean Value – Mean

(𝑥 − 𝜇)

Squared Deviations from the Mean (Value – Mean)2

(𝑥 − 𝜇)2

1 65 65-85 = -20 (-20)2 = 400

2

3

4

Mean 𝜇 = Sum ∑(𝑥 − 𝜇) = Sum ∑(𝑥 − 𝜇)2 =

Average of Squared Deviations ∑(𝑥 − 𝜇)2

𝑛=

Square Root of Average of Squared

Deviations √

∑(𝑥 − 𝜇)2

𝑛=

Will’s Data


(𝑥 − 𝜇)


(𝑥 − 𝜇)2

1

2

3

4



𝑛=


Deviations √

∑(𝑥 − 𝜇)2

𝑛=


19 | P a g e

10. Translate into words: ∑(𝑥 − 𝜇)2. 11. Interpret Anna’s standard deviation in context. 12. Who is the best student? How do you know?

Finding the mean, median and standard deviation on the calculator.

Press “List”. Enter Johnny’s grades in List 1, Will’s grades in List 2, and Anna’s grades in List 3.

Once all of the scores are entered, and you have checked to make sure that they are correct, press “2nd”

and “MODE” (QUIT).

Press “2nd” and then “LIST” (STAT). Use the arrow key to select “CALC” and then choose “1-Var Stats”.

Press enter and the screen should say “1-Var Stats”. You now need to choose the list that the calculator

will use. Press “2nd” and then “LIST” and select the list you want to use.

Your screen should now state “1-Var Stats” and either L1, L2, or L3. Press “Enter” and look at the

information.

Normally, we will be using Sx (sample standard deviation) for standard deviation. This is used more

frequently because in order to use σx (population standard deviation) we need to have the data for the

entire population and that is not often the case. Sample deviation is the larger of the two values. In this

case, we would use σx because we had all of the students’ scores.

13. Why do you think you were taught about MAD as a measure of spread in previous math courses and

now you’re learning about using standard deviation as the measure of spread when mean is the

measure of center?

Anna’s Data


(𝑥 − 𝜇)


(𝑥 − 𝜇)2

1

2

3

4



𝑛=


Deviations √

∑(𝑥 − 𝜇)2

𝑛=


20 | P a g e

Review: Box Plots

Create a box plot for each set of data. Make note of any numbers in the set that you think are outliers. State

the values for the five-number summary (upper quartile, middle quartile, lower quartile, least value, greatest

value) and the values for the measures of spread (range, and interquartile range). Remember to clearly show

your work for each calculation.

IQR & Outliers:

The IQR can be used to determine whether or not the data set includes an outlier.

First, multiply the IQR by 1.5. Second, subtract the product of 1.5 and the IQR from the first quartile. That

number is the lowest number that would not be considered an outlier. Third, add the product of 1.5 and the

IQR to the third quartile. That is the largest number that would not be considered an outlier. In a modified

box plot, only non-outlier data is used to create the box plot. All outlier points are displayed as circles.

As data is added to a set, the IQR and the five-number summary need to be recalculated in order to

determine whether or not the new data set includes an outlier.

4. Calculate the greatest and least values for the data sets in #1 & #2 that would not be considered outliers.

a. What is the smallest value that would not be an outlier for the data in #1?

b. What is the largest value that would not be an outlier for the data in #1?

c. What is the smallest value that would not be an outlier for the data in #2?

d. What is the largest value that would not be an outlier for the data in #2?

1. {3, 8, 6, 8, 11, 9, 8}

Q1:

Q2:

Q3:

Greatest Value:

Least Value:

five-number summary:

Range:

Interquartile Range:

2. {12, 3, 2, 4, 5, 6, 6, 4, 10, 7, 9, 8}

Q1:

Q2:

Q3:

Greatest Value:

Least Value:

five-number summary:

Range:


3. The middle quartile, or second quartile, is also which measure of central tendency?

1 12 1 12


21 | P a g e

Create a box plot for each set of data. If the data set has outliers, create a modified box plot. Clearly show

your calculations.

5. {12, 2, 3, 4, 5, 3, 5, 7, 6}

Q1:

Q2:

Q3:

Range:


Greatest Value:

Least Value:

Five-Number Summary:

6. 11, 5, 7, 9, 3, 8, 9

Q1:

Q2:

Q3:

Range:


Greatest Value:

Least Value:

Five-Number Summary:

8. How would the values for the box plot change for #6 if {2} was added to the data?

9. How would the values for the box plot change for #6 if {12} was added to the data?

1 12 1 12

10. x is an integer less than 13. What is the largest possible value of x if it is an outlier?

{x, 11, 13, 14, 15, 15, 15, 16, 16, 17, 17}

Show your work and sketch the modified box plot. The largest possible value of x is ______.

7. Which numbers would be considered outliers for #5 and #6?

a. For the data set in #5, outliers would be numbers less than ____________ or greater

than _____________.

b. For the data set in #6, outliers would be numbers less than ____________ or greater

than _____________.


22 | P a g e

For each set of data, box plot, or five-number summary calculate the least number that could be in the set

that would not be considered an outlier and the greatest number that could be in the set that would not be

considered an outlier. State how many outliers the set has. (no outliers, exactly # outliers, or at least #

outliers.

11. {3, 7, 7, 9, 9, 11, 17}

For this set, outliers would be less than _______ or greater than _______. This data set

has__________________________________ outlier(s). List the numbers in the set that are known

outliers: __________________.

12. {1, 1, 7, 8, 9, 9, 9, 10, 12, 20}



outliers: __________________.

13. Five-number summary: {1 / 8 / 9 / 10 / 13}



outliers: __________________.

14. Five-number summary: {2 / 12 / 13 / 16 / 22.5}


has__________________________________ outliers. List the numbers in the set that are known

outliers: __________________.

15.



outliers: __________________.

1 12


23 | P a g e

Comparing Box Plots

Data sets can be compared based on their four key characteristics: center, spread, shape, and outliers. Depending on

the type of graph, you can also use other characteristics. For example, box plots can be compared based on their lower

and upper quartile values.

Many customers have been complaining about the service that they have received from the waiters at a

restaurant. The manager gave the servers a 120-question test about their job. The scores for the first test

were unacceptable, so the manager had the staff retrained and then gave them a second test. The test

scores are showed in figure 2.

2. Compare the two box plots.

3. What conclusions would you draw if you were the restaurant’s manager? Create a numbered list.

The head chef has not been happy with the consistency of the other cooks in the kitchen and has been

keeping track of the number of errors they make each day. Figure 3 shows how many errors each of the four

cooks made each day last month.

4. Analyze and compare the box plots.

5. Draw conclusions about the workers based on the box plots. If you were the head chef, what

recommendations would you make to the owner of the restaurant? Create a numbered list.

1. Describe, compare, and contrast the two box plots.

Test Scores

1st Period

100 90 80

2nd Period

40

35

30

25

20

15

10

5


24 | P a g e

6. What information is not presented in the box plots, but would be helpful to better understand the

situation? Make a numbered list.

Comparing Dot Plots

A candy company has two manufacturing plants. Quality control experts randomly sampled bags of candy to

see how the actual weight of the candy compared to the weight listed on the bag. The scale measured the

weight of each bag to the nearest half of a gram.

7. Compare and contrast the data in the two dot plots.

8. Add eight data points to the dot plot for manufacturing plant B so that the data distribution becomes more peaked

and less widely spread.

25

0.0

24

9.0

24

8.0

24

7.0

25

1.0

25

2.0

Actual Weight of the Candy in the Bag (g) 2

50

.0

24

9.0

24

8.0

24

7.0

25

1.0

25

2.0

Actual Weight of the Candy in the Bag (g)

Manufacturing Plant A Manufacturing Plant A Manufacturing Plant B


25 | P a g e

Unit 2 Review

1. Is it more appropriate to use standard deviation or IQR with mean?

2. Is it more appropriate to use standard deviation or IQR with median?

3. Answer the questions based on the box-and-whisker plot.

a) What is the median? b) What is lower quartile? c) What is the least value?

d) What is the greatest value? e) What is the upper quartile?

4. The histogram shows

the graphs of last month’s

telephone bills for all of

the customers in a town.

A) How many customers

had bills that were less

than $40?

B) Based on the

histogram, how many

customers does the

telephone company have?

5. Which measure of spread is calculated by using absolute values?

6. Which measure of spread is calculated by using square roots?

1 10


26 | P a g e

7. A box plot has a minimum value of 8 and a maximum value of 52. The lower quartile is 22, the

middle quartile is 25, and the upper quartile is 30.

Clearly show your work.

For this set, outliers would be less than _______ or greater than _______.

Are there any outliers?

8. a) Which measure of center is it most appropriate to use to describe a set of data which includes

outliers?

b) The data should be written in numerical order before calculating which measure of center?

9. a) Calculate the median for the following set of data {2, 6, 7, 8, 95}

b) Calculate the median for the following set of data {8, 5, 2, 9, 1, 2, 7, 10}

10. a) Calculate the mean for the following set of data {9, 1, 5}

b) Calculate the mean for the following set of data {2, 7, 1, 3, 7}

11. Two sets of data have the same mean and median. The first set of data has a standard deviation

of 3. The second set of data has a standard deviation of 13. In which set would you expect to find

the data closer to the center? Explain.

12. Quiz scores from Mr. Warner’s 3rd period class.

Quiz Scores

a) What is the median score on the last quiz?

b) Calculate the average (arithmetic mean) score on the last quiz.

1 10


27 | P a g e

13. Calculate the range of the data from the dot plot in #12.

14. A box plot has a minimum value of 8 and a maximum value of 32. The lower quartile is 22, the

middle quartile is 25, and the upper quartile is 30.

a) What is the range?

b) What is the interquartile range?

15. Sketch a dot plot which has a uniform shape.

16. Determine whether each of the following data is categorical (C) or quantitative (Q).

______ the number of minutes each student spends studying math each day

______ students’ favorite classes at school (Math will win!)

______ the high temperatures for Raleigh, NC on 10/21 for each of the past 100 years

______ the types of apples at the grocery store

______ the number of times Bobo says “Sasquach” on each episode of Finding Bigfoot.

17. Most students studied hard and did very well on the test. A few students procrastinated and

failed the test. The scores were used to create a histogram. Which of the four shapes would best

describe this histogram?

18. Create a box plot for the data. Show your work. {2, 4, 5, 6, 6, 8, 9}

19. Create a dot plot that is mound shaped.

20. What are the four main characteristics that should be addressed when describing the distribution

of data?

1 10

1 10


28 | P a g e

Answers:

Pages 3 – 6: Vocabulary

box plot Also called a box-and-whisker plot. A graph of quantitative data built using the five number summary of the data. The graph is made by making vertical lines above the lower quartile, median, and upper quartile on a number line and connecting these vertical lines with horizontal segments to form the box. The minimum and maximum values are represented by dots and connected by segments, or "whiskers", to the box. Box plots can be drawn either horizontally or vertically.

categorical data Data that records qualities or characteristics of an individual, such as gender or eye color. (Also call qualitative data)

clustering When data seems to be gathered around a particular value or values.

data A collection of information in context.

dot plot A graph of quantitative data using dots placed above a simple number line. If a data value occurs more than once, the dots are stacked on top of each other over that value on the number line.

frequency How often something happens (usually during a period of time).

frequency distribution A table that summarizes the distribution of quantitative data by dividing the data into intervals (also called classes) and counting the number of data values within each interval. This count is called the frequency. A histogram is a graph of a frequency distribution of quantitative data where intervals of data are represented with bars. The widths of the bars are determined by the length of the intervals into which the data has been divided and the heights of the bars are proportional to the class frequencies.

interquartile range A measure of the spread of the middle 50% of a set of quantitative data; the difference between the upper and lower quartiles. IQR = Q3 − Q1

mean A numerical measure of center that is the arithmetic average of the data.

mean absolute deviation (MAD)

A numerical measure of spread that shows how much data values vary from the mean for a quantitative data set. A low mean absolute deviation indicates that the data points tend to be very close to the mean, whereas a high mean absolute deviation indicates that the data points are spread out over a large range of values. The process of calculating the mean absolute deviation involves taking the absolute value of the deviations from the mean.

measures of center Numerical measures that describe the typical value of a quantitative data set. In this unit, we will be studying the mean and the median.

measures of spread Numerical measures that describe how much values typically vary from the center in a quantitative data set. In this unit, we will be studying interquartile range and standard deviation.

http://en.wikipedia.org/wiki/Mean


29 | P a g e

median A numerical measure of center that describes the middle value of a data set. Note that the median does not have to be one of the values in the data set, but a value that divides the data set in half so that 50% of the data values lie above the median and 50% of the data values fall below the median.

modified box plot A box plot that indicates which data values, if any, are outliers by representing them as dots separate from the box plot. The whisker(s) connect the box to the lowest and/or highest data values that are not outliers, instead of the minimum and/or maximum values.

outlier A data value that does not fit the overall pattern of the data distribution. In the case of one-variable data, an outlier is a value that is more than 1.5 IQR above the third quartile or below the first quartile.

population In statistics, a set of individuals that we wish to describe and/or make predictions about, such as group of people, a row of plants, a set of batteries, a group of rats, or bacteria placed into test tubes.

quantitative data Data that measures a characteristic of an individual, such as height, weight, or age. (Also called measurement data)

skewed data Data that tends to have a long tail on one side or the other

standard deviation A numerical measure of spread that shows how much data values vary from the mean for a quantitative data set. A low standard deviation indicates that the data points tend to be very close to the mean, whereas a high standard deviation indicates that the data points are spread out over a large range of values. The process of calculating the standard deviation involves squaring the deviations from the mean.

symmetrical A symmetric distribution is one where the left and right hand sides of the

distribution are roughly equally balanced around the mean.

variable Characteristic recorded about each individual in a data set. Examples: age, time since last online purchase, zip code, gender, height, hair color, etc.

Page 6:

1. Univariate data is data dealing with one variable. In the last unit of this year, you will learn about bivariate data.

2. Data can be used in two ways.

1) data analysis (used to describe)

A restaurant could use data analysis in order to pay its workers based on how many hours they worked, or use

it to try to get people to invest in the business.

2) statistical inference (used to predict)

A restaurant could use statistical inference in order to figure out how much food to order from its vendors, or

use it to figure out how many workers they will likely need for each shift.

3. Data can be grouped into two sets: categorical data and quantitative data. Sometimes quantitative data is called

numerical data. This can be misleading since numbers are sometimes considered categorical.

http://en.wikipedia.org/wiki/Mean


30 | P a g e

Categorical data is data that doesn’t have units associated with it. Examples include Social Security numbers, best-

selling dish, and favorite waiter. Even though Social Security numbers are numbers, they don’t have a unit associated

with them. It wouldn’t make sense to calculate their mean or median.

Quantitative data is data that is measured in units. Examples include profit in $, and expenses in $. Quantitative data is

what we will be focusing on in unit 2.

4. Data can be displayed in graphs. Examples include dot (or line) plots, histograms, box plots, scatter plots, and

frequency tables.

Data can also be displayed as numbers. Examples include mean, median, range, standard deviation, and interquartile

range (IQR).

5. The four characteristics that should be addressed when dealing with data:

1) measure of center (mean or median)

2) measure of spread (MAD or standard deviation when dealing with the mean, IQR when dealing with median,

and range for both median and mean)

3) shape (skewed right, skewed left, mound, and uniform)

4) outliers (obvious outliers as well as statistical outliers which are calculated by using the IQR and a boxplot)

Page 7 1. a) C b) Q c) Q d) Q e) Q f) C g) C h) C i) Q j) C

Pages 8 & 9 1 a.

b. The distribution of ages is mound shaped (symmetrical) and centered around 31 to 40 years. The ages

possibly ranged from 1 to 70 years, giving a range of 69. There are no outliers.

c. Answers should vary. One issue with histograms (and box plots) is that the individual data is unknown.

Each example should have 1 number between 1-10, 4 numbers between 11-20, 5 numbers between 21-30, 7

numbers between 31-40, 6 numbers between 41-50, 5 numbers between 51-60, and 1 number in the range

of 61-70.

0

2

4

6

8

1-1

0

11

-20

21

-30

31

-40

41

-50

51

-60

61

-70

Fre

qu

en

cy

Age (in years)

Series1

This histogram could also be drawn by writing

1, 11, 21, 31, 41, 51, 61, and 71 on the

horizontal axis. The histogram should have a

title of “Restaurant Patrons” or something else

appropriate.


31 | P a g e

2) a, b, and c.

Pages 11 – 15.

1. Answers will vary. 2. 85 3. Answers will vary. 4. Answers will vary. Some may look at Johnny’s increasing

trend over time. Some may look at Will’s consistency. Some may note Anna’s one low grade. Just using the mean to

describe each student is not enough. I think that we can all agree that they are not “equal” in their test performance.

We need more information than just the typical test score. One thing to look at is how consistent each student is, and

measures of spread will give us that information.

5.

The distribution of wait times is skewed right. Most of

the wait times fell to the left of 6 minutes with one

extreme value between 18 – 20.9 minutes which is an

outlier. The center is around 3 minutes. The range is

18.5. The title of the histogram should be something like

“Waiting Times”. The vertical axis should be labeled

“Frequency”.

Time in minutes

d) A dot plot would be more difficult to make and not as useful. There are a lot of different data

points, so a dot plot would have to have all numbers, by tenths, from 0 to 18.5. A dot plot with

186 possible data points would be quite large. Also, it isn’t useful for the context of the problem.

The difference between a customer waiting 1.2 minutes and 1.3 minutes is a matter of 6

seconds. This is fairly minimal.

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Johnny

Test 1 65 85 -20

Test 2 82 85 -3

Test 3 93 85 8

Test 4 100 85 15

Sum of Deviations 0

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Will

Test 1 82 85 -3

Test 2 86 85 1

Test 3 88 85 3

Test 4 84 85 -1

Sum of Deviations 0


32 | P a g e

Score

𝑥

Mean

�̅�

Deviation from

the Mean

𝑥 − �̅�

Anna

Test 1 80 85 -5

Test 2 99 85 14

Test 3 73 85 -12

Test 4 88 85 3

Sum of Deviations 0

6. The mean is the balance point, so the distances between the mean and each point balance in either direction.

7. Johnny’s is 11.5. Will’s is 2. Anna’s is 8.5.

8. The MAD tells us the average deviation from the mean for each student, telling us how much their test scores

typically vary from their average test score. Will’s MAD is the lowest, indicating that his test scores are the least spread

out.

9. Johnny’s test scores typically fall within 11-12 points of his average score of 85. So Johnny’s scores typically vary

anywhere from 74 to 96.

Standard deviation tables.

Johnny’s Data

Test Value (𝑥) Deviations from the Mean

Value – Mean (𝑥 − 𝜇)


(𝑥 − 𝜇)2

1 65 65-85 = -20 (-20)2 = 400

2 82 82-85 = -3 (-3)2 = 9

3 93 93-85 = 8 (8)2 = 64

4 100 100-85 = 15 (15)2 = 225

Mean 𝜇 = 85 Sum ∑(𝑥 − 𝜇) = 0 Sum ∑(𝑥 − 𝜇)2 = 698

Average of Squared

Deviations ∑(𝑥 − 𝜇)2

𝑛=

698

4= 174.5

Square Root of Average of

Squared Deviations √

∑(𝑥 − 𝜇)2

𝑛= √174.5 ≈ 13.2


33 | P a g e

Will’s Data



Squared Deviations from the Mean

(Value – Mean)2

(𝑥 − 𝜇)2

1 82 -3 9

2 86 1 1

3 88 3 9

4 84 -1 1


Average of Squared


𝑛=

20

4= 5



∑(𝑥 − 𝜇)2

𝑛= √5 ≈ 2.2

Anna’s Data



Squared Deviations from the Mean

(Value – Mean)2

(𝑥 − 𝜇)2

1 80 -5 25

2 99 14 196

3 73 -12 144

4 88 3 9


Average of Squared


𝑛=

374

4= 93.5



∑(𝑥 − 𝜇)2

𝑛= √93.5 ≈ 9.7

10. The sum of the squared deviations from the mean. In other words, subtract the mean from each value, square

these differences and add them up.

11. Anna’s test grades typically vary by about ten points from her average score of 85. So her test scores typically vary

from about 75 to 95.


34 | P a g e

12. Answers will vary. Will is the most consistent as his scores vary by about 2.2 points from the mean of 85 points.

13. Answers will vary. The measure of center only gives us a one-number summary of a data set. It doesn’t give us an

idea of how much the numbers in the set vary from the center. The measure of spread gives us an idea of how far from

the center we can expect to find the data. MAD is fairly easy to calculate, but it isn’t as accurate as standard deviation,

especially when outliers are in a set of data. However, standard deviations involve square roots and they are often

irrational numbers, so MAD is used as the center until students have learned about square roots and irrational

numbers.

Pages 16 – 18

1. 3, 6, 8, 8, 8, 9, 11

Q1: 6 Q2: 8 Q3: 9 Greatest Value: 11 Least Value: 3 five-number summary: 3/6/8/9/11 Range: 11 – 3 = 8

Interquartile Range: 9 – 6 = 3

2. 2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12

Q1:4+4

2= 4 Q2:

6+6

2 = 6 Q3:

8+9

2= 8.5 Greatest Value: 12 Least Value: 2 five-number summary: 2/4/6/8.5/12

Range: 12 – 2 = 10 Interquartile Range: 8.5 – 4 = 4.5

3. The middle quartile is also the median.

4. a. 6 – 1.5(3) = 1.5 b. 9 + 1.5(3) = 13.5 c. 4 – 1.5(4.5) = –2.75 d. 8.5 + 1.5(4.5) = 15.25

5. 2, 3, 5, 5, 6, 6, 8, 9, 12

Q1: 3+5

2= 4 Q2: 6 Q3:

8+9

2= 8.5

Range: 12 – 2 = 10 Interquartile Range: 8.5 – 4 = 4.5 Greatest Value:12 Least Value:2

5-Number Summary: 2/4/6/8.5/12

6. 3, 5, 7, 8, 9, 9, 11

Q1: 5 Q2: 8 Q3: 9 Range: 11 – 3 = 8 Interquartile Range: 9 – 5 = 4 Greatest Value: 11 Least Value: 3

5-Number Summary: 3/5/8/9/11

7. a) –2.25 ; 11.75 b) –1 ; 15

8. Since 2 would be the new “least value”, the range will increase. The interquartile range will either increase or stay the

same. The quartiles will either decrease or stay the same. {2, 3, 5, 7, 8, 9, 9, 11}

Range: 11 – 2 = 9 ; Least value: 2 ; Q1: 3+5

2= 4 ; Q2:

7+8

2= 7.5 ; Q3=

9+9

2= 9 ; IQ range: 9 – 4 = 5

9. Since 12 would be the new “greatest value”, the range will increase. The interquartile range and the quartiles will

either increase or stay the same. {3, 5, 7, 8, 9, 9, 11, 12}

Range: 12 – 3 = 9 ; Greatest value: 12 ; Q1: 5+7

2= 6 ; Q2:

8+9

2= 8.5 ; Q3=

9+11

2= 10 ; IQ range: 10 – 6 = 4

1 12

1 12


35 | P a g e

10. 13 – 1.5(3) = 8.5. The largest possible value of x is 8.

11. 1 ; 17 ; no outliers ; none

12. 2.5; 14.5 ; exactly 4 ; 1, 1, 12, 20

13. 4.5; 13.5 ; at least 2 ; 1, 13

14. 6; 22; at least 2 ; 2, 22.5

15. 2; 10; at least 1 ; 1, 2

Pages 19 & 20

1. Answers may vary. Both sets of data have the same interquartile range (94 – 84 = 10 ; 96 – 86 = 10) and the same

range (96 – 80 = 16 ; 100 – 84 = 16). 25% of the scores in 1st period were at or above 94. 50% of the scores in 2nd

period were at or above 94. The lowest score in 2nd period was the same as the lower quartile in 1st period, so 25% of the

scores from 1st period were less than or equal to the lowest score in 2nd period. The highest score in 1st period was the

same as the upper quartile in 2nd period, so 25% of the scores in 2nd period were greater than or equal to the highest score

in 1st period. Overall, the students in 2nd period earned higher scores than the students in 1st period.

2. Answers may vary. When comparing box plots, focus on the range, IQR, and the 5-number summary.

3. Answers may vary.

4. Answers may vary. When comparing box plots, focus on the range, IQR, and the 5-number summary.



7. Answers may vary. The data from Plant A is much more peaked and less widely spread than the data from Plant B.

Both dot plots are mound-shaped. Both have a center of around 250 grams. Neither set of data has outliers.

8. The 8 points should be added in the 249.5 to 250.5 range.

j


36 | P a g e

Documents

One Variable Statistics - Unbound · 2019-11-28 · Common Core Math 1 Unit 2 One Variable Statistics 3 | P a g e Main Concepts Page # Study Guide 2 Vocabulary 3 – 6 Identify and