13
Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4 suzanne fox Say Thanks to the Authors Click http://www.ck12.org/saythanks (No sign in required)

Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

Embed Size (px)

Citation preview

Page 1: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

Data Analysis Using Dot Plots,Measures of Central Tendency,

and Interquartile Range -7.SP.3,4

suzanne fox

Say Thanks to the AuthorsClick http://www.ck12.org/saythanks

(No sign in required)

Page 2: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

To access a customizable version of this book, as well as otherinteractive content, visit www.ck12.org

CK-12 Foundation is a non-profit organization with a mission toreduce the cost of textbook materials for the K-12 market bothin the U.S. and worldwide. Using an open-content, web-basedcollaborative model termed the FlexBook®, CK-12 intends topioneer the generation and distribution of high-quality educationalcontent that will serve both as core text as well as provide anadaptive environment for learning, powered through the FlexBookPlatform®.

Copyright © 2012 CK-12 Foundation, www.ck12.org

The names “CK-12” and “CK12” and associated logos and theterms “FlexBook®” and “FlexBook Platform®” (collectively“CK-12 Marks”) are trademarks and service marks of CK-12Foundation and are protected by federal, state, and internationallaws.

Any form of reproduction of this book in any format or medium,in whole or in sections must include the referral attribution linkhttp://www.ck12.org/saythanks (placed in a visible location) inaddition to the following terms.

Except as otherwise noted, all CK-12 Content (includingCK-12 Curriculum Material) is made available to Usersin accordance with the Creative Commons Attribution/Non-Commercial/Share Alike 3.0 Unported (CC BY-NC-SA) License(http://creativecommons.org/licenses/by-nc-sa/3.0/), as amendedand updated by Creative Commons from time to time (the “CCLicense”), which is incorporated herein by this reference.

Complete terms can be found at http://www.ck12.org/terms.

Printed: March 10, 2013

AUTHORsuzanne fox

CONTRIBUTORck 12

Page 3: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

CONCEPT 1 Data Analysis Using DotPlots, Measures of Central Tendency,

and Interquartile Range - 7.SP.3,4Students wil learn several methods of describing data and data relationships through dot plots and box and whiskersplots. Examining the data through measures of central tendency and quartiles will show relationships within dataand between data sets.

Dot Plots

A dot plot consists of a horizontal scale (a number line) on which dots are placed to show the numerical values ofthe data. If a data value repeats the dots are piled up at that location. One dot for each repitition. We say that thedot plot displays the distribution of the data.

A dot plot is easier to look at compared to a long list of numbers. By examining the dot plot in the gray box, you cansee that the numbers 25 and 40 are repeated the most. You can see that the number 50 is repeated the least. Not onlythat but you can count very easily how many times each number is repeated. A dot plot is a good way of organizingnumbers and values.

1

Page 4: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.org

Here are two dot plots. Students were surveyed and asked what they thought their grade would be on a mathexam. The results of the survey are in green. Then the students took the math test and the scores were plotted onthe top graph in purple. The blue line is the halfway point between the scores. What conclusions can we make basedon looking at the two dot plots?

1. The students tended to think that they would score lower than they actually did.2. Counting the dots, more people scored above 15, where in the survey, more students thought they would score

below 15.3. In the survey, only 14 students thought they would score above 25. On the actual test, 24 people scored above

a 25.4. On the actual math test, scores were spread out a lot more than in the survey scores.

These are only a few of the comparisons you could make by looking at the dot plots.

Watch the video tutorial on dot plots at http://stattrek.com/videos/ap/lessons/charts/2b/ap-2b.aspx to help you under-stand what a dot plot is, what it is used for, and how to make a dot plot.

Example 1

The following tables show the number of cars sold each month in one year for two separate dealers. Create a dotplot for each of the following tables of data using the same grid. Give three inferences based on the graphs and howthey overlap.

2

Page 5: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

First we look at the lowest and highest number of cars sold for both because the graph must be able to handleall data. The lowest number is 23 and the highest is 38. The horizontal axis is made to include 23 - 38. Thevertical axis is left to increase by 1’s.

Plot the individual data points for each of the tables. Use two colors to represent the two dealers. Make sureyou use a key.

Three inferences based on the dot plots of both sets of data are:

1. Mac’s sold more cars than Sid’s.2. Mac and Sid had three months where they sold the same amount of cars.3. The maximum number of cars sold in one month by Sid was 35 and the maximum number of cars sold by Mac

was 38.

Create your own dot plots with the applet from http://www.shodor.org/interactivate/activities/PlopIt/. Just enteryour data and see what the dot plot looks like. Then, as a review, watch the video on dot plots at http://stattrek.com/statistics/charts/dot-plot.aspx

Measures of Central Tendency

Measures of central tendency are the center values of a data set.

• Mean is the average of all the data. Its symbol is x̄.• Mode is the data value appearing most often in the data set.• Median is the middle value of the data set, arranged in order from least to greatest.• Range is the difference between the greatest and least value of the data set.

To help understand what measures of central tendency tell us, let’s find the mean, median, mode, and range of thisset of data.

Mrs. Kramer collected the scores from her students test and obtained the following data:

90, 76, 53, 78, 88 , 80, 81, 91, 99, 68, 62, 78, 67, 82, 88 , 89, 78, 72, 77, 96, 93, 88 , 88

3

Page 6: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.org

• To find the mean, add all the values and divide by the number of values you added.

mean = 80.96

• To find the mode , look for the value(s) repeating the most.

mode = 88

• To find the median, organize the data from least to greatest. Then find the middle value.

53, 62, 62, 67, 68, 72, 76, 77, 78, 78, 78, 78, 80, 81, 82, 88, 88, 88, 88, 89, 90, 91, 93, 96, 99

median = 81

• To find the range, subtract the highest value and the lowest value.

range = 99−53 = 46

When a data set has two modes, it is bimodal.

If the data does not have a “middle value,” the median is the average of the two middle values. This occurs whendata sets have an even number of entries.

Which Measure Is Best?

While the mean, mode, and median represent centers of data, one is usually more helpful than another when talkingabout a particular data set.

For example, if the data has a wide range (a big difference between the greatest and least value), the median (middlevalue) is a better choice to describe the center than the mean. If one value occurs again and again, then the mode isa better way to describe the data.

• The price of a houses in a certain part of the country is often described using the median. This is because therecan be just a couple of very high prices in one given region that can make it seem like all the houses are moremoney.

• If a sandwich shop sold ten different sandwiches, the mode (the value that occurs most often) would be usefulto describe the favorite sandwich.

Example 2

Find the mean, median, and range of the salaries given below. Which measure of central tendency is the best todescribe this set of data? Justify your answer.

4

Page 7: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

TABLE 1.1:

Professional Realm Annual incomeFarming, Fishing, and Forestry $19,630Sales and Related $28,920Architecture and Engineering $56,330Healthcare Practitioners $49,930Legal $69,030Teaching & Education $39,130Construction $35,460Professional Baseball Player* $2,476,590

(Source: Bureau of Labor Statistics, except (*) - The Baseball Players’ Association (playbpa.com)).

Find all four measures of central tendency. They are the mean, median, mode, and range.

x̄ =19,630+28,920+56,330+49,930+69,030+39,130+35,460+2,476,590

8

x̄ =2,774,750

8x̄ = 346,843.75

mean = 346,843.75

Median (middle):

Arrange the salaries from least to greatest.

19,630 /28,920 /35,460 /39,130 /49,930 /56,330 /69,030 /2,476,590

There is no middle value. Take the two closest to the middle and average them together.

39,130+49,9302

median = 44,530

Mode (most):

There is no value that occurs the most. Each salary occurs one time in the chart.

5

Page 8: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.org

T here is no mode.

2,476,590−19,630 = 2,456,960

The mean is $346,843.75. The median is $44,530. There is no mode. The range is $2,476,590.

The median is the best measure to describe the data. This is because there is such a wide range in the data.The mean would make people think that any professional would make about $346,843.75. Most professionalsdo not.

Find more examples and a video on measures of central tendency at http://www.sophia.org/mean-median-mode-and-range-tutorial.

Mean Absolute Deviation

The mean absolute deviation of a set of data is the average distance between each data value and the mean. Themean is the average of the data. Absolute is without using negatives. Deviation is to move away from.

So mean absolute deviation is how far we move away from the average of the data without paying attention to thedirection.

This may seem a bit confusing, so let’s look at it using a dot plot.

Looking at the dot plot, there are 8 values marked with an X and the mean of the 8 values is shown with the arrow. thefirst step in finding the mean absolute deviation is to see how far each value (X) is away from the mean. The distanceis marked for each value on the dot plot. Notice that it does not matter if the distance is to the left or right. That iswhat the absolute value means.

Now we find the average of the distances. Add up the distances and divide by how many numbers we have.

8+4+2+1+2+3+4+6 = 30 is the sum of the distances.

30÷8 = 3.75 is the total divided by how many values we have.

This makes the mean absolute deviation 3.75. This means that the average distance betewen each value and themean is 3.75

You can also find the mean absolute deviation from two sets of data. When you do this, it helps to see how closethe actual data points are to the mean. It can also tell if the mean is a good representation of the data. The chartbelow represents the heights of two basketball teams. Let’s calculate the mean absolute value of each team and seehow this measure of central tendency can help us predict the team with the better chance of winning based on theirheights.

6

Page 9: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

Find the mean of each set of data.

Team 1:

x̄ =70+74+72+70+72+67+70+65+68+66

10

x̄ =69410

x̄ = 69.4

Team 2:

x̄ =73+76+68+65+70+73+74+72+68+68

10

x̄ =70710

x̄ = 70.7

Now we find the difference between the mean and each data value ( height). Because it is absolute value, there areno negative signs. The new chart shows the absolute difference.

Finally to find the mean absolute deviation, we find the averages of the two new sets of data.

Team 1 mean absolute deviation:

MAD =0.6+4.6+2.6+0.6+2.6+2.4+0.6+4.4+1.4+1.4

10

MAD =21.210

MAD = 2.12

Team 2 mean absolute deviation:

MAD =2.3+5.3+2.7+5.7+0.7+2.3+3.3+1.3+2.7+2.7

10

MAD =2910

MAD = 2.9

7

Page 10: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.org

Looking at the mean absolute deviation for each team, a deviation of 2 - 3 inches is considered a moderate variation.This means that there is some difference but not too much among the heights of each team. There is more of adifference in player heights for team 2 because their MAD is greater.

Want to watch a rap about MAD? Go to http://www.youtube.com/watch?v=nRKJGDHgTK4 and enjoy. Then getserious and watch http://www.mrmaisonet.com/index.php?/Statistics-Video/Mean-Absolute-Deviation.html

Interquartile Range

If you are told that you are between the 25th percent and 75th percent of your class in height, would you know howto find the other people that are in this range as well? In statistics this is called finding the interquartile range.

Inter is between, quartile is divided into 4 parts, range is the difference between the greatest and least data value. Thatmeans that the interquartile range is the difference between the number at the 75th quartile and the number at the25th quartile (like a mini range). Let’s make this clear using the following set of random data. We have alreadyarranged the data from least to greatest.

To make our four groups we first find the median. Remember that the median is the middle number in the data. Themiddle is also the 50th quartile. Think - 50% is half! Once we have the data split into two halves, we split thosehalves in half. This will make 4 equal parts.

If we wanted to find the 1st quartile, we would average 22 and 24 to get 23. 23 is the 1st quartile.

If we wanted to find the 3rd quartile, we would average 41 and 45 to get 43. 43 is the 3rd quartile.

Example 3

Given the following set of data:

a. determine the numbers at each of the four interquartiles and the interquartile range.b. determine the number that is at the 40th percentile

a. To find the 4 quartiles, put the numbers in order from least to greatest and find the median.

8

Page 11: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

Now break up the two remaining parts into two smaller parts.

Notice how everything is labeled. The interquartile range is the range inside the 25th and 75th quartile. 88 -72 = 16. the interquartile range is 16.

Learn more about quartiles and interquartile range at http://www.sophia.org/range-and-interquartile-range-iqr/range-and-interquartile-range-iqr–2-tutorial.

Box and Whiskers Plot

Consider the following list of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

The median is the middle value. There are 10 values, so the median lies halfway between the 5th and the 6th value.The median is therefore 5.5. This splits the list cleanly into two halves.

The lower list is: 1, 2, 3, 4, 5

And the upper list is: 6, 7, 8, 9, 10

The median of the lower half is 3. The median of the upper half is 8. These numbers, together with the median,cut the list into four quarters. These are the quartiles (percentiles). A box-and-whisker plot is formed by placingvertical lines at five positions, corresponding to the smallest value, the first quartile, the median, the third quartileand the greatest value. So far this is just like the lines in our example 3. Now, a box is drawn between the positionof the first and third quartiles, and horizontal line segments (the whiskers) connect the box with the two extremevalues.

The box-and-whisker plot for the integers 1 through 10 is shown below.

With a box-and-whisker plot, you can see the distance from the first quartile to the third quartile. This isanother way to see the inter-quartile range. A box and whiskers plot shows the measure of the spread of themiddle half of the data.

9

Page 12: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.org

Example 4

Harika is rolling 3 dice and adding the numbers together. She records the total score for each of 50 rolls, andthe scores she gets are shown below. Display the data in a box-and-whisker plot, and find both the range and theinter-quartile range.

9, 10, 12, 13, 10, 14, 8, 10, 12, 6, 8, 11, 12, 12, 9, 11, 10, 15, 10, 8, 8, 12, 10, 14, 10, 9, 7, 5, 11, 15, 8, 9, 17, 12, 12,13, 7, 14, 6, 17, 11, 15, 10, 13, 9, 7, 12, 13, 10, 12

Solution

First we’ll put the list in order. Since there are 50 data points, the median will be the mean of the 25th and26th values. The median will split the data into two lists of 25 values; we can write them as two distinct lists.

5,6,6,7,7,7,8,8,8,8,8,9, 9 ,9,9,9,10,10,10,10,10,10,10,10, 10 , 11 ,11,

11,11,12,12,12,12,12,12,12,12,12, 13 ,13,13,13,14,14,14,15,15,15,17,17

Since each sub-list has 25 values, splitting the two smaller lists (the first and third quartiles of the entire dataset) can be found from the median of each smaller list. For 25 values, we split into 2 groups of 12 numbers,and the quartiles are given by the 13th number from each smaller sub-list.

From the ordered list we can see the five number summary:

• The lowest value is 5• The first quartile is 9• The median is 10.5• The third quartile is 13• The highest value is 17.

The box-and-whisker plot therefore looks like this:

FIGURE 1.1

The range is given by subtracting the smallest value from the largest value: 17−5 = 12.

The inter-quartile range is given by subtracting the first quartile from the third quartile: 13−9 = 4.

Representing Outliers in a Box-and-Whisker Plot

Box-and-whisker plots can be misleading if we don’t take outliers into account. An outlier is a data point that doesnot fit well with the other data in the list. For box-and-whisker plots, we can define which points are outliers by howfar they are from the box part of the diagram.

Example 5

The box-and-whisker plots below represent the times taken by a school class to complete an obstacle course. Thetimes have been separated into boys and girls. The boys and the girls each think that they did best.

a. Give the following information for both the boys and the girls:

10

Page 13: Data Analysis Using Dot Plots, Measures of Central ... · PDF fileData Analysis Using Dot Plots, Measures of Central Tendency, ... dot plot displays ... Measures of Central Tendency,

www.ck12.orgConcept 1. Data Analysis Using Dot Plots, Measures of Central Tendency, and Interquartile Range - 7.SP.3,4

• lowest value• first quartile• median• third quartile• highest value

b. Give a reason why the boys or girls think they did the best.

FIGURE 1.2

Comparing two sets of data with a box-and-whisker plot is relatively straightforward. For example, you cansee that the data for the boys is more spread out, both in terms of the range and the inter-quartile range.

The five number summary for each is shown in the table below.

TABLE 1.2:

Boys GirlsLowest value 1:30 1:40First Quartile 2:00 2:30Median 2:30 2:55Third Quartile 3:30 3:20Highest value 5:10 4:10

Here are some points each side could use in their argument:

Boys:

• The boys had the fastest time (1 minute 30 seconds), so the fastest individual was a boy.• The boys also had the smaller median (2 minutes 30 seconds), meaning half of the boys were finished when

only one fourth of the girls were finished (since the girls’ first quartile is also 2:30). In other words, the boys’average time was faster.

Girls:

• The boys had the slowest time (5 minutes 10 seconds), so by the time all the girls were finished there was stillat least one boy completing the course.

• The girls had the smaller third quartile (3 min 20 seconds), meaning that even without taking the slowestfourth of each group into account, the girls were still quickest.

A great review of box and whiskers plots can be watched at http://www.phschool.com/atschool/academy123/english/academy123_content/wl-book-demo/ph-118s.html

dot plot, distribution, mean, median, mode, bimodal, range, moderate variation, quartile, interquartile range, meanabsolute deviation,

11