Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Maddison CapelMath 1040 Skittles ProjectApril 29, 2015
Introduction
This project is a statistical analysis of the contents of a 2.17 ounce bag of Skittles. We are analyzing the contents of our own bag as well as the bags of the rest of the class. The total amount of bags analyzed was 40. The purpose of this project is to find out what the average amount of Skittles are and how many of each color are in each bag. The statistical techniques being used in the project are categorical data, quantitative data, confidence intervals, and hypothesis testing. This is fun and helpful way of showing how statistics can be useful in everyday life.
Organizing and Displaying Categorical Data: Colors
The charts below show the proportion of the Skittle colors in every bag included in the project.
0.188
0.222
0.2
0.185
0.205
Skittle Color Proportions
Red
Orange
Yellow
Green
Purple
457
541
487
452
500
Count of Skittle Colors
Red
Orange
Yellow
Green
Purple
My original assumption was that the Skittle colors would be even distributed, having 20% of each color. This was reaffirmed by the data and graphs. It is good to see that the manufacturer tries to put in an equal amount of each color in every bag.
My bag of Skittles contained:
The total number of candies in the sample: 2437
The entire sample contains:
Number of red candies
Number of orange candies
Number of yellow candies
Number of green candies
Number of purple candies
457 541 487 452 500
Proportion
0.188 0.222 0.200 0.185 0.205
541
500487
457 452
400
420
440
460
480
500
520
540
560
Orange Purple Yellow Red Green
Tota
l Num
ber o
f Ski
ttle
s
Skittle Colors
Count of Skittle Colors
Number of red candies
Number of orange candies
Number of yellow candies
Number of green candies
Number of purple candies
11 14 14 13 11
Proportion 0.175 0.222 0.222 0.210 0.175
Organizing and Displaying Quantitative Data: the Number of Candies per BagThe total number of candies in my single 2.17-ounce bag of Skittles= 63
The total number of bags in the sample collected by the entire class= 40
The total number of candies in the sample collected by the entire class= 2437
For the entire sample:
x �̅ = 60.9
s= 1.9
5-number-summary:
Min: 54.0
Q₁: 60.0
Q₂: 61.0
Q₃: 62.0
Max: 64.0
The graphs do depict what I expected. As I was originally looking through the class data of the total number of Skittles in each bag I could tell right away that the mean would be around 61 or 62 just based on the frequency of those numbers in the data set. It appears that the data is skewed to the left. The data from the entire class
does fit into what my personal data showed.
Reflection: Categorical Versus Quantitative Data
Categorical data consists of names or labels that are not numbers representing counts or
measurements. Quantitative data, as its name suggests, consists of numbers
representing counts or measurements. The types of graphs that make sense to use for
Categorical data are pie charts and pareto charts. The reason for that is because with pie
charts and pareto charts you are grouping the various categories and then giving them
names, instead of numbering them. Since the Categorical data graphs are labeled with
names it wouldn’t make sense to try to find a difference, mean, or standard deviation of
the data set because they are just labeled out by what the group is called. You can still
find out how many of each group there is and graph the data that way, but it doesn’t
make sense to find a difference between the labels. The types of graphs that would be
used for Quantitative data are ogive, dotplot, and histograms. The reason those graphs
would best be used with Quantitative data is due to the fact they all rely on numbers on
the horizontal axis, instead of names, and that’s how the categories or groups are
sectioned, they are not given a name/label to group them into. It wouldn’t make sense to
give them a name for the various sections of the graphs because they are based on
numbers.
54 55 56 57 58 59 60 61 62 63 64 65 66 More0
2
4
6
8
10
12
1 1
3
8
1011
5
1
Number of Skittles per bag
FrequencyCandies per bag
Freq
uenc
y
Confidence Interval Estimates
A confidence interval is a range of values used to estimate the true value of a population
parameter. You could use a confidence interval to find out what the true value is of a
population proportion, the population standard deviation, or the true mean by calculating
the margin of error and using a sample statistic to infer the true statistic. Below are the
answers I found to the questions while using the confidence interval technique:
Construct a 95% confidence interval estimate for the true proportion of purple candies: .189 < p < .221. What that means is we are 95% confident that the purple population proportion falls within the interval .189 and .221.
Construct a 99% confidence interval estimate for the true mean number of candies per bag: 60.09 < µ < 61.71. What that means is we are 99% confident that the population mean falls within the interval 60.09 and 61.71.
Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag: 1.49 < σ < 2.52. What that means is we are 98% confident that the population standard deviation falls within the interval 1.49 and 2.52.
Full written calculations can be found on the last page.
Hypothesis Tests
A hypothesis test is a procedure for testing a claim about a property of a population. First you hypothesize about whether or not the test of that claim might be equal, greater than, less than, or not equal to what the claim has stated. You then run the test statistic and either reject the null hypothesis or fail to reject the null hypothesis. Below are the answers I found to the questions while using the hypothesis testing technique:
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green:
n=2437 p=.20 x=452 q=.80 α=.01 p̂=.1854739
Hₒ: p=.20
Hₐ: p≠.20
t= -1.7927 critical value: ±2.575 p-value= .0098
p-value > α Fail to reject null hypothesis. There is not sufficient evidence to warrant rejection of the claim that 20% of Skittles is green.
Use the 0.05 significance level to test the claim that the mean number of candies in the bag of Skittles is 56:
n=40 x=60.9 µ=56.0 s=1.9 α=.05
Hₒ: µ=56.0
Hₐ: µ≠56.0
t=16.31069528 critical value= ±2.023 p-value= .0434
p-value < α Reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim that the mean number of Skittles in a bag is 56.
Full written calculations can be found on the last page.
Reflection: Confidence Interval and Hypothesis Tests
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population proportions are as follows: The sample must be a
simple random sample which means each member of the subset has an equal probability
of being chosen. A simple random sample is meant to be an unbiased representation of a
group. Conditions for the binomial distribution must be satisfied which means there is a
fixed number of trials and the trials are independent. Also there must be two categories
of outcomes, and the probabilities remain constant for each trial. Lastly, there must be
at least 5 successes and 5 failures. Yes, my samples did meet the conditions as it was a
simple random sample, there were only two possible outcomes and they were
independent.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population means are as follows: The sample must be a simple
random sample. The population must be either normally distributed or have a samples
size larger than 30 (n > 30). Yes, the sample does meet the requirements because it is a
simple random sample and the sample size (n) is larger than 30, it is 40.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population standard deviation are as follows: The sample must
be a simple random sample. The population must have normally distributed values. The
requirement of a normal distribution is much stricter here than in earlier discussed
topics, so departure from normal distributions can result in large errors. The sample is a
simple random sample and it is normally distributed so it does meet the requirements.
The possible errors that could have been made by using this data is mixing up which n
would be used for a certain equation. In our calculations we used two n (sample size)
depending on what the question was asking. We used n=2437 when we were talking
about the total number of Skittles. Then we had n=40 which was the number of bags of
candies. It could be very easy to mix the two up and get very wrong answers. Another
error that could have been made is a Type I or Type II error. A Type I error is when you
reject a null hypothesis when it is actually true. A Type II error is when you fail to reject a
null hypothesis when it is actually false. You can always improve the sampling method
by adding more samples to get a more accurate understanding. Also, it might be helpful
when doing this project to purchase bags of Skittles from different stores and locations.
The reason for that is if you purchase all the Skittles from the same store, that store
could have had a shipment with skewed data in some way. Maybe the manufacturer sent
out that had way too little or way too many Skittles and that would skew the data and
cause outliers.
In conclusion the data from the entire class shows there is slightly more orange candies
in a bag of Skittles than any other color. However, it is very close to being evenly
distributed. There is about 20% of each color in every bag of Skittles. The manufacturer
also appears to put in roughly the same amount of candies per bag.
Reflective Writing
As a result of this project I have learned how to calculate a confidence interval using
proportion, mean, and standard deviation of a set of data samples. It showed me how to
organize and display the data in a meaningful and constructive way. Performing the
steps in this project also helped solidify the key concepts I have learned throughout the
semester. I am a very hands on learner so this project was a perfect way to help me
remember these concepts long term. It is also useful to know how to take a set of data
and either dispute or confirm a claim made by the manufacturer. This technique is called
hypothesis testing.
My math skills have improved tremendously by taking this class. Not only do I feel I now
have a knowledge base for some of the most relevant mathematics I have ever taken,
but my problem solving and critical thinking skills have increased. These newly acquired
skills would be useful in future classes when trying to calculate the probably of getting
an answer right on a test when I am merely guessing on a multiple choice question.
Another application where my newly acquired statistics skills would come in handy is
when trying to find out what my grade is in the class by using the weighted mean
concept.
This project help improve my problem solving skills because the class in general has
taught me how to look at a problem with a more analytic eye. Now, when I look at a
problem instead of getting overwhelmed if I don’t understand it right way, I will take bits
and pieces that I do understand and then work with that until I have come to a greater
understanding of the question. I used to be terrified of story problems but now I feel I
have the skills necessary to dissect the question and break it down into solvable pieces.
I have taken many math classes throughout my years of education and statistics is by far
the most useful one. The other math classes like algebra and geometry are useful
because you have to have a basic understanding of those topics in order to be a well-
rounded and educated person. But never in my life have I ever used the quadratic
equation at work or when hanging out with family and friends. I have, however, been at
a family gathering before reading an article where they were talking about a statistical
analysis of a survey. Now I know to question the source, sample method, and sampling
size. Before taking this class I would have accepted the data at face value and left it at
that. I am more analytical now and appreciate accurate information so I can have the
knowledge to agree or disagree with the statement.