maddisonc.weebly.com · Web viewFull written calculations can be found on the last page. Reflection: Confidence Interval and Hypothesis Tests The conditions that must be met before

Maddison CapelMath 1040 Skittles ProjectApril 29, 2015

Introduction

This project is a statistical analysis of the contents of a 2.17 ounce bag of Skittles. We are analyzing the contents of our own bag as well as the bags of the rest of the class. The total amount of bags analyzed was 40. The purpose of this project is to find out what the average amount of Skittles are and how many of each color are in each bag. The statistical techniques being used in the project are categorical data, quantitative data, confidence intervals, and hypothesis testing. This is fun and helpful way of showing how statistics can be useful in everyday life.

Organizing and Displaying Categorical Data: Colors

The charts below show the proportion of the Skittle colors in every bag included in the project.

0.188

0.222

0.2

0.185

0.205

Skittle Color Proportions

Red

Orange

Yellow

Green

Purple

457

541

487

452

500

Count of Skittle Colors

Red

Orange

Yellow

Green

Purple

My original assumption was that the Skittle colors would be even distributed, having 20% of each color. This was reaffirmed by the data and graphs. It is good to see that the manufacturer tries to put in an equal amount of each color in every bag.

My bag of Skittles contained:

The total number of candies in the sample: 2437

The entire sample contains:

Number of red candies

Number of orange candies

Number of yellow candies

Number of green candies

Number of purple candies

457 541 487 452 500

Proportion

0.188 0.222 0.200 0.185 0.205

541

500487

457 452

400

420

440

460

480

500

520

540

560

Orange Purple Yellow Red Green

Tota

l Num

ber o

f Ski

ttle

s

Skittle Colors

Count of Skittle Colors

Number of red candies

Number of orange candies

Number of yellow candies

Number of green candies

Number of purple candies

11 14 14 13 11

Proportion 0.175 0.222 0.222 0.210 0.175

Organizing and Displaying Quantitative Data: the Number of Candies per BagThe total number of candies in my single 2.17-ounce bag of Skittles= 63

The total number of bags in the sample collected by the entire class= 40

The total number of candies in the sample collected by the entire class= 2437

For the entire sample:

x �̅ = 60.9

s= 1.9

5-number-summary:

Min: 54.0

Q₁: 60.0

Q₂: 61.0

Q₃: 62.0

Max: 64.0

The graphs do depict what I expected. As I was originally looking through the class data of the total number of Skittles in each bag I could tell right away that the mean would be around 61 or 62 just based on the frequency of those numbers in the data set. It appears that the data is skewed to the left. The data from the entire class

does fit into what my personal data showed.

Reflection: Categorical Versus Quantitative Data

Categorical data consists of names or labels that are not numbers representing counts or

measurements. Quantitative data, as its name suggests, consists of numbers

representing counts or measurements. The types of graphs that make sense to use for

Categorical data are pie charts and pareto charts. The reason for that is because with pie

charts and pareto charts you are grouping the various categories and then giving them

names, instead of numbering them. Since the Categorical data graphs are labeled with

names it wouldn’t make sense to try to find a difference, mean, or standard deviation of

the data set because they are just labeled out by what the group is called. You can still

find out how many of each group there is and graph the data that way, but it doesn’t

make sense to find a difference between the labels. The types of graphs that would be

used for Quantitative data are ogive, dotplot, and histograms. The reason those graphs

would best be used with Quantitative data is due to the fact they all rely on numbers on

the horizontal axis, instead of names, and that’s how the categories or groups are

sectioned, they are not given a name/label to group them into. It wouldn’t make sense to

give them a name for the various sections of the graphs because they are based on

numbers.

54 55 56 57 58 59 60 61 62 63 64 65 66 More0

2

4

6

8

10

12

1 1

3

8

1011

5

1

Number of Skittles per bag

FrequencyCandies per bag

Freq

uenc

y

Confidence Interval Estimates

A confidence interval is a range of values used to estimate the true value of a population

parameter. You could use a confidence interval to find out what the true value is of a

population proportion, the population standard deviation, or the true mean by calculating

the margin of error and using a sample statistic to infer the true statistic. Below are the

answers I found to the questions while using the confidence interval technique:

Construct a 95% confidence interval estimate for the true proportion of purple candies: .189 < p < .221. What that means is we are 95% confident that the purple population proportion falls within the interval .189 and .221.

Construct a 99% confidence interval estimate for the true mean number of candies per bag: 60.09 < µ < 61.71. What that means is we are 99% confident that the population mean falls within the interval 60.09 and 61.71.

Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag: 1.49 < σ < 2.52. What that means is we are 98% confident that the population standard deviation falls within the interval 1.49 and 2.52.

Full written calculations can be found on the last page.

Hypothesis Tests

A hypothesis test is a procedure for testing a claim about a property of a population. First you hypothesize about whether or not the test of that claim might be equal, greater than, less than, or not equal to what the claim has stated. You then run the test statistic and either reject the null hypothesis or fail to reject the null hypothesis. Below are the answers I found to the questions while using the hypothesis testing technique:

Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green:

n=2437 p=.20 x=452 q=.80 α=.01 p̂=.1854739

Hₒ: p=.20

Hₐ: p≠.20

t= -1.7927 critical value: ±2.575 p-value= .0098

p-value > α Fail to reject null hypothesis. There is not sufficient evidence to warrant rejection of the claim that 20% of Skittles is green.

Use the 0.05 significance level to test the claim that the mean number of candies in the bag of Skittles is 56:

n=40 x=60.9 µ=56.0 s=1.9 α=.05

Hₒ: µ=56.0

Hₐ: µ≠56.0

t=16.31069528 critical value= ±2.023 p-value= .0434

p-value < α Reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim that the mean number of Skittles in a bag is 56.

Full written calculations can be found on the last page.

Reflection: Confidence Interval and Hypothesis Tests

The conditions that must be met before performing calculations for confidence intervals

and hypothesis tests on population proportions are as follows: The sample must be a

simple random sample which means each member of the subset has an equal probability

of being chosen. A simple random sample is meant to be an unbiased representation of a

group. Conditions for the binomial distribution must be satisfied which means there is a

fixed number of trials and the trials are independent. Also there must be two categories

of outcomes, and the probabilities remain constant for each trial. Lastly, there must be

at least 5 successes and 5 failures. Yes, my samples did meet the conditions as it was a

simple random sample, there were only two possible outcomes and they were

independent.


and hypothesis tests on population means are as follows: The sample must be a simple

random sample. The population must be either normally distributed or have a samples

size larger than 30 (n > 30). Yes, the sample does meet the requirements because it is a

simple random sample and the sample size (n) is larger than 30, it is 40.


and hypothesis tests on population standard deviation are as follows: The sample must

be a simple random sample. The population must have normally distributed values. The

requirement of a normal distribution is much stricter here than in earlier discussed

topics, so departure from normal distributions can result in large errors. The sample is a

simple random sample and it is normally distributed so it does meet the requirements.

The possible errors that could have been made by using this data is mixing up which n

would be used for a certain equation. In our calculations we used two n (sample size)

depending on what the question was asking. We used n=2437 when we were talking

about the total number of Skittles. Then we had n=40 which was the number of bags of

candies. It could be very easy to mix the two up and get very wrong answers. Another

error that could have been made is a Type I or Type II error. A Type I error is when you

reject a null hypothesis when it is actually true. A Type II error is when you fail to reject a

null hypothesis when it is actually false. You can always improve the sampling method

by adding more samples to get a more accurate understanding. Also, it might be helpful

when doing this project to purchase bags of Skittles from different stores and locations.

The reason for that is if you purchase all the Skittles from the same store, that store

could have had a shipment with skewed data in some way. Maybe the manufacturer sent

out that had way too little or way too many Skittles and that would skew the data and

cause outliers.

In conclusion the data from the entire class shows there is slightly more orange candies

in a bag of Skittles than any other color. However, it is very close to being evenly

distributed. There is about 20% of each color in every bag of Skittles. The manufacturer

also appears to put in roughly the same amount of candies per bag.

Reflective Writing

As a result of this project I have learned how to calculate a confidence interval using

proportion, mean, and standard deviation of a set of data samples. It showed me how to

organize and display the data in a meaningful and constructive way. Performing the

steps in this project also helped solidify the key concepts I have learned throughout the

semester. I am a very hands on learner so this project was a perfect way to help me

remember these concepts long term. It is also useful to know how to take a set of data

and either dispute or confirm a claim made by the manufacturer. This technique is called

hypothesis testing.

My math skills have improved tremendously by taking this class. Not only do I feel I now

have a knowledge base for some of the most relevant mathematics I have ever taken,

but my problem solving and critical thinking skills have increased. These newly acquired

skills would be useful in future classes when trying to calculate the probably of getting

an answer right on a test when I am merely guessing on a multiple choice question.

Another application where my newly acquired statistics skills would come in handy is

when trying to find out what my grade is in the class by using the weighted mean

concept.

This project help improve my problem solving skills because the class in general has

taught me how to look at a problem with a more analytic eye. Now, when I look at a

problem instead of getting overwhelmed if I don’t understand it right way, I will take bits

and pieces that I do understand and then work with that until I have come to a greater

understanding of the question. I used to be terrified of story problems but now I feel I

have the skills necessary to dissect the question and break it down into solvable pieces.

I have taken many math classes throughout my years of education and statistics is by far

the most useful one. The other math classes like algebra and geometry are useful

because you have to have a basic understanding of those topics in order to be a well-

rounded and educated person. But never in my life have I ever used the quadratic

equation at work or when hanging out with family and friends. I have, however, been at

a family gathering before reading an article where they were talking about a statistical

analysis of a survey. Now I know to question the source, sample method, and sampling

size. Before taking this class I would have accepted the data at face value and left it at

that. I am more analytical now and appreciate accurate information so I can have the

knowledge to agree or disagree with the statement.

Documents

maddisonc.weebly.com · Web viewFull written calculations can be found on the last page. Reflection: Confidence Interval and Hypothesis Tests The conditions that must be met before