12
1. Which variables are skewed? Which are more skewed than others? -The positively skewed variables are Cost, Calories, and Fat, Calories from fat -Fat is by far the largest positively skewed variable signifying that it is easier to find a fat heavy soup then a soup without much fat -The only negatively skewed variable is the Sodium, while all others are positively skewed. - Sodium is the only variable that provides the ability for the consumer to find a soup with a variable oriented towards the lower end of the distribution, which equates to a healthier soup in this context. 2. Compare the variation among the variables Method used: I conducted a hypothesis test using a 5% level of significance to determine whether there is variation amongst the means of the cost variables belonging to the types of soup. Metho d Null hypothesis H₀: All means are equal Alternative hypothesis H₁: At least one mean is different Equal variances were assumed for the analysis. Results Analysis of Variance Sourc e DF Adj SS Adj MS F- Value P- Value Soup 2 0.6309 8 0.31549 2 4.41 0.0180 Error 44 3.1508 1 0.07160 9 Total 46 3.7818 0 Christopher Ly 260603365

Which variables are skewed

Embed Size (px)

Citation preview

1. Which variables are skewed? Which are more skewed than others?-The positively skewed variables are Cost, Calories, and Fat, Calories from fat-Fat is by far the largest positively skewed variable signifying that it is easier to find a fat heavy soup then a soup without much fat -The only negatively skewed variable is the Sodium, while all others are positively skewed. - Sodium is the only variable that provides the ability for the consumer to find a soup with a variable oriented towards the lower end of the distribution, which equates to a healthier soup in this context.

2. Compare the variation among the variables

Method used: I conducted a hypothesis test using a 5% level of significance to determine whether there is variation amongst the means of the cost variables belonging to the types of soup.

MethodNull hypothesis H₀: All means are equalAlternative hypothesis

H₁: At least one mean is different

Equal variances were assumed for the analysis.

ResultsAnalysis of Variance

Source DF Adj SS Adj MS

F-Value

P-Value

Soup 2 0.63098

0.315492

4.41 0.0180

Error 44 3.15081

0.071609

Total 46 3.78180

Christopher Ly 260603365

Tukey Simultaneous Tests for Differences of Means

Difference of Levels

Difference of Means

SE of Difference 95% CI

T-Valu

eAdjusted P-

ValueTomato-Chicken Noodle

-0.2109 0.1040 (-0.4632, 0.0414)

-2.03 0.1176

Vegetable-Chicken Noodle

0.13338 0.09150 (-0.08854, 0.35531)

1.46 0.3209

Vegetable-Tomato

0.3443 0.1160 (0.0628, 0.6257)

2.97 0.0132

Individual confidence level = 98.05%

Interpretation of the results

-The p-value for the costs is less than 0.05 (our alpha level) suggesting that the difference in means is statistically significant and we can reject Ho and side with Ha.. This result indicates that the mean differences between the costs of the flavours of soup are statistically significant. Because the mean of the prices are different this raises the question of what variables may lead to this difference.

-The difference between the means of vegetable and tomato soup is between 0.0628 and 0.6257. There range of difference between their means does not include zero with 95% confidence, therefore it is deemed statistically significant because it is likely that they will not have the same mean often if at all.

Model Summary

S R-sqR-

sq(adj)R-

sq(pred)0.26759

916.68

%12.90% 6.14%

-The low r-squared shows the factor, flavour of soup, explains only 16.68% of the variation in the response, costs. This means cost is likely to be influenced by another explanatory variable.

3. What other types of relationships exist between the variables?What about the correlation amongst the variables?

Correlation: Cost, Calories, Fat, Cal. from Fat, SodiumCorrelations for type 1 soups

Cost Calories FatCal. from

FatCalories -

0.456702

0.0870Fat 0.19739

1-

0.542114

0.4807 0.0368Cal. from Fat

0.331247

-0.80009

2

0.930792

0.2278 0.0003 <0.0001

Sodium 0.443584

-0.71189

4

0.298585

0.533179

0.0977 0.0029 0.2797 0.0407Cell Contents:

Pearson correlation

P-Value

Interpretation of the result-Surprisingly cost does not have a large affect on the amounts of the variables. With correlations of -0.54, -0.80 and -0.71, the variable, calories, seems to have a consistently adequate correlation with the other nutritional variables because they are over the 0.5 correlation mark. This suggests that there is a negative relationship between calories and other nutritional variables. When it decreases the other nutritional variables invariably increases meaning that they are good calories, the absence of them increases so-called fillers. Therefore it can be stated that with type 1 soups that calories is a significant indicator for the other variables.

Correlations For type 2 soups

Cost Calories FatCal. from

FatCalories -

0.064116

0.7768Fat 0.11289

00.69279

40.6169 0.0004

Cal. from Fat

0.042057

0.413254

0.901894

0.8526 0.0559 <0.0001

Sodium -0.53488

9

0.131939

0.274645

0.396656

0.0103 0.5583 0.2161 0.0676Cell Contents:

Pearson correlationP-Value

-There are no significant correlations amongst the variables aside from the relationship between fat and cal. from fat. Therefore it can be stated that with type 2 soups the variables are mostly dependent.

Correlations

Cost Calories FatCal. from

FatCalories -

0.431825

0.3333Fat -

0.607656

0.970585

0.1478 0.0003Cal. from Fat

-0.77960

5

0.857799

0.955715

0.0387 0.0135 0.0008Sodium -

0.417342

0.545868

0.569606

0.585287

0.3516 0.2050 0.1819 0.1674Cell Contents:

Pearson correlationP-Value

-There are significant correlations between calories and the other nutritional variables. They are all positively correlated with relatively good strength above 0.50. This indicates that in type 3 soups the amount of calories drives up the count of the other nutritional statistics. This would mean that the calories added are bad calories.

-Type 4 soups contain too few samples to form an adequate correlation

4. Does vegetable soup have more costly soups then tomato soup?Methodp₁: proportion where sample of flavour 2 > 0.458510638 cost

p₂: proportion where Sample of flavour 3 >0.458510638 costDifference: p₁ - p₂Estimation for Difference

Difference

95% CI for Difference

0.658120 (0.350531, 0.965709)

TestNull hypothesis H₀: p₁ - p₂ =

0Alternative hypothesis

H₁: p₁ - p₂ ≠ 0

MethodZ-

Value P-ValueFisher's exact 0.0075Normal approximation

4.19 <0.0001

The normal approximation may be inaccurate for small samples.

Interpretation of results-With a 95% confidence interval we see that the difference in the population means of costs between flavor 2 and 3 lies between 0.350531and 0.965709. We therefore know that 95% of the time vegetable costs more then the means of all soups more then tomato soup does. It never crosses zero and can therefore signify that costs of one population is greater then the others, in this case p1 (flavour 2)’s price is greater then p2 (flavour3). Further proof of the illegitimacy of the null hypothesis that assumes a 0 difference in price is the p value (0.0075) < 0.05 (alpha level). 5. Does chicken noodle have more costly soups then tomato soup?

Methodp₁: proportion where sample of flavour 1>0.458510638

costp₂: proportion where sample of flavour 3> 0.458510638 costDifference: p₁ - p₂Estimation for Difference

Difference

95% CI for Difference

0.288889 (0.007759, 0.570019)

TestNull hypothesis H₀: p₁ - p₂ =

0Alternative hypothesis

H₁: p₁ - p₂ ≠ 0

MethodZ-

ValueP-

ValueFisher's exact 0.2137Normal approximation

2.01 0.0440

The normal approximation may be inaccurate for small samples.

Interpretation of results-Although narrowly, 95 % of the time the output indicates that flavor 1 costs more then the mean cost of all soups more then flavor 3 does. This happens within a difference of 0.00775 and 0.570019. The null hypothesis states that the difference in the proportion of flavour 1 soups that cost more then the mean cost and of flavour 3 soups that cost more then the mean cost 0.

-The results of the past two data analysis indicates that of all flavours tomato soup has the least amount of soups that cost more then the average cost of all soups and vegetable soup has the most.

5. How would you describe the characteristics of the expensive soups?Multiple Regression: Cost versus Type, Flavour, Calories, Fat, Cal. from Fat, Sodium

Coefficients

Term Coef SE CoefT-

Value P-Value VIFCalories 0.008052 0.002396 3.36 0.0017 53.0

0Fat -0.26154 0.05185 -5.04 <0.000

123.4

0Cal. from Fat

0.028935 0.006065 4.77 <0.0001

16.95

Sodium -0.000752

5

0.0001741

-4.32 <0.0001

13.94

Flavour -0.04019 0.06347 -0.63 0.5301 11.71

Type 0.14753 0.05043 2.93 0.0056 10.02

-These are the coefficients describing the relationships between cost (response variable) and the explanatory variables of type, flavor, calories, fat, cal. from fat and sodium. -All but the flavor of the soup seems to have a statistically significant effect on price because their p values are all lower then the alpha value set at 0.05.

Model Summary

S R-sqR-

sq(adj)R-

sq(pred)0.23313

083.69

%81.30% 78.73%

-There is a high correlation of variation that indicates much of the data is explained by the model

-A VID greater then 5-10 suggest multicollinearity, meaning the variables are correlated and therefore are redundant in explaining the variation of the response variable. Because of this I removed calories and fat to see what would happened.

Coefficients

Term Coef SE CoefT-

ValueP-

Value VIF

Flavour 0.14489 0.04340 3.34 0.0017 3.45Type 0.19685 0.04865 4.05 0.0002 5.87Cal. from Fat

0.007070 0.005011 1.41 0.1655 7.29

Sodium -0.000491

2

0.0002075

-2.37 0.0225 12.48

-The new output indicates type, flavour, and sodium are good indicators for the cost response variable.

Model Summary

S R-sqR-

sq(adj)R-

sq(pred)0.26013

279.20

%76.72% 74.68%

The new model still maintains a high correlation variation and eliminates redundant variables.

Regression EquationCost = −0.11891 Fat + 0.024606 Cal. from Fat − 0.0007109 Sodium + 0.13884 Flavour + 0.24776 Type

-This equation with find you the cost associated with the variables *The associated residual plot graphs listed in the appendix (1) signify that the model fits the data

-Conclusion- we can establish that flavor, taste are the best indicators of the cost followed by sodium amount.

Appendix

1.