Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Some Analytical Chemistry of Potato Chips
Lessons on Sampling and ANOVA in SAS and JMP
Eric Cai
*How much sodium dost a potato crisp hast?
Images courtesy of Poyraz 72 and Evan-Amos via Wikimedia.
*Shakespearean online translator courtesy of LingoJam by Joseph Rocca.
Sodium chloride NaCl
Objectives
• Estimate the weight percentage of sodium in a bag of potato chips
• Obtain a confidence interval for the estimated weight percentage
• Need to minimize the cumulative uncertainty in the final result – Minimize the width of the confidence interval
Objectives
• Estimate the weight percentage of sodium in a bag of potato chips
• Obtain a confidence interval for the estimated weight percentage
• Need to minimize the cumulative uncertainty in the final result – Minimize the width of the confidence interval
Bag of Potato Chips
1 2 3 4
How to minimize uncertainty?
• Use precise instruments
• Measure many aliquots
• Minimize the variation between the samples
How to minimize uncertainty?
• Use precise instruments
• Measure many aliquots
• Minimize the variation between the samples
Bag of Potato Chips
1 2 3 4
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Bag of Potato Chips
1 2 3 4
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Within a Chip
Variation in Weight
Percentage Within a Chip
Raw Data – Wide Format Chip 1 Chip 2 Chip 3 Chip 4
Aliquot 1 0.324% 0.455% 0.420% 0.447%
Aliquot 2 0.311% 0.467% 0.463% 0.377%
Aliquot 3 0.352% 0.448% 0.424% 0.398%
Raw Data – Wide Format Chip 1 Chip 2 Chip 3 Chip 4
Aliquot 1 0.324% 0.455% 0.420% 0.447%
Aliquot 2 0.311% 0.467% 0.463% 0.377%
Aliquot 3 0.352% 0.448% 0.424% 0.398%
Chip Weight Percentage
Chip 1
Chip 1
Chip 1
Chip 2
Chip 2
Chip 2
Chip 3
Chip 3
Chip 3
Chip 4
Chip 4
Chip 4
Desired Data Long Format
Needed for analysis in both SAS and JMP
* enter the raw data; data sodium1; input chip1 chip2 chip3 chip4; datalines; 0.324 0.455 0.420 0.447 0.311 0.467 0.463 0.377 0.352 0.448 0.424 0.398 ; run; * transpose the data; * convert the weight percentages from a vertical display to a horizontal display; proc transpose data = sodium1 out = sodium2 name = sample prefix = aliquot; var chip:; run; * show the transposed data; proc print data = sodium2; run;
Long, but still wide
sample aliquot1 aliquot2 aliquot3
chip1 0.324 0.311 0.352
chip2 0.455 0.467 0.448
chip3 0.420 0.463 0.424
chip4 0.447 0.377 0.398
* sodium2 needs to be transposed once more for all weight percentages to be in one column; proc transpose data = sodium2 out = sodium3 ( rename = ( col1 = weight_percentage ) ) name = subsample; var aliquot:; by sample; run; * show sodium3 - it is now ready for analysis; proc print data = sodium3; run;
Transformed Data – Long Format
sample subsample weight_percentage
chip1 aliquot1 0.324
chip1 aliquot2 0.311
chip1 aliquot3 0.352
chip2 aliquot1 0.455
chip2 aliquot2 0.467
chip2 aliquot3 0.448
chip3 aliquot1 0.420
chip3 aliquot2 0.463
chip3 aliquot3 0.424
chip4 aliquot1 0.447
chip4 aliquot2 0.377
chip4 aliquot3 0.398
PROC TRANSPOSE X 2 Wide to Long
sample subsample weight_percentage
chip1 aliquot1 0.324
chip1 aliquot2 0.311
chip1 aliquot3 0.352
chip2 aliquot1 0.455
chip2 aliquot2 0.467
chip2 aliquot3 0.448
chip3 aliquot1 0.420
chip3 aliquot2 0.463
chip3 aliquot3 0.424
chip4 aliquot1 0.447
chip4 aliquot2 0.377
chip4 aliquot3 0.398
sample aliquot1 aliquot2 aliquot3
chip1 0.324 0.311 0.352
chip2 0.455 0.467 0.448
chip3 0.420 0.463 0.424
chip4 0.447 0.377 0.398
Chip 1 Chip 2 Chip 3 Chip 4
Aliquot 1 0.324% 0.455% 0.420% 0.447%
Aliquot 2 0.311% 0.467% 0.463% 0.377%
Aliquot 3 0.352% 0.448% 0.424% 0.398%
See the November, 2015, issue of the VanSUG newsletter about PROC TRANSPOSE by Dilinuer Kuerban
Visualize the Data
Visualize the Data
Grand Mean
Sample mean of all data
Group-specific means Sample means within each group (chip)
Visualize the Data
Between-group variation
Within-group variation
Compare the 2 sources of variation
• Analysis of Variance (ANOVA) – Linear regression with categorical predictors
– Partition a continuous variable by a categorical factor
– Use sum of squares to quantify the variation
– Sum of deviations of data away from the average • Scale (divide) each sum by the number of degrees of
freedom
Visualize the Data
Between-group variation
Within-group variation
Analysis of Variance (ANOVA)
• Use sum of squares to quantify the variations
• Sum of deviations of data away from the average
Between-group variation
vs.
Within-group variation
* use ANOVA to partition and compare the 2 sources of variation; proc anova data = sodium4; class sample; model weight_percentage = sample; run; You can also use PROC GLM to implement ANOVA. ANOVA is one special case of general linear models. PROC ANOVA should only be used when there are equal numbers of observations for every combination of the classification factors. • There are many exceptions to this!
Image courtesy of Cdang via Wikimedia
There is much more variation in the weight percentage of sodium between the chips than within the chips!
Bag of Potato Chips
1 2 3 4
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Between
Chips
Variation in Weight
Percentage Within a Chip
Variation in Weight
Percentage Within a Chip
JMP
• A software from The SAS Institute
• Point-and-click
• Has underlying scripting language
• Statistics
• Machine learning
• Industrial statistics
• Go to JMP demonstration!
Bag of Potato Chips
1 2 3 4
Bag of Potato Chips
1 2 3 4
More measurements are needed!
There is a trade-off!
Louis Valente Manager of Global Field Enablement for JMP
Mark Bailey Principal Analytical Training Consultant for JMP
Arati Mejdal Global Social Media Manager for JMP Software
Thank you JMP staff!