Upload
others
View
4
Download
1
Embed Size (px)
Citation preview
MethodsforSensoryEvaluationofFood
SEP 2 4 1973Agriculture
CanadaI*
;vised
630.4
C212P12841970(1973print)
c.2
•••'•-"'
HnniHSHKu
SNHGBnra
m
PUBLICATION 1284
-•.•.-•';
-6?%
I- -A*H
PUBLICATION 1284 1970
METHODS FOR SENSORY EVALUATION OF FOOD
ELIZABETH LARMONDFood Research Institute, Central Experimental Farm, Ottawa
CANADA DEPARTMENT OF AGRICULTURE
Copies of this publication may be obtained from
INFORMATION DIVISIONCANADA DEPARTMENT OF AGRICULTUREOTTAWAK1A0C7
©INFORMATION CANADA. OTTAWA. 1973
Printed 1967Revised 1970Reprinted 1971, 1973
Code No.: 3M-3651 3-9:73Cat. No.: A53-1284
CONTENTS
Introduction 3
Types of Tests 5
Samples and Their Preparation 5
Panelists 7
Testing Conditions 8
Questionnaires 10
Design of Experiments and Methods of Analyzing Data .... 13
Consumer Testing 14
Appendix I - Sample Questionnaires and Examples of
Analyses
Triangle Test Difference Analysis 15
Duo-Trio Test Difference Analysis 17
Multiple Comparison Difference Analysis 19
Ranking Difference Analysis 24
Scoring Difference Test 27
Paired Comparison Difference Test 31
Hedonic Scale Scoring 36
Paired Comparison Preference 37
Ranking Preference 38
Appendix 1 1 - Statistical Chart 1 39
Statistical Chart 2 - F Distribution 40
Statistical Chart 3 - Studentized Ranges, 5 percent ... 42
Statistical Chart 4 - Studentized Ranges, 1 percent ... 43
Statistical Chart 5 - Scores for Ranked Data 44
Appendix IN - Notes on Introductory Statistics by Andres
Petrasovits 45
References 55
Additional Sources of Information 56
Digitized by the Internet Archive
in 2012 with funding from
Agriculture and Agri-Food Canada - Agriculture et Agroalimentaire Canada
http://www.archive.org/details/methodsforsensorOOIarm
INTRODUCTION
A sensory evaluation is made by the senses of taste, smell, and touch
when food is eaten. The complex sensation that results from the interaction
of our senses is used to measure food quality in programs for quality con-
trol and new product development. This evaluation may be carried out by
one person or by several hundred.
The first and simplest form of sensory evaluation is made at the bench
by the research worker who develops the new food products. He relies on
his own evaluation to determine gross differences in products. Sensory
evaluation is conducted in a more formal manner by laboratory and consumer
panels.
Most aspects of quality can be measured only by sensory panels,
although advances are being made in the development of objective tests
that measure individual quality factors. Instruments that measure texture
are probably the best known. Some examples are the L.E.E.-Kramer shear
press and the Warner-Bratzler shearing device. Gas chromatography and
mass spectrometry enable odor to be measured to a limited extent. The
color of foods can be accurately measured by tristimulus colorimetry. As
new instruments are developed to measure quality, sensory evaluation will
be used to prove and standardize new objective tests.
When people are used as a measuring instrument, it is necessary to
rigidly control all testing methods and conditions to overcome errors caused
by psychological factors. "Error" is not synonymous with mistakes, but
may include all kinds of extraneous influences. The physical and mental
condition of the panelist and the influence of the testing environment affect
sensory tests. For example, some people may have more flavor acuity in
the morning, others in the afternoon. Even the weather can influence the
disposition of panelists.
Small panels are used to test the palatability of foods. They may also
be used in preliminary acceptance testing. Laboratory panels may be used
to determine:
the best processing procedures,
suitable varieties of raw material,
preferable cooking and processing temperatures,
effect of substituting one ingredient for another,
best storage procedures,
effect of insecticides and fertilizers on flavor of foods,
effect of animal feeds on flavor and keeping quality of meat,
optimum size of pieces and their importance,
effect of color on acceptability of foods,
suitable recipes for the use of new products,
comparison with competitors' products.
The author is grateful to Mr. A. Petrasovits, Statistical Research
Service, Canada Department of Agriculture, who wrote Appendix III.
Statistical Chart 1 has been reprinted with permission from Wallerstein
Laboratory (Bengtsson, K. 1953. Wallerstein Lab. Commun. 16 (No. 54):
231—251.). Thanks are due to the Literary Executor of the late Sir Ronald
A. Fisher, F.R.S., Cambridge, to Dr. Frank Yates, F.R.S. Rothamstead,
and to Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint
in part Tables V (Statistical Chart 2) and XX (Statistical Chart 5) fromtheir book Statistical Tables for Biological, Agricultural and Medical
Research. Thanks are also due to Dr. Malcolm Turner for permission to
reprint "Multiple Range and Multiple F Tests" (Statistical Charts 3 and 4)by D. B. Duncan from Biometrics, Volume II, 1955.
4
TYPES OF TESTS
The two types of tests are difference tests and preference tests.
Difference Tests
In difference tests the members of the panel are merely asked if a
difference exists between two or more samples. Individual likes and dis-
likes are disregarded and each panelist is advised to be objective in his
evaluation. He may not like a particular product, but he should be taught
what constitutes good and poor quality and try to evaluate it on the basis
of the instructions received. He is acting as a quality measuring instru-
ment.
Preference Tests
Preference or acceptance tests determine representative population
preferences, and need many people on the panel. The total scores from
trained panels can be used to predict preference scores obtained from
panels of 100 to 160 untrained persons (19). Some tests are conducted on
a national scale by firms specializing in this form of testing. Even after
extensive tests, there is no assurance that the results will apply to the
total population.
SAMPLES AND THEIR PREPARATION
Panel members are usually influenced by all the characteristics of the
test material. Therefore, test samples should be prepared and served as
uniformly as possible (6).
Information About Samples
As little information as possible about the test should be given to the
panelists, since this information may influence results. It has been found
(13) that if information given to the panelists has meaning within the terms
of their experience it will influence their responses; panelists will taste
what they expect to taste. When panelists were told that high-quality raw
products were used, the panel preference was high, whereas the knowledge
that low-quality raw products were used lowered the panel preference. The
information that the food had been treated with unspecified chemicals and
exposed to unspecified sterilizing rays did not alter panel findings.
Temperature of Samples
The samples must be of uniform temperature. Therefore the mechanical
problems of serving foods at a constant and uniform temperature should be
carefully considered. The temperature at which food is usually eaten is
recommended for samples. However, taste buds are less sensitive to very
high or very low temperatures, which impair full flavor perception.
Sample Uniformity
To measure flavor differences in products of large unit size, such ascanned peach halves, slice the large units into smaller pieces and care-
fully mix them to obtain a more uniform sample. To test the quality of cannedjuice, open several cans and mix all the juice together before preparing
individual test samples.
Coding
Samples should be coded in such a manner that the judges cannot
distinguish the samples by the code or be influenced by code bias. For
example, if the samples are numbered 1, 2, 3, or lettered A, B, C, a coding
bias could be caused because people associate 1 or A with "first" or"best" and might tend to score this sample higher. A set of three-digitrandom numbers should be assigned to each sample so that the panelists
will receive samples coded differently (17).
Number of Samples
To determine the number of samples to be presented at one testingsession (12) consider the following:
The nature of the product being tested — No more than six samples ofice cream should be evaluated because of the temperature of the product.
The itensity and complexity of the sensory property being judged —
It has been found (18) that with mild products such as green beans and
canned peaches up to 20 samples may be tasted with no decrease in the
taster's ability to discriminate.
The experience of the taster — A professional tea, wine, or coffeetaster can evaluate hundreds of samples in one day.
The amount of time and commodity available.
Order of Presentation
The order in which the samples are presented to the judges may also
influence results. Studies on the effect of sample sequence on food prefer-
ence (8) have shown position effect (the later samples were rated lower);
contrast effect (serving "good" samples first lowered the ratings for "poor"
samples); and convergence effect (tendency to make similar responses to
successive stimuli). Contrast and convergence effects were shown to be
independent of position effect. Random presentation to the panelists
equalized these effects.
Figure 1. Preparation of samples.
Figure 2. Presentation of samples through hatch.
PANELISTS
For economical reasons choose panel members from all available person-
nel, including office, plant, and research staff. It should be considered a
part of work routine for personnel in the food industry to serve as panelists.
However, no one should be asked to evaluate foods to which he objects.
Persons concerned with the test product (product development and produc-
tion) and those who prepare the samples for testing should not be included
on the panel.
Panel members should be in good health and should excuse themselves
if they are suffering from a cold. Nonsmokers and smokers have been found
equally useful for panels. It is inadvisable for smokers to smoke within one
to two hours before a test (3). Heavy smokers, people who smoke one or
more packs of cigarettes per day, have been found to be generally less
sensitive than nonsmokers (2). There are exceptions however. No correlationexists between age and sensitivity. A certain amount of sensitivity isrequired, but this seems not as important as experience (1). A person ofaverage sensitivity, a high degree of personal integrity, ability to concen-
trate, intellectual curiosity, and willingness to spend time in evaluation
may do a better job than a careless person with extreme acuity of taste and
smell.
To keep the panelists interested in the work that is being done, showthem the results when each series of tests is completed. The importanceof the work can be shown by running the tests in a controlled, efficient
manner.
Selection and Training
Panelists should be selected for their ability to detect differences.
People who do well with some products often do poorly on others. A tasteris seldom equally proficient in tasting all foods.
Researchers disagree on the value of training panelists. It has been
stated (17) that screening tests could be used to choose panelists who are
capable of detecting differences, and actual training would be unnecessary.
At the Operational Research Group of Cadbury Bros. Ltd., in Bournville,
England, panelists with sufficiently discriminating palates are selected on
the basis of a series of triangular tests with typical confectionery materials.
The selected panelists are given 10 weeks training on special training
panels and finally they attend the formal sessions of a tasting panel for
4 weeks before their assessments are actually used (15). Certainly a specific
panel for the product and method being tested is more useful than a general-
purpose panel. Panelists must "be familiar with the product and must know
what constitutes good quality in the product. Preliminary sessions will
help to clarify the meaning of descriptive terms. To be of any real use terms
should be referable to specific objective standards (23).
Number of Panelists
Since there are many sources of variation in sensory tests, the more
tasters on the panel, the more likely it is that the individual variations will
balance. Four panelists is probably the minimum number (11), although most
researchers think there should be eight or ten (10). A small panel of highsensitivity and ability to differentiate may be preferable to a large panel of
less sensitivity.
TESTING CONDITIONS
Testing Area
The room where the tests are run may be simple or elaborate, but the
panelists must be independent of each other, in separate booths or in parti-
tioned sections at a large table. The panelists must be free from distractions
such as noise.
If possible, the room should be odor free and separate from, though
adjacent to, the area where the sample is prepared. Air-conditioning is
useful in the testing area. The walls should be off-white or a light neutralgray so that sample color will not be altered.
8
Figure 3. Taste panel area.
Lighting
Uniform lighting is essential. Natural white fluorescent lights are more
suitable than cool white lights. Red fluorescent lights are used in the test-
ing area of the Food Research Institute, Ottawa, to hide obvious color
differences in samples whose flavor is being evaluated.
Testing Schedule
The time of day that tests are run influences results. Although this
cannot be controlled if the number of tests is large, late morning and
midafternoon have been found to be the best times for testing. Since a
panelist's eating habits affect test results, it is desirable that no tests be
performed in the period from 1 hour before a meal to 2 hours after a meal.
Containers
The samples to be tested should be presented to the panelists in clean,
odorless, and tasteless containers.
Tasting Procedures
It is generally agreed that whether a panelist swallows the sample or
spits it out the result of the test is unchanged. However, the panelist should
be instructed to use the same method with each sample in each test.
Use crackers, white bread, celery, apples, or water to remove all traces
of flavor from the mouth between tasting samples of certain foods. If water
is used it should be at room temperature, as cold water reduces the effi-
ciency of the taste buds.
QUESTIONNAIRES
The simplest type of questionnaire has been found to be the most effi-
cient. Elaborate questionnaires divert the attention of the panelist and
complicate interpretations. No blanks should be put in the questionnaire
except those that apply to the panelist. The person analyzing the test
results should use separate summary sheets.
A new questionnaire should be prepared if the method or objective ofthe test changes. The increased reliability of the test results is worth the
time and effort needed to make up new questionnaires.
Questionnaires for some commonly used tests follow. Sample question-
naires with examples of their analyses are shown in Appendix I.
Figure 4. Panelist with samples and questionnaire.
10
Figures 5—12. Examples of trays prepared for the following tests: 5, triangle test;
6, duo»trio test; 7, paired comparison test; 8, ranking for difference; 9, multiple
comparison; 10, scoring for difference; 11, scoring for preference; 12, ranking
for preference.
11
Difference Tests
The triangle test — In the triangle test, three coded samples are pre-sented to the panelist. He is told that two samples are identical and he is
asked to indicate the odd one. See sample questionnaire on page 15.
The duo-trio test — In the duo-trio test, three samples are presented tothe taster, one is labeled R (reference) and the other two are coded. Onecoded sample is identical with R and the other is different. The panelist isasked to identify the odd sample. See sample questionnaire on page 17.
Paired comparison test — In the paired comparison test, a pair of codedsamples that represent the standard or control and an experimental treatment
are presented to the panelist, who is asked to indicate which sample has the
greater or lesser degree of intensity of a specified characteristic — such assweetness and hardness. If more than two treatments are being considered,
each treatment is compared with every other in the series. The design be-
comes somewhat cumbersome if many treatments are compared.
An example of a paired comparison test is given on page 31.
In most instances the value of a paired comparison can be increased
by a panelist's report on the extent of the difference found (20).
Paired testing is generally used to compare new with old procedures
and in quality control. When evidence from paired comparison experiments
is reported, do not conclude without further facts that a less preferred
treatment is of poor quality. The panelist should be asked to rate the quality
of each treatment as good, fair, or poor.
Ranking — The panelist is asked to rank several coded samples accord-ing to the intensity of some particular characteristic. See sample question-
naire on page 24.
Multiple comparison — In multiple comparison tests, a known referenceor standard sample is labeled R and presented to the panelist with severalcoded samples. The panelist is asked to score the coded samples in com-
parison with the reference sample. See sample questionnaire on page 19.
Scoring — Coded samples are evaluated by the panelist who records hisreactions on a descriptive graduated scale. These scores are given numerical
values by the person who analyzes the results. See sample questionnaires
on pages 27 and 30.
The flavor-profile method — The flavor-profile procedure was developedby Arthur D. Little Incorporated. A small laboratory panel of six or eightpeople who have been trained in the method measure the flavor profile of
food products. Descriptive words and numbers, with identical meaning to
each panel member, are used to show the relative strength of each note on a
suitable scale. With this method it is possible to determine small degrees
12
of difference between two samples, the degree of blending, degrees of sim-
ilarity, and overall impression of the product. Considerable knowledge of
flavor is required for the interpretation of flavor-profile results, since they
cannot be analyzed statistically. The flavor-profile method requires great
skill, extensive education in odor and flavor sensations, and keen interest
and intelligence on the part of the panelist. This technique has been re-
viewed in detail by Sjostrom (21) and Caul (4).
Dilution tests — Dilution tests involve the determination of the identi-fication threshold for the material under study. The flavor of a product is
described in terms of the percentage of dilution or as a ratio that reflects
the actual amount of odor or flavor detected. This method requires suitable
standards for comparison and for dilution of the test material and is limited
to foods that can be made homogeneous without affecting flavor (24).
Preference Tests
Paired comparison — The paired comparison test used in preferencetesting is similar to that used for difference testing. When testing prefer-
ences, the panelist is asked which sample he prefers and the degree of
preference. See sample questionnaire, page 37.
Scoring — Many different types of scales have been developed to try todetermine a degree of like or dislike for a food. These scales may be worded
"excellent, " "very good," "good," "poor," or in other similar manner.
However, the preference scale that has probably received the most attention
in the past 10 years is the nine-point hedonic scale developed at the Quar-
termaster Food and Container Institute of the United States. Much time and
effort was expended to determine which words best express a person's like
or dislike of a food. This hedonic scale is the result of these investigations
(14). See sample questionnaire on page 36.
Ranking — Ranking follows the same procedure as difference testing
(p. 12) except that when used as a preference test the questionnaire is
worded so that the panelist will indicate his order of preference for the
samples. See sample questionnaire on page 38.
DESIGN OF EXPERIMENTS AND METHODS OF ANALYZING DATA
The accuracy of sensory evaluation tests on food and the reliance that
can be placed on their results depend on standardization of testing conditions
and use of statistical methods of experimental design and analysis (22).
Plan the experiment in advance so that a simple mathematical model
may be applied to the analysis. Many simple mathematical models depend on
the independent nature of the data. Experimental design was introduced to
13
develop this independence and often makes a- simple mathematical modelapplicable. Experimental design makes the test efficient and saves time
and material. An excellent reference for experimental design has beenwritten by Cochran and Cox (5). Application of an experimental designshould be approved by a statistician.
Random choice of samples and order of presentation help eliminate error.Replication of tests will strengthen the results.
A discussion of the significance of experimental data is usually basedon a comparison of what actually happened to what would happen if chance
alone were operating. Before applying statistics to the analysis of experi-
mental da;a, the specific concept of probability should be understood as it
applies to particular experimental data. Appendix III explains the concept of
probability. The way in which an experiment is conducted usually determines
not only whether inferences can be made but also the calculations required
to make them.
CONSUMER TESTING
The consumer test is used to measure consumer acceptance of a prod-
uct. Although the fate of a food product depends on consumer acceptance,
formal studies of consumer preference are recent. Consumer studies are
completely separate from laboratory panels, which do not attempt to predict
consumer reaction. In this publication methods used in consumer tests are
not described. Ideally a consumer test should cover a large sample of the
population for whom the product is intended and should involve geographic
as well as income sampling. Because of these conditions, consumer tests
are often conducted by a specialized company.
14
APPENDIX I
SAMPLE QUESTIONNAIRES AND EXAMPLES OF ANALYSES
TRIANGLE TESTDIFFERENCE ANALYSIS
DATE TASTER
PRODUC T
Instructions: Here are three samples for evaluation. Two of these samplesare duplicates. Separate the odd sample for difference only.
(1) (2)
Sample Check odd sample
314
628
542
(3) Indicate the degree of difference between the duplicate samples and the
odd sample.
Slight Much
Moderate Extreme
(4) Acceptability:
Odd sample more acceptable
Duplicate samples more acceptable.
(5) Comments:
15
Example:
To determine if a difference existed between fish-potato flakes proc-essed under two different sets of conditions, a triangle test was used.
The samples were first reconstituted by adding boiling water, then
they were coded. On each tray there were three coded samples: two were
the same and one was different. Eleven panelists were given two trayseach, one after the other, and asked to identify the odd sample on each
tray. This made a total of 22 judgments.
The odd sample was correctly identified 19 times. According to Statis-
tical Chart 1 of Appendix II, page 39, 19 correct judgments from 22 panel-
ists in a triangle test are significant at the 0.1 percent level. The
conclusion was that a difference existed between the samples. If the
number of correct judgments had been less than 12, the conclusion would
be that no detectable difference existed between the samples.
The degree of difference indicated by those panelists who correctlychose the odd sample was:
Slight = 1 Much = 6
Moderate = 7 Extreme = 5
The next part of the triangle test was to choose the more acceptable
sample. Of the 19 panelists who correctly identified the odd sample, 14
found the same sample more acceptable. According to Statistical Chart 1,for a two-sample test (there were only two choices at this point), this is
below the number required for significance at the 5 percent level. However,
the same degree of significance cannot be attached to the results of this
secondary question as to the primary question. Once it is determined that
a difference exists, another test should be conducted to determine which
sample is more acceptable (probably a paired comparison test).
16
DUO-TRIO TESTDIFFERENCE ANALYSIS
NAME
DATE
PRODUCT
On your tray you have a marked control sample (R) and two coded
samples, one is identical with R, the other is different. Which of the coded
samples is different from R?
SAMPLES CHECK ODD SAMPLE
432
701
Example:
To determine if methional could be detected when added to Cheddar
cheese in amounts of 0.125 ppm and 0.250 ppm a duo-trio test was used.
Each tray had a control sample marked R and two coded samples, one
with methional added and one control. The duo-trio test was used in prefer-
ence to the triangle test because less tasting is required to form a judgment
using the duo-trio test. This fact becomes important when tasting a sub-
stance with a lingering aftertaste, such as methional.
The test was performed on two successive days using eight panelists.
Each day the panelists were presented with two trays: one with 0.125 ppm
and the other with 0.250 ppm methional added to a coded sample. This
made a total of 16 judgments at each level. The results are shown in
Table 1.
17
TABLE 1
Leve I of me thional added> PP 1Tl
Panelists list day 2nd day
0.125 0.250 0.125 0.250
PI X R R RP2 R R R RP3 X R X RP4 R X X RP5 R R R RP6 X R X XP7 R R R RP8 R R R R
Total 5 7 5 7
P = PanelistX = Wrong
R = Right
0.125 ppm = 10 out of 16 correct
0.250 ppm = 14 out of 16 correct
Consult Statistical Chart 1 of Appendix II, page 39, for 16 panelists
in a two-sample test, which shows that 14 correct judgments are significant
at the 1 percent level, while 10 are not significant, even at the 5 percent
level.
The conclusion is that methional added to Cheddar cheese can be
detected at the 0.250 ppm level but not at the 0.125 ppm level.
18
NAME
MULTIPLE COMPARISON
DIFFERENCE ANALYSIS
DATE
QUESTIONNAIRE:
You are receiving samples of . to compare for .
You have been given a reference sample, marked R, to which you are tocompare each sample. Test each sample; show whether it is better than,
comparable to, or inferior to the reference. Then mark the amount of dif-ference that exists.
Sample Number
Better than R
Equal to R
Inferior to R
AMOUNT OF DIFFERENCE
None
Slight
Moderate
Much
Extreme
COMMENTS:
Any comments you may have about the flavor of the samples may bemade here:
Example:
A multiple comparison test was conducted to determine how much anti-oxidant could be added to fish-potato flakes without tasters detecting a
19
difference in flavor. The flakes tested contained no antioxidant (0), 1unit, 2 units, 4 units, and 6 units of antioxidant. Rach tray contained a
reference sample labeled R that contained no antioxidant and five coded
samples (the four different levels of antioxidant and one sample with no
antioxidant). Fifteen panelists were asked to evaluate these samples
according to the score sheets on page 19. The ratings were given numerical
values 1 to 9 by the person analyzing the results with "no difference"
equaling 5, "extremely better than R" equaling 1, and "extremely inferiorto R" equaling 9. The analysis of variance was calculated as shown inTable 2 and on the following pages.
TABLE 2
r^^n f*li etcLevel of antioxidant added
Total1 dll ClloLa
1 Unit 2 Units 4 Units 6 Units
PI 1 4 5 1 9 20P2 3 3 5 5 7 23P3 7 3 4 4 7 25
P4 5 7 7 3 9 31P5 3 3 3 3 1 13P6 1 1 1 1 2 6P7 5 5 3 5 6 24P8 2 2 3 2 5 14P9 1 3 3 3 3 13P10 1 1 1 7 5 15Pll 6 5 1 4 1 17
P12 7 2 1 3 9 22
P13 3 2 3 2 6 16P14 3 3 1 5 1 13P15 3 1 5 3 3 15
Total 51 45 46 51 74 267
P = Panelist
Analysis of Variance
Correction factor = (Total) 2 /Number of responses (15 tasters x 5 samples)
CF = (267)V75= 71289/75= 950.52
Sum of squares, samples = (Sum of the square of the total for each sample/
Number of judgments for each sample) — CF
= [(512 + 452 + 462 + 512 + 742)/15] - CF
= (14819/15)- CF = 987.93 - 950.52
= 37.41
20
set up as follows:
ss MS
37.41 9.35
111.28 7.95
201.81 3.60
350.48
Sum of squares, panelists = (Sum of the square of the total for each panel-ist/Number of judgments by each panelist) — CF
= [(202 + 232 + 252 ... + 152)/5] - CF
= (5309/5)- CF= 1061.80- 950.52
= 111.28
Total sum of squares - Sum of the square of each judgment — CF
= (12 + 32 + 72 + . . . + 32)_ CF
= 1301.00- 950.52
= 350.48
The analysis of variance chart was then
Source of variance df SS MS F
Samples 4 37.41 9.35 2.59*
Panelists 14 111.28 7.95 2.21*
Error 56
Total 74
df — Degrees of freedom for samples is the number of samples minusone. There were five samples, so the degrees of freedom for samples
in this example is 4. Degrees of freedom for panelists is the number
of panelists minus one. There were 15 panelists so the df for this
source is 14. The df for total is the total number of judgments
minus 1 (75- 1 = 74).
Error— (l)To determine the df for "error" subtract the values obtained
for the other variables (in this case 4 for samples and 14 for
panelists) from the total, 74,
i.e. 74- (4 f 14)= 56.
(2) To determine the SS for "error" subtract the values obtained
for the other variables (in this case 37.41 for samples and
111.28 for panelists) from the total, 350.48,
i.e. 350.48- (37.41 + 111.28)= 201.81.
MS — The mean square for any variable is determined by dividing the SS
by its respective degree of freedom.
F — The variance ratio or F value for samples is determined by dividingthe MS for samples by the MS for error or 9.35/3.60 = 2.59. The F
value for panelists may be determined by dividing the MS for panel-
ists by the MS for error.
21
To determine if the difference between the samples is significant,
the calculated F value (2.59) is checked in Chart 2 of Appendix II on
pages 40 and 41. With 4 degrees of freedom in the numerator and 56 degrees
of freedom in the denominator, the variance ratio (F value) must exceed
2.52 to be significant at the 5 percent level and it must exceed 3.65 to be
significant at the 1 percent level. The value of 2.59 is therefore significant
at the 5 percent level (*). If the variance ratio is not significant, the
conclusion is that the addition of up to 6 units of antioxidant does not
make a detectable difference in the flavor of antioxidant.
Since there is a significant difference between the samples, the ones
that are" different can be determined by using Duncan's Multiple Range
Test (7, 16).
1 Unit 2 Units 4 Units 6 Units
Sample score = 51 45 46 51 74
Sample mean - Score/Number of
panelists = 51/15 45/15 46/15 51/15 74/15
= 3.4 3.0 3.1 3.4 4.9
The sample means are arranged according to magnitude:
A B C D E
6 Units 4 Units 2 Units 1 Unit
4 - 9 3.4 3.4 3.1 3.0
The standard error of the sample mean:
SE = y/ (MS error/Number of judgments for each sample)
= V (3.60/15)= V 0.24 = 0.49
The "shortest significant ranges" for 2, 3, 4, and 5 means are deter-
mined by using Chart 3 of Appendix II, on page 42, for the 5 percent level
of probability and Chart 4 for the 1 percent level. In this case to determine
the 5 percent level, Chart 3 is used to find the "Studentized ranges" rp for
p - 2, . . . 5 means with 56 degrees of freedom (since 56 is not shown, the
figure 60 is used).
These values are then multiplied by the standard error of the mean, to
obtain the shortest significant ranges, Rp. This gives:
P 2 3 4 5rp (5 percent) 2.83 2.98 3.08 3.14
Rp 1.39 1.46 1.51 1.54
22
The differences between the sample means are compared with the shortest
significant range appropriate for the range under consideration in the fol-
lowing order:
(i) highest minus the lowest, highest minus second lowest, and so
on to highest minus second highest;
(ii) second highest minus lowest and so on to second highest minus
third highest;
(iii) and so on down to second lowest minus lowest.
If, at any stage, a difference in (i) does not exceed the shortest sig-
nificant range, the procedure stops, Say, for example, the highest mean
minus the kth mean does not exceed the shortest significant range, then a
line is drawn underscoring all means between the highest and the kth means
inclusive, which indicates that this set of means should be grouped together
as exhibiting no significant differences.
Determination of shortest significant ranges
Sample Sample Sample
1
Sample
4
Sample
5
Range a = 2
b= 3
c= 4
d = 5
(i)
A-E= 4.9- 3.0= 1.9> 1.54 (R 5 )A- D= 4.9- 3.1= 1.8> 1.51 (R 4 )
A- C = 4.9- 3.4= 1.5 > 1.46 (R 3 )A- B= 4.9- 3.4= 1.5 > 1.39 (R 2 )
A is underscored as it is significantly different from the others.
A BC DE
(ii)
B- E= 3.4- 3.0= 0.4 < 1.51 (1 4 )Samples B, C, D, and E are underscored together as they exhibit nosignificant differences.
23
Proceed no further, since there is no significant difference between
B and E.
Therefore, the conclusion is that at the 5 percent level A is signif-
icantly different from E or that 6 units made a significant differencein flavor from 0. This procedure may be repeated using Chart 4 if the
samples were significantly different at the 1 percent level.
This procedure can be repeated to see which panelists are signifi-
cantly different from each other.
RANKING
DIFFERENCE ANALYSIS
NAME
DATE
PRODUCT
Evaluate these samples for tenderness. Please rank the samples for
tenderness. The sample that is tenderest is ranked first, the second tender-
est is ranked second, the toughest sample is ranked third. Place the code
number in the appropriate box.
Example:
A ranking test was used to compare the texture of meat of three breedsof geese. The cooked meat was cut into pieces % inch by % inch by 1
inch and one coded sample from each breed was presented to each of eight
panelists. These panelists ranked the samples according to the score
sheet above.
24
Results:
Ranks: B, B 2 B 3
PI 2 1 3
P2 2 1 3
P3 2 1 3P4 1 2 3P5 1 3 2
P6 2 1 3
P7 2 1 3P8 1 2 3
Total 13 12 23
P = PanelistB = Breed1 = First
2 = Second
3 = Third
To analyze these results the ranks were transformed into scores,according to Fisher and Yates (9). Chart 5 of Appendix II, page 44, was
used to determine the numerical value for each score. The sample ranked
first of three samples was given a value of 0.85. When converting ranks,
the middle rank is given a value of zero and the ranks beyond the middle
are given negative values corresponding to the positive values given in
the chart. In this case second is and third is —0.85. If we had six ranksthe values would be:
first = 1.27
second = 0.64
third = 0.20
fourth = -0.20
fifth = -0.64
sixth =-1.27
In Chart 5 values can be assigned to ranks in tests with up to 30 samples
in the manner described above.
Totalcores BH BP B c
PI 0.85 -0.85
P2 0.85 -0.85
P3 0.85 -0.85
P4 0.85 -0.85P5 0.85 -0.85
P6 0.85 -0.85
P7 0.85 -0.85
P8 0.85 -0.85
Total 2.55 3.40 -5.95
25
The scores were then analyzed by the analysis of variance, as on page 20.
CF =
SS samples = ([2.552 + 3.402 + (-5.95)2] /8) _. CF
= (53.465/8)-
= 6.68
SS panelists = 0/3 =
Total SS = [02 + 02 + 02 + 0.852 . . . + (-0.85) 2 ] - CF
= 11.56
Variables df SS MS F
Samples 2 6.68 3.34 9.54**
Panelists 7
Error 14 4.88 0.35
Total 23 11.56
Samples BH Bp B c
2.55 3.40 -5.95
Mean samples 0.32 0.43 -0.74
A B C
+ 0.43 0.32 -0.74
Standard error / (0.35/8) /0.04375
0.209
2 3
rp (5 percent) 3.03 3.18
RP 0.63 0.66
A - C = 0.43 - (-0.74) = 1.17 > 0.66 (R3 )
A- B= 0.43 -0.32 = 0.11 < 0.63 (R 2 )
A _B C
B-C=0.32 - (-0.74)= 1.06 > 0.63 (R 2 )C is significantly different from A and B.
The conclusion is that the meat from breed C was significantly less tender
than that of breeds H and P at the 5 percent level.
26
SCORING
DIFFERENCE TEST
NAME
DATE
PRODUCT
Evaluate these samples for flavor. Taste test each one. Use the ap-
propriate scale to show your evaluation and check the point that best
describes your feeling about the flavor of the sample.
Code
815
Excellent
.Very good
. Good
. Fair
Poor
Very poor
Code
558
Excellent
Very good
Good
Fair
Poor
Very poor
Code
394
Excellent
Very good
Good
Fair
Poor
Very poor
Reason Reason Reas on
27
Example:
Taste tests were conducted to determine any difference in flavor in
the meat of three breeds of geese. The scoring test was used rather thanmultiple comparison as there was no standard or reference with which to
compare. Eight panelists rated the coded samples according to the score
sheet on page 27. The ratings were given numerical values by the person
analyzing results with excellent = 1, very poor = 6.
The results were analyzed by the analysis of variance, see page 20.
Bj B 2 B 3 Total
PI 3 2 3 8P2 4 6 4 14P3 3 2 3 8P4 1 4 2 7P5 2 4 2 8P6 1 3 3 7P7 2 6 4 12
P8 Jl A _?_ 12.Total 18 33 23 74
Correction factor = 742/24= 5476/24= 228.17
SS samples = (1/8) (182 + 332 + 232) _ CF
= (1/8) x 1942- CF
= 242.75- 228.17
= 14.58
SS panelists = (1/3) (82 + I42 + . . . + 102) - CF
= (1/3) x 730- CF
= 243.33- 228.17
= 15.16
Total SS = (32 +42 + ... +22) -CF= 276- CF
= 276- 228.] 7
= 47.83
Variables df SS MS F
Samples 2 14.58 7.29 5.65*
Panelists 7 15.16 2.17
Error 14 18.09 1.29
Total 23 47.83
28
There is a significant difference between samples at the 5 percent
level.
The Multiple Range Test is used to determine which samples are
significantly different from the others.
B, B 2 B 3
Mean samples = 18/8 33/8 23/8
= 2.25 4.13 2.88
Ranked means = A B C
B 2 B 3 B!
4.13 2.88 2.25
Standard error = V (1.29/8)
= 0.4
= V 0.16
rp (5 percent) 3.03 3.18
Rp 1.21 1.27
A - C = 4.13 - 2.25 = 1.88 > 1.27 (R 3 )
A- B = 4.13- 2.88= 1.25 > 1.21 (R 2 )A is significantly different from B and C.
B- C = 2.88- 2.25 = 0.63 < 1.21 (R 2 )A B C
B and C are not significantly different from each other. The conclusionis that breed 2 is significantly different from breeds 1 and 3 at the 5 per-
cent level.
29
SCORING
DIFFERENCE TEST
NAME
DATE
Evaluate these samples of goose meat for tenderness. Taste test
each one. Use the appropriate scale to show your evaluation by checking
at the point that best describes your feeling about the sample.
Code
664
Extremely tender
Very tender
Moderately tender
Slightly tender
Slightly tough
Moderately tough
Very tough
Extremely tough
Code
758
Extremely tender
Very tender
Moderately tender
Slightly tender
Slightly tough
Moderately tough
Very tough
Extremely tough
Code
708
Extremely tender
Very tender
Moderately tender
Slightly tender
Slightly tough
Moderately tough
Very tough
Extremely tough
Reason Reason Reason
Note: This is another example of scoring for difference. Analysis of vari-
ance is used to analyze results (see page 20).
Numerical values: extremely tender = 1
extremely tough = 8
30
PAIRED COMPARISON
DIFFERENCE TEST
DATE TASTER
PRODUCT
Evaluate these two samples of peaches for texture.
1. Is there a difference in texture between the two samples?
Yes No
2. Indicate the degree of difference in texture between the two samples by
checking one of the following statements.
846 is extremely better than 165
846 is much better than 165
846 is slightly better than 165
No difference
165 is slightly better than 846
165 is much better than 846
165 is extremely better than 846
3. Rate the texture of the samples.
165 846
Good Good
Fair Fair
oor roorPoi
Comments:
31
One of the tests conducted as part of a study of the effect of small
doses of irradiation on the keeping quality of fresh peaches was a paired
comparison test on texture.
Four samples were compared:
(1) control or krads sample, (2) 150 krads sample, (3) 200 krads sample,
and (4) 250 krads sample.
Each sample was compared with every other sample, making a total of six
pairs. Each pair was presented to eight panelists for evaluation according
to the score sheet on page 31. The experimental design requires that half
the panelists taste one sample of the pair first and that the others taste
the second sample first.
The ratings of the panelists were given numerical values of +3, +2, +1,0,-1,-2,-3. Example:
Sample Code
150 krads 8461 Pair= 250 krads 165
The score sheet for the four panelists tasting sample 846 first was set up
as follows with the values on the right being assigned by the analyzer.
846 is extremely better than 165 (+3)
846 is much better than 165 (+2)846 is slightly better than 165 ( + 1)
No difference ( 0)
165 is slightly better than 846 (-1)
165 is much better than 846 (-2)
165 is extremely better than 846 (-3)
The four panelists who tasted sample 165 first received a score sheet set
up as follows with the corresponding values on the right.
165 is extremely better than 846 (+3)
165 is much better than 846 (+2)
165 is slightly better than 846 (+1)
No difference. ( 0)
846 is slightly better than 165 (-1)
846 is much better than 165 (-2)
846 is extremely better than 165 (-3)
If one of the four panelists tasting 165 first indicated that 846 was much
better than 165, his score was -2.
The results of the scoring for all six pairs by all tasters was tabulated
as shown in Table 3.
32
TABLE 3
Order of
presentation
Fre
-3
quency i
-2 -]
j{ scores
+1
equal
+2
to
+3
Total
scoreMean
Average
preference
0,150
150,0 3
2
1
1 1 1
-7
0.25
-1.75
1.00
0,200
200,0 2
1 1
1
2
1
«
5
-1
1.25
-0.25
0.75
0,250
250,0 2
1
2
2 1 8
-2
2.00
-0.50
1.25
150,200
200,150
1 1
1 1
2
2
-1
1
-0.25
0.25
-0.25
150,250
250,150 3
2 1
1
1 3
-2
0.75
-0.50
0.625
200,250
250,200
1 1 2
2 1 1
-3
3
-0.75
0.75
-0.75
Total 9 9 8 13 8 1
Mean = Total score/Number of panelists = 1/4 = 0.25
Average preferences = Vi (mean for 0,150 - mean for 150,0)
= Vi (0.25-(-1.75)) = % (2.00) = 1.00
The average preference of over 150 was 1.00 and the average preferenceof 150 over was -1.00. Thus the average preference of over 200 equals- the average preference of 200 over 0.
The above data were used to perform an analysis of variance, accord-ing to the method of Scheffe (20).
Main effects of treatments (&) were calculated by totaling the average
preference of each sample over every other sample and dividing by the
number of treatments.
&o = M (average preference of over 150 + average preference of over
200 + average preference of over 250)
= % (1.00 +0.75 + 1.25)
= % (3.00) =0.75
33
a 150= Va (-1.00 - 0.25 + 0.625) = -0.15625
a2Q0
= M (-0.75 + 0.25 - 0.75) = -0.3125
a 250 = Va (-1.25- 0.625 + 0.75) = -0.28125
The order effect (6) was calculated by totaling the mean for each order
of each pair and dividing this total by the number of ordered pairs (6 pairs
x 2 orders = 12).
6 = (1/12) (0.25-1.75 + 1.25 -0.25 + 2.00 -0.50 -0.25 + 0.25 +0.75 -0.50
-0.75 +0.75)
= 0.104
ANALYSIS OF VARIANCE TABLE
Variables df SS MS F
Main effects 3 24.4375 8.1458
Order effect 1 0.5192 0.5192
Error 44 74.0433 1.6828
4.84**
Total 48 99.0000
Sum of squares for main effects = number of panelists x number of treatmentsx sum of squares of each a
= 8 x 4 x [0.752 + (-0.15625) 2 + (-0.3125) 2 + (-0.28125)?] = 24.4375
Sum of squares for order effect = number of panelists x number of pairsx order effect {8 )
= 8 x 6 x 0.104 2 =0.5192
Total sum of squares (using the frequency of each score as shown in Table 3)
= 32(0 + 1) + 2
2(9 + 8) + 1
2(9 + 13) +
2(8)
= 99.00
Sum of squares for error = 99.00 - 24.4375 - 0.5192 = 74.0433
di-Main effects. There were 4 treatments, so the degrees of freedom = 3.
Order effect. There were 2 orders — one sample of pair first or the othersample of the pair first, df = 1.
Total. For paired comparison, the df for total is the total number of
observations = 48.
Error. 48 - 3 - 1 = 44.
MS for main effects = 24.4375/3 = 8.1458
MS for order effect = 0.5192/1 = 0.5192
MS for error = 74.0433/44 = 1.6828
34
The F ratio is determined for each variable by dividing its MS by theMS error. Consult Chart 2 on pages 40 and 41 to see if the F ratio is signi-ficant. This procedure is described on page 22. In our example, the main
effects are significantly different at the 1 percent level.
Use Tukey's Test to determine which samples are different. Because
Duncan's Multiple Range Test (7) has previously been used and explained
in this booklet (see pages 22-23), it is used here to determine which sam-
ples are different.
150 200 250
Average preferences
0.75
ai50
-0.15625
a200
-0.3125
a250
-0.28125
B D
ao
0.75
a150
-0.15625
a250
-0.28125
a200
-0.3125
S.E . =\/ (MS error/Number of judgments for each sample) = \/ (1.6828/24)
= y/ 0.070 =0.27
rp (5 percent)
RP
2
2.86
0.77
3
3.01
0.81
4
3.10
0.84
61o~ Soo = °-75 ~ (-0-3125) = 1.0625 > 0.84
250 = 0.75 - (-0.28125) = 1.03125 > 0.81
a - a150
= 0.75 - (-0.15625) = 0.90625 > 0.77
Siso
" a 200 = "0-15625 - (-0.3125) = 0.15626 < 0.81
a. a150
a250
a200
The control sample is significantly different from the other samples. Its
score is higher and therefore it would be considered to have better texture.
This does not mean necessarily that the other samples are of poor texture.
To determine the quality of the samples, the ratings given them by the
panelists were tabulated to see if they were considered to be good, fair, or
poor. An average score can be determined for each sample by assigning the
values good = 3, fair = 2, poor = 1, and computing the average for each
sample.
35
HEDONIC SCALE
SCORING
DATE TASTER
PRODUCT
Taste test these samples and check how much you like or dislike each
one. Use the appropriate scale to show your attitude by checking at the
point that best describes your feeling about the sample. Please give a
reason for this attitude. Remember you are the only one who can tell what
you like. An honest expression of your personal feeling will help us.
CODE
459
CODE
667
CODE
619
CODE
347
Like
extremely
Like
very much
Like
moderately
Like
slightly
Neither like
nor dislike
Dislike
slightly
Dislike
moderately
Dislike
very much
Dislike
extremely
REASON
Like
extremely
Like
very much
Like
moderately
Like
slightly
Neither like
nor dislike
Dislike
slightly
Dislike
moderately
Dislike
very much
Dislike
extremely
REASON
.Like
extremely
Like
very much
Like
moderately
.Like
slightly
Neither like
nor dislike
Dislike
slightly
. Dislike
moderately
_Dislike
very much
Dislike
ext remely
REASON
Like
extremely
Like
very much
Like
moderately
Like
slightly
Neither like
nor dislike
Dislike
slightly
.Dislike
moderately
Dislike
very much
Dislike
extremely
REASON
Note: To analyze the results of this test use analysis of variance (see
pages 20-23).
Numerical values: like extremely = 9
dislike extremel) = 1
36
PAIRED COMPARISON
PREFERENCE
DATE TASTER
PRODUCT
INSTRUCTIONS: (A) Here are two samples for evaluation. Please indicatewhich sample you prefer.
622 244
(B) Indicate the degree of preference between the two
samples.
Slight
Moderate
Much
Extreme
Example:
To determine which of two samples of Cheddar cheese (Number 1 or
Number 2) had more acceptable flavor, a paired comparison test was used.
The samples were coded and one sample of each cheese was presented on
each tray. Eight panelists were presented with three trays each, one after
the other. This made a total of 24 judgments.
Cheese Number 2 was preferred 17 times in the 24 judgments. Accord-
ing to Statistical Chart 1 of Appendix II, on page 39, in the two-sample
test, one sample must be preferred 18 times to be significantly more
acceptable. So we conclude that neither cheese was significantly more
acceptable for flavor.
37
RANKING
PREFERENCE
NAME
DATE
PRODUCT.
Please rank these samples according to your preference.
Code
First
Second
Third
Fourth
Note: The results are analyzed by converting ranks to scores and con-
ducting analysis of variance as shown for Ranking Difference Analysis,
page 24.
38
APPENDIX IISTATISTICAL CHART 1
Number
of
tasters
Two-sample test
number of concurring c
necessary to establ
significance
rhoices
lish
Triangle test
difference analysis,
number of correct answers
necessary to establish
significance
* ** *** * ** ***
1
2
3
4
5
— — —
3
4
5
—-
- - -
5
-
6 6 — — 5 6 —
7 7 — — 5 6 78 8 8 — 6 7 89 8 9 — 6 i 8
10 9 10 — 7 8 911 10 11 11 7 8 10
12 10 11 12 8 9 10
13 11 12 13 8 9 11
14 12 13 14 9 10 11
15 12 13 14 9 10 12
16 13 14 15 9 11 12
17 13 15 16 10 11 13
18 14 15 17 10 12 13
19 15 16 17 11 13 14
20 15 17 18 11 13 14
21 16 17 19 12 13 15
22 17 18 19 12 14 15
23 17 19 20 12 14 16
24 18 19 21 13 15 16
25 18 20 21 13 15 17
26 19 20 22 14 15 17
27 20 21 23 14 16 18
28 20 22 23 15 16 18
29 21 22 24 15 17 19
30 21 23 25 15 17 19
31 22 24 25 16 18 20
32 23 24 27 16 18 20
33 23 25 27 17 18 21
34 24 25 27 17 19 21
35 24 26 28 17 19 22
36 25 27 29 18 20 22
37 25 27 29 18 20 22
3 8 26 28 30 19 21 23
39 27 28 31 19 21 23
40 27 29 31 19 21 24
41 27 29 32 20 22 24
42 28 30 32 20 22 25
43 28 30 33 21 23 25
44 29 31 33 21 23 25
45 30 32 34 22 24 26
46 . 30 32 35 22 24 26
47 31 33 35 23 24 27
48 31 33 36 23 25 27
49 32 34 37 23 25 28
50 32 35 37 24 26 28
5 percent level of significance. ** 1 percent level. ***0.1 percent level.
39
STATISTICAL CHART 2
Variance Ratio — 5 Percent Points for Distribution of Fn^ — Degrees of freedom for numeratorn — Degrees of freedom for denominator
n2 \ 1 2 3 4 5 6 8 12 24 oo
1 161.4 199.5 215.7 224.6 230.2 234.0 238.9 243.9 249.0 254.3
2 18.51 19.00 19.16 19.25 19.30 19.33 19.37 19.41 19.45 19.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.84 8.74 8.64 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 3.41 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 3.12 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.07 2.90 2.71
10 4.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54
11 4.84 3.98 3.59 3.36 3.20 3.09 2.95 2.79 2.61 2.40
12 4.75 3.88 3.49 3.26 3.11 3.00 2.85 2.69 2.50 2.3D
13 4.67 3.80 3.41 3.18 3.02 2.92 2.77 2.60 2.42 2.21
14 4.60 3.74 3.34 3.11 2.96 2.85 2.70 2.53 2.35 2.13
15 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07
16 4.49 3.63 3.24 3.01 2.85 2.74 2.59 2.42 2.24 2.01
17 4.45 3.59 3.20 2.96 2.81 2.70 2.55 2.38 2.19 1.96
18 4.41 3.55 3.16 2.93 2.77 2.66 2.51 2.34 2.15 1.92
19 4.38 3.52 3.13 2.90 2.74 2.63 2.48 2.31 2.11 1.88
20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
21 4.32 3.47 3.07 2.84 2.68 2.57 2.42 2.25 2.05 1.81
22 4.30 3.44 3.05 2.82 2.66 2.55 2.40 2.23 2.03 1.78
23 4.28 3.42 3.03 2.80 2.64 2.53 2.38 2.20 2.00 1.76
24 4.26 3.40 3.01 2.78 2.62 2.51 2.36 2.18 1.98 1.73
25 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71
26 4.22 3.37 2.98 2.74 2.59 2.47 2.32 2.15 1.95 1.69
27 4.21 3.35 2.96 2.73 2.57 2.46 2.30 2.13 1.93 1.67
28 4.20 3.34 2.95 2.71 2.56 2.44 2.29 2.12 1.91 1.65
29 4.18 3.33 2.93 2.70 2.54 2.43 2.28 2.10 1.90 1.64
30 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51
60 4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.39
120 3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61 1.25
oo 3.84 2.99 2.60 2.37 2.21 2.09 1.94 1.75 1.52 1.00
40
STATISTICAL CHART 2 - Concluded
Variance Ratio — 1 Percent Points for Distribution of Fn^ — Degrees of freedom for numeratorrc2— Degrees of freedom for denominator
n 2 ^\1 2 3 4 5 6 8 12 24 so
1 4052 4999 5403 5625 5764 5859 5981 6106 6234 6366
2 98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.42 99.46 99.50
3 34.12 30.81 29.46 28.71 28.24 27.91 27.49 27.05 26.60 26.12
4 21.20 18.00 16.69 15.98 15.52 15.21 14.80 14.37 13.93 13.46
5 16.26 13.27 12.06 11.39 10.97 10.67 10.29 9.89 9.47 9.02
6 13.74 10.92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.84 6.47 6.07 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.67 5.28 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.11 4.73 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91
11 9.65 7.20 6.22 5.67 5.32 5.07 4.74 4.40 4.02 3.60
12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 3.78 3.36
13 9.07 6.70 5.74 5.20 4.86 4.62 4.30 3.96 3.59 3.16
14 8.86 6.51 5.56 5.03 4.69 4.46 4.14 3.80 3.43 3.00
15 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87
16 8.53 6.23 5.29 4.77 4.44 4.20 3.89 3.55 3.18 2.75
17 8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.45 3.08 2.65
18 8.28 6.01 5.09 4.58 4.25 4.01 3.71 3.37 3.00 2.57
19 8.18 5.93 5.01 4.50 4.17 3.94 3.63 3.30 2.92 2.49
20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.23 2.86 2.42
21 8.02 5.78 4.87 4.37 4.04 3.81 3.51 3.17 2.80 2.36
22 7.94 5.72 4.82 4.31 3.99 3.76 3.45 3.12 2.75 2.31
23 7.88 5.66 4.76 4.26 3.94 3.71 3.41 3.07 2.70 2.26
24 7.82 5.61 4.72 4.22 3.90 3.67 3.36 3.03 2.66 2.21
25 7.77 5.57 4.68 4.18 3.86 3.63 3.32 2.99 2.62 2.17
26 7.72 5.53 4.64 4.14 3.82 3.59 3.29 2.96 2.58 2.13
27 7.68 5.49 4.60 4.11 3.78 3.56 3.26 2.93 2.55 2.10
28 7.64 5.45 4.57 4.07 3.75 3.53 3.23 2.90 2.52 2.06
29 7.60 5.42 4.54 4.04 3.73 3.50 3.20 2.87 2.49 2.03
30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 2.12 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 1.95 1.38
oo 6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00
41
<Xu-J
<y
<
J
3 5?3^ C/3 S
o o o Cl CO r- IC ^. CO CM o CO m cmo o lO O 1 - NO lO -t "* CO CO CO CO CI
<Xu-I
<y
<
Du
en m E—
i
H v:U. e CO
D l>Q_ ^ GL3
c/5
S
lO On NO lO r- o -+" O to Ol CTn IO rf CO Ol ,_, ON NO IO 00o O o CO lO co CO o co r- m co Ol l-H o o o CO CO co t - I - f- 1 - t^ r- NO NO NO NOo o rf On r- no VO NO m IO m 10 IO IO lO to -+ rt rf rf rf rf rf rf rf rf rf rf rf rf1-1 o 1-1
in On NO IO r- O -+ ON to CM On to rf CO CM pH O NO rf Oo o o co lO CO CO o co r- U5 CO OJ '—
1
o o o cc CO co t - t - t - O- O. t- NO NO NO NOlO o
Onrf On f~ NO NO NO lO tO in LO LO tO lO m -f rf rf rf rf rt rf rf rf rf rf rf rf rf
tO o £ IO r- o -+ Cl\ IO O) On IO Ol r> r- IO On co 2 ^o O o CO 1/5 00 CO o CO t^ to n r^ IO CO r^ O m COco o o CO lO r^ co o CO r- lO CO Ol 1—1 o O On co co cc t - t - t - NO O NO to tO rf CO
oOn
rf On t-^ NO NO NO IO LO LO LO to to LO -f -f -+ rf rf rf rf rf if rf rf rf rf rf rf
CO rt c\ CO -f r- ^H NO Ol On NO ,_, 1^- IO CM ,_| rf r- CM rfVO O o Ol rt r^ Ol o r~ NO •H. CO Ol 1—
1
o o o CC CO f- t - i - NO NO NO NO to rf rf CO^H d
onrf1—
1
O t^ nO NO lO to m LO tO tO LO lO it -t -+ rf rf rf rf rf rf rf rf rf rf rf rfOl co r- CO o i+ co CO On ^r> c^ co •+ OJ O 00 r— •* CO *— NO NO lO lO 1/5 10 lO IO IO lO **-* rf rf rf rf rf rf rf rf rf rf rf rf rf
NO Tf co rt NO o -f O NO CM 0!N to CM en NO rt NO ON IO NOCM O o o CO NO '—
1
co NO lO CO CM 1—
i
o O o co co t- r- NO VO NO LO lO LO rf CO CO CM1—1 O
o\rf On t- NO NO lO lO LO in lO lO lO -* Tf -* * rf rt rf rf rt rf rf rf rf rf rf rf
CO in r- 00 ^ 1* ON to _ r- 1(0 ro r^ co ,_ m _ rf PA OO o o O co 10 o co lO -I- Ol 1—
i
O o O cc. 1 - [ - r- NO NO NO to LO to rf rf CO OJ CMoON
rf On t~- NO NO m IO m lO to LO »* * -* -* -* rf rf rf rf rf rf rf rf rf rf rf rf
rt o co I—
1
NO «* Ol Ol -t- 1^ ^H NO OJ co rf ,_, r^ cn NO o r-- CO Orf
O c_> NO o> 1—
1
NO CO OH On DO i - NO NO 1/5 IO rt *+ CO rt co Ol CM CM O o OnO -f CO NO vO m lO in -f rt -+ rt -f -f It -t -t rf rf rf rf -+ rf rf rf rf rf CO COo 1—1
>o —
i
CM o NO CO co 1/5 X CM r- rt O r~ it Ol r - rf , | 00 NO o. CM NO Oo o LO CO ON lO CM o CO t^ NO tO -* >* CO co CO 01 Ol Ol i—
1
1—
i
O O o O 00 COo rf CO NO LO m in in if rf rt rt Tf >+ rt rt rt Tf rt rf rf rf rf -* it ^o co CO COo r""t
NO l-H O -t CO rt o CO o 01 NO r-H r - m O r- IO CM Q> NO CO ,— (0\ OJ NO ^H rtCM
o © 01 CO r~ 01 c* r~ NO rt co CO Ol CM
STATISTICAL CHART 5
Scores for Ranked Data
The mean deviations of the 1 st , 2nc% 3fd . . . largest members of samples
of different sizes; zero and negative values omitted.
Ordinalnumber 2 3
Size
4
of Sam
5
pie
6 7 8 9 10
1 .56 .85 1.03 1.16 1.27 1.35 1.42 1.49 1.54
2 .30 .50 .64 .76 .85 .93 1.00
3 .20 .35 .47 .57 .66
4 .15 .27 .38
5 .12
11 12 13 14 15 16 17 18 19 20
1 1.59 1.63 1.67 1.70 1.74 1.76 1.79 1.82 1.84 1.87
2 1.06 1.12 1.16 1.21 1.25 1.28 1.32 1.35 1.38 1.41
3 .73 .79 .85 .90 .95 .99 1.03 1.07 1.10 1.13
4 .46 .54 .60 .66 .71 .76 .81 .85 .89 .92
5 .22 .31 .39 .46 .52 .57 .62 .67 .71 .75
6 .10 .19 .27 .34 .39 .45 .50 .55 .591-7
t .09 .17 .23 .30 .35 .40 .45
8 .08 .15 .21 .26 .31
9 .07 .13 .19
10 .06
21 22 23 24 25 26 27 28 29 30
1 1.89 1.91 1.93 1.95 1.97 1.98 2.00 2.01 2.03 2.04
2 1.43 1.46 1.48 1.50 1.52 1.54 1.56 1.58 1.60 1.62
3 1.16 1.19 1.21 1.24 1.26 1.29 1.31 1.33 1.35 1.36
4 .95 .98 1.01 1.04 1.07 1.09 1.11 1.14 1.16 1.18
5 .78 .82 .85 .88 .91 .93 .96 .98 1.00 1.03
6 .63 .67 .70 .73 .76 .79 .82 .85 .87 .89
7 .49 .53 .57 .60 .64 .67 .70 .73 .75 .78
8 .36 .41 .45 .48 .52 .55 .58 .61 .64 .67
9 .24 .29 .33 .37 .41 .44 .48 .51 .54 .57
10 .12 .17 .22 .26 .30 .34 .38 .41 .44 .47
11 .06 .11 .16 .20 .24 .28 .32 .35 .38
12 .05 .10 .14 .19 .22 .26 .29
13 .05 .09 .13 .17 .21
14 .04 .09 .12
15 .04
Tests of psychological preference and some other experimental datasuffice to place a series of magnitudes in order of preference, without supplying
metrical values. Analyses of variance, correlations, etc., can be carried out on
such data by using the normal scores, appropriate to each position in order, in
a sample of the size observed. Ties may be scored with the means of the ordinalvalues involved, but in such cases the sums of squares will require correction.
44
APPENDIX III
NOTES ON INTRODUCTORY STATISTICS 1
The purpose of these notes is to put in a written form the content of
a 3-hour discussion on some basic statistical concepts, held at the Food
Research Institute of the Department of Agriculture.
The term statistics includes the numerical descriptions (figures) of
the quantitative aspects of things, and the body of methods (subject) used
for making decisions in the face of uncertainty. The decisions may vary.
They may be concerned with estimating the probability of rain on a summer
day after considering the available meteorological data, whether or not to
accept a shipment of parts after only partial inspection, or how to set up anexperimental plan to test several varieties of geese for tenderness. These
notes deal with the concept of statistics as a body of methods.
Statistics is helpful to research workers in many ways, such as by
suggesting which to observe and how to observe it (Theory of Design ofExperiments aid Theory of Sampling) and how to summarize results in formsthat are comprehensible (Descriptive Statistics). Owing to experimental
error (chance circumstance), data and predictions cannot be expected to
agree exactly even if the scientist's theories are correct. Therefore, when
mathematical results from the Theory of Chance are applied to statistical
problems (Inferential Statistics), they help to draw conclusions from the
data.
SAMPLE AND POPULATION
Sample (data, observations) is the "number" that have been observed.
Population is the totality of all possible observations of the same kind.
Sample results vary by chance, and the pattern of chance variation
depends on the population from which the samples have been drawn. A
sample is not a miniature replicate of the population, so when decisions
about a population are based on a sample, it is necessary to make allowance
for the role of chance.
RANDOM SAMPLE
If a sample is to be representative of the population, each member of
the population should have an equal chance of being included in the sample.
But this requirement is not enough in itself to make up a random sample.
This appendix was prepared by Andres Petrasovits, Statistical Research Service,Canada Department of Agriculture, Ottawa.
45
A sample of size N is said to be random if each combination of N items inthe population has an equal chance of being chosen.
To stress the importance of the italicized words in the above definition,
consider the following:
Example: A college catalogue of students has 50 pages. A sample of 50students was taken by selecting at random one student from each page.
Note that this is not a random sample, because a sample including two
students listed on the same page has a zero chance of being chosen.
It must be pointed out that samples that are not strictly random are
often preferable to random samples on the grounds of convenience or of
increased precision (Stratification).
PARAMETER AND STATISTIC
Samples are taken to learn about the population being sampled. A
parameter is a quantitative characteristic of the population. A statistic
is a mathematical function of the sample values or a value computed from
the sample.
Example 1
Suppose we are interested in learning about the average weight of
adult Canadians. The true average weight (call it pt) of Canadians is
clearly impossible (physically and economically) to compute. It is possible,
however, to take a sample of 100 Canadians, register their weights, and
compute the mean of the sample weights. Let us denote this mean obtained
from the sample X, and suppose that, in this case, we obtained X = 141.5
pounds. As a first approximation it would be reasonable to regard 141.5
pounds as an estimate of fi, the true but unknown average weight for all
Canadians.
Then we say: X is a statistic computed from the sample, |i is a
parameter of the population, and X is an estimate of fi.
Example 2
If we are interested in the variability of weights, an indicator of
dispersion that is often used is the range (largest minus smallest). As with
the mean weights, the true range (call it R) of adult Canadian body weights
cannot be practically obtained, but we can use the sample of size 100
(from example 1) to obtain an estimate of the true range on the basis of the
range observed in the sample. Let us denote the latter by r.
Then we say: r is a statistic computed from the sample and R is a
parameter of the population: r is an estimate of R.
46
Example 3
A political poll is to be taken for Ontario to estimate the proportion
of Liberals. Clearly, the true proportion of Liberals (call it 77) would be
practically impossible to compute. We can, however, obtain the proportion
of Liberals from the sample taken (call it p) and regard p as an estimate of
77. Suppose that p was found to be 0.46, then we say that p = 0.46 is a
statistic computed from the sample, 77 is a parameter of the population,
and p is an estimate of 77.
It should be pointed out that many statistics can be computed from a
sample. In example 1, other statistics besides X may give an estimate of fi,for instance the most frequent value (mode) or the average of the extremes.
POPULATION OF SAMPLE MEANS
Consider the following finite population P: (1, 2, 3, 4). Note that two
summary measures of interest about P are the population mean \l = 2.5 and
the population range R = 4 - 1=3.
Let us form all possible samples of size 2, say, without replacement
from P and for each sample let us compute the sample mean. We have:
Sample X
(1, 2) 1.5
(1, 3) 2
(1, 4) 2.5
(2, 3) 2.5
(2, 4) 3
(3, 4) 3.5
The set of sample means
Px = (1.5, 2, 2.5, 2.5, 3, 3.5)is called the population of sample means of size 2 obtained from the
original population P.
ORIGINAL POPULATION: P POPULATION OF SAMPLE MEANS: P
:>1.5 2.5
2 35 3
When comparing the original population, P, with the population of sample
means of size 2, P^, it should be noticed that:
(i) the mean of the population of sample means is equal to the mean
of the original population,
47
(ii) the variability (dispersion) in the population of sample means is
smaller than the variability in the original population.
The above results are illustrated by the previous example.
For (i),
(1 + 2 + 3 + 4)/4 = (1.5 + 2 + 2.5 + 2.5 + 3 + 3.5)/6 = 2.5,
and for (ii), if we agree to use the range as a measure of variability, then
range for original population = 4 - 1 = 3, and
range for population of sample means = 3.5 - 1.5 = 2.
As an exercise for the reader it may be useful to examine the population of
sample means derived by taking all possible samples of size 3 from the
population:
P = (2, 3, 3, 7, 7, 0).
NORMAL DISTRIBUTION
Frequency Distribution
The pattern (characteristics) of a population or sample is shown by
grouping the data in a frequency table. This table is called the frequency
distribution of the population or the sample.
Example: The ages in years of 10 students in a classroom were:
21, 20, 21, 22, 20, 22, 21, 20, 23, 21.
Frequency distribution
20
21
22
23
Frequency
3
4
2
1
5
>- 4-
Z 7HI -'
S 2-1£ i
GRAPHICAL REPRESENTION OFTHE FREQUENCY DISTRIBUTION
21 22
AGE23
Characterization of the Normal Distribution
Roughly speaking, a normal distribution is a frequency distribution
whose graphical description is "bell-shaped." It must be noted that not
all bell-shaped distributions are normal.
A normal distribution is fully determined by two parameters: the mean
Qz) and the standard deviation (a). A full definition of the latter is available
in any introductory statistics textbook.
48
The physical meaning of the
quantities yt and a.INFLECTION
POINT
The mean, /x, establishes the "location" or "center" of the distribution.
The standard deviation, a, measures dispersion and is given by the hori-
zontal distance from xi to the inflection points of the normal curve. An
important characteristic of the normal distribution is its symmetry about the
mean xt.
NORMAL CURVES WITH DIFFERENT MEANS (LOCATIONS)AND SAME STANDARD DEVIATION (DISPERSION)
NORMAL CURVES WITH SAME MEAN (LOCATION)AND DIFFERENT STANDARD DEVIATIONS
These figures show how the mean (/x) and the standard deviation (a) affect
the appearance of the normal distribution.
Example
In a study of radioactivity, measurements and counts per minute were
registered with their frequency.
Frequency distribution
Counts
per
minute Frequency
9000 3
9100 20
9200 31
9300 54
9400 81
9500 112
9600 88
9700 64
9800 37
9900 14
10000 6
125 -
100"
75 -
? 50 *
25'
FREQUENCY DISTRIBUTION
4nnn I9000 ' 92009100
9600 ' 98009300 9500 9700 9900
COUNTS PER MINUTE
49
CENTRAL LIMIT THEOREM
The normal distribution arises from some natural populations (for
example, radioactivity counts) and from all populations of sample means(of large enough sample size). This is an important fact in the Theory of
Statistics and it is known as the Central Limit Theorem, which states that
as the sample size increases, the distribution of the population of sample
means becomes more like the "bell-shaped" normal distribution regardless
of the shape of the distribution of the original population.
Example: Consider the population:
P l= (0, 1, 2, 3, 4, 5)
and construct from it the population of sample means of size 2 (P2) and the
population of sample means of size 4 (P 4 ). From the frequency distributions
of P 1? P 2 , and P 4 construct graphical representations. It will be observedthat the shape of the graphical representation of P 2 looks more bell-shapedthan that of P
land that the shape of the graphical representation of P 4
looks more bell-shaped than that of P 2 .
Construction of P 2 Construe tion of P 4
Sample X Sample X Sample X Sample X
(0, 1) 0.5 (1,4) 2.5 (0, 1, 2, 3) 1.50 (0, 2, 3, 5) 2.50
(0, 2) 1.0 (0, 5) 3.0 (0, 1, 2, 4) 1.75 (0, 2, 4, 5) 2.75
(0, 3) 1.5 (2, 3) 2.5 (0, 1, 2, 5) 2.00 (0, 3, 4, 5) 3.00
(0, 4) 2.0 (2, 4) 3.0 (0, 1, 3, 4) 2.00 (1, 2, 3, 4) 2.50
(0, 5) 2.5 (2, 5) 3.5 (0, 1, 3, 5) 2.25 (1, 2, 3, 5) 2.75
(1,2) 1.5 (3,4) 3.5 (0, 1, 4, 5) 2.50 (1, 2, 4, 5) 3.00
(1, 3) 2.0 (3, 5) 4.0 (0, 2, 3, 4) 2.25 (1, 3, 4, 5) 3.25
(4, 5) 4.5 (2, 3, 4, 5) 3.50
Frequency distribution of Pl
/alue Fre quency
1
1 1
2 1
3 1
4 1
5 1
12 3 4VALUE
50
Frequency distribution of P 2
Sample
mean
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Frequency
1
1
2
2
3
2
2
1
1
f
3- -I
: 1
1
!
1
1
0.5 1.5 2.0 2.5 3.0 3.5 4.0 4.5
SAMPLE MEAN
Frequency distribution of P 4
Sample
mean Frequency
1.50 1
1.75 1
2.00 2
2.25 2
2.50 3
2.75 2
3.00 2
3.25 1
3.50 1
u H1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50
SAMPLE MEAN
The conclusions to be drawn from this illustration are:
(i) As the sample size increases, the frequency distribution of the
population of sample means approaches a normal distribution.
(ii) As the sample size increases, the variability in the population of
sample means diminishes.
INFERENTIAL CONCEPTS
Two inferential tools of great use in statistics are Tests of Hypothesisand Confidence Intervals. Although closelv related in their theoretical
basis, they serve different needs from an applied viewpoint.
Example:
Suppose a scientist is investigating the number of bacteria per gram in
frozen eggs. Suppose also that his main interest is in the mean number of
bacteria per gram (call it /i). Two situations can be visualized:
51.
(i) The scientist may have some theory of his own or he may have
other facts (another scientist's experiments) that lead him to
believe that /i = 150. He may decide to use his observations to
test if, in fact, fi = 150 is consistent with his experimental results.
(ii) Or the scientist may want to estimate the mean number from his
data and may not be concerned with checking any particular
theory. Situation (i) leads to Statistical Tests of Hypothesis and
situation (ii) leads to Confidence Intervals.
PROBABILITY
There are several interpretations of probability. Here we give the
approach referred to as frequentist. In a long series of throws with a coin,
heads and tails will occur approximately equally often. We then say that
the probability of a head (or tail) turning up is %. In a long series of
throws with a fair die, each of the six sides will occur in approximately
1/6 of the total number of throws and the probability for any of the numbers
is 1/6. Generally the concept of probability can be formulated by saying
that in a very long series of trials any event will tend to occur with a
relative frequency that is approximately equal to the probability of the
event.
TESTS OF HYPOTHESIS
An example borrowed from criminal legal procedure may help to
introduce some concepts.
Null hypothesis: H : defendant is innocent.
Alternative hypothesis: H a : defendant is guilty.
The jury observes the evidence and reaches a verdict. The two types of
error are:
Type I error: an innocent person is found guilty.
Type II error: a criminal is acquitted.
The type I error consists of rejecting the null hypothesis when it is true.
The type II error consists of accepting the null hypothesis when the
alternative is true.
An example of a statistical test of hypothesis is the following: A coin
is known to be either fair (ht) or two headed (hh). After a single toss of the
coin, a test of the hypothesis must be set up:
H : the coin is fair
versus H a : the coin is two headed.
52
The test of a statistical hypothesis consists of a decision rule to accept
or reject the null hypothesis (H ) on the basis of relevant statistical
information (outcome) of a single toss of the coin. There are many decision
rules that can be used to construct a test. Some are better (more reasonable)
than others. Here are three different decision rules, each of which can be
used as a statistical test for the null hypothesis that the coin is ht versus
the alternative that the coin is hh.
D|: Toss the coin and if t shows up accept H ; if h shows up rejectH -
D2 : Toss the coin and if t shows up accept H ; if h shows up accept
H„-
D 3 : Toss the coin and if it rains outside accept H ; if it does notrain outside reject H .
If we must choose one of the decision rules as a basis for our test, D 3 ,although it is a decision rule, can be disregarded on common sense grounds,
since it does not use experimental evidence. However, when Dj and D 2 arecompared, it is not clear which one is better. Therefore, a measure of the
goodness of the decision rule that is to be used as a basis for a statistical
test of hypothesis is needed.
Associated with a decision rule are two quantities: a = P (making atype I error) which reads "probability of making a Type I error" and /3 = "P
(making a type II error). Clearly, a desirable property for a statistical
test is that both a and /3 be as low as possible. Thus, the pair (a, /3)
provides a basis to measure the goodness of a statistical test of hypothesis.
Unfortunately, the size of both a and /3 is not enough by itself to select
the best one of several available tests of hypothesis. This point will be
illustrated by computing a and /3 for decision rules D t and D 2 in the coinexample.
ForD^a = P (making a type I error)
= P( rejecting H when it is true)= P (h shows up with a fair coin)
1= 2
/3 = P (making a type II error)= P (accepting H when H a is true)= P (t shows up with a two-headed coin)=
Therefore for Dj, a = A and /3 =
For D 2 ,a = P (making a type I error)
= P (rejecting H when it is true)= 0, because according to D 2 we never reject H
53
/3 = P (making a type II error)= P (accepting H when H a is true)= P (heads or tails show up when coin is 2h)= 1, because heads will always appear, so the event (heads or tails)
will always occur.
Therefore:
a
D 2 1
The above table shows that the decision rule (D j or D 2 ) that should beused cannot be chosen only by considering the probabilities a and /3 that
correspond to each decision rule because their relative importance is
unknown. Consideration of the economic importance of the type I and type II
errors is necessary when choosing which decision rule to use for a parti-
cular situation. Further discussion of this point is beyond the scope of
these notes.
The probability of making a type I error is called the level of signifi-
cance of the test, while the quantity, 1 minus P (type II error), is referred
to as the power of the test. In statistical methodology, if no economic
criteria are available to weigh the importance of the type I and type II
errors, the usual way to select a test is to fix the level of significance
arbitrarily, say 0.05, to consider only the decision rules for which the level
of significance is 0.05, and then to choose the decision rule for which the
power is maximum.
CONFIDENCE INTERVALS
Research workers often have to estimate parameters. An estimate of a
parameter given by the corresponding statistic is unlikely to be precisely
equal to the parameter, so it is necessary to show the margin of variability
to which it is subject. A way to do this is to specify an interval within
which we may be "confident" that the parameter lies. Such an interval is
called a confidence interval.
Example: A sample of 10 adult Canadians is taken to estimate the mean
Canadian adult weight (fi).
POPULATIONSAMPLE
"\ 198 192\ 178 164
^SAMPLE \_1 ^ 158 190' ) 200 200/ 165 160
54
If it is assumed that Canadian adult weights are normally distributed,
by using a technique available in any textbook on statistics, the 95%confidence interval for n can be computed from the sample and is 160.75,
190.25. The meaning of a confidence interval is often misunderstood. It
does not mean that the probability is 0.95 that the population mean (//)lies between 160.75 and 190.25. It means that if many random samples
of size 10 are taken from the population of adult Canadian weights and for
each sample a 95% confidence interval for the mean is computed according
to the above technique, then, in a very long series 95% of the confidence
intervals so computed will contain the true mean.
REFERENCES
1. Amerine, M. A., and C. S. Ough. 1964. The Sensory Evaluation of
Californian Wines. Lab. Pract. Vol. 13, No. 8.
2. Baker, R. A. 1964. Taste and Odour in Water. Lab. Pract. Vol. 13,
No. 8.
3. Bengtsson, K., and E. Helm. 1953. Principles of Taste Testing.
Wallerstein Laboratories Communications.
4. Caul, J. F. 1957. The Profile Method of Flavor Analysis. Adv. in Food
Research, 7: 1.
5. Cochran, W.G., and G.M. Cox. 1957. Experimental Designs. John Wiley
and Sons, New York, N.Y.
6. Dawson, E. H., J.L. Brogdon, and S. McManus. 1963. Sensory Testing
of Differences in Taste. Food Technol. Vol. 17, No. 9.
7. Duncan, D. B. 1955. Multiple Range and Multiple F Tests. Biometrics,Vol. II.
8. Eindhoven, J., D. Peryam, F. Heiligman, and G. A. Baker. 1964.
Effect of Sample Sequence on Food Preference. J. Food Sci.
Vol. 29, No. 4.
9. Fisher, R. A., and F. Yates. 1942. Statistical Tables. Oliver and
Boyd Ltd., Edinburgh and London.
10. Kramer, A., J. Cooler, M. Modery, and B.A. Twigg. 1963. Numbers of
Tasters Required to Determine Consumer Preference for Fruit
Drinks. Food Technol. Vol. 17, No. 3.
11. Lowe, B. 1963. Experimental Cookery, John Wiley and Sons. New York,N.Y.
12. Pangborn, R. M. V. 1964. Sensory Evaluation of Food at the University
of California. Lab. Pract. Vol. 13, No. 7.
55
13. Pettit, L. A. 1958. Informational Bias in Flavor Preference Testing.
Food Technol. Vol. 12, No. 1.
14. Peryam, D. R. 1964. Sensory Testing at the Quartermaster Food and
Container Institute. Lab. Pract. Vol. 13, No. 7.
15. Read, D. R. 1964. A Quantitative Approach to the Comparative Assess-ment of Taste Quality in the Confectionery Industry. Biometrics,
20: 143-155.
16. Robinson, P. 1959. Tests of Significance. Bulletin of Statistical
Research Service, Canada Department of Agriculture.
17. Sather, L. A. 1958. Laboratory Flavor Panels. Oregon Agricultural
Experiment Station Paper No. 56.
18. Sather, L. A., and L. D. Calvin. 1960. The Effect of Number of
Judgments in a Test on Flavor Evaluations for Preference. Food
Technol. Vol. 14.
19. Sather, L. A., L. D. Calvin, and A. Tomsma. 1963. Relation of Prefer-
ence Panels and Trained Panel Scores on Dry Whole Milk. J. Dairy
Sci.46.
20. Scheffe, H. 1952. An Analysis of Variance for Paired Comparisons.
J.Am. Statist. Ass. 47: 381-400.
21. Sjostrom, L. B., S. E. Cairncross, and J. F. Caul. 1957. Methodology
of the Flavor Profile. Food Technol. Vol. 11, p. 20.
22. Snedecor, G. W. 1956. Statistical Methods. Iowa State College Press,
Iowa State College, Ames.
23. Tilgner, D. J. 1962. Anchored Sensory Evaluation Tests — A StatusReport. Food Technol. Vol. 16, No. 3.
24. Tilgner, D. J. 1962. Dilution Tests for Odor and Flavor Analysis.