64
Methods for Sensory Evaluation of Food SEP 2 4 1973 Agriculture Canada I* ;vised 630.4 C212 P1284 1970 (1973print) c.2 •••'•-"' HnniHSHKu SNHGB nra m PUBLICATION 1284 -•. •.-•'; -6?% I- -A* H

Methods for sensory evaluation of food - Internet Archive · 2012. 3. 1. · Methods for Sensory Evaluation of Food SEP241973 Agriculture I* Canada;vised 630.4 C212 P1284 1970 (1973print)

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

  • MethodsforSensoryEvaluationofFood

    SEP 2 4 1973Agriculture

    CanadaI*

    ;vised

    630.4

    C212P12841970(1973print)

    c.2

    •••'•-"'

    HnniHSHKu

    SNHGBnra

    m

    PUBLICATION 1284

    -•.•.-•';

    -6?%

    I- -A*H

  • PUBLICATION 1284 1970

    METHODS FOR SENSORY EVALUATION OF FOOD

    ELIZABETH LARMONDFood Research Institute, Central Experimental Farm, Ottawa

    CANADA DEPARTMENT OF AGRICULTURE

  • Copies of this publication may be obtained from

    INFORMATION DIVISIONCANADA DEPARTMENT OF AGRICULTUREOTTAWAK1A0C7

    ©INFORMATION CANADA. OTTAWA. 1973

    Printed 1967Revised 1970Reprinted 1971, 1973

    Code No.: 3M-3651 3-9:73Cat. No.: A53-1284

  • CONTENTS

    Introduction 3

    Types of Tests 5

    Samples and Their Preparation 5

    Panelists 7

    Testing Conditions 8

    Questionnaires 10

    Design of Experiments and Methods of Analyzing Data .... 13

    Consumer Testing 14

    Appendix I - Sample Questionnaires and Examples of

    Analyses

    Triangle Test Difference Analysis 15

    Duo-Trio Test Difference Analysis 17

    Multiple Comparison Difference Analysis 19

    Ranking Difference Analysis 24

    Scoring Difference Test 27

    Paired Comparison Difference Test 31

    Hedonic Scale Scoring 36

    Paired Comparison Preference 37

    Ranking Preference 38

    Appendix 1 1 - Statistical Chart 1 39

    Statistical Chart 2 - F Distribution 40

    Statistical Chart 3 - Studentized Ranges, 5 percent ... 42

    Statistical Chart 4 - Studentized Ranges, 1 percent ... 43

    Statistical Chart 5 - Scores for Ranked Data 44

    Appendix IN - Notes on Introductory Statistics by Andres

    Petrasovits 45

    References 55

    Additional Sources of Information 56

  • Digitized by the Internet Archive

    in 2012 with funding from

    Agriculture and Agri-Food Canada - Agriculture et Agroalimentaire Canada

    http://www.archive.org/details/methodsforsensorOOIarm

  • INTRODUCTION

    A sensory evaluation is made by the senses of taste, smell, and touch

    when food is eaten. The complex sensation that results from the interaction

    of our senses is used to measure food quality in programs for quality con-

    trol and new product development. This evaluation may be carried out by

    one person or by several hundred.

    The first and simplest form of sensory evaluation is made at the bench

    by the research worker who develops the new food products. He relies on

    his own evaluation to determine gross differences in products. Sensory

    evaluation is conducted in a more formal manner by laboratory and consumer

    panels.

    Most aspects of quality can be measured only by sensory panels,

    although advances are being made in the development of objective tests

    that measure individual quality factors. Instruments that measure texture

    are probably the best known. Some examples are the L.E.E.-Kramer shear

    press and the Warner-Bratzler shearing device. Gas chromatography and

    mass spectrometry enable odor to be measured to a limited extent. The

    color of foods can be accurately measured by tristimulus colorimetry. As

    new instruments are developed to measure quality, sensory evaluation will

    be used to prove and standardize new objective tests.

    When people are used as a measuring instrument, it is necessary to

    rigidly control all testing methods and conditions to overcome errors caused

    by psychological factors. "Error" is not synonymous with mistakes, but

    may include all kinds of extraneous influences. The physical and mental

    condition of the panelist and the influence of the testing environment affect

    sensory tests. For example, some people may have more flavor acuity in

    the morning, others in the afternoon. Even the weather can influence the

    disposition of panelists.

    Small panels are used to test the palatability of foods. They may also

    be used in preliminary acceptance testing. Laboratory panels may be used

    to determine:

    the best processing procedures,

    suitable varieties of raw material,

    preferable cooking and processing temperatures,

    effect of substituting one ingredient for another,

    best storage procedures,

    effect of insecticides and fertilizers on flavor of foods,

    effect of animal feeds on flavor and keeping quality of meat,

    optimum size of pieces and their importance,

    effect of color on acceptability of foods,

    suitable recipes for the use of new products,

    comparison with competitors' products.

  • The author is grateful to Mr. A. Petrasovits, Statistical Research

    Service, Canada Department of Agriculture, who wrote Appendix III.

    Statistical Chart 1 has been reprinted with permission from Wallerstein

    Laboratory (Bengtsson, K. 1953. Wallerstein Lab. Commun. 16 (No. 54):

    231—251.). Thanks are due to the Literary Executor of the late Sir Ronald

    A. Fisher, F.R.S., Cambridge, to Dr. Frank Yates, F.R.S. Rothamstead,

    and to Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint

    in part Tables V (Statistical Chart 2) and XX (Statistical Chart 5) fromtheir book Statistical Tables for Biological, Agricultural and Medical

    Research. Thanks are also due to Dr. Malcolm Turner for permission to

    reprint "Multiple Range and Multiple F Tests" (Statistical Charts 3 and 4)by D. B. Duncan from Biometrics, Volume II, 1955.

    4

  • TYPES OF TESTS

    The two types of tests are difference tests and preference tests.

    Difference Tests

    In difference tests the members of the panel are merely asked if a

    difference exists between two or more samples. Individual likes and dis-

    likes are disregarded and each panelist is advised to be objective in his

    evaluation. He may not like a particular product, but he should be taught

    what constitutes good and poor quality and try to evaluate it on the basis

    of the instructions received. He is acting as a quality measuring instru-

    ment.

    Preference Tests

    Preference or acceptance tests determine representative population

    preferences, and need many people on the panel. The total scores from

    trained panels can be used to predict preference scores obtained from

    panels of 100 to 160 untrained persons (19). Some tests are conducted on

    a national scale by firms specializing in this form of testing. Even after

    extensive tests, there is no assurance that the results will apply to the

    total population.

    SAMPLES AND THEIR PREPARATION

    Panel members are usually influenced by all the characteristics of the

    test material. Therefore, test samples should be prepared and served as

    uniformly as possible (6).

    Information About Samples

    As little information as possible about the test should be given to the

    panelists, since this information may influence results. It has been found

    (13) that if information given to the panelists has meaning within the terms

    of their experience it will influence their responses; panelists will taste

    what they expect to taste. When panelists were told that high-quality raw

    products were used, the panel preference was high, whereas the knowledge

    that low-quality raw products were used lowered the panel preference. The

    information that the food had been treated with unspecified chemicals and

    exposed to unspecified sterilizing rays did not alter panel findings.

    Temperature of Samples

    The samples must be of uniform temperature. Therefore the mechanical

    problems of serving foods at a constant and uniform temperature should be

  • carefully considered. The temperature at which food is usually eaten is

    recommended for samples. However, taste buds are less sensitive to very

    high or very low temperatures, which impair full flavor perception.

    Sample Uniformity

    To measure flavor differences in products of large unit size, such ascanned peach halves, slice the large units into smaller pieces and care-

    fully mix them to obtain a more uniform sample. To test the quality of cannedjuice, open several cans and mix all the juice together before preparing

    individual test samples.

    Coding

    Samples should be coded in such a manner that the judges cannot

    distinguish the samples by the code or be influenced by code bias. For

    example, if the samples are numbered 1, 2, 3, or lettered A, B, C, a coding

    bias could be caused because people associate 1 or A with "first" or"best" and might tend to score this sample higher. A set of three-digitrandom numbers should be assigned to each sample so that the panelists

    will receive samples coded differently (17).

    Number of Samples

    To determine the number of samples to be presented at one testingsession (12) consider the following:

    The nature of the product being tested — No more than six samples ofice cream should be evaluated because of the temperature of the product.

    The itensity and complexity of the sensory property being judged —

    It has been found (18) that with mild products such as green beans and

    canned peaches up to 20 samples may be tasted with no decrease in the

    taster's ability to discriminate.

    The experience of the taster — A professional tea, wine, or coffeetaster can evaluate hundreds of samples in one day.

    The amount of time and commodity available.

    Order of Presentation

    The order in which the samples are presented to the judges may also

    influence results. Studies on the effect of sample sequence on food prefer-

    ence (8) have shown position effect (the later samples were rated lower);

    contrast effect (serving "good" samples first lowered the ratings for "poor"

    samples); and convergence effect (tendency to make similar responses to

    successive stimuli). Contrast and convergence effects were shown to be

    independent of position effect. Random presentation to the panelists

    equalized these effects.

  • Figure 1. Preparation of samples.

    Figure 2. Presentation of samples through hatch.

    PANELISTS

    For economical reasons choose panel members from all available person-

    nel, including office, plant, and research staff. It should be considered a

    part of work routine for personnel in the food industry to serve as panelists.

    However, no one should be asked to evaluate foods to which he objects.

    Persons concerned with the test product (product development and produc-

    tion) and those who prepare the samples for testing should not be included

    on the panel.

    Panel members should be in good health and should excuse themselves

    if they are suffering from a cold. Nonsmokers and smokers have been found

    equally useful for panels. It is inadvisable for smokers to smoke within one

    to two hours before a test (3). Heavy smokers, people who smoke one or

    more packs of cigarettes per day, have been found to be generally less

    sensitive than nonsmokers (2). There are exceptions however. No correlationexists between age and sensitivity. A certain amount of sensitivity isrequired, but this seems not as important as experience (1). A person ofaverage sensitivity, a high degree of personal integrity, ability to concen-

    trate, intellectual curiosity, and willingness to spend time in evaluation

    may do a better job than a careless person with extreme acuity of taste and

    smell.

  • To keep the panelists interested in the work that is being done, showthem the results when each series of tests is completed. The importanceof the work can be shown by running the tests in a controlled, efficient

    manner.

    Selection and Training

    Panelists should be selected for their ability to detect differences.

    People who do well with some products often do poorly on others. A tasteris seldom equally proficient in tasting all foods.

    Researchers disagree on the value of training panelists. It has been

    stated (17) that screening tests could be used to choose panelists who are

    capable of detecting differences, and actual training would be unnecessary.

    At the Operational Research Group of Cadbury Bros. Ltd., in Bournville,

    England, panelists with sufficiently discriminating palates are selected on

    the basis of a series of triangular tests with typical confectionery materials.

    The selected panelists are given 10 weeks training on special training

    panels and finally they attend the formal sessions of a tasting panel for

    4 weeks before their assessments are actually used (15). Certainly a specific

    panel for the product and method being tested is more useful than a general-

    purpose panel. Panelists must "be familiar with the product and must know

    what constitutes good quality in the product. Preliminary sessions will

    help to clarify the meaning of descriptive terms. To be of any real use terms

    should be referable to specific objective standards (23).

    Number of Panelists

    Since there are many sources of variation in sensory tests, the more

    tasters on the panel, the more likely it is that the individual variations will

    balance. Four panelists is probably the minimum number (11), although most

    researchers think there should be eight or ten (10). A small panel of highsensitivity and ability to differentiate may be preferable to a large panel of

    less sensitivity.

    TESTING CONDITIONS

    Testing Area

    The room where the tests are run may be simple or elaborate, but the

    panelists must be independent of each other, in separate booths or in parti-

    tioned sections at a large table. The panelists must be free from distractions

    such as noise.

    If possible, the room should be odor free and separate from, though

    adjacent to, the area where the sample is prepared. Air-conditioning is

    useful in the testing area. The walls should be off-white or a light neutralgray so that sample color will not be altered.

    8

  • Figure 3. Taste panel area.

    Lighting

    Uniform lighting is essential. Natural white fluorescent lights are more

    suitable than cool white lights. Red fluorescent lights are used in the test-

    ing area of the Food Research Institute, Ottawa, to hide obvious color

    differences in samples whose flavor is being evaluated.

    Testing Schedule

    The time of day that tests are run influences results. Although this

    cannot be controlled if the number of tests is large, late morning and

    midafternoon have been found to be the best times for testing. Since a

    panelist's eating habits affect test results, it is desirable that no tests be

    performed in the period from 1 hour before a meal to 2 hours after a meal.

    Containers

    The samples to be tested should be presented to the panelists in clean,

    odorless, and tasteless containers.

    Tasting Procedures

    It is generally agreed that whether a panelist swallows the sample or

    spits it out the result of the test is unchanged. However, the panelist should

    be instructed to use the same method with each sample in each test.

  • Use crackers, white bread, celery, apples, or water to remove all traces

    of flavor from the mouth between tasting samples of certain foods. If water

    is used it should be at room temperature, as cold water reduces the effi-

    ciency of the taste buds.

    QUESTIONNAIRES

    The simplest type of questionnaire has been found to be the most effi-

    cient. Elaborate questionnaires divert the attention of the panelist and

    complicate interpretations. No blanks should be put in the questionnaire

    except those that apply to the panelist. The person analyzing the test

    results should use separate summary sheets.

    A new questionnaire should be prepared if the method or objective ofthe test changes. The increased reliability of the test results is worth the

    time and effort needed to make up new questionnaires.

    Questionnaires for some commonly used tests follow. Sample question-

    naires with examples of their analyses are shown in Appendix I.

    Figure 4. Panelist with samples and questionnaire.

    10

  • Figures 5—12. Examples of trays prepared for the following tests: 5, triangle test;

    6, duo»trio test; 7, paired comparison test; 8, ranking for difference; 9, multiple

    comparison; 10, scoring for difference; 11, scoring for preference; 12, ranking

    for preference.

    11

  • Difference Tests

    The triangle test — In the triangle test, three coded samples are pre-sented to the panelist. He is told that two samples are identical and he is

    asked to indicate the odd one. See sample questionnaire on page 15.

    The duo-trio test — In the duo-trio test, three samples are presented tothe taster, one is labeled R (reference) and the other two are coded. Onecoded sample is identical with R and the other is different. The panelist isasked to identify the odd sample. See sample questionnaire on page 17.

    Paired comparison test — In the paired comparison test, a pair of codedsamples that represent the standard or control and an experimental treatment

    are presented to the panelist, who is asked to indicate which sample has the

    greater or lesser degree of intensity of a specified characteristic — such assweetness and hardness. If more than two treatments are being considered,

    each treatment is compared with every other in the series. The design be-

    comes somewhat cumbersome if many treatments are compared.

    An example of a paired comparison test is given on page 31.

    In most instances the value of a paired comparison can be increased

    by a panelist's report on the extent of the difference found (20).

    Paired testing is generally used to compare new with old procedures

    and in quality control. When evidence from paired comparison experiments

    is reported, do not conclude without further facts that a less preferred

    treatment is of poor quality. The panelist should be asked to rate the quality

    of each treatment as good, fair, or poor.

    Ranking — The panelist is asked to rank several coded samples accord-ing to the intensity of some particular characteristic. See sample question-

    naire on page 24.

    Multiple comparison — In multiple comparison tests, a known referenceor standard sample is labeled R and presented to the panelist with severalcoded samples. The panelist is asked to score the coded samples in com-

    parison with the reference sample. See sample questionnaire on page 19.

    Scoring — Coded samples are evaluated by the panelist who records hisreactions on a descriptive graduated scale. These scores are given numerical

    values by the person who analyzes the results. See sample questionnaires

    on pages 27 and 30.

    The flavor-profile method — The flavor-profile procedure was developedby Arthur D. Little Incorporated. A small laboratory panel of six or eightpeople who have been trained in the method measure the flavor profile of

    food products. Descriptive words and numbers, with identical meaning to

    each panel member, are used to show the relative strength of each note on a

    suitable scale. With this method it is possible to determine small degrees

    12

  • of difference between two samples, the degree of blending, degrees of sim-

    ilarity, and overall impression of the product. Considerable knowledge of

    flavor is required for the interpretation of flavor-profile results, since they

    cannot be analyzed statistically. The flavor-profile method requires great

    skill, extensive education in odor and flavor sensations, and keen interest

    and intelligence on the part of the panelist. This technique has been re-

    viewed in detail by Sjostrom (21) and Caul (4).

    Dilution tests — Dilution tests involve the determination of the identi-fication threshold for the material under study. The flavor of a product is

    described in terms of the percentage of dilution or as a ratio that reflects

    the actual amount of odor or flavor detected. This method requires suitable

    standards for comparison and for dilution of the test material and is limited

    to foods that can be made homogeneous without affecting flavor (24).

    Preference Tests

    Paired comparison — The paired comparison test used in preferencetesting is similar to that used for difference testing. When testing prefer-

    ences, the panelist is asked which sample he prefers and the degree of

    preference. See sample questionnaire, page 37.

    Scoring — Many different types of scales have been developed to try todetermine a degree of like or dislike for a food. These scales may be worded

    "excellent, " "very good," "good," "poor," or in other similar manner.

    However, the preference scale that has probably received the most attention

    in the past 10 years is the nine-point hedonic scale developed at the Quar-

    termaster Food and Container Institute of the United States. Much time and

    effort was expended to determine which words best express a person's like

    or dislike of a food. This hedonic scale is the result of these investigations

    (14). See sample questionnaire on page 36.

    Ranking — Ranking follows the same procedure as difference testing

    (p. 12) except that when used as a preference test the questionnaire is

    worded so that the panelist will indicate his order of preference for the

    samples. See sample questionnaire on page 38.

    DESIGN OF EXPERIMENTS AND METHODS OF ANALYZING DATA

    The accuracy of sensory evaluation tests on food and the reliance that

    can be placed on their results depend on standardization of testing conditions

    and use of statistical methods of experimental design and analysis (22).

    Plan the experiment in advance so that a simple mathematical model

    may be applied to the analysis. Many simple mathematical models depend on

    the independent nature of the data. Experimental design was introduced to

    13

  • develop this independence and often makes a- simple mathematical modelapplicable. Experimental design makes the test efficient and saves time

    and material. An excellent reference for experimental design has beenwritten by Cochran and Cox (5). Application of an experimental designshould be approved by a statistician.

    Random choice of samples and order of presentation help eliminate error.Replication of tests will strengthen the results.

    A discussion of the significance of experimental data is usually basedon a comparison of what actually happened to what would happen if chance

    alone were operating. Before applying statistics to the analysis of experi-

    mental da;a, the specific concept of probability should be understood as it

    applies to particular experimental data. Appendix III explains the concept of

    probability. The way in which an experiment is conducted usually determines

    not only whether inferences can be made but also the calculations required

    to make them.

    CONSUMER TESTING

    The consumer test is used to measure consumer acceptance of a prod-

    uct. Although the fate of a food product depends on consumer acceptance,

    formal studies of consumer preference are recent. Consumer studies are

    completely separate from laboratory panels, which do not attempt to predict

    consumer reaction. In this publication methods used in consumer tests are

    not described. Ideally a consumer test should cover a large sample of the

    population for whom the product is intended and should involve geographic

    as well as income sampling. Because of these conditions, consumer tests

    are often conducted by a specialized company.

    14

  • APPENDIX I

    SAMPLE QUESTIONNAIRES AND EXAMPLES OF ANALYSES

    TRIANGLE TESTDIFFERENCE ANALYSIS

    DATE TASTER

    PRODUC T

    Instructions: Here are three samples for evaluation. Two of these samplesare duplicates. Separate the odd sample for difference only.

    (1) (2)

    Sample Check odd sample

    314

    628

    542

    (3) Indicate the degree of difference between the duplicate samples and the

    odd sample.

    Slight Much

    Moderate Extreme

    (4) Acceptability:

    Odd sample more acceptable

    Duplicate samples more acceptable.

    (5) Comments:

    15

  • Example:

    To determine if a difference existed between fish-potato flakes proc-essed under two different sets of conditions, a triangle test was used.

    The samples were first reconstituted by adding boiling water, then

    they were coded. On each tray there were three coded samples: two were

    the same and one was different. Eleven panelists were given two trayseach, one after the other, and asked to identify the odd sample on each

    tray. This made a total of 22 judgments.

    The odd sample was correctly identified 19 times. According to Statis-

    tical Chart 1 of Appendix II, page 39, 19 correct judgments from 22 panel-

    ists in a triangle test are significant at the 0.1 percent level. The

    conclusion was that a difference existed between the samples. If the

    number of correct judgments had been less than 12, the conclusion would

    be that no detectable difference existed between the samples.

    The degree of difference indicated by those panelists who correctlychose the odd sample was:

    Slight = 1 Much = 6

    Moderate = 7 Extreme = 5

    The next part of the triangle test was to choose the more acceptable

    sample. Of the 19 panelists who correctly identified the odd sample, 14

    found the same sample more acceptable. According to Statistical Chart 1,for a two-sample test (there were only two choices at this point), this is

    below the number required for significance at the 5 percent level. However,

    the same degree of significance cannot be attached to the results of this

    secondary question as to the primary question. Once it is determined that

    a difference exists, another test should be conducted to determine which

    sample is more acceptable (probably a paired comparison test).

    16

  • DUO-TRIO TESTDIFFERENCE ANALYSIS

    NAME

    DATE

    PRODUCT

    On your tray you have a marked control sample (R) and two coded

    samples, one is identical with R, the other is different. Which of the coded

    samples is different from R?

    SAMPLES CHECK ODD SAMPLE

    432

    701

    Example:

    To determine if methional could be detected when added to Cheddar

    cheese in amounts of 0.125 ppm and 0.250 ppm a duo-trio test was used.

    Each tray had a control sample marked R and two coded samples, one

    with methional added and one control. The duo-trio test was used in prefer-

    ence to the triangle test because less tasting is required to form a judgment

    using the duo-trio test. This fact becomes important when tasting a sub-

    stance with a lingering aftertaste, such as methional.

    The test was performed on two successive days using eight panelists.

    Each day the panelists were presented with two trays: one with 0.125 ppm

    and the other with 0.250 ppm methional added to a coded sample. This

    made a total of 16 judgments at each level. The results are shown in

    Table 1.

    17

  • TABLE 1

    Leve I of me thional added> PP 1Tl

    Panelists list day 2nd day

    0.125 0.250 0.125 0.250

    PI X R R RP2 R R R RP3 X R X RP4 R X X RP5 R R R RP6 X R X XP7 R R R RP8 R R R R

    Total 5 7 5 7

    P = PanelistX = Wrong

    R = Right

    0.125 ppm = 10 out of 16 correct

    0.250 ppm = 14 out of 16 correct

    Consult Statistical Chart 1 of Appendix II, page 39, for 16 panelists

    in a two-sample test, which shows that 14 correct judgments are significant

    at the 1 percent level, while 10 are not significant, even at the 5 percent

    level.

    The conclusion is that methional added to Cheddar cheese can be

    detected at the 0.250 ppm level but not at the 0.125 ppm level.

    18

  • NAME

    MULTIPLE COMPARISON

    DIFFERENCE ANALYSIS

    DATE

    QUESTIONNAIRE:

    You are receiving samples of . to compare for .

    You have been given a reference sample, marked R, to which you are tocompare each sample. Test each sample; show whether it is better than,

    comparable to, or inferior to the reference. Then mark the amount of dif-ference that exists.

    Sample Number

    Better than R

    Equal to R

    Inferior to R

    AMOUNT OF DIFFERENCE

    None

    Slight

    Moderate

    Much

    Extreme

    COMMENTS:

    Any comments you may have about the flavor of the samples may bemade here:

    Example:

    A multiple comparison test was conducted to determine how much anti-oxidant could be added to fish-potato flakes without tasters detecting a

    19

  • difference in flavor. The flakes tested contained no antioxidant (0), 1unit, 2 units, 4 units, and 6 units of antioxidant. Rach tray contained a

    reference sample labeled R that contained no antioxidant and five coded

    samples (the four different levels of antioxidant and one sample with no

    antioxidant). Fifteen panelists were asked to evaluate these samples

    according to the score sheets on page 19. The ratings were given numerical

    values 1 to 9 by the person analyzing the results with "no difference"

    equaling 5, "extremely better than R" equaling 1, and "extremely inferiorto R" equaling 9. The analysis of variance was calculated as shown inTable 2 and on the following pages.

    TABLE 2

    r^^n f*li etcLevel of antioxidant added

    Total1 dll ClloLa

    1 Unit 2 Units 4 Units 6 Units

    PI 1 4 5 1 9 20P2 3 3 5 5 7 23P3 7 3 4 4 7 25

    P4 5 7 7 3 9 31P5 3 3 3 3 1 13P6 1 1 1 1 2 6P7 5 5 3 5 6 24P8 2 2 3 2 5 14P9 1 3 3 3 3 13P10 1 1 1 7 5 15Pll 6 5 1 4 1 17

    P12 7 2 1 3 9 22

    P13 3 2 3 2 6 16P14 3 3 1 5 1 13P15 3 1 5 3 3 15

    Total 51 45 46 51 74 267

    P = Panelist

    Analysis of Variance

    Correction factor = (Total) 2 /Number of responses (15 tasters x 5 samples)

    CF = (267)V75= 71289/75= 950.52

    Sum of squares, samples = (Sum of the square of the total for each sample/

    Number of judgments for each sample) — CF

    = [(512 + 452 + 462 + 512 + 742)/15] - CF

    = (14819/15)- CF = 987.93 - 950.52

    = 37.41

    20

  • set up as follows:

    ss MS

    37.41 9.35

    111.28 7.95

    201.81 3.60

    350.48

    Sum of squares, panelists = (Sum of the square of the total for each panel-ist/Number of judgments by each panelist) — CF

    = [(202 + 232 + 252 ... + 152)/5] - CF

    = (5309/5)- CF= 1061.80- 950.52

    = 111.28

    Total sum of squares - Sum of the square of each judgment — CF

    = (12 + 32 + 72 + . . . + 32)_ CF

    = 1301.00- 950.52

    = 350.48

    The analysis of variance chart was then

    Source of variance df SS MS F

    Samples 4 37.41 9.35 2.59*

    Panelists 14 111.28 7.95 2.21*

    Error 56

    Total 74

    df — Degrees of freedom for samples is the number of samples minusone. There were five samples, so the degrees of freedom for samples

    in this example is 4. Degrees of freedom for panelists is the number

    of panelists minus one. There were 15 panelists so the df for this

    source is 14. The df for total is the total number of judgments

    minus 1 (75- 1 = 74).

    Error— (l)To determine the df for "error" subtract the values obtained

    for the other variables (in this case 4 for samples and 14 for

    panelists) from the total, 74,

    i.e. 74- (4 f 14)= 56.

    (2) To determine the SS for "error" subtract the values obtained

    for the other variables (in this case 37.41 for samples and

    111.28 for panelists) from the total, 350.48,

    i.e. 350.48- (37.41 + 111.28)= 201.81.

    MS — The mean square for any variable is determined by dividing the SS

    by its respective degree of freedom.

    F — The variance ratio or F value for samples is determined by dividingthe MS for samples by the MS for error or 9.35/3.60 = 2.59. The F

    value for panelists may be determined by dividing the MS for panel-

    ists by the MS for error.

    21

  • To determine if the difference between the samples is significant,

    the calculated F value (2.59) is checked in Chart 2 of Appendix II on

    pages 40 and 41. With 4 degrees of freedom in the numerator and 56 degrees

    of freedom in the denominator, the variance ratio (F value) must exceed

    2.52 to be significant at the 5 percent level and it must exceed 3.65 to be

    significant at the 1 percent level. The value of 2.59 is therefore significant

    at the 5 percent level (*). If the variance ratio is not significant, the

    conclusion is that the addition of up to 6 units of antioxidant does not

    make a detectable difference in the flavor of antioxidant.

    Since there is a significant difference between the samples, the ones

    that are" different can be determined by using Duncan's Multiple Range

    Test (7, 16).

    1 Unit 2 Units 4 Units 6 Units

    Sample score = 51 45 46 51 74

    Sample mean - Score/Number of

    panelists = 51/15 45/15 46/15 51/15 74/15

    = 3.4 3.0 3.1 3.4 4.9

    The sample means are arranged according to magnitude:

    A B C D E

    6 Units 4 Units 2 Units 1 Unit

    4 - 9 3.4 3.4 3.1 3.0

    The standard error of the sample mean:

    SE = y/ (MS error/Number of judgments for each sample)

    = V (3.60/15)= V 0.24 = 0.49

    The "shortest significant ranges" for 2, 3, 4, and 5 means are deter-

    mined by using Chart 3 of Appendix II, on page 42, for the 5 percent level

    of probability and Chart 4 for the 1 percent level. In this case to determine

    the 5 percent level, Chart 3 is used to find the "Studentized ranges" rp for

    p - 2, . . . 5 means with 56 degrees of freedom (since 56 is not shown, the

    figure 60 is used).

    These values are then multiplied by the standard error of the mean, to

    obtain the shortest significant ranges, Rp. This gives:

    P 2 3 4 5rp (5 percent) 2.83 2.98 3.08 3.14

    Rp 1.39 1.46 1.51 1.54

    22

  • The differences between the sample means are compared with the shortest

    significant range appropriate for the range under consideration in the fol-

    lowing order:

    (i) highest minus the lowest, highest minus second lowest, and so

    on to highest minus second highest;

    (ii) second highest minus lowest and so on to second highest minus

    third highest;

    (iii) and so on down to second lowest minus lowest.

    If, at any stage, a difference in (i) does not exceed the shortest sig-

    nificant range, the procedure stops, Say, for example, the highest mean

    minus the kth mean does not exceed the shortest significant range, then a

    line is drawn underscoring all means between the highest and the kth means

    inclusive, which indicates that this set of means should be grouped together

    as exhibiting no significant differences.

    Determination of shortest significant ranges

    Sample Sample Sample

    1

    Sample

    4

    Sample

    5

    Range a = 2

    b= 3

    c= 4

    d = 5

    (i)

    A-E= 4.9- 3.0= 1.9> 1.54 (R 5 )A- D= 4.9- 3.1= 1.8> 1.51 (R 4 )

    A- C = 4.9- 3.4= 1.5 > 1.46 (R 3 )A- B= 4.9- 3.4= 1.5 > 1.39 (R 2 )

    A is underscored as it is significantly different from the others.

    A BC DE

    (ii)

    B- E= 3.4- 3.0= 0.4 < 1.51 (1 4 )Samples B, C, D, and E are underscored together as they exhibit nosignificant differences.

    23

  • Proceed no further, since there is no significant difference between

    B and E.

    Therefore, the conclusion is that at the 5 percent level A is signif-

    icantly different from E or that 6 units made a significant differencein flavor from 0. This procedure may be repeated using Chart 4 if the

    samples were significantly different at the 1 percent level.

    This procedure can be repeated to see which panelists are signifi-

    cantly different from each other.

    RANKING

    DIFFERENCE ANALYSIS

    NAME

    DATE

    PRODUCT

    Evaluate these samples for tenderness. Please rank the samples for

    tenderness. The sample that is tenderest is ranked first, the second tender-

    est is ranked second, the toughest sample is ranked third. Place the code

    number in the appropriate box.

    Example:

    A ranking test was used to compare the texture of meat of three breedsof geese. The cooked meat was cut into pieces % inch by % inch by 1

    inch and one coded sample from each breed was presented to each of eight

    panelists. These panelists ranked the samples according to the score

    sheet above.

    24

  • Results:

    Ranks: B, B 2 B 3

    PI 2 1 3

    P2 2 1 3

    P3 2 1 3P4 1 2 3P5 1 3 2

    P6 2 1 3

    P7 2 1 3P8 1 2 3

    Total 13 12 23

    P = PanelistB = Breed1 = First

    2 = Second

    3 = Third

    To analyze these results the ranks were transformed into scores,according to Fisher and Yates (9). Chart 5 of Appendix II, page 44, was

    used to determine the numerical value for each score. The sample ranked

    first of three samples was given a value of 0.85. When converting ranks,

    the middle rank is given a value of zero and the ranks beyond the middle

    are given negative values corresponding to the positive values given in

    the chart. In this case second is and third is —0.85. If we had six ranksthe values would be:

    first = 1.27

    second = 0.64

    third = 0.20

    fourth = -0.20

    fifth = -0.64

    sixth =-1.27

    In Chart 5 values can be assigned to ranks in tests with up to 30 samples

    in the manner described above.

    Totalcores BH BP B c

    PI 0.85 -0.85

    P2 0.85 -0.85

    P3 0.85 -0.85

    P4 0.85 -0.85P5 0.85 -0.85

    P6 0.85 -0.85

    P7 0.85 -0.85

    P8 0.85 -0.85

    Total 2.55 3.40 -5.95

    25

  • The scores were then analyzed by the analysis of variance, as on page 20.

    CF =

    SS samples = ([2.552 + 3.402 + (-5.95)2] /8) _. CF

    = (53.465/8)-

    = 6.68

    SS panelists = 0/3 =

    Total SS = [02 + 02 + 02 + 0.852 . . . + (-0.85) 2 ] - CF

    = 11.56

    Variables df SS MS F

    Samples 2 6.68 3.34 9.54**

    Panelists 7

    Error 14 4.88 0.35

    Total 23 11.56

    Samples BH Bp B c

    2.55 3.40 -5.95

    Mean samples 0.32 0.43 -0.74

    A B C

    + 0.43 0.32 -0.74

    Standard error / (0.35/8) /0.04375

    0.209

    2 3

    rp (5 percent) 3.03 3.18

    RP 0.63 0.66

    A - C = 0.43 - (-0.74) = 1.17 > 0.66 (R3 )

    A- B= 0.43 -0.32 = 0.11 < 0.63 (R 2 )

    A _B C

    B-C=0.32 - (-0.74)= 1.06 > 0.63 (R 2 )C is significantly different from A and B.

    The conclusion is that the meat from breed C was significantly less tender

    than that of breeds H and P at the 5 percent level.

    26

  • SCORING

    DIFFERENCE TEST

    NAME

    DATE

    PRODUCT

    Evaluate these samples for flavor. Taste test each one. Use the ap-

    propriate scale to show your evaluation and check the point that best

    describes your feeling about the flavor of the sample.

    Code

    815

    Excellent

    .Very good

    . Good

    . Fair

    Poor

    Very poor

    Code

    558

    Excellent

    Very good

    Good

    Fair

    Poor

    Very poor

    Code

    394

    Excellent

    Very good

    Good

    Fair

    Poor

    Very poor

    Reason Reason Reas on

    27

  • Example:

    Taste tests were conducted to determine any difference in flavor in

    the meat of three breeds of geese. The scoring test was used rather thanmultiple comparison as there was no standard or reference with which to

    compare. Eight panelists rated the coded samples according to the score

    sheet on page 27. The ratings were given numerical values by the person

    analyzing results with excellent = 1, very poor = 6.

    The results were analyzed by the analysis of variance, see page 20.

    Bj B 2 B 3 Total

    PI 3 2 3 8P2 4 6 4 14P3 3 2 3 8P4 1 4 2 7P5 2 4 2 8P6 1 3 3 7P7 2 6 4 12

    P8 Jl A _?_ 12.Total 18 33 23 74

    Correction factor = 742/24= 5476/24= 228.17

    SS samples = (1/8) (182 + 332 + 232) _ CF

    = (1/8) x 1942- CF

    = 242.75- 228.17

    = 14.58

    SS panelists = (1/3) (82 + I42 + . . . + 102) - CF

    = (1/3) x 730- CF

    = 243.33- 228.17

    = 15.16

    Total SS = (32 +42 + ... +22) -CF= 276- CF

    = 276- 228.] 7

    = 47.83

    Variables df SS MS F

    Samples 2 14.58 7.29 5.65*

    Panelists 7 15.16 2.17

    Error 14 18.09 1.29

    Total 23 47.83

    28

  • There is a significant difference between samples at the 5 percent

    level.

    The Multiple Range Test is used to determine which samples are

    significantly different from the others.

    B, B 2 B 3

    Mean samples = 18/8 33/8 23/8

    = 2.25 4.13 2.88

    Ranked means = A B C

    B 2 B 3 B!

    4.13 2.88 2.25

    Standard error = V (1.29/8)

    = 0.4

    = V 0.16

    rp (5 percent) 3.03 3.18

    Rp 1.21 1.27

    A - C = 4.13 - 2.25 = 1.88 > 1.27 (R 3 )

    A- B = 4.13- 2.88= 1.25 > 1.21 (R 2 )A is significantly different from B and C.

    B- C = 2.88- 2.25 = 0.63 < 1.21 (R 2 )A B C

    B and C are not significantly different from each other. The conclusionis that breed 2 is significantly different from breeds 1 and 3 at the 5 per-

    cent level.

    29

  • SCORING

    DIFFERENCE TEST

    NAME

    DATE

    Evaluate these samples of goose meat for tenderness. Taste test

    each one. Use the appropriate scale to show your evaluation by checking

    at the point that best describes your feeling about the sample.

    Code

    664

    Extremely tender

    Very tender

    Moderately tender

    Slightly tender

    Slightly tough

    Moderately tough

    Very tough

    Extremely tough

    Code

    758

    Extremely tender

    Very tender

    Moderately tender

    Slightly tender

    Slightly tough

    Moderately tough

    Very tough

    Extremely tough

    Code

    708

    Extremely tender

    Very tender

    Moderately tender

    Slightly tender

    Slightly tough

    Moderately tough

    Very tough

    Extremely tough

    Reason Reason Reason

    Note: This is another example of scoring for difference. Analysis of vari-

    ance is used to analyze results (see page 20).

    Numerical values: extremely tender = 1

    extremely tough = 8

    30

  • PAIRED COMPARISON

    DIFFERENCE TEST

    DATE TASTER

    PRODUCT

    Evaluate these two samples of peaches for texture.

    1. Is there a difference in texture between the two samples?

    Yes No

    2. Indicate the degree of difference in texture between the two samples by

    checking one of the following statements.

    846 is extremely better than 165

    846 is much better than 165

    846 is slightly better than 165

    No difference

    165 is slightly better than 846

    165 is much better than 846

    165 is extremely better than 846

    3. Rate the texture of the samples.

    165 846

    Good Good

    Fair Fair

    oor roorPoi

    Comments:

    31

  • One of the tests conducted as part of a study of the effect of small

    doses of irradiation on the keeping quality of fresh peaches was a paired

    comparison test on texture.

    Four samples were compared:

    (1) control or krads sample, (2) 150 krads sample, (3) 200 krads sample,

    and (4) 250 krads sample.

    Each sample was compared with every other sample, making a total of six

    pairs. Each pair was presented to eight panelists for evaluation according

    to the score sheet on page 31. The experimental design requires that half

    the panelists taste one sample of the pair first and that the others taste

    the second sample first.

    The ratings of the panelists were given numerical values of +3, +2, +1,0,-1,-2,-3. Example:

    Sample Code

    150 krads 8461 Pair= 250 krads 165

    The score sheet for the four panelists tasting sample 846 first was set up

    as follows with the values on the right being assigned by the analyzer.

    846 is extremely better than 165 (+3)

    846 is much better than 165 (+2)846 is slightly better than 165 ( + 1)

    No difference ( 0)

    165 is slightly better than 846 (-1)

    165 is much better than 846 (-2)

    165 is extremely better than 846 (-3)

    The four panelists who tasted sample 165 first received a score sheet set

    up as follows with the corresponding values on the right.

    165 is extremely better than 846 (+3)

    165 is much better than 846 (+2)

    165 is slightly better than 846 (+1)

    No difference. ( 0)

    846 is slightly better than 165 (-1)

    846 is much better than 165 (-2)

    846 is extremely better than 165 (-3)

    If one of the four panelists tasting 165 first indicated that 846 was much

    better than 165, his score was -2.

    The results of the scoring for all six pairs by all tasters was tabulated

    as shown in Table 3.

    32

  • TABLE 3

    Order of

    presentation

    Fre

    -3

    quency i

    -2 -]

    j{ scores

    +1

    equal

    +2

    to

    +3

    Total

    scoreMean

    Average

    preference

    0,150

    150,0 3

    2

    1

    1 1 1

    -7

    0.25

    -1.75

    1.00

    0,200

    200,0 2

    1 1

    1

    2

    1

    «

    5

    -1

    1.25

    -0.25

    0.75

    0,250

    250,0 2

    1

    2

    2 1 8

    -2

    2.00

    -0.50

    1.25

    150,200

    200,150

    1 1

    1 1

    2

    2

    -1

    1

    -0.25

    0.25

    -0.25

    150,250

    250,150 3

    2 1

    1

    1 3

    -2

    0.75

    -0.50

    0.625

    200,250

    250,200

    1 1 2

    2 1 1

    -3

    3

    -0.75

    0.75

    -0.75

    Total 9 9 8 13 8 1

    Mean = Total score/Number of panelists = 1/4 = 0.25

    Average preferences = Vi (mean for 0,150 - mean for 150,0)

    = Vi (0.25-(-1.75)) = % (2.00) = 1.00

    The average preference of over 150 was 1.00 and the average preferenceof 150 over was -1.00. Thus the average preference of over 200 equals- the average preference of 200 over 0.

    The above data were used to perform an analysis of variance, accord-ing to the method of Scheffe (20).

    Main effects of treatments (&) were calculated by totaling the average

    preference of each sample over every other sample and dividing by the

    number of treatments.

    &o = M (average preference of over 150 + average preference of over

    200 + average preference of over 250)

    = % (1.00 +0.75 + 1.25)

    = % (3.00) =0.75

    33

  • a 150= Va (-1.00 - 0.25 + 0.625) = -0.15625

    a2Q0

    = M (-0.75 + 0.25 - 0.75) = -0.3125

    a 250 = Va (-1.25- 0.625 + 0.75) = -0.28125

    The order effect (6) was calculated by totaling the mean for each order

    of each pair and dividing this total by the number of ordered pairs (6 pairs

    x 2 orders = 12).

    6 = (1/12) (0.25-1.75 + 1.25 -0.25 + 2.00 -0.50 -0.25 + 0.25 +0.75 -0.50

    -0.75 +0.75)

    = 0.104

    ANALYSIS OF VARIANCE TABLE

    Variables df SS MS F

    Main effects 3 24.4375 8.1458

    Order effect 1 0.5192 0.5192

    Error 44 74.0433 1.6828

    4.84**

    Total 48 99.0000

    Sum of squares for main effects = number of panelists x number of treatmentsx sum of squares of each a

    = 8 x 4 x [0.752 + (-0.15625) 2 + (-0.3125) 2 + (-0.28125)?] = 24.4375

    Sum of squares for order effect = number of panelists x number of pairsx order effect {8 )

    = 8 x 6 x 0.104 2 =0.5192

    Total sum of squares (using the frequency of each score as shown in Table 3)

    = 32(0 + 1) + 2

    2(9 + 8) + 1

    2(9 + 13) +

    2(8)

    = 99.00

    Sum of squares for error = 99.00 - 24.4375 - 0.5192 = 74.0433

    di-Main effects. There were 4 treatments, so the degrees of freedom = 3.

    Order effect. There were 2 orders — one sample of pair first or the othersample of the pair first, df = 1.

    Total. For paired comparison, the df for total is the total number of

    observations = 48.

    Error. 48 - 3 - 1 = 44.

    MS for main effects = 24.4375/3 = 8.1458

    MS for order effect = 0.5192/1 = 0.5192

    MS for error = 74.0433/44 = 1.6828

    34

  • The F ratio is determined for each variable by dividing its MS by theMS error. Consult Chart 2 on pages 40 and 41 to see if the F ratio is signi-ficant. This procedure is described on page 22. In our example, the main

    effects are significantly different at the 1 percent level.

    Use Tukey's Test to determine which samples are different. Because

    Duncan's Multiple Range Test (7) has previously been used and explained

    in this booklet (see pages 22-23), it is used here to determine which sam-

    ples are different.

    150 200 250

    Average preferences

    0.75

    ai50

    -0.15625

    a200

    -0.3125

    a250

    -0.28125

    B D

    ao

    0.75

    a150

    -0.15625

    a250

    -0.28125

    a200

    -0.3125

    S.E . =\/ (MS error/Number of judgments for each sample) = \/ (1.6828/24)

    = y/ 0.070 =0.27

    rp (5 percent)

    RP

    2

    2.86

    0.77

    3

    3.01

    0.81

    4

    3.10

    0.84

    61o~ Soo = °-75 ~ (-0-3125) = 1.0625 > 0.84

    250 = 0.75 - (-0.28125) = 1.03125 > 0.81

    a - a150

    = 0.75 - (-0.15625) = 0.90625 > 0.77

    Siso

    " a 200 = "0-15625 - (-0.3125) = 0.15626 < 0.81

    a. a150

    a250

    a200

    The control sample is significantly different from the other samples. Its

    score is higher and therefore it would be considered to have better texture.

    This does not mean necessarily that the other samples are of poor texture.

    To determine the quality of the samples, the ratings given them by the

    panelists were tabulated to see if they were considered to be good, fair, or

    poor. An average score can be determined for each sample by assigning the

    values good = 3, fair = 2, poor = 1, and computing the average for each

    sample.

    35

  • HEDONIC SCALE

    SCORING

    DATE TASTER

    PRODUCT

    Taste test these samples and check how much you like or dislike each

    one. Use the appropriate scale to show your attitude by checking at the

    point that best describes your feeling about the sample. Please give a

    reason for this attitude. Remember you are the only one who can tell what

    you like. An honest expression of your personal feeling will help us.

    CODE

    459

    CODE

    667

    CODE

    619

    CODE

    347

    Like

    extremely

    Like

    very much

    Like

    moderately

    Like

    slightly

    Neither like

    nor dislike

    Dislike

    slightly

    Dislike

    moderately

    Dislike

    very much

    Dislike

    extremely

    REASON

    Like

    extremely

    Like

    very much

    Like

    moderately

    Like

    slightly

    Neither like

    nor dislike

    Dislike

    slightly

    Dislike

    moderately

    Dislike

    very much

    Dislike

    extremely

    REASON

    .Like

    extremely

    Like

    very much

    Like

    moderately

    .Like

    slightly

    Neither like

    nor dislike

    Dislike

    slightly

    . Dislike

    moderately

    _Dislike

    very much

    Dislike

    ext remely

    REASON

    Like

    extremely

    Like

    very much

    Like

    moderately

    Like

    slightly

    Neither like

    nor dislike

    Dislike

    slightly

    .Dislike

    moderately

    Dislike

    very much

    Dislike

    extremely

    REASON

    Note: To analyze the results of this test use analysis of variance (see

    pages 20-23).

    Numerical values: like extremely = 9

    dislike extremel) = 1

    36

  • PAIRED COMPARISON

    PREFERENCE

    DATE TASTER

    PRODUCT

    INSTRUCTIONS: (A) Here are two samples for evaluation. Please indicatewhich sample you prefer.

    622 244

    (B) Indicate the degree of preference between the two

    samples.

    Slight

    Moderate

    Much

    Extreme

    Example:

    To determine which of two samples of Cheddar cheese (Number 1 or

    Number 2) had more acceptable flavor, a paired comparison test was used.

    The samples were coded and one sample of each cheese was presented on

    each tray. Eight panelists were presented with three trays each, one after

    the other. This made a total of 24 judgments.

    Cheese Number 2 was preferred 17 times in the 24 judgments. Accord-

    ing to Statistical Chart 1 of Appendix II, on page 39, in the two-sample

    test, one sample must be preferred 18 times to be significantly more

    acceptable. So we conclude that neither cheese was significantly more

    acceptable for flavor.

    37

  • RANKING

    PREFERENCE

    NAME

    DATE

    PRODUCT.

    Please rank these samples according to your preference.

    Code

    First

    Second

    Third

    Fourth

    Note: The results are analyzed by converting ranks to scores and con-

    ducting analysis of variance as shown for Ranking Difference Analysis,

    page 24.

    38

  • APPENDIX IISTATISTICAL CHART 1

    Number

    of

    tasters

    Two-sample test

    number of concurring c

    necessary to establ

    significance

    rhoices

    lish

    Triangle test

    difference analysis,

    number of correct answers

    necessary to establish

    significance

    * ** *** * ** ***

    1

    2

    3

    4

    5

    — — —

    3

    4

    5

    —-

    - - -

    5

    -

    6 6 — — 5 6 —

    7 7 — — 5 6 78 8 8 — 6 7 89 8 9 — 6 i 8

    10 9 10 — 7 8 911 10 11 11 7 8 10

    12 10 11 12 8 9 10

    13 11 12 13 8 9 11

    14 12 13 14 9 10 11

    15 12 13 14 9 10 12

    16 13 14 15 9 11 12

    17 13 15 16 10 11 13

    18 14 15 17 10 12 13

    19 15 16 17 11 13 14

    20 15 17 18 11 13 14

    21 16 17 19 12 13 15

    22 17 18 19 12 14 15

    23 17 19 20 12 14 16

    24 18 19 21 13 15 16

    25 18 20 21 13 15 17

    26 19 20 22 14 15 17

    27 20 21 23 14 16 18

    28 20 22 23 15 16 18

    29 21 22 24 15 17 19

    30 21 23 25 15 17 19

    31 22 24 25 16 18 20

    32 23 24 27 16 18 20

    33 23 25 27 17 18 21

    34 24 25 27 17 19 21

    35 24 26 28 17 19 22

    36 25 27 29 18 20 22

    37 25 27 29 18 20 22

    3 8 26 28 30 19 21 23

    39 27 28 31 19 21 23

    40 27 29 31 19 21 24

    41 27 29 32 20 22 24

    42 28 30 32 20 22 25

    43 28 30 33 21 23 25

    44 29 31 33 21 23 25

    45 30 32 34 22 24 26

    46 . 30 32 35 22 24 26

    47 31 33 35 23 24 27

    48 31 33 36 23 25 27

    49 32 34 37 23 25 28

    50 32 35 37 24 26 28

    5 percent level of significance. ** 1 percent level. ***0.1 percent level.

    39

  • STATISTICAL CHART 2

    Variance Ratio — 5 Percent Points for Distribution of Fn^ — Degrees of freedom for numeratorn — Degrees of freedom for denominator

    n2 \ 1 2 3 4 5 6 8 12 24 oo

    1 161.4 199.5 215.7 224.6 230.2 234.0 238.9 243.9 249.0 254.3

    2 18.51 19.00 19.16 19.25 19.30 19.33 19.37 19.41 19.45 19.50

    3 10.13 9.55 9.28 9.12 9.01 8.94 8.84 8.74 8.64 8.53

    4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.63

    5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.36

    6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67

    7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 3.41 3.23

    8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 3.12 2.93

    9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.07 2.90 2.71

    10 4.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54

    11 4.84 3.98 3.59 3.36 3.20 3.09 2.95 2.79 2.61 2.40

    12 4.75 3.88 3.49 3.26 3.11 3.00 2.85 2.69 2.50 2.3D

    13 4.67 3.80 3.41 3.18 3.02 2.92 2.77 2.60 2.42 2.21

    14 4.60 3.74 3.34 3.11 2.96 2.85 2.70 2.53 2.35 2.13

    15 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07

    16 4.49 3.63 3.24 3.01 2.85 2.74 2.59 2.42 2.24 2.01

    17 4.45 3.59 3.20 2.96 2.81 2.70 2.55 2.38 2.19 1.96

    18 4.41 3.55 3.16 2.93 2.77 2.66 2.51 2.34 2.15 1.92

    19 4.38 3.52 3.13 2.90 2.74 2.63 2.48 2.31 2.11 1.88

    20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84

    21 4.32 3.47 3.07 2.84 2.68 2.57 2.42 2.25 2.05 1.81

    22 4.30 3.44 3.05 2.82 2.66 2.55 2.40 2.23 2.03 1.78

    23 4.28 3.42 3.03 2.80 2.64 2.53 2.38 2.20 2.00 1.76

    24 4.26 3.40 3.01 2.78 2.62 2.51 2.36 2.18 1.98 1.73

    25 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71

    26 4.22 3.37 2.98 2.74 2.59 2.47 2.32 2.15 1.95 1.69

    27 4.21 3.35 2.96 2.73 2.57 2.46 2.30 2.13 1.93 1.67

    28 4.20 3.34 2.95 2.71 2.56 2.44 2.29 2.12 1.91 1.65

    29 4.18 3.33 2.93 2.70 2.54 2.43 2.28 2.10 1.90 1.64

    30 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62

    40 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51

    60 4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.39

    120 3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61 1.25

    oo 3.84 2.99 2.60 2.37 2.21 2.09 1.94 1.75 1.52 1.00

    40

  • STATISTICAL CHART 2 - Concluded

    Variance Ratio — 1 Percent Points for Distribution of Fn^ — Degrees of freedom for numeratorrc2— Degrees of freedom for denominator

    n 2 ^\1 2 3 4 5 6 8 12 24 so

    1 4052 4999 5403 5625 5764 5859 5981 6106 6234 6366

    2 98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.42 99.46 99.50

    3 34.12 30.81 29.46 28.71 28.24 27.91 27.49 27.05 26.60 26.12

    4 21.20 18.00 16.69 15.98 15.52 15.21 14.80 14.37 13.93 13.46

    5 16.26 13.27 12.06 11.39 10.97 10.67 10.29 9.89 9.47 9.02

    6 13.74 10.92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88

    7 12.25 9.55 8.45 7.85 7.46 7.19 6.84 6.47 6.07 5.65

    8 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.67 5.28 4.86

    9 10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.11 4.73 4.31

    10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91

    11 9.65 7.20 6.22 5.67 5.32 5.07 4.74 4.40 4.02 3.60

    12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 3.78 3.36

    13 9.07 6.70 5.74 5.20 4.86 4.62 4.30 3.96 3.59 3.16

    14 8.86 6.51 5.56 5.03 4.69 4.46 4.14 3.80 3.43 3.00

    15 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87

    16 8.53 6.23 5.29 4.77 4.44 4.20 3.89 3.55 3.18 2.75

    17 8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.45 3.08 2.65

    18 8.28 6.01 5.09 4.58 4.25 4.01 3.71 3.37 3.00 2.57

    19 8.18 5.93 5.01 4.50 4.17 3.94 3.63 3.30 2.92 2.49

    20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.23 2.86 2.42

    21 8.02 5.78 4.87 4.37 4.04 3.81 3.51 3.17 2.80 2.36

    22 7.94 5.72 4.82 4.31 3.99 3.76 3.45 3.12 2.75 2.31

    23 7.88 5.66 4.76 4.26 3.94 3.71 3.41 3.07 2.70 2.26

    24 7.82 5.61 4.72 4.22 3.90 3.67 3.36 3.03 2.66 2.21

    25 7.77 5.57 4.68 4.18 3.86 3.63 3.32 2.99 2.62 2.17

    26 7.72 5.53 4.64 4.14 3.82 3.59 3.29 2.96 2.58 2.13

    27 7.68 5.49 4.60 4.11 3.78 3.56 3.26 2.93 2.55 2.10

    28 7.64 5.45 4.57 4.07 3.75 3.53 3.23 2.90 2.52 2.06

    29 7.60 5.42 4.54 4.04 3.73 3.50 3.20 2.87 2.49 2.03

    30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01

    40 7.31 5.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80

    60 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 2.12 1.60

    120 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 1.95 1.38

    oo 6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00

    41

  • <Xu-J

    <y

    <

    J

    3 5?3^ C/3 S

    o o o Cl CO r- IC ^. CO CM o CO m cmo o lO O 1 - NO lO -t "* CO CO CO CO CI

  • <Xu-I

    <y

    <

    Du

    en m E—

    i

    H v:U. e CO

    D l>Q_ ^ GL3

    c/5

    S

    lO On NO lO r- o -+" O to Ol CTn IO rf CO Ol ,_, ON NO IO 00o O o CO lO co CO o co r- m co Ol l-H o o o CO CO co t - I - f- 1 - t^ r- NO NO NO NOo o rf On r- no VO NO m IO m 10 IO IO lO to -+ rt rf rf rf rf rf rf rf rf rf rf rf rf1-1 o 1-1

    in On NO IO r- O -+ ON to CM On to rf CO CM pH O NO rf Oo o o co lO CO CO o co r- U5 CO OJ '—

    1

    o o o cc CO co t - t - t - O- O. t- NO NO NO NOlO o

    Onrf On f~ NO NO NO lO tO in LO LO tO lO m -f rf rf rf rf rt rf rf rf rf rf rf rf rf

    tO o £ IO r- o -+ Cl\ IO O) On IO Ol r> r- IO On co 2 ^o O o CO 1/5 00 CO o CO t^ to n r^ IO CO r^ O m COco o o CO lO r^ co o CO r- lO CO Ol 1—1 o O On co co cc t - t - t - NO O NO to tO rf CO

    oOn

    rf On t-^ NO NO NO IO LO LO LO to to LO -f -f -+ rf rf rf rf rf if rf rf rf rf rf rf

    CO rt c\ CO -f r- ^H NO Ol On NO ,_, 1^- IO CM ,_| rf r- CM rfVO O o Ol rt r^ Ol o r~ NO •H. CO Ol 1—

    1

    o o o CC CO f- t - i - NO NO NO NO to rf rf CO^H d

    onrf1—

    1

    O t^ nO NO lO to m LO tO tO LO lO it -t -+ rf rf rf rf rf rf rf rf rf rf rf rfOl co r- CO o i+ co CO On ^r> c^ co •+ OJ O 00 r— •* CO *— NO NO lO lO 1/5 10 lO IO IO lO **-* rf rf rf rf rf rf rf rf rf rf rf rf rf

    NO Tf co rt NO o -f O NO CM 0!N to CM en NO rt NO ON IO NOCM O o o CO NO '—

    1

    co NO lO CO CM 1—

    i

    o O o co co t- r- NO VO NO LO lO LO rf CO CO CM1—1 O

    o\rf On t- NO NO lO lO LO in lO lO lO -* Tf -* * rf rt rf rf rt rf rf rf rf rf rf rf

    CO in r- 00 ^ 1* ON to _ r- 1(0 ro r^ co ,_ m _ rf PA OO o o O co 10 o co lO -I- Ol 1—

    i

    O o O cc. 1 - [ - r- NO NO NO to LO to rf rf CO OJ CMoON

    rf On t~- NO NO m IO m lO to LO »* * -* -* -* rf rf rf rf rf rf rf rf rf rf rf rf

    rt o co I—

    1

    NO «* Ol Ol -t- 1^ ^H NO OJ co rf ,_, r^ cn NO o r-- CO Orf

    O c_> NO o> 1—

    1

    NO CO OH On DO i - NO NO 1/5 IO rt *+ CO rt co Ol CM CM O o OnO -f CO NO vO m lO in -f rt -+ rt -f -f It -t -t rf rf rf rf -+ rf rf rf rf rf CO COo 1—1

    >o —

    i

    CM o NO CO co 1/5 X CM r- rt O r~ it Ol r - rf , | 00 NO o. CM NO Oo o LO CO ON lO CM o CO t^ NO tO -* >* CO co CO 01 Ol Ol i—

    1

    1—

    i

    O O o O 00 COo rf CO NO LO m in in if rf rt rt Tf >+ rt rt rt Tf rt rf rf rf rf -* it ^o co CO COo r""t

    NO l-H O -t CO rt o CO o 01 NO r-H r - m O r- IO CM Q> NO CO ,— (0\ OJ NO ^H rtCM

    o © 01 CO r~ 01 c* r~ NO rt co CO Ol CM

  • STATISTICAL CHART 5

    Scores for Ranked Data

    The mean deviations of the 1 st , 2nc% 3fd . . . largest members of samples

    of different sizes; zero and negative values omitted.

    Ordinalnumber 2 3

    Size

    4

    of Sam

    5

    pie

    6 7 8 9 10

    1 .56 .85 1.03 1.16 1.27 1.35 1.42 1.49 1.54

    2 .30 .50 .64 .76 .85 .93 1.00

    3 .20 .35 .47 .57 .66

    4 .15 .27 .38

    5 .12

    11 12 13 14 15 16 17 18 19 20

    1 1.59 1.63 1.67 1.70 1.74 1.76 1.79 1.82 1.84 1.87

    2 1.06 1.12 1.16 1.21 1.25 1.28 1.32 1.35 1.38 1.41

    3 .73 .79 .85 .90 .95 .99 1.03 1.07 1.10 1.13

    4 .46 .54 .60 .66 .71 .76 .81 .85 .89 .92

    5 .22 .31 .39 .46 .52 .57 .62 .67 .71 .75

    6 .10 .19 .27 .34 .39 .45 .50 .55 .591-7

    t .09 .17 .23 .30 .35 .40 .45

    8 .08 .15 .21 .26 .31

    9 .07 .13 .19

    10 .06

    21 22 23 24 25 26 27 28 29 30

    1 1.89 1.91 1.93 1.95 1.97 1.98 2.00 2.01 2.03 2.04

    2 1.43 1.46 1.48 1.50 1.52 1.54 1.56 1.58 1.60 1.62

    3 1.16 1.19 1.21 1.24 1.26 1.29 1.31 1.33 1.35 1.36

    4 .95 .98 1.01 1.04 1.07 1.09 1.11 1.14 1.16 1.18

    5 .78 .82 .85 .88 .91 .93 .96 .98 1.00 1.03

    6 .63 .67 .70 .73 .76 .79 .82 .85 .87 .89

    7 .49 .53 .57 .60 .64 .67 .70 .73 .75 .78

    8 .36 .41 .45 .48 .52 .55 .58 .61 .64 .67

    9 .24 .29 .33 .37 .41 .44 .48 .51 .54 .57

    10 .12 .17 .22 .26 .30 .34 .38 .41 .44 .47

    11 .06 .11 .16 .20 .24 .28 .32 .35 .38

    12 .05 .10 .14 .19 .22 .26 .29

    13 .05 .09 .13 .17 .21

    14 .04 .09 .12

    15 .04

    Tests of psychological preference and some other experimental datasuffice to place a series of magnitudes in order of preference, without supplying

    metrical values. Analyses of variance, correlations, etc., can be carried out on

    such data by using the normal scores, appropriate to each position in order, in

    a sample of the size observed. Ties may be scored with the means of the ordinalvalues involved, but in such cases the sums of squares will require correction.

    44

  • APPENDIX III

    NOTES ON INTRODUCTORY STATISTICS 1

    The purpose of these notes is to put in a written form the content of

    a 3-hour discussion on some basic statistical concepts, held at the Food

    Research Institute of the Department of Agriculture.

    The term statistics includes the numerical descriptions (figures) of

    the quantitative aspects of things, and the body of methods (subject) used

    for making decisions in the face of uncertainty. The decisions may vary.

    They may be concerned with estimating the probability of rain on a summer

    day after considering the available meteorological data, whether or not to

    accept a shipment of parts after only partial inspection, or how to set up anexperimental plan to test several varieties of geese for tenderness. These

    notes deal with the concept of statistics as a body of methods.

    Statistics is helpful to research workers in many ways, such as by

    suggesting which to observe and how to observe it (Theory of Design ofExperiments aid Theory of Sampling) and how to summarize results in formsthat are comprehensible (Descriptive Statistics). Owing to experimental

    error (chance circumstance), data and predictions cannot be expected to

    agree exactly even if the scientist's theories are correct. Therefore, when

    mathematical results from the Theory of Chance are applied to statistical

    problems (Inferential Statistics), they help to draw conclusions from the

    data.

    SAMPLE AND POPULATION

    Sample (data, observations) is the "number" that have been observed.

    Population is the totality of all possible observations of the same kind.

    Sample results vary by chance, and the pattern of chance variation

    depends on the population from which the samples have been drawn. A

    sample is not a miniature replicate of the population, so when decisions

    about a population are based on a sample, it is necessary to make allowance

    for the role of chance.

    RANDOM SAMPLE

    If a sample is to be representative of the population, each member of

    the population should have an equal chance of being included in the sample.

    But this requirement is not enough in itself to make up a random sample.

    This appendix was prepared by Andres Petrasovits, Statistical Research Service,Canada Department of Agriculture, Ottawa.

    45

  • A sample of size N is said to be random if each combination of N items inthe population has an equal chance of being chosen.

    To stress the importance of the italicized words in the above definition,

    consider the following:

    Example: A college catalogue of students has 50 pages. A sample of 50students was taken by selecting at random one student from each page.

    Note that this is not a random sample, because a sample including two

    students listed on the same page has a zero chance of being chosen.

    It must be pointed out that samples that are not strictly random are

    often preferable to random samples on the grounds of convenience or of

    increased precision (Stratification).

    PARAMETER AND STATISTIC

    Samples are taken to learn about the population being sampled. A

    parameter is a quantitative characteristic of the population. A statistic

    is a mathematical function of the sample values or a value computed from

    the sample.

    Example 1

    Suppose we are interested in learning about the average weight of

    adult Canadians. The true average weight (call it pt) of Canadians is

    clearly impossible (physically and economically) to compute. It is possible,

    however, to take a sample of 100 Canadians, register their weights, and

    compute the mean of the sample weights. Let us denote this mean obtained

    from the sample X, and suppose that, in this case, we obtained X = 141.5

    pounds. As a first approximation it would be reasonable to regard 141.5

    pounds as an estimate of fi, the true but unknown average weight for all

    Canadians.

    Then we say: X is a statistic computed from the sample, |i is a

    parameter of the population, and X is an estimate of fi.

    Example 2

    If we are interested in the variability of weights, an indicator of

    dispersion that is often used is the range (largest minus smallest). As with

    the mean weights, the true range (call it R) of adult Canadian body weights

    cannot be practically obtained, but we can use the sample of size 100

    (from example 1) to obtain an estimate of the true range on the basis of the

    range observed in the sample. Let us denote the latter by r.

    Then we say: r is a statistic computed from the sample and R is a

    parameter of the population: r is an estimate of R.

    46

  • Example 3

    A political poll is to be taken for Ontario to estimate the proportion

    of Liberals. Clearly, the true proportion of Liberals (call it 77) would be

    practically impossible to compute. We can, however, obtain the proportion

    of Liberals from the sample taken (call it p) and regard p as an estimate of

    77. Suppose that p was found to be 0.46, then we say that p = 0.46 is a

    statistic computed from the sample, 77 is a parameter of the population,

    and p is an estimate of 77.

    It should be pointed out that many statistics can be computed from a

    sample. In example 1, other statistics besides X may give an estimate of fi,for instance the most frequent value (mode) or the average of the extremes.

    POPULATION OF SAMPLE MEANS

    Consider the following finite population P: (1, 2, 3, 4). Note that two

    summary measures of interest about P are the population mean \l = 2.5 and

    the population range R = 4 - 1=3.

    Let us form all possible samples of size 2, say, without replacement

    from P and for each sample let us compute the sample mean. We have:

    Sample X

    (1, 2) 1.5

    (1, 3) 2

    (1, 4) 2.5

    (2, 3) 2.5

    (2, 4) 3

    (3, 4) 3.5

    The set of sample means

    Px = (1.5, 2, 2.5, 2.5, 3, 3.5)is called the population of sample means of size 2 obtained from the

    original population P.

    ORIGINAL POPULATION: P POPULATION OF SAMPLE MEANS: P

    :>1.5 2.5

    2 35 3

    When comparing the original population, P, with the population of sample

    means of size 2, P^, it should be noticed that:

    (i) the mean of the population of sample means is equal to the mean

    of the original population,

    47

  • (ii) the variability (dispersion) in the population of sample means is

    smaller than the variability in the original population.

    The above results are illustrated by the previous example.

    For (i),

    (1 + 2 + 3 + 4)/4 = (1.5 + 2 + 2.5 + 2.5 + 3 + 3.5)/6 = 2.5,

    and for (ii), if we agree to use the range as a measure of variability, then

    range for original population = 4 - 1 = 3, and

    range for population of sample means = 3.5 - 1.5 = 2.

    As an exercise for the reader it may be useful to examine the population of

    sample means derived by taking all possible samples of size 3 from the

    population:

    P = (2, 3, 3, 7, 7, 0).

    NORMAL DISTRIBUTION

    Frequency Distribution

    The pattern (characteristics) of a population or sample is shown by

    grouping the data in a frequency table. This table is called the frequency

    distribution of the population or the sample.

    Example: The ages in years of 10 students in a classroom were:

    21, 20, 21, 22, 20, 22, 21, 20, 23, 21.

    Frequency distribution

    20

    21

    22

    23

    Frequency

    3

    4

    2

    1

    5

    >- 4-

    Z 7HI -'

    S 2-1£ i

    GRAPHICAL REPRESENTION OFTHE FREQUENCY DISTRIBUTION

    21 22

    AGE23

    Characterization of the Normal Distribution

    Roughly speaking, a normal distribution is a frequency distribution

    whose graphical description is "bell-shaped." It must be noted that not

    all bell-shaped distributions are normal.

    A normal distribution is fully determined by two parameters: the mean

    Qz) and the standard deviation (a). A full definition of the latter is available

    in any introductory statistics textbook.

    48

  • The physical meaning of the

    quantities yt and a.INFLECTION

    POINT

    The mean, /x, establishes the "location" or "center" of the distribution.

    The standard deviation, a, measures dispersion and is given by the hori-

    zontal distance from xi to the inflection points of the normal curve. An

    important characteristic of the normal distribution is its symmetry about the

    mean xt.

    NORMAL CURVES WITH DIFFERENT MEANS (LOCATIONS)AND SAME STANDARD DEVIATION (DISPERSION)

    NORMAL CURVES WITH SAME MEAN (LOCATION)AND DIFFERENT STANDARD DEVIATIONS

    These figures show how the mean (/x) and the standard deviation (a) affect

    the appearance of the normal distribution.

    Example

    In a study of radioactivity, measurements and counts per minute were

    registered with their frequency.

    Frequency distribution

    Counts

    per

    minute Frequency

    9000 3

    9100 20

    9200 31

    9300 54

    9400 81

    9500 112

    9600 88

    9700 64

    9800 37

    9900 14

    10000 6

    125 -

    100"

    75 -

    ? 50 *

    25'

    FREQUENCY DISTRIBUTION

    4nnn I9000 ' 92009100

    9600 ' 98009300 9500 9700 9900

    COUNTS PER MINUTE

    49

  • CENTRAL LIMIT THEOREM

    The normal distribution arises from some natural populations (for

    example, radioactivity counts) and from all populations of sample means(of large enough sample size). This is an important fact in the Theory of

    Statistics and it is known as the Central Limit Theorem, which states that

    as the sample size increases, the distribution of the population of sample

    means becomes more like the "bell-shaped" normal distribution regardless

    of the shape of the distribution of the original population.

    Example: Consider the population:

    P l= (0, 1, 2, 3, 4, 5)

    and construct from it the population of sample means of size 2 (P2) and the

    population of sample means of size 4 (P 4 ). From the frequency distributions

    of P 1? P 2 , and P 4 construct graphical representations. It will be observedthat the shape of the graphical representation of P 2 looks more bell-shapedthan that of P

    land that the shape of the graphical representation of P 4

    looks more bell-shaped than that of P 2 .

    Construction of P 2 Construe tion of P 4

    Sample X Sample X Sample X Sample X

    (0, 1) 0.5 (1,4) 2.5 (0, 1, 2, 3) 1.50 (0, 2, 3, 5) 2.50

    (0, 2) 1.0 (0, 5) 3.0 (0, 1, 2, 4) 1.75 (0, 2, 4, 5) 2.75

    (0, 3) 1.5 (2, 3) 2.5 (0, 1, 2, 5) 2.00 (0, 3, 4, 5) 3.00

    (0, 4) 2.0 (2, 4) 3.0 (0, 1, 3, 4) 2.00 (1, 2, 3, 4) 2.50

    (0, 5) 2.5 (2, 5) 3.5 (0, 1, 3, 5) 2.25 (1, 2, 3, 5) 2.75

    (1,2) 1.5 (3,4) 3.5 (0, 1, 4, 5) 2.50 (1, 2, 4, 5) 3.00

    (1, 3) 2.0 (3, 5) 4.0 (0, 2, 3, 4) 2.25 (1, 3, 4, 5) 3.25

    (4, 5) 4.5 (2, 3, 4, 5) 3.50

    Frequency distribution of Pl

    /alue Fre quency

    1

    1 1

    2 1

    3 1

    4 1

    5 1

    12 3 4VALUE

    50

  • Frequency distribution of P 2

    Sample

    mean

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    Frequency

    1

    1

    2

    2

    3

    2

    2

    1

    1

    f

    3- -I

    : 1

    1

    !

    1

    1

    0.5 1.5 2.0 2.5 3.0 3.5 4.0 4.5

    SAMPLE MEAN

    Frequency distribution of P 4

    Sample

    mean Frequency

    1.50 1

    1.75 1

    2.00 2

    2.25 2

    2.50 3

    2.75 2

    3.00 2

    3.25 1

    3.50 1

    u H1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50

    SAMPLE MEAN

    The conclusions to be drawn from this illustration are:

    (i) As the sample size increases, the frequency distribution of the

    population of sample means approaches a normal distribution.

    (ii) As the sample size increases, the variability in the population of

    sample means diminishes.

    INFERENTIAL CONCEPTS

    Two inferential tools of great use in statistics are Tests of Hypothesisand Confidence Intervals. Although closelv related in their theoretical

    basis, they serve different needs from an applied viewpoint.

    Example:

    Suppose a scientist is investigating the number of bacteria per gram in

    frozen eggs. Suppose also that his main interest is in the mean number of

    bacteria per gram (call it /i). Two situations can be visualized:

    51.

  • (i) The scientist may have some theory of his own or he may have

    other facts (another scientist's experiments) that lead him to

    believe that /i = 150. He may decide to use his observations to

    test if, in fact, fi = 150 is consistent with his experimental results.

    (ii) Or the scientist may want to estimate the mean number from his

    data and may not be concerned with checking any particular

    theory. Situation (i) leads to Statistical Tests of Hypothesis and

    situation (ii) leads to Confidence Intervals.

    PROBABILITY

    There are several interpretations of probability. Here we give the

    approach referred to as frequentist. In a long series of throws with a coin,

    heads and tails will occur approximately equally often. We then say that

    the probability of a head (or tail) turning up is %. In a long series of

    throws with a fair die, each of the six sides will occur in approximately

    1/6 of the total number of throws and the probability for any of the numbers

    is 1/6. Generally the concept of probability can be formulated by saying

    that in a very long series of trials any event will tend to occur with a

    relative frequency that is approximately equal to the probability of the

    event.

    TESTS OF HYPOTHESIS

    An example borrowed from criminal legal procedure may help to

    introduce some concepts.

    Null hypothesis: H : defendant is innocent.

    Alternative hypothesis: H a : defendant is guilty.

    The jury observes the evidence and reaches a verdict. The two types of

    error are:

    Type I error: an innocent person is found guilty.

    Type II error: a criminal is acquitted.

    The type I error consists of rejecting the null hypothesis when it is true.

    The type II error consists of accepting the null hypothesis when the

    alternative is true.

    An example of a statistical test of hypothesis is the following: A coin

    is known to be either fair (ht) or two headed (hh). After a single toss of the

    coin, a test of the hypothesis must be set up:

    H : the coin is fair

    versus H a : the coin is two headed.

    52

  • The test of a statistical hypothesis consists of a decision rule to accept

    or reject the null hypothesis (H ) on the basis of relevant statistical

    information (outcome) of a single toss of the coin. There are many decision

    rules that can be used to construct a test. Some are better (more reasonable)

    than others. Here are three different decision rules, each of which can be

    used as a statistical test for the null hypothesis that the coin is ht versus

    the alternative that the coin is hh.

    D|: Toss the coin and if t shows up accept H ; if h shows up rejectH -

    D2 : Toss the coin and if t shows up accept H ; if h shows up accept

    H„-

    D 3 : Toss the coin and if it rains outside accept H ; if it does notrain outside reject H .

    If we must choose one of the decision rules as a basis for our test, D 3 ,although it is a decision rule, can be disregarded on common sense grounds,

    since it does not use experimental evidence. However, when Dj and D 2 arecompared, it is not clear which one is better. Therefore, a measure of the

    goodness of the decision rule that is to be used as a basis for a statistical

    test of hypothesis is needed.

    Associated with a decision rule are two quantities: a = P (making atype I error) which reads "probability of making a Type I error" and /3 = "P

    (making a type II error). Clearly, a desirable property for a statistical

    test is that both a and /3 be as low as possible. Thus, the pair (a, /3)

    provides a basis to measure the goodness of a statistical test of hypothesis.

    Unfortunately, the size of both a and /3 is not enough by itself to select

    the best one of several available tests of hypothesis. This point will be

    illustrated by computing a and /3 for decision rules D t and D 2 in the coinexample.

    ForD^a = P (making a type I error)

    = P( rejecting H when it is true)= P (h shows up with a fair coin)

    1= 2

    /3 = P (making a type II error)= P (accepting H when H a is true)= P (t shows up with a two-headed coin)=

    Therefore for Dj, a = A and /3 =

    For D 2 ,a = P (making a type I error)

    = P (rejecting H when it is true)= 0, because according to D 2 we never reject H

    53

  • /3 = P (making a type II error)= P (accepting H when H a is true)= P (heads or tails show up when coin is 2h)= 1, because heads will always appear, so the event (heads or tails)

    will always occur.

    Therefore:

    a

    D 2 1

    The above table shows that the decision rule (D j or D 2 ) that should beused cannot be chosen only by considering the probabilities a and /3 that

    correspond to each decision rule because their relative importance is

    unknown. Consideration of the economic importance of the type I and type II

    errors is necessary when choosing which decision rule to use for a parti-

    cular situation. Further discussion of this point is beyond the scope of

    these notes.

    The probability of making a type I error is called the level of signifi-

    cance of the test, while the quantity, 1 minus P (type II error), is referred

    to as the power of the test. In statistical methodology, if no economic

    criteria are available to weigh the importance of the type I and type II

    errors, the usual way to select a test is to fix the level of significance

    arbitrarily, say 0.05, to consider only the decision rules for which the level

    of significance is 0.05, and then to choose the decision rule for which the

    power is maximum.

    CONFIDENCE INTERVALS

    Research workers often have to estimate parameters. An estimate of a

    parameter given by the corresponding statistic is unlikely to be precisely

    equal to the parameter, so it is necessary to show the margin of variability

    to which it is subject. A way to do this is to specify an interval within

    which we may be "confident" that the parameter lies. Such an interval is

    called a confidence interval.

    Example: A sample of 10 adult Canadians is taken to estimate the mean

    Canadian adult weight (fi).

    POPULATIONSAMPLE

    "\ 198 192\ 178 164

    ^SAMPLE \_1 ^ 158 190' ) 200 200/ 165 160

    54

  • If it is assumed that Canadian adult weights are normally distributed,

    by using a technique available in any textbook on statistics, the 95%confidence interval for n can be computed from the sample and is 160.75,

    190.25. The meaning of a confidence interval is often misunderstood. It

    does not mean that the probability is 0.95 that the population mean (//)lies between 160.75 and 190.25. It means that if many random samples

    of size 10 are taken from the population of adult Canadian weights and for

    each sample a 95% confidence interval for the mean is computed according

    to the above technique, then, in a very long series 95% of the confidence

    intervals so computed will contain the true mean.

    REFERENCES

    1. Amerine, M. A., and C. S. Ough. 1964. The Sensory Evaluation of

    Californian Wines. Lab. Pract. Vol. 13, No. 8.

    2. Baker, R. A. 1964. Taste and Odour in Water. Lab. Pract. Vol. 13,

    No. 8.

    3. Bengtsson, K., and E. Helm. 1953. Principles of Taste Testing.

    Wallerstein Laboratories Communications.

    4. Caul, J. F. 1957. The Profile Method of Flavor Analysis. Adv. in Food

    Research, 7: 1.

    5. Cochran, W.G., and G.M. Cox. 1957. Experimental Designs. John Wiley

    and Sons, New York, N.Y.

    6. Dawson, E. H., J.L. Brogdon, and S. McManus. 1963. Sensory Testing

    of Differences in Taste. Food Technol. Vol. 17, No. 9.

    7. Duncan, D. B. 1955. Multiple Range and Multiple F Tests. Biometrics,Vol. II.

    8. Eindhoven, J., D. Peryam, F. Heiligman, and G. A. Baker. 1964.

    Effect of Sample Sequence on Food Preference. J. Food Sci.

    Vol. 29, No. 4.

    9. Fisher, R. A., and F. Yates. 1942. Statistical Tables. Oliver and

    Boyd Ltd., Edinburgh and London.

    10. Kramer, A., J. Cooler, M. Modery, and B.A. Twigg. 1963. Numbers of

    Tasters Required to Determine Consumer Preference for Fruit

    Drinks. Food Technol. Vol. 17, No. 3.

    11. Lowe, B. 1963. Experimental Cookery, John Wiley and Sons. New York,N.Y.

    12. Pangborn, R. M. V. 1964. Sensory Evaluation of Food at the University

    of California. Lab. Pract. Vol. 13, No. 7.

    55

  • 13. Pettit, L. A. 1958. Informational Bias in Flavor Preference Testing.

    Food Technol. Vol. 12, No. 1.

    14. Peryam, D. R. 1964. Sensory Testing at the Quartermaster Food and

    Container Institute. Lab. Pract. Vol. 13, No. 7.

    15. Read, D. R. 1964. A Quantitative Approach to the Comparative Assess-ment of Taste Quality in the Confectionery Industry. Biometrics,

    20: 143-155.

    16. Robinson, P. 1959. Tests of Significance. Bulletin of Statistical

    Research Service, Canada Department of Agriculture.

    17. Sather, L. A. 1958. Laboratory Flavor Panels. Oregon Agricultural

    Experiment Station Paper No. 56.

    18. Sather, L. A., and L. D. Calvin. 1960. The Effect of Number of

    Judgments in a Test on Flavor Evaluations for Preference. Food

    Technol. Vol. 14.

    19. Sather, L. A., L. D. Calvin, and A. Tomsma. 1963. Relation of Prefer-

    ence Panels and Trained Panel Scores on Dry Whole Milk. J. Dairy

    Sci.46.

    20. Scheffe, H. 1952. An Analysis of Variance for Paired Comparisons.

    J.Am. Statist. Ass. 47: 381-400.

    21. Sjostrom, L. B., S. E. Cairncross, and J. F. Caul. 1957. Methodology

    of the Flavor Profile. Food Technol. Vol. 11, p. 20.

    22. Snedecor, G. W. 1956. Statistical Methods. Iowa State College Press,

    Iowa State College, Ames.

    23. Tilgner, D. J. 1962. Anchored Sensory Evaluation Tests — A StatusReport. Food Technol. Vol. 16, No. 3.

    24. Tilgner, D. J. 1962. Dilution Tests for Odor and Flavor Analysis.