Ch 5 BB Analyze

Embed Size (px)

Citation preview

  • 7/29/2019 Ch 5 BB Analyze

    1/83

    Chapter 5

    Analyze

  • 7/29/2019 Ch 5 BB Analyze

    2/83

    Objective

    In the measure phase, our focus was primarily onY.

    In the analyze phase, we shall focus on the Xs.

    Objectives are To identify and establish the root cause(s) of the

    problem (i.e. the Xs affecting Y)

    [If required] To develop the function Y=f(X), that isuseful

    To identify the important Xs for experimentation

  • 7/29/2019 Ch 5 BB Analyze

    3/83

  • 7/29/2019 Ch 5 BB Analyze

    4/83

    Types of Factors (Xs)

    Noise Factors Neither the effect difference among levels can be

    reduced nor can the best level be picked up forimplementation

    Example:Environmental condition, measurementmethod, Degradation time, Supply voltage, Toleranceson control factors

    Block Factors The best level cant be picked up for implementation but

    the effect difference among levels can be reduced One or more control factors and/or noise factors are

    associated with a (naturally occurring) block Example:Geographical location, Shift of operation,

    Spindle number, Supplier

  • 7/29/2019 Ch 5 BB Analyze

    5/83

  • 7/29/2019 Ch 5 BB Analyze

    6/83

  • 7/29/2019 Ch 5 BB Analyze

    7/83

  • 7/29/2019 Ch 5 BB Analyze

    8/83

    Notes: Analysis Steps

    Validate measurement

    Develop operating definition for each X

    Perform gauge R & R for each X

    Select probable causes

    Base lining of each X

    Is-Is Not testing

    Importance rating Develop and test hypotheses

    Develop hypothesis Collect data Test hypothesis

  • 7/29/2019 Ch 5 BB Analyze

    9/83

  • 7/29/2019 Ch 5 BB Analyze

    10/83

    Brainstorming

    Brainstorming is a part of problem solving,where a group of people put social

    inhibitions and norms aside for generating

    as many new ideas as possible (regardlessof their initial worth).

  • 7/29/2019 Ch 5 BB Analyze

    11/83

  • 7/29/2019 Ch 5 BB Analyze

    12/83

  • 7/29/2019 Ch 5 BB Analyze

    13/83

  • 7/29/2019 Ch 5 BB Analyze

    14/83

  • 7/29/2019 Ch 5 BB Analyze

    15/83

  • 7/29/2019 Ch 5 BB Analyze

    16/83

  • 7/29/2019 Ch 5 BB Analyze

    17/83

  • 7/29/2019 Ch 5 BB Analyze

    18/83

  • 7/29/2019 Ch 5 BB Analyze

    19/83

  • 7/29/2019 Ch 5 BB Analyze

    20/83

    CE Diagram: Dispersion AnalysisType

    Poor DataQuality

    Sampling

    Sample collection

    Design

    implementation

    Samplingdesign

    Requirement

    Accuracy

    Cost

    Unit sampling cost

    Sample size

    Sample size

    Stratification

    Why does the dispersion in Because of

    Sampling occur? Sampling design, designimplementation and sample collection

    Sampling design occur? Requirement variation

  • 7/29/2019 Ch 5 BB Analyze

    21/83

  • 7/29/2019 Ch 5 BB Analyze

    22/83

    Bad CE Diagram

    Effect

    (1) Very simple looking

    (2) Contains very few causes

    Effect

    1

    2

    3 45

    (3) Complicated looking, but full of non-actionable causes

    (4) Complicated looking, but contains many causes having no effect

    A good CE diagramis one that is easy

    to use, leads toaction and actionsare likely to yieldresults

  • 7/29/2019 Ch 5 BB Analyze

    23/83

  • 7/29/2019 Ch 5 BB Analyze

    24/83

  • 7/29/2019 Ch 5 BB Analyze

    25/83

  • 7/29/2019 Ch 5 BB Analyze

    26/83

  • 7/29/2019 Ch 5 BB Analyze

    27/83

  • 7/29/2019 Ch 5 BB Analyze

    28/83

  • 7/29/2019 Ch 5 BB Analyze

    29/83

  • 7/29/2019 Ch 5 BB Analyze

    30/83

  • 7/29/2019 Ch 5 BB Analyze

    31/83

  • 7/29/2019 Ch 5 BB Analyze

    32/83

  • 7/29/2019 Ch 5 BB Analyze

    33/83

  • 7/29/2019 Ch 5 BB Analyze

    34/83

  • 7/29/2019 Ch 5 BB Analyze

    35/83

  • 7/29/2019 Ch 5 BB Analyze

    36/83

  • 7/29/2019 Ch 5 BB Analyze

    37/83

  • 7/29/2019 Ch 5 BB Analyze

    38/83

  • 7/29/2019 Ch 5 BB Analyze

    39/83

  • 7/29/2019 Ch 5 BB Analyze

    40/83

    Example Solution: One Tail

    80.1

    36

    15

    3685.372

    n

    S

    Xt

    Observed t = 1.80 does not fallin the rejection region

  • 7/29/2019 Ch 5 BB Analyze

    41/83

    Example : Z Test for Proportion

    Problem:A marketing company claimsthat it receives 4% responses from its

    Mailing.

    Approach:To test this claim, a randomsample of 500 were surveyed with 25

    responses. So observed p = .05 is more

    than the claimed ps = .04Solution:Z test (Assume a = .05). Whytest at all since p > ps? Both sided H1.

  • 7/29/2019 Ch 5 BB Analyze

    42/83

  • 7/29/2019 Ch 5 BB Analyze

    43/83

  • 7/29/2019 Ch 5 BB Analyze

    44/83

  • 7/29/2019 Ch 5 BB Analyze

    45/83

  • 7/29/2019 Ch 5 BB Analyze

    46/83

    t Test An Example

    Sample no. 1 2 3 4 5 6 7 8 9 10

    Oil Quenching 145 150 153 148 141 152 146 154 139 148

    Water quenching 155 153 150 158 143 149 161 155 154 146

    Hardness after hardening by two quenching methods

    Data159156153150147144141

    Oil Quenching

    Water Quenching

    Dotplot of Oil Quenching, Water Quenching

    What can you conclude?

    Mean s.d

    Oil Q 147.6 4.97

    Water Q 152.4 5.46

    sp =[(9*4.972 + 9*5.462)/(10 + 10 - 2)]

    = 5.221t0 = (147.6152.4) / [5.221*(2/10)] = -2.06

    tcritical = t.025, 18 = 2.101 > mod (to)

    H0: oil = waterHA: oilwater

    H0 can not be rejected at 95% level of confidence

  • 7/29/2019 Ch 5 BB Analyze

    47/83

  • 7/29/2019 Ch 5 BB Analyze

    48/83

  • 7/29/2019 Ch 5 BB Analyze

    49/83

    Pizza Experiment : Solution

    Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d

    Pizza A 12.9 5.7 16.0 14.3 2.4 1.6 14.6 10.2 4.3 6.6 8.86 5.40

    Pizza B 16.0 7.5 16.0 14.0 13.2 3.4 15.5 11.3 15.4 10.6 12.29 4.18

    Hypotheses:: H0: A = B, H1: A BPooled s. d = sp = [(5.402 + 4.182)/2] = 4.83

    t = (12.298.86) / [4.83*(1/5)] = 3.43/2.16 = 1.59 < t.025, 18 = 2.101

    Subject 1 2 3 4 5 6 7 8 9 10 Mean s. d

    B - A 3.1 1.8 0.0 -0.3 10.8 1.8 0.9 1.1 11.1 4.0 3.43 4.17

    Hypotheses:: H0: B-A = 0, H1: B-A 0t = 3.43 / [4.17/10] = 2.60 > t.025, 9 = 2.262

    H0 is Rejected

    There is sufficient evidence to claim that

    people prefer Pizza A more than Pizza B

    H0can not berejected

    Two Sample t Test

    F T t

  • 7/29/2019 Ch 5 BB Analyze

    50/83

    F Test :Testing Variances of Two Populations

    H0: 12 = 22

    H1: 1222

    H0: 12 = 22

    H1: 12 > 22

    H0: 12 = 22

    H1: 12 < 22

    Hypotheses

    2

    1

    2

    2

    0s

    sF

    2

    2

    2

    1

    0s

    s

    F

    212

    2

    2

    10 , ss

    s

    sF >

    Test Statistic

    1,1,2/0 21 >

    nnFF

    a

    1,1,0 21 > nnFF a

    1,1,0 12 >

    nnFF

    a

    H0 Rejection criteria

    In the previous pizza example, standard deviation in the twocases are 5.40 and 4.18. Assuming two sided alternative,

    F0 = 5.402/4.182 = 1.67 < F.025, 9, 9 = 4.03

    H0not rejectedTwo variances are not significantly different

  • 7/29/2019 Ch 5 BB Analyze

    51/83

  • 7/29/2019 Ch 5 BB Analyze

    52/83

    Basic ANOVA Situation

    We want to compare means of two or moregroups

    Analysis of variance (ANOVA) makes use of an

    omnibus F test that tells us if there is anysignificant difference anywhere among the groups

    If F test says no significant difference then there isno point in searching further

    If F test indicates significance then we may useother tools to find out where the difference is

  • 7/29/2019 Ch 5 BB Analyze

    53/83

    An Example ANOVA Situation

    Subjects: 25 patients with blisters

    Treatments:Treatment A, Treatment B, Placebo

    Measurement:# of days until blisters heal

    Data [and means]:

    A: 5, 6, 6, 7, 7, 8, 9, 10 [7.25]

    B: 7, 7, 8, 9, 9, 10, 10, 11 [8.875]

    P: 7, 9, 9, 10, 10, 10, 11, 12, 13 [10.11]

    Are these differences significant?

  • 7/29/2019 Ch 5 BB Analyze

    54/83

    Two Sources of Variability

    In ANOVA, an estimate ofvariability between groupsiscompared withvariability within groups.

    Between-group variation is the variation among the means of thedifferent treatment conditions due to chance (random sampling

    error) and treatment effects, if any exist. Within-group variation is the variation due to chance (random

    sampling error) among individuals given the same treatment.

    Total Variation in Data

    Within Group Variation Variation due to chance

    Between Group Variation Variation due to chance and

    treatment effects, if any

  • 7/29/2019 Ch 5 BB Analyze

    55/83

    Variability Between Groups

    There is a lot of variability from one mean to the next.

    Large differences between means probably are not due tochance.

    It is difficult to imagine that all six groups are randomsamples taken from the same population.

    The null hypothesis is rejected, indicating a treatmenteffectin at least one of the groups.

  • 7/29/2019 Ch 5 BB Analyze

    56/83

    Variability Within Groups

    Same amount of variability between group means.

    However, there is more variability within each group.

    The larger the variability within each group, the lessconfident we can be that we are dealing withsamples drawn from different populations.

  • 7/29/2019 Ch 5 BB Analyze

    57/83

    The F Ratio

    The ANOVA F-statistic is a ratio of the Between Group

    Variation divided by the Within Group Variation

    VariationGroupWithinVariationGroupBetweenF

    A large F is evidence against H0, since it indicates that there

    is more difference between groups than within groups

  • 7/29/2019 Ch 5 BB Analyze

    58/83

    Two Sources of Variability

    1>F

    GroupsWithinyVariabilitGroupsBetweenyVariabilitF

  • 7/29/2019 Ch 5 BB Analyze

    59/83

    Two Sources of Variability

    roupsy Within GVariabilit

    Groupsy BetweenVariabilit

    F

    1F

  • 7/29/2019 Ch 5 BB Analyze

    60/83

    Blister Experiment: Minitab ANOVAOutput

    1 less than # of

    groups

    # of data values - # of groups

    (equals df for each group

    added together)

    1 less than # of individuals

    (just like other situations)

    Analysis of Variance for days

    Source DF SS MS F P

    Treatment 2 34.74 17.37 6.45 0.006

    Error 22 59.26 2.69

    Total 24 94.00

  • 7/29/2019 Ch 5 BB Analyze

    61/83

    Minitab ANOVA Output

    Analysis of Variance for daysSource DF SS MS F P

    Treatment 2 34.74 17.37 6.45 0.006

    Error 22 59.26 2.69

    Total 24 94.00

    2)( iobs

    ij xx

    (xi

    obs

    x)2

    (xijobs x)

    2

    SS stands for sum of squares

  • 7/29/2019 Ch 5 BB Analyze

    62/83

    Minitab ANOVA Output

    MSG = SSG / DFG

    MSE = SSE / DFE

    Analysis of Variance for days

    Source DF SS MS F P

    Treatment 2 34.74 17.37 6.45 0.006

    Error 22 59.26 2.69

    Total 24 94.00

    F = MSG / MSE

    P-value

    comes from

    F(DFG,DFE)

  • 7/29/2019 Ch 5 BB Analyze

    63/83

    Contingency Table

    Group Outcome Total

    Defective Non-defective

    Supplier 1 13 37 50

    Supplier 2 6 144 150

    Total 19 181 200

    26%

    4%

    74%

    96%

    Y = Outcome

    Y is discrete

    Its a 2 x 2

    contingency table

    Is there any difference between the two suppliers?

    Generally speaking, we want to know if there is anyassociation between the groups and outcomes

    In the above we have two categorical variables eachat two levels. In general we can have p x q x rx table

    We shall discuss only p x q

  • 7/29/2019 Ch 5 BB Analyze

    64/83

    Expected Frequencies

    Group Outcome Total

    Defective Non-defective

    Supplier 1 13 37 50

    Supplier 2 6 144 150

    Total 19 181 200

    5

    14

    45

    136

    #

    Expected frequency

    under H0, i.e. no

    supplier difference

    Under H0, Probability of getting a defective = 19 / 200 = 0.095

    So, under H0, number of defectives expected in the samples are

    Supplier 1: 50 x 0.095 = 5 (rounded to nearest integer) Supplier 2: 150 x 0.095 = 14 (rounded to nearest integer)

    Expected frequency of non-defectives can be found out similarly, or

    simply as (50 5) = 45 (supplier 1) and (150 14) = 136 (supplier 2)

  • 7/29/2019 Ch 5 BB Analyze

    65/83

  • 7/29/2019 Ch 5 BB Analyze

    66/83

    Contingency Table 2 Test

    Group Outcome Total

    Defective Non-defective

    Supplier 1 13 37 50

    Supplier 2 6 144 150

    Total 19 181 200

    5

    14

    45

    136

    p

    i

    q

    j ij

    ijij

    E

    OE

    1 1

    2

    2)(

    c

    H0 will be rejected if

    )1)(1(,22

    > qpacc

    Eij = Expected frequency

    Oij = Observed frequency

    11.21

    75.135

    )14475.135(

    25.14

    )625.14(

    25.45

    )3725.45(

    75.4

    )1375.4( 22222

    c

    2.05,1 = 3.84

    Expected frequencies

    should not be rounded

    For our data

    H0 is rejected. Two suppliers are significantly different

  • 7/29/2019 Ch 5 BB Analyze

    67/83

    Correlation

    In practice we frequently find that a group of twoor more variables move together

    For simplicity, let us assume two continuous

    variables The variables may be (Y1, Y2), (X1, X2) or (X, Y)

    At this moment we also not concerned whetherthe correlation is meaningful!

    How do we measure the strength of correlationbetween the two variables?

  • 7/29/2019 Ch 5 BB Analyze

    68/83

    Correlation Coefficient

    The most popular measure of association betweentwo continuous variables is the correlationcoefficient r

    It is defined in such a manner so that it varies

    Between -1 (perfect negative correlation)

    Through 0 (absolutely no correlation)

    To +1 (perfect positive correlation)

    However we should never jump to compute thevalue of r. The first step in studying associationbetween two variables is to plot the data in theform of a Scatter Diagram

  • 7/29/2019 Ch 5 BB Analyze

    69/83

    Scatter Diagram and r

    r = +1.0

  • 7/29/2019 Ch 5 BB Analyze

    70/83

    Scatter Diagram and r

    r = -1.0

  • 7/29/2019 Ch 5 BB Analyze

    71/83

    Scatter Diagram and r

    r = +0.7

    d

  • 7/29/2019 Ch 5 BB Analyze

    72/83

    Scatter Diagram and r

    r = +0.3

    S d

  • 7/29/2019 Ch 5 BB Analyze

    73/83

    Scatter Diagram and r

    r = -0.3

    S Di d

  • 7/29/2019 Ch 5 BB Analyze

    74/83

    Scatter Diagram and r

    r = 0

    U f l f S Di

  • 7/29/2019 Ch 5 BB Analyze

    75/83

    Usefulness of Scatter Diagram

    Drawing scatter diagram should always be the first step incorrelation analysis for the following reasons

    Guard against gross computational error

    Detects outliers, if any (may be genuine or data error) Detects influential points, if any (may be genuine or data error)

    Detects groups in data, if any

    Detects nonlinearity (r is a measure of linear association only)

    Provides a quick approximate solution

    Easy to understand

    C ti

  • 7/29/2019 Ch 5 BB Analyze

    76/83

    Computing r

    yyxx

    xy

    n

    y

    iyy

    n

    x

    ixx

    iixy

    SS

    Sr

    yyyS

    xxxS

    n

    yxxyyyxxS

    2

    2

    )(22

    )(22

    )(

    )(

    ))((

    22 )()(

    ))((

    yyxx

    yyxxr

    ii

    ii

    Data: (Var 1, Var 2)=(x, y)

    Formulae for convenient andefficient computing

    # x y x2

    y2

    xy1 2 6 4 36 12

    2 6 8 36 64 48

    3 1 4 1 16 4

    4 4 7 16 49 28

    5 3 4 9 16 12

    6 5 9 25 81 45

    7 8 10 64 100 80

    Total 29 48 155 362 229

    Sxy = 229 (29 * 48) / 7 = 30.14

    Sxx = 155 (29 * 29) / 7 = 34.86

    Syy = 362 (48 * 48) / 7 = 32.86

    r = (30.14/((34.86*32.86)) =0.891

    R i A l i

  • 7/29/2019 Ch 5 BB Analyze

    77/83

    Regression Analysis

    As in the correlation analysis we have n observations of theform (X, Y) or (X1, X2, X3, , Xk, Y)

    Y is called the dependent variable and the Xs independentvariables

    Objective is to develop a equation relating Y to the Xs forpredicting Y in the data range Simple linear regression

    Y = + X + , where is the random error component

    Multiple linear regression Y = + 1 X1 + 2 X2 + 3 X3+ . + kXk + All or a few of the Xs may be a function of x1

    Nonlinear regression

    Here we shall discuss only simple linear regression

    R i A l i A E l

  • 7/29/2019 Ch 5 BB Analyze

    78/83

    Regression Analysis: An Example

    Given one variable Goal: Predict Y Example:

    Given Years ofExperience Predict Salary

    Questions: When X=10, what is Y? When X=25, what is Y? This is known as

    regression

    X (years) Y (salary inRs. 1,000)

    3 30

    8 57

    9 64

    13 72

    3 36

    6 43

    11 59

    21 90

    1 20

    Obt i i th B t Fit Li

  • 7/29/2019 Ch 5 BB Analyze

    79/83

    Obtaining the Best Fit Line

    Years of Experience

    Salary

    20151050

    90

    80

    70

    60

    50

    40

    30

    20

    Scatterplot of Salary vs Years of Experience

    1. Plot the data

    Obt i i th B t Fit Li

  • 7/29/2019 Ch 5 BB Analyze

    80/83

    Obtaining the Best Fit Line

    Linear Regression: Y=3.5*X+23.2

    0

    20

    40

    60

    80

    100

    120

    0 5 10 15 20 25

    Years

    Sa

    lary

    2. Then obtain the best fit line

    Obt i i th B t Fit Li

  • 7/29/2019 Ch 5 BB Analyze

    81/83

    Obtaining the Best Fit Line

    by minimizing the deviations

    Usually, we minimize the square of thedeviations for estimating and

    Hence the name Least Square Method

    R i F l

  • 7/29/2019 Ch 5 BB Analyze

    82/83

    Regression Formulae

    XY ba

    xy

    xx

    yyxx

    ii

    i

    ii

    ba

    b

    2

    )(

    ))((

    Using our earlier notation we havexx

    xy

    S

    Sb

    Exercise: Estimateand for the salary data

    G d f Fit

  • 7/29/2019 Ch 5 BB Analyze

    83/83

    Goodness of Fit

    yy

    xy

    S

    S

    TSS

    SSR

    squaresofsumTotal

    regressiontoduesquaresofSumR

    b

    2

    Note that R

    2

    is nothing but the squareof the correlation coefficient r

    For practical purposes, usefulness of the equation isjudged by the standard error s

    sbandpredictioneApproximat

    MSEserrordardS

    nSSEMSEerrorsquareMean

    RSSTSSSSEsquaresofsumError

    2

    tan

    )2/(