Analisis Regresi Linier Dan Logistik

  • Upload
    anenz

  • View
    238

  • Download
    1

Embed Size (px)

Citation preview

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    1/22

    26/05/1

     Analisis Regresi Linier dan

    Logistik

    Oleh :

    Nurita Andayani

    Introduction• Difference between chi-square and regression : chi-

    square test of independence to determine whether astatistical relationship existed between two variables. Thechi-square test tell if there is such a relationship, but it doesnot tell about what that relationship. But regression andcorrelation analyses will show how to determine both thenature and the strength of a relationship between twovariables

    • Regression analysis is a body of statistical methodsdealing with the formulation of mathematical models thatdepict relationships among variables, and the use of thesemodeled relationships for the purpose of prediction and otherstatistical inferences.

    • The word “regression” was first in its present technicalcontext by Sir Francis Galton, who analyzed the heights ofsons and the average heights of their parents.

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    2/22

    26/05/1

    Models• The independent or controlled variable is also called the predictor variable

    and is denoted by x. The effect or response variable is denoted by y.

    • If the relation between y and x is exactly a straight line, then the variablesare connected by the formula :

    y = α + β x 

    where α indicates the intercept of the line with the y axis and β representsthe slope of the line, or the change in y per unit change in x.

    y

    x

    β

    α

    1

    yi

    α + β x i 

     x i 

    Statistical Model

    Yi = α + βxi + ei , i = 1, …, n

    Where :

    a) x 1, x 2 , …,x n are the set values of the controlled variable x

    that the experimenter has selected for the study.

    b) e1, e2 , …,en are the unknown error components that are

    superimposed on the true linear relation. These are

    unobservable random variables, which we assume areindependently and normally distributed with a mean of

    zero and unknown variance of σ2.

    c) The parameters α and β, which together locate the

    straight line, are unknown.

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    3/22

    26/05/1

    Basic Notations

     y xn y x y y x xS 

     yn y y yS 

     xn x x xS 

    n y x

    n x

    iiii xy

    ii y

    ii x

    i

    ))((

    )(

    )(

    1,

    1

    2222

    2222

    Example

    • Zippy Cola is studying the effect of itslatest advertising campaign. Peoplechosen at random were called and askedhow many cans of Zippy Cola they hadbought in the past week and how many

    Zippy Cola advertisements they had eitherread or seen in the past week.X (number of ads) 3 7 4 2 0 4 1 2

    Y( cans purchased) 11 18 9 4 7 6 3 8

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    4/22

    26/05/1

    Least Squares Regression Line

    • Least squares regression line :

    • Least square estimate of α :

    • Least square estimate of β :

    • The residual sum of squares or the sum of squares due to

    error is :

     x y        ˆˆˆ  

     x

     xy

    S b     

     x ya        ˆˆ  

    n

    i

    ii x y   x yS S SSE 

    1

    2222 )ˆˆ(ˆ      

    Properties of the Least SquaresEstimators

    a) The least squares estimators are unbiased; that is

    and

    b) and

    c) The distribution of and are normal with means of α and

    β, respectively; the standard deviations are the square rootsof the variances given in b).

    d) s2=SSE/(n-2) is an unbiased estimator of σ2. Also, (n-1)s2/σ2

    is distributed as with d,f,=n-2, and it is independent ofand

         )ˆ( E          )ˆ( E 

    22 )(     s E 

     ˆ    ̂

     ˆ

      ̂

    2  

     )( s E 

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    5/22

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    6/22

    26/05/1

    Inference about α

    2

    2

    2/)1(

    2

    2

    0

    0100

    1.ˆ

    :forintervalconfidence p%

    2nd.f.,1

    )ˆ(on basedis::

     x

    CI 

     x

     x

    n st 

     x

    n s

     H vs H 

     

     

          

    Checks on The Straight Line Model

    relation

    linear fromdeviationrelationlinear value

    or residual byExplainedobserved

    )ˆˆ()ˆˆ(

     y

     x y x y iiii          

    ed)(unexplainrelationlinear  byof SS

    SSresidualexplainedSSTotal

    ˆ   222

     y

    SS E S S   x y      

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    7/22

    26/05/1

     Anova for checking regression model

    Source Sum of Squares d.f. Mean Squares F

    Regression SSR 1 MSR=SSR/1 MSR/MSE

    Error SSE n – 2 MSE=SSE/(n-2)

    Total SST n – 1

    Inference for regression model

    ))2(,1(

    10

    :

    )leveltsignifican(with:regionRejection

    0:0:

    n F  F  R

     H  H 

     

     

        

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    8/22

    26/05/1

    The coefficient of determination

    • The sample coefficient of determination is

    developed from relationship between two kinds of

    variation: variation of Y values in a data set around :

     – The fitted regression line

     – Their own mean

    %100010

    1

    22

    2

     Ror  R

    SS T 

    SS E 

    SST 

    SS R R

    Perfect fitted

    regression lineunfitted

    regression

    model

    The coefficient correlation

    • Coefficient correlation ( r ) indicates the direction of

    the relationship between the two variables X and Y

    • If an inverse relationship exist-that is, if Y decreases

    as X increases-then r will fall between 0 and -1

    • If there is a direct relationship (if Y increases as X

    increases), then r will be a value within the range 0

    and 1

    22.  y x

     xy

    S S 

    S r  

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    9/22

    26/05/1

    Exercise• PUSKESMAS „PANCORAN MAS‟ ingin mengetahui

    hubungan antara usia dengan besarnya tekanandarah dari pasien. Diambil 10 pasien dan didapatkan

    hasilnya sebagai berikut

    Usia 38 36 72 42 68 63 49 56 60 55

    Tekanan darah 115 118 160 140 152 149 145 147 155 150

    a) Buat model regresinya !

    b) Jika usia pasien adalah 40 pediksikan besar tekanan

    darahnya !

    c) Ujilah model regresi yang telah anda buat !d) Ujilah apakah parameter α=0 dan β=0 ?

    e) Buat selang kepercayaan 90% untuk α dan β !

    f) Hitung koefisien determinasi dan korelasinya, jelaskan artinya

    !

    What is Logistic Regression?

     – Form of regression that allows the predictionof discrete variables by a mix of continuousand discrete predictors.

     –  Addresses the same questions thatdiscriminant function analysis and multiple

    regression do but with no distributionalassumptions on the predictors (thepredictors do not have to be normallydistributed, linearly related or have equalvariance in each group)

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    10/22

    26/05/1

    1

    What is Logistic Regression?

     – Logistic regression is often used because

    the relationship between the a discrete

    variable and a predictor is non-linear 

    • Example from the text: the probability of heart disease

    changes very little with a ten-point difference among

    people with low-blood pressure, but a ten point change

    can mean a drastic change in the probability of heart

    disease in people with high blood-pressure.

     Assumptions

    •  Absence of multicollinearity

    • No outliers

    • Independence of errors – assumes a

    between subjects design. There are

    other forms if the design is withinsubjects.

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    11/22

    26/05/1

    Background

     – Odds – like probability. Odds are usually

    written as “5 to 1 odds” which is equivalent to

    1 out of five or .20 probability or 20% chance,

    etc.

    • The problem with probabilities is that they are

    non-linear 

    • Going from .10 to .20 doubles the probability, but

    going from .80 to .90 barely increases the

    probability.

    Background

    • Odds ratio – the ratio of the odds over 1

     – the odds. The probability of winning

    over the probability of losing. 5 to 1 odds

    equates to an odds ratio of .20/.80 = .25.

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    12/22

    26/05/1

    1

    Background

    • Logit – this is the natural log of an odds

    ratio; often called a log odds even though

    it really is a log odds ratio. The logit

    scale is linear and functions much like a

    z-score scale.

    Background

    LOGITS ARE CONTINOUS, LIKE Z

    SCORES

    p = 0.50, then logit = 0

    p = 0.70, then logit = 0.84

    p = 0.30, then logit = -0.84

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    13/22

    26/05/1

    1

    Plain old regression

    • Y = A BINARY RESPONSE (DV)

     – 1 POSITIVE RESPONSE (Success)P

     – 0 NEGATIVE RESPONSE (failure)Q = (1-P)

    • MEAN(Y) = P, observed proportion of

    successes

    • VAR(Y) = PQ, maximized when P = .50,

    variance depends on mean (P)

    • XJ = ANY TYPE OF PREDICTORContinuous, Dichotomous, Polytomous

    Plain old regression

    • and it is assumed that errors are

    normally distributed, with mean=0 andconstant variance (i.e., homogeneity of

    variance)

    0 1 1|Y X B B X     

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    14/22

    26/05/1

    1

    Plain old regression

    • an expected value is a mean, so

    0 1 1ˆ( | ) E Y X B B X 

    • The predicted value equals the proportion of

    observations for which Y|X = 1; P is theprobability of Y = 1(A SUCCESS) given X, and

    Q = 1- P (A FAILURE) given X.

    1ˆ   ˆ( ) |

    Y Y P X  

     An alternative – the ogive function

    •  An ogive function is a curved s-shaped

    function and the most common is the

    logistic function which looks like:

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    15/22

    26/05/1

    1

    The logistic function

    The logistic function

    • Where Y-hat is the estimated probability

    that the ith case is in a category and u isthe regular linear regression equation:

    1

    u

    i   u

    eY 

    e

     

    1 1 2 2   K K u A B X B X B X  

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    16/22

    26/05/1

    1

    The logistic function

    0 1 1

    0 1 1

    ˆ

    1

    b b X 

    i   b b X 

    e

    e

     

    The logistic function

    • Change in probability is not constant

    (linear) with constant changes in X

    • This means that the probability of a

    success (Y = 1) given the predictor

    variable (X) is a non-linear function,specifically a logistic function

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    17/22

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    18/22

    26/05/1

    1

    Logistic Function

    • Constant regression

    constant different

    slopes

     – v2: b0 = -4.00

    b1 = 0.05 (middle)

     – v3: b0 = -4.00

    b1 = 0.15 (top) – v4: b0 = -4.00

    b1 = 0.025 (bottom)10090807060504030

    1.0

    .8

    .6

    .4

    .2

    0.0

    V4

    V1

    V3

    V1

    V2

    V1

    Logistic Function

    • Constant slopes

    with different

    regression

    constants

     – v2: b0 = -3.00

    b1 = 0.05 (top)

     – v3: b0 = -4.00

    b1 = 0.05 (middle)

     – v4: b0 = -5.00

    b1 = 0.05 (bottom)

    10090807060504030

    1.0

    .9

    .8

    .7

    .6

    .5

    .4

    .3

    .2

    .1

    0.0

    V4

    V1

    V3

    V1

    V2

    V1

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    19/22

    26/05/1

    1

    The Logit

    • By algebraic manipulation, the logistic

    regression equation can be written in

    terms of an odds ratio for success:

    0 1 1

    ˆ( 1| )exp( )ˆ(1 ( 1| )) (1 )

    i

    i

    i

     P Y X 

    b b X  P Y X 

     

     

     

     

    The Logit

    • Odds ratios range from 0 to positive

    infinity

    • Odds ratio: P/Q is an odds ratio; less

    than 1 = less than .50 probability, greater

    than 1 means greater than .50 probability

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    20/22

    26/05/1

    2

    The Logit

    • Finally, taking the natural log of both

    sides, we can write the equation in

    terms of logits (log-odds):

    0 1 1

    ˆ( 1| )ln ln

    ˆ(1 ( 1| )) (1 )

     P Y X b b X 

     P Y X 

     

     

    For a single predictor 

    The Logit

    • For multiple predictors

    0 1 1 2 2

    ˆln

    ˆ(1 )  k k 

    b b X b X b X   

     

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    21/22

    26/05/1

    2

    The Logit

    • Log-odds are a linear function of thepredictors

    • The regression coefficients go back totheir old interpretation (kind of)

     – The expected value of the logit (log-odds) when X = 0

     – Called a „logit difference‟; The amountthe logit (log-odds) changes, with a oneunit change in X; the amount the logitchanges in going from X to X + 1

    Conversion

    • EXP(logit) or = odds ratio

    • Probability = odd ratio / (1 + odd ratio)

  • 8/9/2019 Analisis Regresi Linier Dan Logistik

    22/22

    26/05/1

    THANK YOU

    GOOD LUCK