Formula Stables

  • View
    213

  • Download
    0

Embed Size (px)

Text of Formula Stables

  • 8/17/2019 Formula Stables

    1/29

    Formulas and Tables Inferential Statistics

  • 8/17/2019 Formula Stables

    2/29

    Contents

    Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Normal distribution -  z - and  t-tests . . . . . . . . . . . . . . . . . . 13 Analysis of variance (ANOVA) . . . . . . . . . . . . . . . . . . . . 19 Cross tables -  χ2-test . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Non-parametric tests . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Table 1a: Standard normal distribution - negative  z -values . . . . . 24 Table 1b: Standard normal distribution - positive  z -values . . . . . 25 Table 2: Critical values Student  t-distribution . . . . . . . . . . . . 26

    Table 3: Critical values  χ 2

    -distribution . . . . . . . . . . . . . . . . 27 Table 4: Critical values  F -distribution for α = 0.05 . . . . . . . . . 28

    1

  • 8/17/2019 Formula Stables

    3/29

    Descriptive statistics

    Mean

    For  n observed values  x  of variable  X , the mean equals

    x =

    n i=1

    xi

    n   .

    Mean of a frequency distribution

    For  n   observed values  x   of variable  X , with  k   different outcomes with fre- quency  f , the mean equals

    x =

    k i=1

    f ixi

    n   .

    For a dichotomous (binary) variable X with different outcomes  x  = 0 and x = 1, the mean equals the proportion of outcomes  x = 1, referred to as  px.

    Median

    The median is the middle observed value of all ordered observations. The median corresponds to the 50th percentile,  P 50   (see ‘Percentiles’ below).

    Mode

    The modus is the most frequent observed value.

    Standard deviation

    The standard deviation (as estimator for the population value  σ) is

    sx =

     n i=1(xi − x)2

    n − 1   .

    The standard deviation population value for a dichotomous (binary) variable is

    σx  =  

     px(1 − px).

    2

  • 8/17/2019 Formula Stables

    4/29

    Variance

    The variance (as estimator for the population value  σ2) is

    s2x  =

    n i=1(xi − x)2

    n − 1   .

    The variance population value for a dichotomous (binary) variable is

    σ2x = px(1 − px).

    Percentiles

    The  pth percentile is the value for which  p  percent of observations is smaller or equal. For example, 50th percentile is the value for which holds that half  of all observations are smaller or equal. This is referred to as P 50  (which is equivalent to the median).

    Interquartile distance

    The interquartile distance is

    IQR =  Q3 − Q1, where  Q3  corresponds to  P 75  and  Q1  corresponds to  P 25.

    Range

    The range indicates within which distance from each other with all observed values are located. It is calculated by

    range =  maximum − minimum.

    Z-score

    The z-score, or standardized score

    z xi  =  xi − x

    sx .

    (This is a linear transformation with  a  = −x/sx   and  b  = 1/sx, see ‘Linear transformation’ below).

    3

  • 8/17/2019 Formula Stables

    5/29

    Covariance

    The covariance between  x and  y

    sxy  =   1

    n − 1 n i=1

    (xi − x)(yi − y).

    The following rules apply with respect to the variance and covariance:

    sxx   =   s 2 x

    s 2 x+y   =   s

    2 x + s

    2 y + 2sxy

    s2x−y   =   s 2 x + s

    2 y − 2sxy.

    For two dichotomous (binary) variables  X  and Y , where pxy  equals the prob- ability of a score of 1 for both   X   and   Y , the covariance population value equals

    σxy  = pxy − px py.

    Pearson’s (product-moment) correlation coefficient

    The correlation between  x and  y

    rxy   =   sxy

    sxsy

    =   1

    n − 1 n i=1

    z xiz yi

    =   1

    n − 1 n i=1

     xi − x

    sx

     yi − y

    sy

     .

    Effect sizes correlation coefficient •   rxy  = 0.1 small effect •   rxy  = 0.3 medium effect •   rxy  = 0.5 large effect

    4

  • 8/17/2019 Formula Stables

    6/29

    Linear transformation

    For a linear transformation  yi  =  a + bxi  the following holds

    y  =  a + b · x

    en

    s2y   =   b 2 · s2x

    sy   =   b · sx.

    5

  • 8/17/2019 Formula Stables

    7/29

    Regression

    Simple linear regression 

    Regression equation simple linear regression

     yi = a + bxi, where the regression coefficient is estimated by

    b =  r xy sysx

    and the intercept is estimated by

    a =  y − bx.

    Residual

    The residual (or prediction error) is

    (yi −  ŷi), where  yi   is the observed value and ŷi  the predicted value for person  i.

    Sums of squares for y

    n i=1

    (yi − y)2 = n i=1

    ( yi − y)2 + n i=1

    (yi − yi)2, also referred to as:

    SS y  =   SS   y−y   + SS y−  y, or as:

    SS tot =   SS reg   + SS res

    where  SS tot   is the total sum of squares of  y,  SS reg   is the regression sum of  squares ‘explained’ by the model and  SS res   is the residual sum of squares.

    6

  • 8/17/2019 Formula Stables

    8/29

    Proportion explained variation

    The proportion explained variation (also called the proportional reduction in prediction error) is

    r2xy   =

    n i=1

    (yi − y)2 − n i=1

    (yi − yi)2 n i=1

    (yi − y)2 ,

    =   SS tot − SS res

    SS tot ,

    =   SS reg

    SS tot

    t-test for regression coefficient   b

    The test statistic for regression coefficient  b assuming  H 0:   β  = β 0 = 0 is

    t =  (b − β 0)

    seb .

    where seb is calculated by software. The statistic follows a t distribution with n − 2 degrees of freedom (df  = n − 2), when the assumptions hold.

    Standardized residual

    The standardized residual equals

    yi − yi seyi−   yi

    ,

    where  seyi −    yi , the standard error for the residual (also referred to as  seres) iscalculated by software.

    7

  • 8/17/2019 Formula Stables

    9/29

    Residual standard deviation

    The residual standard deviation based on  n observations equals

    sres  =

      ni=1 (yi − yi)2 n − k   ,

    in other words:   sres  =

      SS res n−k

     , where  k  equals the number of parameters in

    the regression equation (k = 2 for simple regression).

    95% - prediction interval for  yi yi − 2s ≤ yi ≤ yi + 2s, where  s  is the residual standard deviation and 2 is an approximation of  tα/2.

    95% - confidence interval for  µy

      y − 2(s/√ n) ≤ µy ≤ 

    y + 2(s/ √ 

    n),

    where  s  is the residual standard deviation, 2 is an approximation of  tα/2  and n  is the number of observations.

    8

  • 8/17/2019 Formula Stables

    10/29

    Multiple linear regression 

    Regression equation simple multiple regression

    For the independent variables (predictors) x1, x2, x3, . . . , x j, . . .

     yi  =  a + b1xi1 + b2xi2 + b3xi3 + . . . + b jxij + . . . . Proportion explained variation

    The proportion explained variation (proportional reduction in prediction er- ror), or squared multiple correlation coefficient is

    R2 =

    n i=1

    (yi − y)2 − n i=1

    (yi − yi)2 n i=1

    (yi − y)2 ,

    oftewel:   R2 =   SS tot−SS res SS tot

    =   SS reg SS tot

    .

    Multiple correlation coefficient

    R = √ 

    R2

    F -test statistic regression analysis

    The null hypothesis that all regression coefficients equal zero is tested using

    F   =

    n

    i=1 (

      yi − y)2

    k − 1 n i=1

    (yi − yi)2 n − k

    =

    SS reg

    df reg SS res df res

    =  M S  reg

    MS res ,

    where  k  equals the number of parameters in the regression equation and  n the number of observations. The degrees of freedom are   df reg   =   k − 1 en df res  = n − k.   df reg  and  df res  are often referred to as  df 1  and  df 2. MS  denotes mean squares.   MS res  denotes the residual variance.

    9

  • 8/17/2019 Formula Stables

    11/29

    Test statistic for  b j

    The test statistic for regression coefficient  b assuming  H 0:   β  = β 0 = 0 is

    tbj  =  (b j − β 0)

    sebj ,

    where   sebj   is calculated by software. The statistic follows a   t   distribution with  n − k  degrees o