     # Formula Stables

• View
213

0

Embed Size (px)

### Text of Formula Stables

• 8/17/2019 Formula Stables

1/29

Formulas and Tables Inferential Statistics

• 8/17/2019 Formula Stables

2/29

Contents

Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Normal distribution -  z - and  t-tests . . . . . . . . . . . . . . . . . . 13 Analysis of variance (ANOVA) . . . . . . . . . . . . . . . . . . . . 19 Cross tables -  χ2-test . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Non-parametric tests . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Table 1a: Standard normal distribution - negative  z -values . . . . . 24 Table 1b: Standard normal distribution - positive  z -values . . . . . 25 Table 2: Critical values Student  t-distribution . . . . . . . . . . . . 26

Table 3: Critical values  χ 2

-distribution . . . . . . . . . . . . . . . . 27 Table 4: Critical values  F -distribution for α = 0.05 . . . . . . . . . 28

1

• 8/17/2019 Formula Stables

3/29

Descriptive statistics

Mean

For  n observed values  x  of variable  X , the mean equals

x =

n i=1

xi

n   .

Mean of a frequency distribution

For  n   observed values  x   of variable  X , with  k   different outcomes with fre- quency  f , the mean equals

x =

k i=1

f ixi

n   .

For a dichotomous (binary) variable X with different outcomes  x  = 0 and x = 1, the mean equals the proportion of outcomes  x = 1, referred to as  px.

Median

The median is the middle observed value of all ordered observations. The median corresponds to the 50th percentile,  P 50   (see ‘Percentiles’ below).

Mode

The modus is the most frequent observed value.

Standard deviation

The standard deviation (as estimator for the population value  σ) is

sx =

 n i=1(xi − x)2

n − 1   .

The standard deviation population value for a dichotomous (binary) variable is

σx  = 

px(1 − px).

2

• 8/17/2019 Formula Stables

4/29

Variance

The variance (as estimator for the population value  σ2) is

s2x  =

n i=1(xi − x)2

n − 1   .

The variance population value for a dichotomous (binary) variable is

σ2x = px(1 − px).

Percentiles

The  pth percentile is the value for which  p  percent of observations is smaller or equal. For example, 50th percentile is the value for which holds that half  of all observations are smaller or equal. This is referred to as P 50  (which is equivalent to the median).

Interquartile distance

The interquartile distance is

IQR =  Q3 − Q1, where  Q3  corresponds to  P 75  and  Q1  corresponds to  P 25.

Range

The range indicates within which distance from each other with all observed values are located. It is calculated by

range =  maximum − minimum.

Z-score

The z-score, or standardized score

z xi  =  xi − x

sx .

(This is a linear transformation with  a  = −x/sx   and  b  = 1/sx, see ‘Linear transformation’ below).

3

• 8/17/2019 Formula Stables

5/29

Covariance

The covariance between  x and  y

sxy  =   1

n − 1 n i=1

(xi − x)(yi − y).

The following rules apply with respect to the variance and covariance:

sxx   =   s 2 x

s 2 x+y   =   s

2 x + s

2 y + 2sxy

s2x−y   =   s 2 x + s

2 y − 2sxy.

For two dichotomous (binary) variables  X  and Y , where pxy  equals the prob- ability of a score of 1 for both   X   and   Y , the covariance population value equals

σxy  = pxy − px py.

Pearson’s (product-moment) correlation coefficient

The correlation between  x and  y

rxy   =   sxy

sxsy

=   1

n − 1 n i=1

z xiz yi

=   1

n − 1 n i=1

 xi − x

sx

 yi − y

sy

 .

Effect sizes correlation coefficient •   rxy  = 0.1 small effect •   rxy  = 0.3 medium effect •   rxy  = 0.5 large effect

4

• 8/17/2019 Formula Stables

6/29

Linear transformation

For a linear transformation  yi  =  a + bxi  the following holds

y  =  a + b · x

en

s2y   =   b 2 · s2x

sy   =   b · sx.

5

• 8/17/2019 Formula Stables

7/29

Regression

Simple linear regression

Regression equation simple linear regression

yi = a + bxi, where the regression coefficient is estimated by

b =  r xy sysx

and the intercept is estimated by

a =  y − bx.

Residual

The residual (or prediction error) is

(yi −  ŷi), where  yi   is the observed value and ŷi  the predicted value for person  i.

Sums of squares for y

n i=1

(yi − y)2 = n i=1

( yi − y)2 + n i=1

(yi − yi)2, also referred to as:

SS y  =   SS   y−y   + SS y−  y, or as:

SS tot =   SS reg   + SS res

where  SS tot   is the total sum of squares of  y,  SS reg   is the regression sum of  squares ‘explained’ by the model and  SS res   is the residual sum of squares.

6

• 8/17/2019 Formula Stables

8/29

Proportion explained variation

The proportion explained variation (also called the proportional reduction in prediction error) is

r2xy   =

n i=1

(yi − y)2 − n i=1

(yi − yi)2 n i=1

(yi − y)2 ,

=   SS tot − SS res

SS tot ,

=   SS reg

SS tot

t-test for regression coefficient   b

The test statistic for regression coefficient  b assuming  H 0:   β  = β 0 = 0 is

t =  (b − β 0)

seb .

where seb is calculated by software. The statistic follows a t distribution with n − 2 degrees of freedom (df  = n − 2), when the assumptions hold.

Standardized residual

The standardized residual equals

yi − yi seyi−   yi

,

where  seyi −    yi , the standard error for the residual (also referred to as  seres) iscalculated by software.

7

• 8/17/2019 Formula Stables

9/29

Residual standard deviation

The residual standard deviation based on  n observations equals

sres  =

  ni=1 (yi − yi)2 n − k   ,

in other words:   sres  =

  SS res n−k

, where  k  equals the number of parameters in

the regression equation (k = 2 for simple regression).

95% - prediction interval for  yi yi − 2s ≤ yi ≤ yi + 2s, where  s  is the residual standard deviation and 2 is an approximation of  tα/2.

95% - confidence interval for  µy

 y − 2(s/√ n) ≤ µy ≤ 

y + 2(s/ √

n),

where  s  is the residual standard deviation, 2 is an approximation of  tα/2  and n  is the number of observations.

8

• 8/17/2019 Formula Stables

10/29

Multiple linear regression

Regression equation simple multiple regression

For the independent variables (predictors) x1, x2, x3, . . . , x j, . . .

yi  =  a + b1xi1 + b2xi2 + b3xi3 + . . . + b jxij + . . . . Proportion explained variation

The proportion explained variation (proportional reduction in prediction er- ror), or squared multiple correlation coefficient is

R2 =

n i=1

(yi − y)2 − n i=1

(yi − yi)2 n i=1

(yi − y)2 ,

oftewel:   R2 =   SS tot−SS res SS tot

=   SS reg SS tot

.

Multiple correlation coefficient

R = √

R2

F -test statistic regression analysis

The null hypothesis that all regression coefficients equal zero is tested using

F   =

n

i=1 (

 yi − y)2

k − 1 n i=1

(yi − yi)2 n − k

=

SS reg

df reg SS res df res

=  M S  reg

MS res ,

where  k  equals the number of parameters in the regression equation and  n the number of observations. The degrees of freedom are   df reg   =   k − 1 en df res  = n − k.   df reg  and  df res  are often referred to as  df 1  and  df 2. MS  denotes mean squares.   MS res  denotes the residual variance.

9

• 8/17/2019 Formula Stables

11/29

Test statistic for  b j

The test statistic for regression coefficient  b assuming  H 0:   β  = β 0 = 0 is

tbj  =  (b j − β 0)

sebj ,

where   sebj   is calculated by software. The statistic follows a   t   distribution with  n − k  degrees o

Recommended ##### DOWNFIELD STABLES - Pike Smith & Kemp Rural Stables/Sales... Email: rural@pikesmithkemp.co.uk LOCATION
Documents Documents