Statistics Cookbook

Embed Size (px)

Citation preview

  • 8/12/2019 Statistics Cookbook

    1/28

    Probability and Statistics

    Cookbook

    Copyright cMatthias Vallentin,[email protected]

    10th October, 2012

    http://matthias.vallentin.net/http://matthias.vallentin.net/mailto:[email protected]:[email protected]://matthias.vallentin.net/
  • 8/12/2019 Statistics Cookbook

    2/28

    This cookbook integrates a variety of topics in probability the-

    ory and statistics. It is based on literature[1,6,3]and in-class

    material from courses of the statistics department at the Uni-

    versity of California in Berkeley but also influenced by other

    sources[4,5]. If you find errors or have suggestions for further

    topics, I would appreciate if you send me an email. The most re-

    cent version of this document is available at http://matthias.

    vallentin.net/probability-and-statistics-cookbook/ . To

    reproduce, please contact me.

    Contents

    1 Distribution Overview 31.1 Discrete Distributions . . . . . . . . . . 31.2 Continuous Distributions . . . . . . . . 4

    2 Probability Theory 6

    3 Random Variables 6

    3.1 Transformations . . . . . . . . . . . . . 7

    4 Expectation 7

    5 Variance 7

    6 Inequalities 8

    7 Distribution Relationships 8

    8 Probability and Moment GeneratingFunctions 9

    9 Multivariate Distributions 99.1 Standard Bivariate Normal . . . . . . . 99.2 Bivariate Normal . . . . . . . . . . . . . 99.3 Multivariate Normal . . . . . . . . . . . 9

    10 Convergence 910.1 Law of Large Numbers (LLN) . . . . . . 1010.2 Central Limit Theorem (CLT) . . . . . 10

    11 Statistical Inference 1011.1 Point Estimation . . . . . . . . . . . . . 1011.2 Normal-Based Confidence Interval . . . 11

    11.3 Empirical distribution . . . . . . . . . . 1111.4 Statistical Functionals . . . . . . . . . . 11

    12 Parametric Inference 11

    12.1 Method of Moments . . . . . . . . . . . 11

    12.2 Maximum Likelihood . . . . . . . . . . . 12

    12.2.1 Delta Method . . . . . . . . . . . 12

    12.3 Multiparameter Models . . . . . . . . . 12

    12.3.1 Multiparameter delta method . . 13

    12.4 Parametric Bootstrap . . . . . . . . . . 13

    13 Hypothesis Testing 13

    14 Bayesian Inference 14

    14.1 Credible Intervals . . . . . . . . . . . . . 14

    14.2 Function of parameters. . . . . . . . . . 14

    14.3 Priors . . . . . . . . . . . . . . . . . . . 15

    14.3.1 Conjugate Priors . . . . . . . . . 15

    14.4 Bayesian Testing . . . . . . . . . . . . . 15

    15 Exponential Family 16

    16 Sampling Methods 1616.1 The Bootstrap . . . . . . . . . . . . . . 16

    16.1.1 Bootstrap Confidence Intervals . 16

    16.2 Rejection Sampling . . . . . . . . . . . . 17

    16.3 Importance Sampling. . . . . . . . . . . 17

    17 Decision Theory 17

    17.1 Risk . . . . . . . . . . . . . . . . . . . . 17

    17.2 Admissibility . . . . . . . . . . . . . . . 17

    17.3 Bayes Rule . . . . . . . . . . . . . . . . 18

    17.4 Minimax Rules . . . . . . . . . . . . . . 18

    18 Linear Regression 18

    18.1 Simple Linear Regression . . . . . . . . 18

    18.2 Prediction . . . . . . . . . . . . . . . . . 19

    18.3 Multiple Regression . . . . . . . . . . . 19

    18.4 Model Selection . . . . . . . . . . . . . . 19

    19 Non-parametric Function Estimation 20

    19.1 Density Estimation . . . . . . . . . . . . 20

    19.1.1 Histograms . . . . . . . . . . . . 20

    19.1.2 Kernel Density Estimator (KDE) 21

    19.2 Non-parametric Regression . . . . . . . 2119.3 Smoothing Using Orthogonal Functions 21

    20 Stochastic Processes 2220.1 Markov Chains . . . . . . . . . . . . . . 2220.2 Poisson Processes . . . . . . . . . . . . . 22

    21 Time Series 2321.1 Stationary Time Series . . . . . . . . . . 2321.2 Estimation of Correlation . . . . . . . . 2421.3 Non-Stationary Time Series . . . . . . . 24

    21.3.1 Detrending . . . . . . . . . . . . 2421.4 ARIMA models . . . . . . . . . . . . . . 24

    21.4.1 Causality and Invertibility . . . . 2521.5 Spectral Analysis . . . . . . . . . . . . . 25

    22 Math 2622.1 Gamma Function . . . . . . . . . . . . . 2622.2 Beta Function . . . . . . . . . . . . . . . 2622.3 Series . . . . . . . . . . . . . . . . . . . 2722.4 Combinatorics . . . . . . . . . . . . . . 27

    mailto:[email protected]:[email protected]://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/mailto:[email protected]
  • 8/12/2019 Statistics Cookbook

    3/28

    1 Distribution Overview

    1.1 Discrete Distributions

    Notation1 FX(x) fX(x) E [X] V [X] MX(s)

    Uniform Unif {a , . . . , b}

    0 x < axa+1

    ba a x b1 x > b

    I(a < x < b)

    b

    a + 1

    a + b

    2

    (b a + 1)2 112

    eas e(b+1)ss(b

    a)

    Bernoulli Bern (p) (1p)1x px (1p)1x p p(1 p) 1 p +pes

    Binomial Bin (n, p) I1p(n x, x + 1)

    n

    x

    px (1 p)nx np np(1 p) (1p +pes)n

    Multinomial Mult (n, p) n!

    x1! . . . xk!px11 pxkk

    ki=1

    xi= n npi npi(1 pi)

    ki=0

    piesi

    n

    Hypergeometric Hyp(N , m, n)

    x npnp(1p)

    mx

    mxnx

    Nx

    nmN

    nm(N n)(Nm)N2(N 1)

    Negative Binomial NBin(r, p) Ip(r, x + 1) x + r 1

    r 1 pr(1

    p)x r

    1

    p

    p r

    1

    p

    p2 p

    1 (1p)esr

    Geometric Geo (p) 1 (1p)x x N+ p(1p)x1 x N+ 1p

    1 pp2

    p

    1 (1p)es

    Poisson Po () exi=0

    i

    i!

    xe

    x! e(e

    s1)

    1

    n

    a bx

    PMF

    Uniform (discrete)

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0 10 20 30 40x

    PMF

    n =40, p =0.3n =30, p =0.6n =25, p =0.9

    Binomial

    0.0

    0.2

    0.4

    0.6

    0.8

    2 4 6 8 10x

    PMF

    p = 0.2p = 0.5p = 0.8

    Geometric

    0.0

    0.1

    0.2

    0.3

    0 5 10 15 20

    x

    PMF

    = 1

    = 4

    = 10

    Poisson

    1We use the notation (s,x) and (x) to refer to the Gamma functions (see 22.1), and use B(x,y) and Ix to refer to the Beta functions (see 22.2).

    3

  • 8/12/2019 Statistics Cookbook

    4/28

    1.2 Continuous Distributions

    Notation FX(x) fX(x) E [X] V [X] MX(s)

    Uniform Unif (a, b)

    0 x < axaba a < x < b

    1 x > b

    I(a < x < b)

    b aa + b

    2

    (b a)212

    esb esas(b a)

    Normal

    N, 2 (x) =

    x

    (t) dt (x) = 1

    2exp

    (x )2

    22 2 exps +

    2s2

    2

    Log-Normal lnN, 2 12

    +1

    2erf

    ln x

    22

    1

    x

    22exp

    (ln x )

    2

    22

    e+

    2/2 (e2 1)e2+2

    Multivariate Normal MVN(, ) (2)k/2||1/2e 12 (x)T1(x) exp

    Ts +1

    2sTs

    Students t Student() Ix

    2,

    2

    +12

    2

    1 + x2

    (+1)/20

    2 >2 1< 2

    Chi-square 2k1

    (k/2)

    k

    2,x

    2

    1

    2k/2(k/2)xk/21ex/2 k 2k (1 2s)k/2 s < 1/2

    F F(d1, d2) I d1xd1x+d2

    d12

    ,d1

    2

    (d1x)d1dd22(d1x+d2)

    d1+d2

    xBd12 ,

    d12

    d2d2 2

    2d22(d1+ d2 2)d1(d2 2)2(d2 4)

    Exponential Exp () 1 ex/ 1

    ex/ 2 1

    1 s (s < 1/)

    Gamma Gamma (, ) (,x/)

    ()

    1

    () x1ex/ 2

    1

    1 s

    (s < 1/)

    Inverse Gamma InvGamma(, )

    , x

    ()

    ()x1e/x

    1 > 1 2

    ( 1)2( 2)2 > 2 2(s)/2

    () K

    4s

    Dirichlet Dir ()

    ki=1 i

    k

    i=1 (i)

    k

    i=1

    xi1iiki=1 i

    E [Xi] (1 E [Xi])ki=1 i+ 1

    Beta Beta (, ) Ix(, ) ( + )

    () ()x1 (1 x)1

    +

    ( + )2( + + 1) 1 +

    k=1

    k1r=0

    + r

    + + r

    sk

    k!

    Weibull Weibull(, k) 1 e(x/)k k

    x

    k1e(x/)

    k

    1 +

    1

    k

    2

    1 +

    2

    k

    2

    n=0

    snn

    n!

    1 +n

    k

    Pareto Pareto(xm, ) 1xm

    x

    x xm x

    m

    x+1 x xm xm

    1 > 1 xm

    ( 1)2( 2) > 2 (xms)(,xms)s

  • 8/12/2019 Statistics Cookbook

    5/28

    1

    b a

    a bx

    PDF

    Uniform (continuous)

    0.0

    0.2

    0.4

    0.6

    0.8

    4 2 0 2 4

    x

    (x)

    =0, 2

    =0.2

    =0, 2

    =1

    =0, 2

    =5

    =2, 2

    =0.5

    Normal

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0.0 0.5 1.0 1.5 2.0 2.5 3.0x

    PDF

    =0, 2

    =3 =2,

    2=2

    =0, 2

    =1 =0.5,

    2=1

    =0.25, 2

    =1 =0.125,

    2=1

    LogNormal

    0.0

    0.1

    0.2

    0.3

    0.4

    4 2 0 2 4

    x

    PDF

    =1

    =2

    =5

    =

    Student's t

    0

    1

    2

    3

    4

    0 2 4 6 8

    x

    PDF

    k = 1

    k = 2

    k = 3

    k = 4

    k = 5

    2

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    0 1 2 3 4 5

    x

    PDF

    d1 =1, d2 =1

    d1 =2, d2 =1

    d1 =5, d2 =2

    d1 =100, d2 =1

    d1 =100, d2 =100

    F

    0.0

    0.5

    1.0

    1.5

    2.0

    0 1 2 3 4 5x

    PDF

    = 2 = 1 = 0.4

    Exponential

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0 5 10 15 20

    x

    PDF

    =1, =2 =2, =2 =3, =2 =5, =1 =9, =0.5

    Gamma

    0

    1

    2

    3

    4

    0 1 2 3 4 5

    x

    PDF

    =1, =1 =2, =1 =3, =1 =3, =0.5

    Inverse Gamma

    0

    1

    2

    3

    4

    5

    0.0 0.2 0.4 0.6 0.8 1.0

    x

    PDF

    =0.5, =0.5 =5, =1 =1, =3 =2, =2 =2, =5

    Beta

    0

    1

    2

    3

    4

    0.0 0.5 1.0 1.5 2.0 2.5

    x

    PDF

    =1, k =0.5

    =1, k =1 =1, k =1.5 =1, k =5

    Weibull

    0

    1

    2

    3

    0 1 2 3 4 5

    x

    PDF

    xm =1, =1

    xm =1, =2

    xm =1, =4

    Pareto

    5

  • 8/12/2019 Statistics Cookbook

    6/28

    2 Probability Theory

    Definitions

    Sample space Outcome (point or element) Event A -algebraA

    1. A2. A1, A2, . . . , A =

    i=1 Ai A

    3. A A = A A Probability Distribution P

    1. P [A] 0 A2. P [] = 1

    3. P

    i=1

    Ai

    =

    i=1

    P [Ai]

    Probability space (, A,P)Properties

    P [] = 0 B = B= (A A) B = (A B) (A B) P [A] = 1 P [A] P [B] = P [A B] + P [A B] P [] = 1 P [] = 0 (n An) = n An (n An) = n An DeMorgan P [n An] = 1 P [n An] P [A B] = P [A] + P [B] P [A B]

    = P

    [A B] P

    [A] +P

    [B] P [A B] = P [A B] + P [A B] + P [A B] P [A B] = P [A] P [A B]

    Continuity of Probabilities

    A1 A2 . . . = limn P [An] = P [A] whereA=i=1 Ai

    A1 A2 . . . = limn P [An] = P [A] whereA=i=1 Ai

    IndependenceA B P [A B] = P [A]P [B]

    Conditional Probability

    P [A | B] = P [A B]P [B]

    P [B]> 0

    Law of Total Probability

    P [B] =ni=1

    P [B|Ai]P [Ai] =ni=1

    Ai

    Bayes Theorem

    P [Ai

    |B] =

    P [B | Ai]P [Ai]

    nj=1 P [B | Aj ]P [Aj ] =n

    i=1 AiInclusion-Exclusion Principle n

    i=1

    Ai

    = nr=1

    (1)r1

    ii1

  • 8/12/2019 Statistics Cookbook

    7/28

    3.1 Transformations

    Transformation functionZ= (X)

    Discrete

    fZ(z) = P [(X) = z ] = P [{x: (x) = z}] = P

    X 1(z) = x1(z)

    f(x)

    Continuous

    FZ(z) = P [(X) z] =Az

    f(x) dx withAz = {x: (x) z}

    Special case if strictly monotone

    fZ(z) = fX(1(z))

    ddz 1(z) =fX(x) dxdz

    =fX(x) 1|J|The Rule of the Lazy Statistician

    E [Z] = (x) dFX(x)E [IA(x)] =

    IA(x) dFX(x) =

    A

    dFX(x) = P [X A]

    Convolution

    Z:= X+ Y fZ(z) =

    fX,Y(x, z x) dx X,Y0= z

    0

    fX,Y(x, z x) dx

    Z:= |X Y| fZ(z) = 2

    0

    fX,Y(x, z+ x) dx

    Z:=

    X

    Y

    fZ(z) =

    |x

    |fX,Y(x,xz) dx

    =

    xfx(x)fX(x)fY(xz) dx

    4 Expectation

    Definition and properties

    E [X] = X =

    x dFX(x) =

    x

    xfX(x) X discrete

    xfX(x) X continuous

    P [X= c] = 1 = E [c] = c E [cX] = c E [X] E [X+ Y] = E [X] + E [Y]

    E [XY ] =X,Y

    xyfX,Y(x, y) dFX(x) dFY(y)

    E [(Y)] =(E [X]) (cf. Jenseninequality) P [X Y] = 0 = E [X] E [Y] P [X= Y] = 1 = E [X] = E [Y] E [X] =

    x=1

    P [X x]

    Sample mean

    Xn = 1n

    ni=1

    Xi

    Conditional expectation

    E [Y| X= x] =

    yf(y | x) dy E [X] = E [E [X| Y]] E[(X, Y) | X= x] =

    (x, y)fY|X(y | x) dx

    E [(Y, Z) | X= x] =

    (y, z)f(Y,Z)|X(y, z | x) dydz E [Y + Z| X] = E [Y| X] + E [Z| X] E [(X)Y| X] = (X)E [Y| X] E[Y| X] = c = Cov [X, Y] = 0

    5 VarianceDefinition and properties

    V [X] = 2X = E

    (X E [X])2 = E X2 E [X]2 V

    n

    i=1 Xi =n

    i=1 V [Xi] + 2i=j Cov [Xi, Yj ] V

    ni=1

    Xi

    =

    ni=1

    V [Xi] ifXi Xj

    Standard deviationsd[X] =

    V [X] = X

    Covariance

    Cov [X, Y] = E [(X E [X])(Y E [Y])] = E [XY ] E [X]E [Y] Cov [X, a] = 0 Cov [X, X] = V [X] Cov [X, Y] = Cov [Y, X] Cov [aX,bY] = abCov [X, Y]

    7

  • 8/12/2019 Statistics Cookbook

    8/28

    Cov [X+ a, Y + b] = Cov [X, Y]

    Cov ni=1

    Xi,

    mj=1

    Yj

    = ni=1

    mj=1

    Cov [Xi, Yj ]

    Correlation

    [X, Y] = Cov [X, Y]

    V [X]V [Y]

    Independence

    X Y = [X, Y] = 0 Cov [X, Y] = 0 E [XY ] = E [X]E [Y]Sample variance

    S2 = 1

    n 1ni=1

    (Xi Xn)2

    Conditional variance

    V [Y| X] = E (Y E [Y| X])2 | X = E Y2 | X E [Y| X]2 V [Y] = E [V [Y| X]] + V [E [Y| X]]

    6 Inequalities

    Cauchy-Schwarz

    E [XY ]2 E X2E Y2

    Markov

    P [(X) t] E [(X)]t

    Chebyshev

    P [|X E [X]| t] V [X]t2

    Chernoff

    P [X (1 + )] e

    (1 + )1+

    > 1

    Jensen

    E [(X)] (E [X]) convex

    7 Distribution Relationships

    Binomial

    Xi

    Bern(p) =

    n

    i=1 Xi Bin(n, p) X Bin(n, p) , Y Bin(m, p) = X+ Y Bin(n + m, p)

    limn Bin(n, p) = Po(np) (n large,p small) limn Bin(n, p) =N(np,np(1 p)) (n large, p far from 0 and 1)

    Negative Binomial

    X NBin (1, p) = Geo (p) X NBin (r, p) = ri=1Geo (p) Xi NBin(ri, p) =

    Xi NBin(

    ri, p)

    X

    NBin (r, p) . Y

    Bin(s + r, p) =

    P [X

    s] = P [Y

    r]

    Poisson

    Xi Po (i) Xi Xj =ni=1

    Xi Po

    ni=1

    i

    Xi Po (i) Xi Xj = Xi

    nj=1

    Xj Bin nj=1

    Xj , inj=1 j

    Exponential

    Xi Exp() Xi Xj =ni=1

    Xi Gamma (n, )

    Memoryless property: P [X > x + y | X > y ] = P [X > x]Normal

    X N, 2 = X N(0, 1) X N, 2 Z= aX+ b = Z Na + b, a22 X N1, 21 Y N2, 22 = X+ Y N1+ 2, 21+ 22 Xi N

    i,

    2i

    =i Xi Ni i,i 2i

    P [a < X b] = b

    a

    (x) = 1 (x) (x) = x(x) (x) = (x2 1)(x) Upper quantile ofN(0, 1): z= 1(1 )

    Gamma

    X Gamma (, ) X/ Gamma (, 1) Gamma (, ) i=1Exp () Xi Gamma(i, ) Xi Xj =

    i Xi Gamma (

    i i, )

    ()

    =

    0

    x1ex dx

    Beta

    1B(, )

    x1(1 x)1 = ( + )()()

    x1(1 x)1

    E Xk =

    B( + k, )

    B(, ) =

    + k 1 + + k 1

    E Xk1 Beta (1, 1) Unif(0, 1)8

  • 8/12/2019 Statistics Cookbook

    9/28

    8 Probability and Moment Generating Functions

    GX(t) = E

    tX |t| 0) lim

    nP [|Xn X| > ] = 0

    3. Almost surely (strongly): Xnas X

    P limn

    Xn= X = P : limn

    Xn() = X() = 19

  • 8/12/2019 Statistics Cookbook

    10/28

    4. In quadratic mean (L2): Xnqm X

    limn

    E

    (Xn X)2

    = 0

    Relationships

    Xn qm X = Xn P X = Xn D X

    Xn

    as

    X =

    Xn

    P

    X

    Xn D X (c R) P [X= c] = 1 = Xn P X Xn P X Yn P Y = Xn+ Yn P X+ Y Xn qm X Yn qm Y = Xn+ Yn qm X+ Y Xn P X Yn P Y = XnYn P XY Xn P X = (Xn) P (X) Xn D X = (Xn) D (X) Xn qm b limn E [Xn] = b limnV [Xn] = 0 X1, . . . , X n i id E [X] = V [X]< Xn qm

    Slutzkys Theorem

    Xn D X andYn P c = Xn+ Yn D X+ c Xn D X andYn P c = XnYn D cX In general: Xn D XandYn D Y = Xn+ Yn D X+ Y

    10.1 Law of Large Numbers (LLN)

    Let{X1, . . . , X n} be a sequence of iid rvs, E [X1] = .

    Weak (WLLN)

    XnP n

    Strong (SLLN)

    Xnas n

    10.2 Central Limit Theorem (CLT)

    Let{X1, . . . , X n} be a sequence of iid rvs, E [X1] = , and V [X1] = 2.

    Zn:=Xn

    V Xn=

    n(Xn )

    D Z where Z N(0, 1)

    limn

    P [Zn z] = (z) z R

    CLT notations

    Zn N(0, 1)

    Xn N

    , 2

    n

    Xn N

    0,

    2

    n

    n(

    Xn ) N0, 2

    n(Xn )n

    N(0, 1)

    Continuity correction

    P

    Xn x x + 12

    /

    n

    P

    Xn x

    1

    x 12

    /

    n

    Delta method

    Yn N

    , 2

    n

    = (Yn) N

    (), (())2

    2

    n

    11 Statistical Inference

    LetX1, , Xn iid Fif not otherwise noted.

    11.1 Point Estimation

    Point estimatorn of is a rv:n = g(X1, . . . , X n) bias(n) = E n Consistency:n P Sampling distribution: F(n) Standard error: se(n) = V n Mean squared error: mse= E

    (n )2 =bias(n)2 + V n

    limn bias(n) = 0 limn se(n) = 0 =n is consistent Asymptotic normality:

    n

    se

    D N(0, 1)

    Slutzkys Theorem often lets us replace se(n) by some (weakly) consis-tent estimatorn.

    10

  • 8/12/2019 Statistics Cookbook

    11/28

    11.2 Normal-Based Confidence Interval

    Supposen N,se2. Let z/2 = 1(1 (/2)), i.e., P Z > z/2 = /2and P

    z/2 < Z < z/2 = 1 where Z N(0, 1). ThenCn=n z/2se

    11.3 Empirical distribution

    Empirical Distribution Function (ECDF)

    Fn(x) = ni=1 I(Xi x)n

    I(Xi x) =

    1 Xi x0 Xi > x

    Properties (for any fixed x)

    EFn =F(x)

    VFn = F(x)(1 F(x))

    n

    mse= F(x)(1 F(x))n

    D 0Fn P F(x)

    Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (X1, . . . , X n F)

    P supx F(x) Fn(x) > = 2e2n2Nonparametric 1 confidence band for F

    L(x) = max{Fn n, 0}U(x) = min{Fn+ n, 1}

    =

    1

    2nlog

    2

    P [L(x) F(x) U(x) x] 1

    11.4 Statistical Functionals

    Statistical functional: T(F) Plug-in estimator of = (F):n = T(Fn) Linear functional: T(F) = (x) dFX(x) Plug-in estimator for linear functional:

    T(Fn) = (x) dFn(x) = 1nn

    i=1 (Xi) Often: T(Fn) NT(F),se2 = T(Fn) z/2se pth quantile: F1(p) = inf{x: F(x) p}= Xn2 = 1

    n 1ni=1

    (Xi Xn)2

    =

    1n

    ni=1(Xi )3

    3j

    = ni=1(Xi

    Xn)(Yi

    Yn)n

    i=1(Xi Xn)2ni=1(Yi Yn)12 Parametric Inference

    Let F=

    f(x; ) : be a parametric model with parameter space Rkand parameter = (1, . . . , k).

    12.1 Method of Moments

    jth moment

    j() = E Xj = xj dFX(x)jth sample moment

    j = 1n

    ni=1

    Xji

    Method of moments estimator (MoM)

    1() =12() =

    2

    ..

    . =

    ..

    .k() =k

    11

  • 8/12/2019 Statistics Cookbook

    12/28

    Properties of the MoM estimator

    n exists with probability tending to 1 Consistency:n P Asymptotic normality:

    n(

    ) D N(0, )

    where =gE Y YT gT, Y = (X, X2, . . . , X k)T,g= (g1, . . . , gk) and gj = 1j ()12.2 Maximum Likelihood

    Likelihood:Ln: [0, )

    Ln() =ni=1

    f(Xi; )

    Log-likelihood

    n() = log Ln() =ni=1

    log f(Xi; )

    Maximum likelihood estimator (mle)

    Ln(n) = sup

    Ln()

    Score function

    s(X; ) =

    log f(X; )

    Fisher informationI() = V[s(X; )]

    In() = nI()

    Fisher information (exponential family)

    I() = E

    s(X; )

    Observed Fisher information

    Iobsn () = 2

    2

    ni=1

    log f(Xi; )

    Properties of the mle

    Consistency:n P

    Equivariance:n is the mle = (n) ist the mle of() Asymptotic normality:

    1. se 1/In()(n )

    se

    D N(0, 1)

    2.

    se

    1/In(

    n)

    (n )se D N(0, 1) Asymptotic optimality (or efficiency), i.e., smallest variance for large sam-

    ples. Ifn is any other estimator, the asymptotic relative efficiency isare(n,n) = V

    nV

    n 1 Approximately the Bayes estimator

    12.2.1 Delta MethodIf=() where is differentiable and () = 0:

    (n )se() D N(0, 1)where=() is the mle of and

    se= ()se(n)12.3 Multiparameter Models

    Let = (1, . . . , k) and= (1, . . . ,k) be the mle.Hjj =

    2n2

    Hjk = 2njk

    Fisher information matrix

    In() =

    E[H11] E[H1k]... . . . ...E[Hk1] E[Hkk ]

    Under appropriate regularity conditions

    ( ) N(0, Jn)12

  • 8/12/2019 Statistics Cookbook

    13/28

    withJn() = I1n . Further, if

    j is the j th component of , then(j j)sej D N(0, 1)

    where

    se

    2j =Jn(j, j) and Cov

    j ,k

    =Jn(j, k)

    12.3.1 Multiparameter delta method

    Let =(1, . . . , k) and let the gradient of be

    =

    1...

    k

    Suppose

    == 0 and=(). Then,( )se() D N(0, 1)

    where

    se() = TJn

    andJn = Jn() and= =.

    12.4 Parametric Bootstrap

    Sample from f(x;n) instead of fromFn, wheren could be the mleor methodof moments estimator.

    13 Hypothesis Testing

    H0: 0 versus H1: 1Definitions

    Null hypothesis H0 Alternative hypothesis H1

    Simple hypothesis = 0

    Composite hypothesis > 0 or < 0 Two-sided test: H0 : = 0 versus H1: =0 One-sided test: H0: 0 versus H1: > 0 Critical value c Test statistic T Rejection regionR = {x: T(x)> c} Power function() = P [X R] Power of a test: 1 P [Type II error] = 1 = inf

    1()

    Test size: = P [Type I error] = sup0

    ()

    RetainH0 RejectH0H0 true

    Type I Error ()

    H1 true Type II Error ()

    (power)p-value

    p-value = sup0 P[T(X) T(x)] = inf

    : T(x) R

    p-value = sup0 P[T(X) T(X)] 1F(T(X) ) s ince T(X)F

    = inf

    : T(X) R

    p-va lue ev ide nce

    0.1 l ittle or no evidence against H0

    Wald test

    Two-sided test Reject H0 when|W| > z/2 where W=

    0se P |W| > z/2 p-value = P0[|W| > |w|] P [|Z| > |w|] = 2(|w|)

    Likelihood ratio test (LRT)

    T(X) = sup Ln()sup0 Ln()

    = Ln(n)Ln(n,0) 13

  • 8/12/2019 Statistics Cookbook

    14/28

    (X) = 2 log T(X) D 2rq whereki=1

    Z2i 2k andZ1, . . . , Z k iid N(0, 1)

    p-value = P0[(X)> (x)] P

    2rq > (x)

    Multinomial LRT

    mle:

    pn=

    X1n

    , . . . ,Xk

    n

    T(X) =Ln(pn)Ln(p0) = kj=1pjp0jX

    j

    (X) = 2k

    j=1

    Xjlog

    pjp0j

    D 2k1

    The approximate size LRT rejects H0 when(X) 2k1,Pearson Chi-square Test

    T =kj=1

    (XjE [Xj ])2E [Xj ]

    where E [Xj ] = np0j under H0

    T D 2k1 p-value = P 2k1> T(x) Faster D X2k1 than LRT, hence preferable for small n

    Independence testing

    I rows,J columns, X multinomial sample of size n = I J mles unconstrained:pij = Xijn mles under H0:p0ij =pipj = Xin Xjn LRT: = 2

    Ii=1

    Jj=1 Xijlog

    nXijXiXj PearsonChiSq: T = Ii=1Jj=1 (XijE[Xij ])2E[Xij ]

    LRT and Pearson D 2k, where = (I 1)(J 1)

    14 Bayesian Inference

    Bayes Theorem

    f( | x) = f(x | )f()f(xn)

    = f(x | )f()f(x | )f() d Ln()f()

    Definitions

    Xn = (X1, . . . , X n)

    xn = (x1, . . . , xn) Prior density f() Likelihood f(xn | ): joint density of the data

    In particular,Xn iid = f(xn | ) =ni=1

    f(xi | ) = Ln()

    Posterior density f( | xn) Normalizing constantcn = f(xn) = f(x | )f() d Kernel: part of a density that depends on Posterior mean n =

    f( | xn) d=

    Ln()f()Ln()f() d

    14.1 Credible Intervals

    Posterior interval

    P [ (a, b) | xn] = ba

    f( | xn) d= 1

    Equal-tail credible interval a

    f( | xn) d=b

    f( | xn) d= /2

    Highest posterior density (HPD) region Rn

    1. P [ Rn] = 1 2. Rn= {: f( | xn)> k} for some k

    Rn is unimodal = Rn is an interval

    14.2 Function of parametersLet= () andA = {: () }.Posterior CDF for

    H(r | xn) = P [() | xn] =A

    f( | xn) d

    Posterior density

    h(| xn) = H(| xn)Bayesian delta method

    | Xn N(),se () 14

  • 8/12/2019 Statistics Cookbook

    15/28

    14.3 Priors

    Choice

    Subjective bayesianism. Objective bayesianism. Robust bayesianism.

    Types

    Flat: f() constant Proper: f() d= 1 Improper: f() d= Jeffreys prior (transformation-invariant):

    f()

    I() f()

    det(I())

    Conjugate: f() andf( | xn) belong to the same parametric family

    14.3.1 Conjugate Priors

    Discrete likelihood

    Likelihood Conjugate prior Posterior hyperparameters

    Bern(p) Beta (, ) +n

    i=1 xi, + n n

    i=1 xiBin(p) Beta (, ) +

    ni=1

    xi, +ni=1

    Ni ni=1

    xi

    NBin (p) Beta (, ) + rn,+ni=1

    xi

    Po () Gamma (, ) +ni=1

    xi, + n

    Multinomial(p) Dir () +

    ni=1

    x(i)

    Geo(p) Beta (, ) + n, +

    ni=1

    xi

    Continuous likelihood (subscript c denotes constant)

    Likelihood Conjugate prior Posterior hyperparameters

    Unif (0, ) Pareto(xm, k) max

    x(n), xm

    , k+ n

    Exp() Gamma (, ) + n, +ni=1

    xi

    N, 2c N0,

    20

    0

    2

    0

    + ni=1 xi

    2

    c /1

    2

    0

    + n

    2

    c , 120

    + n

    2c

    1Nc, 2 Scaled Inverse Chi-

    square(, 20 )+ n,

    20+n

    i=1(xi )2+ n

    N, 2 Normal-scaled InverseGamma(,,,)

    + nx

    + n , + n, +

    n

    2,

    +1

    2

    ni=1

    (xi x)2 + (x )2

    2(n + )

    MVN(, c) MVN(0, 0) 10 + n1c 1 10 0+ n1x,10 + n

    1c1

    MVN(c, ) Inve rse-Wishart(, )

    n + , +

    ni=1

    (xi c)(xi c)T

    Pareto(xmc , k) Gamma (, ) + n, +ni=1

    log xixmc

    Pareto(xm, kc) Pareto(x0, k0) x0, k0 kn where k0> knGamma(c, ) Gamma (0, 0) 0+ nc, 0+

    ni=1

    xi

    14.4 Bayesian Testing

    IfH0: 0:

    Prior probability P [H0] =

    0

    f() d

    Posterior probability P [H0 | xn] =

    0

    f( | xn) d

    LetH0, . . . , H K1 b e K hypotheses. Suppose f( | Hk),

    P [Hk | xn] = f(xn | Hk)P [Hk]Kk=1 f(x

    n | Hk)P [Hk],

    15

  • 8/12/2019 Statistics Cookbook

    16/28

    Marginal likelihood

    f(xn | Hi) =

    f(xn | , Hi)f( | Hi) d

    Posterior odds (ofHi relative to Hj)

    P [Hi | xn]P [Hj | xn] =

    f(xn | Hi)f(xn | Hj) Bayes Factor BFij

    P [Hi]P [Hj ] prior odds

    Bayes factorlog10 BF10 BF10 evidence

    0 0.5 1 1.5 Weak0.5 1 1.5 10 Moderate1 2 10 100 Strong>2 >100 Dec isive

    p =p

    1pBF101 + p1pBF10

    where p = P [H1] andp = P [H1 | xn]

    15 Exponential FamilyScalar parameter

    fX(x | ) = h(x)exp {()T(x) A()}=h(x)g()exp {()T(x)}

    Vector parameter

    fX(x | ) = h(x)exp

    si=1

    i()Ti(x) A()

    =h(x)exp

    {()

    T(x)

    A()

    }=h(x)g()exp {() T(x)}Natural form

    fX(x | ) = h(x)exp { T(x) A()}=h(x)g()exp { T(x)}=h(x)g()exp

    TT(x)

    16 Sampling Methods

    16.1 The BootstrapLet Tn = g(X1, . . . , X n) be a statistic.

    1. Estimate VF[Tn] with V Fn [Tn].2. ApproximateV Fn [Tn] using simulation:

    (a) Repeat the followingB times to get Tn,1, . . . , T n,B, an ii dsample from

    the sampling distribution implied byFni. Sample uniformlyX1 , . . . , X

    nFn.

    ii. ComputeTn = g(X1 , . . . , X

    n).

    (b) Then

    vboot=V Fn = 1BBb=1

    Tn,b

    1

    B

    Br=1

    Tn,r

    2

    16.1.1 Bootstrap Confidence Intervals

    Normal-based interval

    Tn z/2

    seboot

    Pivotal interval

    1. Location parameter = T(F)

    2. Pivot Rn =n 3. Let H(r) = P [Rn r] be the cdf ofRn4. Let Rn,b=n,b n. Approximate Husing bootstrap:

    H(r) = 1B

    Bb=1

    I(Rn,b r)

    5. = sample quantile of (n,1, . . . ,n,B)6. r = sample quantile of (R

    n,1, . . . , R

    n,B), i.e., r

    =

    n

    7. Approximate 1 confidence interval Cn=

    a, b

    where

    a= n H1 1 2

    = n r1/2= 2n 1/2

    b= n H1 2

    = n r/2= 2n /2

    Percentile interval

    Cn = /2, 1/2 16

  • 8/12/2019 Statistics Cookbook

    17/28

    16.2 Rejection Sampling

    Setup

    We can easily sample from g () We want to sample from h(), but it is difficult We know h() up to a proportional constant: h() = k()

    k () d

    Envelope condition: we can find M >0 such that k()

    M g()

    Algorithm

    1. Draw cand g()2. Generateu Unif (0, 1)3. Accept cand ifu k(

    cand)

    M g(cand)

    4. Repeat untilB values ofcand have been accepted

    Example

    We can easily sample from the prior g () = f()

    Target is the posterior h() k() = f(xn | )f() Envelope condition: f(xn | ) f(xn |n) = Ln(n) M Algorithm

    1. Draw cand f()2. Generateu Unif (0, 1)3. Accept cand ifu Ln(

    cand)

    Ln(n)16.3 Importance Sampling

    Sample from an importance function g rather than target density h.Algorithm to obtain an approximation to E [q() | xn]:

    1. Sample from the prior1, . . . , niid f()

    2. wi= Ln(i)Bi=1 Ln(i)

    i= 1, . . . , B

    3. E [q() | xn] Bi=1 q(i)wi17 Decision Theory

    Definitions

    Unknown quantity affecting our decision:

    Decision rule: synonymous for an estimator Actiona A: possible value of the decision rule. In the estimation

    context, the action is just an estimate of ,(x). Loss function L: consequences of taking action a when true state is or

    discrepancy between and, L : A [k, ).Loss functions

    Squared error loss: L(, a) = (

    a)2

    Linear loss: L(, a) =K1( a) a

  • 8/12/2019 Statistics Cookbook

    18/28

    17.3 Bayes Rule

    Bayes rule (or Bayes estimator)

    r(f,) = infr(f, )(x) = infr( | x) x = r(f,) = r ( | x)f(x) dx

    Theorems

    Squared error loss: posterior mean Absolute error loss: posterior median Zero-one loss: posterior mode

    17.4 Minimax Rules

    Maximum riskR() = sup

    R(,) R(a) = sup

    R(, a)

    Minimax rulesup

    R(,

    ) = inf

    R(

    ) = inf

    sup

    R(,

    )

    = Bayes rule c: R(,) = cLeast favorable prior

    f = Bayes rule R(,f) r(f,f) 18 Linear Regression

    Definitions

    Response variable Y CovariateX (aka predictor variable or feature)

    18.1 Simple Linear Regression

    ModelYi = 0+ 1Xi+ i E [i | Xi] = 0, V [i | Xi] = 2

    Fitted line r(x) =0+1xPredicted (fitted) values

    Yi =

    r(Xi)

    Residualsi = Yi Yi= Yi 0+1Xi

    Residual sums of squares (rss)

    rss(0,1) = ni=1

    2i

    Least square estimates

    T = (

    0,

    1)

    T : min

    0,

    1

    rss

    0= Yn 1 Xn1= ni=1(Xi Xn)(Yi Yn)n

    i=1(Xi Xn)2 =

    ni=1 XiYi nXYni=1 X

    2i nX2

    E

    | Xn = 01

    V

    | Xn

    = 2

    nsX

    n1

    ni=1 X

    2i Xn

    Xn 1

    se(0) = sXnn

    i=1X2i

    n

    se(1) = sX

    n

    where s2X = n1n

    i=1(Xi Xn)2 and2 = 1n2 ni=12i (unbiased estimate).Further properties:

    Consistency:0 P 0 and1 P 1 Asymptotic normality:

    0 0se(0) D N(0, 1) and 1 1se(1) D N(0, 1) Approximate 1 confidence intervals for 0 and1:

    0 z/2se(0) and1 z/2se(1) Wald test for H0 : 1= 0 vs. H1: 1= 0: reject H0 if|W| > z/2 where

    W =1/se(1).R2

    R2 = ni=1(Yi Y)2ni=1(Yi Y)2

    = 1 ni=12ini=1(Yi Y)2

    = 1 rsstss

    18

  • 8/12/2019 Statistics Cookbook

    19/28

    Likelihood

    L =ni=1

    f(Xi, Yi) =ni=1

    fX(Xi) ni=1

    fY|X(Yi | Xi) = L1 L2

    L1=ni=1

    fX(Xi)

    L2=n

    i=1

    fY|X(Yi | Xi) n

    exp 122 iYi (0 1Xi)2

    Under the assumption of Normality, the least squares estimator is also the mle

    2 = 1n

    ni=1

    2i

    18.2 Prediction

    ObserveX= x of the covariate and want to predict their outcome Y.

    Y =0+1xV

    Y = V 0+ x2V 1+ 2xCov 0,1Prediction interval 2n=2ni=1(Xi X)2ni(Xi X)2j + 1

    Y z/2n

    18.3 Multiple Regression

    Y =X +

    where

    X=

    X11 X1k... . . . ...Xn1 Xnk

    =1...

    k

    =1...

    n

    Likelihood

    L(, ) = (22)n/2 exp

    122

    rss

    rss= (y X)T(y X) = Y X2 =Ni=1

    (Yi xTi )2

    If the (k k) matrix XTX is invertible,= (XTX)1XTYV

    | Xn =2(XTX)1 N, 2(XTX)1Estimate regression function

    r(x) = kj=1

    jxjUnbiased estimate for 2

    2 = 1n k

    ni=1

    2i = X Y

    mle = X 2 = n kn

    2

    1 Confidence interval

    j z/2

    se(

    j)

    18.4 Model Selection

    Consider predicting a new observation Y for covariates X and let S Jdenote a subset of the covariates in the model, where |S| =k and|J| =n.Issues

    Underfitting: too few covariates yields high bias Overfitting: too many covariates yields high variance

    Procedure

    1. Assign a score to each model

    2. Search through all models to find the one with the highest score

    Hypothesis testing

    H0: j = 0 vs. H1: j= 0 j JMean squared prediction error (mspe)

    mspe= E

    (Y(S) Y)2Prediction risk

    R(S) =ni=1

    mspei =ni=1

    E

    (Yi(S) Yi )2

    Training error

    Rtr(S) = ni=1

    (Yi(S) Yi)219

  • 8/12/2019 Statistics Cookbook

    20/28

    R2

    R2(S) = 1 rss(S)tss

    = 1 Rtr(S)

    tss = 1

    ni=1(Yi(S) Y)2ni=1(Yi Y)2

    The training error is a downward-biased estimate of the prediction risk.

    E

    Rtr(S) < R (S)bias(Rtr(S)) = E Rtr(S) R(S) = 2

    ni=1

    Cov Yi, YiAdjustedR2

    R2(S) = 1 n 1n k

    rss

    tss

    Mallows Cp statisticR(S) =Rtr(S) + 2k2 = lack of fit + complexity penaltyAkaike Information Criterion (AIC)

    AIC(S) = n(S,2S) k

    Bayesian Information Criterion (BIC)

    BI C(S) = n(S,2S) k2log nValidation and training

    RV(S) = mi=1

    (Yi (S) Yi )2 m= |{validation data}|, often n4 or n2Leave-one-out cross-validation

    RCV(S) = ni=1

    (Yi Y(i))2 = ni=1

    Yi Yi(S)1 Uii(S)

    2

    U(S) = XS(XTSXS)

    1XS (hat matrix)

    19 Non-parametric Function Estimation

    19.1 Density Estimation

    Estimate f(x), where f(x) = P [X A] = A f(x) dx.Integrated square error (ise)

    L(f,fn) = f(x) fn(x)2 dx= J(h) + f2(x) dx

    Frequentist risk

    R(f,fn) = E L(f,fn) = b2(x) dx + v(x) dx

    b(x) = E

    fn(x)

    f(x)

    v(x) = V fn(x)19.1.1 Histograms

    Definitions

    Number of bins m Binwidthh = 1m BinBj has j observations Definepj =j/n and pj = Bj f(u) du

    Histogram estimator

    fn(x) = mj=1

    pjh

    I(x Bj)

    E

    fn(x) = pjh

    V

    fn(x)

    =pj(1 pj)

    nh2

    R(fn, f) h 212 (f(u))2 du + 1nhh =

    1

    n1/3

    6

    (f(u))2du

    1/3

    R(fn, f) Cn2/3

    C=

    3

    4

    2/3 (f(u))2 du

    1/3

    Cross-validation estimate ofE [J(h)]

    JCV(h) = f2n(x) dx 2nn

    i=1f(i)(Xi) = 2(n 1)h n + 1(n 1)h

    m

    j=1p2j

    20

  • 8/12/2019 Statistics Cookbook

    21/28

    19.1.2 Kernel Density Estimator (KDE)

    Kernel K

    K(x) 0 K(x) dx= 1 xK(x) dx= 0

    x2K(x) dx 2K> 0KDE

    fn(x) = 1n

    ni=1

    1

    hK

    x Xi

    h

    R(f,fn) 1

    4(hK)

    4

    (f(x))2 dx +

    1

    nh

    K2(x) dx

    h = c2/51 c

    1/52 c

    1/53

    n1/5 c1=

    2K, c2=

    K2(x) dx, c3=

    (f(x))2 dx

    R(f,

    fn) =

    c4n4/5

    c4=5

    4(2K)

    2/5

    K2(x) dx

    4/5

    C(K) (f)2 dx

    1/5

    Epanechnikov Kernel

    K(x) =

    3

    4

    5(1x2/5) |x| 0

    ipij = 1Chapman-Kolmogorov

    pij(m + n) =k

    pij(m)pkj(n)

    Pm+n= PmPn

    Pn = P P= PnMarginal probability

    n = (n(1), . . . , n(N)) where i(i) = P [Xn = i]

    0 initial distribution

    n = 0Pn

    20.2 Poisson Processes

    Poisson process

    {Xt : t [0, )} = number of events up to and including time t X0= 0 Independent increments:

    t0< < tn : Xt1 Xt0 Xtn Xtn1

    Intensity function(t)

    P [Xt+h Xt = 1] = (t)h + o(h) P [Xt+h Xt = 2] = o(h)

    Xs+t Xs Po (m(s + t) m(s)) where m(t) =t

    0(s) ds

    Homogeneous Poisson process

    (t) = Xt Po (t) > 0

    Waiting times

    Wt:= time at whichXt occurs

    Wt Gamma

    t,1

    Interarrival times

    St = Wt+1 Wt

    St Exp

    1

    tWt1 Wt

    St

    22

  • 8/12/2019 Statistics Cookbook

    23/28

    21 Time Series

    Mean function

    xt = E [xt] =

    xft(x) dx

    Autocovariance function

    x(s, t) = E [(xs s)(xt t)] = E [xsxt] stx(t, t) = E (xt t)2 = V [xt]

    Autocorrelation function (ACF)

    (s, t) = Cov [xs, xt]V [xs]V [xt]

    = (s, t)(s, s)(t, t)

    Cross-covariance function (CCV)

    xy(s, t) = E [(xs xs)(yt yt)]Cross-correlation function (CCF)

    xy(s, t) =

    xy(s, t)x(s, s)y(t, t)

    Backshift operatorBk(xt) = xtk

    Difference operatord = (1 B)d

    White noise

    wt wn(0, 2w) Gaussian: wt iid N

    0, 2w

    E [wt] = 0 t T V [wt] = 2 t T w(s, t) = 0 s =t s, t T

    Random walk

    Drift xt= t +

    tj=1 wj

    E [xt] = tSymmetric moving average

    mt =

    k

    j=k a

    jxtj where aj =aj 0 andk

    j=k a

    j = 1

    21.1 Stationary Time Series

    Strictly stationary

    P [xt1 c1, . . . , xtk ck] = P [xt1+h c1, . . . , xtk+h ck]

    k N, tk, ck, h Z

    Weakly stationary

    E x2t < t Z E x2t =m t Z x(s, t) = x(s + r, t + r) r,s,t Z

    Autocovariance function

    (h) = E [(xt+h )(xt )] h Z (0) = E (xt )2

    (0)

    0

    (0) |(h)| (h) = (h)

    Autocorrelation function (ACF)

    x(h) = Cov [xt+h, xt]V [xt+h]V [xt]

    = (t + h, t)(t + h, t + h)(t, t)

    =(h)

    (0)

    Jointly stationary time series

    xy(h) = E [(xt+h

    x)(yt

    y)]

    xy(h) = xy(h)

    x(0)y(h)

    Linear process

    xt = +

    j=jwtj where

    j=

    |j | <

    (h) = 2

    w

    j=

    j+hj

    23

  • 8/12/2019 Statistics Cookbook

    24/28

    21.2 Estimation of Correlation

    Sample mean

    x= 1

    n

    nt=1

    xt

    Sample variance

    V [x] = 1

    n

    n

    h=n1 |h|

    n x(h)Sample autocovariance function

    (h) = 1n

    nht=1

    (xt+h x)(xt x)

    Sample autocorrelation function

    (h) =(h)

    (0)

    Sample cross-variance function

    xy(h) = 1n

    nht=1

    (xt+h x)(yt y)

    Sample cross-correlation function

    xy(h) = xy(h)x(0)y(0)Properties

    x(h)= 1n ifxt is white noise

    xy(h) = 1

    n ifxt or yt is white noise

    21.3 Non-Stationary Time Series

    Classical decomposition model

    xt = t+ st+ wt

    t = trend

    st = seasonal component wt = random noise term

    21.3.1 Detrending

    Least squares

    1. Choose trend model, e.g.,t = 0+ 1t + 2t2

    2. Minimize rss to obtain trend estimatet =0+1t +2t23. Residuals noise wt

    Moving average

    The low-passfilter vt is a symmetric moving average mt withaj = 12k+1 :

    vt = 1

    2k+ 1

    ki=k

    xt1

    If 12k+1k

    i=kwtj 0, a linear trend function t = 0+ 1t passeswithout distortion

    Differencing

    t = 0+ 1t =

    xt= 1

    21.4 ARIMA models

    Autoregressive polynomial

    (z) = 1 1z pzp z C p= 0

    Autoregressive operator

    (B) = 1 1B pBp

    Autoregressive model order p, AR (p)

    xt = 1xt1+ + pxtp+ wt (B)xt= wtAR (1)

    xt = k(xtk) +k1j=0

    j(wtj)k,||

  • 8/12/2019 Statistics Cookbook

    25/28

    Moving average polynomial

    (z) = 1 + 1z+ + qzq z C q= 0Moving average operator

    (B) = 1 + 1B+ + pBp

    MA (q) (moving average model orderq)

    xt = wt+ 1wt

    1+

    + qwtq

    xt = (B)wt

    E [xt] =

    qj=0

    jE [wtj ] = 0

    (h) = Cov [xt+h, xt] =

    2wqh

    j=0jj+h 0 h q0 h > q

    MA (1)xt= wt+ wt1

    (h) =

    (1 + 2)2w h= 0

    2w h= 1

    0 h > 1

    (h) =

    (1+2) h= 1

    0 h > 1

    ARMA (p, q)

    xt = 1xt1+ + pxtp+ wt+ 1wt1+ + qwtq(B)xt= (B)wt

    Partial autocorrelation function (PACF)

    xh1i regression ofxi on{xh1, xh2, . . . , x1} hh= corr(xh x

    h

    1

    h , x0 xh

    1

    0 ) h 2 E.g.,11= corr(x1, x0) = (1)ARIMA (p, d, q)

    dxt = (1 B)dxt is ARMA (p, q)(B)(1 B)dxt= (B)wt

    Exponentially Weighted Moving Average (EWMA)

    xt = xt1+ wt wt1

    xt =

    j=1(1 )j1xtj+ wt when||

  • 8/12/2019 Statistics Cookbook

    26/28

    Amplitude A Phase U1= A cos and U2= A sin often normally distributed rvs

    Periodic mixture

    xt =

    qk=1

    (Uk1cos(2kt) + Uk2sin(2kt))

    Uk1, Uk2, fork = 1, . . . , q , are independent zero-mean rvs with variances 2k

    (h) = qk=1 2kcos(2kh) (0) = E x2t = qk=1 2k

    Spectral representation of a periodic process

    (h) = 2 cos(20h)

    =2

    2e2i0h +

    2

    2e2i0h

    =

    1/21/2

    e2ih dF()

    Spectral distribution function

    F() =

    0 < 02/2 < 02 0

    F() = F(1/2) = 0 F() = F(1/2) = (0)

    Spectral density

    f() =

    h=(h)e2ih 1

    2 1

    2

    Needsh= |(h)| < = (h) = 1/21/2 e2ihf() d h= 0, 1, . . . f() 0 f() = f() f() = f(1 ) (0) = V [xt] =

    1/21/2 f() d

    White noise: fw() = 2w ARMA (p, q) , (B)xt= (B)wt:

    fx() = 2w

    |(e2i)|2

    |(e2i)

    |2

    where (z) = 1 pk=1 kzk and (z) = 1 +qk=1 kzk

    Discrete Fourier Transform (DFT)

    d(j) = n1/2

    ni=1

    xte2ijt

    Fourier/Fundamental frequencies

    j = j/n

    Inverse DFT

    xt = n1/2

    n1j=0

    d(j)e2ijt

    PeriodogramI(j/n) = |d(j/n)|2

    Scaled Periodogram

    P(j/n) = 4

    nI(j/n)

    =

    2

    n

    n

    t=1xtcos(2tj/n

    2+

    2

    n

    n

    t=1xtsin(2tj/n

    2

    22 Math

    22.1 Gamma Function

    Ordinary: (s) =

    0

    ts1etdt

    Upper incomplete: (s, x) =x

    ts1etdt

    Lower incomplete: (s, x) = x

    0

    ts1etdt

    ( + 1) =() > 1 (n) = (n 1)! n N (1/2) =

    22.2 Beta Function

    Ordinary: B(x, y) = B(y, x) = 1

    0

    tx1(1 t)y1 dt= (x)(y)(x + y)

    Incomplete: B(x;a, b) = x

    0

    ta1(1 t)b1 dt Regularized incomplete:

    Ix(a, b) = B(x; a, b)B(a, b)a,b

    N

    =

    a+b1j=a

    (a + b 1)!j!(a + b 1 j)! xj(1 x)a+b1j26

  • 8/12/2019 Statistics Cookbook

    27/28

    I0(a, b) = 0 I1(a, b) = 1 Ix(a, b) = 1 I1x(b, a)

    22.3 Series

    Finite

    n

    k=1k=

    n(n + 1)

    2

    n

    k=1

    (2k 1) = n2

    n

    k=1

    k2 =n(n + 1)(2n + 1)

    6

    n

    k=1

    k3 =

    n(n + 1)

    2

    2

    nk=0

    ck = cn+1 1

    c 1 c = 1

    Binomial

    n

    k=0n

    k= 2n

    n

    k=0

    r+ k

    k

    =

    r+ n + 1

    n

    nk=0

    k

    m

    =

    n + 1

    m + 1

    Vandermondes Identity:

    rk=0

    m

    k

    n

    r k

    =

    m + n

    r

    Binomial Theorem:

    n

    k=0n

    kankbk = (a + b)nInfinite

    k=0

    pk = 1

    1 p ,k=1

    pk = p

    1 p |p|

  • 8/12/2019 Statistics Cookbook

    28/28

    Univariate

    distributionrelationships,court

    esyLeemisandMcQueston[2].

    28