Upload
alice1303
View
219
Download
0
Embed Size (px)
Citation preview
8/12/2019 Statistics Cookbook
1/28
Probability and Statistics
Cookbook
Copyright cMatthias Vallentin,[email protected]
10th October, 2012
http://matthias.vallentin.net/http://matthias.vallentin.net/mailto:[email protected]:[email protected]://matthias.vallentin.net/8/12/2019 Statistics Cookbook
2/28
This cookbook integrates a variety of topics in probability the-
ory and statistics. It is based on literature[1,6,3]and in-class
material from courses of the statistics department at the Uni-
versity of California in Berkeley but also influenced by other
sources[4,5]. If you find errors or have suggestions for further
topics, I would appreciate if you send me an email. The most re-
cent version of this document is available at http://matthias.
vallentin.net/probability-and-statistics-cookbook/ . To
reproduce, please contact me.
Contents
1 Distribution Overview 31.1 Discrete Distributions . . . . . . . . . . 31.2 Continuous Distributions . . . . . . . . 4
2 Probability Theory 6
3 Random Variables 6
3.1 Transformations . . . . . . . . . . . . . 7
4 Expectation 7
5 Variance 7
6 Inequalities 8
7 Distribution Relationships 8
8 Probability and Moment GeneratingFunctions 9
9 Multivariate Distributions 99.1 Standard Bivariate Normal . . . . . . . 99.2 Bivariate Normal . . . . . . . . . . . . . 99.3 Multivariate Normal . . . . . . . . . . . 9
10 Convergence 910.1 Law of Large Numbers (LLN) . . . . . . 1010.2 Central Limit Theorem (CLT) . . . . . 10
11 Statistical Inference 1011.1 Point Estimation . . . . . . . . . . . . . 1011.2 Normal-Based Confidence Interval . . . 11
11.3 Empirical distribution . . . . . . . . . . 1111.4 Statistical Functionals . . . . . . . . . . 11
12 Parametric Inference 11
12.1 Method of Moments . . . . . . . . . . . 11
12.2 Maximum Likelihood . . . . . . . . . . . 12
12.2.1 Delta Method . . . . . . . . . . . 12
12.3 Multiparameter Models . . . . . . . . . 12
12.3.1 Multiparameter delta method . . 13
12.4 Parametric Bootstrap . . . . . . . . . . 13
13 Hypothesis Testing 13
14 Bayesian Inference 14
14.1 Credible Intervals . . . . . . . . . . . . . 14
14.2 Function of parameters. . . . . . . . . . 14
14.3 Priors . . . . . . . . . . . . . . . . . . . 15
14.3.1 Conjugate Priors . . . . . . . . . 15
14.4 Bayesian Testing . . . . . . . . . . . . . 15
15 Exponential Family 16
16 Sampling Methods 1616.1 The Bootstrap . . . . . . . . . . . . . . 16
16.1.1 Bootstrap Confidence Intervals . 16
16.2 Rejection Sampling . . . . . . . . . . . . 17
16.3 Importance Sampling. . . . . . . . . . . 17
17 Decision Theory 17
17.1 Risk . . . . . . . . . . . . . . . . . . . . 17
17.2 Admissibility . . . . . . . . . . . . . . . 17
17.3 Bayes Rule . . . . . . . . . . . . . . . . 18
17.4 Minimax Rules . . . . . . . . . . . . . . 18
18 Linear Regression 18
18.1 Simple Linear Regression . . . . . . . . 18
18.2 Prediction . . . . . . . . . . . . . . . . . 19
18.3 Multiple Regression . . . . . . . . . . . 19
18.4 Model Selection . . . . . . . . . . . . . . 19
19 Non-parametric Function Estimation 20
19.1 Density Estimation . . . . . . . . . . . . 20
19.1.1 Histograms . . . . . . . . . . . . 20
19.1.2 Kernel Density Estimator (KDE) 21
19.2 Non-parametric Regression . . . . . . . 2119.3 Smoothing Using Orthogonal Functions 21
20 Stochastic Processes 2220.1 Markov Chains . . . . . . . . . . . . . . 2220.2 Poisson Processes . . . . . . . . . . . . . 22
21 Time Series 2321.1 Stationary Time Series . . . . . . . . . . 2321.2 Estimation of Correlation . . . . . . . . 2421.3 Non-Stationary Time Series . . . . . . . 24
21.3.1 Detrending . . . . . . . . . . . . 2421.4 ARIMA models . . . . . . . . . . . . . . 24
21.4.1 Causality and Invertibility . . . . 2521.5 Spectral Analysis . . . . . . . . . . . . . 25
22 Math 2622.1 Gamma Function . . . . . . . . . . . . . 2622.2 Beta Function . . . . . . . . . . . . . . . 2622.3 Series . . . . . . . . . . . . . . . . . . . 2722.4 Combinatorics . . . . . . . . . . . . . . 27
mailto:[email protected]:[email protected]://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/http://matthias.vallentin.net/probability-and-statistics-cookbook/mailto:[email protected]8/12/2019 Statistics Cookbook
3/28
1 Distribution Overview
1.1 Discrete Distributions
Notation1 FX(x) fX(x) E [X] V [X] MX(s)
Uniform Unif {a , . . . , b}
0 x < axa+1
ba a x b1 x > b
I(a < x < b)
b
a + 1
a + b
2
(b a + 1)2 112
eas e(b+1)ss(b
a)
Bernoulli Bern (p) (1p)1x px (1p)1x p p(1 p) 1 p +pes
Binomial Bin (n, p) I1p(n x, x + 1)
n
x
px (1 p)nx np np(1 p) (1p +pes)n
Multinomial Mult (n, p) n!
x1! . . . xk!px11 pxkk
ki=1
xi= n npi npi(1 pi)
ki=0
piesi
n
Hypergeometric Hyp(N , m, n)
x npnp(1p)
mx
mxnx
Nx
nmN
nm(N n)(Nm)N2(N 1)
Negative Binomial NBin(r, p) Ip(r, x + 1) x + r 1
r 1 pr(1
p)x r
1
p
p r
1
p
p2 p
1 (1p)esr
Geometric Geo (p) 1 (1p)x x N+ p(1p)x1 x N+ 1p
1 pp2
p
1 (1p)es
Poisson Po () exi=0
i
i!
xe
x! e(e
s1)
1
n
a bx
PMF
Uniform (discrete)
0.00
0.05
0.10
0.15
0.20
0.25
0 10 20 30 40x
PMF
n =40, p =0.3n =30, p =0.6n =25, p =0.9
Binomial
0.0
0.2
0.4
0.6
0.8
2 4 6 8 10x
PMF
p = 0.2p = 0.5p = 0.8
Geometric
0.0
0.1
0.2
0.3
0 5 10 15 20
x
PMF
= 1
= 4
= 10
Poisson
1We use the notation (s,x) and (x) to refer to the Gamma functions (see 22.1), and use B(x,y) and Ix to refer to the Beta functions (see 22.2).
3
8/12/2019 Statistics Cookbook
4/28
1.2 Continuous Distributions
Notation FX(x) fX(x) E [X] V [X] MX(s)
Uniform Unif (a, b)
0 x < axaba a < x < b
1 x > b
I(a < x < b)
b aa + b
2
(b a)212
esb esas(b a)
Normal
N, 2 (x) =
x
(t) dt (x) = 1
2exp
(x )2
22 2 exps +
2s2
2
Log-Normal lnN, 2 12
+1
2erf
ln x
22
1
x
22exp
(ln x )
2
22
e+
2/2 (e2 1)e2+2
Multivariate Normal MVN(, ) (2)k/2||1/2e 12 (x)T1(x) exp
Ts +1
2sTs
Students t Student() Ix
2,
2
+12
2
1 + x2
(+1)/20
2 >2 1< 2
Chi-square 2k1
(k/2)
k
2,x
2
1
2k/2(k/2)xk/21ex/2 k 2k (1 2s)k/2 s < 1/2
F F(d1, d2) I d1xd1x+d2
d12
,d1
2
(d1x)d1dd22(d1x+d2)
d1+d2
xBd12 ,
d12
d2d2 2
2d22(d1+ d2 2)d1(d2 2)2(d2 4)
Exponential Exp () 1 ex/ 1
ex/ 2 1
1 s (s < 1/)
Gamma Gamma (, ) (,x/)
()
1
() x1ex/ 2
1
1 s
(s < 1/)
Inverse Gamma InvGamma(, )
, x
()
()x1e/x
1 > 1 2
( 1)2( 2)2 > 2 2(s)/2
() K
4s
Dirichlet Dir ()
ki=1 i
k
i=1 (i)
k
i=1
xi1iiki=1 i
E [Xi] (1 E [Xi])ki=1 i+ 1
Beta Beta (, ) Ix(, ) ( + )
() ()x1 (1 x)1
+
( + )2( + + 1) 1 +
k=1
k1r=0
+ r
+ + r
sk
k!
Weibull Weibull(, k) 1 e(x/)k k
x
k1e(x/)
k
1 +
1
k
2
1 +
2
k
2
n=0
snn
n!
1 +n
k
Pareto Pareto(xm, ) 1xm
x
x xm x
m
x+1 x xm xm
1 > 1 xm
( 1)2( 2) > 2 (xms)(,xms)s
8/12/2019 Statistics Cookbook
5/28
1
b a
a bx
Uniform (continuous)
0.0
0.2
0.4
0.6
0.8
4 2 0 2 4
x
(x)
=0, 2
=0.2
=0, 2
=1
=0, 2
=5
=2, 2
=0.5
Normal
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0x
=0, 2
=3 =2,
2=2
=0, 2
=1 =0.5,
2=1
=0.25, 2
=1 =0.125,
2=1
LogNormal
0.0
0.1
0.2
0.3
0.4
4 2 0 2 4
x
=1
=2
=5
=
Student's t
0
1
2
3
4
0 2 4 6 8
x
k = 1
k = 2
k = 3
k = 4
k = 5
2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0 1 2 3 4 5
x
d1 =1, d2 =1
d1 =2, d2 =1
d1 =5, d2 =2
d1 =100, d2 =1
d1 =100, d2 =100
F
0.0
0.5
1.0
1.5
2.0
0 1 2 3 4 5x
= 2 = 1 = 0.4
Exponential
0.0
0.1
0.2
0.3
0.4
0.5
0 5 10 15 20
x
=1, =2 =2, =2 =3, =2 =5, =1 =9, =0.5
Gamma
0
1
2
3
4
0 1 2 3 4 5
x
=1, =1 =2, =1 =3, =1 =3, =0.5
Inverse Gamma
0
1
2
3
4
5
0.0 0.2 0.4 0.6 0.8 1.0
x
=0.5, =0.5 =5, =1 =1, =3 =2, =2 =2, =5
Beta
0
1
2
3
4
0.0 0.5 1.0 1.5 2.0 2.5
x
=1, k =0.5
=1, k =1 =1, k =1.5 =1, k =5
Weibull
0
1
2
3
0 1 2 3 4 5
x
xm =1, =1
xm =1, =2
xm =1, =4
Pareto
5
8/12/2019 Statistics Cookbook
6/28
2 Probability Theory
Definitions
Sample space Outcome (point or element) Event A -algebraA
1. A2. A1, A2, . . . , A =
i=1 Ai A
3. A A = A A Probability Distribution P
1. P [A] 0 A2. P [] = 1
3. P
i=1
Ai
=
i=1
P [Ai]
Probability space (, A,P)Properties
P [] = 0 B = B= (A A) B = (A B) (A B) P [A] = 1 P [A] P [B] = P [A B] + P [A B] P [] = 1 P [] = 0 (n An) = n An (n An) = n An DeMorgan P [n An] = 1 P [n An] P [A B] = P [A] + P [B] P [A B]
= P
[A B] P
[A] +P
[B] P [A B] = P [A B] + P [A B] + P [A B] P [A B] = P [A] P [A B]
Continuity of Probabilities
A1 A2 . . . = limn P [An] = P [A] whereA=i=1 Ai
A1 A2 . . . = limn P [An] = P [A] whereA=i=1 Ai
IndependenceA B P [A B] = P [A]P [B]
Conditional Probability
P [A | B] = P [A B]P [B]
P [B]> 0
Law of Total Probability
P [B] =ni=1
P [B|Ai]P [Ai] =ni=1
Ai
Bayes Theorem
P [Ai
|B] =
P [B | Ai]P [Ai]
nj=1 P [B | Aj ]P [Aj ] =n
i=1 AiInclusion-Exclusion Principle n
i=1
Ai
= nr=1
(1)r1
ii1
8/12/2019 Statistics Cookbook
7/28
3.1 Transformations
Transformation functionZ= (X)
Discrete
fZ(z) = P [(X) = z ] = P [{x: (x) = z}] = P
X 1(z) = x1(z)
f(x)
Continuous
FZ(z) = P [(X) z] =Az
f(x) dx withAz = {x: (x) z}
Special case if strictly monotone
fZ(z) = fX(1(z))
ddz 1(z) =fX(x) dxdz
=fX(x) 1|J|The Rule of the Lazy Statistician
E [Z] = (x) dFX(x)E [IA(x)] =
IA(x) dFX(x) =
A
dFX(x) = P [X A]
Convolution
Z:= X+ Y fZ(z) =
fX,Y(x, z x) dx X,Y0= z
0
fX,Y(x, z x) dx
Z:= |X Y| fZ(z) = 2
0
fX,Y(x, z+ x) dx
Z:=
X
Y
fZ(z) =
|x
|fX,Y(x,xz) dx
=
xfx(x)fX(x)fY(xz) dx
4 Expectation
Definition and properties
E [X] = X =
x dFX(x) =
x
xfX(x) X discrete
xfX(x) X continuous
P [X= c] = 1 = E [c] = c E [cX] = c E [X] E [X+ Y] = E [X] + E [Y]
E [XY ] =X,Y
xyfX,Y(x, y) dFX(x) dFY(y)
E [(Y)] =(E [X]) (cf. Jenseninequality) P [X Y] = 0 = E [X] E [Y] P [X= Y] = 1 = E [X] = E [Y] E [X] =
x=1
P [X x]
Sample mean
Xn = 1n
ni=1
Xi
Conditional expectation
E [Y| X= x] =
yf(y | x) dy E [X] = E [E [X| Y]] E[(X, Y) | X= x] =
(x, y)fY|X(y | x) dx
E [(Y, Z) | X= x] =
(y, z)f(Y,Z)|X(y, z | x) dydz E [Y + Z| X] = E [Y| X] + E [Z| X] E [(X)Y| X] = (X)E [Y| X] E[Y| X] = c = Cov [X, Y] = 0
5 VarianceDefinition and properties
V [X] = 2X = E
(X E [X])2 = E X2 E [X]2 V
n
i=1 Xi =n
i=1 V [Xi] + 2i=j Cov [Xi, Yj ] V
ni=1
Xi
=
ni=1
V [Xi] ifXi Xj
Standard deviationsd[X] =
V [X] = X
Covariance
Cov [X, Y] = E [(X E [X])(Y E [Y])] = E [XY ] E [X]E [Y] Cov [X, a] = 0 Cov [X, X] = V [X] Cov [X, Y] = Cov [Y, X] Cov [aX,bY] = abCov [X, Y]
7
8/12/2019 Statistics Cookbook
8/28
Cov [X+ a, Y + b] = Cov [X, Y]
Cov ni=1
Xi,
mj=1
Yj
= ni=1
mj=1
Cov [Xi, Yj ]
Correlation
[X, Y] = Cov [X, Y]
V [X]V [Y]
Independence
X Y = [X, Y] = 0 Cov [X, Y] = 0 E [XY ] = E [X]E [Y]Sample variance
S2 = 1
n 1ni=1
(Xi Xn)2
Conditional variance
V [Y| X] = E (Y E [Y| X])2 | X = E Y2 | X E [Y| X]2 V [Y] = E [V [Y| X]] + V [E [Y| X]]
6 Inequalities
Cauchy-Schwarz
E [XY ]2 E X2E Y2
Markov
P [(X) t] E [(X)]t
Chebyshev
P [|X E [X]| t] V [X]t2
Chernoff
P [X (1 + )] e
(1 + )1+
> 1
Jensen
E [(X)] (E [X]) convex
7 Distribution Relationships
Binomial
Xi
Bern(p) =
n
i=1 Xi Bin(n, p) X Bin(n, p) , Y Bin(m, p) = X+ Y Bin(n + m, p)
limn Bin(n, p) = Po(np) (n large,p small) limn Bin(n, p) =N(np,np(1 p)) (n large, p far from 0 and 1)
Negative Binomial
X NBin (1, p) = Geo (p) X NBin (r, p) = ri=1Geo (p) Xi NBin(ri, p) =
Xi NBin(
ri, p)
X
NBin (r, p) . Y
Bin(s + r, p) =
P [X
s] = P [Y
r]
Poisson
Xi Po (i) Xi Xj =ni=1
Xi Po
ni=1
i
Xi Po (i) Xi Xj = Xi
nj=1
Xj Bin nj=1
Xj , inj=1 j
Exponential
Xi Exp() Xi Xj =ni=1
Xi Gamma (n, )
Memoryless property: P [X > x + y | X > y ] = P [X > x]Normal
X N, 2 = X N(0, 1) X N, 2 Z= aX+ b = Z Na + b, a22 X N1, 21 Y N2, 22 = X+ Y N1+ 2, 21+ 22 Xi N
i,
2i
=i Xi Ni i,i 2i
P [a < X b] = b
a
(x) = 1 (x) (x) = x(x) (x) = (x2 1)(x) Upper quantile ofN(0, 1): z= 1(1 )
Gamma
X Gamma (, ) X/ Gamma (, 1) Gamma (, ) i=1Exp () Xi Gamma(i, ) Xi Xj =
i Xi Gamma (
i i, )
()
=
0
x1ex dx
Beta
1B(, )
x1(1 x)1 = ( + )()()
x1(1 x)1
E Xk =
B( + k, )
B(, ) =
+ k 1 + + k 1
E Xk1 Beta (1, 1) Unif(0, 1)8
8/12/2019 Statistics Cookbook
9/28
8 Probability and Moment Generating Functions
GX(t) = E
tX |t| 0) lim
nP [|Xn X| > ] = 0
3. Almost surely (strongly): Xnas X
P limn
Xn= X = P : limn
Xn() = X() = 19
8/12/2019 Statistics Cookbook
10/28
4. In quadratic mean (L2): Xnqm X
limn
E
(Xn X)2
= 0
Relationships
Xn qm X = Xn P X = Xn D X
Xn
as
X =
Xn
P
X
Xn D X (c R) P [X= c] = 1 = Xn P X Xn P X Yn P Y = Xn+ Yn P X+ Y Xn qm X Yn qm Y = Xn+ Yn qm X+ Y Xn P X Yn P Y = XnYn P XY Xn P X = (Xn) P (X) Xn D X = (Xn) D (X) Xn qm b limn E [Xn] = b limnV [Xn] = 0 X1, . . . , X n i id E [X] = V [X]< Xn qm
Slutzkys Theorem
Xn D X andYn P c = Xn+ Yn D X+ c Xn D X andYn P c = XnYn D cX In general: Xn D XandYn D Y = Xn+ Yn D X+ Y
10.1 Law of Large Numbers (LLN)
Let{X1, . . . , X n} be a sequence of iid rvs, E [X1] = .
Weak (WLLN)
XnP n
Strong (SLLN)
Xnas n
10.2 Central Limit Theorem (CLT)
Let{X1, . . . , X n} be a sequence of iid rvs, E [X1] = , and V [X1] = 2.
Zn:=Xn
V Xn=
n(Xn )
D Z where Z N(0, 1)
limn
P [Zn z] = (z) z R
CLT notations
Zn N(0, 1)
Xn N
, 2
n
Xn N
0,
2
n
n(
Xn ) N0, 2
n(Xn )n
N(0, 1)
Continuity correction
P
Xn x x + 12
/
n
P
Xn x
1
x 12
/
n
Delta method
Yn N
, 2
n
= (Yn) N
(), (())2
2
n
11 Statistical Inference
LetX1, , Xn iid Fif not otherwise noted.
11.1 Point Estimation
Point estimatorn of is a rv:n = g(X1, . . . , X n) bias(n) = E n Consistency:n P Sampling distribution: F(n) Standard error: se(n) = V n Mean squared error: mse= E
(n )2 =bias(n)2 + V n
limn bias(n) = 0 limn se(n) = 0 =n is consistent Asymptotic normality:
n
se
D N(0, 1)
Slutzkys Theorem often lets us replace se(n) by some (weakly) consis-tent estimatorn.
10
8/12/2019 Statistics Cookbook
11/28
11.2 Normal-Based Confidence Interval
Supposen N,se2. Let z/2 = 1(1 (/2)), i.e., P Z > z/2 = /2and P
z/2 < Z < z/2 = 1 where Z N(0, 1). ThenCn=n z/2se
11.3 Empirical distribution
Empirical Distribution Function (ECDF)
Fn(x) = ni=1 I(Xi x)n
I(Xi x) =
1 Xi x0 Xi > x
Properties (for any fixed x)
EFn =F(x)
VFn = F(x)(1 F(x))
n
mse= F(x)(1 F(x))n
D 0Fn P F(x)
Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (X1, . . . , X n F)
P supx F(x) Fn(x) > = 2e2n2Nonparametric 1 confidence band for F
L(x) = max{Fn n, 0}U(x) = min{Fn+ n, 1}
=
1
2nlog
2
P [L(x) F(x) U(x) x] 1
11.4 Statistical Functionals
Statistical functional: T(F) Plug-in estimator of = (F):n = T(Fn) Linear functional: T(F) = (x) dFX(x) Plug-in estimator for linear functional:
T(Fn) = (x) dFn(x) = 1nn
i=1 (Xi) Often: T(Fn) NT(F),se2 = T(Fn) z/2se pth quantile: F1(p) = inf{x: F(x) p}= Xn2 = 1
n 1ni=1
(Xi Xn)2
=
1n
ni=1(Xi )3
3j
= ni=1(Xi
Xn)(Yi
Yn)n
i=1(Xi Xn)2ni=1(Yi Yn)12 Parametric Inference
Let F=
f(x; ) : be a parametric model with parameter space Rkand parameter = (1, . . . , k).
12.1 Method of Moments
jth moment
j() = E Xj = xj dFX(x)jth sample moment
j = 1n
ni=1
Xji
Method of moments estimator (MoM)
1() =12() =
2
..
. =
..
.k() =k
11
8/12/2019 Statistics Cookbook
12/28
Properties of the MoM estimator
n exists with probability tending to 1 Consistency:n P Asymptotic normality:
n(
) D N(0, )
where =gE Y YT gT, Y = (X, X2, . . . , X k)T,g= (g1, . . . , gk) and gj = 1j ()12.2 Maximum Likelihood
Likelihood:Ln: [0, )
Ln() =ni=1
f(Xi; )
Log-likelihood
n() = log Ln() =ni=1
log f(Xi; )
Maximum likelihood estimator (mle)
Ln(n) = sup
Ln()
Score function
s(X; ) =
log f(X; )
Fisher informationI() = V[s(X; )]
In() = nI()
Fisher information (exponential family)
I() = E
s(X; )
Observed Fisher information
Iobsn () = 2
2
ni=1
log f(Xi; )
Properties of the mle
Consistency:n P
Equivariance:n is the mle = (n) ist the mle of() Asymptotic normality:
1. se 1/In()(n )
se
D N(0, 1)
2.
se
1/In(
n)
(n )se D N(0, 1) Asymptotic optimality (or efficiency), i.e., smallest variance for large sam-
ples. Ifn is any other estimator, the asymptotic relative efficiency isare(n,n) = V
nV
n 1 Approximately the Bayes estimator
12.2.1 Delta MethodIf=() where is differentiable and () = 0:
(n )se() D N(0, 1)where=() is the mle of and
se= ()se(n)12.3 Multiparameter Models
Let = (1, . . . , k) and= (1, . . . ,k) be the mle.Hjj =
2n2
Hjk = 2njk
Fisher information matrix
In() =
E[H11] E[H1k]... . . . ...E[Hk1] E[Hkk ]
Under appropriate regularity conditions
( ) N(0, Jn)12
8/12/2019 Statistics Cookbook
13/28
withJn() = I1n . Further, if
j is the j th component of , then(j j)sej D N(0, 1)
where
se
2j =Jn(j, j) and Cov
j ,k
=Jn(j, k)
12.3.1 Multiparameter delta method
Let =(1, . . . , k) and let the gradient of be
=
1...
k
Suppose
== 0 and=(). Then,( )se() D N(0, 1)
where
se() = TJn
andJn = Jn() and= =.
12.4 Parametric Bootstrap
Sample from f(x;n) instead of fromFn, wheren could be the mleor methodof moments estimator.
13 Hypothesis Testing
H0: 0 versus H1: 1Definitions
Null hypothesis H0 Alternative hypothesis H1
Simple hypothesis = 0
Composite hypothesis > 0 or < 0 Two-sided test: H0 : = 0 versus H1: =0 One-sided test: H0: 0 versus H1: > 0 Critical value c Test statistic T Rejection regionR = {x: T(x)> c} Power function() = P [X R] Power of a test: 1 P [Type II error] = 1 = inf
1()
Test size: = P [Type I error] = sup0
()
RetainH0 RejectH0H0 true
Type I Error ()
H1 true Type II Error ()
(power)p-value
p-value = sup0 P[T(X) T(x)] = inf
: T(x) R
p-value = sup0 P[T(X) T(X)] 1F(T(X) ) s ince T(X)F
= inf
: T(X) R
p-va lue ev ide nce
0.1 l ittle or no evidence against H0
Wald test
Two-sided test Reject H0 when|W| > z/2 where W=
0se P |W| > z/2 p-value = P0[|W| > |w|] P [|Z| > |w|] = 2(|w|)
Likelihood ratio test (LRT)
T(X) = sup Ln()sup0 Ln()
= Ln(n)Ln(n,0) 13
8/12/2019 Statistics Cookbook
14/28
(X) = 2 log T(X) D 2rq whereki=1
Z2i 2k andZ1, . . . , Z k iid N(0, 1)
p-value = P0[(X)> (x)] P
2rq > (x)
Multinomial LRT
mle:
pn=
X1n
, . . . ,Xk
n
T(X) =Ln(pn)Ln(p0) = kj=1pjp0jX
j
(X) = 2k
j=1
Xjlog
pjp0j
D 2k1
The approximate size LRT rejects H0 when(X) 2k1,Pearson Chi-square Test
T =kj=1
(XjE [Xj ])2E [Xj ]
where E [Xj ] = np0j under H0
T D 2k1 p-value = P 2k1> T(x) Faster D X2k1 than LRT, hence preferable for small n
Independence testing
I rows,J columns, X multinomial sample of size n = I J mles unconstrained:pij = Xijn mles under H0:p0ij =pipj = Xin Xjn LRT: = 2
Ii=1
Jj=1 Xijlog
nXijXiXj PearsonChiSq: T = Ii=1Jj=1 (XijE[Xij ])2E[Xij ]
LRT and Pearson D 2k, where = (I 1)(J 1)
14 Bayesian Inference
Bayes Theorem
f( | x) = f(x | )f()f(xn)
= f(x | )f()f(x | )f() d Ln()f()
Definitions
Xn = (X1, . . . , X n)
xn = (x1, . . . , xn) Prior density f() Likelihood f(xn | ): joint density of the data
In particular,Xn iid = f(xn | ) =ni=1
f(xi | ) = Ln()
Posterior density f( | xn) Normalizing constantcn = f(xn) = f(x | )f() d Kernel: part of a density that depends on Posterior mean n =
f( | xn) d=
Ln()f()Ln()f() d
14.1 Credible Intervals
Posterior interval
P [ (a, b) | xn] = ba
f( | xn) d= 1
Equal-tail credible interval a
f( | xn) d=b
f( | xn) d= /2
Highest posterior density (HPD) region Rn
1. P [ Rn] = 1 2. Rn= {: f( | xn)> k} for some k
Rn is unimodal = Rn is an interval
14.2 Function of parametersLet= () andA = {: () }.Posterior CDF for
H(r | xn) = P [() | xn] =A
f( | xn) d
Posterior density
h(| xn) = H(| xn)Bayesian delta method
| Xn N(),se () 14
8/12/2019 Statistics Cookbook
15/28
14.3 Priors
Choice
Subjective bayesianism. Objective bayesianism. Robust bayesianism.
Types
Flat: f() constant Proper: f() d= 1 Improper: f() d= Jeffreys prior (transformation-invariant):
f()
I() f()
det(I())
Conjugate: f() andf( | xn) belong to the same parametric family
14.3.1 Conjugate Priors
Discrete likelihood
Likelihood Conjugate prior Posterior hyperparameters
Bern(p) Beta (, ) +n
i=1 xi, + n n
i=1 xiBin(p) Beta (, ) +
ni=1
xi, +ni=1
Ni ni=1
xi
NBin (p) Beta (, ) + rn,+ni=1
xi
Po () Gamma (, ) +ni=1
xi, + n
Multinomial(p) Dir () +
ni=1
x(i)
Geo(p) Beta (, ) + n, +
ni=1
xi
Continuous likelihood (subscript c denotes constant)
Likelihood Conjugate prior Posterior hyperparameters
Unif (0, ) Pareto(xm, k) max
x(n), xm
, k+ n
Exp() Gamma (, ) + n, +ni=1
xi
N, 2c N0,
20
0
2
0
+ ni=1 xi
2
c /1
2
0
+ n
2
c , 120
+ n
2c
1Nc, 2 Scaled Inverse Chi-
square(, 20 )+ n,
20+n
i=1(xi )2+ n
N, 2 Normal-scaled InverseGamma(,,,)
+ nx
+ n , + n, +
n
2,
+1
2
ni=1
(xi x)2 + (x )2
2(n + )
MVN(, c) MVN(0, 0) 10 + n1c 1 10 0+ n1x,10 + n
1c1
MVN(c, ) Inve rse-Wishart(, )
n + , +
ni=1
(xi c)(xi c)T
Pareto(xmc , k) Gamma (, ) + n, +ni=1
log xixmc
Pareto(xm, kc) Pareto(x0, k0) x0, k0 kn where k0> knGamma(c, ) Gamma (0, 0) 0+ nc, 0+
ni=1
xi
14.4 Bayesian Testing
IfH0: 0:
Prior probability P [H0] =
0
f() d
Posterior probability P [H0 | xn] =
0
f( | xn) d
LetH0, . . . , H K1 b e K hypotheses. Suppose f( | Hk),
P [Hk | xn] = f(xn | Hk)P [Hk]Kk=1 f(x
n | Hk)P [Hk],
15
8/12/2019 Statistics Cookbook
16/28
Marginal likelihood
f(xn | Hi) =
f(xn | , Hi)f( | Hi) d
Posterior odds (ofHi relative to Hj)
P [Hi | xn]P [Hj | xn] =
f(xn | Hi)f(xn | Hj) Bayes Factor BFij
P [Hi]P [Hj ] prior odds
Bayes factorlog10 BF10 BF10 evidence
0 0.5 1 1.5 Weak0.5 1 1.5 10 Moderate1 2 10 100 Strong>2 >100 Dec isive
p =p
1pBF101 + p1pBF10
where p = P [H1] andp = P [H1 | xn]
15 Exponential FamilyScalar parameter
fX(x | ) = h(x)exp {()T(x) A()}=h(x)g()exp {()T(x)}
Vector parameter
fX(x | ) = h(x)exp
si=1
i()Ti(x) A()
=h(x)exp
{()
T(x)
A()
}=h(x)g()exp {() T(x)}Natural form
fX(x | ) = h(x)exp { T(x) A()}=h(x)g()exp { T(x)}=h(x)g()exp
TT(x)
16 Sampling Methods
16.1 The BootstrapLet Tn = g(X1, . . . , X n) be a statistic.
1. Estimate VF[Tn] with V Fn [Tn].2. ApproximateV Fn [Tn] using simulation:
(a) Repeat the followingB times to get Tn,1, . . . , T n,B, an ii dsample from
the sampling distribution implied byFni. Sample uniformlyX1 , . . . , X
nFn.
ii. ComputeTn = g(X1 , . . . , X
n).
(b) Then
vboot=V Fn = 1BBb=1
Tn,b
1
B
Br=1
Tn,r
2
16.1.1 Bootstrap Confidence Intervals
Normal-based interval
Tn z/2
seboot
Pivotal interval
1. Location parameter = T(F)
2. Pivot Rn =n 3. Let H(r) = P [Rn r] be the cdf ofRn4. Let Rn,b=n,b n. Approximate Husing bootstrap:
H(r) = 1B
Bb=1
I(Rn,b r)
5. = sample quantile of (n,1, . . . ,n,B)6. r = sample quantile of (R
n,1, . . . , R
n,B), i.e., r
=
n
7. Approximate 1 confidence interval Cn=
a, b
where
a= n H1 1 2
= n r1/2= 2n 1/2
b= n H1 2
= n r/2= 2n /2
Percentile interval
Cn = /2, 1/2 16
8/12/2019 Statistics Cookbook
17/28
16.2 Rejection Sampling
Setup
We can easily sample from g () We want to sample from h(), but it is difficult We know h() up to a proportional constant: h() = k()
k () d
Envelope condition: we can find M >0 such that k()
M g()
Algorithm
1. Draw cand g()2. Generateu Unif (0, 1)3. Accept cand ifu k(
cand)
M g(cand)
4. Repeat untilB values ofcand have been accepted
Example
We can easily sample from the prior g () = f()
Target is the posterior h() k() = f(xn | )f() Envelope condition: f(xn | ) f(xn |n) = Ln(n) M Algorithm
1. Draw cand f()2. Generateu Unif (0, 1)3. Accept cand ifu Ln(
cand)
Ln(n)16.3 Importance Sampling
Sample from an importance function g rather than target density h.Algorithm to obtain an approximation to E [q() | xn]:
1. Sample from the prior1, . . . , niid f()
2. wi= Ln(i)Bi=1 Ln(i)
i= 1, . . . , B
3. E [q() | xn] Bi=1 q(i)wi17 Decision Theory
Definitions
Unknown quantity affecting our decision:
Decision rule: synonymous for an estimator Actiona A: possible value of the decision rule. In the estimation
context, the action is just an estimate of ,(x). Loss function L: consequences of taking action a when true state is or
discrepancy between and, L : A [k, ).Loss functions
Squared error loss: L(, a) = (
a)2
Linear loss: L(, a) =K1( a) a
8/12/2019 Statistics Cookbook
18/28
17.3 Bayes Rule
Bayes rule (or Bayes estimator)
r(f,) = infr(f, )(x) = infr( | x) x = r(f,) = r ( | x)f(x) dx
Theorems
Squared error loss: posterior mean Absolute error loss: posterior median Zero-one loss: posterior mode
17.4 Minimax Rules
Maximum riskR() = sup
R(,) R(a) = sup
R(, a)
Minimax rulesup
R(,
) = inf
R(
) = inf
sup
R(,
)
= Bayes rule c: R(,) = cLeast favorable prior
f = Bayes rule R(,f) r(f,f) 18 Linear Regression
Definitions
Response variable Y CovariateX (aka predictor variable or feature)
18.1 Simple Linear Regression
ModelYi = 0+ 1Xi+ i E [i | Xi] = 0, V [i | Xi] = 2
Fitted line r(x) =0+1xPredicted (fitted) values
Yi =
r(Xi)
Residualsi = Yi Yi= Yi 0+1Xi
Residual sums of squares (rss)
rss(0,1) = ni=1
2i
Least square estimates
T = (
0,
1)
T : min
0,
1
rss
0= Yn 1 Xn1= ni=1(Xi Xn)(Yi Yn)n
i=1(Xi Xn)2 =
ni=1 XiYi nXYni=1 X
2i nX2
E
| Xn = 01
V
| Xn
= 2
nsX
n1
ni=1 X
2i Xn
Xn 1
se(0) = sXnn
i=1X2i
n
se(1) = sX
n
where s2X = n1n
i=1(Xi Xn)2 and2 = 1n2 ni=12i (unbiased estimate).Further properties:
Consistency:0 P 0 and1 P 1 Asymptotic normality:
0 0se(0) D N(0, 1) and 1 1se(1) D N(0, 1) Approximate 1 confidence intervals for 0 and1:
0 z/2se(0) and1 z/2se(1) Wald test for H0 : 1= 0 vs. H1: 1= 0: reject H0 if|W| > z/2 where
W =1/se(1).R2
R2 = ni=1(Yi Y)2ni=1(Yi Y)2
= 1 ni=12ini=1(Yi Y)2
= 1 rsstss
18
8/12/2019 Statistics Cookbook
19/28
Likelihood
L =ni=1
f(Xi, Yi) =ni=1
fX(Xi) ni=1
fY|X(Yi | Xi) = L1 L2
L1=ni=1
fX(Xi)
L2=n
i=1
fY|X(Yi | Xi) n
exp 122 iYi (0 1Xi)2
Under the assumption of Normality, the least squares estimator is also the mle
2 = 1n
ni=1
2i
18.2 Prediction
ObserveX= x of the covariate and want to predict their outcome Y.
Y =0+1xV
Y = V 0+ x2V 1+ 2xCov 0,1Prediction interval 2n=2ni=1(Xi X)2ni(Xi X)2j + 1
Y z/2n
18.3 Multiple Regression
Y =X +
where
X=
X11 X1k... . . . ...Xn1 Xnk
=1...
k
=1...
n
Likelihood
L(, ) = (22)n/2 exp
122
rss
rss= (y X)T(y X) = Y X2 =Ni=1
(Yi xTi )2
If the (k k) matrix XTX is invertible,= (XTX)1XTYV
| Xn =2(XTX)1 N, 2(XTX)1Estimate regression function
r(x) = kj=1
jxjUnbiased estimate for 2
2 = 1n k
ni=1
2i = X Y
mle = X 2 = n kn
2
1 Confidence interval
j z/2
se(
j)
18.4 Model Selection
Consider predicting a new observation Y for covariates X and let S Jdenote a subset of the covariates in the model, where |S| =k and|J| =n.Issues
Underfitting: too few covariates yields high bias Overfitting: too many covariates yields high variance
Procedure
1. Assign a score to each model
2. Search through all models to find the one with the highest score
Hypothesis testing
H0: j = 0 vs. H1: j= 0 j JMean squared prediction error (mspe)
mspe= E
(Y(S) Y)2Prediction risk
R(S) =ni=1
mspei =ni=1
E
(Yi(S) Yi )2
Training error
Rtr(S) = ni=1
(Yi(S) Yi)219
8/12/2019 Statistics Cookbook
20/28
R2
R2(S) = 1 rss(S)tss
= 1 Rtr(S)
tss = 1
ni=1(Yi(S) Y)2ni=1(Yi Y)2
The training error is a downward-biased estimate of the prediction risk.
E
Rtr(S) < R (S)bias(Rtr(S)) = E Rtr(S) R(S) = 2
ni=1
Cov Yi, YiAdjustedR2
R2(S) = 1 n 1n k
rss
tss
Mallows Cp statisticR(S) =Rtr(S) + 2k2 = lack of fit + complexity penaltyAkaike Information Criterion (AIC)
AIC(S) = n(S,2S) k
Bayesian Information Criterion (BIC)
BI C(S) = n(S,2S) k2log nValidation and training
RV(S) = mi=1
(Yi (S) Yi )2 m= |{validation data}|, often n4 or n2Leave-one-out cross-validation
RCV(S) = ni=1
(Yi Y(i))2 = ni=1
Yi Yi(S)1 Uii(S)
2
U(S) = XS(XTSXS)
1XS (hat matrix)
19 Non-parametric Function Estimation
19.1 Density Estimation
Estimate f(x), where f(x) = P [X A] = A f(x) dx.Integrated square error (ise)
L(f,fn) = f(x) fn(x)2 dx= J(h) + f2(x) dx
Frequentist risk
R(f,fn) = E L(f,fn) = b2(x) dx + v(x) dx
b(x) = E
fn(x)
f(x)
v(x) = V fn(x)19.1.1 Histograms
Definitions
Number of bins m Binwidthh = 1m BinBj has j observations Definepj =j/n and pj = Bj f(u) du
Histogram estimator
fn(x) = mj=1
pjh
I(x Bj)
E
fn(x) = pjh
V
fn(x)
=pj(1 pj)
nh2
R(fn, f) h 212 (f(u))2 du + 1nhh =
1
n1/3
6
(f(u))2du
1/3
R(fn, f) Cn2/3
C=
3
4
2/3 (f(u))2 du
1/3
Cross-validation estimate ofE [J(h)]
JCV(h) = f2n(x) dx 2nn
i=1f(i)(Xi) = 2(n 1)h n + 1(n 1)h
m
j=1p2j
20
8/12/2019 Statistics Cookbook
21/28
19.1.2 Kernel Density Estimator (KDE)
Kernel K
K(x) 0 K(x) dx= 1 xK(x) dx= 0
x2K(x) dx 2K> 0KDE
fn(x) = 1n
ni=1
1
hK
x Xi
h
R(f,fn) 1
4(hK)
4
(f(x))2 dx +
1
nh
K2(x) dx
h = c2/51 c
1/52 c
1/53
n1/5 c1=
2K, c2=
K2(x) dx, c3=
(f(x))2 dx
R(f,
fn) =
c4n4/5
c4=5
4(2K)
2/5
K2(x) dx
4/5
C(K) (f)2 dx
1/5
Epanechnikov Kernel
K(x) =
3
4
5(1x2/5) |x| 0
ipij = 1Chapman-Kolmogorov
pij(m + n) =k
pij(m)pkj(n)
Pm+n= PmPn
Pn = P P= PnMarginal probability
n = (n(1), . . . , n(N)) where i(i) = P [Xn = i]
0 initial distribution
n = 0Pn
20.2 Poisson Processes
Poisson process
{Xt : t [0, )} = number of events up to and including time t X0= 0 Independent increments:
t0< < tn : Xt1 Xt0 Xtn Xtn1
Intensity function(t)
P [Xt+h Xt = 1] = (t)h + o(h) P [Xt+h Xt = 2] = o(h)
Xs+t Xs Po (m(s + t) m(s)) where m(t) =t
0(s) ds
Homogeneous Poisson process
(t) = Xt Po (t) > 0
Waiting times
Wt:= time at whichXt occurs
Wt Gamma
t,1
Interarrival times
St = Wt+1 Wt
St Exp
1
tWt1 Wt
St
22
8/12/2019 Statistics Cookbook
23/28
21 Time Series
Mean function
xt = E [xt] =
xft(x) dx
Autocovariance function
x(s, t) = E [(xs s)(xt t)] = E [xsxt] stx(t, t) = E (xt t)2 = V [xt]
Autocorrelation function (ACF)
(s, t) = Cov [xs, xt]V [xs]V [xt]
= (s, t)(s, s)(t, t)
Cross-covariance function (CCV)
xy(s, t) = E [(xs xs)(yt yt)]Cross-correlation function (CCF)
xy(s, t) =
xy(s, t)x(s, s)y(t, t)
Backshift operatorBk(xt) = xtk
Difference operatord = (1 B)d
White noise
wt wn(0, 2w) Gaussian: wt iid N
0, 2w
E [wt] = 0 t T V [wt] = 2 t T w(s, t) = 0 s =t s, t T
Random walk
Drift xt= t +
tj=1 wj
E [xt] = tSymmetric moving average
mt =
k
j=k a
jxtj where aj =aj 0 andk
j=k a
j = 1
21.1 Stationary Time Series
Strictly stationary
P [xt1 c1, . . . , xtk ck] = P [xt1+h c1, . . . , xtk+h ck]
k N, tk, ck, h Z
Weakly stationary
E x2t < t Z E x2t =m t Z x(s, t) = x(s + r, t + r) r,s,t Z
Autocovariance function
(h) = E [(xt+h )(xt )] h Z (0) = E (xt )2
(0)
0
(0) |(h)| (h) = (h)
Autocorrelation function (ACF)
x(h) = Cov [xt+h, xt]V [xt+h]V [xt]
= (t + h, t)(t + h, t + h)(t, t)
=(h)
(0)
Jointly stationary time series
xy(h) = E [(xt+h
x)(yt
y)]
xy(h) = xy(h)
x(0)y(h)
Linear process
xt = +
j=jwtj where
j=
|j | <
(h) = 2
w
j=
j+hj
23
8/12/2019 Statistics Cookbook
24/28
21.2 Estimation of Correlation
Sample mean
x= 1
n
nt=1
xt
Sample variance
V [x] = 1
n
n
h=n1 |h|
n x(h)Sample autocovariance function
(h) = 1n
nht=1
(xt+h x)(xt x)
Sample autocorrelation function
(h) =(h)
(0)
Sample cross-variance function
xy(h) = 1n
nht=1
(xt+h x)(yt y)
Sample cross-correlation function
xy(h) = xy(h)x(0)y(0)Properties
x(h)= 1n ifxt is white noise
xy(h) = 1
n ifxt or yt is white noise
21.3 Non-Stationary Time Series
Classical decomposition model
xt = t+ st+ wt
t = trend
st = seasonal component wt = random noise term
21.3.1 Detrending
Least squares
1. Choose trend model, e.g.,t = 0+ 1t + 2t2
2. Minimize rss to obtain trend estimatet =0+1t +2t23. Residuals noise wt
Moving average
The low-passfilter vt is a symmetric moving average mt withaj = 12k+1 :
vt = 1
2k+ 1
ki=k
xt1
If 12k+1k
i=kwtj 0, a linear trend function t = 0+ 1t passeswithout distortion
Differencing
t = 0+ 1t =
xt= 1
21.4 ARIMA models
Autoregressive polynomial
(z) = 1 1z pzp z C p= 0
Autoregressive operator
(B) = 1 1B pBp
Autoregressive model order p, AR (p)
xt = 1xt1+ + pxtp+ wt (B)xt= wtAR (1)
xt = k(xtk) +k1j=0
j(wtj)k,||
8/12/2019 Statistics Cookbook
25/28
Moving average polynomial
(z) = 1 + 1z+ + qzq z C q= 0Moving average operator
(B) = 1 + 1B+ + pBp
MA (q) (moving average model orderq)
xt = wt+ 1wt
1+
+ qwtq
xt = (B)wt
E [xt] =
qj=0
jE [wtj ] = 0
(h) = Cov [xt+h, xt] =
2wqh
j=0jj+h 0 h q0 h > q
MA (1)xt= wt+ wt1
(h) =
(1 + 2)2w h= 0
2w h= 1
0 h > 1
(h) =
(1+2) h= 1
0 h > 1
ARMA (p, q)
xt = 1xt1+ + pxtp+ wt+ 1wt1+ + qwtq(B)xt= (B)wt
Partial autocorrelation function (PACF)
xh1i regression ofxi on{xh1, xh2, . . . , x1} hh= corr(xh x
h
1
h , x0 xh
1
0 ) h 2 E.g.,11= corr(x1, x0) = (1)ARIMA (p, d, q)
dxt = (1 B)dxt is ARMA (p, q)(B)(1 B)dxt= (B)wt
Exponentially Weighted Moving Average (EWMA)
xt = xt1+ wt wt1
xt =
j=1(1 )j1xtj+ wt when||
8/12/2019 Statistics Cookbook
26/28
Amplitude A Phase U1= A cos and U2= A sin often normally distributed rvs
Periodic mixture
xt =
qk=1
(Uk1cos(2kt) + Uk2sin(2kt))
Uk1, Uk2, fork = 1, . . . , q , are independent zero-mean rvs with variances 2k
(h) = qk=1 2kcos(2kh) (0) = E x2t = qk=1 2k
Spectral representation of a periodic process
(h) = 2 cos(20h)
=2
2e2i0h +
2
2e2i0h
=
1/21/2
e2ih dF()
Spectral distribution function
F() =
0 < 02/2 < 02 0
F() = F(1/2) = 0 F() = F(1/2) = (0)
Spectral density
f() =
h=(h)e2ih 1
2 1
2
Needsh= |(h)| < = (h) = 1/21/2 e2ihf() d h= 0, 1, . . . f() 0 f() = f() f() = f(1 ) (0) = V [xt] =
1/21/2 f() d
White noise: fw() = 2w ARMA (p, q) , (B)xt= (B)wt:
fx() = 2w
|(e2i)|2
|(e2i)
|2
where (z) = 1 pk=1 kzk and (z) = 1 +qk=1 kzk
Discrete Fourier Transform (DFT)
d(j) = n1/2
ni=1
xte2ijt
Fourier/Fundamental frequencies
j = j/n
Inverse DFT
xt = n1/2
n1j=0
d(j)e2ijt
PeriodogramI(j/n) = |d(j/n)|2
Scaled Periodogram
P(j/n) = 4
nI(j/n)
=
2
n
n
t=1xtcos(2tj/n
2+
2
n
n
t=1xtsin(2tj/n
2
22 Math
22.1 Gamma Function
Ordinary: (s) =
0
ts1etdt
Upper incomplete: (s, x) =x
ts1etdt
Lower incomplete: (s, x) = x
0
ts1etdt
( + 1) =() > 1 (n) = (n 1)! n N (1/2) =
22.2 Beta Function
Ordinary: B(x, y) = B(y, x) = 1
0
tx1(1 t)y1 dt= (x)(y)(x + y)
Incomplete: B(x;a, b) = x
0
ta1(1 t)b1 dt Regularized incomplete:
Ix(a, b) = B(x; a, b)B(a, b)a,b
N
=
a+b1j=a
(a + b 1)!j!(a + b 1 j)! xj(1 x)a+b1j26
8/12/2019 Statistics Cookbook
27/28
I0(a, b) = 0 I1(a, b) = 1 Ix(a, b) = 1 I1x(b, a)
22.3 Series
Finite
n
k=1k=
n(n + 1)
2
n
k=1
(2k 1) = n2
n
k=1
k2 =n(n + 1)(2n + 1)
6
n
k=1
k3 =
n(n + 1)
2
2
nk=0
ck = cn+1 1
c 1 c = 1
Binomial
n
k=0n
k= 2n
n
k=0
r+ k
k
=
r+ n + 1
n
nk=0
k
m
=
n + 1
m + 1
Vandermondes Identity:
rk=0
m
k
n
r k
=
m + n
r
Binomial Theorem:
n
k=0n
kankbk = (a + b)nInfinite
k=0
pk = 1
1 p ,k=1
pk = p
1 p |p|
8/12/2019 Statistics Cookbook
28/28
Univariate
distributionrelationships,court
esyLeemisandMcQueston[2].
28