Portfolio Statistics and Optimization - Distributions and

Portfolio Statistics and OptimizationDistributions and Summary Statistics

chris bemis

January 28, 2021

chris bemis Portfolio Statistics and Optimization

Cumulative Distribution Functions

In the general case, we focus on real-valued random variables,X .

The cumulative distribution function of X is given by

F(x) = P (X ∈ (−∞, x]) ,

and we assume there are no point masses in F .

The cumulative distribution function captures all distributionalinformation about X , for example:

P(X ∈ (a,b]) = P(X 6 b) − P(X 6 a) = F(b) − F(a),P(X > a) = 1 − P(X 6 a) = 1 − F(a),


Symmetric Distributions and Linear Transformations

Some useful definitions and observations

• We say X is symmetric about 0 if F(x) = 1 − F(−x). In thiscase, for a > 0,

P(|x | > a) = 2 · F(−a)

F(0) =12

• A linear transformation such as Y = aX + b results in

FY (y) = FX

(y − b

a

)for a > 0


Probability Density Functions

We say X has a probability density function (pdf), f : R→ R>0,if

F(x) =∫ x

−∞ f(s)ds.

Such an f must satisfy

• f > 0•∫R f(s)ds = 1

Unless noted explicitly, we will assume that random variablesunder consideration have a pdf.


Expectation and Variance

For X with a probability density function, the expected value ofa X is given by

E(X) =

∫∞−∞ sf(s)ds.

The expected value of X is also commonly denoted by µ.

For finite expectation, the variance of X is defined by

Var(X) = E((X − µ)2) ,

or, in terms of a probability density function,

Var(X) =

∫∞−∞(s − µ)2f(s)ds

and it may be shown that

Var(X) = E(X2)− µ2.

It is often useful to work with the square root of variance,denoted commonly as σ, as it is in the same units as µ.


Linear Transformations and Expectation and Variance

Shifting and dilating X results in changes to the expectationand variance as

E(aX + b) = aE(X) + b

Var(aX + b) = a2Var(X).

The expectation operator is therefore linear. And variance,sometimes called the dispersion parameter of X , isindependent of the location of the origin.


Skew and Kurtosis

In addition to the location and dispersion measures ofexpectation and variance, higher moments are sometimesconsidered.

The skew of X is defined as

γ = E

((X − µ)3

σ3

)and the kurtosis of X is given by

κ = E

((X − µ)4

σ4

).

More negative (positive) skew indicates more mass to the left(right) of the mean.

Kurtosis is a very rough indication of how much mass is in thetails of the distribution of X , with a higher value indicating fattertails.


Percentiles

The (100 · p)th percentile of X is given by

πp = min{π |F(π) = p}.

and if F is locally invertible near πp, then πp = F−1(p).

In contrast to the mean, variance, skew, or kurtosis of X ,percentiles are never infinite.

Several percentile-based risk metrics are used in portfolioconstruction, like Value-at-Risk, and Conditional Value-at-Risk.


Multivariate Distributions

We also consider random variables in RN as

X =

X1X2...

XN

.

And, as before, define now a joint cumulative distributionfunction of X , F : RN → [0,1] as

F(x1, x2, . . . , xN) = P (X1 6 x1,X2 6 x2, . . . ,XN 6 xN) .


Multivariate Density

Again, we focus primarily on random variables which admit aprobability density function, now,

f : RN → R+,

with

F(x1, x2, . . . , xN) =

∫ x1

−∞ · · ·∫ xN

−∞ f(s1, . . . , sN)dsN · · ·ds1

as expected.

The same requirements for this f as before apply.


Multivariate Expectation

The expected value of the multivariate random variable X , isgiven component wise

E(X) =

E(X1)E(X2)

...E(XN)

,

and may be written simply as a function of the density whenavailable. We often write µ = E(X) as before.

This operator is linear.


Linearity of Multivariate Expectation

In the case of X admitting a joint density, we have in the simplecase of X ∈ R2,

E (w1X1 + w2X2) =

∫∞−∞∫∞−∞ (w1X1 + w2X2) f(x1, x2)dx1dx2

(by the linearity of the integral operator)

= w1

∫∞−∞∫∞−∞ X1f(x1, x2)dx1dx2

+w2

∫∞−∞∫∞−∞ X2f(x1, x2)dx1dx2

= w1E (X1) + w2E (X1)

as desired.


Covariance

As is usual for multivariate calculus, the generalization of thesecond order measure, variance, results in a matrix, here calledthe covariance of X ,

Cov(X) = E((X − µ)(X − µ) ′

).

The components of Cov(X) are often denoted by σij for the (i, j)entry of the matrix. It is common to denote Cov(X) by Σ.

Convince yourself that σii = Var(Xi).


Covariance Between Univariate Random Variables

For X and Y univariate random variables taking values in R, wealso write

Cov(X ,Y) = E ((X − µX )(Y − µY )) .

This formulation frames covariance as an operator on randomvariables. We will discuss this operator’s properties insubsequent chapters.

Convince yourself that Cov(Xi ,Xj) = σij from the above.


Positive Semi-Definiteness of the Covariance Matrix

A key property of the covariance matrix, Σ, is that it is positivesemi-definite:

w ′Σw > 0,

for any vector, w.

To see this, define Y = w ′X .

Var(Y) = Var(w ′X)

= E((w ′X − w ′µ)2)

= E((w ′(X − µ))2)

= E(w ′(X − µ)(X − µ) ′w

)= w ′E

((X − µ)(X − µ) ′

)w

= w ′Σw.

Since the variance of Y is nonnegative, the result follows.chris bemis Portfolio Statistics and Optimization

Linear Combinations of X

The more general case of Y = BX for multivariate X andconstant matrix, B gives similar results:

E(BX) = BE(X)

andCov(BX) = BCov(X)B ′.

Both results follow directly from the work already shown above.


Transformations of Multivariate Random Variables

Forg : RN → RN

a one-to-one and continuous function, and

Y = g(X),

fY (y) = fX (g−1(y)) · det(∇g−1(y)

)where ∇· denotes the Jacobian, and det(·) is the determinantfunction.

We will apply this transformation directly as an exercise whenwe develop the log-normal distribution below.


Marginals

The marginal distribution of Xj is denoted

Fj(xj) = P(Xj 6 xj)

and is given byF(∞, . . . , xj , . . . ,∞) .

When a density exists for X , the marginal density for Xj is givenby

fj(x) =∫∞−∞ · · ·

∫∞−∞ f(s1, . . . , x, . . . , sN)dsN · · ·dsj+1dsj−1 · · ·ds1


Example Marginals in the Market


Observations from an Example Marginal

• The jointly normal distribution assumption seems to be apoor fit overall, but looks ellipsoidal otherwise.• There seems to be a fairly strong linear relationship

between the monthly returns of IBM and the S&P 500.• Extreme events seem more tightly clustered on the

downside.


Independence

Two random variables, X1 and X2, are said to be independent if

F(x1, x2) = F1(x1) · F2(x2),

or, equivalently,

P(X1 6 x1,X2 6 x2) = P(X1 6 x1)P(X2 6 x2).

If densities are available, independence may be defined by themarginals

f(x1, x2) = f1(x1) · f2(x2).

In every case, independence identifies that probabilities ofevents in one variable do not impact the other.


Jointly Independent, Generally

For N jointly independent random variables, we have

F(x1, . . . , xN) =

N∏i=1

Fi(xi),

P(X1 6 x1, . . . ,XN 6 xN) =

N∏i=1

P(Xi 6 xi),

and in the case of having densities,

f(x1, . . . , xN) =

N∏i=1

fi(xi).


Testing Independence in a Sample

We may look at the joint probability

P(rIBM < rτ, rS&P < rτ)

and compare this to

P(rIBM < rτ)P(rS&P < rτ)

for various values of rτ.

For rτ = 0,

P(rIBM < rτ, rS&P < rτ) = 0.333P(rIBM < rτ)P(rS&P < rτ) = 0.225.

Assuming independence would then underweight the observedfrequency of joint downward movements significantly.


Independence and Variance

The variance of the sum of independent univariate randomvariables is the sum of their variances.

Generally, though,

Var(X + Y) = E((X + Y − µX − µY )

2)= E

((X − µX )

2)+ E ((Y − µY )2)

+2E ((X − µX )(Y − µY ))

= σ2X + σ2

Y + 2σXY .

As a result, independence requires the cross term, σXY , to bezero.

The converse is not always true.


Multivariate Normal

As in the univariate case, the density for the multivariate normaldistribution is a two parameter function,

φµ,Σ(x) =1

(2π)N2

1

det(Σ)12exp

(−

12(x − µ) ′Σ−1(x − µ)

),

for x ∈ RN.

If X has the density above, we say X is a multivariate normal,and denote it X ∼ N(µ,Σ).

We denote the cumulative distribution function of a multivariatenormal by Φµ,Σ.

Finally, in this case, we have

E(X) = µ

Cov(X) = Σ.


Multivariate Normal Covariance

Implicit in the definition of the multivariate normal is the inverseof the covariance.

One may show that requiring Σ to have an inverse implies that itis positive definite; that is, the only vector satisfying w ′Σw = 0is w ≡ 0, and every other case is positive.


Linear Combinations of Jointly Normal RandomVariables

For X ∼ N(0, I), with I ∈ RN×N the identity matrix, andY = a + BX , we may show that Y ∼ N(a,BB ′).

Since

FY (y) = P(Y < y)= P(a + BX < y)= P

(X < B−1(y − a)

)= FX

(X < B−1(y − a)

),

we haveFY (y) = Φ0,I(B−1(y − a))



Resorting to the density, and using the ‘proportional to’notation, ∝ to avoid the normalizing scalar, we have

FY (y) ∝∫

s6B−1(y−a)exp

(−

12

s ′s)

ds

(changing variables x = Bs + a,1

det (B)dx = ds)

=1

det (B)

∫x6y

exp

(−

12(B−1(x − a)) ′(B−1(x − a))

)dx

=1

det (B)

∫x6y

exp

(−

12(x − a) ′(B−1) ′B−1(x − a)

)dx

=1

det (B)

∫x6y

exp

(−

12(x − a) ′(BB ′)−1(x − a)

)dx



As a result, the density of Y is

φa,BB ′ ,

and so Y ∼ N(a,BB ′).

In the proof above, we required B to be invertible. Thiscondition may be relaxed to arbitrary B, but not with thetechniques presented here.

Notice, too, that this construction gives a method for simulation.Given X ∼ N(0, I), if a mean, µ, and covariance, Σ, are desired,then

Y = µ+ΛX

with Σ = ΛΛ ′ is distributed as Y ∼ N(µ,Σ).


Multivariate Log-Normal

We say Y is a multivariate log-normal random variable if

Y = exp(X)

where

exp(X) =

exp(X1)...

exp(XN)

for some X ∼ N(µ,Σ). Equivalently, Y is log-normal if itslogarithm is normal.

The distribution is again a two parameter family, and we denoteit by Y ∼ LN(µ,Σ).


Multivariate Log-Normal Density

The multivariate log-normal density is given by

lnµ,Σ(x) =1

(2π)N2

1

det(Σ)−12×

exp

(−

12(log(x) − µ) ′Σ−1(log(x) − µ)

)·

N∏i=1

1xi

and may be derived using the change of variable theorem.

Since Y = exp(X), we use g(·) = exp(·).

We are left to gather the inverse and Jacobian of g(·).



The inverse of g(·) is immediate:

g−1(Y) =

log(Y1)...

log(YN)

=

g−11 (Y1)

...

g−1N (YN)

.

The Jacobian, ∇g−1, is the matrix of partials for each gi ,∂g−1

i∂yj

.These are exactly

∂g−1i∂yj

=

{1yj

if i = j0 otherwise.

As a result, the Jacobian is diagonal, and its determinant issimply

∏Ni=1

1yi

.



Putting this all together, the density for Y is

lnµ,Σ(y) = φµ,Σ(g−1(y)

) ∣∣det (∇g−1(y))∣∣

= φµ,Σ (log(y))N∏

i=1

1yi

Finally, the mean and covariance of Y ∼ LN(µ,Σ) are given as

E(Y)i = exp

(µi +

12σ2

i

)Cov(Y)ij = exp

(µi + µj +

12(σ2

i + σ2j )

)exp(σij − 1)


Multivariate Student t

We say T is distributed as a multivariate Student t distributionwith ν ∈N+ degrees of freedom if T has density

stµ,Σ;ν(x) =Γ(ν+N

2

)Γ(ν2

)(νπ)N/2

1det(Σ)1/2 ×(

1 +1ν(x − µ) ′Σ−1(x − µ)

)−ν+N2

,

where Γ(·) is the gamma function.

From the definition, it is clear that the distribution is ellipsoidal.We also have

E(T) = µ

when ν > 1 (otherwise it’s infinite) and

Cov(T) =ν

ν− 2Σ

for ν > 2.chris bemis Portfolio Statistics and Optimization

Linear Transformations of Multivariate Student t

We may show again that if T ∼ St(µ,Σ;ν), and

Y = a + BT ,

thenY ∼ St(a + Bµ,BΣB ′;ν).

As a result, portfolios of Student t distributed returns are againStudent t , and their statistical properties are therefore wellknown.

We strongly suggest to rely more on the Student t distribution(with five degrees of freedom) in financial data simulation thanthe multivariate normal due to, especially, the higher kurtosis ofthe former.


Estimators and Convergence

We next consider the question of estimating the mean andvariance of a random variable.We are immediately familiar with the estimator

µ̂ =1N

N∑i=1

xi .

Underlying our intuition here is that each xi is drawn from adistribution Xi , and {Xi}

Ni=1 are iid.

The above is really a sample of a random variable,

MN =1N

N∑i=1

Xi .

By the linearity of the expectation operator, if the Xi are iid, thenE(MN) = µ, if E(Xi) = µ.But how is MN distributed?


The Weak Law of Large Numbers

A sequence of real valued random variables, {Xi} converges inprobability to X if for any ε > 0,

limN→∞P (|XN − X | > ε) = 0.

With this, The Weak Law of Large Numbers is given by:

For {Xi} iid random variables taking values in R, with E(Xi) = µ,and Var(Xi) = σ

2 for each i,

MN =1N

N∑i=1

Xi

converges in probability to µ.


Markov and Chebyshev Inequalities

The proof of the Weak Law of Large Numbers relies on twoinequalities, neither of which are usually binding enough to bepractical.

Markov’s Inequality: Let X be a nonnegative random varialble.Then

P(X > a) 61aE(X).

Chebyshev’s Inequality: Let X be a random varialble takingvalues in R with E(X) = µ and Var(X) = σ2). Then

P (|X − µ| > a) 6σ2

a2 .

Chebyshev’s Inequality needs Markov’s Inequality, and theWeak Law proof requires Chebyshev’s Inequality.


Proof of The Weak Law of Large Numbers

For MN as before,

Var(MN) = Var

(1N

N∑i=1

Xi

)

=1

N2 Var

(N∑

i=1

Xi

)

=1

N2

N∑i=1

Var (Xi)

=1Nσ2.

Notice that the mean remained fixed while the variance, ameasure of dispersion of MN, scaled by 1

N .


Proof of The Weak Law of Large Numbers

Using Chebyshev’s Inequality, we have for any ε > 0,

P (|MN − µ| > ε) 6Var(MN)

ε

=σ2

N1ε.

For a fixed ε, then, the right hand side goes to zero as N →∞,completing the proof.

The Weak Law gives a convergence result for µ̂ = 1N∑N

i=1 Xi .

We might still want to know the actual distribution of MN.


Convergence in Distribution

A sequence of real valued random variables {Xi} withdistribution functions {Fi(·)}, respectively, converges indistribution to X if

limN→∞FN(X) = F(X),

where F(·) is the distribution function of X .

Convergence in probability implies convergence in distributionin the sense that if real valued random variables {Xi} withdistribution functions {Fi(·)} converge in probability to X ,

XN → X

thenFN → F .


Central Limit Theorem

For {Xi} iid real random variables with finite mean and variance,µ and σ2, respectively, the cumulative distribution function, FN,of the random variable

ZN =

√N(MN − µ)

σ

satisfieslim

N→∞FN(x) = Φ(x).

That is, the cumulative distribution function of ZN converges indistribution to the standard normal.

While the proof is outside the scope of our work, its usefulnessand ubiquity make it unavoidable.


Bias

For a quantity to be estimated, θ, and estimator, θ̂, the bias of θ̂is

Bias(θ̂) = E(θ̂) − θ.

An unbiased estimator has zero bias and satisfies

E(θ̂) = θ.

We have already seen that

MN =1N

N∑i=1

Xi

is an unbiased estimator of the mean.


Unbiased Estimator of Variance

With the same assumptions as above,

sN =1

N − 1

N∑i=1

(Xi − µ̂)2,

with µ̂ = 1N∑N

i=1 Xi is an unbiased estimator of the variance ofX . That is,

E(sN) = σ2.

The term 1/(N − 1) is somewhat surprising on first pass. It willbe slightly more intuitive when we encounter the estimate ofvariance again in ordinary least squares where degrees offreedom will be given more color.


Proof of Unbiased Estimator of Variance

Consider first the definition,

E

(N∑

i=1

(Xi − µ̂)2

)=

N∑i=1

E((Xi − µ̂)

2) ,and note that each summand may be written as

E((Xi − µ̂)

2) = E(X2i ) − 2E(Xiµ̂) + E(µ̂

2),

so that

E

(N∑

i=1

(Xi − µ̂)2

)= NE(X2) − 2

N∑i=1

E(Xiµ̂) + NE(µ̂2)

= NE(X2) − 2E(Nµ̂µ̂) + NE(µ̂2)

= NE(X2) − 2NE(µ̂2) + NE(µ̂2)

= N(E(X2) − E(µ̂2)

)chris bemis Portfolio Statistics and Optimization


To determine E(µ̂2), recall that for any X ,

Var(X) = E(X2)− E(X)2,

so thatVar(µ̂) = E

(µ̂2)− E(µ̂)2,

which, by previous work says

1Nσ2 = E

(µ̂2)− µ2.

Or,

E(µ̂2) = 1

Nσ2 + µ2.



Putting the above together, we have

E

(N∑

i=1

(Xi − µ̂)2

)= N

(E(X2) − E(µ̂2)

)= N

(σ2 + µ2 −

(1Nσ2 + µ2

))= N

(N − 1

Nσ2)

= (N − 1)σ2.

Dividing both sides by N − 1 proves the result.


Some Notes on Estimators

Whenever a fit of a distribution has been shown, we haveestimated the mean and variance (and covariance in one case)of the underlying distributions using the above estimators.

Implicit in this estimation is that the random variables underconsideration are iid.

Specifically, we repeatedly assumed, then, that daily log returnsof the S&P are independent. But this may not obtain; see, forexample Amir Khandani and Andrew Lo.

We don’t present a solution to the apparent lack ofindependence of daily returns here, but find it important to notethe tension between estimation in practice, theory, andempirical findings.


Central Limit Theorem and Diversification

Consider a model where each stock’s returns, ri , are iid withmean and standard deviation, µS and σS , respectively.

The Central Limit Theorem implies that the mean and varianceof the return of an evenweighted portfolio,

rΠ =1N

N∑i=1

ri

will be maintained, and diminish (as 1/N), respectively, and thatthe distribution of rΠ will be normal.

Of course, the assumptions do not hold in practice.

As a prologue, though, we have some motivation to considerdiversification, as such, when constructing a portfolio.


CAPM Lite

From the figures considered, we might assume that a givenstock’s returns look something like

ri = βim + εi .

In other words, IBM’s returns (r) are linearly related to thereturns to the S&P 500 (m), plus some error.

This is really just an approximation used for insight and not amodel derived from empirical validation, but the implications areinteresting.

For example, we may want to make some assumptions aboutwhat a ‘good’ ε· should look like.


CAPM Lite

For example, we may want to assume

E(εi) = 0Var(εi) = σ2

i

Cov(m, εi) = 0Var(m) = σ2

m.

In this case, we get that

E(rΠ) =

∑i βi

NE(m)

Var(rΠ) =

(∑i βi

N

)2

σ2m +

1N2

N∑i=1

σ2i ,

so that if∑

i βi ≈ N, we reconstruct (up to variance) the market,with reduced idiosyncratic variance as N increases.


CAPM Lite

Of course, the attractiveness of reducing idiosyncratic noiseraises the question: Why not just own the market, then?

We will encounter this question again when we have more toolsunder our belt.


fin.


Documents

Portfolio Statistics and Optimization - Distributions and