Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS1.pdf · IN S T I T U T E O F WA T E R A C O U S T I C S, SO N A R E N G I N E E R I N G A N D SI G N A L T H

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 1

Stochastic Signals and SystemsContents1 Probability Theory2 Stochastic Processes3 Parameter Estimation4 Signal Detection5 Spectrum Analysis 6 Optimal Filtering




1 Probability Theory 5

1.1 Terminology 5

1.2 Definition of Probability 71.2.1 Relative Frequency and Probability 71.2.2 Axiomatic Approach to Probability 81.2.3 Classical Definition of Probability 10

1.3 Conditional Probability 11

1.4 Random Variables 14

1.5 Distribution Functions 171.5.1 Purely discrete case 211.5.2 Purely continuous case 25

1.6 Some Special Distributions 301.6.1 Discrete Distributions 301.6.2 Continuous Distributions 38




1.7 Bivariate Distribution 551.7.1 Bivariate Distribution Function 561.7.2 Bivariate Density Function 591.7.3 Marginal Distribution and Density Function 601.7.4 Conditional Distribution and Density Function 621.7.5 Independent Random Variables 681.7.6 Bivariate Normal Distribution 69

1.8 Transformations of Random Variables 751.8.1 Function of One Random Variable 751.8.2 One Function of Two Random Variables 841.8.3 Two Functions of Two Random Variables 86

1.9 Expectation Operator 911.9.1 Expectation for Univariate Distributions 911.9.2 Expectation for Bivariate Distributions 1071.9.3 Mean Square Estimation 132




1.10 Vector-valued Random Variables 1381.10.1 Multivariate Distributions 138 1.10.2 Transformation of Vector-valued Random Variables 1441.10.3 Expectations for Vector-valued Random Variables 1491.10.4 Mean Square Estimation 1591.10.5 Multivariate Normal Distribution 166

1.11 Sequences of Random Variables 1791.11.1 Convergence Concepts 1791.11.2 Laws of Large Numbers 1861.11.3 Central Limit Theorems 189

References to Chapter 1 192




1 Probability Theory1.1 TerminologyE: random experiment (activity where the outcome is

randomly influenced)X: sample space (set of all possible outcomes of E

which may consist of a finite, infinite countable or uncountable number of elements)

x : elementary event (possible outcome of E, i.e. xÎX)

Ø: impossible event (empty set Ø= { })




E: event (collection of some of the possible outcomesof E, i.e. E Ì X )

S : s -field, i.e. a system of subsets of X satisfying 1) X ÎS2) if E ÎS then = X \E ÎS3) if EiÎS for i = 1,2,… then

Corollary: 1) ØÎS2) if E1,E2 ÎS then E1 Ç E2 ÎS and E1\E2 ÎS3) if EiÎS for i = 1,2,… then

(X,S): measurable space

Eii=1

∞∪ ∈S

Eii=1

∞∩ ∈S

E




1.2 Definition of Probability1.2.1 Relative Frequency and ProbabilityIf a random experiment is performed n times and where the event of interest E is observed with frequency hn(E), then the relative frequency of the occurrence of E is de-fined by

Empirical law of large numbersFor sufficiently large n we can write with a high degree of certainty that

( )( ) with 0 ( ) 1.nn n

h EH E H En

= £ £

).()( EHEP n@




1.2.2 Axiomatic Approach to ProbabilityConsider an experiment E with measurable space (X,S).A probability measure P is then defined as a mapping

P : S® Rwhich satisfies the following axioms

1) if E Î S then P(E) ³ 0

2) P(X) = 13) if EiÎS for i =1,2,… and EiÇEj =Ø for i ¹ j then

å¥

=

¥

=

=÷÷ø

öççè

æ

11)(

ii

ii EPEP !




The triple (X,S,P) is called probability space.

Implications:1) P(Ø) = 0 2) if E1, E2 ÎS with E1ÌE2 then

P(E1) £ P(E2)3) if E1, E2 Î S then

P(E1ÈE2) = P(E1) + P(E2) - P(E1ÇE2)




1.2.3 Classical Definition of ProbabilitySuppose that an experiment has a finite number n of pos-sible outcomes, x1, x2,…, xn and we are interested in an event E = {xi1,xi2,…,xim} with {i1, i2,…, im} Ì {1,2,…,n}. If we assume that all outcomes x1, x2,…, xn are equally likely, then

This is a basic result which assigns probabilities to events purely on the basis of combinatorial arguments.However, its application is strictly limited to experiments of a finite number of equally likely outcomes.

.outcomes of number total

to favorable outcomes of number)(nmEEP ==




1.3 Conditional ProbabilityLet (X,S,P) be a probability space with E1, E2 ÎS and P(E2) > 0. The conditional probability of E1 given that E2 occurred is defined by

One can easily show that the conditional probability sat-isfies the axioms of a probability measure.

)()()|(

2

2121 EP

EEPEEP Ç=




Implications:1) Bayes’ Formula

P(E2 | E1 ) = P(E1 | E2) P(E2) / P(E1)

Furthermore, assuming

we can derive the Total Probability

!i

iji EEjiEE Ì¹=Ç and,Ø

å=i

ii EPEEPEP )()|()(




and the generalised Bayes’ Formula

2) Two events E1 and E2 are called independent ifP(E1 | E2) = P(E1)

holds. Consequently, we can stipulateP(E1 Ç E2) = P(E1) P(E2)

( ) ( | )( | ) , ( ) 0( ) ( | )k k

ki ii

P E P E EP E E P EP E P E E

= >å




1.4 Random VariablesA mapping

X: X ® R,

such that to each x Î X there corresponds a unique real number X(x) Î R, is called random variable or measura-ble function with respect to S, if for each set B Ì R the inverse image

X -1(B) = {x : X(x) Î B}is element of S.




To assign probabilities to random variables one has to translate statements about the values of random varia-bles as follows.

PX (B) = P(X -1(B)) = P({x : X(x) Î B })

Furthermore, a s -field has to be defined over R. One can show that such a s -field should include all intervals of the kind (-¥,x]. The power set P(R) includes the desired intervals but its cardinality is to high to be able to implement the meas-urability properties.




However, one can show that a particular s -field exists, called Borel-field B, that is the smallest possible includ-ing all the intervals (-¥, x] and that guarantees the meas-urability of all sets element of B.

Thus, we can define by(R,B) the measurable space

and (R,B,PX) the probability space

of a random variable X.




1.5 Distribution FunctionsGiven a random variable X, the distribution of X, FX (x), is defined by

FX (x) = PX ((-¥,x]) = P({x : X(x) £ x}) = P(X £ x).

One can show that FX (x) uniquely determines all the probabilistic properties of the random variable X.In particular, for any a,b Î R with a £ b we have

P(X £ b) = P(X £ a) + P(a < X £ b),

cf. Axiom 3. Hence P(a < X £ b) = P(X £ b) - P(X £ a) = FX (b) - FX (a).




Distribution functions have the properties:(1) 0 £ FX (x) £ 1 for all x Î R

(since FX (x) is a probability)

(2) limx®-¥ FX (x) = 0, limx®¥ FX (x) = 1(since limx ®-¥ X -1((-¥, x]) = Ø Ù limx®¥ X -1((-¥, x]) = X)

(3) FX (x) is a non-decreasing function, i.e. for any h ³ 0 and all x, FX (x+h) ³ FX (x) (since FX (x + h) - FX (x) = P(x < X £ x + h) ³ 0)

(4) FX (x) is right-continuous, i.e. for all xlimh®0+ FX (x+h) = FX (x)

(the limit h ® 0 is taken through positive values only)




Any distribution function FX (x), can be expressed byFX (x) = a1 FX,1 (x) + a2 FX,2 (x) + a3 FX,3 (x),

where ai ³ 0 for i = 1,2,3, a1 + a2 + a3 = 1

and FX,1 (x) is continuous everywhere and differentia-ble for almost all x, i.e. absolute continuous,FX,2 (x) is a step-function with a finite or counta-ble infinite number of jumps,FX,3 (x) is a singular function, that is continuouswith zero derivative almost everywhere.




FX,1 (x) and FX,2 (x) correspond to the two basic types of probability distributions one usually encounters in prac-tise, i.e. the

continuous and discrete distribution,respectively.Since FX,3 (x) is highly pathological, it can be safely as-sumed that it does not arise in real applications. In practice we therefore ignore FX,3 (x) and assume that all distribution functions can be simply represented by

FX (x) = lFX,1 (x) + (1 – l ) FX,2 (x)with 0 £ l £ 1.




1.5.1 Purely discrete case (l = 0)The distribution function FX (x) = FX,2 (x) is a simple step-function with jumps pi at the points xi for i = 1,2,3,… FX (x) would typically have the form

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x2

p2

p3

p4

p5

x3x1 x4

p1

x5

FX(x)

x




If an interval (a,b] does not contain any of the jump pointsxi, then clearly

P(a < X £ b) = FX (b) - FX (a) = 0.

Hence, X cannot take any value lying between to succes-sive jump points.

For each i and any small h > 0 we can write

P(xi - h < X £ xi + h) = FX (xi + h) - FX (xi - h) = pi .

Letting h ® 0, we obtainP(X = xi) = pi i = 1,2,3,…




Thus the only values X can take are those correspond-ing to the jump points. Therefore, X is called discrete ran-dom variable.The jump pi at point xi represents the probability that Xtakes the value xi. Furthermore, (x1,p1),(x2,p2),… are usedto define the so-called probability mass function pX (x).

x2

pX (x2) = p2

pX (x3) = p3

pX (x4) = p4

pX (x5) = p5

x3x1 x4

pX (x1) = p1

x5x

pX (x)

-6 -4 -2 0 2 4 60

0.1

0.2

0.3




Discrete distribution functions possess the properties:

(1) , where the summation extents over all values of i for which xi £ x

(2) 0 £ pX (x) £ 1(since pX (x) is a probability mass function)

(3)

(since limx®¥ FX (x) = Si pX (xi) = 1)

pX (xi )i∑ = 1

FX (x) = pX (xi )i,xi≤x∑




1.5.2 Purely continuous case (l = 1)The distribution function FX (x) = FX,1(x) is absolutely con-tinuous, i.e. differentiable for almost all x. FX (x) would typically possess a graph as shown below.

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

FX(x)

x




X can, in general, take any value either on a finite or an infinite interval and is therefore called continuous ran-dom variable.Thus continuous random variables are suitable models for measuring physical quantities such as pressures, volt-ages, temperatures, etc.

Furthermore, FX (x) can be represented by

where fX (x) is said to be the probability density function (PDF) of X.

FX (x) = fX ( ′x ) d ′x−∞

x

∫ ,




If fX (x) is continuous at x, then

exists. For a small interval (x,x+Dx] we can now write

or

where o(Dx) represents a term of smaller order of magni-tude than Dx.

′FX (x) =dFX (x)dx

= fX (x)

P(x < X ≤ x + Δx) = FX (x + Δx)− FX (x) = fX ( ′x )d ′xx

x+Δx

∫

P(x < X ≤ x + Δx) = fX (x)Δx + o(Δx),




The latter equation forms the basis for interpreting fX (x) as a density function, namely, fX (x) defines the density of probability in the neighbourhood of the point x.Remarks:

§ fX (x) itself does not represent a probability§ fX (x) × Dx has a probabilistic interpretation, § fX (x) completely determines FX (x) and therefore

completely specifies the properties of a continu-ous random variable.




Probability density functions satisfy the properties:(1) fX (x) ³ 0 for all x Î R

(since FX (x) is a non-decreasing function)

(2)(since )

(3) For any a,b Î R with a £ b

f (x)dx−∞

∞

∫ = 1

limx→∞F(x) = f (x)dx

−∞

∞

∫ = 1

P(a < X ≤ b) = FX (b)− FX (a) = fX (x)dxa

b

∫




1.6 Some Special Distributions1.6.1 Discrete DistributionsBinomial distributionConsider an experiment which has only two possible out-comes, “success” and “failure”, with probability p and(1-p), respectively.

The number of “successes” occurring in n independent repetitions of the experiment is a random variable X that can take the values k = 0,1,…,n. Within a sequence of n independent trials, k successes can occur in




different arrangements. The probability of a specific ar-rangement is obviously

Thus, the probability for observing k successes in n inde-pendent trials is given by

The bn,p(k) (k = 0,1,…,n) are called binomial probabilities.

nk

⎛⎝⎜

⎞⎠⎟= n!k !(n − k)!

pk 1− p( )n−k .

P ξ : X (ξ) = k{ }( ) = P X = k( ) = nk

⎛⎝⎜

⎞⎠⎟pk 1− p( )n−k = bn,p(k).




Exploiting the binomial theorem one can verify that

Thus, the distribution function of the so-called binomial distribution B (n,p) can be defined by

where m = ëxû Î Z, i.e. m £ x < m+1 (Gauss bracket).

FX (x) = P(X ≤ x) = bn,p(k)k=0

m

∑ =nk

⎛⎝⎜

⎞⎠⎟pk (1− p)n−k

k=0

m

∑ ,

bn,p(k)k=0

n

∑ =nk

⎛⎝⎜

⎞⎠⎟pk (1− p)n−k

k=0

n

∑ = p + (1− p)⎡⎣ ⎤⎦n= 1.




0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

k

b np(k )

binomial probabilities

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

F X(x

)

binomial distribution function

p = 0.5




Poisson distributionConsider the limiting form of the binomial probabilities when n®¥ and p®0 in such a way that np=an®a, a positive constant.Substituting p by an /n, we obtain

bn,αnn

(k) = n!k ! n − k( )!

αn

n⎛⎝⎜

⎞⎠⎟

k

1−αn

n⎛⎝⎜

⎞⎠⎟

⎛

⎝⎜

⎞

⎠⎟

n−k

= nn⋅ n−1n!n−k +1n

1−αn

n⎛⎝⎜

⎞⎠⎟

⎛

⎝⎜

⎞

⎠⎟

−k

1−αn

n⎛⎝⎜

⎞⎠⎟

⎛

⎝⎜

⎞

⎠⎟

nαnk

k !

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪.




As n tends to infinity, we have

The pa (k) (k = 0,1,…) are called Poisson probabilities.

Note that the sum of the infinitely many but countable Poisson probabilities satisfies

for all a.

bn,αnn

(k) n→∞⎯ →⎯⎯ e−α α k

k != pα (k).

0 0( ) 1

!

k

k kp k e e e

ka a a

aa¥ ¥

- -

= =

= = =å å




Hence, the distribution function of the so-called Poisson distribution P (a) is given by

where m = ëxû Î Z, i.e. m £ x < m+1 (Gauss bracket).

In practice, the Poisson distribution is used for approxi-mating the binomial distribution in cases, where in a large number of independent trials (large n) the number of oc-currences of a rare event (small p) is of interest.

FX (x) = P(X ≤ x) = pα (k)k=0

m

∑ = e−α α k

k !k=0

m

∑ ,




0 10 20 30 40 500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

k

p a(k

)

Poisson probabilities

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

F X(x

)

Poisson distribution function

a = 25




1.6.2 Continuous DistributionsUniform (rectangular) distributionA continuous random variable X is uniformly distributed on the interval [a,b] (in abbreviated form X ∼ R(a,b)), if the probability density function is defined by

where 1M (x) denotes the indicator function of the set M Ì R, i.e.

fX (x) =1

b − a1[a,b](x) x ∈!,

1M (x) =1 x ∈M0 otherwise⎧⎨⎩

.




The distribution function can be expressed as follows:

A uniformly distributed random variable X ∼ R(-p,p) is often used for modelling a random initial phase of a si-nusoidal signal.

FX (x) = fX ( ′x ) d ′x−∞

x

∫ = 1b − a

1[a,b]( ′x ) d ′x−∞

x

∫

=0 x ≤ a(x − a) (b − a) x ∈[a,b]1 x ≥ b

⎧

⎨⎪

⎩⎪

.




-1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

x

f X(x

)

density function

-1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

x

F X(x

)

distribution function

a = 0, b = 1




Normal (Gaussian) distributionA continuous random variable X is said to be normally distributed with parameters µ Î R and s 2 (X∼N(µ,s 2)), if the probability density function is defined by

The normal distribution function

can not be expressed in explicit form.

fX (x) =12πσ 2

exp − (x − µ)2

2σ 2

⎧⎨⎩⎪

⎫⎬⎭⎪

x ∈!.

FX (x) = fX ( ′x ) d ′x−∞

x

∫ = 1

2πσ 2exp − ( ′x − µ)2

2σ 2

⎧⎨⎩⎪

⎫⎬⎭⎪d ′x

−∞

x

∫




However, to check that fX (x), where fX (x) > 0 "xÎR, rep-resents a valid form of a probability density function we have to show that

With the substitution x = (x' - µ) /s we can derive

Since fX (x) > 0 "x Î R, it is equivalent to proof that

1

2πσ 2exp −

′x − µ( )22σ 2

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪−∞

∞

∫ d ′x = 1

2πexp − x

2

2⎧⎨⎩⎪

⎫⎬⎭⎪−∞

∞

∫ dx.

fX (x) dx−∞

∞

∫ = 1.

1

2πexp − x

2

2⎧⎨⎩⎪

⎫⎬⎭⎪−∞

∞

∫ dx⎛

⎝⎜⎞

⎠⎟

2

= 1.




1

2πexp(−x2 2)

−∞

∞

∫ dx⎛⎝⎜

⎞⎠⎟

2

=

= 12π

exp − (x2 + y 2) 2{ } dxdy−∞

∞

∫−∞

∞

∫= 12π

exp(−r 2 2) r dr dφ0

∞

∫0

2π

∫= exp(−r 2 2) r dr

0

∞

∫ = exp −u( ) du0

∞

∫ = 1.

Thus, after introducing a double integral, employing po-lar coordinates and finally substituting u = r2/2, the valid-ity of the equation can be shown:




The special form N (0,1), i.e. µ = 0 and s 2 = 1, is called standardized normal distribution. Its distribution function is usually denoted by F(x), i.e.

There are extensive tables of the function F(x) available in the literature. These tables enable us to evaluate the distribution function of X ∼ N(µ,s 2) as follows.

Φ(x) = 12π

exp − ′x 2

2⎧⎨⎩⎪

⎫⎬⎭⎪−∞

x

∫ d ′x .




Let X ∼ N(µ,s 2) and therefore (X - µ) /s ∼ N(0,1) then

The normal distribution is by far the most important dis-tribution in probability theory and statistical inference.Its prominence is attributed to central limit theorems, which roughly state, that the sum of a large number of (independent) random variables obeys an approximate normal distribution.

FX (x) = P ξ :−∞ < X (ξ) ≤ x{ }( )= P ξ :−∞ < X (ξ)− µ

σ≤ x − µ

σ⎧⎨⎩

⎫⎬⎭

⎛

⎝⎜⎞

⎠⎟= Φ x − µ

σ⎛⎝⎜

⎞⎠⎟.




-5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

x

f X(x

)

Gauss density function

-5 0 50

0.2

0.4

0.6

0.8

1

x

F X(x

)

Gauss distribution function

µ = 0, s 2 = 1




Exponential distributionA continuous random variable X, taking positive values only, is said to satisfy an exponential distribution with pa-rameter l > 0 (X ∼ E (l)), if its probability density func-tion is of the form

Hence, its distribution function is given by

fX (x) = λ exp(−λ x)1[0,∞)(x) x ∈!.

FX (x) = fX ( ′x )−∞

x

∫ d ′x = λ exp(−λ ′x )1[0,∞)( ′x )−∞

x

∫ d ′x

= 1− exp(−λ x)⎡⎣ ⎤⎦1[0,∞)(x).




The exponential distribution is used as a model for the life times of items when ageing processes can be neglected.

x x0 1 2 3 4 5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f X(x

)

density function

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

F X(x

)

distribution function

l = 1




Weibull distributionWhen either ageing or initial failures processes have to be taken into account, a suitable model for the life time of an item can be provided by a continuous random va-riable X obeying a Weibull distribution with parameters l > 0,h > 0 (X ∼W(l,h)).

The probability density function of the Weibull distribu-tion is defined by

fX (x) = λ η xη−1exp(−λ xη )1[0,∞)(x) x ∈!.




For the distribution function we obtain

By selecting a value for h, the following three cases can be qualitatively distinguished:h = 1 obviously W(l,1) = E (l)h > 1 ageing process is incorporated 0 < h < 1 initial failure process is considered

FX (x) = fX ( ′x )−∞

x

∫ d ′x = λη ′x η−1 exp(−λ ′x η )1[0,∞)( ′x )−∞

x

∫ d ′x

= 1− exp(−λ xη )⎡⎣ ⎤⎦1[0,∞)(x).




0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

x

f X(x

)

Weibull density function

0 1 2 3 4 50

0.2

0.4

0.6

0.8

1

x

F X(x

)

Weibull distribution function

h = 1/2h = 1h = 2l = 1




Cauchy distributionA continuous random variable X is said to follow a Cauchy distribution with parameters µ Î R and n > 0 (X ∼ C(µ,n)), if the probability density function is given by

The distribution function can be derived as follows:

fX (x) =1π

νν 2 + (x − µ) 2

= 1π ν

1

1+ (x − µ) ν⎡⎣ ⎤⎦2 x ∈!.

FX (x) = fX ( ′x )−∞

x

∫ d ′x = 1π ν

1

1+ ( ′x − µ) ν⎡⎣ ⎤⎦2−∞

x

∫ d ′x =




The Cauchy distribution possess so-called long tails, i.e. its density function decays slowly as x tends either to plus or minus infinity.Consequently, Cauchy distributed random variables are suitable models for experiments where large measure-ment values can be observed with certain likelihood.

= 1π

11+ ′′x 2−∞

(x−µ) ν

∫ d ′′x = 12+ 1πarctan x − µ

ν⎛⎝⎜

⎞⎠⎟.




-10 -5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

x

f X(x

)

Cauchy density function

-10 -5 0 5 100

0.2

0.4

0.6

0.8

1

x

F X(x

)

Cauchy distribution function

µ = 0, n = 1




1.7 Bivariate DistributionThe theory of random variables discussed so far dealt only with univariate distributions, i.e. the probability dis-tributions describe the properties of single random vari-ables. However, the modelling of experimental results often re-quires several random variables, e.g. the results of meas-uring the simultaneous values of pressure and tempera-ture in a gas of constant volume have to be described bytwo random variables.




1.7.1 Bivariate Distribution FunctionA bivariate distribution function FXY (x,y) is defined by

FXY (x,y) = P({x : X(x)£x, Y(x)£y}) = P(X£x,Y£y).

It can be computed in discrete and continuous case by

or

respectively, where pXY (x,y) denotes the bivariate prob-ability mass function and fXY (x,y) the bivariate density function.

FXY (x,y) = fXY ( ′x , ′y )d ′x d ′y−∞

y

∫−∞

x

∫ ,

FXY (x,y) = pXY (xi ,y j )j ,y j ≤y∑

i,xi≤x∑




Bivariate distribution functions possess the properties:(1) limx®¥, y®¥ FXY (x,y) = FXY (¥,¥) = 1

(2) limx®-¥ FXY (x,y) = FXY (-¥,y) = 0limy®-¥ FXY (x,y) = FXY (x,-¥) = 0

(3) FXY (x,y) is right-continuous in x and y, respectively,i.e. limh®0+ FXY (x+h,y) = FXY (x,y)limh®0+ FXY (x,y+h) = FXY (x,y)




(4) FXY (x,y) is a non-decreasing function, i.e. for anyh ³ 0 is FXY (x+h,y) ³ FXY (x,y) for all x and any y

FXY (x,y+h) ³ FXY (x,y) for all y and any x

(5) second difference (n-th difference for n = 2)DFXY ((a,b]) == FXY (b1,b2) - FXY (b1,a2) - FXY (a1,b2) + FXY (a1,a2)= P ({x : (X(x),Y(x))TÎ (a,b]}) = P ((X,Y)TÎ (a,b]) ³ 0where

(a,b] = ((a1,a2)T,(b1,b2)T] = (a1,b1] ´ (a2,b2].




1.7.2 Bivariate Density FunctionIf fXY (x,y) is continuous at (x,y), then

exists. Bivariate density functions satisfy the properties:(1) fXY (x,y) ³ 0 for all (x,y)T Î R2

(2)

(3) For any (a,b] = (a1,b1] ´ (a2,b2] Ì R2

fXY (x,y)dxdy−∞

∞

∫−∞

∞

∫ = 1

fXY (x,y) =∂2FXY (x,y)

∂x∂y=∂2FXY (x,y)

∂y ∂x

P (X ,Y )T∈(a,b]( ) = fXY (x,y)dxdya1

b1∫a2

b2∫ ,




1.7.3 Marginal Distribution and Density FunctionFor a given bivariate distribution of (X,Y), the univariate distributions of the individual random variables X and Y can be deduced from the expressions

and

limy→∞FXY (x,y) = FXY (x,∞) = P(X ≤ x,Y ≤ ∞)

= P(X ≤ x) = FX (x)

limx→∞FXY (x,y) = FXY (∞,y) = P(X ≤ ∞,Y ≤ y)

= P(Y ≤ y) = FY (y).




For the marginal distribution and density functions of Xand Y we may write

with

and

with

respectively.

FX (x) = fX ( ′x )d ′x−∞

x

∫ = fXY ( ′x ,y)dy d ′x−∞

∞

∫−∞

x

∫

fX (x) = fXY (x,y)dy−∞

∞

∫

FY (y) = fY ( ′y )d ′y−∞

y

∫ = fXY (x, ′y )dxd ′y−∞

∞

∫−∞

y

∫

fY (y) = fXY (x,y)dx−∞

∞

∫ ,




1.7.4 Conditional Distribution and Density FunctionThe conditional probability of {x : X(x) £ x} knowing that {x : Y(x) £ y} occurred is given by, cf. Section 1.3,

Hence, the conditional distribution function of X under the condition {Y £ y} can be defined by

P(X ≤ x |Y ≤ y) =P {X ≤ x}∩ {Y ≤ y }( )

P(Y ≤ y), if P(Y ≤ y) > 0.

FX (x |Y ≤ y) = P(X ≤ x |Y ≤ y)= P(X ≤ x,Y ≤ y) P(Y ≤ y)= FXY (x,y) FY (y), if FY (y) > 0.




The conditional density function of X under the condition {Y £ y} can be deduced from

such that

If fXY (x,y) is continuous at (x,y), we can also write

fX (x |Y ≤ y) =dFX (x |Y ≤ y)

dx=∂FXY (x,y) ∂x

FY (y).

FX (x |Y ≤ y) = fX ( ′x |Y ≤ y) d ′x−∞

x

∫ = FXY (x,y) FY (y)

= fXY ( ′x , ′y ) d ′y d ′x−∞

y

∫−∞

x

∫ FY (y)

fX (x |Y ≤ y) =fXY (x, ′y )d ′y

−∞

y

∫FY (y)

.




Let now {x : y < Y(x) £ y + Dy} denote an event satisfy-ing P(y < Y £ y + Dy) > 0. Then, we can derive

FX (x | y <Y ≤ y + Δy)

= P(X ≤ x,y <Y ≤ y + Δy)P(y <Y ≤ y + Δy)

= P(X ≤ x,Y ≤ y + Δy)−P(X ≤ x,Y ≤ y)P(Y ≤ y + Δy)−P(Y ≤ y)

=FXY (x,y + Δy)− FXY (x,y)FY (y + Δy)− FY (y)

⋅ ΔyΔy.




Assuming fXY (x,y) to be continuous at (x,y) and fY (y) > 0, the limit

exists and the conditional distribution function of X un-der the condition {Y = y} can be expressed by

limΔy→0

FX (x | y <Y ≤ y + Δy) =∂FXY (x,y) ∂ydFY (y) dy

FX (x |Y = y) =∂FXY (x,y) ∂ydFY (y) dy

=∂FXY (x,y) ∂y

fY (y)

=fXY ( ′x ,y)d ′x

−∞

x

∫fY (y)

=fXY ( ′x ,y)d ′x

−∞

x

∫fXY (x,y)dx−∞

∞

∫.




Furthermore, exploiting

the conditional density function of X under the condition {Y = y} can be represented by

FX (x |Y = y) = fX ( ′x |Y = y) d ′x−∞

x

∫ =fXY ( ′x ,y)fY (y)

d ′x−∞

x

∫

fX (x |Y = y) =fXY (x,y)fY (y)

=fXY (x,y)

fXY (x,y)dx−∞

∞

∫=∂2FXY (x,y) ∂x∂ydFY (y) dy

=dFX (x |Y = y)

dx.




Conditional density functions exhibit the properties:

(1)

(2)Bayes’ Formula:

(3)

fX (x |Y = y) ⋅ Δx =fXY (x,y) ⋅ Δx ⋅ ΔyfY (y) ⋅ Δy

≈ P(x < X ≤ x + Δx,y <Y ≤ y + Δy)P(y <Y ≤ y + Δy)

fXY (x,y) = fX (x |Y = y) fY (y) = fY (y | X = x) fX (x)

fX (x) = fXY (x,y)dy−∞

∞

∫ = fX (x |Y = y)−∞

∞

∫ fY (y) dy

fX (x |Y = y) = fY (y | X = x) fX (x) fY (y), if fY (y) > 0




1.7.5 Independent Random VariablesWe say that two continuous random variables, X and Yare independent if the events {X £ x} and {Y £ y} are in-dependent for all x,y Î R, i.e. cf. section 1.3 that

FXY (x,y) = FX (x) FY (y), fXY (x,y) = fX (x) fY (y)and consequently

FX (x |Y = y) = FX (x), FY (y |X = x) = FY (y),fX (x |Y = y) = fX (x) and fY (y |X = x) = fY (y) .

For notational convenience we define fX (x|y) := fX (x |Y = y), fY (y |x) := fY (y |X = x),FX (x |y) := FX (x |Y = y) and FY (y |x) := FY (y |X = x).




1.7.6 Bivariate Normal DistributionTwo continuous random variables X and Y are said to obey a bivariate normal distribution with parameters µX, µY, |r | < 1, sX > 0, sY > 0 if their probability density func-tion is given by

for all x,y Î R.

fXY (x,y) =1

2πσ XσY 1− ρ2exp − 1

2(1− ρ2)⎧⎨⎩

×

x − µXσ X

⎛

⎝⎜⎞

⎠⎟

2

−2ρ(x − µX )(y − µY )

σ XσY

+y − µYσY

⎛

⎝⎜⎞

⎠⎟

2⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎫⎬⎪

⎭⎪




Bivariate Normal Density Function

-20

2

-2

0

2

0

0.05

0.1

0.15

0.2

xy

f X(x

)

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

xy

µX = 0, µY = 0, s2X = 1, s2

Y = 0, r = 1/2 µX = 0, µY = 0, sX = 1, sY = 1, r = 1/2




Employing vector/matrix notation the bivariate normal density function can be expressed by

where denotes the vector of the expected/mean val-ues and represents the so-called covariance matrix, i.e.

and one writes

fX(x) =1

2π det(ΣX )exp − 1

2(x − µX )

T ΣX−1(x − µX )

⎧⎨⎩

⎫⎬⎭,

µX =µX1µX2

⎛

⎝⎜⎜

⎞

⎠⎟⎟, ΣX =

σ X1

2 ρσ X1σ X2

ρσ X1σ X2

σ X2

2

⎛

⎝⎜⎜

⎞

⎠⎟⎟,

µXΣX

X = (X1,X 2)T ∼N2(µX,ΣX ).




Marginal density functionsThe marginal density functions of X and Y are given by

and

Hence, the marginal probability distributions of X and Y are , respectively.

fX (x) = fXY (x, ′y )d ′y−∞

∞

∫ = 1

2πσ X

exp − 12x − µXσ X

⎛

⎝⎜⎞

⎠⎟

2⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

fY (y) = fXY ( ′x ,y)d ′x−∞

∞

∫ = 1

2πσY

exp − 12

y − µyσY

⎛

⎝⎜

⎞

⎠⎟

2⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪.

N (µX ,σ X2 ) and N (µY ,σY

2 )




Conditional density functionsThe conditional density function of X under the condition{Y = y} is given by

and we can write in abbreviated form

fX (x |Y = y) =fXY (x,y)fY (y)

= 1

σ X 2π (1− ρ2)×

exp − 12σ X

2 (1− ρ2)x − µX + ρ

σ X

σY

(y − µY )⎛

⎝⎜⎞

⎠⎟⎡

⎣⎢⎢

⎤

⎦⎥⎥

2⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

X |Y = y ∼N µX + ρσ X

σY

(y − µY ),σ X2 (1− ρ2)

⎛

⎝⎜⎞

⎠⎟.




Analogous, the conditional density function of Y under the condition {X = x} is given by

Thus, we can write

fY (y | X = x) =fXY (x,y)fX (x)

= 1

σY 2π 1− ρ2( )×

exp − 12σY

2(1− ρ2)y − µY + ρ

σY

σ X

x − µX( )⎛

⎝⎜⎞

⎠⎟⎡

⎣⎢⎢

⎤

⎦⎥⎥

2⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪.

Y | X = x∼N µY + ρσY

σ X

(x − µX ),σY2(1− ρ2)

⎛

⎝⎜⎞

⎠⎟.




1.8 Transformations of Random Variables1.8.1 Function of One Random VariableLet g : R® R be a measurable function, i.e. "y Î R is

Then, we can define a random variable Y : X ® R by

possessing a distribution function determined by FY (y) = P(Y ≤ y) = P g(X ) ≤ y( )

= P ξ :g X (ξ)( ) ≤ y{ }( ) = P X −1 g−1 (−∞,y ]( )( )( ) .

ξ !Y(ξ) = g X (ξ)( )

g−1 (−∞,y ]( ) = {x :g(x) ≤ y } ∈B.




Strictly Monotonic Functiona) Suppose that g(x) is a strictly monotonic increasing

function. The distribution function can be written as

Moreover, if fX (x) is continuous and g(x) continuously differentiable at x = g -1(y) we can derive the density function fY (y) by applying the chain role of calculus.

FY (y) = P(Y ≤ y) = P g(X ) ≤ y( )= P X ≤ g−1(y)( ) = FX g−1(y)( ) = fX (x)dx−∞

g−1(y )

∫ .




Hence, we obtain

fY (y) = fX g−1(y)( )d g

−1(y)( )dy

for g(−∞) < y < g(∞)

0 elsewhere

⎧

⎨⎪

⎩⎪

=

fX g−1(y)( )

dg(x) dxx=g−1(y )

for g(−∞) < y < g(∞)

0 elsewhere

⎧

⎨⎪⎪

⎩⎪⎪




b) Let g(x) be a strictly monotonic decreasing function. Consequently, we have

Again, if fX (x) is continuous and g(x) continuously dif-ferentiable at x = g -1(y) the density function fY (y) can be determined by employing the chain role of calculus.

FY (y) = P(Y ≤ y) = P g(X ) ≤ y( ) = P X ≥ g−1(y)( )= 1−P X < g−1(y)( ) = 1− FX g−1(y)− 0( )= 1− fX (x)dx−∞

g−1(y )

∫ = fX (x)dxg−1(y )

∞

∫




Thus, we obtain

fY (y) = − fX g−1(y)( )d g

−1(y)( )dy

for g(−∞) < y < g(∞)

0 elsewhere

⎧

⎨⎪

⎩⎪

=−

fX g−1(y)( )

dg(x) dxx=g−1(y )

for g(−∞) < y < g(∞)

0 elsewhere

⎧

⎨⎪⎪

⎩⎪⎪




Exploiting the property

the results of case a) and b) can be summarised.Let fX (x) be continuous at x = g -1(y) and g(x) any strictly monotonic function that is continuously differentiable at x = g -1(y). Then fY (y) can be calculated by

dg(x) dx > 0 for g(x) strictly monotonic increasing< 0 for g(x) strictly monotonic decreasing

fY (y) =

fX g−1(y)( )

dg(x) dxx=g−1(y )

for g(−∞) < y < g(∞)

0 elsewhere

⎧

⎨⎪⎪

⎩⎪⎪




Exercise 1.8-1: Linear function




Non-Monotonic FunctionTheorem:Let fX (x) denote the continuous density function of the random variable X and let g(x) be a continuously differ-entiable function. Furthermore, suppose that equation y = g(x) may possess n solutions for a particular y, i.e.

Then, fY (y), the continuous density function of the ran-dom variable Y = g(X) can be determined by

fY (y) =fX (xi )

dg(xi ) dx xi=gi−1(y )

i=1

n

∑

y = g(x1) =… = g(xn).




Exercise 1.8-2: Quadratic function




1.8.2 One Function of Two Random VariablesSuppose (X,Y) are random variables with bivariate den-sity function fXY (x,y). Let g(x,y) be a function such that

represents a random variable, i.e. for all z Î R is

Then the distribution function of Z is given by

Z = g(X ,Y )

Dz = (x,y) :g(x,y) ≤ z{ }∈B.

FZ (z) = P(Z ≤ z) = P g(X ,Y ) ≤ z( )= P (X ,Y )∈Dz( ) = fXY (x,y)dxdyDz

∫∫




Exercise 1.8-3: Sum of two random variables

Exercise 1.8-4: Magnitude of the difference of two independent random variables




1.8.3 Two Functions of Two Random VariablesLet (X,Y) be random variables with bivariate density func-tion fXY (x,y). Suppose g1(x,y) and g2(x,y) are functions such that

are random variables, i.e. for all u,v Î R are

and consequently

),(and),( 21 YXgVYXgU ==

Duv = Du ∩Dv = (x,y) :g1(x,y) ≤ u,g2(x,y) ≤ v{ }∈B.

Du = (x,y) :g1(x,y) ≤ u{ }∈B ,Dv = (x,y) :g2(x,y) ≤ v{ }∈B




The bivariate distribution function of (U,V) is given by

Assume that the equation system

has the unique solution (x,y) = g1

−1(u,v),g2−1(u,v)( ).

FUV (u,v) = P(U ≤ u,V ≤ v)

= P g1(X ,Y ) ≤ u,g2(X ,Y ) ≤ v( )= P (X ,Y )∈Duv( ) = fXY (x,y)dxdyDuv

∫∫ .

(u,v) = g1(x,y),g2(x,y)( )




Furthermore, suppose g1(x,y), g2(x,y) have continuous partial derivatives and the determinant of the Jacobian

does not vanish, i.e.

Then the bivariate density function of (U,V) is given by

J(x,y) =∂(g1,g2)∂(x,y)

=∂g1 ∂x ∂g1 ∂y∂g2 ∂x ∂g2 ∂y

⎛

⎝⎜

⎞

⎠⎟

det J(x,y)( ) ≠ 0.

fUV (u,v) =fXY g1

−1(u,v),g2−1(u,v)( )

det J g1−1(u,v),g2

−1(u,v)( )( ) .




Alternatively, using the Jacobian

and exploiting the well known results

and

we can write

!J(u,v) = J g1−1(u,v),g2

−1(u,v)( )−1

!J(u,v) =∂(g1

−1,g2−1)

∂(u,v)=

∂g1−1 ∂u ∂g1

−1 ∂v

∂g2−1 ∂u ∂g2

−1 ∂v

⎛

⎝⎜

⎞

⎠⎟ .

fUV (u,v) = fXY g1−1(u,v),g2

−1(u,v)( ) det !J(u,v)( ) .

det !J(u,v)( ) = 1 det J g1−1(u,v),g2

−1(u,v)( )( )




Exercise 1.8-5: Linear transformation

Exercise 1.8-6: Product of two random variables

Exercise 1.8-7: Quotient of two random variables

Exercise 1.8-8:Rayleigh distribution




1.9 Expectation Operator1.9.1 Expectation for Univariate DistributionsExpected Value of a Random VariableThe expected value of a random variable X, also called mean value, is defined by

provided that the sum respectively integral converges absolutely. The two cases can be summarized by introducing the

µX = E(X ) =xi pX (xi ) when X is discrete

i∑

xfX (x) dx−∞

∞

∫ when X is continuous

⎧

⎨⎪

⎩⎪




so-called Stieltjes-Integral

Remark:Let a = x0 < x1 < … < xn = b be points that provide a par-tition of [a,b] into n subintervals (xk,xk+1] (k = 0,…,n-1)andThen the Stieltjes-Integral is defined by

µX = E(X ) = x dFX (x)−∞

∞

∫ .

g(x)a

b

∫ dF(x) = limn→∞

maxk(xk+1−xk )→0

g( !xk ) F(xk+1)− F(xk )( )k=0

n−1

∑ .

!xk ∈(xk ,xk+1].




Exercise 1.9-1: Expected value for Poisson distribution

Exercise 1.9-2: Expected value for exponential distribution

Exercise 1.9-3: Expected value for normal distribution

Exercise 1.9-4: Expected value for Cauchy distribution




Expected Value of a Function of a Random VariableFor a measurable function, g(X), of the random variable X, we define the expected value of g(X) by

provided that the sum respectively integral converges absolutely.

E g(X )( ) = g(x) dFX (x)−∞

∞

∫

=g(xi )pX (xi ) when X is discrete

i∑

g(x)fX (x) dx−∞

∞


⎧

⎨⎪

⎩⎪




Let FY (y) denote the distribution function of the random variable Y = g(X), then by using the definition of integra-tion it is easy to establish that

Consequently, the expected value of a function of a ran-dom variable g(X) can be computed directly without de-termining first the distribution function of the random var-iable Y = g(X).

E g(X )( ) = g(x) dFX (x)−∞

∞

∫ = y dFY (y)−∞

∞

∫ = E(Y ).




MomentsIf g(X) = Xk with k > 0, the expected value of g(X), i.e.

is called k-th moment of X provided that the integral con-verges absolutely. For k = 1, we obtain the mean value µX = m1 = E(X).

mk = E(X k ) = xk dFX (x)−∞

∞

∫

=xik pX (xi ) when X is discrete

i∑

xk fX (x) dx−∞

∞


⎧

⎨⎪

⎩⎪




Centralised MomentsSuppose g(X) = (X - µX)k with k > 0, then the expected value of g(X), i.e.

is called k-th centralised moment of X.

ck = E (X − µX )k( ) = E k

m⎛⎝⎜

⎞⎠⎟m=0

k

∑ (−µX )mX k−m⎛

⎝⎜⎞

⎠⎟

=km

⎛⎝⎜

⎞⎠⎟m=0

k

∑ (−µX )mE(X k−m)




For k = 2, the centralised moment given by

is called variance. The positive root of the variance is denoted by sX and is called standard deviation.

Absolute MomentsThe k-th absolute moment of X is defined by

σ X2 = Var(X ) = c2 = E (X − µX )

2( ) = E(X 2)− µX2

E |X |k( ) = |x|k dFX (x)−∞

∞

∫ .




Because of the inequality

we can state that the existence of the k-th absolute mo-ment insures the existence of the (k-1)-th moment.

Chebyschev InequalityLet X be a random variable. For k ³ 1 and any e > 0 the inequality

holds.

|X |k−1≤1+ |X |k for k = 1,2,…,

P |X | ≥ ε( ) ≤ E |X |k( )

ε k




Exercise 1.9-5:Second order moments for Poisson distribution

Exercise 1.9-6: Second order moments for exponential distribution

Exercise 1.9-7:Second order moments for normal distribution

Exercise 1.9-8:Application and proof of the Chebyschev inequality




Characteristic FunctionThe characteristic function of the random variable X is de-fined by taking the expected value of g(X) = e jsX, i.e.

where s Î R.

ΦX(s) =E(e js X) = e js x dFX (x)−∞

∞

∫

=

e js xi pX (xi ) when X is discretei∑

e js x fX (x) dx−∞

∞


⎧

⎨⎪⎪

⎩⎪⎪




Characteristic functions have the properties: § FX (s) is continuous in s

(absolute and uniform convergence of the sum resp. integral)

§ |FX (s)| £ 1 for all s Î R

§ FX (0) = E(e0) = E(1) = 1(characteristic function takes its maximum at s = 0)

Note:

ΦX(s) = e js x fX (x)dx−∞

∞

∫ ≤ e js x fX (x)dx−∞

∞

∫ = fX (x)dx−∞

∞

∫ =1

ΦX(s) = e js xi pX (xi )i∑ ≤ e js xi pX (xi )i∑ = pX (xi )i∑ =1




Moreover, one can easily observe that FX (-w) equals the Fourier transform of fX (x). Hence, the properties of a characteristic function are es-sentially equivalent to the properties of a Fourier trans-form (one-to-one mapping).

Consequently, the probability distribution of a random var-iable is uniquely defined by the inverse Fourier transformof the characteristic function FX (s), i.e.

fX (x) =12π

e− j s xΦX (s)ds−∞

∞

∫ .




Let X be a random variable with characteristic function FX (s) and Y = aX + b. Thus, the characteristic function of Y can be easily determined by

Its probability density function can be derived by apply-ing the inverse Fourier Transform as follows

ΦY (s) = E(ejsY ) = E(e js(aX +b)) = e jbsE(e jas X ) = e jbsΦX (as).

fY (y) =12π

e− j s y

−∞

∞

∫ ΦY (s)ds =12π

e− j s(y−b)

−∞

∞

∫ ΦX (as)ds

= 1a12π

e− j ′s (y−b) a

−∞

∞

∫ ΦX ( ′s )d ′s = 1afX

y − ba

⎛⎝⎜

⎞⎠⎟.




Moment TheoremSuppose that E(Xk) exists for any k ³ 1, i.e. E(|X|k) < ¥, and therefore

holds, i.e. the order of differentiation and integration can be interchanged, we can deduce the so-called moment theorem

dkΦX (s)dsk

= dkE(e js X )dsk

= E ∂k e j s X

∂sk⎛⎝⎜

⎞⎠⎟= j kE(X ke js X ),

mk = E X k( ) = 1j kdkΦX (s)dsk

s=0




Exercise 1.9-9:Characteristic function of univariate normal distributions

Exercise 1.9-10:Higher order moments of univariate normal distributions

Exercise 1.9-11:Non-negative definiteness of the characteristic function




1.9.2 Expectation for Bivariate DistributionsExpected Value of a Function of two Random VariablesFor a measurable function g(X,Y) of the random variables X and Y, the expected value of g(X,Y) is defined by

provided that the sum respectively integral converges ab-solutely.

E g(X ,Y )( ) = g(x,y) d 2FXY (x,y)−∞

∞

∫−∞

∞

∫ =

=

g(xi ,y j ) pXY (xi ,y j ) in the discrete casej∑

i∑

g(x,y)fXY (x,y) dxdy in the continuous case−∞

∞

∫−∞

∞

∫

⎛

⎝

⎜⎜⎜⎜




FZ (z) may denote the distribution function of the random variable Z = g(X,Y). Then analogue to the univariate case, we can show that

That is, the expected value of a function of two random variables g(X,Y) can be computed directly without deter-mining first the distribution function of the random vari-able Z = g(X,Y).

E g(X ,Y )( ) = g(x,y) d 2FXY (x,y)−∞

∞

∫−∞

∞

∫= z dFZ (z)−∞

∞

∫ = E(Z).




Expected Value of a Linear CombinationThe expected value of a linear combination leads to

Thus, the expected value of a linear combination equals the linear combination of the expected values.

We note in particular that

E ai gi (X ,Y )i∑⎛⎝⎜

⎞⎠⎟= ai gi (x,y)

i∑ d 2FXY (x,y)−∞

∞

∫−∞

∞

∫ =

= ai gi (x,y) d2FXY (x,y)−∞

∞

∫−∞

∞

∫i∑ = aiE gi (X ,Y )( )

i∑ .

E(aX k + bY k ) = aE(X k )+ bE(Yk ) for k ≥1.




Bivariate MomentsThe (k,l)-th moment and centralised moment of discretedistributed random variables are defined by

and

respectively.

mkl = E(XkY l ) = xi

ky jl pXY (xi ,y j )

j∑

i∑ k = 1,2,…; l = 1,2,…

ckl = E X −E(X )( )k Y −E(Y )( )l( ) = E (X − µX )k (Y − µY )

l( )= (xi − µX )

k (y j − µY )l pXY (xi ,y j )

j∑

i∑ ,

k = 1,2,…; l = 1,2,…




In case of continuously distributed random variables, the (k,l)-th moment and centralised moment are defined by

and

respectively.

mkl = E(X kY l ) = xky l fXY (x,y) dxdy−∞

∞

∫−∞

∞

∫ k = 1,2,…; l = 1,2,…

ckl = E X −E(X )( )k Y −E(Y )( )l( ) = E (X − µX )k (Y − µY )l( )= (x − µX )k (y − µY )l fXY (x,y) dxdy

−∞

∞

∫−∞

∞

∫ k = 1,2,…; l = 1,2,…




On setting k = 0 or l = 0, the moments reduce to the cor-responding moments of the marginal distributions of Xand Y. However, if k ³ 1 and l ³ 1, the moments become func-tions of the complete bivariate distribution.

In particular, setting k = l = 1, the centralised moment,called the covariance between X and Y, is given by

and

c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( )= (xi − µX )(y j − µY )pXY (xi ,y j )

j∑

i∑




for the discrete and continuous case, respectively. The covariance can be expressed by first and second or-der moments as follows

Furthermore, we have

c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( )= (x − µX )(y − µY )fXY (x,y) dxdy−∞

∞

∫−∞

∞

∫

c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( ) = E(XY )− µXµY .

c20 = Cov(X,X ) = E (X −µX )2( )

= Var(X ) = E(X 2)−µX2 = σ X

2 .




The covariance measures the degree of linear associa-tion between X,Y; i.e. the larger resp. smaller the mag-nitude of the covariance the larger resp. smaller is the linear association.To achieve an unified understanding about what is large and small, we introduce the normalized quantity

which is called the correlation coefficient between X,Y.

ρ (X,Y ) = Cov(X ,Y )Var(X )Var(Y )

=c11c20 c02

,




Exercise 1.9-12:Covariance and correlation coefficient of bivariate normal distributions




Theorem:For all bivariate distributions with finite second order mo-ments the Cauchy Schwarz inequality

holds and the correlation coefficient satisfies the inequality

where the equality sign is taken, if, with probability 1, a linear relationship between X and Y exists.

E(XY )( )2 ≤E(X 2)E(Y 2)

ρ (X,Y ) ≤1,




Exercise 1.9-13:Proof of the inequalities




Uncorrelatedness, Orthogonality and Independence Let X,Y be random variables. Then X,Y are called

(1) uncorrelated, if

(2) orthogonal, if

(3) independent, if for all x,y Î R

ρ(X,Y ) = 0⇒Cov(X,Y ) = 0⇒E(XY ) = E(X )E(Y )

E(XY ) = xy fXY (x,y) dxdy−∞

∞

∫−∞

∞

∫ = 0

fXY (x,y) = fX (x) ⋅ fy (y)




Implications:1) If X and Y are independent random variables and

g1(x) and g2(y) are measurable functions then U =g1(X) and V = g2(Y) are independent and uncorre-lated random variables.

2) If X and Y are orthogonal random variables then E(X+Y)2 = E(X 2) + E(Y 2) holds.

3) If X and Y are orthogonal random variables and E(X) = 0 or/and E(Y) = 0 then X and Y are uncor-related random variables.




Remarks:§ If X and Y are uncorrelated random variables then

U = g1(X) and V = g2(Y) are not necessarily uncorre-lated random variables.

§ If X and Y are uncorrelated random variables then X and Y are not necessarily independent random var-iables.

§ If X and Y are uncorrelated and normally distributed random variables then X and Y are also independent random variables.




Exercise 1.9-14:Verification of the remarks




Conditional Expected ValueSuppose that X and Y are bivariate distributed continu-ous random variables. The conditional expected value of X, given Y = y, written as EX (X |Y = y) is defined by

provided that the integral converges absolutely. The corresponding expression for the discrete case is obtained with obvious modifications.

EX (X |Y = y) = x fX (x |Y = y) dx−∞

∞

∫ =

= xfXY (x,y)fY (y)

dx−∞

∞

∫ = 1fY (y)

x fXY (x,y) dx−∞

∞

∫




In general, the value of EX (X |Y = y) will vary as we vary the value of y. Thus, EX (X |Y = y) will be a function of y and we can write

where yX (y) is called the regression function of X on Y.

Analogously, the conditional expected value of Y, given X = x, is defined by

where yY (x) denotes the regression function of Y on X.

EX (X |Y = y) =ψ X (y),

EY (Y | X = x) =ψY (x),




More generally, if we consider a measurable function of X, g(X), whose expected value exists, then the conditional expected value of g(X), given Y = y, is given by

Similarly, the conditional expected value of g(Y), given X = x, is defined by

EX g(X ) |Y = y( ) = g(x)fX (x |Y = y) dx−∞

∞

∫= 1fY (y)

g(x)fXY (x,y) dx−∞

∞

∫ =ψ g(X )(y).

EY g(Y ) | X = x( ) =ψ g(Y )(x).




Conditional ExpectationLet us now consider the random variable yg(X)(Y), which we obtain by replacing y by Y in the function yg(X)(×).

The random variable yg(X)(Y) is called the conditional ex-pectation of g(X), given Y, and we write

For the random variable yg(Y)(X), we analogous write

ψ g(X )(Y ) = EX g(X ) |Y( ).

ψ g(Y )(X ) = EY g(Y ) | X( ).




Properties of conditional expectations:(1) EY[EX (X |Y)] = E(X), EX [EY (Y|X)] = E(Y)

or more generally EY [EX (g(X)|Y)] = E(g(X)), EX [EY (g(Y)|X)] = E(g(Y))

(2) Moreover, if h(Y) is a function such that E(g(X)h(Y))exists, thenEX (g(X)h(Y) |Y=y) = h(y)EX (g(X) |Y=y) (conditional on Y=y, h(Y) can be treated as constant) and hence we haveEY [EX(g(X)h(Y) |Y)] = EY [h(Y)EX (g(X) |Y)]

= E(g(X)h(Y))




Exercise 1.9-15:Verification of the properties

Exercise 1.9-16:Determination of E(XY) for bivariate normal distributed random variables




Bivariate Characteristic FunctionThe bivariate characteristic function of the random varia-bles X and Y is defined by

and we can write

and

for the discrete and continuous case respectively.

ΦXY (s1,s2) = exp j s1x + s2y( )( ) fXY (x,y) dxdy−∞

∞

∫−∞

∞

∫

ΦXY (s1,s2) = exp j s1xn + s2ym( )( )m∑ pXY (xn,ym)

n∑

ΦXY (s1,s2) = E exp j s1X + s2Y( )( )( ) with (s1,s2)T ∈!2,




Properties of bivariate characteristic functions: § FXY (s1,s2) is continuous in s1 and s2

§ |FXY (s1,s2)| £ 1 for all (s1,s2)TÎ R2

§ FXY (s1,0) = FX (s1) and FXY (0,s2) = FY (s2)§ FXY (0,0) = E(e0) = E(1) = 1§ FXY (-w1,-w2) is the 2d-Fourier transform of fXY (x,y)

§ the random variables X and Y are independent, iffFXY (s1,s2) = FX (s1) ·FY (s2)

⇒ fXY (x,y) =1

(2π )2e− j s1x+s2 y( )

−∞

∞

∫ ΦXY (s1,s2) ds1ds2−∞

∞

∫




Exercise 1.9-17:Determine the density of Z = X +Y, if X and Y are independent random variables




Moment TheoremSupposing the moments mkl = E(XkY l) exist and therefore

holds, i.e. the order of differentiation and integration can be interchanged, we can deduce the moment theorem

∂k+lΦXY (s1,s2)∂s1

k ∂s2l =

∂k+lE e j (s1X +s2Y )( )∂s1

k ∂s2l

= j k+lE X kY le j (s1X +s2Y )( ),

mkl = E X kY l( ) = 1j k+l

∂k+lΦXY (s1,s2)∂s1

k ∂s2l

s1=0,s2=0

.




1.9.3 Mean Square Error EstimationNon-linear Mean Square Error EstimationWe wish to estimate the random variable Y by a functionof the random variable X. Our aim is to find any functiong(X) such that the mean square error

is minimum. Over the class of all functions g for which the expected value exists, q(g) is minimized by choosing

q(g) = E Y − g(X )( )2( ) = y − g(x)( )2 fXY (x,y) dxdy−∞

∞

∫−∞

∞

∫

g(x) =ψY (x) = EY (Y | X = x) .




Exercise 1.9-18:Proof of the non-linear mean square error estimation result




Linear Mean Square Error EstimationNow we wish to estimate the random variable Y by a lin-ear function of X, i.e. g(X) = a1X + a0. The objective is to minimise the mean square error

by varying the parameters a1 and a0.

That is, we want to solve the minimisation problem

The mean square error is minimum if the equation system

q(a1,a0) = E Y − (a1X + a0)( )2⎡⎣⎢

⎤⎦⎥

a = a1,a0( )T = argmina1,a0

q(a1,a0)( ).




is satisfied. Solving the equation system provides

and

a1 =E(XY )−E(X )E(Y )

E X 2( )− E(X )( )2=m11 −m10m01

m20 −m102 = Cov(X ,Y )

Var(X )

= ρ(X ,Y )σY σ X

∂q(a1,a0) ∂a1∂q(a1,a0) ∂a0

⎛

⎝⎜

⎞

⎠⎟a1=a1a0=a0

= −2E Y − a1X − a0( )X( )E Y − a1X − a0( )

⎛

⎝⎜⎜

⎞

⎠⎟⎟= 0

a0 = E(Y )− a1E(X ) =m01 − a1m10 =m20m01 −m10m11

m20 −m102 .




Substitution of a1 and a0 by â1 and â0 in q(a1,a0) gives theminimum mean square error

Moreover, the minimum mean square errors of the linear and non-linear approach obviously satisfy the relationship

q(a1,a0) = E Y − (a1X + a0)( )2( )= E Y −E(Y )− a1 X −E(X )( )( )2⎛

⎝⎞⎠

= Var(Y )−2a1Cov(X ,Y )+ a12 Var(X )

= σY2 1− ρ2(X ,Y )( ).

q(a1,a0) ≥ q(g) .




Exercise 1.9-19:Mean square error estimation for bivariate normal distributed random variables




1.10 Vector-valued Random Variables1.10.1 Multivariate DistributionsMultivariate Distributions and Density FunctionsThe basic ideas of bivariate distributions are easily ex-tended to the general case, where instead of two, n ran-dom variables X1,X2,…,Xn are considered.

Thus, the distribution function of the random vector X =(X1,X2,…,Xn)T is defined by

and the density function is given by

FX(x) = FX1…Xn (x1,…,xn) = P(X1 ≤ x1,…,Xn ≤ xn)




Marginal Distributions and Density FunctionsFor a given multivariate distribution function of X1,X2,…,Xn, the marginal distribution and density function of X1,X2, …,Xk can be expressed by

and

respectively.

FX1…Xk (x1,…,xk ) = FX1…Xk…Xn (x1,…,xk ,∞,…,∞)

fX(x) = fX1…Xn (x1,…,xn) =∂n

∂x1!∂xnFX1…Xn (x1,…,xn).

fX1…Xk(x1,…,xk) = ! fX1…Xk…Xn(x1,…,xk,xk+1,…,xn)dxk+1!dxn−∞

∞

∫−∞

∞

∫




Conditional Distributions and Density FunctionsThe conditional distribution and density function of X1,…,Xk under the condition {Xk+1 = xk+1,…,Xn = xn} is given by

and

FX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn) =

= P(X1 ≤ x1,…,Xk ≤ xk | Xk+1 = xk+1,…,Xn = xn)

fX1…Xk(x1,…,xk | Xk+1= xk+1,…,Xn= xn) =fX1…Xn(x1,…,xn)

fXk+1…Xn(xk+1,…,xn)




Notation:

Exercise 1.10-1:Calculations with conditional densities

FX1…Xk (x1,…,xk | xk+1,…,xn) =

= FX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn)

and

fX1…Xk (x1,…,xk | xk+1,…,xn) =

= fX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn)




Independent Random VariablesThe random variables X1, X2,…, Xn are said to be inde-pendent, if the multivariate distribution or density func-tion breaks down into the product of n marginal distribu-tions or density functions.

Thus X1,X2,…,Xn are independent if the multivariate dis-tribution can be written in the form

(valid for the discrete and continuous case)

FX1…Xn (x1,…,xn) = FXi (xi )i=1

n

∏




or if the density function can be written in the form

(applies for the continuous case).The random variables X1,…,Xk are independent from the random variables Xk+1,…, Xn if the multivariate distribu-tion or the density function can be expressed by

or

FX1…Xn (x1,…,xn) = FX1…Xk (x1,…,xk ) ⋅FXk+1…Xn (xk+1,…,xn)

fX1…Xn (x1,…,xn) = fX1…Xk (x1,…,xk ) ⋅ fXk+1…Xn (xk+1,…,xn).

fX1…Xn (x1,…,xn) = fXi (xi )i=1

n

∏




1.10.2 Transformation of Vector-valued Random Variables

Suppose X1,X2,…,Xn are random variables and g1,…,gmare functions with m £ n such that

are random variables. Then the multivariate distribution of Y1,Y2,…,Ym is given by

Y1 = g1(X1,…,Xn),…,Ym = gm(X1,…,Xn)

FY1…Ym (y1,…,ym) =

= P g1(X1,…,Xn) ≤ y1,…,gm(X1,…,Xn) ≤ ym( ).




The multivariate density function of Y1,Y2,…,Ym can be determined as follows. In case of m < n we define n- mfunctions (auxiliary variables)

Assume that the equation system

has the unique solution

yi = gi (x1,…,xn) for i = 1,…,myi = xi = gi (x1,…,xn) for i =m +1,…,n.

(x1,…,xn)T = g1

−1(y1,…,yn),…,gn−1(y1,…,yn)( )T .

(y1,…,yn)T = g1(x1,…,xn),…,gn(x1,…,xn)( )T




Furthermore, suppose g1(x1,x2,…,xn),…, gn(x1,x2,…,xn) have continuous partial derivatives and the determinant of the Jacobian

does not vanish, i.e.

J(x1,…,xn) =

∂g1∂x1

!∂g1∂xn

! !∂gn∂x1

!∂gn∂xn

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

det J(x1,…,xn)( ) ≠ 0,




then the multivariate density function of Y1,Y2,…,Yn is given by

Finally, integration over ym+1,…,yn provides the multivar-iate density function of Y1,Y2,…,Ym.

fY1…Yn (y1,…,yn) =fX1…Xn (x1,…,xn)

detJ(x1,…,xn) x1=g1−1(y1,…,yn )

!xn=gn

−1(y1,…,yn )

.

fY1…Ym (y1,…,ym) = ! fY1…Yn (y1,…,yn)dym+1!dyn−∞

∞

∫−∞

∞

∫




Exercise 1.10-2:Density of a linear combination of random variables




1.10.3 Expectations for Vector-valued Random Variables

Expected ValueFor a measurable function g(X1, X2,…, Xn) the expectedvalue of g(X1,X2,…,Xn) is defined by

provided that the sum resp. integral converges absolutely.

E g(X1,…,Xn)( ) = ! g(x1,…,xn)dnFX1…Xn(x1,…,xn)−∞

∞

∫−∞

∞

∫

=

! g(x1,i1,…,xn,in) pi1…inin

∑i1

∑

! g(x1,…,xn)fX1…Xn(x1,…,xn)dx1!dxn−∞

∞

∫−∞

∞

∫

⎛

⎝

⎜⎜⎜⎜




Conditional Expected ValueLet X1, X2,…, Xn are multivariate distributed continuousrandom variables. The conditional expected value of X1, given X2 = x2,…, Xn = xn is defined by

E X1 | X 2 = x2,…,Xn = xn( ) = x1 fX1(x1 | x2,…,xn)dx1−∞

∞

∫

= x1fx1…xn (x1,…,xn)

fx2…xn (x2,…,xn)dx1−∞

∞

∫=ψ X1

(x2,…,xn).




Conditional ExpectationNow, replacing x2,…,xn by X2,…, Xn in yX1

(x2,…,xn) we obtain the random variable

which is called the conditional expectation of X1, given X2,…,Xn.

Properties of conditional expectations:

ψ X1(X 2,…,Xn) = E X1 | X 2,…,Xn( )

E E X1 | X 2,…,Xn( )⎡⎣ ⎤⎦ = E(X1)

E X1X 2 | X3( ) = E E X1X 2 | X 2,X3( ) | X3⎡⎣ ⎤⎦

= E X 2E X1 | X 2,X3( ) | X3⎡⎣ ⎤⎦




Uncorrelated and Orthogonal Random VariablesThe random variables X1,…, Xn are called uncorrelatedresp. called orthogonal if for all i ¹ j

holds. Consequently, if X1,…, Xn are uncorrelated resp. are or-thogonal and

we can write

E XiX j( ) = E Xi( )E X j( ) resp. E XiX j( ) = 0

U = Xii=1

n∑ ,

σU2 = σ Xi

2

i=1

n

∑ resp. E U2( ) = E Xi2( )

i=1

n

∑ .




MomentsLet X1,…, Xn be multivariate distributed continuous ran-dom variables. Then common moments of X1,…, Xn can be determined by

and where the order of the moments is defined by

mk1…kn= E X1

k1,…,Xnkn( )

= x1k1!xn

knfX1…Xn (x1,…,xn)dx1!dxn−∞

∞

∫

r = k1 +!+ kn.




Characteristic FunctionsThe characteristic function of the multivariate distributed random variables X1,…,Xn is given by

Hence, if the random variables X1,…,Xn are independent, we obtain

ΦX(s) = ΦX1…Xn(s1,…,sn)

= E exp j siX ii=1

n

∑⎛⎝⎜

⎞⎠⎟

⎛

⎝⎜⎞

⎠⎟= E exp j sTX( )⎡

⎣⎤⎦.

ΦX(s)=E exp jsi X i( )i=1

n

∏⎛⎝⎜⎞⎠⎟= E exp jsi X i( )⎡⎣ ⎤⎦

i=1

n

∏ = ΦXi(si )

i=1

n

∏




Let X1,…,Xn be now independent random variables and

then the characteristic function of U is given by

If in addition, the X1,…,Xn possess the density functions fX1

,…, fXn, the density function of U can be determined by

U = Xii=1

n

∑ ,

ΦU (s) = E exp j sU( )⎡⎣ ⎤⎦ = E exp j sX ii=1

n

∑⎛⎝⎜

⎞⎠⎟

⎛

⎝⎜⎞

⎠⎟= ΦXi

(s)i=1

n

∏ .

fU (u) = fX1∗ fX2∗!∗ fXn( ) (u)




Exercise 1.10-3:Sample mean and sample variance of independent normally distributed random variables




Moment TheoremSuppose that the moments exist and therefore

holds, we can deduce the moment theorem

mk1…kn= E X1

k1!Xnkn( ) = 1

j k1+!+kn

∂k1+!+knΦX1!Xn(s1,…,sn)

∂sk1!∂skn s1=0"

sn=0

.

∂k1+!+knΦX1,…,Xn(s1,…,sn)

∂s1k1!∂sn

kn=∂k1+!+knE e j (s1X1+!+snXn )( )

∂s1k1!∂sn

kn

= j k1+!+knE X1k1!Xn

kn e j (s1X1+!+snXn )( ),

mk1…kn= E X1

k1!Xnkn( )




Exercise 1.10-4:Application of the Moment Theorem




1.10.4 Mean Square Error EstimationNon-linear Mean Square Error EstimationTo estimate the random variable X0 by means of the ran-dom variables X1,…, Xn, we want to find any function g(X1,…,Xn) that minimizes the mean square error

Over the class of all functions g for which the expected value exists, q(g) is minimised by choosing

q(g) = E X0 − g(X1,…,Xn)( )2( )= ! x0− g(x1,…,xn)( )2 fX0…Xn(x0,…,xn)dx0!dxn−∞

∞

∫−∞

∞

∫ .

g(x1,…,xn) =ψ X0(x1,…,xn) = EX0 X0 | X1= x1,…,Xn= xn( ) .




Linear Mean Square Error EstimationNow we wish to estimate the random variable X0 by a lin-ear function of X1,…,Xn, i.e.

such that the mean square error

is minimized by varying the parameter vector a.

g(X1,…,Xn) = a1X1 +!+ anXn = aiX ii=1

n

∑ = aTX,

q(a) = q(a1,…,an) = E X0 − aiX ii=1

n

∑ )⎛⎝⎜

⎞⎠⎟

2⎛

⎝⎜⎜

⎞

⎠⎟⎟= E X0 − a

TX( )2⎡⎣⎢

⎤⎦⎥




Thus, we have to solve the minimisation problem

The mean square error is minimum if

is satisfied for i = 1,…, n. After some manipulations we obtain the equation system

where

a = (a1,…,an)T = argmin

a1,…,an

q(a1,…,an)( ).

∂q(a)∂ai a=a

= −2E X0 − aTX( )Xi

⎡⎣

⎤⎦ = 0

Ra = r,

R = E(XXT ) and r = E(X0X).




With the estimate the minimum mean square er-ror is given by

q(a) = E X0 − aTX( )2⎡

⎣⎢⎤⎦⎥

= E X0 − (R−1r)TX( )2⎡

⎣⎢⎤⎦⎥= E X0 − r

TR−1X( )2⎡⎣⎢

⎤⎦⎥

= E X0X0 − 2rTR−1XX0 + r

TR−1XXTR−1r( )= E X0

2( )− 2rTR−1r + rTR−1RR−1r

= E X02( )− rTR−1r

a =R−1r




Orthogonality PrincipleThe coefficient vector that minimizes the mean square error satisfies the equation

Thus, the residual

is orthogonal to the observations X1,…,Xn. This is knownas the orthogonality principle.

a

E X0 − aTX( )Xi

⎡⎣

⎤⎦ = 0, i = 1,…,n

R = X0 − aTX = X0 − X0




A geometric interpretation of this result is shown in the figure below, where X0, R and X1,…, Xn are thought of as vectors.

R = X0 − X0

0X

3X2X

1X

XaTX ˆˆ0 =




Consequently, 1) is the orthogonal projection of X0 onto the sub-

space U spanned by the observations X1,…,Xn. 2) R is orthogonal to the subspace U and therefore or-

thogonal to the observations X1,…,Xn.3) if and only

if X0 lies within the subspace U, i.e. X0 is linearly de-pendent on X1,…,Xn, thus .

4) if and only if X0 is or-thogonal to the subspace U.

X0

X0 = aTX

q( a) = E (X0 − X0)2( ) = E (X0 − X0)X0( ) = 0

q( a) = E (X0 − X0)X0( ) = E X02( )




1.10.5 Multivariate Normal DistributionCharacteristic Function of a Normally Distributed Vector-valued Random VariableLet U1,…, Um be m independent standardized normally distributed random variables, i.e. Uk ∼ N(0,1).

The characteristic function of Uk is given by

cf. exercise 1.9-9. Consequently, the characteristic function of the random vector U = (U1,…,Um)T is

ΦUk(rk ) = exp −

rk2

2

⎛

⎝⎜⎞

⎠⎟,




Transformation of the random vector U by a (n´m) ma-trix A provides the random vector

that possesses the characteristic function

ΦU(r) = E exp( j rTU)( ) = E exp( j rkUk )

k=1

m

∏⎛⎝⎜⎞⎠⎟

= E exp( j rkUk )( )k=1

m

∏ = ΦUk(rk )

k=1

m

∏

= exp − 12

rk2

k=1

m

∑⎛⎝⎜

⎞⎠⎟= exp − 1

2rTr

⎛⎝⎜

⎞⎠⎟

V = V1,…,Vn( )T = AU




where

represents the covariance matrix of V.

ΦV(s) = E exp( j sTV)( ) = E exp( j sTAU)( )

= ΦU (sTA)T( ) = exp − 1

2sTA sTA( )T⎛

⎝⎜⎞⎠⎟

= exp − 12sTAATs

⎛⎝⎜

⎞⎠⎟= exp − 1

2sTΣs

⎛⎝⎜

⎞⎠⎟,

Σ = E VVT( ) = E AU AU( )T( ) = E AUUTAT( )= AE UUT( ) AT = AIAT = AAT




Translation of the random vector V by a constant vectorleads to the random vector

The characteristic function of W can be expressed by

FW (s) represents the characteristic function of the nor-mally distributed random vector, where and denote the vector-valued expected value and the symmetric and

W = W1,…,Wn( )T = V + µ = AU+ µ .

ΦW(s) = E exp( j sTW)( ) = E exp( j sT (V+ µ))( )

= exp( j sTµ)E exp( j sTV)( ) = exp j sTµ − 12sTΣs

⎛⎝⎜

⎞⎠⎟.

µ Σ

µ = µ1,…,µn( )T




non-negative definite covariance matrix, respectively, i.e.

and

Theorem:Let and Y =AX + b, where the entries of the (n´m) matrix A and the (n´1) vector b are constants. Then with

and

E (W − µ)(W − µ)T( )= E VVT( )= Σ = ΣT,

sTΣs ≥ 0 ∀s ∈!n.

E(W) = E(V + µ) =E(V)+ µ = µ

ΣY = E (Y − µY )(Y − µY )T( ) = AΣX A

T .

µY = E(Y) = AµX +b

X∼Nm(µX,ΣX )

Y∼Nn(µY,ΣY )




Exercise 1.10-5:Proof of the Theorem




Density Function of a Normally Distributed Vector-valued Random VariableLet U1,…, Um be m independent standardized normally distributed random variables, i.e. Uk∼ N (0,1).

Thus, the density function of the random vector U = (U1, …,Um)T can be written as

fU(u) = fUk (uk )k=1

m

∏ = 1

2πexp − 1

2uk2⎛

⎝⎜⎞⎠⎟

⎛

⎝⎜⎞

⎠⎟k=1

m

∏

= (2π )−m2 exp − 1

2uk2

k=1

m

∑⎛⎝⎜

⎞⎠⎟= (2π )

−m2 exp − 1

2uTu

⎛⎝⎜

⎞⎠⎟




Now, we transform the random vector U by

where A denotes a regular (m´m) matrix, i.e. det(A)¹0.With the inverse transform

the determinant of the Jacobian

the well known identities

V = V1,…,Vm( )T = AU+ µ ,

U = A−1(V − µ),

det J(u)( ) = det ∂ v1,…,vm( )∂ u1,…,um( )

⎛

⎝⎜

⎞

⎠⎟ = det(A),




and the result derived for determining the density of a transformed random vector

det(A) = det AT( )det(Σ) = det AAT( )

= det(A)det AT( ) = det(A)( )2 > 0Σ−1 = AAT( ) −1 = AT( ) −1A −1 = A−1( )T A−1

fV(v) =fU(A

−1(v − µ))det(A)

,




we obtain

fV(v) = (2π )−m2 det(A)

−1exp − 1

2A−1(v − µ)( )T A−1(v − µ)

⎛⎝⎜

⎞⎠⎟

= (2π )−m2 det(A)

−1exp − 1

2(v − µ)T A−1( )T A−1(v − µ)

⎛⎝⎜

⎞⎠⎟

= (2π )−m2 det(A)

−1exp − 1

2(v − µ)T AAT( )−1(v − µ)

⎛⎝⎜

⎞⎠⎟

= (2π )−m2 det(Σ)

− 12 exp − 1

2(v − µ)T Σ−1(v − µ)

⎛⎝⎜

⎞⎠⎟.




Exercise 1.10-6:Transformation to standardized normal distribution




Composed Vector-valued Random VariablesTheorem:Let X be a random vector composed of the random vec-tors X1, X2 and obeying the distribution

Then X1 and X2 possessing the marginal distributions

and the conditional distribution

X =X1X2

⎛

⎝⎜

⎞

⎠⎟ ∼Nk µ =

µ1µ2

⎛

⎝⎜

⎞

⎠⎟ , Σ =

Σ11 Σ12Σ21 Σ22

⎛

⎝⎜

⎞

⎠⎟

⎛

⎝⎜

⎞

⎠⎟ .

X1∼Nk1µ1,Σ11( ), X2 ∼Nk2

µ2,Σ22( )

X1 |X2 = x2 ∼Nk1µ1 + Σ12Σ22

−1(x2 − µ2),Σ11 − Σ12Σ22−1Σ21( ).




Exercise 1.10-7:Proof of the Theorem

Exercise 1.10-8:Mean square error estimation




1.11 Sequences of Random Variables1.11.1 Convergence ConceptsLet (Xn) = X1, X2,… be a sequence of random variables. For any specific x, (Xn(x)) is a sequence that might or might not converge and where the notion of convergence can be given several interpretations.

Recall, that a deterministic sequence (xn) tends to a limit x, if for any given e > 0, we can find a number Ne such that

|xn - x | < e for every n > Ne




Convergence (everywhere)If the sequence (Xn(x)) tends to a number X(x) for everyx Î X, then we say that the random sequence (Xn) con-verges everywhere to the random variable X and we write this as

Convergence almost surely (a.s.)If the probability of the set of all events x that satisfy limn®¥Xn(x) = X (x), equals 1, i.e.

limn→∞

Xn = X or Xn → X

P ξ : limn→∞

Xn(ξ) = X (ξ){ }( ) = P limn→∞

Xn = X( ) = 1




or equivalently

then we say that the random sequence (Xn) converges almost surely (with probability 1) to the random variable X and we write this as

Convergence in Mean Square (m.s.)Let (Xn) be a sequence of random variables. We say that the sequence converges in mean square to a random variable X if

limn→∞

Xn = X or Xna.s.⎯ →⎯⎯ X .

limm→∞

P supn≥m

Xn − X ≥ ε⎛⎝

⎞⎠ = 0 for every ε > 0




holds and we write this as

where l.i.m. denotes the limit in mean.

Convergence in Probability (p)A sequence of random variables (Xn) is said to converge in probability to a random variable X if for every e > 0, we have

and we write this as

l.i.m.n→∞

Xn = X or Xnm.s.⎯ →⎯⎯ X ,

limn→∞E Xn − X( )2⎡⎣⎢

⎤⎦⎥= 0

limn→∞P Xn − X ≥ ε( ) = 0




where p lim denotes the limit in probability.

Convergence in Distribution (d)Let be the sequence of distribution functions of the sequence of random variables (Xn). Then (Xn) is said to converge in distribution (or in Law) to a random variable X with the distribution function Fx if

at all continuity points of Fx. Such a convergence is ex-pressed by

p limn→∞

Xn = X or Xnp⎯ →⎯ X ,

FXn n→∞⎯ →⎯⎯ FX

Xnd⎯ →⎯ X or Xn

L⎯ →⎯ X .

(FXn )




Relationships among the various types of convergence1) Convergence with probability 1 implies convergence in

probability2) Convergence with probability 1 implies convergence in

mean square, provided second order moments exist. 3) Convergence in mean square implies convergence in

probability

4) Convergence in probability implies convergence in dis-tribution




Exercise 1.11-1:Proof of statement 1) and 3)




1.11.2 Laws of Large NumbersChebyschev’s Theorem (Weak Law of Large Numbers) Let (Xk) be a sequence of pairwise uncorrelated random variables with

Then we have

where

E(Xk ) = µk , Var(Xk ) = σ k2 and lim

n→∞

1n2

σ k2

k=1

n

∑ = 0.

Xn =1n

Xkk=1

n

∑ and limn→∞

1n

µkk=1

n

∑ = µ.

Xnm.s.n→∞⎯ →⎯⎯ µ,




Exercise 1.11-2:Proof of Chebyschev’s Theorem




Kolmogorov’s Theorem (Strong Law of Large Numbers)Let (Xk) be a sequence of independent and identically distributed (i.i.d.) random variables. Furthermore, the mo-ments E(Xk) exist and are equal to µ. Then we have

where

Xna.s.n→∞⎯ →⎯⎯ µ,

Xn =1n

Xkk=1

n

∑ .




1.11.3 Central Limit Theorems Lindeberg-Levy’s TheoremLet (Xk) be a sequence of independent and identically distributed random variables, such that E(Xk) = µ and Var(Xk)=s 2¹0. Then the distribution function of the ran-dom variable

tends to that of a standardized normal distribution as napproaches infinity, i.e.

Yn = nXn − µσ

⎛⎝⎜

⎞⎠⎟

FYn (y) n→∞⎯ →⎯⎯ Φ(y).




Exercise 1.11-3:Proof of Lindeberg-Levy’s Theorem




Liaponov’s TheoremLet (Xk) be a sequence of independent distributed ran-dom variables, such that E(Xk) = µk, Var(Xk) =sk

2 ¹ 0 and E|Xk - µk|3 = bk. Furthermore, let

Then the distribution function of the random variable

tends to that of a standardized normal distribution as napproaches infinity, i.e.

Bn = βkk=1

n∑( )1 3 , Cn = σ k2

k=1

n∑( )1 2 and limn→∞Bn Cn = 0.

Yn =1Cn

Xk − µk( )k=1

n∑

FYn (y) n→∞⎯ →⎯⎯ Φ(y).




References to Chapter 1[1] J.F. Böhme, Stochastische Signale, Teubner, 1998[2] M. Fisz, Probability Theory and Mathematical Statistics,

Krieger Publishing Company, 1980[3] G. Hänsler, Statistische Signale, Springer, 2001 [4] S. Kay, Intuitive Probability and Random Processes using

MATLAB, Springer, 2006[5] A. Papoulis, Probability, Random Variables and Stochastic

Processes, McGraw-Hill, 1991[6] C.R. Rao, Linear Statistical Inference and Its Application,

John Wiley, 1973[7] G. Wunsch, H. Schreiber, Stochastische Systeme,

VEB-Verlag Technik, 1982

Documents

Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS1.pdf · IN S T I T U T E O F WA T E R A C O U S T I C S, SO N A R E N G I N E E R I N G A N D SI G N A L T H