Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 1
Stochastic Signals and SystemsContents1 Probability Theory2 Stochastic Processes3 Parameter Estimation4 Signal Detection5 Spectrum Analysis 6 Optimal Filtering
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 2
1 Probability Theory 5
1.1 Terminology 5
1.2 Definition of Probability 71.2.1 Relative Frequency and Probability 71.2.2 Axiomatic Approach to Probability 81.2.3 Classical Definition of Probability 10
1.3 Conditional Probability 11
1.4 Random Variables 14
1.5 Distribution Functions 171.5.1 Purely discrete case 211.5.2 Purely continuous case 25
1.6 Some Special Distributions 301.6.1 Discrete Distributions 301.6.2 Continuous Distributions 38
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 3
1.7 Bivariate Distribution 551.7.1 Bivariate Distribution Function 561.7.2 Bivariate Density Function 591.7.3 Marginal Distribution and Density Function 601.7.4 Conditional Distribution and Density Function 621.7.5 Independent Random Variables 681.7.6 Bivariate Normal Distribution 69
1.8 Transformations of Random Variables 751.8.1 Function of One Random Variable 751.8.2 One Function of Two Random Variables 841.8.3 Two Functions of Two Random Variables 86
1.9 Expectation Operator 911.9.1 Expectation for Univariate Distributions 911.9.2 Expectation for Bivariate Distributions 1071.9.3 Mean Square Estimation 132
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 4
1.10 Vector-valued Random Variables 1381.10.1 Multivariate Distributions 138 1.10.2 Transformation of Vector-valued Random Variables 1441.10.3 Expectations for Vector-valued Random Variables 1491.10.4 Mean Square Estimation 1591.10.5 Multivariate Normal Distribution 166
1.11 Sequences of Random Variables 1791.11.1 Convergence Concepts 1791.11.2 Laws of Large Numbers 1861.11.3 Central Limit Theorems 189
References to Chapter 1 192
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 5
1 Probability Theory1.1 TerminologyE: random experiment (activity where the outcome is
randomly influenced)X: sample space (set of all possible outcomes of E
which may consist of a finite, infinite countable or uncountable number of elements)
x : elementary event (possible outcome of E, i.e. xÎX)
Ø: impossible event (empty set Ø= { })
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 6
E: event (collection of some of the possible outcomesof E, i.e. E Ì X )
S : s -field, i.e. a system of subsets of X satisfying 1) X ÎS2) if E ÎS then = X \E ÎS3) if EiÎS for i = 1,2,… then
Corollary: 1) ØÎS2) if E1,E2 ÎS then E1 Ç E2 ÎS and E1\E2 ÎS3) if EiÎS for i = 1,2,… then
(X,S): measurable space
Eii=1
∞∪ ∈S
Eii=1
∞∩ ∈S
E
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 7
1.2 Definition of Probability1.2.1 Relative Frequency and ProbabilityIf a random experiment is performed n times and where the event of interest E is observed with frequency hn(E), then the relative frequency of the occurrence of E is de-fined by
Empirical law of large numbersFor sufficiently large n we can write with a high degree of certainty that
( )( ) with 0 ( ) 1.nn n
h EH E H En
= £ £
).()( EHEP n@
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 8
1.2.2 Axiomatic Approach to ProbabilityConsider an experiment E with measurable space (X,S).A probability measure P is then defined as a mapping
P : S® Rwhich satisfies the following axioms
1) if E Î S then P(E) ³ 0
2) P(X) = 13) if EiÎS for i =1,2,… and EiÇEj =Ø for i ¹ j then
å¥
=
¥
=
=÷÷ø
öççè
æ
11)(
ii
ii EPEP !
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 9
The triple (X,S,P) is called probability space.
Implications:1) P(Ø) = 0 2) if E1, E2 ÎS with E1ÌE2 then
P(E1) £ P(E2)3) if E1, E2 Î S then
P(E1ÈE2) = P(E1) + P(E2) - P(E1ÇE2)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 10
1.2.3 Classical Definition of ProbabilitySuppose that an experiment has a finite number n of pos-sible outcomes, x1, x2,…, xn and we are interested in an event E = {xi1,xi2,…,xim} with {i1, i2,…, im} Ì {1,2,…,n}. If we assume that all outcomes x1, x2,…, xn are equally likely, then
This is a basic result which assigns probabilities to events purely on the basis of combinatorial arguments.However, its application is strictly limited to experiments of a finite number of equally likely outcomes.
.outcomes of number total
to favorable outcomes of number)(nmEEP ==
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 11
1.3 Conditional ProbabilityLet (X,S,P) be a probability space with E1, E2 ÎS and P(E2) > 0. The conditional probability of E1 given that E2 occurred is defined by
One can easily show that the conditional probability sat-isfies the axioms of a probability measure.
)()()|(
2
2121 EP
EEPEEP Ç=
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 12
Implications:1) Bayes’ Formula
P(E2 | E1 ) = P(E1 | E2) P(E2) / P(E1)
Furthermore, assuming
we can derive the Total Probability
!i
iji EEjiEE ̹=Ç and,Ø
å=i
ii EPEEPEP )()|()(
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 13
and the generalised Bayes’ Formula
2) Two events E1 and E2 are called independent ifP(E1 | E2) = P(E1)
holds. Consequently, we can stipulateP(E1 Ç E2) = P(E1) P(E2)
( ) ( | )( | ) , ( ) 0( ) ( | )k k
ki ii
P E P E EP E E P EP E P E E
= >å
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 14
1.4 Random VariablesA mapping
X: X ® R,
such that to each x Î X there corresponds a unique real number X(x) Î R, is called random variable or measura-ble function with respect to S, if for each set B Ì R the inverse image
X -1(B) = {x : X(x) Î B}is element of S.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 15
To assign probabilities to random variables one has to translate statements about the values of random varia-bles as follows.
PX (B) = P(X -1(B)) = P({x : X(x) Î B })
Furthermore, a s -field has to be defined over R. One can show that such a s -field should include all intervals of the kind (-¥,x]. The power set P(R) includes the desired intervals but its cardinality is to high to be able to implement the meas-urability properties.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 16
However, one can show that a particular s -field exists, called Borel-field B, that is the smallest possible includ-ing all the intervals (-¥, x] and that guarantees the meas-urability of all sets element of B.
Thus, we can define by(R,B) the measurable space
and (R,B,PX) the probability space
of a random variable X.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 17
1.5 Distribution FunctionsGiven a random variable X, the distribution of X, FX (x), is defined by
FX (x) = PX ((-¥,x]) = P({x : X(x) £ x}) = P(X £ x).
One can show that FX (x) uniquely determines all the probabilistic properties of the random variable X.In particular, for any a,b Î R with a £ b we have
P(X £ b) = P(X £ a) + P(a < X £ b),
cf. Axiom 3. Hence P(a < X £ b) = P(X £ b) - P(X £ a) = FX (b) - FX (a).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 18
Distribution functions have the properties:(1) 0 £ FX (x) £ 1 for all x Î R
(since FX (x) is a probability)
(2) limx®-¥ FX (x) = 0, limx®¥ FX (x) = 1(since limx ®-¥ X -1((-¥, x]) = Ø Ù limx®¥ X -1((-¥, x]) = X)
(3) FX (x) is a non-decreasing function, i.e. for any h ³ 0 and all x, FX (x+h) ³ FX (x) (since FX (x + h) - FX (x) = P(x < X £ x + h) ³ 0)
(4) FX (x) is right-continuous, i.e. for all xlimh®0+ FX (x+h) = FX (x)
(the limit h ® 0 is taken through positive values only)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 19
Any distribution function FX (x), can be expressed byFX (x) = a1 FX,1 (x) + a2 FX,2 (x) + a3 FX,3 (x),
where ai ³ 0 for i = 1,2,3, a1 + a2 + a3 = 1
and FX,1 (x) is continuous everywhere and differentia-ble for almost all x, i.e. absolute continuous,FX,2 (x) is a step-function with a finite or counta-ble infinite number of jumps,FX,3 (x) is a singular function, that is continuouswith zero derivative almost everywhere.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 20
FX,1 (x) and FX,2 (x) correspond to the two basic types of probability distributions one usually encounters in prac-tise, i.e. the
continuous and discrete distribution,respectively.Since FX,3 (x) is highly pathological, it can be safely as-sumed that it does not arise in real applications. In practice we therefore ignore FX,3 (x) and assume that all distribution functions can be simply represented by
FX (x) = lFX,1 (x) + (1 – l ) FX,2 (x)with 0 £ l £ 1.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 21
1.5.1 Purely discrete case (l = 0)The distribution function FX (x) = FX,2 (x) is a simple step-function with jumps pi at the points xi for i = 1,2,3,… FX (x) would typically have the form
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x2
p2
p3
p4
p5
x3x1 x4
p1
x5
FX(x)
x
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 22
If an interval (a,b] does not contain any of the jump pointsxi, then clearly
P(a < X £ b) = FX (b) - FX (a) = 0.
Hence, X cannot take any value lying between to succes-sive jump points.
For each i and any small h > 0 we can write
P(xi - h < X £ xi + h) = FX (xi + h) - FX (xi - h) = pi .
Letting h ® 0, we obtainP(X = xi) = pi i = 1,2,3,…
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 23
Thus the only values X can take are those correspond-ing to the jump points. Therefore, X is called discrete ran-dom variable.The jump pi at point xi represents the probability that Xtakes the value xi. Furthermore, (x1,p1),(x2,p2),… are usedto define the so-called probability mass function pX (x).
x2
pX (x2) = p2
pX (x3) = p3
pX (x4) = p4
pX (x5) = p5
x3x1 x4
pX (x1) = p1
x5x
pX (x)
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 24
Discrete distribution functions possess the properties:
(1) , where the summation extents over all values of i for which xi £ x
(2) 0 £ pX (x) £ 1(since pX (x) is a probability mass function)
(3)
(since limx®¥ FX (x) = Si pX (xi) = 1)
pX (xi )i∑ = 1
FX (x) = pX (xi )i,xi≤x∑
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 25
1.5.2 Purely continuous case (l = 1)The distribution function FX (x) = FX,1(x) is absolutely con-tinuous, i.e. differentiable for almost all x. FX (x) would typically possess a graph as shown below.
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
FX(x)
x
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 26
X can, in general, take any value either on a finite or an infinite interval and is therefore called continuous ran-dom variable.Thus continuous random variables are suitable models for measuring physical quantities such as pressures, volt-ages, temperatures, etc.
Furthermore, FX (x) can be represented by
where fX (x) is said to be the probability density function (PDF) of X.
FX (x) = fX ( ′x ) d ′x−∞
x
∫ ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 27
If fX (x) is continuous at x, then
exists. For a small interval (x,x+Dx] we can now write
or
where o(Dx) represents a term of smaller order of magni-tude than Dx.
′FX (x) =dFX (x)dx
= fX (x)
P(x < X ≤ x + Δx) = FX (x + Δx)− FX (x) = fX ( ′x )d ′xx
x+Δx
∫
P(x < X ≤ x + Δx) = fX (x)Δx + o(Δx),
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 28
The latter equation forms the basis for interpreting fX (x) as a density function, namely, fX (x) defines the density of probability in the neighbourhood of the point x.Remarks:
§ fX (x) itself does not represent a probability§ fX (x) × Dx has a probabilistic interpretation, § fX (x) completely determines FX (x) and therefore
completely specifies the properties of a continu-ous random variable.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 29
Probability density functions satisfy the properties:(1) fX (x) ³ 0 for all x Î R
(since FX (x) is a non-decreasing function)
(2)(since )
(3) For any a,b Î R with a £ b
f (x)dx−∞
∞
∫ = 1
limx→∞F(x) = f (x)dx
−∞
∞
∫ = 1
P(a < X ≤ b) = FX (b)− FX (a) = fX (x)dxa
b
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 30
1.6 Some Special Distributions1.6.1 Discrete DistributionsBinomial distributionConsider an experiment which has only two possible out-comes, “success” and “failure”, with probability p and(1-p), respectively.
The number of “successes” occurring in n independent repetitions of the experiment is a random variable X that can take the values k = 0,1,…,n. Within a sequence of n independent trials, k successes can occur in
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 31
different arrangements. The probability of a specific ar-rangement is obviously
Thus, the probability for observing k successes in n inde-pendent trials is given by
The bn,p(k) (k = 0,1,…,n) are called binomial probabilities.
nk
⎛⎝⎜
⎞⎠⎟= n!k !(n − k)!
pk 1− p( )n−k .
P ξ : X (ξ) = k{ }( ) = P X = k( ) = nk
⎛⎝⎜
⎞⎠⎟pk 1− p( )n−k = bn,p(k).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 32
Exploiting the binomial theorem one can verify that
Thus, the distribution function of the so-called binomial distribution B (n,p) can be defined by
where m = ëxû Î Z, i.e. m £ x < m+1 (Gauss bracket).
FX (x) = P(X ≤ x) = bn,p(k)k=0
m
∑ =nk
⎛⎝⎜
⎞⎠⎟pk (1− p)n−k
k=0
m
∑ ,
bn,p(k)k=0
n
∑ =nk
⎛⎝⎜
⎞⎠⎟pk (1− p)n−k
k=0
n
∑ = p + (1− p)⎡⎣ ⎤⎦n= 1.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 33
0 5 10 15 20 250
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
k
b np(k )
binomial probabilities
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F X(x
)
binomial distribution function
p = 0.5
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 34
Poisson distributionConsider the limiting form of the binomial probabilities when n®¥ and p®0 in such a way that np=an®a, a positive constant.Substituting p by an /n, we obtain
bn,αnn
(k) = n!k ! n − k( )!
αn
n⎛⎝⎜
⎞⎠⎟
k
1−αn
n⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜
⎞
⎠⎟
n−k
= nn⋅ n−1n!n−k +1n
1−αn
n⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜
⎞
⎠⎟
−k
1−αn
n⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜
⎞
⎠⎟
nαnk
k !
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 35
As n tends to infinity, we have
The pa (k) (k = 0,1,…) are called Poisson probabilities.
Note that the sum of the infinitely many but countable Poisson probabilities satisfies
for all a.
bn,αnn
(k) n→∞⎯ →⎯⎯ e−α α k
k != pα (k).
0 0( ) 1
!
k
k kp k e e e
ka a a
aa¥ ¥
- -
= =
= = =å å
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 36
Hence, the distribution function of the so-called Poisson distribution P (a) is given by
where m = ëxû Î Z, i.e. m £ x < m+1 (Gauss bracket).
In practice, the Poisson distribution is used for approxi-mating the binomial distribution in cases, where in a large number of independent trials (large n) the number of oc-currences of a rare event (small p) is of interest.
FX (x) = P(X ≤ x) = pα (k)k=0
m
∑ = e−α α k
k !k=0
m
∑ ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 37
0 10 20 30 40 500
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
k
p a(k
)
Poisson probabilities
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F X(x
)
Poisson distribution function
a = 25
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 38
1.6.2 Continuous DistributionsUniform (rectangular) distributionA continuous random variable X is uniformly distributed on the interval [a,b] (in abbreviated form X ∼ R(a,b)), if the probability density function is defined by
where 1M (x) denotes the indicator function of the set M Ì R, i.e.
fX (x) =1
b − a1[a,b](x) x ∈!,
1M (x) =1 x ∈M0 otherwise⎧⎨⎩
.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 39
The distribution function can be expressed as follows:
A uniformly distributed random variable X ∼ R(-p,p) is often used for modelling a random initial phase of a si-nusoidal signal.
FX (x) = fX ( ′x ) d ′x−∞
x
∫ = 1b − a
1[a,b]( ′x ) d ′x−∞
x
∫
=0 x ≤ a(x − a) (b − a) x ∈[a,b]1 x ≥ b
⎧
⎨⎪
⎩⎪
.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 40
-1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
x
f X(x
)
density function
-1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
x
F X(x
)
distribution function
a = 0, b = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 41
Normal (Gaussian) distributionA continuous random variable X is said to be normally distributed with parameters µ Î R and s 2 (X∼N(µ,s 2)), if the probability density function is defined by
The normal distribution function
can not be expressed in explicit form.
fX (x) =12πσ 2
exp − (x − µ)2
2σ 2
⎧⎨⎩⎪
⎫⎬⎭⎪
x ∈!.
FX (x) = fX ( ′x ) d ′x−∞
x
∫ = 1
2πσ 2exp − ( ′x − µ)2
2σ 2
⎧⎨⎩⎪
⎫⎬⎭⎪d ′x
−∞
x
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 42
However, to check that fX (x), where fX (x) > 0 "xÎR, rep-resents a valid form of a probability density function we have to show that
With the substitution x = (x' - µ) /s we can derive
Since fX (x) > 0 "x Î R, it is equivalent to proof that
1
2πσ 2exp −
′x − µ( )22σ 2
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪−∞
∞
∫ d ′x = 1
2πexp − x
2
2⎧⎨⎩⎪
⎫⎬⎭⎪−∞
∞
∫ dx.
fX (x) dx−∞
∞
∫ = 1.
1
2πexp − x
2
2⎧⎨⎩⎪
⎫⎬⎭⎪−∞
∞
∫ dx⎛
⎝⎜⎞
⎠⎟
2
= 1.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 43
1
2πexp(−x2 2)
−∞
∞
∫ dx⎛⎝⎜
⎞⎠⎟
2
=
= 12π
exp − (x2 + y 2) 2{ } dxdy−∞
∞
∫−∞
∞
∫= 12π
exp(−r 2 2) r dr dφ0
∞
∫0
2π
∫= exp(−r 2 2) r dr
0
∞
∫ = exp −u( ) du0
∞
∫ = 1.
Thus, after introducing a double integral, employing po-lar coordinates and finally substituting u = r2/2, the valid-ity of the equation can be shown:
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 44
The special form N (0,1), i.e. µ = 0 and s 2 = 1, is called standardized normal distribution. Its distribution function is usually denoted by F(x), i.e.
There are extensive tables of the function F(x) available in the literature. These tables enable us to evaluate the distribution function of X ∼ N(µ,s 2) as follows.
Φ(x) = 12π
exp − ′x 2
2⎧⎨⎩⎪
⎫⎬⎭⎪−∞
x
∫ d ′x .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 45
Let X ∼ N(µ,s 2) and therefore (X - µ) /s ∼ N(0,1) then
The normal distribution is by far the most important dis-tribution in probability theory and statistical inference.Its prominence is attributed to central limit theorems, which roughly state, that the sum of a large number of (independent) random variables obeys an approximate normal distribution.
FX (x) = P ξ :−∞ < X (ξ) ≤ x{ }( )= P ξ :−∞ < X (ξ)− µ
σ≤ x − µ
σ⎧⎨⎩
⎫⎬⎭
⎛
⎝⎜⎞
⎠⎟= Φ x − µ
σ⎛⎝⎜
⎞⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 46
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
x
f X(x
)
Gauss density function
-5 0 50
0.2
0.4
0.6
0.8
1
x
F X(x
)
Gauss distribution function
µ = 0, s 2 = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 47
Exponential distributionA continuous random variable X, taking positive values only, is said to satisfy an exponential distribution with pa-rameter l > 0 (X ∼ E (l)), if its probability density func-tion is of the form
Hence, its distribution function is given by
fX (x) = λ exp(−λ x)1[0,∞)(x) x ∈!.
FX (x) = fX ( ′x )−∞
x
∫ d ′x = λ exp(−λ ′x )1[0,∞)( ′x )−∞
x
∫ d ′x
= 1− exp(−λ x)⎡⎣ ⎤⎦1[0,∞)(x).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 48
The exponential distribution is used as a model for the life times of items when ageing processes can be neglected.
x x0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f X(x
)
density function
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
F X(x
)
distribution function
l = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 49
Weibull distributionWhen either ageing or initial failures processes have to be taken into account, a suitable model for the life time of an item can be provided by a continuous random va-riable X obeying a Weibull distribution with parameters l > 0,h > 0 (X ∼W(l,h)).
The probability density function of the Weibull distribu-tion is defined by
fX (x) = λ η xη−1exp(−λ xη )1[0,∞)(x) x ∈!.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 50
For the distribution function we obtain
By selecting a value for h, the following three cases can be qualitatively distinguished:h = 1 obviously W(l,1) = E (l)h > 1 ageing process is incorporated 0 < h < 1 initial failure process is considered
FX (x) = fX ( ′x )−∞
x
∫ d ′x = λη ′x η−1 exp(−λ ′x η )1[0,∞)( ′x )−∞
x
∫ d ′x
= 1− exp(−λ xη )⎡⎣ ⎤⎦1[0,∞)(x).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 51
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
x
f X(x
)
Weibull density function
0 1 2 3 4 50
0.2
0.4
0.6
0.8
1
x
F X(x
)
Weibull distribution function
h = 1/2h = 1h = 2l = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 52
Cauchy distributionA continuous random variable X is said to follow a Cauchy distribution with parameters µ Î R and n > 0 (X ∼ C(µ,n)), if the probability density function is given by
The distribution function can be derived as follows:
fX (x) =1π
νν 2 + (x − µ) 2
= 1π ν
1
1+ (x − µ) ν⎡⎣ ⎤⎦2 x ∈!.
FX (x) = fX ( ′x )−∞
x
∫ d ′x = 1π ν
1
1+ ( ′x − µ) ν⎡⎣ ⎤⎦2−∞
x
∫ d ′x =
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 53
The Cauchy distribution possess so-called long tails, i.e. its density function decays slowly as x tends either to plus or minus infinity.Consequently, Cauchy distributed random variables are suitable models for experiments where large measure-ment values can be observed with certain likelihood.
= 1π
11+ ′′x 2−∞
(x−µ) ν
∫ d ′′x = 12+ 1πarctan x − µ
ν⎛⎝⎜
⎞⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 54
-10 -5 0 5 100
0.05
0.1
0.15
0.2
0.25
0.3
x
f X(x
)
Cauchy density function
-10 -5 0 5 100
0.2
0.4
0.6
0.8
1
x
F X(x
)
Cauchy distribution function
µ = 0, n = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 55
1.7 Bivariate DistributionThe theory of random variables discussed so far dealt only with univariate distributions, i.e. the probability dis-tributions describe the properties of single random vari-ables. However, the modelling of experimental results often re-quires several random variables, e.g. the results of meas-uring the simultaneous values of pressure and tempera-ture in a gas of constant volume have to be described bytwo random variables.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 56
1.7.1 Bivariate Distribution FunctionA bivariate distribution function FXY (x,y) is defined by
FXY (x,y) = P({x : X(x)£x, Y(x)£y}) = P(X£x,Y£y).
It can be computed in discrete and continuous case by
or
respectively, where pXY (x,y) denotes the bivariate prob-ability mass function and fXY (x,y) the bivariate density function.
FXY (x,y) = fXY ( ′x , ′y )d ′x d ′y−∞
y
∫−∞
x
∫ ,
FXY (x,y) = pXY (xi ,y j )j ,y j ≤y∑
i,xi≤x∑
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 57
Bivariate distribution functions possess the properties:(1) limx®¥, y®¥ FXY (x,y) = FXY (¥,¥) = 1
(2) limx®-¥ FXY (x,y) = FXY (-¥,y) = 0limy®-¥ FXY (x,y) = FXY (x,-¥) = 0
(3) FXY (x,y) is right-continuous in x and y, respectively,i.e. limh®0+ FXY (x+h,y) = FXY (x,y)limh®0+ FXY (x,y+h) = FXY (x,y)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 58
(4) FXY (x,y) is a non-decreasing function, i.e. for anyh ³ 0 is FXY (x+h,y) ³ FXY (x,y) for all x and any y
FXY (x,y+h) ³ FXY (x,y) for all y and any x
(5) second difference (n-th difference for n = 2)DFXY ((a,b]) == FXY (b1,b2) - FXY (b1,a2) - FXY (a1,b2) + FXY (a1,a2)= P ({x : (X(x),Y(x))TÎ (a,b]}) = P ((X,Y)TÎ (a,b]) ³ 0where
(a,b] = ((a1,a2)T,(b1,b2)T] = (a1,b1] ´ (a2,b2].
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 59
1.7.2 Bivariate Density FunctionIf fXY (x,y) is continuous at (x,y), then
exists. Bivariate density functions satisfy the properties:(1) fXY (x,y) ³ 0 for all (x,y)T Î R2
(2)
(3) For any (a,b] = (a1,b1] ´ (a2,b2] Ì R2
fXY (x,y)dxdy−∞
∞
∫−∞
∞
∫ = 1
fXY (x,y) =∂2FXY (x,y)
∂x∂y=∂2FXY (x,y)
∂y ∂x
P (X ,Y )T∈(a,b]( ) = fXY (x,y)dxdya1
b1∫a2
b2∫ ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 60
1.7.3 Marginal Distribution and Density FunctionFor a given bivariate distribution of (X,Y), the univariate distributions of the individual random variables X and Y can be deduced from the expressions
and
limy→∞FXY (x,y) = FXY (x,∞) = P(X ≤ x,Y ≤ ∞)
= P(X ≤ x) = FX (x)
limx→∞FXY (x,y) = FXY (∞,y) = P(X ≤ ∞,Y ≤ y)
= P(Y ≤ y) = FY (y).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 61
For the marginal distribution and density functions of Xand Y we may write
with
and
with
respectively.
FX (x) = fX ( ′x )d ′x−∞
x
∫ = fXY ( ′x ,y)dy d ′x−∞
∞
∫−∞
x
∫
fX (x) = fXY (x,y)dy−∞
∞
∫
FY (y) = fY ( ′y )d ′y−∞
y
∫ = fXY (x, ′y )dxd ′y−∞
∞
∫−∞
y
∫
fY (y) = fXY (x,y)dx−∞
∞
∫ ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 62
1.7.4 Conditional Distribution and Density FunctionThe conditional probability of {x : X(x) £ x} knowing that {x : Y(x) £ y} occurred is given by, cf. Section 1.3,
Hence, the conditional distribution function of X under the condition {Y £ y} can be defined by
P(X ≤ x |Y ≤ y) =P {X ≤ x}∩ {Y ≤ y }( )
P(Y ≤ y), if P(Y ≤ y) > 0.
FX (x |Y ≤ y) = P(X ≤ x |Y ≤ y)= P(X ≤ x,Y ≤ y) P(Y ≤ y)= FXY (x,y) FY (y), if FY (y) > 0.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 63
The conditional density function of X under the condition {Y £ y} can be deduced from
such that
If fXY (x,y) is continuous at (x,y), we can also write
fX (x |Y ≤ y) =dFX (x |Y ≤ y)
dx=∂FXY (x,y) ∂x
FY (y).
FX (x |Y ≤ y) = fX ( ′x |Y ≤ y) d ′x−∞
x
∫ = FXY (x,y) FY (y)
= fXY ( ′x , ′y ) d ′y d ′x−∞
y
∫−∞
x
∫ FY (y)
fX (x |Y ≤ y) =fXY (x, ′y )d ′y
−∞
y
∫FY (y)
.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 64
Let now {x : y < Y(x) £ y + Dy} denote an event satisfy-ing P(y < Y £ y + Dy) > 0. Then, we can derive
FX (x | y <Y ≤ y + Δy)
= P(X ≤ x,y <Y ≤ y + Δy)P(y <Y ≤ y + Δy)
= P(X ≤ x,Y ≤ y + Δy)−P(X ≤ x,Y ≤ y)P(Y ≤ y + Δy)−P(Y ≤ y)
=FXY (x,y + Δy)− FXY (x,y)FY (y + Δy)− FY (y)
⋅ ΔyΔy.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 65
Assuming fXY (x,y) to be continuous at (x,y) and fY (y) > 0, the limit
exists and the conditional distribution function of X un-der the condition {Y = y} can be expressed by
limΔy→0
FX (x | y <Y ≤ y + Δy) =∂FXY (x,y) ∂ydFY (y) dy
FX (x |Y = y) =∂FXY (x,y) ∂ydFY (y) dy
=∂FXY (x,y) ∂y
fY (y)
=fXY ( ′x ,y)d ′x
−∞
x
∫fY (y)
=fXY ( ′x ,y)d ′x
−∞
x
∫fXY (x,y)dx−∞
∞
∫.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 66
Furthermore, exploiting
the conditional density function of X under the condition {Y = y} can be represented by
FX (x |Y = y) = fX ( ′x |Y = y) d ′x−∞
x
∫ =fXY ( ′x ,y)fY (y)
d ′x−∞
x
∫
fX (x |Y = y) =fXY (x,y)fY (y)
=fXY (x,y)
fXY (x,y)dx−∞
∞
∫=∂2FXY (x,y) ∂x∂ydFY (y) dy
=dFX (x |Y = y)
dx.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 67
Conditional density functions exhibit the properties:
(1)
(2)Bayes’ Formula:
(3)
fX (x |Y = y) ⋅ Δx =fXY (x,y) ⋅ Δx ⋅ ΔyfY (y) ⋅ Δy
≈ P(x < X ≤ x + Δx,y <Y ≤ y + Δy)P(y <Y ≤ y + Δy)
fXY (x,y) = fX (x |Y = y) fY (y) = fY (y | X = x) fX (x)
fX (x) = fXY (x,y)dy−∞
∞
∫ = fX (x |Y = y)−∞
∞
∫ fY (y) dy
fX (x |Y = y) = fY (y | X = x) fX (x) fY (y), if fY (y) > 0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 68
1.7.5 Independent Random VariablesWe say that two continuous random variables, X and Yare independent if the events {X £ x} and {Y £ y} are in-dependent for all x,y Î R, i.e. cf. section 1.3 that
FXY (x,y) = FX (x) FY (y), fXY (x,y) = fX (x) fY (y)and consequently
FX (x |Y = y) = FX (x), FY (y |X = x) = FY (y),fX (x |Y = y) = fX (x) and fY (y |X = x) = fY (y) .
For notational convenience we define fX (x|y) := fX (x |Y = y), fY (y |x) := fY (y |X = x),FX (x |y) := FX (x |Y = y) and FY (y |x) := FY (y |X = x).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 69
1.7.6 Bivariate Normal DistributionTwo continuous random variables X and Y are said to obey a bivariate normal distribution with parameters µX, µY, |r | < 1, sX > 0, sY > 0 if their probability density func-tion is given by
for all x,y Î R.
fXY (x,y) =1
2πσ XσY 1− ρ2exp − 1
2(1− ρ2)⎧⎨⎩
×
x − µXσ X
⎛
⎝⎜⎞
⎠⎟
2
−2ρ(x − µX )(y − µY )
σ XσY
+y − µYσY
⎛
⎝⎜⎞
⎠⎟
2⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎫⎬⎪
⎭⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 70
Bivariate Normal Density Function
-20
2
-2
0
2
0
0.05
0.1
0.15
0.2
xy
f X(x
)
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
xy
µX = 0, µY = 0, s2X = 1, s2
Y = 0, r = 1/2 µX = 0, µY = 0, sX = 1, sY = 1, r = 1/2
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 71
Employing vector/matrix notation the bivariate normal density function can be expressed by
where denotes the vector of the expected/mean val-ues and represents the so-called covariance matrix, i.e.
and one writes
fX(x) =1
2π det(ΣX )exp − 1
2(x − µX )
T ΣX−1(x − µX )
⎧⎨⎩
⎫⎬⎭,
µX =µX1µX2
⎛
⎝⎜⎜
⎞
⎠⎟⎟, ΣX =
σ X1
2 ρσ X1σ X2
ρσ X1σ X2
σ X2
2
⎛
⎝⎜⎜
⎞
⎠⎟⎟,
µXΣX
X = (X1,X 2)T ∼N2(µX,ΣX ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 72
Marginal density functionsThe marginal density functions of X and Y are given by
and
Hence, the marginal probability distributions of X and Y are , respectively.
fX (x) = fXY (x, ′y )d ′y−∞
∞
∫ = 1
2πσ X
exp − 12x − µXσ X
⎛
⎝⎜⎞
⎠⎟
2⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
fY (y) = fXY ( ′x ,y)d ′x−∞
∞
∫ = 1
2πσY
exp − 12
y − µyσY
⎛
⎝⎜
⎞
⎠⎟
2⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪.
N (µX ,σ X2 ) and N (µY ,σY
2 )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 73
Conditional density functionsThe conditional density function of X under the condition{Y = y} is given by
and we can write in abbreviated form
fX (x |Y = y) =fXY (x,y)fY (y)
= 1
σ X 2π (1− ρ2)×
exp − 12σ X
2 (1− ρ2)x − µX + ρ
σ X
σY
(y − µY )⎛
⎝⎜⎞
⎠⎟⎡
⎣⎢⎢
⎤
⎦⎥⎥
2⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
X |Y = y ∼N µX + ρσ X
σY
(y − µY ),σ X2 (1− ρ2)
⎛
⎝⎜⎞
⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 74
Analogous, the conditional density function of Y under the condition {X = x} is given by
Thus, we can write
fY (y | X = x) =fXY (x,y)fX (x)
= 1
σY 2π 1− ρ2( )×
exp − 12σY
2(1− ρ2)y − µY + ρ
σY
σ X
x − µX( )⎛
⎝⎜⎞
⎠⎟⎡
⎣⎢⎢
⎤
⎦⎥⎥
2⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪.
Y | X = x∼N µY + ρσY
σ X
(x − µX ),σY2(1− ρ2)
⎛
⎝⎜⎞
⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 75
1.8 Transformations of Random Variables1.8.1 Function of One Random VariableLet g : R® R be a measurable function, i.e. "y Î R is
Then, we can define a random variable Y : X ® R by
possessing a distribution function determined by FY (y) = P(Y ≤ y) = P g(X ) ≤ y( )
= P ξ :g X (ξ)( ) ≤ y{ }( ) = P X −1 g−1 (−∞,y ]( )( )( ) .
ξ !Y(ξ) = g X (ξ)( )
g−1 (−∞,y ]( ) = {x :g(x) ≤ y } ∈B.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 76
Strictly Monotonic Functiona) Suppose that g(x) is a strictly monotonic increasing
function. The distribution function can be written as
Moreover, if fX (x) is continuous and g(x) continuously differentiable at x = g -1(y) we can derive the density function fY (y) by applying the chain role of calculus.
FY (y) = P(Y ≤ y) = P g(X ) ≤ y( )= P X ≤ g−1(y)( ) = FX g−1(y)( ) = fX (x)dx−∞
g−1(y )
∫ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 77
Hence, we obtain
fY (y) = fX g−1(y)( )d g
−1(y)( )dy
for g(−∞) < y < g(∞)
0 elsewhere
⎧
⎨⎪
⎩⎪
=
fX g−1(y)( )
dg(x) dxx=g−1(y )
for g(−∞) < y < g(∞)
0 elsewhere
⎧
⎨⎪⎪
⎩⎪⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 78
b) Let g(x) be a strictly monotonic decreasing function. Consequently, we have
Again, if fX (x) is continuous and g(x) continuously dif-ferentiable at x = g -1(y) the density function fY (y) can be determined by employing the chain role of calculus.
FY (y) = P(Y ≤ y) = P g(X ) ≤ y( ) = P X ≥ g−1(y)( )= 1−P X < g−1(y)( ) = 1− FX g−1(y)− 0( )= 1− fX (x)dx−∞
g−1(y )
∫ = fX (x)dxg−1(y )
∞
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 79
Thus, we obtain
fY (y) = − fX g−1(y)( )d g
−1(y)( )dy
for g(−∞) < y < g(∞)
0 elsewhere
⎧
⎨⎪
⎩⎪
=−
fX g−1(y)( )
dg(x) dxx=g−1(y )
for g(−∞) < y < g(∞)
0 elsewhere
⎧
⎨⎪⎪
⎩⎪⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 80
Exploiting the property
the results of case a) and b) can be summarised.Let fX (x) be continuous at x = g -1(y) and g(x) any strictly monotonic function that is continuously differentiable at x = g -1(y). Then fY (y) can be calculated by
dg(x) dx > 0 for g(x) strictly monotonic increasing< 0 for g(x) strictly monotonic decreasing
fY (y) =
fX g−1(y)( )
dg(x) dxx=g−1(y )
for g(−∞) < y < g(∞)
0 elsewhere
⎧
⎨⎪⎪
⎩⎪⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 81
Exercise 1.8-1: Linear function
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 82
Non-Monotonic FunctionTheorem:Let fX (x) denote the continuous density function of the random variable X and let g(x) be a continuously differ-entiable function. Furthermore, suppose that equation y = g(x) may possess n solutions for a particular y, i.e.
Then, fY (y), the continuous density function of the ran-dom variable Y = g(X) can be determined by
fY (y) =fX (xi )
dg(xi ) dx xi=gi−1(y )
i=1
n
∑
y = g(x1) =… = g(xn).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 83
Exercise 1.8-2: Quadratic function
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 84
1.8.2 One Function of Two Random VariablesSuppose (X,Y) are random variables with bivariate den-sity function fXY (x,y). Let g(x,y) be a function such that
represents a random variable, i.e. for all z Î R is
Then the distribution function of Z is given by
Z = g(X ,Y )
Dz = (x,y) :g(x,y) ≤ z{ }∈B.
FZ (z) = P(Z ≤ z) = P g(X ,Y ) ≤ z( )= P (X ,Y )∈Dz( ) = fXY (x,y)dxdyDz
∫∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 85
Exercise 1.8-3: Sum of two random variables
Exercise 1.8-4: Magnitude of the difference of two independent random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 86
1.8.3 Two Functions of Two Random VariablesLet (X,Y) be random variables with bivariate density func-tion fXY (x,y). Suppose g1(x,y) and g2(x,y) are functions such that
are random variables, i.e. for all u,v Î R are
and consequently
),(and),( 21 YXgVYXgU ==
Duv = Du ∩Dv = (x,y) :g1(x,y) ≤ u,g2(x,y) ≤ v{ }∈B.
Du = (x,y) :g1(x,y) ≤ u{ }∈B ,Dv = (x,y) :g2(x,y) ≤ v{ }∈B
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 87
The bivariate distribution function of (U,V) is given by
Assume that the equation system
has the unique solution (x,y) = g1
−1(u,v),g2−1(u,v)( ).
FUV (u,v) = P(U ≤ u,V ≤ v)
= P g1(X ,Y ) ≤ u,g2(X ,Y ) ≤ v( )= P (X ,Y )∈Duv( ) = fXY (x,y)dxdyDuv
∫∫ .
(u,v) = g1(x,y),g2(x,y)( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 88
Furthermore, suppose g1(x,y), g2(x,y) have continuous partial derivatives and the determinant of the Jacobian
does not vanish, i.e.
Then the bivariate density function of (U,V) is given by
J(x,y) =∂(g1,g2)∂(x,y)
=∂g1 ∂x ∂g1 ∂y∂g2 ∂x ∂g2 ∂y
⎛
⎝⎜
⎞
⎠⎟
det J(x,y)( ) ≠ 0.
fUV (u,v) =fXY g1
−1(u,v),g2−1(u,v)( )
det J g1−1(u,v),g2
−1(u,v)( )( ) .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 89
Alternatively, using the Jacobian
and exploiting the well known results
and
we can write
!J(u,v) = J g1−1(u,v),g2
−1(u,v)( )−1
!J(u,v) =∂(g1
−1,g2−1)
∂(u,v)=
∂g1−1 ∂u ∂g1
−1 ∂v
∂g2−1 ∂u ∂g2
−1 ∂v
⎛
⎝⎜
⎞
⎠⎟ .
fUV (u,v) = fXY g1−1(u,v),g2
−1(u,v)( ) det !J(u,v)( ) .
det !J(u,v)( ) = 1 det J g1−1(u,v),g2
−1(u,v)( )( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 90
Exercise 1.8-5: Linear transformation
Exercise 1.8-6: Product of two random variables
Exercise 1.8-7: Quotient of two random variables
Exercise 1.8-8:Rayleigh distribution
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 91
1.9 Expectation Operator1.9.1 Expectation for Univariate DistributionsExpected Value of a Random VariableThe expected value of a random variable X, also called mean value, is defined by
provided that the sum respectively integral converges absolutely. The two cases can be summarized by introducing the
µX = E(X ) =xi pX (xi ) when X is discrete
i∑
xfX (x) dx−∞
∞
∫ when X is continuous
⎧
⎨⎪
⎩⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 92
so-called Stieltjes-Integral
Remark:Let a = x0 < x1 < … < xn = b be points that provide a par-tition of [a,b] into n subintervals (xk,xk+1] (k = 0,…,n-1)andThen the Stieltjes-Integral is defined by
µX = E(X ) = x dFX (x)−∞
∞
∫ .
g(x)a
b
∫ dF(x) = limn→∞
maxk(xk+1−xk )→0
g( !xk ) F(xk+1)− F(xk )( )k=0
n−1
∑ .
!xk ∈(xk ,xk+1].
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 93
Exercise 1.9-1: Expected value for Poisson distribution
Exercise 1.9-2: Expected value for exponential distribution
Exercise 1.9-3: Expected value for normal distribution
Exercise 1.9-4: Expected value for Cauchy distribution
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 94
Expected Value of a Function of a Random VariableFor a measurable function, g(X), of the random variable X, we define the expected value of g(X) by
provided that the sum respectively integral converges absolutely.
E g(X )( ) = g(x) dFX (x)−∞
∞
∫
=g(xi )pX (xi ) when X is discrete
i∑
g(x)fX (x) dx−∞
∞
∫ when X is continuous
⎧
⎨⎪
⎩⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 95
Let FY (y) denote the distribution function of the random variable Y = g(X), then by using the definition of integra-tion it is easy to establish that
Consequently, the expected value of a function of a ran-dom variable g(X) can be computed directly without de-termining first the distribution function of the random var-iable Y = g(X).
E g(X )( ) = g(x) dFX (x)−∞
∞
∫ = y dFY (y)−∞
∞
∫ = E(Y ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 96
MomentsIf g(X) = Xk with k > 0, the expected value of g(X), i.e.
is called k-th moment of X provided that the integral con-verges absolutely. For k = 1, we obtain the mean value µX = m1 = E(X).
mk = E(X k ) = xk dFX (x)−∞
∞
∫
=xik pX (xi ) when X is discrete
i∑
xk fX (x) dx−∞
∞
∫ when X is continuous
⎧
⎨⎪
⎩⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 97
Centralised MomentsSuppose g(X) = (X - µX)k with k > 0, then the expected value of g(X), i.e.
is called k-th centralised moment of X.
ck = E (X − µX )k( ) = E k
m⎛⎝⎜
⎞⎠⎟m=0
k
∑ (−µX )mX k−m⎛
⎝⎜⎞
⎠⎟
=km
⎛⎝⎜
⎞⎠⎟m=0
k
∑ (−µX )mE(X k−m)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 98
For k = 2, the centralised moment given by
is called variance. The positive root of the variance is denoted by sX and is called standard deviation.
Absolute MomentsThe k-th absolute moment of X is defined by
σ X2 = Var(X ) = c2 = E (X − µX )
2( ) = E(X 2)− µX2
E |X |k( ) = |x|k dFX (x)−∞
∞
∫ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 99
Because of the inequality
we can state that the existence of the k-th absolute mo-ment insures the existence of the (k-1)-th moment.
Chebyschev InequalityLet X be a random variable. For k ³ 1 and any e > 0 the inequality
holds.
|X |k−1≤1+ |X |k for k = 1,2,…,
P |X | ≥ ε( ) ≤ E |X |k( )
ε k
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 100
Exercise 1.9-5:Second order moments for Poisson distribution
Exercise 1.9-6: Second order moments for exponential distribution
Exercise 1.9-7:Second order moments for normal distribution
Exercise 1.9-8:Application and proof of the Chebyschev inequality
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 101
Characteristic FunctionThe characteristic function of the random variable X is de-fined by taking the expected value of g(X) = e jsX, i.e.
where s Î R.
ΦX(s) =E(e js X) = e js x dFX (x)−∞
∞
∫
=
e js xi pX (xi ) when X is discretei∑
e js x fX (x) dx−∞
∞
∫ when X is continuous
⎧
⎨⎪⎪
⎩⎪⎪
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 102
Characteristic functions have the properties: § FX (s) is continuous in s
(absolute and uniform convergence of the sum resp. integral)
§ |FX (s)| £ 1 for all s Î R
§ FX (0) = E(e0) = E(1) = 1(characteristic function takes its maximum at s = 0)
Note:
ΦX(s) = e js x fX (x)dx−∞
∞
∫ ≤ e js x fX (x)dx−∞
∞
∫ = fX (x)dx−∞
∞
∫ =1
ΦX(s) = e js xi pX (xi )i∑ ≤ e js xi pX (xi )i∑ = pX (xi )i∑ =1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 103
Moreover, one can easily observe that FX (-w) equals the Fourier transform of fX (x). Hence, the properties of a characteristic function are es-sentially equivalent to the properties of a Fourier trans-form (one-to-one mapping).
Consequently, the probability distribution of a random var-iable is uniquely defined by the inverse Fourier transformof the characteristic function FX (s), i.e.
fX (x) =12π
e− j s xΦX (s)ds−∞
∞
∫ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 104
Let X be a random variable with characteristic function FX (s) and Y = aX + b. Thus, the characteristic function of Y can be easily determined by
Its probability density function can be derived by apply-ing the inverse Fourier Transform as follows
ΦY (s) = E(ejsY ) = E(e js(aX +b)) = e jbsE(e jas X ) = e jbsΦX (as).
fY (y) =12π
e− j s y
−∞
∞
∫ ΦY (s)ds =12π
e− j s(y−b)
−∞
∞
∫ ΦX (as)ds
= 1a12π
e− j ′s (y−b) a
−∞
∞
∫ ΦX ( ′s )d ′s = 1afX
y − ba
⎛⎝⎜
⎞⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 105
Moment TheoremSuppose that E(Xk) exists for any k ³ 1, i.e. E(|X|k) < ¥, and therefore
holds, i.e. the order of differentiation and integration can be interchanged, we can deduce the so-called moment theorem
dkΦX (s)dsk
= dkE(e js X )dsk
= E ∂k e j s X
∂sk⎛⎝⎜
⎞⎠⎟= j kE(X ke js X ),
mk = E X k( ) = 1j kdkΦX (s)dsk
s=0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 106
Exercise 1.9-9:Characteristic function of univariate normal distributions
Exercise 1.9-10:Higher order moments of univariate normal distributions
Exercise 1.9-11:Non-negative definiteness of the characteristic function
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 107
1.9.2 Expectation for Bivariate DistributionsExpected Value of a Function of two Random VariablesFor a measurable function g(X,Y) of the random variables X and Y, the expected value of g(X,Y) is defined by
provided that the sum respectively integral converges ab-solutely.
E g(X ,Y )( ) = g(x,y) d 2FXY (x,y)−∞
∞
∫−∞
∞
∫ =
=
g(xi ,y j ) pXY (xi ,y j ) in the discrete casej∑
i∑
g(x,y)fXY (x,y) dxdy in the continuous case−∞
∞
∫−∞
∞
∫
⎛
⎝
⎜⎜⎜⎜
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 108
FZ (z) may denote the distribution function of the random variable Z = g(X,Y). Then analogue to the univariate case, we can show that
That is, the expected value of a function of two random variables g(X,Y) can be computed directly without deter-mining first the distribution function of the random vari-able Z = g(X,Y).
E g(X ,Y )( ) = g(x,y) d 2FXY (x,y)−∞
∞
∫−∞
∞
∫= z dFZ (z)−∞
∞
∫ = E(Z).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 109
Expected Value of a Linear CombinationThe expected value of a linear combination leads to
Thus, the expected value of a linear combination equals the linear combination of the expected values.
We note in particular that
E ai gi (X ,Y )i∑⎛⎝⎜
⎞⎠⎟= ai gi (x,y)
i∑ d 2FXY (x,y)−∞
∞
∫−∞
∞
∫ =
= ai gi (x,y) d2FXY (x,y)−∞
∞
∫−∞
∞
∫i∑ = aiE gi (X ,Y )( )
i∑ .
E(aX k + bY k ) = aE(X k )+ bE(Yk ) for k ≥1.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 110
Bivariate MomentsThe (k,l)-th moment and centralised moment of discretedistributed random variables are defined by
and
respectively.
mkl = E(XkY l ) = xi
ky jl pXY (xi ,y j )
j∑
i∑ k = 1,2,…; l = 1,2,…
ckl = E X −E(X )( )k Y −E(Y )( )l( ) = E (X − µX )k (Y − µY )
l( )= (xi − µX )
k (y j − µY )l pXY (xi ,y j )
j∑
i∑ ,
k = 1,2,…; l = 1,2,…
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 111
In case of continuously distributed random variables, the (k,l)-th moment and centralised moment are defined by
and
respectively.
mkl = E(X kY l ) = xky l fXY (x,y) dxdy−∞
∞
∫−∞
∞
∫ k = 1,2,…; l = 1,2,…
ckl = E X −E(X )( )k Y −E(Y )( )l( ) = E (X − µX )k (Y − µY )l( )= (x − µX )k (y − µY )l fXY (x,y) dxdy
−∞
∞
∫−∞
∞
∫ k = 1,2,…; l = 1,2,…
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 112
On setting k = 0 or l = 0, the moments reduce to the cor-responding moments of the marginal distributions of Xand Y. However, if k ³ 1 and l ³ 1, the moments become func-tions of the complete bivariate distribution.
In particular, setting k = l = 1, the centralised moment,called the covariance between X and Y, is given by
and
c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( )= (xi − µX )(y j − µY )pXY (xi ,y j )
j∑
i∑
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 113
for the discrete and continuous case, respectively. The covariance can be expressed by first and second or-der moments as follows
Furthermore, we have
c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( )= (x − µX )(y − µY )fXY (x,y) dxdy−∞
∞
∫−∞
∞
∫
c11 = Cov(X,Y ) = E (X − µX )(Y − µY )( ) = E(XY )− µXµY .
c20 = Cov(X,X ) = E (X −µX )2( )
= Var(X ) = E(X 2)−µX2 = σ X
2 .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 114
The covariance measures the degree of linear associa-tion between X,Y; i.e. the larger resp. smaller the mag-nitude of the covariance the larger resp. smaller is the linear association.To achieve an unified understanding about what is large and small, we introduce the normalized quantity
which is called the correlation coefficient between X,Y.
ρ (X,Y ) = Cov(X ,Y )Var(X )Var(Y )
=c11c20 c02
,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 115
Exercise 1.9-12:Covariance and correlation coefficient of bivariate normal distributions
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 116
Theorem:For all bivariate distributions with finite second order mo-ments the Cauchy Schwarz inequality
holds and the correlation coefficient satisfies the inequality
where the equality sign is taken, if, with probability 1, a linear relationship between X and Y exists.
E(XY )( )2 ≤E(X 2)E(Y 2)
ρ (X,Y ) ≤1,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 117
Exercise 1.9-13:Proof of the inequalities
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 118
Uncorrelatedness, Orthogonality and Independence Let X,Y be random variables. Then X,Y are called
(1) uncorrelated, if
(2) orthogonal, if
(3) independent, if for all x,y Î R
ρ(X,Y ) = 0⇒Cov(X,Y ) = 0⇒E(XY ) = E(X )E(Y )
E(XY ) = xy fXY (x,y) dxdy−∞
∞
∫−∞
∞
∫ = 0
fXY (x,y) = fX (x) ⋅ fy (y)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 119
Implications:1) If X and Y are independent random variables and
g1(x) and g2(y) are measurable functions then U =g1(X) and V = g2(Y) are independent and uncorre-lated random variables.
2) If X and Y are orthogonal random variables then E(X+Y)2 = E(X 2) + E(Y 2) holds.
3) If X and Y are orthogonal random variables and E(X) = 0 or/and E(Y) = 0 then X and Y are uncor-related random variables.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 120
Remarks:§ If X and Y are uncorrelated random variables then
U = g1(X) and V = g2(Y) are not necessarily uncorre-lated random variables.
§ If X and Y are uncorrelated random variables then X and Y are not necessarily independent random var-iables.
§ If X and Y are uncorrelated and normally distributed random variables then X and Y are also independent random variables.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 121
Exercise 1.9-14:Verification of the remarks
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 122
Conditional Expected ValueSuppose that X and Y are bivariate distributed continu-ous random variables. The conditional expected value of X, given Y = y, written as EX (X |Y = y) is defined by
provided that the integral converges absolutely. The corresponding expression for the discrete case is obtained with obvious modifications.
EX (X |Y = y) = x fX (x |Y = y) dx−∞
∞
∫ =
= xfXY (x,y)fY (y)
dx−∞
∞
∫ = 1fY (y)
x fXY (x,y) dx−∞
∞
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 123
In general, the value of EX (X |Y = y) will vary as we vary the value of y. Thus, EX (X |Y = y) will be a function of y and we can write
where yX (y) is called the regression function of X on Y.
Analogously, the conditional expected value of Y, given X = x, is defined by
where yY (x) denotes the regression function of Y on X.
EX (X |Y = y) =ψ X (y),
EY (Y | X = x) =ψY (x),
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 124
More generally, if we consider a measurable function of X, g(X), whose expected value exists, then the conditional expected value of g(X), given Y = y, is given by
Similarly, the conditional expected value of g(Y), given X = x, is defined by
EX g(X ) |Y = y( ) = g(x)fX (x |Y = y) dx−∞
∞
∫= 1fY (y)
g(x)fXY (x,y) dx−∞
∞
∫ =ψ g(X )(y).
EY g(Y ) | X = x( ) =ψ g(Y )(x).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 125
Conditional ExpectationLet us now consider the random variable yg(X)(Y), which we obtain by replacing y by Y in the function yg(X)(×).
The random variable yg(X)(Y) is called the conditional ex-pectation of g(X), given Y, and we write
For the random variable yg(Y)(X), we analogous write
ψ g(X )(Y ) = EX g(X ) |Y( ).
ψ g(Y )(X ) = EY g(Y ) | X( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 126
Properties of conditional expectations:(1) EY[EX (X |Y)] = E(X), EX [EY (Y|X)] = E(Y)
or more generally EY [EX (g(X)|Y)] = E(g(X)), EX [EY (g(Y)|X)] = E(g(Y))
(2) Moreover, if h(Y) is a function such that E(g(X)h(Y))exists, thenEX (g(X)h(Y) |Y=y) = h(y)EX (g(X) |Y=y) (conditional on Y=y, h(Y) can be treated as constant) and hence we haveEY [EX(g(X)h(Y) |Y)] = EY [h(Y)EX (g(X) |Y)]
= E(g(X)h(Y))
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 127
Exercise 1.9-15:Verification of the properties
Exercise 1.9-16:Determination of E(XY) for bivariate normal distributed random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 128
Bivariate Characteristic FunctionThe bivariate characteristic function of the random varia-bles X and Y is defined by
and we can write
and
for the discrete and continuous case respectively.
ΦXY (s1,s2) = exp j s1x + s2y( )( ) fXY (x,y) dxdy−∞
∞
∫−∞
∞
∫
ΦXY (s1,s2) = exp j s1xn + s2ym( )( )m∑ pXY (xn,ym)
n∑
ΦXY (s1,s2) = E exp j s1X + s2Y( )( )( ) with (s1,s2)T ∈!2,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 129
Properties of bivariate characteristic functions: § FXY (s1,s2) is continuous in s1 and s2
§ |FXY (s1,s2)| £ 1 for all (s1,s2)TÎ R2
§ FXY (s1,0) = FX (s1) and FXY (0,s2) = FY (s2)§ FXY (0,0) = E(e0) = E(1) = 1§ FXY (-w1,-w2) is the 2d-Fourier transform of fXY (x,y)
§ the random variables X and Y are independent, iffFXY (s1,s2) = FX (s1) ·FY (s2)
⇒ fXY (x,y) =1
(2π )2e− j s1x+s2 y( )
−∞
∞
∫ ΦXY (s1,s2) ds1ds2−∞
∞
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 130
Exercise 1.9-17:Determine the density of Z = X +Y, if X and Y are independent random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 131
Moment TheoremSupposing the moments mkl = E(XkY l) exist and therefore
holds, i.e. the order of differentiation and integration can be interchanged, we can deduce the moment theorem
∂k+lΦXY (s1,s2)∂s1
k ∂s2l =
∂k+lE e j (s1X +s2Y )( )∂s1
k ∂s2l
= j k+lE X kY le j (s1X +s2Y )( ),
mkl = E X kY l( ) = 1j k+l
∂k+lΦXY (s1,s2)∂s1
k ∂s2l
s1=0,s2=0
.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 132
1.9.3 Mean Square Error EstimationNon-linear Mean Square Error EstimationWe wish to estimate the random variable Y by a functionof the random variable X. Our aim is to find any functiong(X) such that the mean square error
is minimum. Over the class of all functions g for which the expected value exists, q(g) is minimized by choosing
q(g) = E Y − g(X )( )2( ) = y − g(x)( )2 fXY (x,y) dxdy−∞
∞
∫−∞
∞
∫
g(x) =ψY (x) = EY (Y | X = x) .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 133
Exercise 1.9-18:Proof of the non-linear mean square error estimation result
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 134
Linear Mean Square Error EstimationNow we wish to estimate the random variable Y by a lin-ear function of X, i.e. g(X) = a1X + a0. The objective is to minimise the mean square error
by varying the parameters a1 and a0.
That is, we want to solve the minimisation problem
The mean square error is minimum if the equation system
q(a1,a0) = E Y − (a1X + a0)( )2⎡⎣⎢
⎤⎦⎥
a = a1,a0( )T = argmina1,a0
q(a1,a0)( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 135
is satisfied. Solving the equation system provides
and
a1 =E(XY )−E(X )E(Y )
E X 2( )− E(X )( )2=m11 −m10m01
m20 −m102 = Cov(X ,Y )
Var(X )
= ρ(X ,Y )σY σ X
∂q(a1,a0) ∂a1∂q(a1,a0) ∂a0
⎛
⎝⎜
⎞
⎠⎟a1=a1a0=a0
= −2E Y − a1X − a0( )X( )E Y − a1X − a0( )
⎛
⎝⎜⎜
⎞
⎠⎟⎟= 0
a0 = E(Y )− a1E(X ) =m01 − a1m10 =m20m01 −m10m11
m20 −m102 .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 136
Substitution of a1 and a0 by â1 and â0 in q(a1,a0) gives theminimum mean square error
Moreover, the minimum mean square errors of the linear and non-linear approach obviously satisfy the relationship
q(a1,a0) = E Y − (a1X + a0)( )2( )= E Y −E(Y )− a1 X −E(X )( )( )2⎛
⎝⎞⎠
= Var(Y )−2a1Cov(X ,Y )+ a12 Var(X )
= σY2 1− ρ2(X ,Y )( ).
q(a1,a0) ≥ q(g) .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 137
Exercise 1.9-19:Mean square error estimation for bivariate normal distributed random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 138
1.10 Vector-valued Random Variables1.10.1 Multivariate DistributionsMultivariate Distributions and Density FunctionsThe basic ideas of bivariate distributions are easily ex-tended to the general case, where instead of two, n ran-dom variables X1,X2,…,Xn are considered.
Thus, the distribution function of the random vector X =(X1,X2,…,Xn)T is defined by
and the density function is given by
FX(x) = FX1…Xn (x1,…,xn) = P(X1 ≤ x1,…,Xn ≤ xn)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 139
Marginal Distributions and Density FunctionsFor a given multivariate distribution function of X1,X2,…,Xn, the marginal distribution and density function of X1,X2, …,Xk can be expressed by
and
respectively.
FX1…Xk (x1,…,xk ) = FX1…Xk…Xn (x1,…,xk ,∞,…,∞)
fX(x) = fX1…Xn (x1,…,xn) =∂n
∂x1!∂xnFX1…Xn (x1,…,xn).
fX1…Xk(x1,…,xk) = ! fX1…Xk…Xn(x1,…,xk,xk+1,…,xn)dxk+1!dxn−∞
∞
∫−∞
∞
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 140
Conditional Distributions and Density FunctionsThe conditional distribution and density function of X1,…,Xk under the condition {Xk+1 = xk+1,…,Xn = xn} is given by
and
FX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn) =
= P(X1 ≤ x1,…,Xk ≤ xk | Xk+1 = xk+1,…,Xn = xn)
fX1…Xk(x1,…,xk | Xk+1= xk+1,…,Xn= xn) =fX1…Xn(x1,…,xn)
fXk+1…Xn(xk+1,…,xn)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 141
Notation:
Exercise 1.10-1:Calculations with conditional densities
FX1…Xk (x1,…,xk | xk+1,…,xn) =
= FX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn)
and
fX1…Xk (x1,…,xk | xk+1,…,xn) =
= fX1…Xk (x1,…,xk | Xk+1 = xk+1,…,Xn = xn)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 142
Independent Random VariablesThe random variables X1, X2,…, Xn are said to be inde-pendent, if the multivariate distribution or density func-tion breaks down into the product of n marginal distribu-tions or density functions.
Thus X1,X2,…,Xn are independent if the multivariate dis-tribution can be written in the form
(valid for the discrete and continuous case)
FX1…Xn (x1,…,xn) = FXi (xi )i=1
n
∏
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 143
or if the density function can be written in the form
(applies for the continuous case).The random variables X1,…,Xk are independent from the random variables Xk+1,…, Xn if the multivariate distribu-tion or the density function can be expressed by
or
FX1…Xn (x1,…,xn) = FX1…Xk (x1,…,xk ) ⋅FXk+1…Xn (xk+1,…,xn)
fX1…Xn (x1,…,xn) = fX1…Xk (x1,…,xk ) ⋅ fXk+1…Xn (xk+1,…,xn).
fX1…Xn (x1,…,xn) = fXi (xi )i=1
n
∏
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 144
1.10.2 Transformation of Vector-valued Random Variables
Suppose X1,X2,…,Xn are random variables and g1,…,gmare functions with m £ n such that
are random variables. Then the multivariate distribution of Y1,Y2,…,Ym is given by
Y1 = g1(X1,…,Xn),…,Ym = gm(X1,…,Xn)
FY1…Ym (y1,…,ym) =
= P g1(X1,…,Xn) ≤ y1,…,gm(X1,…,Xn) ≤ ym( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 145
The multivariate density function of Y1,Y2,…,Ym can be determined as follows. In case of m < n we define n- mfunctions (auxiliary variables)
Assume that the equation system
has the unique solution
yi = gi (x1,…,xn) for i = 1,…,myi = xi = gi (x1,…,xn) for i =m +1,…,n.
(x1,…,xn)T = g1
−1(y1,…,yn),…,gn−1(y1,…,yn)( )T .
(y1,…,yn)T = g1(x1,…,xn),…,gn(x1,…,xn)( )T
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 146
Furthermore, suppose g1(x1,x2,…,xn),…, gn(x1,x2,…,xn) have continuous partial derivatives and the determinant of the Jacobian
does not vanish, i.e.
J(x1,…,xn) =
∂g1∂x1
!∂g1∂xn
! !∂gn∂x1
!∂gn∂xn
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
det J(x1,…,xn)( ) ≠ 0,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 147
then the multivariate density function of Y1,Y2,…,Yn is given by
Finally, integration over ym+1,…,yn provides the multivar-iate density function of Y1,Y2,…,Ym.
fY1…Yn (y1,…,yn) =fX1…Xn (x1,…,xn)
detJ(x1,…,xn) x1=g1−1(y1,…,yn )
!xn=gn
−1(y1,…,yn )
.
fY1…Ym (y1,…,ym) = ! fY1…Yn (y1,…,yn)dym+1!dyn−∞
∞
∫−∞
∞
∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 148
Exercise 1.10-2:Density of a linear combination of random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 149
1.10.3 Expectations for Vector-valued Random Variables
Expected ValueFor a measurable function g(X1, X2,…, Xn) the expectedvalue of g(X1,X2,…,Xn) is defined by
provided that the sum resp. integral converges absolutely.
E g(X1,…,Xn)( ) = ! g(x1,…,xn)dnFX1…Xn(x1,…,xn)−∞
∞
∫−∞
∞
∫
=
! g(x1,i1,…,xn,in) pi1…inin
∑i1
∑
! g(x1,…,xn)fX1…Xn(x1,…,xn)dx1!dxn−∞
∞
∫−∞
∞
∫
⎛
⎝
⎜⎜⎜⎜
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 150
Conditional Expected ValueLet X1, X2,…, Xn are multivariate distributed continuousrandom variables. The conditional expected value of X1, given X2 = x2,…, Xn = xn is defined by
E X1 | X 2 = x2,…,Xn = xn( ) = x1 fX1(x1 | x2,…,xn)dx1−∞
∞
∫
= x1fx1…xn (x1,…,xn)
fx2…xn (x2,…,xn)dx1−∞
∞
∫=ψ X1
(x2,…,xn).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 151
Conditional ExpectationNow, replacing x2,…,xn by X2,…, Xn in yX1
(x2,…,xn) we obtain the random variable
which is called the conditional expectation of X1, given X2,…,Xn.
Properties of conditional expectations:
ψ X1(X 2,…,Xn) = E X1 | X 2,…,Xn( )
E E X1 | X 2,…,Xn( )⎡⎣ ⎤⎦ = E(X1)
E X1X 2 | X3( ) = E E X1X 2 | X 2,X3( ) | X3⎡⎣ ⎤⎦
= E X 2E X1 | X 2,X3( ) | X3⎡⎣ ⎤⎦
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 152
Uncorrelated and Orthogonal Random VariablesThe random variables X1,…, Xn are called uncorrelatedresp. called orthogonal if for all i ¹ j
holds. Consequently, if X1,…, Xn are uncorrelated resp. are or-thogonal and
we can write
E XiX j( ) = E Xi( )E X j( ) resp. E XiX j( ) = 0
U = Xii=1
n∑ ,
σU2 = σ Xi
2
i=1
n
∑ resp. E U2( ) = E Xi2( )
i=1
n
∑ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 153
MomentsLet X1,…, Xn be multivariate distributed continuous ran-dom variables. Then common moments of X1,…, Xn can be determined by
and where the order of the moments is defined by
mk1…kn= E X1
k1,…,Xnkn( )
= x1k1!xn
knfX1…Xn (x1,…,xn)dx1!dxn−∞
∞
∫
r = k1 +!+ kn.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 154
Characteristic FunctionsThe characteristic function of the multivariate distributed random variables X1,…,Xn is given by
Hence, if the random variables X1,…,Xn are independent, we obtain
ΦX(s) = ΦX1…Xn(s1,…,sn)
= E exp j siX ii=1
n
∑⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜⎞
⎠⎟= E exp j sTX( )⎡
⎣⎤⎦.
ΦX(s)=E exp jsi X i( )i=1
n
∏⎛⎝⎜⎞⎠⎟= E exp jsi X i( )⎡⎣ ⎤⎦
i=1
n
∏ = ΦXi(si )
i=1
n
∏
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 155
Let X1,…,Xn be now independent random variables and
then the characteristic function of U is given by
If in addition, the X1,…,Xn possess the density functions fX1
,…, fXn, the density function of U can be determined by
U = Xii=1
n
∑ ,
ΦU (s) = E exp j sU( )⎡⎣ ⎤⎦ = E exp j sX ii=1
n
∑⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜⎞
⎠⎟= ΦXi
(s)i=1
n
∏ .
fU (u) = fX1∗ fX2∗!∗ fXn( ) (u)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 156
Exercise 1.10-3:Sample mean and sample variance of independent normally distributed random variables
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 157
Moment TheoremSuppose that the moments exist and therefore
holds, we can deduce the moment theorem
mk1…kn= E X1
k1!Xnkn( ) = 1
j k1+!+kn
∂k1+!+knΦX1!Xn(s1,…,sn)
∂sk1!∂skn s1=0"
sn=0
.
∂k1+!+knΦX1,…,Xn(s1,…,sn)
∂s1k1!∂sn
kn=∂k1+!+knE e j (s1X1+!+snXn )( )
∂s1k1!∂sn
kn
= j k1+!+knE X1k1!Xn
kn e j (s1X1+!+snXn )( ),
mk1…kn= E X1
k1!Xnkn( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 158
Exercise 1.10-4:Application of the Moment Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 159
1.10.4 Mean Square Error EstimationNon-linear Mean Square Error EstimationTo estimate the random variable X0 by means of the ran-dom variables X1,…, Xn, we want to find any function g(X1,…,Xn) that minimizes the mean square error
Over the class of all functions g for which the expected value exists, q(g) is minimised by choosing
q(g) = E X0 − g(X1,…,Xn)( )2( )= ! x0− g(x1,…,xn)( )2 fX0…Xn(x0,…,xn)dx0!dxn−∞
∞
∫−∞
∞
∫ .
g(x1,…,xn) =ψ X0(x1,…,xn) = EX0 X0 | X1= x1,…,Xn= xn( ) .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 160
Linear Mean Square Error EstimationNow we wish to estimate the random variable X0 by a lin-ear function of X1,…,Xn, i.e.
such that the mean square error
is minimized by varying the parameter vector a.
g(X1,…,Xn) = a1X1 +!+ anXn = aiX ii=1
n
∑ = aTX,
q(a) = q(a1,…,an) = E X0 − aiX ii=1
n
∑ )⎛⎝⎜
⎞⎠⎟
2⎛
⎝⎜⎜
⎞
⎠⎟⎟= E X0 − a
TX( )2⎡⎣⎢
⎤⎦⎥
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 161
Thus, we have to solve the minimisation problem
The mean square error is minimum if
is satisfied for i = 1,…, n. After some manipulations we obtain the equation system
where
a = (a1,…,an)T = argmin
a1,…,an
q(a1,…,an)( ).
∂q(a)∂ai a=a
= −2E X0 − aTX( )Xi
⎡⎣
⎤⎦ = 0
Ra = r,
R = E(XXT ) and r = E(X0X).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 162
With the estimate the minimum mean square er-ror is given by
q(a) = E X0 − aTX( )2⎡
⎣⎢⎤⎦⎥
= E X0 − (R−1r)TX( )2⎡
⎣⎢⎤⎦⎥= E X0 − r
TR−1X( )2⎡⎣⎢
⎤⎦⎥
= E X0X0 − 2rTR−1XX0 + r
TR−1XXTR−1r( )= E X0
2( )− 2rTR−1r + rTR−1RR−1r
= E X02( )− rTR−1r
a =R−1r
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 163
Orthogonality PrincipleThe coefficient vector that minimizes the mean square error satisfies the equation
Thus, the residual
is orthogonal to the observations X1,…,Xn. This is knownas the orthogonality principle.
a
E X0 − aTX( )Xi
⎡⎣
⎤⎦ = 0, i = 1,…,n
R = X0 − aTX = X0 − X0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 164
A geometric interpretation of this result is shown in the figure below, where X0, R and X1,…, Xn are thought of as vectors.
R = X0 − X0
0X
3X2X
1X
XaTX ˆˆ0 =
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 165
Consequently, 1) is the orthogonal projection of X0 onto the sub-
space U spanned by the observations X1,…,Xn. 2) R is orthogonal to the subspace U and therefore or-
thogonal to the observations X1,…,Xn.3) if and only
if X0 lies within the subspace U, i.e. X0 is linearly de-pendent on X1,…,Xn, thus .
4) if and only if X0 is or-thogonal to the subspace U.
X0
X0 = aTX
q( a) = E (X0 − X0)2( ) = E (X0 − X0)X0( ) = 0
q( a) = E (X0 − X0)X0( ) = E X02( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 166
1.10.5 Multivariate Normal DistributionCharacteristic Function of a Normally Distributed Vector-valued Random VariableLet U1,…, Um be m independent standardized normally distributed random variables, i.e. Uk ∼ N(0,1).
The characteristic function of Uk is given by
cf. exercise 1.9-9. Consequently, the characteristic function of the random vector U = (U1,…,Um)T is
ΦUk(rk ) = exp −
rk2
2
⎛
⎝⎜⎞
⎠⎟,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 167
Transformation of the random vector U by a (n´m) ma-trix A provides the random vector
that possesses the characteristic function
ΦU(r) = E exp( j rTU)( ) = E exp( j rkUk )
k=1
m
∏⎛⎝⎜⎞⎠⎟
= E exp( j rkUk )( )k=1
m
∏ = ΦUk(rk )
k=1
m
∏
= exp − 12
rk2
k=1
m
∑⎛⎝⎜
⎞⎠⎟= exp − 1
2rTr
⎛⎝⎜
⎞⎠⎟
V = V1,…,Vn( )T = AU
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 168
where
represents the covariance matrix of V.
ΦV(s) = E exp( j sTV)( ) = E exp( j sTAU)( )
= ΦU (sTA)T( ) = exp − 1
2sTA sTA( )T⎛
⎝⎜⎞⎠⎟
= exp − 12sTAATs
⎛⎝⎜
⎞⎠⎟= exp − 1
2sTΣs
⎛⎝⎜
⎞⎠⎟,
Σ = E VVT( ) = E AU AU( )T( ) = E AUUTAT( )= AE UUT( ) AT = AIAT = AAT
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 169
Translation of the random vector V by a constant vectorleads to the random vector
The characteristic function of W can be expressed by
FW (s) represents the characteristic function of the nor-mally distributed random vector, where and denote the vector-valued expected value and the symmetric and
W = W1,…,Wn( )T = V + µ = AU+ µ .
ΦW(s) = E exp( j sTW)( ) = E exp( j sT (V+ µ))( )
= exp( j sTµ)E exp( j sTV)( ) = exp j sTµ − 12sTΣs
⎛⎝⎜
⎞⎠⎟.
µ Σ
µ = µ1,…,µn( )T
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 170
non-negative definite covariance matrix, respectively, i.e.
and
Theorem:Let and Y =AX + b, where the entries of the (n´m) matrix A and the (n´1) vector b are constants. Then with
and
E (W − µ)(W − µ)T( )= E VVT( )= Σ = ΣT,
sTΣs ≥ 0 ∀s ∈!n.
E(W) = E(V + µ) =E(V)+ µ = µ
ΣY = E (Y − µY )(Y − µY )T( ) = AΣX A
T .
µY = E(Y) = AµX +b
X∼Nm(µX,ΣX )
Y∼Nn(µY,ΣY )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 171
Exercise 1.10-5:Proof of the Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 172
Density Function of a Normally Distributed Vector-valued Random VariableLet U1,…, Um be m independent standardized normally distributed random variables, i.e. Uk∼ N (0,1).
Thus, the density function of the random vector U = (U1, …,Um)T can be written as
fU(u) = fUk (uk )k=1
m
∏ = 1
2πexp − 1
2uk2⎛
⎝⎜⎞⎠⎟
⎛
⎝⎜⎞
⎠⎟k=1
m
∏
= (2π )−m2 exp − 1
2uk2
k=1
m
∑⎛⎝⎜
⎞⎠⎟= (2π )
−m2 exp − 1
2uTu
⎛⎝⎜
⎞⎠⎟
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 173
Now, we transform the random vector U by
where A denotes a regular (m´m) matrix, i.e. det(A)¹0.With the inverse transform
the determinant of the Jacobian
the well known identities
V = V1,…,Vm( )T = AU+ µ ,
U = A−1(V − µ),
det J(u)( ) = det ∂ v1,…,vm( )∂ u1,…,um( )
⎛
⎝⎜
⎞
⎠⎟ = det(A),
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 174
and the result derived for determining the density of a transformed random vector
det(A) = det AT( )det(Σ) = det AAT( )
= det(A)det AT( ) = det(A)( )2 > 0Σ−1 = AAT( ) −1 = AT( ) −1A −1 = A−1( )T A−1
fV(v) =fU(A
−1(v − µ))det(A)
,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 175
we obtain
fV(v) = (2π )−m2 det(A)
−1exp − 1
2A−1(v − µ)( )T A−1(v − µ)
⎛⎝⎜
⎞⎠⎟
= (2π )−m2 det(A)
−1exp − 1
2(v − µ)T A−1( )T A−1(v − µ)
⎛⎝⎜
⎞⎠⎟
= (2π )−m2 det(A)
−1exp − 1
2(v − µ)T AAT( )−1(v − µ)
⎛⎝⎜
⎞⎠⎟
= (2π )−m2 det(Σ)
− 12 exp − 1
2(v − µ)T Σ−1(v − µ)
⎛⎝⎜
⎞⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 176
Exercise 1.10-6:Transformation to standardized normal distribution
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 177
Composed Vector-valued Random VariablesTheorem:Let X be a random vector composed of the random vec-tors X1, X2 and obeying the distribution
Then X1 and X2 possessing the marginal distributions
and the conditional distribution
X =X1X2
⎛
⎝⎜
⎞
⎠⎟ ∼Nk µ =
µ1µ2
⎛
⎝⎜
⎞
⎠⎟ , Σ =
Σ11 Σ12Σ21 Σ22
⎛
⎝⎜
⎞
⎠⎟
⎛
⎝⎜
⎞
⎠⎟ .
X1∼Nk1µ1,Σ11( ), X2 ∼Nk2
µ2,Σ22( )
X1 |X2 = x2 ∼Nk1µ1 + Σ12Σ22
−1(x2 − µ2),Σ11 − Σ12Σ22−1Σ21( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 178
Exercise 1.10-7:Proof of the Theorem
Exercise 1.10-8:Mean square error estimation
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 179
1.11 Sequences of Random Variables1.11.1 Convergence ConceptsLet (Xn) = X1, X2,… be a sequence of random variables. For any specific x, (Xn(x)) is a sequence that might or might not converge and where the notion of convergence can be given several interpretations.
Recall, that a deterministic sequence (xn) tends to a limit x, if for any given e > 0, we can find a number Ne such that
|xn - x | < e for every n > Ne
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 180
Convergence (everywhere)If the sequence (Xn(x)) tends to a number X(x) for everyx Î X, then we say that the random sequence (Xn) con-verges everywhere to the random variable X and we write this as
Convergence almost surely (a.s.)If the probability of the set of all events x that satisfy limn®¥Xn(x) = X (x), equals 1, i.e.
limn→∞
Xn = X or Xn → X
P ξ : limn→∞
Xn(ξ) = X (ξ){ }( ) = P limn→∞
Xn = X( ) = 1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 181
or equivalently
then we say that the random sequence (Xn) converges almost surely (with probability 1) to the random variable X and we write this as
Convergence in Mean Square (m.s.)Let (Xn) be a sequence of random variables. We say that the sequence converges in mean square to a random variable X if
limn→∞
Xn = X or Xna.s.⎯ →⎯⎯ X .
limm→∞
P supn≥m
Xn − X ≥ ε⎛⎝
⎞⎠ = 0 for every ε > 0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 182
holds and we write this as
where l.i.m. denotes the limit in mean.
Convergence in Probability (p)A sequence of random variables (Xn) is said to converge in probability to a random variable X if for every e > 0, we have
and we write this as
l.i.m.n→∞
Xn = X or Xnm.s.⎯ →⎯⎯ X ,
limn→∞E Xn − X( )2⎡⎣⎢
⎤⎦⎥= 0
limn→∞P Xn − X ≥ ε( ) = 0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 183
where p lim denotes the limit in probability.
Convergence in Distribution (d)Let be the sequence of distribution functions of the sequence of random variables (Xn). Then (Xn) is said to converge in distribution (or in Law) to a random variable X with the distribution function Fx if
at all continuity points of Fx. Such a convergence is ex-pressed by
p limn→∞
Xn = X or Xnp⎯ →⎯ X ,
FXn n→∞⎯ →⎯⎯ FX
Xnd⎯ →⎯ X or Xn
L⎯ →⎯ X .
(FXn )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 184
Relationships among the various types of convergence1) Convergence with probability 1 implies convergence in
probability2) Convergence with probability 1 implies convergence in
mean square, provided second order moments exist. 3) Convergence in mean square implies convergence in
probability
4) Convergence in probability implies convergence in dis-tribution
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 185
Exercise 1.11-1:Proof of statement 1) and 3)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 186
1.11.2 Laws of Large NumbersChebyschev’s Theorem (Weak Law of Large Numbers) Let (Xk) be a sequence of pairwise uncorrelated random variables with
Then we have
where
E(Xk ) = µk , Var(Xk ) = σ k2 and lim
n→∞
1n2
σ k2
k=1
n
∑ = 0.
Xn =1n
Xkk=1
n
∑ and limn→∞
1n
µkk=1
n
∑ = µ.
Xnm.s.n→∞⎯ →⎯⎯ µ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 187
Exercise 1.11-2:Proof of Chebyschev’s Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 188
Kolmogorov’s Theorem (Strong Law of Large Numbers)Let (Xk) be a sequence of independent and identically distributed (i.i.d.) random variables. Furthermore, the mo-ments E(Xk) exist and are equal to µ. Then we have
where
Xna.s.n→∞⎯ →⎯⎯ µ,
Xn =1n
Xkk=1
n
∑ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 189
1.11.3 Central Limit Theorems Lindeberg-Levy’s TheoremLet (Xk) be a sequence of independent and identically distributed random variables, such that E(Xk) = µ and Var(Xk)=s 2¹0. Then the distribution function of the ran-dom variable
tends to that of a standardized normal distribution as napproaches infinity, i.e.
Yn = nXn − µσ
⎛⎝⎜
⎞⎠⎟
FYn (y) n→∞⎯ →⎯⎯ Φ(y).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 190
Exercise 1.11-3:Proof of Lindeberg-Levy’s Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 191
Liaponov’s TheoremLet (Xk) be a sequence of independent distributed ran-dom variables, such that E(Xk) = µk, Var(Xk) =sk
2 ¹ 0 and E|Xk - µk|3 = bk. Furthermore, let
Then the distribution function of the random variable
tends to that of a standardized normal distribution as napproaches infinity, i.e.
Bn = βkk=1
n∑( )1 3 , Cn = σ k2
k=1
n∑( )1 2 and limn→∞Bn Cn = 0.
Yn =1Cn
Xk − µk( )k=1
n∑
FYn (y) n→∞⎯ →⎯⎯ Φ(y).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 1 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 192
References to Chapter 1[1] J.F. Böhme, Stochastische Signale, Teubner, 1998[2] M. Fisz, Probability Theory and Mathematical Statistics,
Krieger Publishing Company, 1980[3] G. Hänsler, Statistische Signale, Springer, 2001 [4] S. Kay, Intuitive Probability and Random Processes using
MATLAB, Springer, 2006[5] A. Papoulis, Probability, Random Variables and Stochastic
Processes, McGraw-Hill, 1991[6] C.R. Rao, Linear Statistical Inference and Its Application,
John Wiley, 1973[7] G. Wunsch, H. Schreiber, Stochastische Systeme,
VEB-Verlag Technik, 1982