Ch2 probability and random variables pg 81

Principles of Communication Prof. V. Venkata Rao

Indian Institute of Technology Madras

2.1

2 UUCHAPTER 2

Probability and Random Variables

2.1 Introduction At the start of Sec. 1.1.2, we had indicated that one of the possible ways

of classifying the signals is: deterministic or random. By random we mean

unpredictable; that is, in the case of a random signal, we cannot with certainty

predict its future value, even if the entire past history of the signal is known. If the

signal is of the deterministic type, no such uncertainty exists.

Consider the signal ( ) ( )1cos 2x t A f t= π + θ . If A , θ and 1f are known,

then (we are assuming them to be constants) we know the value of ( )x t for all t .

( A , θ and 1f can be calculated by observing the signal over a short period of

time).

Now, assume that ( )x t is the output of an oscillator with very poor

frequency stability and calibration. Though, it was set to produce a sinusoid of

frequency 1f f= , frequency actually put out maybe f1' where ( )f f f1 1 1

' ∈ ± ∆ .

Even this value may not remain constant and could vary with time. Then,

observing the output of such a source over a long period of time would not be of

much use in predicting the future values. We say that the source output varies in

a random manner.

Another example of a random signal is the voltage at the terminals of a

receiving antenna of a radio communication scheme. Even if the transmitted



2.2

(radio) signal is from a highly stable source, the voltage at the terminals of a

receiving antenna varies in an unpredictable fashion. This is because the

conditions of propagation of the radio waves are not under our control.

But randomness is the essence of communication. Communication

theory involves the assumption that the transmitter is connected to a source,

whose output, the receiver is not able to predict with certainty. If the students

know ahead of time what is the teacher (source + transmitter) is going to say

(and what jokes he is going to crack), then there is no need for the students (the

receivers) to attend the class!

Although less obvious, it is also true that there is no communication

problem unless the transmitted signal is disturbed during propagation or

reception by unwanted (random) signals, usually termed as noise and

interference. (We shall take up the statistical characterization of noise in

Chapter 3.)

However, quite a few random signals, though their exact behavior is

unpredictable, do exhibit statistical regularity. Consider again the reception of

radio signals propagating through the atmosphere. Though it would be difficult to

know the exact value of the voltage at the terminals of the receiving antenna at

any given instant, we do find that the average values of the antenna output over

two successive one minute intervals do not differ significantly. If the conditions of

propagation do not change very much, it would be true of any two averages (over

one minute) even if they are well spaced out in time. Consider even a simpler

experiment, namely, that of tossing an unbiased coin (by a person without any

magical powers). It is true that we do not know in advance whether the outcome

on a particular toss would be a head or tail (otherwise, we stop tossing the coin

at the start of a cricket match!). But, we know for sure that in a long sequence of

tosses, about half of the outcomes would be heads (If this does not happen, we

suspect either the coin or tosser (or both!)).



2.3

Statistical regularity of averages is an experimentally verifiable

phenomenon in many cases involving random quantities. Hence, we are tempted

to develop mathematical tools for the analysis and quantitative characterization

of random signals. To be able to analyze random signals, we need to understand

random variables. The resulting mathematical topics are: probability theory,

random variables and random (stochastic) processes. In this chapter, we shall

develop the probabilistic characterization of random variables. In chapter 3, we

shall extend these concepts to the characterization of random processes.

2.2 Basics of Probability We shall introduce some of the basic concepts of probability theory by

defining some terminology relating to random experiments (i.e., experiments

whose outcomes are not predictable).

2.2.1. Terminology Def. 2.1: Outcome

The end result of an experiment. For example, if the experiment consists

of throwing a die, the outcome would be anyone of the six faces, 1 6,........,F F

Def. 2.2: Random experiment An experiment whose outcomes are not known in advance. (e.g. tossing a

coin, throwing a die, measuring the noise voltage at the terminals of a resistor

etc.)

Def. 2.3: Random event A random event is an outcome or set of outcomes of a random experiment

that share a common attribute. For example, considering the experiment of

throwing a die, an event could be the 'face 1F ' or 'even indexed faces'

( 2 4 6, ,F F F ). We denote the events by upper case letters such as A , B or

A A1 2, ⋅ ⋅ ⋅ ⋅



2.4

Def. 2.4: Sample space The sample space of a random experiment is a mathematical abstraction

used to represent all possible outcomes of the experiment. We denote the

sample space by S .

Each outcome of the experiment is represented by a point in S and is

called a sample point. We use s (with or without a subscript), to denote a sample

point. An event on the sample space is represented by an appropriate collection

of sample point(s).

Def. 2.5: Mutually exclusive (disjoint) events Two events A and B are said to be mutually exclusive if they have no

common elements (or outcomes).Hence if A and B are mutually exclusive, they

cannot occur together.

Def. 2.6: Union of events

The union of two events A and B , denoted A B∪ , {also written as

( )A B+ or ( A or B )} is the set of all outcomes which belong to A or B or both.

This concept can be generalized to the union of more than two events.

Def. 2.7: Intersection of events The intersection of two events, A and B , is the set of all outcomes which

belong to A as well as B . The intersection of A and B is denoted by ( )A B∩

or simply ( )AB . The intersection of A and B is also referred to as a joint event

A and B . This concept can be generalized to the case of intersection of three or

more events.

Def. 2.8: Occurrence of an event An event A of a random experiment is said to have occurred if the

experiment terminates in an outcome that belongs to A .



2.5

Def. 2.9: Complement of an event

The complement of an event A , denoted by A is the event containing all

points in S but not in A .

Def. 2.10: Null event

The null event, denoted φ , is an event with no sample points. Thus φ = S

(note that if A and B are disjoint events, then AB = φ and vice versa).

2.2.2 Probability of an Event The probability of an event has been defined in several ways. Two of the

most popular definitions are: i) the relative frequency definition, and ii) the

classical definition.

Def. 2.11: The relative frequency definition:

Suppose that a random experiment is repeated n times. If the event A

occurs An times, then the probability of A , denoted by ( )P A , is defined as

( ) lim An

nP An→ ∞

⎛ ⎞= ⎜ ⎟⎝ ⎠

(2.1)

Ann

⎛ ⎞⎜ ⎟⎝ ⎠

represents the fraction of occurrence of A in n trials.

For small values of n , it is likely that Ann

⎛ ⎞⎜ ⎟⎝ ⎠

will fluctuate quite badly. But

as n becomes larger and larger, we expect, Ann

⎛ ⎞⎜ ⎟⎝ ⎠

to tend to a definite limiting

value. For example, let the experiment be that of tossing a coin and A the event

'outcome of a toss is Head'. If n is the order of 100, Ann

⎛ ⎞⎜ ⎟⎝ ⎠

may not deviate from



2.6

12

by more than, say ten percent and as n becomes larger and larger, we

expect Ann

⎛ ⎞⎜ ⎟⎝ ⎠

to converge to 12

.

Def. 2.12: The classical definition: The relative frequency definition given above has empirical flavor. In the

classical approach, the probability of the event A is found without

experimentation. This is done by counting the total number N of the possible

outcomes of the experiment. If AN of those outcomes are favorable to the

occurrence of the event A , then

( ) ANP AN

= (2.2)

where it is assumed that all outcomes are equally likely!

Whatever may the definition of probability, we require the probability

measure (to the various events on the sample space) to obey the following

postulates or axioms:

P1) ( ) 0P A ≥ (2.3a)

P2) ( ) 1P =S (2.3b)

P3) ( )AB = φ , then ( ) ( ) ( )P A B P A P B+ = + (2.3c)

(Note that in Eq. 2.3(c), the symbol + is used to mean two different things;

namely, to denote the union of A and B and to denote the addition of two real

numbers). Using Eq. 2.3, it is possible for us to derive some additional

relationships:

i) If AB ≠ φ , then ( ) ( ) ( ) ( )P A B P A P B P AB+ = + − (2.4)

ii) Let 1 2, ,......, nA A A be random events such that:

a) i jA A = φ , for i j≠ and (2.5a)



2.7

b) 1 2 ...... nA A A+ + + = S . (2.5b)

Then, ( ) ( ) ( ) ( )1 2 ...... nP A P A A P A A P A A= + + + (2.6)

where A is any event on the sample space.

Note: nA A A1 2, , ,⋅ ⋅ ⋅ are said to be mutually exclusive (Eq. 2.5a) and exhaustive

(Eq. 2.5b).

iii) ( ) ( )1P A P A= − (2.7)

The derivation of Eq. 2.4, 2.6 and 2.7 is left as an exercise.

A very useful concept in probability theory is that of conditional

probability, denoted ( )|P B A ; it represents the probability of B occurring, given

that A has occurred. In a real world random experiment, it is quite likely that the

occurrence of the event B is very much influenced by the occurrence of the

event A . To give a simple example, let a bowl contain 3 resistors and 1

capacitor. The occurrence of the event 'the capacitor on the second draw' is very

much dependent on what has been drawn at the first instant. Such dependencies

between the events is brought out using the notion of conditional probability .

The conditional probability ( )|P B A can be written in terms of the joint

probability ( )P AB and the probability of the event ( )P A . This relation can be

arrived at by using either the relative frequency definition of probability or the

classical definition. Using the former, we have

( ) lim AB

n

nP AB

n→ ∞

⎛ ⎞= ⎜ ⎟

⎝ ⎠

( ) lim An

nP An→ ∞

⎛ ⎞= ⎜ ⎟⎝ ⎠



2.8

where ABn is the number of times AB occurs in n repetitions of the experiment.

As ( )|P B A refers to the probability of B occurring, given that A has occurred,

we have

Def 2.13: Conditional Probability

( )| lim AB

n A

nP B A

n→ ∞=

( )( ) ( )lim , 0

AB

n A

nP ABn P An P A

n→ ∞

⎛ ⎞⎜ ⎟

= = ≠⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

(2.8a)

or ( ) ( ) ( )|P AB P B A P A=

Interchanging the role of A and B , we have

( ) ( )( ) ( )| , 0

P ABP A B P B

P B= ≠ (2.8b)

Eq. 2.8(a) and 2.8(b) can be written as

( ) ( ) ( ) ( ) ( )| |P AB P B A P A P B P A B= = (2.9)

In view of Eq. 2.9, we can also write Eq. 2.8(a) as

( ) ( ) ( )( )

||

P B P A BP B A

P A= , ( )P A 0≠ (2.10a)

Similarly

( ) ( ) ( )( )

P A P B AP A B

P B|

| = , ( )P B 0≠ (2.10b)

Eq. 2.10(a) or 2.10(b) is one form of Bayes’ rule or Bayes’ theorem.

Eq. 2.9 expresses the probability of joint event AB in terms of conditional

probability, say ( )|P B A and the (unconditional) probability ( )P A . Similar

relation can be derived for the joint probability of a joint event involving the

intersection of three or more events. For example ( )P ABC can be written as



2.9

( ) ( ) ( )|P ABC P AB P C AB=

( ) ( ) ( )| |P A P B A P C AB= (2.11)

Another useful probabilistic concept is that of statistical independence.

Suppose the events A and B are such that

( ) ( )|P B A P B= (2.13)

That is, knowledge of occurrence of A tells no more about the probability of

occurrence B than we knew without that knowledge. Then, the events A and B

are said to be statistically independent. Alternatively, if A and B satisfy the

Eq. 2.13, then

( ) ( ) ( )P AB P A P B= (2.14)

Either Eq. 2.13 or 2.14 can be used to define the statistical independence

of two events. Note that if A and B are independent, then

( ) ( ) ( )P AB P A P B= , whereas if they are disjoint, then ( ) 0P AB = . The notion

of statistical independence can be generalized to the case of more than two

events. A set of k events 1 2, ,......, kA A A are said to be statistically independent

if and only if (iff) the probability of every intersection of k or fewer events equal

the product of the probabilities of its constituents. Thus three events , ,A B C are

independent when

Exercise 2.1

Let 1 2, ,......, nA A A be n mutually exclusive and exhaustive events

and B is another event defined on the same space. Show that

( ) ( ) ( )( ) ( )

1

||

|

j jj n

j ji

P B A P AP A B

P B A P A=

=

∑ (2.12)

Eq. 2.12 represents another form of Bayes’ theorem.



2.10

( ) ( ) ( )P AB P A P B=

( ) ( ) ( )P AC P A P C=

( ) ( ) ( )P BC P B P C=

and ( ) ( ) ( ) ( )P ABC P A P B P C=

We shall illustrate some of the concepts introduced above with the help of

two examples.

Example 2.1

Priya (P1) and Prasanna (P2), after seeing each other for some time (and

after a few tiffs) decide to get married, much against the wishes of the parents on

both the sides. They agree to meet at the office of registrar of marriages at 11:30

a.m. on the ensuing Friday (looks like they are not aware of Rahu Kalam or they

don’t care about it).

However, both are somewhat lacking in punctuality and their arrival times

are equally likely to be anywhere in the interval 11 to 12 hrs on that day. Also

arrival of one person is independent of the other. Unfortunately, both are also

very short tempered and will wait only 10 min. before leaving in a huff never to

meet again.

a) Picture the sample space

b) Let the event A stand for “P1 and P2 meet”. Mark this event on the sample

space.

c) Find the probability that the lovebirds will get married and (hopefully) will

live happily ever after.

a) The sample space is the rectangle, shown in Fig. 2.1(a).



2.11

Fig. 2.1(a): S of Example 2.1

b) The diagonal OP represents the simultaneous arrival of Priya and

Prasanna. Assuming that P1 arrives at 11: x , meeting between P1 and P2

would take place if P2 arrives within the interval a to b , as shown in the

figure. The event A , indicating the possibility of P1 and P2 meeting, is

shown in Fig. 2.1(b).

Fig. 2.1(b): The event A of Example 2.1



2.12

c) Shaded area 11Probability of marriageTotal area 36

= =

Example 2.2: Let two honest coins, marked 1 and 2, be tossed together. The four

possible outcomes are 1 2T T , 1 2T H , 1 2H T , 1 2H H . ( 1T indicates toss of coin 1

resulting in tails; similarly 2T etc.) We shall treat that all these outcomes are

equally likely; that is the probability of occurrence of any of these four outcomes

is 14

. (Treating each of these outcomes as an event, we find that these events

are mutually exclusive and exhaustive). Let the event A be 'not 1 2H H ' and B be

the event 'match'. (Match comprises the two outcomes 1 2T T , 1 2H H ). Find

( )|P B A . Are A and B independent?

We know that ( ) ( )( )

|P AB

P B AP A

= .

AB is the event 'not 1 2H H ' and 'match'; i.e., it represents the outcome 1 2T T .

Hence ( ) 14

P AB = . The event A comprises of the outcomes ‘ 1 2T T , 1 2T H and

1 2H T ’; therefore,

( ) 34

P A =

( )1

14| 3 34

P B A = =

Intuitively, the result ( ) 1|3

P B A = is satisfying because, given 'not 1 2H H ’ the

toss would have resulted in anyone of the three other outcomes which can be



2.13

treated to be equally likely, namely 13

. This implies that the outcome 1 2T T given

'not 1 2H H ', has a probability of 13

.

As ( ) 12

P B = and ( ) 1|3

P B A = , A and B are dependent events.

2.3 Random Variables Let us introduce a transformation or function, say X , whose domain is the

sample space (of a random experiment) and whose range is in the real line; that

is, to each i ∈s S , X assigns a real number, ( )iX s , as shown in Fig.2.2.

Fig. 2.2: A mapping ( )X from S to the real line.

The figure shows the transformation of a few individual sample points as

well as the transformation of the event A , which falls on the real line segment

[ ]1 2,a a .

2.3.1 Distribution function: Taking a specific case, let the random experiment be that of throwing a

die. The six faces of the die can be treated as the six sample points in S ; that is,



2.14

, 1, 2, ...... , 6i iF s i= = . Let ( )iX s i= . Once the transformation is induced,

then the events on the sample space will become transformed into appropriate

segments of the real line. Then we can enquire into the probabilities such as

( ){ }P s X s a1:⎡ ⎤<⎣ ⎦

( ){ }P s b X s b1 2:⎡ ⎤< ≤⎣ ⎦

or

( ){ }:P s X s c⎡ ⎤=⎣ ⎦

These and other probabilities can be arrived at, provided we know the

Distribution Function of X, denoted by ( )XF which is given by

( ) ( ){ }XF x P s X s x:⎡ ⎤= ≤⎣ ⎦ (2.15)

That is, ( )XF x is the probability of the event, comprising all those sample points

which are transformed by X into real numbers less than or equal to x . (Note

that, for convenience, we use x as the argument of ( )XF . But, it could be any

other symbol and we may use ( )XF α , ( )1XF a etc.) Evidently, ( )XF is a function

whose domain is the real line and whose range is the interval [ ]0, 1 ).

As an example of a distribution function (also called Cumulative

Distribution Function CDF), let S consist of four sample points, 1s to 4s , with

each with sample point representing an event with the probabilities ( )114

P s = ,

( )218

P s = , ( )318

P s = and ( )412

P s = . If ( ) 1.5, 1, 2, 3, 4iX s i i= − = , then

the distribution function ( )XF x , will be as shown in Fig. 2.3.



2.15

Fig. 2.3: An example of a distribution function

( )XF satisfies the following properties:

i) ( ) 0,XF x x≥ − ∞ < < ∞

ii) ( ) 0XF − ∞ =

iii) ( ) 1XF ∞ =

iv) If a b> , then ( ) ( ) ( ){ }:X XF a F b P s b X s a⎡ ⎤⎡ ⎤− = < ≤⎣ ⎦ ⎣ ⎦

v) If a b> , then ( ) ( )X XF a F b≥

The first three properties follow from the fact that ( )XF represents the

probability and ( ) 1P =S . Properties iv) and v) follow from the fact

( ){ } ( ){ } ( ){ }: : :s X s b s b X s a s X s a≤ ∪ < ≤ = ≤

Referring to the Fig. 2.3, note that ( ) 0XF x = for 0.5x < − whereas

( )XF 10.54

− = . In other words, there is a discontinuity of 14

at the point

x 0.5= − . In general, there is a discontinuity in XF of magnitude aP at a point

x a= , if and only if

( ){ }: aP s X s a P⎡ ⎤= =⎣ ⎦ (2.16)



2.16

The properties of the distribution function are summarized by saying that

( )xF is monotonically non-decreasing, is continuous to the right at each point

x TP

1PT, and has a step of size aP at point a if and if Eq. 2.16 is satisfied.

Functions such as ( )X for which distribution functions exist are called

Random Variables (RV). In other words, for any real x , ( ){ }:s X s x≤ should

be an event in the sample space with some assigned probability. (The term

“random variable” is somewhat misleading because an RV is a well defined

function from S into the real line.) However, every transformation from S into

the real line need not be a random variable. For example, let S consist of six

sample points, 1s to 6s . The only events that have been identified on the sample

space are: { } { } { }1 2 3 4 5 6, , , , andA s s B s s s C s= = = and their probabilities

are ( ) ( ) ( )= = =P A P B P C2 1 1, and6 2 6

. We see that the probabilities for the

various unions and intersections of A , B and C can be obtained.

Let the transformation X be ( )iX s i= . Then the distribution function

fails to exist because

[ ] ( )4: 3.5 4.5P s x P s< ≤ = is not known as 4s is not an event on the sample

space.

TP

1PT Let x a= . Consider, with 0∆ > ,

( ) ( ) ( )lim lim0 0

X XP a X s a F a F a⎡ ⎤ ⎡ ⎤< ≤ + ∆ = + ∆ −⎣ ⎦ ⎣ ⎦∆ → ∆ →

We intuitively feel that as 0∆ → , the limit of the set ( ){ }:s a X s a< ≤ + ∆ is the null set and

can be proved to be so. Hence,

( ) ( ) 0X X

a F aF + − = , where ( )0

lima a+ ∆ →= + ∆

That is, ( )X

F x is continuous to the right.



2.17

2.3.2 Probability density function

Though the CDF is a very fundamental concept, in practice we find it more

convenient to deal with Probability Density Function (PDF). The PDF, ( )Xf x is

defined as the derivative of the CDF; that is

( ) ( )XX

d F xf x

d x= (2.17a)

or

( ) ( )x

X XF x f d− ∞

= α α∫ (2.17b)

The distribution function may fail to have a continuous derivative at a point

x a= for one of the two reasons:

i) the slope of the ( )xF x is discontinuous at x a=

ii) ( )xF x has a step discontinuity at x a=

The situation is illustrated in Fig. 2.4.

Exercise 2.2

Let S be a sample space with six sample points, 1s to 6s . The events

identified on S are the same as above, namely, { },1 2A s s= ,

{ }3 4 5, ,B s s s= and { }6C s= with ( ) ( ) ( )1 1 1, and3 2 6

P A P B P C= = = .

Let ( )Y be the transformation,

( )1, 1, 22 , 3, 4, 53 , 6

i

iY s i

i

=⎧⎪= =⎨⎪ =⎩

Show that ( )Y is a random variable by finding ( )YF y . Sketch ( )YF y .



2.18

Fig. 2.4: A CDF without a continuous derivative

As can be seen from the figure, ( )XF x has a discontinuous slope at

1x = and a step discontinuity at 2x = . In the first case, we resolve the

ambiguity by taking Xf to be a derivative on the right. (Note that ( )XF x is

continuous to the right.) The second case is taken care of by introducing the

impulse in the probability domain. That is, if there is a discontinuity in XF at

x a= of magnitude aP , we include an impulse ( )aP x aδ − in the PDF. For

example, for the CDF shown in Fig. 2.3, the PDF will be,

( ) 1 1 1 1 1 3 1 54 2 8 2 8 2 2 2Xf x x x x x⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= δ + + δ − + δ − + δ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ (2.18)

In Eq. 2.18, ( )Xf x has an impulse of weight 18

at 12

x = as

1 12 8

P X⎡ ⎤= =⎢ ⎥⎣ ⎦. This impulse function cannot be taken as the limiting case of

an even function (such as 1 xga ⎛ ⎞⎜ ⎟ε ε⎝ ⎠

) because,

( )1 12 2

0 01 12 2

1 1 1lim lim8 2 16Xf x dx x dx

ε → ε →− ε − ε

⎛ ⎞= δ − ≠⎜ ⎟⎝ ⎠∫ ∫



2.19

However, ( )ε →

− ε

=∫12

012

1lim8Xf x dx . This ensures,

( )⎧ − ≤ <⎪⎪= ⎨⎪ ≤ <⎪⎩

2 1 1,8 2 23 1 3,8 2 2

X

xF x

x

Such an impulse is referred to as the left-sided delta function.

As XF is non-decreasing and ( ) 1XF ∞ = , we have

i) ( ) 0Xf x ≥ (2.19a)

ii) ( ) 1Xf x dx∞

− ∞

=∫ (2.19b)

Based on the behavior of CDF, a random variable can be classified as:

i) continuous (ii) discrete and (iii) mixed. If the CDF, ( )XF x , is a continuous

function of x for all x , then X TP

1PT is a continuous random variable. If ( )XF x is a

staircase, then X corresponds to a discrete variable. We say that X is a mixed

random variable if ( )XF x is discontinuous but not a staircase. Later on in this

lesson, we shall discuss some examples of these three types of variables.

We can induce more than one transformation on a given sample space. If

we induce k such transformations, then we have a set of k co-existing random

variables.

2.3.3 Joint distribution and density functions Consider the case of two random variables, say X and Y . They can be

characterized by the (two-dimensional) joint distribution function, given by

( ) ( ) ( ){ }, , : ,X YF x y P s X s x Y s y⎡ ⎤= ≤ ≤⎣ ⎦ (2.20)

TP

1PT As the domain of the random variable ( )X is known, it is convenient to denote the variable

simply by X .



2.20

That is, ( ), ,X YF x y is the probability associated with the set of all those

sample points such that under X , their transformed values will be less than or

equal to x and at the same time, under Y , the transformed values will be less

than or equal to y . In other words, ( ), 1 1,X YF x y is the probability associated with

the set of all sample points whose transformation does not fall outside the

shaded region in the two dimensional (Euclidean) space shown in Fig. 2.5.

Fig. 2.5: Space of ( ) ( ){ },X s Y s corresponding to ( ), 1 1,X YF x y

Looking at the sample space S , let A be the set of all those sample

points s ∈ S such that ( ) 1X s x≤ . Similarly, if B is comprised of all those

sample points s ∈ S such that ( ) 1Y s y≤ ; then ( )1 1,F x y is the probability

associated with the event AB .

Properties of the two dimensional distribution function are:

i) ( ), , 0 , ,X YF x y x y≥ − ∞ < < ∞ − ∞ < < ∞

ii) ( ) ( ), ,, , 0X Y X YF y F x− ∞ = − ∞ =

iii) ( ), , 1X YF ∞ ∞ =

iv) ( ) ( ), ,X Y YF y F y∞ =

v) ( ) ( ), ,X Y XF x F x∞ =



2.21

vi) If 2 1x x> and 2 1y y> , then

( ) ( ) ( ), 2 2 , 2 1 , 1 1, , ,X Y X Y X YF x y F x y F x y≥ ≥

We define the two dimensional joint PDF as

( ) ( )∂=

∂ ∂X Y X Yf x y F x yx y

2

, ,, , (2.21a)

or ( ) ( ), ,, ,y x

X Y X YF x y f d d− ∞ − ∞

= α β α β∫ ∫ (2.21b)

The notion of joint CDF and joint PDF can be extended to the case of k random

variables, where 3k ≥ .

Given the joint PDF of random variables X and Y , it is possible to obtain

the one dimensional PDFs, ( )Xf x and ( )Yf y . We know that,

( ) ( ), 1 1,X Y XF x F x∞ = .

That is, ( ) ( )1

1 , ,x

X X YF x f d d∞

− ∞ − ∞

= α β β α∫ ∫

( ) ( )1

1 ,1

,x

X X Ydf x f d d

d x

∞

− ∞ − ∞

⎧ ⎫⎡ ⎤⎪ ⎪= α β β α⎢ ⎥⎨ ⎬⎢ ⎥⎪ ⎪⎣ ⎦⎩ ⎭

∫ ∫ (2.22)

Eq. 2.22 involves the derivative of an integral. Hence,

( ) ( ) ( )1

1 , 1,XX X Y

x x

d F xf x f x d

d x

∞

− ∞=

= = β β∫

or ( ) ( ), ,X X Yf x f x y dy∞

− ∞

= ∫ (2.23a)

Similarly, ( ) ( ), ,Y X Yf y f x y dx∞

− ∞

= ∫ (2.23b)

(In the study of several random variables, the statistics of any individual variable

is called the marginal. Hence it is common to refer ( )XF x as the marginal



2.22

distribution function of X and ( )Xf x as the marginal density function. ( )YF y and

( )Yf y are similarly referred to.)

2.3.4 Conditional density Given ( ), ,X Yf x y , we know how to obtain ( )Xf x or ( )Yf y . We might also

be interested in the PDF of X given a specific value of 1y y= . This is called the

conditional PDF of X , given 1y y= , denoted ( )X Yf x y| 1| and defined as

( ) ( )( )

X YX Y

Y

f x yf x y

f y, 1

| 11

,| = (2.24)

where it is assumed that ( )1 0Yf y ≠ . Once we understand the meaning of

conditional PDF, we might as well drop the subscript on y and denote it by

( )| |X Yf x y . An analogous relation can be defined for ( )| |Y Xf y x . That is, we have

the pair of equations,

( ) ( )( )

,|

,| X Y

X YY

f x yf x y

f y= (2.25a)

and ( ) ( )( )

,|

,| X Y

Y XX

f x yf y x

f x= (2.25b)

or ( ) ( ) ( ), |, |X Y X Y Yf x y f x y f y= (2.25c)

( ) ( )| |Y X Xf y x f x= (2.25d)

The function ( )| |X Yf x y may be thought of as a function of the variable x with

variable y arbitrary, but fixed. Accordingly, it satisfies all the requirements of an

ordinary PDF; that is,

( )| | 0X Yf x y ≥ (2.26a)

and ( )| | 1X Yf x y dx∞

− ∞

=∫ (2.26b)



2.23

2.3.5 Statistical independence In the case of random variables, the definition of statistical independence

is somewhat simpler than in the case of events. We call k random variables

1 2, , ......... , kX X X statistically independent iff, the k -dimensional joint PDF

factors into the product

( )1 i

k

X ii

f x=∏ (2.27)

Hence, two random variables X and Y are statistically independent, iff,

( ) ( ) ( ), ,X Y X Yf x y f x f y= (2.28)

and three random variables X , Y , Z are independent, iff

( ) ( ) ( ) ( ), , , ,X Y Z X Y zf x y z f x f y f z= (2.29)

Statistical independence can also be expressed in terms of conditional

PDF. Let X and Y be independent. Then,

( ) ( ) ( ), ,X Y X Yf x y f x f y=

Making use of Eq. 2.25(c), we have

( ) ( ) ( ) ( )| |X Y Y X Yf x y f y f x f y=

or ( ) ( )| |X Y Xf x y f x= (2.30a)

Similarly, ( ) ( )| |Y X Yf y x f y= (2.30b)

Eq. 2.30(a) and 2.30(b) are alternate expressions for the statistical independence

between X and Y . We shall now give a few examples to illustrate the concepts

discussed in sec. 2.3.

Example 2.3

A random variable X has

( ) 2

0 , 0, 0 10

100 , 10X

xF x K x x

K x

<⎧⎪= ≤ ≤⎨⎪ >⎩

i) Find the constant K

ii) Evaluate ( )5P X ≤ and ( )5 7P X< ≤



2.24

iii) What is ( )Xf x ?

i) ( )XF K K 1100 1100

∞ = = ⇒ = .

ii) ( ) ( ) 15 5 25 0.25100XP x F ⎛ ⎞≤ = = × =⎜ ⎟⎝ ⎠

( ) ( ) ( )5 7 7 5 0.24X XP X F F< ≤ = − =

( ) ( )0 , 00.02 , 0 100 , 10

XX

xd F x

f x x xd x

x

<⎧⎪= = ≤ ≤⎨⎪ >⎩

Note: Specification of ( )Xf x or ( )XF x is complete only when the algebraic

expression as well as the range of X is given.

Example 2.4 Consider the random variable X defined by the PDF

( ) ,b xXf x ae x−= − ∞ < < ∞ where a and b are positive constants.

i) Determine the relation between a and b so that ( )Xf x is a PDF.

ii) Determine the corresponding ( )XF x

iii) Find [ ]1 2P X< ≤ .

i) As can be seen ( ) 0Xf x ≥ for x− ∞ < < ∞ . In order for ( )Xf x to

represent a legitimate PDF, we require 0

2 1b x b xae dx ae dx∞ ∞

− −

− ∞

= =∫ ∫ .

That is, 0

12

b xae dx∞

− =∫ ; hence 2b a= .

ii) the given PDF can be written as

( ) , 0, 0 .

b x

X b x

a e xf x

a e x−

⎧ <⎪= ⎨≥⎪⎩



2.25

For 0x < , we have

( ) 12 2

xb b x

XbF x e d eα

− ∞

= α =∫ .

Consider 0x > . Take a specific value of 2x =

( ) ( )2

2X XF f x d x− ∞

= ∫

( )2

1 Xf x d x∞

= − ∫

But for the problem on hand, ( ) ( )X Xf x d x f x d x2

2

−∞

− ∞

=∫ ∫

Therefore, ( ) 11 , 02

b xXF x e x−= − >

We can now write the complete expression for the CDF as

( )1 , 0211 , 02

b x

Xb x

e xF x

e x−

⎧ <⎪⎪= ⎨⎪ − ≥⎪⎩

iii) ( ) ( ) ( )212 12

b bX XF F e e− −− = −

Example 2.5

Let ( ),

1 , 0 , 0 2, 2

0 ,X Y

x y yf x y

otherwise

⎧ ≤ ≤ ≤ ≤⎪= ⎨⎪⎩

Find (a) i) ( )| | 1Y Xf y and ii) ( )| | 1.5Y Xf y

(b) Are X and Y independent?

(a) ( ) ( )( )

,|

,| X Y

Y XX

f x yf y x

f x=



2.26

( )Xf x can be obtained from ,X Yf by integrating out the unwanted variable y

over the appropriate range. The maximum value taken by y is 2 ; in

addition, for any given ,x y x≥ . Hence,

( )2 1 1

2 2Xx

xf x d y= = −∫ , 0 2x≤ ≤

Hence,

(i) ( )|

1 1 , 1 22|1 1 0 ,2Y X

yf y

otherwise≤ ≤⎧

= = ⎨⎩

(ii) ( )|

1 2 , 1.5 22|1.5 1 0 ,4Y X

yf y

otherwise≤ ≤⎧

= = ⎨⎩

b) the dependence between the random variables X and Y is evident from

the statement of the problem because given a value of 1X x= , Y should

be greater than or equal to 1x for the joint PDF to be non zero. Also we see

that ( )| |Y Xf y x depends on x whereas if X and Y were to be independent,

then ( ) ( )| |Y X Yf y x f y= .



2.27

Exercise 2.3 For the two random variables X and Y , the following density functions

have been given. (Fig. 2.6)

Fig. 2.6: PDFs for the exercise 2.3

Find

a) ( ), ,X Yf x y

b) Show that

( ), 0 10

1001 , 10 205 100

Y

y yf y

y y

⎧ ≤ ≤⎪⎪= ⎨⎪ − < ≤⎪⎩

2.4 Transformation of Variables

The process of communication involves various operations such as

modulation, detection, filtering etc. In performing these operations, we typically

generate new random variables from the given ones. We will now develop the

necessary theory for the statistical characterization of the output random

variables, given the transformation and the input random variables.



2.28

2.4.1 Functions of one random variable Assume that X is the given random variable of the continuous type with

the PDF, ( )Xf x . Let Y be the new random variable, obtained from X by the

transformation ( )Y g X= . By this we mean the number ( )1Y s associated with

any sample point 1s is

( ) ( )( )1 1Y s g X s=

Our interest is to obtain ( )Yf y . This can be obtained with the help of the following

theorem (Thm. 2.1).

Theorem 2.1

Let ( )Y g X= . To find ( )Yf y , for the specific y , solve the equation

( )y g x= . Denoting its real roots by nx , ( ) ( )1 ,......, , .......ny g x g x= = = = ,

we will show that

( ) ( )( )

( )( )

1

1

........... .........' '

X X nY

n

f x f xf y

g x g x= + + + (2.31)

where ( )'g x is the derivative of ( )g x .

Proof: Consider the transformation shown in Fig. 2.7. We see that the equation

( )y g x1 = has three roots namely, 1 2 3, andx x x .



2.29

Fig. 2.7: X Y− transformation used in the proof of theorem 2.1

We know that ( ) [ ]Yf y d y P y Y y d y= < ≤ + . Therefore, for any given 1y , we

need to find the set of values x such that ( )1 1y g x y d y< ≤ + and the

probability that X is in this set. As we see from the figure, this set consists of the

following intervals:

1 1 1 2 2 2 3 3 3, ,x x x d x x d x x x x x x d x< ≤ + + < ≤ < ≤ +

where 1 3 20 , 0 , but 0d x d x d x> > < .

From the above, it follows that

[ ] [ ] [ ][ ]

< ≤ + = < ≤ + + + < ≤

+ < ≤ +

P y Y y d y P x X x d x P x dx X x

P x X x d x1 1 1 1 1 2 2 2

3 3 3

This probability is the shaded area in Fig. 2.7.

[ ] ( ) ( )1 1 1 1 1 11

,'Xd yP x X x d x f x d x d x

g x< ≤ + = =

[ ] ( ) ( )2 2 2 2 2 22

,'Xd yP x d x x x f x d x d x

g x+ < ≤ = =

[ ] ( ) ( )3 3 3 3 3 33

,'Xd yP x X x d x f x d x d x

g x< ≤ + = =



2.30

We conclude that

( ) ( )( )

( )( )

( )( )

= + +X X XY

f x f x f xf y d y d y d y d y

g x g xg x1 2 3

1 32' '' (2.32)

and Eq. 2.31 follows, by canceling d y from the Eq. 2.32.

Note that if ( ) 1g x y= = constant for every x in the interval ( )0 1,x x , then

we have [ ] ( ) ( ) ( )1 0 1 1 0X XP Y y P x X x F x F x= = < ≤ = − ; that is ( )YF y is

discontinuous at 1y y= . Hence ( )Yf y contains an impulse, ( )1y yδ − of area

( ) ( )1 0X XF x F x− .

We shall take up a few examples.

Example 2.6

( )Y g X X a= = + , where a is a constant. Let us find ( )Yf y .

We have ( )' 1g x = and x y a= − . For a given y , there is a unique x

satisfying the above transformation. Hence,

( ) ( )( )'

XY

f xf y d y d y

g x= and as ( )' 1g x = , we have

( ) ( )Y Xf y f y a= −

Let ( )1 , 1

0 ,X

x xf x

elsewhere

⎧ − ≤⎪= ⎨⎪⎩

and 1a = −

Then, ( ) 1 1Yf y y= − +

As 1y x= − , and x ranges from the interval ( )1, 1− , we have the range of y

as ( )2, 0− .



2.31

Hence ( )1 1 , 2 0

0 ,Y

y yf y

elsewhere

⎧ − + − ≤ ≤⎪= ⎨⎪⎩

( )XF x and ( )YF y are shown in Fig. 2.8.

Fig. 2.8: ( )XF x and ( )YF y of example 2.6

As can be seen, the transformation of adding a constant to the given variable

simply results in the translation of its PDF.

Example 2.7

Let Y b X= , where b is a constant. Let us find ( )Yf y .

Solving for X , we have 1X Yb

= . Again for a given y , there is a unique

x . As ( )'g x b= , we have ( ) 1Y X

yf y fb b

⎛ ⎞= ⎜ ⎟⎝ ⎠

.

Let ( )X

x xf x

otherwise

1 , 0 22

0 ,

⎧ − ≤ ≤⎪= ⎨⎪⎩

and 2b = − , then



2.32

( )1 1 , 4 02 4

0 ,Y

y yf y

otherwise

⎧ ⎡ ⎤+ − ≤ ≤⎪ ⎢ ⎥= ⎣ ⎦⎨⎪⎩

( )Xf x and ( )Yf y are sketched in Fig. 2.9.

Fig. 2.9: ( )Xf x and ( )Yf y of example 2.7

Example 2.8

2 , 0Y a X a= > . Let us find ( )Yf y .

Exercise 2.4

Let Y a X b= + , where a and b are constants. Show that

( ) 1Y X

y bf y fa a

−⎛ ⎞= ⎜ ⎟⎝ ⎠

.

If ( )Xf x is as shown in Fig.2.8, compute and sketch ( )Yf y for 2a = − , and

1b = .



2.33

( )' 2g x a x= . If 0y < , then the equation 2y a x= has no real solution.

Hence ( ) 0Yf y = for 0y < . If 0y ≥ , then it has two solutions, 1yxa

= and

2yxa

= − , and Eq. 2.31 yields

( )1 , 0

2

0 ,

X X

Y

y yf f ya ayf y a

aotherwise

⎧ ⎡ ⎤⎛ ⎞ ⎛ ⎞+ − ≥⎪ ⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎪ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦= ⎨

⎪⎪⎩

Let 1a = , and ( )21 exp ,

22Xxf x x

⎛ ⎞= − − ∞ < < ∞⎜ ⎟⎜ ⎟π ⎝ ⎠

(Note that ( )exp α is the same as eα )

Then ( ) 1 exp exp2 22 2Yy yf y

y⎡ ⎤⎛ ⎞ ⎛ ⎞= − + −⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦π

y y

yotherwise

1 exp , 022

0 ,

⎧ ⎛ ⎞− ≥⎪ ⎜ ⎟= ⎝ ⎠⎨⎪⎩

π

Sketching of ( )Xf x and ( )Yf y is left as an exercise.

Example 2.9 Consider the half wave rectifier transformation given by

0 , 0, 0

XY

X X≤⎧

= ⎨ >⎩

a) Let us find the general expression for ( )Yf y

b) Let ( )Xx

f xotherwise

1 1 3,2 2 20 ,

⎧ − < <⎪= ⎨⎪⎩

We shall compute and sketch ( )Yf y .



2.34

a) Note that ( )g X is a constant (equal to zero) for X in the range of ( ), 0− ∞ .

Hence, there is an impulse in ( )Yf y at 0y = whose area is equal to

( )0XF . As Y is nonnegative, ( ) 0Yf y = for 0y < . As Y X= for 0x > ,

we have ( ) ( )Y Xf y f y= for 0y > . Hence

( ) ( ) ( ) ( ) ( )00 ,X X

Yf y w y F y

f yotherwise

⎧ + δ⎪= ⎨⎪⎩

where ( ) 1 , 00 ,

yw y

otherwise≥⎧

= ⎨⎩

b) Specifically, let ( )1 1 3,2 2 20 ,

Xx

f xelsewhere

⎧ − < <⎪= ⎨⎪⎩

Then, ( )

( )1 , 041 3, 02 20 ,

Y

y y

f y y

otherwise

⎧ δ =⎪⎪⎪= < ≤⎨⎪⎪⎪⎩

( )Xf x and ( )Yf y are sketched in Fig. 2.10.

Fig. 2.10: ( )Xf x and ( )Yf y for the example 2.9



2.35

Note that X , a continuous RV is transformed into a mixed RV, Y .

Example 2.10

Let X

YX

1, 01, 0

− <⎧= ⎨+ ≥⎩

a) Let us find the general expression for ( )Yf y .

b) We shall compute and sketch ( )Yf x assuming that ( )Xf x is the same as

that of Example 2.9.

a) In this case, Y assumes only two values, namely 1± . Hence the PDF of Y

has only two impulses. Let us write ( )Yf y as

( ) ( ) ( )1 11 1Yf y P y P y−= − + +δ δ where

[ ] [ ]1 10 0P P X and P X− = < ≥

b) Taking ( )Xf x of example 2.9, we have 134

P = and 114

P− = . Fig. 2.11

has the sketches ( )Xf x and ( )Yf y .

Fig. 2.11: ( )Xf x and ( )Yf y for the example 2.10

Note that this transformation has converted a continuous random variable X into

a discrete random variable Y .



2.36

Exercise 2.5

Let a random variable X with the PDF shown in Fig. 2.12(a) be the

input to a device with the input-output characteristic shown in Fig. 2.12(b).

Compute and sketch ( )Yf y .

Fig. 2.12: (a) Input PDF for the transformation of exercise 2.5

(b) Input-output transformation

Exercise 2.6 The random variable X of exercise 2.5 is applied as input to the

X Y− transformation shown in Fig. 2.13. Compute and sketch ( )Yf y .



2.37

We now assume that the random variable X is of discrete type taking on

the value kx with probability kP . In this case, the RV, ( )Y g X= is also

discrete, assuming the value ( )k kY g x= with probability kP .

If ( )ky g x= for only one kx x= , then [ ] [ ]k k kP Y y P X x P= = = = . If

however, ( )ky g x= for kx x= and mx x= , then [ ]k k mP Y y P P= = + .

Example 2.11

Let 2Y X= .

a) If ( ) ( )Xi

f x x i6

1

16 =

= −∑ δ , find ( )Yf y .

b) If ( ) ( )Xi

f x x i3

2

16 = −

= −∑ δ , find ( )Yf y .

a) If X takes the values ( )1, 2, ......., 6 with probability of 16

, then Y takes the

values 2 2 21 , 2 , ......., 6 with probability 16

. That is, ( ) ( )Yi

f y x i6

2

1

16 =

= δ −∑ .

b) If, however, X takes the values 2, 1, 0, 1, 2, 3− − with probability 16

, then

Y takes the values 0, 1, 4, 9 with probabilities 1 1 1 1, , ,6 3 3 6

respectively.

That is,

( ) ( ) ( ) ( ) ( )1 19 1 46 3Yf y y y y y⎡ ⎤ ⎡ ⎤= δ + δ − + δ − + δ −⎣ ⎦ ⎣ ⎦

2.4.2 Functions of two random variables Given two random variables X and Y (assumed to be continuous type),

two new random variables, Z and W are defined using the transformation

( ),Z g X Y= and ( ),W h X Y=



2.38

Given the above transformation and ( ), ,X Yf x y , we would like to obtain

( ), ,Z Wf z w . For this, we require the Jacobian of the transformation, denoted

,,

z wJx y

⎛ ⎞⎜ ⎟⎝ ⎠

where

,,

z zx yz wJ

x y w wx y

∂ ∂∂ ∂⎛ ⎞

=⎜ ⎟ ∂ ∂⎝ ⎠∂ ∂

z w z wx y y x

∂ ∂ ∂ ∂= −

∂ ∂ ∂ ∂

That is, the Jacobian is the determinant of the appropriate partial derivatives. We

shall now state the theorem which relates ( ), ,Z Wf z w and ( ), ,X Yf x y .

We shall assume that the transformation is one-to-one. That is, given

( ) 1,g x y z= , (2.33a)

( ) 1,h x y w= , (2.33b)

then there is a unique set of numbers, ( )1 1,x y satisfying Eq. 2.33.

Theorem 2.2: To obtain ( ), ,Z Wf z w , solve the system of equations

( ) 1,g x y z= ,

( )h x y w1, = ,

for x and y . Let ( )1 1,x y be the result of the solution. Then,

( ) ( ), 1 1,

1 1

,,

,,

X YZ W

f x yf z w

z wJx y

=⎛ ⎞⎜ ⎟⎝ ⎠

(2.34)

Proof of this theorem is given in appendix A2.1. For a more general version of

this theorem, refer [1].



2.39

Example 2.12 X and Y are two independent RVs with the PDFs ,

( ) 1 , 12Xf x x= ≤

( ) 1 , 12Yf y y= ≤

If Z X Y= + and W X Y= − , let us find (a) ( ), ,Z Wf z w and (b) ( )Zf z .

a) From the given transformations, we obtain ( )12

x z w= + and

( )12

y z w= − . We see that the mapping is one-to-one. Fig. 2.14(a)

depicts the (product) space A on which ( ), ,X Yf x y is non-zero.

Fig. 2.14: (a) The space where ,X Yf is non-zero

(b) The space where ,Z Wf is non-zero



2.40

We can obtain the space B (on which ( ), ,Z Wf z w is non-zero) as follows:

spaceA spaceB

The line 1x = ( )1 1 22

z w w z+ = ⇒ = − +

The line 1x = − ( ) ( )1 1 22

z w w z+ = − ⇒ = − +

The line 1y = ( )− = ⇒ = −z w w z1 1 22

The line 1y = − ( )1 1 22

z w w z− = − ⇒ = +

The space B is shown in Fig. 2.14(b). The Jacobian of the transformation

is

1 1, 2

, 1 1z wJx y

⎛ ⎞= = −⎜ ⎟ −⎝ ⎠

and ( ) 2J = .

Hence ( ),

11 , ,4, 82 0 ,

Z W

z wf z w

otherwise

⎧ ∈⎪= = ⎨⎪⎩

B

b) ( ) ( ), ,Z Z Wf z f z w d w∞

− ∞

= ∫

From Fig. 2.14(b), we can see that, for a given z ( 0z ≥ ), w can take

values only in the toz z− . Hence

( ) 1 1 , 0 28 4

z

Zz

f z d w z z+

−

= = ≤ ≤∫

For z negative, we have

( ) 1 1 , 2 08 4

z

Zz

f z d w z z−

= = − − ≤ ≤∫



2.41

Hence ( ) 1 , 24Zf z z z= ≤

Example 2.13

Let the random variables R and Φ be given by, 2 2R X Y= + and

tan YarcX

⎛ ⎞Φ = ⎜ ⎟⎝ ⎠

where we assume 0R ≥ and − π < Φ < π . It is given that

( )2 2

,1, exp , ,

2 2X Yx yf x y x y

⎡ ⎤⎛ ⎞+= − − ∞ < < ∞⎢ ⎥⎜ ⎟π ⎝ ⎠⎣ ⎦

.

Let us find ( ), ,Rf rΦ ϕ .

As the given transformation is from cartesian-to-polar coordinates, we can

write cosx r= φ and siny r= φ , and the transformation is one -to -one;

cos sin1

sin cos

r rx y

Jr

r rx y

∂ ∂ϕ ϕ∂ ∂

= = =− ϕ ϕ∂ϕ ∂ϕ∂ ∂

Hence, ( )R

r r rf r

otherwise

2

,

exp , 0 ,, 2 2

0 ,Φ

⎧ ⎛ ⎞− ≤ ≤ ∞ − π < ϕ < π⎪ ⎜ ⎟ϕ = π⎨ ⎝ ⎠

⎪⎩

It is left as an exercise to find ( )Rf r , ( )fΦ ϕ and to show that R and Φ are

independent variables.

Theorem 2.2 can also be used when only one function ( ),Z g X Y= is

specified and what is required is ( )Zf z . To apply the theorem, a conveniently

chosen auxiliary or dummy variable W is introduced. Typically W X= or

W Y= ; using the theorem ( ), ,Z Wf z w is found from which ( )Zf z can be

obtained.



2.42

Let Z X Y= + and we require ( )Zf z . Let us introduce a dummy variable

W Y= . Then, X Z W= − , and Y W= .

As J 1= ,

( ) ( ), ,, ,Z W X Yf z w f z w w= −

and ( ) ( ), ,Z X Yf z f z w w d w∞

− ∞

= −∫ (2.35)

If X and Y are independent, then Eq. 2.35 becomes

( ) ( ) ( )Z X Yf z f z w f w d w∞

− ∞

= −∫ (2.36a)

That is, Z X Yf f f= ∗ (2.36b)

Example 2.14 Let X and Y be two independent random variables, with

( )1 , 1 120 ,

Xx

f xotherwise

⎧ − ≤ ≤⎪= ⎨⎪⎩

( )1 , 2 130 ,

Y

yf y

otherwise

⎧ − ≤ ≤⎪= ⎨⎪⎩

If Z X Y= + , let us find [ ]2P Z ≤ − .

From Eq. 2.36(b), ( )Zf z is the convolution of ( )Xf z and ( )Yf z . Carrying

out the convolution, we obtain ( )Zf z as shown in Fig. 2.15.



2.43

Fig. 2.15: PDF of Z X Y= + of example 2.14

[ ]2P Z ≤ − is the shaded area 112

= .

Example 2.15

Let XZY

= ; let us find an expression for ( )Zf z .

Introducing the dummy variable W Y= , we have

X Z W=

Y W=

As 1Jw

= , we have

( ) ( ), ,Z X Yf z w f zw w dw∞

− ∞

= ∫

Let ( ),

1 , 1, 1, 4

0 ,X Y

x y x yf x y

elsewhere

+⎧ ≤ ≤⎪= ⎨⎪⎩

Then ( )21

4Zzwf z w dw

∞

− ∞

+= ∫



2.44

( ), ,X Yf x y is non-zero if ( ),x y ∈ A . where A is the product space

1x ≤ and 1y ≤ (Fig.2.16a). Let ( ), ,Z Wf z w be non-zero if ( ),z w ∈ B . Under

the given transformation, B will be as shown in Fig. 2.16(b).

Fig. 2.16: (a) The space where ,X Yf is non-zero

(b) The space where ,Z Wf is non-zero

To obtain ( )Zf z from ( ), ,Z Wf z w , we have to integrate out w over the

appropriate ranges.

i) Let 1z < ; then

( )1 12 2

1 0

1 124 4Zzw zwf z w dw w dw

−

+ += =∫ ∫

1 14 2

z⎛ ⎞= +⎜ ⎟⎝ ⎠

ii) For 1z > , we have

( )1

2

2 30

1 1 1 124 4 2

z

Zzwf z w dw

z z⎛ ⎞+

= = +⎜ ⎟⎝ ⎠

∫

iii) For 1z < − , we have

( )1

2

2 30

1 1 1 124 4 2

z

Zzwf z w dw

z z

−⎛ ⎞+

= = +⎜ ⎟⎝ ⎠

∫



2.45

Hence ( )2 3

1 1 , 14 21 1 1 , 14 2

Z

z zf z

zz z

⎧ ⎛ ⎞+ ≤⎜ ⎟⎪ ⎝ ⎠⎪= ⎨⎛ ⎞⎪ + >⎜ ⎟⎪ ⎝ ⎠⎩

In transformations involving two random variables, we may encounter a

situation where one of the variables is continuous and the other is discrete; such

cases are handled better, by making use of the distribution function approach,

as illustrated below.

Example 2.16 The input to a noisy channel is a binary random variable with

[ ] [ ] 10 12

P X P X= = = = . The output of the channel is given by Z X Y= +

Exercise 2.7

Let Z X Y= and W Y= .

a) Show that ( ) ,1 ,Z X Y

zf z f w d ww w

∞

− ∞

⎛ ⎞= ⎜ ⎟⎝ ⎠∫

b) X and Y be independent with

( )2

1 , 11

Xf x xx

= ≤π +

and ( )2

2 , 00 ,

y

Yy e yf y

otherwise

−⎧⎪ ≥= ⎨⎪⎩

Show that ( )2

21 ,2

z

Zf z e z−= − ∞ < < ∞

π.



2.46

where Y is the channel noise with ( )2

21 ,2

y

Yf y e y−

= − ∞ < < ∞π

. Find

( )Zf z .

Let us first compute the distribution function of Z from which the density

function can be derived.

( ) [ ] [ ] [ ] [ ]| 0 0 | 1 1P Z z P Z z X P X P Z z X P X≤ = ≤ = = + ≤ = =

As Z X Y= + , we have

[ ] ( )| 0 YP Z z X F z≤ = =

Similarly [ ] ( ) ( )| 1 1 1YP Z z X P Y z F z⎡ ⎤≤ = = ≤ − = −⎣ ⎦

Hence ( ) ( ) ( )1 1 12 2Z Y YF z F z F z= + − . As ( ) ( )Z Z

df z F zd z

= , we have

( ) ( )22 11 1 1exp exp2 2 22 2Z

zzf z⎡ ⎤⎛ ⎞⎛ ⎞ −⎢ ⎥⎜ ⎟⎜ ⎟= − + −⎢ ⎥⎜ ⎟⎜ ⎟π π⎝ ⎠ ⎝ ⎠⎣ ⎦

The distribution function method, as illustrated by means of example 2.16, is a

very basic method and can be used in all situations. (In this method, if

( )Y g X= , we compute ( )YF y and then obtain ( ) ( )YY

d F yf y

d y= . Similarly, for

transformations involving more than one random variable. Of course computing

the CDF may prove to be quite difficult in certain situations. The method of

obtaining PDFs based on theorem 2.1 or 2.2 is called change of variable

method.) We shall illustrate the distribution function method with another

example.

Example 2.17

Let Z X Y= + . Obtain ( )Zf z , given ( ), ,X Yf x y .



2.47

[ ] [ ]P Z z P X Y z≤ = + ≤

[ ]P Y z x= ≤ −

This probability is the probability of ( ),X Y lying in the shaded area shown in Fig.

2.17.

Fig. 2.17: Shaded area is the [ ]P Z z≤

That is,

( ) ( ), ,z x

Z X YF z d x f x y d y−∞

− ∞ − ∞

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∫ ∫

( ) ( ), ,z x

Z X Yf z d x f x y d yz

−∞

− ∞ − ∞

⎡ ⎤⎡ ⎤∂= ⎢ ⎥⎢ ⎥

∂ ⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦∫ ∫

( ), ,z x

X Yd x f x y d yz

−∞

− ∞ − ∞

⎡ ⎤∂= ⎢ ⎥

∂⎢ ⎥⎣ ⎦∫ ∫

( ), ,X Yf x z x d x∞

− ∞

= −∫ (2.37a)

It is not too difficult to see the alternative form for ( )Zf z , namely,



2.48

( ) ( ), ,Z X Yf z f z y y d y∞

− ∞

= −∫ (2.37b)

If X and Y are independent, we have ( ) ( ) ( )Z X Yf z f z f z= ∗ . We note that Eq.

2.37(b) is the same as Eq. 2.35.

So far we have considered the transformations involving one or two

variables. This can be generalized to the case of functions of n variables. Details

can be found in [1, 2].

2.5 Statistical Averages The PDF of a random variable provides a complete statistical

characterization of the variable. However, we might be in a situation where the

PDF is not available but are able to estimate (with reasonable accuracy) certain

(statistical) averages of the random variable. Some of these averages do provide

a simple and fairly adequate (though incomplete) description of the random

variable. We now define a few of these averages and explore their significance.

The mean value (also called the expected value, mathematical

expectation or simply expectation) of random variable X is defined as

[ ] ( )X Xm E X X x f x d x∞

− ∞

= = = ∫ (2.38)

where E denotes the expectation operator. Note that Xm is a constant. Similarly,

the expected value of a function of X , ( )g X , is defined by

( ) ( ) ( ) ( )XE g X g X g x f x d x∞

− ∞

⎡ ⎤ = =⎣ ⎦ ∫ (2.39)

Remarks: The terminology expected value or expectation has its origin in games

of chance. This can be illustrated as follows: Three small similar discs, numbered

1,2 and 2 respectively are placed in bowl and are mixed. A player is to be



2.49

blindfolded and is to draw a disc from the bowl. If he draws the disc numbered 1,

he will receive nine dollars; if he draws either disc numbered 2, he will receive 3

dollars. It seems reasonable to assume that the player has a '1/3 claim' on the 9

dollars and '2/3 claim' on three dollars. His total claim is 9(1/3) + 3(2/3), or five

dollars. If we take X to be (discrete) random variable with the PDF

( ) ( ) ( )1 21 23 3Xf x x x= δ − + δ − and ( ) 15 6g X X= − , then

( ) ( ) ( )15 6 5XE g X x f x d x∞

− ∞

⎡ ⎤ = − =⎣ ⎦ ∫

That is, the mathematical expectation of ( )g X is precisely the player's claim or

expectation [3]. Note that ( )g x is such that ( )1 9g = and ( )2 3g = .

For the special case of ( ) ng X X= , we obtain the n-th moment of the

probability distribution of the RV, X ; that is,

( )n nXE X x f x d x

∞

− ∞

⎡ ⎤ =⎣ ⎦ ∫ (2.40)

The most widely used moments are the first moment ( 1n = , which results in the

mean value of Eq. 2.38) and the second moment ( 2n = , resulting in the mean

square value of X ).

( )2 2 2XE X X x f x d x

∞

− ∞

⎡ ⎤ = =⎣ ⎦ ∫ (2.41)

If ( ) ( )nXg X X m= − , then ( )E g X⎡ ⎤⎣ ⎦ gives n-th central moment; that is,

( ) ( ) ( )n nX X XE X m x m f x d x

∞

− ∞

⎡ ⎤− = −⎣ ⎦ ∫ (2.42)

We can extend the definition of expectation to the case of functions of

( )2k k ≥ random variables. Consider a function of two random variables,

( ),g X Y . Then,



2.50

( ) ( ) ( ),, , ,X YE g X Y g x y f x y d x d y∞ ∞

− ∞ − ∞

⎡ ⎤ =⎣ ⎦ ∫ ∫ (2.43)

An important property of the expectation operator is linearity; that is, if

( ),Z g X Y X Y= = α + β where α and β are constants, then Z X Y= α + β .

This result can be established as follows. From Eq. 2.43, we have

[ ] ( ) ( ), ,X YE Z x y f x y d x dy∞ ∞

− ∞ − ∞

= α + β∫ ∫

( ) ( ) ( ) ( ), ,, ,X Y X Yx f x y d x dy y f x y d x dy∞ ∞ ∞ ∞

− ∞ − ∞ − ∞ − ∞

= α + β∫ ∫ ∫ ∫

Integrating out the variable y in the first term and the variable x in the second

term, we have

[ ] ( ) ( )X YE Z x f x d x y f y d y∞ ∞

− ∞ − ∞

= α + β∫ ∫

X Y= α + β

2.5.1 Variance Coming back to the central moments, we have the first central moment

being always zero because,

( ) ( ) ( )X X XE X m x m f x d x∞

− ∞

⎡ ⎤− = −⎣ ⎦ ∫

0X Xm m= − =

Consider the second central moment

( )2 2 22X X XE X m E X m X m⎡ ⎤ ⎡ ⎤− = − +⎣ ⎦⎣ ⎦

From the linearity property of expectation,

[ ]2 2 2 22 2X X X XE X m X m E X m E X m⎡ ⎤ ⎡ ⎤− + = − +⎣ ⎦ ⎣ ⎦

2 2 22 X XE X m m⎡ ⎤= − +⎣ ⎦

22 2 2

XX m X X⎡ ⎤= − = − ⎣ ⎦



2.51

The second central moment of a random variable is called the variance and its

(positive) square root is called the standard deviation. The symbol 2σ is

generally used to denote the variance. (If necessary, we use a subscript on 2σ )

The variance provides a measure of the variable's spread or randomness.

Specifying the variance essentially constrains the effective width of the density

function. This can be made more precise with the help of the Chebyshev

Inequality which follows as a special case of the following theorem.

Theorem 2.3: Let ( )g X be a non-negative function of the random variable X . If

( )E g X⎡ ⎤⎣ ⎦ exists, then for every positive constant c ,

( ) ( )E g XP g X c

c⎡ ⎤⎣ ⎦⎡ ⎤≥ ≤⎣ ⎦ (2.44)

Proof: Let ( ){ }:A x g x c= ≥ and B denote the complement of A .

( ) ( ) ( )XE g X g x f x d x∞

− ∞

⎡ ⎤ =⎣ ⎦ ∫

( ) ( ) ( ) ( )X XA B

g x f x d x g x f x d x= +∫ ∫

Since each integral on the RHS above is non-negative, we can write

( ) ( ) ( )XA

E g X g x f x d x⎡ ⎤ ≥⎣ ⎦ ∫

If x A∈ , then ( )g x c≥ , hence

( ) ( )XA

E g X c f x d x⎡ ⎤ ≥⎣ ⎦ ∫

But ( ) [ ] ( )XA

f x d x P x A P g X c⎡ ⎤= ∈ = ≥⎣ ⎦∫

That is, ( ) ( )E g X c P g X c⎡ ⎤ ⎡ ⎤≥ ≥⎣ ⎦ ⎣ ⎦ , which is the desired result.

Note: The kind of manipulations used in proving the theorem 2.3 is useful in

establishing similar inequalities involving random variables.



2.52

To see how innocuous (or weak, perhaps) the inequality 2.44 is, let ( )g X

represent the height of a randomly chosen human being with ( ) 1.6E g X m⎡ ⎤ =⎣ ⎦ .

Then Eq. 2.44 states that the probability of choosing a person over 16 m tall is at

most 1 !10

(In a population of 1 billion, at most 100 million would be as tall as a

full grown Palmyra tree!)

Chebyshev inequality can be stated in two equivalent forms:

i) 2

1 , 0X XP X m k kk

⎡ ⎤− ≥ σ ≤ >⎣ ⎦ (2.45a)

ii) ⎡ ⎤− < σ > −⎣ ⎦X XP X m kk 2

11 (2.45b)

where Xσ is the standard deviation of X .

To establish 2.45(a), let ( ) ( )2Xg X X m= − and 2 2

Xc k= σ in theorem

2.3. We then have,

( )2 2 22

1X XP X m k

k⎡ ⎤− ≥ σ ≤⎣ ⎦

In other words,

2

1X XP X m k

k⎡ ⎤− ≥ σ ≤⎣ ⎦

which is the desired result. Naturally, we would take the positive number k to be

greater than one to have a meaningful result. Chebyshev inequality can be

interpreted as: the probability of observing any RV outside k± standard

deviations off its mean value is no larger than 2

1k

. With 2k = for example, the

probability of 2X XX m− ≥ σ does not exceed 1/4 or 25%. By the same token,

we expect X to occur within the range ( )2X Xm ± σ for more than 75% of the

observations. That is, smaller the standard deviation, smaller is the width of the

interval around Xm , where the required probability is concentrated. Chebyshev



2.53

inequality thus enables us to give a quantitative interpretation to the statement

'variance is indicative of the spread of the random variable'.

Note that it is not necessary that variance exists for every PDF. For example, if

( ) 2 2 , and 0Xf x xx

απ= − ∞ < < ∞ α >

α +, then 0X = but 2X is not finite.

(This is called Cauchy’s PDF)

2.5.2 Covariance An important joint expectation is the quantity called covariance which is

obtained by letting ( ) ( ) ( ), X Yg X Y X m Y m= − − in Eq. 2.43. We use the

symbol λ to denote the covariance. That is,

[ ] ( ) ( )X Y X YX Y E X m Y mcov , ⎡ ⎤λ = = − −⎣ ⎦ (2.46a)

Using the linearity property of expectation, we have

[ ]X Y X YE X Y m mλ = − (2.46b)

The [ ]X Ycov , , normalized with respect to the product X Yσ σ is termed as the

correlation coefficient and we denote it by ρ . That is,

[ ] X YX Y

X Y

E X Y m m−ρ =

σ σ (2.47)

The correlation coefficient is a measure of dependency between the variables.

Suppose X and Y are independent. Then,

[ ] ( ), ,X YE X Y x y f x y d x d y∞ ∞

− ∞ − ∞

= ∫ ∫

( ) ( )X Yx y f x f y d x d y∞ ∞

− ∞ − ∞

= ∫ ∫

( ) ( )X Y X Yx f x d x y f y d y m m∞ ∞

− ∞ − ∞

= =∫ ∫



2.54

Thus, we have X Yλ (and X Yρ ) being zero. Intuitively, this result is appealing.

Assume X and Y to be independent. When the joint experiment is performed

many times, and given 1X x= , then Y would occur sometimes positive with

respect to Ym , and sometimes negative with respect to Ym . In the course of

many trials of the experiments and with the outcome 1X x= , the sum of the

numbers ( )1 Yx y m− would be very small and the quantity, sumnumber of trials

,

tends to zero as the number of trials keep increasing.

On the other hand, let X and Y be dependent. Suppose for example, the

outcome y is conditioned on the outcome x in such a manner that there is a

greater possibility of ( )Yy m− being of the same sign as ( )Xx m− . Then we

expect X Yρ to be positive. Similarly if the probability of ( )Xx m− and ( )Yy m−

being of the opposite sign is quite large, then we expect X Yρ to be negative.

Taking the extreme case of this dependency, let X and Y be so conditioned

that, given 1X x= , then 1Y x= ± α , α being constant. Then 1X Yρ = ± . That

is, for X and Y be independent, we have 0X Yρ = and for the totally dependent

( )y x= ± α case, 1X Yρ = . If the variables are neither independent nor totally

dependent, then ρ will have a magnitude between 0 and 1.

Two random variables X and Y are said to be orthogonal if [ ]E X Y 0= .

If ( )or 0X Y X Yρ λ = , the random variables X and Y are said to be

uncorrelated. That is, if X and Y are uncorrelated, then XY X Y= . When the

random variables are independent, they are uncorrelated. However, the fact that

they are uncorrelated does not ensure that they are independent. As an example,

let



2.55

( )1 ,

2 20 ,

X

xf x

otherwise

α α⎧ − < <⎪= α⎨⎪⎩

,

and 2Y X= . Then,

2

3 3 3

2

1 1 0X Y X x d x x d xα

∞

α− ∞ −

= = = =α α∫ ∫

As 0 , 0X X Y= = which means 0X Y X Y X Yλ = − = . But X and Y are

not independent because, if 1X x= , then 21 !Y x=

Let Y be the linear combination of the two random variables 1X and 2X .

That is 1 1 2 2Y k X k X= + where 1k and 2k are constants. Let [ ]i iE X m= ,

2 2iX iσ = σ , 1, 2i = . Let 12ρ be the correlation coefficient between 1X and 2X .

We will now relate 2Yσ to the known quantities.

[ ] 22 2Y E Y E Y⎡ ⎤ ⎡ ⎤σ = − ⎣ ⎦⎣ ⎦

[ ] 1 1 2 2E Y k m k m= +

2 2 21 1 2 2 1 2 1 22E Y E k X k X k k X X⎡ ⎤ ⎡ ⎤= + +⎣ ⎦ ⎣ ⎦

With a little manipulation, we can show that

2 2 21 1 2 2 12 1 2 1 22Y k k k kσ = σ + σ + ρ σ σ (2.48a)

If 1X and 2X are uncorrelated, then

2 2 21 1 2 2Y k kσ = σ + σ (2.48b)

Note: Let 1X and 2X be uncorrelated, and let 1 2Z X X= + and 1 2W X X= − .

Then 2 2 2 21 2Z Wσ = σ = σ + σ . That is, the sum as well as the difference random

variables have the same variance which is larger than 21σ or 2

2σ !

The above result can be generalized to the case of linear combination of

n variables. That is, if



2.56

1

n

i ii

Y k X=

= ∑ , then

2 2 2

12

n

Y i i i j i j i ji i j

i j

k k k=

<

σ = σ + ρ σ σ∑ ∑∑ (2.49)

where the meaning of various symbols on the RHS of Eq. 2.49 is quite obvious.

We shall now give a few examples based on the theory covered so far in

this section.

Example 2.18 Let a random variable X have the CDF

( ) 2

0 , 0

, 0 28

, 2 4161 , 4

X

xx x

F xx x

x

<⎧⎪⎪ ≤ ≤⎪= ⎨⎪ ≤ ≤⎪⎪ ≤⎩

We shall find a) X and b) 2Xσ

a) The given CDF implies the PDF

( )

0 , 01 , 0 281 , 2 480 , 4

X

x

xf x

x x

x

<⎧⎪⎪ ≤ ≤⎪= ⎨⎪ ≤ ≤⎪⎪ ≤⎩

Therefore,

[ ]2 4

2

0 2

1 18 8

E X x d x x d x= +∫ ∫



2.57

1 7 314 3 12

= + =

b) [ ] 22 2X E X E X⎡ ⎤ ⎡ ⎤σ = − ⎣ ⎦⎣ ⎦

2 42 2 3

0 2

1 18 8

E X x d x x d x⎡ ⎤ = +⎣ ⎦ ∫ ∫

1 15 473 2 6

= + =

22 47 31 167

6 12 144X⎛ ⎞σ = − =⎜ ⎟⎝ ⎠

Example 2.19 Let cosY X= π , where

( )Xx

f xotherwise

1 11,2 2

0,

⎧ − < <⎪= ⎨⎪⎩

Let us find [ ]E Y and 2Yσ .

From Eq. 2.38 we have,

[ ] ( )12

12

2cos 0.0636E Y x d x−

= π = =π∫

( )12

2 2

12

1cos 0.52

E Y x d x−

⎡ ⎤ = π = =⎣ ⎦ ∫

Hence 22

1 4 0.962Yσ = − =

π

Example 2.20 Let X and Y have the joint PDF

( ),

, 0 1, 0 1,

0 ,X Y

x y x yf x y

elsewhere+ < < < <⎧

= ⎨⎩



2.58

Let us find a) 2E X Y⎡ ⎤⎣ ⎦ and b) X Yρ

a) From Eq. 2.43, we have

( )2 2, ,X YE X Y x y f x y dx dy

∞ ∞

− ∞ − ∞

⎡ ⎤ =⎣ ⎦ ∫ ∫

( )1 1

2

0 0

1772

x y x y dx dy= + =∫ ∫

b) To find X Yρ , we require [ ] [ ] [ ], ,E X Y E X E Y , Xσ and Yσ . We can easily

show that

[ ] [ ] 2 27 11,12 144X YE X E Y= = σ = σ = and [ ] 48

144E X Y =

Hence 111X Yρ = −

Another statistical average that will be found useful in the study of

communication theory is the conditional expectation. The quantity,

( ) ( ) ( )|| |X YE g X Y y g x f x y dx∞

− ∞

⎡ ⎤= =⎣ ⎦ ∫ (2.50)

is called the conditional expectation of ( )g X , given Y y= . If ( )g X X= , then

we have the conditional mean, namely, [ ]|E X Y .

[ ] [ ] ( )|| | |X YE X Y y E X Y x f x y dx∞

− ∞

= = = ∫ (2.51)

Similarly, we can define the conditional variance etc. We shall illustrate the

calculation of conditional mean with the help of an example.

Example 2.21

Let the joint PDF of the random variables X and Y be



2.59

( ),

1 , 0 1, 0,

0 ,X Y

x y xf x y x

outside

⎧ < < < <⎪≡ ⎨⎪⎩

.

Let us compute [ ]|E X Y .

To find [ ]|E X Y , we require the conditional PDF, ( ) ( )( )

,|

,| X Y

X YY

f x yf x y

f y=

( )1 1 ln , 0 1Yy

f y d x y yx

= = − < <∫

( )|

11| , 1

ln lnX Yxf x y y x

y x y= = − < <

−

Hence ( )1 1|

lny

E X Y x d xx y

⎡ ⎤= −⎢ ⎥

⎣ ⎦∫

1lny

y−

=

Note that [ ]|E X Y y= is a function of y .

2.6 Some Useful Probability Models In the concluding section of this chapter, we shall discuss certain

probability distributions which are encountered quite often in the study of

communication theory. We will begin our discussion with discrete random

variables.

2.6.1. Discrete random variables i) Binomial: Consider a random experiment in which our interest is in the occurrence or non-

occurrence of an event A . That is, we are interested in the event A or its

complement, say, B . Let the experiment be repeated n times and let p be the



2.60

probability of occurrence of A on each trial and the trials are independent. Let X

denote random variable, ‘number of occurrences of A in n trials'. X can be

equal to 0, 1, 2, ........., n . If we can compute [ ], 0, 1, .........,P X k k n= = , then

we can write ( )Xf x .

Taking a special case, let 5n = and 3k = . The sample space

(representing the outcomes of these five repetitions) has 32 sample points, say,

1 32, ...........,s s . The sample point 1s could represent the sequence ABBBB . The

sample points such as ABAAB , A A ABB etc. will map into real number 3 as

shown in the Fig. 2.18. (Each sample point is actually an element of the five

dimensional Cartesian product space).

Fig. 2.18: Binomial RV for the special case 5n =

( ) ( ) ( ) ( ) ( ) ( )P AB A AB P A P B P A P A P B= , as trails are independent.

( ) ( ) ( )22 31 1 1p p p p p p= − − = −

There are ( )510 sample points for which 3

3X s⎛ ⎞

= =⎜ ⎟⎝ ⎠

.

In other words, for 5n = , 3k = ,

[ ] ( )2353 1

3P X p p⎛ ⎞

= = −⎜ ⎟⎝ ⎠

Generalizing this to arbitrary n and k , we have the binomial density, given by



2.61

( ) ( )0

n

X ii

f x P x i=

= δ −∑ (2.52)

where ( )1 n iii

nP p p

i−⎛ ⎞

= −⎜ ⎟⎝ ⎠

As can be seen, ( ) 0Xf x ≥ and

( ) ( )0 0

1n n

n iiX i

i i

nf x d x P p p

i

∞−

= =− ∞

⎛ ⎞= = −⎜ ⎟

⎝ ⎠∑ ∑∫

( )1 1n

p p⎡ ⎤= − + =⎣ ⎦

It is left as an exercise to show that [ ]E X n p= and ( )2 1X n p pσ = − . (Though

the formulae for the mean and the variance of a binomial PDF are simple, the

algebra to derive them is laborious).

We write X is ( ),b n p to indicate X has a binomial PDF with parameters

n and p defined above.

The following example illustrates the use of binomial PDF in a

communication problem.

Example 2.22 A digital communication system transmits binary digits over a noisy

channel in blocks of 16 digits. Assume that the probability of a binary digit being

in error is 0.01 and that errors in various digit positions within a block are

statistically independent.

i) Find the expected number of errors per block

ii) Find the probability that the number of errors per block is greater than or

equal to 3.

Let X be the random variable representing the number of errors per block.

Then X is ( )16, 0.01b .

i) [ ] 16 0.01 0.16;E X n p= = × =



2.62

ii) ( ) [ ]3 1 2P X P X≥ = − ≤

( ) ( )2

16

0

161 0.1 1i i

ip

i−

=

⎛ ⎞= − −⎜ ⎟

⎝ ⎠∑

0.002=

ii) Poisson: A random variable X which takes on only integer values is Poisson

distributed, if

( ) ( )0 !

m

Xm

ef x x mm

− λ∞

=

λ= δ −∑ (2.53)

where λ is a positive constant.

Evidently ( ) 0Xf x ≥ and ( ) 1Xf x d x∞

− ∞

=∫ because 0 !

m

me

m

∞λ

=

λ=∑ .

We will now show that

[ ] 2XE X = λ = σ

Since,

0 !

m

me

m

∞λ

=

λ= ∑ , we have

( ) 1

0 1

1! !

m m

m m

d e me md m m

λ −∞ ∞λ

= =

λ λ= = =

λ λ∑ ∑

[ ]1 !

m

m

m eE X e em

− λ∞λ − λ

=

λ= = λ = λ∑

Differentiating the series again, we obtain,

Exercise 2.8

Show that for the example 2.22, Chebyshev inequality results in

3 0.0196 0.02P X⎡ ⎤≥ ≤⎣ ⎦ . Note that Chebyshev inequality is not very

tight.



2.63

2 2E X⎡ ⎤ = λ + λ⎣ ⎦ . Hence 2Xσ = λ .

2.6.2 Continuous random variables i) Uniform:

A random variable X is said to be uniformly distributed in the interval

a x b≤ ≤ if,

( )1 ,

0 ,X

a x bb af x

elsewhere

⎧ ≤ ≤⎪ −= ⎨⎪⎩

(2.54)

A plot of ( )Xf x is shown in Fig.2.19.

Fig.2.19: Uniform PDF

It is easy show that

[ ]2

a bE X += and

( )22

12X

b a−σ =

Note that the variance of the uniform PDF depends only on the width of the

interval ( )b a− . Therefore, whether X is uniform in ( )1, 1− or ( )2, 4 , it has the

same variance, namely 13

.

ii) Rayleigh: An RV X is said to be Rayleigh distributed if,



2.64

( )2

exp , 02

0 ,X

x x xf x b b

elsewhere

⎧ ⎛ ⎞− ≥⎪ ⎜ ⎟= ⎨ ⎝ ⎠

⎪⎩

(2.55)

where b is a positive constant,

A typical sketch of the Rayleigh PDF is given in Fig.2.20. ( ( )Rf r of

example 2.12 is Rayleigh PDF.)

Fig.2.20: Rayleigh PDF

Rayleigh PDF frequently arises in radar and communication problems. We will

encounter it later in the study of narrow-band noise processes.



2.65

iii) Gaussian

By far the most widely used PDF, in the context of communication theory

is the Gaussian (also called normal) density, specified by

( ) ( )2

21 exp ,

22X

XXX

x mf x x

⎡ ⎤−= ⎢− ⎥ − ∞ < < ∞

σπ σ ⎢ ⎥⎣ ⎦ (2.56)

where Xm is the mean value and 2Xσ the variance. That is, the Gaussian PDF is

completely specified by the two parameters, Xm and 2Xσ . We use the symbol

( )2,X XN m σ to denote the Gaussian density1PT. In appendix A2.3, we show that

( )Xf x as given by Eq. 2.56 is a valid PDF.

As can be seen from the Fig. 2.21, The Gaussian PDF is symmetrical with

respect to Xm .

TP

1PT In this notation, ( )0, 1N denotes the Gaussian PDF with zero mean and unit variance. Note that

if X is ( )2,X XN m σ , then X

X

X mY

−=

σ

⎛ ⎞⎜ ⎟⎝ ⎠

is ( )0, 1N .

Exercise 2.9

a) Let ( )Xf x be as given in Eq. 2.55. Show that ( )0

1Xf x d x∞

=∫ . Hint: Make

the change of variable 2x z= . Then, 2

d zx d x = .

b) Show that if X is Rayleigh distributed, then [ ]2bE X π

= and

2 2E X b⎡ ⎤ =⎣ ⎦



2.66

Fig. 2.21: Gaussian PDF

Hence ( ) ( ) 0.5Xm

X X XF m f x d x− ∞

= =∫

Consider [ ]P X a≥ . We have,

[ ] ( )2

21 exp

22X

Xa X

x mP X a d x

∞ ⎡ ⎤−≥ = ⎢− ⎥

σπ σ ⎢ ⎥⎣ ⎦∫

This integral cannot be evaluated in closed form. By making a change of variable

X

X

x mz⎛ ⎞−

= ⎜ ⎟σ⎝ ⎠, we have

[ ]2

212

X

X

z

a m

P X a e d z∞

−

−σ

≥ =π∫

X

X

a mQ⎛ ⎞−

= ⎜ ⎟σ⎝ ⎠

where ( )21 exp

22y

xQ y d x∞ ⎛ ⎞

= −⎜ ⎟π ⎝ ⎠

∫ (2.57)

Note that the integrand on the RHS of Eq. 2.57 is ( )0, 1N .

( )Q function table is available in most of the text books on

communication theory as well as in standard mathematical tables. A small list is

given in appendix A2.2 at the end of the chapter.



2.67

The importance of Gaussian density in communication theory is due to a

theorem called central limit theorem which essentially states that:

If the RV X is the weighted sum of N independent random components,

where each component makes only a small contribution to the sum, then ( )XF x

approaches Gaussian as N becomes large, regardless of the distribution of the

individual components.

For a more precise statement and a thorough discussion of this theorem,

you may refer [1-3]. The electrical noise in a communication system is often due

to the cumulative effects of a large number of randomly moving charged

particles, each particle making an independent contribution of the same amount,

to the total. Hence the instantaneous value of the noise can be fairly adequately

modeled as a Gaussian variable. We shall develop Gaussian random processes in detail in Chapter 3 and, in Chapter 7, we shall make use of this

theory in our studies on the noise performance of various modulation schemes.

Example 2.23 A random variable Y is said to have a log-normal PDF if lnX Y= has a

Gaussian (normal) PDF. Let Y have the PDF, ( )Yf y given by,

( )( )2

2

ln1 exp , 022

0 ,Y

yy

f y y

otherwise

⎧ ⎡ ⎤− α⎪ ⎢− ⎥ ≥⎪= βπ β⎨ ⎢ ⎥⎣ ⎦⎪⎪⎩

where α and β are given constants.

a) Show that Y is log-normal

b) Find ( )E Y

c) If m is such that ( ) 0.5YF m = , find m .

a) Let lnX Y= or lnx y= (Note that the transformation is one-to-one)



2.68

1 1d x Jd y y y

= → =

Also as 0 ,y x→ → − ∞ and as ,y x→ ∞ → ∞

Hence ( ) ( )2

21 exp ,

22X

xf x x

⎡ ⎤− α= ⎢− ⎥ − ∞ < < ∞

βπ β ⎢ ⎥⎣ ⎦

Note that X is ( )2N α, β

b) ( )2

2212

xX xY E e e e d x

− α∞ −β

− ∞

⎡ ⎤⎢ ⎥⎡ ⎤= =⎣ ⎦ ⎢ ⎥π β⎣ ⎦

∫

( ) 22

2222 1

2

x

e e d x

⎡ ⎤− α + β⎣ ⎦∞β −α + β

− ∞

⎡ ⎤⎢ ⎥= ⎢ ⎥π β⎢ ⎥⎣ ⎦

∫

As the bracketed quantity being the integral of a Gaussian PDF

between the limits ( ),− ∞ ∞ is 1, we have

2

2Y eβ

α +=

c) [ ] [ ]lnP Y m P X m≤ = ≤

Hence if [ ] 0.5P Y m≤ = , then [ ]ln 0.5P X m≤ =

That is, ln m = α or m eα=

iv) Bivariate Gaussian

As an example of a two dimensional density, we will consider the bivariate

Gaussian PDF, ( ), ,X Yf x y , ,x y− ∞ < < ∞ given by,

( ) ( ) ( ) ( ) ( )2 2

, 2 2

1 2

1 1, exp 2X Y X Y

X Y

X Y X Y

x m y m x m y mf x y

k k

− − − −= − + − ρ

σ σ σ σ

⎧ ⎡ ⎤⎫⎨ ⎬⎢ ⎥⎩ ⎣ ⎦⎭

(2.58)

where,

21 2 1X Yk = π σ σ − ρ



2.69

( )22 2 1k = − ρ

Correlation coefficient between andX Yρ =

The following properties of the bivariate Gaussian density can be verified:

P1) If andX Y are jointly Gaussian, then the marginal density of orX Y is

Gaussian; that is, X is ( )2,X XN m σ and Y is ( )2,Y YN m σ TP

1PT

P2) ( ) ( ) ( ), iff 0X Y X Yf x y f x f y= ρ =

That is, if the Gaussian variables are uncorrelated, then they are

independent. That is not true, in general, with respect to non-Gaussian

variables (we have already seen an example of this in Sec. 2.5.2).

P3) If Z X Y= α + β where α and β are constants and andX Y are jointly

Gaussian, then Z is Gaussian. Therefore ( )Zf z can be written after

computing Zm and 2Zσ with the help of the formulae given in section 2.5.

Figure 2.22 gives the plot of a bivariate Gaussian PDF for the case of

0ρ = and X Yσ = σ .

TP

1PT Note that the converse is not necessarily true. Let Xf and Yf be obtained from ,X Yf and let Xf

and Yf be Gaussian. This does not imply ,X Yf is jointly Gaussian, unless andX Y are

independent. We can construct examples of a joint PDF ,X Yf , which is not Gaussian but results in

Xf and Yf that are Gaussian.



2.70

Fig.2.22: Bivariate Gaussian PDF ( X Yσ = σ and 0ρ = )

For 0ρ = and X Yσ = σ , ,X Yf resembles a (temple) bell, with, of course,

the striker missing! For 0ρ ≠ , we have two cases (i) ρ , positive and (ii)

ρ , negative. If 0ρ > , imagine the bell being compressed along the

X Y= − axis so that it elongates along the X Y= axis. Similarly for

0ρ < .

Example 2.24

Let X and Y be jointly Gaussian with 2 21, 1X YX Y= − = σ = σ = and

12X Yρ = − . Let us find the probability of ( ),X Y lying in the shaded region D

shown in Fig. 2.23.



2.71

Fig. 2.23: The region D of example 2.24

Let A be the shaded region shown in Fig. 2.24(a) and B be the shaded

region in Fig. 2.24(b).

Fig. 2.24: (a) Region A and (b) Region B used to obtain region D

The required probability = ( ) ( ), ,P x y P x y⎡ ⎤ ⎡ ⎤∈ − ∈⎣ ⎦ ⎣ ⎦A B

For the region A , we have 1 12

y x≥ − + and for the region B , we have

1 22

y x≥ − + . Hence the required probability is,



2.72

X XP Y P Y1 22 2

⎡ ⎤ ⎡ ⎤+ ≥ − + ≥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

Let XZ Y2

= +

Then Z is Gaussian with the parameters,

1 12 2

Z Y X= + = −

2 2 21 12 .4 2Z X Y X Yσ = σ + σ + ρ

1 1 1 31 2 . .4 2 2 4

= + − =

That is, Z is 1 3,2 4

N ⎛ ⎞−⎜ ⎟⎝ ⎠

. Then

12

34

ZW

+= is ( )0, 1N

[ ]1 3P Z P W⎡ ⎤≥ = ≥⎣ ⎦

[ ] 523

P Z P W⎡ ⎤≥ = ≥⎢ ⎥

⎣ ⎦

Hence the required probability ( ) ( )53 0.04 0.0013

Q Q ⎛ ⎞= − −⎜ ⎟

⎝ ⎠0.039=



2.73

Exercise 2.10 X and Y are independent, identically distributed (iid) random variables,

each being ( )0, 1N . Find the probability of ,X Y lying in the region A shown in

Fig. 2.25.

Fig. 2.25: Region A of Exercise 2.10

Note: It would be easier to calculate this kind of probability, if the space is a

product space. From example 2.12, we feel that if we transform ( ),X Y into

( ),Z W such that Z X Y= + , W X Y= − , then the transformed space B

would be square. Find ( ), ,Z Wf z w and compute the probability ( ),Z W ∈ B .



2.74

Exercise 2.11 Two random variables X and Y are obtained by means of the

transformation given below.

( ) ( )12

1 22 log cos 2eX U U= − π (2.59a)

( ) ( )12

1 22 log sin 2eY U U= − π (2.59b)

1U and 2U are independent random variables, uniformly distributed in the

range 1 20 , 1u u< < . Show that X and Y are independent and each is

( )0, 1N .

Hint: Let 1 12 logeX U= − and 1 1Y X=

Show that 1Y is Rayleigh. Find ( ), ,X Yf x y using 1 cosX Y= Θ and

1 sinY Y= Θ , where 22 UΘ = π .

Note: The transformation given by Eq. 2.59 is called the Box-Muller

transformation and can be used to generate two Gaussian random number

sequences from two independent uniformly distributed (in the range 0 to 1)

sequences.



2.75

Appendix A2.1 Proof of Eq. 2.34 The proof of Eq. 2.34 depends on establishing a relationship between the

differential area d z d w in the z w− plane and the differential area d x d y in

the x y− plane. We know that

( ) [ ], , ,Z Wf z w d z d w P z Z z d z w W w d w= < ≤ + < ≤ +

If we can find d x d y such that

( ) ( ), ,, ,Z W X Yf z w d z d w f x y d x d y= , then ,Z Wf can be found. (Note that

the variables x and y can be replaced by their inverse transformation

quantities, namely, ( )1 ,x g z w−= and ( )1 ,y h z w−= )

Let the transformation be one-to-one. (This can be generalized to the case of

one-to-many.) Consider the mapping shown in Fig. A2.1.

Fig. A2.1: A typical transformation between the ( )x y− plane and ( )z w− plane

Infinitesimal rectangle ABC D in the z w− plane is mapped into the

parallelogram in the x y− plane. (We may assume that the vertex A transforms

to A' , B to B' etc.) We shall now find the relation between the differential area of

the rectangle and the differential area of the parallelogram.



2.76

Consider the parallelogram shown in Fig. A2.2, with vertices P P P1 2 3, ,

and P4 .

Fig. A2.2: Typical parallelogram

Let ( )x y, be the co-ordinates of P1. Then the P2 and P3 are given by

g hP x d z y d zz z

1 1

2 ,− −⎛ ⎞∂ ∂

= + +⎜ ⎟⎜ ⎟∂ ∂⎝ ⎠

x yx d z y d zz z

,⎛ ⎞∂ ∂= + +⎜ ⎟∂ ∂⎝ ⎠

⎛ ⎞∂ ∂

= + +⎜ ⎟∂ ∂⎝ ⎠

x yP x d w y d ww w3 ,

Consider the vectors 1V and 2V shown in the Fig. A2.2 where

( )P P1 2 1= −V and ( )= −P P2 3 1V .

That is,

x yd z d zz z1

∂ ∂= +

∂ ∂V i j

x yd w d ww w2∂ ∂

= +∂ ∂

V i j

where i and j are the unit vectors in the appropriate directions. Then, the area

A of the parallelogram is,

A 1 2= ×V V



2.77

As 0× =i i , 0× =j j , and ( )× = − × =j ii j k where k is the unit vector

perpendicular to both i and j , we have

x y y x d z d wz w z w1 2

∂ ∂ ∂ ∂× = −

∂ ∂ ∂ ∂V V

x yA J d z d wz w

,,

⎛ ⎞= ⎜ ⎟

⎝ ⎠

That is,

( ) ( )Z W X Yx yf z w f x y Jz w, ,

,, ,,

⎛ ⎞= ⎜ ⎟

⎝ ⎠

( )X Yf x y

z wJx y

, ,

,,

=⎛ ⎞⎜ ⎟⎝ ⎠

.



2.78

Appendix A2.2

( )Q Function Table

( )2

212

xQ e d x

∞−

α

α =π∫

It is sufficient if we know ( )Q α for 0α ≥ , because ( ) ( )1Q Q− α = − α . Note

that ( )0 0.5Q = .

y ( )Q y y ( )Q y y ( )Q y

0.05

0.10

0.15

0.20

0.25

0.4801

0.4602

0.4405

0.4207

0.4013

1.05

1.10

1.15

1.20

1.25

0.1469

0.1357

0.1251

0.1151

0.0156

2.10

2.20

2.30

2.40

2.50

0.0179

0.0139

0.0107

0.0082

0.0062

0.30

0.35

0.40

0.45

0.50

0.3821

0.3632

0.3446

0.3264

0.3085

1.30

1.35

1.40

1.45

1.50

0.0968

0.0885

0.0808

0.0735

0.0668

2.60

2.70

2.80

2.90

3.00

0.0047

0.0035

0.0026

0.0019

0.0013

0.55

0.60

0.65

0.70

0.75

0.2912

0.2743

0.2578

0.2420

0.2266

1.55

1.60

1.65

1.70

1.75

0.0606

0.0548

0.0495

0.0446

0.0401

3.10

3.20

3.30

3.40

3.50

0.0010

0.00069

0.00048

0.00034

0.00023

0.80

0.85

0.90

0.95

1.00

0.2119

0.1977

0.1841

0.1711

0.1587

1.80

1.85

1.90

1.95

2.00

0.0359

0.0322

0.0287

0.0256

0.0228

3.60

3.70

3.80

3.90

4.00

0.00016

0.00010

0.00007

0.00005

0.00003

y ( )Q y 310− 3.10 310

2

−

3.28

410− 3.70 410

2

−

3.90

510− 4.27 610− 4.78



2.79

Note that some authors use ( )erfc , the complementary error function which is

given by

( ) ( ) 221erfc erf e d∞

− β

α

α = − α = βπ ∫

and the error function, ( ) 2

0

2erf e dα

− βα = βπ ∫

Hence ( ) 12 2

Q erfc α⎛ ⎞α = ⎜ ⎟

⎝ ⎠.



2.80

Appendix A2.3

Proof that ( )σ2,X XN m is a valid PDF

We will show that ( )Xf x as given by Eq. 2.56, is a valid PDF by

establishing ( ) 1Xf x d x∞

− ∞

=∫ . (Note that ( ) 0Xf x ≥ for x− ∞ < < ∞ ).

Let 22

2 2yv

I e d v e d y∞ ∞

− −

− ∞ − ∞

= =∫ ∫ .

Then, 22

2 2 2yv

I e d v e d y∞ ∞

− −

− ∞ − ∞

⎡ ⎤ ⎡ ⎤= ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦∫ ∫

2 2

2v y

e d v d y∞ ∞ +

−

− ∞ − ∞

= ∫ ∫

Let cosv r= θ and siny r= θ . Then, 2 2r v y= + and 1tan yv

− ⎛ ⎞θ = ⎜ ⎟⎝ ⎠

,

and d x d y r d r d= θ . (Cartesian to Polar coordinate transformation).

22

2 2

0 0

r

I e r d r dπ ∞

−= θ∫ ∫

2= π or 2I = π

That is, 2

21 12

v

e d v∞

−

− ∞

=π ∫ (A2.3.1)

Let X

x

x mv −=

σ (A2.3.2)

Then, x

d xd v =σ

(A2.3.3)

Using Eq. A2.3.2 and Eq. A2.3.3 in Eq. A2.3.1, we have the required result.



2.81

References 1) Papoulis, A., ‘Probability, Random Variables and Stochastic Processes’,

McGraw Hill (3P

rdP edition), 1991.

2) Wozencraft, J. M. and Jacobs, I. J., ‘Principles of Communication

Engineering’, John Wiley, 1965.

3) Hogg, R. V., and Craig, A. T., ‘Introduction to Mathematical Statistics”, The

Macmillan Co., 2P

ndP edition, 1965.

Education

Ch2 probability and random variables pg 81