73

Probabili t - EECS · Probabili t y and Random Pro cesses fo r Electrical and Computer Engineers John A. Gubner Universit y of Wisconsin{Madison

Embed Size (px)

Citation preview

Probability and Random Processes for

Electrical and Computer Engineers

John A. Gubner

University of Wisconsin{Madison

Discrete Random Variables

Bernoulli(p)

}(X = 1) = p; }(X = 0) = 1� p.E[X] = p; var(X) = p(1� p); GX(z) = (1� p) + pz.

binomial(n;p)

}(X = k) =

�n

k

�pk(1� p)n�k; k = 0; : : : ; n.

E[X] = np; var(X) = np(1� p); GX(z) = [(1� p) + pz]n.

geometric0(p)

}(X = k) = (1� p)pk; k = 0; 1; 2; : : : :

E[X] =p

1� p; var(X) =

p

(1� p)2; GX (z) =

1� p

1� pz.

geometric1(p)

}(X = k) = (1� p)pk�1; k = 1; 2; 3; : : ::

E[X] =1

1� p; var(X) =

p

(1� p)2; GX (z) =

(1� p)z

1� pz.

negative binomial or Pascal(m;p)

}(X = k) =

�k � 1

m � 1

�(1� p)mpk�m; k = m;m + 1; : : : :

E[X] =m

1� p; var(X) =

mp

(1� p)2; GX (z) =

�(1� p)z

1� pz

�m.

Note that Pascal(1; p) is the same as geometric1(p).

Poisson(�)

}(X = k) =�ke��

k!; k = 0; 1; : : : :

E[X] = �; var(X) = �; GX(z) = e�(z�1).

Fourier Transforms�

Fourier Transform

H(f) =

Z 1

�1

h(t)e�j2�ft dt

Inversion Formula

h(t) =

Z 1

�1

H(f)ej2�ft df

h(t) H(f)

I[�T;T ](t) 2Tsin(2� Tf)

2� Tf

2Wsin(2�Wt)

2�WtI[�W;W ](f)

(1� jtj=T )I[�T;T ](t) T

�sin(� Tf)

� Tf

�2

W

�sin(�Wt)

�Wt

�2(1� jf j=W )I[�W;W ](f)

e��tu(t)1

� + j2�f

e��jtj2�

�2 + (2�f)2

�2 + t2� e�2��jfj

e�(t=�)2=2p2� � e��

2(2�f)2=2

�The indicator function I[a;b](t) := 1 for a � t � b and I[a;b](t) := 0 otherwise. In

particular, u(t) := I[0;1)(t) is the unit step function.

Preface

Intended Audience

This book contains enough material to serve as a text for a two-coursesequence in probability and random processes for electrical and computer engi-neers. It is also useful as a reference by practicing engineers.

The material for a �rst course can be o�ered at either the undergraduate orgraduate level. The prerequisite is the usual undergraduate electrical and com-puter engineering course on signals and systems, e.g., Haykin and Van Veen [22]or Oppenheim and Willsky [36] (see the Bibliography at the end of the book).

The material for a second course can be o�ered at the graduate level. Theadditional prerequisite is greater mathematical maturity and some familiaritywith linear algebra; e.g., determinants and matrix inverses.

Material for a First Course

In a �rst course, Chapters 1{4 and 6 would make up the core of any o�ering.These chapters cover the basics of probability and discrete and continuous ran-dom variables. The following additional topics are also appropriate for a �rstcourse. include

� Chapter 5, Statistics, can be studied any time after Chapter 4. Thischapter covers parameter estimation and con�dence intervals, histograms,and Monte{Carlo estimation. Numerous Matlab scripts and problemscan be found here. This is a stand-alone chapter, whose material is notused elsewhere in the text, with the exception of Problem 12 in Chapter 9and Problem 6 in Chapter 13.

� Chapter 7, Introduction to Random Processes (Sections 7.1{7.7), can bestudied any time after Chapter 6. (In a more advanced course that iscovering Chapter 8, Random Vectors, the ordering of Chapters 7 and 8can be reversed without any di�culty.)�

� Section 9.1, The Poisson Process, can be covered any time after Chapter 3.

� Sections 10.1{10.2 on discrete-time Markov chains can be covered anytime after Chapter 2. (Section 10.3 on continuous-time Markov chainsrequires material from Chapters 4 and 9.)

As noted above, a �rst course can be o�ered at the undergraduate or the grad-uate level. In the aforementioned material, sections/remarks/problems markedwith a star ( ?) and Chapter Notes indicated by a numerical superscript in thetext are directed at graduate students. These items are easily omitted in anundergraduate course.

�Logically, the material on random vectors in Chapter 8 could be incorporated into Chap-ter 6, and the material in Chapter 7 could be incorporated into Chapter 9. However, thepresent arrangement allows greater exibility in course structure.

iii

iv Preface

Material for a Second Course

The core material for a second course would consist of the following.

� Chapter 8, RandomVectors, emphasizes the �nite-dimensionalKarhunen{Lo�eve expansion, nonlinear transformations of random vectors, Gaussianrandom vectors, and linear and nonlinear minimum mean squared errorestimation. There is also an introduction to complex random variablesand vectors, especially the circularly symmetric Gaussian due to its im-portance in wireless communications.

� Chapter 9, Advanced Concepts in Random Processes, begins with thePoisson, renewal, and Wiener processes. The general question of the ex-istence of processes with speci�ed �nite-dimensional distributions is ad-dressed using Kolmogorov's theorem.

� Chapter 11, Mean Convergence and Applications, covers convergence andcontinuity in mean of order p. Emphasis is on the vector-space struc-ture of random variables with �nite pth moment. Applications includethe continuous-time Karhunen{Lo�eve expansion, the Wiener process, andthe spectral representation. The orthogonality principle and projectiontheorem are also derived and used to develop conditional expectation andprobability in the general case; e.g., when two random variables are indi-vidually continuous but not jointly continuous.

� Chapter 12, Other Modes of Convergence, covers convergence in proba-bility, convergence in distribution, and almost-sure convergence. Appli-cations include showing that mean-square integrals of Gaussian processesare Gaussian, the strong and weak laws of large numbers, and the centrallimit theorem.

Additional material that may be included depending on student preparationand course objectives:

� Some instructors may wish to start a second course by covering some ofthe starred sections and some of the Notes from Chapters 1{4 and 6 andassigning some of the starred problems.

� If students are not familiar with the material in Chapter 7, it can becovered either immediately before or after Chapter 8.

� Since Chapter 10 on Markov chains is not needed for Chapters 11{13, itcan be covered any time after Chapter 9.

� If Chapter 5, Statistics, is covered after Chapter 12, then attention canbe focused on the derivations and their technical details, which requirefacts about the multivariate Gaussian, the strong law of large numbers,and Slutsky's Theorem.

� Chapter 13, Self Similarity and Long-Range Dependence, is provided tointroduce students to these concepts because they arise in network traf-�c modeling. Fractional Brownian motion and fractional autoregressivemoving average processes are developed as prominent examples exhibiting

May 31, 2003

Preface v

self similarity and long-range dependence.

Chapter Features

� Each chapter begins with a Chapter Outline listing the main sections andsubsections.

� Key equations are boxed:

}(AjB) :=}(A \B)}(B)

:

� Important text passages are highlighted:

Two events A and B are said to be independent if}(A\B) =}(A)}(B).

� Tables of discrete random variables and of Fourier transform pairs arefound inside the front cover. A table of continuous random variables isfound inside the back cover.

� The index was compiled as the book was being written. Hence, thereare many cross-references to related information. For example, see \chi-squared random variable."

� When cdfs or other functions are encountered that do not have a closedform, Matlab commands are given for computing them. See \Matlabcommands" in the index.

� Each chapter contains a Notes section. Throughout each chapter, nu-merical superscripts refer to discussions in the Notes section. These notesare usually rather technical and address subtleties of the theory that areimportant for students in a graduate course. The notes can be skipped inan undergraduate course.

� Each chapter contains a Problems section. Problems are grouped ac-cording to the section they are based on, and this is clearly indicated.This enables the student to refer to the appropriate part of the text forbackground relating to particular problems, and it enables the instruc-tor to make up assignments more quickly. In chapters intended for a�rst course, problems marked with a ? are more challenging, and may beskipped in an undergraduate o�ering.

� Each chapter contains an Exam Preparation section. This serves as achapter summary, drawing attention to key concepts and formulas.

May 31, 2003

Table of Contents

1 Introduction to Probability 1

Why Do Electrical and Computer Engineers Need to Study Prob-ability? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1Relative Frequency : : : : : : : : : : : : : : : : : : : : : : : : : : 3What Is Probability Theory? : : : : : : : : : : : : : : : : : : : : 5Features of Each Chapter : : : : : : : : : : : : : : : : : : : : : : 6

1.1 Review of Set Notation : : : : : : : : : : : : : : : : : : : : : : : 6Set Operations : : : : : : : : : : : : : : : : : : : : : : : : : : : 6Set Identities : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8Partitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10?Functions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11?Countable and Uncountable Sets : : : : : : : : : : : : : : : : 13

1.2 Probability Models : : : : : : : : : : : : : : : : : : : : : : : : : : 151.3 Axioms and Properties of Probability : : : : : : : : : : : : : : : 21

Consequences of the Axioms : : : : : : : : : : : : : : : : : : : 221.4 Conditional Probability : : : : : : : : : : : : : : : : : : : : : : : 25

The Law of Total Probability and Bayes' Rule : : : : : : : : : 261.5 Independence : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29

Independence for More Than Two Events : : : : : : : : : : : : 311.6 Combinatorics and Probability : : : : : : : : : : : : : : : : : : : 33

Ordered Sampling with Replacement : : : : : : : : : : : : : : 34Ordered Sampling without Replacement : : : : : : : : : : : : 34Unordered Sampling without Replacement : : : : : : : : : : : 36Unordered Sampling with Replacement : : : : : : : : : : : : : 41

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 46Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 59

2 Discrete Random Variables 60

2.1 Probabilities Involving Random Variables : : : : : : : : : : : : : 61Discrete Random Variables : : : : : : : : : : : : : : : : : : : : 64Integer-Valued Random Variables : : : : : : : : : : : : : : : : 65Pairs of Random Variables : : : : : : : : : : : : : : : : : : : : 67Multiple Independent Random Variables : : : : : : : : : : : : 68Probability Mass Functions : : : : : : : : : : : : : : : : : : : : 71

2.2 Expectation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75Expectation of Functions of Random Variables, or the Law of

the Unconscious Statistician (LOTUS) : : : : : : : : : : : 77Linearity of Expectation : : : : : : : : : : : : : : : : : : : : : 79Moments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79The Markov and Chebyshev Inequalities : : : : : : : : : : : : 81

vi

Table of Contents vii

Expectations of Products of Functions of Independent Ran-dom Variables : : : : : : : : : : : : : : : : : : : : : : : : 83

Uncorrelated Random Variables : : : : : : : : : : : : : : : : : 842.3 Probability Generating Functions : : : : : : : : : : : : : : : : : : 852.4 The Binomial Random Variable : : : : : : : : : : : : : : : : : : : 89

Poisson Approximation of Binomial Probabilities : : : : : : : 922.5 The Weak Law of Large Numbers : : : : : : : : : : : : : : : : : : 92

Conditions for the Weak Law : : : : : : : : : : : : : : : : : : 932.6 Conditional Probability : : : : : : : : : : : : : : : : : : : : : : : 94

The Law of Total Probability : : : : : : : : : : : : : : : : : : 97The Substitution Law : : : : : : : : : : : : : : : : : : : : : : : 99Binary Channel Receiver Design : : : : : : : : : : : : : : : : : 102

2.7 Conditional Expectation : : : : : : : : : : : : : : : : : : : : : : : 104Substitution Law for Conditional Expectation : : : : : : : : : 105Law of Total Probability for Expectation : : : : : : : : : : : : 105

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 107Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 111Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 121

3 Continuous Random Variables 123

3.1 Densities and Probabilities : : : : : : : : : : : : : : : : : : : : : : 123Location and Scale Parameters and the Gamma Densities : : 130The Paradox of Continuous Random Variables : : : : : : : : : 133

3.2 Expectation of a Single Random Variable : : : : : : : : : : : : : 1333.3 Transform Methods : : : : : : : : : : : : : : : : : : : : : : : : : : 138

Moment Generating Functions : : : : : : : : : : : : : : : : : : 139Characteristic Functions : : : : : : : : : : : : : : : : : : : : : 142Why So Many Transforms? : : : : : : : : : : : : : : : : : : : : 144

3.4 Expectation of Multiple Random Variables : : : : : : : : : : : : 1453.5 ?Probability Bounds : : : : : : : : : : : : : : : : : : : : : : : : : 147

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 150Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 163

4 Cumulative Distribution Functions and Their Applications 164

4.1 Continuous Random Variables : : : : : : : : : : : : : : : : : : : : 165Receiver Design for Discrete Signals in Continuous Noise : : : 171Simulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 172

4.2 Discrete Random Variables : : : : : : : : : : : : : : : : : : : : : 173Simulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 175

4.3 Mixed Random Variables : : : : : : : : : : : : : : : : : : : : : : 1764.4 Functions of Random Variables and Their Cdfs : : : : : : : : : : 1794.5 Properties of Cdfs : : : : : : : : : : : : : : : : : : : : : : : : : : 1844.6 The Central Limit Theorem : : : : : : : : : : : : : : : : : : : : : 1874.7 Reliability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 193

May 31, 2003

viii Table of Contents

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 196

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 199

Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 216

5 Statistics 217

5.1 Parameter Estimators and Their Properties : : : : : : : : : : : : 218

5.2 Con�dence Intervals for the Mean | Known Variance : : : : : : 220

5.3 Con�dence Intervals for the Mean | Unknown Variance : : : : : 224

Applications : : : : : : : : : : : : : : : : : : : : : : : : : : : : 225

Sampling with and without Replacement : : : : : : : : : : : : 226

5.4 Con�dence Intervals for Other Parameters : : : : : : : : : : : : : 227

5.5 Histograms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 228

The Chi-Squared Test : : : : : : : : : : : : : : : : : : : : : : : 232

5.6 Monte{Carlo Estimation : : : : : : : : : : : : : : : : : : : : : : : 235

5.7 Con�dence Intervals for Gaussian Data : : : : : : : : : : : : : : : 237

Estimating the Mean : : : : : : : : : : : : : : : : : : : : : : : 237

Limiting t Distribution : : : : : : : : : : : : : : : : : : : : : : 239

Estimating the Variance | Known Mean : : : : : : : : : : : : 239

Estimating the Variance | Unknown Mean : : : : : : : : : : 241

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 244

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 246

Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 254

6 Bivariate Random Variables 255

6.1 Joint and Marginal Probabilities : : : : : : : : : : : : : : : : : : 255

Product Sets and Marginal Probabilities : : : : : : : : : : : : 258

Joint Cumulative Distribution Functions : : : : : : : : : : : : 259

Marginal Cumulative Distribution Functions : : : : : : : : : : 261

Independent Random Variables : : : : : : : : : : : : : : : : : 263

6.2 Jointly Continuous Random Variables : : : : : : : : : : : : : : : 264

Marginal Densities : : : : : : : : : : : : : : : : : : : : : : : : 267

Independence : : : : : : : : : : : : : : : : : : : : : : : : : : : 269

Expectation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 270?Continuous Random Variables that are not Jointly Continuous271

6.3 Conditional Probability and Expectation : : : : : : : : : : : : : : 272

6.4 The Bivariate Normal : : : : : : : : : : : : : : : : : : : : : : : : 279

6.5 Extension to Three or More Random Variables : : : : : : : : : : 283

The Law of Total Probability : : : : : : : : : : : : : : : : : : 285

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 287

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 288

Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 296

May 31, 2003

Table of Contents ix

7 Introduction to Random Processes 298

7.1 Characterization of Random Processes : : : : : : : : : : : : : : : 301

Mean and Correlation Functions : : : : : : : : : : : : : : : : : 302

Cross-Correlation Functions : : : : : : : : : : : : : : : : : : : 304

7.2 Strict-Sense and Wide-Sense Stationary Processes : : : : : : : : : 305

7.3 WSS Processes through LTI Systems : : : : : : : : : : : : : : : : 308

Time-Domain Analysis : : : : : : : : : : : : : : : : : : : : : : 309

Frequency-Domain Analysis : : : : : : : : : : : : : : : : : : : 310

7.4 Power Spectral Densities for WSS Processes : : : : : : : : : : : : 313

Power in a Process : : : : : : : : : : : : : : : : : : : : : : : : 313

White Noise : : : : : : : : : : : : : : : : : : : : : : : : : : : : 315

7.5 Characterization of Correlation Functions : : : : : : : : : : : : : 318

7.6 The Matched Filter : : : : : : : : : : : : : : : : : : : : : : : : : : 320

7.7 The Wiener Filter : : : : : : : : : : : : : : : : : : : : : : : : : : 322?Causal Wiener Filters : : : : : : : : : : : : : : : : : : : : : : 324

7.8 ?The Wiener{Khinchin Theorem : : : : : : : : : : : : : : : : : : 327

7.9 ?Mean-Square Ergodic Theorem for WSS Processes : : : : : : : : 329

7.10 ?Power Spectral Densities for non-WSS Processes : : : : : : : : : 331

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 333

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 335

Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 345

8 Random Vectors 348

8.1 Mean Vector, Covariance Matrix, and Characteristic Function : : 349

Mean and Covariance : : : : : : : : : : : : : : : : : : : : : : : 349

Decorrelation, Data Reduction, and the Karhunen{Lo�eve Ex-pansion : : : : : : : : : : : : : : : : : : : : : : : : : : : : 351

Characteristic Function : : : : : : : : : : : : : : : : : : : : : : 353

8.2 Transformations of Random Vectors : : : : : : : : : : : : : : : : 354

8.3 The Multivariate Gaussian : : : : : : : : : : : : : : : : : : : : : : 357

The Characteristic Function of a Gaussian Random Vector : : 358

For Gaussian Random Vectors Uncorrelated Implies Indepen-dent : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 358

The Density Function of a Gaussian Random Vector : : : : : 360

8.4 Estimation of Random Vectors : : : : : : : : : : : : : : : : : : : 363

Linear Minimum Mean Squared Error Estimation : : : : : : : 363

MinimumMean Squared Error Estimation : : : : : : : : : : : 367

8.5 Complex Random Variables and Vectors : : : : : : : : : : : : : : 368

Complex Gaussian Random Vectors : : : : : : : : : : : : : : : 370

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 371

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 372

Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 381

May 31, 2003

x Table of Contents

9 Advanced Concepts in Random Processes 384

9.1 The Poisson Process : : : : : : : : : : : : : : : : : : : : : : : : : 384?Derivation of the Poisson Probabilities : : : : : : : : : : : : : 388Marked Poisson Processes : : : : : : : : : : : : : : : : : : : : 391Shot Noise : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 391

9.2 Renewal Processes : : : : : : : : : : : : : : : : : : : : : : : : : : 3929.3 The Wiener Process : : : : : : : : : : : : : : : : : : : : : : : : : 393

Integrated-White-Noise Interpretation of the Wiener Process : 395The Problem with White Noise : : : : : : : : : : : : : : : : : 396The Wiener Integral : : : : : : : : : : : : : : : : : : : : : : : : 397Random Walk Approximation of the Wiener Process : : : : : 398

9.4 Speci�cation of Random Processes : : : : : : : : : : : : : : : : : 402Finitely Many Random Variables : : : : : : : : : : : : : : : : 402In�nite Sequences (Discrete Time) : : : : : : : : : : : : : : : : 403Continuous-Time Random Processes : : : : : : : : : : : : : : 406Gaussian Processes : : : : : : : : : : : : : : : : : : : : : : : : 407

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 408Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 408Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 416

10 Introduction to Markov Chains 417

10.1 Discrete-Time Markov Chains : : : : : : : : : : : : : : : : : : : : 418State Space and Transition Probabilities : : : : : : : : : : : : 420Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 421Stationary Distributions : : : : : : : : : : : : : : : : : : : : : 423Derivation of the Chapman{Kolmogorov Equation : : : : : : : 427Stationarity of the n-Step Transition Probabilities : : : : : : : 428

10.2 Limit Distributions : : : : : : : : : : : : : : : : : : : : : : : : : : 429Examples When Limit Distributions Do Not Exist : : : : : : : 430Classi�cation of States : : : : : : : : : : : : : : : : : : : : : : 431Other Characterizations of Transience and Recurrence : : : : 434Classes of States : : : : : : : : : : : : : : : : : : : : : : : : : : 435

10.3 Continuous-Time Markov Chains : : : : : : : : : : : : : : : : : : 438Behavior of Continuous-Time Markov Chains : : : : : : : : : 440Kolmogorov's Di�erential Equations : : : : : : : : : : : : : : : 442Stationary Distributions : : : : : : : : : : : : : : : : : : : : : 443Derivation of the Backward Equation : : : : : : : : : : : : : : 444

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 446Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 450

11 Mean Convergence and Applications 451

11.1 Convergence in Mean of Order p : : : : : : : : : : : : : : : : : : 452Continuity in Mean of Order p : : : : : : : : : : : : : : : : : : 456

11.2 Normed Vector Spaces of Random Variables : : : : : : : : : : : : 457Mean-Square Integrals : : : : : : : : : : : : : : : : : : : : : : 461

May 31, 2003

Table of Contents xi

11.3 The Karhunen{Lo�eve Expansion : : : : : : : : : : : : : : : : : : 46211.4 The Wiener Integral (Again) : : : : : : : : : : : : : : : : : : : : 46811.5 Projections, Orthogonality Principle, Projection Theorem : : : : 46911.6 Conditional Expectation and Probability : : : : : : : : : : : : : : 47311.7 The Spectral Representation : : : : : : : : : : : : : : : : : : : : : 480

Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 484Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 485Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 497

12 Other Modes of Convergence 499

12.1 Convergence in Probability : : : : : : : : : : : : : : : : : : : : : 50012.2 Convergence in Distribution : : : : : : : : : : : : : : : : : : : : : 50212.3 Almost Sure Convergence : : : : : : : : : : : : : : : : : : : : : : 508

The Skorohod Representation : : : : : : : : : : : : : : : : : : 514Notes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 515Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 516Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 525

13 Self Similarity and Long-Range Dependence 527

13.1 Self Similarity in Continuous Time : : : : : : : : : : : : : : : : : 528Implications of Self Similarity : : : : : : : : : : : : : : : : : : 528Stationary Increments : : : : : : : : : : : : : : : : : : : : : : : 529Fractional Brownian Motion : : : : : : : : : : : : : : : : : : : 530

13.2 Self Similarity in Discrete Time : : : : : : : : : : : : : : : : : : : 532Convergence Rates for the Mean-Square Ergodic Theorem : : 533Aggregation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 534The Power Spectral Density : : : : : : : : : : : : : : : : : : : 535Engineering vs. Statistics/Networking Notation : : : : : : : : 537

13.3 Asymptotic Second-Order Self Similarity : : : : : : : : : : : : : : 53813.4 Long-Range Dependence : : : : : : : : : : : : : : : : : : : : : : : 54113.5 ARMA Processes : : : : : : : : : : : : : : : : : : : : : : : : : : : 54313.6 ARIMA Processes : : : : : : : : : : : : : : : : : : : : : : : : : : 545

Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 547Exam Preparation : : : : : : : : : : : : : : : : : : : : : : : : : : 551

Bibliography 553

Index 556

May 31, 2003

CHAPTER 1

Introduction to Probability

Chapter OutlineWhy Do Electrical and Computer Engineers Need to Study Probability?Relative FrequencyWhat Is Probability Theory?Features of Each Chapter

1.1. Review of Set NotationSet Operations, Set Identities, Partitions, ?Functions, ?Countable and Un-countable Sets

1.2. Probability Models1.3. Axioms and Properties of Probability

Consequences of the Axioms1.4. Conditional Probability

The Law of Total Probability and Bayes' Rule1.5. Independence

Independence for More Than Two Events1.6. Combinatorics and Probability

Ordered Sampling with Replacement, Ordered Sampling without Replace-ment, Unordered Sampling without Replacement, Unordered Sampling withReplacement

Why Do Electrical and Computer Engineers Need to StudyProbability?

Probability theory provides powerful tools to explain, model, analyze, anddesign technology developed by electrical and computer engineers. Here are afew applications.

Signal Processing. My own interest in the subject arose when I was anundergraduate taking the required course in probability for electrical engineers.We were introduced to the following problem of detecting a known signal inadditive noise. Consider an air-tra�c control system that sends out a knownradar pulse. If there are no objects in range of the radar, the system returnsonly a noise waveform. If there is an object in range, the system returns there ected radar pulse plus noise. The overall goal is to design a system thatdecides whether received waveform is noise only or signal plus noise. Our classaddressed the subproblem of designing an optimal, linear, time-invariant systemto process the received waveform so as amplify the signal and suppress the noise

?Sections marked with a ? can be omitted in an introductory course.

1

2 Chap. 1 Introduction to Probability

by maximizing the signal-to-noise ratio. We learned that the optimal transferfunction is given by the matched �lter. You will study this in Chapter 7.

Computer Memories. Suppose you are designing a computer memory to holdk-bit words. To increase system reliability, you actually store n-bit words,where n� k of the bits are devoted to error correction. The memory is reliableif when reading a location, any k or more bits (out of n) are correctly recovered.How should n be chosen to guarantee a speci�ed reliability? You will be ableto answer questions like these after you study the binomial random variable inChapter 2.

Optical Communication Systems. Optical communication systems use pho-todetectors to interface between optical and electronic subsystems. When thesesystems are at the limits of the operating capabilities, the number of photoelec-trons produced by the photodetector is well-modeled by the Poisson� randomvariable you will study in Chapter 2 (see also the Poisson process in Chapter 9).In deciding whether a transmitted bit is a zero or a one, the receiver counts thenumber of photoelectrons and compares it to a threshold. System performanceis determined by computing the probability of this event.

Wireless Communication Systems. In order to enhance weak signals andmaximize the range of communication systems, it is necessary to use ampli�ers.Unfortunately, ampli�ers always generate thermal noise, which is added to thedesired signal. As a consequence of the underlying physics, the noise is Gaus-sian. Hence, the Gaussian density function, which you will meet in Chapter 3,plays a very prominent role in the analysis and design of communication sys-tems. When noncoherent receivers are used, e.g., noncoherent frequency shiftkeying, this naturally leads to the Rayleigh, chi-squared, noncentral chi-squared,and Rice density functions that you will meet in the problems in Chapters 3,4, 6, and 8.

Variability in Electronic Circuits. Although circuit manufacturing processesattempt to ensure that all items have nominal parameter values, there is alwayssome variation among items. How can we estimate the average values in abatch of items without testing all of them? How good is our estimate? You willlearn how to do this in Chapter 5 when you study parameter estimation andcon�dence intervals. Incidentally, the same concepts apply to the prediction ofpresidential elections by surveying only a few voters.

Computer Network Tra�c. Prior to the 1990s, network analysis and designwas carried out using long-established Markovian models [38, p. 1]. You willstudy Markov chains in Chapter 10. As self similarity was observed in thetra�c of local-area networks [32], wide-area networks [39], and in World WideWeb tra�c [13], a great research e�ort began to examine the impact of selfsimilarity on network analysis and design. This research has yielded some

�Many quantities in probability and statistics are named after famous mathematiciansand statisticians. You can use an Internet search engine to �nd pictures and biographies.At the time of this writing, numerous biographies of famous mathematicians and statis-ticians can be found at http://turnbull.mcs.st-and.ac.uk/history/BiogIndex.html andhttp://www.york.ac.uk/depts/maths/histstat/people/welcome.htm

May 28, 2003

Relative Frequency 3

surprising insights into questions about bu�er size vs. bandwidth, multiple-time-scale congestion control, connection duration prediction, and other issues[38, pp. 9{11]. In Chapter 13 you will be introduced to self similarity andrelated concepts.

In spite of the foregoing applications, probability was not originally de-veloped to handle problems in electrical and computer engineering. The �rstapplications of probability were to questions about gambling posed to Pascalin 1654 by the Chevalier de Mere. Later, probability theory was applied tothe determination of life expectancies and life-insurance premiums, the the-ory of measurement errors, and to statistical mechanics. Today, the theoryof probability and statistics is used in many other �elds, such as economics,�nance, medical treatment and drug studies, manufacturing quality control,public opinion surveys, etc.

Relative FrequencyConsider an experiment that can result inM possible outcomes, O1; : : : ; OM .

For example, in tossing a die, one of the six sides will land facing up. We couldlet Oi denote the outcome that the ith side faces up, i = 1; : : : ; 6. As anotherexample, there are M = 52 possible outcomes if we draw one card from a deckof playing cards. The simplest example we consider is the ipping of a coin. Inthis case there are two possible outcomes, \heads" and \tails." No matter whatthe experiment, suppose we perform it n times and make a note of how manytimes each outcome occurred. Each performance of the experiment is called atrial.y Let Nn(Oi) denote the number of times Oi occurred in n trials. Therelative frequency of outcome Oi is de�ned to be

Nn(Oi)

n:

Thus, the relative frequency Nn(Oi)=n is the fraction of times Oi occurred.Here are some simple computations using relative frequency. First,

Nn(O1) + � � �+ Nn(OM) = n;

and soNn(O1)

n+ � � �+ Nn(OM)

n= 1: (1.1)

Second, we can group outcomes together. For example, if the experiment istossing a die, let E denote the event that the outcome of a toss is a face withan even number of dots; i.e., E is the event that the outcome is O2, O4, or O6.If we let Nn(E) denote the number of times E occurred in n tosses, it is easyto see that

Nn(E) = Nn(O2) + Nn(O4) + Nn(O6);

yWhen there are only two outcomes, the repeated experimentsare calledBernoulli trials.

May 28, 2003

4 Chap. 1 Introduction to Probability

and so the relative frequency of E is

Nn(E)

n=

Nn(O2)

n+Nn(O4)

n+Nn(O6)

n: (1.2)

Practical experience has shown us that as the number of trials n becomeslarge, the relative frequencies settle down and appear to converge to some lim-iting value. This behavior is known as statistical regularity.

Example 1.1. Suppose we toss a fair coin 100 times and note the relativefrequency of heads. Experience tells us that the relative frequency should beabout 1=2. When we did this,z we got 0:47 and were not disappointed.

The tossing of a coin 100 times and recording the relative frequency of headsout of 100 tosses can be considered an experiment in itself. Since the numberof heads can range from 0 to 100, there are 101 possible outcomes, which wedenote by S0; : : : ; S100. In the preceding example, this experiment yielded S47.

Example 1.2. We performed the experiment with outcomes S0; : : : ; S1001000 times and counted the number of occurrences of each outcome. All trialsproduced between 33 and 68 heads. Rather the list N1000(Sk) for the remainingvalues of k, we summarize as follows:

N1000(S33) +N1000(S34) + N1000(S35) = 4

N1000(S36) +N1000(S37) + N1000(S38) = 6

N1000(S39) +N1000(S40) + N1000(S41) = 32

N1000(S42) +N1000(S43) + N1000(S44) = 98

N1000(S45) +N1000(S46) + N1000(S47) = 165

N1000(S48) +N1000(S49) + N1000(S50) = 230

N1000(S51) +N1000(S52) + N1000(S53) = 214

N1000(S54) +N1000(S55) + N1000(S56) = 144

N1000(S57) +N1000(S58) + N1000(S59) = 76

N1000(S60) +N1000(S61) + N1000(S62) = 21

N1000(S63) +N1000(S64) + N1000(S65) = 9

N1000(S66) +N1000(S67) + N1000(S68) = 1:

This data is illustrated in the histogram shown in Figure 1.1. (The bars arecentered over values of the form k=100; e.g., the bar of height 230 is centeredover 0:49.)

How can we explain this statistical regularity? Why does the bell-shapedcurve �t so well over the histogram?

zWe did not actually toss a coin. We used a random number generator to simulate thetoss of a fair coin. Simulation is discussed in Chapters 4 and 5.

May 28, 2003

What Is Probability Theory? 5

0.3 0.4 0.5 0.6 0.70

50

100

150

200

250

Figure 1.1. Histogram of Example 1.2 with overlay of a Gaussian density.

What Is Probability Theory?Axiomatic probability theory, which is the subject of this book, was de-

veloped by A. N. Kolmogorov in 1933. This theory speci�es a set of ax-ioms that a well-de�ned mathematical model is required to satisfy such that ifbO1; : : : ; bOM are the mathematical objects that correspond to experimental out-comes O1; : : : ; OM , and if bOi is assigned probabilityx pi, then the probabilitiesof events constructed from the bOi will satisfy the same additivity properties asrelative frequency such as (1.1) and (1.2). Furthermore, using the axioms, itcan be proved that

limn!1

Nn( bOi)

n= pi:

This result is a special case of the strong law of large numbers, which isderived in Chapter 12. (A related result, known as the weak law of largenumbers, is derived in Chapter 2.) The law of large numbers says that themathematical model has the statistical regularity that we observe experimen-tally. This is why probability theory has enjoyed great success in the analysis,design, and prediction of real-world systems.

Probability theory also explains why the histogram in Figure 1.1 agrees withthe bell-shaped curve overlaying it. If probability p is assigned to bOheads, thenthe probability of bSk (the mathematical object corresponding to Sk above) isgiven by the binomial probability formula (see Example 1.39 or Section 2.4)

n!

k!(n� k)!pk(1� p)n�k;

where k = 0; : : : ; n (in Example 1.2, n = 100 and p = 1=2). By the centrallimit theorem, which is derived in Chapter 4, if n is large, the above expressionis approximately equal to

1p2�np(1� p)

exp

��1

2

�k � nppnp(1� p)

�2�:

xThe concept of probability will be de�ned later.

May 28, 2003

6 Chap. 1 Introduction to Probability

(You should convince your self that the graph of e�x2

is indeed a bell-shapedcurve.)

Features of Each ChapterThe last three sections of every chapter are entitled Notes, Problems, and

Exam Preparation, respectively. The Notes section contains additional informa-tion referenced in the text by numerical superscripts. These notes are usuallyrather technical and can be skipped by the beginning student. However, thenotes provide a more in-depth discussion of certain topics that may be of in-terest to more advanced readers. The Problems section is an integral part ofthe book, and in some cases contains developments beyond those in the maintext. The instructor may wish to solve some of these problems in the lectures.Remarks, problems, and sections marked by a ? are intended for more advancedreaders, and can be omitted in a �rst course. The Exam Preparation sectionprovides a few study suggestions, including references to the more importantconcepts and formulas introduced in the chapter.

1.1. Review of Set NotationSince Kolmogorov's axiomatic theory is expressed in the language of sets,

we recall in this section some basic de�nitions, notation, and properties of sets.Let be a set of points. If ! is a point in , we write ! 2 . Let A and B

be two collections of points in . If every point in A also belongs to B, we saythat A is a subset of B, and we denote this by writing A � B. If A � B andB � A, then we write A = B; i.e., two sets are equal if they contain exactlythe same points.

Set relationships can be represented graphically in Venn diagrams. Inthese pictures, the whole space is represented by a rectangular region, andsubsets of are represented by disks or oval-shaped regions. For example, inFigure 1.2(a), the disk A is completely contained in the oval-shaped region B,thus depicting the relation A � B.

Set Operations

If A � , and ! 2 does not belong to A, we write ! =2 A. The set of allsuch ! is called the complement of A in ; i.e.,

Ac := f! 2 : ! =2 Ag:

This is illustrated in Figure 1.2(b), in which the shaded region is the complementof the disk A.

The empty set or null set of is denoted by 6 ; it contains no points of. Note that for any A � , 6 � A. Also, c = 6 .

The union of two subsets A and B is

A [B := f! 2 : ! 2 A or ! 2 Bg:

May 28, 2003

1.1 Review of Set Notation 7

( a )

A B A

Ac

( b )

Figure 1.2. (a) Venn diagram of A � B. (b) The complement of the disk A, denoted byAc, is the shaded part of the diagram.

Here \or" is inclusive; i.e., if ! 2 A [B, we permit ! to belong to either A orB or both. This is illustrated in Figure 1.3(a), in which the shaded region isthe union of the disk A and the oval-shaped region B.

The intersection of two subsets A and B is

A \B := f! 2 : ! 2 A and ! 2 Bg;

hence, ! 2 A \B if and only if ! belongs to both A and B. This is illustratedin Figure 1.3(b), in which the shaded area is the intersection of the disk A andthe oval-shaped region B. The reader should also note the following specialcase. If A � B (recall Figure 1.2(a)), then A\B = A. In particular, we alwayshave A \ = A and 6 \B = 6 .

B

( a )

A A B

( b )

Figure 1.3. (a) The shaded region is A [B. (b) The shaded region is A \B.

The set di�erence operation is de�ned by

B nA := B \Ac;

i.e., B nA is the set of ! 2 B that do not belong to A. In Figure 1.4(a), B nAis the shaded part of the oval-shaped region B.

Two subsets A and B are disjoint or mutually exclusive if A \B = 6 ;i.e., there is no point in that belongs to both A and B. This condition isdepicted in Figure 1.4(b).

May 28, 2003

8 Chap. 1 Introduction to Probability

BA

( a ) ( b )

B

A

Figure 1.4. (a) The shaded region is B n A. (b) Venn diagram of disjoint sets A and B.

Example 1.3. Let := f0; 1; 2; 3; 4;5;6; 7g, and put

A := f1; 2; 3; 4g; B := f3; 4; 5; 6g; and C := f5; 6g:Evaluate A [B, A \B, A \ C, Ac, and B nA.

Solution. It is easy to see that A [ B = f1; 2; 3; 4;5;6g, A \ B = f3; 4g,and A \C = 6 . Since Ac = f0; 5; 6; 7g,

B nA = B \Ac = f5; 6g = C:

Set Identities

Set operations are easily seen to obey the following relations. Some of theserelations are analogous to the familiar ones that apply to ordinary numbers ifwe think of union as the set analog of addition and intersection as the set analogof multiplication. Let A;B, and C be subsets of . The commutative lawsare

A [B = B [A and A \B = B \A: (1.3)

The associative laws are

A [ (B [C) = (A [B) [C and A \ (B \C) = (A \B) \C: (1.4)

The distributive laws are

A \ (B [C) = (A \B) [ (A \C) (1.5)

andA [ (B \C) = (A [B) \ (A [C): (1.6)

De Morgan's laws are

(A \B)c = Ac [Bc and (A [B)c = Ac \Bc: (1.7)

Formulas (1.3){(1.5) are exactly analogous to their numerical counterparts. For-mulas (1.6) and (1.7) do not have numerical counterparts. We also recall that

May 28, 2003

1.1 Review of Set Notation 9

A \ = A and 6 \ B = 6 ; hence, we can think of as the analog of thenumber one and 6 as the analog of the number zero. Another analog is theformula A [ 6 = A.

We next consider in�nite collections of subsets of . Suppose An � ,n = 1; 2; : : : : Then

1[n=1

An := f! 2 : ! 2 An for some 1 � n <1g:

In other words, ! 2 S1n=1An if and only if for at least one integer n satisfying

1 � n < 1, ! 2 An. This de�nition admits the possibility that ! 2 An formore than one value of n. Next, we de�ne

1\n=1

An := f! 2 : ! 2 An for all 1 � n <1g:

In other words, ! 2 T1n=1An if and only if ! 2 An for every positive integer n.

Example 1.4. Let denote the real numbers, = IR := (�1;1). Thenthe following in�nite intersections and unions can be simpli�ed. Consider theintersection

1\n=1

(�1; 1=n) = f! : ! < 1=n for all 1 � n <1g:

Now, if ! < 1=n for all 1 � n < 1, then ! cannot be positive; i.e., we musthave ! � 0. Conversely, if ! � 0, then for all 1 � n < 1, ! � 0 < 1=n. Itfollows that 1\

n=1

(�1; 1=n) = (�1; 0]:

Consider the in�nite union,

1[n=1

(�1;�1=n] = f! : ! � �1=n for some 1 � n <1g:

Now, if ! � �1=n for some n with 1 � n < 1, then we must have ! < 0.Conversely, if ! < 0, then for large enough n, ! � �1=n. Thus,

1[n=1

(�1;�1=n] = (�1; 0):

In a similar way, one can show that

1\n=1

[0; 1=n) = f0g;

May 28, 2003

10 Chap. 1 Introduction to Probability

as well as

1[n=1

(�1; n] = (�1;1) and1\n=1

(�1;�n] = 6 :

The following generalized distributive laws also hold,

B \� 1[n=1

An

�=

1[n=1

(B \An);

and

B [� 1\n=1

An

�=

1\n=1

(B [An):

We also have the generalized De Morgan's laws,� 1\n=1

An

�c=

1[n=1

Acn;

and � 1[n=1

An

�c=

1\n=1

Acn:

Finally, we will need the following de�nition. We say that subsets An; n =1; 2; : : : ; are pairwise disjoint if An \Am = 6 for all n 6= m.

Partitions

A family of sets Bn is called a partition if the sets are pairwise disjointand their union is the whole space . A partition of three sets B1, B2, and B3

is illustrated in Figure 1.5(a). Partitions are useful for chopping up sets intomanageable, disjoint pieces. Given a set A, write

A = A \

= A \�[

n

Bn

�=

[n

(A \Bn):

Since the Bn are pairwise disjoint, so are the pieces (A\Bn). This is illustratedin Figure 1.5(b), in which a disk is broken up into three disjoint pieces.

If a family of sets Bn is disjoint but their union is not equal to the wholespace, we can always add the remainder set

R :=

�[n

Bn

�c(1.8)

May 28, 2003

1.1 Review of Set Notation 11

B1

B2

B3

( a ) ( b )

B1

B2

B3

Figure 1.5. (a) The partition B1, B2, B3. (b) Using the partition to break up a disk intothree disjoint pieces (the shaded regions).

to the family to create a partition. Writing

= Rc [R=

�[n

Bn

�[R;

we see that the union of the augmented family is the whole space. It onlyremains to show that Bk \R = 6 . Write

Bk \R = Bk \�[

n

Bn

�c= Bk \

�\n

Bcn

�= Bk \Bc

k \� \n6=k

Bcn

�= 6 :

?Functions

A function consists of a set X of admissible inputs called the domain anda rule or mapping f that associates to each x 2 X a value f(x) that belongsto a set Y called the co-domain. We indicate this symbolically by writingf :X ! Y , and we say, \f maps X into Y ." Two functions are the same ifand only if they have the same domain, co-domain, and rule. If f :X ! Y andg:X ! Y , then the mappings f and g are the same if and only if f(x) = g(x)for all x 2 X.

The set of all possible values of f(x) is called the range. The range of afunction is the set ff(x) : x 2 Xg. In general, the range is a proper subset ofthe co-domain.

A function is said to be onto if its range is equal to its co-domain. In otherwords, every value y 2 Y \comes from somewhere" in the sense that for everyy 2 Y , there is at least one x 2 X with y = f(x).

?Sections marked with a ? can be omitted in an introductory course.

May 28, 2003

12 Chap. 1 Introduction to Probability

A function is said to be one-to-one if the condition f(x1) = f(x2) impliesx1 = x2.

Another way of thinking about the concepts of onto and one-to-one is thefollowing. A function is onto if for every y 2 Y , the equation f(x) = y has asolution. This does not rule out the possibility that there may be more thanone solution. A function is one-to-one if for every y 2 Y , the equation f(x) = ycan have at most one solution. This does not rule out the possibility that forsome values of y 2 Y , there may be no solution.

A function is said to be invertible if for every y 2 Y there is a uniquex 2 X with f(x) = y. Hence, a function is invertible if and only if it is bothone-to-one and onto; i.e., for every y 2 Y , the equation f(x) = y has a uniquesolution.

Example 1.5. For any real number x, put f(x) := x2. Then

f : (�1;1)! (�1;1)

f : (�1;1)! [0;1)

f : [0;1)! (�1;1)

f : [0;1)! [0;1)

speci�es four di�erent functions. In the �rst case, the function is not one-to-onebecause f(2) = f(�2), but 2 6= �2; the function is not onto because there isno x 2 (�1;1) with f(x) = �1. In the second case, the function is onto sincefor every y 2 [0;1), f(

py) = y. However, since f(�py) = y also, the function

is not one-to-one. In the third case, the function fails to be onto, but is one-to-one. In the fourth case, the function is onto and one-to-one and thereforeinvertible.

The last concept we introduce concerning functions is that of inverse image.If f :X ! Y , and if B � Y , then the inverse image of B is

f�1(B) := fx 2 X : f(x) 2 Bg;which we emphasize is a subset of X. This concept applies to any functionwhether or not it is invertible. When the set X is understood, we sometimeswrite

f�1(B) := fx : f(x) 2 Bgto simplify the notation.

Example 1.6. If f : (�1;1)! (�1;1), where f(x) = x2, �nd f�1([4; 9])and f�1([�9;�4]).

Solution. In the �rst case, write

f�1([4; 9]) = fx : f(x) 2 [4; 9]g

May 28, 2003

1.1 Review of Set Notation 13

= fx : 4 � f(x) � 9g= fx : 4 � x2 � 9g= fx : 2 � x � 3 or � 3 � x � �2g= [2; 3][ [�3;�2]:

In the second case, we need to �nd

f�1([�9;�4]) = fx : �9 � x2 � �4g:Since this is no x 2 (�1;1) with x2 < 0, f�1([�9;�4]) = 6 .

Remark. If we modify the function in the preceding example to bef : [0;1)! (�1;1), then f�1([4; 9]) = [2; 3] instead.

?Countable and Uncountable Sets

The number of points in a set A is denoted by jAj. We call jAj the cardi-nality of A. The cardinality of a set may be �nite or in�nite. A little re ectionshould convince you that if A and B are two disjoint sets, then

jA [Bj = jAj+ jBj:Use the convention that if x is a real number, then

x+1 = 1 and 1+1 = 1;

and be sure to consider the three cases: (i) A and B both have �nite cardinality,(ii) one has �nite cardinality and one has in�nite cardinality, and (iii) both havein�nite cardinality.

A set A is said to be countable if the elements of A can be enumerated orlisted in a sequence: a1; a2; : : : : In other words, a set A is countable if it can bewritten in the form

A =1[k=1

fakg;

where we emphasize that the union is over the positive integers, k = 1; 2; : : ::

Remark. Since there is no requirement that the ak be distinct, every �niteset is countable by our de�nition. For example, you should verify that the setA = f1; 2; 3g can be written in the above form by taking a1 = 1; a2 = 2; a3 = 3,and ak = 3 for k = 4; 5; : : : : By a countably in�nite set, we mean a countableset that is not �nite.

Example 1.7. Show that a set of the form

B =1[

i;j=1

fbijg

?Sections marked with a ? can be omitted in an introductory course.

May 28, 2003

14 Chap. 1 Introduction to Probability

is countable.

Solution. The point here is that a sequence that is doubly indexed bypositive integers forms a countable set. To see this, consider the array

b11 b12 b13 b14

b21 b22 b23

b31 b32

b41. . .

Now list the array elements along antidiagonals from lower left to upper rightde�ning

a1 := b11a2 := b21; a3 := b12a4 := b31; a5 := b22; a6 := b13a7 := b41; a8 := b32; a9 := b23; a10 := b14� � �

This shows that

B =1[k=1

fakg;

and so B is a countable set.

Example 1.8. Show that the positive rational numbers form a countablesubset.

Solution. Recall that a rational number is of the form i=j where i and jare integers with j 6= 0. Hence, the set of positive rational numbers is equal to

1[i;j=1

fi=jg:

By the previous example, this is a countable set.

You will show in Problem 12 that the union of two countable sets is acountable set. It then easily follows that the set of all rational numbers iscountable.

A set is uncountable or uncountably in�nite if it is not countable.

Example 1.9. Show that the set S of unending sequences of zeros andones is uncountable.

May 28, 2003

1.2 Probability Models 15

Solution. Suppose, to obtain a contradiction, that S is countable. Thenwe can exhaustively list all the elements of S. Such a list would look like

a1 := 1 0 1 1 0 1 0 1 1 � � �a2 := 0 0 1 0 1 1 0 0 0 � � �a3 := 1 1 1 0 1 0 1 0 1 � � �a4 := 1 1 0 1 0 0 1 1 0 � � �a5 := 0 1 1 0 0 0 0 0 0 � � �

.... . .

But this list can never be complete. To construct a new binary sequence thatis not in the list, use the following diagonal argument. Take a := 0 1 0 0 1 � � �to be such that kth bit of a is the complement of the kth bit of ak. In otherwords, viewing the above sequences as an in�nite matrix, go along the diagonaland ip all the bits to construct a. Then a 6= a1 because they di�er in the �rstbit. Similarly, a 6= a2 because they di�er in the second bit. And so on.

The same argument shows that the interval of real numbers [0; 1) is notcountable. Write each fractional real number in its binary expansion, e.g.,0:11010101110 : : : and identify the expansion with the corresponding sequenceof zeros and ones in the example.

1.2. Probability ModelsIn this section, we introduce a number of simple physical experiments and

suggest mathematical probability models for them. These models are used tocompute various probabilities of interest.

Consider the experiment of tossing a fair die and measuring, i.e., noting,the face turned up. Our intuition tells us that the \probability" of the ith faceturning up is 1=6, and that the \probability" of a face with an even number ofdots turning up is 1=2.

Here is a mathematical model for this experiment and measurement. Let be any set containing six points. We call the sample space. Each point in corresponds to, or models, a possible outcome of the experiment. The individualpoints ! 2 are called sample points or outcomes. For simplicity, let

:= f1; 2; 3; 4;5; 6g:

Now putFi := fig; i = 1; 2; 3; 4;5; 6;

andE := f2; 4; 6g:

We call the sets Fi and E events. An event is a collection of outcomes. Theevent Fi corresponds to, or models, the die's turning up showing the ith face.

May 28, 2003

16 Chap. 1 Introduction to Probability

Similarly, the event E models the die's showing a face with an even number ofdots. Next, for every subset A of , we denote the number of points in A byjAj. We call jAj the cardinality of A. We de�ne the probability of any eventA by

}(A) := jAj=jj:In other words, for the model we are constructing for this problem, the prob-ability of an event A is de�ned to be the number of outcomes in A dividedby the total number of possible outcomes. With this de�nition, it follows that}(Fi) = 1=6 and }(E) = 3=6 = 1=2, which agrees with our intuition.

We now make four observations about our model:

(i) }(6 ) = j6 j=jj = 0=jj = 0.(ii) }(A) � 0 for every event A.(iii) If A and B are mutually exclusive events, i.e., A \B = 6 , then }(A [

B) = }(A) +}(B); for example, F3 \ E = 6 , and it is easy to checkthat }(F3 [E) = }(f2; 3; 4; 6g) = }(F3) +}(E).

(iv) When the die is tossed, something happens; this is modeled mathemat-ically by the easily veri�ed fact that }() = 1.

As we shall see, these four properties hold for all the models discussed in thissection.

We next modify our model to accommodate an unfair die as follows. Observethat for a fair die,{

}(A) =jAjjj =

X!2A

1

jj =X!2A

p(!);

where p(!) := 1=jj. For an unfair die, we simply change the de�nition of thefunction p(!) to re ect the likelihood of occurrence of the various faces. Thisnew de�nition of } still satis�es (i) and (iii); however, to guarantee that (ii)and (iv) still hold, we must require that p be nonnegative and sum to one, or,in symbols, p(!) � 0 and

P!2 p(!) = 1.

Example 1.10. Construct a sample space and probability } to modelan unfair die in which faces 1{5 are equally likely, but face 6 has probability1=3. Using this model, compute the probability that a toss results in a faceshowing an even number of dots.

Solution. We again take = f1; 2; 3; 4;5;6g. To make face 6 have prob-ability 1=3, we take p(6) = 1=3. Since the other faces are equally likely, for! = 1; : : : ; 5, we take p(!) = c, where c is a constant to be determined. To �ndc we use the fact that

1 = }() =X!2

p(!) =6X

!=1

p(!) = 5c+1

3:

{If A = 6 , the summation is taken to be zero.

May 28, 2003

1.2 Probability Models 17

It follows that c = 2=15. Now that p(!) has been speci�ed for all !, we de�nethe probability of any event A by

}(A) :=X!2A

p(!):

Letting E = f2; 4; 6g model the result of a toss showing a face with an evennumber of dots, we compute

}(E) =X!2E

p(!) = p(2) + p(4) + p(6) =2

15+

2

15+1

3=

3

5:

This unfair die has a greater probability of showing an even numbered face thatthe fair die.

This problem is typical of the kinds of \word problems" to which probabilitytheory is applied to analyze well-de�ned physical experiments. The applicationof probability theory requires the modeler to take the following steps:

1. Select a suitable sample space .2. De�ne }(A) for all events A. For example, if is a �nite set and all

outcomes ! are equally likely, we usually take }(A) = jAj=jj. If it isnot the case that all outcomes are equally likely, e.g., as in the previousexample, then }(A) would be given by some other formula that you mustdetermine based on the problem statement.

3. Translate the given \word problem" into a problem requiring the calcu-lation of }(E) for some speci�c event E.

The following example gives a family of constructions that can be used tomodel experiments having a �nite number of possible outcomes.

Example 1.11. Let M be a positive integer, and put := f1; 2; : : :;Mg.Next, let p(1); : : : ; p(M ) be nonnegative real numbers such that

PM!=1 p(!) = 1.

For any subset A � , put

}(A) :=X!2A

p(!):

In particular, to model equally likely outcomes, or equivalently, outcomes thatoccur \at random," we take p(!) = 1=M . In this case, }(A) reduces to jAj=jj.

Example 1.12. A single card is drawn at random from a well-shu�eddeck of playing cards. Find the probability of drawing an ace. Also �nd theprobability of drawing a face card.

Solution. The �rst step in the solution is to specify the sample space and the probability }. Since there are 52 possible outcomes, we take :=

May 28, 2003

18 Chap. 1 Introduction to Probability

f1; : : : ; 52g. Each integer corresponds to one of the cards in the deck. To specify}, we must de�ne }(E) for all events E � . Since all cards are equally likelyto be drawn, we put }(E) := jEj=jj.

To �nd the desired probabilities, let 1; 2; 3; 4 correspond to the four aces, andlet 41; : : : ; 52 correspond to the 12 face cards. We identify the drawing of an acewith the event A := f1; 2; 3; 4g, and we identify the drawing of a face card withthe event F := f41; : : : ; 52g. It then follows that }(A) = jAj=52 = 4=52 = 1=13and }(F ) = jF j=52 = 12=52 = 3=13.

While the sample spaces in Example 1.11 can model any experiment witha �nite number of outcomes, it is often convenient to use alternative samplespaces.

Example 1.13. Suppose that we have two well-shu�ed decks of cards, andwe draw one card at random from each deck. What is the probability of drawingthe ace of spades followed by the jack of hearts? What is the probability ofdrawing an ace and a jack (in either order)?

Solution. The �rst step in the solution is to specify the sample space and the probability }. Since there are 52 possibilities for each draw, there are522 = 2,704 possible outcomes when drawing two cards. Let D := f1; : : : ; 52g,and put

:= f(i; j) : i; j 2 Dg:Then jj = jDj2 = 522 = 2,704 as required. Since all pairs are equally likely,we put }(E) := jEj=jj for arbitrary events E � .

As in the preceding example, we denote the aces by 1; 2; 3; 4. We let 1 denotethe ace of spades. We also denote the jacks by 41; 42; 43; 44, and the jack ofhearts by 42. The drawing of the ace of spades followed by the jack of heartsis identi�ed with the event

A := f(1; 42)g;

and so }(A) = 1=2,704 � 0:000370. The drawing of an ace and a jack isidenti�ed with B := Baj [Bja, where

Baj :=�(i; j) : i 2 f1; 2; 3; 4g and j 2 f41; 42; 43;44g

corresponds to the drawing of an ace followed by a jack, and

Bja :=�(i; j) : i 2 f41; 42; 43; 44g and j 2 f1; 2; 3; 4g

corresponds to the drawing of a jack followed by an ace. Since Baj and Bja aredisjoint,}(B) = }(Baj)+}(Bja) = (jBajj+jBjaj)=jj. Since jBajj = jBjaj = 16,}(B) = 2 � 16=2,704 = 2=169 � 0:0118.

May 28, 2003

1.2 Probability Models 19

Example 1.14. Two cards are drawn at random from a single well-shu�eddeck of playing cards. What is the probability of drawing the ace of spadesfollowed by the jack of hearts? What is the probability of drawing an ace anda jack (in either order)?

Solution. The �rst step in the solution is to specify the sample space andthe probability}. There are 52 possibilities for the �rst draw and 51 possibilitiesfor the second. Hence, the sample space should contain 52�51 = 2,652 elements.Using the notation of the preceding example, we take

:= f(i; j) : i; j 2 D with i 6= jg;

Note that jj = 522 � 52 = 2,652 as required. Again, all such pairs are equallylikely, and so we take }(E) := jEj=jj for arbitrary events E � . The eventsA and B are de�ned as before, and the calculation is the same except thatjj = 2,652 instead of 2,704. Hence, }(A) = 1=2,652 � 0:000377, and }(B) =2 � 16=2,652 = 8=663 � 0:012.

In some experiments, the number of possible outcomes is countably in�nite.For example, consider the tossing of a coin until the �rst heads appears. Here isa model for such situations. Let denote the set of all positive integers, :=f1; 2; : : :g. For ! 2 , let p(!) be nonnegative, and suppose that

P1!=1 p(!) =

1. For any subset A � , put

}(A) :=X!2A

p(!):

This construction can be used to model the coin tossing experiment by identi-fying ! = i with the outcome that the �rst heads appears on the ith toss. Ifthe probability of tails on a single toss is � (0 � � < 1), it can be shown thatwe should take p(!) = �!�1(1 � �) (cf. Example 2.9). To �nd the probabil-ity that the �rst head occurs before the fourth toss, we compute }(A), whereA = f1; 2; 3g. Then

}(A) = p(1) + p(2) + p(3) = (1 + �+ �2)(1 � �):

If � = 1=2, }(A) = (1 + 1=2 + 1=4)=2 = 7=8.

For some experiments, the number of possible outcomes is more than count-ably in�nite. Examples include the lifetime of a lightbulb or a transistor, anoise voltage in a radio receiver, and the arrival time of a city bus. In thesecases, } is usually de�ned as an integral,

}(A) :=

ZA

f(!) d!; A � ;

for some nonnegative function f . Note that f must also satisfyRf(!) d! = 1.

May 28, 2003

20 Chap. 1 Introduction to Probability

Example 1.15. Consider the following model for the lifetime of a light-bulb. For the sample space we take the nonnegative half line, := [0;1), andwe put

}(A) :=

ZA

f(!) d!;

where, for example, f(!) := e�!. Then the probability that the lightbulb'slifetime is between 5 and 7 time units is

}([5; 7]) =

Z 7

5

e�! d! = e�5 � e�7:

Example 1.16. A certain bus is scheduled to pick up riders at 9:15. How-ever, it is known that the bus arrives randomly in the 20-minute interval between9:05 and 9:25, and departs immediately after boarding waiting passengers. Findthe probability that the bus arrives at or after its scheduled pick-up time.

Solution. Let := [5; 25], and put

}(A) :=

ZA

f(!) d!:

Now, the term \randomly" in the problem statement is usually taken to meanthat f(!) � constant. In order that }() = 1, we must choose the constant tobe 1=length() = 1=20. We represent the bus arriving at or after 9:15 with theevent L := [15; 25]. Then

}(L) =

Z[15;25]

1

20d! =

Z 25

15

1

20d! =

25� 15

20=

1

2:

Example 1.17. A dart is thrown at random toward a circular dartboard ofradius 10 cm. Assume the thrower never misses the board. Find the probabilitythat the dart lands within 2 cm of the center.

Solution. Let := f(x; y) : x2 + y2 � 100g, and for any A � , put

}(A) :=area(A)

area()=

area(A)

100�:

We then identify the event A := f(x; y) : x2 + y2 � 4g with the dart's landingwithin 2 cm of the center. Hence,

}(A) =4�

100�= 0:04:

May 28, 2003

1.3 Axioms and Properties of Probability 21

1.3. Axioms and Properties of ProbabilityIn this section, we present Kolmogorov's axioms and derive some of their

consequences.The probabilitymodels of the preceding section suggest the following axioms

that we now require of any probability model.

Given a nonempty set , called the sample space, and a function} de�nedon the subsets of , we say } is a probability measure if the following fouraxioms are satis�ed:1

(i) The empty set 6 is called the impossible event. The probability ofthe impossible event is zero; i.e., }(6 ) = 0.

(ii) Probabilities are nonnegative; i.e., for any event A, }(A) � 0.(iii) If A1; A2; : : : are events that are mutually exclusive or pairwise disjoint,

i.e., An \Am = 6 for n 6= m, thenk

}

� 1[n=1

An

�=

1Xn=1

}(An): (1.9)

This property is summarized by saying that the probability of the unionof disjoint events is the sum of the probabilities of the individual events,or more brie y, \the probabilities of disjoint events add."

(iv) The entire sample space is called the sure event or the certainevent. Its probability is always one; i.e., }() = 1.

A probability measure is a function whose argument is an event and whosevalue is a nonnegative real number. The foregoing axioms imply many otherproperties. In particular, we show later that the value of a probability measuremust lie in the interval [0; 1].

At this point, advanced readers, especially graduate students,should read Note 1 in the Notes section at the end of the chapter.

We now give an interpretation of how and } model randomness. Weview the sample space as being the set of all possible \states of nature."First, Mother Nature chooses a state !0 2 . We do not know which statehas been chosen. We then conduct an experiment, and based on some physicalmeasurement, we are able to determine that !0 2 A for some event A � . Insome cases, A = f!0g, that is, our measurement reveals exactly which state !0was chosen by Mother Nature. (This is the case for the events Fi de�ned atthe beginning of Section 1.2). In other cases, the set A contains !0 as well asother points of the sample space. (This is the case for the event E de�ned atthe beginning of Section 1.2). In either case, we do not know before making

kSee the paragraphFinite Disjoint Unions below and Problem 25 for further discussionregarding this axiom.

May 28, 2003

22 Chap. 1 Introduction to Probability

the measurement what measurement value we will get, and so we do not knowwhat event A Mother Nature's !0 will belong to. Hence, in many applications,e.g., gambling, weather prediction, computer message tra�c, etc., it is useful tocompute }(A) for various events to determine which ones are most probable.

Consequences of the Axioms

Axioms (i){(iv) that characterize a probability measure have several impor-tant implications as discussed below.

Finite Disjoint Unions. Let N be a positive integer. By taking An = 6 forn > N in axiom (iii), we obtain the special case

}

� N[n=1

An

�=

NXn=1

}(An); An pairwise disjoint:

Remark. It is not possible to go backwards and use this special case toderive axiom (iii).

Example 1.18. If A is an event consisting of a �nite number of samplepoints, say A = f!1; : : : ; !Ng, then2 }(A) =

PNn=1

}(f!ng). Similarly, if Aconsists of a countably many sample points, say A = f!1; !2; : : :g, then directlyfrom axiom (iii), }(A) =

P1n=1

}(f!ng).

Probability of a Complement. Given an event A, we can always write =A [ Ac, which is a �nite disjoint union. Hence, }() = }(A) +}(Ac). Since}() = 1, we �nd that

}(Ac) = 1�}(A): (1.10)

Monotonicity. If A and B are events, then

A � B implies }(A) � }(B): (1.11)

To see this, �rst note that A � B implies

B = A [ (B \Ac):This relation is depicted in Figure 1.6, in which the disk A is a subset of theoval-shaped region B; the shaded region is B \Ac. The �gure shows that B isthe disjoint union of the disk A together with the shaded region B \Ac. SinceB = A [ (B \Ac) is a disjoint union, and since probabilities are nonnegative,

}(B) = }(A) +}(B \Ac)� }(A):

May 28, 2003

1.3 Axioms and Properties of Probability 23

BA

Figure 1.6. In this diagram, the disk A is a subset of the oval-shaped region B; the shadedregion is B \ Ac, and B = A [ (B \ Ac).

Note that the special case B = results in }(A) � 1 for every event A. Inother words, probabilities are always less than or equal to one.

Inclusion-Exclusion. Given any two events A and B, we always have

}(A [B) = }(A) +}(B) �}(A \B): (1.12)

To derive (1.12), �rst note that (see Figure 1.7)

B

( a )

A BA

( b )

Figure 1.7. (a) Decomposition A = (A \Bc)[ (A \B). (b) Decomposition B = (A \B)[(Ac \B).

A = (A \Bc) [ (A \B)and

B = (A \B) [ (Ac \B):

BA

Figure 1.8. Decomposition A [B = (A \Bc) [ (A \B)[ (Ac \B).

May 28, 2003

24 Chap. 1 Introduction to Probability

Hence,

A [B =�(A \Bc) [ (A \B)� [ �(A \B) [ (Ac \B)�:

The two copies of A \B can be reduced to one using the identity F [ F = Ffor any set F . Thus,

A [B = (A \Bc) [ (A \B) [ (Ac \B):A Venn diagram depicting this last decomposition is shown in Figure 1.8. Tak-ing probabilities of the preceding equations, which involve disjoint unions, we�nd that

}(A) = }(A \Bc) +}(A \B);}(B) = }(A \B) +}(Ac \B);

}(A [B) = }(A \Bc) +}(A \B) +}(Ac \B):Using the �rst two equations, solve for }(A\Bc) and }(Ac \B), respectively,and then substitute into the �rst and third terms on the right-hand side of thelast equation. This results in

}(A [B) =�}(A) �}(A \B)� +}(A \B)+�}(B) �}(A \B)�

= }(A) +}(B) �}(A \B):Limit Properties. Using axioms (i){(iv), the following formulas can be de-

rived (see Problems 26{28). For any sequence of events An,

}

� 1[n=1

An

�= lim

N!1}

� N[n=1

An

�; (1.13)

and

}

� 1\n=1

An

�= lim

N!1}

� N\n=1

An

�: (1.14)

In particular, notice that if the An are increasing in the sense that An � An+1for all n, then the �nite union in (1.13) reduces to AN (see Figure 1.9(a)). Thus,(1.13) becomes

}

� 1[n=1

An

�= lim

N!1}(AN ); if An � An+1: (1.15)

Similarly, if the An are decreasing in the sense that An+1 � An for all n, thenthe �nite intersection in (1.14) reduces to AN (see Figure 1.9(b)). Thus, (1.14)becomes

}

� 1\n=1

An

�= lim

N!1}(AN ); if An+1 � An: (1.16)

May 28, 2003

1.4 Conditional Probability 25

( a )

A1 A2 A3 A1 A2 A3

( b )

Figure 1.9. (a) For increasing events A1 � A2 � A3, the union A1 [A2 [A3 = A3. (b) Fordecreasing events A1 � A2 � A3 , the intersection A1 \ A2 \ A3 = A3.

Formulas (1.12) and (1.13) together imply that for any sequence of events An,

}

� 1[n=1

An

��

1Xn=1

}(An): (1.17)

This formula is known as the union bound in engineering and as countablesubadditivity in mathematics. It is derived in Problems 29 and 30 at the endof the chapter.

1.4. Conditional ProbabilitySuppose we want to �nd out if there is a relationship between cancer and

smoking. We survey n people and �nd out if they have cancer and if theyhave smoked. For each person, there are four possible outcomes, depending onwhether the person has cancer (c) or does not have cancer (nc) and whether theperson is a smoker (s) or a nonsmoker (ns). We denote these pairs of outcomesby Oc;s, Oc;ns, Onc;s, and Onc;ns. The numbers of each outcome can be arrangedin the matrix �

N (Oc;s) N (Oc;ns)N (Onc;s) N (Onc;ns)

�: (1.18)

The sum of the �rst column is the number of smokers, which we denote byN (Os). The sum of the second column is the number of nonsmokers, which wedenote by N (Ons).

The relative frequency of smokers who have cancer is N (Oc;s)=N (Os), andthe relative frequency of nonsmokers who have cancer is N (Oc;ns)=N (Ons). IfN (Oc;s)=N (Os) is substantially greater than N (Oc;ns)=N (Ons), we would con-clude that smoking increases the occurrence of cancer.

Notice that the relative frequency of smokers who have cancer can also bewritten as the quotient of relative frequencies,

N (Oc;s)

N (Os)=

N (Oc;s)=n

N (Os)=n: (1.19)

This suggests the following de�nition of conditional probability. Let be asample space. Let the event S model a person's being a smoker, and let the event

May 28, 2003

26 Chap. 1 Introduction to Probability

C model a person's having cancer. In our model, the conditional probabilitythat a person has cancer given that the person is a smoker is de�ned by

}(CjS) :=}(C \ S)}(S)

;

where the probabilities model the relative frequencies on the right-hand side of(1.19). This de�nition makes sense only if }(S) > 0. If }(S) = 0, }(CjS) isnot de�ned.

Given any two events A and B of positive probability,

}(AjB) =}(A \B)}(B)

(1.20)

and

}(BjA) =}(A \B)}(A)

:

From (1.20), we see that

}(A \B) = }(AjB)}(B): (1.21)

Substituting this into the numerator above yields

}(BjA) =}(AjB)}(B)

}(A): (1.22)

We next turn to the problem of computing the denominator }(A).

The Law of Total Probability and Bayes' Rule

From the identity (recall Figure 1.7(a))

A = (A \B) [ (A \Bc);

it follows that

}(A) = }(A \B) +}(A \Bc):

Using (1.21), we have

}(A) = }(AjB)}(B) +}(AjBc)}(Bc): (1.23)

This formula is the simplest version of the law of total probability. Substi-tuting (1.23) into (1.22), we obtain

}(BjA) =}(AjB)}(B)

}(AjB)}(B) +}(AjBc)}(Bc): (1.24)

May 28, 2003

1.4 Conditional Probability 27

This formula is the simplest version of Bayes' rule.As illustrated in the following example, it is not necessary to remember

Bayes' rule as long as you know the de�nition of conditional probability andthe law of total probability.

Example 1.19. Polychlorinated biphenyls (PCBs) are toxic chemicals.Given that you are exposed to PCBs, suppose that your conditional proba-bility of developing cancer is 1/3. Given that you are not exposed to PCBs,suppose that your conditional probability of developing cancer is 1/4. Supposethat the probability of being exposed to PCBs is 3/4. Find the probability thatyou were exposed to PCBs given that you do not develop cancer.

Solution. To solve this problem, we use the notation��

E = fexposed to PCBsg and C = fdevelop cancerg:With this notation, it is easy to interpret the problem as telling us that

}(CjE) = 1=3; }(CjEc) = 1=4; and }(E) = 3=4; (1.25)

and asking us to �nd }(EjCc). Before solving the problem, note that theabove data implies three additional equations as follows. First, recall that}(Ec) = 1 � }(E). Similarly, since conditional probability is a probabilityas a function of its �rst argument, we can write }(CcjE) = 1 �}(CjE) and}(CcjEc) = 1�}(CjEc). Hence,

}(CcjE) = 2=3; }(CcjEc) = 3=4; and }(Ec) = 1=4: (1.26)

To �nd the desired conditional probability, we write

}(EjCc) =}(E \ Cc)}(Cc)

=}(CcjE)}(E)

}(Cc)

=(2=3)(3=4)}(Cc)

=1=2}(Cc)

:

To �nd the denominator, we use the law of total probability to write

}(Cc) = }(CcjE)}(E) +}(CcjEc)}(Ec)

= (2=3)(3=4) + (3=4)(1=4) = 11=16:

��In working this example, we follow commonpractice and do not explicitly specify the sam-ple space or the probabilitymeasure}. Hence, the expression\letE = fexposed to PCBsg"is shorthand for \let E be the subset of that models being exposed to PCBs." The curiousreader may �nd one possible choice for and }, along with precise mathematical de�nitionsof the events E and C, in Note 3.

May 28, 2003

28 Chap. 1 Introduction to Probability

Hence,

}(EjCc) =1=2

11=16=

8

11:

We now generalize the law of total probability. Let Bn be a sequence ofpairwise disjoint events such that

Pn}(Bn) = 1. Then for any event A,

}(A) =Xn

}(AjBn)}(Bn):

To derive this result, put B :=SnBn, and observe thatyy

}(B) =Xn

}(Bn) = 1:

It follows that }(Bc) = 1 �}(B) = 0. Next, for any event A, A \ Bc � Bc,and so

0 � }(A \Bc) � }(Bc) = 0:

Hence, }(A \Bc) = 0. Writing (recall Figure 1.7(a))

A = (A \B) [ (A \Bc);

it follows that

}(A) = }(A \B) +}(A \Bc)

= }(A \B)= }

�A \

�[n

Bn

��= }

�[n

[A \Bn]�

=Xn

}(A \Bn): (1.27)

To compute }(BkjA), write

}(BkjA) =}(A \Bk)}(A)

=}(AjBk)}(Bk)

}(A):

Applying the law of total probability to }(A) in the denominator yields thegeneral form of Bayes' rule,

}(BkjA) =}(AjBk)}(Bk)Xn

}(AjBn)}(Bn):

yyNotice that since we do not requireS

nBn = , the Bn do not, strictly speaking, form

a partition. However, since }(B) = 1, the remainder set (cf. (1.8)), which in this case is Bc,has probability zero.

May 28, 2003

1.5 Independence 29

In formulas like this, A is an event that we observe, while the Bn are eventsthat we cannot observe but would like to make some inference about. Beforemaking any observations, we know the prior probabilities }(Bn), and weknow the conditional probabilities }(AjBn). After we observe A, we computethe posterior probabilities}(BkjA) for each k.

Example 1.20. In Example 1.19, before we learn any information abouta person, that person's prior probability of being exposed to PCBs is }(E) =3=4 = 0:75. After we observe that the person does not develop cancer, theposterior probability that the person was exposed to PCBs is}(EjCc) = 8=11 �0:73, which is lower than the prior probability.

1.5. IndependenceIn the previous section, we discussed how we might determine if there is a

relationship between cancer and smoking. We said that if the relative frequencyof smokers who have cancer is substantially larger than the relative frequencyof nonsmokers who have cancer, we would conclude that smoking a�ects theoccurrence of cancer. On the other hand, if the relative frequencies of canceramong smokers and nonsmokers are about the same, we would say that theoccurrence of cancer does not depend on whether or not a person smokes.

In probability theory, if events A and B satisfy }(AjB) = }(AjBc), we sayA does not depend on B. This condition says that

}(A \B)}(B)

=}(A \Bc)}(Bc)

: (1.28)

Applying the formulas }(Bc) = 1�}(B) and}(A) = }(A \B) +}(A \Bc)

to the right-hand side yields

}(A \B)}(B)

=}(A)�}(A \B)

1�}(B) :

Cross multiplying to eliminate the denominators gives

}(A \B)[1 �}(B)] = }(B)[}(A)�}(A \B)]:Subtracting common terms from both sides shows that }(A\B) =}(A)}(B).Since this sequence of calculations is reversible, and since the condition }(A \B) = }(A)}(B) is symmetric in A and B, it follows that A does not dependon B if and only if B does not depend on A.

When events A and B satisfy

}(A \B) = }(A)}(B); (1.29)

we say they are statistically independent, or just independent.

May 28, 2003

30 Chap. 1 Introduction to Probability

Caution: The reader is warned to make sure he or she understands thedi�erence between disjoint sets and independent events. Recall thatA andB aredisjoint if A\B = 6 . This concept does not involve} in any way; to determineif A and B are disjoint requires only knowledge of A and B themselves. Onthe other hand, (1.29) implies that independence does depend on } and notjust on A and B. To determine if A and B are independent requires not onlyknowledge of A and B, but also knowledge of }. See Problem 48.

In arriving at (1.29) as the de�nition of independent events, we noted that(1.29) is equivalent to (1.28). Hence, if A and B are independent, }(AjB) =}(AjBc). What is this common value? Write

}(AjB) =}(A \B)}(B)

=}(A)}(B)}(B)

= }(A):

We now make some further observations about independence. First, it is asimple exercise to show that if A and B are independent events, then so are Aand Bc, Ac and B, and Ac and Bc. For example, writing

}(A) = }(A \B) +}(A \Bc)

= }(A)}(B) +}(A \Bc);

we have

}(A \Bc) = }(A) �}(A)}(B)= }(A)[1�}(B)]= }(A)}(Bc):

By interchanging the roles of A and Ac and/or B and Bc, it follows that if anyone of the four pairs is independent, then so are the other three.

Example 1.21. An Internet packet travels from its source to router 1,from router 1 to router 2, and from router 2 to its destination. If routers droppackets independently with probability p, what is the probability that a packetis successfully transmitted from its source to its destination?

Solution. For i = 1; 2; let Di denote the event that the packet is droppedby router i. Let S denote the event that the packet is successfully transmittedfrom the source to the destination. Observe that S occurs if and only if thepacket is not dropped by router 1 and it is not dropped by router 2. We canwrite this symbolically as

S = Dc1 \Dc

2:

Since the problem tells us that D1 and D2 are independent events, so are Dc1

and Dc2. Hence,

}(S) = }(Dc1 \Dc

2)

= }(Dc1)}(D

c2)

= [1�}(D1)] [1�}(D2)]

= (1 � p)2:

May 28, 2003

1.5 Independence 31

Now suppose that A and B are any two events. If }(B) = 0, then we claimthat A and B are independent. We must show that

}(A \B) = }(A)}(B) = 0:

To show that the left-hand side is zero, observe that since probabilities arenonnegative, and since A \B � B,

0 � }(A \B) � }(B) = 0: (1.30)

We now show that if }(B) = 1, then A and B are independent. Since}(B) = 1,}(Bc) = 1�}(B) = 0, and it follows that A and Bc are independent.But then so are A and B.

Independence for More Than Two Events

Suppose that for j = 1; 2; : : :; Aj is an event. When we say that the Aj areindependent, we certainly want that for any i 6= j,

}(Ai \Aj) = }(Ai)}(Aj):

And for any distinct i; j; k, we want

}(Ai \Aj \Ak) = }(Ai)}(Aj)}(Ak):

We want analogous equations to hold for any four events, �ve events, and soon. In general, we want that for every �nite subset J containing two or morepositive integers,

}

� \j2J

Aj

�=Yj2J

}(Aj):

In other words, we want the probability of every intersection involving �nitelymany of the Aj to be equal to the product of the probabilities of the individualevents. If the above equation holds for all �nite subsets of two or more positiveintegers, then we say that the Aj are mutually independent, or just inde-pendent. If the above equation holds for all subsets J containing exactly twopositive integers but not necessarily for all �nite subsets of 3 or more positiveintegers, we say that the Aj are pairwise independent.

Example 1.22. Given three events, say A, B, and C, they are mutuallyindependent if and only if the following equations all hold,

}(A \B \C) = }(A)}(B)}(C)

}(A \B) = }(A)}(B)

}(A \C) = }(A)}(C)

}(B \C) = }(B)}(C):

May 28, 2003

32 Chap. 1 Introduction to Probability

It is possible to construct events A, B, and C such that the last three equationshold (pairwise independence), but the �rst one does not.4 It is also possiblefor the �rst equation to hold while the last three fail.5

Example 1.23. A coin is tossed three times and the number of heads isnoted. Find the probability that the number of heads is two, assuming thetosses are mutually independent and that on each toss the probability of headsis � for some �xed 0 � � � 1.

Solution. To solve the problem, let Hi denote the event that the ith tossis heads (so }(Hi) = �), and let S2 denote the event that the number of headsin three tosses is 2.� Then

S2 = (H1 \H2 \Hc3) [ (H1 \Hc

2 \H3) [ (Hc1 \H2 \H3):

This is a disjoint union, and so }(S2) is equal to

}(H1 \H2 \Hc3) +}(H1 \Hc

2 \H3) +}(Hc1 \H2 \H3): (1.31)

Next, since H1, H2, and H3 are mutually independent, so are (H1 \H2) andH3. Hence, (H1 \H2) and Hc

3 are also independent. Thus,

}(H1 \H2 \Hc3) = }(H1 \H2)}(H

c3)

= }(H1)}(H2)}(Hc3)

= �2(1� �):

Treating the last two terms in (1.31) similarly, we have }(S2) = 3�2(1� �). Ifthe coin is fair, i.e., � = 1=2, then }(S2) = 3=8.

Example 1.24. If A1; A2; : : : are mutually independent, show that

}

� 1\n=1

An

�=

1Yn=1

}(An):

Solution. Write

}

� 1\n=1

An

�= lim

N!1}

� N\n=1

An

�; by limit property (1.14);

= limN!1

NYn=1

}(An); by independence;

=1Yn=1

}(An);

�In working this example, we again do not explicitly specify the sample space or theprobability measure }. The interested reader can �nd one possible choice for and } inNote 6.

May 28, 2003

1.6 Combinatorics and Probability 33

where the last step is just the de�nition of the in�nite product.

Example 1.25. Consider an in�nite sequence of independent coin tosses.Assume that the probability of heads is 0 < p < 1. What is the probability ofseeing all heads? What is the probability of ever seeing heads?

Solution. We use the result of the preceding example as follows. Let be a sample space equipped with a probability measure } and events An; n =1; 2; : : : ; with }(An) = p, where the An are mutually independent.7 The eventAn corresponds to, or models, the outcome that the nth toss results in a heads.The outcome of seeing all heads corresponds to the event

T1n=1An, and its

probability is

}

� 1\n=1

An

�= lim

N!1

NYn=1

}(An) = limN!1

pN = 0:

The outcome of ever seeing heads corresponds to the event A :=S1n=1An.

Since}(A) = 1�}(Ac), it su�ces to compute the probability of Ac =T1n=1A

cn.

Arguing exactly as above, we have

}

� 1\n=1

Acn

�= lim

N!1

NYn=1

}(Acn) = limN!1

(1� p)N = 0:

Thus, }(A) = 1� 0 = 1.

1.6. Combinatorics and ProbabilityNote to the Reader. The material in this section is not required anywhere

else in the book, and can be covered at any time.

There are many probability problems, especially those concerned with gam-bling, that can ultimately be reduced to questions about cardinalities of varioussets. We saw several examples in Section 1.2. Those examples were simple, andthey were chosen so that it was easy to determine the cardinalities of the re-quired sets. However, in more complicated problems, it is extremely helpful tohave some systematic methods for �nding cardinalities of sets. Combinatoricsis the study of systematic counting methods, which we will be using to �nd thecardinalities of various sets that arise in probability. The four kinds of countingproblems we discuss are

(i) ordered sampling with replacement;(ii) ordered sampling without replacement;(iii) unordered sampling without replacement;(iv) unordered sampling with replacement.

Of these, the �rst two are rather straightforward, and the last two are somewhatcomplicated.

May 28, 2003

34 Chap. 1 Introduction to Probability

Ordered Sampling with Replacement

Before stating the problem, we begin with some examples to illustrate theconcepts to be used.

Example 1.26. Let A, B, and C be �nite sets. How many triples are thereof the form (a; b; c), where a 2 A, b 2 B, and c 2 C?

Solution. Since there are jAj choices for a, jBj choices for b, and jCj choicesfor c, the total number of triples is jAj � jBj � jCj.

Similar reasoning shows that for k �nite sets A1; : : : ; Ak, there are jA1j � � � jAkjk-tuples of the form (a1; : : : ; ak) where each ai 2 Ai.

Example 1.27. How many k-digit numbers are there?

Solution. Taking \digit" to mean \decimal digit," we put each Ai :=f0; : : : ; 9g. Since each jAij = 10, there are 10k k-digit numbers.

As the previous example shows, it is often convenient to take all of the Aito be the same set.

Example 1.28 (Ordered Sampling with Replacement). From a deck of ncards, we draw k cards with replacement; i.e., we draw each card, make a noteof it, put the card back in the deck and re-shu�e the deck before choosing thenext card. How many di�erent sequences of k cards can be drawn in this way?

Solution. Although there is only one deck of cards, because we draw withreplacement and re-shu�e, we may as well think of ourselves as having k decksof n cards each. Think of each Ai := f1; : : : ; ng. Then we need to know howmany k-tuples there are of the form (a1; : : : ; ak) where each ai 2 Ai. We seethat the answer is nk.

Ordered Sampling without Replacement

In Example 1.26, we formed triples (a; b; c) where no matter which a 2 Awe chose, it did not a�ect which elements we were allowed to choose from thesets B or C. We next consider the construction of k-tuples in which our choicefor the each entry a�ects the choices available for the remaining entries.

Example 1.29 (Ordered Sampling without Replacement). From a deck ofn cards, we draw k � n cards without replacement. How many sequences canbe drawn in this way?

May 28, 2003

1.6 Combinatorics and Probability 35

Solution. There are n cards for the �rst draw, n� 1 cards for the seconddraw, and so on. Hence, there are

n(n� 1) � � � (n� [k � 1]) =n!

(n � k)!

di�erent sequences.

Example 1.30. Let A be a �nite set of n elements. How may k-tuples(a1; : : : ; ak) of distinct entries ai 2 A can be formed?

Solution. There are n choices for a1, but only n � 1 choices for a2 sincerepeated entries are not allowed. Similarly, there are only n� 2 choices for a3,and so on. This is the same argument used in the previous example. Hence,there are n!=(n� k)! k-tuples with distinct elements of A.

Given a set A, we let Ak denote the set of all k-tuples (a1; : : : ; ak) whereeach ai 2 A. We denote by Ak� the subset of all k-tuples with distinct entries.If jAj = n, then jAkj = jAjk = nk, and jAk�j = n!=(n� k)!.

Example 1.31 (The Birthday Problem). In a group of k people, what isthe probability that two or more people have the same birthday?

Solution. The �rst step in the solution is to specify the sample space and the probability }. Let D := f1; : : : ; 365g denote the days of the year, andlet

:= f(d1; : : : ; dk) : di 2 Dgdenote the set of all possible sequences of k birthdays. Then jj = jDjk.Assuming all sequences are equally likely, we take}(E) := jEj=jj for arbitraryevents E � .

Let Q denote the set of sequences (d1; : : : ; dk) that have at least one pair ofrepeated entries. For example, if k = 9, one of the sequences in Q would be

(364; 17; 201;17;51;171;51;33;51):

Notice that 17 appears twice and 51 appears 3 times. The set Q is very compli-cated. On the other hand, consider Qc, which is the set of sequences (d1; : : : ; dk)that have no repeated entries. Then

jQcj =jDj!

(jDj � k)!;

and

}(Qc) =jQcjjj =

jDj!jDjk (jDj � k)!

;

where jDj = 365. A plot of }(Q) = 1 �}(Qc) as a function of k is shown inFigure 1.10. As the dashed line indicates, for k � 23, the probability of twomore more people having the same birthday is greater than 1=2.

May 28, 2003

36 Chap. 1 Introduction to Probability

0 5 10 15 20 25 30 35 40 45 50 550

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 1.10. A plot of }(Q) as a function of k. For k � 23, the probability of two or morepeople having the same birthday is greater than 1=2.

Unordered Sampling without Replacement

Before stating the problem, we begin with a simple example to illustrate theconcept to be used.

Example 1.32. Let A = f1; 2; 3; 4;5g. Then A3 contains 53 = 125 triples.The set of triples with distinct entries, A3�, contains 5!=2! = 60 triples. We canwrite A3� as the disjoint union

A3� = G123 [G124 [G125 [G134 [G135

[G145 [G234 [G235 [G245 [G345;

where for distinct i; j; k,

Gijk := f(i; j; k); (i; k; j); (j; i; k); (j; k; i); (k; i; j); (k; j; i)g:

Each triple inGijk is a rearrangement, or permutation, of the same 3 elements.

The above decomposition works in general. Write Ak� as the union of disjointsets,

Ak� =[G; (1.32)

where each subset G consists of k-tuples that contain the same elements. Ingeneral, for a k-tuple built from k distinct elements, there are k choices for the�rst entry, k � 1 choices for the second entry, and so on. Hence, there are k!

May 28, 2003

1.6 Combinatorics and Probability 37

k-tuples that can be built. In other words, each G in (1.32) has jGj = k!. Itfollows from (1.32) that

jAk�j = (number of di�erent sets G) � k!; (1.33)

and so the number of di�erent subsets G is

jAk�jk!

=n!

k!(n� k)!:

The standard notation for the above right-hand side is�n

k

�:=

n!

k!(n� k)!

and is read \n choose k." In Matlab,�nk

�= nchoosek(n; k). The symbol�

nk

�is also called the binomial coe�cient because it arises in the binomial

theorem, which is discussed in Chapter 2.

Example 1.33 (Unordered Sampling without Replacement). In many cardgames, we are dealt a hand of k cards, but the order in which the cards aredealt is not important. From a deck of n cards, how many k-card hands arepossible?

Solution. First think about ordered hands corresponding to k-tuples withdistinct entries. The set of all such hands corresponds to Ak� . Now grouptogether k-tuples composed of the same elements into sets G as in (1.32). Allthe ordered k-tuples in a particular G represent rearrangements of a single hand.So it is really the number of di�erent sets G that corresponds to the number ofunordered hands. Thus, the number of k-card hands is

�nk

�.

Example 1.34. A 12-person jury is to be selected from a group of 20potential jurors. How many di�erent juries are possible?

Solution. There are �20

12

�=

20!

12! 8!= 125970

di�erent juries.

Example 1.35. A 12-person jury is to be selected from a group of 20potential jurors of which 11 are men and 9 are women. How many 12-personjuries are there with 5 men and 7 women?

Solution. There are�115

�ways to choose the 5 men, and there are

�97

�ways

to choose the 7 women. Hence, there are�11

5

��9

7

�=

11!

5! 6!� 9!

7! 2!= 16632

May 28, 2003

38 Chap. 1 Introduction to Probability

possible juries with 5 men and 7 women.

Example 1.36. An urn contains 11 green balls and 9 red balls. If 12 ballsare chosen at random, what is the probability of choosing exactly 5 green ballsand 7 red balls?

Solution. Since balls are chosen at random, the desired probability is

number of ways to choose 5 green balls and 7 red balls

number of ways to choose 12 balls:

In the numerator, the 5 green balls must be chosen from the 11 available greenballs, and the 7 red balls must be chosen from the 9 available red balls. In thedenominator, the total of 5+ 7 = 12 balls must be chosen from the 11+9 = 20available balls. So the required probability is�

11

5

��9

7

��20

12

� =16632

125970� 0:132:

Example 1.37. Consider a collection of N items, of which d are defective(and N � d work properly). Suppose we test n � N items at random. Showthat the probability that k of the n tested items are defective is�

d

k

��N � d

n � k

��N

n

� : (1.34)

Solution. Since items are chosen at random, the desired probability is

number of ways to choose k defective and n� k working items

number of ways to choose n items:

In the numerator, the k defective items are chosen from the total of d defectiveones, and the n � k working items are chosen from the total of N � d onesthat work. In the denominator, the n items to be tested are chosen from thetotal of N items. Hence, the desired numerator is

�dk

��N�dn�k

�, and the desired

denominator is�Nn

�.

Example 1.38 (Lottery). In some state lottery games, a player chooses ndistinct numbers from the set f1; : : : ; Ng. At the lottery drawing, balls num-bered from 1 to N are mixed, and n balls withdrawn. What is the probabilitythat k of the n balls drawn match the player's choices?

May 28, 2003

1.6 Combinatorics and Probability 39

Solution. Let D denote the subset of n numbers chosen by the player.Then f1; : : : ; Ng = D [Dc. We need to �nd the probability that the lotterydrawing chooses k numbers fromD and n�k numbers fromDc. Since jDj = n,this probability is �

n

k

��N � n

n � k

��N

n

� :

Notice that this is just (1.34) with d = n. In other words, we regard the numberschosen by the player as \defective," and we are �nding the probability that thelottery drawing chooses k defective and n � k non-defective numbers.

Example 1.39 (Binomial Probabilities). A certain coin has probability pof turning up heads. If the coin is tossed n times, what is the probability thatk of the n tosses result in heads? Assume tosses are independent.

Solution. Let Hi denote the event that the ith toss is heads. We call i thetoss index, which takes values 1; : : : ; n. A typical sequence of n tosses would be

H1 \Hc2 \H3 \ � � � \Hn�1 \Hc

n;

where Hci is the event that the ith toss is tails. The probability that n tosses

result in k heads and n� k tails is

}

�[ eH1 \ � � � \ eHn

�;

where eHi is either Hi or Hci , and the union is over all such intersections for

which eHi = Hi occurs k times and eHi = Hci occurs n� k times. Since this is a

disjoint union,

}

�[ eH1 \ � � � \ eHn

�=X

}� eH1 \ � � � \ eHn

�:

By independence,

}� eH1 \ � � � \ eHn

�= }

� eH1

� � � �}� eHn

�= pk(1� p)n�k

is the same for every term in the sum. The number of terms in the sum is thenumber of ways of selecting k out of n toss indexes to assign to heads. Sincethis number is

�nk

�, the probability that k of n tosses result in heads is�

n

k

�pk(1� p)n�k:

May 28, 2003

40 Chap. 1 Introduction to Probability

Example 1.40 (Bridge). In bridge, 52 cards are dealt to 4 players; hence,each player has 13 cards. The order in which the cards are dealt is not im-portant, just the �nal 13 cards each player ends up with. How many di�erentbridge games can be dealt?

Solution. There are�5213

�ways to choose the 13 cards of the �rst player.

Now there are only 52�13 = 39 cards left. Hence, there are�3913

�ways to choose

the 13 cards for the second player. Similarly, there are�2613

�ways to choose the

second player's cards, and�1313

�= 1 way to choose the fourth player's cards. It

follows that there are�52

13

��39

13

��26

13

��13

13

�=

52!

13! 39!� 39!

13! 26!� 26!

13! 13!� 13!

13! 0!

=52!

(13!)4� 5:36� 1028

games that can be dealt.

Example 1.41. Traditionally, computers use binary arithmetic, and storen-bit words composed of zeros and ones. The new m{Computer uses m-aryarithmetic, and stores n-symbol words in which the symbols (m-ary digits)come from the set f0; 1; : : :;m � 1g. How many n-symbol words are therewith k0 zeros, k1 ones, k2 twos, : : : , and km�1 copies of symbol m � 1, wherek0 + k1 + k2 + � � �+ km�1 = n?

Solution. To answer this question, we build a typical n-symbol word ofthe required form as follows. We begin with an empty word,

( ; ; : : : ; )| {z }n empty positions

:

From these n available positions, there are�nk0

�ways to select positions to put

the k0 zeros. For example, if k0 = 3, we might have

( ; 0; ; 0; ; : : : ; ; 0)| {z }n� 3 empty positions

:

Now there are only n� k0 empty positions. From these, there are�n�k0k1

�ways

to select positions to put the k1 ones. For example, if k1 = 1, we might have

( ; 0; 1; 0; ; : : : ; ; 0)| {z }n� 4 empty positions

:

Now there are only n�k0�k1 empty positions. From these, there are�n�k0�k1

k2

�ways to select positions to put the k2 twos. Continuing in this way, we �nd

May 28, 2003

1.6 Combinatorics and Probability 41

that the number of n-symbol words with the required numbers of zeros, ones,twos, etc., is�

n

k0

��n� k0k1

��n� k0 � k1

k2

�� � ��n� k0 � k1 � � � � � km�2

km�1

�;

which expands to

n!

k0! (n� k0)!� (n� k0)!

k1! (n� k0 � k1)!� (n � k0 � k1)!

k2! (n� k0 � k1 � k2)!� � �

� � � (n � k0 � k1 � � � � � km�2)!km�1! (n� k0 � k1 � � � � � km�1)!

:

Canceling common factors and noting that (n�k0�k1�� � ��km�1)! = 0! = 1,we obtain

n!

k0! k1! � � �km�1!as the number of n-symbol words with k0 zeros, k1 ones, etc.

We call �n

n0; : : : ; nm�1

�:=

n!

k0! k1! � � �km�1!the multinomial coe�cient. When m = 2,�

n

k0; k1

�=

�n

k0; n� k0

�=

n!

k0! (n� k0)!=

�n

k0

�becomes the binomial coe�cient.

Unordered Sampling with Replacement

Before stating the problem, we begin with a simple example to illustrate theconcepts involved.

Example 1.42. An automated snack machine dispenses apples, bananas,and carrots. For a �xed price, the customer gets 5 items from among the 3possible choices. For example, a customer could choose 1 apple, 2 bananas,and 2 carrots. To record the customer's choices electronically, 7-bit sequencesare used. For example, (0; 1; 0; 0; 1; 0; 0) means 1 apple, 2 bananas, and 2 car-rots. The �rst group of zeros tells how many apples, the second group of zerostells how many bananas, and the third group of zeros tells how many car-rots. The ones are used to separate the groups of zeros. As another example,(0; 0; 0; 1; 0;1;0) means 3 apples, 1 banana, and 1 carrot. How many customerchoices are there?

Solution. The question is equivalent to asking how many 7-bit sequencesthere are with 5 zeros and 2 ones. From Example 1.41, the answer is

� 75;2

�=�

75

�=�72

�.

May 28, 2003

42 Chap. 1 Introduction to Probability

Example 1.43 (Unordered Sampling with Replacement). From the set A =f1; 2; : : : ; ng, k numbers are drawn with replacement. How many di�erent setsof k numbers can be obtained in this way?

Solution. We identify the sets drawn in this way with binary sequenceswith k zeros and n � 1 ones, where the ones break up the groups of zeros asin the previous example. Hence, there are

�k+n�1

k

�=�k+n�1n�1

�ways to draw k

items with replacement from n items.

Just as we partitioned Ak� in (1.32), we can partition Ak using

Ak =[G;

where each G contains all k-tuples with the same elements. Unfortunately,di�erent Gs may contain di�erent numbers of k-tuples. For example, if n = 3and k = 3, one of the sets G would be

f(1; 2; 3); (1; 3; 2); (2;1; 3); (2;3; 1); (3;1;2); (3;2;1)g;

while another G would be

f(1; 2; 2); (2; 1; 2); (2;2;1)g:

How many di�erent sets G are there? Although we cannot �nd the answer byusing an equation like (1.33), we see from the above analysis that there are�k+n�1

k

�sets G.

Notesx1.3: Axioms and Properties of Probability

Note 1. When the sample space is �nite or countably in�nite, }(A) isusually de�ned for all subsets of by taking

}(A) :=X!2A

p(!)

for some nonnegative function p that sums to one; i.e., p(!) � 0 andP

!2 p(!)= 1. (It is easy to check that if } is de�ned in this way, then it satis�es theaxioms of a probability measure.) However, for larger sample spaces, such aswhen is an interval of the real line, e.g., Example 1.16, and we want theprobability of an interval to be proportional to its length, it is not possible tode�ne }(A) for all subsets and still have } satisfy all four axioms. (A proofof this fact can be found in advanced texts, e.g., [4, p. 45].) The way aroundthis di�culty is to de�ne }(A) only for some subsets of , but not all subsetsof . It is indeed fortunate that this can be done in such a way that }(A) isde�ned for all subsets of interest that occur in practice. A set A for which }(A)

May 28, 2003

Notes 43

is de�ned is called an event, and the collection of all events is denoted by A.The triple (;A;}) is called a probability space.

Given that }(A) is de�ned only for A 2 A, in order for the probabilityaxioms stated in Section 1.3 to make sense, A must have certain properties.First, axiom (i) requires that 6 2 A. Second, axiom (iv) requires that 2 A.Third, axiom (iii) requires that if A1; A2; : : : are mutually exclusive events, thentheir union,

S1n=1An, is also an event. Additionally, we need that if A1; A2; : : :

are arbitrary events, then so is their intersection,T1n=1An. We show below

that these four requirements are satis�ed if we assume only that A is a �-�eld.

If A is a collection of subsets of with the following properties, then A iscalled a �-�eld or a �-algebra.

(i) The empty set 6 belongs to A, i.e., 6 2 A.(ii) If A 2 A, then so does its complement, Ac, i.e., A 2 A implies Ac 2 A.(iii) If A1; A2; : : : belong to A, then so does their union,

S1n=1An.

If A is a �-�eld, then 6 and 6 c = belong to A. Also, for any An 2 A,their union is in A, and by De Morgan's law, so is their intersection.

Given any set , let 2 denote the collection of all subsets of . We call2 the power set of . This notation is used for both �nite and in�nite sets.The notation is motivated by the fact that if is a �nite set, then there are2jj di�erent subsets of .y Since the power set obviously satis�es the threeproperties above, the power set is a �-�eld.

Let C be any collection of subsets of . We do not assume C is a �-�eld.De�ne �(C) to be the smallest �-�eld that contains C. By this we meanthat if D is any �-�eld with C � D, then �(C) � D.

Example 1.44. Let A be a nonempty subset of , and put C = fAg sothat the collection C consists of a single subset. Find �(C).

Solution. From the three properties of a �-�eld, any �-�eld that containsA must also contain Ac, 6 , and . We claim

�(C) = f6 ; A;Ac;g:ySuppose = f!1; : : : ; !ng. Each subset of can be associated with an n-bit word. A

point !i is in the subset if and only if the ith bit in the word is a 1. For example, if n = 5,we associate 01011 with the subset f!2; !4; !5g since bits 2, 4, and 5 are ones. In particular,00000 corresponds to the empty set and 11111 corresponds to itself. Since there are 2n

n-bit words, there are 2n subsets of .

May 28, 2003

44 Chap. 1 Introduction to Probability

Since A[Ac = , it is easy to see that our choice satis�es the three propertiesof a �-�eld. It is also clear that if D is any �-�eld such that C � D, then everysubset in our choice for �(C) must belong to D; i.e., �(C) � D.

More generally, if A1; : : : ; An is a partition of , then �(fA1; : : : ; Ang)consists of the empty set along with the 2n � 1 subsets constructed by takingall possible unions of the Ai. See Problem 33.

For general collections C of subsets of , all we can say is that (Problem 37)

�(C) =\C�A

A;

where the intersection is over all �-�elds A that contain C. Note that there isalways at least one �-�eld A that contains C; e.g., the power set.

Note 2. In light of the preceding note, we see that to guarantee that}(f!ng) is de�ned in Example 1.18, it is necessary to assume that the sin-gleton sets f!ng are events, i.e., f!ng 2 A.

x1.4: Conditional Probability

Note 3. Here is a choice for and } for Example 1.19. Let

:= f(e; c) : e; c = 0 or 1g;where e = 1 corresponds to exposure to PCBs, and c = 1 corresponds todeveloping cancer. We then take

E := f(e; c) : e = 1g = f (1; 0) ; (1; 1) g;

andC := f(e; c) : c = 1g = f (0; 1) ; (1; 1) g:

It follows that

Ec = f (0; 1) ; (0; 0) g and Cc = f (1; 0) ; (0; 0) g:Hence, E \C = f(1; 1)g, E \Cc = f(1; 0)g, Ec \C = f(0; 1)g, and Ec \ Cc =f(0; 0)g.

In order to specify a suitable probability measure on , we work backwards.First, if a measure } on exists such that (1.25) and (1.26) hold, then

}(f(1; 1)g) = }(E \C) =}(CjE)}(E) = 1=4;

}(f(0; 1)g) = }(Ec \ C) =}(CjEc)}(Ec) = 1=16;

}(f(1; 0)g) = }(E \Cc) =}(CcjE)}(E) = 1=2;

}(f(0; 0)g) = }(Ec \ Cc) = }(CcjEc)}(Ec) = 3=16:

May 28, 2003

Notes 45

This suggests that we de�ne } by

}(A) :=X!2A

p(!);

where p(!) = p(e; c) is given by p(1; 1) := 1=4, p(0; 1) := 1=16, p(1; 0) := 1=2,and p(0; 0) := 3=16. Starting from this de�nition of }, it is not hard to checkthat (1.25) and (1.26) hold.

x1.5: Independence

Note 4. Here is an example of three events that are pairwise independent,but not mutually independent. Let

:= f1; 2; 3; 4; 5;6;7g;and put }(f!g) := 1=8 for ! 6= 7, and }(f7g) := 1=4. Take A := f1; 2; 7g,B := f3; 4; 7g, and C := f5; 6; 7g. Then }(A) = }(B) = }(C) = 1=2. and}(A \ B) = }(A \ C) = }(B \ C) = }(f7g) = 1=4. Hence, A and B, A andC, and B and C are pairwise independent. However, since }(A \ B \ C) =}(f7g) = 1=4, and since }(A)}(B)}(C) = 1=8, A, B, and C are not mutuallyindependent.

Note 5. Here is an example of three events for which }(A \ B \ C) =}(A)}(B)}(C) but no pair is independent. Let := f1; 2; 3; 4g. Put}(f1g) =}(f2g) = }(f3g) = p and }(f4g) = q, where 3p + q = 1 and 0 � p; q � 1.Put A := f1; 4g, B := f2; 4g, and C := f3; 4g. Then the intersection of anypair is f4g, as is the intersection of all three sets. Also, }(f4g) = q. Since}(A) = }(B) = }(C) = p+q, we require (p+q)3 = q and (p+q)2 6= q. Solving3p+ q = 1 and (p + q)3 = q for q reduces to solving 8q3 + 12q2 � 21q + 1 = 0.Now, q = 1 is obviously a root, but this results in p = 0, which implies mutualindependence. However, since q = 1 is a root, it is easy to verify that

8q3 + 12q2 � 21q+ 1 = (q � 1)(8q2 + 20q� 1):

By the quadratic formula, the desired root is q =��5+3

p3�=4. It then follows

that p =�3 � p

3�=4 and that p + q =

��1 + p3�=2. Now just observe that

(p+ q)2 6= q.

Note 6. Here is a choice for and } for Example 1.23. Let

:= f(i; j; k) : i; j; k = 0 or 1g;with 1 corresponding to heads and 0 to tails. Now put

H1 := f(i; j; k) : i = 1g;H2 := f(i; j; k) : j = 1g;H3 := f(i; j; k) : k = 1g;

May 28, 2003

46 Chap. 1 Introduction to Probability

and observe that

H1 = f (1; 0; 0) ; (1; 0; 1) ; (1; 1; 0) ; (1; 1; 1) g;H2 = f (0; 1; 0) ; (0; 1; 1) ; (1; 1; 0) ; (1; 1; 1) g;H3 = f (0; 0; 1) ; (0; 1; 1) ; (1; 0; 1) ; (1; 1; 1) g:

Next, let }(f(i; j; k)g) := �i+j+k(1� �)3�(i+j+k). Since

Hc3 = f (0; 0; 0) ; (1; 0; 0) ; (0; 1; 0) ; (1; 1; 0) g;

H1 \ H2 \Hc3 = f(1; 1; 0)g. Similarly, H1 \Hc

2 \ H3 = f(1; 0; 1)g, and Hc1 \

H2 \H3 = f(0; 1; 1)g. Hence,

S2 = f (1; 1; 0) ; (1; 0; 1) ; (0; 1; 1) g= f(1; 1; 0)g[ f(1; 0; 1)g[ f(0; 1; 1)g;

and thus, }(S2) = 3�2(1� �).

Note 7. To show the existence of a sample space and probability measurewith such independent events is beyond the scope of this book. Such construc-tions can be found in more advanced texts such as [4, Section 36].

Problemsx1.1: Review of Set Notation

1. For real numbers �1 < a < b <1, we use the following notation.

(a; b] := fx : a < x � bg(a; b) := fx : a < x < bg[a; b) := fx : a � x < bg[a; b] := fx : a � x � bg:

We also use

(�1; b] := fx : x � bg(�1; b) := fx : x < bg(a;1) := fx : x > ag[a;1) := fx : x � ag:

For example, with this notation, (0; 1]c = (�1; 0] [ (1;1) and (0; 2] [[1; 3) = (0; 3). Now analyze

(a) [2; 3]c,

(b) (1; 3) [ (2; 4),(c) (1; 3) \ [2; 4),

May 28, 2003

Problems 47

(d) (3; 6] n (5; 7).2. Sketch the following subsets of the x-y plane.

(a) Bz := f(x; y) : x+ y � zg for z = 0;�1;+1.(b) Cz := f(x; y) : x > 0; y > 0; and xy � zg for z = 1.

(c) Hz := f(x; y) : x � zg for z = 3.

(d) Jz := f(x; y) : y � zg for z = 3.

(e) Hz \ Jz for z = 3.

(f) Hz [ Jz for z = 3.

(g) Mz := f(x; y) : max(x; y) � zg for z = 3, where max(x; y) isthe larger of x and y. For example, max(7; 9) = 9. Of course,max(9; 7) = 9 too.

(h) Nz := f(x; y) : min(x; y) � zg for z = 3, where min(x; y) is thesmaller of x and y. For example, min(7; 9) = 7 = min(9; 7).

(i) M2 \N3.

(j) M4 \N3.

3. Let denote the set of real numbers, = (�1;1).

(a) Use the distributive law to simplify

[1; 4]\�[0; 2][ [3; 5]

�:

(b) Use De Morgan's law to simplify�[0; 1][ [2; 3]

�c.

(c) Simplify1\n=1

(�1=n; 1=n).

(d) Simplify1\n=1

[0; 3 + 1=(2n)).

(e) Simplify1[n=1

[5; 7� (3n)�1].

(f) Simplify1[n=1

[0; n].

4. Fix two sets A and C. If C � A, show that for every subset B,

(A \B) [C = A \ (B [C): (1.35)

Also show that if (1.35) holds for any set B, then C � A (and thus (1.35)holds for all sets B).

May 28, 2003

48 Chap. 1 Introduction to Probability

?5. Let A and B be subsets of . Put

I := f! 2 : ! 2 A implies ! 2 Bg:

Show that A \ I = A \B.?6. Explain why f : (�1;1)! [0;1) with f(x) = x3 is not well de�ned.

?7. Sketch f(x) = sin(x) for x 2 [��=2; �=2].(a) Determine, if possible, a choice of co-domainY such that f : [��=2; �=2]!

Y is invertible.

(b) Find the inverse image fx : f(x) � 1=2g.(c) Find the inverse image fx : f(x) < 0g.

?8. Sketch f(x) = sin(x) for x 2 [0; �].

(a) Determine, if possible, a choice of co-domain Y such that f : [0; �]!Y is invertible.

(b) Find the inverse image fx : f(x) � 1=2g.(c) Find the inverse image fx : f(x) < 0g.

?9. Let X be any set, and let A � X. De�ne the real-valued function f by

f(x) :=

�1; x 2 A;0; x =2 A:

Thus, f :X ! IR, where IR := (�1;1) denotes the real numbers. Forarbitrary B � IR, �nd f�1(B). Hint: There are four cases to consider,depending on whether 0 or 1 belong to B.

?10. Let f :X ! Y be a function such that f takes only n distinct values, sayy1; : : : ; yn. De�ne

Ai := f�1(fyig) = fx 2 X : f(x) = yig:

Show that for arbitrary B � Y , f�1(B) can be expressed as a union ofthe Ai. (It then follows that there are only 2n possibilities for f�1(B).)

?11. If f :X ! Y , show that inverse images preserve the following set opera-tions.

(a) If B � Y , show that f�1(Bc) = f�1(B)c.

(b) If Bn is a sequence of subsets of Y , show that

f�1� 1[n=1

Bn

�=

1[n=1

f�1(Bn):

May 28, 2003

Problems 49

(c) If Bn is a sequence of subsets of Y , show that

f�1� 1\n=1

Bn

�=

1\n=1

f�1(Bn):

?12. Show that if B and C are countable sets, then so is B [ C. Hint: IfB =

Sifbig and C =

Sifcig, put a2i := fbig and a2i�1 := fcig.

?13. Let C1, C2, : : :be countable sets. Show that

B :=1[i=1

Ci

is a countable set.

?14. Show that any subset of a countable set is countable.

?15. Show that if A � B and A is uncountable, then so is B.

?16. Show that the union of a countable set and an uncountable set is uncount-able.

x1.2: Probability Models

17. A letter of the alphabet (a{z) is generated at random. Specify a samplespace and a probability measure }. Compute the probability that avowel (a, e, i, o, u) is generated.

18. A collection of plastic letters, a{z, is mixed in a jar. Two letters aredrawn at random, one after the other. What is the probability of drawinga vowel (a, e, i, o, u) and a consonant in either order? Two vowels in anyorder? Specify your sample space and probability }.

19. A new baby wakes up exactly once every night. The time at which thebaby wakes up occurs at random between 9 pm and 7 am. If the parentsgo to sleep at 11 pm, what is the probability that the parents are notawakened by the baby before they would normally get up at 7 am? Specifyyour sample space and probability }.

20. For any real or complex number z 6= 1 and any positive integer N , derivethe geometric series formula

N�1Xk=0

zk =1� zN

1� z; z 6= 1:

Hint: Let SN := 1 + z + � � �+ zN�1, and show that SN � zSN = 1� zN .Then solve for SN .

May 28, 2003

50 Chap. 1 Introduction to Probability

Remark. If jzj < 1, jzjN ! 0 as N !1. Hence,

1Xk=0

zk =1

1� z; for jzj < 1:

21. Let := f1; : : : ; 6g. If p(!) = 2 p(! � 1) for ! = 2; : : : ; 6, and ifP6!=1 p(!) = 1, show that p(!) = 2!�1=63. Hint: Use Problem 20.

x1.3: Axioms and Properties of Probability

22. Let A and B be events for which }(A), }(B), and }(A [B) are known.Express the following in terms of these probabilities:

(a) }(A \B).(b) }(A \Bc).

(c) }(B [ (A \Bc)).

(d) }(Ac \Bc).

23. Let be a sample space equipped with two probability measures, }1 and}2. Given any 0 � � � 1, show that if }(A) := �}1(A) + (1 � �)}2(A),then } satis�es the four axioms of a probability measure.

24. Let be a sample space, and �x any point !0 2 . For any event A, put

�(A) :=

�1; !0 2 A;0; otherwise:

Show that � satis�es the axioms of a probability measure.

25. Suppose that instead of axiom (iii) of Section 1.3, we assume only thatfor any two disjoint events A and B, }(A [B) = }(A) +}(B). Use thisassumption and inductionz on N to show that for any �nite sequence ofpairwise disjoint events A1; : : : ; AN ,

}

� N[n=1

An

�=

NXn=1

}(An):

Using this result for �nite N , it is not possible to derive axiom (iii), whichis the assumption needed to derive the limit results of Section 1.3.

zIn this case, using induction on N means that you �rst verify the desired result for N = 2.Second, you assume the result is true for some arbitrary N � 2 and then prove the desiredresult is true for N + 1.

May 28, 2003

Problems 51

?26. The purpose of this problem is to show that any countable union can bewritten as a union of pairwise disjoint sets. Given any sequence of setsFn, de�ne a new sequence by A1 := F1, and

An := Fn \ F cn�1 \ � � � \ F c

1 ; n � 2:

Note that the An are pairwise disjoint. For �nite N � 1, show that

N[n=1

Fn =N[n=1

An:

Also show that 1[n=1

Fn =1[n=1

An:

?27. Use the preceding problem to show that for any sequence of events Fn,

}

� 1[n=1

Fn

�= lim

N!1}

� N[n=1

Fn

�:

?28. Use the preceding problem to show that for any sequence of events Gn,

}

� 1\n=1

Gn

�= lim

N!1}

� N\n=1

Gn

�:

29. The Finite Union Bound. Show that for any �nite sequence of eventsF1; : : : ; FN ,

}

� N[n=1

Fn

��

NXn=1

}(Fn):

Hint: Use the inclusion-exclusion formula (1.12) and induction on N . Seethe last footnote for information on induction.

?30. The In�nite Union Bound. Show that for any in�nite sequence of eventsFn,

}

� 1[n=1

Fn

��

1Xn=1

}(Fn):

Hint: Combine Problems 27 and 29.

?31. First Borel{Cantelli Lemma. Show that if Bn is a sequence of events forwhich 1X

n=1

}(Bn) < 1; (1.36)

May 28, 2003

52 Chap. 1 Introduction to Probability

then

}

� 1\n=1

1[k=n

Bk

�= 0:

Hint: Let G :=T1n=1Gn, where Gn :=

S1k=nBk. Now use Problem 28,

the union bound of the preceding problem, and the fact that (1.36) implies

limN!1

1Xn=N

}(Bn) = 0:

?32. Second Borel{Cantelli Lemma. Show that ifBn is a sequence of independentevents for which 1X

n=1

}(Bn) = 1;

then

}

� 1\n=1

1[k=n

Bk

�= 1:

Hint: The inequality 1�}(Bk) � exp[�}(Bk)] may be helpful.x

?33. This problem assumes you have read Note 1. Let A1; : : : ; An be a partitionof . If C := fA1; : : : ; Ang, show that �(C) consists of the empty set alongwith all unions of the form [

i

Aki

where ki is a �nite subsequence of distinct elements from f1; : : : ; ng.?34. This problem assumes you have read Note 1. Let be a sample space,

and let X: ! IR, where IR denotes the set of real numbers. Supposethe mapping X takes �nitely many distinct values x1; : : : ; xn. Find thesmallest �-�eld A of subsets of such that for all B � IR, X�1(B) 2 A.Hint: Problems 10 and 11.

?35. This problem assumes you have read Note 1. Let := f1; 2; 3; 4; 5g, andput A := f1; 2; 3g and B := f3; 4; 5g. Put }(A) := 5=8 and }(B) := 7=8.

(a) Find F := �(fA;Bg), the smallest �-�eld containing the sets A andB.

(b) Compute }(F ) for all F 2 F .(c) Trick Question. What is }(f1g)?

?36. This problem assumes you have read Note 1. Show that a �-�eld cannot becountably in�nite; i.e., show that if a �-�eld contains an in�nite numberof sets, then it contains an uncountable number of sets.

xThe inequality 1 � x � e�x for x � 0 can be derived by showing that the functionf(x) := e�x � (1� x) satis�es f(0)� 0 and is nondecreasing, e.g., f 0(x) � 0.

May 28, 2003

Problems 53

?37. This problem assumes you have read Note 1.

(a) Let A� be any indexed collection of �-�elds. Show thatT�A� is

also a �-�eld.

(b) Given a collection of subsets of , show that

C0 :=\C�A

A;

where the intersection is over all �-�elds A that contain C, is thesmallest �-�eld containing C.

?38. The Borel �-Field. This problem assumes you have read Note 1. Let B denotethe smallest �-�eld containing all the open subsets of IR := (�1;1).This collection B is called the Borel �-�eld. The sets in B are calledBorel sets. Hence, every open set, and every open interval, is a Borelset.

(a) Show that every interval of the form (a; b] is also a Borel set. Hint:

Write (a; b] as a countable intersection of open intervals and use theproperties of a �-�eld.

(b) Show that every singleton set fag is a Borel set.(c) Let a1; a2; : : : be distinct real numbers. Put

A :=1[k=1

fakg:

Determine whether or not A is a Borel set.

(d) Lebesgue measure � on the Borel subsets of (0; 1) is a probabilitymeasure that is completely characterized by the property that theLebesgue measure of an open interval (a; b) � (0; 1) is its length; i.e.,��(a; b)

�= b � a. Show that �

�(a; b]

�is also equal to b � a. Find

�(fag) for any singleton set. If the set A in part (c) is a Borel set,compute �(A).

Remark. Note 5 in Chapter 4 contains more details on the con-stuction of probability measures on the Borel subsets of IR.

?39. The Borel �-Field, Continued. This problem assumes you have read Note 1.

Background: Recall that a set U � IR is open if for every x 2 U , thereis a positive number "x, depending on x, such that (x � "x; x+ "x) � U .Hence, an open set U can always be written in the form

U =[x2U

(x� "x; x+ "x):

May 28, 2003

54 Chap. 1 Introduction to Probability

Now observe that if (x� "x; x+ "x) � U , we can �nd a rational numberqx close to x and a rational number �x < "x such that

x 2 (qx � �x; qx + �x) � (x� "x; x+ "x) � U:

Thus, every open set can be written in the form

U =[x2U

(qx � �x; qx + �x);

where each qx and each �x is a rational number. Since the rational num-bers form a countable set, there are only countably many such intervalswith rational centers and rational lengths; hence, the union is really acountable one.

Problem: Show that the smallest �-�eld containing all the open intervalsis equal to the Borel �-�eld de�ned in Problem 38.

x1.4: Conditional Probability

40. IfN (Oc;ns)

N (Ons); N (Ons);

N (Oc;s)

N (Os); and N (Os)

are given, compute N (Onc;s) and N (Onc;ns) in terms of them.

41. If}(C) and}(B\C) are positive, derive the chain rule of conditionalprobability,

}(A \BjC) = }(AjB \C)}(BjC):Also show that

}(A \B \C) = }(AjB \C)}(BjC)}(C):

42. The university buys workstations from two di�erent suppliers, Mini Mi-cros (MM) and Highest Technology (HT). On delivery, 10% of MM'sworkstations are defective, while 20% of HT's workstations are defective.The university buys 140 MM workstations and 60 HT workstations for itscomputer lab. Suppose you walk into the computer lab and randomly sitdown at a workstation.

(a) What is the probability that your workstation is from MM? FromHT?

(b) What is the probability that your workstation is defective? Answer:

0:13.

(c) Given that your workstation is defective, what is the probability thatit came from Mini Micros? Answer: 7=13.

May 28, 2003

Problems 55

1−00

1−11

Figure 1.11. Binary channel with crossover probabilities " and �. If � = ", this is called abinary symmetric channel.

43. The probability that a cell in a wireless system is overloaded is 1=3. Giventhat it is overloaded, the probability of a blocked call is 0:3. Given thatit is not overloaded, the probability of a blocked call is 0:1. Find theconditional probability that the system is overloaded given that your callis blocked. Answer: 0:6.

44. The binary channel shown in Figure 1.11 operates as follows. Given thata 0 is transmitted, the conditional probability that a 1 is received is ".Given that a 1 is transmitted, the conditional probability that a 0 isreceived is �. Assume that the probability of transmitting a 0 is the sameas the probability of transmitting a 1. Given that a 1 is received, �nd theconditional probability that a 1 was transmitted. Hint: Use the notation

Ti := fi is transmittedg; i = 0; 1;

andRj := fj is receivedg; j = 0; 1:

Remark. If � = ", this channel is called the binary symmetric chan-nel.

45. Professor Random has taught probability for many years. She has foundthat 80% of students who do the homework pass the exam, while 10%of students who don't do the homework pass the exam. If 60% of thestudents do the homework, what percent of students pass the exam? Ofstudents who pass the exam, what percent did the homework? Answer:

12=13.

46. A certain jet aircraft's autopilot has conditional probability 1=3 of failuregiven that it employs a faulty microprocessor chip. The autopilot hasconditional probability 1=10 of failure given that it employs nonfaultychip. According to the chip manufacturer, the probability of a customer'sreceiving a faulty chip is 1=4. Given that an autopilot failure has occurred,�nd the conditional probability that a faulty chip was used. Use thefollowing notation:

AF = fautopilot failsgCF = fchip is faultyg:

May 28, 2003

56 Chap. 1 Introduction to Probability

Answer: 10=19.

?47. You have �ve computer chips, two of which are known to be defective.

(a) You test one of the chips; what is the probability that it is defective?

(b) Your friend tests two chips at random and reports that one is defec-tive and one is not. Given this information, you test one of the threeremaining chips at random; what is the conditional probability thatthe chip you test is defective?

(c) Consider the following modi�cation of the preceding scenario. Yourfriend takes away two chips at random for testing; before your friendtells you the results, you test one of the three remaining chips atrandom; given this (lack of) information, what is the conditionalprobability that the chip you test is defective? Since you have notyet learned the results of your friend's tests, intuition suggests thatyour conditional probability should be the same as your answer topart (a). Is your intuition correct?

x1.5: Independence

48. (a) If two sets A and B are disjoint, what equation must they satisfy?

(b) If two events A and B are independent, what equation must theysatisfy?

(c) Suppose two events A and B are disjoint. Give conditions underwhich they are also independent. Give conditions under which theyare not independent.

49. A certain binary communication system has a bit-error rate of 0:1; i.e.,in transmitting a single bit, the probability of receiving the bit in error is0:1. To transmit messages, a three-bit repetition code is used. In otherwords, to send the message 1, 111 is transmitted, and to send the message0, 000 is transmitted. At the receiver, if two or more 1s are received, thedecoder decides that message 1 was sent; otherwise, i.e., if two or morezeros are received, it decides that message 0 was sent. Assuming bit errorsoccur independently, �nd the probability that the decoder puts out thewrong message. Answer: 0:028.

50. You and your neighbor attempt to use your cordless phones at the sametime. Your phones independently select one of ten channels at random toconnect to the base unit. What is the probability that both phones pickthe same channel?

51. A new car is equipped with dual airbags. Suppose that they fail inde-pendently with probability p. What is the probability that at least oneairbag functions properly?

May 28, 2003

Problems 57

52. A discrete-time FIR �lter is to be found satisfying certain constraints,such as energy, phase, sign changes of the coe�cients, etc. FIR �lters canbe thought of as vectors in some �nite-dimensional space. The energyconstraint implies that all suitable vectors lie in some hypercube; i.e., asquare in IR2, a cube in IR3, etc. It is easy to generate random vectorsuniformly in a hypercube. This suggests the following Monte{Carlo pro-cedure for �nding a �lter that satis�es all the desired properties. Supposewe generate vectors independently in the hypercube until we �nd one thathas all the desired properties. What is the probability of ever �nding sucha �lter?

53. A dart is repeatedly thrown at random toward a circular dartboard ofradius 10 cm. Assume the thrower never misses the board. Let An denotethe event that the dart lands within 2 cm of the center on the nth throw.Suppose that the An are mutually independent and that }(An) = p forsome 0 < p < 1. Find the probability that the dart never lands within2 cm of the center.

54. Each time you play the lottery, your probability of winning is p. You playthe lottery n times, and plays are independent. How large should n beto make the probability of winning at least once more than 1=2? Answer:

For p = 1=106, n � 693147.

55. Alex and Bill go �shing. Find the conditional probability that Alexcatches no �sh given that at least one of them catches no �sh. Assumethey catch �sh independently and that each has probability 0 < p < 1 ofcatching no �sh.

56. Consider the sample space = [0; 1] equipped with the probability mea-sure

}(A) :=

ZA

1 d!; A � :

For A = [0; 1=2], B = [0; 1=4][ [1=2; 3=4], and C = [0; 1=8][ [1=4; 3=8][[1=2; 5=8][ [3=4;7=8], determine whether or not A, B, and C are mutuallyindependent.

57. Given events A, B, and C, show that

}(A \CjB) = }(AjB)}(CjB)

if and only if

}(AjB \C) = }(AjB):In this case, A and C are conditionally independent given B.

May 28, 2003

58 Chap. 1 Introduction to Probability

x1.6: Combinatorics and Probability

58. An electronics store carries 3 brands of computers, 5 brands of at screens,and 7 brands of printers. How many di�erent systems (computer, atscreen, and printer) can the store sell?

59. If we use binary digits, how many n-bit numbers are there?

60. A certain Internet message consists of 4 header packets plus 96 data pack-ets. Unfortunately, a faulty router randomly re-orders the packets. Whatis the probability that the �rst header packet to arrive at the destinationis the 10th packet received? Answer: 0.02996.

61. In a pick-4 lottery game, a player selects 4 digits, each one from 0; : : : ; 9.If the 4 digits selected by the player match the random 4 digits of thelottery drawing in any order, the player wins. If the player has selected 4distinct digits, what is the probability of winning? Answer: 0.0024.

62. How many 8-bit words are there with 3 ones (and 5 zeros)? Answer: 56.

63. A faulty computer memory location reads out random 8-bit bytes. Whatis the probability that a random word has 4 ones and 4 zeros? Answer:

0.2734.

64. Suppose 41 people enter a contest in which 3 winners are chosen at ran-dom. The �rst contestant chosen wins $500, the second contestant chosenwins $400, and the third contestant chosen wins $250. How many di�er-ent outcomes are possible? If all three winners receive $250, how manydi�erent outcomes are possible? Answers: 63,960 and 10,660.

65. From a well-shu�ed deck of 52 playing cards you are dealt 14 cards. Whatis the probability that 2 cards are spades, 3 are hearts, 4 are diamonds,and 5 are clubs? Answer: 0.0116.

66. From a well-shu�ed deck of 52 playing cards you are dealt 5 cards. Whatis the probability that all 5 cards are of the same suit? Answer: 0.00198.

67. A �nite set D of n elements is to be partitioned into m disjoint sub-sets, D1; : : : ; Dm in which jDij = ki. How many di�erent partitions arepossible?

68. m-ary Pick n Lottery. In this game, a player chooses n m-ary digits. Inthe lottery drawing, n m-ary digits are chosen at random. If the n digitsselected by the player match the random n digits of the lottery drawing inany order, the player wins. If the player has selected n digits with k0 zeros,k1 ones, : : : , and km�1 copies of digit m � 1, where k0 + � � �+ km�1 = n,what is the probability of winning? In the case of n = 4, m = 10, and aplayer's choice of the form xxyz, what is the probability of winning; forxxyy; for xxxy? Answers: 0.0012, 0.0006, 0.0004.

May 28, 2003

Exam Preparation 59

69. In Example 1.42, what 7-bit sequence corresponds to 2 apples and 3 car-rots? What sequence corresponds to 2 apples and 3 bananas? Whatsequence corresponds to 5 apples?

Exam PreparationYou may use the following suggestions to prepare a study sheet, including for-mulas mentioned that you have trouble remembering. You may also want toask your instructor for additional suggestions.

1.1. Review of Set Notation. Be familiar with set notation, operations, andidentities. Graduate students should in addition be familiar with the pre-cise de�nition of a function and the notions of countable and uncountablesets.

1.2. Probability Models. Know how to construct and use probability modelsfor simple problems.

1.3. Axioms and Properties of Probability. Know axioms and properties of prob-ability. Important formulas include (1.9) for disjoint unions, and (1.10){ (1.12). Graduate students should understand and know how to use(1.13){(1.17); in addition, your instructor may also require familiaritywith Note 1 and related problems concerning �-�elds.

1.4. Conditional Probability. What is important is the law of total probabil-ity (1.23) and and being able to use it to solve problems.

1.5. Independence. Do not confuse independent sets with disjoint sets. IfA1; A2; : : : are independent, then so are ~A1; ~A2; : : : ;where each ~Ai is eitherAi or A

ci .

1.6. Combinatorics and Probability. The four kinds of counting problems are

(i) ordered sampling of k out of n items with replacement: nk.(ii) ordered sampling of k � n out of n items without replacement:

n!=(n� k)!.(iii) unordered sampling of k � n out of n items without replacement:�

nk

�.

(iv) unordered sampling of k out of n items with replacement:�k+n�1

k

�.

Know also the multinomial coe�cient.

Work any review problems assigned by your instructor. If you �nish them,re-work your homework assignments.

May 28, 2003