Download pdf - Stochastic Processes Subject Notes - WordPress.com · 8/3/2018 · Stochastic Processes The Markov Property A stochastic process has the Markov property if the conditional probability

1

Stochastic Processes Subject Notes

Probability Review

Axiomatic Probability Sample Space:

The set of all possible outcomes of some random process

Sigma algebra:

A set of subsets of our sample space, generally thought of as the events we can measure or observe

Probability measure:

A set function from the set of events to a number between 0 and 1 such that .

Random variable:

A function from the sample space to a real number

Probabilities of Random Variables When A is a subset of the real numbers, we can say that (using set inverse):

Strictly speaking the first notation is incorrect, as probability is a set function taking events as

arguments, and is a number not a set in . More correctly we write:

Basically, this is saying that if we say something like "the probability that X = 4", we really are talking

about "the probability of the set of all outcomes in the sample space that yield X = 4".

Distribution Function The distribution function of the random variable X can be written:

These probabilities along the real line are enough to specify the probability of any event. So two

variables with the same distribution function have the same probability for any subset A of (or

technically the same probability for all the outcomes that produce values in the subset A).

2

Random Vectors Multiple random variables placed on the same probability space.

Independent Random Variables Random variables are said to be independent if their joint distribution or density function is equal to the

product of the individual density or distribution functions for each variable.

Conditional probability

Independence Events A and B are said to be independent if:

This means that the sets are independent for all Borel sets.

3

Expectation of a Random Variable

Moments

Conditional Probability Density For continuous variable x:

If we now want the conditional distribution function of two continuous variables:

To find the conditional density, differentiate:

Poisson Limit Theorem The law of rare events or Poisson limit theorem gives a Poisson approximation to the binomial

distribution, under certain conditions.

4

The Poisson Limit Theorem states that that if , , are independent Bernoulli random variables with

P(Xi = 1) = 1 − P(Xi = 0) = pi , then is well-approximated by a Poisson random variable

with parameter .

Discrete-Time Markov Chains

Stochastic Processes

The Markov Property A stochastic process has the Markov property if the conditional probability distribution of future states

of the process (conditional on both past and present states) depends only upon the present state, not

on the sequence of events that preceded it.

Transition Probabilities A random sequence with a countable state space forms a DTMC if:

This enables us to write:

5

The key point here being that transition probabilities do not depend on the time index. Note that each

row sums to 1.

Deriving Joint Distribution

Consider , such that:

We can then write the joint distribution as:

Which can be written just as a product of binomials.

N-Step Transitions

The Chapman-Kolmogorov equations show how we can calculate the n-step transition probabilities

from the single-step transition probabilities . The equation states that for any :

Which is the sum of the probabilities of going from state to in steps, and from to in steps,

summed over all the states .

We can derive this equation using the law of total probability and simple properties of conditional

probabilities:

6

Where the last step follows from the Markov property.

The upshot of this is all the information we need to specify all finite dimensional distributions is the

starting distribution and the one-step transition matrix.

Accessibility Relationships Accessibility: a state is accessible from , denoted , if for some we have

Communicability: states and communicate with each other if and , denoted

Non-essentiality: a state is non-essential if there is a state such that but ; that is

eventually we will leave state and never be able to return

Essentiality: a state is essential if implies that

Absorbing: a state is an absorbing state if . Absorbing states are essential

Ephemeral: a state is ephemeral if , meaning that once we reach the state we can never

return to it. Ephemeral states usually don’t add anything to a DTMC model and we are going to

assume that there are no such states

7

Properties of Communicability Relation If there are no ephemeral states, then the following properties hold for all states:

Reflexivity:

Symmetry: if and only if

Transitivity: if and then

Transitivity Suppose we know that and , we want to show that . We know that

and

for some and . Using the Chapman-Kolmorogov equation we then have:

Hence we have demonstrated transitivity.

Communicating Classes Note that transitivity relations produce a partitioning of the state space. Consider a set S whose

elements can be related to each other via any equivalence relation ⇔. Then S can be partitioned into a

collection of disjoint subsets (where might be infinite) such that implies that

⇔ .

An essential state cannot be in the same communicating class as a non-essential state. This means we

can further divide the communicating class partition into a set of non-essential communicating classes

and a set of essential communicating classes.

If a DTMC starts in a state from a non-essential communicating class then once it leaves, it never

returns. If the DTMC starts in a state from a essential communicating class then it can never leave.

Irreducible Markov Chain If a DTMC has only one communicating class (i.e., all states communicate) then it is called an irreducible

DTMC.

8

Random Walk Behaviour Consider a random walk where with iid with and

. This DTMC is irreducible and so all states are essential.

However,if p > q, then , so will ‘drift to infinity’, at least in expectation.

As such, for each fixed state , with probability one, the DTMC will visit only finitely many times.

We infer from this that even if a state is essential, we might still leave it and never return. This always

happens with non-essential states, but even for some essential states it still happens. Thus we need a

further classification of states.

Recurrence This further classification of states that we need relies on calculating the probability that the DTMC

returns to a state once it has left. Define , the probability of returning to a state eventually:

State is said to be recurrent if , and transient if . Transient states we might return to a

bunch of times, but eventually we leave them - there is a probability that there is last time that

we return to state .

If the DTMC starts in a recurrent state then, with probability one, it will eventually re-enter . At this

point, the process will start anew (by the Markov property) and it will re-enter again with probability

one. So the DTMC will (with probability one) visit infinitely-many times.

If the DTMC starts in a transient state then there is a probability that it will never return. So,

Letting be the number of visits to state after starting there, we see that has a geometric

distribution:

Thus we have:

Thus the expected number of return times to a transient state is finite - eventually we expect that we

will never return there.

Consider the indicator variable . We can define as:

9

We can therefore calculate as:

It follows that state is recurrent if and only if

Recurrence is a Class Property Recurrence is a class property, meaning that if one state in a communicating class is recurrent, then so

are all states in that class. To show this, assume that state is recurrent and ↔ . Since can always

return to itself from , there must exist and such that and

. Now consider:

This inequality holds since the latter event is a subset of the former event, requiring a specific value of

that the first event does not. We can now write:

We could likewise write out the same inequality for . Thus we have:

Using the second inequality:

Sub out n for :

Now add the first inequality:

The terms in parentheses are independent of , so can be taken as irrelevant constants when summing:

10

We know that

since is recurrent (the finite and shifts don't matter for infinite

sums), so therefore

, and thus state is also recurrent.

Note that if the Markov chain is irreducible then all states are either recurrent or transient and so it’s

appropriate to refer to the chain as either recurrent or transient.

Random Walk Recurrence We can compute the m-step transition probabilities from state j to itself by observing that these

probabilities are zero if m is odd and if even is equal to:

This sum diverges if p = q = 1/2, so the DTMC is recurrent. Otherwise it is transient.

Periodicity

State is periodic with period if the set of is non-empty and has a greatest common

divisor . If a state has period 1 we say it is aperiodic.

Periodicity is a class property, meaning that all states in the same communicating class have the same

period.

11

Steps for Analysing a DTMC Draw a transition diagram

Divide the state space into essential and non-essential states

Define the communicating classes, and divide them into recurrent and transient

Decide whether the classes are periodic

Recurrence in Finite State MC This is fairly intuitive - if we have an infinite list of a finite number of things, one of those things must

appear an infinite number of times. Recall that a state is transient if:

This means that the DTMC visits only finitely-many times. Now define to be the probability that the

DTMC starts in state and ever visits state . We can break this up into the sum over all the possible

times when it first visits state :

12

Now consider the probability that we go from to in steps:

As the sum over all the possible first fists to at time , followed by returning to in the remaining

steps.

We can now consider the probability of going from to . Substituting the above formula we have:

Here we have written the probability that we start from j and ever get to k, as the sum over all the

numbers of steps that could take (n), and also a sum over all number of steps which could have

been our first time getting to state . Note that

since we may go from to more than

once.

Now, if all states were transient, we would have:

Using the above results, we can rewrite this as:

13

This is a contradiction. Hence all states cannot be transient in a finite-state markov chain; one of

must be infinite for some k. This argument does not hold for an infinite markov chain, which

obviously it shouldn't since a simple random walk with has all states transient.

Recurrence in Infinite State MC - First Step Analysis In order to be able to tell whether a class is recurrent, we need to be able to calculate the probability of

return for at least one state, which we will denote . Denote the probability that the chain ever

reaches state 0, given it starts at . Ultimately therefore we want to determine:

To do this, we need to introduce a more general equation:

Which simply says that the probability we ever go from to 0 is the probability of going straight there,

plus the sum over all other states , of going to that state and then to state 0.

To see how this method works, consider a simple random walk that 'bounces off' zero. Thus:

Thus using our equation from above (i.e. the probability that we ever go from to 0 is the probability

that we first go up and then to zero, plus the probability that we first go down and then to 0):

The first equation is a second-order linear difference equation with constant coefficients.

14

Which has roots:

If these roots are distinct (i.e. if ), the general solution is thus a linear combination of

these roots:

Otherwise the general solution uses the generalised eigenvector:

Where the values of the constants A and B need to be determined by boundary equations, or other

information that we have.

In both cases, it should be clear that , otherwise these won't be proper probabilities for large . To

solve for A, substitute in :

Thus , which makes sense because p ≤ 1/2 and so we have a neutral or downward drift.

If we now consider the case where , we need to introduce the idea of continuity of

probability measure. As we take increasing subsets, the probabilities of these subsets converge to the

union of all these sets.

15

This reasoning allows us to determine that is the minimal nonnegative solution to the difference

equation we are trying to solve, because the probability that we ever get to zero is obviously less than

the sum of all the probabilities for getting there in 1,2,3, etc steps (since we could get there more than

once).

The upshot of this is that in our general solution of the form:

We know that we are looking for the minimal solution, so . Now we cannot use the

argument we used earlier. We can still use the boundary condition:

16

Thus we have:

Hence we find:

The Gambler’s Ruin Problem

The recurrence equation gives us:

Which for has the same generation solution as before:

Since we can never come down from , the upper boundary condition gives us:

Likewise the lower boundary condition is:

17

Which together yield:

Null and Positive Recurrence We want to know the proportion of time a DTMC spends in each state over the long run. This should be

the same as the limiting probabilities . Note that this will be zero for transient and non-

essential states.

Define to be the time between the th and st return to state .

Let us define:

18

In the aperiodic, positive recurrent case, the number of times visiting state j will be:

In the null recurrent or transient case, , so the limit becomes zero.

For this analysis to work, the chain must be non-periodic.

Ergodicity and Stationarity We call the DTMC ergodic if for all the limit

exists and does not depend on .

19

Note that stationary distributions are not necessarily the same thing as limiting distributions. They are

equivalent under the following conditions:

We often test whether an irreducible, aperiodic DTMC is ergodic by attempting to solve the equations,

and seeing if there is a unique solution.

Double-Stochastic Matrix If an aperiodic DTMC has a doubly-stochastic transition matrix, then we can easily verify that:

This also provides a good example for the difference between limiting distrubitions and stationary

distributions. A 2 by 2 doubly stochastic matrix as shown below is periodic so has no limiting

distribution, but does have a stationary distribution.

20

Random walk with One Barrier Consider again the random walk with a barrier on the lower boundary (i.e. positive only). We know it is

irreducible (as every state is accessible to every other), aperiodic (as there is a loop at zero), and

recurrent if p (upward probability) is less than or equal to 1/2. We now as the question: is this chain

ergodic (i.e. is it positive or null recurrent?)

We can do this by using the above theorem, and determine whether there is a single probability solution

to the equation:

Using the downwards boundary condition, we have:

The rest of the chain comes from the transition probabilities:

Thus we arrive at the equations:

For this to be a probability solution, we need all the pi's to sum to 1:

Thus we must have

in the stationary distribution.

Interpreting the Distribution For an irreducible, aperiodic and positive-recurrent DTMC, the distribution defined by π has a number of

interpretations

Limiting: the probability in the limit that the chain is in state

Stationary: starting in distribution the chain will remain there forever

Ergodic: the proportion of time the chain spends in state converges to with probability one

21

Markov Reward Processes Consider a situation where a cost or reward is incurred/earned by the DTMC whenever it visits

state for one time unit. In the stationary regime, the expected cost/reward per time unit is:

In some situations we have a means of controlling such a process by making decisions that affect the

transition probabilities and we wish to make decisions that maximise our reward. The study of how to

do this is known as Markov decision theory.

Consider the situation where we have a discrete time stochastic process and a finite set of

actions. Transition probabilities depend on state and action, and can be written as . The

objective is to choose a set of actions (a policy) so as the maximise the expected value of rewards over

the time horizon with respect to our actions.

Poisson Processes

Key Distributions The Poisson distribution arises as the limit of the binomial distribution:

The exponential distribution arises as the limit of a geometric distribution:

Introduction A nonnegative integer-valued process is a Poisson process with a rate if:

it has independent increments on disjoint intervals, so the following are independent:

22

the value at any time depends on the time by:

for the value at two distinct times:

This third result can be proved using the moment-generating function:

Note the distinction between and , with :

23

We can think of the Poisson process as being the limit of a discrete version:

In the discrete version, the number of successes is binomially distributed, while the number of time

units needed for a success is geometrically distributed. As such,

, which as we know

converges in distribution to as . Likewise we have

converges with large

to .

Joint Distribution Using the property of independent increments, we find that the joint distribution is given by:

That is, the joint distribution factorises into independent marginal distributions.

24

Waiting Times are Exponential is a Poisson process with rate if and only if are independent random variables.

To prove this, note that the waiting time until the jth jump is less than , , if and only if there are

or more events in time , . Thus we have:

Thus we see that waiting times follow an exponential distribution. More generally:

Which is a sum of independent exponentially distributed random variables, forming a gamma

distribution.

Order Statistics The th order statistic of random variables refers to the th smallest value of these

variables, and is denoted .

In general if are independent random variables with distribution function and density ,

the distribution function of is given by:

25

The density is given by:

So adding combinatorial coefficients:

The joint density for order statistics of variables is given by:

Conditional Distributions It turns out that the conditional distribution of arrival times given the total number of occurrences is

equal to , is equal to the distribution of order statistics from iid normal variables.

Where are independent Uniform on . We can also state this as:

26

We can show this by direct computation by rewriting Ts in terms of s:

By independence of non-overlapping intervals we have:

We can simplify the result by normalising each statistic by :

Superposition of Poisson Processes The sum of two independent Poisson process is itself a Poisson process with a rate equal to the sum of

the rates of the two initial processes.

27

To show that this is the case, just check the two axioms of Poisson processes:

The sum of Poissons is a Poisson, so

For disjoint intervals and , we have

which are both independent given that and are Poisson processes

Thinning of a Poisson Process Suppose in a Poisson process each customer is marked (set aside) independently with probability p,

and denote as the number of ‘marked’ customers up to time . It turns out that the marked process

and the ‘non-marked’ process are independent Poisson processes with rates and

.

This arises because when we specify a point being in , all that tells us is that we cannot have a point in

the exact same spot in . But the probability of having a point at any exact spot is zero anyway, so our

probability distribution is unchanged conditioning on this information.

Poisson Arrivals See Time Averages One question we may ask is when a new arrival (say a new customer arriving at a queue in a store)

observe a markov chain in its stationary state, namely seeing state with probability ?

The “PASTA Theorem” states that Poisson arrivals see time averages, so from the customer’s

perspective the chain is always in its steady state. This is the case because the only additional

information that the arriving customer has is that , so we write:

Thus, conditioning a Poisson process on a single value at a single point does not change the distribution.

28

Compound Poisson Process A compound Poisson process is a regular Poisson process where the increments are themselves i.i.d.

random variables (as opposed to only the interval times). Such a process can be defined as:

Continuous-Time Markov Chains

Introduction A non-negative integer valued stochastic process in continuous time is said to be a

Continuous-Time Markov Chain if, for all and positive integers :

Generally we deal with homogenous CTMCs, which have probabilities that don’t change over time, only

on the state.

To have the memoryless property, markov chains must have exponential holding times, so that the rate

of jumping does not change depending on how long one has been at the state.

29

If is the first time the chain leaves state then using the markov property at step 2 and

homogeneity at step 3:

The Chapman-Kolmogorov Equations For the continuous case we write these equations as:

In matrix form we can express this as:

Unfortunately since time is now continuous, we cannot write as powers of a single matrix, since we

need to be able to use non-integer time values as well. We need a new object. Consider therefore:

We are interested what happens when becomes very small, so define:

The continuity assumption on implies the existence of the matrix A, called the matrix or

infinitesimal generator of the CTMC. Element-wise this is:

Note that this value is always finite, and (rows sum to zero).

Transition Probability Generator From the analysis above, we would hope to show that both of the following hold:

30

To do so, we need to verify that certain limits converge so that we can interchange limits and sums. This

analysis will be omitted here.

For non-explosive CTMCs, the matrix A determines the transition probability completely by solving the

backward or forward equations to get:

Poisson Process Example The Poisson process is a variety of a CTMC, where we have:

So we have the very simple generator:

Now solving for the transition probabilities considering element-wise:

By induction we find the generation rule to be:

Interpretation of the Generator Beginning with the equation to define the generator:

31

We have for a small value of , and in the case where the chain stays at state k (note the approximation

because it is possible to leave state , come back, and leave again, but for small this is unlikely):

So we can think of as the rate of transition from j to k, with:

Since each row sums to zero, we have:

Now consider the same probability from the point of view of leaving time. Since we know holding times

are exponential, we have for the case of staying at state k:

Using an expansion approximation for small :

Equating these two derivations we have for :

Likewise have for small h:

Comparing this with the result obtained above we have for :

To find out where the CTMC moves upon leaving state , we calculate:

32

In the limit of small h:

Ergodicity For an ergodic CTMC, the stationary distribution satisfies:

This occurs if and only if satisfies:

Discrete vs Continuous

Birth and Death Processes Let be the number of ‘people’ in a system at time t. Whenever there are ‘people’ in the system

new arrivals enter (by birth or immigration) the system at an exponential rate and ‘people’ leave (or

die from) the system at an exponential rate , with arrivals and departures occurring independently of

one another.

The generator of a birth and death process has the form:

33

The CTMC evolves by remaining in state for an exponentially-distributed time with rate , then

it moves to state with probability

and state with probability

, and so on.

The Poisson process is an example of a pure birth process with constant birth rates.

For a birth and death process with a given initial distribution, one can find by solving

the system of differential equations (for a vector):

Expanded by columns as:

This system of equations governs the ‘redistribution’ of ‘probability mass’ as time passes. For finite state

space birth and death process, it can be solved numerically.

The stationary distribution can be found by solving , which implies:

Which is equivalent to the condition:

Which has the solution:

Using the same theorem as for discrete processes:

34

Queuing Theory

Introduction Queueing theory is the mathematical study of the operation of stochastic systems describing processing

of flows of jobs.

Kendall’s Notation This notation takes the form A/B/n/m, where:

A describes the arrival process

o A = M (Markov) inter-arrival times are independent and exponentially-distributed

o A = G inter-arrival times are independent with an arbitrary distribution

o A = D inter-arrival times are deterministic

B describes the service process

o B = M service times are independent and exponentially-distributed

o B = G service times are independent with an arbitrary distribution

o B = D service times are deterministic

n gives the number of servers

m gives the capacity of the system. When this is usually omitted

Describing the M/M/1 Queue Arrival stream: Poisson process with intensity

Service: n = 1 server, service time

Infinite space for waiting: .

The state Xt gives the number of customers at time t: If the server is idle, while if

one customer is being served and – customers are waiting in the queue.

35

The queue length of an M/M/1 queue evolves as a birth and death process with birth rates and

death rates . Thus we use our result from birth-death processes to determine the stationary

distribution (with

):

Average number of people in the system is calculated by:

Average queue length is:

Waiting Times Waiting time depends on , the number of people already in the system when a new arrival appears.

For service times , the new arrival will have to wait:

This leads to the expectation:

The expected total time in the system is simply the sum of expected waiting time and expected time

being served:

36

Little’s Law This expresses a relationship between queue length and expected waiting time:

Describing the M/M/a Queue Arrival stream: Poisson process with intensity

Service: n = a servers, service time

Infinite space for waiting: .

A stationary distribution exists if:

The stationary distribution is given by:

Average queue length is:

37

Where is the proportion of time that all servers are busy.

By Little’s Law, the expected waiting time is:

And so the expected delay is simply plus the expected service time:

Single vs Multiple Servers Which is better: a single server with service rate , or servers with service rate each? A heuristic

argument tells us that if , both systems work with the same rate, but if the rate for

the a server queue is , which is less than the rate for the single server. So we might conclude that

the single server is better.

We can show this explicitly by comparing the expected time for the M/M/1 and M/M/a queues:

Renewal Theory

Introduction A renewal process is a counting process for which the times between successive

events, called renewals, are independent and identically-distributed random variables with an arbitrary

common distribution function F. A Poisson process is a renewal process, but a renewal process may not

be Poisson.

38

Important properties:

Explosions Is it possible to have an infinite number of jumps in a finite time? To see that this is not possible:

Distribution of Nt We know that the distribution tends to infinity over time:

To get an expression for the distribution function write:

But how can we find ?

Since we have , it follows:

39

By the strong law of large numbers, we know also that:

And hence by the sandwich theorem we find that:

And we see that, for large t, grows like

.

The M/G/1/1 Queue We consider a single-server Erlang Loss System. There is no queue: when an arriving customer finds

server busy, he/she does not enter. Service times are independent and identically-distributed with

distribution function G with the mean .

Let be the number of customers who have been admitted by . Then the times between successive

entries of customers are made up of a service time, and then a waiting time from the end of service until

the next arrival.

The mean time between renewals is the sum of mean service and waiting times:

Thus the entry rate is:

40

The proportion of customers who enter the queue is:

The Renewal Central Limit Theorem The theorem states that for and then:

To show this:

Choose so

, thus:

Note that this applies regardless of the distribution of .

Residual Lifetime The residual lifetime at time is the expected amount of time until the next arrival:

41

When the distribution of is non-lattice (i.e. not discrete), then for all :

To show this, note that the proportion of time up to the th arrival where the residual lifetime is longer

than is:

By the law of large numbers, and since long-term limiting proportions are the same as limiting

probabilities, as approaches infinity this tends to:

Hence we have:

Where is the distribution of the renewal times .

Age Distribution The age is the time sense the last renewal:

42

Note that:

If we set we find:

To find the joint residual/age density, twice differentiate the joint distribution:

Brownian Motion

Defining Brownian Motion The normal distribution arises as the limit of random walks. We know from the central limit theoream

that for iid with mean 0 and variance 1:

A continuous time stochastic process is standard Brownian motion if:

It has continuous sample paths

It has independent increments on disjoint intervals

For each

43

Properties of Brownian Motion Brownian motion can be considered as a limit of a random walk, packing an infinite number of steps in

every finite time interval. On the basis of this we can derive the following property:

Furthermore a Brownian motion process restarted at any moment is still a Brownian motion process:

Brownian motion with parameter is defined to have the same distribution as .

Multivariate Normal Distribution The multivariate normal distribution describes the distribution of a sum of independent normal

variables. We say that has the multivariate normal distribution if:

44

Where is a positive definite matrix (meaning , and so matrix has ‘square root’) called the

covariance matrix, and is a k-vector mean.

So an individual is given by:

So we can show that the covariance is given by :

An easier way of writing a multivariate normal is by using a lower triangular matrix such that

We can also write:

Since we know the density of , we can find the density of by simply making a change of variables:

45

For any invertible matrix , we have is a multivariate normal with mean and covariance .

Joint Brownian Distribution The joint distributions of Brownian motion observed at a collection of times are linear functions of

independent normal variables, which correspond to the increments. Thus we can write:

Therefore we see that we can write the joint distribution as a multivariate normal distribution:

The means are zero and the so the distribution is entirely determined by the pairwise covariances, which

we can compute as:

Thus we have the covariance matrix for the joint distribution of :

Einstein Derivation of Brownian Motion Brownian motion arises as the limit of random walk, and so inherits the definition/properties from the

random walk. This result is called Donsker’s Theorem or the invariance principle.

Define a simple random walk with time scaled to :

By the law of total probability:

Taking and doing some rearranging we find:

46

This is the heat equation which under appropriate boundary conditions has the solution:

Which is the normal distribution.

Hitting Times Define the first time when a Brownian motion process hits the level to be . Since Brownian motion is

continuous, if then . Furthermore, since random walk is recurrent, is finite.

We can derive the distribution of by the equation:

So all that remains is to find the conditional distribution. However since we know that shifting the origin

of a Brownian motion up by some constant leaves the result still a Brownian motion process, we have

the reflection principle:

Thus we find that hitting times are distributed as:

47

The density of the hitting time is thus given by what is called Levy’s distribution:

Maximum of Brownian Motion The distribution of the maximum of up to time is derived as follows:

So we find that the maximum of Brownian motion is distributed as the absolute value of a normal

distribution.

Relative Hitting Times By analogy with the gambler’s ruin problem: