Detection Theory - Binary and M-ary source

Bayesian Hypothesis testing Statistical Decision Theory I. Simple Hypothesis testing. Binary Hypothesis testing Bayesian Hypothesis testing. Minimax Hypothesis testing. Neyman-Pearson criterion. M-Hypotheses. Receiver Operating Characteristics. Composite Hypothesis testing. Composite Hypothesis testing approaches. Performance of GLRT for large data records. Nuisance parameters.

Classical detection and estimation theory. What is detection? Signal detection and estimation is the area of study that deals with

the processing of information-bearing signals for the purpose of extracting information from them.

A simple digital communication system.

TransmitterSource ChannelDigital

SequenceSignal

Sequence

s(t) r(t)=s(t)+n(t)

TT

1 0

TT

sin0tsin1t

Components of a decision theory problem. 1. Source - that generates an

output. 2. Probabilistic transition

mechanism - a device that knows which hypothesis is true. It generates a point in the observation space accordingly to some probability law.

3. Observation space describes all the outcomes of the transition mechanism.

4. Decision - to each point in observation space is assigned one of the hypotheses

Probabilistictransition

mechanismSource Observation

space

Decisionrule

Decision

H1

H0

Components of a decision theory problem.

Example:

When 1H is true the source generates +1.

When 0H is true the source generates -1.

An independent discrete random variable n whose probability density is added to the source output.

12

14

14

N0-1 +1

pn(N)

12

14

14

N+10 +2

pn(N H1)

12

14

14

N-1-2 0

pn(N H0)

Source

H1

H0+1

-1

Transitionmechanism

Observationspace

r

The sum of the source output and n is observed variable r. Observation space has finite dimension, i.e. observation consists of a

set of N numbers and can be represented as a point in N dimensional space.

Under the two hypotheses, we have

1

0

: 1

: 1

H r n

H r n

= +

= + After observing the outcome in the observation space we shall guess

which hypothesis is true. We use a decision rule that assigns each point to one of the

hypotheses.

Detection and estimation applications involve making inferences from observations that are distorted or corrupted in some unknown manner.

Simple binary hypothesis testing. The decision problem in which each of two source outputs

corresponds to a hypothesis. Each hypothesis maps into a point in the observation space. We assume that the observation space is a set of N observations:

1 2, , , Nr r r . Each set can be represented as a vector r:

1

2

N

rr

r

r #

The probabilistic transition mechanism generates points in accord with the two known conditional densities ( )

11|

|H

p Hr

R , ( )0

0||

Hp H

rR .

The objective is to use this information to develop a decision rule.

Decision criteria. In the binary hypothesis problem either 0H or 1H is true. We are seeking decision rules for making a choice. Each time the experiment is conducted one of four things can happen:

1. 0H true; choose 0H correct 2. 0H true; choose 1H 3. 1H true; choose 1H correct 4. 1H true; choose 0H

The purpose of a decision criterion is to attach some relative importance to the four possible courses of action.

The method for processing the received data depends on the decision criterion we select.

Bayesian criterion. Source generates two outputs with given (a priori) probabilities 1 0,P P . These represent the observer information before the experiment is conducted.

The cost is assigned to each course of actions. 00 10 01 11, , ,C C C C . Each time the experiment is conducted a certain cost will be incurred. The decision rule is designed so that on the average the cost will be

as small as possible. Two probabilities are averaged over: the a priori probability and

probability that a particular course of action will be taken.

Source

pr H1(R H1)

pr H0(R H0)Z: Observation space

R

R

Z0

Z0

Say:H0

Z1

Say:H1

The expected value of the cost is

( )( )( )( )

00 0 0 0

10 0 1 0

11 1 1 1

01 1 0 1

Pr say | is true

Pr say | is true

Pr say | is true

Pr say | is true

C P H H

C P H H

C P H H

C P H H

=

+

+

+

R

The binary observation rule divides the total observation space Z into two parts: 0 1,Z Z .

Each point in observation space is assigned to one of these sets. The expression of the risk in terms of transition probabilities and the

decision regions:

( ) ( )

( ) ( )

0 0

0 1

1 1

1 0

00 0 0 10 0 0r| r|

11 1 1 01 1 1r| r|

R | R R | R

R | R R | R

H HZ Z

H HZ Z

C P p H d C P p H d

C P p H d C P p H d

= +

+ +

R

0 1,Z Z cover the observation space (the integrals integrate to one). We assume that the cost of a wrong decision is higher than the cost

of a correct decision.

10 00

01 11

C C

C C

>

>

For Bayesian test the regions 0Z and 1Z are chosen such that the risk will be minimized.

We assume that the decision is to be made for each point in observation space. ( )0 1Z Z Z= +

The decision regions are defined by the statement:

( ) ( )

( ) ( )

0 0

0 0

1 1

0 0

00 0 0 10 0 0| |

11 1 1 01 1 1| |

| |

| |

H HZ Z Z

H HZ Z Z

C P p H d C P p H d

C P p H d C P p H d

= +

+ +

r r

r r

R R R R

R R R R

R

Observing that

( ) ( )0 1

0 1| || | 1

H HZ Z

p H d p H d= = r rR R R R

( ) ( )( ) ( )

1

0 0

1 01 11 1|

0 10 1 11

0 10 00 0|

|

|

H

Z H

P C C p HPC PC d

P C C p H

= + + r

r

RR

RR

The integral represents the cost controlled by those points R that we assign to 0Z .

The value of R where the second term is larger than the first contribute to the negative amount to the integral and should be included in 0Z .

The value of R where two terms are equal has no effect. The decision regions are defined by the statement: If ( ) ( ) ( ) ( )

1 01 01 11 1 0 10 11 0r| r|

R | R |H H

P C C p H P C C p H , assign R to 1Z and say that 1H is true. Otherwise assign R to 0Z and say that 0H is true.

This may be expressed as:

( )( )

( )( )

01

10

1r| 0 0010

0 1 01 11r|

R |

R |

HH

HH

p H P C C

p H P C C

( )( )( )

1

0

1r|

0r|

R |R

R |H

H

p H

p H = is called likelihood ratio.

Regardless of the dimension of R , ( ) R is one-dimensional variable. Data processing is involved in computing ( )R and is not affected by

the prior probabilities and cost assignments.

The quantity ( )( )

0 10 00

1 01 11

P C C

P C C is the threshold of the test.

The can be left as a variable threshold and may be changed if our a priori knowledge or costs are changed.

Bayes criterion has led us to a Likelihood Ratio Test (LRT)

( )0

1

RH

H

An equivalent test is ( )0

1

ln R lnH

H

Summary of the Bayesian test: The Bayesian test can be conducted simply by calculating the

likelihood ratio ( )R and comparing it to the threshold. Test design: Assign a-priori probabilities to the source outputs. Assign costs for each action. Assume distribution for ( )

11|

|H

p Hr

R , ( )0

0||

Hp H

rR .

calculate and simplify the ( )R

Likelihood ratio processor

DataProcessor

( )R( )

1

0

RH

H

Threshold device DecisionR

Special case.

00 11 0C C= = 01 10 1C C= =

( ) ( )0 1

0 0

0 0 1 1| || |

H HZ Z Z

P p H d P p H d

= + r rR R R RR

( ) ( )01

00 1

1

ln R ln ln ln 1H

H

PP P

P =

When the two hypotheses are equally likely, the threshold is zero.

Sufficient statistics.

Sufficient statistics is a function T that transfers the initial data set to the new data set ( )T R that still contains all necessary information contained in R regarding the problem under investigation.

The set that contains a minimal amount of elements is called minimal

sufficient statistics.

When making a decision knowing the value of the sufficient statistic

is just as good as knowing R .

The integrals in the Bayes test. False alarm:

( )0

1

0||F H

Z

P p H d= r R R . We say that target is present when it is not.

Probability of detection:

( )1

1

1||D H

Z

P p H d= r R R . Probability of miss:

( )1

0

1||M H

Z

P p H d= r R R . We say target is absent when it is present.

Special case: the prior probabilities unknown. Minimax test.

( ) ( )

( ) ( )

0 0

0 0

1 1

0 0

00 0 0 10 0 0| |

11 1 1 01 1 1| |

| |

| |

H HZ Z Z

H HZ Z Z

C P p H d C P p H d

C P p H d C P p H d

= +

+ +

r r

r r

R R R R

R R R R

R

If the regions 0Z and 1Z fixed the integrals are determined. ( ) ( )( )0 10 1 11 1 01 11 0 10 00 1M FPC PC P C C P P C C P= + + R 0 11P P= The Bayesian risk will be function of 1P .

( ) ( )

( ) ( ) ( )1 00 10

1 11 00 01 11 10 00

1 F F

M F

P C P C P

P C C C C P C C P

= + + +

R

Bayesian test can be found if all the costs and a priori probabilities are known.

If we know all the probabilities we can calculate the Bayesian cost.

Assume that we do not know 1P and just assume a certain one *

1P and design a corresponding test.

If 1P changes the regions 0 1,Z Z changes and with these also FP and DP .

The test is designed for *1P but the actual a priori probability is 1P . By assuming *1P we fix FP and DP . Cost for different 1P is given by a

function ( )*1 1,P PR . Because the threshold is fixed the cost ( )*1 1,P PR is a linear function of *1P .

Bayesian test minimizes the risk for *1P . For other values of 1P

( ) ( )*1 1 1,P P PR R ( )1PR is strictly concave. (If ( )R is a

continuos random variable with strictly

R

RF

RBC11

C00

P1

*

1P10

A function of 1P if *

1P is fixed

monotonic probability distribution function, the change of always change the risk.)

Minimax test. The Bayesian test designed to minimize the maximum possible risk is

called a minimax test. 1P is chosen to maximize our risk ( )*1 1,P PR . To minimize the maximum risk we select the *1P for which ( )1PR is

maximum. If the maximum occurs inside the interval [ ]0,1 , the ( )*1 1,P PR will

become a horizontal line. Coefficient of 1P must be zero. ( ) ( ) ( )11 00 01 11 10 00 0M FC C C C P C C P + = This equation is

the minimax equation.

RRF

RB

C11

C00

P110

R

RF

RB

C11

C00P1

10

R

RF

RB

C11C00

P110

a) b) c)

Risk curves: maximum value of Rat a) 1 1P = b) 1 0P = c) 10 1P

Special case. Cost function is

00 11 0C C= =

01 10,M FC C C C= = . The risk is

( ) ( )1 1 0 1F F M M F F F F M MP C P P C P C P PC P PC P= + = +R . The minimax equation is

M M F FC P C P= .

Neyman-Pearson test. Often it is difficult to assign realistic costs of a priori probabilities.

This can be bypassed if to work with the conditional probabilities FP and DP .

We have two conflicting objectives to make FP as small as possible and DP as large as possible.

Neyman-Pearson criterion. Constrain 'FP = and design a test to maximize DP (or minimize

MP ) under this constraint. The solution can be obtained by using Lagrange multipliers. 'M FF P P = +

( ) ( )1 0

0 0

1 0r| r|F R | R+ R | R '

H HZ Z Z

p H d p H d

=

If FP = , minimizing F minimizes MP . ( ) ( ) ( )

1 0

0

1 0r| r|F 1 ' R | R | R

H HZ

p H p H d = + For any positive value of an LRT will minimize F. F is minimized if we assign a point R to 0Z only when the term in the

bracket is negative.

If ( )( )

1

0

1r|

0r|

R |

R |H

H

p H

p H< assign point to 0Z (say 0H )

F is minimized by the likelihood ratio test. ( )1

0

RH

H

To satisfy the constraint is selected so that 'FP = . ( )

00|

| 'F HP p H d

= = Value of will be nonnegative because ( )

00|

|H

p H will be zero for negative values of .

( )0| 0l Hp X H ( )1| 1l Hp X H

L

( )0| 0l Hp X H ( )1| 1l Hp X H

L

Example. We assume that under 1H the source output is a constant voltage m . Under 0H the sourse output is zero. Voltage is corrupted by an additive noise. The out put is sampled with N samples for each second. Each noise sample is a i.i.d. zero mean Gaussian random variable with variance 2 .

1

0

: , 1, 2, ,

: , 1, 2, ,i i

i i

H r m n i N

H r n i N

= + =

= =

( )2

2

1 exp2 2in

Xp X =

The probability density of ir under each hypothesis is:

( ) ( ) ( )1

2

21|

1| exp2 2ii

ini ir H

R mp R H p R m

= =

( ) ( ) ( )0

2

20|

1| exp2 2ii

ini ir H

Rp R H p R

= =

The joint probability of N samples is:

( ) ( )1

2

21|1

1| exp2 2

Ni

Hi

R mp H =

= r R

( )0

2

20|1

1| exp2 2

Ni

Hi

Rp H =

= r R

The likelihood ratio is

( )

( )22

1

2

21

1 exp2 2

1 exp2 2

Ni

i

Ni

i

R m

R

=

=

=

R

After cancelling common terms and taking logarithm:

( )2

2 21

ln .2

N

ii

m NmR = = R

Likelihood ratio test is:

1

0

2

2 21

ln2

N H

iHi

m NmR = ,

1

0

2

1

ln2

N H

iHi

NmRm

=+ .

We use Nmd for normalisation.

1

01

1 ln2

N H

iHi

Nml RN Nm

== +

Under 0H l is obtained by adding N independent zero mean Gaussian variables with variance 2 and then dividing by N . Therefore l is ( )0,1N .

Under 1H l is ,1NmN

.

( )

2

log / /2

1 lnexp2 2 2F

d d

x dP dx erfcd

+

= = + where

Nmd is the distance between the means of the two densities.

( )

( )

( )

( )

2

log / /2

2

log / /2

1 exp2 2

1 logexp2 2 2

Dd d

d d

x dP dx

y ddy erfcd

+

= = =

In the communication systems a special case is important

( ) 0 1Pr F MP P PP + . If 0 1P P= the threshold is one and ( ) ( )12Pr F MP P + .

Receiver Operating Characteristics (ROC).

For a Neyman-Pearson test the values of FP and DP completely specify the test performance.

DP depends on FP . The function of ( )D FP P is defined as the Receiver Operating Characteristic (ROC).

The Receiver Operating Characteristic (ROC) completely describes the performance of the test as a function of the parameters of interest.

Example.

Properties of ROC. All continuous likelihood tests have ROCs that are concave

downward. All continuous likelihood ratio tests have ROCs that are above the

D FP P= . The slope of a curve in a ROC at a particular point is equal to the

value of the threshold required to achieve the FP and DP at that point

Whenever the maximum value of the Bayes risk is interior to the interval (0,1) of the P1 axis the minimax operating point is the intersection of the line

( ) ( )( ) ( )11 00 01 11 10 001 0D FC C C C P C C P + = and the appropriate curve on the ROC.

Determination of minimax operating point.

Conclusions. Using either the Bayes criterion of a Neyman-Pearson criterion, we

find that the optimum test is a likelihood ratio test. Regardless of the dimension of the observation space the optimum

test consist of comparing a scalar variable ( ) R with the threshold. For the binary hypothesis test the decision space is one dimensional. The test can be simplified by calculating the sufficient statistics. A complete description of the LRT performance was obtained by

plotting the conditioning probabilities DP and FP as the threshold was varied.

M Hypotheses. We choose one of M hypotheses There are M source outputs each of which corresponds to one of M

hypotheses. We are forsed to make decisions. There are 2M alternatives that may occur each time the experiment

is conducted.

Probabilistictransition

mechanismSource

NH

#1H0H

Z: Observation space

R

R

Zi

Z0

Say:H0

Z1

Say:H1

Say:Hi

( )0

0r|R |

Hp H

( )1 1r|

R |H

p H

( )r| R |N NHp H

Bayes Criterion.

ijC cost of each course of actions.

iZ region in observation space where we chose iH

iP a priori probabilities.

( )1 1

|0 0

|j

i

M M

j ij jHi j Z

PC p H d

= == r R RR

R is minimized through selecting iZ .

Example M = 3.

0 1 2Z Z Z Z= + +

( ) ( )

( ) ( )

( ) ( )

( ) ( )

( )

0 0

1 2 1

0 1

2 0 2

1 1

0 2

2 2

0 1 0

2

1

0 00 0 0 10 0| |

0 20 0 1 11 1| |

1 01 1 1 21 1| |

2 22 2 2 02 2| |

2 12 2|

| |

| + |

| |

| + |

|

H HZ Z Z Z

H HZ Z Z Z

H HZ Z

H HZ Z Z Z

HZ

PC p H d PC p H d

PC p H d PC p H d

PC p H d PC p H d

PC p H d PC p H d

PC p H d

= +

+

+ +

+

r r

r r

r r

r r

r

R R R R

R R R R

R R R R

R R R R

R R

R

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

2 1

0

0 2

1

0 1

2

0 00 1 11 2 22

2 02 22 2 1 01 11 1| |

0 10 00 0 2 12 22 2| |

0 20 00 0 1 21 11 1| |

| |

| |

| |

H HZ

H HZ

H HZ

PC PC PC

P C C p H P C C p H d



= + +

+ + + + + +

r r

r r

r r

R R R

R R R

R R R

R

R is minimized if we assign each R to the region in which the value of the integrand is the smallest.

Label the integrals ( ) ( ) ( )0 1 2, ,I I IR R R .

( ) ( ) ( )0 1 2 0 and , chooseI I I H

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

0 2

1 2

0 1

2 1

0 1

0 2

or

0 001 01 11 1 10 2 12 02 2or

or

0 002 02 22 2 20 1 21 01 1or

or

02 12 22 2 20 10 1 21 11 1or

H H

H H

H H

H H

H H

H H

P C C P C C P C C

P C C P C C P C C

P C C P C C P C C

+

+

+

R R

R R

R R

M hypotheses always lead to a decision space that has, at most, 1M dimensions.

Decision Space

2H

1H0H

( )1 R

( )2 R

Special case. (often in communication)

00 11 22 0C C C= = = 1,ijC i j= ( ) ( )

( ) ( )

( ) ( )

1 2

1 00 2

1 2

2 00 1

0 2

2 10 1

or

1 1 0 0| |or

or

2 2 0 0| |or

or

2 2 0 1| |or

| |

| |

| |

H H

H HH H

H H

H HH H

H H

H HH H

P p H P p H

P p H P p H

P p H P p H

r r

r r

r r

R R

R R

R R

Compute the posterior probabilities and choose the largest. Maximum a posteriori probability computer.

Special case. Degeneration of hypothesis. What happens if to combine 1H and 2H :

12 21 0C C= = . For simplicity

01 10 20 02C C C C= = =

00 11 22 0C C C= = = First two equations of the test reduce to

( ) ( )1 2

0

or

1 1 2 2 0

H H

HP P P + R R

1 2or H H

0H

( )2 R

( )1 R

Dummy hypothesis. Actual problem has two hypothesis 1H and 2H . We introduce a new one 0H with a priori probability 0 0P = Let 1 2 1P P+ = and 12 02 21 01,C C C C= = We always choose 1H or 2H . The test reduces to:

( ) ( ) ( ) ( )21

2 12 22 2 1 21 11 1

H

HP C C P C C R R

Useful if ( )( )

2

1

2|

1|

|

|H

H

p H

p Hr

r

R

R difficult to work with but ( )1 R , ( )2 R are

simple.

Conclusions. 1. The minimum dimension of the decision space is no more that

1M . The boundaries of the decision regions are hyperplanes in the ( )1 1, , m plane. 2. The test is simple to find but error probabilities are often difficult to compute. 3. An important test is the minimum total probability of error test. Here we compute the a posteriori probability of each test and choose the largest.

Documents

Detection Theory - Binary and M-ary source