Upload
mohamed-abdelbaset
View
29
Download
0
Embed Size (px)
DESCRIPTION
Detection Theory - Binary and M-ary source
Citation preview
Bayesian Hypothesis testing Statistical Decision Theory I. Simple Hypothesis testing. Binary Hypothesis testing Bayesian Hypothesis testing. Minimax Hypothesis testing. Neyman-Pearson criterion. M-Hypotheses. Receiver Operating Characteristics. Composite Hypothesis testing. Composite Hypothesis testing approaches. Performance of GLRT for large data records. Nuisance parameters.
Classical detection and estimation theory. What is detection? Signal detection and estimation is the area of study that deals with
the processing of information-bearing signals for the purpose of extracting information from them.
A simple digital communication system.
TransmitterSource ChannelDigital
SequenceSignal
Sequence
s(t) r(t)=s(t)+n(t)
TT
1 0
TT
sin0tsin1t
Components of a decision theory problem. 1. Source - that generates an
output. 2. Probabilistic transition
mechanism - a device that knows which hypothesis is true. It generates a point in the observation space accordingly to some probability law.
3. Observation space describes all the outcomes of the transition mechanism.
4. Decision - to each point in observation space is assigned one of the hypotheses
Probabilistictransition
mechanismSource Observation
space
Decisionrule
Decision
H1
H0
Components of a decision theory problem.
Example:
When 1H is true the source generates +1.
When 0H is true the source generates -1.
An independent discrete random variable n whose probability density is added to the source output.
12
14
14
N0-1 +1
pn(N)
12
14
14
N+10 +2
pn(N H1)
12
14
14
N-1-2 0
pn(N H0)
Source
H1
H0+1
-1
Transitionmechanism
Observationspace
r
The sum of the source output and n is observed variable r. Observation space has finite dimension, i.e. observation consists of a
set of N numbers and can be represented as a point in N dimensional space.
Under the two hypotheses, we have
1
0
: 1
: 1
H r n
H r n
= +
= + After observing the outcome in the observation space we shall guess
which hypothesis is true. We use a decision rule that assigns each point to one of the
hypotheses.
Detection and estimation applications involve making inferences from observations that are distorted or corrupted in some unknown manner.
Simple binary hypothesis testing. The decision problem in which each of two source outputs
corresponds to a hypothesis. Each hypothesis maps into a point in the observation space. We assume that the observation space is a set of N observations:
1 2, , , Nr r r . Each set can be represented as a vector r:
1
2
N
rr
r
r #
The probabilistic transition mechanism generates points in accord with the two known conditional densities ( )
11|
|H
p Hr
R , ( )0
0||
Hp H
rR .
The objective is to use this information to develop a decision rule.
Decision criteria. In the binary hypothesis problem either 0H or 1H is true. We are seeking decision rules for making a choice. Each time the experiment is conducted one of four things can happen:
1. 0H true; choose 0H correct 2. 0H true; choose 1H 3. 1H true; choose 1H correct 4. 1H true; choose 0H
The purpose of a decision criterion is to attach some relative importance to the four possible courses of action.
The method for processing the received data depends on the decision criterion we select.
Bayesian criterion. Source generates two outputs with given (a priori) probabilities 1 0,P P . These represent the observer information before the experiment is conducted.
The cost is assigned to each course of actions. 00 10 01 11, , ,C C C C . Each time the experiment is conducted a certain cost will be incurred. The decision rule is designed so that on the average the cost will be
as small as possible. Two probabilities are averaged over: the a priori probability and
probability that a particular course of action will be taken.
Source
pr H1(R H1)
pr H0(R H0)Z: Observation space
R
R
Z0
Z0
Say:H0
Z1
Say:H1
The expected value of the cost is
( )( )( )( )
00 0 0 0
10 0 1 0
11 1 1 1
01 1 0 1
Pr say | is true
Pr say | is true
Pr say | is true
Pr say | is true
C P H H
C P H H
C P H H
C P H H
=
+
+
+
R
The binary observation rule divides the total observation space Z into two parts: 0 1,Z Z .
Each point in observation space is assigned to one of these sets. The expression of the risk in terms of transition probabilities and the
decision regions:
( ) ( )
( ) ( )
0 0
0 1
1 1
1 0
00 0 0 10 0 0r| r|
11 1 1 01 1 1r| r|
R | R R | R
R | R R | R
H HZ Z
H HZ Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
R
0 1,Z Z cover the observation space (the integrals integrate to one). We assume that the cost of a wrong decision is higher than the cost
of a correct decision.
10 00
01 11
C C
C C
>
>
For Bayesian test the regions 0Z and 1Z are chosen such that the risk will be minimized.
We assume that the decision is to be made for each point in observation space. ( )0 1Z Z Z= +
The decision regions are defined by the statement:
( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0| |
11 1 1 01 1 1| |
| |
| |
H HZ Z Z
H HZ Z Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
r r
r r
R R R R
R R R R
R
Observing that
( ) ( )0 1
0 1| || | 1
H HZ Z
p H d p H d= = r rR R R R
( ) ( )( ) ( )
1
0 0
1 01 11 1|
0 10 1 11
0 10 00 0|
|
|
H
Z H
P C C p HPC PC d
P C C p H
= + + r
r
RR
RR
The integral represents the cost controlled by those points R that we assign to 0Z .
The value of R where the second term is larger than the first contribute to the negative amount to the integral and should be included in 0Z .
The value of R where two terms are equal has no effect. The decision regions are defined by the statement: If ( ) ( ) ( ) ( )
1 01 01 11 1 0 10 11 0r| r|
R | R |H H
P C C p H P C C p H , assign R to 1Z and say that 1H is true. Otherwise assign R to 0Z and say that 0H is true.
This may be expressed as:
( )( )
( )( )
01
10
1r| 0 0010
0 1 01 11r|
R |
R |
HH
HH
p H P C C
p H P C C
( )( )( )
1
0
1r|
0r|
R |R
R |H
H
p H
p H = is called likelihood ratio.
Regardless of the dimension of R , ( ) R is one-dimensional variable. Data processing is involved in computing ( )R and is not affected by
the prior probabilities and cost assignments.
The quantity ( )( )
0 10 00
1 01 11
P C C
P C C is the threshold of the test.
The can be left as a variable threshold and may be changed if our a priori knowledge or costs are changed.
Bayes criterion has led us to a Likelihood Ratio Test (LRT)
( )0
1
RH
H
An equivalent test is ( )0
1
ln R lnH
H
Summary of the Bayesian test: The Bayesian test can be conducted simply by calculating the
likelihood ratio ( )R and comparing it to the threshold. Test design: Assign a-priori probabilities to the source outputs. Assign costs for each action. Assume distribution for ( )
11|
|H
p Hr
R , ( )0
0||
Hp H
rR .
calculate and simplify the ( )R
Likelihood ratio processor
DataProcessor
( )R( )
1
0
RH
H
Threshold device DecisionR
Special case.
00 11 0C C= = 01 10 1C C= =
( ) ( )0 1
0 0
0 0 1 1| || |
H HZ Z Z
P p H d P p H d
= + r rR R R RR
( ) ( )01
00 1
1
ln R ln ln ln 1H
H
PP P
P =
When the two hypotheses are equally likely, the threshold is zero.
Sufficient statistics.
Sufficient statistics is a function T that transfers the initial data set to the new data set ( )T R that still contains all necessary information contained in R regarding the problem under investigation.
The set that contains a minimal amount of elements is called minimal
sufficient statistics.
When making a decision knowing the value of the sufficient statistic
is just as good as knowing R .
The integrals in the Bayes test. False alarm:
( )0
1
0||F H
Z
P p H d= r R R . We say that target is present when it is not.
Probability of detection:
( )1
1
1||D H
Z
P p H d= r R R . Probability of miss:
( )1
0
1||M H
Z
P p H d= r R R . We say target is absent when it is present.
Special case: the prior probabilities unknown. Minimax test.
( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0| |
11 1 1 01 1 1| |
| |
| |
H HZ Z Z
H HZ Z Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +
r r
r r
R R R R
R R R R
R
If the regions 0Z and 1Z fixed the integrals are determined. ( ) ( )( )0 10 1 11 1 01 11 0 10 00 1M FPC PC P C C P P C C P= + + R 0 11P P= The Bayesian risk will be function of 1P .
( ) ( )
( ) ( ) ( )1 00 10
1 11 00 01 11 10 00
1 F F
M F
P C P C P
P C C C C P C C P
= + + +
R
Bayesian test can be found if all the costs and a priori probabilities are known.
If we know all the probabilities we can calculate the Bayesian cost.
Assume that we do not know 1P and just assume a certain one *
1P and design a corresponding test.
If 1P changes the regions 0 1,Z Z changes and with these also FP and DP .
The test is designed for *1P but the actual a priori probability is 1P . By assuming *1P we fix FP and DP . Cost for different 1P is given by a
function ( )*1 1,P PR . Because the threshold is fixed the cost ( )*1 1,P PR is a linear function of *1P .
Bayesian test minimizes the risk for *1P . For other values of 1P
( ) ( )*1 1 1,P P PR R ( )1PR is strictly concave. (If ( )R is a
continuos random variable with strictly
R
RF
RBC11
C00
P1
*
1P10
A function of 1P if *
1P is fixed
monotonic probability distribution function, the change of always change the risk.)
Minimax test. The Bayesian test designed to minimize the maximum possible risk is
called a minimax test. 1P is chosen to maximize our risk ( )*1 1,P PR . To minimize the maximum risk we select the *1P for which ( )1PR is
maximum. If the maximum occurs inside the interval [ ]0,1 , the ( )*1 1,P PR will
become a horizontal line. Coefficient of 1P must be zero. ( ) ( ) ( )11 00 01 11 10 00 0M FC C C C P C C P + = This equation is
the minimax equation.
RRF
RB
C11
C00
P110
R
RF
RB
C11
C00P1
10
R
RF
RB
C11C00
P110
a) b) c)
Risk curves: maximum value of Rat a) 1 1P = b) 1 0P = c) 10 1P
Special case. Cost function is
00 11 0C C= =
01 10,M FC C C C= = . The risk is
( ) ( )1 1 0 1F F M M F F F F M MP C P P C P C P PC P PC P= + = +R . The minimax equation is
M M F FC P C P= .
Neyman-Pearson test. Often it is difficult to assign realistic costs of a priori probabilities.
This can be bypassed if to work with the conditional probabilities FP and DP .
We have two conflicting objectives to make FP as small as possible and DP as large as possible.
Neyman-Pearson criterion. Constrain 'FP = and design a test to maximize DP (or minimize
MP ) under this constraint. The solution can be obtained by using Lagrange multipliers. 'M FF P P = +
( ) ( )1 0
0 0
1 0r| r|F R | R+ R | R '
H HZ Z Z
p H d p H d
=
If FP = , minimizing F minimizes MP . ( ) ( ) ( )
1 0
0
1 0r| r|F 1 ' R | R | R
H HZ
p H p H d = + For any positive value of an LRT will minimize F. F is minimized if we assign a point R to 0Z only when the term in the
bracket is negative.
If ( )( )
1
0
1r|
0r|
R |
R |H
H
p H
p H< assign point to 0Z (say 0H )
F is minimized by the likelihood ratio test. ( )1
0
RH
H
To satisfy the constraint is selected so that 'FP = . ( )
00|
| 'F HP p H d
= = Value of will be nonnegative because ( )
00|
|H
p H will be zero for negative values of .
( )0| 0l Hp X H ( )1| 1l Hp X H
L
( )0| 0l Hp X H ( )1| 1l Hp X H
L
Example. We assume that under 1H the source output is a constant voltage m . Under 0H the sourse output is zero. Voltage is corrupted by an additive noise. The out put is sampled with N samples for each second. Each noise sample is a i.i.d. zero mean Gaussian random variable with variance 2 .
1
0
: , 1, 2, ,
: , 1, 2, ,i i
i i
H r m n i N
H r n i N
= + =
= =
( )2
2
1 exp2 2in
Xp X =
The probability density of ir under each hypothesis is:
( ) ( ) ( )1
2
21|
1| exp2 2ii
ini ir H
R mp R H p R m
= =
( ) ( ) ( )0
2
20|
1| exp2 2ii
ini ir H
Rp R H p R
= =
The joint probability of N samples is:
( ) ( )1
2
21|1
1| exp2 2
Ni
Hi
R mp H =
= r R
( )0
2
20|1
1| exp2 2
Ni
Hi
Rp H =
= r R
The likelihood ratio is
( )
( )22
1
2
21
1 exp2 2
1 exp2 2
Ni
i
Ni
i
R m
R
=
=
=
R
After cancelling common terms and taking logarithm:
( )2
2 21
ln .2
N
ii
m NmR = = R
Likelihood ratio test is:
1
0
2
2 21
ln2
N H
iHi
m NmR = ,
1
0
2
1
ln2
N H
iHi
NmRm
=+ .
We use Nmd for normalisation.
1
01
1 ln2
N H
iHi
Nml RN Nm
== +
Under 0H l is obtained by adding N independent zero mean Gaussian variables with variance 2 and then dividing by N . Therefore l is ( )0,1N .
Under 1H l is ,1NmN
.
( )
2
log / /2
1 lnexp2 2 2F
d d
x dP dx erfcd
+
= = + where
Nmd is the distance between the means of the two densities.
( )
( )
( )
( )
2
log / /2
2
log / /2
1 exp2 2
1 logexp2 2 2
Dd d
d d
x dP dx
y ddy erfcd
+
= = =
In the communication systems a special case is important
( ) 0 1Pr F MP P PP + . If 0 1P P= the threshold is one and ( ) ( )12Pr F MP P + .
Receiver Operating Characteristics (ROC).
For a Neyman-Pearson test the values of FP and DP completely specify the test performance.
DP depends on FP . The function of ( )D FP P is defined as the Receiver Operating Characteristic (ROC).
The Receiver Operating Characteristic (ROC) completely describes the performance of the test as a function of the parameters of interest.
Example.
Properties of ROC. All continuous likelihood tests have ROCs that are concave
downward. All continuous likelihood ratio tests have ROCs that are above the
D FP P= . The slope of a curve in a ROC at a particular point is equal to the
value of the threshold required to achieve the FP and DP at that point
Whenever the maximum value of the Bayes risk is interior to the interval (0,1) of the P1 axis the minimax operating point is the intersection of the line
( ) ( )( ) ( )11 00 01 11 10 001 0D FC C C C P C C P + = and the appropriate curve on the ROC.
Determination of minimax operating point.
Conclusions. Using either the Bayes criterion of a Neyman-Pearson criterion, we
find that the optimum test is a likelihood ratio test. Regardless of the dimension of the observation space the optimum
test consist of comparing a scalar variable ( ) R with the threshold. For the binary hypothesis test the decision space is one dimensional. The test can be simplified by calculating the sufficient statistics. A complete description of the LRT performance was obtained by
plotting the conditioning probabilities DP and FP as the threshold was varied.
M Hypotheses. We choose one of M hypotheses There are M source outputs each of which corresponds to one of M
hypotheses. We are forsed to make decisions. There are 2M alternatives that may occur each time the experiment
is conducted.
Probabilistictransition
mechanismSource
NH
#1H0H
Z: Observation space
R
R
Zi
Z0
Say:H0
Z1
Say:H1
Say:Hi
( )0
0r|R |
Hp H
( )1 1r|
R |H
p H
( )r| R |N NHp H
Bayes Criterion.
ijC cost of each course of actions.
iZ region in observation space where we chose iH
iP a priori probabilities.
( )1 1
|0 0
|j
i
M M
j ij jHi j Z
PC p H d
= == r R RR
R is minimized through selecting iZ .
Example M = 3.
0 1 2Z Z Z Z= + +
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )
0 0
1 2 1
0 1
2 0 2
1 1
0 2
2 2
0 1 0
2
1
0 00 0 0 10 0| |
0 20 0 1 11 1| |
1 01 1 1 21 1| |
2 22 2 2 02 2| |
2 12 2|
| |
| + |
| |
| + |
|
H HZ Z Z Z
H HZ Z Z Z
H HZ Z
H HZ Z Z Z
HZ
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d
= +
+
+ +
+
r r
r r
r r
r r
r
R R R R
R R R R
R R R R
R R R R
R R
R
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2 1
0
0 2
1
0 1
2
0 00 1 11 2 22
2 02 22 2 1 01 11 1| |
0 10 00 0 2 12 22 2| |
0 20 00 0 1 21 11 1| |
| |
| |
| |
H HZ
H HZ
H HZ
PC PC PC
P C C p H P C C p H d
P C C p H P C C p H d
P C C p H P C C p H d
= + +
+ + + + + +
r r
r r
r r
R R R
R R R
R R R
R
R is minimized if we assign each R to the region in which the value of the integrand is the smallest.
Label the integrals ( ) ( ) ( )0 1 2, ,I I IR R R .
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
0 2
1 2
0 1
2 1
0 1
0 2
or
0 001 01 11 1 10 2 12 02 2or
or
0 002 02 22 2 20 1 21 01 1or
or
02 12 22 2 20 10 1 21 11 1or
H H
H H
H H
H H
H H
H H
P C C P C C P C C
P C C P C C P C C
P C C P C C P C C
+
+
+
R R
R R
R R
M hypotheses always lead to a decision space that has, at most, 1M dimensions.
Decision Space
2H
1H0H
( )1 R
( )2 R
Special case. (often in communication)
00 11 22 0C C C= = = 1,ijC i j= ( ) ( )
( ) ( )
( ) ( )
1 2
1 00 2
1 2
2 00 1
0 2
2 10 1
or
1 1 0 0| |or
or
2 2 0 0| |or
or
2 2 0 1| |or
| |
| |
| |
H H
H HH H
H H
H HH H
H H
H HH H
P p H P p H
P p H P p H
P p H P p H
r r
r r
r r
R R
R R
R R
Compute the posterior probabilities and choose the largest. Maximum a posteriori probability computer.
Special case. Degeneration of hypothesis. What happens if to combine 1H and 2H :
12 21 0C C= = . For simplicity
01 10 20 02C C C C= = =
00 11 22 0C C C= = = First two equations of the test reduce to
( ) ( )1 2
0
or
1 1 2 2 0
H H
HP P P + R R
1 2or H H
0H
( )2 R
( )1 R
Dummy hypothesis. Actual problem has two hypothesis 1H and 2H . We introduce a new one 0H with a priori probability 0 0P = Let 1 2 1P P+ = and 12 02 21 01,C C C C= = We always choose 1H or 2H . The test reduces to:
( ) ( ) ( ) ( )21
2 12 22 2 1 21 11 1
H
HP C C P C C R R
Useful if ( )( )
2
1
2|
1|
|
|H
H
p H
p Hr
r
R
R difficult to work with but ( )1 R , ( )2 R are
simple.
Conclusions. 1. The minimum dimension of the decision space is no more that
1M . The boundaries of the decision regions are hyperplanes in the ( )1 1, , m plane. 2. The test is simple to find but error probabilities are often difficult to compute. 3. An important test is the minimum total probability of error test. Here we compute the a posteriori probability of each test and choose the largest.