21
Minimax Hypothesis Testing ECE531 Lecture 3: Minimax Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 05-February-2009 Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 1 / 21

ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

ECE531 Lecture 3: Minimax Hypothesis Testing

D. Richard Brown III

Worcester Polytechnic Institute

05-February-2009

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 1 / 21

Page 2: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Simple Binary Bayesian Risks Under Different Priors

r(δ, π0) = π0R0(δ) + (1 − π0)R1(δ)

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 10

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1

prior

r(δ, π0)

r(δBπ′

0 , π0)

r(δBπ0 , π0)

π0π′0

R1(δ)

R1(δBπ′

0)

R0(δ)

R0(δBπ′

0)

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 2 / 21

Page 3: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Least Favorable Prior State Distribution

0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 10

0. 1

0. 2

0. 3

0. 5

0. 6

0. 7

0. 8

0. 9

1

prior

r(δBπlf , π0)r(δBπ′

0 , π0)

r(δBπ0 , π0)

π0πlfπ′0

R1(δBπlf )

R1(δBπ′

0)

R0(δBπlf )

R0(δBπ′

0)

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 3 / 21

Page 4: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Minimax Hypothesis Testing

Definition:

ρmm := arg minρ

maxj

Rj(ρ)

Remarks:

◮ No single decision rule minimizes the weighted average, e.g. Bayes,risk for every possible prior state distribution.

◮ A conservative approach is to minimize the worst case risk over allpossible prior state distributions.

◮ Intuitively, there should be a least favorable prior. Does it alwaysexist? Is it unique?

◮ Intuitively, the minimax decision rule should be the Bayesian decisionrule with constant Bayesian risk over the priors. Is this always true?

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 4 / 21

Page 5: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Minimum Bayesian Risk as a Function of the Prior

Let V (π) := r(δBπ, π) be the minimum Bayesian risk for the prior π.

Theorem

The minimum Bayesian risk V (π) is concave and continuous over the

space of priors satisfying πj ≥ 0, j = 0, 1, . . . , N − 1, and∑

j πj = 1.Hence, there exists a unique least favorable prior

πlf = arg maxπ

V (π).

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 5 / 21

Page 6: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Concavity of the Minimum Bayesian Risk

A function is concave if, for any {x, y} in the domain of f and anyα ∈ [0, 1], f(αx + (1 − α)y) ≥ αf(x) + (1 − α)f(y).

Denote a pair of priors as π and π′ and a third prior π′′ = απ + (1 − α)π′.We can write

V (π′′) = π′′⊤R(δBπ′′

)

= απ⊤R(δBπ′′

) + (1 − α)π′⊤R(δBπ′′

)

≥ αV (π) + (1 − α)V (π′)

hence V (π) is concave.

01π0 π′′

0 π′0

V (π0)

V (π′′0 ) V (π′

0)V

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 6 / 21

Page 7: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Continuity of the Minimum Bayesian Risk

Theorem (“A First Course in Optimization Theory” byR.K. Sundaram)

Let f : D → R be a concave function. Then, if D is open, f is continuous

on D. If D is not open, f is continuous on the interior of D.

Note that continuity does not imply differentiability.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 7 / 21

Page 8: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

The Four Possibilities

1

3 4

2

π0lf = 0π0lf = 1

π0lfπ0lf

V (π0)V (π0)

V (π0)V (π0)

π0π0

π0π0

R1(δBπlf )

R1(δBπlf )

R1(δBπlf )

R1(δBπlf )

R0(δBπlf )

R0(δBπlf )

R0(δBπlf )

R0(δBπlf )

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 8 / 21

Page 9: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 1: Differentiable Interior Maximum Risk

TheoremIf there exists a prior π′ such that the conditional risks satisfy R0(δ

Bπ′

) = R1(δBπ′

)

then π′ is a least favorable prior and the minimax decision rule is ρmm = δBπ′

.

Proof.Given a π′ satisfying R0(δ

Bπ′

) = R1(δBπ′

). For any δ,

max{R0(δ), R1(δ)} ≥ maxπ0∈[0,1]

π0R0(δ) + (1 − π0)R1(δ)

≥ π′R0(δ) + (1 − π′)R1(δ)

≥ π′R0(δBπ′

) + (1 − π′)R1(δBπ′

)

= R0(δBπ′

) = R1(δBπ′

)

Moreover, for any π0 ∈ [0, 1],

V (π′) = π′R0(δBπ′

) + (1 − π′)R1(δBπ′

) = π0R0(δBπ′

) + (1 − π0)R1(δBπ′

) ≥ V (π0)

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 9 / 21

Page 10: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

A Procedure for Finding the Minimax Decision Rule

1. Find a Bayesian decision rule δBπ as a function of the prior π.

2. See if Case 1 holds by solving for the unique least favorable prior πlf

using the equalizer rule:

R0(δBπlf ) = R1(δ

Bπlf )

3. If the solution exists, then set

ρmm = δBπlf

4. If there is no solution to the equalizer rule, then see if Case 3 or 4holds by computing the risk at the endpoints π0lf = 0 and π0lf = 1.

5. If neither endpoint is least favorable, then we must be in Case 2. Inthis case we must create a randomized minimax decision rule as aconvex function of two deterministic Bayes decision rules.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 10 / 21

Page 11: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Example: Coherent Detection of BPSK

Our Bayes decision rule for coherent BPSK with prior π0, π1 = 1 − π0 is

δBπ(y) =

1 if y > γ

0/1 if y = γ

0 if y < γ.

where γ := a0+a12

+ σ2

a1−a0ln π0

π1.

The conditional risks are

R0(δBπ) = Q

(

γ − a0

σ

)

R1(δBπ) = Q

(

a1 − γ

σ

)

where Q(x) :=∫ ∞x

1√2π

e−t2/2 dt.

Let’s try the equalizer rule. What value of γ gives us R0(δBπ) = R1(δ

Bπ)?Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 11 / 21

Page 12: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Example: Coherent Detection of BPSK

Answer: R0 = R1 when γ = a0+a12

. Hence

ρmm(y) =

1 if y > a0+a12

0/1 if y = a0+a12

0 if y < a0+a12

.

a0 a1Y0 Y1

γ = a0+a12

What does this imply about the least favorable prior?Answer: π0 = π1 = 1

2.

Given a0, a1, and σ, the minimax rule allows you to guarantee aworst-case risk over all priors.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 12 / 21

Page 13: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Example: Coherent Detection of BPSK

0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 10

0. 1

0. 2

0. 3

0. 5

0. 6

0. 7

0. 8

0. 9

1

prior

r(δBπlf , π0)r(δBπ′

0 , π0)

r(δBπ0 , π0)

π0πlfπ′0

R1(δBπlf )

R1(δBπ′

0)

R0(δBπlf )

R0(δBπ′

0)

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 13 / 21

Page 14: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Cases 3-4: Maximum Risk Occurs at Boundary

Let’s return to our coin flipping problem from Lecture 1 (H0 ↔ x0 = HTand H1 ↔ x1 = HH) with a modified cost matrix

C =

[

0 100100 60

]

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

prior π0

Bay

es r

isk

rule 1rule 2rule 3rule 4minimum

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 14 / 21

Page 15: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Cases 3-4: Maximum Risk Occurs at Boundary

Remarks:

◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.

◮ Note that the equalizer rule would give no solution to this problem:

R0(D1) = 100 and R1(D1) = 60

R0(D2) = 0 and R1(D2) = 100

R0(D3) = 50 and R1(D3) = 60

R0(D4) = 50 and R1(D4) = 100

No decision rule gives R0 = R1.

◮ In this example, the least favorable prior (maximizing the minimumrisk) is π0 = 0, or that the coin is always HH. This should make sense.

◮ The minimax decision rule is Rule 3: observe T, decide the coin is

fair; observe H, decide the coin is unfair.

◮ You can guarantee a worst-case risk of $60 by using Rule 3.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 15 / 21

Page 16: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 2: Non-Differentiable Interior Maximum

Back to our original coin flipping problem with cost matrix

C =

[

0 100100 0

]

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

prior π0

Bay

es r

isk

rule 1rule 2rule 3rule 4minimum

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 16 / 21

Page 17: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 2: Non-Differentiable Interior Maximum

Remarks:

◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.

◮ Again, the equalizer rule gives no deterministic solution since

R0(D1) = 100 and R1(D1) = 0

R0(D2) = 0 and R1(D2) = 100

R0(D3) = 50 and R1(D3) = 0

R0(D4) = 50 and R1(D4) = 100

◮ In this example, the least favorable prior (maximizing the minimumrisk) is π0 = 2

3, or that the coin is HT with probability 2

3.

◮ The minimax decision rule is neither Rule 2 or Rule 3.

◮ You can guarantee a worst-case risk of $1003

by using a randomizeddecision rule that is a combination of Rules 2 and 3.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 17 / 21

Page 18: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 2: Non-Differentiable Interior Maximum

Problem: Find a randomized decision rule that satisfies the equalizer rule

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

prior

Ba

ye

s r

isk

rule 1

rule 2

rule 3

rule 4

minimum

r(ρmm)V (πlf )V (πlf )

π0

R1(δBπ+

lf )

R1(δBπ−

lf )

R0(δBπ−

lf )

R0(δBπ+

lf )

δBπ+lf

δBπ−

lf

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 18 / 21

Page 19: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 2: Non-Differentiable Interior Maximum

Our randomized minimax decision rule is the

ρmm = αδBπ−

lf + (1 − α)δBπ+lf

We can calculate the randomization α ∈ [0, 1] by applying the equalizerrule:

αR0(δBπ−

lf ) + (1 − α)R0(δBπ+

lf ) = αR1(δBπ−

lf ) + (1 − α)R1(δBπ+

lf )

which gives the solution

α =R1(δ

Bπ+lf ) − R0(δ

Bπ+lf )

(R0(δBπ−

lf ) − R1(δBπ−

lf )) − (R0(δBπ+

lf ) − R1(δBπ+

lf ))

=V ′(π+

lf )

V ′(π+lf ) − V ′(π−

lf )

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 19 / 21

Page 20: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Case 2: Non-Differentiable Interior Maximum

What is the minimax decision rule in our example?

V ′(π+lf ) = −100

V ′(π−lf ) = 50

hence α = 23. If H0 is the hypothesis that the coin is HT, H1 is the

hypothesis that the coin is HH, the observations y0 = T, y1 = H, then ourdeterministic decision rules 2 and 3 can be written as

D3 = δBπ−

lf =

[

1 00 1

]

and D2 = δBπ+lf =

[

1 10 0

]

The minimax decision rule is then given by

ρmm(y = T) =2

3

»

10

+1

3

»

10

=

»

10

(T → always decide HT)

ρmm(y = H) =2

3

»

01

+1

3

»

10

=

»

1/32/3

– „

H → randomize →nPr(decide HT)=1/3Pr(decide HH)=2/3

«

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 20 / 21

Page 21: ECE531 Lecture 3: Minimax Hypothesis Testing · Minimax Hypothesis Testing Minimax Hypothesis Testing Definition: ρmm:= argmin ρ max j Rj(ρ) Remarks: No single decision rule minimizes

Minimax Hypothesis Testing

Final Remarks on Minimax Hypothesis Testing

1. The objective of minimax hypothesis testing is to minimize yourworst-case (maximum) risk over all possible prior state probabilities.

2. Conservative approach but useful in scenarios when:◮ the prior is unknown and/or◮ you need to provide a maximum risk guarantee.

3. Try the equalizer rule first!

4. Minimax risk at the endpoints only occurs in weird cases.

5. Finite observation space Y implies that the minimum Bayes risk curveV is not going to be differentiable everywhere. Randomization isoften necessary to obtain the minimax decision rule in these cases.

6. Composite hypotheses:◮ The equalizer rule is still valid.◮ Checking “endpoints” is still valid (but more points to check).◮ In the case of a non-differentiable interior maximum, finding the

randomization can be difficult.

Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 21 / 21