Gregory Gurevich and Albert Vexler The Department of Industrial Engineering and Management, SCE- Shamoon College of Engineering, Beer-Sheva 84100, Israel

Gregory Gurevich and Albert Vexler 1 2

The Department of Industrial Engineering and Management, SCE- Shamoon College of Engineering, Beer-Sheva 84100, Israel

Department of Biostatistics, The New York State University at Buffalo, Buffalo, NY 14214, USA

1

2

כנס האיכות הרביעי בגליל "איכות – הלכה ומעשה"2011 במאי, 12

On Optimality of the Shiryaev-Roberts Procedure for

Retrospective and Sequential Change Point Detections

This paper demonstrates that criteria for which a given test is optimal can be declared by

the structure of this test, and hence almost any reasonable test is optimal. In order to establish

this conclusion, the principle idea of the fundamental lemma of Neyman and Pearson is

applied to interpret the goodness of tests, as well as retrospective and sequential change point

detections are considered in the context of the proposed technique. We demonstrate that the

Shiryaev-Roberts approach applied to the retrospective change point detection yields the

averagely most powerful procedure. We also prove non-asymptotic optimality properties for

the Shiryayev-Roberts sequential procedure.

1 .INTRODUCTION

Without loss of generality, we can say that the principle idea of the proof of the fundamental

lemma of Neyman and Pearson is based on the trivial inequality

0 BAIBA (1.1 )

for all A , B and ]1,0[ ( I is the indicator function(. Thus, for example ,

if we would like to classify i.i.d. observations nXXX ,...,, 21 regarding the hypotheses:

10 X :H is from a density function 0f

versus

11 X :H is from a density function 1f ,

then the likelihood ratio test )i.e., we should reject 0H if CXfXfn

i ii 1 01 /

where C is a threshold( is uniformly most powerful. This classical proposition directly follows

from the expected value under 0H of )1.1( with n

i ii XfXfA1 01 / ,

CB and that is considered as any decision rule based on the observed sample .

CXf

XfEC

Xf

XfIC

Xf

XfE

n

i i

iH

n

i i

in

i i

iH

1 0

1

1 0

1

1 0

100 ,

n

n

ii

n

i i

in

i i

i dxdxxfCxf

xfI

xf

xf...... 1

10

1 0

1

1 0

1

n

n

ii

n

i i

i dxdxxfCxf

xfIC ...... 1

10

1 0

1

n

n

ii

n

i i

i dxdxxfxf

xf...... 1

10

1 0

1

n

n

ii dxdxxfC ...... 1

10 .

Therefore

11

01011 0

1

1 0

1

HH

n

i i

iH

n

i i

iH CPPC

Xf

XfCPC

Xf

XfP .

However, in the case when we have a given test-statistic nS instead of the likelihood ratio

n

i ii XfXf1 01 / , the inequality )1.1( yields

CSCSICS nnn ,

for any decision rule ]1,0[ . Therefore, the test-rule CSn has also an optimal

property following the application of )1.1(. The problem is to obtain a reasonable interpretation

of the optimality.

The present paper proposes that the structure of any given retrospective or sequential test

can substantiate type-)1.1( inequalities that provide optimal properties of that test. Although

this approach can be applied easily to demonstrate an optimality of tests, presentation of that

optimality in the classical operating characteristics )e.g., the power/significance level of tests(

is the complicated issue.

In order to represent the proposed methodology, we consider different kinds of hypothesis

testing. Firstly, we consider the retrospective change point problem. We demonstrate that the

Shiryaev-Roberts approach applied to the change point detection yields the average most

powerful procedure. Secondly, we consider the sequential testing. Here the focus on the

inequality )1.1( leads us to a non-asymptotic optimality of a well known sequential

procedure, for which only an asymptotic result of optimality has been proved in the literature .

2 .RETROSPECTIVE CHANGE POINT DETECTION

In this section we will focus on a sequence of previously obtained independent observations

in an attempt to detect whether they all share the same distribution. Note that, commonly, the

change point problem is evaluated under assumption that if a change in the distribution did

occur, then it is unique and the observations after the change all have the same distribution ,

which differs from the distribution of the observations before the change.

Let nXX ,...,1 be independent observations with density functions ngg ,...,1 ,

respectively. We want to test the null hypothesis

0i0 g : fH for all ni ,...,1

versus the alternative

10111 ......g : fggfgH n , is unknown.

The maximum likelihood estimation of the change point parameter applied to the

likelihood ratio

n

i ii XfXf 01 / leads us to the well accepted in the change point

literature CUSUM test: we should reject 0H if

CXf

Xfn

ki i

i

nk

0

1

1max ,

where 0C is a threshold .

Alternatively, following Gurevich and Vexler )2010(, the test based on the Shiryaev-Roberts

approach has the following form: we reject 0H if

CXf

Xf

nR

n

n

k

n

ki i

in

1 0

111 . (2.1 )

Consider an optimal meaning of the test )2.1(. Let kP and kE ( nk ,...,0 ) respectively

denote probability and expectation conditional on k (the case 0k corresponds to

0H .)By virtue of )1.1( with nRA n / and CB ,it follows that

CR

nCR

nICR

n nnn111

. (2.2 )

Without loss of generality, we assume that 1,0 is any decision rule based on the observed

sample nXX ,...,1 such that if 1 , then we reject 0H .Since

n

k

n

ki i

in Xf

XfE

nRE

n 1 0

100

11

n

k

n

ki

n

i

n

iii

i

i dxxfxf

xf

n 1 1 10

0

1...1

n

k

k

i

n

ii

n

kiii dxxfxfI

n 1

1

1 1101...

1

n

kkP

n 1

11

derivation of 0H -expectation of the left and right side of )2.2( directly provides the following

proposition.

Proposition 2.1 The test )2.1( is the averagely most powerful test, i.e.

n

kk

n

knnk CPP

nCR

nCPCR

nP

n 10

10 11

1111 ,

for any decision rule ]1,0[ based on the observations nXXX ,...,, 21 .

3 .SEQUENTIAL SHIRYAEV-ROBERTS CHANGE POINT DETECTION

In this section, we suppose that independent observations ,..., 21 XX are surveyed

sequentially. Let 121 ,...,, XXX be each distributed according to a density function 0f ,

whereas ,..., 1 XX have a density function 1f , where 1 is unknown .

The case corresponds to the situation when all observations are distributed according

to 0f .The formulation of sequential change point detection conforms to raising an alarm as

soon as possible after the change and to avoid false alarms. Efficient detection methods for this

stated problem are based on CUSUM and Shiryaev-Roberts stopping rules. The CUSUM

policy is: we stop sampling of X -s and report that a change in distribution of X has been

detected at the first time 1n that

CXfXfn

ki iink

01

1/max ,

for a given threshold C . The Shiryaev-Roberts procedure is defined by the stopping time

CnNC nR : 1inf ,(3.1 )

where the Shiryayev-Roberts test-statistic nR is

n

k

n

ki i

in Xf

XfR

1 0

1 (3.2) .

The sequential CUSUM detection procedure has a non-asymptotic optimal property

Moustakides, 1986(. At that, for the Shiryaev-Roberts procedure an asymptotic )as C )

optimality has been shown )Pollak, 1985(. )Although Yakir )1997( attempted to prove a non -

asymptotic optimality of the Shiryaev-Roberts procedure, Mei )2006( has pointed out the

inaccuracy of the Yakir's proofs.( In order to demonstrate optimality of the Shiryaev-Roberts

detection scheme, Pollak )1985( has proved an asymptotic closeness of the expected loss

using a Bayes rule for the considered change problem with a known prior distribution of

to that using the rule CN .However, in the context of simple application of the proposed

methodology, the procedure )3.1( itself declares loss functions for which that detection policy

is optimal. That is, setting

nNCRA ,min and CB in )1.1( leads to

0,min,min CRICR nNnN CC , (3.3)

for all ]1,0[ .Since nNCR CnNC,min ,

CRICR nNnN CC ,min,min(3.4)

011

nNICRkNICR Cn

n

kCk .

It is clear that )3.4( can be utilized to present an optimal property of the detection rule CN .

However, here, for simplicity, noting that every summand in the left side of the inequality

(3.4 )is non-negative, we focus only on

0 nNICR Cn .

Thus, if is a stopping time )assuming that for all k ,the event k is measurable

in the -algebra generated by kXX ,...,1

)and is defined by nI

then

0N , C nnIRCE n . (3.5)

By virtue of the definition )3.2(, we obtain from )3.5( that

nNPnNPC CC ,min

0,min1

n

kCkCk nNPnNP .

(3.6)

Therefore,

1

,minn

CC nNPnNPC

0,min1 1

n

n

kCkCk nNPnNP ,

where

(3.7)

1 1

,minn

n

kCkCk nNPnNP

1

,mink kn

CkCk nNPnNP

1

1,min1k

CkCk kNEkNE ,

(3.8)

0 aaIa .Let us summarize )3.7( with )3.8( in the next proposition.

Proposition 3.1 The Shiryaev-Roberts stopping time )3.1( satisfies

Cn

Cn NCEnNE

1

1

Cn

Cn NCEnNE ,min1,minmin1

.

Note that , E corresponds to the average run length to false alarm of a stopping rule

,and hence small values of E are privileged, whereas small values of

1nEn are also preferable )because 1nEn relates to fallibility

of the sequential detection in the case n .)Obviously, if then

CN,min detects that faster than the stopping time CN .

4 .REMARK AND CONCLUSION

In the context of statistical testing, when distribution functions of observations are known up

to parameters, one of the approaches dealing with estimation of unknown parameters is the

mixture technique. To be specific, consider the change point problem introduced in Section 2 .

Assume that the density function ;11 ufuf ,where is the unknown

parameter. In this case, we can transform the test )2.1( in accordance with the mixture

methodology )e.g. Krieger et al., 2003(. That is, we have to choose a prior and pretend that

~ .Thus, the mixture Shiryaev-Roberts statistic has form

n

k

n

ki i

in ud

uXf

uXf

nR

1 0

1(1) ();

;1

and hence the consequential transformation of the test )2.1( has the following property

CRn

CPuduXXCRn

Pn n

n

kikjjnk

(1)0

11

(1) 1();f from are

11

();f from are H ofrejection declares 1

110 uduXXP

n

n

kikjjk

00 H ofrejection declares CP ,

for any decision rule ]1,0[ based on the observations nXX ,...,1 .In this case the

optimality formulated in Proposition 2.1 is integrated over values of the unknown parameter .

Finally, note that for the case where one wishes to investigate the optimality of a given test

in an unfixed context, inequalities similar to )1.1( can be obtained by focusing on the

structure of a test. These inequalities report an optimal property of the test. However ,

translation of that property in terms of the quality of tests is the issue.

QuestionsQuestions??

Thank youThank you!!

Documents

Gregory Gurevich and Albert Vexler The Department of Industrial Engineering and Management, SCE- Shamoon College of Engineering, Beer-Sheva 84100, Israel