13
Annals of Operations Research 103, 161–173, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Global Convergence of Conjugate Gradient Methods without Line Search JIE SUN [email protected] Department of Decision Sciences, National University of Singapore, Republic of Singapore JIAPU ZHANG ∗∗ Department of Mathematics, University of Melbourne, Melbourne, Australia Abstract. Global convergence results are derived for well-known conjugate gradient methods in which the line search step is replaced by a step whose length is determined by a formula. The results include the following cases: (1) The Fletcher–Reeves method, the Hestenes–Stiefel method, and the Dai–Yuan method applied to a strongly convex LC 1 objective function; (2) The Polak–Ribière method and the Conjugate Descent method applied to a general, not necessarily convex, LC 1 objective function. Keywords: conjugate gradient methods, convergence of algorithms, line search AMS subject classification: 90C25, 90C30 1. Introduction The conjugate gradient method is quite useful in finding an unconstrained minimum of a high-dimensional function f (x). In general, the method has the following form: d k = g k for k = 1, g k + β k d k1 for k> 1, (1) x k+1 = x k + α k d k , (2) where g k denotes the gradient f (x k ), α k is a steplength obtained by a line search, d k is the search direction, and β k is chosen so that d k becomes the kth conjugate direction when the function is quadratic and the line search is exact. Varieties of this method differ in the way of selecting β k . Some well-known formulae for β k are given by β FR k = g k 2 g k1 2 (Fletcher–Reeves [7]), (3) β PR k = g T k (g k g k1 ) g k1 2 (Polak–Ribière [13]), (4) Corresponding author. This author is a fellow of the Singapore-MIT Alliance. His research was partially supported by grant R314-000-009-112 and R314-000-026-112 of National University of Singapore. ∗∗ This author’s work was partially supported by a scholarship from National University of Singapore.

Global Convergence of Conjugate Gradient Methods without Line Search

  • Upload
    jie-sun

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Annals of Operations Research 103, 161–173, 2001 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

Global Convergence of Conjugate Gradient Methodswithout Line Search

JIE SUN ∗ [email protected] of Decision Sciences, National University of Singapore, Republic of Singapore

JIAPU ZHANG ∗∗Department of Mathematics, University of Melbourne, Melbourne, Australia

Abstract. Global convergence results are derived for well-known conjugate gradient methods in which theline search step is replaced by a step whose length is determined by a formula. The results include thefollowing cases: (1) The Fletcher–Reeves method, the Hestenes–Stiefel method, and the Dai–Yuan methodapplied to a strongly convex LC1 objective function; (2) The Polak–Ribière method and the ConjugateDescent method applied to a general, not necessarily convex, LC1 objective function.

Keywords: conjugate gradient methods, convergence of algorithms, line search

AMS subject classification: 90C25, 90C30

1. Introduction

The conjugate gradient method is quite useful in finding an unconstrained minimum ofa high-dimensional function f (x). In general, the method has the following form:

dk ={−gk for k = 1,

−gk + βkdk−1 for k > 1,(1)

xk+1 = xk + αkdk, (2)

where gk denotes the gradient ∇f (xk), αk is a steplength obtained by a line search, dk isthe search direction, and βk is chosen so that dk becomes the kth conjugate directionwhen the function is quadratic and the line search is exact. Varieties of this methoddiffer in the way of selecting βk. Some well-known formulae for βk are given by

βFRk = ‖gk‖2

‖gk−1‖2(Fletcher–Reeves [7]), (3)

βPRk = gT

k (gk − gk−1)

‖gk−1‖2(Polak–Ribière [13]), (4)

∗ Corresponding author. This author is a fellow of the Singapore-MIT Alliance. His research was partiallysupported by grant R314-000-009-112 and R314-000-026-112 of National University of Singapore.∗∗ This author’s work was partially supported by a scholarship from National University of Singapore.

162 SUN AND ZHANG

βHSk = gT

k (gk − gk−1)

dTk−1(gk − gk−1)

(Hestenes–Stiefel [9]), (5)

and

βCDk = ‖gk‖2

−dTk−1gk−1

(The Conjugate Descent Method [6]), (6)

where ‖ · ‖ is the Euclidean norm and “T” stands for the transpose. Recently Dai andYuan [5] also introduced a formula for βk:

βDYk = ‖gk‖2

dTk−1(gk − gk−1)

. (7)

For ease of presentation we call the methods corresponding to (3)–(7) the FR method,the PR method, the HS method, the CD method, and the DY method, respectively. Theglobal convergence of methods (3)–(7) has been studied by many authors, includingAl-Baali [1], Gilbert and Nocedal [8], Hestenes and Stiefel [9], Hu and Storey [10],Liu, Han and Yin [11], Powell [14,15], Touati-Ahmed and Storey [16], and Zou-tendijk [18] among others. A key factor of global convergence is how to select thesteplength αk. The most natural choice of αk is to do the exact line search, i.e., to letαk = argminα�0f (xk +αdk). However, somewhat surprisingly, this natural choice couldresult in a non-convergent sequence in the case of PR and HS methods, as shown byPowell [15]. Inspired by Powell’s work, Gilbert and Nocedal [8] conducted an elegantanalysis and showed that the PR method is globally convergent if βPR

k is restricted to benon-negative and αk is determined by a line search step satisfying the sufficient descentcondition gT

k dk � −c‖gk‖2 in addition to the standard Wolfe conditions [17]. Recently,Dai and Yuan [3,4] showed that both the CD method and the FR method are globallyconvergent if the following line search conditions for αk are satisfied:{

f (xk) − f (xk + αkdk) � −δαkgTk dk,

σ1gTk dk � g(xk + αkdk)

Tdk � −σ2gTk dk,

(8)

where σ1 � 0, 0 < δ < σ1 < 1, 0 < σ2 < 1 for the CD method and in additionσ1 + σ2 � 1 for the FR method.

In this paper we consider the problem from a different angle. We show that aunified formula for αk (see (10) below) can ensure global convergence for many cases,which include:

(1) The FR method, the HS method, and the DY method applied to a strongly convexfirst order Lipschitz continuous (LC1 for short) objective function.

(2) The PR method and the CD method applied to a general, not necessarily convex,LC1 objective function. For convenience we will refer to these methods as conjugategradient methods without line search since the steplength is determined by a formularather than a line search process.

GLOBAL CONVERGENCE 163

2. The proof for convergence

We adopt the following assumption on function f which is commonly used in the liter-ature.

Assumption 1. The function f is LC1 in a neighborhood N of the level set L := {x ∈R

n | f (x) � f (x1)} and L is bounded. Here, by LC1 we mean that the gradient ∇f (x)

is Lipschitz continuous with modulus µ, i.e., there exists µ > 0 such that ‖∇f (xk+1) −∇f (xk)‖ � µ‖xk+1 − xk‖ for any xk+1, xk ∈ N.

Assumption 1 is sufficient for the global convergence of the PR method and theCD method, but seems to be not enough for the convergence of the FR method, theHS method, and the DY method without line search. Thus, for these three methods weimpose the following stronger assumption.

Assumption 2. The function f is LC1 and strongly convex on N. In other words, thereexists λ > 0 such that [∇f (xk+1) − ∇f (xk)]T(xk+1 − xk) � λ‖xk+1 − xk‖2 for anyxk+1, xk ∈ N.

Note that assumption 2 implies assumption 1 since a strongly convex function hasbounded level sets.

Let {Qk} be a sequence of positive definite matrices. Assume that there existνmin > 0 and νmax > 0 such that ∀d ∈ R

n

νmindTd � dTQkd � νmaxd

Td. (9)

This condition would be satisfied, for instance, if Qk = Q and Q is positive definite.We analyze the conjugate gradient methods that use the following steplength formula:

αk = − δgTk dk

‖dk‖2Qk

, (10)

where ‖dk‖Qk:=

√dT

k Qkdk and δ ∈ (0, νmin/µ). Note that the specification of δ ensuresδµ/νmin < 1.

Lemma 1. Suppose that xk is given by (1), (2), and (10). Then

gTk+1dk = ρkg

Tk dk (11)

holds for all k, where

ρk = 1 − δφk‖dk‖2

‖dk‖2Qk

(12)

164 SUN AND ZHANG

and

φk =

0 for αk = 0,

(gk+1 − gk)T(xk+1 − xk)

‖xk+1 − xk‖2for αk �= 0.

(13)

Proof. The case of αk = 0 implies that ρk = 1 and gk+1 = gk . Hence (11) is valid. Wenow prove for the case of αk �= 0. From (2) and (10) we have

gTk+1dk = gT

k dk + (gk+1 − gk)Tdk = gT

k dk + α−1k (gk+1 − gk)

T(xk+1 − xk)

= gTk dk + α−1

k φk‖xk+1 − xk‖2 = gTk dk + αkφk‖dk‖2

= gTk dk − δgT

k dk

‖dk‖2Qk

φk‖dk‖2 =(

1 − δφk‖dk‖2

‖dk‖2Qk

)gT

k dk = ρkgTk dk. (14)

The proof is complete. �

Corollary 2. In formulae (6) and (7) we have βDYk = βCD

k /(1 − ρk−1).

Proof. This is straightforward from (6), (7), and (11). �

Corollary 3. There holds

1 − δµ

νmin� ρk � 1 + δµ

νmin(15)

for all k if assumption 1 is valid; and this estimate can be sharpened to

0 < ρk � 1 − δλ

νmax(16)

if assumption 2 is valid.

Proof. By (12) and (13) we have

ρk = 1 − δφk

‖dk‖2

‖dk‖2Qk

= 1 − δ(gk − gk−1)T(xk − xk−1)

‖xk − xk−1‖2

‖dk‖2

‖dk‖2Qk

. (17)

Then from assumption 1 or 2 we have either

‖gk+1 − gk‖ � µ‖xk+1 − xk‖ (18)

or

(gk − gk−1)T(xk − xk−1) � λ‖xk − xk−1‖2, (19)

which, together with (9) and (17), leads to the corresponding bounds for ρk. �

GLOBAL CONVERGENCE 165

Lemma 4. Suppose that assumptions 1 holds and that xk is given by (1), (2), and (10).Then

∑dk �=0

(gTk dk)

2

‖dk‖2< ∞. (20)

Proof. By the mean-value theorem we have

f (xk+1) − f (xk) = gT(xk+1 − xk), (21)

where g = ∇f (x) for some x ∈ [xk, xk+1]. Now by the Cauchy–Schwartz inequality,(2), (10), and assumption 1 we obtain

gT(xk+1 − xk) = gTk (xk+1 − xk) + (g − gk)

T(xk+1 − xk)

� gTk (xk+1 − xk) + ‖g − gk‖ ‖xk+1 − xk‖

� gTk (xk+1 − xk) + µ‖xk+1 − xk‖2 = αkg

Tk dk + µα2

k‖dk‖2

= αkgTk dk − µαkδg

Tk dk‖dk‖2

‖dk‖2Qk

= αkgTk dk

(1 − µδ‖dk‖2

‖dk‖2Qk

)

� −δ

(1 − µδ

νmin

)(gT

k dk)2

‖dk‖2Qk

; (22)

i.e.,

f (xk+1) − f (xk) � −δ

(1 − µδ

νmin

)(gT

k dk)2

‖dk‖2Qk

, (23)

which implies f (xk+1) � f (xk). It follows by assumption 1 that limk→∞ f (xk) exists.Thus, from (9) and (23) we obtain

(gTk dk)

2

‖dk‖2� νmax

(gTk dk)

2

‖dk‖2Qk

� νmax

δ(1 − µδ/νmin)

[f (xk) − f (xk+1)

]. (24)

This finishes our proof. �

Lemma 5. Suppose that assumption 1 holds and that xk is given by (1), (2), and (10).Then

lim infk→∞ ‖gk‖ �= 0 implies

∑dk �=0

‖gk‖4

‖dk‖2< ∞. (25)

Proof. If lim infk→∞ ‖gk‖ �= 0, then there exists γ > 0 such that ‖gk‖ � γ for all k.

Let

λk = |gTk dk|

‖dk‖ . (26)

166 SUN AND ZHANG

Then by lemma 4 there holds

λk � γ

4(27)

for all large k. By lemma 1 and corollary 3 we have

∣∣gTk dk−1

∣∣ = ∣∣ρkgTk−1dk−1

∣∣ �(

1 + δµ

νmin

)∣∣gTk−1dk−1

∣∣ < 2∣∣gT

k−1dk−1

∣∣. (28)

Considering (1), we have

gk = βkdk−1 − dk. (29)

By multiplying gk on both sides of (29), we obtain

‖gk‖2 = βkgTk dk−1 − gT

k dk. (30)

From (30) and (28) it follows that

‖gk‖2

‖dk‖ = βkgTk dk−1 − gT

k dk

‖dk‖ � |βk| |gTk dk−1| + |gT

k dk|‖dk‖

� 2λk−1‖βkdk−1‖

‖dk‖ + λk = λk + 2λk−1‖dk + gk‖

‖dk‖� λk + 2λk−1 + 2λk−1

‖gk‖2

‖dk‖‖gk‖ � λk + 2λk−1 + 2λk−1‖gk‖2

γ ‖dk‖� λk + 2λk−1 + 2

4

) ‖gk‖2

γ ‖dk‖ = λk + 2λk−1 + ‖gk‖2

2‖dk‖;

i.e.,

‖gk‖2

‖dk‖ � λk + 2λk−1 + ‖gk‖2

2‖dk‖ . (31)

The above relation can be re-written as

‖gk‖2

‖dk‖ � 2λk + 4λk−1 � 4(λk + λk−1). (32)

Hence we have

‖gk‖4

‖dk‖2� 16(λk + λk−1)

2 � 32(λ2

k + λ2k−1

). (33)

By lemma 4 we know that∑

λ2k < ∞. Hence we obtain from (33)

∑dk �=0

‖gk‖4

‖dk‖2< ∞.

Remarks. Note that the two lemmae above are also true if assumption 2 holds becauseassumption 2 implies assumption 1. The proof of lemma 5 is similar to the one of

GLOBAL CONVERGENCE 167

theorem 2.3 and corollary 2.4 in [2]. The difference is that here we do not need toassume the strong Wolfe conditions.

Lemma 6. The CD method satisfies

‖dk‖2

‖gk‖4� ‖dk−1‖2

‖gk−1‖4+ �

‖gk‖2(34)

under assumption 1, where � is a positive constant.

Proof. For the CD method, by lemma 1 and corollary 3 we have

gTk dk = gT

k

(−gk + ‖gk‖2

−gTk−1dk−1

dk−1

)= −‖gk‖2 − ‖gk‖2 gT

k dk−1

gTk−1dk−1

= −‖gk‖2 − ‖gk‖2 ρk−1gTk−1dk−1

gTk−1dk−1

= −(1 + ρk−1)‖gk‖2 < 0. (35)

Note that from the last equality above and corollary 3 we have(gT

k dk

)2 � ‖gk‖4, (36)

which is due to ρk−1 � 0 that is deduced by corollary 3 and δ ∈ (0, νmin/µ). Applying(1), (6), lemma 1 and (36), we obtain

‖dk‖2 = ∥∥−gk + βCDk dk−1

∥∥2 =∥∥∥∥−gk + ‖gk‖2

−gTk−1dk−1

dk−1

∥∥∥∥2

= ‖gk‖2 + ‖gk‖4

(gTk−1dk−1)

2‖dk−1‖2 + 2‖gk‖2

gTk−1dk−1

gTk dk−1

= ‖gk‖2 + ‖gk‖4

(gTk−1dk−1)2

‖dk−1‖2 + 2ρk−1gTk−1dk−1

gTk−1dk−1

‖gk‖2

= (1 + 2ρk−1)‖gk‖2 + ‖gk‖4

(gTk−1dk−1)2

‖dk−1‖2

�[

1 + 2

(1 + δµ

νmin

)]‖gk‖2 + ‖gk‖4

‖gk−1‖4‖dk−1‖2, (37)

where

‖gk‖4

(gTk−1dk−1)2

‖dk−1‖2 � ‖gk‖4

‖gk−1‖4‖dk−1‖2

was gotten by using (36). By defining � := 1 + 2(1 + δµ/νmin) and dividing both sidesby ‖gk‖4 we get

‖dk‖2

‖gk‖4� ‖dk−1‖2

‖gk−1‖4+ �

‖gk‖2. (38)

168 SUN AND ZHANG

The proof for the CD method is complete. �

Lemma 7. Suppose assumption 2 holds. Then the FR method satisfies

‖dk‖2

‖gk‖4� ‖dk−1‖2

‖gk−1‖4+ �

‖gk‖2, (39)

where � is a positive constant.

Proof.

−gTk dk−1 = −ρk−1g

Tk−1dk−1 = −ρk−1g

Tk−1

(−gk−1 + βFRk−1dk−2

)

= −ρk−1gTk−1

(−gk−1 + ‖gk−1‖2

‖gk−2‖2dk−2

)

= ρk−1‖gk−1‖2 − ρk−1‖gk−1‖2

‖gk−2‖2gT

k−1dk−2. (40)

Thus, we have a recursive equation which leads to

−gTk dk−1 = ρk−1‖gk−1‖2 − ρk−1‖gk−1‖2

‖gk−2‖2gT

k−1dk−2

= ρk−1‖gk−1‖2 + ρk−1ρk−2‖gk−1‖2 − ρk−1ρk−2‖gk−1‖2

‖gk−3‖2gT

k−2dk−3

= · · · = ‖gk−1‖2(ρk−1 + ρk−1ρk−2 + · · · + ρk−1ρk−2 · · · ρ2)

� ‖gk−1‖2∞∑

k=1

(1 − δλ

νmax

)k

=: �1‖gk−1‖2. (41)

Hence we obtain

‖dk‖2

‖gk‖4= ‖gk‖2 − 2βFR

k gTk dk−1 + (βFR

k )2‖dk−1‖2

‖gk‖4

� ‖gk‖2 + 2�1βFRk ‖gk−1‖2 + (βFR

k )2‖dk−1‖2

‖gk‖4

= 1 + 2�1

‖gk‖2+ ‖dk−1‖2

‖gk−1‖4. (42)

By defining � := 1 + 2�1, the proof is complete. �

Lemma 8. The DY method satisfies

‖dk‖2

‖gk‖4=

(1 − ρk−2

1 − ρk−1

)2 ‖dk−1‖2

‖gk−1‖4+

(1 + ρk−1

1 − ρk−1

)1

‖gk‖2(43)

under assumption 2.

GLOBAL CONVERGENCE 169

Proof. For the DY method, by (1), (7), lemma 1 and corollary 3 we have

gTk dk = gT

k

(−gk + ‖gk‖2

dTk−1(gk − gk−1)

dk−1

)= −‖gk‖2 + ‖gk‖2 gT

k dk−1

gTk dk−1 − gT

k−1dk−1

= −‖gk‖2 + ‖gk‖2 ρk−1gTk−1dk−1

(ρk−1 − 1)gTk−1dk−1

= − 1

1 − ρk−1‖gk‖2 < 0, (44)

where 0 < ρk−1 < 1 because of corollary 3. Applying (1), (7), lemma 1 and (44), weobtain

‖dk‖2 = ∥∥−gk + βDYk dk−1

∥∥2 =∥∥∥∥−gk + βCD

k dk−1

1 − ρk−1

∥∥∥∥2

=∥∥∥∥−gk + ‖gk‖2

−gTk−1dk−1

dk−1

1 − ρk−1

∥∥∥∥2

= ‖gk‖2 + ‖gk‖4

(1 − ρk−1)2(gTk−1dk−1)2

‖dk−1‖2 + 2‖gk‖2

(1 − ρk−1)gTk−1dk−1

gTk dk−1

= ‖gk‖2 + ‖gk‖4

(1 − ρk−1)2(gTk−1dk−1)2

‖dk−1‖2 + 2ρk−1gTk−1dk−1

(1 − ρk−1)gTk−1dk−1

‖gk‖2

=(

1 − ρk−2

1 − ρk−1

)2 ‖dk−1‖2

‖gk−1‖4‖gk‖4 +

(1 + ρk−1

1 − ρk−1

)‖gk‖2. (45)

Note in the last step above we used equation (44). Dividing both sides of the aboveformula by ‖gk‖4, the proof is complete. �

Theorem 9. Suppose that assumption 1 holds. Then the CD method will generate asequence {xk} such that lim infk→∞ ‖gk‖ = 0. The same conclusion holds for the FRand DY methods under assumption 2.

Proof. We first show this for the CD and FR methods. If lim infk→∞ ‖gk‖ �= 0, thenthere exists γ > 0 such that ‖gk‖ � γ for all k, and by lemma 4 we have

∑dk �=0

‖gk‖4

‖dk‖2< ∞. (46)

From lemmas 6 and 7 the two methods satisfy

‖dk‖2

‖gk‖4� ‖dk−1‖2

‖gk−1‖4+ �

‖gk‖2(47)

under either assumption 1 or assumption 2. Hence we get

‖dk‖2

‖gk‖4� ‖dk−1‖2

‖gk−1‖4+ �

γ 2� ‖dk−2‖2

‖gk−2‖4+ 2�

γ 2� · · · � ‖d1‖2

‖g1‖4+ (k − 1)�

γ 2. (48)

170 SUN AND ZHANG

Thus, (48) means that

‖gk‖4

‖dk‖2� 1

ka − a + b, (49)

where a = �/γ 2 and b = ‖d1‖2/‖g1‖4 = 1/‖g1‖2 – both are constants. From (49) wehave

∑dk �=0

‖gk‖4

‖dk‖2= +∞, (50)

which is contradictory to (46). Hence the theorem is valid for the CD and FR methods.We next consider the DY method. If lim infk→∞ ‖gk‖ �= 0, then there exists γ > 0

such that ‖gk‖ � γ for all k, and by lemma 4 we have

∑dk �=0

‖gk‖4

‖dk‖2< ∞. (51)

From lemma 8 the method satisfies

‖dk‖2

‖gk‖4=

(1 − ρk−2

1 − ρk−1

)2 ‖dk−1‖2

‖gk−1‖4+

(1 + ρk−1

1 − ρk−1

)1

‖gk‖2, (52)

under assumption 2. Hence we get

‖dk‖2

‖gk‖4�

(1 − ρk−2

1 − ρk−1

)2 ‖dk−1‖2

‖gk−1‖4+

(1 + ρk−1

1 − ρk−1

)1

γ 2

�(

1 − ρk−3

1 − ρk−1

)2 ‖dk−2‖2

‖gk−2‖4+

[1 − ρ2

k−1

(1 − ρk−1)2

+ 1 − ρ2k−2

(1 − ρk−1)2

]1

γ 2� · · ·

�(

1 − ρ1

1 − ρk−1

)2 ‖d2‖2

‖g2‖4+

[1 − ρ2

k−1

(1 − ρk−1)2+ · · · + 1 − ρ2

2

(1 − ρk−1)2

]1

γ 2. (53)

The quantities (1 − ρ1

1 − ρk−1

)2

,1 − ρ2

k−1

(1 − ρk−1)2, · · · , 1 − ρ2

2

(1 − ρk−1)2

(54)

have a common upper bound due to corollary 3. Denoting this common bound by �,

from (53) we obtain

‖gk‖4

‖dk‖2� 1

ka − 2a + b, (55)

where a = �/γ 2, b = �‖d2‖2/‖g2‖4 are constants. From (55) we have

∑dk �=0

‖gk‖4

‖dk‖2= +∞, (56)

GLOBAL CONVERGENCE 171

which is contradictory to (51). Hence the proof for the DY method is complete and thetheorem is valid. �

Now we turn to the PR and the HS methods.

Theorem 10. Suppose that assumption 1 holds. Then the PR method will generate asequence {xk} such that lim infk→∞ ‖gk‖ = 0. If assumption 2 holds, then the sameconclusion holds for the HS method as well.

Proof. We consider the PR method first. Assume in contrary that ‖gk‖ � γ for all k.From lemma 4 we have

∑‖xk+1 − xk‖2 =

∑‖αkdk‖2 =

∑dk �=0

(gTk dk)

2

‖dk‖2Qk

� 1

νmin

∑dk �=0

(gTk dk)

2

‖dk‖2< ∞. (57)

Hence ‖xk+1 − xk‖2 → 0 and

∣∣βPRk

∣∣ =∣∣∣∣g

Tk+1(gk+1 − gk)

‖gk‖2

∣∣∣∣ → 0 (58)

as k → ∞. Since L is bounded, both {xk} and {gk} are bounded. By using

‖dk‖ � ‖gk‖ + |βk|‖dk−1‖, (59)

one can show that ‖dk‖ is uniformly bounded. Thus we have

∣∣gTk dk

∣∣ = ∣∣gTk

(−gk + βPRk dk−1

)∣∣ � ‖gk‖2 − ∣∣βPRk

∣∣‖gk‖‖dk−1‖ � ‖gk‖2

2(since

∥∥βPRk dk−1

∥∥ � ‖gk‖2

)(60)

for large k. Thus we have

(gTk dk)

2

‖dk‖2‖gk‖2� 1

2

‖gk‖2

‖dk‖2. (61)

Since ‖gk‖ � γ and ‖dk‖ is bounded above, we conclude that there is ε > 0 such that

(gTk dk)

2

‖dk‖2‖gk‖2� ε, (62)

which implies

∑dk �=0

‖gk‖2 (gTk dk)

2

‖dk‖2‖gk‖2= ∞. (63)

172 SUN AND ZHANG

This is a contradiction to lemma 4. The proof for the PR method is complete. We nowconsider the HS method. Again, we assume ‖gk‖ � γ for all k and note that gk−gk−1 →0. Since

gTk dk = gT

k

(−gk + βHSk dk−1

) = gTk

(−gk − gT

k (gk − gk−1)

(1 − ρk−1)dTk−1gk−1

dk−1

)

= −‖gk‖2 − ρk−1

1 − ρk−1gT

k (gk − gk−1), (64)

we have (gT

k dk

‖gk‖2

)2

=[

1 + ρk−1

1 − ρk−1

gTk (gk − gk−1)

‖gk‖2

]2

� 1

2(65)

for large k. This is the same as (61). Moreover, from

∣∣βHSk

∣∣ =∣∣∣∣ gT

k (gk − gk−1)

(1 − ρk−1)dTk−1gk−1

∣∣∣∣ �√

2

δλ/νmax

∣∣∣∣gTk (gk − gk−1)

‖gk−1‖2

∣∣∣∣ → 0 (66)

we conclude that ‖dk‖ is bounded through (59). The rest of the proof is the same as theproof for the PR method. �

3. Final remarks

We show that by taking a “fixed” steplength αk defined by formula (10), the conjugategradient method is globally convergent for several popular choices of βk. The result dis-closes an interesting property of the conjugate gradient method – its global convergencecan be guaranteed by taking a pre-determined steplength rather than following a set ofline search rules. This steplength might be practical in cases that the line search is ex-pensive or hard. Our proofs require that the function is at least LC1 (sometimes stronglyconvex in addition) and the level set L is bounded. We point out that the latter condi-tion can be relaxed to that the function is lower bounded on L in the case of the CDmethod because an upper bound of ‖gk‖ is not assumed in the corresponding proof. Weallow certain flexibility in selecting the sequence {Qk} in practical computation. At thesimplest case we may just let all Qk be the identity matrix. It would be more interest-ing to define Qk as a matrix of certain simple structure that carries some second orderinformation of the objective function. This might be a topic of further research.

Acknowledgments

The authors are grateful to the referees for their comments, which improved the paper.The second author is much indebted to Professor C. Wang for his cultivation and toProfessor L. Qi for his support.

GLOBAL CONVERGENCE 173

References

[1] M. Al-Baali, Descent property and global convergence of the Fletcher–Reeves method with inexactline search, IMA J. Numer. Anal. 5 (1985) 121–124.

[2] Y.H. Dai, J.Y. Han, G.H. Liu, D.F. Sun, H.X. Yin and Y.X. Yuan, Convergence properties of nonlinearconjugate gradient methods, SIAM J. Optim. 10 (1999) 345–358.

[3] Y.H. Dai and Y. Yuan, Convergence properties of the conjugate descent method, Advances in Mathe-matics 25 (1996) 552–562.

[4] Y.H. Dai and Y. Yuan, Convergence properties of the Fletcher–Reeves method, IMA J. Numer. Anal.16 (1996) 155–164.

[5] Y.H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence prop-erty, SIAM J. Optimization 10 (1999) 177–182.

[6] R. Fletcher, Practical Method of Optimization, Vol I: Unconstrained Optimization, 2nd edn. (Wiley,New York, 1987).

[7] R. Fletcher and C. Reeves, Function minimization by conjugate gradients, Comput. J. 7 (1964) 149–154.

[8] J.C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for opti-mization, SIAM Journal on Optimization 2 (1992) 21–42.

[9] M.R. Hestenes and E. Stiefel, Method of conjugate gradient for solving linear system, J. Res. Nat.Bur. Stand. 49 (1952) 409–436.

[10] Y. Hu and C. Storey, Global convergence result for conjugate gradient methods, Journal of Optimiza-tion Theory and Applications 71 (1991) 399–405.

[11] G. Liu, J. Han and H. Yin, Global convergence of the Fletcher–Reeves algorithm with inexact linesearch, Appl. Math. J. Chinese Univ. Ser. B 10 (1995) 75–82.

[12] B. Polak, The conjugate gradient method in extreme problems, Comput. Math. Math. Phys. 9 (1969)94–112.

[13] B. Polak and G. Ribière, Note sur la convergence des méthodes de directions conjuguées, Rev. Fran.Informat. Rech. Opér. 16 (1969) 35–43.

[14] M.J.D. Powell, Nonconvex minimization calculations and the conjugate gradient method, in: LectureNotes in Mathematics 1066 (1984) pp. 121–141.

[15] M.J.D. Powell, Convergence properties of algorithms for nonlinear optimization, SIAM Review 28(1986) 487–500.

[16] D. Touati-Ahmed and C. Storey, Efficient hybrid conjugate gradient techniques, Journal of Optimiza-tion Theory and Applications 64 (1990) 379–397.

[17] P. Wolfe, Convergence conditions for ascent methods, SIAM Review 11 (1969) 226–235.[18] G. Zoutendijk, Nonlinear programming, computational methods, in: Integer and Nonlinear Program-

ming, ed. J. Abadie (North-Holland, 1970) pp. 37–86.