85
Problem Statement Existing Results Numerical Experiments Summary Riemannian Proximal Gradient Methods Wen Huang Xiamen University Dec. 18, 2019 This is joint work with Ke Wei at Fudan University. Riemannian Proximal Gradient Methods 1

Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian Proximal Gradient Methods

Wen Huang

Xiamen University

Dec. 18, 2019

This is joint work with Ke Wei at Fudan University.

Riemannian Proximal Gradient Methods 1

Page 2: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Problem Statement

Optimization on Manifolds with Structure:

minx∈M

F (x) = f (x) + g(x),

M is a Riemannian manifold;

f is Lipschitz continuously differentiable and may be nonconvex; and

g is continuous and convex, but may be not differentiable.

M

Rf

Applications: sparse PCA, sparse blind deconvolution, sparse low rankimage representation, etc [JTU03, GHT15, SQ16, ZLK+17]

Riemannian Proximal Gradient Methods 2

Page 3: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Problem Statement

Optimization on Manifolds with Structure:

minx∈M

F (x) = f (x) + g(x),

M is a Riemannian manifold;

f is Lipschitz continuously differentiable and may be nonconvex; and

g is continuous and convex, but may be not differentiable.

M

Rf

Applications: sparse PCA, sparse blind deconvolution, sparse low rankimage representation, etc [JTU03, GHT15, SQ16, ZLK+17]

Riemannian Proximal Gradient Methods 2

Page 4: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Existing Nonsmooth Optimization on Manifolds

F :M→ R is Lipschitz continuous

Huang (2013), Gradient sampling method without convergenceanalysis.

Grohs and Hosseini (2015), Two ε-subgradient-based optimizationmethods using line search strategy and trust region strategy,respectively. Any limit point is a critical point.

Hosseini and Uschmajew (2017), Gradient sampling method and anylimit point is a critical point.

Hosseini and Huang and Yousefpour (2018), Mergeε-subgradient-based and quasi-Newton ideas and show any limitpoint is a critical point.

Riemannian Proximal Gradient Methods 3

Page 5: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Existing Nonsmooth Optimization on Manifolds

F :M→ R is convex

Zhang and Sra (2016), subgradient-based method and functionvalue converges to the optimal O(1/

√k).

Ferreira and Oliveira (2002) and Bento, Ferreira and Melo (2017),proximal point method and function value converges to the optimalO(1/k) on Hadamard manifold.

Liu, Shang, Cheng, Cheng, and Jiao (2017), F isLipschitz-continuously differentiable, function value converges to theoptimal O(1/k2)

Riemannian Proximal Gradient Methods 4

Page 6: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Existing Nonsmooth Optimization on Manifolds

F = f + g , where f is L-con, and g is non-smooth

Chen, Ma, So, and Zhang (2018), A proximal gradient method withglobal convergence

Huang and Wei (2019), A FISTA on manifolds with globalconvergence

Huang and Wei (2019), A Riemannian proximal gradient method andits invariant with acceleration. Convergence rate analyses are given

Riemannian Proximal Gradient Methods 5

Page 7: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;

1

The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 8: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;

1The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 9: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;

1The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 10: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;

1The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 11: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;

1The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 12: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Euclidean Proximal Gradient Method

Optimization with Structure: M = Rn×m

minx∈Rn×m

F (x) = f (x) + g(x), (1)

Proximal gradient method and its invariants are excellent methods for solving (1).

A proximal gradient method1:

initial iterate:x0,dk = arg minp∈Rn×m 〈∇f (xk), p〉 + L

2‖p‖2F + g(xk + p), (Proximal mapping)

xk+1 = xk + dk . (Update iterates)

g = 0: reduce to steepest descent method;

L: greater than the Lipschitz constant of ∇f ;

Proximal mapping: easy to compute;

Any limit point is a critical point;1The update rule: xk+1 = arg minx 〈∇f (xk ), x − xk 〉+ L

2‖x − xk‖2 + g(x).

Riemannian Proximal Gradient Methods 6

Page 13: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Convergence Rates

Assumption

minx∈Rn×m F (x) = f (x) + g(x), with convex f ;

O(1/k) sublinear convergence rate:

F (xk)− F (x∗) ≤ C/k , for a constant C ;

Optimal gradient method: O(1/k2) [Dar83, Nes83]

Here, we consider FISTA [BT09]

initial iterate: x0 and let y0 = x0, t0 = 1,dk = arg minp∈Rn×m 〈∇f (yk), p〉 + L

2‖p‖2F + g(yk + p),

xk+1 = yk + dk ,

tk+1 =1+√

4t2k+1

2 ,yk+1 = xk+1 + tk−1

tk+1(xk+1 − xk).

Riemannian Proximal Gradient Methods 7

Page 14: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Convergence Rates

Assumption

minx∈Rn×m F (x) = f (x) + g(x), with convex f ;

O(1/k) sublinear convergence rate:

F (xk)− F (x∗) ≤ C/k , for a constant C ;

Optimal gradient method: O(1/k2) [Dar83, Nes83]

Here, we consider FISTA [BT09]

initial iterate: x0 and let y0 = x0, t0 = 1,dk = arg minp∈Rn×m 〈∇f (yk), p〉 + L

2‖p‖2F + g(yk + p),

xk+1 = yk + dk ,

tk+1 =1+√

4t2k+1

2 ,yk+1 = xk+1 + tk−1

tk+1(xk+1 − xk).

Riemannian Proximal Gradient Methods 7

Page 15: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Convergence Rates

Assumption

minx∈Rn×m F (x) = f (x) + g(x), with convex f ;

O(1/k) sublinear convergence rate:

F (xk)− F (x∗) ≤ C/k , for a constant C ;

Optimal gradient method: O(1/k2) [Dar83, Nes83]

Here, we consider FISTA [BT09]

initial iterate: x0 and let y0 = x0, t0 = 1,dk = arg minp∈Rn×m 〈∇f (yk), p〉 + L

2‖p‖2F + g(yk + p),

xk+1 = yk + dk ,

tk+1 =1+√

4t2k+1

2 ,yk+1 = xk+1 + tk−1

tk+1(xk+1 − xk).

Riemannian Proximal Gradient Methods 7

Page 16: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Difficulties in the Riemannian Setting

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

In the Riemannian setting:

How to define the proximal mapping?

Can be solved cheaply?

Share the same convergence rate?

Riemannian Proximal Gradient Methods 8

Page 17: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

A Riemannian proximal mapping [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Only works for embedded submanifold;

Proximal mapping is defined in tangent space;

Convex programming;

Solved efficiently for the Stiefel manifold by a semi-Newtonalgorithm [XLWZ18];

Step size 1 is not necessary decreasing;

M

TxM

Rx(η)

1[CMSZ18]: S. Chen, S. Ma, M. C. So, and T. Zhang, Proximal gradient methodfor nonsmooth optimization over the Stiefel manifold. arXiv:1811.00980v2

Riemannian Proximal Gradient Methods 9

Page 18: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

A Riemannian proximal mapping [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Only works for embedded submanifold;

Proximal mapping is defined in tangent space;

Convex programming;

Solved efficiently for the Stiefel manifold by a semi-Newtonalgorithm [XLWZ18];

Step size 1 is not necessary decreasing;

M

TxM

Rx(η)

1[CMSZ18]: S. Chen, S. Ma, M. C. So, and T. Zhang, Proximal gradient methodfor nonsmooth optimization over the Stiefel manifold. arXiv:1811.00980v2

Riemannian Proximal Gradient Methods 9

Page 19: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

A Riemannian proximal mapping [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Only works for embedded submanifold;

Proximal mapping is defined in tangent space;

Convex programming;

Solved efficiently for the Stiefel manifold by a semi-Newtonalgorithm [XLWZ18];

Step size 1 is not necessary decreasing;

M

TxM

Rx(η)

1[CMSZ18]: S. Chen, S. Ma, M. C. So, and T. Zhang, Proximal gradient methodfor nonsmooth optimization over the Stiefel manifold. arXiv:1811.00980v2

Riemannian Proximal Gradient Methods 9

Page 20: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

ManPG [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Only works for embedded submanifold;

Proximal mapping is defined in tangent space;

Convex programming;

Solved efficiently for the Stiefel manifold by a semi-Newtonalgorithm [XLWZ18];

Step size 1 is not necessary decreasing;

M

TxM

Rx(η)

Riemannian Proximal Gradient Methods 9

Page 21: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

ManPG [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Only works for embedded submanifold;

Proximal mapping is defined in tangent space;

Convex programming;

Solved efficiently for the Stiefel manifold by a semi-Newtonalgorithm [XLWZ18];

Step size 1 is not necessary decreasing;

M

TxM

Rx(η)

Riemannian Proximal Gradient Methods 9

Page 22: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

ManPG [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Convergence to a stationary point [HW19];

No convergence rate analysis (expect rate O(1/k) if f is convex);

Riemannian Proximal Gradient Methods 10

Page 23: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian Proximal Gradient Method in [CMSZ18]

Euclidean proximal mapping

dk = arg minp∈Rn×m

〈∇f (xk), p〉 +L

2‖p‖2

F + g(xk + p)

ManPG [CMSZ18]

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

Convergence to a stationary point [HW19];

No convergence rate analysis (expect rate O(1/k) if f is convex);

Riemannian Proximal Gradient Methods 10

Page 24: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

New Riemannian Proximal Gradient Methods

GOAL:

1 Numerical aspect:An accelerated Riemannian proximal gradient method with goodnumerical performance

2 Theoretical aspect:An accelerated Riemannian proximal gradient method withconvergence rate analysis and good numerical performance for someinstances

Riemannian Proximal Gradient Methods 11

Page 25: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith good numerical performance

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

Riemannian Proximal Gradient Methods 12

Page 26: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith good numerical performance

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

Riemannian Proximal Gradient Methods 12

Page 27: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal GradientA Riemannian FISTA with a safeguard

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

Run proximal gradient method every N iterations

If the iterate by FISTA has larger function value than that by proximalgradient, then the safeguard takes effect.

Riemannian Proximal Gradient Methods 13

Page 28: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal GradientA Riemannian FISTA with a safeguard

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

FISTA initial iterate: x0 and let y0 = x0, t0 = 1,dk = arg minp∈Rn×m 〈∇f (yk), p〉 + L

2‖p‖2F + g(yk + p),

xk+1 = yk + dk ,

tk+1 =1+√

4t2k+1

2 ,yk+1 = xk+1 + tk−1

tk+1(xk+1 − xk).

Riemannian Proximal Gradient Methods 13

Page 29: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal GradientA Riemannian FISTA with a safeguard

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

A Riemannian generalization: Rx(η) = x + η, R−1x (y) = y − x :

yk+1 = xk+1 +1− tktk+1

(xk − xk+1)︸ ︷︷ ︸replaced by R−1

xk+1(xk )︸ ︷︷ ︸

replaced by Rxk+1

(1−tktk+1

R−1xk+1

(xk ))

,

Riemannian Proximal Gradient Methods 13

Page 30: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical aspect: A New Riemannian Proximal GradientA Riemannian FISTA with a safeguard

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉 + L

2‖η‖2F + g(yk + η);

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Rxk+1

(1−tktk+1

R−1xk+1

(xk))

;

Works well in practice

Convergence globally

No convergence rate analysis

Riemannian Proximal Gradient Methods 13

Page 31: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Theoretical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith convergence rate analysis and good numerical performance for someinstances

A New Riemannian Proximal Gradient Method

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉xk +

L

2‖η‖2

xk︸ ︷︷ ︸Riemannian metric

+g( Rxk (η)︸ ︷︷ ︸replace xk + η

);

2 xk+1 = Rxk (ηk);

General framework for Riemannian optimization;

The tangent space may be too rough to approximate manifold forconvergence analysis;

Step size can be fixed to be 1;

Riemannian Proximal Gradient Methods 14

Page 32: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Theoretical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith convergence rate analysis and good numerical performance for someinstances

A New Riemannian Proximal Gradient Method

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉xk +

L

2‖η‖2

xk︸ ︷︷ ︸Riemannian metric

+g( Rxk (η)︸ ︷︷ ︸replace xk + η

);

2 xk+1 = Rxk (ηk);

General framework for Riemannian optimization;

The tangent space may be too rough to approximate manifold forconvergence analysis;

Step size can be fixed to be 1;

Riemannian Proximal Gradient Methods 14

Page 33: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Theoretical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith convergence rate analysis and good numerical performance for someinstances

A New Riemannian Proximal Gradient Method

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉xk +

L

2‖η‖2

xk︸ ︷︷ ︸Riemannian metric

+g( Rxk (η)︸ ︷︷ ︸replace xk + η

);

2 xk+1 = Rxk (ηk);

General framework for Riemannian optimization;

The tangent space may be too rough to approximate manifold forconvergence analysis;

Step size can be fixed to be 1;

Riemannian Proximal Gradient Methods 14

Page 34: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Theoretical aspect: A New Riemannian Proximal Gradient

GOAL: Develop an accelerated Riemannian proximal gradient methodwith convergence rate analysis and good numerical performance for someinstances

A New Riemannian Proximal Gradient Method

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉xk +

L

2‖η‖2

xk︸ ︷︷ ︸Riemannian metric

+g( Rxk (η)︸ ︷︷ ︸replace xk + η

);

2 xk+1 = Rxk (ηk);

General framework for Riemannian optimization;

The tangent space may be too rough to approximate manifold forconvergence analysis;

Step size can be fixed to be 1;

Riemannian Proximal Gradient Methods 14

Page 35: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Result

Assumption:

1 f is Lipschitz continuously differentiable in a Riemannian sense(L-retraction-smooth);

Riemannian Proximal Gradient Methods 15

Page 36: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Result

Assumption:

1 f is Lipschitz continuously differentiable in a Riemannian sense(L-retraction-smooth);

Definition

A function h :M→ R is called L-retraction-smooth with respect to aretraction R in N ⊂M if for any x ∈ N and any Sx ⊂ TxM such thatRx(Sx) ⊂ N , we have that qx = h Rx satisfies

qx(η) ≤ qx(ξ) + 〈grad qx(ξ), η − ξ〉x +L

2‖η − ξ‖2

x ∀η, ξ ∈ Sx .

Riemannian Proximal Gradient Methods 15

Page 37: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Result

Assumption:

1 f is Lipschitz continuously differentiable in a Riemannian sense(L-retraction-smooth);

Theoretical results:

For any accumulation point x∗ of xk, x∗ is a stationary point, i.e.,0 ∈ ∂F (x∗).

Riemannian Proximal Gradient Methods 15

Page 38: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Riemannian Proximal Gradient Methods 16

Page 39: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Definition

A function h :M→ R is called retraction-convex with respect to aretraction R in N ⊆M if for any x ∈ N and any Sx ⊆ TxM such thatRx(Sx) ⊆ N , there exists a tangent vector ζ ∈ TxM such that qx = h Rx

satisfiesqx(η) ≥ qx(ξ) + 〈ζ, η − ξ〉x ∀η, ξ ∈ Sx . (2)

Note that ζ = grad qx(ξ) if h is differentiable; otherwise, ζ is anysubgradient of qx at ξ.

Riemannian Proximal Gradient Methods 16

Page 40: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Retraction approximately satisfies the triangle relation:∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣ ≤κ‖ηx‖2x , for a constant κ

where ηx = R−1x (y), ξx = R−1

x (z), ζy = R−1y (z).

Riemannian Proximal Gradient Methods 16

Page 41: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Retraction approximately satisfies the triangle relation:∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣ ≤κ‖ηx‖2x , for a constant κ

where ηx = R−1x (y), ξx = R−1

x (z), ζy = R−1y (z).

Table: Exponential mapping on the Stifel manifold with the Euclidean metric〈ηx , ξx〉x = trace(ηTx ξx). Left =

∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣(n, p) = (10, 1) (n, p) = (10, 4) (n, p) = (10, 10)‖ηx‖ Left ‖ηx‖ Left ‖ηx‖ Left

5.00−2 7.83−5 5.00−2 1.83−5 5.00−2 2.14−6

2.50−2 1.80−5 2.50−2 4.27−6 2.50−2 4.72−7

1.25−2 4.25−6 1.25−2 1.01−6 1.25−2 1.11−7

6.25−3 1.03−6 6.25−3 2.46−7 6.25−3 2.68−8

3.12−3 2.54−7 3.12−3 6.05−8 3.13−3 6.61−9

Riemannian Proximal Gradient Methods 16

Page 42: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Retraction approximately satisfies the triangle relation:∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣ ≤κ‖ηx‖2x , for a constant κ

where ηx = R−1x (y), ξx = R−1

x (z), ζy = R−1y (z).

Table: Exponential mapping on the Stifel manifold with the canonical metric〈ηx , ξx〉x = trace(ηTx (I − XXT/2)ξx). Left =

∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣(n, p) = (10, 2) (n, p) = (10, 4) (n, p) = (10, 9)‖ηx‖ Left ‖ηx‖ Left ‖ηx‖ Left

5.00−2 3.55−5 5.00−2 1.15−5 5.00−2 8.39−6

2.50−2 8.06−6 2.50−2 2.58−6 2.50−2 1.89−6

1.25−2 1.90−6 1.25−2 6.08−7 1.25−2 4.45−7

6.25−3 4.61−7 6.25−3 1.47−7 6.25−3 1.08−7

3.13−3 1.13−7 3.13−3 3.63−8 3.12−3 2.66−8

Riemannian Proximal Gradient Methods 16

Page 43: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

f is convex in a Riemannian sense (retraction-convex);

Retraction approximately satisfies the triangle relation:∣∣‖ξx − ηx‖2x − ‖ζy‖2

y

∣∣ ≤κ‖ηx‖2x , for a constant κ

where ηx = R−1x (y), ξx = R−1

x (z), ζy = R−1y (z).

Theoretical results:

Convergence rate O(1/k):

F (xk)− F (x∗) ≤1

k

(L

2‖R−1

x0(x∗)‖2

x0+

LκC

2(F (x0)− F (x∗))

).

Riemannian Proximal Gradient Methods 16

Page 44: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian FISTA

A Riemannian FISTA

initial iterate: x0 and let y0 = x0, t0 = 1;

1 ηk = arg minη∈TykM 〈grad f (yk), η〉yk + L

2‖η‖2yk + g(Ryk (η));

2 xk+1 = Ryk (ηk);

3 tk+1 =1+√

4t2k+1

2 ;

4 Compute yk+1 = Ryk

(tk+1+tk−1

tk+1ηyk − tk−1

tk+1R−1yk (xk)

);

FISTA initial iterate: x0 and let y0 = x0, t0 = 1,dk = arg minp∈Rn×m 〈∇f (yk), p〉 + L

2‖p‖2F + g(yk + p),

xk+1 = yk + dk ,

tk+1 =1+√

4t2k+1

2 ,yk+1 = xk+1 + tk−1

tk+1(xk+1 − xk).

Riemannian Proximal Gradient Methods 17

Page 45: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

A Riemannian FISTA

A Riemannian FISTA

initial iterate: x0 and let y0 = x0, t0 = 1;

1 ηk = arg minη∈TykM 〈grad f (yk), η〉yk + L

2‖η‖2yk + g(Ryk (η));

2 xk+1 = Ryk (ηk);

3 tk+1 =1+√

4t2k+1

2 ;

4 Compute yk+1 = Ryk

(tk+1+tk−1

tk+1ηyk − tk−1

tk+1R−1yk (xk)

);

A Riemannian generalization:

yk+1 =yk +tk+1 + tk − 1

tk+1(xk+1 − yk)− tk − 1

tk+1(xk − yk)

=xk+1 +tk − 1

tk+1(xk+1 − xk),

Riemannian Proximal Gradient Methods 17

Page 46: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

There exists a constant κ such that∣∣∣‖(tk+1 − 1)(R−1yk (xk+1)− R−1

yk (yk+1)) + R−1yk (x∗)− R−1

yk (yk+1)‖2yk

− ‖(tk+1 − 1)R−1yk+1

(xk+1) + R−1yk+1

(x∗)‖2yk+1

∣∣∣ ≤ κ‖R−1yk (yk+1)‖2

yk .

φ(k) :=∑k

i=0 ‖R−1yk (yk+1)‖2

yk increases on the order of O((k + 1)θ) for

θ ∈ [0, 1], i.e., φ(k)(k+1)θ

< Cφ for all k .

Riemannian Proximal Gradient Methods 18

Page 47: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

There exists a constant κ such that∣∣∣‖(tk+1 − 1)(R−1yk (xk+1)− R−1

yk (yk+1)) + R−1yk (x∗)− R−1

yk (yk+1)‖2yk

− ‖(tk+1 − 1)R−1yk+1

(xk+1) + R−1yk+1

(x∗)‖2yk+1

∣∣∣ ≤ κ‖R−1yk (yk+1)‖2

yk .

φ(k) :=∑k

i=0 ‖R−1yk (yk+1)‖2

yk increases on the order of O((k + 1)θ) for

θ ∈ [0, 1], i.e., φ(k)(k+1)θ

< Cφ for all k .

0 20 40 60

iteration

10-15

10-10

10-5

Test for Assumption 4.1

0 20 40 60

iteration

0

0.005

0.01

0.015(k

)Test for Assumption 4.2

Riemannian Proximal Gradient Methods 18

Page 48: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Assumptions and Convergence Rate

Additional Assumptions:

There exists a constant κ such that∣∣∣‖(tk+1 − 1)(R−1yk (xk+1)− R−1

yk (yk+1)) + R−1yk (x∗)− R−1

yk (yk+1)‖2yk

− ‖(tk+1 − 1)R−1yk+1

(xk+1) + R−1yk+1

(x∗)‖2yk+1

∣∣∣ ≤ κ‖R−1yk (yk+1)‖2

yk .

φ(k) :=∑k

i=0 ‖R−1yk (yk+1)‖2

yk increases on the order of O((k + 1)θ) for

θ ∈ [0, 1], i.e., φ(k)(k+1)θ

< Cφ for all k .

Theoretical results:

Convergence rate O(1/k2) if θ = 0:

F (xk)− F (x∗) ≤2L

k2‖R−1

x0(x∗)‖2

x0+

2LκCφk2−θ (F (x0)− F (x∗)).

Riemannian Proximal Gradient Methods 18

Page 49: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

The Proposed Algorithm

A Riemannian FISTA with a safeguard

initial iterate: x0 and let y0 = x0, t0 = 1;

1 Invoke a safeguard every N iterations;

2 ηk = arg minη∈TykM 〈grad f (yk), η〉yk + L

2‖η‖2yk + g(Ryk (η));

3 xk+1 = Ryk (ηk);

4 tk+1 =1+√

4t2k+1

2 ;

5 Compute yk+1 = Ryk

(tk+1+tk−1

tk+1ηyk − tk−1

tk+1R−1yk (xk)

);

Convergence globally;

Convergence rate 1k2−θ if previous assumptions hold and safeguard

takes effect for finite iterations;

Riemannian Proximal Gradient Methods 19

Page 50: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

Riemannian Proximal Gradient Methods 20

Page 51: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

In some cases, the subproblem can be solved by exploiting the structure ofthe manifold;

Riemannian Proximal Gradient Methods 20

Page 52: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

Solving the Riemannian Proximal Mapping

initial iterate: η0 ∈ TuM, σ ∈ (0, 1), k = 0;

1 vk = Ru(ηk);

2 Compute

ξ∗k = arg minξ∈Tvk

M〈T −]Rηk

(grad f (u) + Lηk), ξ〉u +L

4‖ξ‖2

F + g(vk + ξ);

3 Find α > 0 such that `u(ηk + αT −1Rηk

ξ∗k ) < `u(ηk)− σα‖ξ∗k‖2u;

4 ηk+1 = ηk + αT −1Rηk

ξ∗k , k ← k + 1 and goto Step 1;

Above algorithm is used if the ambient space is Rn

Riemannian Proximal Gradient Methods 20

Page 53: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

Solving the Riemannian Proximal Mapping

initial iterate: η0 ∈ TuM, σ ∈ (0, 1), k = 0;

1 vk = Ru(ηk);

2 Compute

ξ∗k = arg minξ∈Tvk

M〈T −]Rηk

(grad f (u) + Lηk), ξ〉u +L

4‖ξ‖2

F + g(vk + ξ);

3 Find α > 0 such that `u(ηk + αT −1Rηk

ξ∗k ) < `u(ηk)− σα‖ξ∗k‖2u;

4 ηk+1 = ηk + αT −1Rηk

ξ∗k , k ← k + 1 and goto Step 1;

Above algorithm is used if the ambient space is Rn

Riemannian Proximal Gradient Methods 20

Page 54: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

Solving the Riemannian Proximal Mapping

initial iterate: η0 ∈ TuM, σ ∈ (0, 1), k = 0;

1 vk = Ru(ηk);

2 Compute

ξ∗k = arg minξ∈Tvk

M〈T −]Rηk

(grad f (u) + Lηk), ξ〉u +L

4‖ξ‖2

F + g(vk + ξ);

3 Find α > 0 such that `u(ηk + αT −1Rηk

ξ∗k ) < `u(ηk)− σα‖ξ∗k‖2u;

4 ηk+1 = ηk + αT −1Rηk

ξ∗k , k ← k + 1 and goto Step 1;

Above algorithm is used if the ambient space is Rn

Riemannian Proximal Gradient Methods 20

Page 55: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Riemannian subproblem

ηu = arg minη∈TuM

`u(η) := 〈∇f (u), η〉u +L

2‖η‖2

u + g(Ru(η))

Solving the Riemannian Proximal Mapping

initial iterate: η0 ∈ TuM, σ ∈ (0, 1), k = 0;

1 vk = Ru(ηk);

2 Compute

ξ∗k = arg minξ∈Tvk

M〈T −]Rηk

(grad f (u) + Lηk), ξ〉u +L

4‖ξ‖2

F + g(vk + ξ);

3 Find α > 0 such that `u(ηk + αT −1Rηk

ξ∗k ) < `u(ηk)− σα‖ξ∗k‖2u;

4 ηk+1 = ηk + αT −1Rηk

ξ∗k , k ← k + 1 and goto Step 1;

An application of [CMSZ18] if R−1u (η) exists.

Riemannian Proximal Gradient Methods 20

Page 56: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Sparse PCA problem [GHT15]

minX∈OB(p,n)

‖XTATAX − D2‖2F + λ‖X‖1,

where A ∈ Rm×n, D is the diagonal matrix with dominant singular valuesof A, OB(p, n) = X ∈ Rn×p | diag(XTX ) = Ip, p ≤ m;

Riemannian Proximal Gradient Methods 21

Page 57: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping (each column):Rx(ηx) = x cos(‖ηx‖) + ηx

‖ηx‖ sin(‖ηx‖);

Explore the fact that the following problem has a closed solution:

minx∈OB(p,n)

‖x − y‖2F +

1

2λ‖x‖1 for any y ∈ Rn×p.

A conditional gradient (Frank-Wolfe) method is used;

Numerically, using approximate 2 iterations is enough for highaccuracy;

Riemannian Proximal Gradient Methods 22

Page 58: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping (each column):Rx(ηx) = x cos(‖ηx‖) + ηx

‖ηx‖ sin(‖ηx‖);

Explore the fact that the following problem has a closed solution:

minx∈OB(p,n)

‖x − y‖2F +

1

2λ‖x‖1 for any y ∈ Rn×p.

A conditional gradient (Frank-Wolfe) method is used;

Numerically, using approximate 2 iterations is enough for highaccuracy;

Riemannian Proximal Gradient Methods 22

Page 59: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping (each column):Rx(ηx) = x cos(‖ηx‖) + ηx

‖ηx‖ sin(‖ηx‖);

Explore the fact that the following problem has a closed solution:

minx∈OB(p,n)

‖x − y‖2F +

1

2λ‖x‖1 for any y ∈ Rn×p.

A conditional gradient (Frank-Wolfe) method is used;

Numerically, using approximate 2 iterations is enough for highaccuracy;

Riemannian Proximal Gradient Methods 22

Page 60: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping (each column):Rx(ηx) = x cos(‖ηx‖) + ηx

‖ηx‖ sin(‖ηx‖);

Explore the fact that the following problem has a closed solution:

minx∈OB(p,n)

‖x − y‖2F +

1

2λ‖x‖1 for any y ∈ Rn×p.

A conditional gradient (Frank-Wolfe) method is used;

Numerically, using approximate 2 iterations is enough for highaccuracy;

Riemannian Proximal Gradient Methods 22

Page 61: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping (each column):Rx(ηx) = x cos(‖ηx‖) + ηx

‖ηx‖ sin(‖ηx‖);

Explore the fact that the following problem has a closed solution:

minx∈OB(p,n)

‖x − y‖2F +

1

2λ‖x‖1 for any y ∈ Rn×p.

A conditional gradient (Frank-Wolfe) method is used;

Numerically, using approximate 2 iterations is enough for highaccuracy;

Riemannian Proximal Gradient Methods 22

Page 62: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

ManPG: the method in [CMSZ18];

RPG: the new Riemannian proximal gradient without acceleration;

A-ManPG: Use similar technique to accelerate ManPG;

A-RPG: the new Riemannian proximal gradient with acceleration;

Riemannian Proximal Gradient Methods 23

Page 63: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

ManPG-Ada:

1 ηk = arg minη∈TxkM 〈∇f (xk), η〉 + L

2‖η‖2F + g(xk + η);

2 xk+1 = Rxk (αkηk) with an appropriate step size αk ;

3 Update L;

Riemannian Proximal Gradient Methods 23

Page 64: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

ManPG and RPG: Stop when δ < 10−8nr ;

A-ManPG and A-RPG: Stop when F is smaller than the minimum ofManPG and RPG;

Riemannian Proximal Gradient Methods 23

Page 65: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

spar.: sparsity of the solution;

navar: the adjusted variance normalized by the variance from thestandard PCA;

Riemannian Proximal Gradient Methods 23

Page 66: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

PG without acceleration is slower than PG with acceleration;

RPG is slightly faster ManPG in term of computational time;

Riemannian Proximal Gradient Methods 23

Page 67: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

PG without acceleration is slower than PG with acceleration;

RPG is slightly faster ManPG in term of computational time;

Riemannian Proximal Gradient Methods 23

Page 68: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 128, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 11791 1.40 8.331 5.11−6 0.54 0.86RPG 11679 0.94 8.331 5.11−6 0.54 0.86

ManPG-Ada 1398 0.30 8.331 1.67−3 0.54 0.86A-ManPG 273 0.09 8.331 9.19−4 0.54 0.86

A-RPG 263 0.06 8.331 1.12−3 0.54 0.86

ManPG and RPG: similarly; and A-ManPG and A-RPG: similarly;in term of:

number of iterations;function values;sparsity;adjusted variance;

Riemannian Proximal Gradient Methods 23

Page 69: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

0 200 400

iteration

190

195

200

205

210

F(X

)

ManPG

RPG

ManPG-Ada

AManPG

ARPG

0 500 1000

iteration

190

195

200

205

210

F(X

)

ManPG

RPG

ManPG-Ada

AManPG

ARPG

Figure: Two typical runs of ManPG, RPG, A-ManPG, and A-RPG for theSparse PCA problem. n = 1024, p = 4, λ = 2, m = 20.

Riemannian Proximal Gradient Methods 24

Page 70: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Sparse PCA problem (Another model) [CMSZ18, HW19]

minX∈St(p,n)

− trace(XTATAX ) + λ‖X‖1,

where A ∈ Rm×n is a data matrix.

Riemannian Proximal Gradient Methods 25

Page 71: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping:

ExpX (ηX ) =[X Q

]exp

([Ω −RT

R 0

])[Ip0

],

where Ω = XTηX , Q and R are from the compact QR factorizationof (I − XXT )ηX .

Ingredients for the algorithm on Page 17:

R−1 by iterative methods [Zim17]T −]R by iterative methods

Riemannian Proximal Gradient Methods 26

Page 72: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Solve the proximal mapping:

ηk = arg minη∈TxM

〈∇f (xk), η〉xk +L

2‖η‖2

xk + g(Rxk (η));

Exponential mapping:

ExpX (ηX ) =[X Q

]exp

([Ω −RT

R 0

])[Ip0

],

where Ω = XTηX , Q and R are from the compact QR factorizationof (I − XXT )ηX .

Ingredients for the algorithm on Page 17:

R−1 by iterative methods [Zim17]T −]R by iterative methods

Riemannian Proximal Gradient Methods 26

Page 73: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Lemma

The adjoint operator of the inverse differentiated retraction is

T −]ηXξX =

[X Q1

]exp

([ΩηX −RT

1

R1 0p×p

]) [X Q1

]Tωx

+(I −

[X Q1

] [X Q1

]T)ωx ,

where ωX = XΩζY + QR2, Y = ExpX (ηX ), Q1R1 = (I − XXT )ηX and

Q2R2 = (I − [XQ1][XQ1]T )ξX are qr decompositions, Q =[Q1 Q2

],

M1 =

ΩηX −RT1 0p×p

R1 0p×p 0p×p0p×p 0p×p 0p×p

, Z ΛZH = M1, and ΩζY and R2 are

solutions of ZH[X Q

]TξX =((

ZH ExpX (M1)

[ΩζY −RT

2

R2 02p×2p

]Z

) Φ

)ZH

[Ip

02p×p

].

Riemannian Proximal Gradient Methods 27

Page 74: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 1024, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 1572 0.92 −7.28 4.76−5 0.64 0.74RPG 1464 5.46 −7.28 4.06−5 0.64 0.74

ManPG-Ada 376 0.22 −7.28 3.99−4 0.64 0.74A-ManPG 110 0.20 −7.28 1.06−3 0.64 0.74

A-RPG 88 1.61 −7.28 2.05−4 0.64 0.74

Same notation, same stopping criterion, same parameter setting;

New approaches take more time due to excessive cost on R−1 and T −];New approaches take less iterations;

Riemannian Proximal Gradient Methods 28

Page 75: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

Table: An average result of 10 random tests. n = 1024, m = 20, r = 4.δ = (L‖xk+1 − xk‖)2. The subscript k indicates a scale of 10k .

λ Algo iter time f δ spar. navar

3

ManPG 1572 0.92 −7.28 4.76−5 0.64 0.74RPG 1464 5.46 −7.28 4.06−5 0.64 0.74

ManPG-Ada 376 0.22 −7.28 3.99−4 0.64 0.74A-ManPG 110 0.20 −7.28 1.06−3 0.64 0.74

A-RPG 88 1.61 −7.28 2.05−4 0.64 0.74

Same notation, same stopping criterion, same parameter setting;

New approaches take more time due to excessive cost on R−1 and T −];New approaches take less iterations;

Riemannian Proximal Gradient Methods 28

Page 76: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Numerical Experiments

0 100 200

iteration

-74

-72

-70

-68

-66

F(X

)

ManPG

RPG

ManPG-Ada

AManPG

ARPG

0 100 200

iteration

-75

-74

-73

-72

-71

F(X

)

ManPG

RPG

ManPG-Ada

AManPG

ARPG

Figure: Two typical runs of ManPG, RPG, A-ManPG, and A-RPG for theSparse PCA problem. n = 1024, p = 4, λ = 2, m = 20.

Riemannian Proximal Gradient Methods 29

Page 77: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifold

Scaled proximal mapping:

ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

F + g(xk + η)

=⇒ ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

W + g(xk + η)

where ‖η‖2W = vec(η)TW vec(η) and W is symmetric positive definite.

Difficult to solve in general

Diagonal matrix W inspired by the Riemannian Hessian of thesmooth term.

Riemannian Proximal Gradient Methods 30

Page 78: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifold

Scaled proximal mapping:

ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

F + g(xk + η)

=⇒ ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

W + g(xk + η)

where ‖η‖2W = vec(η)TW vec(η) and W is symmetric positive definite.

Difficult to solve in general

Diagonal matrix W inspired by the Riemannian Hessian of thesmooth term.

Riemannian Proximal Gradient Methods 30

Page 79: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifold

Scaled proximal mapping:

ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

F + g(xk + η)

=⇒ ηk = arg minη∈Txk

M〈∇f (xk), η〉 +

L

2‖η‖2

W + g(xk + η)

where ‖η‖2W = vec(η)TW vec(η) and W is symmetric positive definite.

Difficult to solve in general

Diagonal matrix W inspired by the Riemannian Hessian of thesmooth term.

Riemannian Proximal Gradient Methods 30

Page 80: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifold

The diagonal weight W

Riemannian Hessian of f : St(p, n)→ R : X 7→ − trace(XTATAX ):

Hess f (X )[ηX ] =PTX St(p,n)(−2ATAηX + 2ηX (XTATAX )),

∀ηX ∈ TX St(p, n)

An np-by-np matrix representation of Hess f (X ):

〈ηX ,Hess f (X )[ηX ]〉 =〈ηX ,−2ATAηX + 2ηX (XTATAX )〉=〈vec(ηX ), Jvec(ηX )〉,

where J = −2Ip ⊗ (ATA) + 2(XTATAX )⊗ In.

The diagonal matrix W = max(diag(J), τ Inp).

Riemannian Proximal Gradient Methods 31

Page 81: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifoldNumerical experiments

Table: An average result of 20 random runs for the random data: r = 4,n = 3000 and m = 40. The subscript k indicates a scale of 10k .

λ Algo iter time f ‖ηzk‖ sparsity variance2.5 ManPG-D 1538 1.67 −1.481 1.09−3 0.65 0.722.5 ManPG 2155 2.20 −1.481 1.09−3 0.65 0.722.5 ManPG-Ada-D 469 0.60 −1.481 1.03−3 0.65 0.722.5 ManPG-Ada 508 0.60 −1.481 1.04−3 0.65 0.722.5 AManPG-D 201 0.39 −1.481 1.02−3 0.65 0.722.5 AManPG 237 0.43 −1.491 1.05−3 0.65 0.72

Riemannian Proximal Gradient Methods 32

Page 82: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Acceleration for SPCA on the Stiefel manifoldNumerical experiments

Table: The result for the DNA methylation data: r = 4, n = 24589 andm = 113. The subscript k indicates a scale of 10k .

λ Algo iter time f ‖ηzk‖ sparsity variance6.0 ManPG-D 706 7.37 −7.743 3.11−3 0.29 0.966.0 ManPG 2206 20.10 −7.743 3.14−3 0.29 0.966.0 ManPG-Ada-D 369 4.58 −7.743 3.03−3 0.29 0.966.0 ManPG-Ada 957 10.18 −7.743 3.11−3 0.29 0.966.0 AManPG-D 93 2.33 −7.743 2.91−3 0.29 0.966.0 AManPG 183 3.46 −7.743 2.96−3 0.29 0.96

Riemannian Proximal Gradient Methods 33

Page 83: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

Summary

Propose first Riemannian proximal gradient methods withconvergence rate analyses;

Propose Riemannian proximal gradient methods with acceleration;

Apply the methods to sparse PCA problems on the oblique manifoldand the Stiefel manifold;

Compare the new proximal gradient method with the existingproximal gradient method;

Riemannian Proximal Gradient Methods 34

Page 84: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

References I

A. Beck and M. Teboulle.

A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, January 2009.doi:10.1137/080716542.

Shixiang Chen, Shiqian Ma, Anthony Man-Cho So, and Tong Zhang.

Proximal gradient method for nonsmooth optimization over the Stiefel manifold.arXiv:1811.00980, 2018.

John Darzentas.

Problem Complexity and Method Efficiency in Optimization.1983.

Matthieu Genicot, Wen Huang, and Nickolay T. Trendafilov.

Weakly correlated sparse components with nearly orthonormal loadings.In Geometric Science of Information, pages 484–490, 2015.

W. Huang and K. Wei.

Extending FISTA to Riemannian optimization for sparse PCA.arXiv:1909.05485, 2019.

Ian T. Jolliffe, Nickolay T. Trendafilov, and Mudassir Uddin.

A modified principal component technique based on the Lasso.Journal of Computational and Graphical Statistics, 12(3):531–547, 2003.

Y. E. Nesterov.

A method for solving the convex programming problem with convergence rate $O(1/kˆ2)$.Dokl. Akas. Nauk SSSR (In Russian), 269:543–547, 1983.

Riemannian Proximal Gradient Methods 35

Page 85: Riemannian Proximal Gradient Methodswhuang2/pdf/SUSTech_2019-12-18.pdf2019/12/18  · Problem Statement Existing Results Numerical Experiments Summary Problem Statement Optimization

Problem StatementExisting Results

Numerical ExperimentsSummary

References II

J. Shi and C. Qi.

Low-rank sparse representation for single image super-resolution via self-similarity learning.In 2016 IEEE International Conference on Image Processing (ICIP), pages 1424–1428, Sep. 2016.

Xiantao Xiao, Yongfeng Li, Zaiwen Wen, and Liwei Zhang.

A regularized semi-smooth newton method with projection steps for composite convex programs.Journal of Scientific Computing, 76(1):364–389, Jul 2018.

R. Zimmermann.

A Matrix-Algebraic Algorithm for the Riemannian Logarithm on the Stiefel Manifold under the Canonical Metric.SIAM Journal on Matrix Analysis and Applications, 38(2):322–342, 2017.

Y. Zhang, Y. Lau, H.-W. Kuo, S. Cheung, A. Pasupathy, and J. Wright.

On the global geometry of sphere-constrained sparse blind deconvolution.In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Riemannian Proximal Gradient Methods 36