13
Provable Advantage in Quantum Phase Learning via Quantum Kernel Alphatron Yusen Wu, 1, * Bujiao Wu, 2, * Jingbo Wang, 1, and Xiao Yuan 2, 1 Department of Physics, The University of Western Australia, Perth, WA 6009, Australia 2 Center on Frontiers of Computing Studies, Peking University, Beijing 100871, China (Dated: November 19, 2021) Can we use a quantum computer to speed up classical machine learning in solving problems of practical significance? Here, we study this open question focusing on the quantum phase learning problem, an important task in many-body quantum physics. We prove that, under widely believed complexity theory assumptions, quantum phase learning problem cannot be efficiently solved by machine learning algorithms using classical resources and classical data. Whereas using quantum data, we theoretically prove the universality of quantum kernel Alphatron in efficiently predicting quantum phases, indicating quantum advantages in this learning problem. We numerically bench- mark the algorithm for a variety of problems, including recognizing symmetry-protected topological phases and symmetry-broken phases. Our results highlight the capability of quantum machine learning in efficient prediction of quantum phases. I. INTRODUCTION The complex nature of multipartite entanglement has stimulated different powerful classical techniques, includ- ing quantum Monte Carlo [14], density functional the- ory [5], density matrix renormalization group [6, 7], etc, to study many-body quantum systems. Among them, classical machine learning techniques have been recently considered as a means of either representing the state or learning the quantum behaviour. From both theoreti- cal and numerical perspectives, many works have shown that neural network quantum state ans¨ atze have stronger representation powers than conventional tensor networks and may solve complex static and dynamical quantum problems [814]. Yet, recent quantum supremacy exper- iments [1517] have indicated that the sole application of these classical learning algorithms might still be chal- lenging for an arbitrary quantum system that possesses genuine and intricate entanglement. Therefore, two crit- ical questions arise: (1) where is exactly the limitation of classical machine learning in quantum problems? and (2) do more advanced solutions exist? Here we study these two questions. For a many-body quantum system described by a parameterized Hamilto- nian H(a), we focus on a general quantum phase learn- ing (QPL) problem [1820], aiming at detecting quan- tum phases determined by certain order parameters on an eigenstate of H(a). While QPL plays the key role to investigate tremendous behaviours in condensed-matter physics [21], it is an inherently hard problem since the or- der parameter is generally unknown and estimating the order parameter is classically hard. Indeed, under two reasonable assumptions — (1) the polynomial hierarchy does not collapse in the computational complexity theory, and (2) the classical hardness for random circuit sampling * These two authors contributed equally [email protected] [email protected] holds, we rigorously prove that the QPL problem is hard for any classical machine learning methods if the training data set is generated by a classical Turing machine. We therefore answer the first question by showing the exact limitation of classical machine learning in QPL. For the second question, we consider the solution of QPL using quantum computers. Among different ap- plications of quantum computing [2232], a variety of quantum machine learning algorithms [20, 3342] have been developed for different problems and demonstrated the potential for solving classically intractable problems, using parameterized circuits and classical optimization of noisy-intermediate-scale-quantum devices. Using a quantum computer, we propose the quantum kernel Al- phatron algorithm to efficiently solve the QPL prob- lem. We show convincing numerical results in detecting symmetry-protected topological phases of the Haldane chain and symmetry broken phases of the XXZ model. Recently, Rebentrost, Santha and Yang [43] proposed a quantum alphatron in training multinomial kernel func- tions for fault-tolerant quantum devices, whereas our method focuses on the quantum kernel and is designed for near-term quantum devices. This paper is organized as follows. In Sec. II we re- view the supervised learning and quantum feature spaces, and give the definition of the quantum phase recognition learning problem. We give the hardness results for classi- cal learning algorithms in Sec. III. Sec. IV gives a quan- tum learning algorithm for quantum phase recognition problem. Sec. V gives the numerical results for it, and we also provide the complexity class of the QPL problem in Sec. VI. We give a discussion in Sec. VII. II. PRELIMINARIES In this section, we review the definitions of supervised learning and kernel methods, and introduce the quantum phase recognition problem. arXiv:2111.07553v2 [quant-ph] 18 Nov 2021

Provable Advantage in Quantum Phase Learning via Quantum

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Provable Advantage in Quantum Phase Learning via Quantum

Provable Advantage in Quantum Phase Learning via Quantum Kernel Alphatron

Yusen Wu,1, ∗ Bujiao Wu,2, ∗ Jingbo Wang,1, † and Xiao Yuan2, ‡

1Department of Physics, The University of Western Australia, Perth, WA 6009, Australia2Center on Frontiers of Computing Studies, Peking University, Beijing 100871, China

(Dated: November 19, 2021)

Can we use a quantum computer to speed up classical machine learning in solving problems ofpractical significance? Here, we study this open question focusing on the quantum phase learningproblem, an important task in many-body quantum physics. We prove that, under widely believedcomplexity theory assumptions, quantum phase learning problem cannot be efficiently solved bymachine learning algorithms using classical resources and classical data. Whereas using quantumdata, we theoretically prove the universality of quantum kernel Alphatron in efficiently predictingquantum phases, indicating quantum advantages in this learning problem. We numerically bench-mark the algorithm for a variety of problems, including recognizing symmetry-protected topologicalphases and symmetry-broken phases. Our results highlight the capability of quantum machinelearning in efficient prediction of quantum phases.

I. INTRODUCTION

The complex nature of multipartite entanglement hasstimulated different powerful classical techniques, includ-ing quantum Monte Carlo [1–4], density functional the-ory [5], density matrix renormalization group [6, 7], etc,to study many-body quantum systems. Among them,classical machine learning techniques have been recentlyconsidered as a means of either representing the state orlearning the quantum behaviour. From both theoreti-cal and numerical perspectives, many works have shownthat neural network quantum state ansatze have strongerrepresentation powers than conventional tensor networksand may solve complex static and dynamical quantumproblems [8–14]. Yet, recent quantum supremacy exper-iments [15–17] have indicated that the sole applicationof these classical learning algorithms might still be chal-lenging for an arbitrary quantum system that possessesgenuine and intricate entanglement. Therefore, two crit-ical questions arise: (1) where is exactly the limitationof classical machine learning in quantum problems? and(2) do more advanced solutions exist?

Here we study these two questions. For a many-bodyquantum system described by a parameterized Hamilto-nian H(a), we focus on a general quantum phase learn-ing (QPL) problem [18–20], aiming at detecting quan-tum phases determined by certain order parameters onan eigenstate of H(a). While QPL plays the key role toinvestigate tremendous behaviours in condensed-matterphysics [21], it is an inherently hard problem since the or-der parameter is generally unknown and estimating theorder parameter is classically hard. Indeed, under tworeasonable assumptions — (1) the polynomial hierarchydoes not collapse in the computational complexity theory,and (2) the classical hardness for random circuit sampling

∗ These two authors contributed equally† [email protected][email protected]

holds, we rigorously prove that the QPL problem is hardfor any classical machine learning methods if the trainingdata set is generated by a classical Turing machine. Wetherefore answer the first question by showing the exactlimitation of classical machine learning in QPL.

For the second question, we consider the solution ofQPL using quantum computers. Among different ap-plications of quantum computing [22–32], a variety ofquantum machine learning algorithms [20, 33–42] havebeen developed for different problems and demonstratedthe potential for solving classically intractable problems,using parameterized circuits and classical optimizationof noisy-intermediate-scale-quantum devices. Using aquantum computer, we propose the quantum kernel Al-phatron algorithm to efficiently solve the QPL prob-lem. We show convincing numerical results in detectingsymmetry-protected topological phases of the Haldanechain and symmetry broken phases of the XXZ model.

Recently, Rebentrost, Santha and Yang [43] proposed aquantum alphatron in training multinomial kernel func-tions for fault-tolerant quantum devices, whereas ourmethod focuses on the quantum kernel and is designedfor near-term quantum devices.

This paper is organized as follows. In Sec. II we re-view the supervised learning and quantum feature spaces,and give the definition of the quantum phase recognitionlearning problem. We give the hardness results for classi-cal learning algorithms in Sec. III. Sec. IV gives a quan-tum learning algorithm for quantum phase recognitionproblem. Sec. V gives the numerical results for it, andwe also provide the complexity class of the QPL problemin Sec. VI. We give a discussion in Sec. VII.

II. PRELIMINARIES

In this section, we review the definitions of supervisedlearning and kernel methods, and introduce the quantumphase recognition problem.

arX

iv:2

111.

0755

3v2

[qu

ant-

ph]

18

Nov

202

1

Page 2: Provable Advantage in Quantum Phase Learning via Quantum

2

A. Supervised learning with Quantum featurespace

Here, we denote (a, b) (or (x, y)) as a pair of the datuma (x) and the corresponding label b (y) in the trainingset S (testing set T ). Generally, the task of supervisedlearning is to learn a label y of the testing datum x ∈T ⊂ X from a distribution D(x) defined on the spaceX according to some decision rule h. The decision ruleh is assigned by a selected machine learning model fromthe training set S = (ai, bi)Ni=1, where ai ∈ X followsdistribution D(ai), the label bi = h(ai), and N is the sizeof the training set. Given the training set S, an efficientlearner needs to generate a classifier h in poly(N) time,with the goal of achieving low error or risk

R(h) = Prx∼D

[h(x) 6= y]. (1)

Here, we assume that the datum x is sampled randomlyaccording to D(x), in both training and testing proce-dure, and the size N of the training set is polynomial inthe data dimension.

The kernel method has played a crucial role in the de-velopment of supervised learning [44–46], which providesan approach to increase the expressivity and trainabil-ity of the original training set. We can describe a kernelfunction K : X × X → R as K(x,x′) = Ψ(x)TΨ(x′),where Ψ : X → H is the feature map which maps the da-tum x ∈ X to a higher-dimensional space H (featurespace). Tremendous classical kernel methods [45, 46]have been proposed to learn the non-linear functionsor decision boundaries. With the rapid development ofquantum computers, there is a growing interest in explor-ing whether the quantum kernel method can surpass theclassical kernel [35, 36, 39, 47–61]. Here we leverage thequantum kernel as our kernel function, which is defined

as Q(x,x′) = |〈φ(x)|φ(x′)〉|2, where |φ(x)〉 is a quantumstate associated with x.

B. Quantum Phase Learning (QPL) problem

In this section, we introduce the quantum phaselearning (QPL) problem, which is the key prerequisiteto investigate a large number of behaviours in manycondensed-matter systems [18, 19].

Given an n-qubit Hamiltonian H(a) with interactionparameters a and an order parameter M ∈ C2n×2n , thegoal of quantum phase computation is to approximatethe phase value b = 〈φ(a)|M|φ(a)〉 to additive errorε = 1/poly(n), where |φ(a)〉 is the ground state of H(a).For example, considering the Ising Hamiltonian, the pa-rameter and the order parameter could be the strengthof the transverse magnetic field and the spin correlationrespectively, and quantum phases include paramagnetic,ferromagnetic, and antiferromagnetic phases. In general,it would be hard to recognize quantum phases of an arbi-trary many-body quantum system, owing to the hardness

of obtaining the ground state and the fact that the or-der parameter is generally unknown. Nevertheless, theremay also exist cases where the problem is exactly effi-ciently solvable for very specific choices of parameters.Then, a natural question is, based on the solvable orknown phases, whether we could learn or predict quan-tum phases for other cases. Therefore, it is natural toconsider the learning version of the quantum phase recog-nition problem.

Definition 1 (QPL Problem). Given training data S =(ai, bi)Ni=1 for which ai, bi indicate the classical cou-pling weight and phase value observed from the i-th ex-periment associated with Hamiltonian H (ai), the targetis to learn a prediction model h(a) to minimize the risk

R(h) =∑a∼XD(a) (h(a)− b)2 , (2)

for some fixed distribution D(a) defined on the datumspace X .

In the following, we consider solving the QPL problemusing classical and quantum computing methods.

III. CLASSICAL HARDNESS FOR QPLPROBLEM

In this section, we show the hardness for quantumphase computation and QPL problems using classicalcomputers.

Here we assume that the order parameter M is ageneral n-fold tensor product of local Pauli operators.Therefore, the quantum phase computation is an instanceof the mean-value problem which is the central part ofthe variational quantum algorithms, and “Is the quan-tum computer necessary for the mean value problem?” isstill open, as mentioned in Ref [62]. Note that for anyquantum state |φ〉, there exists a quantum circuit U asso-ciated with |φ〉 such that |φ〉 = U |0n〉. Bravyi et al. [62]proposed an upper bound on estimating 〈φ|M|φ〉 in thecase of a poly(n)-depth U associated with |φ〉. Here, weprovide a lower bound of this problem based on the fol-lowing conjecture raised by Bouland et al. [63].

Conjecture 1 (Ref. [63]). There exists an n-qubit quan-tum circuit U such that the following task is # P-hard:approximate |〈0n|U |0n〉|2 to additive error εc/2

n withprobability 3

4 + 1poly(n) .

Here, a candidate of the worst-case U ∈ C2n×2n isa size m ≤ poly(n) unitary where each basic gate is asingle- or two-qubit gate drawn from Haar-measure [63],following some fixed gate position structure A. We de-note this distribution as HA, see Appendix A for thedetails of this distribution. The hardness result for thequantum phase calculation problem can be stated as thefollowing lemma.

Page 3: Provable Advantage in Quantum Phase Learning via Quantum

3

Lemma 1. With the assumption that Conjecture 1 holds,and the PH in the computational complexity theory doesnot collapse, there exists an n-qubit Hamiltonian H(a)and an order parameterM, such that their correspondingquantum phase computation problem cannot be efficientlycalculated by any classical algorithm.

We provide detailed proof in the Appendix B. Thislemma also serves for the hardness of the QPL problem.Following the “worst-to-average-case” reduction [63], wecan construct a testing set T = (xi, yi)Mi=1 with asso-ciated feature states |φ(xi)〉Mi=1 (ground states). Givenclassical training data S (from classical method), weprove that no classical learning algorithm can efficientlylearn the hypothesis h∗ such that R(h∗(x)) is close tozero for x ∈ T , as shown in the following theorem.

Theorem 1. Given training data S = (ai, bi)Ni=1

(acquired from classical methods) for which ai, bi indi-cate the classical coupling weight and phase value asso-ciated with the Hamiltonian H(ai), there exists a testingset T = (xi, yi)Mi=1 and corresponding ground states|φ(xi)〉Mi=1 such that learning a hypothesis h enablingPr [R(h) < ε] > 1 − δ on T is hard for any classical MLalgorithm, with the assumption that Conjecture 1 holdsand the PH does not collapse. The generalized errorε = O(1/

√N), failure probability δ ∈ (0, 1), and the scale

of testing data M = poly(N).

As we discuss in Sec VI, we divide the QPL probleminto four different cases in terms of training data acquir-ing method and learning algorithms. Theorem 1 giveshardness result for the classical data acquiring methodand classical learning algorithm, using the hardness re-sult in Lemma 1.

We give a proof sketch in the following and defer thedetails in Appendix B 4.

Proof sketch. The fundamental idea of the proof com-bines the ‘worst-to-average-case’ reduction of the randomcircuit sampling problem and the learning complexityclass of classical ML algorithms.

Firstly, we design a learning instance with the sameformat as in Lemma 1. Since the quantum adiabatic al-gorithm with 2-local Hamiltonians can implement univer-sal quantum computational tasks [32, 64], any quantumstate |φ(a)〉 generated from the distribution HA (men-tioned in Conjecture 1) can be encoded as the groundstate of a 2-local Hamiltonian H(a). Therefore, con-structing ground states in T is equivalent to construct-ing random quantum circuits. To give an average-caseresult, we replace |φ(a)〉 = U |0⊗n〉 into |φ(x)〉. Here

|φ(x)〉 is obtained by performing a circuit U(x) drawn

from a distribution HA to the initial state |0⊗n〉. For a

circuit U = Um . . . U1 ∼ HA, U(x) is obtained by multi-plying H1−x

j for each Uj where Hj is drawn from Haar-

measure distribution with structure A, x =∑j xj2

−j

and x = (x1, x2, . . .). The new generated distribu-

tion is denoted as HA. Bouland et al. [63] proved

that the distance between 〈j|U(x)|0〉 and 〈j|V (x)|0〉 (alow-degree polynomial function of x) is bounded, whereV (x) = Vm(x) . . . V1(x), and Vj(x) is the approximation

of UjH1−xj by replacing H1−x

j into the truncated Taylor

expansion in x. By Aaronson and Arkhipov [65], it is#P -hard to compute 〈j|V (x)|0〉 for most of V (x), then

it is also #P -hard to compute 〈j|U(x)|0〉 for most of

U(x). Combined with the result of Lemma 1, we havethe average-hardness to calculate the label yi given xi.

Secondly, the power of classical ML is in BPP/samp⊆ P/poly class [40]. Hence with the assumption thatPH does not collapse, learning with classical data is in-tractable even in the average-case.

Therefore we have proved that the QPL problem can-not be efficiently solved only using classical computers.Specifically, if the training data is generated classicallyand the learning algorithm is also classical, efficientlysolving QPL would lead to an unlikely collapse of com-plexity classes.

IV. QUANTUM LEARNING ALGORITHM

In this section, we show the possibility of solving theQPL problem with quantum data (acquiring from quan-tum devices) leveraging the alphatron algorithm [45]combined with the quantum kernel method. From thelearning theory perspective, training can be phrased asthe empirical risk minimisation, and the associated learn-ing model h∗ for the minimizer of the empirical risk is castas follows.

Theorem 2 (Representer theorem [44]). LetS = (ai, bi)Ni=1 and corresponding feature statesU(ai)|0n〉Ni=1 be the training data set. Q : X ×X 7→ Rbe a quantum kernel with the kernel space H. Considera strictly monotonic increasing regularisation functiong : [0,∞) 7→ R, and regularised empirical risk

RL(h∗) =1

N

N∑i=1

(h∗(ai)− bi)2 + g (‖h∗‖H) . (3)

Then any minimiser of the empirical risk RL(h∗) admits

a representation of the form h∗(x) =∑Ni=1 αiQ(ai,x),

where αi ∈ R for all i ∈ 1, 2, ..., N, x and a are drawnfrom the same distribution.

This representer theorem holds since the quantum ker-nel matrix is positive semi-definite. Using Theorem 2,the critical point for solving the QPL problem is theselection of the quantum kernel and the training algo-rithm. Since the label b represents the phase value (meanvalue) by Definition 1, one of the options of the kernel isQ(ai,x) = |〈φ(ai)|φ(x)〉|2, and unknown measurementM thus can be represented as a linear combination of fea-ture states, that is,M≈

∑i αi|φ(ai)〉〈φ(ai)|. Given the

kernel matrix Q = [Q(ai,aj)]N×N , the optimal weight

Page 4: Provable Advantage in Quantum Phase Learning via Quantum

4

parameters αi in the expression of M has a closed-formsolution by leveraging linear regression algorithms, and itrequires O(Nw)(w ≈ 2.373) time complexity for solvingthe above problem [66].

Here we give a learning approach for all of αi, whichhas a polynomial speedup to the above method. In par-ticular, we introduce the quantum kernel into the Al-phatron algorithm [45], as depicted in Alg. 1. We findthat the quantum kernel perfectly fit into Alphatron al-gorithm and hence the QPL problem can be solved inO(N1.5) time if the kernel matrix Q is provided, as de-picted in Theorem 3.

Algorithm 1: Quantum Kernel Alphatron

Input : training set S = (ai, bi)Ni=1 ∈ Rd × [0, 1],Hamiltonian H(a) with coupling weight a,parameterized quantum circuit U(θ),

quantum circuit approximation Q(ai,x) forquantum kernel Q(ai,x), learning rateλ > 0, number of iterations T , testing dataT = (xj , yj)Mj=1 ∈ Rd × [0, 1]

Output: hr

1 for i = 1, 2, ..., N do2 Prepare the ground state

|φ(a∗i )〉 = U(θ(a∗i ))|0⊗n〉, where θ (a∗i ) :=arg minθ(ai)〈0

⊗n|U†(θ(ai))|H(ai)|U(θ(ai))|0⊗n〉;3 α1 := 0 ∈ RN ;4 for t = 1, 2, ..., T do

5 ht(x) :=∑Ni=1 α

tiQ(ai,x);

6 for i = 1, 2, ..., N do

7 αt+1i = αti + λ

N(bi − ht(ai));

8 Let r = arg mint∈1,...,T∑Mj=1(ht(xj)− yj)2;

9 return hr

Theorem 3. Let quantum kernel Q(ai,x) =|〈φ(ai)|φ(x)〉|2, and (ai, bi)Ni=1 be the training

set such that E[bj |aj ] =∑i αi |〈φ(ai)|φ(aj)〉|2 + g(aj),

g : [0,∞) 7→ [−G,G] is a strictly monotonic increas-ing regularisation function such that E[g2] ≤ εg and∑ij αiαj |〈φ (ai) |φ (aj)〉|2 < B. Then for failure prob-

ability δ ∈ (0, 1), O(N5/2

)copies of quantum states to

estimate Q(ai,x), Alg. 1 outputs a hypothesis h∗ such

that RL(h∗) can be bounded by

O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N

)(4)

by selecting λ = 1, T = O(√N/ log(1/δ)) and M =

O(N log(T/δ)).

Proof sketch. Let w =∑Ni=1 αi|φ(ai)〉 ⊗ |φ(ai)〉∗, where

|φ〉∗ is the conjugate of |φ〉, and the reproduced-kernel-feature-vector Ψ(ai) = |φ(ai)〉 ⊗ |φ(ai)〉∗. Then

Q(ai,x) = |〈φ(ai)|φ(x)〉|2 = 〈Ψ(ai)|Ψ(x)〉, (5)

Quantum Alphatron algorithm

1

Experimental value (𝒂, 𝑏)

Ground states 𝜙(𝑎)

𝑯(𝒂)

Quantum Device

Quantum kernel

𝑄( , )

Quantum AlphatronCoupling 𝒙

Predict phase y(𝒙)

𝜶

FIG. 1. The procedure of the proposed quantum learningalgorithm.

and ∑i

αi |〈φ(ai)|φ(aj)〉|2 = 〈w,Ψ(x)〉 , (6)

by the definition of w and |Ψ(ai)〉. There-

fore, E [bj |aj ] = 〈w,Ψ(x)〉 + g (aj) and ‖w‖2 =∑ij αiαj |〈φ (ai) |φ (aj)〉|2 < B. Hence, this theorem fol-

lowed by substituting the quantum kernel Q and featuremap Ψ into Theorem 1 of Goel and Klivans [45] if we canimplement Q perfectly.

Nevertheless, We can only approximate it with smalladditive error via quantum circuit. Specifically, thequantum kernel can be approximated by indepen-dently performing the Destructive-Swap-Test [67] toO(log(1/δ)/ε2) copies of 2n-qubit state |φ (ai)〉⊗|φ (x)〉,with additive error εQ and failure probability δ, see Ap-pendix D for the details of the circuit implementation ofquantum kernel.

The main idea to bound RL(h∗) in Theorem 3 is toprove that with polynomial copies of training states, wecan bound∣∣∣ht (x)− ht (x)

∣∣∣ ≤ O (t2√log (1/δ)/N5/4)

where ht is the ideal hypothesis with exact quantum ker-nel function Q(·, ·). We defer to Appendix B 2 for thetedious calculations.

The regularisation function g is used to avoid the over-fitting problem, and a general selection is to bound thenorm of ‖w‖, that is g(·) = 2L‖w‖2 − G which is de-termined by all ai ∈ S and the positive parameterL ∈ [0, GB−1]. According to Theorem 3, the proposedquantum learning algorithm can output a hypothesis thatminimizes the empirical risk in O(

√N log(1/δ)tQ) classi-

cal time, where tQ is the time required to compute kernelfunction Q. The complete learning procedure is illus-trated in Fig. 1.

V. NUMERICAL SIMULATION RESULTS

Here we test the capability of the quantum kernel Al-phatron algorithm for several instances of QPL tasks.

Page 5: Provable Advantage in Quantum Phase Learning via Quantum

5

𝑔 𝑔 𝑔 𝑔𝑔

𝑖 𝑖 + 1𝑖 − 1

𝐽1, 𝐽2(a)

(b)

(c)

FIG. 2. Numerical results for predicting ground-state properties in a S = 1/2 XXZ spin model with 16 qubits. (a) Illustrationof the concerned spin model geometry. (b) The two curves depicts the tendency of RL(h) in the training procedure, where thegreen (blue) curve represents the number of training data N = 15 (N = 10). (c) For a fixed iterations (400), we randomly selectthe training data (N = 10 or N = 15) over 10 trials and plot the average performance by Alg. 1 in predicting the magnetizationMx.

(a) (b) (c)

FIG. 3. Numerical results for recognizing a Z2 ×Z2 Symmetry-Protected-Topological (SPT) phase of Haldane Chain. (a) Theexact phase diagram of Haldane Chain (Eq. 8), where the phase boundary points (blue and red curves) are extracted from theliterature [20], and the background shading represents the phase function of (h1/J, h2/J). The yellow line indicates 40 trainingpoints on the line h2 = 0. (b) The phase diagram of Haldane Chain generated by the Alg. 1 with the size of n = 16. (c) The

three curves indicate the variation trend of the loss function RL(h∗) in the training procedure. It shows that our method hasa significant less convergence steps compared to the quantum convolution neural network (QCNN) method [20] with QCNNlayer depth d = 1, 2 in recognizing the SPT phase problem.

Firstly, we consider a warm-up case that detects theappearance of the staggered magnetization for the S =12XXZ spin chain in the Ising limit [69]. The Hamiltonianis defined as

Hw =

n∑i=1

(J1(Sxi S

xi+1 + Syi S

yi+1

)+ J2S

zi S

zi+1

)− g

n∑i=1

Sxi ,

(7)

where Sαi is the α-component of the S = 1/2 spin opera-tor at the i-th site, and g is the strength of the transversefield. The exchange coupling constant in xy plane is de-noted by J1 and that of the z-axis direction by J2. Here,

we set J1 = 0.2, J2 = 1 and depict the phase diagramMx = 〈X〉 as a function of g (see the green curve in Fig.2 (c)), where the expectation is under the ground stateof Hamiltonian Hw. In this case, the number of qubitsn = 16, the training data S = (gi,Mx(gi))Ni=1 where giis randomly sampled from the interval [0, 2]. The predic-tions proposed by Alg. 1 are illustrated in Fig. 2, whichshows Alg. 1 yields a high accuracy prediction even for avery small training set (N = 15).

Secondly, we consider a Z2 × Z2 symmetry-protectedtopological (SPT) phase P which contains the S = 1 Hal-dane chain. The ground states |Φ(h1/J, h2/J)〉 belongs

Page 6: Provable Advantage in Quantum Phase Learning via Quantum

6

𝑃1

𝑃3𝑃2

𝑃1

𝑃2

(a) (b)

𝐽1/𝐽2

FIG. 4. Numerical results for recognizing three distinct phases of XXZ model. (a) The system’s three distinct phases arecharacterized by the topological invariant ZR discussed in the Ref [68]. The invariant ZR = +1 marks the Trivial phase P1,ZR = −1 marks the Topological phase P2 and ZR = 0 marks the Symmetry broken phase P3. Here, the (white and blue)background shading is depicted by the Alg. 1, and the lines δ = 3.0, δ = 0.5 represents 60 training points. (b) The functionZR(J1/J2) at corss sections δ = 3.2 and δ = 1.0 of the phase diagram.

to a family of Hamiltonians

Hs = −Jn−2∑i=1

ZiXi+1Zi+2 − h1n∑i=1

Xi − h2n−1∑i=1

XiXi+1,

(8)

where Xi, Zi are Pauli operators for the spin at site i, nis the number of spins, and h1, h2 and J are param-eters of Hs. In Fig. 3 (a), the blue and red curvesshow the phase boundary points, and the backgroundshading (colored tape) represents the phase diagram asa function of x = (h1/J, h2/J). When the parameterh2 = 0, the ground states of Hs can be exactly solv-able via the Jordan-Wigner transformation, and it can beefficiently detected by global order parameters whetherthese ground states belong to the SPT phase P. Here,we utilize N = 40 data pairs a = (h1/J, h2/J), b asthe training data, in which h2 = 0 and b indicates phasevalue on a (see yellow points in Fig. 3 (a)). Our tar-get is to identify whether a given, unknown ground state|Φ(x)〉 belongs to P. The simulation results for n = 16are illustrated as Fig. 3 (b), which shows that Alg. 1can reproduce the phase diagram with high accuracy onM = 4096 testing points.

Remarkably, since the training data is only on the linewith h2 = 0, we cannot classically learn the quantumphase only from the relationship between the phases andthe h1, h2 parameters. However, the quantum kernel Al-phatron algorithm works using more information of quan-tum kernels. We thus demonstrate that even if the train-ing data are all from classically solvable cases, quantumkernel Alphatron still works. We also note that the quan-tum convolution neural network (QCNN) method [20]has been proposed to solve the same problem by apply-ing a CNN quantum circuit to the quantum state. We

TABLE I. QPL categories in terms of the training data ac-quiring method and learning algorithms.

Training data acquiring method Learning Alg. Hardnessclassical classical NP-hardclassical quantum openquantum classical open

quantum quantum BQPO

numerically show a faster convergence of our quantumkernel Alphatron and leave a more detailed comparisonto QCNN in future works.

Finally, we consider the bond-alternating XXZ model

Hb =∑i:even

J1Hi +∑i:odd

J2Hi, (9)

where Hi = XiXi+1 +YiYi+1 +δZiZi+1, and J1, J2, δ arecoupling parameters of Hb. The XXZ model has threedifferent phases that can be detected by the topologi-cal invariant ZR(J1/J2, δ) [68]. Here, we select totallyN = 60 pairs a = (J1/J2, δ), b = ZR(J1/J2, δ) as thetraining data on the δ = 0.5, δ = 3.0 horizontal lines. InFig. 4 (a), we utilize Alg. 1 to generate the phase diagramas a function of x = (J1/J2, δ), where the colored shad-ing background represents the phase classification resultson a 16-qubit system. The data in phase diagram P3 ispost-processed by the averaging scheme.

VI. THE COMPLEXITY CLASS OF QPLPROBLEM

Before concluding the results, we discuss more on thedifference between “classical” and “quantum” learning.

Page 7: Provable Advantage in Quantum Phase Learning via Quantum

7

In terms of the method that produces the training dataand learning algorithm being classical or quantum, weconsider four categories, as shown in Table I. We discussthe relationship between the four categories respectively.

• For the QPL problem that satisfies Lemma 1, The-orem 1 indicates that this problem is outside the“C-Learning Alg. + C-Data” class, while Theo-rem 3 shows that it belongs to the “Q-LearningAlg. + Q-Data” class. These results thus imply“Q-Learning Alg. + Q-Data” is strictly strongerthan “C-Learning Alg. + C-Data” with suitablecomplexity assumptions. In Table I, we use BQPO

to denote the hardness of the QPL problem. Thisimplies that the quantum kernel Alphatron withquantum data can efficiently solve the QPL prob-lem if there exists an algorithm O that provides theground state of the concerned Hamiltonians.

• Our simulation results also indicate that some QPLproblems are classically hard, yet they could besolved by a quantum learning algorithm with “C-Data”. Therefore, “Q-Learning Alg. + C-Data”could also be strictly stronger than “C-LearningAlg. + C-Data”.

• Another interesting class is “C-Learning Alg. + Q-Data”. Whether it is stronger than “Q-LearningAlg. + C-Data” or strictly weaker than “Q-Learning Alg. + Q-Data” is an interesting problem.Recently, Huang et al. [14] provided a protocol thattakes classical shadow representations as the inputof classical MLs, and utilizes the trained classicalMLs to predict properties of many-body wave func-tions. Their method belongs to the “C-LearningAlg. + Q-Data” class. However, the efficiency ofthe classical shadow depends on the locality of theconcerned order parameter, whether QPL problemsbelong to “C-Learning Alg. + Q-Data” class is leftas an open problem.

We summarize the complexity relationship of these fourcategories for general problems in Fig. 5.

VII. DISCUSSIONS

In this paper, we study the quantum phase learning(QPL) problem using classical and quantum approaches.We prove that with a reasonable conjecture in Ref. [63]

that approximate |〈0n|U |0n〉|2 to additive error εc/2n is

# P -hard, joint with the assumption that PH does notcollapse, it is computationally hard to learn QPL problem

with classical tractable training data set. On the otherhand, we also prove that we can learn QPL problem ef-ficiently if we have a quantum computer. We proposean effective algorithm to illustrate this quantum learningprocess. The quantum learning algorithm is a quantiza-tion of the Alphatron algorithm [45] by leveraging of thequantum kernel method.

Q-Learning Al. + Q-Data

C-Learning Al. + C-Data(BPP/samp)

QPL problem

C-Learning Al.+ Q-Data

Q-Learning Al. + C-Data

FIG. 5. Visualization of the learning ability in terms ofdata acquiring method and learning algorithms, where “Q-Learning Alg.” (“C-Learning Alg.”) represents learning al-gorithms with quantum (classical) computer, “Q-Data” (“C-Data”) represents the learning data that are directly observedfrom physical experiments (can be efficiently simulated byclassical Turing machines).

Numerically, We apply the quantum learning algo-rithm to solving QPL problems, and numerical exper-iments corroborate our theoretical results in a varietyof scenarios, including symmetry-protected topologicalphases and symmetry-broken phases. The numerical re-sults show that our quantum kernel Alphtron algorithmhas a good learning performance for quantum propertieseven with classical training data.

We leave an open problem on whether our quan-tum learning algorithm has a better performance witha more complicated quantum neural network instead ofthe quantum kernel. Since our numerical results hintthe possibility to efficiently solve QPL problem, is thereany rigorous proof to show that QPL problem belongsto “Q-Learning Alg.+C-Data” class? The capability of“C-Learning Alg.+Q-Data” is another interesting openquestion.

ACKNOWLEDGMENTS

Y. Wu thanks Y. Song for providing valuable discus-sions. We would like to thank P. Rebentrost to providesome suggestions for the manuscript.

[1] F. Becca and S. Sorella, Quantum Monte Carlo ap-proaches for correlated systems (Cambridge University

Press, 2017).[2] D. Ceperley and B. Alder, Science 231, 555 (1986).

Page 8: Provable Advantage in Quantum Phase Learning via Quantum

8

[3] W. Foulkes, L. Mitas, R. Needs, and G. Rajagopal, Re-views of Modern Physics 73, 33 (2001).

[4] J. Carlson, S. Gandolfi, F. Pederiva, S. C. Pieper, R. Schi-avilla, K. Schmidt, and R. B. Wiringa, Reviews of Mod-ern Physics 87, 1067 (2015).

[5] W. Kohn, Review of Modern Physics 71, 1253 (1999).[6] S. R. White, Physical Review Letters 69, 2863 (1992).[7] S. R. White, Physical Review B 48, 10345 (1993).[8] G. Carleo and M. Troyer, Science 355, 602 (2017).[9] J. Carrasquilla and R. G. Melko, Nature Physics 13, 431

(2017).[10] X. Gao and L.-M. Duan, Nature Communications 8, 1

(2017).[11] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and

J. I. Cirac, Physical Review X 8, 011006 (2018).[12] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,

R. Melko, and G. Carleo, Nature Physics 14, 447 (2018).[13] J. R. Moreno, G. Carleo, and A. Georges, Physical Re-

view Letters 125, 076402 (2020).[14] H.-Y. Huang, R. Kueng, G. Torlai, V. V. Albert, and

J. Preskill, arXiv:2106.12627 (2021).[15] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,

N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, andH. Neven, Nature Physics 14, 595 (2018).

[16] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A.Buell, et al., Nature 574, 505 (2019).

[17] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C.Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu, et al.,Science 370, 1460 (2020).

[18] F. D. M. Haldane, Physical Review Letters 50, 1153(1983).

[19] F. Pollmann and A. M. Turner, Physical Review B 86,125441 (2012).

[20] I. Cong, S. Choi, and M. D. Lukin, Nature Physics 15,1273 (2019).

[21] S. Sachdev, Physics World 12, 33 (1999).[22] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and

R. D. Somma, Physical Review Letters 114, 090502(2015).

[23] G. H. Low and I. L. Chuang, Physical Review Letters118, 010501 (2017).

[24] P. W. Shor, SIAM Review 41, 303 (1999).[25] X. Qiang, T. Loke, A. Montanaro, K. Aungskunsiri,

X. Zhou, J. L. O’Brien, J. B. Wang, and J. C. Matthews,Nature Communications 7, 1 (2016).

[26] X. Qiang, Y. Wang, S. Xue, R. Ge, L. Chen, Y. Liu,A. Huang, X. Fu, P. Xu, T. Yi, and J. B. Wang, ScienceAdvances 7, 8375 (2021).

[27] M. Szegedy, in 45th Annual IEEE symposium on founda-tions of computer science (IEEE, 2004) pp. 32–41.

[28] S. Bravyi, D. Gosset, and R. Konig, Science 362, 308(2018).

[29] D. Maslov, J.-S. Kim, S. Bravyi, T. J. Yoder, and S. Shel-don, Nature Physics 17, 894 (2021).

[30] B. Wu, M. Ray, L. Zhao, X. Sun, and P. Rebentrost,Physical Review A 103, 042422 (2021).

[31] B. Wu, J. Sun, Q. Huang, and X. Yuan,arXiv:2105.13091 (2021).

[32] Y. Wu, C. Wei, S. Qin, Q. Wen, and F. Gao,arXiv:2005.11970 (2020).

[33] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Phys-ical Review A 98, 032309 (2018).

[34] V. Havlıcek, A. D. Corcoles, K. Temme, A. W. Harrow,A. Kandala, J. M. Chow, and J. M. Gambetta, Nature567, 209 (2019).

[35] M. Schuld and N. Killoran, Physical Review Letters 122,040504 (2019).

[36] M. Schuld, arXiv2101.11020 (2021).[37] Y. Liu, S. Arunachalam, and K. Temme, Nature Physics

17, 1013 (2021).[38] J.-G. Liu and L. Wang, Physical Review A 98, 062324

(2018).[39] C. Blank, D. K. Park, J.-K. K. Rhee, and F. Petruccione,

npj Quantum Information 6, 41 (2020).[40] H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush,

S. Boixo, H. Neven, and J. R. McClean, Nature Com-munications 12, 1 (2021).

[41] H.-Y. Huang, R. Kueng, and J. Preskill, Physical ReviewLetters 126, 190505 (2021).

[42] T. Haug, C. N. Self, and M. Kim, arXiv:2108.01039(2021).

[43] P. Rebentrost, M. Santha, and S. Yang,arXiv:2108.11670 (2021).

[44] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Founda-tions of machine learning (MIT press, 2018).

[45] S. Goel and A. R. Klivans, in Conference on LearningTheory (PMLR, 2019) pp. 1470–1499.

[46] A. T. Kalai and R. Sastry, in COLT (Citeseer, 2009).[47] R. Shaydulin and S. M. Wild, arXiv:2111.05451 (2021).[48] J. Liu, F. Tacchino, J. R. Glick, L. Jiang, and A. Mez-

zacapo, arXiv:2111.04225 (2021).[49] N. Shirai, K. Kubo, K. Mitarai, and K. Fujii,

arXiv:2111.02951 (2021).[50] K. Nakaji, H. Tezuka, and N. Yamamoto,

arXiv:2109.03786 (2021).[51] S. Jerbi, L. J. Fiderer, H. P. Nautrup, J. M. Kubler, H. J.

Briegel, and V. Dunjko, arXiv preprint arXiv:2110.13162(2021).

[52] T. Sancho-Lorente, J. Roman-Roche, and D. Zueco,arXiv:2109.02686 (2021).

[53] L. H. Li, D.-B. Zhang, and Z. Wang, arXiv:2108.11114(2021).

[54] L.-P. Henry, S. Thabet, C. Dalyac, and L. Henriet, Phys-ical Review A 104, 032416 (2021).

[55] J. M. Kubler, S. Buchholz, and B. Scholkopf,arXiv:2106.03747 (2021).

[56] J. R. Glick, T. P. Gujarati, A. D. Corcoles, Y. Kim,A. Kandala, J. M. Gambetta, and K. Temme,arXiv:2105.03406 (2021).

[57] T. Hubregtsen, D. Wierichs, E. Gil-Fuster, P.-J. H. Derks, P. K. Faehrmann, and J. J. Meyer,arXiv:2105.02276 (2021).

[58] S. Klus, P. Gelß, F. Nuske, and F. Noe, arXiv:2103.17233(2021).

[59] X. Wang, Y. Du, Y. Luo, and D. Tao, arXiv:2103.16774(2021).

[60] R. Huusari and H. Kadri, arXiv:2101.05514 (2021).[61] T. Kusumoto, K. Mitarai, K. Fujii, M. Kitagawa, and

M. Negoro, arXiv preprint arXiv:1911.12021 (2019).[62] S. Bravyi, D. Gosset, and R. Movassagh, Nature Physics

17, 337 (2021).[63] A. Bouland, B. Fefferman, C. Nirkhe, and U. Vazirani,

Nature Physics 15, 159 (2019).[64] J. Kempe, A. Kitaev, and O. Regev, SIAM Journal on

Computing 35, 1070 (2006).

Page 9: Provable Advantage in Quantum Phase Learning via Quantum

9

[65] S. Aaronson and A. Arkhipov, in Proceedings of theForty-Third Annual ACM Symposium on Theory of Com-puting, STOC ’11 (Association for Computing Machin-ery, New York, NY, USA, 2011) p. 333–342.

[66] F. Le Gall, in Proceedings of the 39th international sym-posium on symbolic and algebraic computation (2014) pp.296–303.

[67] J. C. Garcia-Escartin and P. Chamorro-Posada, PhysicalReview A 87, 052330 (2013).

[68] A. Elben, J. Yu, G. Zhu, M. Hafezi, F. Pollmann,P. Zoller, and B. Vermersch, Science Advances 6, 3666(2020).

[69] Y. Hieida, K. Okunishi, and Y. Akutsu, Physical ReviewB 64, 224422 (2001).

[70] L. Stockmeyer, SIAM Journal on Computing 14, 849(1985).

[71] S. Arora and B. Barak, Computational complexity: amodern approach (Cambridge University Press, 2009).

[72] L. Welch, in IEEE Int. Symp. Inform. Theory (1983).[73] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.

Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien,Nature Communications 5, 1 (2014).

[74] W. Li, Z. Huang, C. Cao, Y. Huang, Z. Shuai, X. Sun,J. Sun, X. Yuan, and D. Lv, arxiv:2109.08062 (2021).

[75] C. Cao, J. Hu, W. Zhang, X. Xu, D. Chen, F. Yu, J. Li,H. Hu, D. Lv, and M.-H. Yung, arxiv:2109.02110 (2021).

[76] D. Wecker, M. B. Hastings, N. Wiebe, B. K. Clark,C. Nayak, and M. Troyer, Physical Review A 92, 062318(2015).

[77] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.Coles, Nature Communications 12, 1 (2021).

[78] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Nature549, 242 (2017).

[79] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush,and H. Neven, Nature Communications 9, 1 (2018).

Page 10: Provable Advantage in Quantum Phase Learning via Quantum

10

Appendix A: Architecture for Haar random circuitdistribution

In this section, we give the explicit definition for thearchitecture of Haar random circuit distribution.

Definition 2 (Architecture). An architecture A is a col-lection of directed acyclic graphs, one for each integer n.Each graph consists of m < poly(n) vertices, and thedegree of each vertex v satisfies degin(v) = degout(v) ∈1, 2.

Definition 3 (Haar random circuit distribution). Let Abe an architecture over circuits and let the gates in thearchitecture be Gii=1,...,m. Define the distribution HAover circuits in A by drawing each gate Gi independentlyfrom the Haar measure.

Appendix B: Proof of theorems

Here, we provide technical details for the proof of the-orems in the main text.

1. Proof of Lemma 1

We first review several lemmas and assumptions whichare closely related to our proof.

Lemma 2 (Stockmeyer [70]). Given as input a functionf : 0, 1n 7→ 0, 1m and any y ∈ 0, 1m there is aprocedure that runs in randomized time poly(n, 1/ε) withaccess to an NP oracle that outputs an α such that

(1− ε)p ≤ α ≤ (1 + ε)p (B1)

for the value

p =1

2n

∑x

f(x)

if the function f can be computed efficiently given x.

Conjecture 1. (Ref. [63]) There exists an n-qubitquantum circuit U such that the following task is # P-hard: approximate |〈0n|U |0n〉|2 to additive error εc/2

n

with probability 34 + 1

poly(n) .

Proof of Lemma 1. For a ground state |φ〉 = U |0n〉 ofH(a) satisfies Conjecture 1, we can project it to any

computational basis |j〉 with probability p(j) = |〈j|φ〉|2.The hiding argument shows that if one can approxi-mate the probability p(j), then one can approximate

p(0n) = |〈0n|φ〉|2. Therefore Conjecture 1 suggests thatapproximating the p(j) to additive error 2−poly(n) is #P-hard.

Here, we can construct a series of observable Menabling Lemma 1 holds. Let the observable set

M(s)|M(s) = Zs11 ⊗ Zs22 ⊗ · · · ⊗ Zsnn , where Zkdenotes Pauli-Z operator acts on the k-th qubit, ands = s1s2...sn ∈ 0, 1n. Then we have

os = 〈φ|M(s)|φ〉 =∑j

p(j)(−1)j·s, (B2)

and os/2n is the Fourier transformation of p(j). Based

on the algebra symmetry between p(j) and os/2n, we

have

p(j) =∑s

os(−1)j·s/2n. (B3)

According to Conjecture 1, there exists a quantum cir-cuit U and state |φ〉 = U |0n〉 such that it is #P -hard to

approximate p(j) = |〈j|φ〉|2 := |〈j|U |0n〉|2 with additiveerror 1

2npoly(n) . On the other hand, if os can be efficiently

approximated by a classical computer given s, there ex-

ists a BPPNPBPP

algorithm that can approximate p(j)with the multiplicative error 1/poly(n) based on a the-orem by Stockmeyer [70]. Considering BPP ⊆ P/poly,

this yields P#P ⊆ BPPNPBPP ⊆ BPPNP/poly. SinceNPNP ⊆ P#P, one has NPNP ⊆ BPPNP/poly, which im-plies PH collapses to the second level [71].

Therefore, there does not exist a classical algorithmthat can efficiently calculate os based on the assumptionthat PH does not collapse and Conjecture 1 holds.

2. Proof of Theorem 3

Proof of Theorem 3. Notice that if the quantum kernel Qcan be exactly calculated, then by Goel and Klivans [45],Alg. 1 outputs a hypothesis h∗ such that

R(h∗) ≤ O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N

).

This inequality is obtained by leveraging of

R(h∗) ≤ R(ht∗)

+O

(B

√1

N+

√log(1/δ)

N

), (B4)

and

R(ht∗)≤ O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N

).

(B5)for some t∗ ≤ T = O (N/ log(1/δ)). Nevertheless, if thequantum kernel Q is approximated via performing quan-tum circuits, Eq. (B5) should be replaced with

R(ht∗)≤ O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N

).

(B6)

where ht (x) =∑mi=1 α

tiQ (ai,x), and Q is the approxi-

mation of Q.

Page 11: Provable Advantage in Quantum Phase Learning via Quantum

11

In the following, we will prove that∣∣∣R(ht∗)− R (ht∗)

∣∣∣is bounded, and hence R (h∗) is bounded by combining

Eq. (B4), (B5) and (B6). By Theorem 3, Q (ai,x) is anεQ approximation of Q (ai,x), i.e.,∣∣∣Q (ai,x)−Q (ai,x)

∣∣∣ ≤ εQ (B7)

with high probability.For convenience, in the later proof we require for all i,

δiQ = Q (ai,x)−Q (ai,x) are the same, and δtαi= αti−αti

are also the same, denoted them as δQ, δtα respectively.

Since δiQ are in the same order for i ∈ [N ] (δtαisimilarly),

hence it is reasonable for the assumptions. We will havethe same upper bound for R (h∗) without the assump-tions and with a more tedious proof. Then for any i, wehave

− δtα = αti − αti

= α1i − α1

i +1

N

t−1∑k=1

(hk (ai)− hk (ai)

)=

1

N

t−1∑k=1

N∑j=1

(αkj Q (aj ,ai)− αkjQ (aj ,ai)

)

=

t−1∑k=1

(AkδQ + Qiδ

kα + δkαδQ

)(B8)

where Ak = 1N

∑Nj=1 α

ki , and Qi = 1

N

∑Nj=1Q (aj ,ai).

We can also obtain the value of −δt−1α by leveraging ofEq. (B8) and the recurrence relationship. The followingequations follows by subtracting −δt−1α by −δtα,

δtα =(Qi − 1

)δt−1α +At−1δQ + δt−1α δQ, (B9)

hence with the fact that 0 ≤ Qi ≤ 1 and Ak ≤ k−1N , the

absolute value of δtα satisfies the inequality∣∣δtα∣∣ ≤ (1 + |δQ|)∣∣δt−1α

∣∣+t− 2

N|δQ|

=∣∣δt−1α

∣∣+3(t− 2)

NεQ

By the recurrence of |δtα|, we have∣∣δtα∣∣ ≤ 3εQN

t−2∑k=1

k

≤ 3t2εQ2N

,

then we have∣∣∣ht (x)− ht (x)∣∣∣ =

∣∣∣∣∣N∑i=1

(αtiQ (ai,x)− αtiQ (ai,x)

)∣∣∣∣∣≤

N∑i=1

(2∣∣αti∣∣ εQ + 2Q (ai,x) |δtα|

)≤ 2N

(t− 1

NεQ +

∣∣δtα∣∣)≤ 4t2εQ,

for large where the second inequality holds since |αti| ≤t−1N .

Therefore,

∣∣∣R (ht∗)− R(ht∗)∣∣∣ ≤ 1

N

∣∣∣∣∣N∑i=1

(ht (ai)− ht (ai)

)∣∣∣∣∣≤ 4t2εQ,

where the firstly inequality holds by the defini-

tion of R (ht∗) and R(ht∗)

(Recall that R (ht∗) =

1N

∥∥∥∑Ni=1 (E [bi|ai]− ht (ai)) |φ (ai)〉

∥∥∥). The additive er-

ror εQ for the quantum kernel Q (ai,x) can be bounded

to O

(√log(1/δ)

N5/4

)with O

(N5/2

)copies of the quantum

states.Hence,

R(h∗)

≤ O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N+

log(1/δ)

N1/4

)

≤ O

(√εg +G

4

√log(1/δ)

N+B

√log(1/δ)

N

),

where the last inequality holds since t = O (N/ log(1/δ))and G = Ω (1).

3. Complexity argument for the power of data

Here, we review the power of classical ML algorithmsthat can learn from data by means of a complexity class,which is defined as BPP/poly in Ref. [40]. A language Lof bit strings is in BPP/poly if and only if the followingholds. Suppose M and D are two probabilistic Turingmachines, where D generates samples x with |x| = n inpolynomial time for any size n and D defines a sequenceof input distributions Dn. M takes an input x of size

n along with a set (xi, yi)poly(n)i=1 , where xi is sampledfrom Dn using D and yi indicates the corresponding la-bel. If xi ∈ L, one have yi = 1, else yi = 0. Specifically,one require:(1) The probabilistic Turing machine M processes all in-puts x in polynomial time.(2) For all x ∈ L, M outputs 1 with probability greaterthan 2/3.(3) For all x /∈ L, M outputs 0 with probability less than1/3.

4. Proof of Theorem 1

We first review several definitions and lemmas whichare closely related to our proof.

Page 12: Provable Advantage in Quantum Phase Learning via Quantum

12

Lemma 3 (Ref. [72]). Let q be a degree d univari-ate polynomial over any field F. Suppose k pairs el-ements (xi, yi)ki=1 in F are provided, where all xidistinct with the promise that yi = q(xi) for at leastmin(d + 1, (k + d)/2) points. Then, one can recover qexactly in poly(k, d) deterministic time.

Lemma 4 (Ref. [63]). There exists an architecture A sothat the following task is #P -hard: Approximate p(j) =|〈j|U |0n〉|2 to additive error ±ε/2n with probability 3

4 +1

poly(n) over the choice of U ∼ HA in time poly(n, 1/ε).

Proof of Theorem 1. Firstly, we introduce how to con-struct a testing set T = (xi, yi)Mi=1. Suppose we takea worst-case quantum state |φ〉 = U |0n〉 generated by acircuit U so that computing p(j) = |〈j|φ〉|2 to within ad-ditive error 2−poly(n) is #P -hard. Here, the single- andtwo- qubit gate structures of U = UmUm−1...U1 is pro-vided, where Uk denotes the k-th single- or two- qubitgate for k ∈ [m]. Then it can multiply each gate Uk by a

Haar-random matrix H1−xk , that is, Uk(x) = UkH

1−xk ,

where Hk drawn from HA, and real value x ∈ [0, 1]can represent a d-length bit string x = x1x2...xd, that

is x =∑dj=1 xj2

−j . Since Hk is a Haar random matrix,

the quantum gate Uk(x) is completely random. In thisway, one can construct a bridge between the datum xand its corresponding feature state |φ(x)〉, that is

|φ(x)〉 = U(x)|0n〉 =

m∏k=1

(UkH

1−xk

)|0n〉. (B10)

Here, if x = 1, this gives us back the state |φ〉, and ifx ∈ (0, 1/poly(n)) the resulting state |φ(x)〉 looks almostuniformly random. The ‘worst-to-average-case’ reductioncan be achieved by proof of contradiction: Using Taylor

series, the matrix Uk(x) = UkH1−xk can be approximated

by a polynomial function Vk(x) with K = poly(n) degreewith precision ε = O(1/K!) that is

Uk(x) ≈ Vk(x) = UkHk

K∑s=0

(−x logHk)s

s!. (B11)

According to the Feynman path integral,

〈j|Vm(x)...V1(x)|0n〉 =∑

s1,...,sm−1∈0,1n

m∏k=1

〈sk|Vk(x)|sk−1〉

(B12)

is a Km degree polynomial function in x, where sm = jand s0 = 0n. Then

p(j,x) = |〈j|U(x)|0n〉|2 (B13)

can be approximated by a 2Km-degree polynomial func-tion in x with the truncated error O(2mn/(K!)m). ByAaronson and Arkhipov [65], it is #P -hard to compute〈j|V (x)|0〉 for most of V (x), then it is also #P -hard to

compute 〈j|U(x)|0〉 for most of U(x). Then, a testing setT = xi, yiMi=1 is obtained, where xi ∈ [0, 1/poly(n)),feature states |φ(xi)〉 = U(xi)|0n〉 , yi = 〈M〉φ(xi) andthe scale of testing set M = poly(N).

Secondly, we prove that there does not exist efficientclassical ML algorithm that can predict yi for xi ∈ T .Given the training set S, the power of classical ML canbe characterized as the BPP/samp class [40]. If ys,x =〈φ(x)|M(s)|φ(x)〉 can be predicted by a classical ML al-gorithm given (s,x), based on Stockmeyer’s theorem [70],there exists a BPPNP algorithm with a BPP/poly ora-cle, which can approximate p(j, x) with a multiplicativeerror 1/poly(n). Combining BPP/poly ⊆ P/poly, we di-rectly have P#P ⊆ BPPNP/poly, and this thus yields PHcollapses to the second level [71].

Appendix C: QPL problem and quantum randomcircuit

In the main text, the proof of Theorem 1 is estab-lished on the classical hardness for random circuit sam-pling problem, and the feature states are thus generatedby random circuit states. On other hand, the QPL prob-lem has a similar structure to the ground state problem.

In the field of quantum computation, the VariationalQuantum Eigensolver (VQE) is a popular method in ap-proximating the ground state of H(a) [73–75]. The keyidea of VQE is that the parameterized quantum state|Ψ(θ)〉 is prepared and measured on a quantum com-puter, and the classical optimizer updates the parame-ter θ according to the measurement information. Theground state can be obtained by minimizing the energy

E(θ,a) = 〈Ψ(θ)|H(a)|Ψ(θ)〉 (C1)

following the variational principle. Basically, the selec-tion of the ansatz |Ψ(θ)〉 is flexible, which includes theunitary coupled cluster ansatz [76], alternating layeredansatz [77] and hardware efficient ansatz [78].

The hardware efficient ansatz is composed of single-and two-qubit gates in each repeated layer and it is ex-perimental friendly on near term quantum devices. Fol-lowing the notation in the Ref. [78], a D-depth hardwareefficient ansatz is formalized as

Uh(θ) =

D∏d=1

Ud(θd)Wd, (C2)

in which Ud(θd) is the tensor product of n single-qubitrotations and Wd is the entanglement gate. Generally,the construction of Uh(θ) promises it will be close to arandom unitary with the increasing of D [77, 79]. Fromthe above discussion, it is reasonable to assume that theQPL problem involves the quantum random circuit, andthis is consistent to the condition in Conjecture 1.

Page 13: Provable Advantage in Quantum Phase Learning via Quantum

13

Appendix D: Implementation of quantum kernelwith SWAP test

By leveraging of Chernoff bound, the quantum kernelcan be approximated by independently performing theDestructive-Swap-Test [67] to O(log(1/δ)/ε2Q) copies of

2n-qubit state |φ (ai)〉 ⊗ |φ (x)〉, with additive error εQand failure probability δ. The expectation of the mea-surement results of the Destructive-Swap-Test is

〈φ(ai)⊗ φ(x)|SWAP|φ(ai)⊗ φ(x)〉 = Q(ai,x), (D1)

where SWAP|φ(ai) ⊗ φ(x)〉 = |φ(x) ⊗ φ(ai)〉 denotesthe 2n-qubit swap operator. For QPL problem, |φ (ai)〉and |φ (x)〉 can all be generated with polynomial-size cir-cuit, hence the Destructive-Swap-Test can be performedefficiently.