Fisher information and asymptotic normality in system identification for quantum Markov chains

PHYSICAL REVIEW A 83, 062324 (2011)

Fisher information and asymptotic normality in system identification for quantum Markov chains

Madalin GutaUniversity of Nottingham, School of Mathematical Sciences, University Park, Nottingham NG7 2RD, UK

(Received 7 July 2010; revised manuscript received 27 September 2010; published 20 June 2011)

This paper deals with the problem of estimating the coupling constant θ of a mixing quantum Markov chain.For a repeated measurement on the chain’s output we show that the outcomes’ time average has an asymptoticallynormal (Gaussian) distribution, and we give the explicit expressions of its mean and variance. In particular, weobtain a simple estimator of θ whose classical Fisher information can be optimized over different choices ofmeasured observables. We then show that the quantum state of the output together with the system is itselfasymptotically Gaussian and compute its quantum Fisher information, which sets an absolute bound to theestimation error. The classical and quantum Fisher information are compared in a simple example. In the vicinityof θ = 0 we find that the quantum Fisher information has a quadratic rather than linear scaling in output size, andasymptotically the Fisher information is localized in the system, while the output is independent of the parameter.

DOI: 10.1103/PhysRevA.83.062324 PACS number(s): 03.67.−a, 05.30.−d, 02.50.Tt, 03.65.Wj

I. INTRODUCTION

Quantum statistics started in the 70s with the discoverythat notions of “classical” statistics such as the Cramer-Raoinequality and Fisher information have nontrivial quantumextensions which can be used to design optimal measure-ments for quantum state estimation and discrimination [1–4].Recently, statistical inference has become an indispensabletool in quantum engineering tasks such as state preparation[5,6], precision metrology [7,8], quantum process tomography[9,10], state transfer and teleportation [11,12], and continuousvariables tomography [13,14].

Quantum system identification (QSI) is a topic of particularimportance in quantum engineering and control [15] whereaccurate knowledge of dynamical parameters is crucial. Thispaper addresses the QSI problem for Markov dynamics fromthe viewpoint of asymptotic statistics, complementing otherrecent investigations [16–19].

We illustrate the concept of a quantum Markov chainthrough the example of an atom maser [10]: identicallyprepared d-level atoms (input) pass successively and at equaltime intervals τ through a cavity, interact with the cavityfield, and exit in a perturbed state (output) which carriesinformation about the interaction (see Fig. 1). Neglecting theinternal dynamics of atoms and cavity and taking the latterto be of dimension k < ∞, the evolution can be described indiscrete time and consists of applying an interaction unitaryU ∈ M(Cd ⊗ Ck) for each time interval τ when a new atompasses through the cavity. If the incoming atoms are inthe pure state ψ ∈ Cd and the cavity is initially in somestate ϕ ∈ Ck , then at time n the output plus cavity state isψn ∈ (

Cd)⊗n ⊗ Ck:

ψn := U (n)(ψ⊗n ⊗ ϕ) := U (1) · · · U (n)(ψ⊗n ⊗ ϕ), (1)

where U (l) is the copy of U acting on the cavity and the atom,which at time n is at position l on the right side of the cavity.

Consider now that the interaction depends on some un-known parameter θ such that U = Uθ and correspondinglyψn = ψn

θ . Our identification problem is to estimate θ bymeasuring the output rather that the system (cavity) which maynot be directly accessible. For simplicity we restrict ourselves

to the case of estimating one parameter (the interactionstrength), such that Uθ = exp(−iθH ) where H ∈ M(Cd ⊗Ck) is a known Hamiltonian, but the results can be extendedto multiple parameters as well as continuous time [20]. Thequestions we want to address are: how much informationabout θ is contained in the output state ψn

θ , and how can we“extract” it?

The standard approach to such questions goes via the quan-tum Cramer-Rao inequality, which shows that the varianceof any unbiased estimator obtained by measuring a copy ofa state ρθ is lower-bounded by the inverse of the quantumFisher information F (θ )−1 [1,2,21]. Although this bound isgenerally not attainable for a single copy, it is asymptoticallyattainable; that is, there exist a sequence of measurementsMn on n identically prepared systems and estimators θn suchthat

limn→∞ nE(θn − θ )2 = F (θ )−1.

Unlike this case where the n-copy Fisher information scaleslinearly with the number of systems, the Fisher informationof the correlated states ψn

θ depends on the joint state ratherthan that of a single subsystem, and there is no straightforwardargument to show that its rescaled version is asymptoticallyattainable. Moreover, in the case of multidimensional param-eters, this approach would run into the same problems as theindependent-copies model, for which it is well known that theCramer-Rao bound is not attainable even asymptotically [1].

For these reasons, we will pursue an asymptotic analysisbased on the concept of local asymptotic normality (LAN)[22], which was recently extended to quantum statistics[23–26] and used to solve the (asymptotically) optimalestimation problem for general multiparametric models. In thispaper we extend quantum LAN from independent to finitelycorrelated quantum states and at the same time generalizeresults on LAN [27] and the central limit theorem (CLT) forclassical Markov chains [28].

In Theorem 4 we show that, locally with respect to theparameter θ , the output state ψn

θ can be approximated by aone-parameter family of coherent states, cf. (15). This resultcan be seen as the Markov version of asymptotic Gaussianity

062324-11050-2947/2011/83(6)/062324(9) ©2011 American Physical Society

http://dx.doi.org/10.1103/PhysRevA.83.062324

MADALIN GUTA PHYSICAL REVIEW A 83, 062324 (2011)

FIG. 1. (Color online) Atom maser: identically prepared atomspass successively through a cavity, interact with the cavity field, withthe outgoing atoms carrying information on the dynamics.

in coherent spin states (CSSs) [29]. As a by-product, we obtainthe asymptotic expression of the quantum Fisher informationF = F (θ ) (per atom) of the output state (16), which is equalto the “Markovian variance” of the driving Hamiltonian. Thisquantity should be understood as the limit of the rescaledquantum Fisher information F (n)(θ ) of the n-atom family ofstates ψn

θ :

F (θ ) = limn→∞

F (n)(θ )

n.

In particular, F (θ )−1 sets an asymptotic lower bound on thevariance nE(θn − θ )2 of any sequence of unbiased estimators{θn}. Using the formalism of LAN we can show that the lowerbound is attainable but, at the moment, we do not not knowthe explicit form of the optimal measurement, and we expectit to be nonseparable and possibly unfeasible with currenttechnology.

In Theorem 3 we analyze the more realistic setup ofrepeated, separate measurements performed on the outgo-ing atoms, and show that the mean of the outcomes isasymptotically Gaussian and can be used to estimate θ . Thecorresponding classical Fisher information can be maximisedover different measured observables, so that the experimentercan perform the most informative separable measurement,and compare its performance with the benchmark given bythe quantum Fisher information (see example). This gener-alizes Wiseman’s adaptive phase estimation protocol wherea particular field quadrature is most informative among allquadratures [30]. With the same techniques, similar results canbe obtained for means of other functionals of the measurementdata such as correlations between subsequent atoms. However,it remains an open problem to find the (asymptotic) classicalFisher information contained in the complete measurementdata.

The next two sections introduce the key concepts under-lying our results: local asymptotic normality and ergodicity.The main results are contained in theorems 3 and 4. Toillustrate these results we analyze a simple example based onan XY interaction for which we compute the quantum Fisherinformation, and we plot the classical Fisher information fordifferent output observables. An interesting feature of thismodel is the divergence of the asymptotic quantum Fisherinformation per atom at vanishing interaction, which is due toa quadratic rather than linear scaling of the “usual” quantumFisher information F (n)(θ ) for θ ≈ 0. We investigate thisbehavior and find that, with the scaling θ = u/n, the modelconverges to a simple unitary rotation model on the system,with input passing into the output unperturbed. We concludewith a discussion on further extensions, open problems, andconnections with other topics.

II. LOCAL ASYMPTOTIC NORMALITY IN CLASSICALAND QUANTUM STATISTICS

In this section we briefly review some asymptotic statisticstechniques, show how they extend to quantum statistics, andexplain why this is useful. The aim is to introduce the conceptof local asymptotic normality, which will be encountered inthe main results, theorems 3 and 4.

A. Asymptotic estimation in classical statistics

A typical problem in statistics is the following: esti-mate an unknown parameter θ = (θ1, . . . ,θp) ∈ Rp giventhe random variables X1, . . . ,Xn which are independent andidentically distributed (i.i.d.) and have probability distributionPθ depending “smoothly” on θ . If θn := θn(X1, . . . ,Xn) isa unbiased estimator [i.e., E(θn) = θ ], then the Cramer-Rao(C-R) inequality provides the following lower bound to its(rescaled) covariance matrix

nE[(θn − θ )T (θn − θ )] � I (θ )−1, (2)

where I (θ ) is a p × p positive definite real matrix called theFisher information matrix at θ and quantifies the amount of“statistical information” about θ contained in a single samplefrom the distribution Pθ . If pθ (x) = dPθ /dµ denotes thedensity of Pθ with respect to some reference measure µ, thenthe matrix elements of I (θ ) are given by

I (θ )ij : =∫

pθ (x)∂ ln pθ (x)

∂θi

∂ ln pθ (x)

∂θj

µ(dx)

= Eθ (�θ,i�θ,j )

and depend only on the local behavior of the statistical model{Pθ : θ ∈ Rp} around the point θ . To give a simple example, ifXi ∈ {0,1} are independent coin tosses with Pθ [Xi = 1] = θ ,then the mean

θn = 1

n

n∑i=1

Xi

is an unbiased estimator of θ whose distribution is asymptoti-cally normal according to the central limit theorem:

√n(θn − θ )

L−→ N (0,θ (1 − θ )) .

where N (m,V ) is the normal distribution with mean m andvariance V . A simple calculation shows that, in this case,I (θ )−1 = θ (1 − θ ) so that θn achieves the C-R lower bound.Interestingly, the Fisher information diverges at the boundaryof the interval [0,1], but note that the bound is meaningfulonly for points in the interior of the parameter space. A similarsituation will occur later in an example of a quantum Markovchain.

While in general there might not exist any unbiasedestimators achieving the C-R bound for a given n, the theorysays that “good” estimators θn(X1, . . . ,Xn) (e.g., maximumlikelihood under certain conditions) are asymptotically normalwith

√n(θn − θ )

L−→ N (0,I (θ )−1), (3)

such that the C-R bound is asymptotically achieved [31].

062324-2

FISHER INFORMATION AND ASYMPTOTIC NORMALITY . . . PHYSICAL REVIEW A 83, 062324 (2011)

Le Cam went a step further and discovered a morefundamental phenomenon called local asymptotic normality(LAN), which roughly means that the underlying statisticalmodel {Pθ : θ ∈ Rp} can be linearized in the neighborhood ofany fixed parameter, and approximated by a simple Gaussianmodel with fixed covariance and unknown mean [32]. Toexplain this, let us first note that without loss of generalitywe can “localize” θ ; that is, write it as

θ = θ0 + u/√

n,

where θ0 can be chosen to be a rough estimator based ona small subsample of size n = n1−ε with 0 < ε � 1, and u

is an unknown “local parameter.” By a simple concentrationof measure argument [25] one can show that, with vanishingprobability of error, the local parameter satisfies ‖u‖ � nε . Forall practical purposes, we can then use the more convenientlocal parametrization by u ∈ Rp and denote the originaldistribution P n

θ0+u/√

nby Pn,u. Now, LAN is the statement that

there exists randomization (classical channels) Tn and Sn suchthat

limn→∞ sup

‖u‖�nε

‖Tn(pn,u) − N (u)‖1 = 0,

limn→∞ sup

‖u‖�nε

‖pn,u − Sn(N (u))‖1 = 0,

where ‖ · ‖1 denotes the L1 norm, and pn,u and N (u) are theprobability densities of Pn,u and respectively N (u,I (θ0)−1).Operationally, this means that one can use the data X1, . . . ,Xn

to simulate a normally distributed variable with densityN (u), and vice versa, with asymptotically vanishing L1

error, without having access to the unknown parameter u.This type of convergence is strong enough to imply theprevious results on asymptotic normality and optimality ofthe maximum likelihood estimation, but can be used to makesimilar statements about other statistical decision problemsconcerning θ . We will now show that a similar phenomenonoccurs in quantum statistics and indicate how it can be used tofind asymptotically optimal state-estimation protocols.

B. The Cramer-Rao approach to state estimation

Let us consider the problem of estimating a one-dimensional parameter θ given n identical and independentcopies of a quantum system prepared in the (possibly mixed)state ρθ ∈ M(Cd ) that depends smoothly on θ . Following[1–4,21] we analyze an unbiased estimator θn based on theoutcome of an arbitrary measurement M on the joint stateρ⊗n

θ . By the classical Cramer-Rao inequality, the mean-squareerror (MSE) of θn is lower bounded by the inverse (classical)Fisher information of the measurement outcome:

E[(θn − θ )2] � IM (θ )−1.

Thus, a “good” measurement is characterized by a large Fisherinformation, but how large can IM (θ ) be? The answer is givenby the notion of quantum Fisher information associated to themodel {ρθ : θ ∈ R}, which is defined as

F (θ ) := Tr(ρθL2

θ

),

where the symmetric logarithmic derivative (sld) Lθ is thequantum analog of �θ and is the self-adjoint solution of theequation

dρθ

dθ= 1

2(Lθρθ + ρθLθ ) .

As expected, for n identical copies, the quantum Fisherinformation of the joint state {ρ⊗n

θ : θ ∈ R} is nF (θ ) and thesld is given by

Lθ (n) =n∑

i=1

L(i)θ ,

with L(i)θ acting on the ith system.

The Braunstein-Caves inequality shows that the quantumFisher information is an upper bound to the classical one [i.e.,IM (θ ) � nF (θ )] so that we obtain

nE[(θn − θ )2] � F (θ )−1.

Moreover, the constant on the right side can be achievedasymptotically by an adaptive measurement which amountsto measuring Lθ0 (n) for some point θ0 which is a preliminaryestimator of θ obtained by measuring a small proportionof the systems. In summary, the optimal estimation ratefor one-dimensional parameters is n−1F (θ )−1, and can beachieved by means of separate measurements of the sld’s L(i)

θ .Let us turn now to the case where ρθ depends on a mul-

tidimensional parameter θ = (θ1, . . . ,θp) ∈ Rp. The quantumFisher information matrix can be defined along similar linesand all previous inequalities hold as matrix inequalities; inparticular, the covariance matrix of an unbiased estimator θn

satisfies

nE[(θn − θ )T (θn − θ )] � F (θ )−1. (4)

However, unlike the classical case and the one-dimensionalquantum case, the right side is in general not achievable,even asymptotically! This purely quantum phenomenon hasa simple intuitive explanation: the optimal estimation of thedifferent coordinates (θ1, . . . ,θp) requires the simultaneousmeasurement of generally incompatible observables, the asso-ciated sld’s (Lθ,1(n), . . . ,Lθ,p(n)).

Coming back to the original goal of estimating the param-eter θ , the above Fisher information analysis implies that theoptimal measurement procedures must depend on the chosenfigure of merit. Thus, one should not aim at saturating matrixinequalities such as (4) but at finding asymptotically attainablelower bounds for the risk (multiplied by n)

nE[(θn − θ )G(θ )(θn − θ )T ] = nTr[G(θ )Var(θn)],

assuming for simplicity a quadratic loss function with thepositive weight matrix G(θ ). Taking the trace with G(θ )on both sides of (4) gives the generally nonattainable lowerbound Tr[G(θ )F (θ )−1], and other examples can be derivedfrom different versions of the quantum Cramer-Rao inequalitysuch as Belavkin’s right and left inequalities [3]. Holevo [1]derived a more general bound and showed that it is achievablefor families of Gaussian states with unknown displacementsbut, until recently, it remained an open question whether thisbound was asymptotically attainable for finite-dimensionalstates. By further refining the techniques of the unbiased

062324-3


estimation setup, Hayashi and Matsumoto [33] showed that theHolevo bound is indeed asymptotically attainable for generalfamilies of two-dimensional quantum states. Complementingthis frequentist asymptotic analysis, Bagan and coworkers [34]solved the optimal qubit estimation problem for any given n

in the Bayesian setup with invariant priors. However, neitherof these approaches were successful in solving the (asymptot-ically) optimal estimation problem for general (mixed states)multiparametric models ρθ ∈ M(Ck) with k > 2.

C. Local asymptotic normality for quantum states

At this point, a natural question to ask is whether thephenomenon of local asymptotic normality occurs also inquantum statistics, and whether it can be used to designasymptotically optimal measurement strategies. Recall that inthe classical case, the main idea was that for large n, the i.i.d.model could be approximated by a Gaussian model, in thesense that each can be mapped approximately onto the otherby means of classical channels. Building on earlier work byHayashi [35,36], the quantum version of LAN has been derivedin a series of papers [23–26] to which we refer for the details ofthe constructions. Here we only mention the general result, anddiscuss in more detail the special case of pure-state models,which is more relevant for the present work.

As in the classical case, we can localize θ to a neighborhoodof θ0 such that θ = θ0 + u/

√n with ‖u‖ � nε for some

small ε > 0, and we denote ρn,u := ρ⊗n

θ0+u/√

n. Then there exist

channels (normalized, completely positive linear maps) Tn andSn between the appropriated spaces such that

limn→∞ sup

‖u‖�nε

‖Tn(ρn,u) − Nu ⊗ u‖1 = 0, (5)

limn→∞ sup

‖u‖�nε

‖ρn,u − Sn(Nu ⊗ u)‖1 = 0, (6)

where {u : u ∈ Rp} is a family of quantum Gaussian states ofa continuous variables system, and {Nu : u ∈ Rp} is a familyof classical Gaussian distributions. Moreover, each family hasa fixed covariance matrix and the displacement is a lineartransformation of the unknown parameter u. With this toolat hand, one can prove that the Holevo bound is attainablethrough the following three-step procedure: first localize θ ,then send the remaining states through the channel Tn, andthen apply the optimal measurement for the limit Gaussianmodel.

We illustrate LAN for the simple case of a one-dimensionalfamily of pure states on Cd :

|ψθ 〉 := exp(−iθJ )|ψ〉, (7)

where J is a self-adjoint operator which satisfies 〈ψ |J |ψ〉 = 0.The (nonunique) sld is given by

Lθ = −2i[J,|ψθ 〉〈ψθ |],and the quantum Fisher information is proportional to thevariance of the “generator” J :

F (θ ) = 〈ψθ |L2θ |ψθ 〉 = 4〈ψ |J 2|ψ〉 := F. (8)

In the case of pure states, the limit Gaussian model consistsof a family of coherent states so that LAN can be intuitivelyunderstood by analyzing the intrinsic geometric structure

of the quantum statistical model, which is encoded in theinner products between vectors corresponding to differentparameters. Indeed, with the usual definition of the local states|ψn,u〉 := |ψθ0+u/

√n〉⊗n, a simple calculation shows that

limn→∞〈ψn,u|ψn,v〉 = lim

n→∞〈ψ | exp[i(u − v)/(√

nJ )]|ψ〉n

= exp[−(u − v)2F/8] = 〈√

2Fu|√

2Fv〉,(9)

where |u〉 is a coherent state of a one-mode continuousvariable (cv) system, with displacements 〈u|Q|u〉 = u and〈u|P |u〉 = 0. This means that, locally, the “shape” of thestatistical model for n states converges to that of a familyof coherent states. By using the central limit, we can identifythe collective observables which converge “in distribution” tothe coordinates of the cv system as√

2

nFJ (n) :=

√2

nF

n∑i=1

J (i) −→ P,

1√2nF

Lθ0 (n) := 1√2nF

n∑i=1

L(i)θ0

−→ Q,

such that the rescaled sldLθ0 (n)/√

n converges to the sld of thelimit Gaussian model {|√2Fu〉 : u ∈ R}, as expected. With amore careful analysis of the speed of convergence in (9), it canbe shown that the weak LAN can be upgraded to the strongversion described by (5) and (6) [20].

The goal of this paper is to derive the weak LAN forthe output state of a mixing Markov chain together with itsclassical counterpart for averages of simple measurements.The discussion around the i.i.d. models will hopefully providethe necessary intuition about the statistical meaning of LANin the Markov setup and convince the reader that the quantity(16) plays the role of asymptotic quantum Fisher informationper atom. We leave the purely technical step of deriving thestrong LAN and proving the achievability of the quantumFisher information to Ref. [20].

III. MIXING QUANTUM MARKOV CHAINS

For later purposes we recall some ergodicity notions fora Markov chain with fixed unitary U . The reduced n-stepdynamics of the cavity is given by the CP map ρ �→ T n

∗ (ρ)where T∗ is the “transition operator” T∗(ρ) = Tra(U |ψ〉〈ψ | ⊗ρU †) with the trace taken over the atom. We say that T∗is mixing if it has a unique stationary state T∗(ρst) = ρst,and any other state converges to ρst i.e. ‖T n

∗ (ρ) − ρst‖1 → 0.

Mixing chains have a simple characterization in terms of theeigenvalues of T∗, generalizing the classical Perron-Frobeniustheorem.

Theorem 1. T∗ is mixing if and only if it has a uniqueeigenvector with eigenvalue λ1 = 1 and all other eigenvaluessatisfy |λ| < 1.

As a corollary, the convergence to equilibrium is essentiallyexponentially fast ‖T n(ρ) − ρst‖1 � Ckn

k|λ2|n where λ2 isthe eigenvalue of T with the second-largest absolute value[37]. The following theorem is a discrete time analog of the

062324-4


perturbation theorem 5.13 of [38] and its proof is given in theAppendix.

Theorem 2. Let T (n) : M(Ck) →: M(Ck) be a sequence oflinear contractions with asymptotic expansion

T (n) = T0 + 1√nT1 + 1

nT2 + O(n−3/2), (10)

such that T0 is a mixing CP map with stationary state ρst. ThenId − T0 is invertible on the orthogonal complement of 1 withrespect to the inner product 〈A,B〉st := Tr(ρstA

†B) where Idis the identity map. Assuming 〈1,T1(1)〉st = 0 we have

limn→∞ T (n)n[1] = exp(λ)1,

where λ = Tr[ρst(T2(1) + T1 ◦ (Id − T0)−1 ◦ T1(1)].From now on we assume that T = Tθ is mixing. Since

the cavity equilibrates exponentially fast we can choosethe stationary state ρθ

st as initial state, without affecting theasymptotic results below.

IV. MAIN RESULTS

This section contains the main results of the paper. The firstsubsection deals with estimators based on outcomes of separateidentical measurements on the output atoms. In Theorem 3we prove the asymptotic normality of the outcomes’ timeaverage and find the explicit expression of the asymptoticFisher information of such statistics. Similar results can beobtained for time averages of functionals depending on severaloutcomes, such as the correlations between subsequent atoms.In general these will provide higher Fisher information sincethe measurement outcomes are not independent but have ex-ponentially decaying correlations. It remains an open problemis to find the “full” Fisher information of the measurementstochastic process. The second subsection deals with theintrinsic statistical properties of the quantum model ψn

θ . InTheorem 4 we prove that the quantum model is asymptoticallynormal, and we find the explicit expression of the quantumFisher information F per atom.

Consider a simple measurement scheme where an observ-able A ∈ M(Cd ) is measured on each of the outgoing atoms,and let A(l) be the random outcome of the measurement on thel’s atom. By stationarity, all expectations Eθ (A(l)) are equal to

〈A〉θ := Tr[|ψ〉〈ψ | ⊗ ρθ

stU†θ (A ⊗ 1)Uθ

],

and by ergodicity of the measurement process [39]

An := 1

n

n∑l=1

A(l) → 〈A〉θ , almost surely

Generically the right side depends smoothly on θ and canbe inverted (at least locally) so that θ = f (〈A〉θ ) for somewell-behaved function f , providing us with the estimatorθn := f (An). As argued above, to analyze its asymptoticperformance we can take θ = θ0 + u/

√n with θ0 fixed and

u an unknown “local parameter.”Theorem 3. Let θ = θ0 + u/

√n with θ0 fixed and let A ∈

M(Cd )sa be such that 〈A〉θ0 = 0. Then√

nAn is asymptoticallynormal; that is, as n → ∞

Eθ0+u/√

n[exp(it√

nAn)] → Fu(t),

where Fu(t) := exp[iµ(A)ut − σ 2(A)t2

2 ] is the characteristicfunction of the distribution N (µ(A)u,σ 2) with

µ(A) := i 〈[H,A ⊗ 1 + 1 ⊗ B]〉θ0, (11)

σ 2(A) := 〈A2〉θ0 + 2〈A ⊗ B〉θ0 , (12)

B := (Id − T0)−1(〈ψ |U †

θ0A ⊗ 1Uθ0 |ψ〉). (13)

Note that, for u = 0, we obtain a quantum extension of thecentral limit theorem for Markov chains [28].

Before proving the theorem we show that θn is asymptot-ically normal and find its mean-square error. By expandingθn := f (An) around 〈A〉θ0 and using the property f ′(〈A〉θ ) =(d〈A〉θ /dθ )−1 we have

θn = f(〈A〉θ0

) + f ′(〈A〉θ0

) (An − 〈A〉θ0

) + OP (n−1)

= θ0 + µ(A)−1An + OP (n−1),

so that

limn→∞ nE[(θn − θ )2] = lim

n→∞E

[(√nAn

µ(A)− u

)2]

= σ 2(A)

µ(A)2.

The limit can be seen as the inverse (classical) Fisherinformation per measured atom, which in principle can beminimized by varying A and/or the input state ψ .

Proof. By using (1) we can rewrite the characteristicfunction as

Eθ [exp(it√

nAn)] = 〈exp(it√

nAn)〉θ = 〈T (n)n[1]〉θ ,where T (n) : M(Ck) → M(Ck) is the map

T (n) : X �→ Eθ0 [eiHu/√

n(eitA/√

n ⊗ X)e−iHu/√

n|s],

with Eθ0 [Y |s] := 〈ψ |U †θ0YUθ0 |ψ〉 being the “conditional ex-

pectation” onto the system. We expand T (n) as in (10) with

T0(X) := Eθ0 [1 ⊗ X|s] ,

T1(X) := iEθ0 [u [H,1 ⊗ X] + tA ⊗ X|s] ,

T2(X) := −u2

2Eθ0 [[H, [H,1 ⊗ X]] |s]

− t2

2Eθ0 [A2 ⊗ X|s] − utEθ0 [[H,A ⊗ X] |s] .

Since Tr[ρstT1(1)] = 〈A〉θ0 = 0 we can apply Theorem 2so we only need to compute the coefficient λ. The firstpart is Tr[ρstT2(1)] = −t2〈A2〉θ0/2 − ut〈[H,A ⊗ 1]〉θ0 and thesecond part is itTr[ρstT1(B)], with B defined in (13). �

A. Quantum Fisher information and LAN

Recall that the joint state of the system and output atoms is

ψnθ = Uθ (n)(ψ⊗n ⊗ ϕ) := U (1) · · ·U (n)(ψ⊗n ⊗ ϕ)

= exp(−iθH (1)) · · · exp(−iθH (n))(ψ⊗n ⊗ ϕ),

where H (i) represents the copy of the interaction Hamiltonianwhich acts on the system and the ith atom and is ampliatedby the identity on the rest of the atoms. It is important to notethat in general the commutants [H (i),H (j )] are nonzero fori �= j since both Hamiltonians contain system operators. Thismeans that the model ψn

θ is not a covariant one as (7); that

062324-5


is, we cannot write it as ψnθ = exp[−iθH (n)]

(ψ⊗n ⊗ ϕ

)for

some “total Hamiltonian” H (n). In particular, as we will see,the quantum Fisher information depends on θ . However, wecan write

dρnθ

dθ= [

Hθ (n),ρnθ

],

where ρnθ = |ψn

θ 〉〈ψnθ | and

Hθ (n) =n∑

i=1

H(i)θ :=

n∑i=1

U(1)θ · · · U (i−1)

θ H (i)U(i−1)∗θ · · · U (1)∗

θ .

As in (8) it follows that the quantum Fisher information F (n)(θ )is

F (n)(θ ) = 4〈Hθ (n)2〉θ,n − 4〈Hθ (n)〉2θ,n.

Here and in the next few lines the expectation 〈·〉θ,n istaken with respect to the state ψn

θ , rather than the stationarystate whose expectation is denoted 〈·〉θ . We will see that, inasymptotics, we can revert to the stationary-state expectation.Ignoring for the moment the second term on the right side, wewrite

〈Hθ (n)2〉θ,n =n∑

i=1

⟨(H

(i)θ

)2⟩θ,n

+∑

1�i〈j�n

⟨{H

(i)θ ,H

(j )θ

}⟩θ,n

.

Now, by ergodicity we have

limn→∞

1

n

n∑i=1

⟨(H

(i)θ

)2⟩θ,n

= 〈H 2〉θ .

Similarly, by using the Markov property, it can be shown that

limn→∞

1

n

∑1�i<j�n

⟨{H

(i)θ ,H

(j )θ

}⟩θ,n

=∞∑i=0

⟨{H,T i

0 (K)}⟩

θ,

where K := 〈ψ |H |ψ〉 ∈ M(Ck). Assuming that 〈H 〉θ = 0,the bias term 〈Hθ (n)〉2

θ,n is sublinear and, from the above limits,we obtain

limn→∞

1

nF (n)(θ0) = F (θ0), (14)

with F (θ0) being the asymptotic quantum Fisher informationper atom given by (16). The next theorem strengthens thisconclusion, by showing that the quantum model ψn

θ converges(locally around any θ0 �= 0) to a coherent state model withquantum Fisher information F (θ0).

Theorem 4. Suppose that 〈H 〉θ0 = 0. Then the family of (lo-cal) output states {ψn

θ0+u/√

n: u ∈ R} converges to a family of

coherent states {|√F/2u〉 : u ∈ R} with one-dimensional dis-placement 〈√F/2u|Q|√F/2u〉 = √

F/2u. More precisely, asn → ∞⟨

ψnθ0+u/

√n

∣∣ψnθ0+v/

√n

⟩ → eia(u2−v2)〈√

F/2u|√

F/2v〉= eia(u2−v2)e−F (u−v)2/8, (15)

where a ∈ R is a constant and F = F (θ0) is the asymptoticquantum Fisher information (per atom)

F (θ0) := 4[〈H 2〉θ0 + 〈{H,(Id − T0)−1(K)}〉θ0

], (16)

with K := Eθ0 [H |s] = 〈ψ |H |ψ〉.

Note that eia(u2−v2) is an irrelevant phase factor which canbe absorbed in the definition of the limit states. The claimthat F is the asymptotic quantum Fisher information (peratom) of the output follows from the convergence (15) tothe one-dimensional coherent state model {|√F/2u〉 : u ∈ R}and the fact that the latter has quantum Fisher information F

(cf. [2,21]) and agrees with the limit (14). Note also that,similarly to (12), the left side of (16) can be identified with theasymptotic variance appearing in the CLT for the operator2H . This agrees with the formula of the quantum Fisherinformation for unitary families of pure states, as variance ofthe driving Hamiltonian as shown in Sec. II. By extending theweak convergence to strong convergence as described in Sec. IIand applying similar techniques as in the i.i.d. case [25], it canbe shown that there exists a two-step adaptive measurementwhich asymptotically achieves the smallest possible variance

nE[(θn − θ )2] → F−1, θn := θ0 + un/√

n. (17)

Since for coherent models the optimal measurement is that ofthe canonical variable which is conjugate to the “driving” one,we conjecture that quantum Fisher information is achieved bya measuring a collective observable which is conjugate to theHamiltonian H in a more general Markov CLT than that ofTheorem 4.

Prrof. By (1) we can rewrite the inner product as

⟨ψn

θ0+u/√

n

∣∣ψnθ0+v/

√n

⟩ = Tr [ρT (n)n[1]],

where

T (n) : X �→ Eθ0 [eiuH/√

n(1 ⊗ X)e−ivH/√

n|s].

Its expansion is

T0(X) := Eθ0 [1 ⊗ X|s] ,

T1(X) := iEθ0 [uH (1 ⊗ X) − v(1 ⊗ X)H |s] ,

T2(X) := − 12Eθ0 [u2H 2(1 ⊗ X) + v2(1 ⊗ X)H 2|s]

+Eθ0 [uvH (1 ⊗ X)H |s],

and the condition of Theorem 2 holds: Tr[ρstT1(1)] = 〈H 〉θ0 =0. Since T1(1) = i(u − v)K , the contribution from Tr[ρstT1 ◦(Id − T0)−1 ◦ T1(1)] can be written as

−(u − v)2Re〈H (Id − T0)−1(K)〉θ0

−i(u2 − v2)Im〈H (Id − T0)−1(K)〉θ0

Moreover, Tr[ρstT2(1)] = −(u − v)2〈H 2〉θ0/2, so by addingthe two contributions we obtain the desired result. �

V. EXAMPLE

In this section we illustrate the main results for the classicaland quantum Fisher information for a simple discrete-timeXY -interaction model. We also analyze the behavior of thequantum Fisher information in the vicinity of the point θ = 0where the chain is not ergodic and find that it is quadraticrather than linear in n and is concentrated in the system ratherthan the output.

062324-6


Let Uθ = exp(−iθH ) with H the 2-spin “creation-annihilation” hamiltonian H = i(σ+ ⊗ σ− − σ− ⊗ σ+) whereσ± are the raising and lowering operators on C2, respec-tively, and let |ψ〉 := a|0〉 + beif |1〉 be the input state,with a2 + b2 = 1. The transition matrix T0 has eigenvalues{1,c,[c(c + 1) ±

√c2(1 − c)2 − 16a2b2c(1 − c2)]/2}, and is

mixing if and only if c := cos θ0 �= 1. The quantum Fisherinformation (16) is

F := 16a4b4

(1 − c)(1 − c + 4a2b2c), (18)

which is independent of the phase f . This can be com-pared with the value of the classical Fisher informationµ(σ�n)2/σ 2(σ�n) obtained by measuring the spin in the direction�n for each outgoing atom (cf. Theorem 3). For c = 0.5, f =0, and b = 0.8, the quantum Fisher information is F = 5.03while that of the spin measurement varies from 0 to 1.15 (seeFig. 2).

An interesting, and perhaps surprising feature of thequantum Fisher information (18) is that it diverges at vanishingcoupling constant, due to the factor 1 − cos(θ ) ≈ θ2/2 in thedenominator. This singularity arises from the second termon the right side of (16) and stems from the fact that theMarkov chain is not mixing at θ = 0. To get an intuition forthis phenomenon let us compute (according to the standardmethodology [1,2,21]) the quantum Fisher information atθ = 0 for the family

ψnθ := Uθ (n)(ψ⊗n ⊗ ϕ),

where the input state is chosen to be |ψ〉 = (|0〉 + |1〉)/√2 andthe initial state of the “cavity” is |ϕ〉 = |0〉. Since

−iH (n) := dUθ (n)

dθ

∣∣∣∣θ=0

=n∑

i=1

(σ+ ⊗ σ(i)− − σ− ⊗ σ

(i)+ ),

one can easily verify that

⟨ψn

θ=0

∣∣H (n)∣∣ψn

θ=0

⟩ = 0,

and hence the quantum Fisher information at θ = 0 is

F (n) = 4⟨ψn

θ=0

∣∣H (n)2∣∣ψn

θ=0

⟩ = 4∥∥H (n)

∣∣ψnθ=0

⟩∥∥2.

FIG. 2. (Color online) Classical Fisher information for spinmeasurements as function of spin components ny,nz

Since σ−|0〉 = 0 the latter reduces to computing the squarednorm of the vector(

n∑i=1

σ i−

)( |0〉 + |1〉√2

)⊗n

=∑

i1,...,in

c(i1, . . . ,in)|i1, . . . ,in〉,

where ij ∈ {0,1} are the basis indices. Since a basis vectorcontaining p indices equal to 0 can be obtained in p differentways by applying the lowering operator to an input tensor, thecoefficients are

c(i1, . . . ,in) =⎛⎝n −

n∑j=1

ij

⎞⎠ 2n/2,

and the Fisher information is

F (n) = 2−n

n∑p=0

p2

(n

p

)= n(n + 1).

Thus, in contrast to the case of independent systems, the Fisherinformation scales quadratically rather than linearly with thenumber of systems, hence the divergence of the quantum Fisherinformation per atom, which represents the asymptotic valueof F (n)/n. Note that similar quadratic scaling of the quantumFisher information is encountered in phase estimation [7] andmore generally in optimal estimation of unitary channels [40],with the difference that it holds for any parameter rather thanat a single point.

We take a closer look at the states ψnθ by scaling the

parameter as θ = u/n as suggested by the Fisher information.This means that we know θ with an accuracy of n−1 (ratherthan the usual n−1/2) and we would like to find if the estimationof u “stabilises” in the asymptotic regime. By using the sametechnique as in Theorem 4 we have⟨

ψnu/n

∣∣ψnv/n

⟩ = 〈0|T (n)n[1]|0〉, (19)

where T (n) : M(Ck) → M(Ck) is the map

T (n) : X �→ 〈ψ |eiuH/n(1 ⊗ X)e−ivH/n|ψ〉.We expand T (n) as

T (n) = Id + T0

n+ O(n−2),

where

T0 : X �→ i(uKX − vXK), K := 〈ψ |H |ψ〉.Plugging into (19) we obtain the limit

limn→∞

⟨ψn

u/n

∣∣ψnv/n

⟩ = 〈0| exp(T0)[1]|0〉= 〈0| exp[i(u − v)K]|0〉. (20)

This agrees with the quadratic scaling of the quantum Fisherinformation at θ = 0 and shows that the quantum statisticalmodel has a limit provided that the right scaling of parametersare used.

Let us now look at the n-step reduced dynamics of thesystem for the same scaling θ = u/n with initial state |0〉.After a similar computation (with u = v) we obtain

limn→∞ ρ(n) = lim

n→∞ T∗(n)n(|0〉〈0|)= exp(−iuK)|0〉〈0| exp(iuK), (21)

062324-7


which means that, for large n, the system is effectively unitarilydriven by the “Hamiltonian” K , even though we started with anopen system dynamics! This effect is interesting in itself and isreminiscent of the quantum Zeno effect. From (20) and (21) wecan conclude that, asymptotically, the system and the outputhave pure states and, moreover, the input passes undisturbedinto the output∣∣ψn

u/n

⟩ ≈ |ψ〉⊗n ⊗ exp(iuK)|0〉.In conclusion, unlike the ergodic case where the outputcontains information about the parameter, it is the system’sstate which carries all the information, and successive timesteps amount to a simple unitary rotation. In particular,this explains the quadratic scaling of the quantum Fisherinformation F (n). In conclusion, the nonergodic setup exhibitsinteresting statistical features and should be analyzed on itsown in more detail. In particular, for practical applicationsit is important to see whether the quadratically scaled Fisherinformation is achievable asymptotically.

VI. CONCLUSIONS AND OUTLOOK

Quantum system identification is an area of significantpractical relevance with interesting statistical problems goingbeyond the state-estimation framework. We showed that, inthe Markovian setup, this problem is very tractable thanks tothe asymptotic normality satisfied by the output state and thetime average of the simple measurement process. This maycome as a surprise considering that the output is correlated, butit is in perfect agreement with the classical theory of Markovchains where similar results hold [27]. The theorems can beextended to strong LAN with multiple parameters, continuoustime dynamics, and measurements on several atoms [20].However, as in process tomography, full identification of theunitary requires the preparation of different input states. Theoptimisation of these states, and the case of nonmixing chainsare interesting open problems. Another open problem is to findthe classical Fisher information of the simple measurementprocess, rather than that of time-averaged functionals. SinceMarkov chains are closely related to matrix product states[41,42], our results are also relevant for estimating matrixproduct states [43].

When the chain is nonergodic, the Fisher information mayexhibit qualitatively different behavior, such as quadratic ratherthan linear scaling with the number of output systems, whichbears some similarity to that encountered in phase estimation[7]. Understanding this behavior in a more general scenario,and the possible applications in precision metrology, are topicsfor future research.

ACKNOWLEDGMENTS

This work is supported by the EPSRC FellowshipEP/E052290/1. The author thanks Luc Bouten for manydiscussion and for his help in preparing the paper.

APPENDIX: PROOF OF THEOREM 2

Since T0 is a mixing CP map, the identity is the uniqueeigenvector with eigenvalue 1 and all other eigenvalueshave absolute values strictly smaller than 1. For sufficientlylarge n T (n) has the same spectral gap property and we denoteby (λ(n),x(n)) its largest eigenvalue and the correspondingeigenvector such that λ(n) → 1 and x(n) → 1.

Since T (n) is a contraction,

‖T (n)n(1) − T (n)n(x(n))‖ � ‖T (n)n‖‖1 − x(n)‖ n→∞−→ 0.

On the other hand, since T (n)n(x(n)) = λ(n)nx(n), we haveT (n)n(1) → exp(λ)1, provided that the following limit exists:

limn→∞ λ(n)n := exp(λ).

We prove that this is the case by using the Taylor expansions

λ(n) = λ0 + 1√nλ1 + 1

nλ2 + O(n−3/2),

x(n) = x0 + 1√nx1 + 1

nx2 + O(n−3/2).

Then we can solve the eigenvalue problem in successive ordersof approximation:

T0(x0) = λ0x0,

T0(x1) + T1(x0) = λ0x1 + λ1x0, (A1)

T0(x2) + T1(x1) + T2(x0) = λ0x2 + λ1x1 + λ2x0.

From the first equation we have λ0 = 1,x0 = 1. Inserting intothe second equation we get

(T0 − Id)(x1) = −(T1 − λ1Id)(1),

and by taking inner product with 1 we obtain λ1 =〈1,T1(1)〉st = 0 by using the assumption. Hence,

x1 = (Id − T0)−1 ◦ T1(1) + c1,

where (Id − T0)−1 denotes the inverse of the restriction ofId − T0 to the orthogonal complement of 1, and c is a constant.

Similarly, from Eq. (A1),

(T0 − Id)(x2) = −T1(x1) + (λ2Id − T2)(1),

which implies

λ2 = 〈1,T2(1)〉st + 〈1,T1(x1)〉st

= 〈1,T2(1) + T1 ◦ (Id − T0)−1 ◦ T1(1)〉st.

Finally,

limn→∞ λ(n)n = lim

n→∞

(λ0 + λ1√

n+ λ2

n+ O(n−3/2)

)n

= limn→∞

(1 + λ2

n

)n

= exp(λ2). �

[1] A. S. Holevo, Probabilistic and Statistical Aspects of QuantumTheory (North-Holland, Amsterdam, 1982).

[2] C. W. Helstrom, Quantum Detection and Estimation Theory(Academic Press, New York, 1976).

[3] V. P. Belavkin, Theor. Math. Phys. 26, 213(1976).

[4] H. P. Yuen and M. Lax, IEEE Trans. Inf. Theory 19, 740(1973).

062324-8

http://dx.doi.org/10.1007/BF01032091

http://dx.doi.org/10.1007/BF01032091

http://dx.doi.org/10.1109/TIT.1973.1055103

http://dx.doi.org/10.1109/TIT.1973.1055103


[5] H. Haffner et al., Nature (London) 438, 643 (2005).[6] G. Breitenbach, S. Schiller, and J. Mlynek, Nature (London)

387, 471 (1997).[7] V. Giovannetti, S. Lloyd, and L. Maccone, Science 306, 1330

(2004).[8] B. L. Higgins, D. W. Berry, S. D. Bartlett, H. M. Wiseman, and

G. J. Pryde, Nature (London) 450, 393 (2007).[9] M. Lobino, C. Kupchak, E. Figueroa, and A. I. Lvovsky, Phys.

Rev. Lett. 102, 203601 (2009).[10] M. Brune, J. Bernu, C. Guerlin, S. Deleglise, C. Sayrin,

S. Gleyzes, S. Kuhr, I. Dotsenko, J. M. Raimond, and S. Haroche,Phys. Rev. Lett. 101, 240402 (2008).

[11] B. Julsgaard, J. Sherson, J. I. Cirac, J. Fiurasek, and E. S. Polzik,Nature (London) 432, 482 (2004).

[12] J. F. Sherson, H. Krauter, R. K. Olsson, B. Julsgaard,K. Hammerer, J. I. Cirac, and E. S. Polzik, Nature (London)443, 557 (2006).

[13] K. Vogel and H. Risken, Phys. Rev. A 40, 2847 (1989).[14] L. G. Lutterbach and L. Davidovich, Phys. Rev. Lett. 78, 2547

(1997).[15] H. Mabuchi and N. Khaneja, International Journal of Robust and

Nonlinear Control 15, 647 (2005).[16] H. Mabuchi, Quantum Semiclass. Opt. 8, 1103 (1996).[17] M. Howard, J. Twamley, C. Wittmann, T. Gaebel, F. Jelezko,

and J. Wrachtrup, New J. Phys. 8, 33 (2006).[18] S. G. Schirmer and D. K. L. Oi, Laser Physics 20, 1203 (2010).[19] D. Burgarth and K. Maruyama, New J. Phys. 11, 103019 (2009).[20] M. Guta and L. Bouten (unpublished).[21] S. L. Braunstein and C. M. Caves, Phys. Rev. Lett. 72, 3439

(1994).[22] A. van der Vaart, Asymptotic Statistics (Cambridge University

Press, 1998).[23] M. Guta and J. Kahn, Phys. Rev. A 73, 052108 (2006).[24] M. Guta and A. Jencova, Commun. Math. Phys. 276, 341 (2007).

[25] M. Guta, B. Janssens, and J. Kahn, Commun. Math. Phys. 277,127 (2008).

[26] J. Kahn and M. Guta, Commun. Math. Phys. 289, 597 (2009).[27] R. Hopfner, J. Jacod, and L. Ladelli, Probab. Theory Relat. Fields

86, 105 (1990).[28] S. Meyn and R. L. Tweedie, Markov Chains and Stochastic

Stability (Cambridge University Press, 2009).[29] J. M. Radcliffe, J. Phys. A: Gen. Phys. 4, 313 (1971).[30] H. M. Wiseman, Phys. Rev. Lett. 75, 4587 (1995).[31] E. L. Lehmann and G. Casella, Theory of Point Estimation

(Springer, New York, 1998).[32] L. Le Cam, Asymptotic Methods in Statistical Decision Theory

(Springer-Verlag, New York, 1986).[33] M. Hayashi and K. Matsumoto, e-print arXiv:quant-ph/0411073.[34] E. Bagan, M. A. Ballester, R. D. Gill, A. Monras, and R. Munoz-

Tapia, Phys. Rev. A 73, 032301 (2006).[35] M. Hayashi, American Mathematical Society Translations

Series 2 277, 95 (2009).[36] M. Hayashi, presentations at MaPhySto and QUANTOP Work-

shop on Quantum Measurements and Quantum Stochastics(Aarhus, 2003).

[37] B. M. Terhal and D. D. P., Phys. Rev. A 61, 022301 (2000).[38] E. Davies, One-Parameter Semigroups (Academic, London,

1980).[39] B. Kummerer and H. Maassen, Infinite Dimensional Analysis,

Quantum Probability and Related Topics 3, 161 (2000).[40] J. Kahn, Phys. Rev. A 75, 022326 (2007).[41] M. Fannes, B. Nachtergaele, and R. F. Werner, Commun. Math.

Phys. 144, 443 (1992).[42] C. Schon, E. Solano, F. Verstraete, J. I. Cirac, and M. Wolf, Phys.

Rev. Lett. 95, 110503 (2005).[43] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma, D. Gross,

S. D. Bartlett, O. Landon-Cardinal, D. Poulin, and Y.-K. Liu,Nature Commun. 1, 149 (2010).

062324-9

http://dx.doi.org/10.1038/nature04279

http://dx.doi.org/10.1038/387471a0

http://dx.doi.org/10.1038/387471a0

http://dx.doi.org/10.1126/science.1104149

http://dx.doi.org/10.1126/science.1104149


http://dx.doi.org/10.1103/PhysRevLett.102.203601









http://dx.doi.org/10.1002/rnc.1016

http://dx.doi.org/10.1002/rnc.1016

http://dx.doi.org/10.1088/1355-5111/8/6/002

http://dx.doi.org/10.1088/1367-2630/8/3/033

http://dx.doi.org/10.1134/S1054660X10090434

http://dx.doi.org/10.1088/1367-2630/11/10/103019




http://dx.doi.org/10.1007/s00220-007-0340-1

http://dx.doi.org/10.1007/s00220-009-0787-3

http://dx.doi.org/10.1007/BF01207516

http://dx.doi.org/10.1007/BF01207516

http://dx.doi.org/10.1088/0305-4470/4/3/009


http://arXiv.org/abs/arXiv:quant-ph/0411073



http://dx.doi.org/10.1142/S0219025700000091

http://dx.doi.org/10.1142/S0219025700000091


http://dx.doi.org/10.1007/BF02099178

http://dx.doi.org/10.1007/BF02099178



http://dx.doi.org/10.1038/ncomms1147

Documents

Fisher information and asymptotic normality in system identification for quantum Markov chains