Designing neural decoders for large toric codes - mpl.mpg.de · Main result • Introduced a quite reliable way to build neural decoders for large distance toric code (and likely

Designing neural decoders for large toric codes

Xiaotong Ni (TU Delft)Erlangen, May 9

arXiv: 1809.06640

https://arxiv.org/abs/1809.06640

Toric code

Assuming only Pauli-X error on qubits Syndrome extraction is noiseless

Z

Z

Z

Z

Z

Z

Z

Z

Z

XX

XX

10% error rate

code distance

Toric code

Assuming only Pauli-X error on qubits Syndrome extraction is noiseless

Z

Z

Z

Z

Z

Z

Z

Z

Z

XX

XX

10% error rate

• Correction is sensitive to the small change of syndrome, and involves a lot of parity computation.

code distance

Some previous works of neural decoders

G. Torlai and R. G. Melko, Physical Review Letters (2017). S. Krastanov and L. Jiang, Scientific Reports 7 (2017) Repeatedly sample error configurations using RBM / NN

P. Baireuther, T. E. O'Brien, B. Tarasinski, and C. W. J. Beenakker, Quantum (2018) Keep track of the Pauli frames in a simulated circuit level noise model

Main result

• Introduced a quite reliable way to build neural decoders for large distance toric code (and likely many other topological codes)

• Previous works: d~10 —> This work d=64,…

• Source code (and a Google Colab script) can be found at https://github.com/XiaotongNi/toric-code-neural-decoder

https://github.com/XiaotongNi/toric-code-neural-decoder

Motivation

Motivation• Same as several previous works


• A convenient way to adapt to experimental noise models



• Offer a nice combination of speed and accuracy for topological codes.




• Go to quite large input size / code distance





• Traditional decoders works for arbitrary size.





• Traditional decoders works for arbitrary size.

• Test versatility of NN / Find ways to overcome difficulties

Implementation

“Imitation learning”

1. Approximate the decoder in the paper.

Image from Colah's blog


1. Approximate the decoder in the paper.

2. Further optimization with (syndrome, correction) pairs

A Better decoder

Image from Colah's blog

RG decoder

coarse graining

0.08

0.08

0.08

RG decoder

0.9

0.1

• Replace detailed syndrome information with educated guess on the error rates of “coarse-grained qubits”.

• Educated guess done by belief propagation.

0.99

0.99

0.01belief

propagation

coarse graining

0.08

0.08

0.08

RG decoder

0.9

0.1



0.99

0.99

0.01belief

propagation

coarse graining

0.08

0.08

0.08

RG decoder

0.9

0.1



0.99

0.99

0.01belief

propagation

coarse graining

0.08

0.08

0.08BP NN

Stack NN together and train

… 0.9

0.1

xi yi

BP NN BP NN

Stack NN together and train

We can then train all layers altogether, using (syndrome, correction) pairs.

… 0.9

0.1

xi yi

BP NN BP NN

Performance

With less than 2 hours of training time, we can get similar performance compared to minimum-weight perfect matching on bit-flip noise.

5

Figure 5: In this figure, we compare the performance ofour neural decoder to the MWPM algorithm. The solidline for d = 16 and the dashed line for d = 64 decoderare using the strict training policy, while more traininghas been done on the d = 64 decoder corresponding tothe solid line. The “star” points are the performance ofthe minimum-weight perfect matching algorithm. The

colors of stars indicate the code distance they areevaluated on. The vertical grid indicates the physicalerror rates for which we evaluate the logical accuracyfor the lines. We see the performance of neural decodercan be almost as good as MWPM for a decent range of

physical error rates.

decoder in [3], where the authors have shown a thresholdof 8.2% when using 2 ⇥ 1 unit cell, and claim a thresh-old around 9.0% if using 2⇥ 2 unit cell. With the stricttraining policy, our neural decoder is slight better or atleast comparable to the RG decoder, while without thepolicy our neural decoder is clearly better for d 64.

V. DISCUSSION

One obvious question is whether we can get a goodneural decoder for surface code or other topological codes

on large lattices. In the case of surface code, the majordi↵erence compared to the toric code is the existenceof boundaries. This means we have to inject some non-translational invariant components into the network. Forexample, we can have a constant tensor B with shape(L,L, 2) marks the boundary, e.g. B(x, y, i) = 1 if (x, y)is at the smooth boundary and i = 0, or if (x, y) is at therough boundary and i = 1; otherwise B(x, y, i) = 0. Wethen stack B with the old input tensor before feed intothe neural decoder. More generally, if a renormalizationgroup decoder exists for a topological code, we antici-pate that a neural decoder can be trained to have simi-lar or better performance. For example, neural decodersfor surface code with measurement errors, for topologicalcodes with abelian anyons can be trained following thesame procedure describe in this paper.We want to discuss a bit more about running neural

networks on specialized chips. It is straightforward torun our neural decoder on GPU or TPU [18] as they aresupported by Tensorflow [19], the neural network libraryused in this work. There is software (e.g. OpenVINO) tocompile common neural networks to run on commerciallyavailable field-programmable gate arrays (FPGAs), butwe do not know how easy it is for our neural decoder3.Apart from power e�ciency, there is study about oper-ating FPGAs at 4K temperature [20]. Overall, there isa possibility to run neural decoders in low temperature.Note that for running on FPGAs or benchmarking thespeed, it is likely a good idea to first compress the neuralnetworks, see [21].

VI. ACKNOWLEDGEMENT

The author wants to thank Ben Criger, Barbara Ter-hal, Thomas O’Brien for useful discussion. The im-plementation of minimum-weight perfect matching algo-rithm (including the one used in Appendix E) is providedby Christophe Vuillot, which uses the backend from ei-ther Blossom V [22] or NetworkX [23].

[1] S. B. Bravyi and A. Y. Kitaev, (1998), arXiv:quant-ph/9811052v1.

[2] A. Kitaev, Annals of Physics 303, 2 (2003).[3] G. Duclos-Cianci and D. Poulin, Physical review letters

104, 050504 (2010).[4] G. Duclos-Cianci and D. Poulin, Quantum Information

& Computation 14, 721 (2014).[5] P. Baireuther, T. E. O Brien, B. Tarasinski, and C. W. J.

Beenakker, Quantum 2, 48 (2018).

3 The only uncommon component of our neural decoder is element-wise parity function.

[6] S. Varsamopoulos, B. Criger, and K. Bertels, QuantumScience and Technology 3, 015004 (2017).

[7] G. Torlai and R. G. Melko, Physical Review Letters 119,030501 (2017).

[8] N. P. Breuckmann and X. Ni, Quantum 2, 68 (2018).[9] P. Baireuther, M. D. Caio, B. Criger, C. W. J. Beenakker,

and T. E. O’Brien, “Neural network decoder for topo-logical color codes with circuit level noise,” (2018),arXiv:1804.02926v1.

[10] S. Krastanov and L. Jiang, Scientific Reports 7 (2017),10.1038/s41598-017-11266-1.

[11] N. Maskara, A. Kubica, and T. Jochym-O’Connor, “Ad-vantages of versatile neural-network decoding for topo-

About training

Offline training

Neural decoder

Any reasonable amount of training data and time

Goal: good decoder on noise model similar to experiments, or decoders that can utilize error rate information

About training

Offline training

Neural decoder

Any reasonable amount of training data and time

Neural decoder

Training with experimental data

Limited amount of training data (not focused in this work)

Goal: good decoder on noise model similar to experiments, or decoders that can utilize error rate information

offline trained model

parameter space of neural networks

optimal decoder for experiment

Adapt to other noise models

• Small changes in the noise model likely can be compensated by small changes in the early layers (need to break the translational symmetry in some way)

Spatially varying error rates

• Proof of principle for a toy noise model

noisy error rate=0.16 noiseless

Z

Z

Z

Z

Z

Z

Z

Z

Z

How it is done

… 0.9

0.1

BP NN BP NN0.08

0.08

0.08 0.08

…

…

How it is done

… 0.9

0.1

BP NN BP NN

…

…

How it is done

… 0.9

0.1

BP NN BP NN

…

…w1 w2

w3

w4

How it is done

… 0.9

0.1

BP NN BP NN

…

…w1 w2

w3

w4

Roughly speaking: fix parameters of the neural decoder and only train {wi}

How it is done

… 0.9

0.1

BP NN BP NN

…

…w1 w2

w3

w4

Roughly speaking: fix parameters of the neural decoder and only train {wi}

• Increase the logical accuracy from 0.967 to 0.993 after trained on this noise model (d=16)

General takeaway message


1. Approximate

Heuristic algorithm


1. Approximate2. Further optimization with data

A Better heuristic algorithmHeuristic algorithm

Heuristic algorithm

Subroutine 1 Subroutine 2

Subroutine 3

Lazy way of injecting our knowledge about the problem to the machine learning model

Better way to utilize knowledge?

Model

Data

Better way to utilize knowledge?

Model

Data

Physical law

Previous works

Documents

Designing neural decoders for large toric codes - mpl.mpg.de · Main result • Introduced a quite reliable way to build neural decoders for large distance toric code (and likely