Upload
phungnhu
View
215
Download
0
Embed Size (px)
Citation preview
Designing neural decoders for large toric codes
Xiaotong Ni (TU Delft)Erlangen, May 9
arXiv: 1809.06640
Toric code
Assuming only Pauli-X error on qubits Syndrome extraction is noiseless
Z
Z
Z
Z
Z
Z
Z
Z
Z
XX
XX
10% error rate
code distance
Toric code
Assuming only Pauli-X error on qubits Syndrome extraction is noiseless
Z
Z
Z
Z
Z
Z
Z
Z
Z
XX
XX
10% error rate
• Correction is sensitive to the small change of syndrome, and involves a lot of parity computation.
code distance
Some previous works of neural decoders
G. Torlai and R. G. Melko, Physical Review Letters (2017). S. Krastanov and L. Jiang, Scientific Reports 7 (2017) Repeatedly sample error configurations using RBM / NN
P. Baireuther, T. E. O'Brien, B. Tarasinski, and C. W. J. Beenakker, Quantum (2018) Keep track of the Pauli frames in a simulated circuit level noise model
Main result
• Introduced a quite reliable way to build neural decoders for large distance toric code (and likely many other topological codes)
• Previous works: d~10 —> This work d=64,…
• Source code (and a Google Colab script) can be found at https://github.com/XiaotongNi/toric-code-neural-decoder
Motivation
Motivation• Same as several previous works
Motivation• Same as several previous works
• A convenient way to adapt to experimental noise models
Motivation• Same as several previous works
• A convenient way to adapt to experimental noise models
• Offer a nice combination of speed and accuracy for topological codes.
Motivation• Same as several previous works
• A convenient way to adapt to experimental noise models
• Offer a nice combination of speed and accuracy for topological codes.
• Go to quite large input size / code distance
Motivation• Same as several previous works
• A convenient way to adapt to experimental noise models
• Offer a nice combination of speed and accuracy for topological codes.
• Go to quite large input size / code distance
• Traditional decoders works for arbitrary size.
Motivation• Same as several previous works
• A convenient way to adapt to experimental noise models
• Offer a nice combination of speed and accuracy for topological codes.
• Go to quite large input size / code distance
• Traditional decoders works for arbitrary size.
• Test versatility of NN / Find ways to overcome difficulties
Implementation
“Imitation learning”
1. Approximate the decoder in the paper.
Image from Colah's blog
“Imitation learning”
1. Approximate the decoder in the paper.
2. Further optimization with (syndrome, correction) pairs
A Better decoder
Image from Colah's blog
RG decoder
coarse graining
0.08
0.08
0.08
RG decoder
0.9
0.1
• Replace detailed syndrome information with educated guess on the error rates of “coarse-grained qubits”.
• Educated guess done by belief propagation.
0.99
0.99
0.01belief
propagation
coarse graining
0.08
0.08
0.08
RG decoder
0.9
0.1
• Replace detailed syndrome information with educated guess on the error rates of “coarse-grained qubits”.
• Educated guess done by belief propagation.
0.99
0.99
0.01belief
propagation
coarse graining
0.08
0.08
0.08
RG decoder
0.9
0.1
• Replace detailed syndrome information with educated guess on the error rates of “coarse-grained qubits”.
• Educated guess done by belief propagation.
0.99
0.99
0.01belief
propagation
coarse graining
0.08
0.08
0.08BP NN
Stack NN together and train
… 0.9
0.1
xi yi
BP NN BP NN
Stack NN together and train
We can then train all layers altogether, using (syndrome, correction) pairs.
… 0.9
0.1
xi yi
BP NN BP NN
Performance
With less than 2 hours of training time, we can get similar performance compared to minimum-weight perfect matching on bit-flip noise.
5
Figure 5: In this figure, we compare the performance ofour neural decoder to the MWPM algorithm. The solidline for d = 16 and the dashed line for d = 64 decoderare using the strict training policy, while more traininghas been done on the d = 64 decoder corresponding tothe solid line. The “star” points are the performance ofthe minimum-weight perfect matching algorithm. The
colors of stars indicate the code distance they areevaluated on. The vertical grid indicates the physicalerror rates for which we evaluate the logical accuracyfor the lines. We see the performance of neural decodercan be almost as good as MWPM for a decent range of
physical error rates.
decoder in [3], where the authors have shown a thresholdof 8.2% when using 2 ⇥ 1 unit cell, and claim a thresh-old around 9.0% if using 2⇥ 2 unit cell. With the stricttraining policy, our neural decoder is slight better or atleast comparable to the RG decoder, while without thepolicy our neural decoder is clearly better for d 64.
V. DISCUSSION
One obvious question is whether we can get a goodneural decoder for surface code or other topological codes
on large lattices. In the case of surface code, the majordi↵erence compared to the toric code is the existenceof boundaries. This means we have to inject some non-translational invariant components into the network. Forexample, we can have a constant tensor B with shape(L,L, 2) marks the boundary, e.g. B(x, y, i) = 1 if (x, y)is at the smooth boundary and i = 0, or if (x, y) is at therough boundary and i = 1; otherwise B(x, y, i) = 0. Wethen stack B with the old input tensor before feed intothe neural decoder. More generally, if a renormalizationgroup decoder exists for a topological code, we antici-pate that a neural decoder can be trained to have simi-lar or better performance. For example, neural decodersfor surface code with measurement errors, for topologicalcodes with abelian anyons can be trained following thesame procedure describe in this paper.We want to discuss a bit more about running neural
networks on specialized chips. It is straightforward torun our neural decoder on GPU or TPU [18] as they aresupported by Tensorflow [19], the neural network libraryused in this work. There is software (e.g. OpenVINO) tocompile common neural networks to run on commerciallyavailable field-programmable gate arrays (FPGAs), butwe do not know how easy it is for our neural decoder3.Apart from power e�ciency, there is study about oper-ating FPGAs at 4K temperature [20]. Overall, there isa possibility to run neural decoders in low temperature.Note that for running on FPGAs or benchmarking thespeed, it is likely a good idea to first compress the neuralnetworks, see [21].
VI. ACKNOWLEDGEMENT
The author wants to thank Ben Criger, Barbara Ter-hal, Thomas O’Brien for useful discussion. The im-plementation of minimum-weight perfect matching algo-rithm (including the one used in Appendix E) is providedby Christophe Vuillot, which uses the backend from ei-ther Blossom V [22] or NetworkX [23].
[1] S. B. Bravyi and A. Y. Kitaev, (1998), arXiv:quant-ph/9811052v1.
[2] A. Kitaev, Annals of Physics 303, 2 (2003).[3] G. Duclos-Cianci and D. Poulin, Physical review letters
104, 050504 (2010).[4] G. Duclos-Cianci and D. Poulin, Quantum Information
& Computation 14, 721 (2014).[5] P. Baireuther, T. E. O Brien, B. Tarasinski, and C. W. J.
Beenakker, Quantum 2, 48 (2018).
3 The only uncommon component of our neural decoder is element-wise parity function.
[6] S. Varsamopoulos, B. Criger, and K. Bertels, QuantumScience and Technology 3, 015004 (2017).
[7] G. Torlai and R. G. Melko, Physical Review Letters 119,030501 (2017).
[8] N. P. Breuckmann and X. Ni, Quantum 2, 68 (2018).[9] P. Baireuther, M. D. Caio, B. Criger, C. W. J. Beenakker,
and T. E. O’Brien, “Neural network decoder for topo-logical color codes with circuit level noise,” (2018),arXiv:1804.02926v1.
[10] S. Krastanov and L. Jiang, Scientific Reports 7 (2017),10.1038/s41598-017-11266-1.
[11] N. Maskara, A. Kubica, and T. Jochym-O’Connor, “Ad-vantages of versatile neural-network decoding for topo-
About training
Offline training
Neural decoder
Any reasonable amount of training data and time
Goal: good decoder on noise model similar to experiments, or decoders that can utilize error rate information
About training
Offline training
Neural decoder
Any reasonable amount of training data and time
Neural decoder
Training with experimental data
Limited amount of training data (not focused in this work)
Goal: good decoder on noise model similar to experiments, or decoders that can utilize error rate information
offline trained model
parameter space of neural networks
optimal decoder for experiment
Adapt to other noise models
• Small changes in the noise model likely can be compensated by small changes in the early layers (need to break the translational symmetry in some way)
Spatially varying error rates
• Proof of principle for a toy noise model
noisy error rate=0.16 noiseless
Z
Z
Z
Z
Z
Z
Z
Z
Z
How it is done
… 0.9
0.1
BP NN BP NN0.08
0.08
0.08 0.08
…
…
How it is done
… 0.9
0.1
BP NN BP NN
…
…
How it is done
… 0.9
0.1
BP NN BP NN
…
…w1 w2
w3
w4
How it is done
… 0.9
0.1
BP NN BP NN
…
…w1 w2
w3
w4
Roughly speaking: fix parameters of the neural decoder and only train {wi}
How it is done
… 0.9
0.1
BP NN BP NN
…
…w1 w2
w3
w4
Roughly speaking: fix parameters of the neural decoder and only train {wi}
• Increase the logical accuracy from 0.967 to 0.993 after trained on this noise model (d=16)
General takeaway message
“Imitation learning”
1. Approximate
Heuristic algorithm
“Imitation learning”
1. Approximate2. Further optimization with data
A Better heuristic algorithmHeuristic algorithm
Heuristic algorithm
Subroutine 1 Subroutine 2
Subroutine 3
Lazy way of injecting our knowledge about the problem to the machine learning model
Better way to utilize knowledge?
Model
Data
Better way to utilize knowledge?
Model
Data
Physical law
Previous works