19
Research Article An FPGA-Based Quantum Computing Emulation Framework Based on Serial-Parallel Architecture Y. H. Lee, M. Khalil-Hani, and M. N. Marsono VeCAD Research Laboratory, Faculty of Electrical Engineering, Universiti Teknologi Malaysia (UTM), 81310 Skudai, Johor Bahru, Malaysia Correspondence should be addressed to M. Khalil-Hani; [email protected] Received 13 October 2015; Revised 19 February 2016; Accepted 14 March 2016 Academic Editor: Jo˜ ao Cardoso Copyright © 2016 Y. H. Lee et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than soſtware simulations. In this paper, an efficient hardware emulation method that employs a serial- parallel hardware architecture targeted for field programmable gate array (FPGA) is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. e proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits. 1. Introduction Quantum computing is based on the properties of quantum mechanics, namely, superposition and entanglement. Super- position allows a quantum state to be in more than one basis state simultaneously, whereas entanglement is the strong correlation between multiqubit (quantum bit) basis states in a quantum system. Superposition and entanglement facilitate massive parallelism which enables exponential speed-ups to be achieved in the well-known integer factoring and discrete logarithms algorithms [1] and quadratic speed-ups in solving classically intractable brute-force searching and optimization problems [2, 3]. Similar to classical computing, quantum algorithms are developed long before any large-scale practical quantum computer is physically available. In 1994, Shor proposed the integer factoring and discrete logarithms algorithms [1] that brought the world’s attention to the enormous potential of quantum computing. An example of this is the Rivest- Shamir-Adleman (RSA) security scheme [4] which is widely applied in current public key cryptosystem. It is based on the assumption that integer factoring of large number is intractable in classical computing. Shor’s proposal, which, in contrast, factors integer in polynomial time, would make such security scheme no longer secure. In [5], Grover proposed a quantum search algorithm that is capable of identifying a specific element in an unordered elements database in (Π/4)√ attempts. is algorithm achieves a quadratic speed-up over the corresponding classical method that requires /2 queries on average, to retrieve the desired data. Although the solution is only polynomially faster than the classical approach, Grover’s quantum algorithm is an important one as it can be generalized to be applied in many intractable computer science problems. Recently, quantum equivalents for random walks [6], genetic algorithms [3], and NAND tree evaluation [7] have been developed. Shor in [8] categorized quantum algorithms known to provide substantial speed-up over the classical approach into three types: (a) algorithms that achieve notable speed-up by applying quantum Fourier transform (QFT) in periodicity finding; examples of this type of algorithm include integer factoring and discrete logarithms algorithms [1], Simon’s periodicity algorithm [9], Hallgren’s algorithms for Pell’s equation [10], and the quantum algorithms for solving hidden subgroup problems [11, 12]; (b) Grover’s search algorithm and its extensions [2, 13] which in general offer square root Hindawi Publishing Corporation International Journal of Reconfigurable Computing Volume 2016, Article ID 5718124, 18 pages http://dx.doi.org/10.1155/2016/5718124

Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

Research ArticleAn FPGA-Based Quantum Computing Emulation FrameworkBased on Serial-Parallel Architecture

Y H Lee M Khalil-Hani and M N Marsono

VeCAD Research Laboratory Faculty of Electrical Engineering Universiti Teknologi Malaysia (UTM) 81310 SkudaiJohor Bahru Malaysia

Correspondence should be addressed to M Khalil-Hani khalilfkeutmmy

Received 13 October 2015 Revised 19 February 2016 Accepted 14 March 2016

Academic Editor Joao Cardoso

Copyright copy 2016 Y H Lee et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Hardware emulation of quantum systems canmimicmore efficiently the parallel behaviour of quantumcomputations thus allowinghigher processing speed-up than software simulations In this paper an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA) is proposed Quantum Fourier transform andGroverrsquos search are chosen as case studies in this work since they are the core of many useful quantum algorithms Experimentalwork shows that with the proposed emulation architecture a linear reduction in resource utilization is attained against the pipelineimplementations proposed in prior worksThe proposed work contributes to the formulation of a proof-of-concept baseline FPGAemulation frameworkwith optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits

1 Introduction

Quantum computing is based on the properties of quantummechanics namely superposition and entanglement Super-position allows a quantum state to be in more than one basisstate simultaneously whereas entanglement is the strongcorrelation between multiqubit (quantum bit) basis states ina quantum system Superposition and entanglement facilitatemassive parallelism which enables exponential speed-ups tobe achieved in the well-known integer factoring and discretelogarithms algorithms [1] and quadratic speed-ups in solvingclassically intractable brute-force searching and optimizationproblems [2 3]

Similar to classical computing quantum algorithms aredeveloped long before any large-scale practical quantumcomputer is physically available In 1994 Shor proposedthe integer factoring and discrete logarithms algorithms [1]that brought the worldrsquos attention to the enormous potentialof quantum computing An example of this is the Rivest-Shamir-Adleman (RSA) security scheme [4] which is widelyapplied in current public key cryptosystem It is based onthe assumption that integer factoring of large number isintractable in classical computing Shorrsquos proposal which

in contrast factors integer in polynomial time would makesuch security scheme no longer secure In [5] Groverproposed a quantum search algorithm that is capable ofidentifying a specific element in an unordered 119898 elementsdatabase in (Π4)radic119898 attempts This algorithm achieves aquadratic speed-up over the corresponding classical methodthat requires 1198982 queries on average to retrieve the desireddata Although the solution is only polynomially faster thanthe classical approach Groverrsquos quantum algorithm is animportant one as it can be generalized to be applied in manyintractable computer science problems Recently quantumequivalents for randomwalks [6] genetic algorithms [3] andNAND tree evaluation [7] have been developed

Shor in [8] categorized quantum algorithms known toprovide substantial speed-up over the classical approach intothree types (a) algorithms that achieve notable speed-up byapplying quantum Fourier transform (QFT) in periodicityfinding examples of this type of algorithm include integerfactoring and discrete logarithms algorithms [1] Simonrsquosperiodicity algorithm [9] Hallgrenrsquos algorithms for Pellrsquosequation [10] and the quantumalgorithms for solving hiddensubgroup problems [11 12] (b) Groverrsquos search algorithmand its extensions [2 13] which in general offer square root

Hindawi Publishing CorporationInternational Journal of Reconfigurable ComputingVolume 2016 Article ID 5718124 18 pageshttpdxdoiorg10115520165718124

2 International Journal of Reconfigurable Computing

speed improvements over their classical counterparts and (c)algorithms for simulating or solving problems in quantummechanics [14]

Physical realization of a quantum computer is proving tobe extremely challenging [15]With research into viable large-scale quantum computers still ongoing various technologiesnamely ion trap [16] nuclear magnetic resonance [17]and superconductor [18] were attempted Nevertheless onlysmall-scale quantum computation implementations havebeen achieved [19 20] Instead of focusing on the realizationof quantum gates a different approach known as quantumannealing which solves optimization problems by findingthe minimum point is used in the 128-qubit D-Wave One512-qubit D-Wave Two and 1000-qubit D-Wave 2X systems[21 22] However based on the research report presented in[23] the expected quantum speed-ups were not found in theD-Wave systems

In parallel to efforts to develop physical quantum com-puters there is also much effort in the theoretical researchof quantum algorithms Until large-scale practical quantumcomputers become prevalent quantum algorithms are cur-rently developed using the classical computing platformHowever due to their inherent sequential behaviour classicalcomputers that are based on Von Neumann architecturecannot simulate the inherent parallelism in quantum systemsefficiently On the other hand the technology of field pro-grammable gate array (FPGA) offers the potential of massiveparallelism through hardware emulation Consequently sig-nificant improvement in speed performance over the equiv-alent software simulation can be achieved However FPGAis still a form of classical digital computing and resourceutilization on such a classical computing platform growsexponentially as the number of qubits increasesThe problemis further compounded with the fact that accurate modellingof quantum circuit in FPGA technology is nonintuitive andtherefore difficult providing the research motivation for thispaper

This paper presents an efficient FPGA emulation frame-work for quantum computing In the proposed emulationmodel quantumcomputations aremapped to a serial-parallelarchitecture that facilitates scalability by managing the expo-nential growth of resource requirement against number ofqubits Quantum Fourier transform and Groverrsquos search arechosen as case studies in this work since they are the coreof many useful quantum algorithms and in addition theyhave been used as benchmarking models in prior works onFPGA emulation Experimental results on the efficienciesof different FPGA emulation architectures and fixed pointformats are presented whichwill sufficiently demonstrate thefeasibility of proposed framework

The rest of this paper is organized as follows Section 2discusses prior works on FPGA-based quantum computingemulation emphasizing issues of hardware architecture andmodelling of quantum system on FPGA platform In Sec-tion 3 the theoretical background on quantum computingand related quantum algorithms is provided Section 4presents the design of the proposed FPGA emulation modelsforQFT andGroverrsquos search algorithms Experimental results

and analysis are given in Section 5 Finally concludingremarks are made in Section 6

2 Related Work

Modelling of a quantum system on classical computingplatform is a challenging task Hence it is even more diffi-cult to map quantum algorithms for emulation on classicalcomputing environment based on FPGA which is highlyresource-constrained Many attempts have been made in thelast decade in FPGA emulation of quantum algorithms andthese works include [24ndash27] However details of the criticaldesign processes such as mapping of the quantum algorithmsinto the FPGA emulation models and the verification of theimplementations are not revealed in these prior works

For software-based simulation using classical computervarious types of quantum simulators have been proposedAn open source C library 119897119894119887119902119906119886119899119905119906119898 for simulation ofquantumcomputing is presented in [28]where pure quantumcomputer simulation as well as general quantum simulationis supported by the tool In 2007 a variant of binary deci-sion diagram named quantum information decision diagram(QuIDD) for compact state vector storage was introducedin [29] for efficient quantum circuit simulation Garcıa andMarkov [30] proposed a compact data structure based onstabilizer formalism called stabilizer frames

Most of the previous FPGA emulation works are basedon the quantum circuit model which is essentially aninterconnection of quantum gates A different approach wastaken by Goto and Fujishima [24] where a general purposequantum processor was developed instead of applying thequantum circuit model However Fujishimarsquos quantum pro-cessor assumed that the amplitudes of a quantum state canbe either all zeros or with evenly distributed probability Inits emulation of Shorrsquos integer factoring algorithm detailsof the implementation are inadequate for its results to beverified as claimed For instance it is stated in [24] that a64-bit factorization was demonstrated using their emulatorwith only 40Kbits of classicalmemory instead of 320 qubits asrequired with Shorrsquos algorithm in a quantum computer Thisstatement was not supported by design and implementationdetails on how factorization of such a large integer can bedone with only 40Kbits memory where typically it wouldrequire at least 2320 bytes to represent a quantum state of sucha scale on the classical platform

In [25] FPGA emulation of 3-qubit QFT and Groverrsquossearch are proposed In thiswork which is based on the quan-tum circuit model qubit expansion is performed prior to theapplication ofmultiqubit quantum gate transformationsThisleads to an inaccurate modelling of a quantum algorithmsince according to [31] the input quantum state to QFTcircuit should first be placed in superposition of basis stateswhere signal samples are encoded as sequence of amplitudesIn the work by [26] hardware emulation of QFT restrictsits input quantum state to the computational basis stateimplying that superposition is not included in the modellingRivera-Miranda et al in [26] claims 16-qubit QFT emulationis achieved However the emulator can only process up to 32

International Journal of Reconfigurable Computing 3

input signal samples in one evaluation which is equivalentto a 5-qubit QFT emulation if effects of superposition andentanglement are included

From the above discussion it should be noted thenthat the critical quantum properties of superposition andentanglement were not considered in these previous worksresulting in inaccurate modelling of quantum algorithmsWithout the superposition and entanglement effects thepower of quantum parallelism cannot fully be exploited Pre-vious works reported in [25ndash27] applied pipeline architecturein their FPGA emulation implementations so as to obtainhigh throughput and low critical path delay However apipeline design imposes high resource utilization (due to therequirement of additional pipeline registers and associatedlogic) thus limiting FPGA emulation to be deployed in morepractical quantum computing applications that typicallyrequire high qubit sizes In these pipeline implementationsproposed in prior works resource growth was exponential tothe increase in qubit sizes

In this paper the issues outline above is addressed Theefficiencies of different hardware architectural designs forFPGA emulation purposes are evaluated based on the chosencase studies of QFT and Groverrsquos search We propose anaccurate modelling of quantum system for FPGA emulationtargeting efficient resource utilization while maintainingsignificant speed-up over the equivalent simulation approachSince our proposed FPGA emulation framework appliesthe state vector approach simulation models based on the119897119894119887119902119906119886119899119905119906119898 library are selected in this work for benchmark-ing purposes

3 Theoretical Background

In general quantum algorithms obey the basic process flowstructure The computation process begins with a system setin a specific quantum state which is then converted intosuperposition of multiple basis states Unitary transforma-tions are performed on the quantum state according to therequired operations of the algorithm Finally measurementis carried out resulting in the qubits collapsing into classicalbits

31 Quantum Bit (Qubit) In classical computing the small-est unit of information is the bit A bit can be in either state 0or state 1 and the state of a bit can be represented in matrixform as

state 0 =01[

1

0

]

state 1 =01[

0

1

]

(1)

On the other hand in quantum computing the smallestunit of information is the quantum bit or a qubit Todistinguish the classical bit with the quantum qubit Diracket notation is used Using the ket notation the quantumcomputational basis state is represented by |0⟩ and |1⟩ A

qubit can be in state |0⟩ or in state |1⟩ or in superpositionof both basis states The state of a qubit can be represented as

1003816100381610038161003816120595⟩ = 120572 |0⟩ + 120573 |1⟩ equiv

01[

120572

120573

] (2)

where both 120572 and 120573 are complex numbers and |120572|2 + |120573|2 =1 |120572|2 is the probability where the qubit is in state |0⟩ and|120573|2 is the probability where the qubit is in state |1⟩ upon

measurement An 119899-qubit quantum state vector contains2119899 complex numbers which represents the measurementprobability of each basis state However on measurementthe superposition is destroyed and the qubits return to theclassical state of bits depending on the probability derivedfrom the complex-valued state vector

32 TensorKronecker Product Tensor product or Kroneckerproduct is the basic operation that is applied in the formationof a larger quantum system as well as multiqubit quantumtransformations A quantum state vector that can be writtenas the tensor of two vectors is separable whereas a statevector that cannot be expressed as the tensor of two vectorsis entangled [15] The tensor operation on any arbitrary two1-qubit transformations is shown below

[

11988601198861

11988621198863

] otimes [

11988701198871

11988721198873

] =

[

[

[

[

[

[

11988601198870119886011988711198861119887011988611198871

11988601198872119886011988731198861119887211988611198873

11988621198870119886211988711198863119887011988631198871

11988621198872119886211988731198863119887211988631198873

]

]

]

]

]

]

(3)

33 Quantum Circuit Model A quantum algorithm is adescription of a sequence of quantum operations (or trans-formations) applied upon qubits to generate new quantumstates The model most widely used in describing the evolu-tion of a quantum system is the quantum circuit model firstproposed in [32] A quantum circuit is the interconnection ofquantum gates with quantum wires and gate operations arerepresented by unitary matrices

All unitary matrices are invertible and the products ofunitary matrices as well as the inverse of unitary matrix areunitary An119873-by-119873matrix119880 is unitary if119880119880dagger = 119880dagger119880 = 119868

119873

where 119880dagger is the adjoint (conjugate transpose) of 119880 Sinceall quantum transformations are reversible quantum gateoperations can always be undone Fundamental quantumgates include the Hadamard gate phase-shift gate and swapgate and these gates are described as follows

Hadamard gate 119867 is one of the most useful single qubitquantum gates It operates by placing the computational basisstate into superposition of basis states with equal probabilityTheHadamard transform can be represented by the followingunitary matrix

119867 =

1

radic2

[

1 1

1 minus1

] (4)

4 International Journal of Reconfigurable Computing

The following example illustrates the application ofHadamard gates in mapping a 2-qubit basis state |00⟩ to asuperposition of basis states with equal probability

|00⟩

119867otimes119867

997888997888997888997888rarr

1

2

(|00⟩ + |01⟩ + |10⟩ + |11⟩)

(

1

radic2

[

1 1

1 minus1

] otimes

1

radic2

[

1 1

1 minus1

])

[

[

[

[

[

[

1

0

0

0

]

]

]

]

]

]

=

1

2

[

[

[

[

[

[

1

1

1

1

]

]

]

]

]

]

(5)

Controlled phase-shift gate 119862119877119896operates on 2 qubits one

ofwhich is the control qubit and the other is the target qubit Ifthe control qubit is true a phase-shift operation is performedon the target qubit otherwise there is no operation Theoperation is represented by the following matrix

119862119877119896=

[

[

[

[

[

[

[

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 11989021205871198942

119896

]

]

]

]

]

]

]

(6)

Quantum SWAP gate is used for swapping two qubitsIt switches the amplitudes of a quantum state vector Theoperation of a 2-qubit SWAP gate is represented by matrixin

SWAP =[

[

[

[

[

[

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

]

]

]

]

]

]

(7)

34 Quantum Fourier Transform (QFT) The Fourier trans-form is deployed in wide range of engineering and physicsapplications such as signal processing image processingand quantum mechanics It is a reversible transformationthat converts signals from timespatial domain to frequencydomain and vice versa The Fourier transform is defined in(8) for continuous signals and in (9) for discrete signals

119883(119891) = int

infin

minusinfin

119909 (119905) 119890minus1198942120587119891119905

119889119905 (8)

119883119896=

119873minus1

sum

119899=0

119909119899119890minus1198942120587(119896119899119873)

(9)

The quantum Fourier transform (QFT) is a transfor-mation on qubits and is the quantum equivalent of thediscrete Fourier transform It should be noted that a quantumcomputer performs QFT with exponentially less number ofoperations than the classical Fourier transform HoweverQFT does not reduce the execution time of the algorithmwhen classical data is used This is due to the characteristicof the quantum computer that does not allow parallel read-out of all quantum state amplitudes In addition there is no

known method that can effectively instantiate the desiredinput state amplitudes to be Fourier-transformed [33]

In order to harness the power of quantum computingon Fourier transform QFT has to be deployed within otherpractical applications QFT is pivotal in quantum computingsince it is part ofmany quantumalgorithmsThese algorithmsinclude integer factorization and discrete logarithms algo-rithms [1] Simonrsquos periodicity algorithm [9] and Hallgrenrsquosalgorithms [10] They offer significant speed-up over theirclassical counterparts QFT has also found applications inmany real-world problems such as image watermarking [34]and template matching [35]

To compute Fourier transform in quantum domain dis-crete signal samples are encoded as the amplitude sequencesof a quantum state vector which is in superposition of basisstates [31] An 119899-qubit QFT operation which transformsan arbitrary superposition of computational basis states isexpressed in

1003816100381610038161003816120595⟩ =

1

radic2119899

2119899minus1

sum

119895=0

119891 (119895Δ119905)1003816100381610038161003816119895⟩

QFT997888997888997888rarr

1

radic2119899

2119899minus1

sum

119896=0

2119899minus1

sum

119895=0

119891 (119895Δ119905) 1198902120587119894(1198951198962

119899)|119896⟩

(10)

As the requirement for a valid quantum state |120595⟩must benormalized such that it fulfils (11) If the original signal inputsdo not comply with this requirement the amplitudes of thesignal samples have to be divided by the normalization factorradicsum2119899minus1

119897=0|119891(119897Δ119905)|

2 In most cases the input states formed bythe normalized signal samples are entangled

2119899minus1

sum

119895=0

1003816100381610038161003816119891 (119895Δ119905)

1003816100381610038161003816

2= 1 (11)

From (10) it can be observed that the term 1198952119899 in QFTequation is a rational number in the range of 0 le 1198952119899 lt 1 Asqubit representation is typically used in computations the 119895in base-10 integer is redefined in base-2 notation as individualbit such that the binary fraction form as expressed in (12) canbe conveniently adopted

(119895)10equiv (11989511198952sdot sdot sdot 119895119899)2

= (2119899minus11198951+ 2119899minus21198952+ sdot sdot sdot + 2

0119895119899)10

= 2119899(2minus11198951+ 2minus21198952+ sdot sdot sdot + 2

minus119899119895119899)10

= 2119899(0 sdot 11989511198952sdot sdot sdot 119895119899)2

(12)

International Journal of Reconfigurable Computing 5

|k1⟩

|k2⟩

|knminus1⟩

|kn⟩H

H

H

H

R2

R2 Rnmiddot middot middot

middot middot middot Rnminus2 Rnminus1

1

radic2(|0⟩ + e2120587i0middotj1j2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj119899minus1j119899|1⟩)

|j1⟩

|j2⟩

|jnminus1⟩

|jn⟩

1

radic2(|0⟩ + e2120587i0middotj119899|1⟩)

Figure 1 Quantum circuit model for n-qubit QFT

With some algebraic manipulations the QFT equationcan be derived from (13) to form (14) [33]

QFT2119899

1003816100381610038161003816119895⟩ =

1

radic2119899

2119899minus1

sum

119896=0

1198902120587119894(1198951198962

119899)|119896⟩ (13)

=

1

radic2119899(|0⟩ + 119890

21205871198940sdot119895119899|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895119899minus1119895119899

|1⟩) sdot sdot sdot (|0⟩ + 11989021205871198940sdot11989511198952 sdotsdotsdot119895119899

|1⟩)

(14)

Since the term 11989021205871198940sdot1198951 produces either minus1 if 1198951= 1 or +1

otherwise Hadamard computation on the first qubit resultsin (1radic2)(|0⟩ + 11989021205871198940sdot1198951 |1⟩) Computations of the consecutivebits in the binary fraction are obtained using controlledphase-shift gates according to (14) QFT circuit consists ofthree types of elementary gates which are Hadamard gate119867controlled phase-shift gate 119862119877

119896 and SWAP gate The circuit

model of an 119899-qubit QFT is depicted in Figure 1The size of a QFT circuit grows exponentially as the

number of input qubits increases An 119899-qubit QFT involvessum119899

119896=1119896+1 unitary transformations and could process up to 2119899

input samples in one evaluation (provided the input samplesare encoded as the amplitude sequences of a superposition ofcomputational basis states)

35 Groverrsquos Search Algorithm In computer science area atypical search problem is to identify the desired element froman unordered array For many computing applications it iscritical that the search technique is efficient In terms of afunction the search problem 119891 0 1119899 rarr 0 1 is givenwith assurance that there exists one binary string 119909

0where

119891(119909) = 1 if 119909 = 1199090 else 119891(119909) = 0

In classical computing 1198982 queries on average arerequired to search for a particular element in an unorderedarray with 119898 elements In quantum computing Groverrsquossearch algorithm can complete the job in (Π4)radic119898 queries(for the rest of the text and figures in Section 35 the requiredGrover iterations are abbreviated as radic2119899 times) Althoughthe speed-up achieved is only quadratic Groverrsquos algorithmand its extensions are extremely useful in enhancing currentmethods in solving database searching and optimizationproblems which include 3-satisfiability [36] global opti-mization [37] minimum point searching [38] and patternmatching [39] The core operations of Groverrsquos algorithm arephase inversion and inversion about mean Phase inversion

|x⟩

|1⟩

|1205930⟩ |1205931⟩ |1205932⟩

Uf

nn

H

Figure 2 Quantum circuit for phase inversion [15]

inverts the phase of the state-of-interest and its quantumcircuit model is given in Figure 2

In Figure 2 the top 119899-qubit |x⟩ is the target qubit andthe bottom qubit is called the ancilla qubit The functionof 119880119891is to pick out the desired binary string To apply

phase inversion on target qubits Hadamard gate operationis performed on the ancilla qubit which is initialized as |1⟩This is to complement the effect of 119880

119891which takes |x 119910⟩ to

|x 119891(x) oplus 119910⟩In terms of matrices the phase inversion operation can

be expressed as 119880119891(119868119899oplus 119867)|x 1⟩ and the corresponding

quantum states are described as follows10038161003816100381610038161205930⟩ = |x 1⟩

10038161003816100381610038161205931⟩ = |x⟩ [ |0⟩ minus |1⟩

radic2

] = [

|x 0⟩ minus |x 1⟩radic2

]

10038161003816100381610038161205932⟩ = |x⟩ [

1003816100381610038161003816119891 (x) oplus 0⟩ minus 100381610038161003816

1003816119891 (x) oplus 1⟩

radic2

]

= |x⟩ [1003816100381610038161003816119891 (x)⟩ minus 1003816100381610038161003816

1003816119891 (x)⟩

radic2

]

= (minus1)119891(x)|x⟩ [ |0⟩ minus |1⟩

radic2

]

=

minus1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

+1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

(15)

Inversion about mean boosts the phase separationbetween the element-of-interest and other elements in theunordered arrays (after phase inversion operation is appliedto invert the phase of the target element) The mean of

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

2 International Journal of Reconfigurable Computing

speed improvements over their classical counterparts and (c)algorithms for simulating or solving problems in quantummechanics [14]

Physical realization of a quantum computer is proving tobe extremely challenging [15]With research into viable large-scale quantum computers still ongoing various technologiesnamely ion trap [16] nuclear magnetic resonance [17]and superconductor [18] were attempted Nevertheless onlysmall-scale quantum computation implementations havebeen achieved [19 20] Instead of focusing on the realizationof quantum gates a different approach known as quantumannealing which solves optimization problems by findingthe minimum point is used in the 128-qubit D-Wave One512-qubit D-Wave Two and 1000-qubit D-Wave 2X systems[21 22] However based on the research report presented in[23] the expected quantum speed-ups were not found in theD-Wave systems

In parallel to efforts to develop physical quantum com-puters there is also much effort in the theoretical researchof quantum algorithms Until large-scale practical quantumcomputers become prevalent quantum algorithms are cur-rently developed using the classical computing platformHowever due to their inherent sequential behaviour classicalcomputers that are based on Von Neumann architecturecannot simulate the inherent parallelism in quantum systemsefficiently On the other hand the technology of field pro-grammable gate array (FPGA) offers the potential of massiveparallelism through hardware emulation Consequently sig-nificant improvement in speed performance over the equiv-alent software simulation can be achieved However FPGAis still a form of classical digital computing and resourceutilization on such a classical computing platform growsexponentially as the number of qubits increasesThe problemis further compounded with the fact that accurate modellingof quantum circuit in FPGA technology is nonintuitive andtherefore difficult providing the research motivation for thispaper

This paper presents an efficient FPGA emulation frame-work for quantum computing In the proposed emulationmodel quantumcomputations aremapped to a serial-parallelarchitecture that facilitates scalability by managing the expo-nential growth of resource requirement against number ofqubits Quantum Fourier transform and Groverrsquos search arechosen as case studies in this work since they are the coreof many useful quantum algorithms and in addition theyhave been used as benchmarking models in prior works onFPGA emulation Experimental results on the efficienciesof different FPGA emulation architectures and fixed pointformats are presented whichwill sufficiently demonstrate thefeasibility of proposed framework

The rest of this paper is organized as follows Section 2discusses prior works on FPGA-based quantum computingemulation emphasizing issues of hardware architecture andmodelling of quantum system on FPGA platform In Sec-tion 3 the theoretical background on quantum computingand related quantum algorithms is provided Section 4presents the design of the proposed FPGA emulation modelsforQFT andGroverrsquos search algorithms Experimental results

and analysis are given in Section 5 Finally concludingremarks are made in Section 6

2 Related Work

Modelling of a quantum system on classical computingplatform is a challenging task Hence it is even more diffi-cult to map quantum algorithms for emulation on classicalcomputing environment based on FPGA which is highlyresource-constrained Many attempts have been made in thelast decade in FPGA emulation of quantum algorithms andthese works include [24ndash27] However details of the criticaldesign processes such as mapping of the quantum algorithmsinto the FPGA emulation models and the verification of theimplementations are not revealed in these prior works

For software-based simulation using classical computervarious types of quantum simulators have been proposedAn open source C library 119897119894119887119902119906119886119899119905119906119898 for simulation ofquantumcomputing is presented in [28]where pure quantumcomputer simulation as well as general quantum simulationis supported by the tool In 2007 a variant of binary deci-sion diagram named quantum information decision diagram(QuIDD) for compact state vector storage was introducedin [29] for efficient quantum circuit simulation Garcıa andMarkov [30] proposed a compact data structure based onstabilizer formalism called stabilizer frames

Most of the previous FPGA emulation works are basedon the quantum circuit model which is essentially aninterconnection of quantum gates A different approach wastaken by Goto and Fujishima [24] where a general purposequantum processor was developed instead of applying thequantum circuit model However Fujishimarsquos quantum pro-cessor assumed that the amplitudes of a quantum state canbe either all zeros or with evenly distributed probability Inits emulation of Shorrsquos integer factoring algorithm detailsof the implementation are inadequate for its results to beverified as claimed For instance it is stated in [24] that a64-bit factorization was demonstrated using their emulatorwith only 40Kbits of classicalmemory instead of 320 qubits asrequired with Shorrsquos algorithm in a quantum computer Thisstatement was not supported by design and implementationdetails on how factorization of such a large integer can bedone with only 40Kbits memory where typically it wouldrequire at least 2320 bytes to represent a quantum state of sucha scale on the classical platform

In [25] FPGA emulation of 3-qubit QFT and Groverrsquossearch are proposed In thiswork which is based on the quan-tum circuit model qubit expansion is performed prior to theapplication ofmultiqubit quantum gate transformationsThisleads to an inaccurate modelling of a quantum algorithmsince according to [31] the input quantum state to QFTcircuit should first be placed in superposition of basis stateswhere signal samples are encoded as sequence of amplitudesIn the work by [26] hardware emulation of QFT restrictsits input quantum state to the computational basis stateimplying that superposition is not included in the modellingRivera-Miranda et al in [26] claims 16-qubit QFT emulationis achieved However the emulator can only process up to 32

International Journal of Reconfigurable Computing 3

input signal samples in one evaluation which is equivalentto a 5-qubit QFT emulation if effects of superposition andentanglement are included

From the above discussion it should be noted thenthat the critical quantum properties of superposition andentanglement were not considered in these previous worksresulting in inaccurate modelling of quantum algorithmsWithout the superposition and entanglement effects thepower of quantum parallelism cannot fully be exploited Pre-vious works reported in [25ndash27] applied pipeline architecturein their FPGA emulation implementations so as to obtainhigh throughput and low critical path delay However apipeline design imposes high resource utilization (due to therequirement of additional pipeline registers and associatedlogic) thus limiting FPGA emulation to be deployed in morepractical quantum computing applications that typicallyrequire high qubit sizes In these pipeline implementationsproposed in prior works resource growth was exponential tothe increase in qubit sizes

In this paper the issues outline above is addressed Theefficiencies of different hardware architectural designs forFPGA emulation purposes are evaluated based on the chosencase studies of QFT and Groverrsquos search We propose anaccurate modelling of quantum system for FPGA emulationtargeting efficient resource utilization while maintainingsignificant speed-up over the equivalent simulation approachSince our proposed FPGA emulation framework appliesthe state vector approach simulation models based on the119897119894119887119902119906119886119899119905119906119898 library are selected in this work for benchmark-ing purposes

3 Theoretical Background

In general quantum algorithms obey the basic process flowstructure The computation process begins with a system setin a specific quantum state which is then converted intosuperposition of multiple basis states Unitary transforma-tions are performed on the quantum state according to therequired operations of the algorithm Finally measurementis carried out resulting in the qubits collapsing into classicalbits

31 Quantum Bit (Qubit) In classical computing the small-est unit of information is the bit A bit can be in either state 0or state 1 and the state of a bit can be represented in matrixform as

state 0 =01[

1

0

]

state 1 =01[

0

1

]

(1)

On the other hand in quantum computing the smallestunit of information is the quantum bit or a qubit Todistinguish the classical bit with the quantum qubit Diracket notation is used Using the ket notation the quantumcomputational basis state is represented by |0⟩ and |1⟩ A

qubit can be in state |0⟩ or in state |1⟩ or in superpositionof both basis states The state of a qubit can be represented as

1003816100381610038161003816120595⟩ = 120572 |0⟩ + 120573 |1⟩ equiv

01[

120572

120573

] (2)

where both 120572 and 120573 are complex numbers and |120572|2 + |120573|2 =1 |120572|2 is the probability where the qubit is in state |0⟩ and|120573|2 is the probability where the qubit is in state |1⟩ upon

measurement An 119899-qubit quantum state vector contains2119899 complex numbers which represents the measurementprobability of each basis state However on measurementthe superposition is destroyed and the qubits return to theclassical state of bits depending on the probability derivedfrom the complex-valued state vector

32 TensorKronecker Product Tensor product or Kroneckerproduct is the basic operation that is applied in the formationof a larger quantum system as well as multiqubit quantumtransformations A quantum state vector that can be writtenas the tensor of two vectors is separable whereas a statevector that cannot be expressed as the tensor of two vectorsis entangled [15] The tensor operation on any arbitrary two1-qubit transformations is shown below

[

11988601198861

11988621198863

] otimes [

11988701198871

11988721198873

] =

[

[

[

[

[

[

11988601198870119886011988711198861119887011988611198871

11988601198872119886011988731198861119887211988611198873

11988621198870119886211988711198863119887011988631198871

11988621198872119886211988731198863119887211988631198873

]

]

]

]

]

]

(3)

33 Quantum Circuit Model A quantum algorithm is adescription of a sequence of quantum operations (or trans-formations) applied upon qubits to generate new quantumstates The model most widely used in describing the evolu-tion of a quantum system is the quantum circuit model firstproposed in [32] A quantum circuit is the interconnection ofquantum gates with quantum wires and gate operations arerepresented by unitary matrices

All unitary matrices are invertible and the products ofunitary matrices as well as the inverse of unitary matrix areunitary An119873-by-119873matrix119880 is unitary if119880119880dagger = 119880dagger119880 = 119868

119873

where 119880dagger is the adjoint (conjugate transpose) of 119880 Sinceall quantum transformations are reversible quantum gateoperations can always be undone Fundamental quantumgates include the Hadamard gate phase-shift gate and swapgate and these gates are described as follows

Hadamard gate 119867 is one of the most useful single qubitquantum gates It operates by placing the computational basisstate into superposition of basis states with equal probabilityTheHadamard transform can be represented by the followingunitary matrix

119867 =

1

radic2

[

1 1

1 minus1

] (4)

4 International Journal of Reconfigurable Computing

The following example illustrates the application ofHadamard gates in mapping a 2-qubit basis state |00⟩ to asuperposition of basis states with equal probability

|00⟩

119867otimes119867

997888997888997888997888rarr

1

2

(|00⟩ + |01⟩ + |10⟩ + |11⟩)

(

1

radic2

[

1 1

1 minus1

] otimes

1

radic2

[

1 1

1 minus1

])

[

[

[

[

[

[

1

0

0

0

]

]

]

]

]

]

=

1

2

[

[

[

[

[

[

1

1

1

1

]

]

]

]

]

]

(5)

Controlled phase-shift gate 119862119877119896operates on 2 qubits one

ofwhich is the control qubit and the other is the target qubit Ifthe control qubit is true a phase-shift operation is performedon the target qubit otherwise there is no operation Theoperation is represented by the following matrix

119862119877119896=

[

[

[

[

[

[

[

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 11989021205871198942

119896

]

]

]

]

]

]

]

(6)

Quantum SWAP gate is used for swapping two qubitsIt switches the amplitudes of a quantum state vector Theoperation of a 2-qubit SWAP gate is represented by matrixin

SWAP =[

[

[

[

[

[

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

]

]

]

]

]

]

(7)

34 Quantum Fourier Transform (QFT) The Fourier trans-form is deployed in wide range of engineering and physicsapplications such as signal processing image processingand quantum mechanics It is a reversible transformationthat converts signals from timespatial domain to frequencydomain and vice versa The Fourier transform is defined in(8) for continuous signals and in (9) for discrete signals

119883(119891) = int

infin

minusinfin

119909 (119905) 119890minus1198942120587119891119905

119889119905 (8)

119883119896=

119873minus1

sum

119899=0

119909119899119890minus1198942120587(119896119899119873)

(9)

The quantum Fourier transform (QFT) is a transfor-mation on qubits and is the quantum equivalent of thediscrete Fourier transform It should be noted that a quantumcomputer performs QFT with exponentially less number ofoperations than the classical Fourier transform HoweverQFT does not reduce the execution time of the algorithmwhen classical data is used This is due to the characteristicof the quantum computer that does not allow parallel read-out of all quantum state amplitudes In addition there is no

known method that can effectively instantiate the desiredinput state amplitudes to be Fourier-transformed [33]

In order to harness the power of quantum computingon Fourier transform QFT has to be deployed within otherpractical applications QFT is pivotal in quantum computingsince it is part ofmany quantumalgorithmsThese algorithmsinclude integer factorization and discrete logarithms algo-rithms [1] Simonrsquos periodicity algorithm [9] and Hallgrenrsquosalgorithms [10] They offer significant speed-up over theirclassical counterparts QFT has also found applications inmany real-world problems such as image watermarking [34]and template matching [35]

To compute Fourier transform in quantum domain dis-crete signal samples are encoded as the amplitude sequencesof a quantum state vector which is in superposition of basisstates [31] An 119899-qubit QFT operation which transformsan arbitrary superposition of computational basis states isexpressed in

1003816100381610038161003816120595⟩ =

1

radic2119899

2119899minus1

sum

119895=0

119891 (119895Δ119905)1003816100381610038161003816119895⟩

QFT997888997888997888rarr

1

radic2119899

2119899minus1

sum

119896=0

2119899minus1

sum

119895=0

119891 (119895Δ119905) 1198902120587119894(1198951198962

119899)|119896⟩

(10)

As the requirement for a valid quantum state |120595⟩must benormalized such that it fulfils (11) If the original signal inputsdo not comply with this requirement the amplitudes of thesignal samples have to be divided by the normalization factorradicsum2119899minus1

119897=0|119891(119897Δ119905)|

2 In most cases the input states formed bythe normalized signal samples are entangled

2119899minus1

sum

119895=0

1003816100381610038161003816119891 (119895Δ119905)

1003816100381610038161003816

2= 1 (11)

From (10) it can be observed that the term 1198952119899 in QFTequation is a rational number in the range of 0 le 1198952119899 lt 1 Asqubit representation is typically used in computations the 119895in base-10 integer is redefined in base-2 notation as individualbit such that the binary fraction form as expressed in (12) canbe conveniently adopted

(119895)10equiv (11989511198952sdot sdot sdot 119895119899)2

= (2119899minus11198951+ 2119899minus21198952+ sdot sdot sdot + 2

0119895119899)10

= 2119899(2minus11198951+ 2minus21198952+ sdot sdot sdot + 2

minus119899119895119899)10

= 2119899(0 sdot 11989511198952sdot sdot sdot 119895119899)2

(12)

International Journal of Reconfigurable Computing 5

|k1⟩

|k2⟩

|knminus1⟩

|kn⟩H

H

H

H

R2

R2 Rnmiddot middot middot

middot middot middot Rnminus2 Rnminus1

1

radic2(|0⟩ + e2120587i0middotj1j2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj119899minus1j119899|1⟩)

|j1⟩

|j2⟩

|jnminus1⟩

|jn⟩

1

radic2(|0⟩ + e2120587i0middotj119899|1⟩)

Figure 1 Quantum circuit model for n-qubit QFT

With some algebraic manipulations the QFT equationcan be derived from (13) to form (14) [33]

QFT2119899

1003816100381610038161003816119895⟩ =

1

radic2119899

2119899minus1

sum

119896=0

1198902120587119894(1198951198962

119899)|119896⟩ (13)

=

1

radic2119899(|0⟩ + 119890

21205871198940sdot119895119899|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895119899minus1119895119899

|1⟩) sdot sdot sdot (|0⟩ + 11989021205871198940sdot11989511198952 sdotsdotsdot119895119899

|1⟩)

(14)

Since the term 11989021205871198940sdot1198951 produces either minus1 if 1198951= 1 or +1

otherwise Hadamard computation on the first qubit resultsin (1radic2)(|0⟩ + 11989021205871198940sdot1198951 |1⟩) Computations of the consecutivebits in the binary fraction are obtained using controlledphase-shift gates according to (14) QFT circuit consists ofthree types of elementary gates which are Hadamard gate119867controlled phase-shift gate 119862119877

119896 and SWAP gate The circuit

model of an 119899-qubit QFT is depicted in Figure 1The size of a QFT circuit grows exponentially as the

number of input qubits increases An 119899-qubit QFT involvessum119899

119896=1119896+1 unitary transformations and could process up to 2119899

input samples in one evaluation (provided the input samplesare encoded as the amplitude sequences of a superposition ofcomputational basis states)

35 Groverrsquos Search Algorithm In computer science area atypical search problem is to identify the desired element froman unordered array For many computing applications it iscritical that the search technique is efficient In terms of afunction the search problem 119891 0 1119899 rarr 0 1 is givenwith assurance that there exists one binary string 119909

0where

119891(119909) = 1 if 119909 = 1199090 else 119891(119909) = 0

In classical computing 1198982 queries on average arerequired to search for a particular element in an unorderedarray with 119898 elements In quantum computing Groverrsquossearch algorithm can complete the job in (Π4)radic119898 queries(for the rest of the text and figures in Section 35 the requiredGrover iterations are abbreviated as radic2119899 times) Althoughthe speed-up achieved is only quadratic Groverrsquos algorithmand its extensions are extremely useful in enhancing currentmethods in solving database searching and optimizationproblems which include 3-satisfiability [36] global opti-mization [37] minimum point searching [38] and patternmatching [39] The core operations of Groverrsquos algorithm arephase inversion and inversion about mean Phase inversion

|x⟩

|1⟩

|1205930⟩ |1205931⟩ |1205932⟩

Uf

nn

H

Figure 2 Quantum circuit for phase inversion [15]

inverts the phase of the state-of-interest and its quantumcircuit model is given in Figure 2

In Figure 2 the top 119899-qubit |x⟩ is the target qubit andthe bottom qubit is called the ancilla qubit The functionof 119880119891is to pick out the desired binary string To apply

phase inversion on target qubits Hadamard gate operationis performed on the ancilla qubit which is initialized as |1⟩This is to complement the effect of 119880

119891which takes |x 119910⟩ to

|x 119891(x) oplus 119910⟩In terms of matrices the phase inversion operation can

be expressed as 119880119891(119868119899oplus 119867)|x 1⟩ and the corresponding

quantum states are described as follows10038161003816100381610038161205930⟩ = |x 1⟩

10038161003816100381610038161205931⟩ = |x⟩ [ |0⟩ minus |1⟩

radic2

] = [

|x 0⟩ minus |x 1⟩radic2

]

10038161003816100381610038161205932⟩ = |x⟩ [

1003816100381610038161003816119891 (x) oplus 0⟩ minus 100381610038161003816

1003816119891 (x) oplus 1⟩

radic2

]

= |x⟩ [1003816100381610038161003816119891 (x)⟩ minus 1003816100381610038161003816

1003816119891 (x)⟩

radic2

]

= (minus1)119891(x)|x⟩ [ |0⟩ minus |1⟩

radic2

]

=

minus1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

+1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

(15)

Inversion about mean boosts the phase separationbetween the element-of-interest and other elements in theunordered arrays (after phase inversion operation is appliedto invert the phase of the target element) The mean of

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 3

input signal samples in one evaluation which is equivalentto a 5-qubit QFT emulation if effects of superposition andentanglement are included

From the above discussion it should be noted thenthat the critical quantum properties of superposition andentanglement were not considered in these previous worksresulting in inaccurate modelling of quantum algorithmsWithout the superposition and entanglement effects thepower of quantum parallelism cannot fully be exploited Pre-vious works reported in [25ndash27] applied pipeline architecturein their FPGA emulation implementations so as to obtainhigh throughput and low critical path delay However apipeline design imposes high resource utilization (due to therequirement of additional pipeline registers and associatedlogic) thus limiting FPGA emulation to be deployed in morepractical quantum computing applications that typicallyrequire high qubit sizes In these pipeline implementationsproposed in prior works resource growth was exponential tothe increase in qubit sizes

In this paper the issues outline above is addressed Theefficiencies of different hardware architectural designs forFPGA emulation purposes are evaluated based on the chosencase studies of QFT and Groverrsquos search We propose anaccurate modelling of quantum system for FPGA emulationtargeting efficient resource utilization while maintainingsignificant speed-up over the equivalent simulation approachSince our proposed FPGA emulation framework appliesthe state vector approach simulation models based on the119897119894119887119902119906119886119899119905119906119898 library are selected in this work for benchmark-ing purposes

3 Theoretical Background

In general quantum algorithms obey the basic process flowstructure The computation process begins with a system setin a specific quantum state which is then converted intosuperposition of multiple basis states Unitary transforma-tions are performed on the quantum state according to therequired operations of the algorithm Finally measurementis carried out resulting in the qubits collapsing into classicalbits

31 Quantum Bit (Qubit) In classical computing the small-est unit of information is the bit A bit can be in either state 0or state 1 and the state of a bit can be represented in matrixform as

state 0 =01[

1

0

]

state 1 =01[

0

1

]

(1)

On the other hand in quantum computing the smallestunit of information is the quantum bit or a qubit Todistinguish the classical bit with the quantum qubit Diracket notation is used Using the ket notation the quantumcomputational basis state is represented by |0⟩ and |1⟩ A

qubit can be in state |0⟩ or in state |1⟩ or in superpositionof both basis states The state of a qubit can be represented as

1003816100381610038161003816120595⟩ = 120572 |0⟩ + 120573 |1⟩ equiv

01[

120572

120573

] (2)

where both 120572 and 120573 are complex numbers and |120572|2 + |120573|2 =1 |120572|2 is the probability where the qubit is in state |0⟩ and|120573|2 is the probability where the qubit is in state |1⟩ upon

measurement An 119899-qubit quantum state vector contains2119899 complex numbers which represents the measurementprobability of each basis state However on measurementthe superposition is destroyed and the qubits return to theclassical state of bits depending on the probability derivedfrom the complex-valued state vector

32 TensorKronecker Product Tensor product or Kroneckerproduct is the basic operation that is applied in the formationof a larger quantum system as well as multiqubit quantumtransformations A quantum state vector that can be writtenas the tensor of two vectors is separable whereas a statevector that cannot be expressed as the tensor of two vectorsis entangled [15] The tensor operation on any arbitrary two1-qubit transformations is shown below

[

11988601198861

11988621198863

] otimes [

11988701198871

11988721198873

] =

[

[

[

[

[

[

11988601198870119886011988711198861119887011988611198871

11988601198872119886011988731198861119887211988611198873

11988621198870119886211988711198863119887011988631198871

11988621198872119886211988731198863119887211988631198873

]

]

]

]

]

]

(3)

33 Quantum Circuit Model A quantum algorithm is adescription of a sequence of quantum operations (or trans-formations) applied upon qubits to generate new quantumstates The model most widely used in describing the evolu-tion of a quantum system is the quantum circuit model firstproposed in [32] A quantum circuit is the interconnection ofquantum gates with quantum wires and gate operations arerepresented by unitary matrices

All unitary matrices are invertible and the products ofunitary matrices as well as the inverse of unitary matrix areunitary An119873-by-119873matrix119880 is unitary if119880119880dagger = 119880dagger119880 = 119868

119873

where 119880dagger is the adjoint (conjugate transpose) of 119880 Sinceall quantum transformations are reversible quantum gateoperations can always be undone Fundamental quantumgates include the Hadamard gate phase-shift gate and swapgate and these gates are described as follows

Hadamard gate 119867 is one of the most useful single qubitquantum gates It operates by placing the computational basisstate into superposition of basis states with equal probabilityTheHadamard transform can be represented by the followingunitary matrix

119867 =

1

radic2

[

1 1

1 minus1

] (4)

4 International Journal of Reconfigurable Computing

The following example illustrates the application ofHadamard gates in mapping a 2-qubit basis state |00⟩ to asuperposition of basis states with equal probability

|00⟩

119867otimes119867

997888997888997888997888rarr

1

2

(|00⟩ + |01⟩ + |10⟩ + |11⟩)

(

1

radic2

[

1 1

1 minus1

] otimes

1

radic2

[

1 1

1 minus1

])

[

[

[

[

[

[

1

0

0

0

]

]

]

]

]

]

=

1

2

[

[

[

[

[

[

1

1

1

1

]

]

]

]

]

]

(5)

Controlled phase-shift gate 119862119877119896operates on 2 qubits one

ofwhich is the control qubit and the other is the target qubit Ifthe control qubit is true a phase-shift operation is performedon the target qubit otherwise there is no operation Theoperation is represented by the following matrix

119862119877119896=

[

[

[

[

[

[

[

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 11989021205871198942

119896

]

]

]

]

]

]

]

(6)

Quantum SWAP gate is used for swapping two qubitsIt switches the amplitudes of a quantum state vector Theoperation of a 2-qubit SWAP gate is represented by matrixin

SWAP =[

[

[

[

[

[

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

]

]

]

]

]

]

(7)

34 Quantum Fourier Transform (QFT) The Fourier trans-form is deployed in wide range of engineering and physicsapplications such as signal processing image processingand quantum mechanics It is a reversible transformationthat converts signals from timespatial domain to frequencydomain and vice versa The Fourier transform is defined in(8) for continuous signals and in (9) for discrete signals

119883(119891) = int

infin

minusinfin

119909 (119905) 119890minus1198942120587119891119905

119889119905 (8)

119883119896=

119873minus1

sum

119899=0

119909119899119890minus1198942120587(119896119899119873)

(9)

The quantum Fourier transform (QFT) is a transfor-mation on qubits and is the quantum equivalent of thediscrete Fourier transform It should be noted that a quantumcomputer performs QFT with exponentially less number ofoperations than the classical Fourier transform HoweverQFT does not reduce the execution time of the algorithmwhen classical data is used This is due to the characteristicof the quantum computer that does not allow parallel read-out of all quantum state amplitudes In addition there is no

known method that can effectively instantiate the desiredinput state amplitudes to be Fourier-transformed [33]

In order to harness the power of quantum computingon Fourier transform QFT has to be deployed within otherpractical applications QFT is pivotal in quantum computingsince it is part ofmany quantumalgorithmsThese algorithmsinclude integer factorization and discrete logarithms algo-rithms [1] Simonrsquos periodicity algorithm [9] and Hallgrenrsquosalgorithms [10] They offer significant speed-up over theirclassical counterparts QFT has also found applications inmany real-world problems such as image watermarking [34]and template matching [35]

To compute Fourier transform in quantum domain dis-crete signal samples are encoded as the amplitude sequencesof a quantum state vector which is in superposition of basisstates [31] An 119899-qubit QFT operation which transformsan arbitrary superposition of computational basis states isexpressed in

1003816100381610038161003816120595⟩ =

1

radic2119899

2119899minus1

sum

119895=0

119891 (119895Δ119905)1003816100381610038161003816119895⟩

QFT997888997888997888rarr

1

radic2119899

2119899minus1

sum

119896=0

2119899minus1

sum

119895=0

119891 (119895Δ119905) 1198902120587119894(1198951198962

119899)|119896⟩

(10)

As the requirement for a valid quantum state |120595⟩must benormalized such that it fulfils (11) If the original signal inputsdo not comply with this requirement the amplitudes of thesignal samples have to be divided by the normalization factorradicsum2119899minus1

119897=0|119891(119897Δ119905)|

2 In most cases the input states formed bythe normalized signal samples are entangled

2119899minus1

sum

119895=0

1003816100381610038161003816119891 (119895Δ119905)

1003816100381610038161003816

2= 1 (11)

From (10) it can be observed that the term 1198952119899 in QFTequation is a rational number in the range of 0 le 1198952119899 lt 1 Asqubit representation is typically used in computations the 119895in base-10 integer is redefined in base-2 notation as individualbit such that the binary fraction form as expressed in (12) canbe conveniently adopted

(119895)10equiv (11989511198952sdot sdot sdot 119895119899)2

= (2119899minus11198951+ 2119899minus21198952+ sdot sdot sdot + 2

0119895119899)10

= 2119899(2minus11198951+ 2minus21198952+ sdot sdot sdot + 2

minus119899119895119899)10

= 2119899(0 sdot 11989511198952sdot sdot sdot 119895119899)2

(12)

International Journal of Reconfigurable Computing 5

|k1⟩

|k2⟩

|knminus1⟩

|kn⟩H

H

H

H

R2

R2 Rnmiddot middot middot

middot middot middot Rnminus2 Rnminus1

1

radic2(|0⟩ + e2120587i0middotj1j2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj119899minus1j119899|1⟩)

|j1⟩

|j2⟩

|jnminus1⟩

|jn⟩

1

radic2(|0⟩ + e2120587i0middotj119899|1⟩)

Figure 1 Quantum circuit model for n-qubit QFT

With some algebraic manipulations the QFT equationcan be derived from (13) to form (14) [33]

QFT2119899

1003816100381610038161003816119895⟩ =

1

radic2119899

2119899minus1

sum

119896=0

1198902120587119894(1198951198962

119899)|119896⟩ (13)

=

1

radic2119899(|0⟩ + 119890

21205871198940sdot119895119899|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895119899minus1119895119899

|1⟩) sdot sdot sdot (|0⟩ + 11989021205871198940sdot11989511198952 sdotsdotsdot119895119899

|1⟩)

(14)

Since the term 11989021205871198940sdot1198951 produces either minus1 if 1198951= 1 or +1

otherwise Hadamard computation on the first qubit resultsin (1radic2)(|0⟩ + 11989021205871198940sdot1198951 |1⟩) Computations of the consecutivebits in the binary fraction are obtained using controlledphase-shift gates according to (14) QFT circuit consists ofthree types of elementary gates which are Hadamard gate119867controlled phase-shift gate 119862119877

119896 and SWAP gate The circuit

model of an 119899-qubit QFT is depicted in Figure 1The size of a QFT circuit grows exponentially as the

number of input qubits increases An 119899-qubit QFT involvessum119899

119896=1119896+1 unitary transformations and could process up to 2119899

input samples in one evaluation (provided the input samplesare encoded as the amplitude sequences of a superposition ofcomputational basis states)

35 Groverrsquos Search Algorithm In computer science area atypical search problem is to identify the desired element froman unordered array For many computing applications it iscritical that the search technique is efficient In terms of afunction the search problem 119891 0 1119899 rarr 0 1 is givenwith assurance that there exists one binary string 119909

0where

119891(119909) = 1 if 119909 = 1199090 else 119891(119909) = 0

In classical computing 1198982 queries on average arerequired to search for a particular element in an unorderedarray with 119898 elements In quantum computing Groverrsquossearch algorithm can complete the job in (Π4)radic119898 queries(for the rest of the text and figures in Section 35 the requiredGrover iterations are abbreviated as radic2119899 times) Althoughthe speed-up achieved is only quadratic Groverrsquos algorithmand its extensions are extremely useful in enhancing currentmethods in solving database searching and optimizationproblems which include 3-satisfiability [36] global opti-mization [37] minimum point searching [38] and patternmatching [39] The core operations of Groverrsquos algorithm arephase inversion and inversion about mean Phase inversion

|x⟩

|1⟩

|1205930⟩ |1205931⟩ |1205932⟩

Uf

nn

H

Figure 2 Quantum circuit for phase inversion [15]

inverts the phase of the state-of-interest and its quantumcircuit model is given in Figure 2

In Figure 2 the top 119899-qubit |x⟩ is the target qubit andthe bottom qubit is called the ancilla qubit The functionof 119880119891is to pick out the desired binary string To apply

phase inversion on target qubits Hadamard gate operationis performed on the ancilla qubit which is initialized as |1⟩This is to complement the effect of 119880

119891which takes |x 119910⟩ to

|x 119891(x) oplus 119910⟩In terms of matrices the phase inversion operation can

be expressed as 119880119891(119868119899oplus 119867)|x 1⟩ and the corresponding

quantum states are described as follows10038161003816100381610038161205930⟩ = |x 1⟩

10038161003816100381610038161205931⟩ = |x⟩ [ |0⟩ minus |1⟩

radic2

] = [

|x 0⟩ minus |x 1⟩radic2

]

10038161003816100381610038161205932⟩ = |x⟩ [

1003816100381610038161003816119891 (x) oplus 0⟩ minus 100381610038161003816

1003816119891 (x) oplus 1⟩

radic2

]

= |x⟩ [1003816100381610038161003816119891 (x)⟩ minus 1003816100381610038161003816

1003816119891 (x)⟩

radic2

]

= (minus1)119891(x)|x⟩ [ |0⟩ minus |1⟩

radic2

]

=

minus1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

+1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

(15)

Inversion about mean boosts the phase separationbetween the element-of-interest and other elements in theunordered arrays (after phase inversion operation is appliedto invert the phase of the target element) The mean of

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

4 International Journal of Reconfigurable Computing

The following example illustrates the application ofHadamard gates in mapping a 2-qubit basis state |00⟩ to asuperposition of basis states with equal probability

|00⟩

119867otimes119867

997888997888997888997888rarr

1

2

(|00⟩ + |01⟩ + |10⟩ + |11⟩)

(

1

radic2

[

1 1

1 minus1

] otimes

1

radic2

[

1 1

1 minus1

])

[

[

[

[

[

[

1

0

0

0

]

]

]

]

]

]

=

1

2

[

[

[

[

[

[

1

1

1

1

]

]

]

]

]

]

(5)

Controlled phase-shift gate 119862119877119896operates on 2 qubits one

ofwhich is the control qubit and the other is the target qubit Ifthe control qubit is true a phase-shift operation is performedon the target qubit otherwise there is no operation Theoperation is represented by the following matrix

119862119877119896=

[

[

[

[

[

[

[

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 11989021205871198942

119896

]

]

]

]

]

]

]

(6)

Quantum SWAP gate is used for swapping two qubitsIt switches the amplitudes of a quantum state vector Theoperation of a 2-qubit SWAP gate is represented by matrixin

SWAP =[

[

[

[

[

[

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

]

]

]

]

]

]

(7)

34 Quantum Fourier Transform (QFT) The Fourier trans-form is deployed in wide range of engineering and physicsapplications such as signal processing image processingand quantum mechanics It is a reversible transformationthat converts signals from timespatial domain to frequencydomain and vice versa The Fourier transform is defined in(8) for continuous signals and in (9) for discrete signals

119883(119891) = int

infin

minusinfin

119909 (119905) 119890minus1198942120587119891119905

119889119905 (8)

119883119896=

119873minus1

sum

119899=0

119909119899119890minus1198942120587(119896119899119873)

(9)

The quantum Fourier transform (QFT) is a transfor-mation on qubits and is the quantum equivalent of thediscrete Fourier transform It should be noted that a quantumcomputer performs QFT with exponentially less number ofoperations than the classical Fourier transform HoweverQFT does not reduce the execution time of the algorithmwhen classical data is used This is due to the characteristicof the quantum computer that does not allow parallel read-out of all quantum state amplitudes In addition there is no

known method that can effectively instantiate the desiredinput state amplitudes to be Fourier-transformed [33]

In order to harness the power of quantum computingon Fourier transform QFT has to be deployed within otherpractical applications QFT is pivotal in quantum computingsince it is part ofmany quantumalgorithmsThese algorithmsinclude integer factorization and discrete logarithms algo-rithms [1] Simonrsquos periodicity algorithm [9] and Hallgrenrsquosalgorithms [10] They offer significant speed-up over theirclassical counterparts QFT has also found applications inmany real-world problems such as image watermarking [34]and template matching [35]

To compute Fourier transform in quantum domain dis-crete signal samples are encoded as the amplitude sequencesof a quantum state vector which is in superposition of basisstates [31] An 119899-qubit QFT operation which transformsan arbitrary superposition of computational basis states isexpressed in

1003816100381610038161003816120595⟩ =

1

radic2119899

2119899minus1

sum

119895=0

119891 (119895Δ119905)1003816100381610038161003816119895⟩

QFT997888997888997888rarr

1

radic2119899

2119899minus1

sum

119896=0

2119899minus1

sum

119895=0

119891 (119895Δ119905) 1198902120587119894(1198951198962

119899)|119896⟩

(10)

As the requirement for a valid quantum state |120595⟩must benormalized such that it fulfils (11) If the original signal inputsdo not comply with this requirement the amplitudes of thesignal samples have to be divided by the normalization factorradicsum2119899minus1

119897=0|119891(119897Δ119905)|

2 In most cases the input states formed bythe normalized signal samples are entangled

2119899minus1

sum

119895=0

1003816100381610038161003816119891 (119895Δ119905)

1003816100381610038161003816

2= 1 (11)

From (10) it can be observed that the term 1198952119899 in QFTequation is a rational number in the range of 0 le 1198952119899 lt 1 Asqubit representation is typically used in computations the 119895in base-10 integer is redefined in base-2 notation as individualbit such that the binary fraction form as expressed in (12) canbe conveniently adopted

(119895)10equiv (11989511198952sdot sdot sdot 119895119899)2

= (2119899minus11198951+ 2119899minus21198952+ sdot sdot sdot + 2

0119895119899)10

= 2119899(2minus11198951+ 2minus21198952+ sdot sdot sdot + 2

minus119899119895119899)10

= 2119899(0 sdot 11989511198952sdot sdot sdot 119895119899)2

(12)

International Journal of Reconfigurable Computing 5

|k1⟩

|k2⟩

|knminus1⟩

|kn⟩H

H

H

H

R2

R2 Rnmiddot middot middot

middot middot middot Rnminus2 Rnminus1

1

radic2(|0⟩ + e2120587i0middotj1j2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj119899minus1j119899|1⟩)

|j1⟩

|j2⟩

|jnminus1⟩

|jn⟩

1

radic2(|0⟩ + e2120587i0middotj119899|1⟩)

Figure 1 Quantum circuit model for n-qubit QFT

With some algebraic manipulations the QFT equationcan be derived from (13) to form (14) [33]

QFT2119899

1003816100381610038161003816119895⟩ =

1

radic2119899

2119899minus1

sum

119896=0

1198902120587119894(1198951198962

119899)|119896⟩ (13)

=

1

radic2119899(|0⟩ + 119890

21205871198940sdot119895119899|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895119899minus1119895119899

|1⟩) sdot sdot sdot (|0⟩ + 11989021205871198940sdot11989511198952 sdotsdotsdot119895119899

|1⟩)

(14)

Since the term 11989021205871198940sdot1198951 produces either minus1 if 1198951= 1 or +1

otherwise Hadamard computation on the first qubit resultsin (1radic2)(|0⟩ + 11989021205871198940sdot1198951 |1⟩) Computations of the consecutivebits in the binary fraction are obtained using controlledphase-shift gates according to (14) QFT circuit consists ofthree types of elementary gates which are Hadamard gate119867controlled phase-shift gate 119862119877

119896 and SWAP gate The circuit

model of an 119899-qubit QFT is depicted in Figure 1The size of a QFT circuit grows exponentially as the

number of input qubits increases An 119899-qubit QFT involvessum119899

119896=1119896+1 unitary transformations and could process up to 2119899

input samples in one evaluation (provided the input samplesare encoded as the amplitude sequences of a superposition ofcomputational basis states)

35 Groverrsquos Search Algorithm In computer science area atypical search problem is to identify the desired element froman unordered array For many computing applications it iscritical that the search technique is efficient In terms of afunction the search problem 119891 0 1119899 rarr 0 1 is givenwith assurance that there exists one binary string 119909

0where

119891(119909) = 1 if 119909 = 1199090 else 119891(119909) = 0

In classical computing 1198982 queries on average arerequired to search for a particular element in an unorderedarray with 119898 elements In quantum computing Groverrsquossearch algorithm can complete the job in (Π4)radic119898 queries(for the rest of the text and figures in Section 35 the requiredGrover iterations are abbreviated as radic2119899 times) Althoughthe speed-up achieved is only quadratic Groverrsquos algorithmand its extensions are extremely useful in enhancing currentmethods in solving database searching and optimizationproblems which include 3-satisfiability [36] global opti-mization [37] minimum point searching [38] and patternmatching [39] The core operations of Groverrsquos algorithm arephase inversion and inversion about mean Phase inversion

|x⟩

|1⟩

|1205930⟩ |1205931⟩ |1205932⟩

Uf

nn

H

Figure 2 Quantum circuit for phase inversion [15]

inverts the phase of the state-of-interest and its quantumcircuit model is given in Figure 2

In Figure 2 the top 119899-qubit |x⟩ is the target qubit andthe bottom qubit is called the ancilla qubit The functionof 119880119891is to pick out the desired binary string To apply

phase inversion on target qubits Hadamard gate operationis performed on the ancilla qubit which is initialized as |1⟩This is to complement the effect of 119880

119891which takes |x 119910⟩ to

|x 119891(x) oplus 119910⟩In terms of matrices the phase inversion operation can

be expressed as 119880119891(119868119899oplus 119867)|x 1⟩ and the corresponding

quantum states are described as follows10038161003816100381610038161205930⟩ = |x 1⟩

10038161003816100381610038161205931⟩ = |x⟩ [ |0⟩ minus |1⟩

radic2

] = [

|x 0⟩ minus |x 1⟩radic2

]

10038161003816100381610038161205932⟩ = |x⟩ [

1003816100381610038161003816119891 (x) oplus 0⟩ minus 100381610038161003816

1003816119891 (x) oplus 1⟩

radic2

]

= |x⟩ [1003816100381610038161003816119891 (x)⟩ minus 1003816100381610038161003816

1003816119891 (x)⟩

radic2

]

= (minus1)119891(x)|x⟩ [ |0⟩ minus |1⟩

radic2

]

=

minus1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

+1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

(15)

Inversion about mean boosts the phase separationbetween the element-of-interest and other elements in theunordered arrays (after phase inversion operation is appliedto invert the phase of the target element) The mean of

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 5

|k1⟩

|k2⟩

|knminus1⟩

|kn⟩H

H

H

H

R2

R2 Rnmiddot middot middot

middot middot middot Rnminus2 Rnminus1

1

radic2(|0⟩ + e2120587i0middotj1j2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2 middotmiddotmiddotj119899|1⟩)

1

radic2(|0⟩ + e2120587i0middotj119899minus1j119899|1⟩)

|j1⟩

|j2⟩

|jnminus1⟩

|jn⟩

1

radic2(|0⟩ + e2120587i0middotj119899|1⟩)

Figure 1 Quantum circuit model for n-qubit QFT

With some algebraic manipulations the QFT equationcan be derived from (13) to form (14) [33]

QFT2119899

1003816100381610038161003816119895⟩ =

1

radic2119899

2119899minus1

sum

119896=0

1198902120587119894(1198951198962

119899)|119896⟩ (13)

=

1

radic2119899(|0⟩ + 119890

21205871198940sdot119895119899|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895119899minus1119895119899

|1⟩) sdot sdot sdot (|0⟩ + 11989021205871198940sdot11989511198952 sdotsdotsdot119895119899

|1⟩)

(14)

Since the term 11989021205871198940sdot1198951 produces either minus1 if 1198951= 1 or +1

otherwise Hadamard computation on the first qubit resultsin (1radic2)(|0⟩ + 11989021205871198940sdot1198951 |1⟩) Computations of the consecutivebits in the binary fraction are obtained using controlledphase-shift gates according to (14) QFT circuit consists ofthree types of elementary gates which are Hadamard gate119867controlled phase-shift gate 119862119877

119896 and SWAP gate The circuit

model of an 119899-qubit QFT is depicted in Figure 1The size of a QFT circuit grows exponentially as the

number of input qubits increases An 119899-qubit QFT involvessum119899

119896=1119896+1 unitary transformations and could process up to 2119899

input samples in one evaluation (provided the input samplesare encoded as the amplitude sequences of a superposition ofcomputational basis states)

35 Groverrsquos Search Algorithm In computer science area atypical search problem is to identify the desired element froman unordered array For many computing applications it iscritical that the search technique is efficient In terms of afunction the search problem 119891 0 1119899 rarr 0 1 is givenwith assurance that there exists one binary string 119909

0where

119891(119909) = 1 if 119909 = 1199090 else 119891(119909) = 0

In classical computing 1198982 queries on average arerequired to search for a particular element in an unorderedarray with 119898 elements In quantum computing Groverrsquossearch algorithm can complete the job in (Π4)radic119898 queries(for the rest of the text and figures in Section 35 the requiredGrover iterations are abbreviated as radic2119899 times) Althoughthe speed-up achieved is only quadratic Groverrsquos algorithmand its extensions are extremely useful in enhancing currentmethods in solving database searching and optimizationproblems which include 3-satisfiability [36] global opti-mization [37] minimum point searching [38] and patternmatching [39] The core operations of Groverrsquos algorithm arephase inversion and inversion about mean Phase inversion

|x⟩

|1⟩

|1205930⟩ |1205931⟩ |1205932⟩

Uf

nn

H

Figure 2 Quantum circuit for phase inversion [15]

inverts the phase of the state-of-interest and its quantumcircuit model is given in Figure 2

In Figure 2 the top 119899-qubit |x⟩ is the target qubit andthe bottom qubit is called the ancilla qubit The functionof 119880119891is to pick out the desired binary string To apply

phase inversion on target qubits Hadamard gate operationis performed on the ancilla qubit which is initialized as |1⟩This is to complement the effect of 119880

119891which takes |x 119910⟩ to

|x 119891(x) oplus 119910⟩In terms of matrices the phase inversion operation can

be expressed as 119880119891(119868119899oplus 119867)|x 1⟩ and the corresponding

quantum states are described as follows10038161003816100381610038161205930⟩ = |x 1⟩

10038161003816100381610038161205931⟩ = |x⟩ [ |0⟩ minus |1⟩

radic2

] = [

|x 0⟩ minus |x 1⟩radic2

]

10038161003816100381610038161205932⟩ = |x⟩ [

1003816100381610038161003816119891 (x) oplus 0⟩ minus 100381610038161003816

1003816119891 (x) oplus 1⟩

radic2

]

= |x⟩ [1003816100381610038161003816119891 (x)⟩ minus 1003816100381610038161003816

1003816119891 (x)⟩

radic2

]

= (minus1)119891(x)|x⟩ [ |0⟩ minus |1⟩

radic2

]

=

minus1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

+1 |x⟩ [ |0⟩ minus |1⟩radic2

] if x = x0

(15)

Inversion about mean boosts the phase separationbetween the element-of-interest and other elements in theunordered arrays (after phase inversion operation is appliedto invert the phase of the target element) The mean of

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

6 International Journal of Reconfigurable Computing

(1) Start with |0⟩ as the target input qubits(2) Apply 119899-qubit Hadamard gate on target qubits119867otimes119899

(3) forradic2119899 times do(4) Apply phase inversion operation 119880

119891(119868 otimes 119867)

(5) Apply inversion about mean operation on the target qubits minus119868 + 2119860(6) end for(7) Measure the target qubits

Algorithm 1 Groverrsquos search algorithm

all elements is computed and inversions are made aboutthe mean The overall mean remains unchanged after theinversion process This is because the distance between oneelement and the mean is the same before and after inversionThe only change is if the original sequence is above themean during the inversion it is flipped about the mean to thesame distance below the mean and vice versa In general theinversion about mean operation can be expressed as

V1015840 = minusV + 2119886 (16)

where 119886 is the mean V is the value of an element in the arrayand V1015840 is the new value of that element after inversion

In terms of matrices the mean of a 2119899 elements vector119881 is obtained by the product of matrices 119860 and 119881 where allthe elements in the 2119899-by-2119899 matrix 119860 are set to 12119899 Henceinversion about mean in matrix form becomes

1198811015840= minus119881 + 2119860119881 = (minus119868 + 2119860)119881 (17)

In order to achieve high confidence of getting the desiredelement the amplitude amplification process (amplitudeamplification in Groverrsquos search algorithm involves phaseinversion and inversion about mean operations) has to berepeated for radic2119899 times This is because the probability ofsuccess changes sinusoidally by the number of amplitudeamplification iterations (as illustrated in Figure 3) andthe highest probability of success first happened after therequired iterations Pseudocode for a generic Groverrsquos searchalgorithm is given in Algorithm 1

There are two approaches to model Groverrsquos searchquantum algorithm The first approach which is based onquantum circuitmodel is discussed nextThe secondmethodis modelling using arithmetic functions and this is presentedin Section 42 since this approach is applied in this paper

As shown in Figure 4 Groverrsquos search circuit given in [15]is constructed with assumption that black box modules 119880

119891

and minus119868+2119860 are available Descriptions of the119880119891and minus119868+2119860

modules have been given earlierOn the other hand the circuit model for 119899-qubit Groverrsquos

algorithm presented in [33] is shown in Figure 5(a) In thisfigure119867 is the Hadamard gate and 119866 is the Grover iterationcircuit which is illustrated in Figure 5(b)

The function of Grover iteration circuit is equivalentto the phase inversion and inversion about mean In orderto achieve high probability of successful search 119866 is con-catenated for radic2119899 times In Figure 5(b) the role of oraclemodule is to recognize the solution to a particular search

0 20 40 60 80 100 120 140 160 180 2000

010203040506070809

1

Number of iterations

Mea

sure

men

t pro

babi

lity

Element-of-interestOther elements

Figure 3 Probability of success by the number of amplitude amplifi-cation iterations amongst 210 probabilities For 10-qubit search firsthighest probability of success happens at 25th iteration

|0⟩

|1⟩

minusI + 2A

Phase inversionInversion

Hotimesn

Ufn n n n

H

about mean

Repeat radic2n times

Figure 4 Modelling of Groverrsquos search based on quantum circuitmodel [15]

problem in the phase inversion operation By monitoringthe oracle qubit a solution to the search problem can bedetected through the changes of the oracle qubit The designof oraclemodule varieswith different search applications andan example of the oracle circuit for a simple 3-bit search task isshown in Figure 6 In Figure 6 the119883 symbol represents Pauli-119883matrixThe open circle notation indicates conditioning onthe qubit being set to zero whereas the closed circle indicatesconditioning on the qubit being set to one

Deriving from the circuits in Figure 5 the corresponding3-qubit Groverrsquos search circuit is provided in Figure 7 Thiscircuit model is made up of Hadamard oracle quantumNOT and multiqubit controlled-NOT gates

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 7

|0⟩

|1⟩

Target qubits

Oracle qubit

Hotimesn Hotimesnn n

n

G G

H H

middot middot middot

O(radic2n)

(a) 119899-qubit Groverrsquos search

Oracle workspace

Phase

Oraclen-qubits Hotimesn Hotimesn

For x gt 0|x⟩ 997888rarr (minus1)f(x)|x⟩

|0⟩ 997888rarr |0⟩

|x⟩ 997888rarr minus|x⟩

(b) Grover iteration circuit 119866

Figure 5 Quantum circuit model for Groverrsquos search algorithm [33]

X

Xequiv

Figure 6 Oracle circuit for recognizing binary string ldquo010rdquo

4 Proposed FPGA-Based Hardware Emulation

This section presents our approach in modelling quantumFourier transform and Groverrsquos search algorithms for FPGAemulation The proposed techniques can be generalized toFPGA emulation of more complex quantum algorithms thatapply QFT or Groverrsquos algorithm This paper extends ourearlierwork presented in [40 41] In these previousworks thehardware architecture proposed was restricted to the serialdesign with resource sharing facilitated at the register levelIn this paper we enable resource sharing at the operator (orcomputational) level that allows for more efficient emulationof the quantum algorithms Furthermore additional casestudy on Groverrsquos search algorithm is included for general-ization of the proposed framework The choice of hardwarearchitecture varies based on the need of different applicationsBased on the selected case studies the efficiencies of differenthardware architectures for quantum computing emulationpurposes are discussed and analysed in this work The goalis to achieve scalability and also efficient resource utilizationfor emulating practical larger qubit size quantum systems

41 Modelling QFT for FPGA Emulation The derivation ofquantum circuit model for 119899-qubit QFT was discussed earlieris Section 34 Here we present the modelling of QFT forFPGA emulation based on a 3-qubit example According to(14) the mathematical expression for 3-qubit QFT is derivedas shown in

QFT23

1003816100381610038161003816119895⟩ =

1

radic23(|0⟩ + 119890

21205871198940sdot1198953|1⟩)

sdot (|0⟩ + 11989021205871198940sdot11989521198953

|1⟩)

sdot (|0⟩ + 11989021205871198940sdot119895111989521198953

|1⟩)

(18)

Deriving from the general 119899-qubit QFT circuit providedin Figure 1 the corresponding quantum circuit for 3-qubitQFT is obtained as shown in Figure 8

The circuit consists of Hadamard gates 119867 controlledphase-shift gates 119862119877

2and 119862119877

3 and also the SWAP gate

Referring to the functional block diagram given in Figure 9this quantum circuit model corresponds to a sequence ofunitary transformations 119880119894 119894 = 1 to 7 defined by

1198801 = 119867 otimes 119868 otimes 119868 (19)

1198802 =1198621198772otimes 119868 (20)

1198803 = (119868 otimes SWAP) sdot (1198621198773otimes 119868) sdot (119868 otimes SWAP) (21)

1198804 = 119868 otimes 119867 otimes 119868 (22)

1198805 = 119868 otimes1198621198772 (23)

1198806 = 119868 otimes 119868 otimes 119867 (24)

1198807 = SWAP23 (25)

Note that in Figure 9 the modelling of 119899-qubit quan-tum system with superposition and entanglement propertiesresulted in a circuit with 2119899 signals Since the input samplesto QFT circuit are encoded as sequence of amplitudes inan entangled superposition of basis states (discussed in Sec-tion 34) modelling based on individual qubit with separatequantum gate operations is unable to reflect the effects ofapplying a quantum gate on entangled qubits correctly

In order to model the effect of superposition and entan-glement derivation of each unitary transformation is madethrough the tensor product of individual quantum gate andidentity matrix to form unitary matrix of equal dimensionwith the quantum state vector Detailed derivations of thequantum unitary matrices for the 3-qubit QFT have beenpresented in our previous paper [41]

Since these quantum unitary matrices are mostly sparsematrices we extract minimal number of useful arithmeticoperations (due to nonzero elements in the matrices) result-ing in an optimal realization of themodel that can bemappedto an efficient FPGA emulation architecture Incidentally asoftware program has been developed to automate this map-ping hence easily scaling up the circuit model to larger qubitsizes From these arithmetic functions the correspondingdata-flow graph for the 3-qubit QFT is derived as shown inFigure 10

In Figure 10 each IN and OUT signal is a fixed pointcomplex number register for an element of the quantumstate vector The operation of module 119865 corresponds toout 119903 = minusin 119894 and out 119894 = in 119903 where 119865 is the multiplication

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

8 International Journal of Reconfigurable Computing

|0⟩

|0⟩

|0⟩

|1⟩

Target qubits

Grover iteration circuit GGrover iteration circuit G

Oracle qubit

Oracle Oracle

H H

H H

H

H

H

H

H

H

H

H

H

H

H

H

H

H

H HH HX

X

X

X

X

X

X

X

X

X

X

X

H

H

Figure 7 Modelling of 3-qubit Groverrsquos search based on quantum gates

H

H

H

R2

R2

R3

U1 U2 U3 U4 U5 U6 U7

|k1⟩

|k2⟩

|j1⟩

|j2⟩

|k3⟩|j3⟩

1

radic2(|0⟩ + e2120587i0middotj1j2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj2j3|1⟩)

1

radic2(|0⟩ + e2120587i0middotj3|1⟩)

Figure 8 Quantum circuit for 3-qubit QFT

IN0

IN7U1 U2 U3 U4 U5 U6 U7

8 8

OUT0

OUT7

Figure 9 The 3-qubit QFT in terms of a sequence of unitarytransformations

of the input complex number with imaginary number 119894This is applied in unitary transformations 1198802 and 1198805 Theoperation ofmoduleCMULT in Figure 10 is described in (26)The function of CMULT is the multiplication of the inputcomplex number with a constant complex number which isderived based on the controlled phase-shift gate As expressedpreviously in (21) it is used in unitary transformation 1198803

out 119903 = (in 119903 times constant 119903) minus (in 119894 times constant 119894)

out 119894 = (in 119894 times constant 119903) + (in 119903 times constant 119894) (26)

42 Modelling Groverrsquos Search for FPGA Emulation As men-tioned earlier the second approach of modelling Groverrsquosquantum search is modelling using arithmetic functions(based on mathematical model) In this work we havechosen this approach of modelling Groverrsquos algorithm forFPGA emulation In contrast to the quantum circuit modelapproach that involves complex large dimensional matrixoperations (ie matrix multiplication and tensor product)the chosen method can utilize the computational resourcesavailable on FPGA such as comparators adderssubtractersand multiplexers for efficient emulation of Groverrsquos searchalgorithm

This technique is based on phase inversion and inver-sion about mean as described in Section 35 As shown inFigure 11 mathematically Groverrsquos search for a databasewith 2119899 elements mainly involves the processes of invertingthe phase of target element function 119865 and performing

inversion about mean on all elements function minus119868 + 2119860 forradic2119899 times Initialization of the quantum state vector with

equal probability function INIT is carried out once in thebeginning of the process

For the case study of a 3-bit search problem we derivethe data-flow graphs and obtain the result of the requiredarithmetic functions as shown in Figure 12 For experimentalpurposes the oracle module is developed by comparingall elements in database with targeted element using com-parators COMP modules The arithmetic functions of theinversion about mean are derived through straightforwardcomputations that involve summation bit shift and subtrac-tion

43 Architecture of Proposed FPGA Emulation Model It isclear that if implemented on classical computing platformsthe resource utilization for a quantum system would growexponentially Hence the choice of suitable architectureis critical for FPGA-based quantum computing emulationHere we discuss the efficiencies of different architecturalchoices in datapath concurrent pipeline serial and serial-parallel The block diagram of various architectures is con-structed based on the example of 3-qubit QFT

431 Concurrent Processing In concurrent processing of analgorithm all computations are completed within a clockcycle In the case of 3-qubitQFT computation blocks betweenthe input and output registers through the functional blocks1198801 to 1198807 are performed in one clock cycle (refer to Fig-ure 13(a)) However such an architecture consumes enor-mous resources such that the number of registers required toemulate an n-qubit QFT is 2119899+1 In addition the critical pathdelay is very high which results in unrealistic low operatingfrequency

432 Pipeline Architecture Most of the prior works onFPGA-based quantum circuit emulation [25ndash27] are devel-oped based on pipeline architectureThepipeline architecturehas the advantages of high throughput and much shortercritical path delay Figure 13(b) shows the proposed pipelinearchitecture of the 3-qubit QFT However the main issue ofthis approach is that resource utilization grows drasticallyby the number of qubits due to the circuit augmentation ofpipeline registers 2119899(sum119899

119896=1119896+2)pipeline registers are required

to emulate n-qubit QFT Consequently hardware emulationscalability is highly constrained by the available resources inFPGA

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 9

IN0

IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7 IN0 IN4 IN1 IN5 IN2 IN6 IN3 IN7

IN1 IN2 IN3 IN4 IN5 IN6 IN7

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

U7

U6

U5

U4

U3

U2F

F F

F

U1+ + + + minus minus minus minus

+

++ + +minus

minus + minus + minus + minus

minus minus minus

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5

out2_5

out2_6 out2_7

out2_7

CMULTCMULT

out3_0

out3_0 out3_0

out3_1

out3_1 out3_1

out3_2

out3_2 out3_2

out3_3

out3_3 out3_3

out3_4

out3_4 out3_4

out3_5

out3_5 out3_5

out3_6

out3_6 out3_6

out3_7

out3_7 out3_7

out4_0 out4_1 out4_2 out4_3 out4_4 out4_5 out4_6 out4_7

out5_0

out5_0

out5_1

out5_1 out5_0 out5_1

out5_2

out5_2

out5_3

out5_3 out5_2 out5_3

out5_4

out5_4

out5_5

out5_5 out5_4 out5_5

out5_6

out5_6 out5_6

out5_7

out5_7out5_7

out6_0 out6_1

out6_1

out6_2 out6_3

out6_3

out6_4

out6_4

out6_5 out6_6

out6_6

out6_7

1

radic2(1 + i)

1

radic2(1 + i)

times times times times times times times times

times times times times times times times times

times times times times times times times times

Figure 10 Data-flow graph for 3-qubit QFT The multipliers shown in this diagram represent the multiplication of input complex numberwith constant 1radic2

433 Serial Processing Although serial design requires mul-tiple iterations to perform a complex computation it opensup the opportunity for resource sharing Serial processingis suitable for applications where resource utilization is acritical design constraint Figure 13(c) depicts the serial formof the 3-qubit QFT circuit that consists of a control unitand a datapath unit As resources can be reused betweentransformations a serial-based n-qubit QFT consumes 2119899registers a register utilization that is much lower than inconcurrent or pipeline architectures However pure serial

approach forfeits the purpose of conducting FPGA emulationwhose aim is to exploit the parallelism inherent in a quantumsystem as it would still suffer from slow sequential behaviouras exhibited in simulation on classical computer

434 Proposed Serial-Parallel Architecture In this paperwe propose a hybrid serial-parallel architecture for FPGAemulation of quantum algorithms The proposed approachtakes advantage of both serial and parallel design techniques

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

10 International Journal of Reconfigurable Computing

Repeat radic2n times

2n 2n 2n

Phase inversion

INIT

Inversionabout mean

minusI + 2AF

Figure 11 Mathematical modelling of Groverrsquos search algorithm

Applying the concepts of quantum parallelism and quantumdynamicsmodelled by sequential transformations on a quan-tum state vector it is found that the proposed serial-parallelarchitecture is suitable for efficient and accurate quantumcomputing emulation on FPGA platform

Figure 14 shows the functional block diagram of theproposed serial-parallel FPGA emulation architecture of the3-qubit QFT The serial-parallel design of the datapath unitinvolves a number of quantum computation units that canperform parallel computations for each stage of unitarytransformation whereby the same computational resourcescan be reused for the following stage of transformations

For data storage and synchronization purposes 2119899 reg-isters are shared between unitary transformations As com-pared to the pipeline design our proposed serial-parallelapproach achieves linear reduction on the usage of registersto emulate the same quantum system The arithmetic logicunit (ALU) in the datapath unit contains multiple customprocessing elements and the allocation of resources in ALUvaries based on the target application The number of pro-cessing elements is basically determined based on the desired2119899 parallelism in the n-qubit quantum system

As illustrated in Figures 10 and 12 the data-flow graph ofQFT and Groverrsquos search algorithm exhibit similar repetitivepattern between unitary transformations This implies thatthe proposed serial-parallel approach can be generalized forthe two case studies to achieve balance in both resourceutilization and speed performance For the case of Groverrsquossearch the ALUwould be theGrover iterationmodule shownin Figure 12 whereas the control unit is designed to keep trackon the number of Grover iterations required by the targetsearch problem

5 Experimental Results

The proposed emulation designs are modelled in SystemVer-ilog HDL synthesized using Altera Quartus II softwareand implemented into target emulation platform which isbased on Altera Stratix IV EP4SGX530KF43C4 FPGA InSection 51 we discuss the verification of the hardwareemulation designs for the QFT and Groverrsquos search casestudies In addition the automated process for scaling upthe design to larger qubit size is described In Section 52investigation is conducted to study the effects of the numberof mantissa bits used in our fixed point representation formaton resource utilization and precision error In the sectionthat follows we analyse for different emulation architectureshow the increase in qubit size impacts on resource growthand maximum operating frequency allowed in the designs

Finally the runtime speeds in simulation and emulation forquantum algorithms with different qubit sizes are compared

51 Design Verification To show that the quantum algo-rithms have been modelled accurately design verificationof the proposed emulation hardware is performed Goldenreferences based on the software simulationmodels (in C) aredeveloped and their outputs are comparedwith the emulationhardware under test (which are described in SystemVerilogHDL)

In our QFT case study FFTW3 [42] a widely applied fastFourier transform library in C is used to perform Fouriertransform computations on the signal samples that are usedin this work The outputs of the classical Fourier transformthen serve as the golden referencemodel in verification of theproposed emulation model Furthermore since the discreteFourier transform (DFT) is a linear transformation that canbe defined in unitary matrix form the functional correctnessof our QFT hardware emulation model can be convenientlyverified against the DFTmatrixThe expression of an 119899-qubitDFT matrix is shown in

DFT = 1

radic2119899

[

[

[

[

[

[

[

[

[

[

[

1 1 1 1

1 1205961

1205962

1205962119899minus1

1 1205962

1205964

1205962(2119899minus1)

1 1205962119899minus11205962(2119899minus1) 120596

(2119899minus1)(2

119899minus1)

]

]

]

]

]

]

]

]

]

]

]

(27)

where 120596 is the 2119899th root of unity that is 120596 = 11989021205871198942119899

Thechoice of 11989021205871198942

119899

or 119890minus21205871198942119899

is purely a matter of convention asboth the term 11989021205871198942

119899

and the term 119890minus21205871198942119899

to the power of 2119899are equal to 1

On the other hand the design of Groverrsquos search FPGAemulation model is verified against the mathematical modelprovided in literature The simulation model of Groverrsquossearch algorithm also serves as the golden reference modelfor verification purposes

For the development of FPGA emulation models forpractical quantum computing applications it is importantthat the emulation hardware can be scaled up to larger qubitsize architectures In this work the designs are scaled up withthe aid of software program developed in-house HDL codesof the two case studies are autogenerated by the softwareprogram based on the proposed modelling techniques (asdiscussed in Section 4) The generated HDL code producesefficient hardware emulation model based on the proposedserial-parallel architecture

52 Fixed Point Representation As defined in (2) a quantumstate vector is represented by complex floating point numbersTo ensure effective resource utilization in our FPGA emula-tion hardware floating point numbers are replaced by fixedpoint representations In this work a fixed point format with1 sign bit 1 integer bit and 119873 mantissa bits (as shown inFigure 15) is used Since the amplitudes of a quantum statethat is the probabilities of collapsing into computational basis

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 11

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

1

radic8

minusIN0

IN0

IN0 minusIN1IN1 minusIN2IN2 minusIN3IN3 minusIN4IN4 minusIN5IN5 minusIN6IN6 minusIN7IN7

Target

Target Target Target Target Target Target Target Target

Target Target Target Target Target Target Target

COMP

COMPCOMPCOMPCOMPCOMPCOMPCOMPCOMP

COMPCOMPCOMPCOMPCOMPCOMPCOMP

IN1 IN2 IN3 IN4 IN5 IN6 IN7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

out1_0 out1_1 out1_2 out1_3 out1_4 out1_5 out1_6 out1_7

+ + + +

++

+

+ +

+

+ + +

+

out1_0 minusout1_0 out1_1 minusout1_1 out1_2 minusout1_2 out1_3 minusout1_3 out1_4 minusout1_4 out1_5 minusout1_5 out1_6 minusout1_6 out1_7 minusout1_7

minus

minus minus minus minus minus minus minus minus

minus minus minus minus minus minus minus

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

out2_0 out2_1 out2_2 out2_3 out2_4 out2_5 out2_6 out2_7

Groveriteration 2

Grover

INIT

iteration 1

OUT0 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7

≫(n minus 1)

≫(n minus 1)

Figure 12 Data-flow graph of Groverrsquos circuit for 3-bit search

U1 U2 U3 U4 U5 U6 U78 8

IN0

IN7

OUT0

OUT7

(a) Concurrent processing

U4U1 U2 U3 U5 U6 U788

IN0

IN7

OUT0

OUT7

(b) Pipeline architecture

88

Control unit

Datapath unit

Shared

Arithmeticlogic unit

(ALU)

registersIN0

IN7

OUT0

OUT7

(c) Serial architecture

Figure 13 Three-qubit QFT implemented using different hardware architectures

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

12 International Journal of Reconfigurable Computing

CU

ALU

DU

alu3alu2alu1alu0

alu4 alu5 alu6 alu7

CMULTCMULT

reg0reg4

IN0

Shared registers

IN1alu0

alu1alu4

IN2alu2alu4alu1IN3

IN4

alu3alu5

alu4alu2

reg3reg6

reg3

reg1

reg6

reg7

IN7

alu5

IN6

IN5

alu7

alu3

alu6

alu3

alu6

reg4 reg4reg2 reg2 reg2reg1 reg1 reg3 reg3reg5

reg5

reg5 reg5reg6 reg6 reg7

reg7

Stateregisters

Next-stateStart

Control signals

logic

Outputlogic

+

F

F

F

+ + +

minus minus minus minus

1

radic2(1 + i)

1

radic2(1 + i)

alu_comp0 alu_comp1

reg7

reg6

reg5

reg3

reg1

reg2

reg4

reg0

alu_comp

alu_comp0

1

|k1⟩|k2⟩

|j1⟩|j2⟩

|k3⟩|j3⟩

times times times times

times times times times

Figure 14 Proposed serial-parallel architecture for 3-qubit QFTThe multipliers shown in this diagram represent the multiplication of inputcomplex number with constant 1radic2

1 sign bit 1 decimal bit N mantissa bits

Figure 15 Fixed point representation format

states after measurement are constrained in the range of 0 to1 only one bit is required to represent the integer part

Due to the limitations of the classical digital computingplatform representation of qubit amplitudes with infiniteprecision is infeasible In the context of quantum computermodelling particularly for quantum systems with large qubitsizes minimising precision loss is critical to preserve theconsistency of the quantum state during the modellingprocess [43] Here we investigate how precision error is

affected by the number of mantissa bits used in our fixedpoint representation format The corresponding experimen-tal results are given in Figure 16

Precision error shown in Figure 16 is computed based onthe following equations

error 119903 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119903

119898minus simulate 119903

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119903

119898

1003816100381610038161003816

error 119894 =2119899minus1

sum

119898=0

1003816100381610038161003816emulate 119894

119898minus simulate 119894

119898

1003816100381610038161003816

1003816100381610038161003816simulate 119894

119898

1003816100381610038161003816

precision error = 1

2119899+1(error 119903 + error 119894)

(28)

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 13

14 15 16 17 18 19 20 21 22 23 240

005

01

015

02

025

03

035

04

045

05

Number of mantissa bits

Prec

ision

erro

r (

)

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

Figure 16 Precision error against different fixed point formats used

where simulate 119903 and simulate 119894 are the floating point realand imaginary amplitudes of the reference output state gen-erated from simulation whereas emulate 119903 and emulate 119894 arethe amplitudes extracted from the output state of proposedFPGA emulator (converted from original fixed point formatto floating point for verification purposes)

Figure 16 shows that 16-bit fixed point format (14mantissabits) incurs significant precision error for 5-qubit emulationsin both case studies However the error produced by 2-qubitemulations is insignificant This behaviour is because theamplification of the fixed point truncation errors grows withthe size of quantum system For FPGA emulation purposesprecision error due to fixed point representation can bereduced by increasing the number ofmantissa bits with trade-off on resource utilization By expanding the number ofmantissa bits up to 24-bit (which results in 26 total numberof bits) negligible precision error for 5-qubit emulations isattained It is important to note that the proposed FPGAemulator is parameterizable in terms of the number ofmantissa bits for fixed point representation This is crucial toensure different fixed point formats can be applied to emulatequantum circuit of various sizes based on the demandedprecision error tolerance and resource constraint

The size of our fixed point number format also affectsresource utilization The corresponding experimental resultsforQFT case study are shown in Figure 17 Since the resourcesavailable on FPGA device are mostly in blocks or multiples of8 bits the choice of 26-bit fixed point format is not suitableHence we apply 22-mantissa-bit size (ie total number of bitsis 24) in our fixed point representation formats Note thatfor Groverrsquos search emulations the experiment on resourceutilization of DSP blocks is not relevant because FPGAemulation model developed here (as depicted in Figure 12)does not involve multiplication

53 Efficiency of Proposed FPGA Emulation Architecture Inthis subsection we investigate how the increase in qubit size

impacts resource growth and maximum allowable operatingfrequency in different emulation architectures (ie concur-rent pipeline and serial-parallel) In the case of QFT emu-lation model we have two versions of serial-parallel archi-tectures Type 1 serial-parallel uses DSP blocks to performmultiplications whereas type 2 replaces the multiplications(required in Hadamard gate operations 1198801 1198804 and 1198806 inthe 3-qubit QFT case) with shift-add operations Althoughconventional hardware design methods encourage the usageof shift-add operation instead of multiplication to reduceresource utilization the case is now different with FPGAdevices containing efficient built-in DSP blocks Figures 18and 19 show the results of the experiments conducted onQFTand Groverrsquos search algorithms respectively

Based on the experimental results shown in Figure 18when comparing to type 1 type 2 method consumes lessDSP blocks but more logic elements due to the constructionof adders needed in shift-add operations As the number ofqubits increases resource utilization of logic elements fortype 1 emulation grows rapidly when available DSP blocks areused up and it ends up with similar resource utilization asthe type 2 approach Hence we can conclude that for large-scale FPGA emulations both methods would lead to similarresource utilization Thus the first approach is preferred dueto the ease of design process where DSP blocks are utilized bydefault when implementing the design with the FPGA designautomation tool

In contrast to the concurrent and pipeline designs theexperimental results for QFT and Groverrsquos search show thatthe proposed serial-parallel architecture achieves balance onboth resource utilization and operating frequency The pro-posed architecture has significantly reduced resource growthin the application of logic elements dedicated logic regis-ters and DSP block yet maintaining reasonable operatingfrequency With the concurrent and pipeline architectures5-qubit QFT emulation completely used up the resourcesavailable in the Altera Stratix IV FPGA device used in thiswork However the same device can support up to 7-qubitQFT emulationwith the proposed serial-parallel architectureIt is important to note that the processing power of 7-qubitQFT is far higher than the 5-qubit implementation as an 119899-qubit QFT can process up to 2119899 samples in one evaluation

For scalability software simulation would depend onresources available on the computer servers The scalabilityof the FPGA emulation framework would depend on theresources available on target FPGA devices As there havebeen rapid advances in FPGA technology in recent years bydesigning an efficient architecture that is implemented in ahigh-density FPGA device (such as the Altera Stratix 10 thatcontains up to 55 million logic elements) one can actuallyemulate large qubit size quantum circuit on a single FPGAchip Furthermore new approach to FPGA emulation maybe made through the exploration of efficient data structuresandmodellingmethodsThus the proposedwork contributesto the formulation of a proof-of-concept baseline FPGAemulation framework with optimization on datapath designsthat can be extended to emulate practical large-scale quantumcircuits

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

14 International Journal of Reconfigurable Computing

14 15 16 17 18 19 20 21 22 23 240

2

4

6

8

10

12

14

Number of mantissa bits

Com

bina

tiona

l ALU

Ts

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

times104

(a) Logic element usage

2-qubit QFT5-qubit QFT

3-qubit Grover search5-qubit Grover search

14 15 16 17 18 19 20 21 22 23 240

500

1000

1500

2000

2500

3000

3500

Number of mantissa bits

Ded

icat

ed lo

gic r

egist

ers

(b) Register usage

2-qubit QFT5-qubit QFT

14 15 16 17 18 19 20 21 22 23 240

200

400

600

800

1000

1200

Number of mantissa bits

DSP

blo

ck 1

8-bi

t ele

men

t

(c) DSP block usage

Figure 17 Resource utilization for different fixed point formats used

54 Benchmarking between Simulation and Emulation Hereour emulation models are benchmarked against the equiv-alent software simulations The simulation models used arebased on an open source quantum library 119897119894119887119902119906119886119899119905119906119898 [28]Software simulation is performed on an Intel Core i7-4790eight-core processor with 36GHz clock rate running on aLinux-based Ubuntu 1404 kernel whereas hardware emu-lation is based on the Altera Stratix IV EP4SGX530KF43C4FPGA Table 1 shows the runtime comparison betweensimulation and our emulation

Figure 20 illustrates the runtime speed-up (simula-tionemulation) of Groverrsquos search case study It is importantto note that the proposed hardware emulation is implementedbased on 24-bit fixed point format (which is determined

Table 1 Comparison of runtime between simulation and emulation

QFT runtime (s) Groverrsquos search runtime (s)Simulation Emulation Simulation Emulation

2-qubit 155 times 10minus6 358 times 10minus9 296 times 10minus6 46 times 10minus9

3-qubit 466 times 10minus6 805 times 10minus9 740 times 10minus6 120 times 10minus9

4-qubit 510 times 10minus6 1344 times 10minus9 2207 times 10minus6 214 times 10minus9

5-qubit 565 times 10minus6 2193 times 10minus9 3988 times 10minus6 367 times 10minus9

6-qubit mdash mdash 10696 times 10minus6 627 times 10minus9

7-qubit mdash mdash 27714 times 10minus6 968 times 10minus9

based on the experimental work presented in Section 52)whereas 32-bit single precision float is used in the software

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 15

2 25 3 35 4 45 50

2

4

6

8

10

12

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

times104

(a) Logic element usage

2 25 3 35 4 45 50

05

1

15

2

25

3

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

times104

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(b) Register usage

2 25 3 35 4 45 50

200

400

600

800

1000

1200

Number of qubits

DSP

blo

ck 1

8-bi

t ele

men

t

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(c) DSP block usage

2 25 3 35 4 45 50

50

100

150

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipeline

Serial-parallel type 1Serial-parallel type 2

(d) Maximum operating frequency

Figure 18 Resource utilization and maximum operating frequency for different emulation architectures of QFT (based on 24-bit fixed pointformat)

simulation to describe the quantum state For a larger qubitsize quantum circuit the number of mantissa bits for fixedpoint representation has to be increased accordingly to ensuresimilar precision as in software simulation can be achievedwith trade-offs on resource utilization and execution speed[44]

From Figure 20 it can be observed that the proposedhardware emulation provides significant speed-up over soft-ware simulation using 119897119894119887119902119906119886119899119905119906119898 It is important to notethat the achieved speed-up increases drastically as the num-ber of qubits increasesThis result further supports the notionthat hardware emulation has significant potential in themodelling of a large-scale quantum system on the classicalcomputing platform based on FPGA

As the number of required IO pins for emulating QFTand Groverrsquos search algorithms with parallel read-outs istoo much to fit in the existing FPGA devices board-levelverification is infeasible Although the usage of multiplexersreduces the number of output pins resource consumptionrises significantly with the increase in the number of qubitsand this affects the analysis of the overall experiment Thusthe estimated runtime of the proposed FPGA emulationarchitecture is obtained based on the hardware clock cycleand the operating frequency that is acquired from the FPGAdevelopment toolThis is not a critical issue in future practicaldeployment as the selected case studies are meant to work ascore modules within other quantum applications that mightnot require parallel read-out

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 16: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

16 International Journal of Reconfigurable Computing

2 25 3 35 4 45 5 55 6 65 70

1

2

3

4

5

6

7

Number of qubits

Com

bina

tiona

l ALU

Ts

ConcurrentPipelineSerial-parallel

times104

(a) Logic element usage

2 25 3 35 4 45 5 55 6 65 70

05

1

15

2

25

3

35

4

5

45

Number of qubits

Ded

icat

ed lo

gic r

egist

ers

ConcurrentPipelineSerial-parallel

times104

(b) Register usage

2 25 3 35 4 45 5 55 6 65 70

50

100

150

200

250

300

350

400

450

500

Number of qubits

Max

imum

freq

uenc

y (M

Hz)

ConcurrentPipelineSerial-parallel

(c) Maximum operating frequency

Figure 19 Resource utilization and maximum operating frequency for different emulation architectures of Groverrsquos search (based on 24-bitfixed point format)

6 Conclusion and Future Work

Efficient resource utilization is critical for FPGA-basedimplementations especially for emulating quantum comput-ing applications as they typically exhibit exponential resourcerequirement with increasing number of qubits In this paperwe proposed a baseline FPGA emulation framework withfocus on the datapath design optimization based on theconventional state vector model as well as an effectivemethodology that facilitates accurate modelling of quantumalgorithms for FPGAemulationA serial-parallel architecture

with efficient resource utilization for FPGA-based emulationof quantum computing is presentedThe proposed emulationarchitecture achieves linear reduction in resource utilizationcompared to pipeline implementations as found in previousworks This work has also demonstrated the advantage ofFPGA emulation over software simulation where hardwareemulation of 7-qubit Groverrsquos search is about 3 times 104 timesfaster than the software simulation performed on Intel Corei7-4790 eight-core processor running at 36GHz clock rate

However experimental results obtained in this workshow that it is difficult to realize a scalable and flexible

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 17: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of Reconfigurable Computing 17

2 25 3 35 4 45 5 55 6 65 705

1

15

2

25

3

Number of qubits

Runt

ime s

peed

-up

(sim

ulat

ion

emul

atio

n)

times104

Figure 20 Runtime speed-up against number of qubits in Groverrsquossearch emulation

emulation platform for large qubit size real-world quantumsystem using the approach that applies existing state vectorquantum models This concurs with [45] that states that thepractical limit on the size of the quantum system that canbe modelled on classical computing platform can hardly beovercome due to exponentially large memory requirementsfor storing the entire state vector Hence this suggests thata model with a more effective data structure to representquantum systems is required Recently the work on stabi-lizer frames [30] has shown promise in providing a moresuitable data structure for quantum models targeted forFPGA emulation This is the subject of future work in ourresearch in applying FPGA emulation in modelling of large-scale quantum systems In addition the error-correctionstructure available with stabilizer frames will be consideredfor application in practical quantum computations With amore efficient modelling technique FPGA can represent amore efficient emulation strategy of quantum systems

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the Ministry of Higher Education(MOHE) and Universiti Teknologi Malaysia (UTM) underFundamental Research Grant Scheme (FRGS) Vote number4F422

References

[1] P W Shor ldquoAlgorithms for quantum computation discretelogarithms and factoringrdquo in Proceedings of the 35th IEEEAnnual Symposium on Foundations of Computer Science(SFCS rsquo94) pp 124ndash134 Santa Fe NM USA November 1994

[2] L K Grover ldquoQuantum mechanics helps in searching for aneedle in a haystackrdquo Physical Review Letters vol 79 no 2 pp325ndash328 1997

[3] A Malossini and T Calarco ldquoQuantum genetic optimizationrdquoIEEE Transactions on Evolutionary Computation vol 12 no 2pp 231ndash241 2008

[4] R L Rivest A Shamir and L Adleman ldquoA method forobtaining digital signatures and public-key cryptosystemsrdquoCommunications of the ACM vol 21 no 2 pp 120ndash126 1978

[5] L K Grover ldquoA fast quantum mechanical algorithm fordatabase searchrdquo in Proceedings of the 28th Annual ACMSymposium onTheory of Computing pp 212ndash219 ACM 1996

[6] N Shenvi J Kempe andK BWhaley ldquoQuantum random-walksearch algorithmrdquo Physical Review A Atomic Molecular andOptical Physics vol 67 no 5 Article ID 052307 2003

[7] E Farhi J Goldstone and S Gutmann ldquoA quantum algorithmfor the Hamiltonian NAND treerdquo Theory of Computing vol 4pp 169ndash190 2008

[8] P W Shor ldquoWhy havenrsquot more quantum algorithms beenfoundrdquo Journal of the ACM vol 50 no 1 pp 87ndash90 2003

[9] D R Simon ldquoOn the power of quantum computationrdquo SIAMJournal on Computing vol 26 no 5 pp 1474ndash1483 1997

[10] S Hallgren ldquoPolynomial-time quantum algorithms for Pellrsquosequation and the principal ideal problemrdquo Journal of the ACMvol 54 article 4 2007

[11] M Mosca and A Ekert ldquoThe hidden subgroup problem andeigenvalue estimation on a quantum computerrdquo in Quan-tum Computing and Quantum Communications pp 174ndash188Springer Berlin Germany 1999

[12] D Bacon A M Childs and W Van Dam ldquoFrom optimalmeasurement to efficient quantum algorithms for the hiddensubgroup problem over semidirect product groupsrdquo in Pro-ceedings of the 46th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo05) pp 469ndash478 IEEE October2005

[13] L K Grover and A M Sengupta ldquoFrom coupled pendulums toquantum searchrdquo inMathematics of QuantumComputation pp119ndash134 Chapman and HallCRC 2002

[14] R P Feynman ldquoSimulating physics with computersrdquo Interna-tional Journal ofTheoretical Physics vol 21 no 6-7 pp 467ndash4881982

[15] N S Yanofsky and M A Mannucci Quantum Computingfor Computer Scientists vol 20 Cambridge University PressCambridge UK 2008

[16] C Monroe D M Meekhof B E King W M Itano and DJ Wineland ldquoDemonstration of a fundamental quantum logicgaterdquo Physical Review Letters vol 75 no 25 pp 4714ndash4717 1995

[17] N A Gershenfeld and I L Chuang ldquoBulk spin-resonancequantum computationrdquo Science vol 275 no 5298 pp 350ndash3561997

[18] J E Mooij T P Orlando L Levitov L Tian C H Van derWaland S Lloyd ldquoJosephson persistent-current qubitrdquo Science vol285 no 5430 pp 1036ndash1039 1999

[19] I L Chuang N Gershenfeld and M Kubinec ldquoExperimentalimplementation of fast quantum searchingrdquo Physical ReviewLetters vol 80 no 15 article 3408 1998

[20] R Barends J Kelly A Megrant et al ldquoSuperconducting quan-tum circuits at the surface code threshold for fault tolerancerdquoNature vol 508 no 7497 pp 500ndash503 2014

[21] M H Amin N G Dickson and P Smith ldquoAdiabatic quantumoptimization with quditsrdquo Quantum Information Processingvol 12 no 4 pp 1819ndash1829 2013

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 18: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

18 International Journal of Reconfigurable Computing

[22] A D King E Hoskinson T Lanting E Andriyash and M HAmin ldquoDegeneracy degree and heavy tails in quantumanneal-ingrdquo httparxivorgabs151207325

[23] T F Roslashnnow Z Wang J Job et al ldquoDefining and detectingquantum speeduprdquo Science vol 345 no 6195 pp 420ndash4242014

[24] Y Goto and M Fujishima ldquoEfficient quantum computingemulation system with unitary macro-operationsrdquo JapaneseJournal of Applied Physics vol 46 no 4 pp 2278ndash2282 2007

[25] A U Khalid Z Zilic and K Radecka ldquoFPGA emulation ofquantum circuitsrdquo in Proceedings of the IEEE International Con-ference on Computer Design VLSI in Computers and Processors(ICCD rsquo04) pp 310ndash315 IEEE October 2004

[26] J F Rivera-Miranda A J Caicedo-Beltran J D Valencia-Payan J M Espinosa-Duran and J Velasco-Medina ldquoHard-ware emulation of quantum Fourier transformrdquo in Proceedingsof the IEEE 2nd Latin American Symposium on Circuits and Sys-tems (LASCAS rsquo11) pp 1ndash4 IEEE Bogata Colombia February2011

[27] M Aminian M Saeedi M S Zamani andM Sedighi ldquoFPGA-based circuit model emulation of quantum algorithmsrdquo inProceedings of the IEEE Computer Society Annual Symposiumon VLSI (ISVLSI rsquo08) pp 399ndash404 IEEE Montpellier FranceApril 2008

[28] H Weimer M Muller I Lesanovsky P Zoller and H PBuchler ldquoA Rydberg quantum simulatorrdquoNature Physics vol 6no 5 pp 382ndash388 2010

[29] G F Viamontes Efficient quantum circuit simulation [PhDthesis] The University of Michigan Ann Arbor Mich USA2007

[30] H J Garcıa and I L Markov ldquoSimulation of quantum circuitsvia stabilizer framesrdquo IEEE Transactions on Computers vol 64no 8 pp 2323ndash2336 2015

[31] C P Williams and S H Clearwater Explorations in QuantumComputing Springer Berlin Germany 1998

[32] A Barenco D Deutsch A Ekert and R Jozsa ldquoConditionalquantumdynamics and logic gatesrdquo Physical Review Letters vol74 no 20 pp 4083ndash4086 1995

[33] M A Nielsen and I L Chuang Quantum Computation andQuantum Information Cambridge University Press 2000

[34] W-W Zhang F Gao B Liu Q-Y Wen and H Chen ldquoAwatermark strategy for quantum images based on quantumFourier transformrdquo Quantum Information Processing vol 12no 2 pp 793ndash803 2013

[35] D Curtis and D A Meyer ldquoTowards quantum templatematchingrdquo in Proceedings of the SPIE 48th Annual Meeting inQuantum Communications and Quantum Imaging pp 134ndash141International Society for Optics and Photonics August 2003

[36] R Schutzhold and G Schaller ldquoAdiabatic quantum algorithmsas quantum phase transitions first versus second orderrdquo Physi-cal Review A vol 74 no 6 Article ID 060304 2006

[37] W P Baritompa D W Bulger and G R Wood ldquoGroverrsquosquantum algorithm applied to global optimizationrdquo SIAMJournal on Optimization vol 15 no 4 pp 1170ndash1184 2005

[38] C Durr and P Hoyer ldquoA quantum algorithm for finding theminimumrdquo httparxivorgabsquant-ph9607014

[39] A Montanaro ldquoQuantum pattern matching fast on averagerdquoAlgorithmica 2015

[40] Y H Lee M Khalil-Hani and M N Marsono ldquoFPGA-basedquantum circuit emulation a case study on Quantum Fouriertransformrdquo in Proceedings of the 14th International Symposium

on Integrated Circuits (ISIC rsquo14) pp 512ndash515 IEEE SingaporeDecember 2014

[41] M Khalil-Hani Y H Lee and M N Marsono ldquoAn accu-rate FPGA-based hardware emulation on quantum fouriertransformrdquo in Proceedings of the Australasian Symposium onParallel andDistributed Computing (AusPDC rsquo15) vol 1 SydneyAustralia 2015 a1b3

[42] M Frigo and S G Johnson ldquoThe design and implementationof FFTW3rdquo Proceedings of the IEEE vol 93 no 2 pp 216ndash2312005

[43] D Wecker and K M Svore ldquoLIQUi|gt a software design archi-tecture and domain-specific language for quantum computingrdquohttparxivorgabs14024467

[44] S Kilts Advanced FPGA Design Architecture Implementationand Optimization John Wiley amp Sons New York NY USA2007

[45] M Smelyanskiy N P D Sawaya and A Aspuru-GuzikldquoqHiPSTER the quantum high performance software testingenvironmentrdquo httparxivorgabs160107195

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 19: Research Article An FPGA-Based Quantum Computing ...namely, ion trap [ ], nuclear magnetic resonance [ ], and superconductor [ ], were attempted. Nevertheless, only small-scale quantum

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of