13
CS252 Graduate Computer Architecture Lt 25 Lecture 25 Quantum Computing and Quantum Computing and Quantum CAD Design April 28 th , 2010 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs252 Use Quantum Mechanics to Compute? Weird but useful properties of quantum mechanics: Quantization: Only certain values or orbits are good » Remember orbitals from chemistry??? » Remember orbitals from chemistry??? Superposition: Schizophrenic physical elements don’t quite know whether they are one thing or another All existing digital abstractions try to eliminate QM All existing digital abstractions try to eliminate QM Transistors/Gates designed with classical behavior Binary abstraction: a “1” is a “1” and a “0” is a “0” Quantum Computing: Quantum Computing: Use of Quantization and Superposition to compute. Interesting results: Shor’s algorithm: factors in polynomial time! Grover’s algorithm: Finds items in unsorted database in time proportional to square-root of n. Mt i ls simul ti n: xp n nti l cl ssic ll lin tim QM Materials simulation: exponential classically, linear-time QM 4/28/2010 cs252-S10, Lecture 25 2 Quantization: Use of “Spin” North Spin ½ particle: (Proton/Electron) Representation: |0> or |1> Particles like Protons have an intrinsic “Spin” h d fi d ith t t t l South when defined with respect to an external magnetic field Quantum effect gives “1” and “0”: Either spin is “UP” or “DOWN” nothing between 4/28/2010 cs252-S10, Lecture 25 3 Kane Proposal II (First one didn’t quite work) Single Spin Control Gates Ph h Inter-bit Control Gates Phosphorus Impurity Atoms Bits Represented by combination of proton/electron spin Operations performed by manipulating control gates Complex sequences of pulses perform NMR like operations Complex sequences of pulses perform NMR-like operations Temperature < 1° Kelvin! 4/28/2010 cs252-S10, Lecture 25 4

Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

  • Upload
    vanbao

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

CS252Graduate Computer Architecture

L t 25Lecture 25

Quantum Computing andQuantum Computing andQuantum CAD Design

April 28th, 2010

Prof John D. Kubiatowiczhttp://www.cs.berkeley.edu/~kubitron/cs252

Use Quantum Mechanics to Compute?• Weird but useful properties of quantum mechanics:

– Quantization: Only certain values or orbits are good» Remember orbitals from chemistry???» Remember orbitals from chemistry???

– Superposition: Schizophrenic physical elements don’t quite know whether they are one thing or another

• All existing digital abstractions try to eliminate QMAll existing digital abstractions try to eliminate QM– Transistors/Gates designed with classical behavior– Binary abstraction: a “1” is a “1” and a “0” is a “0”

• Quantum Computing: • Quantum Computing: Use of Quantization and Superposition to compute.

• Interesting results:– Shor’s algorithm: factors in polynomial time!– Grover’s algorithm: Finds items in unsorted database in time

proportional to square-root of n.M t i ls simul ti n: xp n nti l cl ssic ll lin tim QM– Materials simulation: exponential classically, linear-time QM

4/28/2010 cs252-S10, Lecture 25 2

Quantization: Use of “Spin”North

Spin ½ particle:(Proton/Electron)

Representation:|0> or |1>

• Particles like Protons have an intrinsic “Spin” h d fi d ith t t t l

South

when defined with respect to an external magnetic field

• Quantum effect gives “1” and “0”:Q g– Either spin is “UP” or “DOWN” nothing between

4/28/2010 cs252-S10, Lecture 25 3

Kane Proposal II (First one didn’t quite work)

Single SpinControl Gates

Ph h

Inter-bit Control Gates

PhosphorusImpurity Atoms

• Bits Represented by combination of proton/electron spin• Operations performed by manipulating control gates

Complex sequences of pulses perform NMR like operations– Complex sequences of pulses perform NMR-like operations

• Temperature < 1° Kelvin!4/28/2010 cs252-S10, Lecture 25 4

Page 2: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Now add Superposition!• The bit can be in a combination of “1” and “0”:

– Written as: = C0|0> + C1|1>– The C’s are complex numbers!The C s are complex numbers!– Important Constraint: |C0|2 + |C1|2 =1

• If measure bit to see what looks like, h b b l | |2 ll f d |0 ( “ P”)– With probability |C0|2 we will find |0> (say “UP”)

– With probability |C1|2 we will find |1> (say “DOWN”)

• Is this a real effect? Options:p– This is just statistical – given a large number of protons, a fraction

of them (|C0|2 ) are “UP” and the rest are down.– This is a real effect, and the proton is really both things until you

t t l k t ittry to look at it

• Reality: second choice! – There are experiments to prove it!

4/28/2010 cs252-S10, Lecture 25 5

A register can have many values!• Implications of superposition:

– An n-bit register can have 2n values simultaneously!– 3-bit example:3 bit example:

= C000|000>+ C001|001>+ C010|010>+ C011|011>+ C100|100>+ C101|101>+ C110|110>+ C111|111>

• Probabilities of measuring all bits are set by r a t f m a ur ng a t ar t y coefficients:– So, prob of getting |000> is |C000|2, etc.– Suppose we measure only one bit (first):pp y

» We get “0” with probability: P0=|C000|2+ |C001|2+ |C010|2+ |C011|2Result: = (C000|000>+ C001|001>+ C010|010>+ C011|011>)

» We get “1” with probability: P1=|C100|2+ |C101|2+ |C110|2+ |C111|2Result: = (C |100>+ C |101>+ C |110>+ C |111>)Result: = (C100|100>+ C101|101>+ C110|110>+ C111|111>)

• Problem: Don’t want environment to measurebefore ready!

S l ti : Q t m E C ti C d !– Solution: Quantum Error Correction Codes!

4/28/2010 cs252-S10, Lecture 25 6

Spooky action at a distance• Consider the following simple 2-bit state:

= C00|00>+ C11|11>– Called an “EPR” pair for “Einstein, Podolsky, Rosen”

N t th t bit• Now, separate the two bits:

Light-Years?

• If we measure one of them, it instantaneously sets other one!– Einstein called this a “spooky action at a distance”

Light Years?

– Einstein called this a spooky action at a distance– In particular, if we measure a |0> at one side, we get a |0> at the other

(and vice versa)• Teleportation

C “ t t” EPR i ( bit X d Y)– Can “pre-transport” an EPR pair (say bits X and Y)– Later to transport bit A from one side to the other we:

» Perform operation between A and X, yielding two classical bits» Send the two bits to the other side» Use the two bits to operate on Y» Poof! State of bit A appears in place of Y

4/28/2010 cs252-S10, Lecture 25 7

Model: Operations on coefficients + measurements

Unitary Transformations

InputComplex Measure

OutputClassical

• Basic Computing Paradigm:

TransformationsState Answer

– Input is a register with superposition of many values » Possibly all 2n inputs equally probable!

– Unitary transformations compute on coefficients» Must maintain probability property (sum of squares = 1)p y p p y ( q )» Looks like doing computation on all 2n inputs simultaneously!

– Output is one result attained by measurement• If do this poorly, just like probabilistic computation:

If 2n inputs equally probable may be 2n outputs equally probable– If 2n inputs equally probable, may be 2n outputs equally probable.– After measure, like picked random input to classical function!– All interesting results have some form of “fourier transform” computation being

done in unitary transformation

4/28/2010 cs252-S10, Lecture 25 8

Page 3: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Shor’s Factoring AlgorithmTh S it f RSA P bli k t t • The Security of RSA Public-key cryptosystems depends on the difficulty of factoring a number N=pq(product of two primes)

Cl i l t : b ti l ti f t i– Classical computer: sub-exponential time factoring– Quantum computer: polynomial time factoring

• Shor’s Factoring Algorithm (for a quantum computer)) h d 1) Choose random x : 2 x N-1.

2) If gcd(x,N) 1, Bingo!3) Find smallest integer r : xr 1 (mod N)Hard

EasyEasy

3) Find smallest integer r : x 1 (mod N)4) If r is odd, GOTO 15) If r is even, a x r/2 (mod N) (a-1)(a+1) = kN6) If N 1( d N) GOTO 1

HardEasyEasy

6) If a N-1(mod N) GOTO 17) ELSE gcd(a ± 1,N) is a non trivial factor of N.

EasyEasy

4/28/2010 cs252-S10, Lecture 25 9

Finding r with xr 1 (mod N)

/\k /

\xk /\k /

\1 k

/ /k

/ /

/\

/\x r yw w1r

/ /xy

r yw0w

( ) \w1r ( ) /

\x0 1 k

0w w

QuantumFourier

Transform

• Finally: Perform measurementr0

r r1 kTransform

– Find out r with high probability– Get |y>|aw’> where y is of form k/r and w’ is related

4/28/2010 cs252-S10, Lecture 25 10

Quantum Computing Architectures• Why study quantum computing?y y q m mp g

– Interesting, says something about physics» Failure to build quantum mechanics wrong?

– Mathematical Exercise (perfectly good reason)M m E (p f y g )– Hope that it will be practical someday:

» Shor’s factoring, Grover’s search, Design of Materials» Quantum Co-processor included in your Laptop?Q p y p p

• To be practical, will need to hand quantum computer design off to classical designers

– Baring Adiabatic algorithms, will probably need 100s to 1000s Baring Adiabatic algorithms, will probably need 100s to 1000s (millions?) of working logical Qubits 1000s to millions of physical Qubits working together

– Current chips: ~1 billion transistors! f l f h• Large number of components is realm of architecture

– What are optimized structures of quantum algorithms when they are mapped to a physical substrate? O i i i ibl b h d– Optimization not possible by hand

» Abstraction of elements to design larger circuits» Lessons of last 30 years of VLSI design: USE CAD

Quantum Circuit Model

• Quantum Circuit model – graphical representation– Time Flows from left to right– Single Wires: persistent Qubits, Double Wires: classical bitsg p

• Qubit – coherent combination of 0 and 1: = |0 + |1– Universal gate set: Sufficient to form all unitary transformations

• Example: Syndrome Measurement (for 3-bit code)p y– Measurement (meter symbol)

produces classical bits• Quantum CAD

Ci i d li– Circuit expressed as netlist– Computer manpulated circuits

and implementations4/28/2010 12cs252-S10, Lecture 25

Page 4: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Quantum Error Correction Not Transversal!

Encoded/8 (T)Ancilla

SXT:

Correct

T TX

Correct

Cor

t

Cor

Cor

Hn-physical Qubitsper logical Qubit H

rrect

Correct

Correct

rrect

Correct

rrect

• Quantum State Fragile encode all Qubits– Uses many resources: e.g. 3-level [[7,1,3]]

code 343 physical Qubits/logical Qubit)!ill d h dl i (f l l l )

QECAncilla

CorrectErrors

Correct

Syndrome

Computation

• Still need to handle operations (fault-tolerantly)– Some set of gates are simply “transversal:”

• Perform identical gate between each physical bit of logical encoding– Others (like T gate for [[7 1 3]] code) cannot be handled transversallyOthers (like T gate for [[7,1,3]] code) cannot be handled transversally

• Can be performed fault-tolerantly by preparing appropriate ancilla• Finally, need to perform periodical error correction

– Correct after every(?): Gate, Long distance movement, Long Idle Period– Correction reducing entropy Consumes Ancilla bits

• Observation: 90% of QEC gates are used for ancilla production 70-85% of all gates are used for ancilla production

4/28/2010 13cs252-S10, Lecture 25

Outline

• Quantum Computing• Ion Trap Quantum Computing• Ion Trap Quantum Computing• Quantum Computer Aided Design

– Area-Delay to Correct Result (ADCR) metricy ( )– Comparison of error correction codes

• Quantum Data PathsQLA CQLA Qalypso– QLA, CQLA, Qalypso

– Ancilla factory and Teleportation Network Design• Error Correction Optimization (“Recorrection”)p ( )• Shor’s Factoring Circuit Layout and Design

4/28/2010 14cs252-S10, Lecture 25

MEMs-Based Ion Trap Devices• Ion Traps: One of the more promising quantum computer

implementation technologies – Built on SiliconBuilt on Silicon

• Can bootstrap the vast infrastructure that currently exists in the microchip industry

– Seems to be on a “Moore’s Law” like scaling curveg• 12 bits exist, 30 promised soon, …• Many researchers working on this problem

– Some optimistic researchers speculate about room temperatureSome optimistic researchers speculate about room temperature• Properties:

– Has a long-distance Wire• So called “ballistic movement”• So-called ballistic movement

– Seems to have relatively long decoherence times– Seems to have relatively low error rates for:

Memory Gates Movement• Memory, Gates, Movement

4/28/2010 15cs252-S10, Lecture 25

Q bit t i i ( B +)

Quantum Computing with Ion Traps

Electrode Control

• Qubits are atomic ions (e.g. Be+)– State is stored in hyperfine levels– Ions suspended in channels

between electrodesQubit Ions

between electrodes• Quantum gates performed by

lasers (either one or two bit ops)– Only at certain trap locationsy p– Ions move between laser sites to

perform gates• Classical control

G (l ) Gate LocationElectrodes

– Gate (laser) ops– Movement (electrode) ops

• Complex pulse sequences to cause Ions to migrate

Gate Location

cause Ions to migrate• Care must be taken to avoid

disturbing state• Demonstrations in the Lab

– NIST, MIT, Michigan, many others

Courtesy of Chuang group, MIT4/28/2010 16cs252-S10, Lecture 25

Page 5: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

An Abstraction of Ion Traps• Basic block abstraction: Simplify Layoutp y y

in/out ports

E l ti f l t th h sim l tistraight 3-way 4-way turn gate locations

• Evaluation of layout through simulation– Movement of ions can be done classically– Yields Computation Time and Probability of Successp y

• Simple Error Model: Depolarizing Errors– Errors for every Gate Operation and Unit of Waiting– Ballistic Movement Error: Two error Models– Ballistic Movement Error: Two error Models

1. Every Hop/Turn has probability of error2. Only Accelerations cause error

4/28/2010 17cs252-S10, Lecture 25

Ion Trap Physical Layout

HH

q01

Time• Input: Gate level quantum circuit H

Hq1q2q3q4Q

ubits

circuit– Bit lines– 1-qubit gates

q5q6

– 2-qubit gates• Output:

– Layout of channels– Layout of channels– Gate locations– Initial locations of ions

q0q5

q6

1

q2

– Movement/gate schedule– Control for schedule

q3

q1

q4

4/28/2010 18cs252-S10, Lecture 25

Outline

• Quantum Computering• Ion Trap Quantum Computing• Ion Trap Quantum Computing• Quantum Computer Aided Design

– Area-Delay to Correct Result (ADCR) metricy ( )– Comparison of error correction codes

• Quantum Data PathsQLA CQLA Qalypso– QLA, CQLA, Qalypso

– Ancilla factory and Teleportation Network Design• Error Correction Optimization (“Recorrection”)p ( )• Shor’s Factoring Circuit Layout and Design

4/28/2010 19cs252-S10, Lecture 25

Vision of Quantum Circuit Design

Classical ControlSchematic Capture QEC Insertion Teleportation Networkm p u

(Graphical Entry)

OR

QEC InsertionPartitioning

LayoutNetwork Insertion

Error AnalysisError Analysis…

Optimization

CAD Tool

Custom Layout andScheduling

CAD ToolImplementation

Quantum Assembly(QASM)

4/28/2010 20cs252-S10, Lecture 25

Page 6: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Important Measurement Metrics• Traditional CAD Metrics:Traditional CAD Metrics:

– Area• What is the total area of a circuit?• Measured in macroblocks (ultimately m2 or similar)Measured n macroblocks (ult mately m or s m lar)

– Latency (Latencysingle)• What is the total latency to compute circuit once• Measured in seconds (or s)

– Probability of Success (Psuccess)• Not common metric for classical circuits• Account for occurrence of errors and error correction

Q Ci i M i ADCR • Quantum Circuit Metric: ADCR – Area-Delay to Correct Result: Probabilistic Area-Delay metric

ADCR = Area E(Latency) = singleLatencyArea– ADCR = Area E(Latency) =

– ADCRoptimal: Best ADCR over all configurations• Optimization potential: Equipotential designs

success

g

P

• Optimization potential: Equipotential designs– Trade Area for lower latency– Trade lower probability of success for lower latency

4/28/2010 21cs252-S10, Lecture 25

• First, generate a physical instance of circuitE d th i it i QEC d

How to evaluate a circuit?– Encode the circuit in one or more QEC codes– Partition and layout circuit: Highly dependant of layout heuristics!

• Create a physical layout and scheduling of bits• Yields area and communication costYields area and communication cost

Normal VectorMonte Carlo:

n timesMonte Carlo:single pass

• Then, evaluate probability of success– Technique that works well for depolarizing errors: Monte Carlo

• Possible error points: Operations, Idle Bits, Communications– Vectorized Monte Carlo: n experiments with one pass– Need to perform hybrid error analysis for larger circuits

• Smaller modules evaluated via vector Monte Carlo• Teleportation infrastructure evaluated via fidelity of EPR bitsTeleportation infrastructure evaluated via fidelity of EPR bits

• Finally – Compute ADCR for particular result– Repeat as necessary by varying parameters to generate ADCRoptimal

4/28/2010 22cs252-S10, Lecture 25

Quantum CAD flowQEC Insert

CircuitSynthesis

Input CircuitFault-Tolerant

Circuit

QEC OptimizationFa

Tole

ReSynthesis (ADCRoptimal)Circuit

(No layout)

aulterant

CircuitPartitioning

Mapping,Scheduling,

Classical control

CommunicationEstimation

TeleportationNetworkInsertiong P Fg Classical controlInsertion

OutpueM

apping

PartitioneCircuit

FunctionaSystem

Hybrid FaultAnalysis

ut LayoutP Complete Layout

Re

Error AnalysisMost Vulnerable Circuits

ed

al

tsuccess

Most Vulnerable rcu ts

ADCR computation

4/28/2010 23cs252-S10, Lecture 25

Example Place and Route Heuristic:Collapsed Dataflowp

• Gate locations placed in dataflow order– Qubits flow left to right– Initial dataflow geometry folded and sortedg y– Channels routed to reflect dataflow edges

• Too many gate locations, collapse dataflow– Using scheduler feedback, identify latency critical edgesg , y y g– Merge critical node pairs– Reroute channels

• Dataflow mapping allows pipelining of computation!pp g p p g p

q0q0q0 q0q1q2

q0q1q2

q0q1

2 q2q3

q2q3q2q3

4/28/2010 24cs252-S10, Lecture 25

Page 7: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

• Possible to perform a comparison between codesComparing Different QEC Codes

p p– Pick circuit/Run through CAD flow– Result depends on goodness of layout and scheduling heuristic

• Layout for CNOT gate (Compare with Cross, et. al)

ate

y g ( p , )– Using Dataflow Heuristic

• Validated with Donath’s wire-length estimator (classical CAD)

Failu

re R

a(classical CAD)– Fully account of movement– Local gate model

• Failure Probability results

Logica

l FFailure Probability results– Best:[[23,1,7]] (Golay),

[[25,1,5]] (Bacon-Shor), [[7,1,3]] (Steane)

Movement Error

– Steane does particularlywell with high movement errors

• Simplicity particularly important in regimeimportant in regime

• More info in Mark Whitney thesis– http://qarc.cs.berkeley.edu/publications

4/28/2010 25cs252-S10, Lecture 25

Outline

• Quantum Computing• Ion Trap Quantum Computing• Ion Trap Quantum Computing• Quantum Computer Aided Design

– Area-Delay to Correct Result (ADCR) metricy ( )– Comparison of error correction codes

• Quantum Data PathsQLA CQLA Qalypso– QLA, CQLA, Qalypso

– Ancilla factory and Teleportation Network Design• Error Correction Optimization (“Recorrection”)p ( )• Shor’s Factoring Circuit Layout and Design

4/28/2010 26cs252-S10, Lecture 25

Quantum Logic Array (QLA)Anc

Comp

Anc

Comp

Anc

Comp

TP EPR EPREPR

EPR EPR

TP TP

EPR

Cor

n-physicalQubits

Syndrome

AncillaFactory

Correct

Anc

Comp

Anc

Comp

Anc

CompEPREPR

EPR EPREPR TPTP TP

EPR

rrectC

orrect

1 or 2-Qubit

EPR

Anc

Comp

Anc

Comp

Anc

CompEPREPR

TPTP TP

EPR

o QubGate (logical)

Storage for2 Logical Qubits

(In-Place)

TeleporterNODEEPR EPR

( )• Basic Unit:

– Two-Qubit cell (logical)– Storage, Compute, Correction

h l

EPR

• Connect Units with Teleporters– Probably in mesh topology, but

details never entirely clear from original papersFi st S i s (L s l ) O i ti (2005)• First Serious (Large-scale) Organization (2005)– Tzvetan S. Metodi, Darshan Thaker,

Andrew W. Cross, Frederic T. Chong, and Isaac L. Chuang4/28/2010 27cs252-S10, Lecture 25

DetailsWh R ul A ?• Why Regular Array?– Distribute Ancilla generation where it is needed– Single 2-Qubit storage cell quite large

d ld h • Concatenated [[7,1,3]] could have 343 or more physical Qubits/ logical Qubit

– Size of single logical Qubit makes sense to teleport between large logical blocksmakes sense to teleport between large logical blocks

– Regularity easier to exploit for CAD tools!• Same reason we have ASICs with regular routing channels

A ti• Assumptions:– Rate of ancilla consumption constant for every Qubit – Ratio of one Teleporter for every two Qubit gate is optimal– (Implicit) Error correction after every move or gate is optimal– Parallelism of quantum circuits can exploit computation on every

Qubit in the system at same time• Are these assumptions valid???

4/28/2010 28cs252-S10, Lecture 25

Page 8: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Running Circuit at “Speed of Data”• Often, Ancilla qubits are independent of dataq p

– Preparation may be pulled offline– Very clear Area/Delay tradeoff:

• Suggests Automatic Tradeoffs (CAD Tool)gg m ff ( )• Ancilla qubits should be ready “just in time”

to avoid ancilla decoherence from idleness

Hardware Devoted to Parallel Ancilla Generation

H CX

TQEC QEC QECT-AncillaQ0

QEC

QECAncilla

QEC

QECAncilla

QEC

QECAncilla

Parallel Circuit Latency

X HT QEC QEC QECT-AncillaQ1 QECAncilla

QECAncilla

QECAncilla

Serial Circuit Latency4/28/2010 29cs252-S10, Lecture 25

How much Ancilla Bandwidth Needed?

• 32-bit Quantum Carry-Lookahead Adder32 bit Quantum Carry Lookahead Adder– Ancilla use very uneven (zero and T ancilla)– Performance is flat at high end of ancilla generation bandwidth

• Can back off 10% in maximum performance an save orders of magnitude in ancilla generation areamagnitude in ancilla generation area

• Many bits idle at any one time– Need only enough ancilla to maintain state for these bits– Many not need to frequently correct idle errorsMany not need to frequently correct idle errors

• Conclusion: makes sense to compute ancilla requirements and share area devoted to ancilla generation

4/28/2010 30cs252-S10, Lecture 25

Ancilla Factory Design I“ l ll • “In-place” ancilla preparation

0 Prep

Cat PrepVerify

?Bit

Correct0 Prep

Cat Prep

0 Prep

Verify

Verify

?

?

Correct

PhaseCorrect

Encoded Ancilla Verification Qubits

A ill f t i t f f th

Cat PrepVerify

• Ancilla factory consists of many of these– Encoded ancilla prepared

in many places then In-placeP

In-placePin many places, then

moved to output port– Movement is costly!

Prep Prep

In-placePrep

In-placePrep

4/28/2010 31cs252-S10, Lecture 25

Ancilla Factory Design IIl d ll k • Pipelined ancilla preparation: break into stages

– Steady stream of encoded ancillae at output portFully laid out and scheduled to get area and – Fully laid out and scheduled to get area and bandwidth estimates

Physical CNOTs Verif X/Zs aey0 Prep

Cat Prep

sbar

sbar

Correct

sbar

ical

Qub

its

ded

Anc

illa

Cro

ss

CNOTs

Cro

ss

Physical

Cro

ss

X/ZJunk

Phy

s

ood

Enc

od

Cat Prep VerifPhysical0 Prep

X/ZCorrect

J

Go

R l t t t bit d f ilRecycle cat state qubits and failures

Recycle used correction qubits4/28/2010 32cs252-S10, Lecture 25

Page 9: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

The Qalypso Datapath ArchitectureD s d t i• Dense data region– Data qubits only– Local communication

• Shared Ancilla Factories– Distributed to data as needed

Full multipl x d t ll d t– Fully multiplexed to all data– Output ports ( ): close to data– Input ports ( ): may be far from

d ( l d i l )data (recycled state irrelevant)• Regions connected by teleportation networks

R R R

4/28/2010 33cs252-S10, Lecture 25

Tiled Quantum DatapathsA AA AncAnc Anc Anc AncAnc

Comp

Anc

Comp

Anc

Comp

AncAncAnc

EPR EPREPR

EPR EPR

TP TP

Anc

Mem

Anc

Mem

AncAncAnc

Anc

Mem

TP TPEPR EPREPR

EPR EPR

TPAnc

Anc

Mem

Anc

Anc

Mem

TP EPREPR

EPR

EPR

Anc

Anc

Comp

Anc

Comp

Anc

Comp

AncAnc

EPREPR

EPR EPREPR TPTP

Anc

Comp

Anc

Comp

Anc

Comp

Anc Anc AncTPTP

EPREPR

EPR EPREPR

CompAncComp

AncTP

EPR

EPREPR

Anc

CompCompCompEPREPR

Previous: QLA, LQLA

Mem Mem MemEPREPR

Previous: CQLA, CQLA+

MemEPR

Comp

Our Group: Qalypso

• Several Different Datapaths mappable by our CAD flow– Variations include hand-tuned Ancilla generators/factories

• Memory: storage for state that doesn’t move muchm y g f m m– Less/different requirements for Ancilla– Original CQLA paper used different QEC encoding

• Automatic mapping must:P titi i it t d i– Partition circuit among compute and memory regions

– Allocate Ancilla resources to match demand (at knee of curve)– Configure and insert teleportation network

4/28/2010 34cs252-S10, Lecture 25

Which Datapath is Best?• Random Circuit Generation• Random Circuit Generation

– f(Gate Count, Gate Types, Qubit Count, Splitting factor)– Splitting factor (r): measures connectivity of the circuit

• Example: 0 5 splits Qubits in half adds random gates between Example: 0.5 splits Qubits in half, adds random gates between two halves, then recursively splits results

• Closely related to Rent’s parameter• Qalypso clear winner (for all r)Q yp ( )

– 4x lower latency than LQLA– 2x smaller area than CQLA+

• Why Qalypso does well:y Q yp– Shared, matched ancilla generation– Automatic network sizing (not one

Teleporter for every two Qubits) A t ti Id tifi ti f– Automatic Identification ofIdle Qubits (memory)

• LQLA and CQLA+ perform close second– Original datapaths supplemented with better ancilla generators – Original datapaths supplemented with better ancilla generators,

automatic network sizing, and Idle Qubit identification– Original QLA and CQLA do very poorly for large circuits

4/28/2010 35cs252-S10, Lecture 25

How to design Teleportation Networkp

Incoming Classical Information

StorageCC

EPR St

Y Teleporters(Unique ID, Dest, Correction Info)

Stor ra

ge

CC

CC

EPR Stream X Teleporters

rage

Storage

Sto

r

CCOutgoing Message

• What is the architecture of the network?I l di T l R d i EPR G – Including Topology, Router design, EPR Generators, etc..

• What are the details of EPR distribution?• What are the practical aspects of routing?p p g

– When do we set up a channel?– What path does the channel take?

4/28/2010 36cs252-S10, Lecture 25

Page 10: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Basic Idea: Chained Teleportation

TeleportationTeleportation

p

G TT G TGT G TGT

PP Adjacent T nodes linked for teleportation

• Positive Features– Regularity (can build classical network topologies)g y p g– T node linking not on critical path– Pre-purification part of link setup

• Fidelity amplification of the liney p– Allows continuous stream of EPR correlations to be established

for use when necessary4/28/2010 37cs252-S10, Lecture 25

1600

Pre-Purification

r

1200

1400

1600

Purify at End Only

Pre-Purify Once Pairs

Per

cation T

600

800

1000 Pre-Purify Twice

ance

EPR

Co

mmun

ic

G

200

400

600

ong-

Dista

Dat

a C

T0

1.0E-09 1.0E-08 1.0E-07 1.0E-06 1.0E-05Error Rate Per Operation

Lo T

• Experiment: Transmit enough EPR pairs over network to meet required fidelity of channel– Measure total global trafficg– Higher Fidelity local EPR pairs less global EPR traffic

• Benefit: decreased congestion at T Nodes4/28/2010 38cs252-S10, Lecture 25

Building a Mesh InterconnectT T T T

P P P P

G

G

G

G

G

G G

T T T T

P P P PG G G G

G G G

Gate Gate Gate Gate

T T T T

P P P P

G G G

• Grid of T nodes linked by G nodes

Gate Gate Gate Gate

• Grid of T nodes• Packet-switched network

- Options: Dimension-Order or Adaptive RoutingP d d d i f

, linked by G nodes

- Precomputed or on-demand start time for setup• Each EPR qubit has associated classical message

4/28/2010 39cs252-S10, Lecture 25

Outline

• Quantum Computing• Ion Trap Quantum Computing• Ion Trap Quantum Computing• Quantum Computer Aided Design

– Area-Delay to Correct Result (ADCR) metricy ( )– Comparison of error correction codes

• Quantum Data PathsQLA CQLA Qalypso– QLA, CQLA, Qalypso

– Ancilla factory and Teleportation Network Design• Error Correction Optimization (“Recorrection”)p ( )• Shor’s Factoring Circuit Layout and Design

4/28/2010 40cs252-S10, Lecture 25

Page 11: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Reducing QEC Overhead

CorrectCorrect

Correct

HH Correct Correct Correct HH Correct

• Standard idea: correct after every gate and long

Correct

Standard idea: correct after every gate, and long communication, and long idle time– This is the easiest for people to analyze– Urban Legend? Must do in order to keep circuit fault tolerant!Urban Legend? Must do in order to keep circuit fault tolerant!

• This technique is suboptimal (at least in some domains)– Not every bit has same noise level!

• Different idea: identify critical Qubits• Different idea: identify critical Qubits– Try to identify paths that feed into noisiest output bits– Place correction along these paths to reduce maximum noise

4/28/2010 41cs252-S10, Lecture 25

Simple Error Propagation ModelError Distance

1

1

2

(EDist) LabelsMaximum EDist

propagation:

4

3

1 2

1

1

2

3

3 2

1

14=max(3,1)+1

H Correct

3

4

1 3

1

1

2

Correct

• EDist model of error propagation: – Inputs start with EDist = 0– Each gate propagates max input EDist to outputs g p p g m p p– Gates add 1 unit of EDist, Correction resets EDist to 1

• Maximum EDist corresponds to Critical Path– Back track critical paths that add to Maximum EDistp

• Add correction to keep EDist below critical threshold– Example: Added correction to keep EDistMAX 2

4/28/2010 42cs252-S10, Lecture 25

QEC OptimizationEDistMAXEDistMAXiteration

QECOptimization

EDistMAX

Partitioningand

Layout

FaultAnalysisInput

Circuit

• Modified version of retiming algorithm: called

EDistMAX Layout

OptimizedLayout

Circuit

ret m ng algor thm called “recorrection:”– Find minimal placement

of correction operations that meets specified

(E ) E

1024-bit QRCA and QCLA adders

p fMAX(EDist) EDistMAX

• Probably of success notalways reduced for EDistMAX > 1EDistMAX > 1– But, operation count and

area drastically reduced• Use Actual Layouts and

Fault AnalysisFault Analysis– Optimization pre-layout,

evaluated post-layout4/28/2010 43cs252-S10, Lecture 25

Recorrection in presence of different QEC codes

f Su

cces

s

Suc

cess

babi

lity

of

abili

ty o

f

Prob

Move Error Rate per Macroblock

Prob

a

Idle Error Rate per CNOT Time

• 500 Gate Random Circuit (r=0.5)• Not all codes do equally well with Recorrection

pEDistMAX=3

pEDistMAX=3

Not all codes do equally well with Recorrection– Both [[23,1,7]] and [[7,1,3]] reasonable candidates– [[25,1,5]] doesn’t seem to do as well

• Cost of communication and Idle errors is clear here!Cost of commun cat on and Idle errors s clear here!• However – real optimization situation would vary EDist

to find optimal point4/28/2010 44cs252-S10, Lecture 25

Page 12: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

Outline

• Quantum Computing• Ion Trap Quantum Computing• Ion Trap Quantum Computing• Quantum Computer Aided Design

– Area-Delay to Correct Result (ADCR) metricy ( )– Comparison of error correction codes

• Quantum Data PathsQLA CQLA Qalypso– QLA, CQLA, Qalypso

– Ancilla factory and Teleportation Network Design• Error Correction Optimization (“Recorrection”)p ( )• Shor’s Factoring Circuit Layout and Design

4/28/2010 45cs252-S10, Lecture 25

Comparison of 1024-bit addersADCRoptimal for ADCRoptimal for optimal1024-bit QCLA

optimal1024-bit QRCA and QCLA

• 1024-bit Quantum Adder Architectures– Ripple-Carry (QRCA)Ripple Carry (QRCA)– Carry-Lookahead (QCLA)

• Carry-Lookahead is better in all architecturesy• QEC Optimization improves ADCR by order of

magnitude in some circuit configurations4/28/2010 46cs252-S10, Lecture 25

Area Breakdown for Adders

• Error Correction is not predominant use of area– Only 20-40% of area devoted to QEC ancilla– For Optimized Qalypso QCLA, 70% of operations for QEC ancilla

ti b t l b t 20% f p yp p

generation, but only about 20% of area• T-Ancilla generation is major component

– Often overlooked• Networking is significant portion of area when allowed to • Networking is significant portion of area when allowed to

optimize for ADCR (30%)– CQLA and QLA variants didn’t really allow for much flexibility

4/28/2010 47cs252-S10, Lecture 25

Investigating 1024-bit Shor’s

• Full Layout of all Elements– Use of 1024-bit Quantum Adders– Optimized error correction– Ancilla optimization and Custom Network LayoutAncilla optimization and Custom Network Layout

• Statistics:– Unoptimized version: 1.351015 operations

O ti i d V i 1000X ll– Optimized Version 1000X smaller– QFT is only 1% of total execution time

4/28/2010 48cs252-S10, Lecture 25

Page 13: Use Quantum Mechanics to Compute? - University of ...kubitron/courses/cs252...CS252 Graduate Computer Architecture Lt Lecture 25 Quantum Computing and Quantum CAD Design April 28th,

1024-bit Shor’s Continued

• Circuits too big to compute PsuccessWorking on this problem– Working on this problem

• Fastest Circuit: 6108 seconds ~ 19 years– Speedup by classically computing recursive squares?

• Smallest Circuit: 7659 mm2Smallest Circuit: 7659 mm– Compare to previous estimate of 0.9 m2 = 9105 mm2

4/28/2010 49cs252-S10, Lecture 25

ConclusionQ t C t A hit t• Quantum Computer Architecture:– Considering details of Quantum Computer systems at larger

scale (1000s or millions of components)• Argued that CAD tools may have a place in Quantum • Argued that CAD tools may have a place in Quantum

Computing Research– Presented Some details of a Full CAD flow (Partitioning, Layout,

Simulation, Error Analysis)Simulation, Error Analysis)– New Evaluation Metric: ADCR = Area E(Latency)– Full mapping and layout accounts for communication cost

• “Recorrection” Optimization for QECecorrect on Opt m zat on for QE– Simplistic model (EDist) to place correction blocks– Validation with full layout– Can improve ADCR by factors of 10 or morep y

• Improves latency and area significantly, can improve probability under some circumstances as well

• Full analysis of Adder architectures and 1024-bit Shor’sS ill l ( d bi ) b ll h i i– Still too long (and too big), but smaller than previous estimates

– Total circuit size still too big for our error analysis – but have hope that we can improve this

4/28/2010 50cs252-S10, Lecture 25