26
2020 Symposia on VLSI Technology and Circuits A Probabilistic Self-Annealing Compute Fabric Based on 560 Hexagonally Coupled Ring Oscillators for Solving Combinatorial Optimization Problems Ibrahim Ahmed 1 , Po-Wei Chiu 1 , and Chris H. Kim 1 1 University of Minnesota

A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

A Probabilistic Self-Annealing Compute Fabric Based on 560 Hexagonally Coupled

Ring Oscillators for Solving Combinatorial Optimization Problems

Ibrahim Ahmed1, Po-Wei Chiu1, and Chris H. Kim1

1 University of Minnesota

Page 2: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 1CA3.3

Page 3: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 2CA3.3

Page 4: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Motivation

• Solution space and time increases rapidly with the number of variables

• Intractable to solve using traditional von Neumann computer

Slide 3CA3.3

0 200 400 600 800 1000Number of variables, n

10010201040106010801010010120

Solu

tion

time,

T

COP: Boolean satisfiability

𝑻 = 𝟏. 𝟑𝟎𝟕𝒏

R. Paturi, JACM, 2005 (UCSD)

Page 5: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Motivation

Slide 4CA3.3

• Ising computation uses energy transfer between spins to reach global minima• A network of coupled oscillators can solve many NP-hard and NP-complete

problems, such as COPs

𝐻(𝑠) = −'𝐽𝑖𝑗 ⋅ 𝑠𝑖𝑠𝑗𝑖,𝑗

−'ℎ𝑖𝑠𝑖𝑖

𝐽89 𝐽78

𝐽56 𝐽45

𝐽23 𝐽12 𝐽34 𝐽14

𝐽69 𝐽58 𝐽47 Local minima

Global minimaH

amilt

onia

n H

(s)

Spin states

𝐽25

COP mapped to graph Network of Spin𝐽!" = Coupling between spins 𝑖 and 𝑗𝑠! = State of spin 𝑖, ℎ! = Local field to spin 𝑖𝐻(𝑠) = Hamiltonian of the system

A. Lucas, Front. Phys., 2014

Page 6: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Prior Art

Slide 5CA3.3

Volume= 700 ft3

D-Wave

Quantum computer

GooglePros: Can theoretically solve a wide variety or problemsCons:1. Requires sophisticated hardware2. Very sensitive to noise3. Operating temperature -273.14°C

Power =25KW

Nano-magnet based

Special process

D. Pierangeli, PRL, 2019

P. Debashis, IEDM, 2016

LASER based

Pros: Can achieve coupling dynamicsCons:1. Requires special process with

fabrication challenges2. Limited practical demonstrations

Page 7: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Prior Art

Slide 6CA3.3

M. Yamaoka, JSSC, 2016T. Takemoto, ISSCC, 2019

Pros: Can theoretically solve a wide variety or problemsCons:1. No coupling dynamics : computation

time and energy may be higher2. Search for better solution is time

dependent

ASIC implementation Breadboard and PCB implementation

T. Wang, arXiv, 2017

T. Wang, arXiv, 2019

Pros: Can achieve coupling dynamicsCons:Not practical for large number of programmable spins

CMOS integrated Ising computer has the best of both worlds

Page 8: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 7CA3.3

Page 9: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Choice of CMOS Spin

Slide 8

LC oscillator:• Pros: low phase noise, frequency

insensitive to PVT• Cons: very large area, narrow

frequency tuning range

Ring oscillator (ROSC):• Pros: small area, simple design,

wide frequency tuning range• Cons: higher phase noise, higher

sensitivity to PVT

Type of oscillator not important for solving Ising problems (e.g. mechanical, electrical, chemical, etc.)

464 µm

Inductor layout

RO

SC la

yout

448

µm

L

RC+-

LC oscillator Ring oscillator

CA3.3

Page 10: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits Slide 9CA3.3

ROSC Coupled Using Digital Latches

1 2 3 4

1 2 3 4

EN1

EN2

Phase comparator

1 2 3 4

1 2 3 4

EN1

EN2

Phase comparator

a) Positive coupling b) Negative coupling

V1

V2

V1

V2 V1V2

V1V2

Negative coupling

Positive coupling

c) Phase comparator

Phase,𝜙1

Phase,𝜙2

𝑠 = 0

𝑠 = 1

ROSC1 ROSC1

ROSC2 ROSC2

Phase,𝜙1

Phase,𝜙2

• Any coupling medium that enables energy transfer may couple ROSCs• ROSC and digital latches are designed with global and local enable signals

Page 11: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Choice of Architecture

Slide 10CA3.3

s2 = 0s3 = 0s4 = 1

s559 = 0s560 = 1

s1 = 1

Hexagonally coupled 28x20 ROSC array

Latch-based coupling

i-th cell and neighbors

Osc. 1Osc. 2Osc. 3Osc. 4

Osc. 560

Osc. 559

Osc. 5 s5 = 1

ni = number of coupled neighbors of i-th cell, Jij = coupling weight between cells i and j

ROSC waveform

Cell states (2560≈3.8×10168 combinations)

Isin

g H

amito

nian

(Esy

stem

)

Local minima

Global minima

Cell state𝐸𝑠𝑦𝑠𝑡𝑒𝑚 = −))𝐽𝑖𝑗 ∙ 𝑠𝑖 ∙ 𝑠𝑗

𝑛𝑖

𝑗=1

𝑁

𝑖=1

... ...

Proposed architecture ROSC waveform and unit cell states System energy of the Ising computer

• Hexagonal unit cell maximizes the number of neighbors in 2D plane• Latch based coupling between cells is digitally controlled

Page 12: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

ROSC and Latch Design

Slide 11

NAND gate

123

5 6

4

7

CLK

Ctrl

Ctrl

Ctrl

7

InverterSynchronizing Clock signal

G_REN

L_REN

Seven stage Ring oscillator

ROSC:• Seven stage ROSC• Global and local enable signals,

G_REN and L_REN, respectively, controls ROSC activation

• Measured frequency: 120MHz

Digital Latch:• Designed with back to back inverters• Global and local enable signals, G_LEN

and L_LEN, respectively, controls latch activation

Digital latch

IN2Va

Vb IN1

Va

Vb

Vg1

Vg2

Vg1

Vg2Va

Vg1Vg2

Vb

Latch Control signals

G_LEN

L_LEN

Va

G_LEN

CA3.3

Page 13: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Modular Unit Cell: Circuit Blocks

Slide 12CA3.3

G_REN

L_REN

CLK

Ai<0>

Ai<4>

Aj<4> 6

2

3 2

D Q

c) Read block

Ai<4>

Qij

a) ROSC block

Aj<0>G_LEN

L_LEN<1:3>

3

b) Latch coupling block

Ai<0>

j ≠ i

D Q D Q D Q D Q

d) Scan block

L_LEN<1:3>L_REN

SCLK_ISCLK_O

DATA_I DATA_O

• Read block samples one of the neighboring ROSC• Scan block programs the graph: four program bit per cell, 2240 bit for the chip

Page 14: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Timing Based Annealing Mechanism

Slide 13CA3.3

Time (µs)

RO

SC p

erio

d (n

s)

0.0 0.2 0.4 0.6 0.8 1.0 1.223

4

5

67

Stable states Stable states Stable states

G_LEN

• Annealing helps the chip to find a better local minima• We periodically turn off the coupling latches and let spin states to change• To reduce delay, we only annealed the chip three times for measurement

65nm CMOS, VDD 1.0V, 25°C

Page 15: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Die Photo and Chip Summary

Slide 14

Application

Process

Architecture

Voltage

Peak powerPower per cell

Combinatorial optimization problems

65nm CMOSROSC, latch based

coupling, self-annealing1.0V

Core: 0.53mm2

23mW41µW

AreaChip: 1.44mm2

Unit cell: 0.00095mm2

28x20 hexagonally coupled ring oscillator

array

On-chip CLK

760µm

701µ

m

Read scans

28x20 array

Peripheral circuit

Die photo Full chip layout Chip summary1.2mm

1.2m

m

• 28x20=560 coupled oscillators (only limiting factor is chip area)

CA3.3

Page 16: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 15CA3.3

Page 17: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Mapping Graph Problems to Hardware

Slide 16

State 0State 1Edge

b) “Difficult” COP (measured)a) “Easy” COP (measured)

• Randomly generate spatial graph using an algorithm that can be directly mapped to the chip without complicated embedding algorithm

• “Easy” COP: 2-color graphs such as checker board pattern• “Difficult” COP: 4 or more color graphs

CA3.3

Page 18: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Introduction to Max Cut Problem

Slide 17

σ1 1

1

2

23

12

2

2

1

3

σ6

σ5 σ2

σ3 σ4

• The problem of finding a maximum cut in a graph is known as the Max-Cut Problem

• Finding max-cut of a graph is an NP-hard problem

1

1

2

2 3

12

2

2

1

3

1

1

2

2 3

12

2

2

1

3

1

1

2

2 3

12

2

2

1

3

1

1

2

2 3

12

2

2

1

3

Cut size = 7 Cut size = 9

Cut size = 12 Cut size = 16 (max)

CA3.3

Page 19: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Introduction to Max Cut Problem

Slide 18CA3.3

σ1 1

1

2

23

12

2

2

1

3

σ6

σ5 σ2

σ3 σ4 𝐻 = Hamiltonian of the system𝜎! = Spin status of magnet 𝑖 {+1 or -1}𝑤!" = weight between magnets 𝑖 and 𝑗

𝐻 𝜎 = −+",$

(−𝑤"$) 𝜎"𝜎$

= +%"&& '()*+

(−𝑤"$) + +,-./ '()*+

𝑤"$

= +%"&& '()*+

(−𝑤"$) + +",$

𝑤"$ − +%"&& '()*+

𝑤"$

=+-00

𝑤"$ − 2× +%"&& '()*+

𝑤"$

Cut size

• Ising Hamiltonian = [sum of all weights] – 2×[cut size]

Page 20: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 19CA3.3

Page 21: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Repeated Measurement of Same Graph

Slide 20CA3.3

Rows (1-26)

Col

umns

(1-1

8)

0.00.20.40.60.8EdgeVertex

Graph mapped to chip

Probability of cell state flip (100 iterations)

1.0

b)a)

0 20 40 60 80 1000

0.20.40.60.81.0

Iteration

Max

-cut

(nor

mal

ized

)

c)

Measured solution for same graph

Hamming distance (between iterations)0 0.2 0.4 0.6 0.8 1

05

10152025

Prob

abili

ty (%

)

d)

• Unit cells may change states between iterations while the final cut values remain similar : proving chip is exploring various local minima

Page 22: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Max-Cut Results for Graph Size 15x15

Slide 21

150 different max-cut problems1M random solutions per problem

0 0.2 0.4 0.6 0.8 1.00

10

20

30

40

Prob

abili

ty (%

)

Normalized cut-value

Graph size: 15x15Search space: 215x15≈5.4x1067

• 150 difficult COPs are mapped and max-cut results are measured• Annealing is done only three times as compared to thousands of times in quantum computers

Page 23: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Max-Cut Results for Different Graph Sizes

Slide 22CA3.3

0 0.2 0.4 0.6 0.8 10

10

20

30

40150 max-cut problems1M random solutions per problem

0 0.2 0.4 0.6 0.8 10

10

20

30

40

0 0.2 0.4 0.6 0.8 10

10

20

30

40

0 0.2 0.4 0.6 0.8 10

10

20

30

40

Prob

abili

ty (%

)Pr

obab

ility

(%)

Prob

abili

ty (%

)Pr

obab

ility

(%)

Normalized cut-value Normalized cut-value

Normalized cut-value Normalized cut-value

Graph size: 6x6Search space: 26x6≈6.9x1010

Graph size: 15x15Search space: 215x15≈5.4x1067

Graph size: 10x10Search space: 210x10≈1.3x1030

Graph size: 26x18Search space: 226x18≈7.6x10140

• Consistently better max-cut solution compared to 1 million random search

Page 24: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Comparison with Prior Arts

Slide 23

PRL ’19 [1] IEDM ’19 [2] ISSCC ’19 [3] ISSCC ’20 [4] Quantum computer [5] This work

Architecture All to all All to all Near neighbor Near neighbor Near neighbor Near neighbor

Technology Photonics IMT devices CMOS 40nm CMOS 65nm Superconductor CMOS 65nm

Coupling Light modulation

IMT interaction Digital logic Digital logic Qubit

interaction ROSC interaction

Operating temperature

Room temperature

Room temperature

Room temperature

Room temperature -273.14°C Room

temperature

Total measured solution 16 1 10 1 Many 1000

Delay 1000 cycles 1ms 22µs 30 cycles - 1µs - 10µs (est.)

Peak power Not reported Not reported Not reported Not reported 25KW 23mW

Measured accuracy

>95% (87% cases)

97.6% (4 spins) 98.8% 100%

(Easy COP) - 98%-100% (Easy)82%-100% (Diff.)

CA3.3

[1] D. Pierangeli, PRL, 2019 [2] S. Dutta, IEDM, 2019 [3] T. Takemoto, ISSCC, 2019

[4] Y. Su, ISSCC, 2020 [5] Z. Bian, D-Wave Systems, 2010

Page 25: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion

Slide 24CA3.3

Page 26: A Probabilistic Self-Annealing Compute Fabric Based on 560 ...people.ece.umn.edu/groups/VLSIresearch/papers/2020/... · Peak power Power per cell Combinatorial optimization problems

2020 Symposia on VLSI Technology and Circuits

Conclusion• First hardware demonstration of a true coupling based

integrated CMOS Ising computer

• Probabilistic exploration of various local minima

• Mapped and solved 1000 COPs in the chip with an accuracy

of 82%-100%

Slide 25CA3.3