Using magnetic tunnel junctions to compute like the brain...Stochastic computing with superparamagnetic tunnel junctions 27 Neural network Stochastic bit streams AND and OR gates,

· PML · NDCD · Alternative Computing Group 1

Using magnetic tunnel junctions to compute like the brainMark Stiles with help from a cast of dozens

Why?

To understand how the brain works

To develop machine consciousness

To do cognitive computing more efficiently

· PML · NDCD · Alternative Computing Group

Hardware for Artificial Intelligence – NIST Gaithersburg

2

Me Vivek Amin

Advait Madhavan Mark

Anders

Philippe Talatchian

Jonathan Goodwill

Matthew Daniels Brian

Hoskins

Siyuan Huang

Alice Mizrahi

Nikolai Zhitenev

Jabez McClelland


Computers are designed to solve numerical problems, the brain excels at categorical problems

3

Natural language processing

Image: cmu.edu

Satellite trajectories

Image: wikimedia.org

Digital Computer Neuromorphic System

High precision numerical Categorical


Computers separate long term memory from processing,

while in the brain, they are intimately connected

4

Neurons (1011)

Synapses (1015)

Small scale structure of the brain:

Memory Unit

Central Processing Unit

Control Unit

Logic Unit

Input

Output

von Neumann architecture:


Memory-Processor Bottleneck Collocation of processor and memory



Computers use a rigid encoding scheme,

the brain prioritizes resiliency with more flexible schemes

5

Neuronal spiking

timeImage: Wikipedia

Clocked processing

Image: Wikipedia


Discrete, Deterministic, & Synchronous Analog, Stochastic, & Asynchronous

Memory-Processor Bottleneck Collocation of processor and memory


Rossant et al., Frontiers in Neuroscience 5, 9 (2011)

Neuro

n I

ndex

Binary Representation

64 32 16 8 4 2 1

1 0 0 1 1 0 1

1 s


Neural networks provide the simplest implementation of brain-like cognitive computing (rate coding)

6

inputs

idealoutputs

information flow

● Neurons output a rate representing a spike train.

● The rate from each neuron is multiplied by synaptic weights for each neuron pair.

● Neurons sum the incoming weights and output a non-linear function of the sum (activation function).

● Network non-linearly transforms space to bring similar inputs together

A: 1.00

B: 0.73

C: 0.53


A whole range of approaches to more efficient computing

7

Machine Learning Specialized

Hardware, GPU, TPU

Specialty Chips, TrueNorth, Loihi, BrainScales

Hardware for Artificial Intelligence

Software

Architectures

Encoding

Circuits

Physical devices


CMOS has developed well-defined abstraction layers,but with novel devices, designing across the stack is necessary

8

Details depend on what the device can do.

What is important depends on what the architecture needs

Software

Architectures

Encoding

Circuits

Physical devices


Magnetic tunnel junctions are multifunctional devices

9

● Non-volatile embedded memory

● Spin-torque nano-oscillators

● Superparamagnetic tunnel junctions

0 2 4 6 8

Re

sis

tan

ce

Time (s)

RAP

RP

RAP

RP

Intel: MRAM integrated into 22nm FinFET CMOS

Neuronal spiking

timeImage: Wikipedia

Rossant et al., Frontiers in Neuroscience 5, 9 (2011)

Neuro

n I

ndex


Bigger energy barriers mean longer data retention, but more energy needed to write.

10

Energy

barrier

60 kT

14 kT

years

ms

μs

Memory for storage (MRAM)

Short term memory (embedded/cache)

Superparamagnetic (random signals)

7 kT

Retention

Time

AP

P

“1”

“0”

low-resistance high-resistance


Magnetic tunnel junctions can be controlled by current.

11

Electron flow through tunnel junction→ Spin-transfer torque

Electron flow through heavy metal underlayer→ Spin-orbit torque

IHM

IMTJ


Energy

Parallel Antiparallel

Negative current


Energy

Parallel Antiparallel

Positive current


Superparamagnetic magnetic tunnel junctions: Control rate with spin-transfer torque (or spin-orbit torque)

12

0.100 0.125 0.150

400

500

600

700

Re

sis

tan

ce

(

)

Time (s)

-50 0 500

500

1000

Rate

(H

z)

Current (A)

Mean R

esi

stance

(Pro

babili

ty)

𝑅𝐴𝑃

𝑅𝑃

IMTJ

AP

kBT

P


Superparamagnetic magnetic tunnel junctions: Control rate with spin-transfer torque (or spin-orbit torque)

13

0.100 0.125 0.150

400

500

600

Resis

tan

ce (

)

Time (s)

0.100 0.125 0.150

400

500

600

700

Re

sis

tan

ce

(

)

Time (s)

0.100 0.125 0.150

400

500

600

Resis

tan

ce (

)

Time (s)

-50 0 500

500

1000

Rate

(H

z)

Current (A)

Mean R

esi

stance

(Pro

babili

ty)

𝑅𝐴𝑃

𝑅𝑃

IMTJ

AP

kBT

I > 0

Stabilizes

antiparallel

state

P

I < 0

Stabilizes

parallel

stateP AP

kBT

AP

kBT

P


Computing with superparamagnetic tunnel junctions

14

● Random number generation

● Probabilistic computing

● Rate coding

Rate

Current

Pro

babili

ty

Current

𝑝 = 0.5

Borders et al., Nature 573, 390-393 (2019)

p -bit

Won Ho Choi et al., 2014 IEEE IEDM pp. 12.5.1-12.5.4,

D. Vodenicarevic et al., Phys. Rev. Applied 8, 054045 (2017)

XOR gate

Direction


Many reasons to incorporate Complementary Metal Oxide Semiconductor (CMOS)

15

pmos

nmos

𝑉𝑔

𝑉𝑔

On for 𝑉𝑔 = 0

Off for 𝑉𝑔 = 𝑉𝑑𝑑

Off for 𝑉𝑔 = 0

On for 𝑉𝑔 = 𝑉𝑑𝑑

Author: Cephidan, https://commons.wikimedia.org/wiki/File:LDD-MOS_transistor_-_CMOS_with_STI.svg


Many reasons to incorporate Complementary Metal Oxide Semiconductor (CMOS)

16

pmos

nmos

𝑉𝑔

𝑉𝑔

On for 𝑉𝑔 = 0

Off for 𝑉𝑔 = 𝑉𝑑𝑑

Off for 𝑉𝑔 = 0

On for 𝑉𝑔 = 𝑉𝑑𝑑

A

Inverter

A

pmos

nmos

AA

𝑉𝑑𝑑

0

Voltage pins Output voltage from voltage pins, not input

One transistor always off except during switching,

In 2014, 250 × 1018

transistors manufactured

What can other devices do better?

● Non-volatility (local memory)

● Plasticity (local learning)

● Stochasticity

● Oscillators

→ cheap substrate

→ local energy source → fanout

→ 30 aJ and up


Energies vary over many orders of magnitude

17

1 meV

1 eV

1 keV

1 MeV

1 aJ

1 zJ

1 fJ

1 pJ

1 GeV

1 nJ

Room temperature

Gate transition

MRAM barrier

Synaptic event

MRAM switching

Neural spike

Ion channel

Liquid Helium temperature


Software

Architectures

Encoding

Circuits

Physical devices

Population coding with superparamagnetic tunnel junctions

18

● Rate/Population Coding

● Transition rate, binary

● Pre-charge sense amplifier

● Superparamagnetic tunnel junctions, Short term memory

A. Mizrahi, T. Hirtzlin, A. Fukushima, H. Kubota, S. Yuasa, J. Grollier, and D. Querlioz, Nature Comm. 9, 1533 (2018);

A. Mizrahi, J. Grollier, D. Querlioz, and M. D. Stiles, J. Appl. Phys. 124, 152111 (2018).


Bigger energy barriers mean longer data retention, but more energy needed to write.

19

Energy

barrier

60 kT

14 kT

years

ms

μs

Memory for storage (MRAM)

Short term memory (synaptic weights)

Superparamagnetic (spiking neurons)

7 kT

Retention

Time


AP

P


Population coding – represent continuous degrees of freedom with multiple spike rates

20

Direction

θ

+ 180°

- 180°

Neurons

Neuron

Firin

g r

ate

Firin

g r

ate

𝐻Σ

𝑤𝑖

𝑟𝑖

Weights


Simple sense, respond, measure, and learn test

21

Sensory input

Motor response

SMTJ-based CMOS only

Area 12,000 μm2 20,000 μm2

Energy 7.8 nJ 20 nJ

T. Hirtzlin at C2N:

Key advantage = stochastic analog to digital conversion

Weights stored in stable MTJs.Can we save energy there?


Continuous learning: the key to a robust system that can adapt to changes

22

Usually system is only trained once because it requires a lot of energyBUT unable to adapt to changes!

Initial Training

Untrained network

Use of the network

Unexpected change Loss of precision,

failure

Untrained network

Initial Training

Weight modification

Use of the network

Unexpected change

System adapted

System equipped with continuous learning


Continuous learning overcomes unreliable synapses

23

Here, ΔE = 15 kBT magnetic tunnel junctions can be used for memory with no precision loss

Ew = 100 kBT

No loss

Ew = 6 kBT

Ew = 8 kBT

Ew = 10 kBT

Ew = 15 kBT

0 1000 2000 3000 4000 5000 60001

10

Err

or

(%)

Learning steps


2 3 4 5 6 7 8 9 10 11 12

0.001

0.01

0.1

1

Error (%)

No

rma

lize

d p

ow

er

co

nsu

mp

tio

n

46 neurons

Increase energy

barrier ΔEw

Using unreliable synapses lowers the energy consumption

24

𝐼 ∝ ∆𝐸𝑤

𝐸𝑛𝑒𝑟𝑔𝑦 ∝ ∆𝐸𝑤2

Write energy

Write current

(Sato et al., APL 2014)


2 3 4 5 6 7 8 9 10 11 12

0.001

0.01

0.1

1

10 neurons

20 neurons

100 neurons

Error (%)

No

rma

lize

d p

ow

er

co

nsu

mp

tio

n

46 neurons


25



Write energy

Write current



0.001

0.01

0.1

1

10

100

2 3 4 5 6 7 8 9 101112

10

12

14

16

18

20

100

46

20

10 neurons

OptimumNo

rma

lize

d p

ow

er

co

nsu

mp

tio

nN

um

be

r

of

ne

uro

ns

En

erg

y b

arr

ier

(k

BT

)

Error (%)


26

Example:For 3% precision→ 54 neurons in each population (2916 weights)→ ΔEw = 12 kBT



Write energy

Write current



Software

Architectures

Encoding

Circuits

Physical devices

Stochastic computing with superparamagnetic tunnel junctions

27

● Neural network

● Stochastic bit streams

● AND and OR gates, PCSA

● Superparamagnetic tunnel junctions

M. W. Daniels, A. Madhavan, P. Tatatchian, A. Mizrahi, and M. D. Stiles, Physical Review Applied, 13, 034016 (2020).


Stochastic computing represents numbers as probabilistic bit

streams

28

observed = 0.19

0.5

0.4

0.7

● Resilient to white noise

● Incorrect bit ⟹ error ≈ 𝑂 1

● Compare to binary: error ≈ 𝑂(2𝑁)

● Naturally coherent with bilevel devices (magnetic tunnel junctions).

● Interpreting these “spike trains” as rates or probabilities restricts us to 0 ≤ 𝑝 ≤ 1.

Time →

0.63

0.35

0.69

p = 0.2


Unbiased superparamagnetic tunnel junction plus CMOSfor low energy random bitstreams

29

Pre-Charge Sense Amplifier (PCSA)Superparamagnetictunnel junction

𝑅𝐴𝑃 > 𝑅𝑟𝑒𝑓 > 𝑅𝑃

Clock

Reference resistor

Clock

Clock

Low energy random bits0.100 0.125 0.150

400

500

600

700

Resis

tan

ce (

)

Time (s)

IMTJ <<IC

AP

kBT

P

𝑝 = 0.5


AND gates for multiplication – Programmable bitstream generator: 10 fJ per bit

30

Example

1001010100 (0.4)

x 1010001110 (0.5)

= 1000000100 (0.2)

AND gate (multiply)

𝒑𝐀𝐍𝐃 = 𝒑𝒂𝒑𝒃

Typical stochastic computing uses Linear Feedback Shift Registers→ Short period pseudorandom numbers

1/2

1/4

3/4

1/4, 3/4

1/8, 3/8

7/8, 5/8

1/8, 3/8 , 5/8, 7/8

1/16, 3/16, 5/16, 7/16

15/16, 13/16, 11/16, 9/16

AND gate Inverter Multiplexer


Synapses from bitstream generators (weights) and AND gates (multiplication)

31

AND-gate synapseSynaptic weight:

Bit stream from neuron

Programmable Bit stream generator


Neurons from OR gates for nonlinear summation

32

Example

1001010100 (0.4)

+ 1010001110 (0.5)

= 1011011110 (0.7)

OR gate (nonlinear add)

𝒑𝐎𝐑 = 𝒑𝒂 + 𝒑𝒃 − 𝒑𝒂𝒑𝒃

OR-gate neuron does not work with correlated bit streams


Low area, high efficiency neural network with stochastic information throughout the calculation

33

AND-gate synapseSynaptic weight: Bit stream from programmable SMTJ generator

Bit stream from neuron

OR-gate neuron


Efficient random bitstream generation and avoiding domain translation – energy efficiency on a standard problem

34

A standard deep neural network structure Standard test dataset

● Energy/performance tradeoff by tuning circuit parameters.

● Running for longer or shorter total time (~N) gives

energy/performance tradeoff.

Stochastic computing implementations

Future: implement primitives to check feasibility with realistic MTJs


Collaborators

35

Brian Hoskins, Matthew Daniels, Guru Khalsa, Mark Anders, Jonathan Goodwill, Jabez McClelland, Nikolai Zhitenev, William Rippard, Matthew Pufall, Emilie Jué, Mark Stiles

Energy Frontier Research CenterDepartment of Energy

Alice Mizrahi, Advait Madhavan, Philippe Talatchian

Sumito Tsunegi, Kay Yakushiji, AkioFukushima, Hitoshi Kubota, Shinji Yuasa

Tifenn Hirtzlin, Damien Querlioz

Jacob Torrejon, Mathieu Riou, Flavio Abreu Araujo, Paolo Bortolotti, Vincent Cros, Julie Grollier

George Tzimpragos, Tim Sherwood

Siyuan Huang, Gina Adam

Vivek Amin, Jonathan Gibbons, Paul Haney, Axel Hoffmann, Julie Grollier


Using magnetic tunnel junctions to compute like the brain

36

Architecture Encoding Devices

Population coding Fluctuation rates Superparamagnetic tunnel junctions

Synaptic weights Binary Unstable magnetic tunnel junctions

Neural network with stochastic computing

Random bitstreams Superparamagnetic tunnel junctions, Repurposed CMOS Logic gates

● Emulating features of the brain can increase efficiency for cognitive computing.

● Applications require designing across the computational stack

● Architecture● Encoding● Circuitry● Devices

● CMOS is likely to play an important role in any room-temperature computing scheme.

● Magnetic tunnel junctions

● Multifunctional● Already in CMOS Fabs

Software

Architectures

Encoding

Circuits

Physical devices

Documents

Using magnetic tunnel junctions to compute like the brain...Stochastic computing with superparamagnetic tunnel junctions 27 Neural network Stochastic bit streams AND and OR gates,