Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
· PML · NDCD · Alternative Computing Group 1
Using magnetic tunnel junctions to compute like the brainMark Stiles with help from a cast of dozens
Why?
To understand how the brain works
To develop machine consciousness
To do cognitive computing more efficiently
· PML · NDCD · Alternative Computing Group
Hardware for Artificial Intelligence – NIST Gaithersburg
2
Me Vivek Amin
Advait Madhavan Mark
Anders
Philippe Talatchian
Jonathan Goodwill
Matthew Daniels Brian
Hoskins
Siyuan Huang
Alice Mizrahi
Nikolai Zhitenev
Jabez McClelland
· PML · NDCD · Alternative Computing Group
Computers are designed to solve numerical problems, the brain excels at categorical problems
3
Natural language processing
Image: cmu.edu
Satellite trajectories
Image: wikimedia.org
Digital Computer Neuromorphic System
High precision numerical Categorical
· PML · NDCD · Alternative Computing Group
Computers separate long term memory from processing,
while in the brain, they are intimately connected
4
Neurons (1011)
Synapses (1015)
Small scale structure of the brain:
Memory Unit
Central Processing Unit
Control Unit
Logic Unit
Input
Output
von Neumann architecture:
Digital Computer Neuromorphic System
Memory-Processor Bottleneck Collocation of processor and memory
High precision numerical Categorical
· PML · NDCD · Alternative Computing Group
Computers use a rigid encoding scheme,
the brain prioritizes resiliency with more flexible schemes
5
Neuronal spiking
timeImage: Wikipedia
Clocked processing
Image: Wikipedia
Digital Computer Neuromorphic System
Discrete, Deterministic, & Synchronous Analog, Stochastic, & Asynchronous
Memory-Processor Bottleneck Collocation of processor and memory
High precision numerical Categorical
Rossant et al., Frontiers in Neuroscience 5, 9 (2011)
Neuro
n I
ndex
Binary Representation
64 32 16 8 4 2 1
1 0 0 1 1 0 1
1 s
· PML · NDCD · Alternative Computing Group
Neural networks provide the simplest implementation of brain-like cognitive computing (rate coding)
6
inputs
idealoutputs
information flow
● Neurons output a rate representing a spike train.
● The rate from each neuron is multiplied by synaptic weights for each neuron pair.
● Neurons sum the incoming weights and output a non-linear function of the sum (activation function).
● Network non-linearly transforms space to bring similar inputs together
A: 1.00
B: 0.73
C: 0.53
· PML · NDCD · Alternative Computing Group
A whole range of approaches to more efficient computing
7
Machine Learning Specialized
Hardware, GPU, TPU
Specialty Chips, TrueNorth, Loihi, BrainScales
Hardware for Artificial Intelligence
Software
Architectures
Encoding
Circuits
Physical devices
· PML · NDCD · Alternative Computing Group
CMOS has developed well-defined abstraction layers,but with novel devices, designing across the stack is necessary
8
Details depend on what the device can do.
What is important depends on what the architecture needs
Software
Architectures
Encoding
Circuits
Physical devices
· PML · NDCD · Alternative Computing Group
Magnetic tunnel junctions are multifunctional devices
9
● Non-volatile embedded memory
● Spin-torque nano-oscillators
● Superparamagnetic tunnel junctions
0 2 4 6 8
Re
sis
tan
ce
Time (s)
RAP
RP
RAP
RP
Intel: MRAM integrated into 22nm FinFET CMOS
Neuronal spiking
timeImage: Wikipedia
Rossant et al., Frontiers in Neuroscience 5, 9 (2011)
Neuro
n I
ndex
· PML · NDCD · Alternative Computing Group
Bigger energy barriers mean longer data retention, but more energy needed to write.
10
Energy
barrier
60 kT
14 kT
years
ms
μs
Memory for storage (MRAM)
Short term memory (embedded/cache)
Superparamagnetic (random signals)
7 kT
Retention
Time
AP
P
“1”
“0”
low-resistance high-resistance
· PML · NDCD · Alternative Computing Group
Magnetic tunnel junctions can be controlled by current.
11
Electron flow through tunnel junction→ Spin-transfer torque
Electron flow through heavy metal underlayer→ Spin-orbit torque
IHM
IMTJ
low-resistance high-resistance
Energy
Parallel Antiparallel
Negative current
low-resistance high-resistance
Energy
Parallel Antiparallel
Positive current
· PML · NDCD · Alternative Computing Group
Superparamagnetic magnetic tunnel junctions: Control rate with spin-transfer torque (or spin-orbit torque)
12
0.100 0.125 0.150
400
500
600
700
Re
sis
tan
ce
(
)
Time (s)
-50 0 500
500
1000
Rate
(H
z)
Current (A)
Mean R
esi
stance
(Pro
babili
ty)
𝑅𝐴𝑃
𝑅𝑃
IMTJ
AP
kBT
P
· PML · NDCD · Alternative Computing Group
Superparamagnetic magnetic tunnel junctions: Control rate with spin-transfer torque (or spin-orbit torque)
13
0.100 0.125 0.150
400
500
600
Resis
tan
ce (
)
Time (s)
0.100 0.125 0.150
400
500
600
700
Re
sis
tan
ce
(
)
Time (s)
0.100 0.125 0.150
400
500
600
Resis
tan
ce (
)
Time (s)
-50 0 500
500
1000
Rate
(H
z)
Current (A)
Mean R
esi
stance
(Pro
babili
ty)
𝑅𝐴𝑃
𝑅𝑃
IMTJ
AP
kBT
I > 0
Stabilizes
antiparallel
state
P
I < 0
Stabilizes
parallel
stateP AP
kBT
AP
kBT
P
· PML · NDCD · Alternative Computing Group
Computing with superparamagnetic tunnel junctions
14
● Random number generation
● Probabilistic computing
● Rate coding
Rate
Current
Pro
babili
ty
Current
𝑝 = 0.5
Borders et al., Nature 573, 390-393 (2019)
p -bit
Won Ho Choi et al., 2014 IEEE IEDM pp. 12.5.1-12.5.4,
D. Vodenicarevic et al., Phys. Rev. Applied 8, 054045 (2017)
XOR gate
Direction
· PML · NDCD · Alternative Computing Group
Many reasons to incorporate Complementary Metal Oxide Semiconductor (CMOS)
15
pmos
nmos
𝑉𝑔
𝑉𝑔
On for 𝑉𝑔 = 0
Off for 𝑉𝑔 = 𝑉𝑑𝑑
Off for 𝑉𝑔 = 0
On for 𝑉𝑔 = 𝑉𝑑𝑑
Author: Cephidan, https://commons.wikimedia.org/wiki/File:LDD-MOS_transistor_-_CMOS_with_STI.svg
· PML · NDCD · Alternative Computing Group
Many reasons to incorporate Complementary Metal Oxide Semiconductor (CMOS)
16
pmos
nmos
𝑉𝑔
𝑉𝑔
On for 𝑉𝑔 = 0
Off for 𝑉𝑔 = 𝑉𝑑𝑑
Off for 𝑉𝑔 = 0
On for 𝑉𝑔 = 𝑉𝑑𝑑
A
Inverter
A
pmos
nmos
AA
𝑉𝑑𝑑
0
Voltage pins Output voltage from voltage pins, not input
One transistor always off except during switching,
In 2014, 250 × 1018
transistors manufactured
What can other devices do better?
● Non-volatility (local memory)
● Plasticity (local learning)
● Stochasticity
● Oscillators
→ cheap substrate
→ local energy source → fanout
→ 30 aJ and up
· PML · NDCD · Alternative Computing Group
Energies vary over many orders of magnitude
17
1 meV
1 eV
1 keV
1 MeV
1 aJ
1 zJ
1 fJ
1 pJ
1 GeV
1 nJ
Room temperature
Gate transition
MRAM barrier
Synaptic event
MRAM switching
Neural spike
Ion channel
Liquid Helium temperature
· PML · NDCD · Alternative Computing Group
Software
Architectures
Encoding
Circuits
Physical devices
Population coding with superparamagnetic tunnel junctions
18
● Rate/Population Coding
● Transition rate, binary
● Pre-charge sense amplifier
● Superparamagnetic tunnel junctions, Short term memory
A. Mizrahi, T. Hirtzlin, A. Fukushima, H. Kubota, S. Yuasa, J. Grollier, and D. Querlioz, Nature Comm. 9, 1533 (2018);
A. Mizrahi, J. Grollier, D. Querlioz, and M. D. Stiles, J. Appl. Phys. 124, 152111 (2018).
· PML · NDCD · Alternative Computing Group
Bigger energy barriers mean longer data retention, but more energy needed to write.
19
Energy
barrier
60 kT
14 kT
years
ms
μs
Memory for storage (MRAM)
Short term memory (synaptic weights)
Superparamagnetic (spiking neurons)
7 kT
Retention
Time
low-resistance high-resistance
AP
P
· PML · NDCD · Alternative Computing Group
Population coding – represent continuous degrees of freedom with multiple spike rates
20
Direction
θ
+ 180°
- 180°
Neurons
Neuron
Firin
g r
ate
Firin
g r
ate
𝐻Σ
𝑤𝑖
𝑟𝑖
Weights
· PML · NDCD · Alternative Computing Group
Simple sense, respond, measure, and learn test
21
Sensory input
Motor response
SMTJ-based CMOS only
Area 12,000 μm2 20,000 μm2
Energy 7.8 nJ 20 nJ
T. Hirtzlin at C2N:
Key advantage = stochastic analog to digital conversion
Weights stored in stable MTJs.Can we save energy there?
· PML · NDCD · Alternative Computing Group
Continuous learning: the key to a robust system that can adapt to changes
22
Usually system is only trained once because it requires a lot of energyBUT unable to adapt to changes!
Initial Training
Untrained network
Use of the network
Unexpected change Loss of precision,
failure
Untrained network
Initial Training
Weight modification
Use of the network
Unexpected change
System adapted
System equipped with continuous learning
· PML · NDCD · Alternative Computing Group
Continuous learning overcomes unreliable synapses
23
Here, ΔE = 15 kBT magnetic tunnel junctions can be used for memory with no precision loss
Ew = 100 kBT
No loss
Ew = 6 kBT
Ew = 8 kBT
Ew = 10 kBT
Ew = 15 kBT
0 1000 2000 3000 4000 5000 60001
10
Err
or
(%)
Learning steps
· PML · NDCD · Alternative Computing Group
2 3 4 5 6 7 8 9 10 11 12
0.001
0.01
0.1
1
Error (%)
No
rma
lize
d p
ow
er
co
nsu
mp
tio
n
46 neurons
Increase energy
barrier ΔEw
Using unreliable synapses lowers the energy consumption
24
𝐼 ∝ ∆𝐸𝑤
𝐸𝑛𝑒𝑟𝑔𝑦 ∝ ∆𝐸𝑤2
Write energy
Write current
(Sato et al., APL 2014)
· PML · NDCD · Alternative Computing Group
2 3 4 5 6 7 8 9 10 11 12
0.001
0.01
0.1
1
10 neurons
20 neurons
100 neurons
Error (%)
No
rma
lize
d p
ow
er
co
nsu
mp
tio
n
46 neurons
Using unreliable synapses lowers the energy consumption
25
𝐼 ∝ ∆𝐸𝑤
𝐸𝑛𝑒𝑟𝑔𝑦 ∝ ∆𝐸𝑤2
Write energy
Write current
(Sato et al., APL 2014)
· PML · NDCD · Alternative Computing Group
0.001
0.01
0.1
1
10
100
2 3 4 5 6 7 8 9 101112
10
12
14
16
18
20
100
46
20
10 neurons
OptimumNo
rma
lize
d p
ow
er
co
nsu
mp
tio
nN
um
be
r
of
ne
uro
ns
En
erg
y b
arr
ier
(k
BT
)
Error (%)
Using unreliable synapses lowers the energy consumption
26
Example:For 3% precision→ 54 neurons in each population (2916 weights)→ ΔEw = 12 kBT
𝐼 ∝ ∆𝐸𝑤
𝐸𝑛𝑒𝑟𝑔𝑦 ∝ ∆𝐸𝑤2
Write energy
Write current
(Sato et al., APL 2014)
· PML · NDCD · Alternative Computing Group
Software
Architectures
Encoding
Circuits
Physical devices
Stochastic computing with superparamagnetic tunnel junctions
27
● Neural network
● Stochastic bit streams
● AND and OR gates, PCSA
● Superparamagnetic tunnel junctions
M. W. Daniels, A. Madhavan, P. Tatatchian, A. Mizrahi, and M. D. Stiles, Physical Review Applied, 13, 034016 (2020).
· PML · NDCD · Alternative Computing Group
Stochastic computing represents numbers as probabilistic bit
streams
28
observed = 0.19
0.5
0.4
0.7
● Resilient to white noise
● Incorrect bit ⟹ error ≈ 𝑂 1
● Compare to binary: error ≈ 𝑂(2𝑁)
● Naturally coherent with bilevel devices (magnetic tunnel junctions).
● Interpreting these “spike trains” as rates or probabilities restricts us to 0 ≤ 𝑝 ≤ 1.
Time →
0.63
0.35
0.69
p = 0.2
· PML · NDCD · Alternative Computing Group
Unbiased superparamagnetic tunnel junction plus CMOSfor low energy random bitstreams
29
Pre-Charge Sense Amplifier (PCSA)Superparamagnetictunnel junction
𝑅𝐴𝑃 > 𝑅𝑟𝑒𝑓 > 𝑅𝑃
Clock
Reference resistor
Clock
Clock
Low energy random bits0.100 0.125 0.150
400
500
600
700
Resis
tan
ce (
)
Time (s)
IMTJ <<IC
AP
kBT
P
𝑝 = 0.5
· PML · NDCD · Alternative Computing Group
AND gates for multiplication – Programmable bitstream generator: 10 fJ per bit
30
Example
1001010100 (0.4)
x 1010001110 (0.5)
= 1000000100 (0.2)
AND gate (multiply)
𝒑𝐀𝐍𝐃 = 𝒑𝒂𝒑𝒃
Typical stochastic computing uses Linear Feedback Shift Registers→ Short period pseudorandom numbers
1/2
1/4
3/4
1/4, 3/4
1/8, 3/8
7/8, 5/8
1/8, 3/8 , 5/8, 7/8
1/16, 3/16, 5/16, 7/16
15/16, 13/16, 11/16, 9/16
AND gate Inverter Multiplexer
· PML · NDCD · Alternative Computing Group
Synapses from bitstream generators (weights) and AND gates (multiplication)
31
AND-gate synapseSynaptic weight:
Bit stream from neuron
Programmable Bit stream generator
· PML · NDCD · Alternative Computing Group
Neurons from OR gates for nonlinear summation
32
Example
1001010100 (0.4)
+ 1010001110 (0.5)
= 1011011110 (0.7)
OR gate (nonlinear add)
𝒑𝐎𝐑 = 𝒑𝒂 + 𝒑𝒃 − 𝒑𝒂𝒑𝒃
OR-gate neuron does not work with correlated bit streams
· PML · NDCD · Alternative Computing Group
Low area, high efficiency neural network with stochastic information throughout the calculation
33
AND-gate synapseSynaptic weight: Bit stream from programmable SMTJ generator
Bit stream from neuron
OR-gate neuron
· PML · NDCD · Alternative Computing Group
Efficient random bitstream generation and avoiding domain translation – energy efficiency on a standard problem
34
A standard deep neural network structure Standard test dataset
● Energy/performance tradeoff by tuning circuit parameters.
● Running for longer or shorter total time (~N) gives
energy/performance tradeoff.
Stochastic computing implementations
Future: implement primitives to check feasibility with realistic MTJs
· PML · NDCD · Alternative Computing Group
Collaborators
35
Brian Hoskins, Matthew Daniels, Guru Khalsa, Mark Anders, Jonathan Goodwill, Jabez McClelland, Nikolai Zhitenev, William Rippard, Matthew Pufall, Emilie Jué, Mark Stiles
Energy Frontier Research CenterDepartment of Energy
Alice Mizrahi, Advait Madhavan, Philippe Talatchian
Sumito Tsunegi, Kay Yakushiji, AkioFukushima, Hitoshi Kubota, Shinji Yuasa
Tifenn Hirtzlin, Damien Querlioz
Jacob Torrejon, Mathieu Riou, Flavio Abreu Araujo, Paolo Bortolotti, Vincent Cros, Julie Grollier
George Tzimpragos, Tim Sherwood
Siyuan Huang, Gina Adam
Vivek Amin, Jonathan Gibbons, Paul Haney, Axel Hoffmann, Julie Grollier
· PML · NDCD · Alternative Computing Group
Using magnetic tunnel junctions to compute like the brain
36
Architecture Encoding Devices
Population coding Fluctuation rates Superparamagnetic tunnel junctions
Synaptic weights Binary Unstable magnetic tunnel junctions
Neural network with stochastic computing
Random bitstreams Superparamagnetic tunnel junctions, Repurposed CMOS Logic gates
● Emulating features of the brain can increase efficiency for cognitive computing.
● Applications require designing across the computational stack
● Architecture● Encoding● Circuitry● Devices
● CMOS is likely to play an important role in any room-temperature computing scheme.
● Magnetic tunnel junctions
● Multifunctional● Already in CMOS Fabs
Software
Architectures
Encoding
Circuits
Physical devices