Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
++
P.Err
1<<
MAX
Xpsc[i]
PSc
Err.[i+1]
MUX
Err Xt
N.Err
MAX/+
MUX >
Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani1, Vinay Chippa1, Srimat Chakradhar2,
Kaushik Roy1, Anand Raghunathan1
1Integrated Systems Laboratory, School of ECE, Purdue University 2Systems Architecture Department, NEC Laboratories America
Approximate Computing - Motivation
Recognition Synthesis Mining Video
Intrinsic Application Resilience Ability of applications to produce outputs of acceptable quality despite underlying computations executed imprecisely
Search
Intrinsic Application Resilience
‘Noisy’ Real World Inputs
Self- Healing/ Iterative
Algorithms
Redundant Input Data
No Golden Output
Perceptual Limitations
Statistical Probabilistic
Computations
Vision
Sources of Resilience
Quality Programmable Processors
Experimental Methodology and Results
QUORA: Quality Programmable 1D/2D Vector Processor
National Science Foundation
Summary
Intrinsic application resilience: A new dimension to optimize HW and SW
Objective: Energy-efficient & programmable processor for approximate computing
Quality programmable processors: Quality codified as part of the instruction set
QUORA: Quality programmable 1D/2D vector processor • Quality programmable ISA and microarchitecture
Acknowledgement
NEC laboratories America
Notion of correctness is relaxed Good enough answers !!!
Need programmable platforms for
approximate computing!
APEAPE
APEAPE
APE
APEAPE APE
APE
APE
APE
APE
APE
APE
ACC
ALU
MU
X
MUX
Reg
Reg
MAPE
MAPE
MU
X
Data. O
UT
APEAPE APE
APEAPE APE
APE
APE
APE
APE
1-to
-man
y-D
EMU
X
MAPE
MAPE
SM
SM
SM
SM
MAP
E
MAP
E
MAP
E
MAP
E
SM SM SM SMSM
MUX
Data. OUT
Data. IN
1-to-many-DEMUX
SM
INST.MEMORY
ScalarReg. File ALU
Prog. Counter
INST. DECODE & CONTROL UNIT
CAPE
Halt
Dat
a. IN
DATA MEMORY
ALU
ACC
Scratch Registers
MAPE
APE
Scra
tch
Regi
ster
s
ALU AC
C
MAPE
InstructionInst. AddInst. Read
APE ARRAY
CLK
RESET
SM_row_sel
MAPE_row_sel
SM_col_sel
MAPE_col_sel
Data. INData. OUT
Data. ReadData. WriteData. Add
INTE
RFAC
E
Quality Control Unit & Quality Monitors
Qu
alit
y Co
ntro
l Un
it &
Qu
alit
y M
on
ito
rs
PE c
ount
Com
plex
ity
Ener
gy
App
roxi
mat
ion
Scop
e
3-tiered PE hierarchy enables larger energy benefits from approximate computing
APE
CAPE
MAPE
Processing Element Hierarchy
> 90%
> 70%
Quality Programmable Instruction Set
Quality Configurable Execution
47 Instructions – 9 APE, 22 MAPE, 13 CAPE, 3 SM Instructions extended with 2 quality fields
e.g., qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt
Type of error – 3 quality metrics e.g., 𝑀𝑎𝑥𝐸𝑟𝑟 = | 𝑂𝑜𝑟𝑖𝑔 − 𝑂𝑎𝑝𝑝𝑟𝑜𝑥|
Amount of error
Block Diagram of QUORA Micro-architecture
Track positive & negative errors
Modulate the threshold for
round-off
Quality specified @ vector inst. outputs Scale the precision of input operands Key idea: Compensate errors across
many scalar operations
C.PSc.
Op-code
Error target
Actual Error+/>
Error
ErrorPSc
PSc.Unit
PSc.Unit
PSc.Unit
PSc.Unit
PSc.Unit
PSc.Unit
PSc.Unit
PSc.Unit
MAP
E
MAP
E
MAP
E
MAP
E
MAPE
MAPE
MAPE
MAPE
APEAPE APE
APE APEAPE APE
APE APEAPE APE
APE APEAPE APE
ACC
CLK
Gated CLK
C.PS
c
R.PS
c
Op-
code
Quality Control unit
Precision Scaling Unit
Array Level View Quality Control Unit
• Set precision values
PSc. units shared across
row/col <1% energy
Micro-architectural Parameters Benchmarks
Micro-architectural Parameters Value
Array Dimensions 16 X 16
No. of PEs (APEs + MAPEs+ CAPE) 289 (256 + 32 + 1)
No. of SM elements 32
Depth of SM elements 64
Operating Frequency 250 MHz
Metric Value
Feature Size 45nm
Area 2.6 mm2
Power 367.8 mW
Gate Count 502042
51% 28%
1% 19%
0% 1% APEs (%)MAPEs (%)CAPE(%)SMs (%)PScE (%)Misc. (%)
Applications Dataset Quality Metric Handwritten Digit Recognition
(SVM-MNIST) MNIST
Percentage classification accuracy
Object Recognition (SVM-NORB) NORB
Digit Classification (CNN) MNIST
Eye Detection (GLVQ) NEC labs.
Optical Character Recognition (k-NN) OCR digits
Census Data Analysis (ANN) Adult
Document Search (SSI) Subset of Wikipedia
No. correct in top 25 results
Image Segmentation (K-Means-Seg) Berkeley dataset Mean distance of
clustered points from respective centroids Optical Character Clustering
(K-Means-OCR) OCR digits
Energy Benefits QP-Instructions
0
0.5
1
1.5
Nor
mal
ized
Ene
rgy
--> No Approx. < 0.5%
~ 2.5 % ~ 7.5%
1.05-1.7X savings for NO quality loss 1.18-2.1X savings for < 2.5% quality loss > 2.5X savings for < 7.5% quality loss
88%
2% 10% 1% 3%
96%
QP-APE QP-MAPEAccurate
Dynamic inst. count Energy 90% of energy in QP instructions
Precision Scaling Mechanisms
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1
APE
Ene
rgy
-->
Average Error (%) -->
TruncUp/DownErr. Comp
Error compensation provides superior Q-E trade-off
QP-MAC
An abstract model for programmable approximate processors
Application Program
Application Quality
Requirement
Program with QP-ISA
HW/S
W IN
TERF
ACE
Quality Programmability: Ability to specify the desired accuracy requirement to HW
Notion of quality explicitly built into the instruction set
0
5
10
15
20
25
30
0 5 10
% o
f App
roxi
mat
e in
stru
ctio
ns
% Loss in Output Quality -->
Arbitrary< 50%< 25%< 7.5%
Constraining errors enables 25-100X more approximate instructions compared to allowing arbitrary errors
Quality Configurable
Execution Unit
QUALITY PROGRAMMABLE MICROARCHITECTURE
Decode & Control
Inst. Fetch
Register File
Quality Control Logic
Instruction accuracy monitor
Software visible Error Registers Quality Programmable ISA
• Quality fields in insts. qpADD dest, op1, op2, MAG, 1%
• Purely based on instruction semantics
Quality-programmable add
Error magnitude < 1%
Capable of executing instructions with different quality levels
Feedback about actual error - used by software to determine quality levels of future instructions
Micro-architecture guarantees instruction level quality bounds
SVM