1
+ + P.Err 1<< MAX X psc [i] PSc Err.[i+1] MUX Err Xt N.Err MAX/+ MUX > Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani 1 , Vinay Chippa 1 , Srimat Chakradhar 2 , Kaushik Roy 1 , Anand Raghunathan 1 1 Integrated Systems Laboratory, School of ECE, Purdue University 2 Systems Architecture Department, NEC Laboratories America Approximate Computing - Motivation Recognition Synthesis Mining Video Intrinsic Application Resilience Ability of applications to produce outputs of acceptable quality despite underlying computations executed imprecisely Search Intrinsic Application Resilience ‘Noisy’ Real World Inputs Self- Healing/ Iterative Algorithms Redundant Input Data No Golden Output Perceptual Limitations Statistical Probabilistic Computations Vision Sources of Resilience Quality Programmable Processors Experimental Methodology and Results QUORA: Quality Programmable 1D/2D Vector Processor National Science Foundation Summary Intrinsic application resilience: A new dimension to optimize HW and SW Objective: Energy-efficient & programmable processor for approximate computing Quality programmable processors: Quality codified as part of the instruction set QUORA: Quality programmable 1D/2D vector processor Quality programmable ISA and microarchitecture Acknowledgement NEC laboratories America Notion of correctness is relaxed Good enough answers !!! Need programmable platforms for approximate computing! APE APE APE APE APE APE APE APE APE APE APE APE APE APE ACC ALU MUX MUX Reg Reg MAPE MAPE MUX Data. OUT APE APE APE APE APE APE APE APE APE APE 1-to-many-DEMUX MAPE MAPE SM SM SM SM MAPE MAPE MAPE MAPE SM SM SM SM SM MUX Data. OUT Data. IN 1-to-many-DEMUX SM INST. MEMORY Scalar Reg. File ALU Prog. Counter INST. DECODE & CONTROL UNIT CAPE Halt Data. IN DATA MEMORY ALU ACC Scratch Registers MAPE APE Scratch Registers ALU ACC MAPE Instruction Inst. Add Inst. Read APE ARRAY CLK RESET SM_row_sel MAPE_row_sel SM_col_sel MAPE_col_sel Data. IN Data. OUT Data. Read Data. Write Data. Add INTERFACE Quality Control Unit & Quality Monitors Quality Control Unit & Quality Monitors PE count Complexity Energy Approximation Scope 3-tiered PE hierarchy enables larger energy benefits from approximate computing APE CAPE MAPE Processing Element Hierarchy > 90% > 70% Quality Programmable Instruction Set Quality Configurable Execution 47 Instructions 9 APE, 22 MAPE, 13 CAPE, 3 SM Instructions extended with 2 quality fields e.g., qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt Type of error 3 quality metrics e.g., =| | Amount of error Block Diagram of QUORA Micro-architecture Track positive & negative errors Modulate the threshold for round-off Quality specified @ vector inst. outputs Scale the precision of input operands Key idea: Compensate errors across many scalar operations C.PSc. +/> Error Error PSc PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit PSc. Unit MAPE MAPE MAPE MAPE MAPE MAPE MAPE MAPE APE APE APE APE APE APE APE APE APE APE APE APE APE APE APE ACC CL K Gated CLK C.PSc R.PSc Op-code Precision Scaling Unit Array Level View Quality Control Unit Set precision values PSc. units shared across row/col <1% energy Micro-architectural Parameters Benchmarks Micro-architectural Parameters Value Array Dimensions 16 X 16 No. of PEs (APEs + MAPEs+ CAPE) 289 (256 + 32 + 1) No. of SM elements 32 Depth of SM elements 64 Operating Frequency 250 MHz Metric Value Feature Size 45nm Area 2.6 mm 2 Power 367.8 mW Gate Count 502042 51% 28% 1% 19% 0% 1% APEs (%) MAPEs (%) CAPE(%) SMs (%) PScE (%) Misc. (%) Applications Dataset Quality Metric Handwritten Digit Recognition (SVM-MNIST) MNIST Percentage classification accuracy Object Recognition (SVM-NORB) NORB Digit Classification (CNN) MNIST Eye Detection (GLVQ) NEC labs. Optical Character Recognition (k-NN) OCR digits Census Data Analysis (ANN) Adult Document Search (SSI) Subset of Wikipedia No. correct in top 25 results Image Segmentation (K-Means-Seg) Berkeley dataset Mean distance of clustered points from respective centroids Optical Character Clustering (K-Means-OCR) OCR digits Energy Benefits QP-Instructions 0 0.5 1 1.5 Normalized Energy --> No Approx. < 0.5% ~ 2.5 % ~ 7.5% 1.05-1.7X savings for NO quality loss 1.18-2.1X savings for < 2.5% quality loss > 2.5X savings for < 7.5% quality loss 88% 2% 10% 1% 3% 96% QP-APE QP-MAPE Accurate Dynamic inst. count Energy 90% of energy in QP instructions Precision Scaling Mechanisms 0 0.2 0.4 0.6 0.8 1 1.2 0 0.5 1 APE Energy --> Average Error (%) --> Trunc Up/Down Err. Comp Error compensation provides superior Q-E trade-off QP-MAC An abstract model for programmable approximate processors Application Program Application Quality Requirement Program with QP-ISA HW/SW INTERFACE Quality Programmability: Ability to specify the desired accuracy requirement to HW Notion of quality explicitly built into the instruction set 0 5 10 15 20 25 30 0 5 10 % of Approximate instructions % Loss in Output Quality --> Arbitrary < 50% < 25% < 7.5% Constraining errors enables 25-100X more approximate instructions compared to allowing arbitrary errors Quality Configurable Execution Unit QUALITY PROGRAMMABLE MICROARCHITECTURE Decode & Control Inst. Fetch Register File Quality Control Logic Instruction accuracy monitor Software visible Error Registers Quality Programmable ISA Quality fields in insts. qpADD dest, op1, op2, MAG, 1% Purely based on instruction semantics Quality- programmable add Error magnitude < 1% Capable of executing instructions with different quality levels Feedback about actual error - used by software to determine quality levels of future instructions Micro-architecture guarantees instruction level quality bounds SVM

Quality Programmable Vector Processors for Approximate Computing · 2013-12-22 · Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani1, Vinay Chippa1,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quality Programmable Vector Processors for Approximate Computing · 2013-12-22 · Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani1, Vinay Chippa1,

++

P.Err

1<<

MAX

Xpsc[i]

PSc

Err.[i+1]

MUX

Err Xt

N.Err

MAX/+

MUX >

Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani1, Vinay Chippa1, Srimat Chakradhar2,

Kaushik Roy1, Anand Raghunathan1

1Integrated Systems Laboratory, School of ECE, Purdue University 2Systems Architecture Department, NEC Laboratories America

Approximate Computing - Motivation

Recognition Synthesis Mining Video

Intrinsic Application Resilience Ability of applications to produce outputs of acceptable quality despite underlying computations executed imprecisely

Search

Intrinsic Application Resilience

‘Noisy’  Real  World Inputs

Self- Healing/ Iterative

Algorithms

Redundant Input Data

No Golden Output

Perceptual Limitations

Statistical Probabilistic

Computations

Vision

Sources of Resilience

Quality Programmable Processors

Experimental Methodology and Results

QUORA: Quality Programmable 1D/2D Vector Processor

National Science Foundation

Summary

Intrinsic application resilience: A new dimension to optimize HW and SW

Objective: Energy-efficient & programmable processor for approximate computing

Quality programmable processors: Quality codified as part of the instruction set

QUORA: Quality programmable 1D/2D vector processor • Quality programmable ISA and microarchitecture

Acknowledgement

NEC laboratories America

Notion of correctness is relaxed Good enough answers !!!

Need programmable platforms for

approximate computing!

APEAPE

APEAPE

APE

APEAPE APE

APE

APE

APE

APE

APE

APE

ACC

ALU

MU

X

MUX

Reg

Reg

MAPE

MAPE

MU

X

Data. O

UT

APEAPE APE

APEAPE APE

APE

APE

APE

APE

1-to

-man

y-D

EMU

X

MAPE

MAPE

SM

SM

SM

SM

MAP

E

MAP

E

MAP

E

MAP

E

SM SM SM SMSM

MUX

Data. OUT

Data. IN

1-to-many-DEMUX

SM

INST.MEMORY

ScalarReg. File ALU

Prog. Counter

INST. DECODE & CONTROL UNIT

CAPE

Halt

Dat

a. IN

DATA MEMORY

ALU

ACC

Scratch Registers

MAPE

APE

Scra

tch

Regi

ster

s

ALU AC

C

MAPE

InstructionInst. AddInst. Read

APE ARRAY

CLK

RESET

SM_row_sel

MAPE_row_sel

SM_col_sel

MAPE_col_sel

Data. INData. OUT

Data. ReadData. WriteData. Add

INTE

RFAC

E

Quality Control Unit & Quality Monitors

Qu

alit

y Co

ntro

l Un

it &

Qu

alit

y M

on

ito

rs

PE c

ount

Com

plex

ity

Ener

gy

App

roxi

mat

ion

Scop

e

3-tiered PE hierarchy enables larger energy benefits from approximate computing

APE

CAPE

MAPE

Processing Element Hierarchy

> 90%

> 70%

Quality Programmable Instruction Set

Quality Configurable Execution

47 Instructions – 9 APE, 22 MAPE, 13 CAPE, 3 SM Instructions extended with 2 quality fields

e.g., qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt

Type of error – 3 quality metrics e.g.,  𝑀𝑎𝑥𝐸𝑟𝑟 = |  𝑂𝑜𝑟𝑖𝑔 − 𝑂𝑎𝑝𝑝𝑟𝑜𝑥|

Amount of error

Block Diagram of QUORA Micro-architecture

Track positive & negative errors

Modulate the threshold for

round-off

Quality specified @ vector inst. outputs Scale the precision of input operands Key idea: Compensate errors across

many scalar operations

C.PSc.

Op-code

Error target

Actual Error+/>

Error

ErrorPSc

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

MAP

E

MAP

E

MAP

E

MAP

E

MAPE

MAPE

MAPE

MAPE

APEAPE APE

APE APEAPE APE

APE APEAPE APE

APE APEAPE APE

ACC

CLK

Gated CLK

C.PS

c

R.PS

c

Op-

code

Quality Control unit

Precision Scaling Unit

Array Level View Quality Control Unit

• Set precision values

PSc. units shared across

row/col <1% energy

Micro-architectural Parameters Benchmarks

Micro-architectural Parameters Value

Array Dimensions 16 X 16

No. of PEs (APEs + MAPEs+ CAPE) 289 (256 + 32 + 1)

No. of SM elements 32

Depth of SM elements 64

Operating Frequency 250 MHz

Metric Value

Feature Size 45nm

Area 2.6 mm2

Power 367.8 mW

Gate Count 502042

51% 28%

1% 19%

0% 1% APEs (%)MAPEs (%)CAPE(%)SMs (%)PScE (%)Misc. (%)

Applications Dataset Quality Metric Handwritten Digit Recognition

(SVM-MNIST) MNIST

Percentage classification accuracy

Object Recognition (SVM-NORB) NORB

Digit Classification (CNN) MNIST

Eye Detection (GLVQ) NEC labs.

Optical Character Recognition (k-NN) OCR digits

Census Data Analysis (ANN) Adult

Document Search (SSI) Subset of Wikipedia

No. correct in top 25 results

Image Segmentation (K-Means-Seg) Berkeley dataset Mean distance of

clustered points from respective centroids Optical Character Clustering

(K-Means-OCR) OCR digits

Energy Benefits QP-Instructions

0

0.5

1

1.5

Nor

mal

ized

Ene

rgy

--> No Approx. < 0.5%

~ 2.5 % ~ 7.5%

1.05-1.7X savings for NO quality loss 1.18-2.1X savings for < 2.5% quality loss > 2.5X savings for < 7.5% quality loss

88%

2% 10% 1% 3%

96%

QP-APE QP-MAPEAccurate

Dynamic inst. count Energy 90% of energy in QP instructions

Precision Scaling Mechanisms

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1

APE

Ene

rgy

-->

Average Error (%) -->

TruncUp/DownErr. Comp

Error compensation provides superior Q-E trade-off

QP-MAC

An abstract model for programmable approximate processors

Application Program

Application Quality

Requirement

Program with QP-ISA

HW/S

W IN

TERF

ACE

Quality Programmability: Ability to specify the desired accuracy requirement to HW

Notion of quality explicitly built into the instruction set

0

5

10

15

20

25

30

0 5 10

% o

f App

roxi

mat

e in

stru

ctio

ns

% Loss in Output Quality -->

Arbitrary< 50%< 25%< 7.5%

Constraining errors enables 25-100X more approximate instructions compared to allowing arbitrary errors

Quality Configurable

Execution Unit

QUALITY PROGRAMMABLE MICROARCHITECTURE

Decode & Control

Inst. Fetch

Register File

Quality Control Logic

Instruction accuracy monitor

Software visible Error Registers Quality Programmable ISA

• Quality fields in insts. qpADD dest, op1, op2, MAG, 1%

• Purely based on instruction semantics

Quality-programmable add

Error magnitude < 1%

Capable of executing instructions with different quality levels

Feedback about actual error - used by software to determine quality levels of future instructions

Micro-architecture guarantees instruction level quality bounds

SVM