Transcript
Page 1: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 1

Ideas for the design of an ASIP for LQCD

Target Compiler TechnologiesCASTNESS’11, Rome, Italy

Page 2: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 2

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 3: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 3

ASIPs in Multi-Core SoC

ASIP: Application-Specific Processor Anything between general-purpose P and hardwired data-path Flexibility through programmability and design-time reconfigurability High throughput, low energy through parallelism and specialization

ASIP is foundation of heterogeneous multi-core SoC Balanced SoC architecture offers best performance at lowest energy and lowest cost

Page 4: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 4

Why ASIPs?

Maximise performance Specialisation Parallelism: VLIW, SIMD, multi-core

Minimise power dissipation Specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation

Leverage the benefits of programmability React to changing requirements Ship first for evolving standards Remedy defects Extend products to new markets without an SoC respin

Page 5: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 5

IP Designer Tool Suite

Page 6: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 6

nML – ASIP description language

Structural skeleton

reg V[4]<vector>; trn vecr<vector>; trn vecs<vector>; trn vecd<vector>;trn vect<vector>;fu vec;fu vabs;...

opn vec_adiff_opn(t:c2u, r:c2u){ action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r;}

Instruction-set grammar

Example: architectural specialisation Absolute-difference instruction in motion estimation

• Registers, busses, functional units

• Application specific data type ‘vector’

• Registers, busses, functional units

• Application specific data type ‘vector’

Primitive functions:•vsub()•vabs()

Primitive functions:•vsub()•vabs()

Operation pattern:V vabs() vsub() V, V Operation pattern:V vabs() vsub() V, V

Page 7: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 7

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 8: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 8

EURETILE hardware platform

Communication DNP

Control RISC

Computation DSP ASIPs: specialised towards the application

− Lattice quantum chromo dynamics (LQCD)

− Neural network (Izhikevich)

DNP

RISC

DSP

MEM

***

ASIP1

Page 9: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 9

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 10: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 10

LQCD ASIP

Goals Increase performance Decrease gate count or usage of FPGA blocks

Means Task level parallelism (multi tile architecture) Data level parallelism Instruction level parallelism Architecture specialisation

Page 11: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 11

LQCD ASIP

Instruction level parallelism

VU_1 … VU_n LS_0 … LS_m

VLIW instruction word

Arithmetic operations in parallel with load/store operations Appropriate mix of n and m based on feedback from

compilation of Qphi() function n*m speed improvement over scalar architecture

Data level parallelism

c1 c2 c3

3-way SIMD fits with SU(3) matrix algebra 3x speed improvement over scalar architecture

Page 12: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 12

LQCD ASIP

Architecture specialisation: complex floating point operations:

C + C, C + i*C → 2x speedup over scalar architecture

C – C, C – i*C

C * R → 4x speedup over scalar architecture

C * C → 8x speedup over scalar architecture

Behaviour of floating point operations • Defined in a C dialect intended for the modelling of functional units

• Translated into simulation and implementation (RTL) models

• Synthesis on standard cell library, mapping on FPGA primitives

Vector types and operators defined for the C compiler

vector v1, va[4], vb[4];

v1 += va[0] * vb[1];

Page 13: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 13

LQCD ASIP

Architecture specialisation: address generationGoal: Vector units should be used every cycle, address generation must be done in parallel

How: to be investigated, after feedback from C compiler!

Deliverables SDK (Compiler, Assembler, Linker, Simulator, Debugger) based

on IP Designer SystemC model RTL Model + FPGA mapping

Page 14: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 14

Page 15: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 15


Recommended