37
Fondazione Silvio Tronchetti Provera Spatial Computation and Compiler techniques for configurable Architectures Milan - July ’07 Alberto Gallini Dept. of Information, Systems and Communication, University of Milan Bicocca.

3D-DRESD Alberto Gallini

Embed Size (px)

Citation preview

Page 1: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Spatial Computation and Compiler techniques for configurable Architectures

Milan - July ’07

Alberto Gallini

Dept. of Information, Systems and Communication, University of Milan Bicocca.

Page 2: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

OUTLINE

1. Intro

2. An hint to BME

3. HCL-to-ASCL compilation framework

4. High-level Compilation Layer (HCL)

5. XiRisc+PiCoGa Architecture Specific Compilation Layer (ASCL)

Page 3: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

• physical problems:- Leakage currents, threshold voltage control, tunneling, electromigration, high

interconnect resistance, crosstalk.

• communication:- It is hard to imagine how any form of globally connected stored-program

architecture could be built in a technology where communication even between adjacent switches is difficult.

Complex design Higher and higher costs

V. Agarwal, H.S. Murukkathampoondi, S.W. Keckler, and D.C. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In International Symposium on Computer Architecture (ISCA), June 2000.

LIMITS OF SILICON TECHNOLOGY

Page 4: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

WHAT’S GOING ONIntel Quad core

… towards multi-cores: tera-scale architecture.

ftp://download.intel.com/research/platform/videos/terascale/terascale_demo.htm

v

Cell Processor IBM-Toshiba-Sony (PSP3)

Cell is a heterogeneous chip multiprocessor consisting of a 64-bit Power core, augmented with 8 specialized co-processors based on a novel single-instruction multiple-data (SIMD) architecture called SPU (Synergistic Processor Unit), for data intensive processing as is found in cryptography, media and scientific applications. The system is integrated by a coherent on-chip bus.

For accelerators based on Field Programmable Gate Arrays (FPGA) and tightly coupled to the CPU via the front side bus (FSB). Intel is committed to working with hardware vendors who build FSB-attached accelerator modules, as well providers of compilers for FPGAs, to integrate AAL into their offerings.

Intel Accelerator Abstraction Layer (AAL)

Page 5: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

RESEARCH APPROACHELEMENTARY DEVICES:

Molecular devices have different structural and behavioral properties from transistors.

COMPUTATIONAL DEVICES:

These structures will not be immune to structural defects. So, defects will have to be managed by employed model.

NON-STANDARD ARCHITECTURES:

Highly scalable (characterized by a great number of not reliable processing elements-PE-) computational architecture are mapped on the crystal. Such architectures could be, in a first step, hybrid (molecular crystal merged into regular silicon lattice)

NEW MODELS OF COMPUTATION:

Different model of computation based on thousands, unreliable, interacting elements.

LINKING TRADITIONAL PROGRAMMING LANGUAGES:

Application deployment and efficient spatial resources exploitation

PRODUCTION PROCESS:

Small dimensions of molecular devices will have strong implications on production process. It is possible to synthesize regular molecular structure by bottom-up self-assembling techniques.

Page 6: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

BIO MOLECULAR ENGINE

An instrument for analysis and experimentation on architectural solutions

Page 7: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

BME FEATURES & RESULTS

- 3D topology editor

- “Gird” support (RMI technology)

- Java System-C integration

- Time management and traffic monitoring

Alberto Gallini, Claudio Ferretti, Giancarlo Mauri, Davide Molteni: Bio-Molecular Engine: A Simulation Environment for Bio-Inspired Architectural Models of Molecular-Scale Devices Based Machines. MSV 2005: 100-106.

Alberto Gallini, Claudio Ferretti, Giancarlo Mauri: Bio Molecular Engine: a bio-inspired environment for models of growing and evolvable computation. GECCO Workshops 2005: 249-256 - ACM Press New York, NY, USA.

Guido Casiraghi, Claudio Ferretti, Alberto Gallini, Giancarlo Mauri: A Membrane Computing System Mapped on an Asynchronous, Distributed Computational Environment. Workshop on Membrane Computing 2005: 159-164.

- Papers:

Page 8: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

COMPILATION FLOWint fibonacci(int m);

int main(void){int x;for (x = 1; x < 10; x++){printf("\n - fibonacci(%d) = %d",x,fibonacci(x));}}

int fibonacci(int m){

unsigned int f_0 = 0;unsigned int f_1 = 1;unsigned int f_2, i;

if (m <= 1){return m;}else{int i = 2;for (i=2; i <= m; i++){f_2 = f_0 + f_1;f_0 = f_1;f_1 = f_2;}return f_2;}

}

mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32,$vr7.s32mov fibonacci.i0 <- $vr6.s32

mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32ldc $vr4.s32 <- 2cvt $vr8.s32 <- fibonacci.f_2mov fibonacci.f_2 <- $vr5.u32mov fibonacci.f_0 <- fibonacci.f_1mov fibonacci.f_1 <- fibonacci.f_2ldc $vr7.s32 <- 1add $vr6.s32 <- fibonacci.i0,$vr7.s32mov fibonacci.i0 <- $vr6.s32

mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32ldc $vr4.s32 <- 2cvt $vr8.s32 <- fibonacci.f_2mov fibonacci.f_2 <- $vr5.u32mov fibonacci.f_0 <- fibonacci.f_1mov fibonacci.f_1 <- fibonacci.f_2ldc $vr7.s32 <- 1add $vr6.s32 <- fibonacci.i0,$vr7.s32mov fibonacci.i0 <- $vr6.s32

mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32mov fibonacci.i0 <- $vr6.s32

Page 9: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

TARGET ARCHITECTURES

STANDARD CORE(s)e.g. VLIW or Multiple Simple

RISC coresMAIN MEMORY

configurable core

Device 1

Device 2

Device N

?

?

Page 10: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

HCL-to-ASCL approach

High-level Compilation Layer (HCL)

ASCL (1) ASCL (2) ASCL (3)

Architecture Specific Compilation Layers

Page 11: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

CODE LIFE-CYCLEOverview of the logical steps: • Statements containing loops

• Statements containing high branch-rate• Series of associative operators• Recursive procedures

ranking

Page 12: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

CFG KERNEL FEATURES(1)

B1

B2

B1

B2

While loop

natural loop

self loop

B1

Loops:+-[1]: a:13,b:4,c:1+--+-[1]: a:14,b:5,c:4+--+--+-[1]: a:18,b:9,c:1+--+--+--+-[1]: a:19,b:10,c:9+--+--+--+--+-[1]: a:28,b:19,c:9+--+--+--+--+--+-[1]: a:37,b:28,c:9+--+--+--+--+--+--+-[1]: a:46,b:37,c:9+--+--+--+--+--+--+--+-[1]: a:55,b:46,c:9+--+--+--+--+--+--+--+--+-[1]: a:64,b:55,c:9+--+--+--+--+--+--+--+--+--+-[1]: a:73,b:64,c:9+--+--+--+--+--+--+--+--+--+--+-[1]: a:82,b:73,c:9+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:91,b:82,c:9+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:100,b:91,c:9+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:100,b:91,c:-508005+--+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-508105,b:-508096,c:-9+--+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-508105,b:-508096,c:105662236+--+--+--+--+--+--+--+--+--+--+--+--+-[3]: a:100,b:91,c:105154231+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:91,b:82,c:-666677360+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-666677451,b:-666677442,c:-9+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-666677451,b:-666677442,c:-1633047632+--+--+--+--+--+--+--+--+--+--+--+-[3]: a:91,b:82,c:1995242304+--+--+--+--+--+--+--+--+--+--+-[2]: a:82,b:73,c:-664419136+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-664419218,b:-664419209,c:-9+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-664419218,b:-664419209,c:-1672992513+--+--+--+--+--+--+--+--+--+--+-[3]: a:82,b:73,c:1957555647+--+--+--+--+--+--+--+--+--+-[2]: a:73,b:64,c:-694279600+--+--+--+--+--+--+--+--+--+--+-[1]: a:-694279673,b:-694279664,c:-9+--+--+--+--+--+--+--+--+--+--+-[2]: a:-694279673,b:-694279664,c:-608504996+--+--+--+--+--+--+--+--+--+-[3]: a:73,b:64,c:-1302784596+--+--+--+--+--+--+--+--+-[2]: a:64,b:55,c:-2022114044+--+--+--+--+--+--+--+--+--+-[1]: a:-2022114108,b:-2022114099,c:-9+--+--+--+--+--+--+--+--+--+-[2]: a:-2022114108,b:-2022114099,c:1586880895+--+--+--+--+--+--+--+--+-[3]: a:64,b:55,c:-435233149+--+--+--+--+--+--+--+-[2]: a:55,b:46,c:1993909992+--+--+--+--+--+--+--+--+-[1]: a:1993909937,b:1993909946,c:1993909937+--+--+--+--+--+--+--+--+-[2]: a:1993909937,b:1993909946,c:760083456+--+--+--+--+--+--+--+--+--+-[1]: a:-1233826481,b:-1233826490,c:-1233826481+--+--+--+--+--+--+--+--+--+-[2]: a:-1233826481,b:-1233826490,c:2050415960+--+--+--+--+--+--+--+--+-[3]: a:1993909937,b:1993909946,c:-1484467880+--+--+--+--+--+--+--+-[3]: a:55,b:46,c:509442112+--+--+--+--+--+--+-[2]: a:46,b:37,c:1364159424+--+--+--+--+--+--+--+-[1]: a:1364159378,b:1364159387,c:1364159378+--+--+--+--+--+--+--+-[2]: a:1364159378,b:1364159387,c:803784946+--+--+--+--+--+--+--+--+-[1]: a:-560374432,b:-560374441,c:-560374432+--+--+--+--+--+--+--+--+-[2]: a:-560374432,b:-560374441,c:182405152+--+--+--+--+--+--+--+-[3]: a:1364159378,b:1364159387,c:986190098+--+--+--+--+--+--+-[3]: a:46,b:37,c:-1944617774+--+--+--+--+--+-[2]: a:37,b:28,c:612444176+--+--+--+--+--+--+-[1]: a:612444139,b:612444148,c:612444139+--+--+--+--+--+--+-[2]: a:612444139,b:612444148,c:328475948+--+--+--+--+--+--+--+-[1]: a:-283968191,b:-283968200,c:-283968191+--+--+--+--+--+--+--+-[2]: a:-283968191,b:-283968200,c:-498404480+--+--+--+--+--+--+-[3]: a:612444139,b:612444148,c:-169928532+--+--+--+--+--+-[3]: a:37,b:28,c:442515644+--+--+--+--+-[2]: a:28,b:19,c:-1970456652+--+--+--+--+--+-[1]: a:-1970456680,b:-1970456671,c:-9+--+--+--+--+--+-[2]: a:-1970456680,b:-1970456671,c:1492064119+--+--+--+--+-[3]: a:28,b:19,c:-478392533+--+--+--+-[2]: a:19,b:10,c:-1542473120+--+--+--+--+-[1]: a:-1542473139,b:-1542473130,c:-9+--+--+--+--+-[2]: a:-1542473139,b:-1542473130,c:383838768+--+--+--+-[3]: a:19,b:10,c:-1158634352+--+--+-[2]: a:18,b:9,c:-1344615568+--+--+--+-[1]: a:-1344615586,b:-1344615577,c:-9+--+--+--+-[2]: a:-1344615586,b:-1344615577,c:-478122593+--+--+-[3]: a:18,b:9,c:-1822738161+--+-[2]: a:14,b:5,c:1618198881+--+--+-[1]: a:1618198867,b:1618198876,c:1618198867+--+--+-[2]: a:1618198867,b:1618198876,c:-1774379668+--+--+--+-[1]: a:902388761,b:902388752,c:9+--+--+--+-[2]: a:902388761,b:902388752,c:1563019408+--+--+--+--+-[1]: a:660630647,b:660630656,c:660630647+--+--+--+--+-[2]: a:660630647,b:660630656,c:479659548+--+--+--+--+--+-[1]: a:-180971099,b:-180971108,c:-180971099+--+--+--+--+--+-[2]: a:-180971099,b:-180971108,c:2017726440+--+--+--+--+-[3]: a:660630647,b:660630656,c:-1797581308+--+--+--+-[3]: a:902388761,b:902388752,c:-234561900+--+--+-[3]: a:1618198867,b:1618198876,c:-2008941568+--+-[3]: a:14,b:5,c:-390742687+-[2]: a:13,b:4,c:-329972872+--+-[1]: a:-329972885,b:-329972876,c:-9+--+-[2]: a:-329972885,b:-329972876,c:-1203135140+-[3]: a:13,b:4,c:-1533108012

RESULT: -1533108012

int f(int a, int b, int depth){

int i = 0; int c = a % b; if (c == 0) c++;

if((a < NUM_ROWS) && (a > 0)) c = f(c+a, c+b, depth+1);

c *= (a + 0xff) * ( b - 0xfa);

if (a > 0) c += f(c-a, c-b, depth+1); return c;

}

recursive functions:

i.e. the subgraph consisting the set of nodes containing B1 and all the nodes from which B2 can be reached without passing through B1.

Page 13: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

KERNEL FEATURES(2)

if(x > SB_NUM_COLUMN){ a += (x * 0xFF);}else{ if((x % 5) == 0){ a += x * __ALPHA__; } else{ a += x * __BETA__; }}

Nested Branch speculation

j = 0;while (j < NUM_COLUMNS){ for(i = 0; i < NUM_ROWS ; i++){ sum += m[i][j]; } j++;}

Associative operators Loop unrolling & tree execution

j = 0;while (j < NUM_COLUMNS){ for(i = 0; i < NUM_ROWS-8 ; i+=8){ sum += m[i][j]; sum += m[i+1][j]; sum += m[i+2][j]; sum += m[i+3][j]; sum += m[i+4][j]; sum += m[i+5][j]; sum += m[i+6][j]; sum += m[i+7][j]; } for(; i< NUM_ROWS;i += 1) { sum += m[i][j]; } j++;}

++

++

++

+

+

sum

a x__ALPHA__

__BETA__

>

SB_NUM_COLUMN*

5

0xFF

*

+==

0

+

*

+

%

Page 14: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

CODE “LINEARIZATION”

ldc $vr0.s32 <- 0mov main.from <- $vr0.s32ldc $vr1.s32 <- 10mov main.to <- $vr1.s32mov main.i <- main.fromldc $vr2.f32 <- "0.0"mov main.sum <- $vr2.f32ldc $vr3.s32 <- 10ldc $vr4.u32 <- 4

cal main.suif_tmp0 <- malloc($vr3.s32,$vr4.u32)

cvt $vr5.p32 <- main.suif_tmp0…

1-Input : source program (ANSI–C)

SUIF is exploited as front-end and the following operations are applied:

- dismantle field access expression to address arithmetic

- dismantle structured returns- compact multi-way branch statements- dismantle scope statements- dismantle if, for and while- flatten statement lists- rename colliding symbols- insert struct padding- insert struct final padding- annote LIR- Loop-unrolling + associative operators

marking.- s2m

output: annoted LIR in MACHINE-SUIF assembly

- Tail-recursion elimination

…if (from <= to ){ do{ amp = amplitude[i]; if (amp < 0) sum = sum - amp * SFACTOR; else sum = sum + amp * SFACTOR; for(j = 0; j <= i; j++){ *(part_amplitude+i) += amplitude[j]; } i++; }while (i <= to);…

Page 15: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

2-Input : LIR in MACHINE-SUIF assemblyMACHSUIF is exploited to get and manipulate CFG and DFG representation of the body of each procedure.

- il2cfg : proc-CFG graph is obtained- structural Analysis (cf-analysis & df-analysis)

- kernel identification and ranking- Kernel translation to SSA representation- analysis and optimizations (architecture dependent)

output: SSA representation of the extracted kernel

INTERMEDIATE REPRESENTATION

Page 16: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

MAPPING

3-Input: kernel SSA representation+ Machine code for standard core

- DFG “refinement” in function of

• physical resources of the programmable core

• efficiency of the computation obtained by the mapped circuit.

- Mapping activity definition

• When does a mapping begin? • How long is it ? • Is the mapping static or dynamic ? • Is its employment fitting?

Architectureconstraints

Page 17: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

C to SUIF

LIR / UNROLL

MACHINE-SUIF

CFG

STRUCTURAL ANALYSIS

ANSI-C

SUIF

(unrolled) Machine-SUIF

CFG (for each procedure)

PRE-ANALYSIS

BACK-END

annoted CFG, bit-vector results, ctrl-tree

Associative operator analysis

-Variable tracing-Structural Analysis initialization

-Region kernel analysis-Recursive procedure analysis-Kernel marking

High-level Compilation Layer (HCL)

Tail recursion elimination

Page 18: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

STRUCTURAL ANALYSIS IMPLEMENTATION

Cfg function 1

Cfg function 2

Cfg function 3

Cfg function n

B1

B2

B3

entry

exit

self-loop

block

if-then

block

Kernel identification

Kernel identification

Page 19: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

DF - STRUCTURAL ANALYSIS

FORWARD problems:

B3entry(b) exit

entry(c)

B2(a)entry(a)

B1 B2entry

Top down pass:Bottom up pass:

• Fentry(c)

• Fentry(b)

• Fentry(a),FB2(a)

• In(entry(c))

• In(entry(b)),In(B3),In(exit)

• In(entry(a)),In(B2(a))

• In(entry),In(B1),In(B2)

Flow functions:

BACKWARD problems:

• Out(entry(c))

• Out(entry(b)),Out(B3),Out(exit)

• Out(entry(a)),Out(B2(a))

• Out(entry),Out(B1),Out(B2)

• FB1, FB2, FB3, Fentry

• Fself-loop = Fbody* = Fbody in(body) = Fbody(in(body))

• Fif-then = (Fthen ° FifY) FifN = (Fthen°Fif) Fif

• Fblock = Fb0 ° Fb1 ° … ° Fbn

Page 20: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

REGIONS & FLOW FUNCTIONS(1)

Fdo-While= (Fdo Fwhile )*

Implementation

Floop (in(do))= Fwhile (Fdo(in(do)))

OUT(Do-while)= Floop (Floop (in(do)))

Do-while

FWhile-loop = (Fwhile Fbody )*

Implementation

Floop(in(while))) = Fbody(Fwhile (in(while)))

out(While-loop)= Fwhile (in(while))

Fwhile (Floop (in(while))) Fwhile (Floop (Floop (in(while))))

While-loop

FIf-then-else = Fif Fthen Fif Felse

Implementation

in(THEN)= Fif (in(If-then-else))

in(ELSE)= Fif (in(If-then-else))

out(If-then)= Fthen(in(THEN)) Felse(in(ELSE))

If-then-else

FIf-then, = (Fif Fthen) Fif

Implementation

in(THEN)= Fif (in(If-then))

out(If-then)= Fthen(in(THEN)) in(Then)

If-then

IF

THEN

IF

THEN ELSE

WHILE

BODY

DO

WHILE

Page 21: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

REGIONS & FLOW FUNCTIONS(2)

FNl-return= (Fdo Fbranch Fbody)*

Implementation

Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))

out(Natural-loop)= Fwhile(in(while))

Fwhile(Floop (in(while))) Fwhile(Floop (Fwhile(Floop(in(while)))))

Natural-loop (Return)

FNl-break= (Fdo Fbranch Fbody)*

Implementation

Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))Floop-branch(in(while))= Fbranch( (Fwhile(in(while)))

out(Natural-loop)= Fwhile(in(while))

Floop-branch(in(while))

Fwhile(Floop (in(while))) Floop-branch(Fwhile(Floop(in(while))))

Fwhile(Floop (Floop(in(while))))

Natural-loop (break)

WHILE

BODY

BRANCH

POST LOOP

WHILE

BODY

BRANCH

POST LOOP

EXIT

Page 22: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

REGIONS & FLOW FUNCTIONS(3)

FBlock = (Fb1 Fb2 … Fbn)

Implementation

out(BLOCK)= Fb1 (Fb2(…Fbn(in(B1))…))

block

FNl-break= (Fdo Fbranch Fbody)*

Implementation

Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))

Floop-branch(in(while))=Fbranch( (Fwhile (in(while)))

out(Natural-loop)= Fwhile(in(while))

Fwhile (Floop-branch (in(while)))

Fwhile (Floop (in(while))) Fwhile(Floop-branch (Fwhile (Floop-branch

(in(while))))) Fwhile(Floop-branch (Fwhile (Floop

(in(while))))) Fwhile(Floop (Fwhile (Floop-branch

(in(while))))) Fwhile(Floop (Fwhile (Floop

(in(while)))))

Natural-loop

(continue)

WHILE

BODY

BRANCH

POST LOOP

B2

Bn

B1

WHILE

BODY

BRANCH

POST LOOP

INCREMENT

WHILE

IF-THEN-ELSEPOST LOOP

POST LOOP

WHILE-LOOP

Page 23: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

The last pass is essentially the back-end for the ASCLs. It provides the user with the possibility to identify the desired properties and to append annotations on the elements of a control tree at different resolutions:

• regions (i.e. sub-trees of the control tree) • sets of basic block • set of instructions.

HCL BACK-END

A region oriented query on control-tree has the following structure:

mark_region(Region * current, RegionType type, Policy p){

if(current->get_region_type() == type) if(p.check_properties(current)) if (p.instruction_level_analysis(current)){ p.annote_kernel(current); return; }for (cir IN current inner regions) if(!cir->isWrapper()){ mark_region(cir);}

}

KERNEL I/O INTERFACE: • Input:

USES vectors determines univocally the input interface.

• Output:(GEN (USES immediately after

current kernel))

a

IN USES[i] = <0101 …>

OUT GEN[i](USES[a]USES[b])

b

Page 24: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

EXAMPLE

0

3

8

1

2 4

5

97 6

int fibonacci(int m);

int main(void){ int x; for (x = 1; x < 10; x++){ printf("\n - fibonacci(%d) = %d",x,fibonacci(x)); }}

int fibonacci(int m){

unsigned int f_0 = 0; unsigned int f_1 = 1; unsigned int f_2, i; if (m <= 1){ return m; } else{ for (i=2; i <= m; i++){ f_2 = f_0 + f_1; f_0 = f_1; f_1 = f_2; } return f_2; }}

Page 25: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.cint fibonacci(int m);

int main(void){ int x; for (x = 1; x < 10; x++){ printf("\n - fibonacci(%d) = %d",x,fibonacci(x)); }}

int fibonacci(int m){

unsigned int f_0 = 0; unsigned int f_1 = 1; unsigned int f_2, i; if (m <= 1){ return m; } else{ for (i=2; i <= m; i++){ f_2 = f_0 + f_1; f_0 = f_1; f_1 = f_2; } return f_2; }}

Page 26: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.cfg**** Node # 0: p s 1 3i **** Node # 3: ubr p 0i s 8 instruction [0] : jmp src: dst: **** Node # 8: ret p 3 s 9 instruction [0] : src: dst: instruction [1] : ldc src: 0, dst: $vr10.s32, instruction [2] : ret src: $vr10.s32, dst: **** Node # 1: cbr p 0 s 2 4 instruction [0] : src: dst: instruction [1] : ldc src: 0, dst: $vr0.u32, instruction [2] : mov src: $vr0.u32, dst: fibonacci.f_0, instruction [3] : ldc src: 1, dst: $vr1.u32, instruction [4] : mov src: $vr1.u32, dst: fibonacci.f_1, instruction [5] : ldc src: 1, dst: $vr2.s32, instruction [6] : bgt src: fibonacci.m, $vr2.s32, dst: **** Node # 4: p 1 s 5 instruction [0] : src: dst: instruction [1] : ldc src: 2, dst: $vr3.u32, instruction [2] : mov src: $vr3.u32, dst: fibonacci.i, **** Node # 5: cbr p 4 6 s 6 7 instruction [0] : src: dst: instruction [1] : cvt src: fibonacci.m, dst: $vr4.u32, instruction [2] : bgt src: fibonacci.i, $vr4.u32, dst: **** Node # 7: ret p 5 s 9 instruction [0] : src: dst: instruction [1] : cvt src: fibonacci.f_2, dst: $vr9.s32, instruction [2] : ret src: $vr9.s32, dst: **** Node # 6: ubr p 5 s 5 instruction [0] : add src: fibonacci.f_0, fibonacci.f_1, dst: $vr5.u32, instruction [1] : mov src: $vr5.u32, dst: fibonacci.f_2, instruction [2] : mov src: fibonacci.f_1, dst: fibonacci.f_0, instruction [3] : mov src: fibonacci.f_2, dst: fibonacci.f_1, instruction [4] : ldc src: 1, dst: $vr8.s32, instruction [5] : cvt src: $vr8.s32, dst: $vr7.u32, instruction [6] : add src: fibonacci.i, $vr7.u32, dst: $vr6.u32, instruction [7] : mov src: $vr6.u32, dst: fibonacci.i, instruction [8] : jmp src: dst: **** Node # 2: ret p 1 s 9 instruction [0] : ret src: fibonacci.m, dst:

**** Node # 9: p 2 7 8 s

0

3

8

1

2 4

5

97 6

Page 27: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci structural analysis

+[root]-> 0 1 9 |+--- (1)[if then else] ->1 2 4 |+---+---+--- (4)[block] ->4 5 7 |+---+---+---+---+---+--- (5)[while loop] ->5 6

01

2 4

5

97 6

01

2 4

5

97

01

2 4

9

0

1

9

root

root

0 91

1 2 4

4 5 7

5 6

Control tree:

Control tree construction:

1) 2) 3) 4) 5)

Page 28: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Pre-analysis info

GEN,PRSV,REACH-IN GEN BIT VECTOR node(0), length 1 : {0} GEN BIT VECTOR node(3), length 1 : {} GEN BIT VECTOR node(8), length 1 : {} GEN BIT VECTOR node(1), length 3 : {1-2} GEN BIT VECTOR node(4), length 4 : {3} GEN BIT VECTOR node(5), length 4 : {} GEN BIT VECTOR node(7), length 4 : {} GEN BIT VECTOR node(6), length 8 : {4-7} GEN BIT VECTOR node(2), length 8 : {} GEN BIT VECTOR node(9), length 8 : {}

PRSV BIT VECTOR node(0): {0-7} PRSV BIT VECTOR node(3): {0-7} PRSV BIT VECTOR node(8): {0-7} PRSV BIT VECTOR node(1): {0,3-4,7} PRSV BIT VECTOR node(4): {0-2,4-6} PRSV BIT VECTOR node(5): {0-7} PRSV BIT VECTOR node(7): {0-7} PRSV BIT VECTOR node(6): {0} PRSV BIT VECTOR node(2): {0-7} PRSV BIT VECTOR node(9): {0-7}

REACH-IN bit-vector, node (0) : {} REACH-IN bit-vector, node (1) : {0} REACH-IN bit-vector, node (4) : {0-2} REACH-IN bit-vector, node (5) : {0-7} REACH-IN bit-vector, node (7) : {0-7} REACH-IN bit-vector, node (6) : {0-7} REACH-IN bit-vector, node (2) : {0-2} REACH-IN bit-vector, node (9) : {0-7}

USES bit-vector, node (0) : {} USES bit-vector, node (1) : {0} [m] USES bit-vector, node (4) : {} USES bit-vector, node (5) : {0,3,7} [m][i][i] USES bit-vector, node (7) : {4} [f_2] USES bit-vector, node (6) : {1-7} [f_0][f_1][i][f_2][f_0][f_1][i] USES bit-vector, node (2) : {0} [m] USES bit-vector, node (9) : {}

0

3

8

1

2 4

5

97 6

Page 29: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

KERNEL MARKED[BASIC-BLOCK] good **** Node # 6: ubr p 5

s 5 [+] instruction [0] : add src: , , dst: , | +---- annote : line +---- annote : *** while_instruction +---- annote : RCHIN +---- annote : USES +---- annote : GEN

[+] instruction [1] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction

[+] instruction [2] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction

[+] instruction [3] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction

[+] instruction [4] : ldc src: , dst: , | +---- annote : *** while_instruction

[+] instruction [5] : add src: , , dst: , | +---- annote : *** while_instruction

[+] instruction [6] : mov src: , dst: , | +---- annote : *** while_instruction

0

3

8

1

2 4

5

97 6

Page 30: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

iquant1_non_intra_fixed

Control tree: [root]-> 0 2 25 +--- (0)[block] ->0 1 +--- (2)[while loop] ->2 3 +---+---+--- (3)[block] ->3 18 24 +---+---+---+---+--- (3)[if then] ->3 4 +---+---+---+---+---+--- (4)[block] ->4 10 13 +---+---+---+---+---+---+---+--- (4)[block] ->4 7 +---+---+---+---+---+---+---+---+--- (4)[if then else] ->4 5 6 +---+---+---+---+---+---+---+---+--- (7)[if then else] ->7 8 9 +---+---+---+---+---+---+---+--- (10)[if then] ->10 11 +---+---+---+---+---+---+---+---+---+--- (11)[if then] ->11 12 +---+---+---+---+---+---+---+--- (13)[if then] ->13 14 +---+---+---+---+---+---+---+---+---+---+--- (14)[block] ->14 17 +---+---+---+---+---+---+---+---+---+---+---+---+--- (14)[if then else] ->14 15 16 +---+---+---+---+--- (18)[if then else] ->18 19 20 +---+---+---+---+---+---+--- (20)[block] ->20 23 +---+---+---+---+---+---+---+---+---+--- (20)[if then else] ->20 21 22 +--- (25)[block] ->25 26

static void iquant1_non_intra_fixed(si16_t *src, si16_t *dst, ui8_t *quant_mat, si32_t mquant){ si32_t i, val, pro;

for (i=0; i<64; i++) { val = src[i]; if (val!=0)

{ pro = (2*val+(val>0 ? 1 : -1))*quant_mat[i]*mquant; val = (pro>=0) ? (si32_t) (pro>>5) : (si32_t) ((pro + 31)>>5); /* mismatch control */ if ((val&1)==0 && val!=0) val+= (val>0) ? -1 : 1;}

/* saturation */ dst[i] = (val>2047) ? 2047 : ((val<-2048) ? -2048 : val); }}

0

0 2 25

2

3

3

3 18 24

20 23

20 21 22

4

4

10

10

13

13

4

4 5 6

7

7 8 9 11 12

11

17

14

14

14 15 16

18 19 20

25 260 1

Page 31: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

PiCoGa & Griffy-C:

• Computation is broken in DFG NODES, whose complexity is at most a 32-bit ANSI-C standard operator.

• For each DFG node it is possible to specify the SIZE in order to avoid waste of unneeded PiCoGA resources.

• shifts, concatenation and bitwise operators by constants are considered routing-only operators, as they do not require RLCs for implementation.

CASE STUDY: XiRisc+PiCoGa and Griffy-C

Page 32: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

C TO GRIFFY-C

C to SUIF

LIR

MACHINE-SUIF

CFG

STRUCTURAL ANALYSIS

KERNEL IDENTIFICATION

• innermost while-region;

• “PiCoGa basic block” marking;

• selection of while-region sub-trees containing only PiCoGa Basic Block;

• Kernel ranking

1

2

3

KERNEL EXTRACTION

PiCoGa Kernel translation• SSA representation

• Cti Cmove replacementGRIFFY–C COMPILER

Page 33: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.lir(1)/* Architecture is Linux */typedef int (*__cp_1) (void);typedef int (*__cp_2) ();typedef int (*__cp_3) (int );typedef char __ar_4[24];typedef int (*__cp_5) (int );

int main(void);extern int printf();int fibonacci(int );

static __ar_4 _fibonacciTmp0 = {10, 32, 45, 32, 102, 105, 98, 111, 110, 97, 99, 99, 105, 40, 37, 100, 41, 32, 61, 32, 32, 37, 100, 0};

int main(void){ int x; int suif_tmp00; x = (1); L3:; if ((! (x< 10))) goto L4; suif_tmp00 = fibonacci(x); (void)printf(_fibonacciTmp0, x, suif_tmp00); x = ((x+1)); goto L3; L4:; /*#line 8 ""*/ return 0 ; }

Page 34: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.lir(2)int fibonacci(int m){ unsigned int f_0; unsigned int f_1; unsigned int f_2; unsigned int i; /*#line 11 "./test/fibonacci/fibonacci.c"*/ f_0 = (0U); /*#line 11 "./test/fibonacci/fibonacci.c"*/ f_1 = (1U); if ((! (m<= 1))) goto L2; /*#line 18 "./test/fibonacci/fibonacci.c"*/ return m ; L2:; i = (2U); L5:; if ((! (i<= ((unsigned int )(m ))))) goto L6;

/*#line 22 "./test/fibonacci/fibonacci.c"*/ f_2 = ((f_0+f_1)); /*#line 23 "./test/fibonacci/fibonacci.c"*/ f_0 = (f_1); /*#line 24 "./test/fibonacci/fibonacci.c"*/ f_1 = (f_2); i = ((i+((1 )))); goto L5; L6:; /*#line 26 "./test/fibonacci/fibonacci.c"*/ return ((int )(f_2 )) ; return 0 ; }

Page 35: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.svm(1)/* target_lib: "suifvm"] *//* Generated automatically by Machine SUIF */

#include <suifvm/c_printer_defs.h>

int main();int printf();int fibonacci();static char _fibonacciTmp0[24];

static char _fibonacciTmp0[24] =

{10, 32, 45, 32, 102, 105, 98, 111, 110, 97, 99, 99, 105, 40, 37, 100, 41, 32, 61, 32, 32, 37, 100, 0};

int main() {int x;int suif_tmp00;

/* Virtual register declarations */int _vr0;int _vr1;void * _vr2;int _vr3;int _vr4;int _vr5;

_vr0 = 1; x = _vr0;L3: _vr1 = 10; if (x >= _vr1) goto L4; suif_tmp00 = fibonacci(x); _vr2 = (void *)&_fibonacciTmp0; printf(_vr2, x, suif_tmp00); _vr4 = 1; _vr3 = x + _vr4; x = _vr3; goto L3;L4: _vr5 = 0; return _vr5;} /* end of main */

Page 36: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

Example: fibonacci.svm(2)intfibonacci(int m){unsigned int f_0;unsigned int f_1;unsigned int f_2;unsigned int i;

/* Virtual register declarations */unsigned int _vr0;unsigned int _vr1;int _vr2;unsigned int _vr3;unsigned int _vr4;unsigned int _vr5;unsigned int _vr6;unsigned int _vr7;int _vr8;int _vr9;int _vr10;

_vr0 = 0;f_0 = _vr0;_vr1 = 1;f_1 = _vr1;_vr2 = 1;if (m > _vr2) goto L2;return m;goto L1;

L2:_vr3 = 2;i = _vr3;

L5:_vr4 = (unsigned int)m;if (i > _vr4) goto L6;_vr5 = f_0 + f_1;f_2 = _vr5;f_0 = f_1;f_1 = f_2;_vr8 = 1;_vr7 = (unsigned int)_vr8;_vr6 = i + _vr7;i = _vr6;goto L5;

L6:_vr9 = (int)f_2;return _vr9;

L1:_vr10 = 0;return _vr10;

} /* end of fibonacci */

Page 37: 3D-DRESD Alberto Gallini

Fondazione Silvio Tronchetti Provera

0

0 2 25

2

3

3

3 18 24

20 23

20 21 22

4

4

10

10

13

13

4

4 5 6

7

7 8 9 11 12

11

17

14

14

14 15 16

18 19 20

25 260 1