Compiler Design for Digital Signal Processors · LSI Proprietary 3 Storage Networking Systems...

Preview:

Citation preview

Compiler Design

for Digital Signal Processors

Erik Eckstein

03/25/2010

2LSI Proprietary

LSI Corporation

• LSI is a global leader in semiconductors and software solutions

• Solutions for storage and networking markets

• Founded 1981

• Headquarter: Milpitas, California

• Design centers around the world

– Small design center in Vienna: development of DSP software tools

3LSI Proprietary

StorageNetworking

Systems Semiconductors

Entry RAID Systems

Mid-range RAID

Systems

Host RAID Adapters

Host RAID SW

Data Management

SW

HDD ICs: HDCs, SoCs,

RCs, Pre-Amps

SAS & SAS ROC

Controllers

SAS Expanders, Bridges

SAN Fabric Products

Custom Storage

Products

Custom Products –

Enterprise Datacom

Network Processors

DSPs

Framers/Mappers

Link Layer Products

Modems, USB,

Firewire

Market Focus

4LSI Proprietary

• Digital Signal Processors– Used in mobile phones, wireless devices, infrastructure, etc.

– Audio, video processing

– High performance for signal processing algorithms

– Low power consumption

• Example: StarPro2716– 16 StarCore SC3400e DSP cores @ 750 MHz

– 8 ARM cores for networking packet processing

– Ethernet I/O support

– 6MB on chip memory

– 96 GMACs (96*109 multiply-accumulate instructions/sec)

– Multichannel speech processing in voice gateways

DSP Overview

5LSI Proprietary

DSP Characteristics

• RISC-like architecture

• Multiple parallel units

• VLIW (very long instruction word)

• Multiple memory buses

• 40-bit registers

• Complex addressing modes

• Hardware loops

• Fixed-point arithmetic

6LSI Proprietary

SC3400e Core

PCUprogram control unit

AGUaddress generation unit

DALUdata ALU

ALU

ALU

ALU

ALU

AA

U

AA

U

BM

U

PROGRAM

XPA (32)

XPD (128)

DATA

XAA (32)

XAD.in XAD.out (64) (64)

XBA (32)

2 FETCH BUFFERS

16DATA

REGISTERS

27ADDRESS

REGISTERS

7LSI Proprietary

DSP-C Compiler Requirements

• Support for all DSP features– with language extensions (fractional data types)

– with automatic recognition

• Optimizing for speed– time critical code

• Optimizing for code size– control code

8LSI Proprietary

high intermediate representation(general)

Compiler Design

low intermediate representation(processor specific)

component component component...

component

component

component

9LSI Proprietary

Code Generator

C Code Parser

Optimizations on Low Level IR

Optimizations on High Level IR

Front End

Back End

Compilation Stages

10LSI Proprietary

addr of addr of

High Level IR Example

assign

addr of

obj "b"

+

read read

obj "c"

obj "a"

a = b + c;

11LSI Proprietary

Low Level IR Example

move.w #<32,r0

suba r1,r0

asll d7,d1

move.w #<-1,d5 move.l (r6),d4

12LSI Proprietary

Optimizations are Important!

• Compiler contains approx. 50 different optimizations– 70% high level optimizations

– 30% low level optimizations

• Most optimization are invoked several times– up to 17 times!

13LSI Proprietary

Optimization Example: FIR Filter

long fir(short *x, short *y)

{

long sum = 0;

int i;

for (i = 0; i < 128; i++) {

sum += x[i] * y[i];

}

return sum;

}

14LSI Proprietary

Loop Strength Reduction

long fir(short *x, short *y)

{

long sum = 0;

int i;

for (i = 0; i < 128; i++) {

sum += *x * *y;

x++;

y++;

}

return sum;

}

15LSI Proprietary

Hardware Loop Support

long fir(short *x, short *y)

{

long sum = 0;

int i;

loop 128 {

sum += *x * *y;

x++;

y++;

}

return sum;

}

16LSI Proprietary

Instruction Selection

_fir

doensh3 #128

clr sum

loopstart3

move.w (x),tmp1

move.w (y),tmp2

imac tmp2,tmp1,sum

adda #4,x

adda #4,y

loopend3

rts

17LSI Proprietary

Register Allocation

_fir

doensh3 #128

clr d0

loopstart3

move.w (r0),d1

move.w (r1),d2

imac d2,d1,d0

adda #4,r0

adda #4,r1

loopend3

rts

5 cycles/iteration

18LSI Proprietary

Addressing Mode Optimization

_fir

doensh3 #128

sub d0,d0,d0

loopstart3

move.w (r0)+,d1

move.w (r1)+,d2

imac d2,d1,d0

loopend3

rts

3 cycles/iteration

19LSI Proprietary

Instruction Scheduling

_fir

doensh3 #128

sub d0,d0,d0

loopstart3

[ move.w (r0)+,d1

move.w (r1)+,d2 ]*)

imac d2,d1,d0

loopend3

rts

2 cycles/iteration*) executed in parallel

20LSI Proprietary

Software Pipelining

1 cycles/iteration

_fir

sub d0,d0,d0

doensh3 #127

[ move.w (r0)+,d1 move.w (r1)+,d2 ]

loopstart3

[ move.w(r0)+,d1

move.w(r1)+,d2

imac d2,d1,d0 ]

loopend3

imac d2,d1,d0

rts

21LSI Proprietary

Loop Transformation

0.25 cycles/iteration

_fir

...

loopstart3

[ move.4w (r0)+,d0:d1:d2:d3

move.4w (r1)+,d8:d9:d10:d11

imac d0,d8,d12

imac d1,d9,d13

imac d2,d10,d14

imac d3,d11,d1 ]

loopend3

...

Questions?

23LSI Proprietary

Recommended