Dynamic Support of Processor Extensions in Cross Development Tools

Preview:

DESCRIPTION

Dynamic Support of Processor Extensions in Cross Development Tools. Vladimir Rubanov Institute for System Programming of RAS SYRCoSE 2007, 31 May 2007. Extensible Embedded System (1). System Components: Processor Core Processor Extensions = Accelerators (new FUs or co-processors) - PowerPoint PPT Presentation

Citation preview

Dynamic Support of Processor Extensions in Cross Development Tools

Vladimir RubanovInstitute for System Programming of RAS

SYRCoSE 2007, 31 May 2007

2

Extensible Embedded System (1)

System Components:Processor CoreProcessor Extensions = Accelerators

(new FUs or co-processors)Memory Subsystem:

• The only program memory.• Core’s data memories.• Shared data memory.• Accelerators' local data memories.

Uniform instruction set for application developer.

3

Extensible Embedded System (2)

MS

Processor Core P

Core’s Local Memory MP

Core’s Data Memories Mp

Program Memory PM

Shared Memory

Accelerators Ai

Internal Core’s Memory {Ep, Xp}

Accelerators’ Local Memories {Ma}

i-th Accelerator’s Local Memory Mai

Ea

Accelerator’s Internal Memory

Control Flow

Data Flow

4

Accelerator Model

Accelerator

Execution Memory Ea

Main Accelerator Memory MA

Decoder DAc Execution Block EA

...

Shared Memory MS

Local Memory Ma

CA

AAControl Flow

Data Flow

Activation

f1 t1p1

fn tnpn

5

System Design Process

Stage 1: Core Design Stage 2: Accelerators Design Stage 3: SoC System Design

(combining the core and selected extensions)

6

Cross Development Tools

Cycle-Accurate SW Simulator Profilers Macro Assembler Disassembler Linker and Librarian Visual Debugger Integrated Development Environment

(IDE)

7

Cross Development Workflow

C Source CodeCompiler

AssemblerAssembly Sources

Linker

Object Code

.asm

Absolute Binary Module

SimulatorDebugger

Profilers

Inte

gra

ted

Dev

elo

pm

ent

En

viro

nm

ent

(ID

E)

Analysis Tools

User

8

Cross Development Tools Role

At the Design Stage:Design space exploration and

prototyping by simulator based profiling.Early development of optimized

software.HDL verification.

At the Deployment Stage:Development of various production

software.

9

Formal Accelerator Description

Special language (ISE) for describing:Accelerator’s Memory StructureAccelerator’s ResourcesAccelerator’s Instruction Set:

• assembly syntax;• binary coding;• cycle-accurate behavior and

resource usage.

10

Memory Structure

DECLARE_MEMORY(INT(16, 3), 4096) LDM;DECLARE_MEMORY(INT(64, 3), 2048) TM;MEMORY(LDM, "Acc LDM");MEMORY(TM, "Acc TM");

DECLARE_REGISTERS_FILE(INT(16), 4) grn;

REGFILE_BEGIN(grn, "General Registers")REGISTER(0, "GR0");REGISTER(1, "GR1");REGISTER(2, "GR2");REGISTER(3, "GR3");

REGFILE_END()

11

Accelerator Instruction Set (1)

.types

grn = [GR0:0] [GR1:1] [GR2:2] [GR3:3]

acr = [ACR1:0] [ACR2:1]

.operands

GRs = {grn : SS}

GRt = {grn : TT}

ACRa = {acr : A}

ACRb = {acr : B}

12

Accelerator Instruction Set (2)

.instructionsALU01 {ADD GRs, GRt // syntax0110-00SS-0111-T0-T1 // codingconstraints {

GRs<>GRt : “GRs and GRt must be different”

}properties {

wgrn:GRs, rgrn:GRs, rgrn:GRt}

}

13

Accelerator Instruction Set (3)

ALU01 {

behavior {

GRs := GRs + GRt;

// GRs ≡ grn[#GRs]

}

}

14

Accelerator Instruction Set (4)

void ALU01 (OPCODE opcode)

{

UINT<2> GRs_ind = (opcode >> 8) && 3;

UINT<2> GRt_ind = ((opcode >> 3) && 1)

|| ((opcode >> 1) && 1);

grn[GRs_ind] =

grn[GRs_ind] + grn[GRt_ind];

FinishCycle();

}

15

Accelerator Instruction Set (5)

MAC01 {MAC ACRa, GRs, GRt…behavior {

// the first cyclemulres := GRs * GRt;FinishCycle();// the second cycleACRa := ACRa + mulres;

}}

16

Inter-Instruction Conflicts

.inter-constraints

[@e2_write_acr = read_acr] % error: “Write After Read conflict for accumulator”

[@p_write && memory_access] % warning: “1 cycle stall: memory access immediately after pointer update”

17

Cross Tools Reconfiguration (1)

1. Accelerator ISE description is created either visually or in plain text.

2. Accelerator description is compiled into a shared library module (.dll on Windows, .so on Linux).

3. Such modules are specified in the cross system configuration.

4. API is used by Core Simulator to execute accelerator instructions as fibers (explicitly controlled threads).

5. API is used by Assembler, Disassembler, Debugger and IDE to extract necessary meta-information about the accelerators.

18

Cross Tools Reconfiguration (2)

The Cross Tools

Automatic generation

On-the-fly reconfiguration

AcceleratorCross Module(.dll or .so)

ISE Specification

Visual ISE Editor

19

MetaDSP: General Layout

20

MetaDSP: Instruction Set Tree

21

MetaDSP: Instruction Properties

22

Results

Generalized model of extensible embedded systems.

A formalism for specifying particular accelerators in ISE language.

Tools for visual editing, analysis and verification of ISE specifications.

Framework for generating executable accelerator modules with meta-information.

Reconfigurable cross development tool chain dynamically extensible by the accelerator modules plugged-in.

23

MetaDSP Framework

The results have been used in MetaDSP – a framework for fast construction and modification of cross development tools for embedded systems.

Proved in 5 commercial projects for different extensible processor families:RISC 32/16 bit DSPsARM-like RISCVLIW DSP

Used in customer’s development teams in Sweden, Taiwan, China, USA

24

Implemented Accelerators

Fast Fourier Transform (FFT). Echo cancellation algorithms. Complex (imaginary) arithmetic

operations. Image processing operations

(JPEG accelerator). Digital voice filtering operations (FIR, IIR). Voice coding/decoding (AMR). MP3 music decoding.

25

Assembler

Parameterized macros support. Conditional assembly

(if, switch, repeat, etc.). Multi-dimensional arrays. Constant expression calculation. Inter-instruction conflicts detection. Automatic NOP insert. C debugging info in Dwarf2 format.

26

Linker

Memory holes optimization (both at variable and module levels).

Visual interface to control memory layout with advanced features (fixed/floating address, alignment).

27

SW Simulator

Cycle-Accurate. Fast speed

(50 MCPS on Core 2 2000Mhz). Pipeline simulation with stalls, zero-

overhead loops, interrupts and timers. Dynamically modifiable code support. Run-time semantics checks. Breakpoints, sample points, trace points.

28

Debugger and IDE Mixed C/Asm/Disasm source level debugging Projects support Full C expressions in Watch window Call Stack with frame switch Various display formats for register and memory

contents Code helper in editor Syntax highlighting in editor Breakpoints / Samplepoints / Tracepoints/

Watchpoints Source Browser RTOS debugging support Various profilers (linear, call graph, instruction

tree, RTOS load, RTOS sequence)

29

30

31

32

Contacts

Vladimir Rubanovvrub@ispras.ru

http://ispras.ru/groups/igroup/igroup.html

Recommended