20
Introduction to multiprocessor programming

Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Introduction to multiprocessor programming

Page 2: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Why do we need multi-core processors?

n Until ~2005 processor performance increase driven by

l Clock speedl Execution optimizationl Cache

n Power walln ILP wall

è Led to multicore processors

è Parallelism must be exposed by the programmer

(source http://www.gotw.ca/publications/concurrency-ddj.htm)

Page 3: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Processor Breakthroughs ....

A major architecture disruption: multiprocessing and specialization will have a strong impact on software

Time

Processor Power

Intel 8086

1980 1990 2000

PowerQUICC II

Multi-processing

Processor specialization

2015-2020 (?)

Architectural break point

CISC era

RISC era

Technology limitations: perf. by parallelism no more by frequency =>

disruption in programming model, long term research challenges

Domain oriented architectures: eg withpredictable performance to control the timeliness in RT critical applis, dynamicreconfiguration for adaptive, distributed

critical architectures (multilevel RT composability)

Source:G. Edelin

(Thales), 2009

Page 4: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Motivation

End user perspective Target architecture perspective

• Explore/Develop algorithms

• Use a simple, comfortable language• E.g. Matlab, Scilab, …

• Don’t want to care about • data types• parallelism

• End result• Performance• Energy efficient• Cost efficient• Fast development time

• Multi-Processor System-on-Chip

• Parallel processor cores• Parallel programming model• E.g. pthreads, MPI, OpenMP

• Parallelism with the processor cores• Single Instruction Multiple Data• Very Long Instruction Word

• Native data types• E.g. 32-bit integer• Other data types perform

inefficient

Page 5: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA in a Nutshell

n Hide the complexity of the underlying hardware to the end usern ALMA will develop an approach for compiling annotated Scilab

code to MPSoC architecturesn Algorithms and tools for

l High-level, platform-independent application code performance estimation and optimization

l Identification of possible partitions and their placement on different resources of the underlying architectures

l Data type binding and data parallelization to exploit data-level parallelism

n Develop an unified SystemC simulation framework to provide an environment for simulating MPSoCs

n Two state-of-the-art architectures provided by RECORE and KIT

n Net result: smaller application development time/effort and faster time-to-market

Page 6: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Objectives

n Extend Scilab for optimization on high-level system models

n Develop a parallelization and optimization environment

n Employ and extend two different architectures

n Parallel code generation

n Parallel code simulation

Page 7: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Challenges for Compiling Scilab to MPSoCs

n Scilab (Matlab-like) programming languagel Dynamic typing (scalars, vectors, matrices)l Pointer-free, i.e. no memory aliasing problemsl End users typically use floating-point data typesl Natural parallelism within vector operations

n MPSoC target architecturesl Exploit coarse-grain parallelism (task-level)

n Distributed memoryl Exploit fine-grain parallelism (instruction-level)

Page 8: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA Architectures (1/2): Recore X2014

n Scalability by virtue ofl Packet-switched Network-on-Chipl Distributed memories & I/Ol Distributed controll Distributed processing cores

n Xentium® processing tilel Fixed-point DSP processingl 10-issue VLIW processorl SIMD capabilityl Streaming communication services

n Reconfigurabilityl Smart memory tile (RAM/FIFO)l Separate applications from each othersl Guarantee QoSl Fault tolerant application mapping

Page 9: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA Architectures (2/2): KIT Kahrisma

Proc

esso

r Con

trol

Uni

t

Mai

n M

emor

y

Con

text

Mem

ory

Cac

he S

ubsy

stem

Control Flow Tiles

Rename Tiles

EDPE Array

Instruction Cache Tiles

EDPE

EDPE

EDPE

EDPE

DSP Instance I DSP Instance II

2-issue VLIW 2-issue VLIW ISA4-issue VLIW ISA 6-issue VLIW ISA ProcessorInstances

DSP Instances

Inst

ruct

ion

pre-

proc

essi

ng

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

…...

…...

…...

…...

…...

…...

n Dynamic reconfigurable MPSoC

n Modules can be reconfigured to processors or DSPs

n Dynamic clustered VLIW processor instances

n Local scratchpad memory n Non-coherent access to

main memoryn Communication between

cores through a network

Page 10: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA Development Flow (overview)

Optimized application code on multi-core platform

Embedded application design Multi-core hardware design

Translation to Scilab &

annotations

Abstract hardware

description (ADL)

KITC-compiler

Multi-core simulator

Parametersforalgorithmoptimization

C-basedcodewithparalleldescriptions

ALMA algorithm

parallelization tools

Executablebinary(forsimulatorandHW)

Recore C-compiler

Structuralhardwaredescription

Feedbackforoptimization

Page 11: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Input for the ALMA tools

ALMA dialect of the Scilab language

n Subset of Scilab language

n Extended by a preprocessing language l Variables declarationl Static types specificationl Maximum size of vector and matrix data

types definition

n Extended by an annotation language for supporting parallelism extraction

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 12: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA Front-end tools

n Scilab Front-End (SAFE)n Parses Scilab source code and

produces high level intermediate representation (HLIR) expressed in C

n ALMA profiler (aprof)l Early performance estimation at the

HLIR level

n High-Level Optimizer (HLO)l Applies platform independent

optimizations to the HLIR

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 13: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Parallelization Tools (Fine grain extraction)

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

n Floating point to fixed pointl No hardware support for FP in

embedded multi-core systemsl Provides an automated floating to

fixed point conversion tool

n SIMD/SWP parallelizationl Loop parallelization and layout

optimization for SIMD ISA. l Explore perf./accuracy trade-off in

fixed point encodings

Page 14: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Parallelization Tools (Coarse-grain extraction)

Coarse-grain parallelism extraction and optimization

n Responsible for the global optimizationn Transformation of ALMA IR CFDG to

Hierarchical Task Graph (HTG)n Resource availability from ADLn HTG partitioning to coresn Optimal mapping and scheduling of

tasks to architecture resourcesn Exploits profiling information from the

simulator for better resource usage estimation

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 15: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Parallel platform code generation

Parallel platform code generation

n Generates target-specific C codel Maps Scilab variables to memory

locationsl Expresses communicationl Expresses SIMD instruction as

intrinsicsn Uses Recore/Kahrisma C compiler

l Exploits ILP by VLIW compilationl Generates executable for the

hardware and simulator

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 16: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Multicore architecture simulation

Multicore architecture simulation

n Simulation of ALMA target architectures

n Retargetablel Structure defined by ADLl Implementation by library of SystemC

modulesn Mixed-accuracy simulation

l Behavioural or cycle-accuratel For individual modules (processor core,

memory subsystem, network)n Collect profiling and tracing

information

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 17: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

ALMA Architecture Description Language (ADL)

ALMA ADL

n Architecture Description Language (ADL)

n Tailored to the requirements of ALMA1. Enables target independence of the

compilation toolchain2. Used as architecture description for the

simulator3. Enables design-space exploration

n Compact specification of regular MPSoC structures by for and ifconstructs

n Structural specification annotated with behavioural information

Applications

Telecom-munication

ImageProcessing

AnnotatedScilabCode

ArchitectureDescription

ADL

ADLCompiler

GeCoSFramework

Fine-GrainParallelismExt.

Corase-GrainParallelismExt.

ParallelCodeGeneration

ALMAIR

ALMAIR

AnnotatedCCode

Target-Spec.Compilation

CCode+Back-Annotation

ALMAMulti-CoreSimulator

Binary

ProfileInformation

JSON

KahrismaCompiler

RecoreCompiler

ALMAArchitectures

KahrismaArch.

RecoreArch.

Iterativ

eOptim

izatio

n

ALMAFront-EndTools

Source-LevelProfiler

ProfileInformation

SciLabFront-End(SAFE)

High-LevelOptimizer

HLIR

HLIR

Legend

ADL

App.Code

Information

Page 18: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

n IEEE 802.16e PHY Layer in NT x NR MIMO Configurationn Typical example of a state-of-the-art wireless communication system

n Application requirements impose hard, real-time constraints.n Design time must follow shrinking time-to-market.

Test Case (1/2): Telecommunications

Rx 1

Rx NR

FFT

Equalizer

Channel Estimator

Derando mizer

Deinter leaver

Symbol Deconstr

uction

- Cyclic Prefix

DiversityCombiner

- Cyclic PrefixFFT

SDU Generati

on

Data SDUs

Uplink

Frame Deconstr

uction

MAC-PHY I/F

BS Rx

`

ALMA 1st IncrementALMA 2nd IncrementTx 1

Tx NT

FEC Encod

erInterleaver

Constel. Mapping

IFFT + Cyclic Prefix

S-T Coding

IFFT + Cyclic Prefix

+ Pre amble

Data SDUs

PHYMAC

UL/DL Frame Mapper

UL/DL Schedul

er

BS Tx

PDU Generation

MAC-PHY I/F

Frame Construction

DownlinkMAC/PHY Control

Symbol Construction

Randomizer

. . . . . .

. . .

. . .

. . . . . . . . .

FEC Decoder

Const. Demap

Page 19: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

n Feature based algorithm for object recognition and multi-object tracking

n Use of Scale Invariant Feature Transform (SIFT)

n The final goal is to run such applications in smart cameras

Test Case (2/2): Image Processing

Page 20: Introduction to multiprocessor programming › modules › document › file.php › CIED11… · Introduction to multiprocessor programming. Why do we need multi-core processors?

Summary

n ALMA Goal: Hide the complexity of the underlying hardware to the end user

n ALMA will develop an approach for compiling annotated Scilab code to MPSoCs

n ALMA toolchain componentsl Front-end tools (Scilab parser, high-level optimizer, profiler)l Fine-grain parallelism extractionl Coarse-grain parallelism extractionl SystemC multi-core simulator

n ALMA toolchain is kept platform independent by a novel Architecture Description Language

n Two state-of-the-art architectures provided by RECORE and KIT

n Evaluated by two test cases from Telecommunicationsand Image Processing domain