10
An Execution Model for An Execution Model for Heterogeneous Multicore Heterogeneous Multicore Architectures Architectures Gregory Diamos, Andrew Kerr, and Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering School of Electrical and Computer Engineering Georgia Institute of Technology Georgia Institute of Technology

An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

Embed Size (px)

Citation preview

Page 1: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

An Execution Model for Heterogeneous An Execution Model for Heterogeneous Multicore ArchitecturesMulticore Architectures

Gregory Diamos, Andrew Kerr, and Sudhakar Gregory Diamos, Andrew Kerr, and Sudhakar YalamanchiliYalamanchili

Computer Architecture and Systems LaboratoryComputer Architecture and Systems LaboratoryCenter for Experimental Research in Computer SystemsCenter for Experimental Research in Computer Systems

School of Electrical and Computer EngineeringSchool of Electrical and Computer EngineeringGeorgia Institute of TechnologyGeorgia Institute of Technology

Page 2: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Software Challenges of Heterogeneity

Programming Model

Execution Model

Portability

Performance

Page 3: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

System Space

System Size and Configuration

Level o

f Ab

stractio

n

Multi-nodeMulti GPUMulticore CPU

Single GPUMulticore CPU

Runtime Execution Model(Harmony)

Runtime Translation of Data-Parallel IR

(Ocelot)

Page 4: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Scalable Portable Execution – Harmony Runtime

LocalMemoryCache

ACC

DMA

FIFOLocalMemoryCache

LocalMemoryCache

ACC

DMA

FIFOACC

DMA

FIFO

Network (e.g., Hypertransport, QPI, PCIe)

CPU CPU CPU

Harmony Run-time

chunk

chunk

Inputs Outputs InputsOutputs

Memory

Transparent scheduling, execution management of chunks

Binary compatibility across system sizes

Cap Model 3readInputs();computeInvariants();for all chunks{ simulateChunk();}generateResults();

Minimize/avoid retuning and porting applications as you add accelerators

Advanced optimizations Speculation, performance prediction, kernel fusion

kernel

kernel

Page 5: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Page 6: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Emerging Environment

Run Time (Harmony)

CUDA JITLLVM I/FEmulator

GPGPU SimulatorSupported ISAs (MIPS,

SPARC, x86, etc.)

Language Front End

Kernel IR

Language Front End

Ocelot

Prof. H. Kim

StatusStatus: • Single node/multi-GPU

StatusStatus: • Test and Debug

StatusStatus: • In progress (Fall 2009)

StatusStatus: • Summer 2009• With Prof. Nate Clark

Datalog

CUDA/OpenCL

Page 7: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Emerging HVM Platform Architecture

With K. Schwan and A. Gavrilovska

Page 8: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Problem Scaling – Risk Analysis Application

With latest CPUs (2x faster) and GPUs(4x faster), GPU advantage should grow by 2x

Measured execution

times

GPU interactive overhead dominates

Page 9: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Other Applications

Page 10: An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Execution Group

GPU Compilation Flow

Abstract Syntax Tree(Datalog Clauses)

GPU(EU)

GPU(EU)

GPU(EU)

GPU Core CPU Core

Data Structures Compute Kernels

P

P

Runtime

Clauses to Execution Units

Execution Units to Algorithms (Kernels)

Predicates to Data Structures

Runtime Mapping of Kernels to Cores