Computing in Space - University of Warwick · Spatial Computing Paradigms • • • • Vs SIMD...

Computing in Space

Introduction

Familiar Visual Aid

FPGA Industry

There are two major players in the FPGA market

Where are FPGAs mostly used

So whats new?

Spatial Computing Paradigms

Vs SIMD

Control-flow Machine

Simple CPU Pipeline

Control-flow Computing example:IBM POWER 8, 12 cores @ 4 GHz

Spatial Computing Machine

Control Flow versus Data Flow

•–––

•–

––

Control Flow versus Data Flow

Which one would you rather do?

CPUs FPGAs

Data Flow specific properties

•––––

•–––

•• ➔

• ➔

Dataflow Computing

Converting Simple Expression

Flowing elements

public class MyKernel extends Kernel {

public MyKernel (KernelParameters parameters) {

super(parameters);

DFEVar x = io.input("x", dfeInt(32));

DFEVar result = x * x + 30;

io.output("y", result, dfeInt(32));}

The Full Kernel

Enabling large scale dataflow designs

Generating data on chip

for (int i = 0; i < N; i++) {q[i] = p[i] + i;

DFEVar p = io.input(“p”, dfeInt(32));DFEVar i = io.input(“i”, dfeInt(32));

DFEVar q = p + i;

io.output(“q”, q, dfeInt(32));

•–

Generating data on chip

DFEVar p = io.input(“p”, dfeInt(32));DFEVar i = control.count.simpleCounter(32, N);

DFEVar q = p + i;

io.output(“q”, q, dfeInt(32));

••

––

Stream Offsets

•→

Stream Offsets

Moving Average in MaxCompiler

Kernel Execution

Boundary Cases

More Complex Moving Average

Kernel Handling Boundary Cases

Starting on Scientific Computing

●●

for (i = 0; ; i += 1) { float d = input[i]; float v = 2.91 – 2.0*d; for (iter=0; iter < 4; iter += 1)

v = v * (2.0 - d * v); output[i] = v;}

Loop Unrolling in space with Dependence

float d = input; float v = 2.91 – 2.0*d; for (iter=0; iter < 4; iter += 1) v = v * (2.0 - d * v); output = v;

Loop Unrolling with Dependence

••

Variable Length Loopint d = input;int shift = 0;while (d != 0 && ((d & 0x3FF) != 0x291)) { shift = shift + 1; d = d >> 1;}output = shift;

// converted to fixed lengthint d = input;int shift = 0;bool finished = false;for (int i = 0; i < 22; ++i) { bool condition = (d != 0 && ((d & 0x3FF) != 0x291)); finished = condition ? true : finished; // loop-carried shift = finished ? shift : shift + 1; // dependencies d = d >> 1;}output = shift;

•••

Variable Length Loop – in hardwareint d = input;int shift = 0;bool finished = false;for (int i = 0; i < 22; ++i) { bool condition=(d!=0&&((d&0x3FF)!=0x291)); finished = condition ? true : finished; shift = finished ? shift : shift + 1; d = d >> 1;}int output = shift;

•––––

•–––

To Unroll or Not to Unroll

Unrolling in time - Acyclic pipeline

sum = 0.0;for (int j=0; j<M; j += 1) { sum = sum + input[j];}output = sum;

●●

Towards some Linear Algebra

•••

Multiple row sums simultaneously using one adder

••

•–––

••

Number Representation

••

•––

Fixed Point Numbers

••

Fixed Point Mathematics

••

•––

Floating Point Representation

•–

•––

Arithmetic takes Space on the DFE

MULT usage for N x M multiplication

Bits 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 5418 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 320 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 322 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 324 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 326 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 628 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 630 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 632 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 634 2 2 2 2 4 4 4 4 4 4 4 4 4 6 6 6 6 6 636 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6 738 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6 740 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6 742 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 6 744 3 3 3 3 6 6 6 6 6 6 6 6 6 9 9 9 9 9 946 3 3 3 3 6 6 6 6 6 6 6 6 6 9 9 9 9 9 948 3 3 3 3 6 6 6 6 6 6 6 6 6 9 9 9 9 9 950 3 3 3 3 6 6 6 6 6 6 6 6 6 9 9 9 9 9 952 3 3 3 3 6 6 6 6 6 6 6 6 6 9 9 9 9 9 954 3 4 4 4 6 6 6 6 6 7 7 7 7 9 9 9 9 9 10

What about error vs area tradeoffs▪ Bit accurate simulations for different bit-width configurations.

[L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang, Accelerating solvers for global atmospheric equations through mixed-precision data flow engine, FPL2013]

••

Finally

Computing in Space - University of Warwick · Spatial Computing Paradigms • • • • Vs SIMD...

Documents

Box2D with SIMD in JavaScript - GitHub Pagespeterjensen.github.io/html5-box2d/html5-box2d.pdf · Native SIMD -> JavaScript SIMD JavaScript Bindings Box2D SIMD opportunities Performance

Maximise Application Performance - Fujitsu...Instruction-level parallelism with SIMD instructions Improvement of computing efficiency using Expanded registers Improvement of cache

Mobile computing-tcp data flow control

SISTEME SIMD

Www.scotland.gov.uk/simd Presentation Outline SIMD Background SIMD 2009 Methodology SIMD 2009 Results Where to find more information Questions

Computing the fast Fourier transform on SIMD microprocessorscs.waikato.ac.nz/~ihw/PhD_theses/Anthony_Blake.pdf · 2012. 6. 11. · Anthony Blake: Computing the fast Fourier transform

nd Breaking SIMD Shackles with an Exposed Flexible … · 2013-09-22 · SIMD [32] and hardware support for scatter/gather. In general, 1. Control Flow Strided Access Loop Carried

Maximizing SIMD Resource Utilization in GPGPUs with SIMD

Data management and control-flow aspects of an SIMD/SPMD …hj/journals/43.pdf · 2006-10-06 · mode parallelism, parallel processing, PASM, reconfigurable ma- chines, SIMD, SPMD

Computing unsteady flow and tracer - Helda

Intel SIMD architecture

Yaminabe simd

FFT Algorithms for SIMD Parallel Processing Systems*hj/journals/24.pdf · 2006-10-06 · JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 3, 48-71 (1986) FFT Algorithms for SIMD Parallel

SIMD Programming and What You Must Know about …Contents 1 Introduction 2 SIMD Instructions 3 SIMD programming alternatives Auto loop vectorization OpenMP SIMD Directives GCC’s

Introduction SIMD Parallelismgec.di.uminho.pt/Discip/MInf/cpd1112/CSP/DataParallelism...SIMD Parallelism " Vector architectures " SIMD extensions " Graphics Processor Units (GPUs)

Arquitecturas SIMD

FLOW COMPUTING QRATE Scanner 3000 series flow computers

Automatic SIMD Vectorization of SSA-based Control …...Automatic SIMD Vectorization of SSA-based Control Flow Graphs Dissertation zur Erlangung des Grades des “Doktors der Ingenieurwissenschaften”

Information Flow Security Models for Cloud Computing

Detailed Syllabus of Information Technology PGIT101 ... · Introduction to high performance Computing – Overview, Flynn’s classifications – SISD, SIMD, MISD, MIMD, Examples