19
Single Thread Parallelism

Single Thread Parallelism

  • Upload
    connie

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Single Thread Parallelism. Outline. Pipelining SIMD (both vector and array processing) VLIW DAE HPS Data Flow. An example: Vector processing. The scalar code: for i=1,50 A(i) = (B(i)+C(i))/2; The vector code: lvs 1 lvl 50 vld V0,B vld V1,C - PowerPoint PPT Presentation

Citation preview

Page 1: Single Thread Parallelism

Single Thread Parallelism

Page 2: Single Thread Parallelism

Outline

• Pipelining

• SIMD (both vector and array processing)

• VLIW

• DAE

• HPS

• Data Flow

Page 3: Single Thread Parallelism
Page 4: Single Thread Parallelism
Page 5: Single Thread Parallelism
Page 6: Single Thread Parallelism
Page 7: Single Thread Parallelism

An example: Vector processing

• The scalar code:

for i=1,50

A(i) = (B(i)+C(i))/2;

• The vector code:

lvs 1

lvl 50

vld V0,B

vld V1,C

vadd V2,V0,V1

vshf V3,V2,1

vst V3,A

Page 8: Single Thread Parallelism

Vector processing example (continued)

• Scalar code (loads take 11 clock cycles):

• Vector code (no vector chaining):

• Vector code (with chaining):

• Vector code (with 2 load, 1 store port to memory):

Page 9: Single Thread Parallelism
Page 10: Single Thread Parallelism
Page 11: Single Thread Parallelism

The HPS Paradigm

• Incorporated the following:– Aggressive branch prediction– Speculative execution– Wide issue– Out-of-order execution– In-order retirement

• First published in Micro-18 (1985)– Patt, Hwu, Shebanow: Introduction to HPS– Patt, Melvin, Hwu, Shebanow: Critical issues

Page 12: Single Thread Parallelism
Page 13: Single Thread Parallelism
Page 14: Single Thread Parallelism
Page 15: Single Thread Parallelism
Page 16: Single Thread Parallelism

Data Flow

• Data Driven execution of inst-level Graphical code– Nodes are operators– Arcs are I/O

• Only REAL dependencies constrain processing– Anti-dependencies don’t (write-after-read)– Output dependencies don’t (write-after-write)– NO sequential I-stream (No program counter)

• Operations execute ASYNCHRONOUSLY• Instructions do not reference memory

– (at least memory as we understand it)

• Execution is triggered by presence of data

Page 17: Single Thread Parallelism
Page 18: Single Thread Parallelism
Page 19: Single Thread Parallelism