Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science...

Multithreaded ASC

Kevin Schaffer and Robert A. Walker

ASC Processor GroupComputer Science Department

Kent State University

Organization of an ASC Computer

Processing Element 1

Control Unit(Scalar Processor)

Processing Element 2

Processing Element N

……

Broadcast/Reduction Bottleneck

Time to perform a broadcast or reduction increases as the number of PEs increases

Even for a moderate number of PEs, this time can dominate the machine cycle time

Pipelining reduces the cycle time but increases the latency

Additional latency causes pipeline hazards

Instruction Types

Scalar instructions Execute entirely within the control unit

Broadcast/Parallel instructions Execute within the PE array Use the broadcast network to transfer instruction and data

Reduction instructions Execute within the PE array Use the broadcast network to transfer instruction and data Use the reduction network to combine data from PEs

Scalar Pipeline

Instruction Fetch (IF)

Instruction Decode (ID)

Execute (EX)

Memory Access (M)

Write Back (W)

Hazards in a Scalar Pipeline

Unified SIMD Pipeline

Broadcast (B1...Bn)

Reduction (R1...Rn)

Number of stages is variable

All instructions go through every stage

Diversified SIMD Pipeline

Separate paths for each instruction type so instructions only go through stages that they use

Stalls less often than a unified pipeline organization

Hazards

Multithreading

Pipelining alone cannot eliminate hazards caused by broadcast and reduction latencies

Solution: use instructions from multiple threads to keep the pipeline full

Instructions from different threads are independent so they cannot generate stalls due to data dependencies

As long as there are a sufficient number of threads, it is possible to fill any number of stall cycles

Types of Multithreading

Coarse-grain multithreading switches to a new thread when the current thread encounters a high latency operation

Fine-grain multithreading switches to a new thread every clock cycle

Simultaneous multithreading can issue instructions from multiple threads in the same clock cycle

For a SIMD processor, fine-grain or simultaneous multithreading is necessary as pipeline stalls are relatively short and occur frequently

Multithreaded Control Unit

Fetch Unit

InstructionCache

Decode Unit 1

Decode Unit 2

Decode Unit 3

Decode Unit N

Instruction Status Table

Reduction Hazard with a Single Thread

Reduction Hazard with Multiple Threads

Execution Time vs. Latency

1 2 3 4 5 6 7 8

Communication Latency (cycles)

ASC Multithreaded ASC MASC

Throughput vs. Latency

1 2 3 4 5 6 7 8

Communication Latency (cycles)

ASC Multithreaded ASC MASC

Multithreaded ASC vs. MASC

A multithreaded ASC computer can execute at most one instruction in a cycle

A MASC computer with j instruction streams can execute up to j instructions in a cycle

In multithread ASC each thread can access every PE

In MASC each instruction stream can only access its partition of PEs

A multithreaded MASC computer could combine the advantages of both

Multithreaded ASC

Multithreaded MASC

Multithreaded ASC Processor

In order to validate simulation results and estimate hardware costs, a prototype processor was developed

Targeted for an Altera Cyclone II (EP2C35) FPGA

Using an FPGA makes it possible to get detailed measurements of speed and hardware cost

Additional Enhancements

Flags (logical values) are a first-class data types with their own set of registers and instructions

Extra reduction operators Count Responders Sum

Hardware semaphores for thread synchronization

Synthesis Results

Targeted for an Altera Cyclone II FPGA (EP2C35)

16 x 16-bit PEs

16 hardware threads

Clock speed: 75 MHz

Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science...

Documents

Multithreaded Data Transport

CAPACITIVE ACCELEROMETER S PIEZORESIS TIVE …ˆr… · asc 4315 ln asc 4325 asc 4325-mf asc 4415 ln asc 4425 asc 4425-mf asc 4515 ln asc 4525 asc 4625 asc os 115 ln* LOW NOISE

Autumn schaffer reading_log_1

Autumn schaffer learner_analysis

Ground Functionalism - Jonathan Schaffer

Multithreaded Algorithms

Presentacion dra gisela schaffer

Schaffer a info_literacy_pathfinder

Multithreaded Chapter 4: Multithreaded Programming

Multithreaded Processors

Jane Schaffer Writing Strategy

Ch04 - Multithreaded Programming

Multithreaded programming Java provides built-in support for multithreaded programming. A multithreaded program contains two or more parts that can run

Autumn schaffer story board

Multithreaded Applications

03 Michael Schaffer

Multithreaded Architectures

Multithreaded Programming Guide - Oracle · Multithreaded Programming Guide - Oracle ... Alarms.....150

Schaffer a collection_development_plan_part_1

Portfolio Bianca Schaffer