23
11-May-04 <1> Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm for Control Hazards in Asynchronous Pipelines

11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <1>

Qianyi Zhang School of Computer Science, University of Birmingham

(Supervisor: Dr Georgios Theodoropoulos)

A Distributed Colouring Algorithm for Control Hazards

in Asynchronous Pipelines

Page 2: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <2>

Outline

• Asynchronous Hardware• Handling the Control Hazards Problem

– In a pipeline architecture

– In asynchronous hardware

– In a multistage asynchronous pipeline

• A Generic Distributed Colouring Solution– Multi-colour vector

– Function of each pipeline stage and arbitrate unit

– A Constructive Proof

• Some Results• Summary

Page 3: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <3>

The Problem of Synchronisation• Digital system:

– a collection of subsystems performing different computations and communicating to exchange information.

– Before a communication transaction the subsystems need to synchronise: wait for a common control state to be reached which guarantees the validity of data exchanged.

• Synchronous: Global clock defines the points in time when communication can take place (Time Driven)

• Problems:– Clock Skew

– Power Skew

– Modularity

– PerformanceSender Receiver

Page 4: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <4>

Asynchronous Logic

• An alternative digital design philosophy:– It allows each sub-system to operates at its own rate and communicates

with its peers only when it needs to exchange information.

– The synchronisation is achieved by the communication protocol: local request and acknowledge signals which provide information regarding the validity of data signals.

2-phase Handshake protocol Sutherland’s micropipeline

Page 5: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <5>

Asynchronous Logic• Asynchronous design techniques have been explored since1950s

but failed to become mainstream: difficulty to enforce specific orderings of operations and to deal with circuit hazards and dynamic states

• Last decade has witnessed a resurgence of interest in Asynchronous Logic– Solution to clock skew problem : No clock = no skew!– Potential for low power: Circuit components activated only when necessary– potential for higher performance: lower power allows increased supply

voltages; average case optimisation – Potential for better technology migration: Modularity– Better EMC: generate low, uncorrelated Electro-Magnetic Interference

• However, it’s also bring new problems:– may result in a larger circuits: REQ & ACK signals– More difficult to design and understand their behaviour

Page 6: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <6>

Handling Control Hazards

• The Control hazards problem in a pipelined architecture– Control hazards arise when an instruction such as a branch a jump, or

the occurrence of an unpredictable event such as an exception, changes the flow of control.

– In a pipeline architecture: the prefetched instructions following a hazard must be removed from the pipeline before the new stream comes.

– The processor must be able to distinguish between instructions originating from the branch or the exception target and instructions already prefetched

InsMem

RegisterBank

ALU

DataMem

IF ID EX MEM WB

SUB $2, $1, $3 AND $3, $2, $4 OR $4, $1, $2 JR $25 SW $15, 100($2)

Page 7: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <7>

Handling Control Hazards

• Control hazards in Synchronous vs Asynchronous Hardware– Synchronous: the depth of prefetching is defined by the clock cycles

and is therefore deterministic.

– Asynchronous: the exact number of the prefetched instructions is nondeterministic and therefore unpredictable: the depth of the prefetching depends on the precise point that the branch or the exception takes place.

SUB $2, $1, $3 AND $3, $2, $4 OR $4, $1, $2 JR $25 SW $15, 100($2)

InsMem

RegisterBank

ALU

DataMem

SUB $2, $1, $3

Need a new strategy !

Page 8: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <8>

• Technique devised for AMULET1 processor (Manchester)

• When a control hazard occurs the colour of the processor changes

• Each instruction address issued to memory, carries the latest operating colour of the processor which will be used to mark the corresponding fetched instruction.

• The colour bit of an instruction which arrives at the datapath for execution, is compared with the current colour of the processor and if a match is not found, the instruction is discarded.

Using “Colour”

… AND $3, $2, $4 OR $4, $1, $2 JR $25

0 1 0

New stream

Page 9: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <9>

Control Hazard in Multiple Stages

• One colour bit is not enough– How many colours we need?

– How to arbitrate if more than

one stages send requests

simultaneously

• Two basic observations– The state of the system is distributed

– Stages that are deeper in the pipeline have higher priority than stages before them: a control transfer event that occurs at a pipeline renders other events that may occur in pipeline stages earlier in the pipeline irrelevant and invalid, event if the latter precede the former in time.

Page 10: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <10>

A Generic Distributed Technique• A colour vector with priorities:

– One colour bit per stage– A vector C = (c1, c2, c3, …,cn,)

in the set Cn, where C is the set of colours C = {0,1}, n is the number of stages in the pipeline and ci is the colour

of the stage i.• Priority of ci > Priority of cj, i>j

• Two arbitrations are made– An Address Arbitration Unit (AAU) : reject the invalid control hazard request– Each Stage: discard the prefetched instructions following the hazard

S1 S2 S3 S4

Page 11: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <11>

• The Address Arbitration Unit– Operates as an autonomous unit

issuing to memory instruction addresses as they arrive from the Program Counter (normal operation) or from the pipeline stages (in the case a control hazard occurs).

– Keeps a record of the colour state of the processor (vector c)

– If a new transfer address arrives from stage Sk:

• If any higher priority colour bit (cj where j>k) in the address is different than the corresponding colour bit of the AAU, rejects the address

• Otherwise lets it through and updates own copy of vector c

A Generic Distributed Technique

Page 12: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <12>

A Generic Distributed Technique• Each stage Sk in the pipeline

– Keeps a record of the colour state of the processor (vector c) which it reads from the instructions as they get through

– For each new instruction that arrives: • If any higher priority colour bit (cj where j>k) in the instruction is

different than the corresponding colour bit of the stage: lets instruction through and

updates own copy of vector c• Otherwise:

– If own colour bit different rejects instruction– Otherwise executes instruction

S1 S2 S3 S4

Page 13: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <13>

0000

AAU

0000 0000

000000000000000000000000

0000000000000000

0000000000000000

0100

0100

(1)

00000100

*

*

A Constructive Proof

000000000000000000000000

0100010001000100

0000000000000000

Page 14: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <14>

00000100

0100

0100

0100

AAU

(1)

*

*

A Constructive Proof

000000000000000000000000

0100010001000100

0000000000000000

Page 15: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <15>

0000

AAU

0000 0000

000000000000000000000000

0000000000000000

0000000000000000

000000000000000000000000

0100010001000100

0000000000000000

00010100

(2)

**

0100*

*

A Constructive Proof

000000000000000000000000

0100010001000100

0001000100010001

0001

Page 16: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <16>

0001

AAU

00010100

**

000000000000000000000000

0100010001000100

0001000100010001

*

*

0000 00000100

(2)

A Constructive Proof

Page 17: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <17>

0001

AAU

00010100

**

000000000000000000000000

0100010001000100

0001000100010001

*

*

01000001

0001

(2)

A Constructive Proof

Page 18: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <18>

0000

AAU

0000 0000

000000000000000000000000

0000000000000000

0000000000000000

0001

0001

0100

(2)

000000000000000000000000

0000000000000000

0001000100010001

00000001

*

*

*

*

A Constructive Proof

Page 19: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <19>

0001

AAU

00010100

(2)

000000000000000000000000

0000000000000000

0001000100010001

0001

**

*

*

0001

A Constructive Proof

Page 20: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <20>

An Integrated Framework for Formal Verification and Distributed Simulation of

Asynchronous HardwareEPSRC Project No. GR/S11091/01 & GR/S11084/01

• £380,000+ - for 3 years starting April 2003• Objectives:

– Exploit compositionality of designs to enable automatic support for refinement checking, equivalence checking and deadlock detection.

– Investigate applicability of data independence as a means to automate datapath abstraction and verification of parameterised component descriptions.

– Investigate applicability of semi-formal techniques in the context of asynchronous hardware.

– Develop algorithms and techniques for partitioning, load balancing, synchronisation and monitoring to support the distributed simulation.

– Develop a prototype CSP-oriented integrated environment for the specification, distributed simulation and formal verification of asynchronous VLSI systems.

– Develop test cases and conduct experiments to test and evaluate our approach.

Page 21: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <21>

Evaluation

– Synthesisable asynchronous implementation of MIPS R3000 processor core– Compatible Instruction Set with R3000– 5-stage pipeline datapath – With precise exceptions

• Balsa:– a synthesis tool for Asynchronous Hardware, developed by AMULET

group– A asynchronous hardware description language based on CSP– A discrete event simulator on RTL level– A compiler for gate level netlist

Page 22: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <22>

• The Balsa model of S1

• The Balsa model of AAU

Evaluation

Cost Estimation of Stages

Cost comparison with SAMIPS

Page 23: 11-May-04 Qianyi Zhang School of Computer Science, University of Birmingham (Supervisor: Dr Georgios Theodoropoulos) A Distributed Colouring Algorithm

11-May-04 <23>

Summary and Future Work

• A distributed colouring algorithm for dealing with control hazards in asynchronous pipeline

• The main advantages:– It provides flexibility in designing the pipeline of the processor, enabling

perfeching at any depth

– Low extra cost introduced in terms of silicon area

• This approach has just been integrated to SAMIPS and proved correct in functionality.

• We will evaluate the performance of this approach and the overhead it imposes in terms of time and power