50
AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011

AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Embed Size (px)

Citation preview

Page 1: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

AKT211 – CAO

06 – More on Advanced Processing Techniques

AKT211 – CAO

06 – More on Advanced Processing Techniques

GhifarParahyangan Catholic University

Okt 17, 2011

GhifarParahyangan Catholic University

Okt 17, 2011

Page 2: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

OutlineOutline

Pipeline RISC vs RISC Superscalar

Page 3: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

INSTRUCTION PIPELINE

Page 4: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

PipelinePipeline

• Problem with single cycle design– Slowest instruction pulls down the

clock frequency– Resource utilization is poor– There are some instructions which

are impossible to be implemented in this manner

• Organizationally needs a change pipeline

Page 5: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

PipelinePipeline

• Similar to the use of an assembly line in a manufacturing plant– Products at various stages can be worked

simultaneously

Page 6: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Two-Stage Instruction PipelineTwo-Stage Instruction Pipeline

Any problem ? The ‘fetch’ has to wait if :1. T(exec) >T(fetch) !2. There is a branch instruction

Page 7: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Six-Stage Instruction PipelineSix-Stage Instruction Pipeline

Decomposing the instruction processing into :

• Fetch Instruction (FI)• Decode Instruction (DI)• Calculate Operands (CO)• Fetch Operands (FO)• Execute Instruction (EI)• Write Operands (WO)

Page 8: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Six-Stage Instruction PipelineSix-Stage Instruction Pipeline

• Assumes that :– no memory conflicts– no branches– no interrupts

Page 9: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Six-Stage Instruction PipelineSix-Stage Instruction Pipeline

With branches :

• Penalty : no instructions complete during time units 9 - 12

Page 10: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Six-Stage Instruction PipelineSix-Stage Instruction Pipeline• Modified algorithm :

Page 11: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Pipeline PerformancePipeline Performance

• The cycle time of an instruction pipeline :

Page 12: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Pipeline PerformancePipeline Performance

• Let T[k,n] be the total time required for a pipeline with k stages to execute n instructions (total execution time) :

• Pipeline speedup :

Page 13: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Speedup Factors with Instruction PipelineSpeedup Factors with Instruction Pipeline

Page 14: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Pipeline HazardsPipeline Hazards

• Occurs when the pipeline, or some portion of the pipeline, must stall/idle because conditions do not permit continued execution

• 3 types of hazards :1. Resource Hazards2. Data Hazards3. Control Hazards

Page 15: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Resource HazardsResource Hazards

• occurs when two (or more) instructions that are already in the pipeline need the same resource

• Sometime referred as a structural hazard

Page 16: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Data HazardsData Hazards

• occurs when there is a conflict in the access of an operand location– ADD EAX, EBX /* EAX = EAX + EBX */– SUB ECX, EAX /* ECX = ECX - EAX */

• 3 types of data hazards :– Read after write (RAW)– Write after read (WAR)– Write after write (WAW)

Page 17: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Control HazardsControl Hazards

• knows as a branch hazard• occurs when the pipeline makes the wrong

decision on a branch prediction and therefore brings instructions into the pipeline that must subsequently be discarded

Page 18: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

CISC vs RISC

Page 19: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

What is CISC?What is CISC?

• CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use of memory. Since the earliest machines were programmed in assembly language and memory was slow and expensive, the CISC philosophy made sense, and was commonly implemented in such large computers as the PDP-11 and the DECsystem 10 and 20 machines.

• Most common microprocessor designs such as the Intel 80x86 and Motorola 68K series followed the CISC philosophy.

• CISC was developed to make compiler development simpler. It shifts most of the burden of generating machine instructions to the processor. For example, instead of having to make a compiler write long machine instructions to calculate a square-root, a CISC processor would have a built-in ability to do this.

Page 20: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

CISC AttributesCISC Attributes

The design constraints that led to the development of CISC (small amounts of slow memory and fact that most early machines were programmed in assembly language) give CISC instructions sets some common characteristics:• A 2-operand format, where instructions have a source and

a destination. Register to register, register to memory, and memory to register commands. Multiple addressing modes for memory, including specialized modes for indexing through arrays

• Variable length instructions where the length often varies according to the addressing mode

• Instructions which require multiple clock cycles to execute.

E.g. Pentium is considered a modern CISC processor

Page 21: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Most CISC hardware architectures have several characteristics in common: • Complex instruction-decoding logic, driven by the need for

a single instruction to support multiple addressing modes. • A small number of general purpose registers. This is the

direct result of having instructions which can operate directly on memory and the limited amount of chip space not dedicated to instruction decoding, execution, and microcode storage.

• Several special purpose registers. Many CTSC designs set aside special registers for the stack pointer, interrupt handling, and so on. This can simplify the hardware design somewhat, at the expense of making the instruction set more complex.

CISC Hw. ArchitectureCISC Hw. Architecture

Page 22: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

At the time of their initial development, CISC machines used available technologies to optimize computer performance.• Microprogramming is as easy as assembly language to

implement, and much less expensive than hardwiring a control unit.

• The ease of microcoding new instructions allowed designers to make CISC machines upwardly compatible: a new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers.

• As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory.

• Because micro-program instruction sets can be written to match the constructs of high-level languages, the compiler does not have to be as complicated.

Page 23: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

CISC DisadvantagesCISC Disadvantages

Designers soon realized that the CISC philosophy had its own problems, including:• Earlier generations of a processor family generally were

contained as a subset in every new version - so instruction set & chip hardware become more complex with each generation of computers.

• So that as many instructions as possible could be stored in memory with the least possible wasted space, individual instructions could be of almost any length - this means that different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine.

• Many specialized instructions aren't used frequently enough to justify their existence -approximately 20% of the available instructions are used in a typical program.

• CISC instructions typically set the condition codes as a side effect of the instruction. Not only does setting the condition codes take time, but programmers have to remember to examine the condition code bits before a subsequent instruction changes them.

Page 24: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

What is RISC?What is RISC?• RISC?

RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes a small, highly-optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures.

• HistoryThe first RISC projects came from IBM, Stanford, and UC-Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. Certain design features have been characteristic of most RISC processors: – one cycle execution time: RISC processors have a CPI (clock

per instruction) of one cycle. This is due to the optimization of each instruction on the CPU and a technique called PIPELINING

– pipelining: a technique that allows for simultaneous execution of parts, or stages, of instructions to more efficiently process instructions;

– large number of registers: the RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory

Page 25: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

RISC AttributesRISC Attributes

The main characteristics of CISC microprocessors are:• Extensive instructions. • Complex and efficient machine instructions. • Micro-encoding of the machine instructions. • Extensive addressing capabilities for memory operations. • Relatively few registers.

In comparison, RISC processors are more or less the opposite of the above:• Reduced instruction set. • Less complex, simple instructions. • Hardwired control unit and machine instructions. • Few addressing schemes for memory operands with only

two basic instructions, LOAD and STORE • Many symmetric registers which are organized into a

register file.

Page 26: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

RISC DisadvantagesRISC Disadvantages• There is still considerable controversy among experts

about the ultimate value of RISC architectures. Its proponents argue that RISC machines are both cheaper and faster, and are therefore the machines of the future.

• However, by making the hardware simpler, RISC architectures put a greater burden on the software. Is this worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway?

Page 27: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

CISC versus RISCCISC versus RISC

CISC RISC

Emphasis on hardware Emphasis on software

Includes multi-clockcomplex instructions

Single-clock,reduced instruction only

Memory-to-memory:"LOAD" and "STORE"incorporated in instructions

Register to register:"LOAD" and "STORE"are independent instructions

Small code sizes,high cycles per second

Low cycles per second,large code sizes

Transistors used for storingcomplex instructions

Spends more transistorson memory registers

Page 28: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

SummationSummation• As memory speed increased, and high-level languages

displaced assembly language, the major reasons for CISC began to disappear, and computer designers began to look at ways computer performance could be optimized beyond just making faster hardware.

• One of their key realizations was that a sequence of simple instructions produces the same results as a sequence of complex instructions, but can be implemented with a simpler (and faster) hardware design. (Assuming that memory can keep up.) RISC (Reduced Instruction Set Computers) processors were the result.

• CISC and RISC implementations are becoming more and more alike. Many of today's RISC chips support as many instructions as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated with RISC chips.

Page 29: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Modern Day AdvancementModern Day Advancement• CISC and RISC Convergence

State of the art processor technology has changed significantly since RISC chips were first introduced in the early '80s. Because a number of advancements are used by both RISC and CISC processors, the lines between the two architectures have begun to blur. In fact, the two architectures almost seem to have adopted the strategies of the other. Because processor speeds have increased, CISC chips are now able to execute more than one instruction within a single clock. This also allows CISC chips to make use of pipelining. With other technological improvements, it is now possible to fit many more transistors on a single chip.

Page 30: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

• This gives RISC processors enough space to incorporate more complicated, CISC-like commands. RISC chips also make use of more complicated hardware, making use of extra function units for superscalar execution. All of these factors have led some groups to argue that we are now in a "post-RISC" era, in which the two styles have become so similar that distinguishing between them is no longer relevant. However, it should be noted that RISC chips still retain some important traits. RISC chips strictly utilize uniform, single-cycle instructions. They also retain the register-to-register, load/store architecture. And despite their extended instruction sets, RISC chips still have a large number of general purpose registers.

Modern Day AdvancementModern Day Advancement

Page 31: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

SUPERSCALAR

Page 32: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

What is Superscalar ?What is Superscalar ?

• Refers to a machine that is designed to improve the execution performance of scalar instructions

• A superscalar processor is one in which multiple independent instruction pipelines are used, exploits what is knows as instruction-level parallelism

• Equally applicable to RISC & CISC• In practice usually RISC

Page 33: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

General Superscalar OrganizationGeneral Superscalar Organization

Page 34: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Fetching Two Instructions per Cycle

Fetching Two Instructions per Cycle

Page 35: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

SuperpipelinedSuperpipelined

• Many pipeline stages need less than half a clock cycle

• Double internal clock speed gets two tasks per external clock cycle

• Superscalar allows parallel fetch execute

Page 36: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Superscalar vsSuperpipelinedSuperscalar vsSuperpipelined

Page 37: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

LimitationsLimitations

• Instruction level parallelism• Compiler based optimisation• Hardware techniques• Limited by

– True data dependency– Procedural dependency– Resource conflicts– Output dependency– Antidependency

Page 38: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

True Data DependencyTrue Data Dependency

• ADD r1, r2 (r1 := r1+r2;)• MOVE r3,r1 (r3 := r1;)• Can fetch and decode second

instruction in parallel with first• Can NOT execute second

instruction until first is finished

Page 39: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Procedural DependencyProcedural Dependency

• Can not execute instructions after a branch, in parallel with, instructions before a branch

• Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed

• This prevents simultaneous fetches

Page 40: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Resource ConflictResource Conflict

• Two or more instructions requiring access to the same resource at the same time– e.g. two arithmetic instructions

• Can duplicate resources– e.g. have two arithmetic units

Page 41: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

DependenciesDependencies

Page 42: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

AntidependencyAntidependency

• WAW dependency– R3:=R3 + R5; (I1)– R4:=R3 + 1; (I2)– R3:=R5 + 1; (I3)– R7:=R3 + R4; (I4)

I3 can not complete before I2 starts as I2 needs a value in R3 and I3 changes R3

Page 43: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Register RenamingRegister Renaming

• Antidependencies occur because register contents may not reflect the correct ordering from the program

• May result in a pipeline stall• Registers allocated dynamically

– i.e. registers are not specifically named

Page 44: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Register Renaming exampleRegister Renaming example

• R3b:=R3a + R5a (I1)• R4b:=R3b + 1 (I2)• R3c:=R5a + 1 (I3)• R7b:=R3c + R4b (I4)• Without subscript refers to logical register in

instruction• With subscript is hardware register allocated• Note R3a R3b R3c• Disadvantage: need more registers !

Page 45: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Superscalar ExecutionSuperscalar Execution

Page 46: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Superscalar Execution ExampleSuperscalar Execution Example

- With Register Renaming for WAR and WAW dependencies.

Page 47: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

ConclusionConclusion

It thereby allows faster CPU throughput than would otherwise be possible at the same clock rate.

All general-purpose CPUs developed since about 1998 are superscalar.

The major problem of executing multiple instructions in a scalar program is the handling of data dependencies. If data dependencies are not effectively handled, it is difficult to achieve an execution rate of more than one instruction per clock cycle.

Page 48: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Comparison of processorsComparison of processors

Page 49: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

Any Question ?

Page 50: AKT211 – CAO 06 – More on Advanced Processing Techniques Ghifar Parahyangan Catholic University Okt 17, 2011 Ghifar Parahyangan Catholic University Okt

THANK YOU