26
CSE 2312 Computer Organization & Assembly Language Programming 1 Spring 2015 CSE 2312 Lecture 5 CPU Junzhou Huang, Ph.D. Department of Computer Science and Engineering Computer Organization & Assembly Language Programming

Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 1Spring 2015

CSE 2312

Lecture 5 CPU

Junzhou Huang, Ph.D.

Department of Computer Science and Engineering

Computer Organization &

Assembly Language Programming

Page 2: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 2Spring 2015

Reviewing (1): IC and CPI

• IC determined by program, ISA, and compiler

• CPI determined by CPU and other factors– Different instructions have different CPI

– Average CPI affected by instruction mix

Rate Clock

CPIIC

TimeCycle ClockCPIIC TimeCPU

(CPI) nInstructioper Cycles

(IC)Count nInstructioCycles Clock

×=

××=

×=

Page 3: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 3Spring 2015

Reviewing (2): CPU Performance Equation

Page 4: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 4Spring 2015

Reviewing (3): Performance Summary

• Performance depends on– Algorithm: affects IC, possibly CPI

– Programming language: affects IC, CPI

– Compiler: affects IC, CPI

– Instruction set architecture: affects IC, CPI, Tc

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructio TimeCPU ××=

Page 5: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 5Spring 2015

Reviewing (4): How Improve Performance?

We must lower execution time!

• Algorithm– Determines number of operations executed

• Programming language, compiler, architecture– Determine number of machine instructions executed per operation (IC)

• Processor and memory system– Determine how fast instructions are executed (CPI)

• I/O system (including OS)– Determines how fast I/O operations are executed

Page 6: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 6Spring 2015

Central Processing Unit (CPU)

The organization of a simple computer with one CPU and two I/O devices

Page 7: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 7Spring 2015

Basic Elements

Other devices: Cache; Virtual Memory Support (MMU); ….

Page 8: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 8Spring 2015

Processor

• CPU– Brain of the Computer, that execute programs stored in the main memory by fetching instructions, examining and executing them one after another

• Bus– Connect different components

– Parallel wires for transmitting address, data and control signals

– Could be external to the CPU (connect with memory, I/O), or internal

• Control Unit– Fetching instructions from main memory and determining their types

• Arithmetic Logic Unit (ALU)– Performing arithmetic operations, such as addition, boolean operations

• Registers– High speed memory used to store temporary results and control information

– Program Counter (PC): point to the next instruction to be fetched

– Instruction Registers (IR): hold the instruction currently being executed

Page 9: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 9Spring 2015

CPU Organization

The data path of a typical Von Neumann machine

• Instructions:– Register-Memory: memory words being fetched into registers

– Register-Register

• Data Path Cycle– The process of running two operands through the ALU and storing results

– Define what the machine can do

– The faster the data path cycles is, the faster the computer runs

Page 10: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 10Spring 2015

Arithmetic Logic Unit (ALU)

ALU

A B

C

op c

n

z

v4

ALU

A B

C

op c

n

z

v4

• Conduct different calculations – +, -, x, /, – and, or, xor, not,

– Shift, …

• Variants– Integer, Floating Point, Double Precision

– High performance machine will have several!

• Input

– Operands - A, B

– Operation code: obtained from encoded instruction

• Output

– Result – C

– Status codes

Status codes:Usuallyc - carry out from +, -, x, shiftn - result is negativez - result is zerov - result overflowed

Page 11: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 11Spring 2015

Instruction Execution Steps

• Fetch-decode-execute – Fetch next instruction from memory into instruction register

– Change the program counter to point out the following instruction

– Determine type of instruction just fetched

– If instructions uses a word in memory, determine where it is

– Fetch the word, if needed, into a CPU register

– Execute the instruction

– Go to step 1 to begin executing following instruction

Central to the operation of all computers

Page 12: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 12Spring 2015

Interpreting Instructions

• Interpreter

– A program that fetches, examines and executes the instructions of other program

– Can write a program to imitate the function of a CPU

– Main advantage: the ability to design a simple processor with the complexity largely confined to the memory holding the interpreter

• Benefits (simple computer with interpreted instructions)

– The ability to fix incorrectly implemented instructions or make up for design deficiencies in the basic hardware

– The opportunity to add new instructions at minimal cost even after delivery of the machine

– Structured design that permitted efficient development, testing and documenting of complex instructions

Page 13: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 13Spring 2015

RISC vs. CISC

• Semantic Gap between – What machine can do?– What high-level programming languages required?

• Key of designing instructions– Designed instructions should be able to issued quickly – How long an instructions actually took mattered less than how many could be started per second

• Reduced Instruction Set Computer– Did not use the interpretation – Did not have to be backward compatible with existing products– Small number of instructions, 50

• Complex Instruction Set Computer– Instructions, around 200-300, DEC VAX and IBM main-frames

• Inter (486 up)– A RISC core executes the simplest (most common) instructions– Interpreting the more complicated instructions in the usual CISC way

Page 14: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 14Spring 2015

Design Principles for Modern Computers

• Instructions directly executed by hardware– Eliminating a level of interpretation provides high speed for most instructions;

– Less frequently occurring instructions are acceptable

• Maximize rate at which instructions are issued– Parallelism can play a major role in improving performance

• Instructions should be easy to decode– A critical limit on the rate of issue of instructions is decoding individual instructions to determine what resources they need;

– Fewer different formats for instructions, the better

• Only loads, stores should reference memory– Access the memory can take a long time– All other instructions should operate only on registers

• Provide plenty of registers– Running out of registers leads to flush them back to memory – Memory access leads to slow speed

Page 15: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 15Spring 2015

Instruction-Level Parallelism

• A five-stage pipeline– The state of each stage as a function of time.

– Nine clock cycles are illustrated

Page 16: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 16Spring 2015

Pipelining

• A five-stage pipeline– Suppose 2ns for the cycle time. – It takes 10ns for an instruction to progress all the way through the five-stage pipeline

– So, the machine runs at 100 MIPS?– Actual rate is 500 MIPS

• Pipelining– Allow a tradeoff between latency and processor bandwidth – Latency: how long it takes to execute an instruction– Processor bandwidth: how many MIPS the CPU has

• Example– Suppose a complex instruction should take 10 ns, under perfect condition, how many stages pipeline we should design to guarantee to execute 500 MIPS?

Each pipeline: 1/500 MIPS = 2 ns10 ns/ 2ns =5 stages

Page 17: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 17Spring 2015

Superscalar Architectures (1)

• Dual five-stage pipelines with a common instruction fetch unit– Fetches pairs of instructions together and put each one into its own pipeline

– Two instructions must not conflict over resource usage

– Neither must depend on the results of others

Page 18: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 18Spring 2015

Superscalar Architectures (2)

A superscalar processor with five functional units.

• Implicit idea– S3 stage can issue instructions considerably faster than the S4 stage is able to execute them

Page 19: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 19Spring 2015

Processor-Level Parallelism (1)

• An array processor– A large number of identical processors that perform the same sequence of instructions on different sets of data

– Different from a standard Von Neumann machine

Page 20: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 20Spring 2015

Processor-Level Parallelism (2)

• A single-bus multiprocessor– Example: locate where the white ball in a picture

• A multicomputer with local memories.

Page 21: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 21Spring 2015

Processor-Level Parallelism (3)

• Multiple-Computers (Loosely coupled)– Easier to build

• Multiple-Processors (Tightly coupled)– Easier to programming

Page 22: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 22Spring 2015

Exercise

• Ex 1: TRUE OR FALSE, Why?

– The Data Path Cycle defines what the computer can do. The longer the data path cycles is, the faster the computer runs.

– Answer: F

– Reason: The Data Path Cycle defines what the computer can do. The shorter/faster the data path cycles is, the faster the computer runs.

Page 23: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 23Spring 2015

Exercise

• Ex 2: What are the design principles for modern computers?

– (a) Instructions directly executed by hardware

– (b) Minimize rate at which instructions are issued

– (c) Instructions should be easy to decode

– (d) Only loads, stores should reference memory

– (e) Provide plenty of registers

– Answer: [a, c, d, e]

Page 24: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 24Spring 2015

Exercise

• Ex 3: The following diagram gives the organization of a simple computer with one CPU and two I/O devices.

– Is it correct? If not, please correct it in the diagram.

• Solution

– Incorrect.

– In place of Disk, it should be Register.

– In place of Register, it should be Disk.

Disk

Register

Page 25: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 25Spring 2015

Exercise

• Ex 4: Consider each instruction has 5 stages in a computer with pipelining techniques. Each stage takes 2 ns.

– What is the maximum number of MIPS that this machine is capable of with this 5-stage pipelining techniques?

– What is the maximum number of MIPS that this machine is capable of in the absence of pipelining?

• Solution

– 1/2ns=500MIPS

– 500/5=100MIPS or 1/(5*2ns)=100MIPS

Page 26: Computer Organization & Assembly Language Programmingranger.uta.edu › ~huang › teaching › CSE2312 › CSE2312_Lecture5.pdf · Spring 2015 CSE 2312 Computer Organization & Assembly

CSE 2312 Computer Organization & Assembly Language Programming 26Spring 2015

Attention

• HW1 due in Class

• Quiz next class (for Chapter 1)