Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CSE 2312 Computer Organization & Assembly Language Programming 1Spring 2015
CSE 2312
Lecture 5 CPU
Junzhou Huang, Ph.D.
Department of Computer Science and Engineering
Computer Organization &
Assembly Language Programming
CSE 2312 Computer Organization & Assembly Language Programming 2Spring 2015
Reviewing (1): IC and CPI
• IC determined by program, ISA, and compiler
• CPI determined by CPU and other factors– Different instructions have different CPI
– Average CPI affected by instruction mix
Rate Clock
CPIIC
TimeCycle ClockCPIIC TimeCPU
(CPI) nInstructioper Cycles
(IC)Count nInstructioCycles Clock
×=
××=
×=
CSE 2312 Computer Organization & Assembly Language Programming 3Spring 2015
Reviewing (2): CPU Performance Equation
CSE 2312 Computer Organization & Assembly Language Programming 4Spring 2015
Reviewing (3): Performance Summary
• Performance depends on– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, Tc
cycle Clock
Seconds
nInstructio
cycles Clock
Program
nsInstructio TimeCPU ××=
CSE 2312 Computer Organization & Assembly Language Programming 5Spring 2015
Reviewing (4): How Improve Performance?
We must lower execution time!
• Algorithm– Determines number of operations executed
• Programming language, compiler, architecture– Determine number of machine instructions executed per operation (IC)
• Processor and memory system– Determine how fast instructions are executed (CPI)
• I/O system (including OS)– Determines how fast I/O operations are executed
CSE 2312 Computer Organization & Assembly Language Programming 6Spring 2015
Central Processing Unit (CPU)
The organization of a simple computer with one CPU and two I/O devices
CSE 2312 Computer Organization & Assembly Language Programming 7Spring 2015
Basic Elements
Other devices: Cache; Virtual Memory Support (MMU); ….
CSE 2312 Computer Organization & Assembly Language Programming 8Spring 2015
Processor
• CPU– Brain of the Computer, that execute programs stored in the main memory by fetching instructions, examining and executing them one after another
• Bus– Connect different components
– Parallel wires for transmitting address, data and control signals
– Could be external to the CPU (connect with memory, I/O), or internal
• Control Unit– Fetching instructions from main memory and determining their types
• Arithmetic Logic Unit (ALU)– Performing arithmetic operations, such as addition, boolean operations
• Registers– High speed memory used to store temporary results and control information
– Program Counter (PC): point to the next instruction to be fetched
– Instruction Registers (IR): hold the instruction currently being executed
CSE 2312 Computer Organization & Assembly Language Programming 9Spring 2015
CPU Organization
The data path of a typical Von Neumann machine
• Instructions:– Register-Memory: memory words being fetched into registers
– Register-Register
• Data Path Cycle– The process of running two operands through the ALU and storing results
– Define what the machine can do
– The faster the data path cycles is, the faster the computer runs
CSE 2312 Computer Organization & Assembly Language Programming 10Spring 2015
Arithmetic Logic Unit (ALU)
ALU
A B
C
op c
n
z
v4
ALU
A B
C
op c
n
z
v4
• Conduct different calculations – +, -, x, /, – and, or, xor, not,
– Shift, …
• Variants– Integer, Floating Point, Double Precision
– High performance machine will have several!
• Input
– Operands - A, B
– Operation code: obtained from encoded instruction
• Output
– Result – C
– Status codes
Status codes:Usuallyc - carry out from +, -, x, shiftn - result is negativez - result is zerov - result overflowed
CSE 2312 Computer Organization & Assembly Language Programming 11Spring 2015
Instruction Execution Steps
• Fetch-decode-execute – Fetch next instruction from memory into instruction register
– Change the program counter to point out the following instruction
– Determine type of instruction just fetched
– If instructions uses a word in memory, determine where it is
– Fetch the word, if needed, into a CPU register
– Execute the instruction
– Go to step 1 to begin executing following instruction
Central to the operation of all computers
CSE 2312 Computer Organization & Assembly Language Programming 12Spring 2015
Interpreting Instructions
• Interpreter
– A program that fetches, examines and executes the instructions of other program
– Can write a program to imitate the function of a CPU
– Main advantage: the ability to design a simple processor with the complexity largely confined to the memory holding the interpreter
• Benefits (simple computer with interpreted instructions)
– The ability to fix incorrectly implemented instructions or make up for design deficiencies in the basic hardware
– The opportunity to add new instructions at minimal cost even after delivery of the machine
– Structured design that permitted efficient development, testing and documenting of complex instructions
CSE 2312 Computer Organization & Assembly Language Programming 13Spring 2015
RISC vs. CISC
• Semantic Gap between – What machine can do?– What high-level programming languages required?
• Key of designing instructions– Designed instructions should be able to issued quickly – How long an instructions actually took mattered less than how many could be started per second
• Reduced Instruction Set Computer– Did not use the interpretation – Did not have to be backward compatible with existing products– Small number of instructions, 50
• Complex Instruction Set Computer– Instructions, around 200-300, DEC VAX and IBM main-frames
• Inter (486 up)– A RISC core executes the simplest (most common) instructions– Interpreting the more complicated instructions in the usual CISC way
CSE 2312 Computer Organization & Assembly Language Programming 14Spring 2015
Design Principles for Modern Computers
• Instructions directly executed by hardware– Eliminating a level of interpretation provides high speed for most instructions;
– Less frequently occurring instructions are acceptable
• Maximize rate at which instructions are issued– Parallelism can play a major role in improving performance
• Instructions should be easy to decode– A critical limit on the rate of issue of instructions is decoding individual instructions to determine what resources they need;
– Fewer different formats for instructions, the better
• Only loads, stores should reference memory– Access the memory can take a long time– All other instructions should operate only on registers
• Provide plenty of registers– Running out of registers leads to flush them back to memory – Memory access leads to slow speed
CSE 2312 Computer Organization & Assembly Language Programming 15Spring 2015
Instruction-Level Parallelism
• A five-stage pipeline– The state of each stage as a function of time.
– Nine clock cycles are illustrated
CSE 2312 Computer Organization & Assembly Language Programming 16Spring 2015
Pipelining
• A five-stage pipeline– Suppose 2ns for the cycle time. – It takes 10ns for an instruction to progress all the way through the five-stage pipeline
– So, the machine runs at 100 MIPS?– Actual rate is 500 MIPS
• Pipelining– Allow a tradeoff between latency and processor bandwidth – Latency: how long it takes to execute an instruction– Processor bandwidth: how many MIPS the CPU has
• Example– Suppose a complex instruction should take 10 ns, under perfect condition, how many stages pipeline we should design to guarantee to execute 500 MIPS?
Each pipeline: 1/500 MIPS = 2 ns10 ns/ 2ns =5 stages
CSE 2312 Computer Organization & Assembly Language Programming 17Spring 2015
Superscalar Architectures (1)
• Dual five-stage pipelines with a common instruction fetch unit– Fetches pairs of instructions together and put each one into its own pipeline
– Two instructions must not conflict over resource usage
– Neither must depend on the results of others
CSE 2312 Computer Organization & Assembly Language Programming 18Spring 2015
Superscalar Architectures (2)
A superscalar processor with five functional units.
• Implicit idea– S3 stage can issue instructions considerably faster than the S4 stage is able to execute them
CSE 2312 Computer Organization & Assembly Language Programming 19Spring 2015
Processor-Level Parallelism (1)
• An array processor– A large number of identical processors that perform the same sequence of instructions on different sets of data
– Different from a standard Von Neumann machine
CSE 2312 Computer Organization & Assembly Language Programming 20Spring 2015
Processor-Level Parallelism (2)
• A single-bus multiprocessor– Example: locate where the white ball in a picture
• A multicomputer with local memories.
CSE 2312 Computer Organization & Assembly Language Programming 21Spring 2015
Processor-Level Parallelism (3)
• Multiple-Computers (Loosely coupled)– Easier to build
• Multiple-Processors (Tightly coupled)– Easier to programming
CSE 2312 Computer Organization & Assembly Language Programming 22Spring 2015
Exercise
• Ex 1: TRUE OR FALSE, Why?
– The Data Path Cycle defines what the computer can do. The longer the data path cycles is, the faster the computer runs.
– Answer: F
– Reason: The Data Path Cycle defines what the computer can do. The shorter/faster the data path cycles is, the faster the computer runs.
CSE 2312 Computer Organization & Assembly Language Programming 23Spring 2015
Exercise
• Ex 2: What are the design principles for modern computers?
– (a) Instructions directly executed by hardware
– (b) Minimize rate at which instructions are issued
– (c) Instructions should be easy to decode
– (d) Only loads, stores should reference memory
– (e) Provide plenty of registers
– Answer: [a, c, d, e]
CSE 2312 Computer Organization & Assembly Language Programming 24Spring 2015
Exercise
• Ex 3: The following diagram gives the organization of a simple computer with one CPU and two I/O devices.
– Is it correct? If not, please correct it in the diagram.
• Solution
– Incorrect.
– In place of Disk, it should be Register.
– In place of Register, it should be Disk.
Disk
Register
CSE 2312 Computer Organization & Assembly Language Programming 25Spring 2015
Exercise
• Ex 4: Consider each instruction has 5 stages in a computer with pipelining techniques. Each stage takes 2 ns.
– What is the maximum number of MIPS that this machine is capable of with this 5-stage pipelining techniques?
– What is the maximum number of MIPS that this machine is capable of in the absence of pipelining?
• Solution
– 1/2ns=500MIPS
– 500/5=100MIPS or 1/(5*2ns)=100MIPS
CSE 2312 Computer Organization & Assembly Language Programming 26Spring 2015
Attention
• HW1 due in Class
• Quiz next class (for Chapter 1)