39
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 3: Arithmetic Instructions Partially adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy, and classes taught by Patterson at Berkeley, Ryan Kastner at UCSB and Mary Jane Irwin at Penn State

ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 3: Arithmetic Instructions Partially adapted from Computer Organization and Design, 4

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

ECE 15B Computer OrganizationSpring 2010

Dmitri Strukov

Lecture 3: Arithmetic Instructions

Partially adapted from Computer Organization and Design, 4th edition, Patterson and Hennessy, and classes taught by Patterson at Berkeley, Ryan Kastner at UCSB and Mary Jane Irwin at Penn State

ECE 15B Spring 2010

Announcement

• TA office hours for Vivek were moved to Tuesday 11:00 am – 12:00 am

• Basics of logic design is in Appendix C (P+H)• SPIM and reading status

ECE 15B Spring 2010

Agenda

• Key concepts from last lecture & several new ones

• C operators and operands• Variables in Assembly: Registers• Addition and Subtraction in Assembly

ECE 15B Spring 2010

Key Concepts from Last Lecture

• Synchronous circuits • Clocking & Pipelining & Timing Diagram • CPU simplified diagram

CPU Clocking• Operation of digital hardware governed by a

constant-rate clock

Clock (cycles)

Data transferand computation

Update state

Clock period

Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s

Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz

ECE 15B Spring 2010

ECE 15B Spring 2010

CPU Overview

ECE 15B Spring 2010

… with muxes Can’t just join wires

together Use multiplexers

… with muxes

ECE 15B Spring 2010

Below the Program

lw $t0, 0($2)lw $t1, 4($2)sw $t1, 0($2)sw $t0, 4($2)

High Level Language Program (e.g., C)

Assembly Language Program (e.g.,MIPS)

Machine Language Program (MIPS)

Hardware Architecture Description (e.g., block diagrams)

Compiler

Assembler

Machine Interpretation

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

Logic Circuit Description(Circuit Schematic Diagrams)

Architecture Implementation

ECE 15B Spring 2010

ECE 15B Spring 2010

Assembly Language

• Basic job of a CPU: execute lots of instructions• Instructions are the primitive operations that the CPU may

execute

• Different CPUs implement different sets of instructions• Instruction Set Architecture (ISA) is a set of instructions a

particular CPU implements• Examples: Intel 80x86 (Pentium 4), IBM/Motorola Power

PC (Macintosh), MIPS, Intel IA64, ARM

ECE 15B Spring 2010

Instruction Set Architectures• Early trend was to add more and more

instructions to new CPU to do elaborate operations• VAX architecture had an instruction to multiply polynomials

• RISC philosophy (Cocke IBM, Patterson, Hennessy, 1980s)

RISC = Reduced Instruction Set Computing• Keep the instruction set small and simple which makes it easier to

build fast hardware• Let software (compiler) do complicated operations by composing

simpler ones

ECE 15B Spring 2010

How to Access Performance of Instruction Set Architecture?

SPEC CPU Benchmark• Programs used to measure performance

– Supposedly typical of actual workload• Standard Performance Evaluation Corp (SPEC)

– Develops benchmarks for CPU, I/O, Web, …

• SPEC CPU2006– Elapsed time to execute a selection of programs

• Negligible I/O, so focuses on CPU performance– Normalize relative to reference machine– Summarize as geometric mean of performance ratios

• CINT2006 (integer) and CFP2006 (floating-point)

n

n

1iiratio time Execution

ECE 15B Spring 2010

CINT2006 for Opteron X4 2356

Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio

perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3

bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8

gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1

mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8

go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6

hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5

sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5

libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8

h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3

omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1

astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1

xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0

Geometric mean 11.7

ECE 15B Spring 2010

Review: Technology Trends Uniprocessor Performance (SPECint)

• VAX : 1.25x/year 1978 to 1986• RISC + x86: 1.52x/year 1986 to 2002• RISC + x86: 1.20x/year 2002 to present

1.25x/year

1.52x/year

1.20x/year

Perf

orm

ance

(vs.

VAX

-11/

780)

3X“Sea change” in chip design: multiple “cores” or processors per chip

ECE 15B Spring 2010

MIPS Architecture

• MIPS– Semiconductor company that built one of the

first commercial RISC architectures• We will study the MIPS architecture in detail

in this class• Why MIPS instead of Intel 80x86

– MIPS is simple and elegant. Don’t want to get bogged down in gritty details

– MIPS is widely used in embedded apps– There are more embedded computers than PCs

ECE 15B Spring 2010

Assembly Variables: Registers

• Unlike HLL like C or Java, assembly cannot use variables– Why not? Keep hardware simple

• Assembly Operands are registers– Limited number of special locations built directly

into the hardware– Operations can only be performed on these– Benefit: Since registers file is small, it is very fast

ECE 15B Spring 2010

Assembly Variables: Registers

• Drawback:– Registers are in a hardware, there are a predetermined

number of them• Solution:

– MIPS code must be very carefully put together to efficiently use registers

• 32 registers in MIPS– Smaller is faster

• Each MIPS register is 32 bits wide– Groups of 32 bits called a word in MIPS

ECE 15B Spring 2010

Assembly Variables

• Registers are numbered from 0 to 31• Each Register can be referred to by number or

name• Number references

– $0, $1, $2, …., $30, $31

ECE 15B Spring 2010

Assembly Variables: Registers

• By convention, each register also has a name to make it easier to code

• For now:$16 - $23 $s0 - $s7(correspond to C variables)$8 - $15 $t0 - $t7(correspond to temporary variables)

Will explain other 16 register names later

• In general, use names to make your code more readable

ECE 15B Spring 2010

C, Java Variables vs. Registers

• In C (and most High Level Languages) variables declared first and given a type– Example:

int fahr, celcius;char a, b, c, d, e;

– Each variable can only represent a value of the type it was declared as (cannot mic and match int and char variables)

• In assembly Language the registers have no type– Operation determines how register contents are treated

ECE 15B Spring 2010

Comments in Assembly

• Another way to make your code more readable: comments

• Hash (#) is used for MIPS comments– Anything from hash mark to end of line is a

comment and will be ignored– Note: different from C, comments have format /*

comment */, so they can span many lines

ECE 15B Spring 2010

Assembly Instructions

• In assembly language, each statement (called an instruction), executes exactly one of a short list of simple commands

• Unlike in C (and most other high level languages), each line of assembly code contains at most one instruction

• Instructions are related to operations (=,+,-, *,/) in C or Java

ECE 15B Spring 2010

MIPS Syntax• Instruction Syntax:[Label:] Op-code [oper. 1], [oper. 2], [oper.3], [#comment] (0) (1) (2) (3) (4) (5)

– Where1) operation name2,3,4) operands5) comments0) label field is optional, will discuss later

– For arithmetic and logic instruction2) operand getting result (“destination”)

3) 1st operand for operation (“source 1”) 4) 2nd operand for operation (source 2”

• Syntax is rigid– 1 operator, 3 operands– Why? Keep hardware simple via regularity

ECE 15B Spring 2010

Addition and Subtraction of Integers• Addition in assembly

– Example:add $s0, $s1, $s2 (in MIPS)• Equivalent to: a = b + c (in C)• Where MIPS registers $s0, $s1, $s2 are associated with C

variables a, b, c

• Subtraction in Assembly– Example

Sub $s3, $s4, S5 (in MIPS)• Equivalent to: d = e - f (in C)• Where MIPS registers $s3, $s4, $s5 are associated with C

variables d, e, f

ECE 15B Spring 2010

Addition and Subtraction of Integers

• The following C statement in MIPS?a= b + c+ d - a

• Break into multiple instructions add $t0, $s1, $s2 #temp = b + c add $t0, $t0, $s3 # temp = temp + d sub $s0, $t0, $s4 # a = temp – e

– Notes: • A single line of C may break up into several lines of MIPS• Everything after the hash mark on each line is ignored (i.e.

comments)

ECE 15B Spring 2010

Addition and Subtraction of Integers

• How do we do this? f = (g + h) – (i + j)Use intermediate temporary registers

add $t0, $s1, $s2 #temp = g + hadd $t1, $s3, $s4 #temp = I + j

sub $s0, $t0, $t1 #f = (g+h)-(i+j)

ECE 15B Spring 2010

Immediates

• Immediates are numerical constants• They appear often in code, so there are special

instructions for them• Add immediate:

addi $s0, $s1, 10 # f= g + 10 (in C)– Where MIPS registers $s0 and $s1 are

associated with C variables f and g– Syntax similar to add instruction, except

that last argument is a number instead of register

ECE 15B Spring 2010

Immediates

• There is no Subtract Immediate in MIPS: Why?– Remove redundant operations, i.e. if operation can be

decomposed to into simpler ones exclude it from the set of instructions

addi …, -X is equivalent to subi …, X so no subi

• Exampleaddi $so, $s1, -10 # f = g – 10– where MIPS registers $s0 and $s1 are associated with C

variables f and g

ECE 15B Spring 2010

Register Zero

• One particular immediate, the number zero (o) appears very often in code– So define register zero ($0 or $zero) to always have the

value 0• Example

add $s0, S1, $zero # f = g• Where MIPS registers $s0 and $s1 are associated with C variables

f, g,

• Defined in hardware, so an instruction addi $zero, $zero, 5will not do anything!

Additional Notes: CPU Time

• Performance improved by– Reducing number of clock cycles– Increasing clock rate– Hardware designer must often trade off clock rate

against cycle count

Rate Clock

Cycles Clock CPU

Time Cycle ClockCycles Clock CPUTime CPU

ECE 15B Spring 2010

Additional Notes: CPU Time Example• Computer A: 2GHz clock, 10s CPU time• Designing Computer B

– Aim for 6s CPU time– Can do faster clock, but causes 1.2 × clock cycles

• How fast must Computer B clock be?

4GHz6s

1024

6s

10201.2Rate Clock

10202GHz10s

Rate ClockTime CPUCycles Clock

6s

Cycles Clock1.2

Time CPU

Cycles ClockRate Clock

99

B

9

AAA

A

B

BB

ECE 15B Spring 2010

Additional Notes: Instruction Count and CPI

• Instruction Count for a program– Determined by program, ISA and compiler

• Average cycles per instruction– Determined by CPU hardware– If different instructions have different CPI

• Average CPI affected by instruction mix

Rate Clock

CPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

ECE 15B Spring 2010

Additional Notes: CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?

1.2500psI

600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

A is faster…

…by this much

ECE 15B Spring 2010

Additional Notes: CPI in More Detail• If different instruction classes take different

numbers of cycles

n

1iii )Count nInstructio(CPICycles Clock

Weighted average CPI

n

1i

ii Count nInstructio

Count nInstructioCPI

Count nInstructio

Cycles ClockCPI

Relative frequency

ECE 15B Spring 2010

Additional Notes: Pipelining Analogy• Pipelined laundry: overlapping execution

– Parallelism improves performance

Four loads: Speedup

= 8/3.5 = 2.3 Non-stop:

Speedup= 2n/0.5n + 1.5 ≈ 4= number of stages

ECE 15B Spring 2010

Additional Notes: Pipeline PerformanceSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

ECE 15B Spring 2010

ECE 15B Spring 2010

Conclusions

• In MIPS assembly language– Register replace C variables– One instruction (simple operation) per line– Simpler is faster

ECE 15B Spring 2010

Review

• Instructions so far:add, addi, sub

• Registers so farC variables: $s0 - $s7Temporary variables: $t0 - $t9Zero: $zero