View
220
Download
0
Tags:
Embed Size (px)
Citation preview
ECE 15B Computer OrganizationSpring 2010
Dmitri Strukov
Lecture 3: Arithmetic Instructions
Partially adapted from Computer Organization and Design, 4th edition, Patterson and Hennessy, and classes taught by Patterson at Berkeley, Ryan Kastner at UCSB and Mary Jane Irwin at Penn State
ECE 15B Spring 2010
Announcement
• TA office hours for Vivek were moved to Tuesday 11:00 am – 12:00 am
• Basics of logic design is in Appendix C (P+H)• SPIM and reading status
ECE 15B Spring 2010
Agenda
• Key concepts from last lecture & several new ones
• C operators and operands• Variables in Assembly: Registers• Addition and Subtraction in Assembly
ECE 15B Spring 2010
Key Concepts from Last Lecture
• Synchronous circuits • Clocking & Pipelining & Timing Diagram • CPU simplified diagram
CPU Clocking• Operation of digital hardware governed by a
constant-rate clock
Clock (cycles)
Data transferand computation
Update state
Clock period
Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz
ECE 15B Spring 2010
Below the Program
lw $t0, 0($2)lw $t1, 4($2)sw $t1, 0($2)sw $t0, 4($2)
High Level Language Program (e.g., C)
Assembly Language Program (e.g.,MIPS)
Machine Language Program (MIPS)
Hardware Architecture Description (e.g., block diagrams)
Compiler
Assembler
Machine Interpretation
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
Logic Circuit Description(Circuit Schematic Diagrams)
Architecture Implementation
ECE 15B Spring 2010
ECE 15B Spring 2010
Assembly Language
• Basic job of a CPU: execute lots of instructions• Instructions are the primitive operations that the CPU may
execute
• Different CPUs implement different sets of instructions• Instruction Set Architecture (ISA) is a set of instructions a
particular CPU implements• Examples: Intel 80x86 (Pentium 4), IBM/Motorola Power
PC (Macintosh), MIPS, Intel IA64, ARM
ECE 15B Spring 2010
Instruction Set Architectures• Early trend was to add more and more
instructions to new CPU to do elaborate operations• VAX architecture had an instruction to multiply polynomials
• RISC philosophy (Cocke IBM, Patterson, Hennessy, 1980s)
RISC = Reduced Instruction Set Computing• Keep the instruction set small and simple which makes it easier to
build fast hardware• Let software (compiler) do complicated operations by composing
simpler ones
SPEC CPU Benchmark• Programs used to measure performance
– Supposedly typical of actual workload• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance– Normalize relative to reference machine– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
n
n
1iiratio time Execution
ECE 15B Spring 2010
CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
ECE 15B Spring 2010
Review: Technology Trends Uniprocessor Performance (SPECint)
• VAX : 1.25x/year 1978 to 1986• RISC + x86: 1.52x/year 1986 to 2002• RISC + x86: 1.20x/year 2002 to present
1.25x/year
1.52x/year
1.20x/year
Perf
orm
ance
(vs.
VAX
-11/
780)
3X“Sea change” in chip design: multiple “cores” or processors per chip
ECE 15B Spring 2010
MIPS Architecture
• MIPS– Semiconductor company that built one of the
first commercial RISC architectures• We will study the MIPS architecture in detail
in this class• Why MIPS instead of Intel 80x86
– MIPS is simple and elegant. Don’t want to get bogged down in gritty details
– MIPS is widely used in embedded apps– There are more embedded computers than PCs
ECE 15B Spring 2010
Assembly Variables: Registers
• Unlike HLL like C or Java, assembly cannot use variables– Why not? Keep hardware simple
• Assembly Operands are registers– Limited number of special locations built directly
into the hardware– Operations can only be performed on these– Benefit: Since registers file is small, it is very fast
ECE 15B Spring 2010
Assembly Variables: Registers
• Drawback:– Registers are in a hardware, there are a predetermined
number of them• Solution:
– MIPS code must be very carefully put together to efficiently use registers
• 32 registers in MIPS– Smaller is faster
• Each MIPS register is 32 bits wide– Groups of 32 bits called a word in MIPS
ECE 15B Spring 2010
Assembly Variables
• Registers are numbered from 0 to 31• Each Register can be referred to by number or
name• Number references
– $0, $1, $2, …., $30, $31
ECE 15B Spring 2010
Assembly Variables: Registers
• By convention, each register also has a name to make it easier to code
• For now:$16 - $23 $s0 - $s7(correspond to C variables)$8 - $15 $t0 - $t7(correspond to temporary variables)
Will explain other 16 register names later
• In general, use names to make your code more readable
ECE 15B Spring 2010
C, Java Variables vs. Registers
• In C (and most High Level Languages) variables declared first and given a type– Example:
int fahr, celcius;char a, b, c, d, e;
– Each variable can only represent a value of the type it was declared as (cannot mic and match int and char variables)
• In assembly Language the registers have no type– Operation determines how register contents are treated
ECE 15B Spring 2010
Comments in Assembly
• Another way to make your code more readable: comments
• Hash (#) is used for MIPS comments– Anything from hash mark to end of line is a
comment and will be ignored– Note: different from C, comments have format /*
comment */, so they can span many lines
ECE 15B Spring 2010
Assembly Instructions
• In assembly language, each statement (called an instruction), executes exactly one of a short list of simple commands
• Unlike in C (and most other high level languages), each line of assembly code contains at most one instruction
• Instructions are related to operations (=,+,-, *,/) in C or Java
ECE 15B Spring 2010
MIPS Syntax• Instruction Syntax:[Label:] Op-code [oper. 1], [oper. 2], [oper.3], [#comment] (0) (1) (2) (3) (4) (5)
– Where1) operation name2,3,4) operands5) comments0) label field is optional, will discuss later
– For arithmetic and logic instruction2) operand getting result (“destination”)
3) 1st operand for operation (“source 1”) 4) 2nd operand for operation (source 2”
• Syntax is rigid– 1 operator, 3 operands– Why? Keep hardware simple via regularity
ECE 15B Spring 2010
Addition and Subtraction of Integers• Addition in assembly
– Example:add $s0, $s1, $s2 (in MIPS)• Equivalent to: a = b + c (in C)• Where MIPS registers $s0, $s1, $s2 are associated with C
variables a, b, c
• Subtraction in Assembly– Example
Sub $s3, $s4, S5 (in MIPS)• Equivalent to: d = e - f (in C)• Where MIPS registers $s3, $s4, $s5 are associated with C
variables d, e, f
ECE 15B Spring 2010
Addition and Subtraction of Integers
• The following C statement in MIPS?a= b + c+ d - a
• Break into multiple instructions add $t0, $s1, $s2 #temp = b + c add $t0, $t0, $s3 # temp = temp + d sub $s0, $t0, $s4 # a = temp – e
– Notes: • A single line of C may break up into several lines of MIPS• Everything after the hash mark on each line is ignored (i.e.
comments)
ECE 15B Spring 2010
Addition and Subtraction of Integers
• How do we do this? f = (g + h) – (i + j)Use intermediate temporary registers
add $t0, $s1, $s2 #temp = g + hadd $t1, $s3, $s4 #temp = I + j
sub $s0, $t0, $t1 #f = (g+h)-(i+j)
ECE 15B Spring 2010
Immediates
• Immediates are numerical constants• They appear often in code, so there are special
instructions for them• Add immediate:
addi $s0, $s1, 10 # f= g + 10 (in C)– Where MIPS registers $s0 and $s1 are
associated with C variables f and g– Syntax similar to add instruction, except
that last argument is a number instead of register
ECE 15B Spring 2010
Immediates
• There is no Subtract Immediate in MIPS: Why?– Remove redundant operations, i.e. if operation can be
decomposed to into simpler ones exclude it from the set of instructions
addi …, -X is equivalent to subi …, X so no subi
• Exampleaddi $so, $s1, -10 # f = g – 10– where MIPS registers $s0 and $s1 are associated with C
variables f and g
ECE 15B Spring 2010
Register Zero
• One particular immediate, the number zero (o) appears very often in code– So define register zero ($0 or $zero) to always have the
value 0• Example
add $s0, S1, $zero # f = g• Where MIPS registers $s0 and $s1 are associated with C variables
f, g,
• Defined in hardware, so an instruction addi $zero, $zero, 5will not do anything!
Additional Notes: CPU Time
• Performance improved by– Reducing number of clock cycles– Increasing clock rate– Hardware designer must often trade off clock rate
against cycle count
Rate Clock
Cycles Clock CPU
Time Cycle ClockCycles Clock CPUTime CPU
ECE 15B Spring 2010
Additional Notes: CPU Time Example• Computer A: 2GHz clock, 10s CPU time• Designing Computer B
– Aim for 6s CPU time– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
4GHz6s
1024
6s
10201.2Rate Clock
10202GHz10s
Rate ClockTime CPUCycles Clock
6s
Cycles Clock1.2
Time CPU
Cycles ClockRate Clock
99
B
9
AAA
A
B
BB
ECE 15B Spring 2010
Additional Notes: Instruction Count and CPI
• Instruction Count for a program– Determined by program, ISA and compiler
• Average cycles per instruction– Determined by CPU hardware– If different instructions have different CPI
• Average CPI affected by instruction mix
Rate Clock
CPICount nInstructio
Time Cycle ClockCPICount nInstructioTime CPU
nInstructio per CyclesCount nInstructioCycles Clock
ECE 15B Spring 2010
Additional Notes: CPI Example• Computer A: Cycle Time = 250ps, CPI = 2.0• Computer B: Cycle Time = 500ps, CPI = 1.2• Same ISA• Which is faster, and by how much?
1.2500psI
600psI
ATime CPUBTime CPU
600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU
500psI250ps2.0IATime CycleACPICount nInstructioATime CPU
A is faster…
…by this much
ECE 15B Spring 2010
Additional Notes: CPI in More Detail• If different instruction classes take different
numbers of cycles
n
1iii )Count nInstructio(CPICycles Clock
Weighted average CPI
n
1i
ii Count nInstructio
Count nInstructioCPI
Count nInstructio
Cycles ClockCPI
Relative frequency
ECE 15B Spring 2010
Additional Notes: Pipelining Analogy• Pipelined laundry: overlapping execution
– Parallelism improves performance
Four loads: Speedup
= 8/3.5 = 2.3 Non-stop:
Speedup= 2n/0.5n + 1.5 ≈ 4= number of stages
ECE 15B Spring 2010
Additional Notes: Pipeline PerformanceSingle-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
ECE 15B Spring 2010
ECE 15B Spring 2010
Conclusions
• In MIPS assembly language– Register replace C variables– One instruction (simple operation) per line– Simpler is faster