27
CSE331 W05.1 Irwin Fall 2007 PSU CSE 331 Computer Organization and Design Fall 2007 Week 5 Section 1: Mary Jane Irwin ( www.cse.psu.edu/~mji ) Section 2: Krishna Narayanan Course material on ANGEL: cms.psu.edu [adapted from D. Patterson slides]

CSE 331 Computer Organization and Design Fall 2007 Week 5

  • Upload
    leola

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE 331 Computer Organization and Design Fall 2007 Week 5. Section 1: Mary Jane Irwin ( www.cse.psu.edu/~mji ) Section 2: Krishna Narayanan Course material on ANGEL: cms.psu.edu [ adapted from D. Patterson slides ]. Head’s Up. Last week’s material Supporting procedure calls and returns - PowerPoint PPT Presentation

Citation preview

Page 1: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.1 Irwin Fall 2007 PSU

CSE 331Computer Organization and

DesignFall 2007

Week 5

Section 1: Mary Jane Irwin (www.cse.psu.edu/~mji)

Section 2: Krishna Narayanan

Course material on ANGEL: cms.psu.edu

[adapted from D. Patterson slides]

Page 2: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.2 Irwin Fall 2007 PSU

Head’s UpLast week’s material

Supporting procedure calls and returns

This week’s material Addressing modes; Assemblers, linkers and loaders

- Reading assignment - PH: 2.10, A.1-A.5

Next week’s material (after Exam #1) Introduction to VHDL and basic VHDL constructs

- Reading assignment – Y, Chapters 1 through 3

Reminders HW 4 is due Thursday, Sept 27th (by 11:55pm) Quiz 3 will close Sunday, Sept 30th (at 11:55pm) Exam #1 is Tuesday, Oct 2, 6:30 to 7:45pm, 113 IST

Page 3: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.3 Irwin Fall 2007 PSU

CDT Article on IBM’s BlueGene/P (9/25/07) Calculations per second

Gigaflop 1 billion (1,000,000,000) Teraflop 1 trillion (1,000,000,000,000) Petaflop 1 quadrillion

(1,000,000,000,000,000) Exaflop 1 quintrillion (1,000,000,000,000,000,000)Year Performance Maker

1993 16,000,000,000 Thinking Machine C-5

1994 236,000,000,000 Fujitsu Numerical Wind Tunnel

1996 368,000,000,000 Hitachi CP-PACS

1997 1,800,000,000,000 intel ASCI Red

2000 12,300,000,000,000 IBM ASCI White

2002 36,000,000,000,000 NEC Earth Simulator

2004 280,000,000,000,000 IBM BlueGene/L

2008 1,000,000,000,000,000 IBM BlueGene/P

2011 10,000,000,000,000,000 IBM BlueGene/P+

2017 20,000,000,000,000,000 IBM BlueGene/P++

202? 1,000,000,000,000,000,000 TBD

Page 4: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.4 Irwin Fall 2007 PSU

MIPS Operand Addressing Modes SummaryRegister addressing – operand is in a register

Base (displacement) addressing – operand’s address in memory is the sum of a register and a 16-bit constant contained within the instruction

Immediate addressing – operand is a 16-bit constant contained within the instruction

1. Register addressing

op rs rt rd funct Register

word operand

op rs rt offset

2. Base addressing

base register

Memory

word or byte operand

3. Immediate addressing

op rs rt operand

Page 5: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.5 Irwin Fall 2007 PSU

MIPS Instruction Addressing Modes SummaryPC-relative addressing – instruction’s address

in memory is the sum of the PC and a 16-bit constant contained within the instruction

Pseudo-direct addressing – instruction’s address in memory is the 26-bit constant contained within the instruction concatenated with the upper 4 bits of the PC

4. PC-relative addressing

op rs rt offset

Program Counter (PC)

Memory

branch destination instruction

5. Pseudo-direct addressing

op jump address

Program Counter (PC)

Memory

jump destination instruction||

Page 6: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.6 Irwin Fall 2007 PSU

Review: MIPS Instructions, so farCategory Instr OpC Example Meaning

Arithmetic

(R & I format)

add 0 & 20 add $s1, $s2, $s3 $s1 = $s2 + $s3

subtract 0 & 22 sub $s1, $s2, $s3 $s1 = $s2 - $s3

add immediate 8 addi $s1, $s2, 4 $s1 = $s2 + 4

shift left logical 0 & 00 sll $s1, $s2, 4 $s1 = $s2 << 4

shift right logical

0 & 02 srl $s1, $s2, 4 $s1 = $s2 >> 4 (fill with zeros)

shift right arithmetic

0 & 03 sra $s1, $s2, 4 $s1 = $s2 >> 4 (fill with sign bit)

and 0 & 24 and $s1, $s2, $s3 $s1 = $s2 & $s3

or 0 & 25 or $s1, $s2, $s3 $s1 = $s2 | $s3

nor 0 & 27 nor $s1, $s2, $s3 $s1 = not ($s2 | $s3)

and immediate c and $s1, $s2, ff00 $s1 = $s2 & 0xff00

or immediate d or $s1, $s2, ff00 $s1 = $s2 | 0xff00

load upper immediate

f lui $s1, 0xffff $s1 = 0xffff0000

Page 7: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.7 Irwin Fall 2007 PSU

Review: MIPS Instructions, so farCategory Instr OpC Example Meaning

Datatransfer(I format)

load word 23 lw $s1, 100($s2) $s1 = Memory($s2+100)

store word 2b sw $s1, 100($s2) Memory($s2+100) = $s1

load byte 20 lb $s1, 101($s2) $s1 = Memory($s2+101)

store byte 28 sb $s1, 101($s2) Memory($s2+101) = $s1

load half 21 lh $s1, 101($s2) $s1 = Memory($s2+102)

store half 29 sh $s1, 101($s2) Memory($s2+102) = $s1

Cond. branch

(I & R format)

br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L

br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L

set on less than immediate

a slti $s1, $s2, 100

if ($s2<100) $s1=1; else $s1=0

set on less than

0 & 2a slt $s1, $s2, $s3 if ($s2<$s3) $s1=1; else $s1=0

Uncond. jump

jump 2 j 2500 go to 10000

jump register 0 & 08 jr $t1 go to $t1

jump and link 3 jal 2500 go to 10000; $ra=PC+4

Page 8: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.8 Irwin Fall 2007 PSU

Review: MIPS R3000 ISA Instruction Categories

Load/Store Computational Jump and Branch Floating Point

- coprocessor Memory Management Special

3 Instruction Formats: all 32 bits wide

R0 - R31

PCHI

LO

OP rs rt rd shamt funct

OP rs rt 16 bit number

OP 26 bit jump target

Registers

R format

I format

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

J format

Page 9: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.9 Irwin Fall 2007 PSU

RISC Design Principles ReviewSimplicity favors regularity

fixed size instructions – 32-bits small number of instruction formats

Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes

Good design demands good compromises three instruction formats

Make the common case fast arithmetic operands from the register file (load-

store machine) allow instructions to contain immediate operands

Page 10: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.10 Irwin Fall 2007 PSU

The Code Translation Hierarchy

C program

compiler

assembly code

Page 11: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.11 Irwin Fall 2007 PSU

Compiler

Transforms the C program into an assembly language program

Advantages of high-level languages many fewer lines of code easier to understand and debug …

Today’s optimizing compilers can produce assembly code nearly as good as an assembly language programming expert and often better for large programs

smaller code size, faster execution

To learn more take CSE 421

Page 12: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.12 Irwin Fall 2007 PSU

The Code Translation Hierarchy

C program

compiler

assembly code

assembler

object code

Page 13: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.13 Irwin Fall 2007 PSU

Page 14: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.14 Irwin Fall 2007 PSU

Assembler

Does a syntactic check of the code (i.e., did you type it in correctly) and then transforms the symbolic assembler code into object (machine) code

Advantages of assembler much easier than remembering instr’s binary codes can use labels for addresses – and let the assembler

do the arithmetic can use pseudo-instructions

- e.g., “move $t0, $t1” exists only in assembler (would be implemented using “add $t0,$t1,$zero”)

When considering performance, you should count instructions executed, not code size

Page 15: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.15 Irwin Fall 2007 PSU

The Two Main Tasks of the Assembler1. Builds a symbol table which holds the

symbolic names (labels) and their corresponding addresses A label is local if it is used only within the file where

its defined. Labels are local by default. A label is external (global) if it refers to code or

data in another file or if it is referenced from another file. Global labels must be explicitly declared global (e.g., .globl main)

2. Translates each assembly language statement into object (machine) code by “assembling” the numeric equivalents of the opcodes, register specifiers, shift amounts, and jump/branch targets/offsets

Page 16: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.16 Irwin Fall 2007 PSU

MIPS (spim) Memory AllocationMemory

230

words

0000 0000

f f f f f f f c

TextSegment

Reserved

Static data

Mem Map I/O

0040 0000

1000 00001000 8000

7f f f f f fcStack

Dynamic data

$sp

$gp

PC

Kernel Code & Data

Page 17: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.17 Irwin Fall 2007 PSU

Other Tasks of the AssemblerConverts pseudo-instr’s to legal assembly code

register $at is reserved for the assembler to do this

Converts branches to far away locations into a branch followed by a jump

Converts instructions with large immediates into a lui followed by an ori

Converts numbers specified in decimal and hexidecimal into their binary equivalents and characters into their ASCII equivalents

Deals with data layout directives (e.g., .asciiz)Expands macros (frequently used sequences of

instructions)

Page 18: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.18 Irwin Fall 2007 PSU

Typical Object File Pieces Object file header: size and position of the following

pieces of the file Text (code) segment (.text) : assembled object

(machine) code Data segment (.data) : data accompanying the code

static data - allocated throughout the program dynamic data - grows and shrinks as needed

Relocation information: identifies instructions (data) that use (are located at) absolute addresses – not relative to a register (including the PC)

on MIPS only j, jal, and some loads and stores (e.g., lw $t1, 100($zero) ) use absolute addresses

Symbol table: global labels with their addresses (if defined in this code segment) or without (if defined external to this code segment)

Debugging information

Page 19: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.19 Irwin Fall 2007 PSU

An Example

.data

.align 0str: .asciiz "The answer is "cr: .asciiz "\n"

.text

.align 2

.globl main

.globl printfmain: ori $2, $0, 5

syscallmove $8, $2

loop: beq $8, $9, doneblt $8, $9, brncsub $8, $8, $9j loop

brnc: sub $9, $9, $8j loop

done: jal printf

Gbl? Symbol Addressstr 1000 0000

cr 1000 000b

yes main 0040 0000

loop 0040 000c

brnc 0040 001c

done 0040 0024

yes printf ???? ????

Relocation Info

Address Data/Instr1000 0000 str

1000 000b cr

0040 0018 j loop

0040 0020 j loop

0040 0024 jal printf

Page 20: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.20 Irwin Fall 2007 PSU

The Code Translation Hierarchy

C program

compiler

assembly code

assembler

object code library routines

executable

linker

machine code

main text segment

printf text segment

Page 21: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.21 Irwin Fall 2007 PSU

LinkerTakes all of the independently assembled code segments

and “stitches” (links) them together Faster to recompile and reassemble a patched segment, than it

is to recompile and reassemble the entire program

1. Decides on memory allocation pattern for the code and data segments of each module Remember, modules were assembled in isolation so each has

assumed its code’s starting location is 0x0040 0000 and its static data starting location is 0x1000 0000

2. Relocates absolute addresses to reflect the new starting location of the code segment and its data segment

3. Uses the symbol tables information to resolve all remaining undefined labels branches, jumps, and data addresses to/in external modules

Linker produces an executable file

Page 22: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.22 Irwin Fall 2007 PSU

Linker Code Schematic

printf: . . .

main: . . . jal ????

call, printf

Linker

Object file

C library

Relocation info

main: . . . jal printfprintf: . . .

Executable file

Page 23: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.23 Irwin Fall 2007 PSU

Linking Two Object FilesH

dr

T

xtse

g

D

seg

R

eloc

S

mtb

l D

bg

File 1

Hdr

T

xtse

g

Dse

g

Rel

oc S

mtb

l D

bg

File 2

+

Executable

Hdr

T

xtse

g

Dse

g

R

eloc

Page 24: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.24 Irwin Fall 2007 PSU

The Code Translation Hierarchy

C program

compiler

assembly code

assembler

object code library routines

executable

linker

loader

memory

machine code

Page 25: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.25 Irwin Fall 2007 PSU

Loader

Loads (copies) the executable code now stored on disk into memory at the starting address specified by the operating system

Copies the parameters (if any) to the main routine onto the stack

Initializes the machine registers and sets the stack pointer to the first free location (0x7fff fffc)

Jumps to a start-up routine (at PC addr 0x0040 0000 on xspim) that copies the parameters into the argument registers and then calls the main routine of the program with a jal main

To learn more take CSE 311 and CSE 411

Page 26: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.26 Irwin Fall 2007 PSU

Dynamically Linked LibrariesStatically linking libraries mean that the library

becomes part of the executable code It loads the whole library even if only a small part is

used (e.g., standard C library is 2.5 MB) What if a new version of the library is released ?

(Lazy) dynamically linked libraries (DLL) – library routines are not linked and loaded until a routine is called during execution

The first time the library routine called, a dynamic linker-loader must

- find the desired routine, remap it, and “link” it to the calling routine (see book for more details)

DLLs require extra space for dynamic linking information, but do not require the whole library to be copied or linked

Page 27: CSE 331 Computer Organization and Design Fall 2007 Week 5

CSE331 W05.27 Irwin Fall 2007 PSU

First Evening Exam

Exam #1: Tuesday, Oct 2, 6:30 to 7:45pm, 113 IST

Look at the practice exam (and solutions) on ANGEL In Section 1 – two people requesting a conflict exam In Section 2 – one person requesting a conflict exam

Questions?