Lecture 5 Assembly Language

– 1 –CSCE 212H Spring 2012

Lecture 5Assembly Language

Lecture 5Assembly Language

TopicsTopics

Assembly Language Lab 2 -

January 27, 2011

CSCE 212 Computer Architecture


OverviewOverviewLast TimeLast Time

Covered through slides 11… of Lecture 4 Floating point: Review, rounding to even, multiplication,

addition Compilation steps

NewNew Architecture (Fred Brooks): Assembly Programmer’s View

Address Modes Swap

Next Time:Next Time: Lab02 - Datalab


Pop Quiz - denormalsPop Quiz - denormals

1.1. What is the representation of the largest denormalized What is the representation of the largest denormalized IEEE float (in binary)?IEEE float (in binary)?

Denormal expField = 0000 0000 Largest denormal all frac bits are 1, ie., frac =111 1111 …1111 Largest denormal representation = 0 0000 0000 111 ….1

2.2. In hex? 0x007FFFFIn hex? 0x007FFFF

3.3. What is its value as an expression, i.e., (-1)What is its value as an expression, i.e., (-1)signsign m * 2 m * 2expexp

Largest denormal’s value = 0.111 1111 … 1111 x 2-BIAS+2

4.4. How many floats are there between 1.0 and 2.0?How many floats are there between 1.0 and 2.0?


5.5. What is a/the representation of minus infinity?What is a/the representation of minus infinity? expField=0xFF, sign bit =1, frac=0x000000 (23 zeroes) -infinity = 0xFF80 0000

6.6. In C are there more ints or doubles?In C are there more ints or doubles? #doubles = (distinct exp)*(number of doubles with same exp) #doubles = (211 – 2)*(252) = 263 – 253 (note this does not count NaN or

+/- infinity as a double (This only counts positives? Ignores 0)

7.7. In Math are there more rationals than integers ?In Math are there more rationals than integers ? Argument for No: the sets have the same cardinality. There are

both countably infinite, where the Reals are uncountable. Argument for yes: every integer is a rational and ½ is a rational

that is not an integer. So actually the way the question is worded what is the best

answer?

8.8. Extra credit for pop quiz 1: what is aleph-0?Extra credit for pop quiz 1: what is aleph-0? http://mathworld.wolfram.com/Aleph-0.html


Pop Quiz – FP multiplication Pop Quiz – FP multiplication

1.1. If x=1.5 what is the If x=1.5 what is the representation of x as a representation of x as a float (in hex)float (in hex)

2.2. And if y=(2-And if y=(2-εε)*2)*23737 Note Note 1.1111.111……1 =(2-1 =(2-εε)) Then what is the frac field

of the float z = x*y

3.3. And what is the And what is the exponent (not the exponent (not the exponent field) of z?exponent field) of z?

4.4. What is the largest gap What is the largest gap between consecutive between consecutive floats?floats?

NoteNote 1.111.11……1111

X_______1.1__X_______1.1__ 111111……1111 (24 bits)(24 bits)

111111……1111 (24 (24 bits)bits)

------------------------------------------ 10.1110.11……101101 (26 bits?)(26 bits?)


Printf conversion specificationsPrintf conversion specifications

% -#0 12 .4 L d

ExamplesExamples

Figure taken from page 368 of “C a Reference Manual” by Harbison and Steele

Start specificationStart specification

FlagsFlags

Minimum field widthMinimum field width

conversion conversion typetype

Size modifierSize modifier

PrecisionPrecision


CYGWINCYGWIN

Unix Emulation under WindowsUnix Emulation under Windows Provides a bash window .bash_profile

Others CH, …Others CH, …

Other Direction: WineOther Direction: Wine

Downloading CYGWINDownloading CYGWIN Google CYGWIN startxwin – run a windows emulator under GYGWIN

Virtual Machines: Virtual Box, VMwareVirtual Machines: Virtual Box, VMware


Homework 1 problem 2.90Homework 1 problem 2.90

/usr/include/math.h/usr/include/math.h # define M_PI 3.14159265358979323846 /* pi */ Hex rep given in problem pi = 0x40490fdb Binary rep 0100 0000 0100 1001 0000 1111 1101 1011

sign +, ExpField = 100 0000 0 Exp = 128-BIAS = 1 Binary val = 1. 100 1001 0000 1111 1101 1011 * 21,

Now from the calculator 22/7= 3 + 1/7Now from the calculator 22/7= 3 + 1/7

=3 + 1428571428571428571428571428571 = 3.(142857)=3 + 1428571428571428571428571428571 = 3.(142857)

And .(142857And .(1428571010) = (.2490) = (.24901616) = (.0010 0100 1001 0000) = (.0010 0100 1001 000022) = .(001) = .(00122))

Pi = 1. 100 1001 0000 1111 1101 1011 * 21,

22/7= 1. 100 1001 0010 0100 0000 1001 0010 0100 00* 21,


Setting Variables and aliases in .bash_profileSetting Variables and aliases in .bash_profilePATH=$HOME/bin:${PATH:-/usr/bin:.}PATH=$HOME/bin:${PATH:-/usr/bin:.}

PATH=$PATH:/usr/local/simplescalar/bin:/usr/local/simplescalar/PATH=$PATH:/usr/local/simplescalar/bin:/usr/local/simplescalar/simplesim-3.0simplesim-3.0

# list of directories separated by colons, used to specify where to # list of directories separated by colons, used to specify where to find commandsfind commands

export PATHexport PATH

PS1="`hostname`> PS1="`hostname`>

w5=/class/csce574-001/web/w5=/class/csce574-001/web/

w=/class/csce212-501/Code/w=/class/csce212-501/Code/

alias h=historyalias h=history

alias lsl="ls -lrt | grep ^d"alias lsl="ls -lrt | grep ^d"

# later you can use the variables in commands like “ls $w”# later you can use the variables in commands like “ls $w”


Intel Registers figure 3.2Intel Registers figure 3.2

Intel microprocessor evolutionIntel microprocessor evolution

40044004 8008 8008 8080 8080 8086 8086 80x86 80x86

Backward compatibiltyBackward compatibilty

Registers of 8080Registers of 8080

A: AH – ALA: AH – AL

C: CH -- CLC: CH -- CL

D: DH – DLD: DH – DL

B: BH -- BLB: BH -- BL

Si,di,sp,bpSi,di,sp,bp


Homework page 105 of textHomework page 105 of text

1.1. 2.562.56

2.2. 2.572.57

3.3. 2.58 give hex representations and the value as an 2.58 give hex representations and the value as an expression of the form 1.xexpression of the form 1.x-1-1xx-2-2…x…x-n-n * 2 * 2 expexp


2.562.56Fill in the return value for the following procedure that Fill in the return value for the following procedure that

tests whether its first argument is greater than or tests whether its first argument is greater than or equal to its second. Assume the function f2u return equal to its second. Assume the function f2u return an unsigned 32-bit number having the same bit an unsigned 32-bit number having the same bit representation as its floating-point argument. You representation as its floating-point argument. You can assume that neither argument is NaN. The two can assume that neither argument is NaN. The two flavors of zero: +0 and -0 are considered equal.flavors of zero: +0 and -0 are considered equal.

int float-qe(float x, float y){int float-qe(float x, float y){unsigned ux = f2u(x);unsigned ux = f2u(x);unsigned uy = f2u(y);unsigned uy = f2u(y);/* Get the sign bits *//* Get the sign bits */unsigned sx = ux >> 31; unsigned sx = ux >> 31; unsiqned sy = uu >> 31; unsiqned sy = uu >> 31; /* Give an expression using only ux, uy, sx and sy *//* Give an expression using only ux, uy, sx and sy */return /* … */ ;return /* … */ ;

}}


2.572.57

Given a floating point format with a k-bit exponent Given a floating point format with a k-bit exponent and an n-bit fraction, write formulas for the exponent and an n-bit fraction, write formulas for the exponent E, significand M, the fraction f, and the value V for E, significand M, the fraction f, and the value V for the quantities that follow. In addition, describe the bit the quantities that follow. In addition, describe the bit representation.representation.

A.A. The number 5.0.The number 5.0.

B.B. The largest odd integer that can be represented The largest odd integer that can be represented exactly.exactly.

C.C. The reciprocal of the smallest positive normalized The reciprocal of the smallest positive normalized value. value.


2.58’ - changed table columns2.58’ - changed table columns

Intel-compatible processors also support an Intel-compatible processors also support an "extended precision" floating-point format with an "extended precision" floating-point format with an 80-bit word divided into a sign bit, k = 15 exponent 80-bit word divided into a sign bit, k = 15 exponent bits, a single integer bit, and n = 63 fraction bits. The bits, a single integer bit, and n = 63 fraction bits. The integer bit is an explicit copy of the implied bit in the integer bit is an explicit copy of the implied bit in the IEEE, floating-point representation. That is, it equals IEEE, floating-point representation. That is, it equals 1- for normalized values and 0 for denormalized 1- for normalized values and 0 for denormalized values. Fill in the following table giving the appropri-values. Fill in the following table giving the appropri-ate values of some "interesting" numbers in this ate values of some "interesting" numbers in this format:format:

Description Representation Value as Expression

Smallest denormalized

Smallest normalized

Largest normalized


New Species: IA64New Species: IA64

NameName DateDate TransistorsTransistors

ItaniumItanium 20012001 10M10M Extends to IA64, a 64-bit architecture Radically new instruction set designed for high performance Will be able to run existing IA32 programs

On-board “x86 engine”

Joint project with Hewlett-Packard

Itanium 2Itanium 2 20022002 221M221M Big performance boost


Assembly Programmer’s ViewAssembly Programmer’s View

Programmer-Visible StateProgrammer-Visible State EIP Program Counter

Address of next instruction

Register FileHeavily used program data

Condition CodesStore status information about

most recent arithmetic operationUsed for conditional branching

EIP

Registers

CPU Memory

Object CodeProgram Data

OS Data

Addresses

Data

Instructions

Stack

ConditionCodes

Memory Byte addressable array Code, user data, (some) OS

data Includes stack used to support

procedures


Moving DataMoving Data

Moving DataMoving Datamovl Source,Dest: Move 4-byte (“long”) word Lots of these in typical code

Operand TypesOperand Types Immediate: Constant integer data

Like C constant, but prefixed with ‘$’E.g., $0x400, $-533Encoded with 1, 2, or 4 bytes

Register: One of 8 integer registersBut %esp and %ebp reserved for special useOthers have special uses for particular instructions

Memory: 4 consecutive bytes of memoryVarious “address modes”

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp


Simple Addressing ModesSimple Addressing Modes

NormalNormal (R)(R) Mem[Reg[R]]Mem[Reg[R]] Register R specifies memory address

movl (%ecx),%eax

DisplacementDisplacement D(R)D(R) Mem[Reg[R]+D]Mem[Reg[R]+D] Register R specifies start of memory region Constant displacement D specifies offset

movl 8(%ebp),%edx


Using Simple Addressing ModesUsing Simple Addressing Modes

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}

swap:pushl %ebpmovl %esp,%ebppushl %ebx

movl 12(%ebp),%ecxmovl 8(%ebp),%edxmovl (%ecx),%eaxmovl (%edx),%ebxmovl %eax,(%edx)movl %ebx,(%ecx)

movl -4(%ebp),%ebxmovl %ebp,%esppopl %ebpret

Body

SetUp

Finish


Understanding SwapUnderstanding Swap

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}

movl 12(%ebp),%ecx # ecx = yp

movl 8(%ebp),%edx # edx = xp

movl (%ecx),%eax # eax = *yp (t1)

movl (%edx),%ebx # ebx = *xp (t0)

movl %eax,(%edx) # *xp = eax

movl %ebx,(%ecx) # *yp = ebx

Stack

Register Variable

%ecx yp

%edx xp

%eax t1

%ebx t0

yp

xp

Rtn adr

Old %ebp %ebp 0

4

8

12

Offset

•••

Old %ebx-4









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

123

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp 0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

123

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

0x120

0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

123

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

0x124

0x120

0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

123

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

456

0x124

0x120

0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

123

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

456

0x124

0x120

123

0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

456

456

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

456

0x124

0x120

123

0x104









0x120

0x124

Rtn adr

%ebp 0

4

8

12

Offset

-4

456

123

Address

0x124

0x120

0x11c

0x118

0x114

0x110

0x10c

0x108

0x104

0x100

yp

xp

%eax

%edx

%ecx

%ebx

%esi

%edi

%esp

%ebp

456

0x124

0x120

123

0x104


Indexed Addressing ModesIndexed Addressing ModesMost General FormMost General Form

D(Rb,Ri,S)D(Rb,Ri,S)

Refers to AddressRefers to Address

Mem[Reg[Rb]+S*Reg[Ri]+ D]Mem[Reg[Rb]+S*Reg[Ri]+ D] D: Constant “displacement” 1, 2, or 4 bytes Rb: Base register: Any of 8 integer registers Ri: Index register: Any, except for %esp

Unlikely you’d use %ebp, either S: Scale: 1, 2, 4, or 8

Special CasesSpecial Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]


Address Computation ExamplesAddress Computation Examples

%edx

%ecx

0xf000

0x100

ExpressionExpression ComputationComputation AddressAddress

0x8(%edx)0x8(%edx)

(%edx,%ecx)(%edx,%ecx)

(%edx,%ecx,4)(%edx,%ecx,4)

0x80(,%edx,2)0x80(,%edx,2)


Address Computation InstructionAddress Computation Instruction

lealleal SrcSrc,,DestDest Src is address mode expression Set Dest to address denoted by expression

UsesUses Computing address without doing memory reference

E.g., translation of p = &x[i]; Computing arithmetic expressions of the form x + k*y

k = 1, 2, 4, or 8.


Some Arithmetic OperationsSome Arithmetic Operations

Format Computation

Two Operand InstructionsTwo Operand Instructionsaddl Src,Dest Dest = Dest + Src

subl Src,Dest Dest = Dest - Src

imull Src,Dest Dest = Dest * Src

sall Src,Dest Dest = Dest << Src Also called shll

sarl Src,Dest Dest = Dest >> Src Arithmetic

shrl Src,Dest Dest = Dest >> Src Logical

xorl Src,Dest Dest = Dest ^ Src

andl Src,Dest Dest = Dest & Src

orl Src,Dest Dest = Dest | Src


Some Arithmetic OperationsSome Arithmetic Operations

Format Computation

One Operand InstructionsOne Operand Instructionsincl Dest Dest = Dest + 1

decl Dest Dest = Dest - 1

negl Dest Dest = - Dest

notl Dest Dest = ~ Dest


Using leal for Arithmetic ExpressionsUsing leal for Arithmetic Expressions

int arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}

arith:pushl %ebpmovl %esp,%ebp

movl 8(%ebp),%eaxmovl 12(%ebp),%edxleal (%edx,%eax),%ecxleal (%edx,%edx,2),%edxsall $4,%edxaddl 16(%ebp),%ecxleal 4(%edx,%eax),%eaximull %ecx,%eax

movl %ebp,%esppopl %ebpret

Body

SetUp

Finish


Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}

movl 8(%ebp),%eax # eax = xmovl 12(%ebp),%edx # edx = yleal (%edx,%eax),%ecx # ecx = x+y (t1)leal (%edx,%edx,2),%edx # edx = 3*ysall $4,%edx # edx = 48*y (t4)addl 16(%ebp),%ecx # ecx = z+t1 (t2)leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5)imull %ecx,%eax # eax = t5*t2 (rval)

y

x

Rtn adr

Old %ebp %ebp 0

4

8

12

OffsetStack

•••

z16


Understanding arithUnderstanding arith

int arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}

# eax = xmovl 8(%ebp),%eax

# edx = ymovl 12(%ebp),%edx

# ecx = x+y (t1)leal (%edx,%eax),%ecx

# edx = 3*yleal (%edx,%edx,2),%edx

# edx = 48*y (t4)sall $4,%edx

# ecx = z+t1 (t2)addl 16(%ebp),%ecx

# eax = 4+t4+x (t5)leal 4(%edx,%eax),%eax

# eax = t5*t2 (rval)imull %ecx,%eax


Another ExampleAnother Example

int logical(int x, int y){ int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval;}

logical:pushl %ebpmovl %esp,%ebp

movl 8(%ebp),%eaxxorl 12(%ebp),%eaxsarl $17,%eaxandl $8185,%eax

movl %ebp,%esppopl %ebpret

Body

SetUp

Finish

movl 8(%ebp),%eax eax = xxorl 12(%ebp),%eax eax = x^y (t1)sarl $17,%eax eax = t1>>17 (t2)andl $8185,%eax eax = t2 & 8185

213 = 8192, 213 – 7 = 8185


CISC PropertiesCISC Properties

Instruction can reference different operand typesInstruction can reference different operand types Immediate, register, memory

Arithmetic operations can read/write memoryArithmetic operations can read/write memory

Memory reference can involve complex computationMemory reference can involve complex computation Rb + S*Ri + D Useful for arithmetic expressions, too

Instructions can have varying lengthsInstructions can have varying lengths IA32 instructions can range from 1 to 15 bytes


Summary: Abstract MachinesSummary: Abstract Machines

1) loops2) conditionals3) switch4) Proc. call5) Proc. return

Machine Models Data Control

1) char2) int, float3) double4) struct, array5) pointer

mem proc

C

Assembly1) byte2) 2-byte word3) 4-byte long word4) contiguous byte allocation5) address of initial byte

3) branch/jump4) call5) retmem regs alu

processorStack Cond.Codes


Pentium Pro (P6)Pentium Pro (P6)HistoryHistory

Announced in Feb. ‘95 Basis for Pentium II, Pentium III, and Celeron processors Pentium 4 similar idea, but different details

FeaturesFeatures Dynamically translates instructions to more regular format

Very wide, but simple instructions

Executes operations in parallelUp to 5 at once

Very deep pipeline12–18 cycle latency


PentiumPro Block DiagramPentiumPro Block Diagram

Microprocessor Report2/16/95


PentiumPro OperationPentiumPro Operation

Translates instructions dynamically into “Uops”Translates instructions dynamically into “Uops” 118 bits wide Holds operation, two sources, and destination

Executes Uops with “Out of Order” engineExecutes Uops with “Out of Order” engine Uop executed when

Operands availableFunctional unit available

Execution controlled by “Reservation Stations”Keeps track of data dependencies between uopsAllocates resources

ConsequencesConsequences Indirect relationship between IA32 code & what actually gets

executed Tricky to predict / optimize performance at assembly level


Whose Assembler?Whose Assembler?

Intel/Microsoft Differs from GASIntel/Microsoft Differs from GAS Operands listed in opposite order

mov Dest, Src movl Src, Dest

Constants not preceded by ‘$’, Denote hex with ‘h’ at end100h $0x100

Operand size indicated by operands rather than operator suffixsub subl

Addressing format shows effective address computation[eax*4+100h] $0x100(,%eax,4)

lea eax,[ecx+ecx*2]sub esp,8cmp dword ptr [ebp-8],0mov eax,dword ptr [eax*4+100h]

leal (%ecx,%ecx,2),%eaxsubl $8,%espcmpl $0,-8(%ebp)movl $0x100(,%eax,4),%eax

Intel/Microsoft Format GAS/Gnu Format

Documents

Lecture 5 Assembly Language