44
55:035 Computer Architecture and Organization Lecture 3

55:035 Computer Architecture and Organization Lecture 3

Embed Size (px)

Citation preview

Page 1: 55:035 Computer Architecture and Organization Lecture 3

55:035Computer Architecture and Organization

Lecture 3

Page 2: 55:035 Computer Architecture and Organization Lecture 3

Outline RISC and CISC Comparison Instruction Set Examples

ARM Freescale 68K Intel IA-32

255:035 Computer Architecture and Organization

Page 3: 55:035 Computer Architecture and Organization Lecture 3

RISC and CISC Reduced Instruction Set

Computer Fixed length instructions Simpler Instructions Fewer cycles per

instruction Load/Store memory

access Register operands only Probably doesn’t have

microcode RISC is a misnomer – may

have many instructions

Complex Instruction Set Computer

Variable length instructions More complex Instructions More cycles per instruction May have “orthogonal”

instruction set Memory and register

operands May have microcode

55:035 Computer Architecture and Organization 3

Page 4: 55:035 Computer Architecture and Organization Lecture 3

ARM “Advanced RISC Machines” www.arm.com Over 90 ARM processors are shipped every

second – more than any other 32-bit processor IP supplier

ARM licenses its technology to more than 200 semiconductor companies.

Eight product families

55:035 Computer Architecture and Organization 4

Page 5: 55:035 Computer Architecture and Organization Lecture 3

ARM Example ARM CortexTM-A8

processor Intellectual Property (IP)

Core licensed by other

companies to create “System On a Chip” (SOC)

Dual, symmetric, in-order issue, 13-stage pipelines

Integrated L2 cache55:035 Computer Architecture and Organization 5

Page 6: 55:035 Computer Architecture and Organization Lecture 3

ARM Register Structure 15 General Purpose

Registers R14 also link register

By convention R12 frame pointer R13 stack pointer

Current Program Status Register

15 banked registers copied/restored when

going to/from User/Supervisor

55:035 Computer Architecture and Organization 6

31 29 7 0

Program counter

R0

R1

31 0

R14

31 0

Status28

R15 (PC)

30 6 4CPSR

N - NegativeZ - Zero

C - CarryV- Overflow

Condition code flags

Processor mode bits

register

Interrupt disable bits

Generalpurposeregisters

15

Page 7: 55:035 Computer Architecture and Organization Lecture 3

ARM Instruction Format

Load/store architecture (RISC) Conditional execution of instructions One or two operands (register) Destination register See appendix B

55:035 Computer Architecture and Organization 7

Condition

31

OP code

28 27 20 19 16 15 12 11 4 3 0

Rn Rd Other info Rm

Page 8: 55:035 Computer Architecture and Organization Lecture 3

ARM Addressing Modes

where:

EA = effective address

offset = a signed number contained in the instruction

shift = direction #integer, where direction is LSL for left shift or LSR for right shift, and integer is a 5-bit unsigned number specifying the shift amount

+/- Rm = the offset magnitude in register Rm can be added to or subtracted from the contents of base register Rn

55:035 Computer Architecture and Organization 8

Name Assembler syntax Addressing function

With immediate of fset:

Pre-inde xed [Rn, #offset] EA = [Rn] + of fset

Pre-inde xedwith writeback [Rn, #offset]! EA = [Rn] + of fset;

Rn [Rn] + of fset

Post-indexed [Rn], #offset EA = [Rn];Rn [Rn] + of fset

With of fset magnitude in Rm:

Pre-inde xed [Rn, Rm , shift] EA = [Rn] [Rm] shifted

Pre-inde xedwith writeback [Rn, Rm , shift]! EA= [Rn] [Rm] shifted;

Rn [Rn] [Rm] shifted

Post-indexed [Rn], Rm , shift EA = [Rn];Rn [Rn] [Rm] shifted

Relati ve Location EA = Location(Pre-inde xed with = [PC] + of fsetimmediate of fset)

Page 9: 55:035 Computer Architecture and Organization Lecture 3

ARM Relative Addressing Mode LDR R1,ITEM

Pre-indexed mode with immediate offset

PC is base register Calculated offset = 52

PC will be at 1008 when executed

55:035 Computer Architecture and Organization 9

52 = offset

1000

word (4 bytes)

ITEM = 1060 Operand

Memory address

updated [PC] = 1008

***

***

LDR R1, ITEM

1004

1008 -

-

Page 10: 55:035 Computer Architecture and Organization Lecture 3

ARM Pre-indexed Mode STR R3,[R5,R6]

Pre-indexed mode base register = R5 offset register = R6

55:035 Computer Architecture and Organization 10

1000

200 = offset

1000

1200

Base register

200

Offset register

***

***

***

STR R3, [R5, R6] R5

R6

Operand

Page 11: 55:035 Computer Architecture and Organization Lecture 3

ARM Post-indexed Mode w/ WB LDR R1,[R2],R10,LSL #2 Use in loop LSL #2 is logical shift left by 2 bits

=> x4 1st pass: R1 <- [R2] 2nd pass: R1 <- [[R2] + [R10] x 4]

R2 <- [R2] + [R10] x 4 3rd pass: R1 <- [[R2] + [R10] x 4]

R2 <- [R2] + [R10] x 4 and so on

55:035 Computer Architecture and Organization 11

100 = 25 x 4

1000

word (4 bytes)

25

Base register***

6

1100

R2

-17

***

3211200

100 = 25 x 4

1000

Offset register

R10

Memoryaddress

Load instruction:

LDR R1,[R2],R10,LSL #2

Page 12: 55:035 Computer Architecture and Organization Lecture 3

ARM Pre-indexed Mode w/ WB STR R0,[R5, #-4]! Push instruction R5 is SP Immediate offset of -4 is

added to [R5] TOS = 2008

55:035 Computer Architecture and Organization 12

2008

2012

Base register (Stack pointer)

R0

R5

2727

-2012

after execution ofPush instruction

Push instruction:

STR R0,[R5,#-4]!

Page 13: 55:035 Computer Architecture and Organization Lecture 3

ARM Instructions All instructions can be executed conditionally

b31-28 of instruction

Most instructions have shift and rotate operations directly implemented in them barrel shifter

Load/store multiple instructions LDMIA R10!,{R0,R1,R6,R7}

R0 <- [R10], R1 <- [R10]+4, R6 <- [R10]+8, R7 <- [R10]+12 R10 <- [R10] + 16

Condition code set by “S” suffix55:035 Computer Architecture and Organization 13

Page 14: 55:035 Computer Architecture and Organization Lecture 3

ARM Instructions Arithmetic

Opcode Rd,Rn,Rm ADD R0,R2,R4 => R0 <- [R2] + [R4] ADD R0,R3,#17 => R0 <- [R3] + 17

immediate value in b7-0

SUB R0,R6, R5 => R0 <- [R6] – [R5] ADD R0,R1,R5,LSL #4 => R0 <- R1+[R5]x16 MUL R0,R1,R2 => R0 <- [R1] X [R2] MLA R0,R1,R2,R3 => R0 <- [R1]X[R2]+[R3] ADDS R0,R1,R2 => R0 <- [R1] + [R2]

Sets condition codes NCZV

55:035 Computer Architecture and Organization 14

Page 15: 55:035 Computer Architecture and Organization Lecture 3

ARM Instructions Logic

Opcode Rd,Rn,Rm AND R0,R2,R4 => R0 <- [R2] ^ [R4] BIC R0,R0,R1 => R0 <- [R0] ^ ~[R1] MVN R0,R3 => R0 <- ~[R3]

BCD Pack Program

55:035 Computer Architecture and Organization 15

LDR R0,POINTER Load address LOC into R0.LDRB R1,[R0] Load ASCI I charactersLDRB R2,[R0,#1] into R1 and R2.AND R2,R2,#&F Clearhigh-order 28 bits of R2.ORR R2,R2,R1,LSL #4 Or [R1] shifted left into [R2].STRB R2,PACKED Store packed BCD digits

into PACKED.

Page 16: 55:035 Computer Architecture and Organization Lecture 3

ARM Instructions Branch

Contain 2’s complement 24-bit offset

Condition to be tested is in b31-28

BEQ LOCATION BGT LOOP

55:035 Computer Architecture and Organization 16

Condition

31

OP code

28 27

Offset

24 23 0

(a) Instruction format

1000

LOCATION = 1100

BEQ LOCATION

Branch target instruction

1004

updated [PC] = 1008

Offset = 92

Page 17: 55:035 Computer Architecture and Organization Lecture 3

ARM Assembly Language

55:035 Computer Architecture and Organization 17

Memory Addressingaddress or datalabel Operation information

AREA CODEENTR Y

Statements that LDR R1,Ngenerate LDR R2,POINTERmachine MOV R0,#0instructions LOOP LDR R3,[R2],#4

ADD R0,R0,R3SUBS R1,R1,#1BGT LOOPSTR R0,SUM

Assembler directives AREA DATASUM DCD 0N DCD 5POINTER DCD NUM1NUM1 DCD 3, 17,27, 12,322

Assembler directives

Page 18: 55:035 Computer Architecture and Organization Lecture 3

ARM Subroutines Example 1 Parameters passed through registers

Branch and Link instruction (BL)

55:035 Computer Architecture and Organization 18

Calling program

LDR R1,NLDR R2,POINTERBL LIST ADDSTR R0,SUM...

Subroutine

LISTADD STMFD R13!,{R3,R14} Save R3and returnaddress in R14 onstack, using R13 as the stack pointer.

MO V R0,#0LOOP LDR R3,[R2],#4

ADD R0,R0,R3SUBS R1,R1,#1BGT LOOPLDMFD R13!,{R3,R15 } Restore R3 and load return address

into PC (R15).

Page 19: 55:035 Computer Architecture and Organization Lecture 3

ARM Subroutines Example 2 Parameters passed on stack

55:035 Computer Architecture and Organization 19

[R0]

[R1]

[R2]

[R3]

ReturnAddress

n

NUM1

Level 3

Level2

Level 1

(Assumetopofstackisat level1 below.)

Callingprogram

LDR R0,POINTER Push NUM1STR R0,[R13,# 4]! onstack.LDR R0,N Push nSTR R0,[R13,# 4]! onstack.BL LIST ADDLDR R0,[R13,#4] Move thesumintoSTR R0,SUM memorylocation SUM.ADD R13,R13,#8 Removeparametersfromstack....

Subroutine

LIST ADD STMFD R13!,{R0 R3,R14} Saveregisters.LDR R1,[R13,#20] LoadparametersLDR R2,[R13,#24] fromstack.MOV R0,#0

LOOP LDR R3,[R2],#4ADD R0,R0,R3SUBS R1,R1,#1BGT LOOPSTR R0,[R13,#24] Placesumonstack.LDMFD R13!,{R0 R3,R15} Restoreregistersandreturn.

Page 20: 55:035 Computer Architecture and Organization Lecture 3

ARM Program Example Byte sorting program

C program Assembly program

55:035 Computer Architecture and Organization 20

for (j = n 1; j > 0; j = j 1){for ( k = j 1; k>= 0; k = k 1 )

{ if (LIST[k]> LIST[j] ){ TEMP = LIST[k];

LIST[k]= LIST[ j];LIST[ j]= TEMP;

}}

}

–––

–ADR R4,LIST Load list pointerregisterR4,LDR R10,N andinitializeouter loopbaseADD R2,R4,R10 registerR2 to LIST + n.ADD R5,R4,#1 Load LIST + 1 into R5.

OUTER LDRB R0,[R2,# 1]! Load LIST( j ) into R0.MOV R3,R2 Initializeinner loopbaseregister

R3 to LIST + n 1.INNER LDRB R1,[R3,# 1]! Load LIST( k) into R1.

CMP R1,R0 Compare LIST(k) to LIST( j).STRGTB R1,[R2] If LIST( k) > LIST( j ), swapSTRGTB R0,[R3] LIST( k) and LIST( j ), andMOVGT R0,R1 move(new) LIST( j ) into R0.CMP R3,R4 If k > 0,repeatBNE INNER inner loop.CMP R2,R5 If j > 1, repeatBNE OUTER outerloop.

––

Page 21: 55:035 Computer Architecture and Organization Lecture 3

Freescale 68K Freescale Semiconductor

formerly Motorola Semiconductor

www.freescale.com There are more than 17 billion Freescale

semiconductors at work all over the planet. Automobiles, computer networks, communications

infrastructure, office buildings, factories, industrial equipment, tools, mobile phones, home appliances and consumer products

About 20 microprocessor families55:035 Computer Architecture and Organization 21

Page 22: 55:035 Computer Architecture and Organization Lecture 3

68K 68K Family

68000: Introduced in 1979, 16 bit word length and 8/16/32 bit arithmetic, 24 bit address space (16 MB)

68008: 8 bit version of the 68000 with 20 bit address space 68010: Version of the 68000 supporting virtual memory and

virtual machine concepts 68020: Extended addressing capabilities, 32-bit, i-cache 68030: Data cache in addition to the instruction cache, on-

chip memory management unit 68040: Floating-point arithmetic, pipelining, . . . “ColdFire” family added in 1994

V1 through V5 cores

55:035 Computer Architecture and Organization 22

Page 23: 55:035 Computer Architecture and Organization Lecture 3

68K Example ColdFire V5 Core

55:035 Computer Architecture and Organization 23

Page 24: 55:035 Computer Architecture and Organization Lecture 3

68K Register Structure 8 32-bit Data Registers 8 32-bit Address

Registers A7 is Stack Pointer

Separate Supervisor and User pointers

Users cannot execute privileged instructions

Status Register

55:035 Computer Architecture and Organization 24

WordByte

Supervisor stack pointer

Long word

User stack pointer

PC

31 15 7 0816

Program counter

pointersStack

registersData

registersAddress

D0

D1

D2

D3

D4

D5

D6

D7

A0

A1

A2

A3

A4

A5

A6

A7

15 13 10 8 4 0

SR Status register

CarryOverflowZeroNegativeExtend

Trace mode selectSupervisor mode select

Interrupt mask

-T-S

-I

-X

-Z-N

-V-C

Page 25: 55:035 Computer Architecture and Organization Lecture 3

68K Instruction Format

Three operand sizes: Byte, Word, Long Word All addressing modes supported (CISC) One or two operands See appendix C

55:035 Computer Architecture and Organization 25

src1011 dst 0

OP code

size

58111215 9 7 6 0

Page 26: 55:035 Computer Architecture and Organization Lecture 3

68K Addressing Modes

where:

EA = effective address

Value = a number given either explicitly or represented by a label

BValue = an 8-bit Value

WValue = a 16-bit Value

An = an address register

Rn = an address or a data register

S = a size indicator

55:035 Computer Architecture and Organization 26

Name Assemblersyntax Addressingfunction

Immediate #Value Operand= Value

AbsoluteShort Value EA = SignExtended WValue

AbsoluteLong Value EA = Value

Register Rn EA = Rn

that is, Operand = [Rn ]

RegisterIndirect (An) EA = [An ]

Autoincrement (An)+ EA = [An ];Increment An

Autodecrement (An) Decrement An ;EA = [An ]

Indexedbasic WValue(An) EA = WValue + [An ]

Indexedfull BValue(An,Rk.S) EA = BValue + [An ] +[Rk ]

Relativebasic WValue(PC) EA = WValue + [PC]or Label

Relative full BValue(PC,Rk.S) EA = BValue + [PC] + [Rk ]or Label (Rk)

Page 27: 55:035 Computer Architecture and Organization Lecture 3

68K Instructions Format – see appendix C

Opcode src,dst Opcode src

Arithmetic examples ABCD, ADD, ADDA, ADDI, ADDQ, ADDX DIVS, DIVU, MULS, MULU SBCD, SUB, SUBA, SUBI, SUBQ,

Logic examples AND, ANDI, EOR, EORI NBCD, NEG, NEGX, NOP, NOT, OR, ORI, SWAP

55:035 Computer Architecture and Organization 27

Page 28: 55:035 Computer Architecture and Organization Lecture 3

68K Instructions Shift examples

ASL, ASR, BCHG, EXT, LSL, LSR ROL, ROR, ROXL,

Bit test and compare BCLR, BSET, BTST, TAS, TST CMP, CMPA, CMPI, CMPMEXG

Branch examples JMP, JSR, RESET, RTE, RTR, RTS, STOP, TRAP, TRAPV

Memory load and store examples LEA, PEA, LINK, UNLINK MOVE, MOVEA, MOVEM, MOVEP, MOVEQ

55:035 Computer Architecture and Organization 28

Page 29: 55:035 Computer Architecture and Organization Lecture 3

68K Assembly Language

55:035 Computer Architecture and Organization 29

R0Clear

R0,SUM

R1(R2)+,R0

Initialization

Move

LOOP AddDecrement

LOOP

#NUM1,R2N,R1Move

Move

Branch>0

MOVE.L N,D1 Put n 1 intotheSUBQ.L #1,D1 counter register D1MOVEA.L #NUM1,A2CLR.L D0

LOOP ADD.W (A2)+,D0DBRA D1,LOOP Loopback until[D1]=–1.MOVE.L D0,SUM

Page 30: 55:035 Computer Architecture and Organization Lecture 3

68K Subroutines

55:035 Computer Architecture and Organization 30

[D0]

[D1]

[A2]

Returnaddress

n

NUM1

Level 3

Level2

Level1

Callingprogram

MOVE.L #NUM1, (A7) Pushparameters onto stack.MOVE.L N, (A7)BSR LISTADDMOVE.L 4(A7),SUM Save result.ADDI.L #8,A7 Restoretopofstack....

Subroutine

LISTADD MOVEM.L D0 D1/A2, (A7) SaveregistersD0,D1,and A2.MOVE.L 16(A7),D1 Initializecounter to n.SUBQ.L #1,D1 Adjust countto useDBRA.MOVEA.L 20(A7),A2 Initialize pointertothelist.CLR.L D0 Initializesumto 0.

LOOP ADD.W (A2)+,D0 Addentryfromlist.DBRA D1,LOOPMOVE.L D0,20(A7) Put resulton thestack.MOVEM.L (A7)+,D0 D1/A2 Restoreregisters.RTS

Page 31: 55:035 Computer Architecture and Organization Lecture 3

68K Program Example Byte sorting program

C program Assembly program

55:035 Computer Architecture and Organization 31

for (j = n 1; j > 0; j = j 1){for ( k = j 1; k>= 0; k = k 1 )

{ if (LIST[k]> LIST[j] ){ TEMP = LIST[k];

LIST[k]= LIST[ j];LIST[ j]= TEMP;

}}

}

–––

–MOVEA.L #LIST,A1 Pointertothestartofthe list.MOVE N,D1 Initializeouter loopSUBQ #1,D1 indexj in D1.

OUTER MOVE D1,D2 InitializeinnerloopSUBQ #1,D2 indexk in D2.MOVE.B (A1,D1),D3 Currentmaximum value in D3.

INNER CMP.B D3,(A1,D2) If LIST( k) [D3],BLE NEXT donotexchange.MOVE.B (A1,D2),(A1,D1) Interchange LIST(k)MOVE.B D3,(A1,D2) andLIST( j) andloadMOVE.B (A1,D1),D3 newmaximum into D3.

NEXT DBRA D2,INNER Decrement counters k and jSUBQ #1,D1 andbranch backBGT OUTER if notfinished.

Page 32: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Intel Corporation www.intel.com developer.intel.com Microprocessor used in PCs and Apple computers Processor Families

Desktop processors Server and workstation processors Internet device processors Notebook processors Embedded and communications processors

55:035 Computer Architecture and Organization 32

Page 33: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Intel microprocessor history

55:035 Computer Architecture and Organization 33

Page 34: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Example P6 Microarchitecture

55:035 Computer Architecture and Organization 34

Page 35: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Example The centerpiece of the P6 processor microarchitecture

is an out-of-order execution mechanism called dynamic execution. Dynamic execution incorporates three data processing concepts:

Deep branch prediction allows the processor to decode instructions beyond branches to keep the instruction pipeline full.

Dynamic data flow analysis requires real-time analysis of the flow of data through the processor to determine dependencies and to detect opportunities for out-of-order instruction execution.

Speculative execution refers to the processor’s ability to execute instructions that lie beyond a conditional branch that has not yet been resolved, and ultimately to commit the results in the order of the original instruction stream.

55:035 Computer Architecture and Organization 35

Page 36: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Register Structure 8 32-bit Data Registers 8 64-bit Floating Point

Registers 6 Segment Registers

55:035 Computer Architecture and Organization 36

R0

R1

31 0

R7

FP0

FP1

FP7

63 0

CS

16 0

SS

ES

FS

GS

DS

Code Segment

Stack Segment

Data Segments

Generalpurposeregisters

8

Floating-pointregisters

8

Segmentregisters

6

Page 37: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Register Structure 32-bit Instruction pointer Status register

Privilege level Condition codes

55:035 Computer Architecture and Organization 37

31 13 11 9 7 0

Instruction pointer

CF - CarryZF - ZeroSF - SignTF - Trap

IOPL - Input/Output

OF - Overflow

IF - Interrupt enable

31 0

Status register

12 8 6

privilege level

Page 38: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Instruction Format

Variable instruction length (CISC) See appendix D

55:035 Computer Architecture and Organization 38

Prefix

1 to 4

OP code ModR/M SIB Displacement Immediate

bytes1 or 2bytes

1byte

1 or 4bytes

1byte

1 or 4bytes

Addressing mode

Page 39: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Addressing Modes

where:

Value = an 8- or 32-bit signed number

Location = a 32-bit address

Reg, Reg1, Reg2 = one of the general purpose registers EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI, with the exception that ESP cannot be used as an index register

Disp = an 8- or 32-bit signed number, except that in the Index with displacement mode it can only be 32 bits

S = scale factor of 1, 2, 4, or 8

55:035 Computer Architecture and Organization 39

Name Assembler syntax Addressing function

Immediate Value Operand= Value

Direct Location EA= Location

Register Reg EA =Regthatis,Operand=[Reg]

Registerindirect [Reg] EA = [Reg]

Basewith [Reg+Disp] EA = [Reg]+Dispdisplacement

Indexwith [Reg S + Disp] EA = [Reg] S +Dispdisplacement

Basewithindex [Reg1+Reg2 * S] EA = [Reg1]+[Reg2] S

Basewithindex [Reg1+Reg2 * S + Disp] EA = [Reg1]+[Reg2] S+Dispanddisplacement

*

Page 40: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Instructions Arithmetic examples

ADC, ADD, CMC, DEC, DIV, IDIV, IMUL, MUL SBB, SUB

Logic examples AND, CLC, STC NEG, NOP, NOT, OR, XOR

55:035 Computer Architecture and Organization 40

Page 41: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Instructions Shift examples

RCL, RCR, ROL, ROR, SAL, SAR, SHL, SHR

Bit test and compare BT, BTC, BTR, BTS, CMP, TEST

Branch examples CALL, RET, CLI, STI, HLT, INT, IRET LOOP, LOOPE,

Memory/IO load and store examples LEA, MOV, MOVSX, MOVZX IN, OUT, POP, POPAD, PUSH, PUSHAD XCHG

55:035 Computer Architecture and Organization 41

Page 42: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Assembly Language

55:035 Computer Architecture and Organization 42

Assembler directives

.dataNUM1 DD 17, 3,51,242, 113N DD 5SUM DD 0

.code

Statements that generatemachine instructions

MAIN : LEA EBX ,NUM1SUB EBX ,4MOV ECX ,NMOV EAX , 0

STARTADD : ADD EAX , [EBX+ECX 4]LOOP STARTADDMOV SUM,EAX

Assembler directives END MAIN

*

Page 43: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Subroutines

55:035 Computer Architecture and Organization 43

Callingprogram

PUSH OFFSET NUM1 Pushparametersonto the stack.PUSH NCALL LIST ADD Branch to thesubroutine.ADD ESP,4 Remove n from the stack.POP SUM Pop the sum into SUM....

Subroutine

LIST ADD: PUSH EDI Save EDI and useMO V EDI,0 as indexregister.PUSH EAX Save EAX and useasMO V EAX,0 accummulator register.PUSH EBX Save EBX and loadMO V EBX,[ESP+20] address NUM1.PUSH ECX Save ECX andMO V ECX,[ESP+20] loadcount n.

STARTADD: ADD EAX,[EBX+EDI 4] Add next number.INC EDI Increment index.DEC ECX Decrement counter.JG START ADD Branch back if not done.MO V [ESP+24],EAX Overwrite NUM1 in stack with sum.POP ECX Restoreregisters.POP EBXPOP EAXPOP EDIRET Return.

*

[ECX]

[EBX]

[EAX]

[EDI]

ReturnAddress

n

NUM1

Level3

Level2

Level1

Page 44: 55:035 Computer Architecture and Organization Lecture 3

IA-32 Program Example Byte sorting program

C program Assembly program

55:035 Computer Architecture and Organization 44

for (j = n 1; j > 0; j = j 1){for ( k = j 1; k>= 0; k = k 1 )

{ if (LIST[k]> LIST[j] ){ TEMP = LIST[k];

LIST[k]= LIST[ j];LIST[ j]= TEMP;

}}

}

–––

–LEA EAX,LIST Loadlist pointerbaseMOV EDI,N register(EAX),andinitializeDEC EDI outer loopindexregister

(EDI) to j=n 1.OUTER: MOV ECX,EDI Initializeinnerloopindex

DEC ECX register(ECX) to k= j 1.MOV DL,[EAX+EDI] Load LIST(j) intoregisterDL.

INNER: CMP [EAX +ECX],DL CompareLIST(k) to LIST(j).JLE NEXT If LIST(k) LIST(j), goto

next lower kindexentry;XCHG [EAX+ECX],DL Otherwise, interchange LIST(k)

and LIST(j), leavingMOV [EAX+EDI],DL newLIST(j) in DL.

NEXT: DEC ECX Decrement inner loop index k.JGE INNER Repeat or terminate inner loop.DEC EDI Decrement outer loop index j.JG OUTER Repeat or terminate outer loop.