Assembly Language Fundamentals

1

Assembly Language FundamentalsAssembly Language Fundamentals

Read Chapter 3 and Chapter 4 of textbookRead Chapter 3 and Chapter 4 of textbook

2

Directives and InstructionsDirectives and Instructions

Assembly language statements are either directives or instructions

Instructions are executable statements. They are translated by the assembler into machine instructions. Ex:

call MySub ;transfer of controlmov ax,5 ;data transfer

Directives tells the assembler how to generate machine code and allocate storage. Ex:

count db 50 ;creates 1 byte ;of storage

;initialized to 50

3

A Template for Assembly Language ProgramsA Template for Assembly Language Programs

.386 = directive to accept all instructions of 386 and previous processors (use .586 to assemble Pentium specific instructions)

end = directive that marks the end of the program

main = label of the entry point of the program (first instruction to execute)

ret = instruction that returns the control to the caller (here the Win32 console)

Macros to perform I/O are included in cs266.inc

.386

.model flatinclude cs266.inc

.data ;data allocation ;directives here

.code

main: ;instructions here

retend

Minimal template of your programs

4

The FLAT Memory ModelThe FLAT Memory Model

The .model flat directive tells the assembler to generate code that will run in protected mode and in 32-bit mode

Also ask the assembler to do whatever is needed in order that code, stack, and data share the same 32-bit memory segment All the segment registers will be loaded with the correct

values at load time and do not need to be changed by the programmer

Only the offset part of a logical address becomes relevant Each data byte (or instruction) is referred to only by a 32-bit

offset address The directives .code and .data mark the beginning of the

code and data segments. They are used only for protection.code is read-only.data is read and write

5

Steps to Produce an Executable FileSteps to Produce an Executable File

The assembler produces an object file from the assembly language source

The object file contains machine language code with some external and relocatable addresses that will be resolved by the linker. There values are undetermined at that stage.

The linker extract object modules (compiled procedures) from a library and links them with the object file to produce the executable file.

The addresses in the executable file are all resolved but they are still logical addresses.

Assembler linkerSource file

Object file

library

Executable file

.obj.asm .exe

.exe

6

Using Borland’s BCC32Using Borland’s BCC32

All these steps are performed with the command: bcc32 –v hello.asm This is the Borland C++ compiler

The bcc32 command calls TASM32 to assemble and produce an object file

It then calls ILINK32 to link this object file with the C/C++ library functions and Win32 functions used by the program to produce the executable file hello.exe

The –v option produces full debugging info td32 hello to debug ‘hello.asm’ program

See the my home page for all the info you need

7

Names and VariablesNames and Variables

A name identifies either: a variable a label a constant a keyword (assembler-reserved word).

8

Names and Variables (Cont.)Names and Variables (Cont.) A variable is a symbolic name for a location in memory that was

allocated by a data allocation directive. Ex:

count db 50 ; allocates 1 byte to

; variable count

A label is a name given to an instruction. It must be followed by ‘:’. Ex:

main: mov eax, 5 xor eax, ebx jump main

It specifies the Entry Point of a block of instructionsThe Entry Point here is “mov eax, 5”

9

Names (Cont.)Names (Cont.)

The first character must be a letter or any one of ‘@’, ‘_’, ‘$’, ‘?’

subsequent characters can include digits A programmer chosen name must be different

from an assembler reserved word avoid using ‘@’ as the first character since many

keywords start with it When called from bcc32, the TASM32 assembler is

case sensitive for user-defined words but case insensitive for the assembler reserved words

10

Integer ConstantsInteger Constants

Integer constants are made of numerical digits with, possibly, a sign and a suffix. Ex: -23 (a negative integer, base 10 is default) 1011b (a binary number) 1011 or 1011d (a decimal number) 513o or 531q (an octal number) 0A7Ch (an hexadecimal number) A7Ch (this is the name of a variable, an

hexadecimal number must start with a decimal digit)

11

Character and String ConstantsCharacter and String Constants

They are any sequence of characters enclosed either in single or double quotation marks. Embedded quotes are permitted. Ex: ‘A’ ‘ABC’ “Hello World!” “123” (this is a string, not a number) “This isn’t a test” ‘Say “hello” to him’

12

Simple Data Allocation DirectivesSimple Data Allocation Directives

The DB (define byte) directive allocates storage for one or more byte values [variable name] DB initval [,initval]

Each initializer can be any constant. Ex:

a db 10, 32, 41h ;allocate 3 bytes

b db 0Ah, 20h,‘A’;same values as above A question mark (?) in the initializer leaves the initial

value of the variable undefined. Ex:

c db ? ;the initial value for c is ;undefined

Everything that follows “;” is ignored by the assembler. It is thus a comment

13

Simple Data Allocation Directives (cont.)Simple Data Allocation Directives (cont.)

A string is stored as a sequence of characters. Ex:

aString db “ABCD”

bString DB ‘A’,’B’,’C’,’D’;same values

cString db 41h,42h,43h,44h ;same values again

The (offset) address of a variable is the address of its first byte. Ex: If the following data segment starts at address 0

.data

Var1 db “ABC”

Var2 db “DEFG” The address of Var1 is 0 = the address of ‘A’ The address of ‘B’ is 1 The address of ‘C’ is 2 The address of Var2 is 3 The address of ‘E’ is 4 …

14


DW (Define Word) allocates a sequence of words. Ex:

A DW 1234h, 5678h ; allocates 2 words

Intel’s x86 are little endian processors: the lowest order byte (of a word or double word) is always stored at the lowest address.

Ex: if variable A (above) is located at address 0, we have: address: 0 1 2 3 value: 34h 12h 78h 56h

15


DD (Define Double Word) allocates a sequence of double words. Ex:

B DD 12345678h ;allocates 1 double word

If this variable is located at address of 0, we have: address: 0 1 2 3 value: 78h 56h 34h 12h

If a value fits into a byte, it will be stored in the lowest ordered byte available. Ex:

V DW ‘A’ the value will be stored as:

address: 0 1 value: 41h 00h

16


The DUP operator enables us to repeat values when allocating storage. Ex:

a db 100 dup(?) ;100 bytes

;uninitializedb db 3 dup(“Ho”) ;6 bytes: “HoHoHo”

DUP can be nested:c db 2 dup(‘a’, 2 dup(‘b’)) ;this allocates 6 bytes:‘abbabb’

DUP must be used with data allocation directives There is a bug is some TASM32 versions:

b db 3 dup(“Ho”) Will allocate 6 bytes that will be filled with 0 (i.e. the specified

initial values are ignored).

17

ConstantsConstants

We can use the equal-sign (=) directive or the EQU directive to give a name to a constant. Ex:

one = 1 ;this is a constant

two equ 2; also a constant The EQU and = directives are equivalent The assembler does not allocate storage to a

constant (in contrast with data allocation directives)

It merely substitutes, at assembly time, the value of the constant at each occurrence of the assigned name

18

Constants (cont.)Constants (cont.)

In place of a constant, we can use a constant expression involving the standard operators used in HLLs: +, -, *, /

Ex: the following constant expression is evaluated at assembly time and given a name at assembly time:

A = (-3 * 8) + 2 A constant can be defined in terms of another

constant:

B = (A+2)/2

19

ExerciseExercise 1 1

Suppose that the following data segment starts at address 0

.dataA DW 1,2B DW 6ABCh Z EQU 232C DB 'ABCD'

A) Find the address of variable A. B) Find the address of variable B. C) Find the address of variable C. D) Find the address of character ‘C’.

20

Data Transfer InstructionsData Transfer Instructions

The MOV instruction transfers the content of the source operand to the destination operand

mov destination,source This changes the content of destination (but not the content of

source) All two-operands instructions are in the form OpCode Dst, Src

Both operands must be of the same size. An operand can be either direct or indirect Direct operands (this chapter) are either:

Immediate (a constant): noted Imm Register: noted Reg Memory variable (with displacement), noted Mem

Indirect operands are used for indirect addressing (later chapter)

21

Data Transfer Instructions (cont.)Data Transfer Instructions (cont.)

Some restrictions on MOV: imm cannot be the destination operand... EIP cannot be an operand Source and destination cannot both be mem.

Direct memory-to-memory data transfer is forbidden!

mov wordVar1,wordVar2; illegal

22


The type of an operand is given by its size (byte, word, doubleword…)

Both operands of MOV must be of the same type Type check is done by the assembler The type assigned to a mem operand is given by

its data allocation directive (DB, DW…) The type assigned to a register is given by its size An imm source operand of MOV must fit into the

size of the destination operand

23


Examples of MOV usage:

mov bh, 255; 8-bit operands

mov al, 256; error: cst too large

mov bx, AwordVar; 16-bit operands

mov bx, AbyteVar; error: size mismatch

mov edx, AdoublewordVar;32-bit operands

mov cx, bl ; error: size mismatch

mov wordVar1,wordVar2 ;error: mem-to-mem

24

MOVZX: Move with Zero ExtendMOVZX: Move with Zero Extend Often we want to move the content of a source operand into

a destination operand of larger size The MOVZX instruction does this operation by filling with

zeros the high order part of the destination. Usage: MOVZX destination,source

Immediate operands are not allowed here The size of destination must be strictly larger than the size

of source Example:

mov bh, 80hmovzx ah,bh ;illegal, size mismatch movzx ax,bh ;AX = 0080hmovzx ecx,ax ;ECX = 00000080h

Notice that if the signed value in the source operand is negative, then MOVZX will not preserve the sign.

mov bh, 80h ;BH = 80h (negative)movzx ax,bh ;AX = 0080h (positive)

25

MOVSX: Move with Sign ExtendMOVSX: Move with Sign Extend We can use the MOVSX instruction to preserve the sign of

the source operand. Usage: MOVSX destination,source

The high order part of the destination operand will be the sign extension of the source operand The sign extension of a negative number is …111111 The sign extension of a positive number is …0000000 Examples:

mov bh, 80h ;BH = 80h (negative)movsx ax,bh ;AX = FF80h (negative);FFh is the sign extension of 80hmov bl, 7Ah ;BL = 7Ah (positive)movsx ax,bl ;AX = 007Ah (positive);00h is the sign extension of 7Ah

MOVSX preserves the signed value whereas MOVZX preserves the unsigned value

Immediate operands are not allowed and the size of destination must be strictly larger than the size of source.

26

Data Transfer Instructions (cont.)Data Transfer Instructions (cont.) We can add a displacement to a memory operand to access a

memory value without a name Ex:

.data

arrB db 10h, 20h

arrW dw 1234h, 5678h arrB+1 refers to the location one byte beyond the beginning of

arrB and arrW+2 refers to the location two bytes beyond the beginning of arrW.

mov al,arrB ; AL = 10h

mov al,arrB+1 ; AL=20h (mem with displacement)

mov ax,arrW+2 ; AX = 5678h

mov ax,arrW+1 ; AX = 7812h ; little endian

convention!

mov ax,arrW-2 ; AX = 2010h

; negative displacement permitted

27


The XCHG instruction exchanges the content of the source and destination operands:

XCHG destination,source

Only mem and reg operands are permitted (and must be of the same size)

Both operands cannot be mem (direct mem-to-mem exchange is forbidden).

To exchange the content of word1 and word2, we have to do:

mov ax,word1xchg word2,axmov word1,ax

28


Given the following data segment

.data

A dw 1234h,-1

B dd 55h,66778899h Indicate if the following instruction is legal. If it is, indicate

the value, in hexadecimal, of the destination operand immediately after the instruction is executed (please verify your answers with a debugger)

MOV eax,A

MOV bx,A+1

MOV bx,A+2

MOV dx,A+4

MOV cx,B+1

MOV edx,B+2

29

Simple Arithmetic InstructionsSimple Arithmetic Instructions

The ADD instruction adds the source to the destination and stores the result in the destination (source remains unchanged)

ADD destination,source The SUB instruction subtracts the source from

the destination and stores the result in the destination (source remains unchanged)

SUB destination,source Both operands must be of the same size and

they cannot be both mem operands Recall that to perform A - B the CPU in fact

performs A + NEG(B)

30

Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.)

ADD and SUB affect all the status flags according to the result of the operation

ZF (zero flag) = 1 iff the result is zero SF (sign flag) = 1 iff the msb of the result is one OF (overflow flag) = 1 iff there is a signed overflow CF (carry flag) = 1 iff there is an unsigned overflow

Signed overflow: when the operation generates an out-of-range (erroneous) signed value

Unsigned overflow: when the operation generates an out-of-range (erroneous) unsigned value

31

More on OverflowsMore on Overflows

A unsigned overflow occurs if and only if (IFF) the unsigned value of the result does not fit into the destination operand This occurs IFF the unsigned interpretation of

the result is erroneous It is signaled by CF=1

A signed overflow occurs IFF the signed value of the result does not fit into the destination operand This occurs IFF the signed interpretation of the

result is erroneous It is signaled by OF=1

32


Both types of overflow occur independently and are signaled separately by CF and OF

mov al, 0FFh

add al,1 ; AL=00h, OF=0, CF=1

mov al,7Fh

add al, 1 ; AL=80h, OF=1, CF=0

mov al,80h

add al,80h ; AL=00h, OF=1, CF=1

Hence: we can have either type of overflow or both of them at the same time

33

Overflow ExampleOverflow Example

mov ax,4000h add ax,ax ;AX = 8000h

Unsigned Interpretation: The sum of the 2 magnitudes 4000h + 4000h

gives 8000h. This is the result in AX (the unsigned value of the result is correct). CF=0

Signed Interpretation: we add two positive numbers: 4000h + 4000h and have obtained a negative number! the signed value of the result in AX is erroneous.

Hence OF=1

34


mov ax,8000h sub ax,0FFFFh ;AX = 8001h

Unsigned Interpretation: from the magnitude 8000h we subtract the

larger magnitude FFFFh the unsigned value of the result is erroneous.

Hence CF=1 Signed Interpretation:

We subtract -1 from the negative number 8000h and obtained the correct signed result 8001h. Hence OF=0

35


mov ah,40h sub ah,80h ;AH = C0h

Unsigned Interpretation: we subtract from 40h the larger number 80h the unsigned value of the result is wrong.

Hence CF=1 Signed Interpretation:

we subtract from 40h (64) a negative number 80h (-128) to obtain a negative number

the signed value of the result is wrong. Hence OF=1

36


For each of these instructions, give the content (in hexadecimal) of the destination operand and the CF and OF flags immediately after the execution of the instruction (verify your answers with a debugger). ADD AX,BX when AX contains 8000h and BX

contains FFFFh. SUB AL,BL when AL contains 00h and BL contains

80h. ADD AH,BH when AH contains 2Fh and BH

contains 52h. SUB AX,BX when AX contains 0001h and BX

contains FFFFh.

37

Simple Arithmetic Instructions (cont.)Simple Arithmetic Instructions (cont.) The INC (increment) and DEC (decrement)

instructions add 1 or subtracts 1 from a single operand (mem or reg operand)

INC destination

DEC destination They affect all status flags, except CF. Say that

initially we have, CF=OF=0

mov bh,0FFh ; CF=0, OF=0

inc bh ; bh=00h, CF=0, OF=0

mov bh,7Fh ; CF=0, OF=0

inc bh ; bh=80h, CF=0, OF=1

38


The NEG instruction performs the twos complement of its operand

NEG destination Where destination is either mem or reg

CF=0 IFF the result is 0 OF=1 IFF there is a signed overflow. Ex:

mov ax,-5

neg ax; CF = 1, OF = 0

mov ax,8000h

neg ax; CF=1, OF=1 signed overflow!

39

I/O on the Win32 ConsoleI/O on the Win32 Console Our programs will communicate with the user via the Win32

console (the MS-DOS box) Input is done on the keyboard Output is done on the screen

Modern OS like Windows forbids user programs to interact directly with I/O hardware User programs can only perform I/O operation via system calls

For simplicity, our programs will perform I/O operations by using macros that are provided in the cs266.inc file Download “cs266.inc” and save it the directory containing all your

programs These macros are calling C libraries functions like printf() which,

in turn, are calling the Win32 API Hence, these I/O operations will be slow but simple to use and

easy to migrate to another OS We will examine the mechanisms involved in I/O operations

later in the course

40

Character OutputCharacter Output

The putch macro prints on the screen the character of the operand’s ASCII code. Usage:

putch source Where source must be a 32-bit operand

i.e. either imm, reg32, or mem32 (a double word variable) .dataaword dw 41hadword dd 61h.codeputch aword ;error: 16-bit operandputch adword ;‘a’ is written on screenputch ‘b’ ;’b’ is written on screenmov eax,’c’putch eax ;’c’ is written on screenputch ax ;error: 16-bit operand

41

Character Output (cont.)Character Output (cont.)

Also: the cursor will advance one position after printing the character

The putch macro calls the putchar() function from the C library. Hence: The number 10 = 0Ah will direct the cursor to the

beginning of the next line (the “newline character” in C). So the <CR> and <LF> functions are both performed on the screen.

putch 10 ;move the cursor to the

;beginning of next line

42

String OutputString Output

To print a string, use the following macro:

putstr source Where source must be mem operand (i.e. the name of a

variable). It cannot be a reg or imm operand. This macro calls printf(“%s”, ) of the C library. Hence:

The number 10 = 0Ah will move the cursor to the beginning of the next line (the “newline character” in C)

The string must be a “null terminating” string. The last character must have ASCII code = 0h. Ex:

.data

msg db “hello”,0ah,“world”,0h

.code

putstr msg ;prints ‘hello’ on one line

;and ‘world’ on the next line

43

Integer OutputInteger Output

To print the signed value of an integer, use:putint source

Where source must be a 32-bit operand i.e. either imm, reg32, or mem32 (a double word variable) . Ex:

.dataaword dw 243adword dd -266.codeputint aword ;error: 16-bit operandputint adword ;-266 is written on screen

putint -1 ; -1 is written on screenmov eax,0FFFFFFFFhputint eax ;-1 is written on screenputint ax ;error: 16-bit operand

44

Character InputCharacter Input To read one or more character on the keyboard, we will use

the getch macro. Usage:getch

This macro calls getchar() from the C library. So it uses a memory buffer that we will call the input buffer.

Upon execution of getch, the input buffer is first examined. If the input buffer is empty, then getch waits for the user to

enter an input line (a sequence of char ended by <CR>). Each character that the user enters (at the keyboard) is

copied into the input buffer When the user enters the <CR>: the screen cursor move to

the next line, the value 0Ah is stored in the input buffer and the control is passed to the instruction following getch

The ASCII code of the first character entered on the keyboard will be stored in AL. The remaining bits of EAX are filled with zeros. Ex:

mov eax,-1getch ; eax=41h if the user first hits ‘A’

45

Character Input (cont.)Character Input (cont.) Example: Suppose that the input buffer is initially empty

and, upon execution of getch, the users enters “hello”+<CR> on the keyboard.

Then, when the control returns to the instruction following getch, EAX contains 068h (= ‘h’) and the input buffer looks like this:

‘h’ ‘e’ ‘l’‘l’ ‘o’ 0Ah

Pointer tonext char

Pointer tolast char

If the input buffer is not empty when getch is executed, then EAX will get loaded with the ASCII code of the next character in the input buffer and the pointer to the next char will increase by one.

The input buffer is empty only when the pointer to the next char points beyond the last character (i.e: 0Ah)

The user is prompted only when the input buffer is empty

46

Character Input (example)Character Input (example)

Try to understand this program

It first prints “?” and moves the cursor to the next line awaiting user input

When the user enters “abcdef” + <CR>, the program displays (before exiting):

abc But if, instead, the user enters “a” +

<CR>, the program displays:

a

and the cursor moves to the next line awaiting user input. If the user then enters “bcdef”+<CR>, the program prints on the next line (before exiting):

b

.386

.model flatinclude cs266.inc

.code main:

putch '?'putch 10getchputch eaxgetchputch eaxgetchputch eaxret

end

47

I/O Example: Case ConversionI/O Example: Case Conversion.386.model flatinclude cs266.inc

.datamsg1 db "Enter a lower case letter: ",0msg2 db 'In upper case it is: 'char db ?,0

.code main:

putstr msg1 getch ;char in eax and goto next linesub al,20h ;converts to upper casemov char,alputstr msg2ret

end

Documents

Assembly Language Fundamentals