Instruction Set & Assembly Language Programming
Jianjian SONGSoftware Institute, Nanjing University
Content Computer Architecture Taxonomy ARM Architecture Introduction ARM Instruction Set ARM Assembly Language
Programming
1. Computer Architecture Taxonomy
What is architecture?
Architecture & Organization 1 Architecture is those attributes visible to
the programmer Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
e.g. Is there a multiply instruction? Organization is how features are
implemented Control signals, interfaces, memory
technology. e.g. Is there a hardware multiply unit or is it
done by repeated addition?
Architecture & Organization 2
All Intel x86 family share the same basic architecture
The IBM System/370 family share the same basic architecture
This gives code compatibility At least backwards
Organization differs between different versions
von Neumann architecture
Memory holds data, instructions. Central processing unit (CPU)
fetches instructions from memory. Separate CPU and memory
distinguishes programmable computer. CPU registers help out: program
counter (PC), instruction register (IR), general-purpose registers, etc.
CPU + memory
memoryCPU
PC
address
data
IRADD r5,r1,r3200
200
ADD r5,r1,r3
Harvard architecture
CPU
PCdata memory
program memory
address
data
address
data
von Neumann vs. Harvard
Harvard can’t use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth.
RISC vs. CISC
Complex instruction set computer (CISC): many addressing modes; many operations.
Reduced instruction set computer (RISC): load/store; pipelinable instructions.
Load-store Architecture
指令集仅能处理 ( 如 ADD 、 SUB 等 ) 寄存器中 ( 或指令中直接指定 ) 的值,而且总是将处理结果放回寄存器中。针对存储器的唯一操作是将存储器的值装入寄存器 (load 指令 ) ,或将寄存器的值存到存储器 (store 指令 ) 。
相比较,典型的 CISC 处理器允许将存储器中的值加 (ADD) 到寄存器,有时还允许将寄存器的值加 (ADD) 到存储器中。
Instruction set characteristics
Fixed vs. variable length. Addressing modes. Number of operands. Types of operands.
Programming model
Programming model: registers visible to the programmer.
Some registers are not visible (e.g. IR).
Multiple implementations
Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc.
2. ARM Architecture Introduction
ARM (Advanced RISC Machines) ARM 公司是一家设计公司,是 IP 供应
商,靠转让设计许可证由合作伙伴生产各具特色的芯片。 What is IP ? Intellectual Property
ARM 的特点 ARM 具有 RISC 体系的一般特点:
大量寄存器 绝大多数操作都在寄存器中进行,通过 Load/
Store 的在内存和寄存器间传递数据。 寻址方式简单 采用固定长度的指令格式
此外, 小体积、低功耗、低成本、高性能 16 位 /32 位双指令集 全球众多合作伙伴
ARM 体系结构的版本和扩充 六个版本
ARMv1 ~ ARMv6 ARM 体系结构的扩充
Thumb (T variant): 16 位指令集,用以改善指令密度;
DSP (E variant): 用于 DSP 应用的算术运算指令集;
Jazeller (J variant): 允许直接执行 Java字节码 什么是指令密度?
执行同等操作序列的前提下,单位内存空间所容纳的机器指令数。
什么是指令密度?
执行同等操作序列的前提下,单位内存空间所容纳的机器指令数。
ARM 体系结构版本的命名格式 命名字符串 :
ARM vx (x: 指令集版本号, 1~6) 表示变种的字符 ( 如 T, E, J ) 用字符 x 表示排除某种写功能。
ARM 处理器系列 ARM7 系列 ARM9 系列 ARM9E 系列 ARM10 系列 SecureCore 系列 Intel StrongARM Intel XScale
3. ARM Instruction Set ARM assembly language ARM programming model ARM memory organization ARM data operations ARM flow of control
Assembly language
Why assembly language? One-to-one with instructions (more or
less). Basic features:
One instruction per line. Labels provide names for addresses (usually
in first column). Instructions often start in later columns. Columns run to end of line.
ARM assembly language example
label1 ADR r4,cLDR r0,[r4] ; a
commentADR r4,dLDR r1,[r4]SUB r0,r0,r1 ; comment
ARM 指令的一般编码格式
cond 00 X opcode S Rn Rd Shifter-operand
31 28 27 26 25 24 21 20 19 16 15 12 11 0
opcode: 指令操作符编码cond: 指令执行条件编码S: 指令的操作是否影响 CPSR 的值Rn: 包含第一个操作数的寄存器编码Rd: 目标寄存器编码Shifter_operand: 第二个操作数
ARM 指令的基本寻址方式 寄存器寻址
例: ADD R0 , R1 , R2 ; (R1)+(R2)→R0 立即数寻址
例: ADD R3 , R3 , #2 ; (R3)+2→R3 寄存器间接寻址
例: LDR R0 , [R3] ; ((R3))→R0 寄存器变址
例: LDR R0 , [R1, #4] ; ((R1)+4)→R0 相对寻址
例: B rel ; (PC)+rel→PC
Pseudo-ops
Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants.
ARM programming model
r0r1r2r3r4r5r6r7
r8r9
r10r11r12r13r14
r15 (PC)
CPSR
31 0
N Z C V
Endianness
Relationship between bit and byte/word ordering defines endianness:
byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3
bit 31 bit 0 bit 0 bit 31
little-endian big-endian
ARM data types
Word is 32 bits long. Word can be divided into four 8-bit
bytes. ARM addresses can be 32 bits long. Address refers to byte.
Address 4 starts at byte 4. Can be configured at power-up as
either little- or big-endian mode.
ARM status bits
Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative), Z (zero), C (carry), V
(overflow). Examples:
-1 + 1 = 0: NZCV = 0110. 231-1+1 = -231: NZCV = 0101.
Instructions Overview
Data instructions Move Instructions Load/Store instructions Comparison instructions Branch instructions
ARM data instructions
Basic format:ADD r0,r1,r2 Computes r1+r2, stores in r0.
Immediate operand:ADD r0,r1,#2 Computes r1+2, stores in r0.
ARM data instructions
ADD, ADC : add (w. carry)
SUB, SBC : subtract (w. carry)
RSB, RSC : reverse subtract (w. carry)
MUL, MLA : multiply (and accumulate)
AND, ORR, EOR BIC : bit clear LSL, LSR : logical
shift left/right ASL, ASR : arithmetic
shift left/right ROR : rotate right RRX : rotate right
extended with C
Data operation varieties
Logical shift: fills with zeroes.
Arithmetic shift: fills with ones.
RRX performs 33-bit rotate, including C bit from CPSR above sign bit.
ARM move instructions
MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1
ARM load/store instructions
LDR, LDRH, LDRB : load (half-word, byte)
STR, STRH, STRB : store (half-word, byte)
Addressing modes: register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4]
ARM comparison instructions
CMP : compare CMN : negated compare TST : bit-wise test TEQ : bit-wise negated test These instructions set only the
NZCV bits of CPSR.
ARM branch instructions
B: Branch BL: Branch and Link
ARM ADR pseudo-op
Cannot refer to an address directly in an instruction.
Generate value by performing arithmetic on PC.
ADR pseudo-op generates instruction required to calculate address:ADR r1,FOO
Example: C assignments
C: x = (a + b) - c;
Assembler:ADR r4,a ; get address for aLDR r0,[r4] ; get value of aADR r4,b ; get address for b, reusing r4LDR r1,[r4] ; get value of bADD r3,r0,r1 ; compute a+bADR r4,c ; get address for cLDR r2,[r4] ; get value of c
C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of xADR r4,x ; get address for xSTR r3,[r4] ; store value of x
Example: C assignment
C:y = a*(b+c);
Assembler:ADR r4,b ; get address for bLDR r0,[r4] ; get value of bADR r4,c ; get address for cLDR r1,[r4] ; get value of cADD r2,r0,r1 ; compute partial resultADR r4,a ; get address for aLDR r0,[r4] ; get value of a
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for yADR r4,y ; get address for ySTR r2,[r4] ; store y
Example: C assignment
C:z = (a << 2) | (b & 15);
Assembler:ADR r4,a ; get address for aLDR r0,[r4] ; get value of aMOV r0,r0,LSL 2 ; perform shiftADR r4,b ; get address for bLDR r1,[r4] ; get value of bAND r1,r1,#15 ; perform ANDORR r1,r0,r1 ; perform OR
C assignment, cont’d.ADR r4,z ; get address for zSTR r1,[r4] ; store value for z
Additional addressing modes
Base-plus-offset addressing:LDR r0,[r1,#16] Loads from location r1+16
Auto-indexing increments base register:LDR r0,[r1,#16]!
Post-indexing fetches, then does offset:LDR r0,[r1],#16 Loads r0 from r1, then adds 16 to r1.
ARM flow of control
All operations can be performed conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS,
GE, LT, GT, LE Branch operation:
B #100 Can be performed conditionally.
Example: if statement
C: if (a < b) { x = 5; y = c + d; } else x = c - d;
Assembler:; compute and test condition
ADR r4,a ; get address for aLDR r0,[r4] ; get value of aADR r4,b ; get address for bLDR r1,[r4] ; get value for bCMP r0,r1 ; compare a < bBGE fblock ; if a >= b, branch to false block
If statement, cont’d.
; true blockMOV r0,#5 ; generate value for xADR r4,x ; get address for xSTR r0,[r4] ; store xADR r4,c ; get address for cLDR r0,[r4] ; get value of cADR r4,d ; get address for dLDR r1,[r4] ; get value of dADD r0,r0,r1 ; compute yADR r4,y ; get address for ySTR r0,[r4] ; store yB after ; branch around false block
If statement, cont’d.
; false blockfblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of cADR r4,d ; get address for dLDR r1,[r4] ; get value for dSUB r0,r0,r1 ; compute a-bADR r4,x ; get address for xSTR r0,[r4] ; store value of x
after ...
Example: Conditional instruction implementation
; true blockMOVLT r0,#5 ; generate value for xADRLT r4,x ; get address for xSTRLT r0,[r4] ; store xADRLT r4,c ; get address for cLDRLT r0,[r4] ; get value of cADRLT r4,d ; get address for dLDRLT r1,[r4] ; get value of dADDLT r0,r0,r1 ; compute yADRLT r4,y ; get address for ySTRLT r0,[r4] ; store y
Example: switch statement
C: switch (test) { case 0: … break; case 1: … }
Assembler:ADR r2,test ; get address for testLDR r0,[r2] ; load value for testADR r1,switchtab ; load address for switch tableLDR r15,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0DCD case1
...
Example: FIR filter
C:for (i=0, f=0; i<N; i++)f = f + c[i]*x[i];
Assembler; loop initiation code
MOV r0,#0 ; use r0 for IMOV r8,#0 ; use separate index for arraysADR r2,N ; get address for NLDR r1,[r2] ; get value of NMOV r2,#0 ; use r2 for f
FIR filter, cont’.d
ADR r3,c ; load r3 with base of cADR r5,x ; load r5 with base of x
; loop bodyloop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]MUL r4,r4,r6 ; compute c[i]*x[i]ADD r2,r2,r4 ; add into running sumADD r8,r8,#4 ; add one word offset to array indexADD r0,r0,#1 ; add 1 to iCMP r0,r1 ; exit?BLT loop ; if i < N, continue
ARM subroutine linkage
Branch and link instruction:BL foo Copies current PC to r14.
To return from subroutine:MOV r15,r14
Nested subroutine calls
Nesting/recursion requires coding convention:
f1 LDR r0,[r13] ; load arg into r0 from stack
; call f2()
STR r13!,[r14] ; store f1’s return adrs
STR r13!,[r0] ; store arg to f2 on stack
BL f2 ; branch and link to f2
; return from f1()
SUB r13,#4 ; pop f2’s arg off stack
LDR r13!,r15 ; restore register and return
Summary
Load/store architecture Most instructions are RISCy,
operate in single cycle. Some multi-register operations take
longer. All instructions can be executed
conditionally.
4. ARM Assembly Language Programming
Why and when to use? AT&T format and Intel format Grammar of ARM assembly
language Examples
Why and when to use?
操作系统内核中的底层程序直接与硬件打交道,需要用到的专用指令。
CPU 中的特殊指令 频繁使用代码的时间效率 程序的空间效率 ( 如操作系统的引导程
序 )Refer to “Linux 内核源代码情景分析” ( 浙江大学出版社 )1.5 节Refer to “Linux 内核源代码情景分析” ( 浙江大学出版社 )1.5 节
AT&T format and Intel format
Grammar of ARM assembly language
语句 程序格式
语句 语句
指令 伪操作 宏
语句格式 { symbol } { instruction | directive |
pseudo-instruction } { ;comment }
伪操作 符号定义伪操作 数据定义伪操作 汇编控制伪操作 框架描述伪操作 信息报告伪操作 其它伪操作
关于变量的伪操作 声明一个全局变量,并初始化
GBLA, GBLL, GBLS 声明一个局部变量,并初始化
LCLA, LCLL, LCLS 变量赋值
SETA, SETL, SETS
Example
GBLA objectsize ; 声明一个全局的算术变量objectsize SETA 0xff ; 给该变量赋值SPACE objectsize ; 使用该变量
GBLL statusBstatusB SETL {TRUE}
关于数据常量的伪操作 EQU
name EQU expr {, type} 通常在 .inc 文件中
分配内存单元 SPACE
{label} SPACE bye_num 分配一块内存单元,并用 0 初始化
DCB {label} DCB expr, {expr} 分配一段字节内存单元,并用 expr 初始化
DCD {label} DCD expr, {expr} 分配一段字内存单元 ( 分配的内存都是字对齐的 ) ,
并用 expr 初始化
MACRO and MEND 子程序与宏
在子程序比较短,而需要传递的参数比较多的情况下使用宏汇编技术
宏定义体 MACRO: 宏定义的开始 MEND: 宏定义的结束 通常在 .mac 文件中
格式 MACRO {$label} macroname {$para1, $para2, ...} ... ;code MEND
Example
MACRO $label xmac $p1 ... ;code$label.loop1 ; 宏定义体的内部标号 ... ;code BGE $label.loop1$label.loop2 ; 宏定义体的内部标号 ... ;code BL $p1 ;参数 p1 是一个子程序的名称 BGT $label.loop2 ... ;code MEND
Example (cont’d) “abc xmac subr1”调用宏展开后的结果
... ;codeabcloop1 ; 内部标号 label 被 abc 代替 ... ;code BGE abcloop1 ; 内部标号 label 被 abc 代替abcloop2 ; 内部标号 label 被 abc 代替 ... ;code BL subr1 ; 参数 p1被实际值 subr1 代替 BGT abcloop2 ... ;code
其它伪操作 AREA: 定义一个代码段或数据段
AREA sectionname {, attr1} {, attr2} ENTRY: 程序入口点 END: 源程序结束
其它伪操作 (cont’d)
GET/INCLUDE INCLUDE filename
EXPORT EXPORT symbol {[WEAK]}
IMPORT IMPORT symbol {[WEAK]}
伪指令 ADR
ADR{cond} register, expr 将基于 PC 的地址值或基于寄存器的地址值读取到寄存器中 汇编替换成一条指令
ADRL ADRL{cond} register, expr ADRL 伪指令比 ADR读取更大的地址范围。 汇编替换为两条指令
LDR LDR{cond} register, =[expr | label_expr] 将一个 32 位的常数或地址值读取到寄存器中
NOP 空操作,如 MOV R0, R0
程序格式 以段为单位组织源文件
代码段和数据段 AREA 伪操作
Example
Review
Computer architecture and ARM architecture
Instruction set Assembly language programming
Program structure Statements