Upload
others
View
10
Download
1
Embed Size (px)
Citation preview
October 6, 2019 Sam Siewert
CEC 320 and 322Microprocessor Systems
Class and Lab
Lecture 7 - ARM ISA Overview
Survey Says …Below 7.0 is a ProblemAssignments are a Chore - I will try to simplifyCome to Office hours for help!
Sam Siewert 2
Lab #5 DemoUse Potentiometer to detect analog level threshold crossing
Analog input > or < Level– COMP_REF_1_65V from comp.h– ComparatorValueGet()
Count Crossings
Update Display on OLED
Interrupts Service Routines– Pre-installed statically in
startup_ewarm.c– Installed in main dynamically -
comp.h, ComparatorIntRegister()
Check proper wiring of +3.3V, GND, and PC
Power to GND short on bread-board is the main risk, so check!
Sam Siewert 3
Example Menu and Commands
Board Configuration and OLED Display
44
Chapter 2 : Instruction sets2a Preliminaries
Video 2.1.1 Computer architecture taxonomy.2.1.2 Assembly language.
2b ARM Processor2c TI C64x & C55x DSP2d Intel x86 / AMD64
9-Oct-19 /erau/cec320/s19/btd
55
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
66
ARM Technology Overview• ARM: “The Architecture For The Digital
World”• ARM is a physical hardware design and
intellectual property company• ARM licenses its cores out and other
companies make processors based on its cores
• ARM also provides toolchain and debugging tools for its cores
9-Oct-19 /erau/cec320/s19/btd
77
ARM History• Acorn Computer Group developed world’s first
RISC processor in 1985• Roger Wilson and Steve Furber were the principle
developers• ARM (Advanced RISC Machines) was a spin out
from Acorn in 1990 with goal of defining a new microprocessor standard
9-Oct-19 /erau/cec320/s19/btd
88
Classic ARM Variations• ARM7xxx
3 stage pipeline Integer processor MMU support for WinCE, Linux and Symbian Used in entry level mobiles, mp3 players, pagers
• ARM9xxx 5 stage pipeline Separate data and instruction cache Higher end mobile and communication devices Telematic and infotainment systems ARM and Thumb instruction set
• ARM11xxx 7 stage pipeline Trustzone security related extensions Reduced power consumption Speed improvements More DSP and SIMD extensions Used in PDA, smartphones, industrial controllers, mobile gaming
9-Oct-19 /erau/cec320/s19/btd
99
ARM Processor Family• Differences between cores
Processor modes Pipeline Architecture Memory protection unit Memory management unit Cache Hardware accelerated Java … and others
9-Oct-19 /erau/cec320/s19/btd
1010
ARM Processor Family• Family
IP processor specifications available from ARM
Allows for backwards compatibility and code re-use
Another common family is x86
9-Oct-19 /erau/cec320/s19/btd
x86 and ARM SoCKey Distinctions between and MCU and an SoC– Both are Single Chip Solutions– SoC includes more processing, memory, and I/O
Multi-Core CPUMemory Controller (Local Bus)
– On-chip Memory (E.g. SRAM)– Off-chip Memory Expansion (E.g. DRAM)– On-chip and Off-chip Persistent Memory (Nand, NOR Flash)
I/O Bus– Expansion I/O Bus (PCIe)– On-chip I/O Bus
Sam Siewert 11
x86 PC System Architecture
for Memory and I/O Bus Interfaces to Peripherals
Intel Altera Cyclone V HPS, Cyclone SoCDual Core ARM Cortex-A9
NVIDIA Tegra K1Quad Cortex-A15
Tegra K1 SoC Detailed Block Diagram
Sam Siewert 12
1313
Comparative Volumes• Approx 12 Billion shipped
in 2013• Approx 50 Billion shipped
prior to 2014• Cortex-A
Most smart phones & tablets
• Cortex-R Real-time & safety critical Expensive
• Cortex-M Embedded & inexpensive
9-Oct-19 /erau/cec320/s19/btd
http://www.anandtech.com/show/7909/arm-partners-ship-50-billion-chips-since-1991-where-did-they-go
1414
Cortex-M Instructions• Each model
adds instructions, but never removes
• M4F adds floating point
9-Oct-19 /erau/cec320/s19/btd
1515
ARM _ISA_ evolution• More
enhancements move further from “RISC”
• Enhancements improve performance on specific tasks Java Security Signal
Processing Encryption
• Enhancements made reasonable by increasing transistor counts
9-Oct-19 /erau/cec320/s19/btd
1616
Modern ARM Variations• Three versions – roughly by time & size• We will FOCUS on the Cortex series devices
9-Oct-19 /erau/cec320/s19/btd
1717
Cortex-A IP implementations
• The designations above are IP cores available from ARM for license
• The ARM ISA has a set of IP implementations available depending upon design requirements
• This table ONLY lists Cortex-A
9-Oct-19 /erau/cec320/s19/btd
1818
ARM Neoverse
• Future ARM plans for server applications• Not the focus of CEC 320
9-Oct-19 /erau/cec320/s19/btd
1919
ARM Design Philosophy• ARM core uses RISC architecture
Reduced instruction set Load store architecture Large number of general purpose registers Parallel executions with pipelines
• But some differences from RISC Enhanced instructions for
Thumb mode DSP instructions Conditional execution instruction 32 bit barrel shifter
9-Oct-19 /erau/cec320/s19/btd
2020
ARM Viewing• How to Choose your ARM Cortex-M
Processor https://youtu.be/qvrmOXtOpvw Good first examination
9-Oct-19 /erau/cec320/s19/btd
2121
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
22229-Oct-19 /erau/cec320/s19/btd
ARM data types• Word is 32 bits long.
Cortex-M4F specific
• Word can be divided into four 8-bit bytes.• ARM address space is 32 bits long.• Addressability is a single byte
Instructions are fetched on 32-bit (ARM) or 16-bit (Thumb) boundaries
• ARM has 16 registers specified in instructions operands
2323
Registers• Registers R0 thru R12 are general
purpose registers• R13 is used as stack pointer (sp)• R14 is used as link register (lr)• R15 is used a program counter (pc)• CPSR – Current program status register• SPSR – Stored program status register
A copy of CPSR for previous mode When exception occurs, ARM copies CPSR of
current mode to the related SPSR All privileged modes but System mode
have individual SPSRs
9-Oct-19 /erau/cec320/s19/btd
2424
Cortex-M Programmers’ Model• R15-R0
Sixteen “general-purpose” registers• Special functions
R15 is the Program Counter (PC) If R15 is the destination operand, some instructions
will exhibit special behavior for mode changes R14 is the Link Register (LR)
For subroutine calls and interrupts/exceptions, the return address is stored in LR. It must be saved before calls are made in the subroutine.
R13 is used as the Stack Pointer (SP)
9-Oct-19 /erau/cec320/s19/btd
2525
Programmers’ Model (cont)31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
N Z C V reserved I F T mode
9-Oct-19 /erau/cec320/s19/btd
• Current Process Status Register (CPSR) Condition code flags (N, Z, C, V) Interrupt disable bits (I, F) Thumb mode enable (T)
Never change directly! Operating mode select Reserved bits
Do not alter the state of these bits for compatibility with future ARM products
I, F and mode cannot be changed in user mode!
2626
Program Status Register• Program status register (PSR)
CPSR (Current PSR) is used to control and store CPU states
CPSR is divided in four 8 bit fields Flags Status Extension Control
9-Oct-19 /erau/cec320/s19/btd
2727
Program Status Register• Program status register flags
N:1 – Negative result Z:1 – Result is zero C:1 – Carry in addition operation C:0 – Borrow in subtraction operation V:1 – Overflow or underflow
9-Oct-19 /erau/cec320/s19/btd
2828
Program Status Register• Program status register controls
I:1 – IRQ interrupts disabled F:1 – FIQ interrupts disabled T:0 – ARM Mode T:1 – Thumb Mode
9-Oct-19 /erau/cec320/s19/btd
2929
Program Status Register• Program status register control modes
0b10000 – User mode 0b10001 – FIQ mode 0b10010 – IRQ mode 0b10011 – Supervisor mode 0b10111 – Abort mode 0b11011 – Undefined mode 0b11111 – System mode
9-Oct-19 /erau/cec320/s19/btd
3030
Programmers’ Model (cont)• Suspended Process Status Register (SPSR)
SPSR is only present when the CPU is operating in one of the exception modes Each exception mode has its own SPSR, since
exception handlers may cause other exceptions. SPSR is a copy of the CPSR immediately
before the exception mode was entered. When returning from the exception, the value in
SPSR is used to restore the CPSR to the proper state for the process that was interrupted.
9-Oct-19 /erau/cec320/s19/btd
3131
Operating Modes• User
Normal program execution mode• System
For running operating system tasks at user privilege level• Supervisor
Protected mode for operating system• Abort
Used to implement process and/or memory protection Two classes of aborts – data abort, prefetch abort
• Undefined Supports software emulation of unsupported instructions and
unimplemented hardware coprocessors• FIQ
Fast interrupt handling• IRQ
General purpose interrupt handling9-Oct-19 /erau/cec320/s19/btd
3232
Banked Registers• Cortex – R & Cortex-A specific• Of total 37 registers only 18 are active in a given
register mode
9-Oct-19 /erau/cec320/s19/btd
3333
ARM Banked Register Viewing• ARM Architecture Fundamentals
https://youtu.be/7LqPJGnBPMM Overall – tiring, but some good content 27:00 – 28:20 <- good discussion of
Cortex-M registers 30:15 - 30:45 <- good discussion of
Cortex-M CPSR Only these intervals valid for quizzes/exams At least one error identified @ 32:40ish
9-Oct-19 /erau/cec320/s19/btd
3434
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
3535
ARM Instruction Set
• ARM and Thumb-2 Quick Reference http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/index.html
Lots of good information on infocenter
• Writing ARM Assembly Language http://www.keil.com/support/man/docs/armasm/armasm_babcfejg.htm http://infocenter.arm.com/help/topic/com.arm.doc.dui0473f/Babcfejg.html
More information on assembler application
9-Oct-19 /erau/cec320/s19/btd
3636
UAL – Unified Assembly Language• Unified Assembler Language (UAL)
common syntax for ARM and Thumb instructions supersedes earlier versions of both the ARM and
Thumb assembler languages. Code written using UAL can be assembled for
ARM or Thumb for any ARM processor. By default, the assembler expects source code to be
written in UAL. For many assembler instructions this means there
are multiple machine language representations One in 32-bits (ARM) Another in 16-bits (Thumb)
9-Oct-19 /erau/cec320/s19/btd
37379-Oct-19 /erau/cec320/s19/btd
ARM assembly language• Not all of the instructions• Most common
Instructions have both ARM and Thumb versions
• All of these can be called upon (3) arguments, where (2) must be registers and the last can be a register, a shifted register, or an immediate
3838
ARM assembly language• Comparison instructions
create status register flags (a.k.a. condition codes) Instructions can make
decisions based upon these values
• Note pseudo-operations which are not directly translated but have an alternate expression
• Many more exist –independent study
9-Oct-19 /erau/cec320/s19/btd
39399-Oct-19 /erau/cec320/s19/btd
ARM data instructions• Fairly standard assembly language:
LDR r0,[r8] ; a comment
label ADD r4,r0,r1
• Basic format:ADD r0,r1,r2
Computes r1+r2, stores in r0.• Immediate operand:
ADD r0,r1,#2
Computes r1+2, stores in r0.
409-Oct-19•/erau/cec3
ARM data instructions• ADD, ADC : add (w.
carry)• SUB, SBC : subtract
(w. carry)• RSB, RSC : reverse
subtract (w. carry)• MUL, MLA : multiply
(and accumulate)
• AND, ORR, EOR• BIC : bit clear• LSL, LSR : logical
shift left/right• ASL, ASR : arithmetic
shift left/right• ROR : rotate right• RRX : rotate right
extended with C
41419-Oct-19 /erau/cec320/s19/btd
Data operation varieties• Logical shift:
fills with zeroes.• Arithmetic shift:
fills with ones.• RRX performs 33-bit rotate, including C bit
from CPSR above sign bit.
42429-Oct-19 /erau/cec320/s19/btd
ARM comparison instructions• CMP : compare• CMN : negated compare• TST : bit-wise test• TEQ : bit-wise negated test• These instructions set only the NZCV bits
of CPSR.
43439-Oct-19 /erau/cec320/s19/btd
ARM move instructions• MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1
44449-Oct-19 /erau/cec320/s19/btd
ARM load/store instructions• LDR, LDRH, LDRB : load (half-word,
byte)• STR, STRH, STRB : store (half-word, byte)• Addressing modes:
register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4]
45459-Oct-19 /erau/cec320/s19/btd
Additional addressing modes• Base-plus-offset addressing:
LDR r0,[r1,#16]
Loads from location r1+16• Auto-indexing increments base register:
LDR r0,[r1,#16]!
• Post-indexing fetches, then does offset:LDR r0,[r1],#16
Loads r0 from r1, then adds 16 to r1.
469-Oct-19 /erau/cec320/s19/btd
From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.
4747
Addressing Modes
• Illustrates why many (x86 has 19) addressing modes are not necessary
• ARM has very flexible addressing modes, designed for limited hardware costs
9-Oct-19 /erau/cec320/s19/btd
48489-Oct-19 /erau/cec320/s19/btd
ARM ADR pseudo-op• Cannot refer to an address directly in an
instruction.• Generate value by performing arithmetic on
PC.• ADR pseudo-op generates instruction
required to calculate address:ADR r1,FOO
49499-Oct-19 /erau/cec320/s19/btd
Example: C assignments• C:
x = (a + b) - c;
• Assembler:ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of c
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x
50509-Oct-19 /erau/cec320/s19/btd
Example: if statement• C:
if (a > b)
{
x = 5;
y = c + d;
}
else
x = c - d;
51519-Oct-19 /erau/cec320/s19/btd
If statement, cont’d.• Assembler:; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; calculate NZCV for (a-b)
BLE fblock ; if (a-b)<=0, branch to false block
; true block
MOV r0,#5 ; generate value for x
ADR r4, x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
: :
52529-Oct-19 /erau/cec320/s19/btd
If statement, cont’d.ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
5353
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
54
ARM Instruction Encoding31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Multiply (accumulate) cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm
Multiply (accumulate) long cond 0 0 0 0 1 U A S Rd_MSW Rd_LSW Rn 1 0 0 1 Rm
Branch and exchange cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn
Single data swap cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm
Halfword data transfer, register offset cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 0 1 1 Rm
Halfword data transfer, immediate offset cond 0 0 0 P U 1 W L Rn Rd offset 1 0 1 1 offset
Signed data transfer (byte/halfword) cond 0 0 0 P U B W L Rn Rd addr_mode 1 1 H 1 addr_mode
Data processing and PSR transfer cond 0 0 I opcode S Rn Rd operand2
Load/store register/unsigned byte cond 0 1 I P U B W L Rn Rd addr_mode
Undefined cond 0 1 1 1
Block data transfer cond 1 0 0 P U 0 W L Rn register list
Branch cond 1 0 1 L offset
Coprocessor data transfer cond 1 1 0 P U N W L Rn CRd CP# offset
Coprocessor data operation cond 1 1 1 0 CP opcode CRn CRd CP# CP 0 CRm
Coprocessor register transfer cond 1 1 1 0 CP opc L CRn Rd CP# CP 1 CRm
Software interrupt cond 1 1 1 1 ignored by processor
9-Oct-19 /erau/cec320/s19/btd
5555
ARM Instruction Encoding
9-Oct-19 /erau/cec320/s19/btd
5656
Data processing instruction encodings
• Many ALU operations utilize this format
• Has multiple options for 2nd
operand Register
Shifted or not
Immediate Rotated or
not
cond 0 0 operand 2# opcode S Rn Rd31 28 2726 25 24 21 20 19 1615 12 11 0
destination registerfirst operand registerset condition codesarithmetic/logic function
8-bit immediate125 11 8 7 0
#rot
Rm11 7 6 5 4 3 0
#shift
Rm
025
11 8 7 6 5 4 3 0
Rs
Sh 0
10 Sh
immediate alignment
immediate shift lengthshift type
second operand register
register shift length
9-Oct-19 /erau/cec320/s19/btd
5757
ARM data processing opcodes
Opco de[2 4 :2 1 ]
Mnemo ni c Meani ng Effect
0000 AND Logical bit-wise AND Rd := Rn AND Op20001 EOR Logical bit-wise exclusive OR Rd := Rn EOR Op20010 SUB Subtract Rd := Rn - Op20011 RSB Reverse subtract Rd := Op2 - Rn0100 ADD Add Rd := Rn + Op20101 ADC Add with carry Rd := Rn + Op2 + C0110 SBC Subtract with carry Rd := Rn - Op2 + C - 10111 RSC Reverse subtract with carry Rd := Op2 - Rn + C - 11000 TST Test Scc on Rn AND Op21001 TEQ Test equivalence Scc on Rn EOR Op21010 CMP Compare Scc on Rn - Op21011 CMN Compare negated Scc on Rn + Op21100 ORR Logical bit-wise OR Rd := Rn OR Op21101 MOV Move Rd := Op21110 BIC Bit clear Rd := Rn AND NOT Op21111 MVN Move negated Rd := NOT Op2
9-Oct-19 /erau/cec320/s19/btd
5858
Conditional Execution• Most instruction sets only allow branches to be executed
conditionally.• However by reusing the condition evaluation hardware,
ARM effectively increases number of instructions. All instructions contain a condition field which determines
whether the CPU will execute them. Non-executed instructions soak up 1 cycle.
Still have to complete cycle so as to allow fetching and decoding of following instructions.
• This removes the need for many branches, which stall the pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional
instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed.
9-Oct-19 /erau/cec320/s19/btd
5959
ARM Condition CodesOpcode[31:28]
Mnemonicextension Meaning Condition flag state
0000 EQ Equal Z==1
0001 NE Not equal Z==0
0010 CS/HS Carry set / unsigned higher or same C==1
0011 CC/LO Carry clear / unsigned lower C==0
0100 MI Minus / negative N==1
0101 PL Plus / positive or zero N==0
0110 VS Overflow V==1
0111 VC No overflow V==0
1000 HI Unsigned higher (C==1) AND (Z==0)
1001 LS Unsigned lower or same (C==0) OR (Z==1)
1010 GE Signed greater than or equal N == V
1011 LT Signed less than N != V
1100 GT Signed greater than (Z==0) AND (N==V)
1101 LE Signed less than or equal (Z==1) OR (N!=V)
1110 AL Always (unconditional) Not applicable
1111 (NV) Never Obsolete, ARM7TDMI unpredictable
9-Oct-19 /erau/cec320/s19/btd
6060
Thumb Instruction Set
9-Oct-19 /erau/cec320/s19/btd
6161
Instruction Set Advantages• ARM
All instructions are 32 bits long. Most instructions are executed in one single cycle. Every instructions can be conditionally executed. A load/store architecture
Data processing instructions act only on registers Three operand format Combined ALU and shifter for high speed bit manipulation
Specific memory access instructions with powerful auto-indexing addressing modes 32 bit ,16 bit and 8 bit data types Flexible multiple register load and store instructions
• Thumb All instructions are exactly 16 bits long to improve code density over other 32-bit
architectures The Thumb architecture still uses a 32-bit core, with:
32-bit address space 32-bit registers 32-bit shifter and ALU 32-bit memory transfer
Gives.... Long branch range Powerful arithmetic operations Large address space
9-Oct-19 /erau/cec320/s19/btd
6262
ARM vs. Thumb size• Generally, routines in THUMB code are
between 65 and 70% the size of the equivalent ARM code.
65% 70% 75%60%% of ARM code size
9-Oct-19 /erau/cec320/s19/btd
6363
Code performances vs Memory width
This figure shows performance in Dhrystone 2.1 MIPS of an ARM7TDMI with 8, 16 and 32-bit wide memory systems. From 32-bit wide memory, ARM code is executed at one instruction per cycle. However, in narrower memory systems, From 16-bit memory, 2 cycles are required while from 8-bit memory the processor generates 4 wait cycles. The Thumb version however can still execute at one instruction per cycle from 16-bit memory, or 2 cycles from 8-bit memory. It therefore has better performance with narrow memory.
9-Oct-19 /erau/cec320/s19/btd
6464
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
6565
Address Space• The standard
ARM C programaddress space model Others COULD
be used – not advised
• The address space is 32 bits
9-Oct-19 /erau/cec320/s19/btd
6666
hw_memmap.h• Defines the base addresses
of all peripherals in the TM4C system
• Each of these peripheral circuits is communicated with through read/write operations to a I/O interface mapped into memory
• Allows use of conventional load/store instructions to configure and/or communicate with peripherals
9-Oct-19 /erau/cec320/s19/btd
6767
ARM Reserved Addresses0x00000000 Reset0x00000004 Undefined instruction exception0x00000008 Software interrupt0x0000000C Prefetch abort exception0x00000010 Data abort exception0x00000014 Reserved0x00000018 Interrupt request (IRQ)0x0000001C Fast interrupt request (FIQ)
9-Oct-19 /erau/cec320/s19/btd
68689-Oct-19 /erau/cec320/s19/btd
Endianness• Relationship between
byte within word ordering defines endianness:
• Only significant for multi-byte accesses
• Little : IntelBig : MotorolaBi-Endian : ARM
byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3
bit 31 bit 0 bit 0 bit 31
little-endian big-endian
• Value being written 0x012345678
• MSB : 0x01LSB : 0x78
6969
2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
7070
ARM Procedure Call Standard• Support for high-level languages• In some areas it is important to adopt
software-defined ‘standard’ solutions the ARM Procedure Call Standard (APCS) is
an example it provides a regular way for procedures to
operate
9-Oct-19 /erau/cec320/s19/btd
7171
ARM Procedure Call Standard• The APCS defines:
particular uses for the ‘general-purpose’ registers
the form of stack to be used a stack-based data structure for backtracing an argument and result passing mechanism support for shared (re-entrant) libraries
9-Oct-19 /erau/cec320/s19/btd
7272
APCS Register Use Convention
9-Oct-19 /erau/cec320/s19/btd
7373
APCS Argument and Result Passing
• The arguments are arranged into a list of words the first 4 arguments are passed in a1 - a4 the remaining arguments are passed via the
stack• A simple result is returned via a1
more complex results are passed via memory, using a1 as the pointer
9-Oct-19 /erau/cec320/s19/btd
74749-Oct-19 /erau/cec320/s19/btd
Runtime Stack in procedure calls
• Each unit is a stack frame• Frame specifics differ between architectures & even compilers
75759-Oct-19 /erau/cec320/s19/btd
Alternate Procedure Stack• Not optimal – Would
use consistent colors for: Saved State Arguments Local Variables Dynamic Stack Usage
• Call Depth of (4)• Each of (N) is a
unique procedure
7676
A Typical Frame Organization• Shows:
Fp – frame pointerConstant for procedure lifetime
Sp – stack pointmay change as local variables are added or removed
Activation recordThe contiguous block of memory on the stack corresponding to a procedure
9-Oct-19 /erau/cec320/s19/btd
77779-Oct-19 /erau/cec320/s19/btd
ARM subroutine linkage• Branch and link instruction:
BL foo
Copies current PC to r14.• To return from subroutine:
MOV r15, r14
78789-Oct-19 /erau/cec320/s19/btd
Nested subroutine calls• Nesting/recursion requires coding
convention:f1 LDR r0,[r13] ; load arg into r0 from stack
; call f2()
STR r13!,[r14] ; store f1’s return adrs
STR r13!,[r0] ; store arg to f2 on stack
BL f2 ; branch and link to f2
; return from f1()
SUB r13, #4 ; pop f2’s arg off stack
LDR r13!,r15 ; restore register and return
7979
2.2 ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware
9-Oct-19 /erau/cec320/s19/btd
8080
Cortex-A57• ARM’s
64-bit (ARM v8)IP core
• 15-24 stage pipeline
• 3 wide execution
• http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/5
9-Oct-19 /erau/cec320/s19/btd
8181
big.LITTLE
9-Oct-19 /erau/cec320/s19/btd
• Heterogeneous Multi-Core• Combine high-performance and high-efficiency cores on
a single die and choose core based on task characteristics
• Up to 70% energy reduction on common workloads• Transparent to software / Apps; Managed by O/S
https://www.mobilegeeks.de/samsung-plant-neues-chromebook-mit-arm-big-little-prozessor-octacore/
8282
Power / Performance Relationship• Relationship is a function of microarchitecture &
fabrication technology• At high performance levels, sub linear performance
improvement when power is increased
9-Oct-19 /erau/cec320/s19/btd
http://cdn2.ubergizmo.com/wp-content/uploads/2013/01/ARM-Big-LITTLE-03.jpg
8383
big.LITTLE Viewing• ARM big.LITTLE Technology Explained
https://youtu.be/KClygZtp8mA Starts & Ends as marketing Technical in middle of the video Significant OS discussion beyond microprocessors
• ARM DynamIQ Redefines Multi-Core Computing https://youtu.be/qPGTP_ZxDyY Good Heterogeneous animation Technical – but high level
9-Oct-19 /erau/cec320/s19/btd
8484
Chap2B. ARM Processor (Cortex-M)• Summary
Assembly language is UAL – atypical in that it has multiple encodings (16/32-bit)
Very flexible instructions Using RISC lessons
Load/store General purpose registers
With complex functionality Rotate, shift, mask on one operand in many instructions Conditional execution Banked Registers
Handles branches and conditional execution efficiently Good for general purpose program encoding AMAZING number of hardware microarchitecture’s available
Common family allows for code compatibility & ease of development efforts
9-Oct-19 /erau/cec320/s19/btd
85859-Oct-19 /erau/cec320/s19/btd
Summary (Wolf)• Load/store architecture• Most instructions are RISCy, operate in
single cycle. Some multi-register operations take longer.
• All instructions can be executed conditionally.
86869-Oct-19 /erau/cec320/s19/btd
Example: C assignment• C:
y = a*(b+c);
• Assembler:ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
87879-Oct-19 /erau/cec320/s19/btd
C assignment, cont’d.MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
88889-Oct-19 /erau/cec320/s19/btd
Example: C assignment• C:
z = (a << 2) | (b & 15);
• Assembler:ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
89899-Oct-19 /erau/cec320/s19/btd
C assignment, cont’d.ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
90909-Oct-19 /erau/cec320/s19/btd
ARM flow of control• All operations can be performed
conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE,
LT, GT, LE• Branch operation:
B #100
Can be performed conditionally.
91919-Oct-19 /erau/cec320/s19/btd
Example: FIR filter• C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
• Assembler; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
92929-Oct-19 /erau/cec320/s19/btd
FIR filter, cont’.dADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
93939-Oct-19 /erau/cec320/s19/btd
Example: Conditional instruction implementation
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for y
STRLT r0,[r4] ; store y
94949-Oct-19 /erau/cec320/s19/btd
Example: switch statement• C:
switch (test) { case 0: … break; case 1: … }
• Assembler:ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...