19
Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual

Introduction to the ARM Architecture - FAU · Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual

Embed Size (px)

Citation preview

Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual

Glance into the past

•  Initial ARM Processor developed by Acorn Computers, 1985 •  ARM means: Acorn RISC Machine architecture •  Architecture was influenced by UC Berkeley’s RISC Project •  RISC means: Reduced Instruction Set Computing •  RISC vs. CISC: remember GRA Lecture ?

•  // TODO: Show OLD advertisement with fancy 80s jingle

Why ? •  Architectural simplicity can be beneficial:

è small implementations è possibly low power consumption

•  Keyfeatures: Keeping implementation size small while maintaining

reasonable performance and low power consumption.

•  Example: Jetson K1 board does not exceed 15W, even under heavy load •  ARM Architecture is suitable for embedded Applications, and even HPC

nowadays: ARM Cluster in Spain.

About the ARM Cortex-A15 MPCore •  Implements ARMv7-A architecture

•  32 bit processor core, licensed by ARM

•  Can access 40 bit large physical addresses (thus up to 1TB RAM)

•  15 Stage Integer, 17-25 Stage FP pipeline

•  NEON extension (ARMS way of doing SIMD)

•  Out of order speculative issue 3-way superscalar execution pipeline

•  32 KB data + 32 KB instruction L1 cache per core

•  Integrated low-latency L2 cache controller, up to 4 MB per cluster

About the Cortex-A15 (ARMv7-A) Cortex A15 Multiprocessor Functionality •  L2 Cache with Snoop Control Unit for cache coherency

Stuff to know about the Cortex-A15 •  Better memory system performance than former models •  Floating point performance enhanced •  Multicore functionality for scalability •  Wider pipelines for higher instruction throughput

About the Architecture (ARMv7-A) •  32 Bit ARM Architecture •  Offering hardware floating point unit and various RISC features •  Most often used architecture in mobile devices •  Three profiles, describe in more detail later: •  A = application, R = real time, M = microcontroller •  Fixed instruction width of 32 bit •  Almost single clock-cycle execution of most instruction

ARMv7 Variants •  ARMV7-A: Traditional ARM architecture with multiple modes, supports ARM

and Thumb instruction set ( 16 bit instruction set with subset functionality of ARM instruction set è better code density ). Supports virtual memory system based on an MMU.

•  ARMV7-R: Realtime profile with multiple modes, supports ARM and Thumb instruction set. Supports protected memory system, based on memory protection unit.

•  ARMV7-M: Microcontroller profile, designed for low-latency interrupt processing, implements some variant of protected memory system.

Core data types •  Data types in memory: •  Byte: 8 bits •  Halfword: 16 bits •  Word: 32 bits •  Doubleworld: 64 bits

•  Data types in registers, supported by instruction set: •  32-bit pointers •  unsigned or signed 32-bit integers •  unsigned 16-bit or 8-bit integers •  signed 16-bit or 8-bit integers •  two 16-bit integers packed into a register •  four 8-bit integers packed into a register •  unsigned or signed 64-bit integers held in two registers

Core data types •  Load and store operations transfer bytes, halfwords or words to and from

memory. •  Instruction set supports also instructions that transfer two or more words to and

from memory

About the Architecture (ARMv7-A) •  ARM implements typical RISC features: •  Large and uniform register file (some ARM processors had over 60 64bit

registers)

•  Load/store Architecture è data-processing operation operate only on registers content, not on memory è more uniform non-functional behavior of instructions ( //TODO: ask students why ? )

•  Simple addressing modes: load/store addresse are computed from register

contents and instruction fields only

About the Architecture (ARMv7-A) •  Other ARM features: •  Combined shift/arithmetic shift/logic operations

•  Load and Store multiple instructions è maximizing data throughput •  Multiple registers can be loaded from a block of consecutive memory

•  Conditional execution of all instructions: Used to be ARMs substitute for a Branch predictor, code gets executed depending on condition of flags in Application Program Status Register, thus keeping number of used branches small and speeding up execution, while saving silicon for a branch predictor.

About the Architecture (ARMv7-A) •  Conditional Execution example: gcd algorithm in C:

•  Normal way with branches

•  Better way for ARM with conditional execution feature

•  BUT: Modern ARM processor DO actually have branch prediction units

About the Architecture (ARMv7-A) •  Core Registers •  Thirteen general-purpose 32-bit registers, R0-R12

•  Three 32-bit Registers for special use, SP (stack pointer), PC (program counter) and LR(link register)

About the Architecture (ARMv7-A) •  SP: Stack pointer, points to the address of the upmost stack element. Could be

used for other things than holding a stack pointer when using ARM instruction set, but that is likely to break stuff according to the manual

•  LR: Link Register, holds address where a called function should return to when it completes. More efficient than popping the return address from the memory where the stack is situated at. Nice, when calling a leaf routine for example.

•  PC: Program Counter, reads address of current instruction plus 8 bytes. è Legacy thing, from when the pipeline was only three stages deep.

About the Architecture (ARMv7-A) •  Application Program Status Register •  32 bit register •  Reports program status •  Contains condition flags such as negative, zero or carry •  Contains an overflow flag •  Contains greater than or equal flags •  Can be used to utilize the conditional execution (as explained earlier)

About the Architecture (ARMv7-A) •  Execution state registers: ISETSTATE,ITSTATE,ENDIANSTATE •  Modify execution of instructions

•  èwhether instruction will be interpreted as Thumb or ARM instruction: ISETSTATE

•  èwhether data is interpreted big-endian or little-indian: ENDIANSTATE

•  No direct access to these registers from application level instructions •  But can be changed due to side-effects of these instructions

About the Architecture (ARMv7-A) •  Execution state registers: ISETSTATE,ITSTATE,ENDIANSTATE •  ITSTATE is a register used for execution of the IT instruction, applying to a

block of up to four instructions following an IT instruction •  IT instruction makes up to four following instructions with conditions that can be

true or not . IT instructions are normally generated by the assembler, because Thumb instruction set does not support conditional execution with C,N,V,Z flags, thus IT instructions are used.

•  It is divided into two subfields •  IT[7:5] Holds base condition for If-Then block. Top 3 bits of condition

code from IT instruction field firstcond •  IT[4:0] Size of the IT block. Value of the LSB of condition code for each

instruction in the block

Why ? •  http://dept.cs.williams.edu/~tom/courses/237/labs/ArmProcessor.pdf •  http://www.arm.com/files/pdf/AT-Exploring_the_Design_of_the_Cortex-A15.pdf •  ARMv7 Architecture Reference •  Wikipedia •  Youtube •  ARM Homepage •  Telegraph.co.uk