The ARM Architecture · Program Status Registers 31 27 N Z C V Q 28 7 6 I F T mode 24 23 16 15 8 54...

Preview:

Citation preview

The ARM Architecture

Delivered at San Jose State University

28/4/10

1

AgendaIntroduction to ARM LtdARM Architecture/Programmers ModelD t P th d Pi liData Path and PipelinesAMBADevelopment ToolsDevelopment Tools

2

ARM Ltd

Founded in November 1990Spun out of Acorn ComputersSpun out of Acorn Computers

Designs the ARM range of RISC processor coresLicenses ARM core designs to semiconductorLicenses ARM core designs to semiconductor partners who fabricate and sell to their customers.

ARM does not fabricate silicon itself

Also develop technologies to assist with the design-in of the ARM architecture

Software tools, boards, debug hardware, li ti ft b hit tapplication software, bus architectures,

peripherals etc

3

ARM’s Activities

Connected Community

Software IP

Development Tools

Connected Community

memorymemory

SoCSoC

ProcessorsSystem Level IP:Data Engines

SoCSoC

Fabric3D Graphics

4

Physical IP

ARM Connected Community – 550+

5 5

Nokia N95 Multimedia Computer OMAP™ 2420OMAP™ 2420

Applications ProcessorARM1136™ processor-based

SoC, developed using Magma ® Blast® family and winner of

Symbian OS™ v9.2Operating System supporting ARM

b d bil d i

y2005 INSIGHT Award for ‘Most

Innovative SoC’

processor-based mobile devices, developed using ARM® RealView®

Compilation Tools

S60™ 3rd EditionS60 Platform supporting ARM

Mobiclip™ Video CodecSoftware video codec for ARM

processor-based mobile devices

pp gprocessor-based mobile devices

ST WLAN SolutionUltra-low power 802.11b/g WLAN

chip with ARM9™ processor-based MAC

6

Connect. Collaborate. Create.

Applications

7 7

8

AgendaIntroduction to ARM LtdARM Architecture/Programmers ModelD t P th d Pi liData Path and PipelinesAMBADevelopment ToolsDevelopment Tools

9

Architecture VersionsARMv7-Cortex

x1-4

Cortex-A9

Cortex-A8

x1-4ARMv6

ARM1176JZ(F)-S™

Cortex A8

ARM11™ MPCore™

ARMv5

ARM1156T2(F)-S™

ARM1136J(F)-S™

ARM1026EJ-S™

C t R4

Cortex-R4F

ARM966E-S™

ARM968E-S™

ARM926EJ-S™

ARM946E-S™

Cortex-R4

ARMv4

SC200™ARM7EJ-S™

ARM922T™ARM920T™

SC300™Cortex™-M3

10 10

SC100™ARM7TDMI(S)™ Cortex-M1

Relative Performance*

1200mW/MHz

800

1000

F (MH )

0.43

0.568

mW/MHz

200

400

600Freq (MHz)0.36

0.335

0.235

0 25

0

200

Processor

xA8

176J

Z-S

26E

J-S

920T

7TD

MI

136J

-S

026E

J-S

0.250.35

Cor

te

AR

M11

AR

M92

ARM

9

ARM

7

AR

M1

AR

M10

11

*Represents attainable speeds in 130, 90 or 65nm processes

Cortex family

Cortex-A8Architecture v7AMMU

Cortex-R4Architecture v7RMPU (optional)

Cortex-M3Architecture v7MMPU (optional)MMU

AXIVFP & NEON support

MPU (optional)AXI Dual Issue

MPU (optional)AHB Lite & APB

12

Data Sizes and Instruction SetsThe ARM is a 32-bit architecture.

Wh d i l ti t th ARMWhen used in relation to the ARM:Byte means 8 bitsHalfword means 16 bits (two bytes)( y )Word means 32 bits (four bytes)

’Most ARM’s implement two instruction sets32-bit ARM Instruction Set16-bit Thumb Instruction Set16 bit Thumb Instruction Set

Jazelle cores can also execute Java bytecode

13

y

ARM and Thumb Performance

30000

20000

25000

Dhrystone 2.1/sec

10000

15000 ARMThumb

Dhrystone 2.1/sec@ 20MHz

0

5000

Memory width (zero wait state)

32-bit 16-bit 16-bit with 32-bit stack

14

Thumb-2 Instruction Set

Second generation of the Thumb architectureBlended 16-bit and 32-bit instruction set 25% faster than Thumb

EEMBC Analysis - Performance

30% smaller than ARM

Increases performance but maintains code densitydensity

Maximizes cache and tightly coupled memory usage

EEMBC Analysis – Code Size

15

Processor Modes

The ARM has seven basic operating modes:

U i il d d d hi h t t kUser : unprivileged mode under which most tasks run

FIQ : entered when a high priority (fast) interrupt is raised

IRQ : entered when a low priority (normal) interrupt is raised

Supervisor : entered on reset and when a Software Interrupt instruction is executed

Abort : used to handle memory access violations

Undef : used to handle undefined instructions

16

System : privileged mode using the same registers as user mode

The ARM Register Set

Current Visible RegistersCurrent Visible RegistersCurrent Visible RegistersCurrent Visible RegistersCurrent Visible RegistersCurrent Visible Registersr0

r1

r2

r3

r4

User Mode r0

r1

r2

r3

r4Banked out Registers

r0

r1

r2

r3

r4Banked out Registers

FIQ ModeIRQ Mode r0

r1

r2

r3

r4Banked out Registers

Undef Mode r0

r1

r2

r3

r4Banked out Registers

SVC Mode r0

r1

r2

r3

r4Banked out Registers

Abort Mode r0

r1

r2

r3

r4Banked out Registers

r4

r5

r6

r7

r8 r8

FIQ IRQ SVC Undef Abort

r4

r5

r6

r7

r8 r8

FIQ IRQ SVC Undef Abort

r4

r5

r6

r7

r8

User IRQ SVC Undef Abort

r8

r4

r5

r6

r7

r8 r8

User FIQ SVC Undef Abort

r4

r5

r6

r7

r8 r8

User FIQ IRQ SVC Abort

r4

r5

r6

r7

r8 r8

User FIQ IRQ Undef Abort

r4

r5

r6

r7

r8 r8

User FIQ IRQ SVC Undef

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r9

r10

r11

r12

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r15 (pc)

cpsr

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

r13 (sp)

r14 (lr)

17

p

spsrspsr spsr spsrspsr

p

spsrspsr spsr spsrspsr

p

spsrspsr spsr spsrspsr

p

spsrspsr spsr spsrspsr

p

spsr spsr spsr spsrspsr

p

spsrspsrspsr spsrspsr

p

spsrspsr spsrspsr spsr

Exception Handling

When an exception occurs, the ARM:Copies CPSR into SPSR <mode>Copies CPSR into SPSR_<mode>Sets appropriate CPSR bits

Change to ARM state FIQIRQ

0x1C0x18Change to exception mode

Disable interrupts (if appropriate)Stores the return address in LR <mode>

IRQ(Reserved)Data Abort

Prefetch Abort

0x18

0x14

0x100x0CStores the return address in LR_<mode>

Sets PC to vector address

To return, exception handler needs to:

Prefetch AbortSoftware Interrupt

Undefined Instruction

Reset

0x0C

0x08

0x04

0x00

Vector TableRestore CPSR from SPSR_<mode>Restore PC from LR_<mode>

This can only be done in ARM stateVector table can be at

0xFFFF0000 on ARM720Td ARM9/10 f il d i

18

This can only be done in ARM state. and on ARM9/10 family devices

Program Status Registers2731

N Z C V Q

28 67

I F T mode

1623 815 5 4 024

f s x c

U n d e f i n e dJ

Condition code flagsN = Negative result from ALU Z = Zero result from ALU

Interrupt Disable bits.I = 1: Disables the IRQ.F = 1: Disables the FIQ.

C = ALU operation Carried outV = ALU operation oVerflowed T Bit

Architecture xT onlySticky Overflow flag - Q flag

Architecture 5TE/J onlyIndicates if saturation has occurred

T = 0: Processor in ARM stateT = 1: Processor in Thumb state

J bitArchitecture 5TEJ onlyJ = 1: Processor in Jazelle state

Mode bitsSpecify the processor mode

19

Cortex-M3 Programmer’s Model

Fully programmable in C r0

r1

Main

Stack-based exception modelOnly two processor modes

Th d M d f U t k

r2

r3

r4

r5

6Thread Mode for User tasksHandler Mode for OS tasks and exceptions

Vector table contains addressesr8

r9

r10

r6

r7

Process

r10

r11

r12

sp

lrsp

r15 (pc)

xPSR

20

Conditional Execution and Flags

ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field.

This improves code density and performance by reducing the number ofThis improves code density and performance by reducing the number of forward branch instructions.CMP r3,#0 CMP r3,#0BEQ skip ADDNE r0,r1,r2Q p , ,ADD r0,r1,r2

skip

By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”.

loop…SUBS r1,r1,#1BNE loop if Z flag clear then branch

decrement r1 and set flags

21

Classes of Instructions (v4T)

Load/Store

Miscellaneous

Data Operations

MOV PC, RmBcc

Change of Flow

BccBLBLX

22

Data processing InstructionsConsist of :

Arithmetic: ADD ADC SUB SBC RSB RSCLogical: AND ORR EOR BICComparisons: CMP CMN TST TEQData movement: MOV MVN

These instructions only work on registers, NOT memory.

Syntax:y

<Operation>{<cond>}{S} Rd, Rn, Operand2

Comparisons set flags only - they do not specify RdData movement does not specify RnSecond operand is sent to the ALU via barrel shifter

23

Second operand is sent to the ALU via barrel shifter.

Using a Barrel Shifter:The 2nd Operand

Register, optionally with shift operationShift value can be either be:

Operand 1

Operand 2 Shift value can be either be:

5 bit unsigned integerSpecified in bottom byte of another register

1

Barrel

2

another register.Used for multiplication by constant

Immediate value

BarrelShifter

Immediate value8 bit number, with a range of 0-255.

Rotated right through even number of positions

ALUnumber of positions

Allows increased range of 32-bit constants to be loaded directly into registersResult

24

registersResult

Single register data transferLDR STR WordLDRB STRB ByteLDRH STRH HalfwordLDRH STRH HalfwordLDRSB Signed byte loadLDRSH Signed halfword loadLDRSH Signed halfword load

Memory system must support all access sizes

Syntax:LDR{<cond>}{<size>} Rd, <address>STR{<cond>}{<size>} Rd, <address>

25

e.g. LDREQB

AgendaIntroduction to ARM LtdARM Architecture/Programmers Model D t P th d Pi liData Path and PipelinesAMBADevelopment ToolsDevelopment Tools

26

The ARM7TDM Core

AddressIncrementer

Address Register

A[31:0]ABE

Incrementer

nWAITMCLKBIGEND

Address Register

Register Bank PC Update

Incrementer

PC

Multiplier

InstructionDecoder

nRESET

nIRQnFIQ

nRWMAS[1:0]

ISYNC

Register Bank p

Decode StageInstruction

Decompression

A B

A

L

U p nRESET

nMREQSEQ

ABORT

LOCK

nTRANS

nM[4:0]

BarrelShifter

Read Data Register

and

Control Logic

B

u

s

B

u

s

B

u

s

nCPICPACPB

nOPC

nM[4:0]

32 Bit ALUWrite Data Register

Logic

27

D[31:0]DBE

Pipeline changes for ARM9TDMI

RegRead Shift ALU Reg

WriteThumb→ARMdecompress

ARM decodeInstructionFetch

ARM7TDMI

ARM9TDMI

Reg Selectdecompress

FETCH DECODE EXECUTE

InstructionFetch Shift + ALU Memory

AccessReg

WriteRegR d

RegD d

ARM9TDMIARM or Thumb

Inst Decode

ReadDecode

FETCH DECODE EXECUTE MEMORY WRITE

28

ARM10 vs. ARM11 Pipelines

Shift + ALU MemoryA RR R d

BranchPrediction ARM or

Th b

ARM10

S t U Access RegWrite

FETCH DECODE EXECUTE MEMORY WRITE

Reg Read

Multiply

Prediction

InstructionFetch

ISSUE

ThumbInstruction

Decode Multiply Add

ARM11

FETCH DECODE EXECUTE MEMORY WRITEISSUE

Shift ALU Saturate

Fetch1

Fetch2 Decode Issue Write

backMAC

1MAC

2MAC

3

AddressData

CacheData

Cache

29

Address Cache1

Cache2

Full Cortex-A8 Pipeline Diagram13-Stage Integer Pipeline 10-Stage NEON Pipeline

Architectura

NE

ONal register file

register file

30

AgendaIntroduction to ARM LtdARM Architecture/Programmers ModelD t P th d Pi liData Path and PipelinesAMBADevelopment ToolsDevelopment Tools

31

An Example AMBA System

High PerformanceARM processor UART

APBARM processor

HighBandwidth APB

UART

TimerAHBExternalMemoryInterface

APBBridge

Keypad

High-bandwidthon-chip RAM

DMABus Master

PIO

Low Power

High PerformancePipelinedBurst SupportMultiple Bus Masters

Non-pipelinedSimple Interface

32

p

AHB Structure

Arbiter

HADDRHWDATA

Master#1

Slave#1

HADDRHWDATA

HRDATA

HADDR

HRDATA

Master#2

Slave#2

Address/Control

Master#3

Slave#3

Write Data

Read Data

Decoder

#3

Slave#4

33

AgendaIntroduction to ARM LtdARM Architecture/Programmers ModelD t P th d Pi liData Path and PipelinesAMBADevelopment ToolsDevelopment Tools

34

Keil Development Tools for ARM

Includes ARM macro assembler, compilers (ARM RealView C/C++ Compiler Keil CARM Compiler or GNU compiler) ARM linker Keil uVisionCompiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil uVision Debugger and Keil uVision IDEKeil uVision Debugger accurately simulates on-chip peripherals (I2C, CAN, UART SPI Interrupts I/O Ports A/D and D/A converters PWM etc )UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc.)

Evaluation Limitations16K byte object code + 16K data limitationSome linker restrictions such as base addresses for code/constantsGNU tools provided are not restricted in any way

http://www.keil.com/demo/

35

Keil Development Tools for ARM

36

37

University Resources

http://www.arm.com/support/university/

University@arm.com

38

Beagle Board

39

$ Wikis blogsTargeting community development

$149> 1000 participants

and growing

Wikis, blogs, promotion of community

activity

Personally affordable

Freedom to innovate

Active & technical

community

Open access to h d

y

Instant access to >10 million lines

Addressing open source communityhardware

documentation

FreeOpportunity

>10 million lines of code

communityneeds

Freesoftware

Opportunity to tinker and

learn

40

Fast, low power, flexible expansion

OMAP3530 Processor600MHz Cortex-A8

NEON+VFPv3

Peripheral I/ODVI-D video outNEON+VFPv3

16KB/16KB L1$256KB L2$

430MHz C64x+ DSP

SD/MMC+S-Video outUSB 2.0 HS OTG

3”

32K/32K L1$48K L1D32K L2

I2C, I2S, SPI,MMC/SDJTAG

PowerVR SGX GPU64K on-chip RAM

POP Memory

Stereo in/outAlternate powerRS-232 serialPOP Memory

128MB LPDDR RAM256MB NAND flash USB Powered

2W maximum consumption

41

OMAP is small % of thatMany adapter options

Car, wall, battery, solar, …

On-going collaboration at BeagleBoard.orgLive chat via IRC for 24/7 community support

And more…

Other Features4 LEDs

Live chat via IRC for 24/7 community supportLinks to software projects to download

Peripheral I/ODVI-D video outSD/MMC

3”4 LEDsUSR0USR1PMU STAT

SD/MMC+S-Video outUSB HS OTGI2C I2S SPI

_PWR

2 buttonsUSER

I2C, I2S, SPI,MMC/SDJTAG

RESET4 boot sources

SD/MMCNAND flash Stereo in/out

Alternate powerRS-232 serial

NAND flashUSBSerial

42

Project Ideas Using BeagleOS Projects

OS porting to ARM/Cortex (TI OMAP), such as open source FreeBSDMythTV systemMythTV system“Super-Beagle” – stack of Beagles as compute engine and task distribution

NEON Optimization ProjectsCodec optimization in ffmpeg (pick your favorite codec)Codec optimization in ffmpeg (pick your favorite codec)Voice and image recognitionOpen-source Flash player optimizations (swfdec)

43

Fin

44

Recommended