Upload
gkreugineraj
View
219
Download
0
Embed Size (px)
Citation preview
7/28/2019 Presentation 05272011
1/55
1
The ARM Architecture
Joe BungoApplications Engineer
ARM University Program
7/28/2019 Presentation 05272011
2/55
2
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools
7/28/2019 Presentation 05272011
3/55
3
ARM Ltd
Founded in November 1990
Spun out of Acorn Computers
Initial funding from Apple, Acorn and VLSI
Designs the ARM range of RISC processor cores
Licenses ARM core designs to semiconductorpartners who fabricate and sell to theircustomers
ARM does not fabricate silicon itself
Also develop technologies to assist with the design-in of the ARM architecture
Software tools, boards, debug hardware
Application software
Bus architectures
Peripherals, etc
7/28/2019 Presentation 05272011
4/55
4
ARMs Activities
memorymemory
SoCSoC
Processors
System Level IP:
Data Engines
Fabric
3D Graphics
Physical IP
Software IP
Development Tools
Connected Community
7/28/2019 Presentation 05272011
5/55
5
ARM Connected Community 700+
5
7/28/2019 Presentation 05272011
6/55
6
Huge Range of Applications
Energy Efficient Appliances
IR FireDetector
IntelligentVending
Tele-parking
UtilityMeters
ExerciseMachinesIntelligent toys
Equipment Adopting 32-bit ARMMicrocontrollers
7/28/2019 Presentation 05272011
7/55
7
Worlds Smallest ARM Computer?
A CB
Wirelessly networked into large scalesensor arrays
Battery Solar Cells
Processor, SRAM and PMU
University of Michigan
Sensors, timers
Cortex-M0 +16KB RAM 65nmUWB Radio antenna
10 kB Storage memory~3fW/bit
12Ah Li-ion Battery
Wireless Sensor Network
Cortex-M0; 65
7/28/2019 Presentation 05272011
8/55
8
Worlds Largest ARM Computer?
4200 ARM poweredNeutrino Detectors
ork supported by the National Science Foundation and University of Wisconsin-Madison
70 bore holes 2.5km deep
60 detectors per stringstarting 1.5km down
1km3 of active telescope
7/28/2019 Presentation 05272011
9/55
9
From 1mm3 to 1km3
1mm3 1km3
10 $1000
Mobile
Embedded Consumer
Mobile Computing Server
Enterprise PC
Home
HPC
7/28/2019 Presentation 05272011
10/55
10
7/28/2019 Presentation 05272011
11/55
11
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
System Design
Development Tools
7/28/2019 Presentation 05272011
12/55
12
ARM Classic Processor Portfolio
ARM922T
SC100
ARM7TDMI(S)
ARM968E-S
ARM946E-S
ARM1176JZ(F)-S
ARM1156T2(F)-S
ARM1136J(F)-S
x1-4
ARM11MPCore
ARM926EJ-S
ARM7EJ-S
ARMv6
ARMv4
ARMv5
7/28/2019 Presentation 05272011
13/55
13
ARM Cortex Processors (v7)
ARM Cortex-A family (v7-A): Applications processors for full OS
and 3rd party applications
ARM Cortex-R family (v7-R):
Embedded processors for real-timesignal processing, control applications
ARM Cortex-M family (v7-M): Microcontroller-oriented processors
for MCU and SoC applications
Cortex-R4
Cortex-A8
SC300
Cortex-M1
Cortex-M3
...2.5GHzx1-4
Cortex-A9
12k gates...
Cortex-M0
Cortex-M4
x1-4
Cortex-A51-2
HeronR
x1-4
Cortex-A15
7/28/2019 Presentation 05272011
14/55
14
Relative Performance*
*Represents attainable speeds in 130, 90, 65, or 45nm processes
Cortex-M0 Cortex-M3 ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8 Cortex-A9Dual-core
Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000
Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5
0
500
1000
1500
2000
2500
MaxFre
quency(Mhz)
7/28/2019 Presentation 05272011
15/55
15
Cortex family
Cortex-A8
Architecture v7A
MMU
AXI
VFP & NEON support
Cortex-R4
Architecture v7R
MPU (optional)
AXI
Dual Issue
Cortex-M3
Architecture v7M
MPU (optional)
AHB Lite & APB
7/28/2019 Presentation 05272011
16/55
16
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.
When used in relation to the ARM:
Byte means 8 bits
Halfword means 16 bits (two bytes)
Word means 32 bits (four bytes)
Most ARMs implement two instruction sets
32-bit ARM Instruction Set
16-bit Thumb Instruction Set
Jazelle cores can also execute Java bytecode
7/28/2019 Presentation 05272011
17/55
17
ARM and Thumb Performance
Memory width (zero wait state)
0
5000
10000
15000
20000
25000
30000
32-bit 16-bit 16-bit with
32-bit stack
ARM
Thumb
Dhrystone 2.1/sec@ 20MHz
7/28/2019 Presentation 05272011
18/55
18
The Thumb-2 instruction set
Variable-length instructions
ARM instructions are a fixed length of 32 bits
Thumb instructions are a fixed length of 16
bits
Thumb-2 instructions can be either 16-bit or
32-bit
Thumb-2 gives approximately 26%improvement in code density over ARM
Thumb-2 gives approximately 25%
improvement in performance over
Thumb
7/28/2019 Presentation 05272011
19/55
19
Cortex-M3 Programmers Model
Fully programmable in C
Stack-based exception model
Only two processor modes
Thread Mode for User tasks
Handler Mode for OS tasks and exceptions
Vector table contains addresses
Process
r8r9
r10
r11
r12
sp
lr
r15 (pc)
xPSR
r0
r1
r2
r3
r4
r5
r6
r7
Main
sp
7/28/2019 Presentation 05272011
20/55
20
ARM Cortex-M3
Application code
OS
System Call (SVCall)Undefined Instruction
Privileged
Cortex-M3 Processor Privilege
Memory
Instructions & Data
Aborts
Interrupts
Reset
Non-Privileged
Supervisor
User
Handler Mode
Thread Mode
7/28/2019 Presentation 05272011
21/55
21
Cortex-M3 Interrupt Handling One Non-Maskable Interrupt (INTNMI) supported
1-240 prioritizable interrupts supported
Interrupts can be masked
Implementation option selects number of interrupts supported
Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core
Interrupt inputs are active HIGH
Cortex-M3Processor Core
INTNMI
NVIC
Cortex-M3
1-240 Interrupts
INTISR[239:0]
7/28/2019 Presentation 05272011
22/55
22
Cortex-M3 Exception Handling
Reset : power-on or system reset
NMI : cannot be stopped or preempted by any exception other than reset
Faults
Hard Fault : default Fault or any fault unable to activate
Memory Manage : MPU violations
Bus Fault : prefetch and memory access violations
Usage Fault : undef instructions, divide by zero, etc.
SVCall : privileged OS requests
Debug Monitor : debug monitor program
PendSV : pending SVCalls
SysTick Interrupt : internal sys timer, i.e., used by RTOS to periodicallycheck resources or peripherals
External Interrupt : i.e., external peripherals
7/28/2019 Presentation 05272011
23/55
23
Cortex-M3 Program Status Register
One Status Register consisting of
APSR - Application Program Status Register ALU flags
IPSR - Interrupt Program Status Register Interrupt/Exception No.
EPSR - Execution Program Status Register
IT field If/Then block information
ICI field Interruptible-Continuable Instruction information
xPSR
Composite of the 3 PSRs
Stored on the stack on exception entry
IT/ICIIT
2731
N Z C V Q
28 7
ISR Number
1623 15 0242526 10
T
7/28/2019 Presentation 05272011
24/55
24
Conditional Execution
ITTET EQ
Inst 1
Inst 2
Inst 3
Inst 4
If Then (IT) instruction added (16 bit)
Up to 3 additional then or else conditions maybe specified (T or E)
Makes up to 4 following instructions conditional
Any normal ARM condition code can be used
16-bit instructions in block do not affect condition code flags
Apart from comparison instruction
32 bit instructions may affect flags (normal rules apply)
Current if-then status stored in CPSR Conditional block maybe safely interrupted and returned to
Must NOT branch into or out of if-then block
MOVEQ
ADDEQ
SUBNE
ORREQ
7/28/2019 Presentation 05272011
25/55
25
Load/Store
Miscellaneous
Classes of Instructions (v4T)
Data Operations
MOV PC, Rm
Bcc
BLBLX
Change of Flow
7/28/2019 Presentation 05272011
26/55
26
Branch : B{} label
Branch with Link : BL{} subroutine_label
The processor core shifts the offset field left by 2 positions, sign-extends it
and adds it to the PC
32 Mbyte range
How to perform longer branches?
2831 24 0
Cond 1 0 1 L Offset
Condition field
Link bit 0 = Branch1 = Branch with link
232527
Branch instructions
7/28/2019 Presentation 05272011
27/55
27
Data processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
These instructions only work on registers, NOT memory.
Syntax:
{}{S} Rd, Rn, Operand2
Comparisons set flags only - they do not specify Rd Data movement does not specify Rn
Second operand is sent to the ALU via barrel shifter.
7/28/2019 Presentation 05272011
28/55
28
Register, optionally with shift operation Shift value can be either be:
5 bit unsigned integer
Specified in bottom byte of
another register.
Used for multiplication by constant
Immediate value
8 bit number, with a range of 0-255.
Rotated right through even
number of positions
Allows increased range of 32-bitconstants to be loaded directly into
registersResult
Operand1
BarrelShifter
Operand2
ALU
Using a Barrel Shifter:The 2nd Operand
7/28/2019 Presentation 05272011
29/55
29
Single register data transfer
LDR STR Word
LDRB STRB ByteLDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Memory system must support all access sizes
Syntax:
LDR{}{} Rd,
STR{}{} Rd,
e.g. LDREQB
7/28/2019 Presentation 05272011
30/55
30
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model Data Path and Pipelines
System Design
Development Tools
7/28/2019 Presentation 05272011
31/55
31
Cortex-M3 Datapath
RegisterBank Mul/Div
AddressIncrementer
ALU
B
A
INTADDR
I_HADDR
AddressRegister
BarrelShifter
Writeback
ALU
Read DataRegister
Write DataRegister
Instruction
Decode
I_HRDATA
D_HWDATA
D_HRDATA
AddressIncrementer
D_HADDRAddressRegister
7/28/2019 Presentation 05272011
32/55
32
Cortex-M3 has 3-stage fetch-decode-execute pipeline
Similar to ARM7
Cortex-M3 does more in each stage to increase overall
performance
Cortex-M3 Pipeline
Branch forwarding & speculation
1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute
Execute stage branch (ALU branch & Load Store Branch)
Fetch(Prefetch)
AGU
InstructionDecode &
Register Read
Branch
AddressPhase & Write
Back
Data PhaseLoad/Store &
Branch
Multiply & Divide
Shift ALU & Branch
Write
7/28/2019 Presentation 05272011
33/55
33
ARM10 vs. ARM11 Pipelines
ARM11
Fetch1
Fetch2
Decode Issue
Shift ALU Saturate
Writeback
MAC1
MAC2
MAC3
AddressData
Cache1
DataCache
2
Shift + ALUMemoryAccess Reg
Write
FETCH DECODE EXECUTE MEMORY WRITE
Reg Read
Multiply
BranchPrediction
InstructionFetch
ISSUE
ARM orThumb
InstructionDecode Multiply
Add
ARM10
7/28/2019 Presentation 05272011
34/55
7/28/2019 Presentation 05272011
35/55
35
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers ModelData Path and Pipelines
System Design
Development Tools
7/28/2019 Presentation 05272011
36/55
36
High PerformanceARM processor
High-bandwidthon-chip RAM
HighBandwidth
ExternalMemory
Interface
DMABus Master
APBBridge
Keypad
UART
PIO
TimerAHB
APB
High Performance
PipelinedBurst SupportMultiple Bus Masters
Low PowerNon-pipelinedSimple Interface
An Example AMBA System
7/28/2019 Presentation 05272011
37/55
37
HWDATA
Arbiter
Decoder
Master#1
Master#3
Master#2
Slave#1
Slave#4
Slave#3
Slave#2
Address/Control
rite Data
Read Data
HADDR
HWDATA
HRDATA
HADDR
HRDATA
AHB Structure
7/28/2019 Presentation 05272011
38/55
38
Mali200 + GP2 SoC Integration
Mali 200
MMUAXI
APB
Clock
Reset
IRQs
IDLEs
Shipped as synthesizable
Verilog
Mali 200 + GP2 requires a
single instant in the SoC,
with a small number ofconnections to be made.
IDLES can be used for
gating the Mali200 and GP2
core clock
Mali GP2
AXI Fabric
7/28/2019 Presentation 05272011
39/55
39
Typical GPU SoC Design
ARM1176JZF
L230
Mali 200 Mali GP2
CLCD
PL111
PL301 High-performance matrix
SDRAMC
PL340
APB Peripheral Sub-System
D I
APB
MS
Sys
Ctrl
nRst
M
PL390
GIC
Int
DDR
PHY
Designed and optimised for AMBA: provides easier integration with ARM cores and fabric IP
Unified Memory Architecture
Local AXI Interconnect
Mali
MMU
7/28/2019 Presentation 05272011
40/55
40
ARM PIPD Logic Product Families
ProcessGeometry180nm 130nm 90nm 65nm
HighSpeed
Low
Power
Low
Area
45nm 32nm 28nm
ECO Kits
(SC12)(SC12)
High Performance (Advantage-HS / SAGE-HS)
(SC7 / SC8)(SC7 / SC8)
Ultra High Density (Metro)
(SC9 / SC10)(SC9 / SC10)
High Density (Advantage)
(SC9 / SC10(SC9 / SC10 tappedtapped))
High Density (SAGE-X)EC
OKits
Power Management KitsPower Management Kits
PowerManagementKits
PowerManagementKits
7/28/2019 Presentation 05272011
41/55
41
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers ModelData Path and Pipelines
System Design
Development Tools
7/28/2019 Presentation 05272011
42/55
42
ARM Debug Architecture
ARMcore
ETM
TAPcontroller
Trace PortJTAG port
Ethernet
Debugger (+ optional
trace tools)
EmbeddedICE Logic
Provides breakpoints and processor/systemaccess
JTAG interface (ICE)
Converts debugger commands to JTAGsignals
Embedded trace Macrocell (ETM)
Compresses real-time instruction and dataaccess trace
Contains ICE features (trigger & filter logic)
Trace port analyzer (TPA)
Captures trace in a deep buffer
EmbeddedICELogic
7/28/2019 Presentation 05272011
43/55
43
Keil Development Tools for ARM
Includes ARM macro assembler, compilers (ARM RealView C/C++Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil uVision
Debugger and Keil uVision IDE
Keil uVision Debugger accurately simulates on-chip peripherals (I2C, CAN,
UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc.)
Evaluation Limitations 16K byte object code + 16K data limitation
Some linker restrictions such as base addresses for code/constants
GNU tools provided are not restricted in any way
http://www.keil.com/demo/
7/28/2019 Presentation 05272011
44/55
44
Keil Development Tools for ARM
7/28/2019 Presentation 05272011
45/55
45
7/28/2019 Presentation 05272011
46/55
46
University Resources
http://www.arm.com/support/university/
7/28/2019 Presentation 05272011
47/55
47
TI Panda Board
OMAP4430 Processor
1 GHz Dual-core ARMCortex-A9 (NEON+VFP)
C64x+ DSP
PowerVR SGX 3D GPU
1080p Video Support
POP Memory
1 GB LPDDR2 RAM
USB Powered < 4W max consumption
(OMAP small % of that) Many adapter options(Car, wall, battery, solar, ..)
7/28/2019 Presentation 05272011
48/55
48
Project Ideas Using Panda
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
Super-Panda stack of Pandas as compute engine and task
distribution
Linux applications
NEON Optimization Projects
Codec optimization in ffmpeg (pick your favorite codec)
Voice and image recognition
Open-source Flash player optimizations (swfdec)
7/28/2019 Presentation 05272011
49/55
49
Fin
7/28/2019 Presentation 05272011
50/55
50
Nokia N95 Multimedia Computer
Symbian OS v9.2Operating System supporting ARMprocessor-based mobile devices,
developed using ARM RealViewCompilation Tools
OMAP 2420Applications Processor
ARM1136 processor-based
SoC, developed using Magma
Blast family and winner of
2005 INSIGHT Award for MostInnovative SoC
Connect. Collaborate. Create.
Mobiclip Video CodecSoftware video codec for ARM
processor-based mobile devices
ST WLAN Solution
Ultra-low power 802.11b/g WLANchip with ARM9 processor-based
MAC
S60 3rd EditionS60 Platform supporting ARM
processor-based mobile devices
7/28/2019 Presentation 05272011
51/55
51
Beagle Board
7/28/2019 Presentation 05272011
52/55
52
$149
> 1000 participantsand growing
Open access tohardware
documentation
Wikis, blogs,promotion of
communityactivity
Freesoftware
Freedom toinnovate
Personallyaffordable
Active &technical
community
Opportunityto tinker and
learn
Instant access to>10 million lines
of code
Addressing
open sourcecommunity
needs
Targeting community development
7/28/2019 Presentation 05272011
53/55
53
OMAP3530 Processor
600MHz Cortex-A8 NEON+VFPv3
16KB/16KB L1$
256KB L2$
430MHz C64x+ DSP
32K/32K L1$
48K L1D 32K L2
PowerVR SGX GPU
64K on-chip RAM
POP Memory
128MB LPDDR RAM
256MB NAND flash USB Powered 2W maximum consumption
OMAP is small % of that Many adapter options
Car, wall, battery, solar,
Peripheral I/O
DVI-D video out
SD/MMC+
S-Video out
USB 2.0 HS OTG
I2C, I2S, SPI,
MMC/SD JTAG
Stereo in/out
Alternate power
RS-232 serial
3
Fast, low power, flexible expansion
7/28/2019 Presentation 05272011
54/55
54
Peripheral I/O
DVI-D video out
SD/MMC+
S-Video out
USB HS OTG
I2C, I2S, SPI,
MMC/SD
JTAG
Stereo in/out
Alternate power RS-232 serial
3
Other Features 4 LEDs
USR0
USR1
PMU_STAT
PWR
2 buttons USER
RESET
4 boot sources
SD/MMC
NAND flash
USB
Serial
On-going collaboration at BeagleBoard.org
Live chat via IRC for 24/7 community support
Links to software projects to download
And more
7/28/2019 Presentation 05272011
55/55
Project Ideas Using Beagle
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
Super-Beagle stack of Beagles as compute engine and task
distribution
Linux applications
NEON Optimization Projects
Codec optimization in ffmpeg (pick your favorite codec)
Voice and image recognition
Open-source Flash player optimizations (swfdec)