Upload
truongmien
View
273
Download
0
Embed Size (px)
Citation preview
ARM Processor Architecture (II)ARM Processor Architecture (II)
Speaker: Lung-Hao Chang Advisor: Prof. Andy Wu
Graduate Institute of Electronics Engineering,National Taiwan University
Modified from National Chiao-Tung University IP Core Design course
2SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
OutlineARM processor coreMemory hierarchySoftware development Summary
3SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM Processor Core
4SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Processor CoreCurrent low-end ARM core for applications like digital mobile phonesTDMI T: Thumb, 16-bit compressed instruction set D: on-chip Debug support, enabling the processor to halt
in response to a debug request M: enhanced Multiplier, yield a full 64-bit result, high
performance I: EmbeddedICE hardware
Von Neumann architecture3-stage pipelineCPI ~ 1.9
5SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Block Diagram
JTAG TAPcontroller
Embedded
processorcore
TCK TMSTRST TDI TDO
D[31:0]
A[31:0]
opc, r/w,mreq, trans,mas[1:0]
othersignals
scan chain 0
scan chain 2
scan chain 1
extern0extern1 ICE
bussplitter
Din[31:0]
Dout[31:0]
6SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Core Diagram
7SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Interface Signals (1/5)
mreqseqlock
Dout[31:0]
D[31:0]
r/wmas[1:0]
mode[4:0]trans
abort
opccpi
cpacpb
memoryinterface
MMUinterface
coprocessorinterface
mclkwaiteclk
isync
bigend
enin
irqq
reset
enout
abe
VddVss
clockcontrol
configuration
interrupts
initialization
buscontrol
power
aleapedbe
dbgrqbreakptdbgack
debug
execextern1extern0dbgen
bl[3:0]
TRSTTCKTMSTDI
JTAGcontrols
TDO
Tbit statetbe
rangeout0rangeout1
dbgrqicommrxcommtx
enouti
highzbusdis
ecapclk
busen
Din[31:0]
A[31:0]
ARM7TDMI
core
tapsm[3:0]ir[3:0]tdoentck1tck2screg[3:0]
TAPinformation
drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs
boundaryscanextension
8SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Interface Signals (2/5)Clock control All state change within the processor are controlled by mclk, the
memory clock Internal clock = mclk AND \wait eclk clock output reflects the clock used by the core
Memory interface 32-bit address (A[31:0]), bidirectional data bus (D[31:0]), separate
data out (Dout[31:0]), data in (Din[31:0]) \mreq indicates a processor cycle which requires memory access seq indicates that the memory address will be sequential to that used
in the previous cyclemreq s eq Cy cl e Us e
0 0 N Non-sequential memory access0 1 S Sequential memory access1 0 I Internal cycle bus and memory inactive1 1 C Coprocessor register transfer memory inactive
9SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Interface Signals (3/5) lock indicates that the processor should keep the bus to ensure the
atomicity of the read and write phase of a SWAP instruction \r/w, a read or a write cycle mas[1:0], encode memory access size byte, half-word or word bl[3:0], externally controlled enables on latches on each of the 4 bytes
on the data input busMMU interface \trans (translation control), 0: user mode, 1: privileged mode \mode[4:0], bottom 5 bits of the CPSR (inverted) abort, disallow access
State T bit, whether the processor is currently executing ARM or Thumb
instructionsConfiguration Bigend, big-endian or little-endian
10SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Interface Signals (4/5)Interrupt \fiq, fast interrupt request, higher priority \irq, normal interrupt request isync, allow the interrupt synchronizer to be passed
Initialization \reset, starts the processor from a known state, executing from
address 0000000016Debug support EmbeddedICE, contains the breakpoint and watchpoint registers Processor registers may be inspected
Debug interface dbgen, external hardware to enable debug dbgrq, asynchronous debug request breakpt, instruction-synchronous request
11SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM7TDMI Interface Signals (5/5)Coprocessor interface \cpi, ARM has identified a coprocessor instruction cpa, tells ARM that there is no coprocessor present cpb, coprocessor busy and cannot begin executing the instruction \opc, whether a memory access is to fetch an instruction or a data
Power 5V or 3V supply, depending on technology and the circuit design
JTAG (Joint Test Action Group) Testing the circuitry on the chip itself
ARM7TDMI characteristics
Process 0.35 um Transistors 74,209 MIPS 60Metal layers 3 Core area 2.1 mm
2Power 87 mW
Vdd 3.3 V Clock 0 to 66 MHz MIPS/W 690
12SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Memory AccessThe ARM7 is a Von Neumann, load/store architecture, i.e., Only 32 bit data bus for both inst. And data. Only the load/store instruction (and SWP)
access memory.Memory is addressed as a 32 bit address spaceData type can be 8 bit bytes, 16 bit half-words or 32 bit words, and may be seen as a byte line folded into 4-byte wordsWords must be aligned to 4 byte boundaries, and half-words to 2 byte boundaries.Always ensure that memory controller supports all three access sizes
13SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM Memory InterfaceSequential (S cycle) (nMREQ, SEQ) = (0, 1) The ARM core requests a transfer to or from an address which is either the
same, or one word or one-half-word greater than the preceding address.
Non-sequential (N cycle) (nMREQ, SEQ) = (0, 0) The ARM core requests a transfer to or from an address which is unrelated to
the address used in the preceding address.
Internal (I cycle) (nMREQ, SEQ) = (1, 0) The ARM core does not require a transfer, as it performing an internal
function, and no useful prefetching can be performed at the same time
Coprocessor register transfer (C cycle) (nMREQ, SEQ) = (1, 1) The ARM core wished to use the data bus to communicate with a
coprocessor, but does no require any action by the memory system.
14SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Cached ARM7TDMI Macrocells
ARM710T 8K unified write through
cache Full memory management
unit supporting virtual memory
Write buffer
ARM720T As ARM 710T but with WinCE
support
ARM740T 8K unified write through cache Memory protection unit Write buffer
AMBAInterface
Inst. & data cache
MMU
ARM Core
CP15EmbeddedICE & JTAG
WriteBuffer
AMBA Address
AMBA Data
VirtualAddress
PhysicalAddress
Inst. & data
15SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Processor Core V.S. CPU CoreProcessor Core The engine that fetches instructions and execute them E.g.: ARM7TDMI, ARM9TDMI, ARM9E-S
CPU Core Consists of the ARM
processor core and some tightly coupled function blocks
Cache and memory management blocks
E.g.: ARM710T, ARM720T, ARM74T, ARM920T, ARM922T, ARM940T, ARM946E-S, and ARM966E-S
AMBAaddress
AMBAdata
instruction &data cache
AMBA interface
ARM7TDMI
EmbeddedICE& JTAG
virtual address
instruct ions & dataph
ysica
lad
dres
s
CP15
MMU
writebuffer
ARM710T
16SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM8Higher performance than ARM7 By increasing the clock rate By reducing the CPI
Higher memory bandwidth, 64-bit wide memory Separate memories for instruction and data access
memory(double-
bandwidth)
prefetchunit
integerunit
coprocessor(s)
write data
read data
addresses
instructionsPC
CPdataCPinst.
Core Organization The prefetch unit is responsible for
fetching instructions from memory and buffering them (exploiting the double bandwidth memory)
It is also responsible for branch prediction and use static prediction based on the branch prediction
backward: predicted taken forward: predicted not taken
ARM8 ARM9TDMIARM10TDMI
17SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Pipeline Organization5-stage, prefetch unit occupies the 1st stage, integer unit occupies the remainder
(1) Instruction prefetch
(2) Instruction decode and register read
(3) Execute (shift and ALU)
(4) Data memory access
(5) Write back results
Prefetch Unit
Integer Unit
18SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Integer Unit Organization
inst. decode
register write
+4
writepipeline
multiplier
register read
mux
ALU/shifter
rot/sgn ex
PC+8instructionscoprocessorinstructions
coprocdata
forwardingpaths
writedata
addressreaddata
decode
execute
memory
write
19SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM8 Macrocell
8 Kbyte cache(double-
bandwidth)
prefetchunit
ARM8 integerunit
CP15
write data
read data
virtual address
instructionsPC
CPdataCPinst.
write buffer MMU
address bufferphysical address
data outdata in address
copy-back tag
JTAG
copy-back data
ARM810 8K byte unified instruction
and data cache Copy-back Double-bandwidth MMU Coprocessor Write bufferupd
ate
TLB
Pro ces s 0.5 um Trans i s to rs 836,022 MIPS 86Metal l ay ers 3 Di e area 76 mm2 Po wer 500 mWVdd 3.3 V Cl o ck 0 to 72 MHz MIPS/ W 172
20SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
StrongARMThe first ARM processor to use a modified-Harvard(separate instruction and data cache) architecture and now available from IntelFeature A 5-stage pipeline with register forwarding Single-cycle execution of all common instructions except
64-bit multiplies Instruction cache/copy-back data cache Write buffer Pseudo-static operation with low power consumption
Process 0.35 um Transistors 2,500,000 MIPS 115/268Metal layers 3 Die area 50 mm
2Power 300/1000 mW
Vdd 1.65/2 V Clock 100/233 MHz MIPS/W 380/268
21SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
StrongARM core pipeline organization
I-cache
rot/sgn ex
+4
rotate
ALU & multiply
I decode
register read+ disp
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediateelds
branchtarget
branchoffset
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
MOV pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
B, BL
pc + 4
+4
mux
shift
22SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
StrongARM Processor (1/2)
CPU core
3.6864 MHz
32.786 KHz
SA-1core da
taca
cheinstruction
cacheinstruction
MMUdataMMU
min
i-cac
he
read bufferwrite buffer
clockPLL
RTCosc.
memory &PCMCIA
LCDcontrol
DMAcontrol bridge
serial 0
serial 1
serial 2
serial 3
serial 4reset
control
interruptcontrol
OStimer
general-purpose I/O
powermanager
RTC
LCD (5)
I/O pins (28)
battery (3)
data (32)
address (26)
control
USB (2)
SDLC (2)
IrDA (2)
UART (2)
Codec (4)
system bus
peripheral bus
reset (2)
SA-1100 Intel SA-1 core 16-Kbyte instruction
and 8-Kbyte data cache
MMU read and write buffers 512-byte mini-data
cache
23SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
StrongARM Processor (2/2)SA-1100 die plot Photo courtesy of Intel Corp.
SA-1100 characteristics
Process 0.35 um Transistors 2,500,000 MIPS 220/250Metal layers 3 Die area 75 mm
2Power 330/550 mW
Vdd 1.5/2 V Clock 190/220 MHz MIPS/W 665/450
24SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMIHarvard architecture Increases available memory bandwidth
Instruction memory interface Data memory interface
Simultaneous accesses to instruction and data memory can be achieved
5-stage pipelineChanges implemented to Improve CPI to ~1.5 Improve maximum clock frequency
25SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMI Organization
I-cache
rot/sgn ex
+4
byte repl.
ALU
I decode
register read
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediatefields
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
pc + 4
+4
mux
shift
mul
B, BLMOV pc
26SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMI Pipeline Operations (1/4)
instructionfetch
instructionfetch
Thumbdecompress
ARMdecode
regread
regwriteshift/ALU
regwriteshift/ALU
r. read
decode
data memoryaccess
Fetch Decode Execute
Memory WriteFetch Decode Execute
ARM9TDMI:
ARM7TDMI:
The ARM9TDMI pipeline is much tighter and does not have sufficient slack time to allow Thumb instructions to be first translate into ARM instructions and then decodedIt has hardware to decode both ARM and Thumb instructions directly
27SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMI Pipeline Operations (2/4)
Thumb instruction decompressor Translates a thumb instruction into its equivalent ARM instruction
28SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMI Pipeline Operations (3/4)
#imm8
15 13 12 11 10 8 7 0
0 0 1 10 Rd
1 1 1 0 0 0 0 0 0 0 #imm81 0 1 0 0 1 0 Rd 0 Rd
31 28 27 26 25 24 21 20 19 16 15 12 11 0
alwayscondition
zeroshift
immediatevalue
destinationmajor opcode,format 3: MOV/CMP/ADD/SUBwith immediate
and sourceregister
minor opcodedenoting ADD
& set CC
Thumb
ARM
ExampleADD Rd, #imm8 ADDS Rd, Rd, #imm8
Thumb ARM
29SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9TDMI Pipeline Operations (4/4)Coprocessor support Floating-point, digital signal processing, special-purpose
hardware accelerator
On-chip debugger Additional features compared to ARM7TDMI
Hardware single stepping Breakpoint can be set on exceptions
ARM9TDMI characteristics
Process 0.25 um Transistors 110,000 MIPS 220Metal layers 3 Core area 2.1 mm
2Power 150 mW
Vdd 2.5 V Clock 0 to 200 MHz MIPS/W 1500
30SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Cached ARM9TDMI MacrocellARM920T ARM9TDMI 16KB instruction cache, 16KB data cache Full Memory Management Unit, Write Buffer
ARM922T ARM9TDMI 8KB instruction cache, 8KB data cache Full Memory Management Unit, Write Buffer
ARM940T ARM9TDMI 4KB instruction cache, 4KB data cache Protection Unit
31SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM920T CPU Core
32SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM9E-S Family OverviewARM9E-S is based on an ARM9TDMI with the following extensions: Single cycle 32*6 multiplier implementation EmbeddedICE logic RT Improved ARM/Thumb interworking New 32*16 and 16*16 multiply instructions New count leading zero instruction New saturated math instructions
ARM946E-S ARM9E-S core Instruction and data caches, selectable sizes Instruction and data RAMs, selectable sizes Protection unit AHB bus interface
Architecture v5TE
33SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM10TDMI (1/2)Performance on the same IC process
ARM10TDMI ARM9TDMI ARM7TDMI22
300MHz on 0.25m CMOS technologyIncrease clock rate 6-stage pipeline
ARM10TDMI
branchprediction
regwrite
r. readdecode
data memoryaccess
Memory WriteFetch Decode Execute
decode
Issue
multiplierpar tials add
instructionfetch
datawrite
shift/ALU
addr.calc.
multiply
34SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM10TDMI (2/2)Reduce CPI Branch prediction Non-blocking load and store execution 64-bit data memory transfer 2 registers in each cycle
4 read ports and 3 write ports
35SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM1020T OverviewArchitecture v5T ARM1020E will be v5TE
CPI ~ 1.36-stage pipelineStatic branch prediction32KB instruction and 32KB data caches hit under miss support
64 bits per cycle LDM/STM operationsEmbeddedICE Logic RT-IISupport for new VFPv1 architectureARM10200 test chip ARM1020T VFP10 SDRAM memory interface PLL
36SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM1136J(F)-SFirst Implementation of ARMv6 Architecture ARM1136J-S integer-only core ARM1136JF-S with integrated floating point
High speed pipeline microarchitecture 8 stages
System level flexibilityLow Power Microarchitecture designed for low power Power management modes
Availability Delivering to first licensees in December 2002
The ARM11 core has been developed and integrated in parallel with the ARM11 PrimeXsys Platform to ensure a fully compatible, high performance, extendable system solution
37SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Memory Hierarchy
38SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Memory Size and Speed
On-chip cache memory
registers
2nd-level off chip cache
Main memory
Hard diskAccess
timecapacity
Slow
Fast
Large
Small Expensive
Cheap
Cost
39SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Caches (1/2)A cache memory is a small, very fast memory that retains copies of recently used memory values.It usually implemented on the same chip as the processor.Caches work because programs normally display the property of locality, which means that at any particular time they tend to execute the same instruction many times on the same areas of data.An access to an item which is in the cache is called a hit, and an access to an item which is not in the cache is a miss.
40SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Caches (2/2)A processor can have one of the following two organizations: A unified cache
This is a single cache for both instructions and data
Separate instruction and data caches This organization is sometimes called a modified Harvard
architectures
41SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Unified instruction and data cache
address
instructionscache memory
copies of
instructions
data
00..0016
FF..FF16
instructions
copies ofdata
registers
processor
instructionsaddress
and data
and data
42SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Separate data and instruction caches
address
datacache
00..0016
FF..FF16
copies ofdata
registers
processor
dataaddress
address
instructionsaddress
cache
copies ofinstructions
instructions
memory
instructions
data
43SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Cache Write StrategiesWrite-through All write operations are passed to main memory
Write-through with buffered write All write operations are still passed to main memory and
the cache updated as appropriate, but instead of slowing the processor down to main memory speed the write address and data are stored in a write buffer which can accept the write information at high speed.
Copy-back (write-back) No kept coherent with main memory
44SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Software Development
45SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Main Components in ADSANSI C compilers armcc and tccISO/Embedded C++ compilers armcpp and tcppARM/Thumb assembler - armasmLinker - armlinkProject management tool for windows - CodeWarriorInstruction set simulator - ARMulatorDebuggers - AXD, ADW, ADU and armsdFormat converter - fromelfLibrarian armarARM profiler armprofC and C++ librariesROM-based debug tools (ARM Firmware Suite, AFS)Real Time Debug and Trace support Support for all ARM cores and processors including ARM9E, ARM10, Jazelle, StrongARM and Intel Xscale
ADS: ARM Developer Suite
46SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM C CompilerCompiler is compliant with the ANSI standard for CSupported by the appropriate library of functionsUse ARM Procedure Call Standard (APCS) for all external functions For procedure entry and exit
May produce assembly source output Can be inspected, hand optimized and then assembled
sequentiallyCan also produce Thumb codes
47SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM LinkerTake one or more object files and combine themResolve symbolic references between the object files and extract the object modules from librariesNormally the linker includes debug tables in the output file
48SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM Symbolic DebuggerA front-end interface to debug program running either under emulator (on the ARMulator) or remotely on a ARM development board (via a serial line or through JTAG test interface)ARMsd allows an executable program to be loaded into the ARMulator or a development board and run. It allows the setting of Breakpoints, addresses in the code Watchpoints, memory address if accessed as data
address Cause exception to halt so that the processor state can be
examined
49SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM Emulator: ARMulatorA suite of programs that models the behavior of various ARM processor cores and system architecture in software on a host systemCan be operates at various levels of accuracy Instruction accurate Cycle accurate Timing accurate
Benchmarking before hardware is available Instruction count or number of cycles can be measured for a program Performance analysis.
Run software on ARMulator Through ARMsd or ARM GUI debuggers, e.g., AXD The processor core model incorporates the remote debug interface,
so the processor and the system state are visible from the ARM symbolic debugger
Supports a C library to allow complete C programs to run on the simulated system
50SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM Development BoardA circuit board including an ARM core (e.g. ARM9TDMI), memory component, I/O and electrically programmable devicesIt can support both hardware and software development before the final application-specific hardware is available
51SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
ARM IntegratorA mother with some extensions to support the development of applicationsProvides core modules, logic modules (XilinxVirtex FPGA, Alter APEX FPGA), OS, input/output resources, bus arbitration, interrupt handling
52SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Summary (1/2)ARM Processor Family
Processorfamily
# of pipeline stages
Memory organization
Clock Rate MIPS/MHz
25 MHz0.91.21.11.251.15
ARM11 8 Von Neumann/Harvard
550 MHz 1.2
66 MHz72 MHz200 MHz400 MHz233 MHz
Von NeumannVon NeumannVon NeumannHarvardHarvardHarvard
335565
ARM6ARM7ARM8ARM9ARM10StrongARM
53SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
Summary (2/2)Memory hierarchy Unified cache/Separate instruction and data cache Write-through with buffered write
Software Development CodeWarrior IDE
armcc/tcc/armcpp/tcpp armasm armlink armprof
AXD (ARM eXtended Debugger) armsd
ARMulatorARM Integrator
54SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004
References[1] http://twins.ee.nctu.edu.tw/courses/ip_core_02/index.html[2] ARM System-on-Chip Architecture, Second Edition,
edited by S.Furber, Addison Wesley Longman: ISBN 0-201-67519-6.
[3] Architecture Reference Manual, Second Edition, edited by D. Seal, Addison Wesley Longman: ISBN 0-201-73719-1.
[4] www.arm.com
ARM Processor Architecture (II)OutlineARM Processor CoreARM7TDMI Processor CoreARM7TDMI Block DiagramARM7TDMI Core DiagramARM7TDMI Interface Signals (1/5)ARM7TDMI Interface Signals (2/5)ARM7TDMI Interface Signals (3/5)ARM7TDMI Interface Signals (4/5)ARM7TDMI Interface Signals (5/5)Memory AccessARM Memory InterfaceCached ARM7TDMI MacrocellsProcessor Core V.S. CPU CoreARM8Pipeline OrganizationInteger Unit OrganizationARM8 MacrocellStrongARMStrongARM core pipeline organizationStrongARM Processor (1/2)StrongARM Processor (2/2)ARM9TDMIARM9TDMI OrganizationARM9TDMI Pipeline Operations (1/4)ARM9TDMI Pipeline Operations (2/4)ARM9TDMI Pipeline Operations (3/4)ARM9TDMI Pipeline Operations (4/4)Cached ARM9TDMI MacrocellARM920T CPU CoreARM9E-S Family OverviewARM10TDMI (1/2)ARM10TDMI (2/2)ARM1020T OverviewARM1136J(F)-SMemory HierarchyMemory Size and SpeedCaches (1/2)Caches (2/2)Unified instruction and data cacheSeparate data and instruction cachesCache Write StrategiesSoftware DevelopmentMain Components in ADSARM C CompilerARM LinkerARM Symbolic DebuggerARM Emulator: ARMulatorARM Development BoardARM IntegratorSummary (1/2)Summary (2/2)References