54
ARM Processor Architecture (II) ARM Processor Architecture (II) Speaker: Lung-Hao Chang 張龍豪 Advisor: Prof. Andy Wu 吳安宇教授 Graduate Institute of Electronics Engineering, National Taiwan University Modified from National Chiao-Tung University IP Core Design course

ARM Processor Architecture (II) - access.ee.ntu.edu.twaccess.ee.ntu.edu.tw/course/SOC2004/SOC實驗教材/ARM Processor... · ARM Processor Architecture (II) Speaker: Lung-Hao Chang

Embed Size (px)

Citation preview

  • ARM Processor Architecture (II)ARM Processor Architecture (II)

    Speaker: Lung-Hao Chang Advisor: Prof. Andy Wu

    Graduate Institute of Electronics Engineering,National Taiwan University

    Modified from National Chiao-Tung University IP Core Design course

  • 2SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    OutlineARM processor coreMemory hierarchySoftware development Summary

  • 3SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM Processor Core

  • 4SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Processor CoreCurrent low-end ARM core for applications like digital mobile phonesTDMI T: Thumb, 16-bit compressed instruction set D: on-chip Debug support, enabling the processor to halt

    in response to a debug request M: enhanced Multiplier, yield a full 64-bit result, high

    performance I: EmbeddedICE hardware

    Von Neumann architecture3-stage pipelineCPI ~ 1.9

  • 5SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Block Diagram

    JTAG TAPcontroller

    Embedded

    processorcore

    TCK TMSTRST TDI TDO

    D[31:0]

    A[31:0]

    opc, r/w,mreq, trans,mas[1:0]

    othersignals

    scan chain 0

    scan chain 2

    scan chain 1

    extern0extern1 ICE

    bussplitter

    Din[31:0]

    Dout[31:0]

  • 6SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Core Diagram

  • 7SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Interface Signals (1/5)

    mreqseqlock

    Dout[31:0]

    D[31:0]

    r/wmas[1:0]

    mode[4:0]trans

    abort

    opccpi

    cpacpb

    memoryinterface

    MMUinterface

    coprocessorinterface

    mclkwaiteclk

    isync

    bigend

    enin

    irqq

    reset

    enout

    abe

    VddVss

    clockcontrol

    configuration

    interrupts

    initialization

    buscontrol

    power

    aleapedbe

    dbgrqbreakptdbgack

    debug

    execextern1extern0dbgen

    bl[3:0]

    TRSTTCKTMSTDI

    JTAGcontrols

    TDO

    Tbit statetbe

    rangeout0rangeout1

    dbgrqicommrxcommtx

    enouti

    highzbusdis

    ecapclk

    busen

    Din[31:0]

    A[31:0]

    ARM7TDMI

    core

    tapsm[3:0]ir[3:0]tdoentck1tck2screg[3:0]

    TAPinformation

    drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs

    boundaryscanextension

  • 8SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Interface Signals (2/5)Clock control All state change within the processor are controlled by mclk, the

    memory clock Internal clock = mclk AND \wait eclk clock output reflects the clock used by the core

    Memory interface 32-bit address (A[31:0]), bidirectional data bus (D[31:0]), separate

    data out (Dout[31:0]), data in (Din[31:0]) \mreq indicates a processor cycle which requires memory access seq indicates that the memory address will be sequential to that used

    in the previous cyclemreq s eq Cy cl e Us e

    0 0 N Non-sequential memory access0 1 S Sequential memory access1 0 I Internal cycle bus and memory inactive1 1 C Coprocessor register transfer memory inactive

  • 9SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Interface Signals (3/5) lock indicates that the processor should keep the bus to ensure the

    atomicity of the read and write phase of a SWAP instruction \r/w, a read or a write cycle mas[1:0], encode memory access size byte, half-word or word bl[3:0], externally controlled enables on latches on each of the 4 bytes

    on the data input busMMU interface \trans (translation control), 0: user mode, 1: privileged mode \mode[4:0], bottom 5 bits of the CPSR (inverted) abort, disallow access

    State T bit, whether the processor is currently executing ARM or Thumb

    instructionsConfiguration Bigend, big-endian or little-endian

  • 10SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Interface Signals (4/5)Interrupt \fiq, fast interrupt request, higher priority \irq, normal interrupt request isync, allow the interrupt synchronizer to be passed

    Initialization \reset, starts the processor from a known state, executing from

    address 0000000016Debug support EmbeddedICE, contains the breakpoint and watchpoint registers Processor registers may be inspected

    Debug interface dbgen, external hardware to enable debug dbgrq, asynchronous debug request breakpt, instruction-synchronous request

  • 11SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM7TDMI Interface Signals (5/5)Coprocessor interface \cpi, ARM has identified a coprocessor instruction cpa, tells ARM that there is no coprocessor present cpb, coprocessor busy and cannot begin executing the instruction \opc, whether a memory access is to fetch an instruction or a data

    Power 5V or 3V supply, depending on technology and the circuit design

    JTAG (Joint Test Action Group) Testing the circuitry on the chip itself

    ARM7TDMI characteristics

    Process 0.35 um Transistors 74,209 MIPS 60Metal layers 3 Core area 2.1 mm

    2Power 87 mW

    Vdd 3.3 V Clock 0 to 66 MHz MIPS/W 690

  • 12SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Memory AccessThe ARM7 is a Von Neumann, load/store architecture, i.e., Only 32 bit data bus for both inst. And data. Only the load/store instruction (and SWP)

    access memory.Memory is addressed as a 32 bit address spaceData type can be 8 bit bytes, 16 bit half-words or 32 bit words, and may be seen as a byte line folded into 4-byte wordsWords must be aligned to 4 byte boundaries, and half-words to 2 byte boundaries.Always ensure that memory controller supports all three access sizes

  • 13SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM Memory InterfaceSequential (S cycle) (nMREQ, SEQ) = (0, 1) The ARM core requests a transfer to or from an address which is either the

    same, or one word or one-half-word greater than the preceding address.

    Non-sequential (N cycle) (nMREQ, SEQ) = (0, 0) The ARM core requests a transfer to or from an address which is unrelated to

    the address used in the preceding address.

    Internal (I cycle) (nMREQ, SEQ) = (1, 0) The ARM core does not require a transfer, as it performing an internal

    function, and no useful prefetching can be performed at the same time

    Coprocessor register transfer (C cycle) (nMREQ, SEQ) = (1, 1) The ARM core wished to use the data bus to communicate with a

    coprocessor, but does no require any action by the memory system.

  • 14SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Cached ARM7TDMI Macrocells

    ARM710T 8K unified write through

    cache Full memory management

    unit supporting virtual memory

    Write buffer

    ARM720T As ARM 710T but with WinCE

    support

    ARM740T 8K unified write through cache Memory protection unit Write buffer

    AMBAInterface

    Inst. & data cache

    MMU

    ARM Core

    CP15EmbeddedICE & JTAG

    WriteBuffer

    AMBA Address

    AMBA Data

    VirtualAddress

    PhysicalAddress

    Inst. & data

  • 15SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Processor Core V.S. CPU CoreProcessor Core The engine that fetches instructions and execute them E.g.: ARM7TDMI, ARM9TDMI, ARM9E-S

    CPU Core Consists of the ARM

    processor core and some tightly coupled function blocks

    Cache and memory management blocks

    E.g.: ARM710T, ARM720T, ARM74T, ARM920T, ARM922T, ARM940T, ARM946E-S, and ARM966E-S

    AMBAaddress

    AMBAdata

    instruction &data cache

    AMBA interface

    ARM7TDMI

    EmbeddedICE& JTAG

    virtual address

    instruct ions & dataph

    ysica

    lad

    dres

    s

    CP15

    MMU

    writebuffer

    ARM710T

  • 16SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM8Higher performance than ARM7 By increasing the clock rate By reducing the CPI

    Higher memory bandwidth, 64-bit wide memory Separate memories for instruction and data access

    memory(double-

    bandwidth)

    prefetchunit

    integerunit

    coprocessor(s)

    write data

    read data

    addresses

    instructionsPC

    CPdataCPinst.

    Core Organization The prefetch unit is responsible for

    fetching instructions from memory and buffering them (exploiting the double bandwidth memory)

    It is also responsible for branch prediction and use static prediction based on the branch prediction

    backward: predicted taken forward: predicted not taken

    ARM8 ARM9TDMIARM10TDMI

  • 17SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Pipeline Organization5-stage, prefetch unit occupies the 1st stage, integer unit occupies the remainder

    (1) Instruction prefetch

    (2) Instruction decode and register read

    (3) Execute (shift and ALU)

    (4) Data memory access

    (5) Write back results

    Prefetch Unit

    Integer Unit

  • 18SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Integer Unit Organization

    inst. decode

    register write

    +4

    writepipeline

    multiplier

    register read

    mux

    ALU/shifter

    rot/sgn ex

    PC+8instructionscoprocessorinstructions

    coprocdata

    forwardingpaths

    writedata

    addressreaddata

    decode

    execute

    memory

    write

  • 19SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM8 Macrocell

    8 Kbyte cache(double-

    bandwidth)

    prefetchunit

    ARM8 integerunit

    CP15

    write data

    read data

    virtual address

    instructionsPC

    CPdataCPinst.

    write buffer MMU

    address bufferphysical address

    data outdata in address

    copy-back tag

    JTAG

    copy-back data

    ARM810 8K byte unified instruction

    and data cache Copy-back Double-bandwidth MMU Coprocessor Write bufferupd

    ate

    TLB

    Pro ces s 0.5 um Trans i s to rs 836,022 MIPS 86Metal l ay ers 3 Di e area 76 mm2 Po wer 500 mWVdd 3.3 V Cl o ck 0 to 72 MHz MIPS/ W 172

  • 20SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    StrongARMThe first ARM processor to use a modified-Harvard(separate instruction and data cache) architecture and now available from IntelFeature A 5-stage pipeline with register forwarding Single-cycle execution of all common instructions except

    64-bit multiplies Instruction cache/copy-back data cache Write buffer Pseudo-static operation with low power consumption

    Process 0.35 um Transistors 2,500,000 MIPS 115/268Metal layers 3 Die area 50 mm

    2Power 300/1000 mW

    Vdd 1.65/2 V Clock 100/233 MHz MIPS/W 380/268

  • 21SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    StrongARM core pipeline organization

    I-cache

    rot/sgn ex

    +4

    rotate

    ALU & multiply

    I decode

    register read+ disp

    D-cache

    fetch

    instructiondecode

    execute

    buffer/data

    write-back

    forwardingpaths

    immediateelds

    branchtarget

    branchoffset

    nextpc

    regshift

    load/storeaddress

    LDR pc

    SUBS pc

    MOV pc

    post-index

    pre-index

    LDM/STM

    register write

    r15

    pc + 8

    B, BL

    pc + 4

    +4

    mux

    shift

  • 22SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    StrongARM Processor (1/2)

    CPU core

    3.6864 MHz

    32.786 KHz

    SA-1core da

    taca

    cheinstruction

    cacheinstruction

    MMUdataMMU

    min

    i-cac

    he

    read bufferwrite buffer

    clockPLL

    RTCosc.

    memory &PCMCIA

    LCDcontrol

    DMAcontrol bridge

    serial 0

    serial 1

    serial 2

    serial 3

    serial 4reset

    control

    interruptcontrol

    OStimer

    general-purpose I/O

    powermanager

    RTC

    LCD (5)

    I/O pins (28)

    battery (3)

    data (32)

    address (26)

    control

    USB (2)

    SDLC (2)

    IrDA (2)

    UART (2)

    Codec (4)

    system bus

    peripheral bus

    reset (2)

    SA-1100 Intel SA-1 core 16-Kbyte instruction

    and 8-Kbyte data cache

    MMU read and write buffers 512-byte mini-data

    cache

  • 23SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    StrongARM Processor (2/2)SA-1100 die plot Photo courtesy of Intel Corp.

    SA-1100 characteristics

    Process 0.35 um Transistors 2,500,000 MIPS 220/250Metal layers 3 Die area 75 mm

    2Power 330/550 mW

    Vdd 1.5/2 V Clock 190/220 MHz MIPS/W 665/450

  • 24SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMIHarvard architecture Increases available memory bandwidth

    Instruction memory interface Data memory interface

    Simultaneous accesses to instruction and data memory can be achieved

    5-stage pipelineChanges implemented to Improve CPI to ~1.5 Improve maximum clock frequency

  • 25SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMI Organization

    I-cache

    rot/sgn ex

    +4

    byte repl.

    ALU

    I decode

    register read

    D-cache

    fetch

    instructiondecode

    execute

    buffer/data

    write-back

    forwardingpaths

    immediatefields

    nextpc

    regshift

    load/storeaddress

    LDR pc

    SUBS pc

    post-index

    pre-index

    LDM/STM

    register write

    r15

    pc + 8

    pc + 4

    +4

    mux

    shift

    mul

    B, BLMOV pc

  • 26SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMI Pipeline Operations (1/4)

    instructionfetch

    instructionfetch

    Thumbdecompress

    ARMdecode

    regread

    regwriteshift/ALU

    regwriteshift/ALU

    r. read

    decode

    data memoryaccess

    Fetch Decode Execute

    Memory WriteFetch Decode Execute

    ARM9TDMI:

    ARM7TDMI:

    The ARM9TDMI pipeline is much tighter and does not have sufficient slack time to allow Thumb instructions to be first translate into ARM instructions and then decodedIt has hardware to decode both ARM and Thumb instructions directly

  • 27SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMI Pipeline Operations (2/4)

    Thumb instruction decompressor Translates a thumb instruction into its equivalent ARM instruction

  • 28SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMI Pipeline Operations (3/4)

    #imm8

    15 13 12 11 10 8 7 0

    0 0 1 10 Rd

    1 1 1 0 0 0 0 0 0 0 #imm81 0 1 0 0 1 0 Rd 0 Rd

    31 28 27 26 25 24 21 20 19 16 15 12 11 0

    alwayscondition

    zeroshift

    immediatevalue

    destinationmajor opcode,format 3: MOV/CMP/ADD/SUBwith immediate

    and sourceregister

    minor opcodedenoting ADD

    & set CC

    Thumb

    ARM

    ExampleADD Rd, #imm8 ADDS Rd, Rd, #imm8

    Thumb ARM

  • 29SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9TDMI Pipeline Operations (4/4)Coprocessor support Floating-point, digital signal processing, special-purpose

    hardware accelerator

    On-chip debugger Additional features compared to ARM7TDMI

    Hardware single stepping Breakpoint can be set on exceptions

    ARM9TDMI characteristics

    Process 0.25 um Transistors 110,000 MIPS 220Metal layers 3 Core area 2.1 mm

    2Power 150 mW

    Vdd 2.5 V Clock 0 to 200 MHz MIPS/W 1500

  • 30SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Cached ARM9TDMI MacrocellARM920T ARM9TDMI 16KB instruction cache, 16KB data cache Full Memory Management Unit, Write Buffer

    ARM922T ARM9TDMI 8KB instruction cache, 8KB data cache Full Memory Management Unit, Write Buffer

    ARM940T ARM9TDMI 4KB instruction cache, 4KB data cache Protection Unit

  • 31SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM920T CPU Core

  • 32SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM9E-S Family OverviewARM9E-S is based on an ARM9TDMI with the following extensions: Single cycle 32*6 multiplier implementation EmbeddedICE logic RT Improved ARM/Thumb interworking New 32*16 and 16*16 multiply instructions New count leading zero instruction New saturated math instructions

    ARM946E-S ARM9E-S core Instruction and data caches, selectable sizes Instruction and data RAMs, selectable sizes Protection unit AHB bus interface

    Architecture v5TE

  • 33SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM10TDMI (1/2)Performance on the same IC process

    ARM10TDMI ARM9TDMI ARM7TDMI22

    300MHz on 0.25m CMOS technologyIncrease clock rate 6-stage pipeline

    ARM10TDMI

    branchprediction

    regwrite

    r. readdecode

    data memoryaccess

    Memory WriteFetch Decode Execute

    decode

    Issue

    multiplierpar tials add

    instructionfetch

    datawrite

    shift/ALU

    addr.calc.

    multiply

  • 34SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM10TDMI (2/2)Reduce CPI Branch prediction Non-blocking load and store execution 64-bit data memory transfer 2 registers in each cycle

    4 read ports and 3 write ports

  • 35SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM1020T OverviewArchitecture v5T ARM1020E will be v5TE

    CPI ~ 1.36-stage pipelineStatic branch prediction32KB instruction and 32KB data caches hit under miss support

    64 bits per cycle LDM/STM operationsEmbeddedICE Logic RT-IISupport for new VFPv1 architectureARM10200 test chip ARM1020T VFP10 SDRAM memory interface PLL

  • 36SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM1136J(F)-SFirst Implementation of ARMv6 Architecture ARM1136J-S integer-only core ARM1136JF-S with integrated floating point

    High speed pipeline microarchitecture 8 stages

    System level flexibilityLow Power Microarchitecture designed for low power Power management modes

    Availability Delivering to first licensees in December 2002

    The ARM11 core has been developed and integrated in parallel with the ARM11 PrimeXsys Platform to ensure a fully compatible, high performance, extendable system solution

  • 37SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Memory Hierarchy

  • 38SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Memory Size and Speed

    On-chip cache memory

    registers

    2nd-level off chip cache

    Main memory

    Hard diskAccess

    timecapacity

    Slow

    Fast

    Large

    Small Expensive

    Cheap

    Cost

  • 39SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Caches (1/2)A cache memory is a small, very fast memory that retains copies of recently used memory values.It usually implemented on the same chip as the processor.Caches work because programs normally display the property of locality, which means that at any particular time they tend to execute the same instruction many times on the same areas of data.An access to an item which is in the cache is called a hit, and an access to an item which is not in the cache is a miss.

  • 40SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Caches (2/2)A processor can have one of the following two organizations: A unified cache

    This is a single cache for both instructions and data

    Separate instruction and data caches This organization is sometimes called a modified Harvard

    architectures

  • 41SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Unified instruction and data cache

    address

    instructionscache memory

    copies of

    instructions

    data

    00..0016

    FF..FF16

    instructions

    copies ofdata

    registers

    processor

    instructionsaddress

    and data

    and data

  • 42SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Separate data and instruction caches

    address

    datacache

    00..0016

    FF..FF16

    copies ofdata

    registers

    processor

    dataaddress

    address

    instructionsaddress

    cache

    copies ofinstructions

    instructions

    memory

    instructions

    data

  • 43SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Cache Write StrategiesWrite-through All write operations are passed to main memory

    Write-through with buffered write All write operations are still passed to main memory and

    the cache updated as appropriate, but instead of slowing the processor down to main memory speed the write address and data are stored in a write buffer which can accept the write information at high speed.

    Copy-back (write-back) No kept coherent with main memory

  • 44SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Software Development

  • 45SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Main Components in ADSANSI C compilers armcc and tccISO/Embedded C++ compilers armcpp and tcppARM/Thumb assembler - armasmLinker - armlinkProject management tool for windows - CodeWarriorInstruction set simulator - ARMulatorDebuggers - AXD, ADW, ADU and armsdFormat converter - fromelfLibrarian armarARM profiler armprofC and C++ librariesROM-based debug tools (ARM Firmware Suite, AFS)Real Time Debug and Trace support Support for all ARM cores and processors including ARM9E, ARM10, Jazelle, StrongARM and Intel Xscale

    ADS: ARM Developer Suite

  • 46SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM C CompilerCompiler is compliant with the ANSI standard for CSupported by the appropriate library of functionsUse ARM Procedure Call Standard (APCS) for all external functions For procedure entry and exit

    May produce assembly source output Can be inspected, hand optimized and then assembled

    sequentiallyCan also produce Thumb codes

  • 47SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM LinkerTake one or more object files and combine themResolve symbolic references between the object files and extract the object modules from librariesNormally the linker includes debug tables in the output file

  • 48SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM Symbolic DebuggerA front-end interface to debug program running either under emulator (on the ARMulator) or remotely on a ARM development board (via a serial line or through JTAG test interface)ARMsd allows an executable program to be loaded into the ARMulator or a development board and run. It allows the setting of Breakpoints, addresses in the code Watchpoints, memory address if accessed as data

    address Cause exception to halt so that the processor state can be

    examined

  • 49SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM Emulator: ARMulatorA suite of programs that models the behavior of various ARM processor cores and system architecture in software on a host systemCan be operates at various levels of accuracy Instruction accurate Cycle accurate Timing accurate

    Benchmarking before hardware is available Instruction count or number of cycles can be measured for a program Performance analysis.

    Run software on ARMulator Through ARMsd or ARM GUI debuggers, e.g., AXD The processor core model incorporates the remote debug interface,

    so the processor and the system state are visible from the ARM symbolic debugger

    Supports a C library to allow complete C programs to run on the simulated system

  • 50SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM Development BoardA circuit board including an ARM core (e.g. ARM9TDMI), memory component, I/O and electrically programmable devicesIt can support both hardware and software development before the final application-specific hardware is available

  • 51SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    ARM IntegratorA mother with some extensions to support the development of applicationsProvides core modules, logic modules (XilinxVirtex FPGA, Alter APEX FPGA), OS, input/output resources, bus arbitration, interrupt handling

  • 52SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Summary (1/2)ARM Processor Family

    Processorfamily

    # of pipeline stages

    Memory organization

    Clock Rate MIPS/MHz

    25 MHz0.91.21.11.251.15

    ARM11 8 Von Neumann/Harvard

    550 MHz 1.2

    66 MHz72 MHz200 MHz400 MHz233 MHz

    Von NeumannVon NeumannVon NeumannHarvardHarvardHarvard

    335565

    ARM6ARM7ARM8ARM9ARM10StrongARM

  • 53SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    Summary (2/2)Memory hierarchy Unified cache/Separate instruction and data cache Write-through with buffered write

    Software Development CodeWarrior IDE

    armcc/tcc/armcpp/tcpp armasm armlink armprof

    AXD (ARM eXtended Debugger) armsd

    ARMulatorARM Integrator

  • 54SoC Consortium Course MaterialSoC Design Laboratory 03/17/2004

    References[1] http://twins.ee.nctu.edu.tw/courses/ip_core_02/index.html[2] ARM System-on-Chip Architecture, Second Edition,

    edited by S.Furber, Addison Wesley Longman: ISBN 0-201-67519-6.

    [3] Architecture Reference Manual, Second Edition, edited by D. Seal, Addison Wesley Longman: ISBN 0-201-73719-1.

    [4] www.arm.com

    ARM Processor Architecture (II)OutlineARM Processor CoreARM7TDMI Processor CoreARM7TDMI Block DiagramARM7TDMI Core DiagramARM7TDMI Interface Signals (1/5)ARM7TDMI Interface Signals (2/5)ARM7TDMI Interface Signals (3/5)ARM7TDMI Interface Signals (4/5)ARM7TDMI Interface Signals (5/5)Memory AccessARM Memory InterfaceCached ARM7TDMI MacrocellsProcessor Core V.S. CPU CoreARM8Pipeline OrganizationInteger Unit OrganizationARM8 MacrocellStrongARMStrongARM core pipeline organizationStrongARM Processor (1/2)StrongARM Processor (2/2)ARM9TDMIARM9TDMI OrganizationARM9TDMI Pipeline Operations (1/4)ARM9TDMI Pipeline Operations (2/4)ARM9TDMI Pipeline Operations (3/4)ARM9TDMI Pipeline Operations (4/4)Cached ARM9TDMI MacrocellARM920T CPU CoreARM9E-S Family OverviewARM10TDMI (1/2)ARM10TDMI (2/2)ARM1020T OverviewARM1136J(F)-SMemory HierarchyMemory Size and SpeedCaches (1/2)Caches (2/2)Unified instruction and data cacheSeparate data and instruction cachesCache Write StrategiesSoftware DevelopmentMain Components in ADSARM C CompilerARM LinkerARM Symbolic DebuggerARM Emulator: ARMulatorARM Development BoardARM IntegratorSummary (1/2)Summary (2/2)References