Presentation 05272011

Embed Size (px)

Citation preview

  • 7/28/2019 Presentation 05272011

    1/55

    1

    The ARM Architecture

    Joe BungoApplications Engineer

    ARM University Program

  • 7/28/2019 Presentation 05272011

    2/55

    2

    Agenda

    Introduction to ARM Ltd

    ARM Architecture/Programmers Model

    Data Path and Pipelines

    System Design

    Development Tools

  • 7/28/2019 Presentation 05272011

    3/55

    3

    ARM Ltd

    Founded in November 1990

    Spun out of Acorn Computers

    Initial funding from Apple, Acorn and VLSI

    Designs the ARM range of RISC processor cores

    Licenses ARM core designs to semiconductorpartners who fabricate and sell to theircustomers

    ARM does not fabricate silicon itself

    Also develop technologies to assist with the design-in of the ARM architecture

    Software tools, boards, debug hardware

    Application software

    Bus architectures

    Peripherals, etc

  • 7/28/2019 Presentation 05272011

    4/55

    4

    ARMs Activities

    memorymemory

    SoCSoC

    Processors

    System Level IP:

    Data Engines

    Fabric

    3D Graphics

    Physical IP

    Software IP

    Development Tools

    Connected Community

  • 7/28/2019 Presentation 05272011

    5/55

    5

    ARM Connected Community 700+

    5

  • 7/28/2019 Presentation 05272011

    6/55

    6

    Huge Range of Applications

    Energy Efficient Appliances

    IR FireDetector

    IntelligentVending

    Tele-parking

    UtilityMeters

    ExerciseMachinesIntelligent toys

    Equipment Adopting 32-bit ARMMicrocontrollers

  • 7/28/2019 Presentation 05272011

    7/55

    7

    Worlds Smallest ARM Computer?

    A CB

    Wirelessly networked into large scalesensor arrays

    Battery Solar Cells

    Processor, SRAM and PMU

    University of Michigan

    Sensors, timers

    Cortex-M0 +16KB RAM 65nmUWB Radio antenna

    10 kB Storage memory~3fW/bit

    12Ah Li-ion Battery

    Wireless Sensor Network

    Cortex-M0; 65

  • 7/28/2019 Presentation 05272011

    8/55

    8

    Worlds Largest ARM Computer?

    4200 ARM poweredNeutrino Detectors

    ork supported by the National Science Foundation and University of Wisconsin-Madison

    70 bore holes 2.5km deep

    60 detectors per stringstarting 1.5km down

    1km3 of active telescope

  • 7/28/2019 Presentation 05272011

    9/55

    9

    From 1mm3 to 1km3

    1mm3 1km3

    10 $1000

    Mobile

    Embedded Consumer

    Mobile Computing Server

    Enterprise PC

    Home

    HPC

  • 7/28/2019 Presentation 05272011

    10/55

    10

  • 7/28/2019 Presentation 05272011

    11/55

    11

    Agenda

    Introduction to ARM Ltd

    ARM Architecture/Programmers Model

    Data Path and Pipelines

    System Design

    Development Tools

  • 7/28/2019 Presentation 05272011

    12/55

    12

    ARM Classic Processor Portfolio

    ARM922T

    SC100

    ARM7TDMI(S)

    ARM968E-S

    ARM946E-S

    ARM1176JZ(F)-S

    ARM1156T2(F)-S

    ARM1136J(F)-S

    x1-4

    ARM11MPCore

    ARM926EJ-S

    ARM7EJ-S

    ARMv6

    ARMv4

    ARMv5

  • 7/28/2019 Presentation 05272011

    13/55

    13

    ARM Cortex Processors (v7)

    ARM Cortex-A family (v7-A): Applications processors for full OS

    and 3rd party applications

    ARM Cortex-R family (v7-R):

    Embedded processors for real-timesignal processing, control applications

    ARM Cortex-M family (v7-M): Microcontroller-oriented processors

    for MCU and SoC applications

    Cortex-R4

    Cortex-A8

    SC300

    Cortex-M1

    Cortex-M3

    ...2.5GHzx1-4

    Cortex-A9

    12k gates...

    Cortex-M0

    Cortex-M4

    x1-4

    Cortex-A51-2

    HeronR

    x1-4

    Cortex-A15

  • 7/28/2019 Presentation 05272011

    14/55

    14

    Relative Performance*

    *Represents attainable speeds in 130, 90, 65, or 45nm processes

    Cortex-M0 Cortex-M3 ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8 Cortex-A9Dual-core

    Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000

    Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5

    0

    500

    1000

    1500

    2000

    2500

    MaxFre

    quency(Mhz)

  • 7/28/2019 Presentation 05272011

    15/55

    15

    Cortex family

    Cortex-A8

    Architecture v7A

    MMU

    AXI

    VFP & NEON support

    Cortex-R4

    Architecture v7R

    MPU (optional)

    AXI

    Dual Issue

    Cortex-M3

    Architecture v7M

    MPU (optional)

    AHB Lite & APB

  • 7/28/2019 Presentation 05272011

    16/55

    16

    Data Sizes and Instruction Sets

    The ARM is a 32-bit architecture.

    When used in relation to the ARM:

    Byte means 8 bits

    Halfword means 16 bits (two bytes)

    Word means 32 bits (four bytes)

    Most ARMs implement two instruction sets

    32-bit ARM Instruction Set

    16-bit Thumb Instruction Set

    Jazelle cores can also execute Java bytecode

  • 7/28/2019 Presentation 05272011

    17/55

    17

    ARM and Thumb Performance

    Memory width (zero wait state)

    0

    5000

    10000

    15000

    20000

    25000

    30000

    32-bit 16-bit 16-bit with

    32-bit stack

    ARM

    Thumb

    Dhrystone 2.1/sec@ 20MHz

  • 7/28/2019 Presentation 05272011

    18/55

    18

    The Thumb-2 instruction set

    Variable-length instructions

    ARM instructions are a fixed length of 32 bits

    Thumb instructions are a fixed length of 16

    bits

    Thumb-2 instructions can be either 16-bit or

    32-bit

    Thumb-2 gives approximately 26%improvement in code density over ARM

    Thumb-2 gives approximately 25%

    improvement in performance over

    Thumb

  • 7/28/2019 Presentation 05272011

    19/55

    19

    Cortex-M3 Programmers Model

    Fully programmable in C

    Stack-based exception model

    Only two processor modes

    Thread Mode for User tasks

    Handler Mode for OS tasks and exceptions

    Vector table contains addresses

    Process

    r8r9

    r10

    r11

    r12

    sp

    lr

    r15 (pc)

    xPSR

    r0

    r1

    r2

    r3

    r4

    r5

    r6

    r7

    Main

    sp

  • 7/28/2019 Presentation 05272011

    20/55

    20

    ARM Cortex-M3

    Application code

    OS

    System Call (SVCall)Undefined Instruction

    Privileged

    Cortex-M3 Processor Privilege

    Memory

    Instructions & Data

    Aborts

    Interrupts

    Reset

    Non-Privileged

    Supervisor

    User

    Handler Mode

    Thread Mode

  • 7/28/2019 Presentation 05272011

    21/55

    21

    Cortex-M3 Interrupt Handling One Non-Maskable Interrupt (INTNMI) supported

    1-240 prioritizable interrupts supported

    Interrupts can be masked

    Implementation option selects number of interrupts supported

    Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core

    Interrupt inputs are active HIGH

    Cortex-M3Processor Core

    INTNMI

    NVIC

    Cortex-M3

    1-240 Interrupts

    INTISR[239:0]

  • 7/28/2019 Presentation 05272011

    22/55

    22

    Cortex-M3 Exception Handling

    Reset : power-on or system reset

    NMI : cannot be stopped or preempted by any exception other than reset

    Faults

    Hard Fault : default Fault or any fault unable to activate

    Memory Manage : MPU violations

    Bus Fault : prefetch and memory access violations

    Usage Fault : undef instructions, divide by zero, etc.

    SVCall : privileged OS requests

    Debug Monitor : debug monitor program

    PendSV : pending SVCalls

    SysTick Interrupt : internal sys timer, i.e., used by RTOS to periodicallycheck resources or peripherals

    External Interrupt : i.e., external peripherals

  • 7/28/2019 Presentation 05272011

    23/55

    23

    Cortex-M3 Program Status Register

    One Status Register consisting of

    APSR - Application Program Status Register ALU flags

    IPSR - Interrupt Program Status Register Interrupt/Exception No.

    EPSR - Execution Program Status Register

    IT field If/Then block information

    ICI field Interruptible-Continuable Instruction information

    xPSR

    Composite of the 3 PSRs

    Stored on the stack on exception entry

    IT/ICIIT

    2731

    N Z C V Q

    28 7

    ISR Number

    1623 15 0242526 10

    T

  • 7/28/2019 Presentation 05272011

    24/55

    24

    Conditional Execution

    ITTET EQ

    Inst 1

    Inst 2

    Inst 3

    Inst 4

    If Then (IT) instruction added (16 bit)

    Up to 3 additional then or else conditions maybe specified (T or E)

    Makes up to 4 following instructions conditional

    Any normal ARM condition code can be used

    16-bit instructions in block do not affect condition code flags

    Apart from comparison instruction

    32 bit instructions may affect flags (normal rules apply)

    Current if-then status stored in CPSR Conditional block maybe safely interrupted and returned to

    Must NOT branch into or out of if-then block

    MOVEQ

    ADDEQ

    SUBNE

    ORREQ

  • 7/28/2019 Presentation 05272011

    25/55

    25

    Load/Store

    Miscellaneous

    Classes of Instructions (v4T)

    Data Operations

    MOV PC, Rm

    Bcc

    BLBLX

    Change of Flow

  • 7/28/2019 Presentation 05272011

    26/55

    26

    Branch : B{} label

    Branch with Link : BL{} subroutine_label

    The processor core shifts the offset field left by 2 positions, sign-extends it

    and adds it to the PC

    32 Mbyte range

    How to perform longer branches?

    2831 24 0

    Cond 1 0 1 L Offset

    Condition field

    Link bit 0 = Branch1 = Branch with link

    232527

    Branch instructions

  • 7/28/2019 Presentation 05272011

    27/55

    27

    Data processing Instructions

    Consist of :

    Arithmetic: ADD ADC SUB SBC RSB RSC

    Logical: AND ORR EOR BIC

    Comparisons: CMP CMN TST TEQ

    Data movement: MOV MVN

    These instructions only work on registers, NOT memory.

    Syntax:

    {}{S} Rd, Rn, Operand2

    Comparisons set flags only - they do not specify Rd Data movement does not specify Rn

    Second operand is sent to the ALU via barrel shifter.

  • 7/28/2019 Presentation 05272011

    28/55

    28

    Register, optionally with shift operation Shift value can be either be:

    5 bit unsigned integer

    Specified in bottom byte of

    another register.

    Used for multiplication by constant

    Immediate value

    8 bit number, with a range of 0-255.

    Rotated right through even

    number of positions

    Allows increased range of 32-bitconstants to be loaded directly into

    registersResult

    Operand1

    BarrelShifter

    Operand2

    ALU

    Using a Barrel Shifter:The 2nd Operand

  • 7/28/2019 Presentation 05272011

    29/55

    29

    Single register data transfer

    LDR STR Word

    LDRB STRB ByteLDRH STRH Halfword

    LDRSB Signed byte load

    LDRSH Signed halfword load

    Memory system must support all access sizes

    Syntax:

    LDR{}{} Rd,

    STR{}{} Rd,

    e.g. LDREQB

  • 7/28/2019 Presentation 05272011

    30/55

    30

    Agenda

    Introduction to ARM Ltd

    ARM Architecture/Programmers Model Data Path and Pipelines

    System Design

    Development Tools

  • 7/28/2019 Presentation 05272011

    31/55

    31

    Cortex-M3 Datapath

    RegisterBank Mul/Div

    AddressIncrementer

    ALU

    B

    A

    INTADDR

    I_HADDR

    AddressRegister

    BarrelShifter

    Writeback

    ALU

    Read DataRegister

    Write DataRegister

    Instruction

    Decode

    I_HRDATA

    D_HWDATA

    D_HRDATA

    AddressIncrementer

    D_HADDRAddressRegister

  • 7/28/2019 Presentation 05272011

    32/55

    32

    Cortex-M3 has 3-stage fetch-decode-execute pipeline

    Similar to ARM7

    Cortex-M3 does more in each stage to increase overall

    performance

    Cortex-M3 Pipeline

    Branch forwarding & speculation

    1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute

    Execute stage branch (ALU branch & Load Store Branch)

    Fetch(Prefetch)

    AGU

    InstructionDecode &

    Register Read

    Branch

    AddressPhase & Write

    Back

    Data PhaseLoad/Store &

    Branch

    Multiply & Divide

    Shift ALU & Branch

    Write

  • 7/28/2019 Presentation 05272011

    33/55

    33

    ARM10 vs. ARM11 Pipelines

    ARM11

    Fetch1

    Fetch2

    Decode Issue

    Shift ALU Saturate

    Writeback

    MAC1

    MAC2

    MAC3

    AddressData

    Cache1

    DataCache

    2

    Shift + ALUMemoryAccess Reg

    Write

    FETCH DECODE EXECUTE MEMORY WRITE

    Reg Read

    Multiply

    BranchPrediction

    InstructionFetch

    ISSUE

    ARM orThumb

    InstructionDecode Multiply

    Add

    ARM10

  • 7/28/2019 Presentation 05272011

    34/55

  • 7/28/2019 Presentation 05272011

    35/55

    35

    Agenda

    Introduction to ARM Ltd

    ARM Architecture/Programmers ModelData Path and Pipelines

    System Design

    Development Tools

  • 7/28/2019 Presentation 05272011

    36/55

    36

    High PerformanceARM processor

    High-bandwidthon-chip RAM

    HighBandwidth

    ExternalMemory

    Interface

    DMABus Master

    APBBridge

    Keypad

    UART

    PIO

    TimerAHB

    APB

    High Performance

    PipelinedBurst SupportMultiple Bus Masters

    Low PowerNon-pipelinedSimple Interface

    An Example AMBA System

  • 7/28/2019 Presentation 05272011

    37/55

    37

    HWDATA

    Arbiter

    Decoder

    Master#1

    Master#3

    Master#2

    Slave#1

    Slave#4

    Slave#3

    Slave#2

    Address/Control

    rite Data

    Read Data

    HADDR

    HWDATA

    HRDATA

    HADDR

    HRDATA

    AHB Structure

  • 7/28/2019 Presentation 05272011

    38/55

    38

    Mali200 + GP2 SoC Integration

    Mali 200

    MMUAXI

    APB

    Clock

    Reset

    IRQs

    IDLEs

    Shipped as synthesizable

    Verilog

    Mali 200 + GP2 requires a

    single instant in the SoC,

    with a small number ofconnections to be made.

    IDLES can be used for

    gating the Mali200 and GP2

    core clock

    Mali GP2

    AXI Fabric

  • 7/28/2019 Presentation 05272011

    39/55

    39

    Typical GPU SoC Design

    ARM1176JZF

    L230

    Mali 200 Mali GP2

    CLCD

    PL111

    PL301 High-performance matrix

    SDRAMC

    PL340

    APB Peripheral Sub-System

    D I

    APB

    MS

    Sys

    Ctrl

    nRst

    M

    PL390

    GIC

    Int

    DDR

    PHY

    Designed and optimised for AMBA: provides easier integration with ARM cores and fabric IP

    Unified Memory Architecture

    Local AXI Interconnect

    Mali

    MMU

  • 7/28/2019 Presentation 05272011

    40/55

    40

    ARM PIPD Logic Product Families

    ProcessGeometry180nm 130nm 90nm 65nm

    HighSpeed

    Low

    Power

    Low

    Area

    45nm 32nm 28nm

    ECO Kits

    (SC12)(SC12)

    High Performance (Advantage-HS / SAGE-HS)

    (SC7 / SC8)(SC7 / SC8)

    Ultra High Density (Metro)

    (SC9 / SC10)(SC9 / SC10)

    High Density (Advantage)

    (SC9 / SC10(SC9 / SC10 tappedtapped))

    High Density (SAGE-X)EC

    OKits

    Power Management KitsPower Management Kits

    PowerManagementKits

    PowerManagementKits

  • 7/28/2019 Presentation 05272011

    41/55

    41

    Agenda

    Introduction to ARM Ltd

    ARM Architecture/Programmers ModelData Path and Pipelines

    System Design

    Development Tools

  • 7/28/2019 Presentation 05272011

    42/55

    42

    ARM Debug Architecture

    ARMcore

    ETM

    TAPcontroller

    Trace PortJTAG port

    Ethernet

    Debugger (+ optional

    trace tools)

    EmbeddedICE Logic

    Provides breakpoints and processor/systemaccess

    JTAG interface (ICE)

    Converts debugger commands to JTAGsignals

    Embedded trace Macrocell (ETM)

    Compresses real-time instruction and dataaccess trace

    Contains ICE features (trigger & filter logic)

    Trace port analyzer (TPA)

    Captures trace in a deep buffer

    EmbeddedICELogic

  • 7/28/2019 Presentation 05272011

    43/55

    43

    Keil Development Tools for ARM

    Includes ARM macro assembler, compilers (ARM RealView C/C++Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil uVision

    Debugger and Keil uVision IDE

    Keil uVision Debugger accurately simulates on-chip peripherals (I2C, CAN,

    UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc.)

    Evaluation Limitations 16K byte object code + 16K data limitation

    Some linker restrictions such as base addresses for code/constants

    GNU tools provided are not restricted in any way

    http://www.keil.com/demo/

  • 7/28/2019 Presentation 05272011

    44/55

    44

    Keil Development Tools for ARM

  • 7/28/2019 Presentation 05272011

    45/55

    45

  • 7/28/2019 Presentation 05272011

    46/55

    46

    University Resources

    http://www.arm.com/support/university/

    [email protected]

  • 7/28/2019 Presentation 05272011

    47/55

    47

    TI Panda Board

    OMAP4430 Processor

    1 GHz Dual-core ARMCortex-A9 (NEON+VFP)

    C64x+ DSP

    PowerVR SGX 3D GPU

    1080p Video Support

    POP Memory

    1 GB LPDDR2 RAM

    USB Powered < 4W max consumption

    (OMAP small % of that) Many adapter options(Car, wall, battery, solar, ..)

  • 7/28/2019 Presentation 05272011

    48/55

    48

    Project Ideas Using Panda

    OS Projects

    OS porting to ARM/Cortex (TI OMAP)

    MythTV system

    Super-Panda stack of Pandas as compute engine and task

    distribution

    Linux applications

    NEON Optimization Projects

    Codec optimization in ffmpeg (pick your favorite codec)

    Voice and image recognition

    Open-source Flash player optimizations (swfdec)

  • 7/28/2019 Presentation 05272011

    49/55

    49

    Fin

  • 7/28/2019 Presentation 05272011

    50/55

    50

    Nokia N95 Multimedia Computer

    Symbian OS v9.2Operating System supporting ARMprocessor-based mobile devices,

    developed using ARM RealViewCompilation Tools

    OMAP 2420Applications Processor

    ARM1136 processor-based

    SoC, developed using Magma

    Blast family and winner of

    2005 INSIGHT Award for MostInnovative SoC

    Connect. Collaborate. Create.

    Mobiclip Video CodecSoftware video codec for ARM

    processor-based mobile devices

    ST WLAN Solution

    Ultra-low power 802.11b/g WLANchip with ARM9 processor-based

    MAC

    S60 3rd EditionS60 Platform supporting ARM

    processor-based mobile devices

  • 7/28/2019 Presentation 05272011

    51/55

    51

    Beagle Board

  • 7/28/2019 Presentation 05272011

    52/55

    52

    $149

    > 1000 participantsand growing

    Open access tohardware

    documentation

    Wikis, blogs,promotion of

    communityactivity

    Freesoftware

    Freedom toinnovate

    Personallyaffordable

    Active &technical

    community

    Opportunityto tinker and

    learn

    Instant access to>10 million lines

    of code

    Addressing

    open sourcecommunity

    needs

    Targeting community development

  • 7/28/2019 Presentation 05272011

    53/55

    53

    OMAP3530 Processor

    600MHz Cortex-A8 NEON+VFPv3

    16KB/16KB L1$

    256KB L2$

    430MHz C64x+ DSP

    32K/32K L1$

    48K L1D 32K L2

    PowerVR SGX GPU

    64K on-chip RAM

    POP Memory

    128MB LPDDR RAM

    256MB NAND flash USB Powered 2W maximum consumption

    OMAP is small % of that Many adapter options

    Car, wall, battery, solar,

    Peripheral I/O

    DVI-D video out

    SD/MMC+

    S-Video out

    USB 2.0 HS OTG

    I2C, I2S, SPI,

    MMC/SD JTAG

    Stereo in/out

    Alternate power

    RS-232 serial

    3

    Fast, low power, flexible expansion

  • 7/28/2019 Presentation 05272011

    54/55

    54

    Peripheral I/O

    DVI-D video out

    SD/MMC+

    S-Video out

    USB HS OTG

    I2C, I2S, SPI,

    MMC/SD

    JTAG

    Stereo in/out

    Alternate power RS-232 serial

    3

    Other Features 4 LEDs

    USR0

    USR1

    PMU_STAT

    PWR

    2 buttons USER

    RESET

    4 boot sources

    SD/MMC

    NAND flash

    USB

    Serial

    On-going collaboration at BeagleBoard.org

    Live chat via IRC for 24/7 community support

    Links to software projects to download

    And more

  • 7/28/2019 Presentation 05272011

    55/55

    Project Ideas Using Beagle

    OS Projects

    OS porting to ARM/Cortex (TI OMAP)

    MythTV system

    Super-Beagle stack of Beagles as compute engine and task

    distribution

    Linux applications

    NEON Optimization Projects

    Codec optimization in ffmpeg (pick your favorite codec)

    Voice and image recognition

    Open-source Flash player optimizations (swfdec)