16
ARM1176JZF-S (iPhone 3G) Jeff Brantley Chris Gregg Bill Stitson

ARM1176JZF-S ( iPhone 3G)

  • Upload
    moral

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

ARM1176JZF-S ( iPhone 3G). Jeff Brantley Chris Gregg Bill Stitson. Processor Overview. Features Designed for consumer and wireless products RISC Processor with Harvard Architecture Vector Floating Point coprocessor Branch prediction “ TrustZone ” security built-in to the CPU - PowerPoint PPT Presentation

Citation preview

Page 1: ARM1176JZF-S ( iPhone  3G)

ARM1176JZF-S(iPhone 3G)

Jeff BrantleyChris GreggBill Stitson

Page 2: ARM1176JZF-S ( iPhone  3G)

Processor OverviewFeatures

• Designed for consumer and wireless products • RISC Processor with Harvard

Architecture• Vector Floating Point

coprocessor• Branch prediction• “TrustZone” security built-in

to the CPU• Instruction and data caches• 8-stage pipeline• 32-bit and 16-bit (“Thumb”)

instruction sets, and “Jazelle” technology for Java execution

Page 3: ARM1176JZF-S ( iPhone  3G)

Memory Hierarchy Harvard architecture:

separate data and instruction caches Allows simultaneous

access 64-bit datapaths L1 Cache

up to 64KB in size 4-way set associative virtual index, physical tag 8 words per line, critical

word first on miss Round robin or pseudo-

random replacement policy

[1]

Page 4: ARM1176JZF-S ( iPhone  3G)

Level 2 Interface

“high-bandwidth interface to second level caches, on-chip RAM, peripherals, and interfaces to external memory” [1]

Level 2 interconnect 64-bit wide interfaces: Instruction Fetch Data Read/Write DMA

Peripheral Interface is 32 bits wide

Page 5: ARM1176JZF-S ( iPhone  3G)

Translation Lookaside Buffer (TLB)

MicroTLBs One each for instructions, data 10 entries Fully associative Round-robin or random replacement

Single Main TLB Contains a fully-associative region of 8 lockable

elementsMisses handled by two-level page table

Page 6: ARM1176JZF-S ( iPhone  3G)

Coprocessor interfaceCore processor can interface to on-chip coprocessors

Instruction set supports up to 16 coprocessors Two of these are used by the VFP

Coprocessors intended to run in-step with core, share data Two-cycle delay: “generous timing margins” [1] Loose synchronization via token queues Core may flush coprocessor pipeline or cancel instructions

Only one coprocessor “active” at one time Not so bad: calls to driver software = core instructions Allows much of the interface to be shared ($$$)

Page 7: ARM1176JZF-S ( iPhone  3G)

Coprocessor Synchronization

[1]

Page 8: ARM1176JZF-S ( iPhone  3G)

VFP CoprocessorUses a dedicated interface to the processor IEEE 754 Standard for Binary Floating-Point

Arithmetic64-bit load and store buses3 independent, parallel pipelines:

Load and store Multiply and accumulate Divide and square root

Short vector instructions: 8 single precision, 4 double

No branch instructions

Page 9: ARM1176JZF-S ( iPhone  3G)

Branch PredictionBranch Prediction (BP) can be turned on and off

with a control register. Provides high level of control

The ARM processor performs two types of BP Dynamic: performed in the Prefetch Unit Static: performed by the integer core (and the first

time, before historical data exists)Branch folding

After prediction, the branch instruction is completely removed from the instruction stream presented to the pipeline.

Page 10: ARM1176JZF-S ( iPhone  3G)

Dynamic Branch Prediction Dynamic Branch Prediction is the “first line” of

branch prediction: if history exists, it will be used. The Branch Target Address Cache (BTAC) holds

virtual target addresses of previous branches 128-entry, direct mapped cache Includes a 2-bit branch prediction history. A BTAC hit produces a branch prediction with zero

cycle delay Both branches (resolved taken and not taken) are

stored in the BTAC, which improves performance. Branch folding is done for almost all dynamically

predicted branches.

Page 11: ARM1176JZF-S ( iPhone  3G)

Static Branch PredictionStatic Branch Prediction is only based on the branch

instruction characteristics (i.e., it does not utilize history)

Simple: All forward conditional branches are not taken, and all

backward branches are taken. “Around 65% of all branches are preceded by enough

non-branch cycles to be completely predicted.” [1]The static branch predictor is used

on compulsory misses (i.e., the first time a branch is encountered)

when there are capacity or conflict misses in the BTAC

Page 12: ARM1176JZF-S ( iPhone  3G)

TrustZone The ARM1176 processors implement “TrustZone”

security extensions that “provide a secure environment for software” [1]

dddd

[2]

• The hardware is partitioned so that the resources are physically separated on the chip, creating a strong boundary between the Normal World and the Secure World• Two virtual processors are created from the one physical

processor, removing the need for a separate processor dedicated to security

• TrustZone aware hardware such as DMA controllers allow secure data transfer

• Examples of how TrustZone can be used include secure PIN entry from the keyboard, to Digital Rights Management of multimedia data.

Page 13: ARM1176JZF-S ( iPhone  3G)

Integer Pipeline

• Up to 4 instructions fetched

• Static branch prediction in Fe2

• Decode/Issue can hold branch alongside other instruction

• Non-blocking loads• Hit Under Miss (HUM)

buffer

Page 14: ARM1176JZF-S ( iPhone  3G)

Jazelle Java hardware acceleration

Java bytecode translated to ARM instruction(s) Extra decode logic between Fetch and Decode stages

Extension of ARM instruction set Limited (unpublished) subset of Java bytecodes Instructions to enter and exit Jazelle state Unsupported bytecodes interpreted in software by

JVMRequires Jazelle-aware JVM

Relatively proprietary Free/Open Source JVM’s cannot take advantage

Page 15: ARM1176JZF-S ( iPhone  3G)

Thumb16-bit extension to 32-bit ARM ISA“Most commonly used” ARM instructions in 16-bit

formEnables higher code density

“Reduces memory bandwidth and size requirements by up to 35%” [4]

Like Jazelle, requires extra pre-decode translation hardware

Can link Thumb-compiled code optimized for space against performance-critical code compiled to 32-bit ARM

Page 16: ARM1176JZF-S ( iPhone  3G)

References① “ARM1176JZF-S Processor Technical Reference

Manual”, ARM Limited, Lit.-Nr.: ARM DDI 0301F, 2004--2007.

② “TrustZone Hardware Architecture”, ARM Limited, http://www.arm.com/products/security/trustzone/hardware.html, downloaded Dec. 4, 2009.

③ “Trust Zone System Design”, http://www.arm.com/products/security/trustzone/systemdesign.html, downloaded Dec. 4, 2009.

④ “ARM1176JZ(F)-S”, ARM Limited, http://www.arm.com/products/CPUs/ARM1176.html, downloaded Dec. 4, 2009.