15
ARM 2007 [email protected] Chapter 15 The Future of the Architectur e by John Rayfield Optimization Technique in Embedded System (AR M) [email protected], 2008 April

ARM 2007 [email protected] Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) [email protected],

Embed Size (px)

Citation preview

Page 1: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Chapter 15

The Future of the Architecture

by John Rayfield

Optimization Technique in Embedded System (ARM)[email protected], 2008 April

Page 2: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Overview

• 1999, ARM plan the future architecture– What’s the future direction of the architecture ?– This consideration results ARMv6.

» First implemented as ARM1136J-S

• Challenges in future– DSP, Video processing for CE device;– Mixture of Little- and Big-endian for TCP/IP;– Sync. methods for multiple processor system;– Power consumption (Computing/mW) .

• Future after ARMv6– ARM TrustZone

Page 3: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.1 Advanced DSP & SIMD support in ARMv6

• SIMD– Advantage: Code density, low power: less

instruction, less time.– Price for this efficient: reduced flexibility.

• Light-weight SIMD– Slicing up existing 32-bit datapath into four 8bit

or two 16bit slices. » So, speedup is 2 (16-bit) or 4 (8-bit).

• ARMv6 includes this “lightweight” SIMD.– SADD8, UADD8, etc.– SADD16, UADD16, etc.

Page 4: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

ARMv6 Instruction

• SIMD arithmetic instruction• Pack instruction

– PKHTB Rd, Rn, Rm // pack halves of Rn, Rm into Rd– PKHBT

• Complex arithmetic instruction– SMUSD Rt, Ra, Rb // Ra(R)*Rb(R) – Ra(i)*Rb(i)

• Cryptographic multiplication– UMAAL Rl, Rh, Rm, Rs // Rh/Rl = Rm*Rs+Rh+Rl

Page 5: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.2 System support additions to ARMv6

• Set current endian– SETEND <spec>

» // spec = BE or LE

• And – REV Rd, Rm

Page 6: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.2.2 Exception Procession

• ARMv6 adds the instruction to improve the efficiency for OS to save the return state of an interruption or exception on a stack.

Page 7: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.2.3 Multiprocessing Synchronization Primitives

• As System-on-Chip (SoC) architecture have become more sophisticated.– ARM cores are now often found in devices with

many processing units that compete for shared resources.

Page 8: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Atomic Sync

• Before, SWP instruction is used to keep semaphores coherent.– But, SWP carries the bottleneck. Because SWP is a block

ing instruction (lock the BUS until resource released, as spin-lock).

• LDREX/STREX in ARMv6– Given system monitor in Memory System.– LDREX load a value from M[x] into Rn, and assuming it

will not be changed during it being used.– STREX store a value into M[x], and its return indicates if

Mx had been modified between previous LDREX and STREX.(means STREX maybe fail)

– Multi-Reads, Exclusive Write.

Page 9: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Organization of ARMv6

• Most sophisticated ARM pipeline– 8-stage, and separate pipelines for load/store

and multiply/accumulate.

• Hit-under-N-miss– Parallel Load Store Unit (LSU)– Decoupling the pipeline execution from the

completion of loads and stores.

• Physical Cache (instead Virtual Cache)– It will reduce cache flushing when context

switching.– Further more, save the power-consumption

brought with memory access ( up to ~20% improvement).

Page 10: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Page 11: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.4 Future Technologies beyond ARMv6

• In 2003, ARM made further technology announcements including TrustZone and Thumb-2.

Page 12: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.4.1 TrustZone

• TrustZone is an architecture extension– first introduced in ARM1176JZ-S.

• Reason– OS are now so complex that it is very hard to verify

security and correctness in the software.– The ARM solution is to add new operating “states” when

only a small verifiable software kernel will run, and this will provide services to the larger OS.

– The microprocessor core then take a role in controlling system peripherals that may be only available to the secure “state” through some new exported signals on the bus interface.

• TrustZone is most useful in devices that will carrying out content downloads, such as cell phones or other portable devices with network connections.

Page 13: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Page 14: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

15.4.2 Thumb-2

• Thumb-2 is an architecture extension– designed to increase performance at high code

density.– It allows for a blend of 32-bit ARM-like

instruction with 16-bit thumb instructions.

• Thumb-2 is announced in Oct 2003.– will be implemented in ARM1156T2-S.– details are not public by the time of writing.

Page 15: ARM 2007 liangalei@sjtu.edu.cn Chapter 15 The Future of the Architecture by John Rayfield Optimization Technique in Embedded System (ARM) LiangAlei@SJTU.edu.cn,

ARM 2007 [email protected]

Summary

• The ARM architecture is not a static constant.– But is being developed and improved to suite the

application required by today’s consumer devices.

– Although the ARMv5TE was very successful at adding some DSP support to ARM. ARMv6 extends the DSP support as well as adding support for large multiprocessor system.

• ARM still concentrates on one of its key benefits—Code Density—and has recently announced the Thumb-2.

• The new focus on security with TrustZone gives ARM a leading in this area.