Chap. 1 RISC 32 bit CPU Architecture Introduction

Preview:

Citation preview

Chap. 1

RISC 32 bit CPU Architecture Introduction

2

1.1 ARM vs. MIPS MIPS Overview ARM Overview

1.2 Samsung S3C2500B (ARM9) Overview Samsung S3C2500B

1.3 IXP (XScale) Overview

Outline

3

MIPS Overview The MIPS (Million Instruction Per Second) architec

ture grew out of research started at Stanford University (Professor John Hennessy).

MIPS project was one of the first publicly known implementations of a Reduced Instruction Set Computer (RISC) architecture.

MIPS processor implemented a smaller, simpler instruction set.

MIPS processor used a technique called pipelining to more efficiently process instructions.

MIPS used 32 registers, each 32 bits wide.

4

MIPS Instruction Set Overview MIPS instruction set consists of about 111 total instructions, each represented in 32 bits.

An example of a MIPS instruction is below: add $r10, $r7, $r8

000000 00111 01000 01010 00000 010100

Op Rs1 Rs2 Rd ………

Funct

$r7 $r8 $r10

5

1.1 ARM vs. MIPS MIPS Overview ARM Overview

1.2 Samsung S3C2500B (ARM9) Overview Samsung S3C2500B

1.3 IXP (XScale) Overview

Outline

6

ARM OverviewAdvances RISC Machines (now known as ARM) was established in November 1990.

ARM (formerly Advanced RISC Machines)

ARM7, ARM9, ARM10, ARM 11 StrongARM, Xscale (PXA, IXP, IXC, etc.)

The standard way to perform I/O functions on ARM systems is by the use of memory-

mapped I/O.

7

I/O Mapped I/O

每一個控制器上的暫存器都被給定一個特殊的 I/O 埠。

Intel 的 IN 跟 OUT 指令可以用來分別讀出或寫入暫存器的值。

CPU

AX

CPU

AX 0x15D4

控制器I/O 埠 0x68

控制器I/O 埠 0x68Outw AX, 0x68

0x15D4

8

Memory Mapped I/O記憶體對映 I/O 是將週邊設備的暫存器映對到記憶體位址空間。

CPU 在存取這些暫存器時,就像是在存取記憶體裡面的值一樣。

CPU

CPU

控制器暫存器位址 0xF000

控制器暫存器位址 0xF000

記憶體

movw 0xF000, BXmovw AX, [BX]

0x0000

0xF000

0xFFFF

AX 0x15D4

0x15D4

9

ARM OverviewARM is fully 16/32-bit RISC architecture

ARM variants are in widespread use in embedded and Low-power applications due to their power saving design features.

Power consumption: CPU Power W Clock /MHz

ARM7TDMI: < 0.25 60 -110 ARM7TDMI-S: < 0.4 >50 ARM9TDMI: 0.3 167 - 220 ARM1020E: ~0.85 200 - 400 IXP (XScale): 1.2 533 Inter 486 cpu: 10 50

10

ARM incorporates the following typical RISC architecture features: A load/store architecture

data-processing operations only operate on register contents, not directly on memory contents.

Simple addressing modes all load/store addresses being determined from register

contents and instruction fields only. Pipelined

(ARM7: 3 stages) (ARM7: 5 stages)

Uniform and fixed-length instruction fields, to simplify instruction decode.

ARM Overview

11

The ARM processor has a total of 37 registers: 31 general-purpose 32bit registers. 6 status registers. 16 general registers and one or two status registers are visible at any time. The visible registers depend on the processor mode. The other registers (the banked registers) are switched in to support IRQ, FIQ, Supervisor, Abort and Undefined mode processing.

ARM Overview

12

Registers: R0 to R15 are directly accessible. R0 to R12 are general purpose. R13 is the Stack Pointer (SP). R14 is the Link Register (LR). R15 is the Program Counter (PC).

ARM Overview

13

Current program status register (CPSR) CPSR is accessible in all processor modes. It contains the following condition code:

Flags, interrupt disable bits, the current processor mode, other status and control information.

Saved program status register (SPSR) SPSR is used to preserve the value of the CPSR

when the associated exception occurs.

ARM Overview

14

Register organization in ARM state Registers are arranged in partially overlapping banks,

with a different register bank for each processor mode, as shown in Figure 1.

Figure.1

ARM Overview

15

ARM OverviewFor detail information about the ARM CPU

Architecture and Register organization, we will introduce in Chap 3.

16

ARM vs. MIPSARM ARM7 MIPS1 MIPS16

Date announced

1985 1995 1986 1996

Instruction size (bits)

32 16/32 32 16/32

Address space (size, model)

32 bits, flat

32 bits, flat

32 bits, flat

32/64 bits, flat

Data addressing modes

6 Thumb :6ARM: 7

1 2

17

1.1 ARM vs. MIPS MIPS Overview ARM Overview

1.2 Samsung S3C2500B (ARM9) Overview Samsung S3C2500B

1.3 IXP (XScale) Overview

Outline

Samsung ARM S3C2500B – Product Overview

19

Product overviewS3C2500B

16/32-bit RISC Cost-effective, high-performance microcontroller

solution for Ethernet-based system SOHO router, Internet gateway, WLAN AP, etc.

S3C2500B built an outstanding CPU core 16/32-bit ARM940TDMI cached processor RISC

processor TDMI means Thumb mode, Debugger core, faster

Multiplier, embedded ICE logic Integrate 4KB instruction/data caches, write buffer,

AMBA bus interface

20

Write policies write through 任何時間 , 若 cache 內之資料有被修改 , 則亦立即修改主記憶

體之相對內容 buffer write-through: use write buffers to decouple the write op

erations of the CPU from external bus writing to main memory

write back 當 cpu 要須改 cache 內容時 , 僅修改 cache 僅當此 slot 要被換掉時 , 才其內容寫到主記憶體內 會有 cache coherency 的問題 , 即 cache 內的內容會和主記憶

體的內容不同

ProcessorCache

Write Buffer

DRAM

21

S3C2500B product overviewIntegrated the following on-chip functions

ARM940T cached processor 8k-byte unified cache/SRAM

I2C interface Ethernet controller HDLC controller GDMA controller UART controller USB controller IOM2 controller Programmable I/O ports Interrupt Controller

22

Product Overview - Features Architectures

Embedded in Circuit emulator (ICE) Little/big-endian mode supported (Internal architecture is big-endian)

System manager 8/16/32-bit external bus support for ROM/SRAM, flash memory, DRAM, and external I/

O, Support EDO/normal or SDRAM Four-word depth write buffer Cost-effective memory-to-peripheral DMA interface

Unified instruction/data cache Two-way set-associative, unified 8k-byte cache Support for LRU (least recently used) replacement protocol

I2C serial interface Ethernet controller (10/100-Mbps full-duplex) HDLC DMA controller (2-channel general DMA)

For memory-to-memory, memory-to-UART, UART-to-memory UARTs (two UART with DMA-based or interrupt-based operation) Timers (two 32-bit timers with interval mode or toggle mode operation) Programmable I/O (64 programmable I/O ports) Interrupt controller (21 interrupt sources, includes 4 external interrupt) Universal Serial Bus (USB)

USB 1.1 compliant Full speed 12 Mbps operation

23

24

Two-way Set Associative cache mapping

25

S3C2500B Block diagram

26

1.1 ARM vs. MIPS MIPS Overview ARM Overview

1.2 S3C4510 (ARM7) Overview Samsung S3C4510B

1.3 IXP (XScale) Overview

Outline

27

IXP (XScale) OverviewIntel XScale core

Intel StrongARM V5 compliant 266, 400, and 533 MHz

3 Network Processor Engines (NPE) Ethernet filtering ATM SARing HDLC

28

IXP (XScale) Overview (con’t)USB 1.1 device controller

Full-speed 16 endpoints

PCI controller 32-bit interface PCI Spec. Rev. 1.1 compatible Host/option capable Master/target capable Two DMA channels 264 MBps peak data rate

29

IXP (XScale) Overview (con’t)2 Ethernet MACsADSL supportHardware security accelerator

DES, 3DES, SHA-1, and MD5 AES 128-bit and 256-bit For VPN, Wireless,... Etc. applications

UTOPIA-2 InterfaceLow Power consumption

1.2W @ 533MHz

30

IXP (XScale) Overview (con’t)DSP support for:

TI DSPs supporting HPI-8/HPI-16 bus cyclesInternal bus monitoring unit

Seven 27-bit event counters Monitors internal bus occurrence and duration

eventsHigh-speed UARTExpansion bus interface

31

IXP (XScale) Overview (con’t)Typical Applications

High performance DSL modem High performance cable modem Residential gateway SME router Integrated access device (IAD) Set-top box DSLAM Access Points 801.11 a/b/g Network Printers

32

IXP (XScale) ArchitectureIXP425 hardware block diagram

33

IXP (XScale) ArchitectureXScale core block diagram

34

IXP (XScale) CoreIntel StrongARM V5TE compliantSeven/eight-stage super-pipeline

Integer pipe Multiply-accumulate (MAC) pipe Memory pipe

Multiple-accumulate coprocessor Can do 2 simultaneous, 16 bit, SIMD multiplies

with 40-bit accumulation

35

IXP (XScale) Core (cont’d)

Management unit 32-entry, data memory management unit 32-entry, instruction memory management unit 32-KByte, 32-way, set associative instruction cache 32-KByte, 32-way, set associative data cache 2-KByte, 2-way, set associative mini-data cache 128-entry, Branch Target Buffer 8-entry write buffer 4-entry fill and pend buffers

allow “hit-under-miss” operation with data cachesDebug unit

JTAG interface

36

IXP (XScale) NPENetwork Processor Engine

Dedicated-function High performance, hardware-multi-threaded Dedicated instruction/data memory bus

Used to off load networking functionsAdditional assist hardware

Hardware security accelerator CRC, AAL 2, AES, DES, SHA-1, and MD5

37

IXP425 Processing Power

Processor Speed

Intel Xscale CoreDrystone 2.1 MIP

S

NPE MIPS Total MIPS

266MHZ 333 133 X 3 = 400 733

400MHZ 500 133 X 3 = 400 900

533MHZ 666 133 X 3 = 400 1066

Recommended