1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc

1

SHARC‘S’uper ‘H’arvard ‘ARC’hitecture

Nagendra Doddapaneni

ER

hit

HAR

ect

VARD

ure

SUP

Arc

2

Overview

•Harvard

Architecture

•Super Harvard

Architecture

•TigerSHARC

processor

3

Outline

• Background

• Harvard Architecture−Why?−What?

• Modern CPU Chip Design

• Super Harvard Architecture

• TigerSHARC Processor

4

Outline

• Background <-





5

Background

•von Neumann Architecture−Single storage for instructions and data

•Digital Signal Processors−Specialized microprocessor designed specifically for digital signal processing, generally in real time

6

Outline

• Background

• Harvard Architecture−Why? <-−What?




7

Why Harvard Architecture ?

• von Neumann bottleneck

(‘memory bound’)

• DSP applications

• In von Neumann architecture−Either reading an instruction−Or reading/writing from/to memory

8

Harvard Architecture (cont…)

9

Outline

• Background

• Harvard Architecture−Why?−What? <-




10

What is Harvard Architecture ?

•Physically separate storage and signal pathways for instruction and data•Next instruction fetched, when executing current instruction•Program memory can be small and wide•Data memory can be large and narrower

11

Outline

• Background


• Modern CPU Chip Design <-



12

Modern CPU chip design

• Incorporate features from both architectures• ‘On chip’ cache memory – divided into

instruction cache and data cache.

Harvard architecture used when CPU accesses cache memory.

• On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture.

Main memory is not separated into data and instruction sections.

13

Outline

• Background



• Super Harvard Architecture <-


14

Super Harvard Architecture

• Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands

• Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture

15

Outline

• Background




• TigerSHARC Processor <-

16

TigerSHARC Processor

• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K buses• DMA Controller• Applications

17


• Processor Architecture <-• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks



18

TigerSHARC Processor Architecture

•3 128-bit data

buses

•2 IALU’s

•2 Computational

Blocks− ALU ( Float and Integer )− SHIFTER− MULTIPLIER− CLU

19


• Processor Architecture• Instruction Parallelism and SIMD Operation <-• Integer ALU• Computational blocks



20

TigerSHARCInstruction Parallelism and SIMD

Operation

• Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW).

• Can execute in parallel? Depends on….− Instruction line resources each requires−Source and Destination of registers used

• Supports SIMD operations through the use of both Computational Blocks in parallel.

• Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.

21


• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU <-• Computational blocks



22

TigerSHARCInteger ALU

•31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers• Performs integer ALU operations and data addressing• ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP• Status flags: zero (Z), negative (N), overflow (V), carry (C)• Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE• Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP)• Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.

23



− X and Y Register File <-− X and Y ALU− Multiplier− Shifter− CLU

• Program Sequencer• I J and K Buses• DMA Controller• Applications

24

TigerSHARC Computational Blocks

X and Y Register File

•Register File Syntax−Each Block has 32x32 bit Data registers−Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. −Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.

25

TigerSHARC Computational Blocks

X and Y Register File•Register File Syntax

26

Volatile registers in each block

• 24 Volatile Data registers in each block−XR0 – XR23−YR0 – YR23

• 2 ALU summation registers in each block−XPR0, XPR1, YPR0, YPR1

• 5 MAC accumulate registers in each block−XMR0 – XMR3, YMR0 – YMR3−XMR4, YMR4 – Overflow registers

27



− X and Y Register File− X and Y ALU <-− Multiplier− Shifter− CLU


28

TigerSHARC

X and Y ALU• 2x64 bit input

paths• 2x64 bit output

paths• 8, 16, 32, or 64 bit

addition/subtraction - Fixed-point

• 32 or 64 bit logical operations - fixed-point

• 32 or 40 bit floating-point operations

29

Sample ALU Instruction

• Example of 16 bit addition

• XYSR1:0 = R31:30 + R25:24

• Performs addition in X and Y Compute Blocks

30



− X and Y Register File− X and Y ALU− Multiplier <-− Shifter− CLU


31

TigerSHARC

Multiplier• Operates on fixed,

floating and complex numbers.

• Fixed-Point numbers− 32x32 bit with 32 or

64 bit results− 4 (16x16 bit) with

4x16 or 4x32 bit results

• Floating-Point numbers− 32x32 bit with 32 bit

result− 40x40 bit with 40 bit

result

• Complex Numbers− 32x32 bit with 32 bit

result − Fixed-point only

• Results stored in MR register

32

TigerSHARC Multiplier

XR0 = R1*R2;;XR1:0 = R3*R5;;XMR1:0 = R3*R5;; //uses XMR4 overflowXR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;;

XFR0 = R1*R2;;XFR1:0 = R3:2*R5:4;; //40 bit multiply

//32 bit mantissa

33



− X and Y Register File− X and Y ALU− Multiplier− Shifter <-− CLU

• Program Sequencer• I J and K data buses• DMA Controller• Applications

34

TigerSHARC

Shifter• Operates on one 64-

bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands

• Shifts and rotates bits

• manipulation operations, like bit set, clear, toggle and test

• Bit FIFO operations to support bit streams

35


• Processor Architecture• Integer ALU• Computational blocks

− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU <-

• Program Sequencer• J and K data buses• I bus – data bus

36

TigerSHARC CLU

• CLU instructions are designed to support different algorithms used for communications applications

• Algorithms supported are−Viterbi Decoding (minimal distance decoding

algorithm)−Turbo-code Decoding (variant of Viterbi

decoding)−De-spreading for Code Division Multiple Access

(CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)

37




• Program Sequencer <-• I J and K buses• DMA Controller• Applications

38

TigerSHARC Program Sequencer

• Supplies instruction addresses to memory • IAB caches up to five fetched instruction

lines waiting to execute• It extracts an instruction line from IAB

and distributes to appropriate core component for execution

• Determine flow control for instructions like JMP, CALL

• Reduce branch delays using branch prediction and BTB

39




• Program Sequencer• I J and K buses <-• DMA Controller• Applications

40

TigerSHARC architecture at a glance

41

TigerSHARC Buses

• DRAM divided into 6 blocks of 4Mbits• 6 blocks connect to four 128-bit wide

internal buses through a crossbar connection

• Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec

• Core and I/O can access −twelve 32-bit data words−four 32-bit instructions

per cycle

42




• Program Sequencer• I J and K buses• DMA Controller <-• Applications

43

TigerSHARC DMA Controller

• On-chip, with 14 DMA channels

• Provide zero-overhead data

transfers

• Operates independently and

invisibly to the DSP’s core

44




• Program Sequencer• I J and K buses• DMA Controller• Applications <-

45

TigerSHARC Applications

46

References

• ANALOG DEVICES− http://www.analog.com/processors/processors/tigersharc/index.ht

ml− http://www.analog.com/processors/processors/sharc/index.html− http://www.analog.com/processors/resources/teachingResources.ht

ml

• ECE-ADI-PROJECT HOME PAGE− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.

htm

http://www.analog.com/processors/processors/tigersharc/index.html

http://www.analog.com/processors/processors/tigersharc/index.html

http://www.analog.com/processors/processors/sharc/index.html

http://www.analog.com/processors/resources/teachingResources.html

http://www.analog.com/processors/resources/teachingResources.html

http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html

http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm

http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm

47

Summary

• What is Harvard Architecture?

• What is Super Harvard Architecture?

• TigerSHARC processor architecture

• How TigerSHARC is ‘faster’ for

targeted DSP applications?

48

Questions?

Thank You.

Documents

1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc