View
231
Download
1
Tags:
Embed Size (px)
Citation preview
1
SHARC‘S’uper ‘H’arvard ‘ARC’hitecture
Nagendra Doddapaneni
ER
hit
HAR
ect
VARD
ure
SUP
Arc
2
Overview
•Harvard
Architecture
•Super Harvard
Architecture
•TigerSHARC
processor
3
Outline
• Background
• Harvard Architecture−Why?−What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
4
Outline
• Background <-
• Harvard Architecture−Why?−What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
5
Background
•von Neumann Architecture−Single storage for instructions and data
•Digital Signal Processors−Specialized microprocessor designed specifically for digital signal processing, generally in real time
6
Outline
• Background
• Harvard Architecture−Why? <-−What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
7
Why Harvard Architecture ?
• von Neumann bottleneck
(‘memory bound’)
• DSP applications
• In von Neumann architecture−Either reading an instruction−Or reading/writing from/to memory
8
Harvard Architecture (cont…)
9
Outline
• Background
• Harvard Architecture−Why?−What? <-
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor
10
What is Harvard Architecture ?
•Physically separate storage and signal pathways for instruction and data•Next instruction fetched, when executing current instruction•Program memory can be small and wide•Data memory can be large and narrower
11
Outline
• Background
• Harvard Architecture−Why?−What?
• Modern CPU Chip Design <-
• Super Harvard Architecture
• TigerSHARC Processor
12
Modern CPU chip design
• Incorporate features from both architectures• ‘On chip’ cache memory – divided into
instruction cache and data cache.
Harvard architecture used when CPU accesses cache memory.
• On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture.
Main memory is not separated into data and instruction sections.
13
Outline
• Background
• Harvard Architecture−Why?−What?
• Modern CPU Chip Design
• Super Harvard Architecture <-
• TigerSHARC Processor
14
Super Harvard Architecture
• Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands
• Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture
15
Outline
• Background
• Harvard Architecture−Why?−What?
• Modern CPU Chip Design
• Super Harvard Architecture
• TigerSHARC Processor <-
16
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
17
TigerSHARC Processor
• Processor Architecture <-• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
18
TigerSHARC Processor Architecture
•3 128-bit data
buses
•2 IALU’s
•2 Computational
Blocks− ALU ( Float and Integer )− SHIFTER− MULTIPLIER− CLU
19
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation <-• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
20
TigerSHARCInstruction Parallelism and SIMD
Operation
• Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW).
• Can execute in parallel? Depends on….− Instruction line resources each requires−Source and Destination of registers used
• Supports SIMD operations through the use of both Computational Blocks in parallel.
• Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.
21
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU <-• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
22
TigerSHARCInteger ALU
•31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers• Performs integer ALU operations and data addressing• ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP• Status flags: zero (Z), negative (N), overflow (V), carry (C)• Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE• Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP)• Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.
23
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File <-− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K Buses• DMA Controller• Applications
24
TigerSHARC Computational Blocks
X and Y Register File
•Register File Syntax−Each Block has 32x32 bit Data registers−Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. −Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.
25
TigerSHARC Computational Blocks
X and Y Register File•Register File Syntax
26
Volatile registers in each block
• 24 Volatile Data registers in each block−XR0 – XR23−YR0 – YR23
• 2 ALU summation registers in each block−XPR0, XPR1, YPR0, YPR1
• 5 MAC accumulate registers in each block−XMR0 – XMR3, YMR0 – YMR3−XMR4, YMR4 – Overflow registers
27
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU <-− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
28
TigerSHARC
X and Y ALU• 2x64 bit input
paths• 2x64 bit output
paths• 8, 16, 32, or 64 bit
addition/subtraction - Fixed-point
• 32 or 64 bit logical operations - fixed-point
• 32 or 40 bit floating-point operations
29
Sample ALU Instruction
• Example of 16 bit addition
• XYSR1:0 = R31:30 + R25:24
• Performs addition in X and Y Compute Blocks
30
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier <-− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications
31
TigerSHARC
Multiplier• Operates on fixed,
floating and complex numbers.
• Fixed-Point numbers− 32x32 bit with 32 or
64 bit results− 4 (16x16 bit) with
4x16 or 4x32 bit results
• Floating-Point numbers− 32x32 bit with 32 bit
result− 40x40 bit with 40 bit
result
• Complex Numbers− 32x32 bit with 32 bit
result − Fixed-point only
• Results stored in MR register
32
TigerSHARC Multiplier
XR0 = R1*R2;;XR1:0 = R3*R5;;XMR1:0 = R3*R5;; //uses XMR4 overflowXR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;;
XFR0 = R1*R2;;XFR1:0 = R3:2*R5:4;; //40 bit multiply
//32 bit mantissa
33
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter <-− CLU
• Program Sequencer• I J and K data buses• DMA Controller• Applications
34
TigerSHARC
Shifter• Operates on one 64-
bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands
• Shifts and rotates bits
• manipulation operations, like bit set, clear, toggle and test
• Bit FIFO operations to support bit streams
35
TigerSHARC Processor
• Processor Architecture• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU <-
• Program Sequencer• J and K data buses• I bus – data bus
36
TigerSHARC CLU
• CLU instructions are designed to support different algorithms used for communications applications
• Algorithms supported are−Viterbi Decoding (minimal distance decoding
algorithm)−Turbo-code Decoding (variant of Viterbi
decoding)−De-spreading for Code Division Multiple Access
(CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)
37
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer <-• I J and K buses• DMA Controller• Applications
38
TigerSHARC Program Sequencer
• Supplies instruction addresses to memory • IAB caches up to five fetched instruction
lines waiting to execute• It extracts an instruction line from IAB
and distributes to appropriate core component for execution
• Determine flow control for instructions like JMP, CALL
• Reduce branch delays using branch prediction and BTB
39
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses <-• DMA Controller• Applications
40
TigerSHARC architecture at a glance
41
TigerSHARC Buses
• DRAM divided into 6 blocks of 4Mbits• 6 blocks connect to four 128-bit wide
internal buses through a crossbar connection
• Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec
• Core and I/O can access −twelve 32-bit data words−four 32-bit instructions
per cycle
42
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller <-• Applications
43
TigerSHARC DMA Controller
• On-chip, with 14 DMA channels
• Provide zero-overhead data
transfers
• Operates independently and
invisibly to the DSP’s core
44
TigerSHARC Processor
• Processor Architecture• Instruction Parallelism and SIMD Operation• Integer ALU• Computational blocks
− X and Y Register File− X and Y ALU− Multiplier− Shifter− CLU
• Program Sequencer• I J and K buses• DMA Controller• Applications <-
45
TigerSHARC Applications
46
References
• ANALOG DEVICES− http://www.analog.com/processors/processors/tigersharc/index.ht
ml− http://www.analog.com/processors/processors/sharc/index.html− http://www.analog.com/processors/resources/teachingResources.ht
ml
• ECE-ADI-PROJECT HOME PAGE− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html− http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.
htm
47
Summary
• What is Harvard Architecture?
• What is Super Harvard Architecture?
• TigerSHARC processor architecture
• How TigerSHARC is ‘faster’ for
targeted DSP applications?
48
Questions?
Thank You.