View
96
Download
0
Category
Tags:
Preview:
DESCRIPTION
To DSP or Not to DSP?. Chad Erven. Words to Bits – Your Options. ASIC FPGA DSP Embedded RISC General Purpose Processor (GPP). Why Go Programmable?. Building the chip wrong Systems are increasingly too complex to efficiently be described by RTL designers - PowerPoint PPT Presentation
Citation preview
To DSP or Not to To DSP or Not to DSP?DSP?
Chad ErvenChad Erven
Words to Bits – Your Words to Bits – Your OptionsOptions ASIC ASIC FPGAFPGA DSPDSP Embedded RISCEmbedded RISC General Purpose Processor (GPP)General Purpose Processor (GPP)
Why Go Why Go Programmable?Programmable?
1.1. Building the chip wrongBuilding the chip wrong– Systems are increasingly too complex to efficiently be Systems are increasingly too complex to efficiently be
described by RTL designersdescribed by RTL designers– Errors are orders of magnitudes more difficult to find in Errors are orders of magnitudes more difficult to find in
hardware than softwarehardware than software– Defects are extremely costly in hardwareDefects are extremely costly in hardware
2.2. Building the wrong chipBuilding the wrong chip– Only software is flexible enough to adapt during and Only software is flexible enough to adapt during and
after system designafter system design
HARDWARE IS TO HARD!HARDWARE IS TO HARD!
So Software and So Software and Processors, Right?Processors, Right? Using processors has its drawbacks – Using processors has its drawbacks –
especially in SOC designsespecially in SOC designs
– Never a perfect match between the application and Never a perfect match between the application and the hardwarethe hardware
– Performance costs, power penalties, wasted silicon Performance costs, power penalties, wasted silicon will ALWAYS happen to some extent will ALWAYS happen to some extent
– Integrating multiple disparate cores with each Integrating multiple disparate cores with each otherother
Splitting the Splitting the Difference – ASIPsDifference – ASIPs Ever wish you were the processor Ever wish you were the processor
designer?designer?
Now you are! Write the exact Now you are! Write the exact instructions you need and nothing more.instructions you need and nothing more.
An Application Specific Integrate An Application Specific Integrate Processor (ASIP) offers the best of both Processor (ASIP) offers the best of both worldsworlds
Back Up!Back Up!
Isn’t hardware too much work?Isn’t hardware too much work?– YesYes
So doesn’t an ASIP defeat the So doesn’t an ASIP defeat the purpose?purpose?– NoNo
Why not?Why not?– Extending a base processor is much easierExtending a base processor is much easier– Readily amiable to automationReadily amiable to automation– You only have to verify the instruction description, You only have to verify the instruction description,
integration into the processor is guaranteed integration into the processor is guaranteed
Cool, Show Me How It Cool, Show Me How It WorksWorks ASIPs derive their performance from ASIPs derive their performance from
three problems for a processorthree problems for a processor1.1. Operations that are innately parallel must be Operations that are innately parallel must be
expressed seriallyexpressed serially– Somewhat solved by SIMD or MIMD processorsSomewhat solved by SIMD or MIMD processors
2.2. Memory space is addressed as one continuous spaceMemory space is addressed as one continuous space– Somewhat solved by modifiers and/or pragmas (dm/pm)Somewhat solved by modifiers and/or pragmas (dm/pm)
3.3. Applications are complicated by their expression as Applications are complicated by their expression as operations on C typesoperations on C types
– Somewhat alleviated by powerful instructions in hardwareSomewhat alleviated by powerful instructions in hardware
Working with the Innate Working with the Innate Nature of the AlgorithmNature of the Algorithm
Example –Example – byte swap (common telecom task)byte swap (common telecom task)
int *a, *b ; int *a, *b ; ……
for(int i= 0 ; i < 4096 ; i++ )for(int i= 0 ; i < 4096 ; i++ ){{a[i] =( a[i] =(
((b[i] & 0x000000ff) << 24) | ((b[i] & 0x000000ff) << 24) | ((b[i] & 0x0000ff00) << 8) | ((b[i] & 0x0000ff00) << 8) | ((b[i] & 0x00ff0000) >> 8) | ((b[i] & 0x00ff0000) >> 8) | ((b[i] & 0xff000000) >> 24) );((b[i] & 0xff000000) >> 24) );
}}
Working with the Innate Working with the Innate Nature of the AlgorithmNature of the Algorithm
Write your own instruction:Write your own instruction:
operationoperation swap { swap {in ARin AR x, x, out ARout AR y}{} y}{}{y = {x[7:0],x[15:8],x[23:16],x[31:24]};}{y = {x[7:0],x[15:8],x[23:16],x[31:24]};}
Making the C Code:Making the C Code:for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ;for(int i = 0 ; i < 4096 ; i++) a[i] = swap(b[i]) ;
Execution Cycles without TIE Execution Cycles without TIE ExtensionExtension
Execution Cycles With TIE Execution Cycles With TIE ExtensionExtension
4,915,300 4,915,300 1,638,5241,638,524 5X SPEED UP!!!5X SPEED UP!!!
Instruction FusionInstruction Fusion
op1
reg1 (input) reg2 (input)
reg3 (output)
op2
reg3 (input) reg4 (input)
reg5 (output)
Unfused operation
op1
reg1 (input) reg2 (input)
op2
reg4 (input)
reg5 (output)
Fused operation
ExampleExamplefor(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ;for(i=0 ; i<n ; i++ ) c[i] = (a[i] * b[i]) >> 4 ;
Assembly:Assembly:
loop:loop:l8uil8ui a12,a11,0a12,a11,0l8uil8ui a13,a10,0a13,a10,0addiaddi a11,a11,1a11,a11,1addiaddi a10,a10,1a10,a10,1mull6u mull6u a8,a12,a13a8,a12,a13sraisrai a8,a8,4a8,a8,4s8is8i a8,a9,0a8,a9,0addiaddi a9,a9,1a9,a9,1
ExampleExample
addi addi
addi
l8uil8ui
srai
mull6u
1 10 0
1
4
a11a10
s8i
a9
a9
ExampleExample
addi addil8uil8ui
1 10 0a11
a10
fusion.mull6u.srai.s8i.addi
a9
a9
ExampleExample
New assembly code:New assembly code:
loop:loop:
l8uil8ui a12,a11,0a12,a11,0
l8uil8ui a13,a10,0a13,a10,0
addiaddi a10,10,1a10,10,1
addiaddi a11,a11,1a11,a11,1
fusion.mull6u.srai.s8i.addifusion.mull6u.srai.s8i.addi a9,12,a13a9,12,a13
BenchmarkingBenchmarking
• Hand coded assembly for the other processors
EEMBC ConsumerMarks (performance). From [Rowen] . EEMBC Summary (Performance/MHz). From [Rowen]
And I Haven’t Even And I Haven’t Even Gotten To…Gotten To… Sharing input operandsSharing input operands
Substituting variables with constantsSubstituting variables with constants
Replacing memory tables with logicReplacing memory tables with logic
Limiting immediate values to the minimum required widthLimiting immediate values to the minimum required width
Placing operands in special registersPlacing operands in special registers
Creating SIMD instructions Creating SIMD instructions
Reducing the size of operand specifiersReducing the size of operand specifiers
Custom input/output queues Custom input/output queues
Ok, Let Me Have It Dr. Ok, Let Me Have It Dr. Smith Smith
(The rest of you can ask questions (The rest of you can ask questions too)too)
Recommended