View
58
Download
0
Category
Preview:
DESCRIPTION
A Parameterized Dataflow Language Extension for Embedded Streaming Systems. Yuan Lin 1 , Yoonseo Choi 1 , Scott Mahlke 1 , Trevor Mudge 1 , Chaitali Chakrabarti 2 1 Advanced Computer Architecture Lab, University of Michigan at Ann Arbor - PowerPoint PPT Presentation
Citation preview
A Parameterized Dataflow Language Extension for Embedded Streaming Systems
Yuan Lin1, Yoonseo Choi1, Scott Mahlke1, Trevor Mudge1, Chaitali Chakrabarti2
1Advanced Computer Architecture Lab, University of Michigan at Ann Arbor2Department of Electrical Engineering, Arizona State University
Embedded Streaming Systems Mobile computing: multimedia anywhere at
anytime
Many of its key workloads are embedded streaming systems Video/audio coding (i.e. H.264) Wireless communications (i.e. W-CDMA) 3D graphics and others…
Cell phones are getting more
complexPCs are getting
more mobile
Characteristics of Streaming Systems
LPF-Tx Scrambler
Spreader
Interleaver
Channelencoder
LPF-Rx
Searcher
Descrambler
Despreader
Combin
erDescrambl
erDespread
er
Interleaver
Channeldecoder
(Viterbi/Turbo)
Transmitter
Receiver
Analog
Upper layer
W-CDMA Physical Layer Processing
LPF-Tx
LPF-Rx
Scrambler
Spreader
Descrambler
Despreader
Combin
erDescrambl
erDespread
er
Searcher
Interleaver
Channelencoder
Interleaver
Channeldecoder
(Viterbi/Turbo)
Data are processed in a pipeline of DSP algorithm kernels Mostly vector/matrix-based data computation Periodic system reconfigurations
i.e. changing from voice communication to data communication
Embedded DSP Processors
ARM
SIMDUnit
LocalMem
DESIMDUnit
LocalMem
DESIMDUnit
LocalMem
DESIMDUnit
LocalMem
DE
GlobalMem
Current trend: multi-core DSPs for streaming applications IBM Cell processor TI OMAP Many other SoCs
Common hardware characteristics Multiple (potentially heterogeneous) data engines (DEs) Software-managed scratchpad memories Explicit DMA transfer operations
Our DSP case study:
SODA, a multi-core DSP processor
Programming Challenge How to automatically compile streaming
systems onto multi-core DSP hardware?ARM
SIMDUnit
LocalMem
DESIMDUnit
LocalMem
DESIMDUnit
LocalMem
DESIMDUnit
LocalMem
DE
GlobalMem
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----?
How to divide the system into multiple threads?
How to SIMDize DSP kernels?
When and where to issue DMA transfers?
VLIW execution scheduling?
How to manage the local and global memory?Who does the execution
scheduling?and many other problems….
Compile for Multi-core DSPs
Two-tier compilation approachLPF-Tx Scramble
r Spreader Interleaver
Channelencoder
LPF-Rx
SearcherDescrambl
erDespread
er
Combiner
Descrambler
Despreader
Interleaver
Channeldecoder(Viterbi/Turbo)
Transmitter
Receiver
Frontend
Upper layer
ARM
ExeUnit
LocalMem
PEExeUnit
LocalMem
PEExeUnit
LocalMem
PEExeUnit
LocalMem
PE
GlobalMemSODA
System Architecture
void Turbo(){ ...}
void Turbo(){ ...}
32-laneSIMDALU
SIMDRF
32-laneSSN
SIMDto
scalar
EX
WB
STV
VTS
scalarRF
16-bitALU
EX
WB
SIMDDataMEM
ScalarDataMEM
SIMD
Scalar
This presentation is focused on system-level language & compilation
Compiling functions, not instructions
System Compilation Overview
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
DE0 ARM
Coarse-grained compilation Function-level, not instruction-level C/C++-to-C compiler
SPEX: Signal Processing EXtension Our high-level language extension
Frontend compilation Translate from SPEX into SPIR
SPIR: Signal Processing IR System compiler’s IR Models function-level interactions
Backend compilation Function-level compilation Generate multi-threaded C code
SPEX
System Compilation Overview
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
DE0 ARM
SPEX Coarse-grained compilation
Function-level, not instruction-level C/C++-to-C compiler
SPEX: Signal Processing EXtension Our high-level language extension
Frontend compilation Translate from SPEX into SPIR
SPIR: Signal Processing IR System compiler’s IR Models function-level interactions
Backend compilation Function-level compilation Generate multi-threaded C code
SPIR: Function-level IR
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
Frontend
Backend
PE0 ARM
SPIR
Must captures stream applications’ system-level behaviors
Based on the dataflow computation model Good for modeling streaming
computations Easy to generate parallel code
But which dataflow model?
node
FIFO
bufferFIFO buffer
node
node
FIFO buffer
SPEX
Synchronous Dataflow Synchronous dataflow (SDF)
Simplest dataflow model Static dataflow No conditional dataflow allowed
Pros Efficiency: can generate execution schedule during compile-time Optimality: We know how to compile SDFs for multi-processor DSPs
Berkeley Ptolemy project, MIT StreamIt compiler
Cons Lack of flexibility: Cannot describe run-time reconfigurations in
stream computations
node
input_rate = 2 output_rate = 3
Parameterized dataflow (PDF) Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values
Parameterized attributes in SPIR Dataflow rates
Parameterized Dataflow
node
input_rate = {1, 4, 8} output_rate = {2, 8}
First proposed by: B. Bhattacharya and S. S. Bbhattacharyya, “Parameterized Dataflow Modeling for DSP Systems.” IEEE Transactions on Signal Processing, Oct. 2001
Parameterized Dataflow Parameterized dataflow (PDF)
Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values
Parameterized attributes in SPIR Dataflow rates Conditional dataflow
IF
if_cond = {true, false}
ifnode
elsenode
IF{1,4,
8}
{2,8}{6,8} {2,4
}
Parameterized Dataflow Parameterized dataflow (PDF)
Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values
Parameterized attributes in SPIR Dataflow rates Conditional dataflow Number of dataflow actors spli
tmerg
e
A[0]
A[1]
A[n]
Number of A nodes = {1, 4, 12}
Parameterized Dataflow Parameterized dataflow (PDF)
Use parameters to model run-time system reconfiguration Each parameter is a variable with a finite set of discrete values
Parameterized attributes in SPIR Dataflow rates Conditional dataflow Number of dataflow actors Streaming size between reconfigurations
There are also other modifications to the dataflow model Please refer to the paper for further details
stream_size = {10k, 20k}
PDF Run-time Execution Model
Three stage run-time execution model
Goal: provide the efficiency of the synchronous dataflow execution on parameterized dataflow
PDF Run-time Execution Model
Stage 1: dataflow initialization
Convert a PDF graph into a SDF graph Setting parameter variables
to constant values
Perform other initialization computation
PDF Run-time Execution Model
Stage 2: dataflow computation
Dataflow computation following static SDF execution schedulesStream
inputStream output
PDF Run-time Execution Model
Stage 3: dataflow finalization
Update the dataflow states with calculated results
System Compilation Frontend
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
Start from a stream system described in C or C++ with SPEX
Translate the description into dataflow representation
SPEX
SPEX
Q: Why can’t we compile pure C/C++?
A: Some of C/C++’s language features cannot be translated into dataflow
i.e. passing pointers as function arguments C/C++: pointer’s memory locations can
be read and written Dataflow: can have read-only and
write-only edges
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
SPEX
SPEX
#include <spex_stream.h> SPEX definition headers
class WCDMA: spex_kernel { pdf_node(interleaver)(...) { ... } Functions for declaring dataflow nodes pdf_node(turbo_dec)(...) { ... }
pdf_graph(wcdma_rec)() Functions for declaring a dataflow graph { ... interleaver(intlv_to_turbo, intlv_in); turbo_dec(turbo_out, intlv_to_turbo); ... }};
SPEX is a set of keywords and language restrictions
A guideline for programmers to write stylized C/C++ code that can be translated into dataflow Dataflow-safe C/C++ programming
SPEX code can be compiled directly with g++
SPEX pdf_node Code Snippets
pdf_node(fir)(channel<int> in, channel<int> & out){ ... z[0] = in.pop(); for (i = 0; i < TAPS; i++) { sum += z[i] * coeff[i]; } out.push(sum); ...}
Read-only input dataflow edge
Write-only output dataflow edge
FIR’s dataflow input
FIR’s dataflow output
SPEX Code Snippetspdf_graph(WCDMA_rec)() { FIR fir; ... channel<int> fir_to_rake; ... pdf { for (i = 0; i < slot_size; i++) { fir.run(fir_to_rake, AtoD); rake.run(rake_out, fir_to_rake); if (mode == voice) viterbi.run(mac_in, rake_out); else turbo.run(mac_in, rake_out); mac(mac_in); } } }pdf_graph_init(WCDMA_rec)() { ... }pdf_graph_final(WCDMA_rec)() { ... }
Static PDF node and edge declarations
PDF scope: a PDF graph description.
Language restrictions within PDF scope.i.e. - Must only use for-loop constructions with constant loop-bounds- Must only include function calls to pdf_node functions.
A guideline for writing dataflow-safe C++ code
Descriptions for dataflow initialization and finalization stagesfir rake if
vit
turif mac
System Compilation Frontend---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
Translate SPEX into parameterized dataflow representation Use traditional control-flow and
dataflow analysis
Semantic error-checking to ensure dataflow-safe C/C++ code
Possible to support other high-level languages
System Compilation Backend---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
Function-level compilation Node-to-DE assignments Memory buffer allocations DMA assignments
Function-level optimizations Software pipelining
Code generation Parallel thread generation Physical buffer allocation If-conversion and predicate
propagation
Conclusion
System-level compilation framework
We have a working compiler for SPEX Target: SODA-like multi-core DSPs
Parameterized dataflow is used as compiler IR
SPEX is a set of language extensions for efficient translation from C/C++ into dataflow
---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
DE0 ARM
Questions www.eecs.umich.edu/~sdrg
Shared Variables In Dataflow Shared variables are not allowed in traditional dataflow
models
SPIR allows shared variables between dataflow nodes Multi-dimensional streaming patterns Non-sequential streaming patterns Decoupled streaming Shared memory buffers
Backend Compilation---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
FIRRak
eTurb
o
Problem with function-level compilation Requires function-level parallelism Wireless protocols do not have many
concurrent functions
FIR
Rake
Turbo
in[0..N] PE
0PE1
PE2
Backend Compilation---- ---- ---- ---- ----
----
-------- ---- -----
---- ----
---- ----
SPIR
Frontend
Backend
PE0 ARM
Utilize existing compiler optimization Function-level software pipelining
Processing each stream data is the same as a loop iteration
Modulo scheduling applied to function-level compilation
FIRRak
eTurb
o
in[i]
PE0
PE1
PE2
FIRRak
eTurb
o
in[i+1]
FIRRak
eTurb
o
in[i+2]
Turbo
RakeFIR
Recommended