Upload
todor
View
215
Download
3
Embed Size (px)
Citation preview
Mapping of Streaming Applications considering
Alternative Application Specifications (Extended Abstract)
Jiali Teddy Zhai, Hristo Nikolov, Todor StefanovLeiden Institute of Advanced Computer Science (LIACS), Leiden University, The Netherlands
{tzhai, nikolov, stefanov}@liacs.nl
Pre-print of the full paper can be found under: http://www.liacs.nl/~tzhai/pdf/Zhai-TECS2012.pdf
Abstract—Streaming applications often require a parallel Model ofComputation (MoC) to specify their application behavior and to facili-
tate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms.Various performance requirements and resource budgets of embeddedsystems ask for an efficient design space exploration (DSE) approach toselect the best design from a design space consisting of a large number
of design choices. However, existing DSE approaches explore the designspace that includes only architecture and mapping alternatives for aninitial application specification given by the application designer. In thispaper, we first show that a design often might not be optimal if alternative
specifications of a given application are not taken into account. We furtherargue that the best alternative specification consists of only independentand load-balanced application tasks. Based on the Polyhedral ProcessNetwork (PPN) MoC, we present an approach to analyze and transform
an initial PPN to an alternative one that contains only independentprocesses if possible. Finally, by prototyping real-life applications on bothFPGA-based MPSoCs and desktop multi-core platforms, we demonstrate
that mapping the alternative application specification results in a largeperformance gain compared to those approaches, in which alternativeapplication specifications are not taken into account.
I. INTRODUCTION
Streaming applications are widely used in several domains. Strin-
gent performance requirements have been driving the implementation
of streaming applications on Multi-Processor System-on-Chip (MP-
SoC) platforms, which contain an increasing number of Processing
Elements (PEs).
To enable an efficient design of streaming applications on MPSoC
platforms, application behavior is usually specified using a certain
parallel Model of Computation (MoC). However, a given application
specification in a MoC may not be the most appropriate one for the
considered MPSoC platform. This is because application designers
may not fully take the computational capacity and communication
cost of the MPSoC platform into account. Therefore, often the initial
application specification only leads to an inferior design. In this
case, alternative application specifications may be needed for efficient
design of an application on MPSoC platforms. In general, the best
application specification, if it exists, to be mapped onto n PEs is
the one that consists of n communication-free and load-balanced
tasks. In this paper, we study the problem of whether an alternative
specification exists, which consists of only communication-free and
load-balanced tasks, for an initial parallel application specification.
II. PAPER CONTRIBUTIONS
For an application modeled using the Polyhedral Process Network
(PPN) MoC [1], we devise an approach to analytically determine
the maximum amount of data-level parallelism, i.e., the number of
communication-free partitions. Subsequently, we propose a procedure
to transform the initial PPN to a set of communication-free partitions,
if they exist. Our approach also can be applied to applications with
cyclic dependences, which are traditionally considered as perfor-
mance bottleneck and hard to parallelize. Finally, we demonstrate
and validate the effectiveness of our approach by prototyping real-
�
�
�
�
�
�
�
�
� � � � � � �
�������
������������� ������� �
�� ���
�� ���� ���
(a)
���
�
���
�
���
�
���
� � � � � �
�������
������������� �
�� � ��
�������� ��
(b)
Fig. 1. Performance results of mapping the FM radio application onto (a)FPGA-based MPSoC platforms and onto (b) a desktop multi-core platform.
life streaming applications on FPGA-based MPSoCs and a desktop
multi-core platform.
III. CASE STUDIES
We performed case studies on two real-life applications modeled
using the PPN MoC, namely a Motion-JPEG (MJPEG) encoder
and the FM radio application taken from the StreamIT benchmark
suite [2]. All experiments were conducted using the open-source
ESPAM [3] tool on two different platforms, a Xilinx ML605 FPGA
board with up to 8 MicroBlaze (MB) soft-cores and a desktop multi-
core platform containing an Intel i7-920 processor with 4 cores. Due
to limited space, here we only show the performance results for the
FM radio application in Fig. 1(a) and 1(b). Compared to mapping
the initial application specification by optimally pipelining, mapping
the alternative PPN (denoted as Alternative in Fig. 1(a)) outperforms
mapping the initial PPN by 32.05% to 97.74% on 3 to 7 MBs on
the FPGA-based platform. A 7.83X speedup is achieved on 8 MBs.
On the desktop platform, our approach (denoted as Alternative in
Fig. 1(b)) outperforms the mapping of the initial PPN by 39.79% to
489.27% using 2 to 7 threads. In the best case, 3.46X speedup is
observed using 7 threads.
To quantify the time complexity of our approach, we conducted ex-
periments on a set of real-life benchmarks. In practice, our approach
takes less than 3 seconds to derive all communication-free partitions
for the considered benchmarks.
REFERENCES
[1] S. Verdoolaege, H. Nikolov, and T. Stefanov, “pn: a tool for improvedderivation of process networks,” EURASIP J. Embedded Syst., 2007.
[2] M. I. Gordon, W. Thies, and S. Amarasinghe, “Exploiting coarse-grainedtask, data, and pipeline parallelism in stream programs,” in Proc. ASPLOS,2006, pp. 151–162.
[3] H. Nikolov, T. Stefanov, and E. Deprettere, “Systematic and automatedmultiprocessor system design, programming, and implementation,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, pp. 542–555,2008.
27978-1-4673-4967-3/12/$31.00 ©2012 IEEE