Upload
tanya-barkus
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
EuroPVM/MPI 2003. Venice, September 29 – October 2
Porting P4 to Digital Signal Processing Platforms
Juan Antonio Rico GallegoJuan Carlos Díaz Martín
José Manuel Rodríguez GarcíaJesús María Álvarez Llorente
Juan Luis García Zapata
Departamento de InformáticaUniversidad de ExtremaduraSPAIN
EuroPVM/MPI 2003. Venice, September 29 – October 2
Index
2
I. Introduction and goals
II. IDSP: A Distributed Framework for DSPs
III. Implementing the P4 functionality upon IDSP
IV. Measuring the P4 Overhead
V. Conclusions
VI. Current and Future Work
EuroPVM/MPI 2003. Venice, September 29 – October 2
Fields of application:• Communications• Voice and Data Compression• Mobile Telephony• Speech Processing• Image and Video Processing• Medical • more ...
Introduction and goals
DSP processors show specialized architectures to run real-time digital signal processing
EuroPVM/MPI 2003. Venice, September 29 – October 2
Sundance SMT310Q PCI carrier board with four TI C6201 DSPs
Nets of DSP multi-computers such as those from Sundance™, Motorola™ or Hunt Engineering™.
Introduction and goals
4
Target machines
EuroPVM/MPI 2003. Venice, September 29 – October 2
Introduction and goals
5
Target machines
• 150-MHz. Capable of delivering 900 MFLOPS
• 16 or 32 MBytes of 100 MHz SDRAM
• 64 Kbytes of CACHE / internal RAM
• 128K Bytes of flash programmable and erasable ROM
• No MMU for virtual memory management
The Texas Instrumens C6000 family of DSPs:
• Very limited resources
• Targeted to embedded systems
EuroPVM/MPI 2003. Venice, September 29 – October 2
Introduction and goals
6
High Computational Complexity and Real Time requirements
A distributed programming standard like MPI is needed
MPIMPICurrent DSP software poses the portability problem:• Platform specific• Provides only low level communication libraries• Poor support to build portable parallel applications
Most applications can be decoupled and distributed among two or more processors
EuroPVM/MPI 2003. Venice, September 29 – October 2
DSP/BIOS. Texas Instruments Kernel for C6000 family of DSP processors (21 Kb)
IDSP: A Distributed Framework for DSPs
7
Thread Synchronization: SEM_pend SEM_post
Thread Management: TSK_create TSK_delete
Timing services: CLK_gethtime
Tracing and Analysis
EuroPVM/MPI 2003. Venice, September 29 – October 2
IDSP: A Distributed Framework for DSPs
8
IDSP. Our own development.It extends DSP/BIOS with distributed facilities (30 Kb)
IDSP runs on• DSK (1 x C6000)• Sundance Multicomputer SMT310Q (4 x C6000)
C6000
DSP/BIOS
C6000
DSP/BIOS
C6000
DSP/BIOSIDSP
distributed DSP application
Thread P2P Communication: COMM_send COMM_recv COMM_asend COMM_arecv COMM_wait COMM_test ...
Thread Management: OPER_create OPER_destroy GROUP_create GROUP_destroy
EuroPVM/MPI 2003. Venice, September 29 – October 2 9
IDSP: A Distributed Framework for DSPs
An IDSP application is a group of operators communicating by message passing
oper
1
oper
2
oper
3
input stream 1
input
stream 2
output
stream
oper
4
oper
5
An operator is a thread that runs an algorithm: FFT, etc
IDSP address• Machine• Group• Operator• Port
EuroPVM/MPI 2003. Venice, September 29 – October 2 10
IDSP: A Distributed Framework for DSPs
IDSP shows a microkernel architecture:
Algorithm operator
P4 address mapper
RPC System Servers
I/O Server
Group Server
Operator Server
GROUP_
CIO_ OPER_
• System servers operators
Software BusKernel
COMM_
• A message passing kernel
EuroPVM/MPI 2003. Venice, September 29 – October 2 11
Implementing the P4 functionality upon IDSP
DSP/BIOSC6000
DSP/BIOSC6000
DSP/BIOSC6000
IDSP
• We have put P4 on top of IDSP:
• MPICH is a portable implementation of MPI:
MPI
P4ADI
• It shows a three layers design:1. MPI macros2. Abstract Device Interface3. Channel Interface, being P4 a well known example
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
12
The P4 re-entrancy problem
• P4 is process based:
Operating system
P4 library P4 library P4 library
Processes
• IDSP is thread based
IDSP
A thread safe version of P4 has been built by:
Modified P4 library
Threads
Putting P4 global variables in IDSP threads private zone Using mutual exclusion mechanisms
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
13
Communication network
• IDSP provides its own addressing scheme
DSP/BIOS
C6000
DSP/BIOS
C6000
DSP/BIOS
C6000
IDSPIDSP address
P4 IP address
sockets
• P4 is based upon TCP/IP Berkeley sockets, but
We have done IDSP/Sockets, a thin and efficient implementation of Berkeley Sockets atop IDSP
IDSP/
EuroPVM/MPI 2003. Venice, September 29 – October 2
User Operator
Address Mapping Server
User Operator
Idsp_addr Ip_addr
Idsp_addr Ip_addr
receiver sender
Implementing the P4 functionality upon IDSP
14
The IP/IDSP mappingp4_send(rank, ...)
Every user operator keeps a cache of addresses
Register (idsp_addr, ip_addr)
Idsp_addr Ip_addr
1
Idsp_addr =
3
2
Get(ip_addr )
send(IP_address, ...)COMM_send(IDSP_address, ...)
4
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
15
Signals
• DSP/BIOS does not provide signals !!!
IDSP takes advantage of this principle for supporting the UNIX signal mechanism:
1. A special message is sent to the target thread2. The target thread receive these message on next
socket read
DSP involved threads, however, exhibits a quite frequent interaction with the kernel for data I/O
• P4 uses UNIX signals for time-outs and process management, but ...
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
16
The startup process
But embedded systems don’t use disks !!
The IDSP approach is as follows:1. Every operator has a well known integer identifier2. A limited number of operators is linked3. GROUP_create takes an array of operator identifiers4. Currently, it assigns each operator to the least loaded
machine
P4 uses a text file specifying program files and machines:Local 0Sun2 1 /home/user/P4pgms/sun/prog1Sun3 2 /home/user/P4pgms/sun/prog2rs6000 1 /home/user/P4pgms/rs6000/prog1
EuroPVM/MPI 2003. Venice, September 29 – October 2
Measuring the P4 Overhead
17
0,030,035
0,040,045
0,050,055
0,060,065
0,07
100 200 300 400 500 600 700 800 900 1000
Size (bytes)
Tim
e (m
s)
Time (ms) BSD
Time (ms) IDSP
Time to send short messages between two operators
Overhead of the socket interface on IDSP
send
COMM_send
EuroPVM/MPI 2003. Venice, September 29 – October 2
Measuring the P4 Overhead
18
0
0,1
0,2
0,3
0,4
0,5
100 200 300 400 500 600 700 800 900 1000
Size (bytes)
Tim
e (m
s)
Time (ms) P4
Time (ms) IDSP
P4_send
COMM_send
Time to send short messages between two operators
Overhead of P4 interface on IDSP
EuroPVM/MPI 2003. Venice, September 29 – October 2
Conclusions
19
• IDSP, a message passing interface for DSPs, has been defined
and implemented
• The IDSP performance in the TI C6000 DSP architecture is
currently reasonably good (50µs for short messages)
• We have been able of supporting P4 upon the small IDSP
interface
• P4 performance upon IDSP is good, but not good enough for
high performance distributed digital signal processing
• A more tuned channel interface layer is needed for DSPs
EuroPVM/MPI 2003. Venice, September 29 – October 2
Current and Future Work
20
• IDSP is currently been augmented with MPI-like p2p primitives
such as COMM_waitany, etc.
• A DSP specific channel interface layer will be developed.
• The ADI and MPI will be supported by such layer.
• The 64 bits C6400 family will be faced soon.
EuroPVM/MPI 2003. Venice, September 29 – October 2 21
Thank you very much !
EuroPVM/MPI 2003. Venice, September 29 – October 2 22
Thank you very much !
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
23
Groups• MPI implement the concept of group• IDSP have a different concept of group
¿How is this managed?
Groups and processes in a MPI application runs in the context of an IDSP group IDSP
group MPI application
MPI group
MPI group
MPI group
EuroPVM/MPI 2003. Venice, September 29 – October 2
Implementing the P4 functionality upon IDSP
24
Listener process• P4 uses an auxiliary process for doing background work• IDSP have not an auxiliary thread
¿How do IDSP does this work?
• Doing this background work• Sending initial information for threads to run (threads
have not parameters at startup)
We use an asynchronous communicator for
Additional Port
Communication Port
Operator
SEND
RECEIVE
CONNECTION_REQ
DIEINITIAL_INFO
EuroPVM/MPI 2003. Venice, September 29 – October 2 25
- Un thread IDSP corre un algoritmo en un sentido diferente que un proceso MPI/P4, que corren todos el mismo programa -
EuroPVM/MPI 2003. Venice, September 29 – October 2 26
User Operator
Address Mapping Server
User Operator
Idsp_addr Ip_addr
Idsp_addr Ip_addr
receiver sender
Implementing the P4 functionality upon IDSP
26
The IP/IDSP mapping
• P4 maps process ranks into IP addresses
Every user operator keeps a cache of addresses
Register (idsp_addr, ip_addr)
Idsp_addr Ip_addr
1
Idsp_addr =
3
2
Get(ip_addr )
• IDSP/Sockets maps IP addresses into IDSP addresses: