Upload
buihanh
View
215
Download
1
Embed Size (px)
Citation preview
Enea Confidential – Under NDA
SICS Multicore Day – 2013-09-23
System-level IPC on Multi-core
Platforms
Ola Dahl
CTO Office
Enea
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
• Enea
Before we start
• Myself
1968 Now
OSEFOUNDED
Services Middleware
~400 employees
468 MSEK revenue
Products and Services
Linux
Copyright © 2013 Enea AB
ST-
EricssonLTH LiU
Enea Confidential – Under NDA
OSE operating system – kernel services, file system services, IP
communication, program management, run-time loader, LINX
Number of communicating entities ~ tens of thousands (pid
space extension from 16 to 20 bits) – number of nodes ~ 100s
System-level IPC
Message-passing between processes – intra-node
and inter-node
Monitoring and event handling – fault-tolerance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
System-level IPC
Cloud
DD
B
A
C
D
C
D
A
B
A
B
A
B
CCCC
DDD
SoC Platform
Fixed Multi-Node
Elastic Multi-Node
Element Messaging Framework – Name
server, message dispatch, communication
patterns, HA functionality, Linux
#nodes ~ 100(s)
#threads/node ~ 1000s
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Communicating entities - Linux process, Linux thread, RTOS task, Bare-metal
executive, User-space thread, Other executing entity (e.g. in an event-driven
execution model)
IPC
Operating System Operating System
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Multicore, Multiple processing entities, Parallelism on different levels – inside
one SoC block, inside SoC, between SoC
Communication on different levels – interconnect, caches, memory, hardware
buffers and hardware IPC support
IPC and Multicore
Operating System Operating System
C0 C2 C3 C4C1 C0 C1 D0 D1 D2
Bus, Interconnect, Cache, Controllers, I/O Bus, Interconnect, Cache, Controllers, I/O
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Multicore, Multiple processing entities, Parallelism on different levels – inside
one SoC block, inside SoC, between SoC
Communication on different levels – interconnect, caches, memory, hardware
buffers and hardware IPC support
Real-time – core isolation – dedicated cores for real-time response
IPC and Multicore
Operating System Operating System
C0 C2 C3 C4C1 C0 C1 D0 D1 D2
Bus, Interconnect, Cache, Controllers, I/O Bus, Interconnect, Cache, Controllers, I/O
Realtime Non-Realtime
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
TCI6638K2K - Multicore DSP+ARM KeyStone II System-on-Chip -
http://www.ti.com/product/tci6638k2k
Heterogeneous Hardware
Processing – 8 C66x DSP Cores (up to 1.2 GHz), 4 ARM Cores (up to 1.4 GHz), Wireless comm (3GPP)
coprocessors
Interconnect and control - Multicore Navigator, TeraNet, Multicore Shared Memory Controller, HyperLinkCopyright © 2013 Enea AB
Enea Confidential – Under NDA
Core isolation for real-time response
Real-time domain and non-real-time
domain
Run-time categories in real-time domain
• Native threads
• User-space threads
• RTOS migration
• Other execution frameworks, e.g.
Open Event Machine
• ENEA LWRT
Heterogeneous Software
Operating System
C0 C1 D0 D1 D2
Bus, Interconnect, Cache, Controllers, I/O
Realtime Non-Realtime
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Communicating entities – e.g. processes, threads, user-space
threads, bare-metal executives
Levels of parallelism
• Multicore processor in a SoC
• Multiple blocks in a SoC
• Multiple SoC in a node
• Multiple nodes
Communication on different levels (e.g. intra-node and inter-
node)
• On each level – Establish contact, Perform communication,
Monitor and act on events, Close
System-level IPC and Multicore
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Linux
Where are we heading?
Virtualisation
Hardware
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
EE Times report - http://seminar2.techonline.com/~additionalresources/embedded_mar1913/embedded_mar1913.pdf
Linux
Linux usage
2013 – 50%
2012 – 46%
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Status of embedded Linux – March 2013http://elinux.org/images/c/cf/Status-of-Embedded-Linux-2013-03-JJ44.pdf
• Average time between Linux releases – 3.3 – 3.8 – 70 days
• Linux 3.4 – RPMsg for IPC between Linux and e.g. RTOS
• Linux 3.7 – ARM multi-platform support, ARM 64-bit support
• Linux 3.7 – perf trace (alternative to strace)
Status of Linux – September 2013
• Latest stable kernel – 3.11.1
• Example changes in 3.11 (released September 2, 2013): – ARM huge page support, KVM and XEN support for ARM64
– SYSV IPC message queue scalability improvements
• Example changes in 3.10 (released June 30, 2013): – Timerless multitasking
Linux
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Real-time framework e.g. Xenomai - http://www.xenomai.org/
PREEMPT_RT - https://rt.wiki.kernel.org/index.php/Main_Page
Core isolation and tickless operation – striving for ”Bare-Metal Multicore
Performance in a General-Purpose Operating System” -
http://www2.rdrop.com/~paulmck/scalability/paper/BareMetalMW.2013.02.25a.
Timerless multitasking in 3.10 retains 1 Hz tick also on isolated cores
Linux 3.12-rc1 (2013-09-16) - even more tickless kernel (1 Hz maintenance tick
removed) – still work to be done, e.g. with memory management
Linux and real-time
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
ITRS - http://public.itrs.net - fifteen-year assessment of the semiconductor
industry’s future technology requirements
ITRS 2012 UPDATE - http://public.itrs.net/Links/2012ITRS/Home2012.htm
• System Drivers - SOC Networking Driver, SOC Consumer Driver,
Microprocessor (MPU) driver, Mixed-Signal Driver, Embedded Memory
Driver
• SOC networking driver - moving towards “multicore architectures with
heterogeneous on-demand accelerator engines”, with “integration of on-
board switch fabric and L3 caches”
Hardware
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
SOC networking driver – MC/AE Architecture – from
http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf
Hardware
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
SOC networking driver – System performance and # of cores – from
http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf
Hardware
Assumptions - constant cost (die area), per-year increase of number of cores (1.4 x), core frequency
(1.05 x), accelerator engine frequency (1.05 x) - logic, memory, cache hierarchy, switching-fabric and
system interconnect will scale consistently with the number of cores
System performance – the “product of number of cores, core frequency, and accelerator engine
frequency”
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
NFV – Network Function Virtualization
ETSI - http://portal.etsi.org/NFV/NFV_White_Paper.pdf
“leveraging standard IT virtualisation technology to consolidate many network
equipment types onto industry standard high volume servers, switches and
storage, which could be located in Datacentres, Network Nodes and in the end
user premises”
Virtualization using e.g. KVM or XEN
Virtualization
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Establishing and performing efficient communication
Constraints from
• Real-time
• Hardware
with an increasing interest in virtualization
System-level IPC aspects
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Is there any remaining work to do?
IPC and Linux
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
IPC in Linux (and UNIX)
1964 Now
FOUNDEDpipe
’70 ’80 ’90 ’00 ’10
UNIX
SysV
POSIX
rt
mmap
SVR4
flock
4.2BSDPOSIX mq
Linux 2.6.6
CMA
Linux 3.2
eventfd
Linux 2.6.22
POSIX shmem
Linux 2.4
POSIX named
semaphore
Linux 2.6
EneaEmacs
Overview, book, man pages, etc. by Michael Kerrisk - http://man7.org/
Linux 1.0
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
IPC on Linux
2000 Now
FOUNDED
’2 ’4 ’6 ’8 ’10
OpenMPI RPMsg kdbus
nanomsg
Binder
LINX for Linux
Enea Element
AF_BUSDBUS
TIPC0MQ
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
sysv ipc shared mem optimizations, June 18, 2013
http://lwn.net/Articles/555469/
“With these patches applied, a custom shm microbenchmark stressing
shmctl doing IPC_STAT with 4 threads a million times, reduces the
execution time by 50%”
ALS: Linux interprocess communication and kdbus, May 30, 2013
http://lwn.net/Articles/551969/
“The work on kdbus is progressing well and Kroah-Hartman expressed
optimism that it would be merged before the end of the year. Beyond
just providing a faster D-Bus (which could be accomplished without
moving it into the kernel, he said), it is his hope that kdbus can
eventually replace Android's binder IPC mechanism. “
Work in progress
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Speeding up D-Bus, February 29, 2012
http://lwn.net/Articles/484203/
“D-Bus currently relies on a daemon process to authenticate processes
and deliver messages that it receives over Unix sockets. Part of the
performance problem is caused by the user-space daemon, which
means that messages need two trips through the kernel on their way to
the destination”
Fast interprocess communication revisited, November 9, 2011
https://lwn.net/Articles/466304/
“Rather we start with the observation that this many attempts to solve
essentially the same problem suggests that something is lacking in
Linux. There is, in other words, a real need for fast IPC that Linux
doesn't address”
Work in progress
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Fast interprocess messaging, September 15, 2010
http://lwn.net/Articles/405346/
“Rather than copy messages through a shared segment, they would
rather deliver messages directly into another process's address space.
To this end, Christopher Yeoh has posted a patch implementing what
he calls cross memory attach.”
Work in progress
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Functionality
Which IPC to use?
Performance
CostTechnology constraints
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Choosing an IPC - Functionality
Functionality SysV
Shared
memory
POSIX
Shared
memory
FIFO Stream
Socket
0MQ LINX
End-point
addressing
SysV key Shmem
object name
File system
node
AF_UNIX –
file system
node,
AF_INET – IP
adress and
port
Transport and
address
(Transport =
TCP, ipc,
inproc)
Endpoint
name
specifying
path to peer
End-point
repr.
Variable File desc File desc x 2 Socket
descriptor
0MQ socket LINX
endpoint, spid
Channels A
memory
area
A memory
area
The FIFO
(unidirectional)
The socket
(bidirectional)
0MQ socket
internal
(bidirectional)
– e.g. TCP or
UNIX domain
socket
Buffer
associated
with LINX
endpoint
Initialisation shmget,
shmat
shm_open,
mmap
mkfifo, open socket, bind,
listen, accept,
connect
Create 0MQ
context and
0MQ socket
linx_open,
linx_hunt
Closing shmdt munmap,
shm_unlink
close, unlink close Close 0MQ
socket
linx_close
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Choosing an IPC - Functionality
Functionality SysV
Shared
memory
POSIX
Shared
memory
FIFO Stream
Socket
0MQ LINX
Sending write to
memory, no
synchronizati
on
write to
memory, no
synchronizat
ion
write write Send message
or number of
bytes to 0MQ
socket
Send LINX
signal
Receiving Read from
memory, no
synchronizati
on
Read from
memory, no
synchronizat
ion
read read Receive
message or
number of
bytes from
0MQ socket
Receive
LINX signal
Blocking No (unless
implemented
separately)
No (unless
implemented
separately)
Blocking
and non-
blocking
R/W
Blocking and
non-blocking
R/W
Blocking and
non-blocking
R/W
Receive is
blocking
(non-
blocking
possible),
Send is not
Monitoring No (unless
implemented
separately)
No (unless
implemented
separately)
select, poll select, poll Monitoring
callback can be
registered with
0MQ context
LINX attach
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Choosing an IPC – Technology constraints
Technology 0MQ kdbus LINX
Sockets Yes No Yes, own type
Daemons No No Discovery daemon
(optional)
Kernel modules No Yes Yes
Pthread
synchronization
Yes No Yes
Kernel synchronization No Yes Yes
Programming
languages
C and more C C
Development status Latest stable
release is 3.2.3,
from May 2013
Estimated to be ready in
2013
Initial release 2006,
current version is
2.6.5, released June
2013
License LGPLv3 LGPL BSD and GPLv2
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
• ipc-bench: A UNIX inter-process communication benchmark
• University of Cambridge -
http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/
Measures Latency, Throughput, IPI latency
• Public results dataset
“Since we have found IPC performance to be a complex, multi-variate
problem, and because we believe that having an open corpus of
performance data will be useful to guide the development of
hypervisors, kernels and programming frameworks, we provide a
database of aggregated ipc-bench datasets.”
Enea and ipc-bench – porting to 32-bit, porting to ARM, porting to
PowerPC, adding tests for CMA, LINX, ZeroMQ
Choosing an IPC - performance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Why is this interesting?
From The case for reconfigurable I/O channels, S. Smith et al,
RESoLVE12, 2012 - http://anil.recoil.org/papers/2012-resolve-fable.pdf
“We show dramatic differences in performance between
communication mechanisms depending on locality and machine
architecture, and observe that the interactions of communication
primitives are often complex and sometimes counter-intuitive”
“Furthermore, we show that virtualisation can cause unexpected effects
due to OS ignorance of the underlying, hypervisor-level hardware
setup”
Measuring IPC performance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html
Measuring IPC performance
Pairwise IPC latency between cores
64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB
Linux 3.8.5-030805-generic, x86_64
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html
Measuring IPC performance
Pairwise IPC throughput between cores. (x-axis is packet size, y-axis is Gbps)
64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB
Linux 3.8.5-030805-generic, x86_64
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7
Measuring IPC performance
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
64 4096 65536
mempipe_spin_thr
mempipe_thr
pipe_thr
tcp_thr
unix_thr
vmsplice_coop_pipe_thr
vmsplice_pipe_thr
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
ARM Pandaboard @ 1 GHz, Cores 0 and 1
Measuring IPC performance
0
500
1000
1500
2000
2500
3000
64 4096 65536
mempipe_spin_thr
mempipe_thr
pipe_thr
tcp_thr
unix_thr
vmsplice_coop_pipe_thr
vmsplice_pipe_thr
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7
Measuring IPC performance
0
5000
10000
15000
20000
25000
30000
zmq_inproc_thr zmq_ipc_thr zmq_tcp_thr unix_thr
0MQ vs UNIX sockets
64
4096
65536
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Brendan Gregg - Linux Performance Analysis and Tools - SCaLE 11x 2013
http://dtrace.org/blogs/brendan/2013/06/08/linux-performance-analysis-and-
tools/
Profiling and Performance
Device drivers
VFS, File
systems, Block
device interface
Sockets,
TCP/UDP, IP,
Ethernet
Scheduler, VM
System call interface
Apps and libs
***
***
- perf - https://perf.wiki.kernel.org/index.php/Main_Page
- DTrace - https://github.com/dtrace4linux
- SystemTap - http://sourceware.org/systemtap/
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Collecting data with perf – IPC test with pipes
Profiling and Performance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Analyzing data recorded with perf
Profiling and Performance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
Examining where time is spent
Profiling and Performance
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
A lot more to choose from*: strace, netstat, top, pidstat, mpstat, dstat,
vmstat, slabtop, free, tcpdump, ip, nicstat, iostat, iotop, blktrace, ps,
pmap, traceroute, ntop, ss, lsof, oprofile, gprof, kcachegrind, valgrind,
google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng,
ktap, ...
Profiling and Performance
* http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
IPC in Linux - Stable but not finished
IPC on Linux – diversified
Performance and profiling – ipc-bench (with adaptations and
extensions), a large selection of profiling tools
Summary
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
• A variety of IPC mechanisms exist
• There is no clear one-fits-all solution
• Performance aspects and functionality aspects (location
transparency, robustness) – different trade-offs for different
use-cases
• IPC and Linux – many stable mechanisms but still work-in-
progress (e.g. kdbus)
• Performance and profiling required
– ipc-bench (with adaptations and extensions)
– perf for performance profiling (one of several, however with a powerful
feature set)
Conclusions
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
• Systems requirements and design - parallelism, partitioning,
heterogeneity, functional requirements, performance requirements –
choosing an IPC mechanism
• Programming - frameworks and execution environments – legacy
and re-use – choosing a programming paradigm
• Verification - measurements and profiling - are we designing (and
implementing) the system as we planned? – choosing the right tools
Challenges
Enea as an IPC partner - Long-term experience, Competence for building future
IPC systems – development, integration, configuration, performance assessment
Copyright © 2013 Enea AB
Enea Confidential – Under NDA
System-level IPC on multicore platforms
Multicore System-on-Chip solutions, offering parallelization and partitioning, are increasingly used in real-time systems. As the number of cores increase, often in combination with increased heterogeneity in the form of hardware accelerated functionality, we see increased demands on effective communication, inside a multicore node but also on an inter-node system-level.
The presentation will outline some of the challenges, as seen from Enea, to be expected when building future communication mechanisms, with requirements on performance and scalability, as well as transparency for applications. We will give examplesfrom ongoing work in the Linux area, from Enea and from other open source contributors.
SICS Multicore day
Copyright © 2013 Enea AB