45
SICS Multicore Day 2013-09-23 System-level IPC on Multi-core Platforms Ola Dahl CTO Office Enea Copyright © 2013 Enea AB

System-level IPC on Multi-core Platforms · System-level IPC Cloud D D B A C D C D A B A B A C C C C DD SoC Platform Fixed Multi-Node Elastic Multi-Node Element Messaging Framework

  • Upload
    buihanh

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Enea Confidential – Under NDA

SICS Multicore Day – 2013-09-23

System-level IPC on Multi-core

Platforms

Ola Dahl

CTO Office

Enea

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

• Enea

Before we start

• Myself

1968 Now

OSEFOUNDED

Services Middleware

~400 employees

468 MSEK revenue

Products and Services

Linux

Copyright © 2013 Enea AB

ST-

EricssonLTH LiU

Enea Confidential – Under NDA

OSE operating system – kernel services, file system services, IP

communication, program management, run-time loader, LINX

Number of communicating entities ~ tens of thousands (pid

space extension from 16 to 20 bits) – number of nodes ~ 100s

System-level IPC

Message-passing between processes – intra-node

and inter-node

Monitoring and event handling – fault-tolerance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

System-level IPC

Cloud

DD

B

A

C

D

C

D

A

B

A

B

A

B

CCCC

DDD

SoC Platform

Fixed Multi-Node

Elastic Multi-Node

Element Messaging Framework – Name

server, message dispatch, communication

patterns, HA functionality, Linux

#nodes ~ 100(s)

#threads/node ~ 1000s

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Communicating entities - Linux process, Linux thread, RTOS task, Bare-metal

executive, User-space thread, Other executing entity (e.g. in an event-driven

execution model)

IPC

Operating System Operating System

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Multicore, Multiple processing entities, Parallelism on different levels – inside

one SoC block, inside SoC, between SoC

Communication on different levels – interconnect, caches, memory, hardware

buffers and hardware IPC support

IPC and Multicore

Operating System Operating System

C0 C2 C3 C4C1 C0 C1 D0 D1 D2

Bus, Interconnect, Cache, Controllers, I/O Bus, Interconnect, Cache, Controllers, I/O

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Multicore, Multiple processing entities, Parallelism on different levels – inside

one SoC block, inside SoC, between SoC

Communication on different levels – interconnect, caches, memory, hardware

buffers and hardware IPC support

Real-time – core isolation – dedicated cores for real-time response

IPC and Multicore

Operating System Operating System

C0 C2 C3 C4C1 C0 C1 D0 D1 D2

Bus, Interconnect, Cache, Controllers, I/O Bus, Interconnect, Cache, Controllers, I/O

Realtime Non-Realtime

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

TCI6638K2K - Multicore DSP+ARM KeyStone II System-on-Chip -

http://www.ti.com/product/tci6638k2k

Heterogeneous Hardware

Processing – 8 C66x DSP Cores (up to 1.2 GHz), 4 ARM Cores (up to 1.4 GHz), Wireless comm (3GPP)

coprocessors

Interconnect and control - Multicore Navigator, TeraNet, Multicore Shared Memory Controller, HyperLinkCopyright © 2013 Enea AB

Enea Confidential – Under NDA

Core isolation for real-time response

Real-time domain and non-real-time

domain

Run-time categories in real-time domain

• Native threads

• User-space threads

• RTOS migration

• Other execution frameworks, e.g.

Open Event Machine

• ENEA LWRT

Heterogeneous Software

Operating System

C0 C1 D0 D1 D2

Bus, Interconnect, Cache, Controllers, I/O

Realtime Non-Realtime

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Communicating entities – e.g. processes, threads, user-space

threads, bare-metal executives

Levels of parallelism

• Multicore processor in a SoC

• Multiple blocks in a SoC

• Multiple SoC in a node

• Multiple nodes

Communication on different levels (e.g. intra-node and inter-

node)

• On each level – Establish contact, Perform communication,

Monitor and act on events, Close

System-level IPC and Multicore

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Linux

Where are we heading?

Virtualisation

Hardware

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

EE Times report - http://seminar2.techonline.com/~additionalresources/embedded_mar1913/embedded_mar1913.pdf

Linux

Linux usage

2013 – 50%

2012 – 46%

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Status of embedded Linux – March 2013http://elinux.org/images/c/cf/Status-of-Embedded-Linux-2013-03-JJ44.pdf

• Average time between Linux releases – 3.3 – 3.8 – 70 days

• Linux 3.4 – RPMsg for IPC between Linux and e.g. RTOS

• Linux 3.7 – ARM multi-platform support, ARM 64-bit support

• Linux 3.7 – perf trace (alternative to strace)

Status of Linux – September 2013

• Latest stable kernel – 3.11.1

• Example changes in 3.11 (released September 2, 2013): – ARM huge page support, KVM and XEN support for ARM64

– SYSV IPC message queue scalability improvements

• Example changes in 3.10 (released June 30, 2013): – Timerless multitasking

Linux

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Real-time framework e.g. Xenomai - http://www.xenomai.org/

PREEMPT_RT - https://rt.wiki.kernel.org/index.php/Main_Page

Core isolation and tickless operation – striving for ”Bare-Metal Multicore

Performance in a General-Purpose Operating System” -

http://www2.rdrop.com/~paulmck/scalability/paper/BareMetalMW.2013.02.25a.

pdf

Timerless multitasking in 3.10 retains 1 Hz tick also on isolated cores

Linux 3.12-rc1 (2013-09-16) - even more tickless kernel (1 Hz maintenance tick

removed) – still work to be done, e.g. with memory management

Linux and real-time

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

ITRS - http://public.itrs.net - fifteen-year assessment of the semiconductor

industry’s future technology requirements

ITRS 2012 UPDATE - http://public.itrs.net/Links/2012ITRS/Home2012.htm

• System Drivers - SOC Networking Driver, SOC Consumer Driver,

Microprocessor (MPU) driver, Mixed-Signal Driver, Embedded Memory

Driver

• SOC networking driver - moving towards “multicore architectures with

heterogeneous on-demand accelerator engines”, with “integration of on-

board switch fabric and L3 caches”

Hardware

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

SOC networking driver – MC/AE Architecture – from

http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf

Hardware

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

SOC networking driver – System performance and # of cores – from

http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf

Hardware

Assumptions - constant cost (die area), per-year increase of number of cores (1.4 x), core frequency

(1.05 x), accelerator engine frequency (1.05 x) - logic, memory, cache hierarchy, switching-fabric and

system interconnect will scale consistently with the number of cores

System performance – the “product of number of cores, core frequency, and accelerator engine

frequency”

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

NFV – Network Function Virtualization

ETSI - http://portal.etsi.org/NFV/NFV_White_Paper.pdf

“leveraging standard IT virtualisation technology to consolidate many network

equipment types onto industry standard high volume servers, switches and

storage, which could be located in Datacentres, Network Nodes and in the end

user premises”

Virtualization using e.g. KVM or XEN

Virtualization

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Establishing and performing efficient communication

Constraints from

• Real-time

• Hardware

with an increasing interest in virtualization

System-level IPC aspects

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Is there any remaining work to do?

IPC and Linux

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

IPC in Linux (and UNIX)

1964 Now

FOUNDEDpipe

’70 ’80 ’90 ’00 ’10

UNIX

SysV

POSIX

rt

mmap

SVR4

flock

4.2BSDPOSIX mq

Linux 2.6.6

CMA

Linux 3.2

eventfd

Linux 2.6.22

POSIX shmem

Linux 2.4

POSIX named

semaphore

Linux 2.6

EneaEmacs

Overview, book, man pages, etc. by Michael Kerrisk - http://man7.org/

Linux 1.0

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

IPC on Linux

2000 Now

FOUNDED

’2 ’4 ’6 ’8 ’10

OpenMPI RPMsg kdbus

nanomsg

Binder

LINX for Linux

Enea Element

AF_BUSDBUS

TIPC0MQ

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

sysv ipc shared mem optimizations, June 18, 2013

http://lwn.net/Articles/555469/

“With these patches applied, a custom shm microbenchmark stressing

shmctl doing IPC_STAT with 4 threads a million times, reduces the

execution time by 50%”

ALS: Linux interprocess communication and kdbus, May 30, 2013

http://lwn.net/Articles/551969/

“The work on kdbus is progressing well and Kroah-Hartman expressed

optimism that it would be merged before the end of the year. Beyond

just providing a faster D-Bus (which could be accomplished without

moving it into the kernel, he said), it is his hope that kdbus can

eventually replace Android's binder IPC mechanism. “

Work in progress

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Speeding up D-Bus, February 29, 2012

http://lwn.net/Articles/484203/

“D-Bus currently relies on a daemon process to authenticate processes

and deliver messages that it receives over Unix sockets. Part of the

performance problem is caused by the user-space daemon, which

means that messages need two trips through the kernel on their way to

the destination”

Fast interprocess communication revisited, November 9, 2011

https://lwn.net/Articles/466304/

“Rather we start with the observation that this many attempts to solve

essentially the same problem suggests that something is lacking in

Linux. There is, in other words, a real need for fast IPC that Linux

doesn't address”

Work in progress

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Fast interprocess messaging, September 15, 2010

http://lwn.net/Articles/405346/

“Rather than copy messages through a shared segment, they would

rather deliver messages directly into another process's address space.

To this end, Christopher Yeoh has posted a patch implementing what

he calls cross memory attach.”

Work in progress

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Functionality

Which IPC to use?

Performance

CostTechnology constraints

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Choosing an IPC - Functionality

Functionality SysV

Shared

memory

POSIX

Shared

memory

FIFO Stream

Socket

0MQ LINX

End-point

addressing

SysV key Shmem

object name

File system

node

AF_UNIX –

file system

node,

AF_INET – IP

adress and

port

Transport and

address

(Transport =

TCP, ipc,

inproc)

Endpoint

name

specifying

path to peer

End-point

repr.

Variable File desc File desc x 2 Socket

descriptor

0MQ socket LINX

endpoint, spid

Channels A

memory

area

A memory

area

The FIFO

(unidirectional)

The socket

(bidirectional)

0MQ socket

internal

(bidirectional)

– e.g. TCP or

UNIX domain

socket

Buffer

associated

with LINX

endpoint

Initialisation shmget,

shmat

shm_open,

mmap

mkfifo, open socket, bind,

listen, accept,

connect

Create 0MQ

context and

0MQ socket

linx_open,

linx_hunt

Closing shmdt munmap,

shm_unlink

close, unlink close Close 0MQ

socket

linx_close

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Choosing an IPC - Functionality

Functionality SysV

Shared

memory

POSIX

Shared

memory

FIFO Stream

Socket

0MQ LINX

Sending write to

memory, no

synchronizati

on

write to

memory, no

synchronizat

ion

write write Send message

or number of

bytes to 0MQ

socket

Send LINX

signal

Receiving Read from

memory, no

synchronizati

on

Read from

memory, no

synchronizat

ion

read read Receive

message or

number of

bytes from

0MQ socket

Receive

LINX signal

Blocking No (unless

implemented

separately)

No (unless

implemented

separately)

Blocking

and non-

blocking

R/W

Blocking and

non-blocking

R/W

Blocking and

non-blocking

R/W

Receive is

blocking

(non-

blocking

possible),

Send is not

Monitoring No (unless

implemented

separately)

No (unless

implemented

separately)

select, poll select, poll Monitoring

callback can be

registered with

0MQ context

LINX attach

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Choosing an IPC – Technology constraints

Technology 0MQ kdbus LINX

Sockets Yes No Yes, own type

Daemons No No Discovery daemon

(optional)

Kernel modules No Yes Yes

Pthread

synchronization

Yes No Yes

Kernel synchronization No Yes Yes

Programming

languages

C and more C C

Development status Latest stable

release is 3.2.3,

from May 2013

Estimated to be ready in

2013

Initial release 2006,

current version is

2.6.5, released June

2013

License LGPLv3 LGPL BSD and GPLv2

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

• ipc-bench: A UNIX inter-process communication benchmark

• University of Cambridge -

http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/

Measures Latency, Throughput, IPI latency

• Public results dataset

“Since we have found IPC performance to be a complex, multi-variate

problem, and because we believe that having an open corpus of

performance data will be useful to guide the development of

hypervisors, kernels and programming frameworks, we provide a

database of aggregated ipc-bench datasets.”

Enea and ipc-bench – porting to 32-bit, porting to ARM, porting to

PowerPC, adding tests for CMA, LINX, ZeroMQ

Choosing an IPC - performance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Why is this interesting?

From The case for reconfigurable I/O channels, S. Smith et al,

RESoLVE12, 2012 - http://anil.recoil.org/papers/2012-resolve-fable.pdf

“We show dramatic differences in performance between

communication mechanisms depending on locality and machine

architecture, and observe that the interactions of communication

primitives are often complex and sometimes counter-intuitive”

“Furthermore, we show that virtualisation can cause unexpected effects

due to OS ignorance of the underlying, hypervisor-level hardware

setup”

Measuring IPC performance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html

Measuring IPC performance

Pairwise IPC latency between cores

64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB

Linux 3.8.5-030805-generic, x86_64

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html

Measuring IPC performance

Pairwise IPC throughput between cores. (x-axis is packet size, y-axis is Gbps)

64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB

Linux 3.8.5-030805-generic, x86_64

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7

Measuring IPC performance

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

64 4096 65536

mempipe_spin_thr

mempipe_thr

pipe_thr

tcp_thr

unix_thr

vmsplice_coop_pipe_thr

vmsplice_pipe_thr

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

ARM Pandaboard @ 1 GHz, Cores 0 and 1

Measuring IPC performance

0

500

1000

1500

2000

2500

3000

64 4096 65536

mempipe_spin_thr

mempipe_thr

pipe_thr

tcp_thr

unix_thr

vmsplice_coop_pipe_thr

vmsplice_pipe_thr

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7

Measuring IPC performance

0

5000

10000

15000

20000

25000

30000

zmq_inproc_thr zmq_ipc_thr zmq_tcp_thr unix_thr

0MQ vs UNIX sockets

64

4096

65536

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Brendan Gregg - Linux Performance Analysis and Tools - SCaLE 11x 2013

http://dtrace.org/blogs/brendan/2013/06/08/linux-performance-analysis-and-

tools/

Profiling and Performance

Device drivers

VFS, File

systems, Block

device interface

Sockets,

TCP/UDP, IP,

Ethernet

Scheduler, VM

System call interface

Apps and libs

***

***

- perf - https://perf.wiki.kernel.org/index.php/Main_Page

- DTrace - https://github.com/dtrace4linux

- SystemTap - http://sourceware.org/systemtap/

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Collecting data with perf – IPC test with pipes

Profiling and Performance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Analyzing data recorded with perf

Profiling and Performance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

Examining where time is spent

Profiling and Performance

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

A lot more to choose from*: strace, netstat, top, pidstat, mpstat, dstat,

vmstat, slabtop, free, tcpdump, ip, nicstat, iostat, iotop, blktrace, ps,

pmap, traceroute, ntop, ss, lsof, oprofile, gprof, kcachegrind, valgrind,

google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng,

ktap, ...

Profiling and Performance

* http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

IPC in Linux - Stable but not finished

IPC on Linux – diversified

Performance and profiling – ipc-bench (with adaptations and

extensions), a large selection of profiling tools

Summary

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

• A variety of IPC mechanisms exist

• There is no clear one-fits-all solution

• Performance aspects and functionality aspects (location

transparency, robustness) – different trade-offs for different

use-cases

• IPC and Linux – many stable mechanisms but still work-in-

progress (e.g. kdbus)

• Performance and profiling required

– ipc-bench (with adaptations and extensions)

– perf for performance profiling (one of several, however with a powerful

feature set)

Conclusions

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

• Systems requirements and design - parallelism, partitioning,

heterogeneity, functional requirements, performance requirements –

choosing an IPC mechanism

• Programming - frameworks and execution environments – legacy

and re-use – choosing a programming paradigm

• Verification - measurements and profiling - are we designing (and

implementing) the system as we planned? – choosing the right tools

Challenges

Enea as an IPC partner - Long-term experience, Competence for building future

IPC systems – development, integration, configuration, performance assessment

Copyright © 2013 Enea AB

Enea Confidential – Under NDA

System-level IPC on multicore platforms

Multicore System-on-Chip solutions, offering parallelization and partitioning, are increasingly used in real-time systems. As the number of cores increase, often in combination with increased heterogeneity in the form of hardware accelerated functionality, we see increased demands on effective communication, inside a multicore node but also on an inter-node system-level.

The presentation will outline some of the challenges, as seen from Enea, to be expected when building future communication mechanisms, with requirements on performance and scalability, as well as transparency for applications. We will give examplesfrom ongoing work in the Linux area, from Enea and from other open source contributors.

SICS Multicore day

Copyright © 2013 Enea AB