Immutable Application Containers
Reproducibility of CAE-computations through Immutable Application Containers
HPC Advisory Council - Swiss Workshop 2015
Agenda
2
• Docker Introduction • How I got here
QNIBng
QNIBTerminal
QNIBMonitoring
• Immutable Application Containers
3
About Me
• >10y Iteration SysAdmin, SysOps, SysEngineer, R&D Engineer
DevOps @Locafox (hyper-scale web-service)
• Founder of QNIB Solutions Holistic System Management
Containerization of SysOps and Workload
Consultancy / Software Design & Development
Docker in a Nutshell
• Magnitude of different goods Not well suited together
• Different environments to handle them Transactional cost fairly high
5
Logistics Pre-1960
• Standard Container no mixup
easy transaction, no matter the environment
6
Container Logistics
7
Container Matrix
8
Multiple Guests
SERVER
HOST KERNEL
HYPERVISOR (Type II)
KERNEL
SERVICE
Userland (OS)
KERNEL KERNEL
Userland (OS)Userland (OS) Userland (OS)
SERVICE SERVICE
SERVER
HOST KERNEL
Userland (Na<ve) Userland (#1) Userland (#2)
SERVICE SERVICE
Traditional Virtualisation Containerisation
IB
IB
HYPERVIS. (Type I)
Userland
Container
HostHost
Kernel
Userland
Hypervisor
9
Interface ViewApplication Application
Traditional Virtualisation Containerisation
lightweight abstraction w/o IO overhead or startup
latency
NativeHosted
<<100functions
>100functions
emulated dev /hyper-calls
Syscall Interface
Library Abstraction
–McLuckie (Senior product lead for Google Compute Engine, Feb/2015)
But when it comes to cloud operations, „we see the VM as the only truly safe isolation. …
Until we see foolproof security for containers, we will always double-bag our
customers' workloads.“
10
• Isolation Hypervisor vs. Kernel Namespaces
• Resource Allocation Hypervisor vs. CGroups
• Security Hypervisor vs. SELinux/AppAmor
11
VM vs. Container
• 1/2 Day, July 16th @ISC High Performance Deep dive into the talking points
How Docker might impact System Operations & HPC Applications
Further discussion beyond what I am talking about today
12
Docker Workshop
• Full Day, September 28th @ISC Cloud&BigData
13
Docker Workshop #2
QNIBng
• B.Sc. report in 2011
15
IBPM
• Qualified Networking w/ InfiniBand (next generation) Log/Performance Monitoring targeted at IB layer
16
QNIBng
Day 2
QNIBTerminal
18
QNIBTerminal
http-proxy
srv discovery /health check consul
elk
kibana
logstash
influxdbinfluxdb
grafanagrafana
slurmctldslurmctld
compute0slurmd
compute<N>slurmd
Log/Events
Services
Performance
Compute
elasticsearch
ls-indexer logstash
kopf es-kopf
nginx
• Holistic approach to target complete HPC Stack
QNIBMonitoring
19
QNIBMonitoring
• Containerized Monitoring Stack Logstash Stack (ELK)
20
QNIBMonitoring
• Containerized Monitoring Stack Logstash Stack (ELK)
Performance Monitoring (Graphite-Universe)
21
QNIBMonitoring
• Containerized Monitoring Stack Logstash Stack (ELK)
Performance Monitoring (Graphite-Universe)
Combine Log and Performance Information
22
QNIBMonitoring
• Containerized Monitoring Stack Logstash Stack (ELK)
Performance Monitoring (Graphite-Universe)
Combine Log and Performance Information
Service Discovery / Health Checks
23
QNIBTerminal
SLURMEvents
DockerEvents
• Big head-room to connect layers Trace errors / events, which span over multiple layers
holistic view on complete stack
• High potential within System Operation Configure Once… Run anywhere…
• Independent Environment Training
Integration Test
ProofOfConcept to explore new stack components
24
Conclusion
MPI-Workloads w/ Docker
• 8 nodes (CentOS 7, 2x 4core XEON, 32GB, Mellanox ConnectX-2)
• 3 containers on top
CentOS6/7, Ubuntu12
• SLURM Resource Scheduler
1 native / 3 container partition
• Multiple Open MPI version installed
1.5, 1.6, 1.8
26
HPC Testcluster ‘Venus’
• osu-micro-benchmarks-4.4.1 • osu_alltoall with two tasks on two hosts
27
MPI µBenchmark
$ mpirun -np 2 -H venus001,venus002 $(pwd)/osu_alltoall# OSU MPI All-to-All Personalized Exchange Latency Test v4.4.1# Size Avg Latency(us)1 1.832 1.824 1.748 1.6316 1.6232 1.6864 1.80128 2.77256 3.11512 3.51
28
MPI µBenchmark [result]la
tenc
y [u
s]
0
1
2
3
4
5
Message Size (KB)
4 8 16 32 64 128 256 512 1024
native cos7 cos6 u12
29
MPI µBenchmark [result #2]la
tenc
y [u
s]
0
0,7
1,4
2,1
2,8
distribution 1.5.4 1.6.4 1.8.3
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
• mimics thermodynamic application workload
• LINPACK corrective
30
HPCG Benchmark
31
HPCG - Distributions ResultsG
FLO
P/s
3
3,75
4,5
5,25
6
native cos7 cos6 u12
CentOS 7.0 oMPI 1.6.4 gcc 4.8.2
CentOS 6.5 oMPI 1.5.4 gcc 4.4.7
Ubuntu12.0oMPI 1.5.4 gcc 4.6.3
32
HPCG - Overall ResultsG
FLO
P/s
3
3,75
4,5
5,25
6
distribution 1.5.4 1.6.4 1.8.4
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
• High potential within end-user application Build once… Run anywhere…
• Abstracted user-land enables fine tuning, regardless of bare-metal system
• But: Will the result be equivalent?
33
Conclusion
Immutable App. Containers
• Bundled up applications sounds good but how about the consistency of the results?
• The Container communicates w/ the hosts Kernel results should be fairly stable, since the syscall interface is quite mature
35
Motivation
• Laptop (VirtualBox) MacBook Pro, Intel Core i7 @3GHz, 16GB RAM, 512GB SSD
• Workstation AMD Phenom II X4 955 (3.2GHz) , 4GB RAM, 64GB SSD
• HPC Cluster 8x SUN FIRE X2250, each 2x Intel Xeon X5472, 32GB RAM, QDR InfiniBand
• AWS EC2 Public Cloud (XEN) Compute instance c3.xlarge, Intel Xeon E5-2680, 7.5 GB RAM, SSD Storage
Compute instance c4.4xlarge, Intel Xeon E5-2666v3, 30 GB RAM, SSD Storage
36
Test Bed
• OpenFOAM tutorial ‘cavity’ • 3 solid walls, 1 moving ‘lid’
37
Use-case
• Changes done to increase computational effort Mesh size increased from 20x20x1 to 1000x1000x1000
streched vertices dimensions by factor 10
Iterates 50 times with step-with 0.1ms
• decomposition into 16 cells decompose && mpirun -np16 icoFoam —parallel 2>&1 > log
• Each iteration outputs the average cell pressure Sum of the pressure leads to measure of consistency
38
Test Case setting
• Immutable Application Containers u1204of222: Ubuntu 12.04 & OpenFOAM 2.2.2
u1204of230: Ubuntu 12.04 & OpenFOAM 2.3.0
u1410of231: Ubuntu 14.10 & OpenFOAM 2.3.1
39
OpenFOAM Containers
• MacBook boot2docker (1.4.0), 3.16.7
CoreOS 618, 3.19
• Workstation CentOS6.6, v2.6.32
Ubuntu12.04, 3.13.0 & 3.17.7
Ubuntu14.10, 3.13.0 & 3.18.1
40
Host Systems
• HPC Cluster ‘venus’ CentOS7.0alpha, 3.10.0
• AWS EC2 Ubuntu14.04, 3.13.0
CoreOS 494, 3.17.2
CoreOS 618, 3.19
• Pressure remains the same among minor releases OpenFOAM 2.2.2: 8.6402816 p
OpenFOAM 2.3.0/2.3.1: 8.6402463 p
• Runtime varies a lot
41
Results
MacBookWorkstation
1xHPC.IBHPC.shmc3.xlarge2xHPC.IB
c4.4xlarge4xHPC.IB8xHPC.IB 7
89
1521
4142
6970
• A paper describing the study can be found http://doc.qnib.org/HPCAC2015.pdf
42
Results [cont]
• Cavity iterates towards a deterministic result Use Karman Vortex Street (chaotic, sensitive to random events)
Stable behaviour of the user-land within Immutable Container?
• Diversify Use-Cases EndUser Applications: e.g. EAGER Pipeline (Uni Tübingen)
If you got ideas / use-cases -> talk to me…
• How about GPGPU, MICs? The abstraction of Docker should only cover the same architecture
43
Future Work
• Results remain stable across multiple Harde-ware versions
Operating Systems
Kernel versions
Network Technology (SHM, InfiniBand)
• Keep images and input data could be enough • Vendors are able to fine-tune and certify
Distribution vendors
Software vendors
44
Conclusion
45
Thank you for your attention…
• Picture Credits p5/6: http://de.slideshare.net/dotCloud/why-docker2bisv4
p7: http://blog.docker.com
p25: http://www.umbc.edu/hpcreu/2014/projects/team4.html https://software.sandia.gov/hpcg/
p32: https://www.youtube.com/watch?v=0KLajv6kS6Q
p38: https://www.youtube.com/watch?v=hZm7lc4sC2o
p40: https://www.flickr.com/photos/dharmabum1964/3108162671