Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
&a Quantitative Comparison
Xu Wanghyper.sh; Kata Containers Architecture Committee
Fupan Lihyper.sh; Kata Containers Upstream Developer
“All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections.”
----David Wheeler
Agenda• Secure Containers, or Linux ABI Oriented Virtualization
• Architecture Comparison
• Benchmarks
• The Future of the Secure Container Projects
What’s Kata Containers
• A container runtime, like runC
• Built w/ virtualization tech, like VM
• Initiated by hyper.sh and Intel®
• Hosted by OpenStack Foundation
• Contributed by Huawei, Google, MSFT, etc.
Kata Containers is Virtualized Container
What’s gVisor• A container runtime, like runC, too
• A user-space kernel, written in Go
• implements a substantial portion of the Linux
system surface
• Developed in Google and was open-sourced in
mid-2018
gVisor implements Linux by way of Linux.
The Similarity• OCI compatible container runtime
• Linux ABI for container applications
• Support OCI runtime spec (docker, k8s, zun…)
• Extra isolation layer
• Guest Kernel, or say, Linux* in Linux
• Do not expose system interface of the host to the container apps
• Goals
• Less overhead, better isolation
Container App
BLACK BOX
Host OS
Linux ABI
Linux ABI
OCI
Inte
rfac
e H
andl
ing
General Architecture ofKata Containers and gVisor
Architecture Comparison
Run Kata Containers• Employ existing VMM (or improved) and standard Linux kernel
• Per-container shims and per-sandbox proxy are stand-alone binaries, or built-in as library (such as in containerd shim v2 implementation)
• An agent in guest OS based on libcontainer
Kata Agent
VMM (e.g. Qemu)
shimKata runtime
Container App
proxyContainer App
virt-serial or vsock
No proxy when vscok enabled
Guest Linux Kernel
Host Kernel
runC-likecmdline
IO-stub process for container Apps shim
Run gVisor• All in one binary (runsc + Sentry +Gofer)• Sentry as a user-space kernel, and per-container process stub• Gofer for 9p IO
Sentry
runsc
ContainerAppSentry Container
AppGofer
runC-likecmdline
Gofer
Syscall redirectHost Kernel
9p
Detailed Architecture – Kata • Classic hardware-assistedvirtualization
• Provide virtual hardwareinterface to Guest
• 2 layers of isolation• kvm and CPU ring protection
• Many IO optimization ways• Pass through• Vhost and vhost-user
Guest Kernel
Drivers
agent Appproc
Appproc
Qem
ude
vices
kvm vhost
Host Kernel
vhost-user
passthru
Detailed Architecture – gVisor (1)• 2 layers of isolations
• Minimal kernel and written in Golang• Safer but the compatibility…
• Ring protection and small set of syscalls• Avoid access more buggy syscalls
• File operations are proceeded inseparated gofer process
• Syscall interception by ptrace or kvm• The performance impactions…
Syscallinterception
Host Kernel
seccomp/namespace
sentryApp proc Gofer
Detailed Architecture – gVisor (2)• In gVisor-kvm:
• Sentry is both host process and guestkernel
• Ring0 lower in guest• Upper in VM• In Host
• Kvm is for higher syscall interceptionperformance rather than isolation
• No insolation between different modesof sentry itself
kvm: Syscallinterception
Host Kernel
seccomp/namespace
Sentry inhost
App proc
GoferSentry as guest
kernel
Expectations based on the Arch• Isolation:
• Both include 2 isolation layer – better isolation than runC
• Compatibility:• Kata should have better compatibility over gVisor
• Performance:• Both should have little overhead on CPU/Memory• gVisor should have smaller memory footprint over kata, and may boot faster• gVisor may have some pain on syscall heavy (include IO heavy) workloads• Kata may have some IO optimization ways
Benchmarks and Analytics
Benchmarks & Setup• Functional Tests
• Syscall coverage
• Standard Benchmarks• Boot time• Memory footprint• CPU/Memory Performance• IO Performance
• Networking Performance
• Real-life cases• nginx, redis, tensorflow
• gVisor didn’t complete a mysql test, and a jenkins test (pull and build)failed due to the ‘git clone’ failure
Testing Setup• Testing Setup
• Most of the Cases: server on packet.net
CPU: 4 Physical Cores @ 2.0 GHz (1 × E3-1578L)
Memory: 32GB DDR4-2400 ECC RAM
Storage: 240 GB of SSD (1 × 240 GB)
Network: 10 Gbps Network (2 × INTEL X710 NIC'S IN TLB)
• Disk IO cases: a more powerful server with additional disks
CPU: 24 Physical Cores @ 2.2 GHz (2 × E5-2650 V4)
Memory: 256 GB of DDR4 ECC RAM (16 × 16 GB)
Storage: 2.8 TB of SSD (6 × 480GB SSD)
• Most of the test containers have 8GB memory if not further noted
• For networking tests, two servers play roles as client and server
Testing
client
container
host
disks
Syscall Coverage & Memory Footprint• gVisor will call about 70 syscalls
according to the code
• For kata, we collect statistics ofthe qemu processes in differentcases• apt-get: 53• redis: 35• nginx: 36
• With busybox (512MB memory)• We enabled template for Kata
01000020000300004000050000600007000080000
kata containers gvisor
Memory Footprint (KB)
single container average of 20
Boot time• Boot busybox image
• Kata (different configurations), gvisor, runc
10
0.5
1
1.5
2
2.5
Boot time (seconds)
cli_dis-factory
cli_en-factory
shimv2_dis-factory
shimv2_en-factory
gvisor
runc
CPU/Memory Performance• Sysbench CPU and memory test
• Test 5 rounds with sysbench and get the average result
• CPU overhead are very limited for both kata and gVisor• Random memory write: 2.5% vs 5.5% overhead comparing to host• Seq memory write: 8% vs 13% overhead comparing to host
10.00048 10.0004 10.00066
0
2
4
6
8
10
12
host kata gvisor
CPU Test (sec), Lower is better
5246.844868.7 4561.58
1917.63 1870.29 1815.2
0
1000
2000
3000
4000
5000
6000
host kata gvisor
Memory Write
avg seq write (MB/s) avg rand write (MB/s)
IO performance (1)• IO tests with FIO, gVisor didn’t complete the test
• Note: The “kata passthru fs” has multi-queue support (in the default kata guest kernel), which is not present inthe host driver
0
5000
10000
15000
20000
25000
30000
128k randread 128k randwrite 4k randread 4k randwrite
Buffer IO (MiB/s)
host fs runC kata passthru(virtio-blk) kata passthru(virtio-scsi) kata 9p
0
200
400
600
800
1000
1200
128k randread 128k randwrite 4k randread 4k randwrite
Direct IO (MiB/s)
host fs runC kata passthru fs
IO Performance (2)• IO test with dd command
• dd if=/dev/zero of=/test/fio-rand-read bs=4k count=25000000• dd if=/dev/zero of=/test/fio-rand-read bs=128k count=780000
• Note: pay attention to the cache of 9p• The DIO on a network FS like 9p only means no guest (client) page cache, but there may exist cache on the host (server)
side
0100200300400500600700800
128k 4k
dd test (MB/s)
host fs runC kata passthru fs kata 9p gvisor 9p
Networking Performance• The networking performance is tested on 10Gb Ethernet
• In default configurations
10123456789
10
Network Throughput (Gbps)
host docker container kata gvisor
Real-life case: Nginx• Apache Bench version 2.3
• ab -n 50000 -c 100 http://10.100.143.131:8080/
• General result
runC Kata* gVisorConcurrency Level 100 100 100Time taken for tests 3.455 3.439 161.338 secondsComplete requests 50000 50000 50000Failed requests 0 0 0Total transferred 42250000 42700000 42250000 bytesHTML transferred 30600000 30600000 30600000 bytesRequests per second 14473.73 14541.18 309.91 [#/sec] (mean)Time per request 6.909 6.877 322.677 [ms] (mean)Time per request 0.069 0.069 3.227 [ms] (mean, across all concurrent requests)Transfer rate 11943.65 12127.12 255.73 [Kbytes/sec] received
* The www-root of the kata test case is not located on 9pfs
Real-life case: Nginx (Cont)• Detailed Connection time(ms) and percentagerunc min mean[+/-sd] median maxConnect 0 0 0.3 0 7Processing 1 7 0.3 7 14Waiting 1 7 0.3 7 8Total 3 7 0.4 7 15
kata min mean[+/-sd] median maxConnect 0 0 0.1 0 7Processing 2 7 1 7 207Waiting 2 7 1 7 207Total 3 7 1 7 207
gvisor min mean[+/-sd] median maxConnect 0 0 0.1 0 2Processing 44 322 119.8 305 836Waiting 21 322 119.9 304 836Total 45 322 119.8 305 836
7 7 7 7 7 8 8 8
207305
355390
415
485
553
634674
836
7 7 7 7 7 7 7 7 150
100
200
300
400
500
600
700
800
900
50% 66% 75% 80% 90% 95% 98% 99%100
%
Percentage of the requests served within a certain time
(ms)
runC
kata
gvisor
* The www-root of the kata test case is not located on 9pfs
Real-life case: Redis
0
20000
40000
60000
80000
100000
120000
140000
160000
PING_IN
LINE
PING_BULK SE
TGET
INCR
LPUSH
RPUSHLP
OPRPOP
SADD
HSET
SPOP
LPUSH
(need
ed to ben
chmark
LRANGE)
LRANGE_1
00 (fi
rst 100
elements
)
LRANGE_3
00 (fi
rst 300
elements
)
LRANGE_5
00 (fi
rst 450
elements
)
LRANGE_6
00 (fi
rst 600
elements
)
MSE
T (10 k
eys)
Redis-benchmark (requests per second)
runC kata gvisor
Real-life case: Tensorflow (CPU)Note: this is a CPU test (No GPU)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
runc kata gvisor-kvm gvisor-ptrace
CNN benchmarks (images/sec)TensorFlow 1.10Model resnet50Dataset imagenet (synthetic)Mode trainingSingleSess FalseBatch size 32 global
32 per deviceNum batches 100Num epochs 0Devices ['/cpu:0']Data format NHWCOptimizer sgd
Summary• Both kata and gVisor
• Small attack interface
• CPU performance identical to host and runC
• On Kata Containers• Some suggestions: Passthru fs, Factory and/or shimv2, etc.
• Good performance in most cases with proper configuration
• On gVisor• Fast boot performance and low memory footprint
• Still need improve the compatibility for general usage
• Furthermore tests• Fine tuned benchmarks, vhost-user/dpdk etc. and NUMA Specific tests, nested virtualization tests• Different kinds of networking tests• More real-life cases
The Future of the Projects
Kata Containers• Kata is on the way more efficient
• The shim v2 shown in the benchmarks is going to be mergedsoon
• The nemu will reduce the VMM overhead• Multiple network improvement is under developing or testing
• And more featureful• Better GPU and other accelerator support
• Contributions are welcome
gVisor and Similar Projects• gVisor shows real lightweight secure sandboxing tech• gVisor need more work on
• More efficient syscall interception and processing• Better compatibility for more applications
• Another project, linuxd, is based on linux kernel anddoing similar jobs
• Lai, Jiangshan, Containerize Linux Kernel, OpenSource Summit2018, Vancouver(https://schd.ws/hosted_files/ossna18/db/Containerize%20Linux%20Kernel.pdf)
Q&AContributions are Welcome