33
Linux containers/Docker (and how it works) Dmitry Fedorov

Linux containers_Docker

Embed Size (px)

Citation preview

Linux containers/Docker(and how it works)

Dmitry Fedorov

Wont talk about

(Lots of talks about it allready)

This talk will not include:

Marketing stories

Docker is awesome...blah,blah,blah

Will talk about

Namespaces

Capabilities

Cgroups

Docker internals (libcontainer)

Namespaces

Namespaces

He's back. And this time he's got a chainsaw.

Yes, folks. We got per-process namespaces. Working. With properbehaviour on exit(), yodda, yodda. Enjoy.

Mount (Mount points)

UTS (Hostname and NIS domain name)

IPC (System V IPC, POSIX message queues)

PID (Process IDs)

Network (Network devices, stacks, ports, etc.)

User (User and group IDs)

Namespaces Api

/proc/[pid]/ns

ipc -> ipc:[4026531839]mnt -> mnt:[4026531840]net -> net:[4026531956]pid -> pid:[4026531836]user -> user:[4026531837]uts -> uts:[4026531838]

Syscalls:

clone(2)

setns(2)

unshare(2)

Mount namespaces (CLONE_NEWNS)

Mount namespaces

Mount points

/proc/[pid]/mounts

/proc/[pid]/mountstats

Mount namespaces

On hostnode:

cat /proc/1/mounts |wc -l32

Inside container:

docker run -it --rm centos:centos7 cat /proc/1/mounts | wc -l16

UTS namespaces (CLONE_NEWUTS)

UTS namespaces

hostname, domainname

On hostnode:

uname -ndfedorov

Inside container:

docker run -it --rm centos:centos7 sh -c 'uname -n'b543e1bb6eef

IPC namespaces (CLONE_NEWIPC)

IPC namespaces

System V IPC objects, POSIX message queues

/proc/sys/fs/mqueue

/proc/sys/kernel

/proc/sysvipc

On hostnode:

dfedorov@dfedorov:~$ ipcs | wc -l45

Inside container:

dfedorov@dfedorov:~$ docker run -it --rm centos:centos7 ipcs |wc -l10

PID namespaces (CLONE_NEWPID)

PID namespaces

process ID number space

Nesting namespace:

PID namespaces can be nested: each PID namespace has a parent,except for the initial ("root") PID namespace.

On hostnode:

ps aux|wc -l298

Inside container:

docker run -it --rm centos:centos7 ps axu |wc -l2

Network namespaces (CLONE_NEWNET)

Network namespaces

network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewalls

/proc/net

/sys/class/net

/sys/class/net on hostnode:

docker0 eth0 lo lxcbr0 veth1 veth50cf98d veth6b9c9cc

Inside container:

docker run -it --rm centos:centos7 ls /sys/class/net

eth0 lo

Network namespaces

Create netns manualy:

ip netns add minimal # Create namespaceip link add eth1 type veth peer name veth1 # Create virtual ethernet deviceip link set eth1 netns minimal # Attach device to namespaceip a add 10.0.0.1/24 dev veth1ip l set veth1 up

User namespaces (CLONE_NEWUSER)

User namespaces

user credentials (user IDs and group IDs), capabilities

Still strict user mapping. Sad ...

UID 1000 inside container -> 1000 on hostnodeUID 0 inside container -> 0 on hostnodeetc

And dont really work ...

sudo ls -l /proc/1/ns/userlrwxrwxrwx 1 root root 0 Nov 28 14:17 /proc/1/ns/user -> user:[4026531837]

docker run -it --rm centos:centos7 ls -l /proc/1/ns/userlrwxrwxrwx 1 root root 0 Nov 28 11:18 /proc/1/ns/user -> user:[4026531837]

User namespaces - Capabilities

per-thread attribute

Used caplist:

CHOWNDAC_OVERRIDEFOWNERMKNODNET_RAWSETGIDSETUIDSETFCAPSETPCAPNET_BIND_SERVICESYS_CHROOTKILL

troublesome: mount (cap_sys_admin)

Cgroups

Cgroups

memory

cpu/cpuset/cpuacct

blkio

device

Purpose:

limits

accounting

afinity

permissions

Efficiency

Efficiency

isolated, but still on hostnode

cpu: native

memory: allmost native, few % shaved for accounting

network: small overhead

dics: native on volumes. overhead on layered fs

Still not a cake

What do we need on top of all of it?

unionfs (aufs, vfs)

snapshotting fs (btrfs, zfs)

CoW (thin provisioning, lvm)

Docker

Docker

container control operations

version control system

system administration

Docker Approach

Application level isolation vs OS level isolation.

One task per container.

Deduplication.

Commoditize.

Typical workflow

developer:

-- write some code -- unit test -- commit

docker build:

-- environment test (serverspec, rspec etc) -- functional test -- push to registry

devops:

-- pull images and run

Old-new challenges

Monitoring.

Logging.

Backups.

Configuration management.

No ssh, and you dont need it.

No ssh, but you have exec.

Thank you. Questions?