INSTRUCTIONS - Solaris Container Setup Overview

Preview:

Citation preview

Solaris 10 Containers

Kimberly ChangOS AmbassadorSolaris 10 Adoption, US Client Solutionshttp://webhome.sfbay/kchangshttp://blogs.sun.com/kchangs

Domain 1Domain 1 Domain 2Domain 2

SunSun Server Server

Container 1Container 1 Container 2Container 2 Container 3Container 3

Container 4Container 4 Container 5Container 5

Server VirtualizationSolaris Containers and Solaris Dynamic System Domains

Server Virtualization

• Consolidates multiple applications• Provides security perimeter between applications

and underlying system• Makes more effective use of hardware• Simplifies administration• Adds flexibility to resource management• Can be hardware- or software- based

Container Components

• Full Resource Containment - SRM (Solaris 9)> Provides predictable service levels

• Isolation -Zones (Solaris 10)> Prevent unauthorized access (security boundary)> Minimize fault propagation (fault boundary)

• Service Management Application> Ease of management – GUI Container Manager

Zones

• Provides virtualized OS environments, each looking like a Solaris instance> Implemented via a lightweight layer in the OS> Details of physical resources are hidden> Separate nodename, IP address, IP port space> Processes cannot see or affect processes in other

containers> Each zone can be administered independently> No porting as the ABI/API is the same

Zones Block Diagram

ce0

global zone(serviceprovider.com)

red zone (red.com)

web services(Apache 1.3.22, J2SE)

enterprise services(Oracle 8i, IAS 6)

sun zone (sun.com)

web services(Apache 2.0, Tomcat)

login services(OpenSSH sshd 3.4)

core services(ypbind, automountd)

core services(ldap_cachemgr, inetd)

zone root: /zone/redzone zone root: /zone/sunzone

ge0

zone managementzonecfg(1M), zoneadm(1M), zlogin(1), ...

zoneadmd

/aux0/redspace /usr

login services(OpenSSH sshd 3.4)

core services(ypbind, automountd)

/opt

ce0:1 ge0:1 ge0:2/red /usr /opt

zoneadmd

/usr /opt

Storage By default, zone refersto non-global zone.

Granularity

• 8,000+ zones per OS instance• Doesn't require dedicated CPUs, memory,

physical devices, etc.> Just space for unique root filesystem

• Existing hardware resources can be:> multiplexed across zones, or> allocated per zone using resource pools

Security

• Security boundary around each zone• Restricted subset of privileges

> A compromised zone is unable to escalate its own privileges

• Important name spaces are isolated• Processes running in a zone are unable to affect

activity in other zones or the global zone

Zone Security Properties

• Services can be isolated from each other> Quarantining potentially risky software> Isolating multiple dis-trusting parties> Containing potential damage by a breach

• Global Zone can:> observe all activities inside each zone> not be seen by software in each zone> change the contents or processes in each zone

• Non-global Zones run with less privileges

Zones are Less Privileged

“contract_event” Request reliable delivery of events“contract_observer” Observe contract events for other users"cpc_cpu” Access to per-CPU perf counters"dtrace_kernel" DTrace kernel tracing"dtrace_proc" DTrace process-level tracing"dtrace_user" DTrace user-level tracing"file_chown" Change file's owner/group IDs"file_chown_self" Give away (chown) files"file_dac_execute" Override file's execute perms"file_dac_read" Override file's read perms"file_dac_search" Override dir's search perms"file_dac_write" Override (non-root) file's write perms"file_link_any" Create hard links to diff uid files"file_owner" Non-owner can do misc owner ops "file_setid" Set uid/gid (non-root) to diff id"ipc_dac_read" Override read on IPC, Shared Mem perms"ipc_dac_write" Override write on IPC, Shared Mem perms"ipc_owner" Override set perms/owner on IPC"net_icmpaccess" Send/Receive ICMP packets"net_privaddr" Bind to privilege port (<1023+extras)"net_rawaccess” Raw access to IP"proc_audit” Generate audit records"proc_chroot” Change root (chroot)"proc_clock_highres" Allow use of hi-res timers"proc_exec" Allow use of execve()"proc_fork" Allow use of fork*() calls"proc_info" Examine /proc of other processes

"proc_lock_memory" Lock pages in physical memory"proc_owner" See/modify other process states"proc_priocntl" Increase priority/sched class"proc_session" Signal/trace other session process"proc_setid" Set process UID"proc_taskid" Assign new task ID“proc_zone” Signal/trace processes in other zones“sys_acct” Manage accounting system (acct)“sys_admin System admin tasks (e.g. domain name)"sys_audit" Control audit system"sys_config" Manage swap"sys_devices" Override device restricts (exclusive)"sys_ipc_config" Increase IPC queue"sys_linkdir" Link/unlink directories"sys_mount" Filesystem admin (mount,quota)"sys_net_config" Config net interfaces,routes,stack"sys_nfs" Bind NFS ports and use syscalls"sys_res_config" Admin processor sets, res pools"sys_resource" Modify res limits (rlimit)"sys_suser_compat" 3rd party modules use of suser"sys_time" Change system time

Interesting Some interesting privilegesBasic Non-root privilegesRemoved Not available in Zones

Processes

• Certain system calls are not permitted or have restricted scope inside a zone> http://developers.sun.com/solaris/articles/application_in_zone.html

• All processes can be seen inside the global zone> But control of those processes is privileged

• Inside a zone, only processes in the same zone can be seen or affected

• proc(4) only shows processes in the same zone

Global root /

/zone

zone1 2 3

/usr /dev ... .... ....

/bin /usr /dev

Zone root / Zone view

Global view

/export

... .... ....

Zone 1

Zone Filesystem

/proc/etc

File Systems & Devices

• Each zone is allocated its own root file system> No access to other zones' root file system> Private /dev directory mounted in zone

• Sparse-root vs. Whole-root> Sparse: subset of packages; sharing of execs, libs,

data. /usr,/sbin,/lib,/platform by default inherited in a read-only manner via lofs

> Whole: copies are made (needs more storage)

• Raw devices can be given to a zone with caution

Network & Identity

• Each zone controls its identity> Node name, RPC domain name, time zone, locale> Each container can use a different naming service (DNS,

LDAP and NIS, etc.)> Private IP addresses, ports

• Separate /etc/passwd files means that unique root users can be assigned

• Only one TCP/IP stack per kernel> Zones shielded from stack specifics – routing, devices, etc.> Cannot view other zones traffic

Zones and Resource Pools

cpu1

Resource Pool AResource Pool B

Non-GlobalZone1

Non-GlobalZone2

Non-GlobalZone3

Global Zone

cpu2 cpu3 cpu4 cpu5 cpu6 cpu7 cpu8

Default Resource Pool● Processor set (now)● Scheduling Class (now)● Memory Set (TBD)● Swap Set (TBD)

Solaris Container

• Solairs Containers = Zones + Resource Management

• Oracle license honor Containers (Zones+RM)> http://oracle.com/corporate/pricing/specialtopics.html> Running Oracle Database in Solaris 10 Containers Best Practices

- Metalink# 317257.1

FSS Scheduling Class

• CPU allocation is based on “shares” assigned to projects or zones> Share defines a guaranteed floor, rather than a cap> Only impose a limit when there is a shortage of CPU> Default share value is 1 share

• FSS works within a processor set• Avoid mixing scheduling classes within a pset• FSS class can be used for workloads having

different CPU utilization patterns> e.g. OLTP, DSS, java

Shares describe relative ratio...

App AApp A20%20%

App B33% App CApp C

14%14%

App DApp D33%33%

App AApp A 30%30%

App B50%

App CApp C20%20%

App A (3 shares) App B (5 shares) App C (2 shares) App D (5 shares)

Solaris ContainerResource Management – Fair Share Scheduler

Fair Share Scheduler (FSS(7D))

• Assigns resources based on number of shares assigned / number of shares on the system

• Two-level model> Top Level: Global zone administrator assigns shares to

zones> Second Level: Zone administrator assigns shares to

projects

• A project's CPU allocation depends on project shares as well as zone shares

• Most likely to use one approach, not both at the same time.

Two Level FSS

3

1

2

1

twilight

drop

fracture

global

Shares Allocated byZone Administrator

6

3

4

5

4

Shares Allocatedto Zones

DatabaseProject

2

(3+1+2+1)x

6

(4+5+4+3+6)=

2

7

6

22x = 6

77~ 7.8%

Enabling FSS Scheduler• Set FSS to be default scheduler class unpon next reboot

> # dispadmin -d FSS> 'dispadmin -d' creates /etc/dispadmin.conf

• Dynamically switch to FSS scheduler> Sysetup init script

> # dispadmin -d FSS> # /etc/init.d/sysetup start

> 'priocntl' command> # priocntl -s -c FSS -i all> # priocntl -s -c FSS -i pid 1

• Verify> # ps -cafe> # ps -ef -o user,pid,class,comm

ExamplesSingle Application Containers

network device(hme0)

storage complex

global zone (v880-room2-rack5-1; 129.76.1.12)

dns1 zone (dnsserver1)

zoneadmd

mail zone (mailserver)

network services(sendmail, IMAP)

remote admin/monitoring(SNMP, SunMC, WBEM)

platform administration(syseventd, devfsadm, ifconfig, metadb,...)

core services(inetd)

core services(inetd)

core services(inetd, rpcbind, sshd, ...)

zone root: /zone/dns1 zone root: /zone/mail1

network device(ce0)

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

ce0:

3

ce1:

1

hme0

:1

zcon

s

zcon

s

zoneadmd

/usr

/usr

App

licat

ion

Env

ironm

ent

Virt

ual

Pla

tform

login services(SSH sshd)

network services(named)

zoneadmd

web1 zone (foo.org)

network services(Apache, Tomcat)

core services(inetd)

zone root: /zone/web1

hme0

:2

ce0:

1

zcon

s

/usr

zoneadmd

web2 zone (bar.net)

network services(IWS)

core services(inetd)

zone root: /zone/web2

hme0

:3

ce0:

2

zcon

s

/usr

pool2 (4 CPU; 10GB)

network device(ce1)

login services(SSH sshd)

login services(SSH sshd)

login services(SSH sshd, telnetd)

10

pool1 (4 CPU; 6GB), FSS

30 60

ExamplesMultiple Application Containers

network device(hme0)

storage complex

global zone (v1280-room3-rack12-2; 129.76.4.24)

oracle1 zone (oracle_ops)

zoneadmd

mail zone (mailserver)

network services(sendmail, IMAP)

remote admin/monitoring(SNMP, SunMC, WBEM)

platform administration(syseventd, devfsadm, ifconfig, metadb,...)

core services(inetd)

core services(inetd, rpcbind, sshd, ...)

zone root: /zone/oracle1 zone root: /zone/mail1

network device(ce0)

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)

ce0:

3

ce1:

1

zcon

s

/usr

App

licat

ion

Env

ironm

ent

Virt

ual

Pla

tform

zoneadmd

hme0

:1

ce0:

1

zcon

s

/usr

zoneadmd

oracle2 zone (ora_ta)

dba users proj(sh, bash, prstat)

system project(inetd, sshd)

zone root: /zone/oracle2

hme0

:2

ce0:

2

zcon

s

/usr

pool2 (4 CPU)

network device(ce1)

login services(SSH sshd)

ora_ta project(oracle)

pool1 (8 CPU), FSS

70 10

web service project(Apache 1.3.22)

app service project(IAS, J2SE)

dba users project(sh, bash, prstat)

backup project(sqlplus)

ora_ops project(oracle)

system project(inetd, sshd)

60

0

10

15

10

5

70

20

10

Sun™ MC – Solaris Container ManagerManage systems that run the

Solaris 8, 9, and 10 OS

Manage Solaris Containers across many systems

Uses Sun Management Center 3.5 Update 1b

Container ManagementView all the Containers

in your environment

Recreate a Container on another system

Automatic discovery of new objects

Create/Delete/Modify

Projects

Manage Solaris Zones

Support for IPQoS for Solaris Zones

Create/Delete/Modify Pools, Zones

Create new Zones through a

single wizard

Zone Commands

• Zone Configuration – zonecfg> Define what a zone looks like

• Console Access – zlogin -C

• Zone Administration – zoneadm> Install, Boot, Restart, Stop, List, Verify, Uninstall

Zone Administration

• zoneadm(1M) is used by the global zone administrator to> install a new root file system for a configured zone> list zones and optionally their state> verify whether the configuration of an installed zone is

semantically complete and ready to be booted> boot or ready an installed zone> halt or reboot a running zone> uninstall the root file system of an installed zone

Zone Console

• Zone pseudo-console available for each zone> Mimics a hardware console> Accessible via zlogin -C> Available prior to zone boot

global# zlogin -C zone1[Connected to zone 'zone1' console]zone1# ~.[Connection to zone 'zone1' console closed]

• Publishes zone state change messages[Notice: zone halted]

Demo

Creating a zone

Global# zonecfg -z zone1zone1: No such zone configuredUse 'create' to begin configuring a new zone.zonecfg:zone1> create

Setting's for the zonezonecfg:zone1> set zonepath=/zoneroots/zone1zonecfg:zone1> set autoboot=truezonecfg:zone1> add netzonecfg:zone1:net> set address=192.9.200.100/24zonecfg:zone1:net> set physical=e1000gzonecfg:zone1:net> endzonecfg:zone1> add inherit-pkg-dirzonecfg:zone1:inherit-pkg-dir> set dir=/optzonecfg:zone1:inherit-pkg-dir> endzonecfg:zone1> verifyzonecfg:zone1> commitzonecfg:zone1> ^D

global# zoneadm list -vcglobal# ls -l /etc/zones/zone1.xmlglobal# zonecfg -z zone1 info

Installing the zoneglobal# zoneadm -z zone1 installPreparing to install zone <zone1>.Creating list of files to copy from the global zone.Copying <2394> files to the zone.Initializing zone product registry.Determining zone package initialization order.Preparing to initialize <1048> packages on the zone.Initialized <1048> packages on zone.Zone <zone1> is initialized.Installation of <1> packages was skipped.Installation of these packages generated warnings: <SFWmuttS>The file </zoneroots/zone1/root/var/sadm/system/logs/install_log> contains a log of the zone installation.

- It took about 9 minutes on my laptop

global# zoneadm list -cv ID NAME STATUS PATH 0 global running / 1 zone1 installed /zoneroots/zone1

Boot the zone

global# zoneadm -z zone1 boot- It took about 4 seconds for 1st boot on my laptop.

global# zoneadm list -cv ID NAME STATUS PATH 0 global running / 1 zone1 running /zoneroots/zone1

global# zlogin -C zone1[Connected to zone 'zone1' console]<Run through sysid tools as usual to do initial customization>

Example: Interactive Initial Boot

• sysidtool(1M) runs by default[NOTICE: zone booting up]

SunOS Release 5.10 Version s10_52 32-bitCopyright 1983-2004 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.Hostname: twilightThe system is coming up. Please wait.

Select a Language

0. English 1. French 2. Japanese 3. Simplified Chinese 4. Traditional Chinese

Please make a choice (0 - 4), or press h or ? for help:

Example: Hands-off Initial Boot

• Using a sysidcfg(4) file as an alternative# cat > ~/zone-cfg/zone1.sysidcfgsystem_locale=en_UStimezone=US/Pacifictimeserver=localhostterminal=xtermssecurity_policy=NONEnetwork_interface=PRIMARY {hostname=zone1 \ protocol_ipv6=no}name_service=NONEsystem_locale=Croot_password=MNYm8SfoJlvIY^D

# cp ~/zone-cfg/zone1.sysidcfg \/zoneroots/zone1/root/etc/sysidcfg

Example: Process Monitoring

• prstat -Z• prstat -z <zonename>• ps -aef -z <zonename>• ps -aef -Z• df -hZ

ptree prints the process trees with child processes indented from their respective parent processes.

• ptree -z <zonename>

Zones and File Systems

• 3 different ways of provisioning file systems:> LOFS – Mount directory from global in a non-global zone> UFS – Mount real UFS directly into non-global zone> Raw – Attach raw devices to non-global zone

• Zonecfg requires a separate “add fs” or “add device” stanza for each device or mount point added.

Example: Zones + LOFS

global# zonecfg -z zone1zonecfg:zone1> add fszonecfg:zone1:fs> set dir=/opt/localzonecfg:zone1:fs> set special=/localzonecfg:zone1:fs> set type=lofszonecfg:zone1:fs> add options [rw, nodevices]zonecfg:zone1:fs> endzonecfg:zone1> verifyzonecfg:zone1> commitzonecfg:zone1> ^D

➔ This will mount the /local directory from the global to a mount point of /opt/local in the zone

➔ Useful to share data between zones, using the global zones as a go-between

Example: Zones + UFSglobal# zonecfg -z dbzone

zonecfg:red> add fszonecfg:red:fs> set dir=/opt/localzonecfg:red:fs> set special=/dev/dsk/c0d0s7

zonecfg:red:fs> set raw=/dev/rdsk/c0d0s7zonecfg:red:fs> set type=ufszonecfg:red:fs> endzonecfg:red> verifyzonecfg:red> commitzonecfg:red> ^D

> Mounts the UFS disk slice /dev/dsk/c0t0d0s7 as /opt/local in the non-global zone.

> No exposed mount point for this file system in the global zone.

Example: Zones + Raw Devicesglobal#zonecfg -z zone1zonecfg:zone1> add devicezonecfg:zone1:device> set match=/dev/rdsk/c0d0s6zonecfg:zone1:device> endzonecfh:zone1> add devicezonecfg:zone1:device> set match=/dev/dsk/c0d0s6zonecfg:zone1:device> endzonecfg:zone1> verifyzonecfg:zone1> commitzonecfg:zone1> ^D

> Adds a raw device directly into the non-global zone> Creates device node for the new device> Match can include wildcards and is evaluated each time the zone boots

zone1# newfs /dev/rdsk/c0d0s6zone1# mount /dev/dsk/c0d0s6 /opt/local

Example: Zones + FSS

#zonecfg -z zone1zonecfg:zone1> set pool=newpoolzonecfg:zone1> add rctlzonecfg:zone1:rctl> set name=zone.cpu-shareszonecfg:zone1:rctl> add value (priv=privileged,limit=10,action=none)zonecfg:zone1:rctl> endzonecfg:zone1> verifyzonecfg:zone1> commitzonecfg:zone1> ^D

Note: default pool will be used if “set pool” is not specified

#prctl -n zone.cpu-shares -r -v 25 -i zone zonename

Resource Pools Managementpoolcfg(1M) and pooladm(1M)

• Enabling pools> # pooladm -e

• Disabling pools> # pooladm -d

• Creating /etc/pooladm.conf xml file> # pooladm -s

• View current config info> # poolcfg -c info

Pools Configuration

• Create a set with min and max number of CPU's in a pool> # poolcfg -c 'create pset dbset (uint pset.min=1; uint

pset.max=2)'

• Create a pool> # poolcfg -c 'create pool dbpool'

• Associate set to the pool> # poolcfg -c 'associate pool dbpool (pset dbset)'

• View current config info> # poolcfg -c info> # poolstat -r all

Pools Example • tm163-118# poolcfg -c info

system tm163-118 string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 20514

pool dbpool int pool.sys_id 3 boolean pool.active true boolean pool.default false int pool.importance 1 string pool.comment pset dbset

pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default

Pools Example (Cont.) pset dbset int pset.sys_id 1 boolean pset.default false uint pset.min 1 uint pset.max 1 string pset.units population uint pset.load 0 uint pset.size 1 string pset.comment

cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line

pset pset_default int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 1 string pset.units population uint pset.load 0 uint pset.size 1 string pset.comment

cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line

Pools and Zone

• Bind a zone to a pool> # poolbind -p dbpool -i zoneid dbzone

• Which pool are you binding to? > dbzone# poolbind -q $$ 25177 dbpool

System Parameter Changes in S10• Many removed and obsoleleted parameters

> http://docs.sun.com/app/docs/doc/817-0404/6mg74vs90?a=view

• Removed System V IPC parameters

Message Queues Semaphores Shared Memorymsgsys:msginfo_msgmap semsys:seminfo_semmaem shmsys:shminfo_shmminmsgsys:msginfo_msgmax semsys:seminfo_semmap shmsys:shminfo_shmsegmsgsys:msginfo_msgseg semsys:seminfo_semmnsmsgsys:msginfo_msgssz semsys:seminfo_semmnu

semsys:seminfo_semvmxsemsys:seminfo_semumesemsys:seminfo_semusz

• Obsoleted parameters replaced with controlable resource parameters (bigigger default value)

> http://docs.sun.com/app/docs/doc/817-1592/6mhahuoim?a=view

Oracle Related Parameters• System V IPC parameters and the corresponding Solaris resource controls

Parameter Resource Control Default Value

100 Yes project.max-sem-ids 128

1024 No N/A N/A

256 Yes project.max-sem-nsems 512

Yes project.max-shm-memory

1 No

100 Yes project.max-shm-ids 128

10 No N/A N/A

Oracle Recommendation

Requiredin S10

SEMNI (semsys:seminfo_semmni)SEMMNS (semsys:seminfo_semmns)SEMMSL (semsys:seminfo_semmsl)SHMMAX (shmsys:shminfo_shmmax)

¼ of physical RAM

SHMMIN (shmsys:shminfo_shmmin)SHMMNI (shmsys:shminfo_shmmni)SHMSEG (shmsys:shminfo_shmseg)

Resource Control Commands

• System V IPC parameters not need to be set in /etc/system

• Set on a per-process or per-project basis

• prctl(1) > # prctl -n process.max-file-descriptor <pid>

> # prctl -n project.cpu-shares -v 10 -r -i project db_project

> # prctl -n project.max-shm-memory -v 10g -r -i project user.oracle

> # prctl -n project.max-shm-memory -i project user.oracle

> # prctl -i project user.oracle

• rctladm (1)> # rctladm -l

Zones FAQ/Blogs/Info

• http://www.opensolaris.org/os/community/zones/faq• http://blogs.sun.com/<whomever>

> David Comay (comay)> Dan Price (dp)> John Beck (jbeck)> Andy Tucker (tucker)

• http://www.sun.com/bigadmin/content/zones• http://www.sun.com/blueprints/

SOLARIS 10 CONTAINERS

Kimberly Changkimberly.chang@sun.comhttp://webhome.sfbay/kchangshttp://blogs.sun.com/kchangs

Recommended