Virtualization with Solaris - front page · PDF file · 2013-08-26Virtualization...

Preview:

Citation preview

Virtualization with SolarisBased on Solaris 10 10/08

Bart MuijzerSystems Solutions ArchitectOperating Systems AmbassadorSUN Microsystems Nederland BV

A bit about me

• Did “Hogere Informatica Opleiding” in Enschede• UNIX addict since 1985• 1991 – 1998: UNIX Sysadmin, Utrecht University• 1998 – 1999: UNIX Specialist, AZU• 1999 – today: SUN Microsystems

> Techie functions> OS Ambassador since 2002

• Married, 3 kids (that I teach Solaris)• http://bartmu.hyves.nl

AgendaOverall

• Virtualization with SUN• Solaris, OpenSolaris... what's up?• BREAK• Solaris

> selected features for developers> ... with some handson!

• How to engage• Sorry... no talk on Java ;-)

Agenda – Part 1Virtualization with SUN

• Overview• SUN offerings

Server VirtualizationSituation Today

Web Web ServerServer

Email Email ServerServer

DNS DNS ServerServer

App App ServerServer

DB DB ServerServer

One application per server

Increasing operational and

mangement costs

Dir Dir ServerServer

Average utilization rate is 5%-20%

Server VirtualizationIt's not easy but it helps...

What the entire art of [server] virtualization comes down to is moving the OS to a place where it

should not be, and running around like your head is on fire trying to fix all the problems that come up.

There are a lot of problems, and they happen quite often, so the performance loss is nothing specific,

but more of a death by 1000 cuts.The InquirerFebruary 2005

Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.

Server

OS

App

Multiple OSes Single OS

CalendarServer Database Web

ServerSunRayServer

AppServerDatabaseMail

ServerWeb

ServerFile

ServerIdentityServer

AppServer Database

Trend to flexibility Trend to isolation

Server Virtualization Approaches

> Very High RAS> Very Scalable> Mature Technology> Ability to run different

OS versions

> Very scalable and low overhead

> Single OS to manage> Cleanly divides system

and application administration

> Fine grained resource management

> Very scalable and low overhead

> Single OS to manage> Fine grained resource

management

> Ability to live migrate an OS

> Ability to run different OS versions and types

> De-couples OS and HW versions

Hard Partitions

Server

OS

ApplicationIsolation all the

way into the hardware

Only as granular as the hardware

allows

Only on certain hardware

IdentityServer

AppServer Database

Server

OS

Application

Server

Application

Virtual Machines

Allows different OS versions and types

Extra overhead for the

Hypervisor

Available on many platforms

MailServer

WebServer

FileServer

Virtualization Types• Full Virtualization

> Thick Hypervisor> OS does not know it runs in a VM> Enables running legacy OS-es

• Paravirtualization> Thin Hypervisor> OS needs to know it runs on top of a VM> Can't run legacy OS-es

• Both can use hardware assistance> Intel-VT, AMD-V, SPARC CMT

Logical Domains Virtual Machines for SPARC

Server

OS

Application

Server

ApplicationMailServer

WebServer

FileServer

Stable interface(sun4v)

Firmware (upgrade)

CMT processor(T1000/T2000)

OS Virtualization (1)Solaris zones

Server

OS

ApplicationResource and namespace

isolation

Very scalable

Available on all platforms

CalendarServer

Database WebServer

ContainerContainer ContainerContainerContainerContainerContainerContainer ContainerContainerContainerContainer

Server VirtualizationSolaris Zones

• Single Solaris instance> Appearance of many OS instances> Minimal performance impact

ZoneZone

CPU CPU CPU CPU CPU CPU CPU CPU

Memory

OS

ZoneZone

Memory

OS

ZoneZone ZoneZone

Zone Properties• Can have own IP stack• Tightly linked with Solaris' Resource Management

capabilities> Same controls for global and local zone

• Upgrade tools know about local zones• Can be branded:

> Linux> Solaris 8

• Attach, Detach, Clone, Migrate• Configurable privileges

Branded Zones• Available for

> SPARC: Solaris 8 and Solaris 9 (userland only)> x86 : Linux

• Linux:> RedHat Enterprise Linux 3, and CentOS

>32-bit only> Only for Solaris 10 x86> NOT running a Linux kernel> Needs Linux CD (and hence valid RTU)

Example: S8C - Upgrade in Phases

Solaris 10Global

OPL

Solaris 10 Container

ZFS DTrace

DatabaseApplication

Solaris 8

Solaris 8 Migration Container

BrandZ

Server

OS

ApplicationDatabaseApplication

Phase I: Deploy H/W, Deploy Solaris 8 Container

db27.foo.comNIS Name SvcRoot PW: db27

Local tools & scripts

db27.foo.comNIS Name SvcRoot PW: db27

Local tools & scripts

FMA

T2000/T5120/T5220

Using Containers to help migration to Solaris 10

Example: S8MA - Upgrade in Phases

Solaris 10Global

OPL

Solaris 10 Container

ZFS DTrace

BrandZ

Server

OS

Application

db27.foo.comNIS Name SvcRoot PW: db27

Local tools & scripts

FMA

Phase II: Application Redeploy

db27.foo.comNIS Name SvcRoot PW: db27

Local tools & scripts

Solaris 8 Migration Container

DatabaseApplication

DatabaseApplication

T2000/T5120/T5220

OS Virtualization (2)Resource Management

Server

OS

ApplicationResource

controls only

Very scalable

Available on all platforms

SunRayServer

AppServerDatabase

OS Virtualization (3)Solaris Containers

Server

OS

ApplicationResource and namespace isolation with

Resource Controls

Very scalable

CalendarServer

Database WebServer

OS Virtualization (4)BrandZ, SCLA, S8C and S9C

• BrandZ is an extension to Zones technology> Enables Solaris Containers to assume different OS

personalities a.k.a. “Brands”• Solaris Containers for Linux Applications build on

BrandZ to provide Linux-branded Containers> Ideal for Linux consolidation and development as well as

migration to Solaris• Solaris 8 Containers, Solaris 9 Containers build on

BrandZ to provide {S8, S9}-branded Containers> Migrate S8 or S9 servers onto S10

Hosted VirtualizationWhat it is

• Virtualization runs on top of some Operating System• Examples:

> Sun xVM VirtualBox (www.virtualbox.org)> VMWare Workstation> Microsoft Virtual Server> User-mode Linux (UML)

>./linux> Virtuozzo

• No doubt, there is more out there...

Server VirtualizationSolutions from Sun

Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.

Server

OS

App

Multiple OSes Single OSTrend to flexibility Trend to isolation

Dynamic System Domains Solaris Containers(Zones + SRM)

Solaris Containersfor Linux Applications

Solaris Trusted Extensions

Solaris Resource Manager(SRM)

Logical Domains

Xen

VMware

Microsoft Virtual Server

CalendarServer Database Web

ServerSunRayServer

AppServerDatabaseMail

ServerWeb

ServerFile

ServerIdentityServer

AppServer Database

Hybrid Solutions

Server

Dynamic System Domains with Solaris Containers> Combine high RAS and proven robustness with

flexible application environments> Both can scale all the way up to 144 way systems> Incur no extra overhead for Virtualization

LDoms/Xen/VMware/MSVS with Solaris Containers> Combine flexibility of OS version and type with secure

application environments> Live migration allows for off-loading a system in

production for repair of DR

Hard Partitions & OS Virtualization Virtual Machines & OS VirtualizationDatabase Mail

ServerWeb

ServerMail

ServerWeb

ServerFile

Server

What's Up ??

Why the OS Matters

Applications

Infrastructure Services

Hardware

Operating System

Support

What You Care About

What You Depend On

What Makes the Difference

Data Overload

What YouWorry About

Intrusions

Costs

Management

Overload

Level of Service

Solaris and Open Source• Innovate through Sharing• Goal: allow external collaborations during the

development of Solaris• Model:

> Release source code every ~2 weeks> Create a “way into SUN” for external contributions> Apply the Solaris Quality Process

• Solaris is Open Source, therefore:> Common Development and Distribution License (CDDL)> Compilers and other tools are free

Solaris and Open DevelopmentS9

S10

Nevada (Open Sourced parts of Solaris)

FCS u1 u2 u3 u4 u5 u6 u7 u8

u1 u2 u3 u4 u5 u6 u7 u8

SXCE (binary distro of Nevada)

+IPS + Installer = Indiana2008.05 2008.11 2009.04

Today

b103

Solaris.Next

SchilliX, Belenix, MartUX mBE,Nexenta OS, MilaX

Further Dev

OpenSolaris• Sun expects

> Help with device drivers> Help with security fixes> Larger footprint

> More ISV support> More self-help discussions> More customers not choosing Linux (or Microsoft)

> A better sense for the future direction Solaris should take> Credibility – we've delivered what we've promised

• Sun does NOT expect> The community to do our work> Customers to run their business on the code base; they should run

on the product

BREAK

Agenda – Part 2Solaris (aimed at Software Developers)

• Selected Solaris Features> ZFS> DTrace> Predictive Self Healing / FMA> Zones> Resource Management> Containers

• ... with some handson!!!

ZFS: no more of...# format... (long interactive session omitted)

# metadb -a -f disk1:slice0 disk2:slice0

# metainit d10 1 1 disk1:slice1d10: Concat/Stripe is setup# metainit d11 1 1 disk2:slice1d11: Concat/Stripe is setup# metainit d20 -m d10d20: Mirror is setup# metattach d20 d11d20: submirror d11 is attached

# metainit d12 1 1 disk1:slice2d12: Concat/Stripe is setup# metainit d13 1 1 disk2:slice2d13: Concat/Stripe is setup# metainit d21 -m d12d21: Mirror is setup# metattach d21 d13d21: submirror d13 is attached

# metainit d14 1 1 disk1:slice3d14: Concat/Stripe is setup# metainit d15 1 1 disk2:slice3d15: Concat/Stripe is setup# metainit d22 -m d14d22: Mirror is setup# metattach d22 d15d22: submirror d15 is attached

# newfs /dev/md/rdsk/d20newfs: construct a new file system /dev/md/rdsk/d20: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d20 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d20 /dev/md/rdsk/d20 /export/home/ann ufs 2 yes -

# newfs /dev/md/rdsk/d21newfs: construct a new file system /dev/md/rdsk/d21: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d21 /export/home/ann# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d21 /dev/md/rdsk/d21 /export/home/bob ufs 2 yes -

# newfs /dev/md/rdsk/d22newfs: construct a new file system /dev/md/rdsk/d22: (y/n)? y... (many pages of 'superblock backup' output omitted)# mount /dev/md/dsk/d22 /export/home/sue# vi /etc/vfstab ... while in 'vi', type this exactly:/dev/md/dsk/d22 /dev/md/rdsk/d22 /export/home/sue ufs 2 yes -

# format... (long interactive session omitted)# metattach d12 disk3:slice1d12: component is attached# metattach d13 disk4:slice1d13: component is attached# metattach d21# growfs -M /export/home/bob /dev/md/rdsk/d21/dev/md/rdsk/d21:... (many pages of 'superblock backup' output omitted)

Traditional Filesystem Administration

Filesystem Admin – The ZFS way

ZFS Administration• Create a storage pool named “home” # zpool create home mirror c0t3d0 c0t4d0

# zfs set mountpoint=/export/home home

• Create filesystems “ann”, “bob”, “sue” # zfs create home/pieter

# zfs create home/clemens

# zfs create home/bartm

• Later, add space to the “home” pool # zpool add home mirror c0t8d0 c0t9d0

ZFS Goodies• snapshot, clone, rollback # zfs snapshot tank/home@yesterday

# zfs clone tank/home@yesterday tank/home-yesterday

# ls ~/.zfs/home-yesterday

# zfs rollback -r tank/home@yesterday

• replicate (incremental) # zfs send tank/home@yesterday |

ssh rhost zfs receive rpool/home@yesterday

# zfs send -i home tank/home@yesterday |

ssh rhost zfs receive rpool/home

Demo: ZFS

PSH: Fault Management Architecture• Predictive Self Healing components:

> Fault Management Architecture (FMA)> Service Management Facility (SMF)

• FMA is based on > Error events, which are dispatched to> Diagnosis agents, that generate> Fault events, handled by> Agents that take proactive action

• Available for CPU, mem, I/O bus• Agents interact with DR, RM, ...• See: http://www.sun.com/bigadmin/content/selfheal/

I

FMA for X64 - example

PSH: Service Management Facility• Other part of Predictive Self Healing• Manage running services

> Replace ancient “rc files” • Maintain:

> Dependencies> Snapshots> Status

• Functions: enable, disable, rollback, restart• See:

> http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.html

I

Tracing and DebuggingA real-life example

• Application: konsole (The X Terminal emulator of KDE)• Problem: konsole becomes unresponsive (hangs) after hitting

^C• Others: no source code available

I

Tracing and DebuggingA real-life eaxample• Analisys:

I

konsole normally sits in a loop calling poll() witha number of fd's, amongst which is an fd that points to

/devices/pseudo/clone@0:ptm

After hitting ^C, konsole still runs, but calls to pollsys()no longer contain the fd that has opened /devices/pseudo/clone@0:ptm.So it looks like konsole never gets any more input from it's childprocess (the shell in this case).

Tracing and debuggingUsing truss(1M) – trace system calls and signals

truss konsole from another window when hitting ^C.

# truss -tpollsys,read -vpollsys,read -p `pgrep konsole`

/1: pollsys(0x08074110, 6, 0x080466A8, 0x00000000) = 1/1: fd=3 ev=POLLIN rev=0/1: fd=9 ev=POLLIN rev=0/1: fd=8 ev=POLLIN rev=0/1: fd=5 ev=POLLIN rev=0/1: fd=11 ev=POLLIN rev=POLLIN/1: fd=15 ev=POLLIN rev=0/1: timeout: 0.928000000 sec/1: read(11, 0x0814A990, 0) = 0

Tracing and debuggingUsing truss(1M)

fd 11 is subsequently dropped off the list of fds thatkonsole wants to watch:

/1: pollsys(0x08074110, 5, 0x080466A8, 0x00000000) = 1/1: fd=3 ev=POLLIN rev=0/1: fd=9 ev=POLLIN rev=0/1: fd=8 ev=POLLIN rev=POLLIN/1: fd=5 ev=POLLIN rev=0/1: fd=15 ev=POLLIN rev=0/1: timeout: 0.928000000 sec

Tracing and debuggingUsing truss(1M)

It looks like konsole treats the 0-byte-read as an EOF, which itshouldn't since, on STREAMs based implementations, poll() (and it'ssystem implementation pollsys()) can return POLLIN revents even when there's 0 bytes available, see man poll(2):

POLLIN Data other than high priority data may be read without blocking. For STREAMS, this flag is set in revents even if the message is of zero length.

Tracing and debuggingUsing truss(1M)

So, if the assumption is correct, and konsole treats the zero-read asEOF, it should be changed to look for POLLHUP revents instead.

The reason this shows up on Solaris and not other unices is probablybecause this is the only STREAMS based pseudo tty implementation that you're running on.

Tracing and debuggingDevelopers reaction

“The story of how we got to this point deserves a blog entry of its own -- maybe I'll write one in the train when next traveling -- because it shows off all the fancy debugging tools that are available on the platform. [...]

Having tools at hand so you can ask questions like 'what are all the FDs passed in to select() in the Qt event loop?' with no recompiles is a godsend here. Or 'what are all the stack traces leading to QSocketNotifier::setEnabled in this running konsole?' Those are powerful tools, a tale for some other time.”

http://www.fruitsalad.org/people/adridg/bobulate/index.php?/archives/638-Incorporating-post-4.1.0-fixes-in-OpenSolaris.html

Dtrace – Dynamic TracingWhat is causing all the cross calls?

The X serverWhat are the X servers doing?

They're mapping and unmapping /dev/nullWhy are they doing that?

They're creating and destroying pixmapsWho's asking them to do that?

Several instances of a stock-ticker applicationHow often is each stock-ticker making this

request?100 times per second

Why is the application doing that?It was written by 10000 monkeys at 10000

keyboards

DTrace

• Improved system observability> Better debugging and performance tuning> Complete view from Java thread to kernel

• Dynamic instrumentation> Enables continuous “black box” recording

• Examine live systems and crash dumps> Reduce time-to-resolution

Dtrace (2)

Dtrace Framework

C C Dtrace(1M)

P PPPP

User

Kernel

DTrace• Structure:

syscall::open:entry/execname==”ls”/{

printf(“Opened file: %s\n”, copyinstr(arg0);}

provider:module:function:name/predicate/{

action; action;}

Demo: Dtrace

Resource Management• Resource set

> Partitions of the hardware resources> Can be: CPU, memory or SWAP

• Resource pools> Logical partitions of different resource sets> Multiple pools can link to the same set> Dynamic: resources are re(allocated) to meet demand and objectives

• Projects> Workload labels linked to a Resource Pool> Enables processes running in a project to have specific resource sets> Mechanism of “shares” to assign right amount of CPUs to workloads

Resource sets

Hardware

OS OS S

CPU Memory SWAP

Resource Pools

Hardware

OS OS S

CPU Memory SWAP

ResourcePool

ResourcePool

Projects

Hardware

OS OS S

CPU Memory SWAP

ResourcePool

ResourcePool

Project[10]

Project[50] Project

Server VirtualizationSolutions from Sun

Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.

Server

OS

App

Multiple OSes Single OSTrend to flexibility Trend to isolation

Dynamic System Domains Solaris Containers(Zones + SRM)

Solaris Containersfor Linux Applications

Solaris Trusted Extensions

Solaris Resource Manager(SRM)

Logical Domains

Xen

VMware

Microsoft Virtual Server

CalendarServer Database Web

ServerSunRayServer

AppServerDatabaseMail

ServerWeb

ServerFile

ServerIdentityServer

AppServer Database

Zones and Resource Mgt

Solaris Zones+

Solaris Resource Manager

=

Solaris Containers

S10 Resource Management

Hardware

OS OS S

CPU Memory SWAP

ResourcePool

ResourcePool

Project

ProjectProject

ZONE ZONE

Next Steps> Get Solaris

sun.com/solaris/get

> Get Data Sheets and White Paperssun.com/solaris/reference_materials

> Get Trainedsun.com/solaris/freetraining | Learning Paths: sun.com/training/solaris

1

2

3

4

> Get Started with Solaris Learning Centerssun.com/solaris/teachme5

> Get Currentsun.com/solaris/move | bigadmin.com/apps | bigadmin.com/hcl

6

> Get Involvedopensolaris.{org,com} | bigadmin.com | developers.sun.com/solaris

SAI – Sun Academic Initiativehttp://www.sun.com/solutions/landing/industry/education/sai/index.xml

• Collaborative relationship with educational institutions.

• Schools become authorized to deliver training on Sun technologies to their students, faculty, and staff.

• Access to free Web-based training and curricula, including courses in the latest Java and Solaris technologies.

• Fontys is already participating• Campus Ambassador Program

> http://developers.sun.com/students/community/map.jsp

Q & A

Bart Muijzerbart.muyzer@sun.com

Virtualization with Solaris

Recommended