35
Director of Cloud Operations [email protected] www.cuddletech.com Ben Rockwood SmartOS Operations Tuesday, October 2, 12

SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

  • Upload
    lecong

  • View
    241

  • Download
    3

Embed Size (px)

Citation preview

Page 1: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Director of Cloud Operations

[email protected]

Ben Rockwood

SmartOSOperations

Tuesday, October 2, 12

Page 2: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Agenda

• Principles of Operation

• Provisioning

• Monitoring

• Configuration Management

• Orchestration

• Authentication

• Access Control

• Auditing

• Logging

• Metrics

• Tips & Tools2

Tuesday, October 2, 12

Page 3: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Obligatory DevOps Pitch

• DevOps is about 3 things:

• The Collaboration of People

• The Convergence of Process

• The Creation & Exploitation of Tools

• Its primary goal is providing quality & value to customers

• It is concerned with flow and encourages system thinking

• Born from TPS/LEAN, TOC, Agile, & classical Operations Management

3

Tuesday, October 2, 12

Page 4: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Principles of Operation

• Goals:

• Omnipotence: All Powerful

• Omnipresence: All Seeing

• Omniscience: All Knowing

• ... Since we can’t really do that...

• Make change control simple and standardized

• Monitor as deeply as possible and alert a human as needed

• Leverage a suite of tools to help us analyze problems quickly

4

Tuesday, October 2, 12

Page 5: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Principles of Operation (cont’d)

• Man is mortal

• Follow the “no snowflake” rule; minimize variation to maximize maintainability and predictability

• Maintain a set of standard operating procedures (SOP’s) to ensure quality across the organization

• Make tools as simple and productive as possible to avoid ad hoc (rouge) administration

• Keep it simple stupid (KISS); cleverness is temporary but grok’ability is forever

• Leverage industry standard tools and stock supplied facilities, avoid excessive customization

5

Tuesday, October 2, 12

Page 6: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Provisioning

• USB Keys, CD/DVD/ISO, or PXE possible

• PXE is preferred for all serious production deployments

• PXE greatly simplifies upgrade/downgrade; just change the TFTP image to boot and reboot the machine.

• Much faster and controllable than USB re-images in place

• Shameless Smart Data Center Plug

6

Tuesday, October 2, 12

Page 7: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Monitoring• JPC uses Zabbix

• Free & Open Source

• Proxy architecture allows for multiple data centers easily

• Agent or Agent-less Operation

• Agent-less: Supports IPMI & SNMP

• Agent is tiny, written in C and can compile static for easy binary installation without dependancies

• Extremely easy to customize and add custom metrics

• Dashboard provides a “single pane of glass” view of your entire infrastructure

• Includes historical graphing of all metrics

• Caution: Use Percona as the backend-database7

Tuesday, October 2, 12

Page 8: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Zabbix System Status

8

Tuesday, October 2, 12

Page 9: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Monitoring: Completing the Solution

9

• Zabbix Agents installed by Chef

• Statically compiled binaries distributed in Cookbook

• Zabbix Alerts

• All alerts sent to Ops Staff Jabber & Email directly

• “Disaster” alerts sent to PagerDuty (SMS Ops)

• Pingdom used as backup/redundant solution, alerts sent to PagerDuty

Tuesday, October 2, 12

Page 10: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Configuration Management

• In SmartOS, CM is mandatory (imho)

• JPC uses Chef-Solo for Configuration Management

• Bootstrap script is curled and piped to bash, which:

• Downloads Chef “Fat Client”

• Creates Chef-Solo Configuration

• Creates SMF Service and Runs Chef

• Each data center has its own “attributes file” which specifies Zabbix Server, LDAP servers, SSH Keys, etc.

• One set of Cookbooks are used for all DC’s

10

Tuesday, October 2, 12

Page 11: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Config Management w/Chef

• JPC Chef Cookbooks include:

• “joyent”: Base cookbook run on all nodes, installs basic tools, fixes anything undesirable in SmartOS, adds BMC driver, adds MegaSAS tools, etc.

• “computenode”: Modifications specific to general purpose compute nodes (currently empty)

• “ldap”: Configures LDAP client, modifies PAM for netgroups support, creates user directories, configures ZFS for delegated administration, etc.

• “zabbix”: Installs and configures Zabbix

• ... others, including “northstar”, “bart”, “logging”, etc.

11

SmartOS Cookbooks & Tools: github.com/joyent/smartos_cookbooks

Tuesday, October 2, 12

Page 12: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Orchestration

• Orchestration layer is required for ad-hoc mass control of nodes, for:

• Re-running Chef

• Mass service control (“svcadm disable zones” on all nodes)

• Auditing

• ... things you can’t foresee

• Several options exist: Mcollective, pssh, mussh, etc.

• SDC includes an Mcollective like solution (sdc-oneachnode)

12

Tuesday, October 2, 12

Page 13: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Orchestration w/ Mussh

13

$ p mussh -H compute-nodes -c 'svcs -H zones'10.0.96.22: online Feb_17 svc:/system/zones:default10.0.96.23: online Feb_17 svc:/system/zones:default10.0.96.24: online Feb_17 svc:/system/zones:default10.0.96.25: online Feb_17 svc:/system/zones:default

$ cat zonecount.mussh RUNNING=`zoneadm list -vc | grep running | grep -v global | wc -l`INSTALLED=`zoneadm list -vc | grep installed | wc -l`

echo "Node ${HOSTNAME}: ${RUNNING} Zones Running - \ ${INSTALLED} Zones in Installed State"

$ p mussh -H nodes -C zonecount.mussh 10.0.96.22: Node 45SY9R1: 9 Zones Running - 2 Zones in Installed State10.0.96.23: Node 8X7Y9R1: 7 Zones Running - 1 Zones in Installed State10.0.96.25: Node 8BZY9R1: 8 Zones Running - 1 Zones in Installed State10.0.96.26: Node 5MPY9R1: 18 Zones Running - 4 Zones in Installed State10.0.96.28: Node H4FY9R1: 5 Zones Running - 1 Zones in Installed State

Tuesday, October 2, 12

Page 14: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Other Orchestration Tools to Consider

• ClusterSSH (cssh): http://sourceforge.net/projects/clusterssh/

• RunDeck (formerly ControlTier): http://rundeck.org

14

Tuesday, October 2, 12

Page 15: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

User Management & Authentication

• Use LDAP!

• JPC uses OpenLDAP

• Easy to manage; lots of resources

• Flexible replication schemes

• Flat text file configuration makes change control easier

• Client Access via Simple-SSL (636)

• Don’t enable Anon access, you do NOT need it

• Firewalled legacy 389 access provided for some appliances

• Preform daily management via Apache DirectoryStudio

• Generate User Passwords using apg (20 char len)15

Tuesday, October 2, 12

Page 16: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

LDAP Considerations

• The “Hard Part” is creating the Schema & seeding the DIT; JPC’s “ldap_kit” will be open sourced soon

• Always deploy LDAP Servers in pairs

• Use MirrorMode replication

• Enforce auth for all users (no anon) and only use SSL if you can

• Don’t mess around with anything other than OpenLDAP & the standard Illumos LDAP Client (ie: don’t go chasing Linux PAM projects, you don’t need them)

• When configuring clients via CM, modify files directly. Trying to exec ldapclient init may have mixed results.

16

Tuesday, October 2, 12

Page 17: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

A Word About Kerberos

• Its not worth the administrative overhead (imho)

• I don’t believe in SSO for administration in production environments (password entry encourage boundary awareness)

• Keep an eye on ApacheDS (directory.apache.org) & FreeIPA (freeipa.org) Projects

17

Tuesday, October 2, 12

Page 18: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Access Control

• Use Role Based Access Control (RBAC)

• Its not hard... really!

• Manage RBAC in LDAP, if possible

• Create abstraction profiles, ex:

• Joyent Level D: Normal user + DTrace

• Joyent Level 1: Normal user + Zone/VM Management

• Joyent Level 2: Admin, All but security

• Joyent Level 3: “Primary Administrator” (uid=0)

18

Tuesday, October 2, 12

Page 19: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

RBAC: Simple Example

19

[root@smartos01 ~]# zfs create -o mountpoint=/home zones/home

[root@smartos01 ~]# useradd -s /bin/bash -m -d /home/benr -P "Primary Administrator" benr80 blocks

[root@smartos01 ~]# passwd benrNew Password: Re-enter new Password: passwd: password successfully changed for benr

benr@smartos01:~$ grep benr /etc/user_attrbenr::::type=normal;profiles=Primary Administrator

=====

$ ssh benr@xxxxxxxxxPassword:

benr@smartos01:~$ cat /etc/shadowcat: cannot open /etc/shadow: Permission denied

benr@smartos01:~$ pfexec cat /etc/shadowroot:$5$YB.Wp7J7$3iLhl.ivH4TCCAFoih6oXCqGIF0SMAjws3w4xwxwOZ4:14897::::::daemon:NP:6445::::::bin:NP:6445::::::

Tuesday, October 2, 12

Page 20: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

RBAC: Learning More

• Authorizations are in /etc/security/auth_attr

• Execs are in /etc/security/exec_attr

• Profiles associate auths and execs for easy reference in /etc/security/prof_attr

• They are associated with users in /etc/user_attr

20

$ grep ZFS /etc/security/prof_attrSoftware Installation:::Add application software to the system:profiles=ZFS File System Management;help=RtSoftwareInstall.htmlZFS File System Management:::Create and Manage ZFS File Systems:help=RtZFSFileSysMngmnt.htmlZFS Storage Management:::Create and Manage ZFS Storage Pools:help=RtZFSStorageMngmnt.html

$ grep ZFS /etc/security/exec_attrZFS File System Management:solaris:cmd:::/sbin/zfs:euid=0ZFS Storage Management:solaris:cmd:::/sbin/zpool:uid=0ZFS Storage Management:solaris:cmd:::/usr/lib/zfs/availdevs:uid=0

Tuesday, October 2, 12

Page 21: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

RBAC Example in LDAP

21

Tuesday, October 2, 12

Page 22: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

RBAC Shells

• pfbash, pfcsh, pfsh, etc.

• Avoid them; intended for roles, not users.

22

Tuesday, October 2, 12

Page 23: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Auditing

• Basic Security Module (BSM) Lives!

• BSM Auditing is enabled by Default

• Audit trails in /var/audit

• Make sure to add a crontab to rotate audit trails (“audit -n”) daily or weekly; by default it does not.

• Print audit trails using “praudit -ls <trail>”; example:

23

# praudit -ls 20120830170449.20120930090225.78-2b-cb-47-af-7d | grep EXECVE | \> awk -F, '{ print $12 " (" $7 "): " $19 " " $20}' root (2012-08-30 17:08:58.862 +00:00): /usr/bin/zonenameroot (2012-08-30 17:08:58.879 +00:00): /usr/sbin/zoneadm listroot (2012-08-30 17:08:58.897 +00:00): /usr/sbin/zfs listroot (2012-08-30 17:08:58.934 +00:00): /bin/bash /usr/bin/sysinforoot (2012-08-30 17:08:58.938 +00:00): uname -sroot (2012-08-30 17:08:58.940 +00:00): zonenameroot (2012-08-30 17:08:58.950 +00:00): cat /tmp/.sysinfo.jsonroot(2012-08-30 17:09:04.155 +00:00): /usr/node/bin/node /usr/sbin/vmadmroot(2012-08-30 17:09:04.273 +00:00): /usr/bin/zonename subjectroot(2012-08-30 17:09:04.287 +00:00): /usr/sbin/zoneadm -zroot(2012-08-30 17:09:04.306 +00:00): /usr/sbin/zfs list

Tuesday, October 2, 12

Page 24: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Auditing with BART

• BART == Basic Auditing & Reporting Tool

• Similar to TripWire

• Consider using “BARTlog”

24

2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/chef/lib/ruby/gems/1.9.1/gems/chef-10.14.2/lib/chef/provider/package/smartos.rb size 2811 3401 mtime 5064a239 5066c229 contents 93d30d38740082bce6529a24ee1024bf 54b367a570a1bc273237add5628b12b4 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include size 7 8 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11 add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/DECkeysym.h add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/HPkeysym.h add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/Sunkeysym.h add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/X.h add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/XF86keysym.h add 2012-09-30T00:00:01+00:00 78-2b-cb-47-af-7d root: [ID 702911 audit.error] BART Reports Change: /opt/local/include/X11/XWDFile.h add

Tuesday, October 2, 12

Page 25: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Logging

• SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

• Rsyslog is a syslog server for this century, includes TCP support, TLS, filtering, compression, database support, etc.

• SMF Services log to /var/svc/log

• System logs found in /var/adm & /var/log

25

Tuesday, October 2, 12

Page 26: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Logging Tips

• Enable BSM Syslog Plugin

• Sadly, command executions do not include ARGV today :(

• Use logger(1) in your scripts to write syslog messages

• Centralize Syslog

• Leverage Rsyslog’s TCP capabilities for clients

• Leverage Rsyslog’s filtering capabilities for building centralized syslog servers

• ... if you can afford it, buy Splunk or SumoLogic

• ... if you can’t, consider Graylog2 and/or Logstash

• If you have too much time on your hands, go Hadoop26

Tuesday, October 2, 12

Page 27: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Metrics

• “If it moves, graph it. If its important, alert on it.”

• Kstats are your friend (See all available: “kstat -p”)

• For everything else, there is dtrace

27

Tuesday, October 2, 12

Page 28: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Metrics: Kstats

28

$ kstat -p | wc -l 33461

$ kstat -p bnx:0:mac:rbytes && sleep 60 && !kstatkstat -p bnx:0:mac:rbytes && sleep 60 && kstat -p bnx:0:mac:rbytesbnx:0:mac:rbytes 614389071bnx:0:mac:rbytes 614419131

30,060 Bytes Recv’d on bnx0

• A “registry” of kernal statistics

• Most stats are counters; to calculate activity find the delta

• Most common tools use Kstats as their source data, ex:

• vmstat

• iostat

• fsstat

Tuesday, October 2, 12

Page 29: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Metrics Graphing Solutions

• RRDtool: All-in-One database and graphing solution; local

• Ganglia: Flexible cluster graphing solution, based on RRDtool (agent-based)

• Graphite: Modern alternative to RRDtool; network based graphing and “rrd” data storage. (agent-less)

• In the end, data is feed into nearly all tools as key/value pairs with a timestamp.

29

Tuesday, October 2, 12

Page 30: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Northstar RRDtool Example

30

Tuesday, October 2, 12

Page 32: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Other Tools & Tips

• Use the Ptools to observe processes

• pfiles: List open file descriptors of a process

• pargs: List arguments & env vars on a process

• pmap: Show memory allocation of a process

• and... pldd, pflags, pcred, pstack, pstop, prun, pwait, etc.

• Monitor per mount file system activity with fsstat

• SmartOS includes ziostat and zmemstat

32

Tuesday, October 2, 12

Page 33: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Other Tools & Tips

• Use IPMI if you’ve got it! IPMI goodies include:

• Sensor Data Repository (sdr)

• System Event Log (sel)

• Serial Console Redirection Over LAN (sol)

• FRU Inventory (fru)

• Know your place! Use LLDP if your network provides it.

• ‘getldp.pl’ uses snoop to listen for LLDP packets

33

$ p ./getldp.pl -lx -i bnx0Watching for LLDP packet on bnx0 for 60 seconds... device-id: 00:1c:73:XX:XX:XX platform: Arista Networks EOS version 4.7.8 running on an Arista Networks DCS-7048T-A port-id: Ethernet37 sysName: XXX-AR48-TOR-3-XX native-vlan: 998

Tuesday, October 2, 12

Page 34: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

... now go forth and operate that thing!

34

Tuesday, October 2, 12

Page 35: SmartOS Operations - Cuddletechcuddletech.com/slides/SmartOS-Operations.pdf · • JPC uses Zabbix ... • SmartOS ships with Rsyslog; will fallback to stock syslogd if you wish

Thank You.

35

Tuesday, October 2, 12