22
Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28,2016

Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

MonitoringSwiftOpenStackSummit, Austin2016

AdamTakvam,Sr.SystemsEngineerMartinLanner,EngagementManager

April28,2016

Page 2: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

2 |SwiftStack Confidential

Page 3: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

3

Overview

• Problems- Usage intelligence- Capacityplanning- Operational health- Audittrails

• Background- Methods: logs+systemmetrics- Interpretation ofmetrics- Actions:thresholds +alerting

• Swiftkeymonitoring concepts- Whattomonitor?- Howtomonitor

• Monitoring methods - demos- Logging:ELK- Trending/Forecasting:

Prometheus +Grafana- Systemmonitoring:Zabbix

|SwiftStack Confidential

Page 4: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

4

It’sLinux!

|SwiftStack Confidential

Page 5: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

5

PropertiesofSwift

• Distributed system

• Extremelydurable through replicationorErasure Coding

• Nosinglepointoffailure

• Evendistributionofdata

• Resilient

• Self-healing capabilities

• Cantakealotofabuseandnegligence

Page 6: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

6

AnatomyofaMonitoringSolution

• Agent: Gathersmetricsonahostandeitherpushedoradvertisesthem- Logstash- PrometheusNodeExporter- ZabbixAgent- NagiosNRPE

• Aggregation Engines: Collects metrics fromagents andprovides an APIwith access toaggregated metric values- Nagios- Zabbix- Elasticsearch- Prometheus

• Visualizer: Renders graphs inahuman-friendlyformat for easy comprehension ofsystemstate- Kibana- Grafana

• Alerting: Uses metric thresholds totriggeralerts when metrics fall out ofan acceptablerange- AlertManager- PagerDuty

|SwiftStack Confidential

Page 7: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

7

FormsofMonitoring

• Systemutilization: CPU,memory,diskI/O,network,auditingcycles,replicatortiming

• Performance:Transaction latency

• Errors:Invalidrequests orstates

• Outages:Servicefailures

• Featureusage:Understand CRUDoperations andtrafficpatterns

• Audittrail:Whodidwhatwhen?

MonitoringLifecycle

• Measurement

• Reporting

• Characterization

• Thresholds

• Alerting

• Rootcauseanalysis

• Remediation- Manual- Automated

|SwiftStack Confidential

Developing aMonitoring Strategy

Page 8: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

8

Examplesofmonitoringmethods

• ELK: Usage intelligence- Who?- Agents- HTTPresponse codes- Errors- Audittrails

• Prometheus: Capacityplanning- Datagrowth- Trendinganalytics

• Zabbix: Operationalhealth- Network- CPU- RAM

Page 9: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

9

KeyconceptsformonitoringSwift

• Cluster full- df- Datagrowth- Capacityplanning

• Networking- Availability- Saturation

• Proxystate- CPU- /healthcheck

• Auditingcycles

• Replicationcycletiming

Page 10: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

10

LoadbalancerhealthchecksagainstSwiftproxyservers

demo@demo:~$ curl http://swift.swiftstack.oss/healthcheckOK

|SwiftStack Confidential

• Mostloadbalancers runICMPchecksagainstallIPsinitspoolbydefault

• Also,considerconfiguring theloadbalancer torunTCPchecksagainstSwift’s/healthcheck endpoint

Example:

Page 11: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

11

AudittrailswithELK

|SwiftStack Confidential

Page 12: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

12

Objectsizedistribution

|SwiftStack Confidential

Page 13: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

13

DistributionofCRUDoperationsovertime

|SwiftStack Confidential

Page 14: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

14

ZabbixtriggersforSwift

|SwiftStack Confidential

Page 15: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

15

Zabbixnodememoryusage

|SwiftStack Confidential

Page 16: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

16

Zabbixdriveutilizationevents

|SwiftStack Confidential

Page 17: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

17

DiskI/O

|SwiftStack Confidential

Page 18: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

18

ObjectReplicatorOperations

|SwiftStack Confidential

Page 19: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

19

Prometheus+Grafanatrendingandforecasting

|SwiftStack Confidential

Page 20: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

20

Alerting

ALERT StorageCritical24HoursIF sum(predict_linear(node_filesystem_free{

job='swiftstack',mountpoint=~"/srv/node/.*”}[1d]), 24*3600) < sum(node_filesystem_size{job="swiftstack",mountpoint=~"/srv/node/.*”}) * 0.2

FOR 1hLABELS {group="storage_admin“severity="critical“

}

|SwiftStack Confidential

Translation:Sendacriticalalerttoallmembersofthestorage_admin groupifthetotalavailablestoragecapacityisprojectedtobelessthan20%ofthetotalstoragecapacitywithinthenext24hoursandthatforecasthasheldtrueforatleast1hour,recalculatingevery5minutes(perserverconfig /notshown).

Example:

Page 21: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

21

Q&A/Demo

|SwiftStack Confidential

Page 22: Monitoring Swift · Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016

22

Thankyou!

|SwiftStack Confidential