42
©2016 Couchbase Inc. Monitoring Production Deployments The Tools – LinkedIn Alex Ma – Principal Architect – Couchbase Michael Kehoe – Staff Site Reliability Engineer - LinkedIn 1

LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

Embed Size (px)

Citation preview

Page 1: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 1

Monitoring Production Deployments The Tools –

LinkedInAlex Ma – Principal Architect – Couchbase

Michael Kehoe – Staff Site Reliability Engineer - LinkedIn

Page 2: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 2©2016 Couchbase Inc.

Overview

• Monitoring Tools• Making sense of the data• External Monitoring Integrations• Summary

Page 3: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 3

Alex MaPrincipal Architect, Strategic [email protected]

IMAGE GOES HERE

Page 4: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 4

Michael KehoeStaff Site Reliability Engineer (SRE) - [email protected]• Production-SRE team• Member of CBVT• Australian!

• Contact• linkedin.com/in/michaelkkehoe• @matrixtek

\GOES HERE

Page 5: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 5

Monitoring Tools

Page 6: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 6

Monitoring Tools – Couchbase Web Console

Page 7: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 7

Monitoring Tools – Couchbase Web Console

Page 8: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 8

Monitoring Tools – Couchbase Web Console

Page 9: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 9

Monitoring Tools – Couchbase REST API

• http://docs.couchbase.com/admin/admin/REST/rest-bucket-stats.html

• GET /pools/default/buckets/[bucket-name]/stats• JSON output format• 60 collections per metric

Page 10: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 10

Monitoring Tools - cbstats

• http://docs.couchbase.com/admin/admin/CLI/cbstats-intro.html• Command Line tool for viewing stats• 333+ Available stats• Cumulative and Snapshot

Page 11: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 11

Monitoring Tools - cbstats

• Average value size = ep_value_size/(curr_items_tot-ep_num_non_resident)

• ep_value_size = Amount of RAM used to hold values in this bucket for this node

• Curr_items_tot = Total count of active/replica items in this bucket for this node

• Ep_num_non_resident = Total number of items not resident in RAM• 9567135872 / ( 28733039 – 26582747 ) = 4449.22 bytes

Page 12: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 12

Monitoring Tools - cbstats

• Cbstats can be pointed to a specific host and a specific port

Page 13: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 13

Monitoring Tools - cbstats

• Cbstats Timings• Histogram that shows the timing of a number of internal operations

• Commit to disk, background IO operations, GET ops• http://

docs.couchbase.com/admin/admin/CLI/CBstats/cbstats-timing.html

Page 14: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 14

Monitoring Tools - Queries

• http://developer.couchbase.com/documentation/server/current/tools/query-monitoring.html

• http://localhost:8093/admin/vitals

Page 15: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 15

Monitoring Tools - htop

• Htop|Top|vmstat|proc• Core Utilization• Customization

Page 16: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 16

Monitoring Tools - iostat

• IO Utilization• Average wait times• Read/Write requests• Determine Capacity

Page 17: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 17

Monitoring Tools - iostat

• IO Utilization• Average wait times• Read/Write requests• Determine Capacity

Page 18: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 18

Monitoring Tools - iftop

• See where traffic is coming from• Measure replication throughput• Verify Capacity

Page 19: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 19

Making Sense of the data

Page 20: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 20

Key Statistics

Metrics to Consider:• Couchbase-Server • Client application• Disk• Network

Page 21: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 21

Key Statistics – Couchbase Server

Page 22: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 22

Key Statistics – Couchbase Server

Metrics to Consider:• Operations• Cache miss (ep_cache_miss_rate)• Active/Replica vbuckets (vb_active_num/vb_replica_num)• Percentage of items in memory (vb_active_resident_items_ratio)• Disk Queue (ep_diskqueue_items)• Misdirected Requests (ep_num_not_my_vbuckets)

Page 23: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 23

Key Statistics – Couchbase Client

Metrics to Consider:• Call-time latency

• Measure GET’s/ SET’s separately• Hit-rate

• Is the hit-rate what you expected• Errors

• Timeouts retrieving objects• Unable to reach Couchbase-Server

• See http://developer.couchbase.com/documentation/server/4.0/sdks/java-2.2/event-bus-metrics.html

Page 24: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 24

Key Statistics – Couchbase Client

Page 25: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 25

Key Statistics – Disk

Metrics to Consider:• Disk Space

• Compaction• Rebalance

• Disk IO• Can disk sustain required IOPS• Disk Queue

Page 26: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 26

Key Statistics – Network

Metrics to Consider:• Network connectivity• Connections• Capacity/ Utilization

Page 27: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 27

Key Statistics – Network – Connectivity

• Ping - simple network connectivity test

• Firewalls – make sure you have the correct ports open• See http://

developer.couchbase.com/documentation/server/current/install/install-ports.html

Page 28: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 28

Key Statistics – Network – Connections

• File-descriptor limits• Connections in CLOSE_WAIT state

• Collect stats from /proc/net/tcp

Page 29: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 29

Key Statistics – Network – Capacity/ Utilization

• Practical network capacity is ~85-90% of theoretical• E.g. 1Gb/s network interface can do 850-900Mb/s

• Congested networks are problematic• Higher latency on responses • Slower replication

• Collect stats from /proc/net/dev

Page 30: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 30

Key Statistics – Network – Capacity/ Utilization

• Practical network capacity is ~85-90% of theoretical (1250 Mb/s)• E.g. 1Gb/s network interface can do 850-900Mb/s

Average object size (bytes) 4,096

ID length (bytes) 32

Meta data size (bytes) 56

Reads 100,000

Writes 60,000

Replica count 1

Read network utilization 421,600,000

Write network utilizaation 502,080,000

Total network utilization 923,680,000 1.25 billion theoretical max

remaining bandwidth 276,320,000

Page 31: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 31

External Monitoring Integrations

Page 32: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 32

External Monitoring Integrations

Page 33: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 33

External Monitoring Integrations – Write your own

Getting Started• Use Couchbase REST API• Pipe ‘cbstats’ output

Page 34: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 34©2016 Couchbase Inc.

Using Couchbase REST API

• Examples• Datadog – http://lnkd.in/cb-datadog• This Example – http://lnkd.in/cb-stats-collector

Page 35: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 35©2016 Couchbase Inc.

Using Couchbase REST API

Page 36: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 36©2016 Couchbase Inc.

Using Couchbase REST API

Page 37: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 37©2016 Couchbase Inc.

Using Couchbase REST API

Page 38: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 38©2016 Couchbase Inc.

Using Couchbase CBstats

Page 39: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 39©2016 Couchbase Inc.

Using Couchbase CBstats

Page 40: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 40

Summary

Page 41: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 41

Summary

Important to have monitoring in-placeUnderstand the metrics you monitor• What causes them• How to remediate

Page 42: LinkedIn: Monitoring production deployments: the tools – Couchbase Connect 2016

©2016 Couchbase Inc. 42

Thank You!