CASSANDRA DAY ATLANTA 2016
MONITORING CASSANDRA
Aaron Morton@aaronmorton
CEO
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle.
Work with clients to deliver and improve Apache Cassandra based solutions.
Apache Cassandra Committer and DataStax MVPs.
Based in New Zealand, Australia, France & USA.
MetricsMonitoring & Alerting
Insights
codehale / yammer / drop wizard
Metrics<dependency groupId=“io.dropwizard.metrics" artifactId=“metrics-core" version="3.1.0" />
Metrics
Seperate Collection from Reporting.
Metrics Collection
Metrics are always collected.
Metrics
Metrics have a dotted notation name, timestamp, and
value e.g.com.thelastpickle.presenters.count=2
Metric Types
Gauge.
A simple value.
Metric Types
Ratio Gauge.
A ratio between two values.
Metric Types
Histograms.
The distribution of values in a stream of data.
Histograms
Quantiles (e.g. 75th, 95th) calculated using reservoir
sampling.(Check docs.)
Histograms
Default Exponentially Decaying Reservoirs, (roughly) the last five
minutes of data, exponential weighting towards newer data.
(Check docs.)
Metric Types
Meter
Measures the per second rate at which a set of events occur.
Meter
Three different exponentially-weighted moving average rates: 1, 5, and 15 minutes
Metric Types
Timer.
Histogram of duration and rate of events .
Reporting
Reporters run in the Cassandra process, pushing
metrics to external services.
Reporters
ConsoleReporter, GraphiteReporter, InfluxDBReporter, RiemannReporter,
…
Reporters In Cassandra
Configuration file:
metrics-reporter-config-sample.yaml
Reporters In Cassandragraphite: - period: 10 timeunit: 'SECONDS' prefix: 'cassandra.prod.ip_1_2_3_4.' hosts: - host: '1.2.3.4' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.apache.cassandra.metrics.+"
metrics-reporter-config
Configures Metrics reporters.
github.com/addthis/metrics-reporter-config
metrics-reporter-config
Supports:
GangliaGraphiteRiemann
JMX
Cassandra creates JMX MBeans for each Metric.
JMX
Reporters
Reporters may change the name of measures, e.g.95thPercentile == p95
MetricsMonitoring & Alerting
Insights
Monitoring and Alerting
Use what you like and what works for you.
Monitoring Platforms
OpsCentre, Grafana & Graphite, DataDog, Riemann
MetricsMonitoring & Alerting
Insights
Names ?
All under
org.apache.cassandra.metrics
Scale ?
Latency? microsecondsRates? per second
Data? bytes
Percentiles ? 75thPercentile 95thPercentile 99thPercentile
Rates ? OneMinuteRate
Request Throughput - All RequestsClientRequest.
$REQUEST.Latency.1MinuteRate
CASRead, CASWrite, RangeSlice, Read, ViewWrite,
Write
A Note On Requests
We will focus onRead, Write
But there are othersCAS*, RangeSlice, ViewWrite
Request Throughput - Per TableTable.$KEYSPACE.$TABLE.
ReadLatency.1MinuteRate WriteLatency.1MinuteRate
Request Latency - All RequestsClientRequest.
Write.Latency.95percentile Read.Latency.95percentile
Request Latency - Per TableTable.$KEYSPACE.$TABLE.
CoordinatorReadLatency.95percentile
Local Latency - Per TableTable.$KEYSPACE.$TABLE.
WriteLatency.95percentile ReadLatency.95percentile
Local Read PathTable.$KEYSPACE.$TABLE.
KeyCacheHitRate.value BloomFilterFalseRatio.value
LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile
Memory UsageTable.$KEYSPACE.$TABLE.
BloomFilterOffHeapMemoryUsed.value IndexSummaryOffHeapMemoryUsed.value
MemtableOnHeapSize.value MemtableOffHeapSize.value
ClientsClient.connnectedNativeClients.value
CQL.PreparedStatementsRatio.value
CQL.PreparedStatementsEvicted.value
Client ErrorsClientRequest.
$REQUEST.Unavailables.1MinuteRate $REQUEST.Timeouts.1MinuteRate $REQUEST.Failures.1MinuteRate
InconsistencyStorage.TotalHints.count
HintedHandOffManager. Hints_created-$IP_ADDRESS.count
Connection.TotalTimeouts.1MinuteRate Connection.$IP_ADDRESS.Timeouts.
1MinuteRate
Inconsistency
Will also want to monitor dropped messages, later…
Eventual ConsistencyReadRepair.Attempted.1MinuteRate
ReadRepair.RepairedBackground.1MinuteRate
ReadRepair.RepairedBlocking.1MinuteRate
Server ErrorsStorage.Exceptions.count
Disk UsageStorage.Load.count
Table.$KEYSPACE.$TABLE. TotalDiskSpaceUsed.count
CompactionsCompaction.PendingTasks.value
Compaction.TotalCompactionsCompleted.1MinuteRate
Table.$KEYSPACE.$TABLE.PendingCompactions .value
Thread Pool PerformanceThreadPools.request.
MutationStage.PendingTasks.value ReadStage.PendingTasks.value
CounterMutationStage.PendingTasks.value RequestResponseStage.PendingTasks.value
ViewMutationStage.PendingTasks.value
Thread Pool PerformanceDroppedMessage.
MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate
Thread Pool PerformanceDroppedMessage.
$VERB.InternalDroppedLatency .95thPercentile
$VERB.CrossNodeDroppedLatency .95thPercentile
Commit Log PerformanceCommitLog.
PendingTasks.Value
WaitingOnSegmentAllocation.95thPercentile
WaitingOnCommit.Value
Thanks.
Aaron Morton@aaronmorton
Co-Founder & Principal Consultantwww.thelastpickle.com