111
Always On: Building Highly Available Applications on Cassandra Robbie Strickland

Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Embed Size (px)

Citation preview

Page 1: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Always On:Building Highly Available Applications on Cassandra

Robbie Strickland

Page 2: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Am I?

Robbie StricklandVP, Software [email protected]@rs_atl An IBM Business

Page 3: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Am I?• Contributor to C*

community since 2010• DataStax MVP 2014/15/16• Author, Cassandra High

Availability & Cassandra 3.x High Availability

• Founder, ATL Cassandra User Group

Page 4: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

What is HA?

Page 5: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

What is HA?• Five nines – 99.999% uptime?– Roughly 9 hours per year– … or a full work day of down time!

Page 6: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

What is HA?• Five nines – 99.999% uptime?– Roughly 9 hours per year– … or a full work day of down time!

• Can we do better?

Page 7: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Cassandra + HA• No SPOF• Multi-DC replication• Incremental backups• Client-side failure handling• Server-side failure handling• Lots of JMX stats

Page 8: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)

Page 9: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)• Properly designed topology

Page 10: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)• Properly designed topology• Data model that respects C* architecture

Page 11: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)• Properly designed topology• Data model that respects C* architecture• Application that handles failure

Page 12: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)• Properly designed topology• Data model that respects C* architecture• Application that handles failure• Monitoring strategy with early warning

Page 13: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA by Design (it’s not an add-on)• Properly designed topology• Data model that respects C* architecture• Application that handles failure• Monitoring strategy with early warning• DevOps mentality

Page 14: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes

Page 15: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes• NetworkTopologyStrategy

Page 16: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes• NetworkTopologyStrategy• GossipingPropertyFileSnitch– Or [YourCloud]Snitch

Page 17: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes• NetworkTopologyStrategy• GossipingPropertyFileSnitch– Or [YourCloud]Snitch

• At least 5 nodes

Page 18: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes• NetworkTopologyStrategy• GossipingPropertyFileSnitch– Or [YourCloud]Snitch

• At least 5 nodes• RF=3

Page 19: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Table Stakes• NetworkTopologyStrategy• GossipingPropertyFileSnitch– Or [YourCloud]Snitch

• At least 5 nodes• RF=3• No load balancer

Page 20: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

HA Topology

Page 21: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Consistency Basics

Page 22: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Consistency Basics• Start with LOCAL_QUORUM reads & writes– Balances performance & availability, and provides

single DC full consistency– Experiment with eventual consistency (e.g.

CL=ONE) in a controlled environment

Page 23: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Consistency Basics• Start with LOCAL_QUORUM reads & writes– Balances performance & availability, and provides

single DC full consistency– Experiment with eventual consistency (e.g.

CL=ONE) in a controlled environment• Avoid non-local CLs in multi-DC environments– Otherwise it’s a crap shoot

Page 24: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Failure

Page 25: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Failure• Don’t put all your

nodes in one rack!

Page 26: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Failure• Don’t put all your

nodes in one rack!• Use rack awareness– Places replicas in

different racks

Page 27: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Failure• Don’t put all your

nodes in one rack!• Use rack awareness– Places replicas in

different racks• But don’t use

RackAwareSnitch

Page 28: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Awareness

R2

R3R1

Rack A Rack B

Page 29: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Awareness

R2

R3R1

Rack A Rack B

GossipingPropertyFileSnitchcassandra-rackdc.properties

dc=dc1rack=a

dc=dc1rack=b

Page 30: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Rack Awareness (Cloud Edition)

R2

R3R1

Availability Zone A

Availability Zone B

[YourCloud]Snitch(it’s automagic!)

Page 31: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Data Center Replication

dc=us-1 dc=eu-1

Page 32: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Data Center ReplicationCREATE KEYSPACE myKeyspaceWITH REPLICATION = {

‘class’:’NetworkTopologyStrategy’,‘us-1’:3,‘eu-1’:3

}

Page 33: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Consistency?

dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM

Page 34: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Consistency?

dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM

Fullyconsistent

Fullyconsistent

Page 35: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Consistency?

dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM

Fullyconsistent

Fullyconsistent

?

Page 36: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Consistency?

dc=us-1 dc=eu-1Assumption: LOCAL_QUORUM

Fullyconsistent

Fullyconsistent

Eventually

consistent

Page 37: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Routing with LOCAL CLClient App

us-1

Client App

eu-1

Page 38: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Routing with LOCAL CLClient App

us-1

Client App

eu-1

Page 39: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Routing with non-LOCAL CL

Client App

us-1

Client App

eu-1

Page 40: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Routing with non-LOCAL CL

Client App

us-1

Client App

eu-1

Page 41: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC Routing• Use DCAwareRoundRobinPolicy wrapped by

TokenAwarePolicy– This is the default– Prefers local DC – chosen based on host distance

and seed list– BUT this can fail for logical DCs that are physically

co-located, or for improperly defined seed lists!

Page 42: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Multi-DC RoutingPro tip:val localDC = //get from configval dcPolicy =

new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()

.withLocalDc(localDC)

.build())

Be explicit!!

Page 43: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Handling DC Failure

Page 44: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Handling DC Failure• Make sure backup DC has sufficient capacity– Don’t try to add capacity on the fly!

Page 45: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Handling DC Failure• Make sure backup DC has sufficient capacity– Don’t try to add capacity on the fly!

• Try to limit updates– Avoids potential consistency issues on recovery

Page 46: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Handling DC Failure• Make sure backup DC has sufficient capacity– Don’t try to add capacity on the fly!

• Try to limit updates– Avoids potential consistency issues on recovery

• Be careful with retry logic– Isolate it to a single point in the stack– Don’t DDoS yourself with retries!

Page 47: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Topology Lessons• Leverage rack awareness• Use LOCAL_QUORUM

– Full local consistency– Eventual consistency across DCs

• Run incremental repairs to maintain inter-DC consistency• Explicitly route local app to local C* DC• Plan for DC failure

Page 48: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Data Modeling

Page 49: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Quick Primer

Page 50: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Quick Primer• C* is a distributed hash table– Partition key (first field in PK declaration)

determines placement in the cluster– Efficient queries MUST know the key!

Page 51: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Quick Primer• C* is a distributed hash table– Partition key (first field in PK declaration)

determines placement in the cluster– Efficient queries MUST know the key!

• Data for a given partition is naturally sorted based on clustering columns

Page 52: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Quick Primer• C* is a distributed hash table– Partition key (first field in PK declaration)

determines placement in the cluster– Efficient queries MUST know the key!

• Data for a given partition is naturally sorted based on clustering columns

• Column range scans are efficient

Page 53: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Quick Primer• All writes are immutable– Deletes create tombstones– Updates do not immediately purge old data– Compaction has to sort all this out

Page 54: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Cares?

Page 55: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Cares?• Bad performance = application downtime &

lost users

Page 56: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Cares?• Bad performance = application downtime &

lost users• Lagging compaction is an operations

nightmare

Page 57: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Who Cares?• Bad performance = application downtime &

lost users• Lagging compaction is an operations

nightmare• Some models & query patterns create serious

availability problems

Page 58: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Do

Page 59: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Do• Choose a partition key that distributes evenly

Page 60: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Do• Choose a partition key that distributes evenly• Model your data based on common read

patterns

Page 61: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Do• Choose a partition key that distributes evenly• Model your data based on common read

patterns• Denormalize using collections & materialized

views

Page 62: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Do• Choose a partition key that distributes evenly• Model your data based on common read

patterns• Denormalize using collections & materialized

views• Use efficient single-partition range queries

Page 63: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t

Page 64: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t• Create hot spots in either data or traffic

patterns

Page 65: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t• Create hot spots in either data or traffic

patterns• Build a relational data model

Page 66: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t• Create hot spots in either data or traffic

patterns• Build a relational data model• Create an application-side join

Page 67: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t• Create hot spots in either data or traffic

patterns• Build a relational data model• Create an application-side join• Run multi-node queries

Page 68: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Don’t• Create hot spots in either data or traffic

patterns• Build a relational data model• Create an application-side join• Run multi-node queries• Use batches to group unrelated writes

Page 69: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #1SELECT *FROM contactsWHERE id IN (1,3,5,7,9)

Page 70: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Client

Problem Case #1

SELECT *FROM contactsWHERE id IN (1,3,5,7)

1 26 5

4 72 8

3 67 8

1 35 2

4 57 8

1 36 4

Must ask every 4 out of 6 nodes in the cluster to satisfy quorum!

Page 71: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Client

Problem Case #1

SELECT *FROM contactsWHERE id IN (1,3,5,7)

1 26 5

4 72 8

3 67 8

1 35 2

4 57 8

1 36 4

“Not enough replicas available for query at consistency LOCAL_QUORUM” X

X1,3,5 all have sufficient replicas,yet entire query fails because of 7

Page 72: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #1

Page 73: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #1• Option 1: Be optimistic and run it anyway– If it fails, you can fall back to option 2

Page 74: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #1• Option 1: Be optimistic and run it anyway– If it fails, you can fall back to option 2

• Option 2: Run parallel queries for each key– Return the results that are available– Fall back to CL ONE for failed keys– Client token awareness means coordinator does less

work

Page 75: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #2CREATE INDEX ON contacts(birth_year)

SELECT *FROM contactsWHERE birth_year=1975

Page 76: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Client

Problem Case #2

SELECT *FROM contactsWHERE birth_year=1975

1975:JimSue

1975:SamJim

1975:SueTim

1975:TimJim

1975:SueSam

1975:SamTim

Index lives with the source data… so 5 nodes must be queried!

Page 77: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Client

Problem Case #2

SELECT *FROM contactsWHERE birth_year=1975

1975:JimSue

1975:SamJim

1975:SueTim

1975:TimJim

1975:SueSam

1975:SamTim

“Not enough replicas available for query at consistency LOCAL_QUORUM”

Index lives with the source data… so 5 nodes must be queried!

X

X

Page 78: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #2

Page 79: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #2• Option 1: Build your own index– App has to maintain the index

Page 80: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #2• Option 1: Build your own index– App has to maintain the index

• Option 2: Use a materialized view– Not available before 3.0

Page 81: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #2• Option 1: Build your own index– App has to maintain the index

• Option 2: Use a materialized view– Not available before 3.0

• Option 3: Run it anyway– Ok for small amounts of data (think 10s to 100s of rows)

that can live in memory– Good for parallel analytics jobs (Spark, Hadoop, etc.)

Page 82: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #3CREATE TABLE sensor_readings (

sensorID uuid,timestamp int,reading decimal,PRIMARY KEY (sensorID, timestamp)

) WITH CLUSTERING ORDER BY (timestamp DESC);

Page 83: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #3• Partition will grow unbounded– i.e. it creates wide rows

Page 84: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #3• Partition will grow unbounded– i.e. it creates wide rows

• Unsustainable number of columns in each partition

Page 85: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Problem Case #3• Partition will grow unbounded– i.e. it creates wide rows

• Unsustainable number of columns in each partition

• No way to archive off old data

Page 86: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Solution #3CREATE TABLE sensor_readings (

sensorID uuid,time_bucket int,timestamp int,reading decimal,PRIMARY KEY ((sensorID, time_bucket),

timestamp)) WITH CLUSTERING ORDER BY (timestamp DESC);

Page 87: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring

Page 88: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring Basics

Page 89: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring Basics• Enable remote JMX

Page 90: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring Basics• Enable remote JMX• Connect a stats collector (jmxtrans, collectd,

etc.)

Page 91: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring Basics• Enable remote JMX• Connect a stats collector (jmxtrans, collectd,

etc.)• Use nodetool for quick single-node queries

Page 92: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Monitoring Basics• Enable remote JMX• Connect a stats collector (jmxtrans, collectd,

etc.)• Use nodetool for quick single-node queries• C* tells you pretty much everything via JMX

Page 93: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Thread Pools• C* is a SEDA architecture– Essentially message queues feeding thread pools– nodetool tpstats

Page 94: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Thread Pools• C* is a SEDA architecture– Essentially message queues feeding thread pools– nodetool tpstats

• Pending messages are bad:Pool Name Active Pending Completed Blocked All time blockedCounterMutationStage 0 0 0 0 0ReadStage 0 0 103 0 0RequestResponseStage 0 0 0 0 0MutationStage 0 13234794 0 0 0

Page 95: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction

Page 96: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction• Lagging compaction is the reason for many

performance issues

Page 97: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction• Lagging compaction is the reason for many

performance issues• Reads can grind to a halt in the worst case

Page 98: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction• Lagging compaction is the reason for many

performance issues• Reads can grind to a halt in the worst case• Use nodetool tablestats/cfstats &

compactionstats

Page 99: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction• Size-Tiered: watch for high SSTable counts:

Keyspace: my_keyspaceRead Count: 11207Read Latency: 0.047931114482020164 ms.Write Count: 17598Write Latency: 0.053502954881236506 ms.Pending Flushes: 0

Table: my_tableSSTable count: 84

Page 100: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction• Leveled: watch for SSTables remaining in L0:

Keyspace: my_keyspaceRead Count: 11207Read Latency: 0.047931114482020164 ms.Write Count: 17598Write Latency: 0.053502954881236506 ms.Pending Flushes: 0

Table: my_tableSSTable Count: 70SSTables in each level: [50/4, 15/10, 5/100]

50 in L0 (should be 4)

Page 101: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction Solution• Triage:– Check stats history to see if it’s a trend or a blip– Increase compaction throughput using nodetool

setcompactionthroughput– Temporarily switch to SizeTiered

Page 102: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Lagging Compaction Solution• Triage:– Check stats history to see if it’s a trend or a blip– Increase compaction throughput using nodetool

setcompactionthroughput– Temporarily switch to SizeTiered

• Do some digging:– I/O problem?– Add nodes?

Page 103: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots

Page 104: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Only takes one to wreak havoc

Page 105: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Only takes one to wreak havoc• It’s a data model problem

Page 106: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Only takes one to wreak havoc• It’s a data model problem• Early detection is key!

Page 107: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Only takes one to wreak havoc• It’s a data model problem• Early detection is key!• Watch partition max bytes– Make sure it doesn’t grow unbounded– … or become significantly larger than mean bytes

Page 108: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Use nodetool toppartitions to sample

reads/writes and find the offending partition

Page 109: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Wide Rows / Hotspots• Use nodetool toppartitions to sample

reads/writes and find the offending partition• Take action early to avoid OOM issues with:– Compaction – Streaming– Reads

Page 110: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

For More Info…

(shameless book plug)

Page 111: Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Thanks!

Robbie [email protected]@rs_atl An IBM Business