20
Manage your compactions before they manage you!

Manage your compactions before they manage you!

Embed Size (px)

Citation preview

Page 1: Manage your compactions before they manage you!

Manage your compactions before they manage you!

Page 2: Manage your compactions before they manage you!

About Pythian

18 Years of Data infrastructure• management consulting• 200+ Top brands• 6000+ databases under• management• Over 300 DBA’s, in 29

countries• Top 5% of DBA work force,

9 • Oracle ACE’s, 2 Microsoft

MVP’• s, 1 Cassandra MVP• Oracle, Microsoft, MySQL,• Datastax partners,

Netezza,• Hadoop and MongoDB plus• UNIX Sysadmin and Oracle • apps

2© 2015. All Rights Reserved.

Page 3: Manage your compactions before they manage you!

About Me

• Cassandra Consultant– First contact was 0.8

• Cassandra MVP & Datastax Certified Architect

• Lisbon Cassandra Meetup• Passion for distributed

systems• Loves a good challenge• Waterpolo is my sport

• @cjrolo3© 2015. All Rights Reserved.

Page 4: Manage your compactions before they manage you!

1 Why Compact

2 Compaction strategies and tuning

3 General tuning

4 Take aways, compacted

5 Q&A

4© 2015. All Rights Reserved.

Page 5: Manage your compactions before they manage you!

Compaction Impact

• High I/O usage• Temporary increase on disk space• High CPU usage• Increase latency on operations!

5© 2015. All Rights Reserved.

Page 6: Manage your compactions before they manage you!

Commonly heard

• "My system is compacting 100% of the time"• "All disk I/O is used by compaction"• "Compaction is far behind"• "Hundreds or thousands of SSTables!"

6© 2015. All Rights Reserved.

Page 7: Manage your compactions before they manage you!

• Quick recap on the write path– Commitlog -> Memtable -> SSTable– SSTables are immutable

Why do we compact?

7© 2015. All Rights Reserved.

Page 8: Manage your compactions before they manage you!

• Tombstones• Row duplicates• Rows spread across multiple SSTables• Consolidation of data is imperative

Why do we compact? (2)

8© 2015. All Rights Reserved.

Page 9: Manage your compactions before they manage you!

Size-tiered compaction

• The original compaction type!• "This strategy triggers a minor compaction when there are a

number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace"

• When to use: write once data, write heavy scenarios, limited I/O available

9© 2015. All Rights Reserved.

Page 10: Manage your compactions before they manage you!

Tuning size-tiered

• CQL Properties:– bucket_high– bucket_low– cold_reads_to_omit– max_threshold– min_threshold– min_sstable_size

10© 2015. All Rights Reserved.

Page 11: Manage your compactions before they manage you!

Levelled compaction

• Strategy appeared in Cassandra 1.0. • "The leveled compaction strategy creates SSTables of a fixed,

relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous."

• When to use: Random reads, Reads that are latency sensitive, highly updated rows

11© 2015. All Rights Reserved.

Page 12: Manage your compactions before they manage you!

Tuning levelled compaction

12© 2015. All Rights Reserved.

• CQL Properties:– sstable_size_in_mb

Page 13: Manage your compactions before they manage you!

Time-series compaction

13© 2015. All Rights Reserved.

• Available since Cassandra 2.0.11, 2.1.1• "DateTieredCompactionStrategy stores data written within a

certain period of time in the same SSTable"• When to use? Time series data!

Page 14: Manage your compactions before they manage you!

• Time-Series Tuning

14© 2015. All Rights Reserved.

• CQL Properties:– base_time_seconds– max_sstable_age_days– max_threshold– min_threshold– timestamp_resolution

Page 15: Manage your compactions before they manage you!

DTCS Quirks...

• Out-of-order data– Hints– Clients not in sync– Repairs– Someone inserted out-of-order data...

15© 2015. All Rights Reserved.

Page 16: Manage your compactions before they manage you!

Monitoring Compaction

• CFStats• Nodetool Compactionstats• CompactionManagerMBean:

– CompletedTasks: Number of completed compactions since the last start of this Cassandra instance

– PendingTasks: Number of estimated tasks remaining to perform– ColumnFamilyInProgress: The table currently being compacted.– BytesTotalInProgress: Total number of data bytes (index and filter are not included) being

compacted.– BytesCompacted: The progress of the current compaction.

• Strace• iotop

16© 2015. All Rights Reserved.

Page 17: Manage your compactions before they manage you!

Disk Tuning

• Compaction means large I/O• Big RAID stripes• SSDs!!• Dedicated non-striped disks• No SAN/NAS• I/O scheduler can have some impact• Some linux settings can be used for emergencies.

17© 2015. All Rights Reserved.

Page 18: Manage your compactions before they manage you!

Take Aways, compacted

• What did we learn here...– Selecting the proper compaction strategy can improve your cluster

performance• Doing the opposite can create serious issues...

– Monitor your compactions!– You can try compactions strategies out without changing your tables!

18© 2015. All Rights Reserved.

Page 19: Manage your compactions before they manage you!

Q&A

• Thanks for listening!• Questions?

19© 2015. All Rights Reserved.

Page 20: Manage your compactions before they manage you!

Thank you