Upload
planet-cassandra
View
360
Download
1
Embed Size (px)
DESCRIPTION
Presenter: Claudiu Barbura, Senior Director of Engineering at Atigeo xPatterns is a big data analytics platform-as-a-service that enables rapid development of enterprise-grade analytical applications. It provides tools, API sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to Cassandra and solrCloud clusters for real-time access through low-latency/high-throughput (automatically generated) apis as well as dashboard and visualization api/tools leveraging the available data and models. In this talk I'll share some of the hard lessons we've learned in the past three years while leveraging Cassandra (and Hector) in large-scale enterprise-grade deployments. We will focus on three specific areas, in which we identified consistent best practices & design patterns: data model optimization as a result of exporting data from HDFS/Hive/Shark into Cassandra through Spark/Hadoop MR jobs under Mesos with throttling, instrumentation and resilience features, automatically publishing geo-replicated, instrumented and monitored REST API's on top of the exported Cassandra data, and lessons learned from running Cassandra at scale from 0.6 to 2.0.6, including performance tuning, and tips and tricks. You will see live demos of our Publish to NoSql tools (Spark/Shark, Mesos, Hive, Cassandra ), a dashboard application built on top of generated data apis (D3.js, Cassandra) and xPatterns' monitoring and instrumentation consoles (Graphite, Ganglia, Nagios).
Citation preview
2
Cassandra in xPa+erns
Cassandra Summit Sept 2014
3
• xPa'erns Architecture • Export to NoSql API (Demo) • Monitoring, instrumentaAon (Demo) • xPa'erns applicaAon (Demo) • Data Modeling • Lessons Learned since 0.6 All 2.0.6
Agenda
4
5
6
Demos …
7
• NTP: synchronize ALL clocks (servers and clients) • Schema disagreement: lock cluster (Zk) before CF create/delete • Reduce the number of CFs (avoid OOM … memtable_total_space_in_mb) • Do not drop CFs before emptying them (truncate/compact first) • Monitoring, instrumentaAon, automaAc restarts • ConsistencyLevel: ONE is best … for our use cases • Key cache, Snappy (LZ4) compression, vnodes
Lessons learned 0.6 -‐ 2.0.6
8
• Rows not too skinny and not too wide (avoid OOM) o Less memory pressure during high-‐throughput writes
o Reduced network I/O, less rows, more column slices
o Key cache & bloom filter index size affects perf
o Efficient compacAon, avoid hot spots • Custom serializaAon and dynamic columns for maximum perf gain (40%)
Data Modeling
9
Q & A
© 2013 AAgeo, LLC. All rights reserved. AAgeo and the xPa'erns logo are trademarks of AAgeo. The informaAon herein is for informaAonal purposes only and represents the current view of AAgeo as of the date of this presentaAon. Because AAgeo must respond to changing market condiAons, it should not be interpreted to be a commitment on the part of AAgeo, and AAgeo cannot guarantee the accuracy of any informaAon provided ager the date of this presentaAon. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.