Cassandra - Deep Dive

Embed Size (px)


Presentation of internal architecture and features of Cassandra based on the version 1.2

Text of Cassandra - Deep Dive

  • 1. Cassandra A Decentralized Structured Storage System By Sameera Nelson

2. Outline Introduction Data Model System Architecture Failure Detection & Recovery Local Persistence Performance Statistics 3. What is Cassandra ? Distributed Storage System Manages Structured Data Highly available , No SPoF Not a Relational Data Model Handle high write throughput No impact on read efficiency 4. Motivation Operational Requirements in Facebook Performance Reliability/ Dealing with Failures Efficiency Continues Growth Application Inbox Search Problem, Facebook 5. Similar Work Google File System Distributed FS, Single master/Slave Ficus/ Coda Distributed FS Farsite Distributed FS, No centralized server Bayou Distributed Relational DB System Dynamo Distributed Storage system 6. Data Model 7. Data Model Figure from Eben Hewitts slides. 8. Supported Operations insert(table; key; rowMutation) get(table; key; columnName) delete(table; key; columnName) 9. Query Language CREATE TABLE users ( user_id int PRIMARY KEY, fname text, lname text ); INSERT INTO users (user_id, fname, lname) VALUES (1745, 'john', 'smith'); SELECT * FROM users; 10. Data Structure Log-Structured Merge Tree 11. System Architecture 12. Architecture 13. Fully Distributed No Single Point of Failure 14. Cassandra Architecture Partitioning Data distribution across nodes Replication Data duplication across nodes Cluster Membership Node management in cluster adding/ deleting 15. Partitioning The Token Ring 16. Partitioning Partitions using Consistent hashing 17. Partitioning Assignment in to the relevant partition 18. Partitioning, Vnodes 19. Replication Based on configured replication factor 20. Replication Different Replication Policies Rack Unaware Rack Aware Data center Aware 21. Cluster Membership Based on scuttlebutt Efficient Gossip based mechanism Inspired for real life rumor spreading. Anti Entropy protocol Repair replicated data by comparing & reconciling differences 22. Cluster Membership Gossip Based 23. Failure Detection & Recovery 24. Failure Detection Track state Directly, Indirectly Accrual Detection mechanism Permanent Node change Admin should explicitly add or remove Hints Data to be replayed in replication Saved in system.hints table 25. Accrual Failure Detector Node is faulty, suspicion level monotonically increases. (t) k k - threshold variable Node is correct (t) = 0 26. Local Persistence 27. Write Request 28. Write Operation 29. Write Operation Logging data in commit log/ memtable Flushing data from the memtable Flushing data on threshold Storing data on disk in SSTables Mark with tombstone Compaction Remove deletes, Sorts, Merges data, consolidation 30. Write Operation Compaction 31. Read Request Direct/ Background (Read repair) 32. Read Operation 33. Delete Operation Data not removed immediately Only Tombstone is written Deleted in Compacting Process 34. Additional Features Adding compression Snappy Compression Secondary index support SSL support Client/ Node Node/ Node Rolling commit logs SSTable data file merging 35. Performance 36. Performance High Throughput & Low Latency Eliminating on-disk data modification Eliminate erase-block cycles No Locking for concurrency control Maintaining integrity not required High Availability Linear Scalability Fault Tolerant 37. Statistics 38. Stats from Netflix Liner scalability 39. Stats from Netflix 40. Some users 41. Thank you 42. Read Detailed Structure