View
161
Download
0
Category
Preview:
Citation preview
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1
DILEEP KALIDINDI23rd February 2015
Explore, Build & Operate
NoSQL with
Apache Cassandra
Who am I ?
Dileep Varma Kalidindi
Current: Senior Engineer @Responsys (since Apr’14), Circles Team.
Fascination: Problem Solving , Distributed & BigData churning systems.
Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.
Hobbies: Adventure sports.
05/02/2023
Are we good ?
3
Data
Data
Data has never been in same structure, so as their modelling techniques.
Applications evolved from OLAP, OLTP to Web, Mobile & Social.
Big Data comes with different characteristics – Volume, Velocity, Variety, Veracity & Value.
Responsys Data:
Need for better suitable Data models and Storage models
- but why ?
Impending Mismatch –Data model & Storage model
SQL relational model is User oriented
in store concurrency, integrity, consistency, or data type validity
Transactional guarantees, schemas and referential integrity
Purpose applications tend to control integrity and validity (not aggregation fancy)
Difference between the persistent data model and the in-memory data structures.
Data duplication and denormalization are now First class citizens !!
Scale–up to Scale– wide – NoSQL Multinode vs RDBMS clustering.
Conceptual – ACID, BASE & CAP
Transactions, consistency and availability – could we prioritize ?
CAP theorem - consequences
Agenda
NoSQL NoSQL Implementations – for various purposes Architecture fit – Polyglot persistence Data modelling – concepts in view of NoSQL . Cassandra – Architecture Database Internals CQL & DEMO Installation, Configuration & tools Oracle NoSQL – pitch by Sheetal
# NoSQL
May 2, 2023 11
NoSQL
Non-relational, distributed, open-source & horizontally scalable #nxtGen
NoSQL is an accidental neologism.
Schema less storage systems built for 5 v’s of Bigdata.
Decentralized – Every node in cluster is identical
High Availability - No SPoF – No Network failures
Open source and No cost models (Except for enterprise support)
NoSQL – Architecture fit-in
Polyglot persistence thinking fits in right data store for appropriate data sets.
Service usage over Direct data usage.Concerns
Operational concerns like licensing, support, tools, upgrade, auditing. Security of Datastore, Context’s, Authorization etc .. Integration with ETL and Data transfer utilities. Deployment complexity
Data models – in view of NoSQL
NoSQL models are application specific “What questions do I have?”
Relational models are driven by structure of data “What answers do I have?”
Modelling techniques Conceptual: Denormalization, Aggregates & Application side joins General: Atomic aggregates, Enumerable Keys, Dimensionality
reduction, Index table & Composite key index. Hierarchical: Tree aggregation, Materialized paths, Nested sets &
batch graph processing.
Data models – deep view
Conceptual: DeNormalization Query data volume or IO per query VS total data volume
Processing complexity VS total data volumeAggregates:
Simple Atomic
Tree aggregation:
NoSQL - implementations
If one implementation fits all then why not RDBMS ?Classification is driven in application point of view !Key-Value
Strong aggregation which is opaque to the database Oracle NoSQL, Windows Azure & Redis
Document database Structure in the aggregate MongoDb, CouchDb & Raven DB
NoSQL - implementations
Column family structures Two level aggregate structure Key & a row aggregate, Row aggregate is a group of columns. Big table, Hbase & Cassandra
Graphs database Neo 4j
NoSQL – implementations – CAP fit
May 2, 2023 21
Apache Cassandra - Continuous availability, linear scalability & operational simplicity
About Column store NoSQL Database. Originally developed by Facebook (2007) and now an Apache project Master less architecture with all nodes in Ring topology Commercial add-ons & support (“enterprise edition”) by Datastax
Data center replication, Scalability (wide), Fault-tolerance & Tunable consistency.
Online load balancing, flexible schema, key-oriented queries & CAP-aware Implementation of good Security standards, Operations, Monitoring & utilities.
Column – Key-value pair Counter column Expiring column Super column
Column family – Collection of rows - Map <RowKeys, OrderedColumn Collection> Dynamic (Wide) Static (Narrow)
KeyStore – containts column families & super column familes
Cassandra – data model
CAP Values – AP (Availability & Partition tolerance). Consistency (eventual) available with latency. No row locking (Hbase wins!)
Linear scaling of Cassandra – throughput vs no-of nodes. Casandra Cluster – Partioner generates tokens for rowKeys Write in action Read in action
Cassandra – Architecture
Installation & Configuration
Yum installation is the easiest - /etc/yum.repos.d/datastax.repo Cassandra.yaml configuration
Cluster_name, data_file_dir, commitlog_dir Directory locations Start Cassandra :– Cassandra –f
Start CLI:- cqlsh Stop Cassandra – service stop or process kill
Demo
May 2, 2023 26
CQL in action
CQL 3.0 is much like SQL. All names are case-insensitive
CQL Data types: Create KeySpace: Responsys_Demo Create table, index, user All other SQL like functions !!
Cassandra – Monitoring
JMX Interface – DEMO Nodetool – Cassandra JMX interface
cfstats Netstats Ring & other operations
DataStax Ops center Nagios monitoring Cassandra logging & GC logging
05/02/2023
29Confidential
Summary, Conclusions&
References
Summary – Quick recap
Data evolution ACID, BASE & CAP NoSQL, data models, implementations Cassandra & Data model Architecture Installations & Operations
Links & References• https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ • http://www.thoughtworks.com/insights/blog/nosql-databases-overview • http://www.dia.uniroma3.it/~torlone/bigdata/L6-NoSQL.pdf • https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ • http://radar.oreilly.com/2013/03/returning-transactions-to-distributed-data-stores.html
05/02/2023
32
Q & A
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.33
Thank you
APPENDIX
Recommended