79
NOSQL INTRO WITH MONGODB & CASSANDRA NOSQL Intro with MongoDB and Cassandra 1 Big Data and NoSQL with MongoDB & Cassandra

Big Data, NoSQL with MongoDB and Cassasdra

  • View
    3.046

  • Download
    3

Embed Size (px)

DESCRIPTION

Presentation on Big Data, NoSQL with MongoDB and Cassasdra

Citation preview

Page 1: Big Data, NoSQL with MongoDB and Cassasdra

1

NOSQL INTRO WITH MONGODB & CASSANDRA

NOSQL Intro with MongoDB and Cassandra

Big Data and NoSQL with MongoDB & Cassandra

Page 2: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

2

Requisite Slide – Who Am I?

- Brian Enochson- SW Engineer who has worked as designer /

developer on NOSQL (Mongo, Cassandra, Hadoop)- Specialize in SW Development, architecture and

training

Brian Enochson [email protected] Twitter @benochso Google Plus https://plus.google.com/+

BrianEnochson

Page 3: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

3

Agenda

 • Presentation Intro• Introduction to Big Data• Introduction to NoSQL• Relational Database to NoSQL technology

contrast & compare• NoSQL landscape

Page 4: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

4

Agenda

• Introduction to MongoDB• MongoDB Components, capabilities and

common use cases• Json & BsON• Documents, collections, references and

Mongo ID• Querying• Data Modeling/Schema Design• Replication & Sharding

Page 5: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

5

Agenda

• Cassandra• Architecture• Data Model• Data Modeling• Application Development• Wrap-up and final Q & A

Page 7: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

7

Big Data – Why Needed

Why are database like Mongo or Cassandra needed?

• To understand one needs to look at • the history of databases• How systems were built in the past

• Then examine modern applications• Web scale• Data acquisition

• Other factors like cost of H/W

Page 8: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

8

History of the Database

• 1960’s – Hierarchical and Network type (IMS and CODASYL)

• 1970’s – Beginnings of theory behind relational model. Codd

• 1980’s – Rise of the relational model. SQL. E/R Model (Chen)

• 1990’s – Access/Excel and MySQL. ODMS began to appear

• 2000;’s – Two forces; large enterprise and open source. Google and Amazon. CAP Theorem (more on that to come…)

• 2010’s – Immergence of NoSQL as an industry player and viable alternative

Page 9: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

9

Why were alternatives needed

• Developers today are faced with Internet scale

• 100,000’s of users• Low cost of storage• Increased processing power• Ability to capture (and need) of millions of events. Caching

solves it to an extent but brings other complexities• Real-time• Need to scale out and not up. (add infinite number of low

cost machines vs. replace with a more powerful machine).

• Cost• Let’s not forget for enterprise DB’s Internet scale can become

expensive• Open source DB’s may solve license cost, but don’t ignore

operational costs

Page 10: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

10

A lot of data

Some facts from http://www.storagenewsletter.com/rubriques/market-reportsresearch/ibm-cmo-study/

Approximately 90 percent of all the real-time information being created today is unstructured data

Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30 zeroes!!)

90 percent of the world's data today has been created in the last two years alone

Page 11: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

11

Relational vs. NoSQL

• Relational

• Divide into tables, relate into foreign keys, DB constraints, normalized data, the Interface is SQL

• NoSQL

• Store in schemaless format, redundancy encouraged, application access determines the storage format (your queries).Interface varies and is optimized for the implementation, no forced DB constraints.

Page 12: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

12

Are Tradeoffs Bad?

Luckily, due to the large number of compromises made when attempting to scale their existing

relational databases, these tradeoffs were not so foreign or distasteful as they might have been.

Greg Burd - https://www.usenix.org/legacy/publications/login/2011-10/openpdfs/Burd.pdf

Page 13: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

13

What Are Tradeoffs?

Eventual consistency

Application has increased responsibility such as maintain consistency & handle transactions

Store redundant data

Page 14: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

14

3 V’s – Describing the Big Data Problem

Driving force in requiring new technology is often referred to as the “3 V’s”.

• Volume – amount of data• Variety – range of data types and sources• Velocity – speed of data in and out

Page 15: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

15

NoSQL is not Big Data

NoSQL != Big Data

NoSQL products were created to help solve the big data problem.

Big data is a much larger problem than just storage. Analysis tools like Hadoop, messaging systems like Kafka, real time processing engines like Storm and machine learning (Mahout) all help solve the big data problem.

Page 16: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

16

NoSQL Types

Document DB MongoDB, CouchDB,

Wide Column– Column Family Cassandra, HBASE, Amazon SimpleDB

Key Value• Riak, Redis, DynamoDB, Voldemort, MemcacheDB

Graph• Neo4J, OrientDB

Search (search can also be a persistence store)• Lucene, Solr, ElasticSearch

Many many many, many more! (http://nosql-database.org/)

Page 17: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

17

Choosing the right one…

Choosing the right NoSQL type and eventual product depends on…

Type of Data• One key and a lot of data?• Schema variance• High volume of data?• Storing, media, blobs, • Document oriented?• Tracking relationships?• Combination?• Multi-Datacenter

Type of Access Volumes of Data (there is big data and there is BIG DATA) Need/want support/services/training

Page 18: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

18

Some Basics Concepts

• ACID

• CAP Theorem

• BASE

Page 19: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

19

ACIDPROBABLY HAVE HEARD OF ACID• Atomic – All or None

• Consistency – What is written is valid

• Isolation – One operation at a time

• Durability – Once committed to the DB, it stays

This is the world we have lived in for a long time…

Page 20: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

20

CAP Theorem (Brewers)

Many may have heard this one

CAP stands for Consistency, Availability and Partition Tolerance• Consistency –like the C in ACID. Operation is all or nothing,

• Availability – service is available.

• Partition Tolerance – No failure other than complete network failure causes system not to respond

** http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

Page 21: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

21

You can only have 2 of them

In Mongo terms you can have 2 of 3. Availability, Partition-Tolerance or Eventual Consistency.

Page 22: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

22

VISUAL GUIDE – USING THE CAP THEOREM

http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 23: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

23

Big Data Wrap up

• So we are talking about large amounts of data

• High velocity of acquisition

• A lot of variety that we need to store. Will worry about it later how to handle (or not)

• Need to scale and not break the bank

• Want the database to support agile, not hinder

Page 24: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

24

Still Wrapping

• Maybe consider going relational if

• Highly transactional (FoundationDB?)

• Business Intelligence Systems (Hadoop may make this not true)

• Don’t be fooled by fear of losing ACID….http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-are-base-not-acid-availability.html

Page 25: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

25

And nowlet’s look at MongoDB

Page 26: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

26

DB Popularityhttp://db-engines.com/en/ranking_definition

Page 27: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

27

Mongo Overview

Few high level points

• Document Oriented• Storage format is JSON (actually BSON)• Replication built in• Master / slave architecture• Strong querying support• Name from "humongous"

Page 28: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

28

Meet Mongo

• Open Source

• Schemaless

• Scalable

• Document Level Atomicity

• Easy Installation

• Relatively Ease Of Use

• Great (!!!!) Documentation

Page 29: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

29

And…

• No cross document transactions

• No joins

• Replication – master / slave

• Sharding

Page 30: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

30

Mongo Advantage

-

* Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen)

Page 31: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

31

Mongo Consistency

Master Slave and Secondary Reads** http://docs.mongodb.org/manual/core/replication-introduction/

Page 32: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

32

Replica Sets

Primary Receives all write requests Replica set can only have on primary Mongo stored all changes in oplog

Secondary Replicates primary oplog Clients can prefer to read from secondaries If primary goes down a new primary is

elected (after 10 seconds no response)

Page 33: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

33

Sharding http://docs.mongodb.org/manual/core/sharding-introduction/

Page 34: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

34

Sharding Clusters

Shards Store the data, normally in production each

shard is a replica set Routers

Routes client operations to shards based on shard key, can have more than one for availability Shard key is range based or hashed

Config Servers Contains cluster metadata Production there are 3 config servers

Page 35: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

35

Mongo Document At its simplest form, Mongo is a document oriented database

• MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs.

• MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents. BSON contains more data types than does JSON.

** For in-depth BSON information, see bsonspec.org.

Page 36: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

36

What does a Document Look Like

{ "_id" : "52a602280f2e642811ce8478",

"ratingCode" : "PG13", "country" : "USA", "entityType" : "Rating” }

Page 37: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

37

Mongo Documents

Page 38: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

38

Rules for a document

Documents have the following rules:

The maximum BSON document size is 16 megabytes.

The field name _id is reserved for use as a primary key; its value must be unique in the collection.

The field names cannot start with the $ character.

The field names cannot contain the . character.

Page 39: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

39

Mongo Install

Windows http://docs.mongodb.org/manual/tutorial/install-mongodb

-on-windows/

MAC http://docs.mongodb.org/manual/tutorial/install-

mongodb-on-os-x/

Create Data Directory , Defaults• C:\data\db• /data/db/ (make sure have permissions)

Or can set using -dbpath C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\

data

Page 40: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

40

Start It!

Databasemongod

Shellmongo

show dbsshow collectionsdb.stats()

Page 41: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

41

Basic Operations

1_simpleinsert.txt

Insert

Find Find all Find One Find with criteria

Indexes Explain()

Page 42: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

42

More Mongo Shell

2_arrays_sort.txt

• Embedded documents

• Limit, Sort

• Using regex in query

• Removing documents

• Drop collection

Page 43: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

43

Import / Export

3_imp_exp.txt

Mongo provides tools for getting data in and out of the database• Data Can Be Exported to json files

• Json files can then be Imported

Page 44: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

44

Conditional Operators

4_cond_ops.txt

• $lt• $gt• $gte• $lte• $or

• Also $not, $exists, $type, $in

(for $type refer to http://docs.mongodb.org/manual/reference/operator/query/type/#_S_type )

Page 45: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

45

Analytics

Aggregation Framework Uses a pipeline model to perform a series of

operations on data. Common is a match phase (selection) and then grouping (create result)

Map Reduce Two phases

Map that creates one or more documents from each input document

Reduce phase that combines output from Map into some result

Finalize – optional that can perform some logic (e.g. sorting) on reduce output

Page 46: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

46

Admin Commands

5_admin.txt

• how dbs• show collections• db.stats()• db.posts.stats()• db.posts.drop()• db.system.indexes.find()

Page 47: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

47

Data Modeling

• Remember with NoSql redundancy is not evil

• Applications insure consistency, not the DB

• Application join data, not defined in the DB

• Datamodel is schema-less• Datamodel is built to support queries

usually

Page 48: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

48

Questions to ask

• Your basic units of data (what would be a document)?

• How are these units grouped / related?

• How does Mongo let you query this data, what are the options?

• Finally, maybe most importantly, what are your applications access patterns?

• Reads vs. writes• Queries• Updates• Deletions• How structured is it

Page 49: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

49

Data Model - Normalized

Normalized

• Similar to relational model.

• One collection per entity type

• Little or no redundancy

• Allows clean updates, familiar to many SQL users, easier to understand

Page 50: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

50

Normalized documents

Page 51: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

51

References

• From parent to child{ name: "O'Reilly Media",

books: [12346789, 234567890, ...]}

• From child to parent{ _id: 123456789, title: "MongoDB: The Definitive Guide", publisher_id: "oreilly"}

Page 52: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

52

Data Model - Embedded

Often used pattern in Mongo is to embed information as subdocuments.

• Used when there is a contains relationship

• Easier querying (when related data is often used together)

• Need to keep 16 MB document size in mind

Page 53: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

53

Embedded

Page 54: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

54

Other considerations For Data Modeling

Many or few collections• Many Collections

• As seen in normalized• Clean and little redundancy• May not provide best performance• May require frequent updates to application if new types added

• Multiple Collections• Middle ground, partially normalized

• Not many collections• One large generic collection• Contains many types• Use type field

Page 55: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

55

Consideration Continued

• Document Growth – will relocate if exceeds allocated size

• Atomicity• Atomic at document level• Consideration for insertions, remove and multi-document updates

Sharding – collections distributed across mongod instances, uses a shard key.

Indexes – index fields often queries, indexes affect write performance slightly

Consider using TTL to automatically expire documents

Page 56: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

56

Common Uses For Mongo

CMS Systems

Log Collection https://code.google.com/p/log4mongo/

Caching

Queues / Messaging Capped Collections - fixed-size collections that support high-throughput

operations that insert, retrieve, and delete documents based on insertion order.

Analytics

Prototyping

Page 57: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

57

MongoDB Development with Java

Mongo DriverSupplied by MongoDB Itself

Easy to setup

Housed on maven repo

Morphia

Uses App Model

Handles References Well

Spring Mongo

Great if using Spring already

Page 58: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

58

Other

Node Javascript (JSON), Coffeescript MEAN Stack

Scala Casbah Reactive Mongo

Page 59: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

59

MEAN Stack

Get MEAN

Mongo, Express, Angular and Node

http://bitnami.com/stack/mean

http://mean.io

Can install, in a VM or even in the cloud

Page 60: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

60

The cloud

Database in the cloud

https://mongolab.com/

Can access using shell, GUI Mongo explorer, mongoimport, mongoexport and use in application

Amazon, Rackspace, Joyent or Azure

Page 61: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

61

Books

MongoDB: The Definitive Guide, 2nd EditionBy: Kristina ChodorowPublisher: O'Reilly Media, Inc.Pub. Date: May 23, 2013Print ISBN-13: 978-1-4493-4468-9Pages in Print Edition: 432

MongoDB in ActionBy: Kyle BankerPublisher: Manning PublicationsPub. Date: December 16, 2011Print ISBN-10: 1-935182-87-0Print ISBN-13: 978-1-935182-87-0Pages in Print Edition: 312

The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop ComputingBy Eelco Plugge; Peter Membrey; Tim HawkinsApress, September 2010ISBN: 9781430230519327 pages

Page 62: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

62

Books Cont.MongoDB Applied Design PatternsBy: Rick CopelandPublisher: O'Reilly Media, Inc.Pub. Date: March 18, 2013Print ISBN-13: 978-1-4493-4004-9Pages in Print Edition: 176

MongoDB for Web Development (rough cut!)By: Mitch PirtlePublisher: Addison-Wesley ProfessionalLast Updated: 14-JUN-2013Pub. Date: March 11, 2015 (Estimated)Print ISBN-10: 0-321-70533-5Print ISBN-13: 978-0-321-70533-4Pages in Print Edition: 360

Instant MongoDBBy: Amol Nayak;Publisher: Packt PublishingPub. Date: July 26, 2013Print ISBN-13: 978-1-78216-970-3Pages in Print Edition: 72

Page 64: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

64

Cassandra

Let’s look briefly at Cassandra as an alternative to Mongo

Page 65: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

65

Cassandra History

• Developed At Facebook, based on Google Big Table and Amazon Dynamo **

• Open Sourced in mid 2008

• Apache Project March 2009

• Commercial Support through Datastax (originally known as Riptano, founded 2010)

• Used at Netflix, eBay and many more. Reportedly 300 TB on 400 machines largest installation

• Current version is 2.0.3

Page 66: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

66

C* Basics

• No Single Point of Failure – highly available. • Peer to Peer – no master

• Data Center Aware – distributed architecture• Linear Scaling – just add hardware• Eventual Consistency, tunable tradeoff between

latency and consistency• Architecture is optimized for writes.• Can have 2 billion columns (cells)!• Data modeling for reads. Design starts with looking at

your queries. (sound familiar?)• With CQL became more SQL-Like, but no joins, no

subqueries, limited ordering (but very useful)• Column Names can part of data, e.g. Time Series

Page 67: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

67

C* Eventual Consistency

** Important Term **Quorum : Q = N / 2 + 1.

We get consistency in a BASE world by satisfying W + R > N

3 obvious ways:

1. W = 1, R = N

2. W = N, R = 1

3. W = Q, R = Q

(N is replication factor, R = read replica count, W = write replica count)

Page 68: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

68

C* Data Model

C* data model is made of these: Column – a name, a value and a timestamp.

Applications can use the name as the data and not use value. (RDBMS like a column).

Row – a collection of columns identified by a unique key. Key is called a partition key (RDBMS like a row).

Column Family – container for an ordered collection rows. Each row is an ordered collection of columns. Each column has a key and maybe a value. (RDBMS like a

table). This is also known as a table now in C* terms. Keyspace – administrative container for CF’s. It is a

namespace. Also has a replication strategy – more late. (RDBMS like a DB or schema).

Page 69: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

69

How Does This Look?

Page 70: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

70

Tokens

Tokens – partitioner dependent element on the ring. Each node has a single unique token assigned. Each node claims a range of tokens that is from its token to

token of the previous node on the ring.

Use this formula Initial_Token= Zero_Indexed_Node_Number * ((2^127) / Number_Of_Nodes) In cassandra.yamlinitial token=42535295865117307932921825928971026432 ** http://blog.milford.io/cassandra-token-calculator/

Page 71: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

71

Replication

• Replication is how many copies of each piece of data that should be stored. In C* terms it is Replication Factor or “RF”.

• In C* RF is set at the keyspace level:CREATE KEYSPACE drg_compare WITH replication = {'class':'SimpleStrategy',

'replication_factor':3};

• How the data is replicated is called the Replication Strategy• SimpleStrategy – returns nodes “next” to each

other on ring, Assumes single DC• NetworkTopologyStrategy – for configuring

per data center. Rack and DC’s aware.update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}];

Page 72: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

72

C* Ring Topology

Page 73: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

73

SimpleStrategy

Using token generation values from before. 4 node cluster. Write value with token 32535295865117307932921825928971026432

Page 74: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

74

SimpleStrategy (Cont)

Page 75: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

75

Coordinator and CL

• When writing, Coordinator Node will be selected. Selected at write (or read) time. Not a SPF!

• Using Gossip Protocol nodes share information with each other. Who is up, who is down, who is taking which token ranges, etc. Every second, each node shares with 1 to 3 nodes.

• Consistency Level (CL) – says how many nodes must agree before an operation is a success. Set at read or write operation.

• ONE – coordinator will wait for one node to ack write (also TWO, THREE). One is default if none provided.

• QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM, EACH_QUORUM

• ANY – waits for some replicate. If all down, still succeeds. Only for writes. Doesn’t guarantee it can be read.

• ALL– Blocks waiting for all replicas

Page 76: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

76

Insuring Consistency

3 important concepts: Read Repair - At time of read, inconsistencies are noticed

between nodes and replicas are updated. Direct and background. Direct is determined by CL.

Anti-Entropy Node Repair - For data that is not read frequently, or to update data on a node that has been down for a while, the nodetool repair process (also called anti-entropy repair). Builds Merkle trees, compares nodes and does repair.

Hinted Handoff - Writes are always sent to all replicas for the specified row regardless of the consistency level specified by the client. If a node happens to be down at the time of write, its corresponding replicas will save hints about the missed writes, and then handoff the affected rows once the node comes back online. This notification happens is via Gossip. Default 1 hour.

Page 77: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

77

Application Development

• Interaction with Cassandra can be done using one of supplied clients such as CLI or CQL. Otherwise client applications are built using a language client library.

• Many clients in multiple languages. Including Java, .NET, Python, Scala, Go, PHP, Node.js, Perl, Ruby, etc.

• Java:• Hector wraps the underlying Thrift API. Hector is one of the most

commonly used client libraries. • Astyanax is a client library developed by Netflix .• Datastax CQL – newest CQL driver, will be very familiar to JDBC

developers• And many more … (JPA)

• Also exists Datastax OPSCenter and other various GUI’s and REST API (Virgil)

Page 78: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

78

Cassandra Summary

Many More Topics / Information Related to C* not covered

Great for Fast Writes

No Single POF

Data Center Aware

Also Relative Ease Of Use

Page 79: Big Data, NoSQL with MongoDB and Cassasdra

NOSQL Intro with MongoDB and Cassandra

79

That’s All Folks

Questions?

Comments?

Thank You!!!!!! [email protected]