47
Getting Started with DataStax Enterprise A Technical Overview Confidential 1

Getting Started with DataStax Enterprise from a Technical Perspective

Embed Size (px)

DESCRIPTION

The requirements for building today’s online applications have changed. Implementing legacy technology hinders your ability to innovate, ensure application performance, and meet the demands of your customers. So how do you determine what underlying systems are the right fit for your needs? Join us as we review the following to help you get started with DataStax Enterprise: - What is Cassandra and why should you care? - What is DataStax Enterprise and how does it differ from Cassandra? - What are the steps to evaluating DataStax Enterprise? - Valuable resources to get up to speed on Cassandra and DataStax Enterprise

Citation preview

Page 1: Getting Started with DataStax Enterprise from a Technical Perspective

Getting Started with DataStax Enterprise

A Technical Overview

Confidential 1

Page 2: Getting Started with DataStax Enterprise from a Technical Perspective
Page 3: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 3

Why Cassandra?

Why DataStax Enterprise?

How to Evaluate?

Page 4: Getting Started with DataStax Enterprise from a Technical Perspective

Confidential 4

Why Cassandra?

Page 5: Getting Started with DataStax Enterprise from a Technical Perspective

What is Apache Cassandra?

Apache Cassandra™ is a massively scalable NoSQL database.

• Continuous availability• High performing writes and reads• Linear scalability• Multi-data center support

Page 6: Getting Started with DataStax Enterprise from a Technical Perspective

10

50

3070

80

40

20

60

Client

Client

Replication Factor = 3

We could still retrieve the data from the other 2 nodes

Token Order_id

Qty Sale

70 1001 10 100

44 1002 5 50

15 1003 30 200

Node failure or it goes down temporarily

Cassandra is Fault Tolerant

Page 7: Getting Started with DataStax Enterprise from a Technical Perspective

Source: Netflix Tech Blog

Netflix Cloud Benchmark…

“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.”Source: Solving Big Data Challenges for Enterprise Application Performance Management benchmark paper presented at the Very Large Database Conference, 2013.

End Point Independent NoSQL BenchmarkHighest in throughput…

Lowest in latency…

The NoSQL Performance Leader

Page 8: Getting Started with DataStax Enterprise from a Technical Perspective

Linearly Scalable

10

50

3070

80

40

20

60

10

30

2040100,000 txns

per sec

200,000 txns

per sec

400,000 txns/

per sec

Simply add nodes to double, quadruple performance and capacity

10

20

Page 9: Getting Started with DataStax Enterprise from a Technical Perspective

Client

10

50

3070

80

40

20

60

Client

15

55

3575

85

45

25

65

East Data CenterWest Data Center

10

50

3070

80

40

20

60

Data Center Outage Occurs

No interruption to the business

Multi Data Center Support

Page 10: Getting Started with DataStax Enterprise from a Technical Perspective

Built for Modern Online Applications

• Architected for today’s needs• Linear scalability at lowest cost• 100% uptime• Operationally simple

Page 11: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 11

Why Cassandra?

• Scale with ease• Always on• Deploy across data centers

Page 12: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 12

Why Cassandra?

Why DataStax Enterprise?

• Scale with ease• Always on• Deploy across data centers

Page 13: Getting Started with DataStax Enterprise from a Technical Perspective

Confidential 13

DataStax deliversApache Cassandra to the Enterprise

Page 14: Getting Started with DataStax Enterprise from a Technical Perspective

DataStax supports both the open source community and modern business enterprises.

Why DataStax?

Open Source DataStax Enterprise

Apache Cassandra (Cassandra Chair and 30% of committers)

Community Edition Enterprise Edition(Tested & Certified for Production)

OpsCenter Standard Enterprise (Alerts, Automated Management Services, Cluster

Management)

DevCenter

Drivers/Connectors

Online Documentation

Online Training

Mailing Lists and Forums

Security Standard Enterprise(Kerberos Authentication & SSL Encryption)

Built-in Real-time Analytics

Built-in Enterprise Search

In-Memory Database Option

Expert Support (24x7x365)

Consultative Support

Onsite Training

Page 15: Getting Started with DataStax Enterprise from a Technical Perspective

• Visual browser-based UI• Point-and-click administration• Visual cluster management• Proactive alerts• Built-in external notifications• Visual backup operations

DataStax OpsCenter

Page 16: Getting Started with DataStax Enterprise from a Technical Perspective

Cassandra Query Language (CQL)

DataStax DevCenter – a free, visual query tool for creating and running CQL statements against Cassandra and DataStax Enterprise.

Page 17: Getting Started with DataStax Enterprise from a Technical Perspective

Internal Authentication

Internal validation of authorized users

Simple to implement & easy to understand

No learning curve

Object Permission Management

Deep control over who can add/change/delete/read data

Uses familiar GRANT/REVOKE from relational world

No learning curve

Client to Node Encryption

Ensures data cannot be captured/stolen in route to a server

Data is safe both in flight from/to a database and on the database

Complete coverage is ensured

Cassandra Security

Page 18: Getting Started with DataStax Enterprise from a Technical Perspective

External Authentication

External validation of authorized users

Leverages Kerberos & LDAP)

Single sign-on to all data domains

Transparent Data Encryption

Protects sensitive data at rest via SSL

No changes needed at application level

Encrypt both Cassandra and Hadoop data

Data Auditing

Audit trail of all accesses and changes

Control to audit only what’s needed

Uses log4j interface to ensure performance & efficient audit operations

DataStax Enterprise Security

Page 19: Getting Started with DataStax Enterprise from a Technical Perspective

• Delivers Solr integration • Very fast performance • Search indexes span

multiple data centers (regular Solr cannot)

• Online scalability via adding new nodes

• Built-in failover; continuously available

Built-in Enterprise Search

C* &

Solr

C* &

Solr

C* &

Solr

C* &

Solr

Page 20: Getting Started with DataStax Enterprise from a Technical Perspective

• Real-time analytics on Cassandra hot data

• MapReduce, Hive, Pig, Sqoop, and Mahout

• No single points of failure

Built-In Enterprise Analytics

Enterprise

Analytics

MapReduce, Hive, Pig, More

Continuous

availability

Integrated big data

platform

C* & Hadoo

p

C* & Hadoo

p

C* & Hadoo

p

C* & Hadoo

p

Page 21: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 21

Why Cassandra?

Why DataStax Enterprise?

• Scale with ease• Always on• Deploy across data centers

• Enterprise-ready capabilities• 24x7x365 support

Page 22: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 22

Why Cassandra?

Why DataStax Enterprise?

• Scale with ease• Always on• Deploy across data centers

• Enterprise-ready capabilities• 24x7x365 support

How to Evaluate?

Page 23: Getting Started with DataStax Enterprise from a Technical Perspective

Evaluation Process

Download & install binaries or sandbox

Leverage use cases to identify needs

Install DSE/OpsCenter on servers

Design/Modify data model

Implement data model

Load sample data

Stress test servers

Develop application

1) R&D Mode2) POC Cycle

3) Optimize

Add Nodes(C*, SOLR, and/or

Hadoop)

Page 25: Getting Started with DataStax Enterprise from a Technical Perspective

Tailored to Meet Your Needs

Confidential 25

FREE Resources

PAID Services

DSE Sandbox

DSE for Non-Production

OpsCenter (Standard)

DevCenter

DataStax Academy

Community Forums

White Papers &Documentation

Onsite Consulting

Remote Consulting

Onsite Training

Public Training

PAID Subscription

Production DSE Pro

Production DSE Standard

Non-Production DSE Max

Non-Production DSE Pro

Non-ProductionDSE Standard

Production DSE Max

PAID Bundles

Quick StartEnterprise

Quick StartStandard

Customer Success Manager

Proactive Guidance

Free Health Check

Free Migration Assessment

Monthly Bulletin Best Practices

Customer Benefits

Page 26: Getting Started with DataStax Enterprise from a Technical Perspective

The Right Mix of Support Resources

Confidential 26

Education & Training Planning & Design Develop & Test

Training Consulting Support

How to use DataStax Enterprise

Learn DataStax admin features

How to use integrated search

How to use integrated analytics

DataStax Enterprise architecture

Data modeling with DataStax

Cluster tuning and performance

Best practices and planning

Troubleshooting errors

Experiencing unexpected results

Clarification on documentation

Critical issue support

Production Support

Page 27: Getting Started with DataStax Enterprise from a Technical Perspective

Available Online Resources

• Patrick McFadin’s data modeling series• CQL/Data modeling on DataStax• Virtual training• Java driver sample code• SOLR documentation and tutorial on DataStax• Analytics documentation• Github code samples• Advance time series best practices

MassivelyScale a DB!

Page 28: Getting Started with DataStax Enterprise from a Technical Perspective

Agenda

Confidential 28

Why Cassandra?

Why DataStax Enterprise?

• Scale with ease• Always on• Deploy across data centers

• Enterprise-ready capabilities• 24x7x365 support

How to Evaluate?

• Evaluate efficiently

Page 29: Getting Started with DataStax Enterprise from a Technical Perspective

Q&A and Next Steps

Confidential 29

Want to learn more about the evaluation process?• Contact your account manager or email us at

[email protected]

Want access to more Cassandra resources?• Visit Planet Cassandra at www.planetcassandra.com

Page 30: Getting Started with DataStax Enterprise from a Technical Perspective

Appendix

Page 31: Getting Started with DataStax Enterprise from a Technical Perspective

EC2 Install Process with Linux AMI’s

• Read through ec2 production planning: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningEC2_c.html

• Go for i2.2xlarge to i2.4xlarge • Create security group: http://

www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/install/installAMIsecurity.html

• Pick a reputable reliable Linux flavored image to start with - preferably an image with the 3.x kernel on it

• Run through the wizard and start AMI's up• Install the prereq's: http://

www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJREJNAabout_c.html

• Install dse node (depends on OS): http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/install/installTOC.html

• Following the "what's next at the bottom of installation instructions, including configuring dse node multidc or single dc (topology should be planned for): http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/deploy/deploySingleDC.html#deploySingleDC or http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/deploy/deployMultiDC.html#deployMultiDC

• Follow and set recommended production settings: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html

Page 32: Getting Started with DataStax Enterprise from a Technical Perspective

Cassandra Architecture Basics – One NodeOrganizes Data in Partitions

Inserted data is written to a Commit Log

As well as a MemTable

MemTables are flushed to disk in an SSTable based on size.

SSTables are immutable

Changes to a partition are written to additional SSTables.

Deletes write tombstones

Node 1Row Data

Partition Key

75

Row DataPartition

Key

9

Page 33: Getting Started with DataStax Enterprise from a Technical Perspective

Background – How Cassandra Stores Data

Model brought from BigTable*Partition key and a lot of cellsCell names sorted (UTF8, Int, Timestamp, etc)• CQL creates timestamp if not specified

Partition key

Cell Name ... Cell Name

Cell Value Cell Value

Timestamp Timestamp

TTL TTL

1 2 Billion

©2013 DataStax Confidential. Do not distribute without consent. 33

Page 34: Getting Started with DataStax Enterprise from a Technical Perspective

Node 1

Node 2

Node 5

Node 3

Node 4

Row Data2

3

Row Data7

6

Row Data2

3Row Data2

3

Row Data7

6

Row Data7

6

Cassandra Architecture Basics – Multi Data Center

• Nodes can be arranged in multiple data centers

• Cassandra replicates data efficiently between remote data centers

• Each data center can have a different RF

• Use data centers to segment nodes for different query patterns

Boston

San FranciscoReal

Time

Analytics

Page 35: Getting Started with DataStax Enterprise from a Technical Perspective

Reading Data

©2013 DataStax Confidential. Do not distribute without consent. Slide 35

/* Demonstrate an easy way to query data. */

try { ResultSet result = session.execute ( "SELECT password from user " +

"WHERE username = 'user2';"); if (result.isExhausted())

return; Row user = result.one();

System.out.println("Password is: " + user.getString("password"));

} catch (NoHostAvailableException ex) {

System.out.println("No Host Available");} catch (QueryValidationException ex) {

System.out.println(“Requested consistency” + “level not met”);}

Page 36: Getting Started with DataStax Enterprise from a Technical Perspective

©2013 DataStax Confidential. Do not distribute without consent. Slide 36

Prepared Statements

PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " +

"VALUES (?, ?);");

BoundStatement boundStatement = new BoundStatement(statement);

try {

session.execute(boundStatement.bind("user4”,"user4password"));

} catch (NoHostAvailableException ex) { System.out.println("Host Not Available");} catch (QueryExecutionException ex) { System.out.println (”Syntax error, runtime, not authorized");} catch (QueryValidationException ex) { System.out.println ("Requested consistency level not met");}

Page 37: Getting Started with DataStax Enterprise from a Technical Perspective

Query-Driven Data Modeling

©2013 DataStax Confidential. Do not distribute without consent.

37

Start by addressing the queries that you will need to answer• Your data should be able to match it directly

Think about:• The actions your application needs to perform

• How you want to access the data

• What are the use cases?

• What does the data look like?

Page 38: Getting Started with DataStax Enterprise from a Technical Perspective

Queries (cont)

What are you trying to retrieve• Does it need to be ordered?

• Is there any nesting of data?

• Do you need to group data?

• Do you need to filter data?

Does data expire?Does data need to be retrieved in chronological order?

©2013 DataStax Confidential. Do not distribute without consent. 38

Page 39: Getting Started with DataStax Enterprise from a Technical Perspective

Relational Concept - Denormalization

• Combine table columns into a single view• No joins• All in how you set the data for fast reads

Employees

SELECT First, Last, DeptFROM employeesWHERE id = ‘1’;

id First Last Dept

1 Edgar Codd Engineering

2 Raymond Boyce Math

©2013 DataStax Confidential. Do not distribute without consent. 39

Page 40: Getting Started with DataStax Enterprise from a Technical Perspective

• Examples: medical device, energy devices/equipment, financial data• Application for sensors, clickstreams, historical data• Typical very high volume writes required• Usually coupled with need to analyze data or search using real-time

analytics• Great fit for DSE Cassandra, SOLR, Analytics Nodes

Time Series – Patterns

©2013 DataStax Confidential. Do not distribute without consent. Slide 40

StationID

Timestamp

Value/s

Timestamp

Value/s

1…N

FLGAZ101

20130611T01:01:01

74.34

20130611T01:01:11

74.28

20130611T01:01:21

74.41

Page 41: Getting Started with DataStax Enterprise from a Technical Perspective

Hardware• Ideal node:

• Processor: CPU 8 cores, • Memory: RAM 16 - 64 GB, with 8 GB of Heap, • Network: at least a Gigabit card, • Disks: lots of small disks using JBOD or basic RAIDs

(0 or 10), but prefer SSDs• Exact needs vary by use case• Production planning:

• http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningHardware_c.html

Page 42: Getting Started with DataStax Enterprise from a Technical Perspective

Cassandra Query Language (CQL)

• Very similar to RDBMS SQL syntax• Create objects via DDL (e.g. CREATE…) • Core DML commands supported: INSERT,

UPDATE, DELETE• Query data with SELECT• Leverage Java drivers to execute queries via

PreparedStatements and ResultSets

SELECT * FROM USERSWHERE STATE = ‘TX’;

Page 43: Getting Started with DataStax Enterprise from a Technical Perspective

Client

SSTable

Memory

SSTables

Commit Log

Flush to Disk

Cassandra is Durable

Data is organized into Partitions

Inserted data is written to a Commit Log for a node

As well as a MemTable

MemTables are flushed to disk in an SSTable based on size.

SSTables are immutable

Page 44: Getting Started with DataStax Enterprise from a Technical Perspective

Overview of Replication in Cassandra

• Replication is controlled by what is called the replication factor. A replication factor of 1 means there is only one copy of a row in a cluster. A replication factor of 2 means there are two copies of a row stored in a cluster

• Replication is controlled at the keyspace level in Cassandra

Original row

Copy of row

Replication Factor (RF) determines additional nodes that get a copy of the partition Eg. RF=3

Copy of row

Page 45: Getting Started with DataStax Enterprise from a Technical Perspective

• The schema used in Cassandra is modeled after after Google Bigtable. It is a row-oriented, column structure

• A keyspace is akin to a database in the RDBMS world• A column family is similar to an RDBMS table but is

more flexible/dynamic• A row in a column family is indexed by its key

ID Name SSN DOB

Portfolio Keyspace

Customer Column Family

Data Model

Page 46: Getting Started with DataStax Enterprise from a Technical Perspective

Tunable Data Consistency• Choose between strong and eventual

consistency (one to all responding) depending on the need

• Can be done on a per-operation basis, and for both reads and writes

• Handles multi-data center operations

• Any• One• Quorum• Local_Quorum• Each_Quorum• All

Writes• One• Quorum• Local_Quorum• Each_Quorum• All

Reads

Page 47: Getting Started with DataStax Enterprise from a Technical Perspective

Thank You