Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Modern Data Architecture for
Today’s Application Needs
Alexander Gauthier
Principal Solutions Engineer
Strategic Accounts
About This
Presenter…Southern California
DataStax – SE, Strategic Accounts
Hortonworks – Solutions Engineer
Teradata – Engineering, Pre-sales
Aster Data – Principal Engineer
Informatica – Senior Engineer
© DataStax, All Rights Reserved.2
Legacy approach
to HA/DR
• Expensive DB
• Expensive Shared
Storage
• Additional Replication
Solution (SRDF, Golden
Gate, Hitachi, etc)
• Expensive Clustering
Solution (VRTS Cluster
Server, others)
• Now Double all that!
Veritas Cluster Server Service
© DataStax, All Rights Reserved.3
Let’s takea moment
© 2017 DataStax, All Rights Reserved. Company Confidential
billions
© 2017 DataStax, All Rights Reserved. Company Confidential
CONTEXTUAL
Requirements for today’s applications
ALWAYS-ON DISTRIBUTED SCALABLEREAL-TIME
© 2017 DataStax, All Rights Reserved. Company Confidential
Apps have changed
7
Client/Server Cloud1990s Today
Web
2000s
And what powers them
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.8
And what powers them
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.9
Scale-out App Layer
Scale-out Data Layer
Cloud Application Characteristics
© DataStax, All Rights Reserved.10
Real-Time DistributedAlways-OnContextual Scalable
Powering Cloud Applications
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.11
Effortless scale
Always-on● Designed to handle any failure,
no matter how catastrophic.
● Take advantage of every
opportunity.
● Focus on what matters most to
you.
Instant insight● Built into your application to
create actionable, modern
experiences.
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.12
How it works
DSE Architecture
© DataStax, All Rights Reserved.13
Analytical SQL
APACHE CASSANDRA™STORAGE FOR ANY TYPE OF DATA - Fault-tolerant, Scalable, Performant, Secure, Unified
Transactional AnalyticsTransactional Analytics
DataCenter 1 DataCenter 2
Analytics
SQL
Machine LearningSearch
JDBC/ODBC
RESTful
APIGraph Analytics
Application Layer
Customer experience – Fraud Detection – IoT – Recommendation Engine – Enterprise Search
Terminology
• Node: A single instance
• Rack: A logical grouping of nodes (optional)
• Data Center: A logical grouping of racks or nodes
• Cluster: A logical grouping of data centers (1 to N)
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.14
DC1 DC2
RAC2
RAC1
Read/Write Anywhere
• Cassandra has a ‘location independence’
architecture, which allows any user to connect
to any node in any data center and read/write
the data they need
• All writes are automatically evenly
partitioned/distributed across the nodes and
replicated automatically throughout the cluster
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the
United States and/or other countries.15
10
50
3070
80
40
20
60
Client
We can still retrieve the data
from the other 2 nodes
Node fails or goes down temporarily
No single point of failure
• Best in class fault tolerance
• Replication automatically handled
• Remains operationally simple at scale
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.16
Multi-data center
• Replicate data across data
centers or cloud availability zones
• No interruption to the business with
any outage
• Global low-latency performance
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.17
10
50
3070
80
40
20
60
Clie
nt
15
55
3575
85
45
25
65
West Data CenterEast Data Center
Outage
10
50
3070
80
40
20
60
Client
Flexible deployment options
Cloud: take full advantage of the clouds
elasticity and global availability. With easy
migrations to any and every cloud provider you’ll
never be locked in.
Hybrid: have the best of both worlds, spanning a
single cluster across on-premises and cloud.
Hub and spoke: have a central hub with many
spokes. Perfect for intermittent connections,
compliance, or optimizing for location needs.
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.18
Linear scalability
• Data partitioned among all nodes in the cluster
• Linear scalability (performance / storage)
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.19
50,000 trans/sec
500 GB 100,000 trans/sec
1 TB
200,000 trans/sec
2 TB
Datastax Reference Architecture
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.20
HTTP Application Message Queue
StreamingAnalytics
BatchAnalytics
Real-time
Hybrid Multi-Cloud Architectures
• Distribute Fault
• Negotiate Locality
• Regional Compliance
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.21
AWS AZURE GCPPHYSICAL
DC
Replication Replication Replication
App
Tier
Cassandra Data Model
• Row-oriented, column structure
• Keyspace: similar to a database in
the RDBMS world
• Table: similar to an RDBMS table
but more flexible/dynamic
• A row in a column family is indexed by
its key.
• Other columns may be indexed as well
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.22
ID Name SSN DOB
Portfolio Keyspace
Customer Table
Cassandra Query Language (CQL)
• Syntax similar to RDBMS SQL
• Create objects via DDL
For example:
CREATE, INSERT, UPDATE,
DELETE, GRANT, REVOKE
SELECT, WHERE
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.23
CQL Example
CREATE TABLE market_prices ( symbol TEXT,date TIMESTAMP,price DECIMAL,side INT,PRIMARY KEY (symbol, date)
) WITH CLUSTERING ORDER BY(date DESC);
Write Path
DataStax is a registered trademark of DataStax, Inc. and its
subsidiaries in the United States and/or other countries.24
F
L
U
S
HID NAME DOB
AB1 John Smith 10/11/1972
AB2 Bob Jones 3/1/1964
ZZ3 Mike West 4/22/1968
IN MEMORY
ON DISK
MEMTABLE
COMMIT LOG
ID NAME DOB
ID NAME DOB
AB3 Mary Smith 1/11/1982
AB4 Jane Hess 3/1/1992
AB1 Jonny Smith 10/11/1972
ID NAME DOB
AB3 Mary Smith 1/11/1982
AB4 Jane Hess 3/1/1992
AB1 Jonny Smith 10/11/1972
ZZ3 Mike West 4/22/1968
AB2 Bob Jones 3/1/1964
F
L
U
S
H
SSTABLES
sequential
Putting it to use
Indexing
& Search
Streaming
Analytics
Graph
Batch
Analytics
Integrated Multi-Model/Mixed Workload Platform
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.34
DSE Search
Live indexing engine with powerful search
• Automatic indexing on insert
• Higher ingestion throughput
• Distributed query optimization
Compared to self-managed:
• No separate search cluster to manage
• Probably less total hardware required
• No “Split Brain” data inconsistencies
• No ETL or synch to build and maintain
• No app level data management codeDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.35
Search
+
Cassandra
Your Application
CQL
DSE Analytics
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.36
Your Application
Real Time OperationsCassandra
Analytics
Analytics
Queries
Your AnalyticsReal Time Replication
Single DSE Custer
Streaming, ad-hoc, and batch
• High-performance
• Workload management
• SQL reporting
Compared to self-managed:
• No ETL
• True HA without Zookeeper
DSE Graph
A scalable, distributed graph database that is optimized for storing, traversing
and querying complex graph data in real time
• Value data between relationships
• DSE Analytics and Search integrated
• Perfect for use cases: Customer360,
Recommendations, Fraud Detection
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.37
©2016 DataStax
• A key difference between a graph database and an RDBMS is how relationships between
entities/vertexes are prioritized and managed.
• While an RDBMS uses foreign keys to connect entities in a secondary fashion, edges (the relationships)
in a graph database are of first order importance.
• Relationships are explicitly embedded in a graph data model.
• A graph-shaped business problem is one in which the concern is with the relationships (edges) among
entities (vertexes) than with the entities in isolation.
RDBMS Graph
An identifiable “something” or object to keep track of Entity Vertex
A connection or reference between two objects Relationship Edge
A characteristic of an object Attribute Property
RDBMS vs. Graph
©2016 DataStax
RDBMS vs. GraphRDBMS Graph
Simple to moderate data complexity Heavy data complexity
Hundreds of potential relationships Hundreds of thousands to millions or billions of potential relationships
Moderate JOIN operations with good performance Heavy to extreme JOIN operations required
Infrequent to no data model changes Constantly changing and evolving data model
Static to semi-static data changes Dynamic and constantly changing data
Primarily structured data Structured and unstructured data
Nested or complex transactions Simple transactions
Always strongly consistent Tunable consistency (eventual to strong)
High availability (handled with failover) Continuous availability (no downtime)
Centralized application that is location dependent (e.g. single
location), especially for write operations and not just read
Distributed application that is location independent (multiple locations
involving multiple data centers and/or clouds) for write and read
operations
Scale up for increased performance Scale out for increased performance (for some graph DB’s)
Intelligent
Data Layer
• Logic
• Learning
• Understanding
• Reasoning
• Retention
© DataStax, All Rights Reserved.40
Deloitte:
Mission Graph
1. Explore and Analyze: Connect the dots
across multiple datasets with a single
search.
2. 2. Visualize Sophisticated Networks:
Create network diagrams that illustrate
relationships and identifyassociations.
3. Enrich Analysis with Public Data: Link
open-source data to help improve insight
into high-valueentities.
4. Link Unstructured Relationships: Extract
unstructured data and pair it to structured
data.
5. Machine Learning: Continuously improve
targeting algorithms through intelligent
machine learning. The more you use
Mission Graph, the smarter it gets.
6. Proactively Manage Network Risk:
Diagnose how influencers and events
create risk to identify similar patterns and
help prevent future incidents.© DataStax, All Rights Reserved.41
Build and Manage
Advanced security
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.47
External Authentication
External validation of authorized users
Leverages Kerberos & LDAP/AD
Single sign-on to all data domains
Transparent Data Encryption
Protects sensitive data at rest
via encryption
No changes needed at app
level
Data Auditing
Audit trail of all accesses and
changes
Control to audit only what’s
needed
Uses log4j interface to ensure
performance & efficient audit
Build how you want
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.48
Drivers
• ODBC
• JDBC
Friendly GUI for CQL developers
• Visually Create and
Navigate Database
Objects
• Tune Queries for
Faster Performance
• Powerful Context-
Aware Editor
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.49
DevCenter
Explore, query, and analyze DSE Graph
• Gremlin Query
Language
• Auto-completion,
result set
visualization,
execution
management, and
much more.
• Friendly Fluent API
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.50
Studio
Visual management for DSE
• Automate what no
one likes – backups,
repairs
• REST API to work in
your world
• Instantly manage your
cluster, scaling up or
down at a moment’s
notice
• Monitor your cluster
and follow best
practices, ensuring a
secure environment
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.51
OpsCenter
Q&A
59 CONFIDENTIAL - © DataStax, All Rights Reserved.