Upload
rightscale
View
2.532
Download
0
Tags:
Embed Size (px)
Citation preview
How RightScale Architects its Databases(for World-wide Scale, HA and DR scenarios)
Josep Blanquer
Senior Systems Architect, RightScale
#2#2
Talk with the Experts.
Menu
Intro
Data Taxonomy
Data Storage DesignScale, HA and DR
Conclusion
#3#3
Talk with the Experts.
Intro: Expectations and scope
What this is and what is not
• IS a talk about:
• how RightScale has designed and implemented its backing datastores
• …for a few of the most representative internal systems
• …with the rationale behind it
• Is NOT a talk about
• RightScale’s overall architecture
• Nodes or hosts, it’s about Systems
• RightScale’s data modeling
Note: Most of the design is implemented and in production but some of the
most advanced things that are still in beta, or we’re still being worked on
#4#4
Talk with the Experts.
Intro: Tools and Technologies
• RightScale uses a mix of RDBMS and NoSQL technologies:
• MySQL , Cassandra and S3 (for backups and archiving)
• Transactionality:
• MySQL: strong ACID properties
• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable
• Availability:
• MySQL: async replication. Master-SlaveN or Master-Master
• Cassandra: Distributed, master-less, highly-replicated (multi-DC)
• Sharding:
• MySQL: no explicit inter-node tools. (Sharding done by application)
• Cassandra: partitions data internally across nodes.
#5#5
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
#6#6
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Common across accounts:
Users
Plans
Settings
MultiCloud Marketplace: Published Assets
Sharing Groups
…
#7#7
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Deployments
Imported assets
Alert Specifications
Server Inputs Audit
Tags
User Events
…
#8#8
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Cloud resource states (cache)
Cloud credentials
#9#9
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Instance agents location
Core agents location
Agent action registry
…
#10#10
Talk with the Experts.
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account:
Collected metric data
Collected syslog data
…
#11#11
Talk with the Experts.
Use
rs
Taxonomy of RightScale’s Data
Insta
nce
s
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data?• Users through the Dash/API
• Instances from the Cloud
Data close to the Users
Data close to the Cloud
Data Placement
#12#12
Talk with the Experts.
Taxonomy of RightScale’s DataX
-acct
Acco
un
t
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Which data do we need?• Data for all accounts
• Data for a single account
Data shared between accounts
Data required within scope
of a single account
Data scope and containment
#13#13
Talk with the Experts.
Use
rs
Taxonomy of RightScale’s Data
Insta
nce
s
X-a
cct
Acco
un
t
Global Objects
Marketplace Assets
Dashboard Objects
Audits
Tags
Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data? Proximity to User vs. Cloud
Which data do we need? Scope of data available
Close to cloud resources
Account-shardable* data
Close to user
Account-shardable data
Close to user
Globally accessible data
#14
Talk with the Experts.
Use
rsIn
sta
nce
s
AccountX-Account
#15
Talk with the Experts.
Use
rsIn
sta
nce
s
global
X-Account
Custom replication
Why custom? More control• Multiple sources
• Individual columns
• Apply transformations
• Smart re-sync features
Global: MySQL• ACID semantics
• Master-Slave replication
#16
Talk with the Experts.
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
Dashboard: MySQL• ACID semantics
• Master-SlaveN replication
• Slave reads
• Rows tagged by account
Other systems: Cassandra• Simpler Key-Value access
• Great scalability
• Great replica control
• High write availability
• Time-to-live expiration as cache
• Rows tagged by account
Data archive: S3• Low read rate
• Globally accessible
#17
Talk with the Experts.
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
dash
events
tags
audit
So we can horizontally scale our
dashboard by partitioning objects
based on account groups:
Clusters
#18
Talk with the Experts.
Use
rs
AccountC
luste
r 1
dash
S3
events
tags
audit
Clu
ste
r N
dash
S3
events
tags
audit
Account Set 1 Account Set 2
RightScale Accounts
Clu
ste
r 3
dash
S3
events
tags
audit …
Features:• 1 cluster: N accounts
• 1 account: 1 home
• Migratable accounts
Benefits:• Great horizontal growth
• Better failure isolation
• Independent scale
• Load rebalancing
• Versionable code
• Differentiated service
#19
Talk with the Experts.
dash
events
tags
audit
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
routing
gateway
monitor
X-Account
#20
Talk with the Experts.
routing
gateway
monitor
routing
gateway
monitor
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
And partition our cloud objects based on the cloud
the instances of an account run on:
Islands
#21
Talk with the Experts.
Cloud 1 Cloud 2 Cloud N
Account
Insta
nce
s
Services co-located
with resourcesServices co-located
with resources
Services co-located
with resources
Features:• 1 instance: 1 home island
• 1 Island can serve N clouds
• Core Agents: global data
Benefits:• Close to cloud resources
• Good failure isolation
• As good as cloud
• Good scale: global replicas
across cassandra DCs
routing
gateway
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
routing
gateway
monitor
routing
gateway
monitor
routing
gateway
monitor
routing
gateway
monitor
routing
gateway
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Gateway: MySQL• Master-Slave replication
• Can port to NoSQL easily
• Mostly a resource cache
• But cloud partitionable
Monitoring: Custom• Replicated files
• Backup to S3
• Archive to S3
Routing: Cassandra• Simpler Key-Value access
• Very high availability
• Great scalability
• Great replica control
• Plus cross DC replication*
#22
Talk with the Experts.
Use
rs
AccountC
luste
r 1
dash
S3
events
tags
audit
Clu
ste
r N
dash
S3
events
tags
audit
Clu
ste
r 3
dash
S3
events
tags
audit …
routing
gateway
monitor
routing
gateway
monitor
routing
gateway
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Insta
nce
s
Different Geographies
Different Clouds
What if the cloud
where the cluster
is deployed on…
Fails?
#23
Talk with the Experts.
Use
rs
AccountC
luste
r 1
dash
S3
events
tags
audit
Clu
ste
r N
dash
S3
events
tags
audit
Clu
ste
r 3
dash
S3
events
tags
audit …
routing
gateway
monitor
routing
gateway
monitor
routing
gateway
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Insta
nce
s
Sister Clusters
Full replica
Features:• Each master has an extra remote slave
• Each cluster in a pair is a DC replica of the other’s
localring
At Disaster Recovery time:• Apps are told to start serving an extra shard
• No need to provision more infrastructure to recover
(try to avoid since everybody is on the same boat)
• New resources can be allocated over time to help
offload existing ones
#24#24
Talk with the Experts.
Conclusions
• Shown that RightScale uses multiple database technologies:
• RDBMS – MySQL for the ACID semantics and ‘queryability’• Using a Master to N-Slaves for RO scale, and quick failure recovery
• And ReadOnly Provisioning – To increase RO availability and scale remote systems
• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster
• For fully replicated regions across the globe (for Read/Write!)
• Shown how RightScale uses them in different techniques
• It partitions resource data into Islands based on cloud proximity• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances
• Can provide routing availability, colocated with instances for any world region
• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account isolation/differentiation
• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources
• It maintains cluster pairs (sister sites)• To recover from full cloud region failures
• It doesn’t require massive amounts of new resources to recover
Talk with the Experts.
Questions?