View
232
Download
4
Category
Tags:
Preview:
Citation preview
april25-26 sanfrancisco
cloud success starts here
Building RightScale’s Globally Distributed Datastore
Josep M. Blanquer, Chief Architect
# 2# 2
#RightscaleCompute
In this talk…
• Intro• Data Taxonomy • Data Storage Design
• Scale, HA and DR considerations
• Conclusion
# 3# 3
#RightscaleCompute
Intro: Expectations and scope
What this is and what is not• IS a talk about:
• how RightScale has designed and implemented its backing datastores
• …for a few of the most representative internal systems• …with the rationale behind it
• Is NOT a talk about• RightScale’s overall architecture• Nodes or hosts, it’s about Systems• RightScale’s data modeling
# 4# 4
#RightscaleCompute
Intro: Tools and Technologies• RightScale uses a mix of RDBMS and NoSQL
technologies:• MySQL , Cassandra and S3 (for backups and archiving)
• Transactionality:• MySQL: strong ACID properties• Cassandra: no Atomicity, eventually Consistent, some Isolation,
Durable
• Availability:• MySQL: async replication. Master-SlaveN or Master-Master• Cassandra: Distributed, master-less, highly-replicated (multi-DC)
• Queryability:• MySQL: Extremely flexible at adding indexes and changing data
model• Cassandra: More difficult to change the querying patterns
# 5# 5
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
# 6# 6
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Common across accounts: Users Plans Settings MultiCloud Marketplace:
Published Assets Sharing Groups …
# 7# 7
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Deployments Imported assets Alert Specifications Server Inputs Audit Tags User Events …
# 8# 8
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Cloud resource states (cache) Cloud credentials
# 9# 9
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Instance agents location Core agents location Agent action registry …
# 10# 10
#RightscaleCompute
Taxonomy of RightScale’s Data
Representative systems with different data semantics:
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Private to each account: Collected metric data Collected syslog data …
# 11# 11
#RightscaleCompute
Taxonomy of RightScale’s DataU
sers
Inst
an
ces
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data?• Users through the Dash/API• Instances from the Cloud
Data close to the Users
Data close to the Cloud
Data Placement
# 12# 12
#RightscaleCompute
Taxonomy of RightScale’s DataX
-acc
tA
cco
un
t
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Which data do we need?• Data for all accounts• Data for a single account
Data shared between accounts
Data required within scopeof a single account
Data scope and containment
# 13# 13
Talk with the Experts.
Use
rs
Taxonomy of RightScale’s DataIn
sta
nce
s
X-a
cct
Acc
ou
nt
Global Objects Marketplace Assets
Dashboard Objects Audits Tags Recent Events
Cloud Polling Data
Routing Data
Monitoring/Syslog
Who uses the data? Proximity to User vs. Cloud
Which data do we need? Scope of data available
Close to cloud resourcesAccount-shardable* data
Close to userAccount-shardable data
Close to userGlobally accessible data
# 14# 14
#RightscaleCompute
Use
rsIn
sta
nce
s
AccountX-Account
# 15# 15
#RightscaleCompute
Use
rsIn
sta
nce
s
global
X-Account
Custom replication
Why custom? More control• Multiple sources• Individual columns• Apply transformations• Smart re-sync features
Global: MySQL• ACID semantics• Master-Slave replication
# 16# 16
#RightscaleCompute
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
Dashboard: MySQL• ACID semantics• Master-SlaveN replication• Slave reads• Rows tagged by account
Other systems: Cassandra• Simpler Key-Value access• Great scalability• Great replica control• High write availability• Time-to-live expiration as cache• Rows tagged by account
Data archive: S3• Low read rate• Globally accessible
# 17# 17
#RightscaleCompute
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
dash
events
tags
audit
So we can horizontally scale our dashboard by partitioning objects
based on account groups:
Clusters
# 18# 18
#RightscaleCompute
Use
rs
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Account Set 1 Account Set 2
RightScale Accounts
Clu
ster
3
dash
S3
events
tags
audit …
Features:• 1 cluster: N accounts• 1 account: 1 home• Migratable accounts
Benefits:• Great horizontal growth• Better failure isolation• Independent scale• Load rebalancing• Versionable code• Differentiated service
US Eas
t
EU Ja
pan
# 19# 19
#RightscaleCompute
dash
events
tags
audit
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
routing
polling
monitor
X-Account
# 20# 20
#RightscaleCompute
routing
polling
monitor
routing
polling
monitor
Use
rsIn
sta
nce
s
Account
global dash
S3
events
tags
audit
X-Account
And partition our cloud objects based on the cloud the instances of an account run on:
Islands
# 21# 21
#RightscaleCompute
Account
Inst
an
ces
Services co-locatedwith resources
Services co-locatedwith resources
Services co-locatedwith resources
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
routing
polling
monitor
routing
polling
monitor
Cloud 1 Cloud 2 Cloud N
# 22# 22
#RightscaleCompute
Account
Inst
an
ces
Features:• 1 instance: 1 home island• 1 Island can serve N clouds• Core Agents: global data
Benefits:• Close to cloud resources• Good failure isolation
• As good as cloud • Good scale: global replicas across cassandra DCs
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Polling Clouds: MySQL• Master-Slave replication• Can port to NoSQL easily• Mostly a resource cache• But cloud partitionable
Monitoring: Custom• Replicated files• Backup to S3• Archive to S3
Routing: Cassandra• Simpler Key-Value access• Very high availability• Great scalability• Great replica control• Plus cross DC replication*
# 23# 23
#RightscaleCompute
Use
rs
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Clu
ster
3
dash
S3
events
tags
audit …
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Inst
an
ces
Azure
AWS E
ast
Privat
e
US Eas
t
Wes
t EU
Japa
n
Different Geographies
Different Clouds
What if the cloud where the clusteris deployed on…
Fails?What if the cloud where the islandis deployed on…
Fails?
# 24# 24
#RightscaleCompute
Use
rs
AccountC
lust
er 1
dash
S3
events
tags
audit
Clu
ster
N
dash
S3
events
tags
audit
Clu
ster
3
dash
S3
events
tags
audit …
US Eas
t
Wes
t EU
Japa
n
routing
polling
monitor
routing
polling
monitor
routing
polling
monitor
Isla
nd 1
Isla
nd 2
Isla
nd N
Inst
an
ces
Azure
AWS E
ast
Privat
e
Sister Clusters
Full replica
Features:• Each master has an extra remote slave• Each cluster in a pair is a DC replica of the other’s localring
At Disaster Recovery time:• Apps are told to start serving an extra shard• No need to provision more infrastructure to recover (try to avoid since everybody is on the same boat)
• New resources can be allocated over time to help offload existing ones
# 25# 25
#RightscaleCompute
Conclusions• Shown that RightScale uses multiple database
technologies:• RDBMS – MySQL for the ACID semantics and ‘queryability’
• Using a Master to N-Slaves for RO scale, and quick failure recovery• And ReadOnly Provisioning – To increase RO availability and scale remote systems
• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster• For fully replicated regions across the globe (for Read/Write!)
• Shown how RightScale uses them in different techniques• It partitions resource data into Islands based on cloud proximity
• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances
• Can provide routing availability, colocated with instances for any world region
• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account
isolation/differentiation• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud
resources
• It maintains cluster pairs (sister sites)• To recover from full cloud region failures• It doesn’t require massive amounts of new resources to recover
april25-26 sanfrancisco
cloud success starts here
Questions?
Recommended