25
How RightScale Architects its Databases (for World-wide Scale, HA and DR scenarios) Josep Blanquer Senior Systems Architect, RightScale

How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

Embed Size (px)

Citation preview

Page 1: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

How RightScale Architects its Databases(for World-wide Scale, HA and DR scenarios)

Josep Blanquer

Senior Systems Architect, RightScale

Page 2: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#2#2

Talk with the Experts.

Menu

Intro

Data Taxonomy

Data Storage DesignScale, HA and DR

Conclusion

Page 3: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#3#3

Talk with the Experts.

Intro: Expectations and scope

What this is and what is not

• IS a talk about:

• how RightScale has designed and implemented its backing datastores

• …for a few of the most representative internal systems

• …with the rationale behind it

• Is NOT a talk about

• RightScale’s overall architecture

• Nodes or hosts, it’s about Systems

• RightScale’s data modeling

Note: Most of the design is implemented and in production but some of the

most advanced things that are still in beta, or we’re still being worked on

Page 4: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#4#4

Talk with the Experts.

Intro: Tools and Technologies

• RightScale uses a mix of RDBMS and NoSQL technologies:

• MySQL , Cassandra and S3 (for backups and archiving)

• Transactionality:

• MySQL: strong ACID properties

• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable

• Availability:

• MySQL: async replication. Master-SlaveN or Master-Master

• Cassandra: Distributed, master-less, highly-replicated (multi-DC)

• Sharding:

• MySQL: no explicit inter-node tools. (Sharding done by application)

• Cassandra: partitions data internally across nodes.

Page 5: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#5#5

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Page 6: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#6#6

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Common across accounts:

Users

Plans

Settings

MultiCloud Marketplace: Published Assets

Sharing Groups

Page 7: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#7#7

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Deployments

Imported assets

Alert Specifications

Server Inputs Audit

Tags

User Events

Page 8: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#8#8

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Cloud resource states (cache)

Cloud credentials

Page 9: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#9#9

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Instance agents location

Core agents location

Agent action registry

Page 10: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#10#10

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Collected metric data

Collected syslog data

Page 11: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#11#11

Talk with the Experts.

Use

rs

Taxonomy of RightScale’s Data

Insta

nce

s

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data?• Users through the Dash/API

• Instances from the Cloud

Data close to the Users

Data close to the Cloud

Data Placement

Page 12: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#12#12

Talk with the Experts.

Taxonomy of RightScale’s DataX

-acct

Acco

un

t

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Which data do we need?• Data for all accounts

• Data for a single account

Data shared between accounts

Data required within scope

of a single account

Data scope and containment

Page 13: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#13#13

Talk with the Experts.

Use

rs

Taxonomy of RightScale’s Data

Insta

nce

s

X-a

cct

Acco

un

t

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data? Proximity to User vs. Cloud

Which data do we need? Scope of data available

Close to cloud resources

Account-shardable* data

Close to user

Account-shardable data

Close to user

Globally accessible data

Page 14: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#14

Talk with the Experts.

Use

rsIn

sta

nce

s

AccountX-Account

Page 15: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#15

Talk with the Experts.

Use

rsIn

sta

nce

s

global

X-Account

Custom replication

Why custom? More control• Multiple sources

• Individual columns

• Apply transformations

• Smart re-sync features

Global: MySQL• ACID semantics

• Master-Slave replication

Page 16: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#16

Talk with the Experts.

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

Dashboard: MySQL• ACID semantics

• Master-SlaveN replication

• Slave reads

• Rows tagged by account

Other systems: Cassandra• Simpler Key-Value access

• Great scalability

• Great replica control

• High write availability

• Time-to-live expiration as cache

• Rows tagged by account

Data archive: S3• Low read rate

• Globally accessible

Page 17: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#17

Talk with the Experts.

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

dash

events

tags

audit

So we can horizontally scale our

dashboard by partitioning objects

based on account groups:

Clusters

Page 18: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#18

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Account Set 1 Account Set 2

RightScale Accounts

Clu

ste

r 3

dash

S3

events

tags

audit …

Features:• 1 cluster: N accounts

• 1 account: 1 home

• Migratable accounts

Benefits:• Great horizontal growth

• Better failure isolation

• Independent scale

• Load rebalancing

• Versionable code

• Differentiated service

Page 19: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#19

Talk with the Experts.

dash

events

tags

audit

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

routing

gateway

monitor

X-Account

Page 20: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#20

Talk with the Experts.

routing

gateway

monitor

routing

gateway

monitor

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

And partition our cloud objects based on the cloud

the instances of an account run on:

Islands

Page 21: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#21

Talk with the Experts.

Cloud 1 Cloud 2 Cloud N

Account

Insta

nce

s

Services co-located

with resourcesServices co-located

with resources

Services co-located

with resources

Features:• 1 instance: 1 home island

• 1 Island can serve N clouds

• Core Agents: global data

Benefits:• Close to cloud resources

• Good failure isolation

• As good as cloud

• Good scale: global replicas

across cassandra DCs

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Gateway: MySQL• Master-Slave replication

• Can port to NoSQL easily

• Mostly a resource cache

• But cloud partitionable

Monitoring: Custom• Replicated files

• Backup to S3

• Archive to S3

Routing: Cassandra• Simpler Key-Value access

• Very high availability

• Great scalability

• Great replica control

• Plus cross DC replication*

Page 22: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#22

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Clu

ste

r 3

dash

S3

events

tags

audit …

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Insta

nce

s

Different Geographies

Different Clouds

What if the cloud

where the cluster

is deployed on…

Fails?

Page 23: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#23

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Clu

ste

r 3

dash

S3

events

tags

audit …

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Insta

nce

s

Sister Clusters

Full replica

Features:• Each master has an extra remote slave

• Each cluster in a pair is a DC replica of the other’s

localring

At Disaster Recovery time:• Apps are told to start serving an extra shard

• No need to provision more infrastructure to recover

(try to avoid since everybody is on the same boat)

• New resources can be allocated over time to help

offload existing ones

Page 24: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#24#24

Talk with the Experts.

Conclusions

• Shown that RightScale uses multiple database technologies:

• RDBMS – MySQL for the ACID semantics and ‘queryability’• Using a Master to N-Slaves for RO scale, and quick failure recovery

• And ReadOnly Provisioning – To increase RO availability and scale remote systems

• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster

• For fully replicated regions across the globe (for Read/Write!)

• Shown how RightScale uses them in different techniques

• It partitions resource data into Islands based on cloud proximity• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances

• Can provide routing availability, colocated with instances for any world region

• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account isolation/differentiation

• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources

• It maintains cluster pairs (sister sites)• To recover from full cloud region failures

• It doesn’t require massive amounts of new resources to recover

Page 25: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

Talk with the Experts.

Questions?