RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Scale, HA and DR...

Preview:

Citation preview

#rightscale

How RightScale ArchitectsIts Databases

(for Worldwide Scale, HA and DR Scenarios)

January 30, 2013

Watch the recording of this webinar

#rightscale

# 2

Your Panel TodayPresenting• Rafael H. Saavedra, VP Engineering, RightScale• Josep Blanquer, Chief Architect, RightScale

Q&A • Jared Marcell, Account Manager, RightScale• David Manriquez, Account Manager, RightScale

Please use the “Questions” window to ask questions any time!

#rightscale

# 3

Menu

Intro

Data Taxonomy

Data Storage DesignScale, HA and DR

Conclusion

#rightscale

# 4

Intro: Expectations and scope

What this is and what is not• IS a talk about:

• how RightScale has designed and implemented its backing datastores• …for a few of the most representative internal systems• …with the rationale behind it

• Is NOT a talk about• RightScale’s overall architecture• Nodes or hosts, it’s about Systems• RightScale’s data modeling

Note: Most of the design is implemented and in production but some of the most advanced things that are still in beta, or are still being worked on

#rightscale

# 5

Intro: Tools and Technologies• RightScale uses a mix of RDBMS and NoSQL technologies:

• MySQL , Cassandra and S3 (for backups and archiving)

• Transactionality:• MySQL: strong ACID properties• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable

• Availability:• MySQL: async replication. Master-SlaveN or Master-Master• Cassandra: Distributed, master-less, highly-replicated (multi-DC)

• Sharding:• MySQL: no explicit inter-node tools. (Sharding done by application)• Cassandra: partitions data internally across nodes.

#rightscale

# 6

Glossary: Examples we will use

Marketplace Assets RightScripts

ServerTemplates

Configuration data objects that areuser-generated, private or shared

TagsResource data that drives automation and reporting

EventsData used to communicate recent events and news feeds to users

Cloud Polling and GatewayData that records actions and states of external API-linked services

RoutingData used to locate and transport messages across instances and/or our services

MonitoringInfrastructure monitoring data recorded and presented on behalf of users

#rightscale

# 7

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

#rightscale

# 8

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Common across accounts: Users Account Plans Settings MultiCloud Marketplace:

Published Assets Sharing Groups …

#rightscale

# 9

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Deployments Imported assets Alert Specifications Server Inputs Audit Tags User Events …

#rightscale

# 10

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Cloud resource states (cache) Cloud credentials

#rightscale

# 11

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Instance agents location Core agents location Agent action registry …

#rightscale

# 12

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account: Collected metric data Collected syslog data …

#rightscale

# 13

Taxonomy of RightScale’s DataX

-acc

tA

ccou

nt

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Which data do we need?• Data for all accounts• Data for a single account

Data shared between accounts

Data required within scopeof a single account

Data scope and containment

#rightscale

# 14

Use

rs

Taxonomy of RightScale’s DataIn

stan

ces

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data?• Users through the Dash/API• Instances from the Cloud

Data close to the Users

Data close to the Cloud

Data Placement

#rightscale

# 15

Use

rs

Taxonomy of RightScale’s DataIn

stan

ces

X-a

cct

Acc

ount

Global Objects Marketplace Assets

Dashboard Objects Audits Tags Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data? Proximity to User vs. Cloud

Which data do we need? Scope of data available

Close to cloud resourcesAccount-shardable* data

Close to userAccount-shardable data

Close to userGlobally accessible data

#rightscale

# 16U

sers

Inst

ance

s

AccountX-Account

#rightscale

# 17U

sers

Inst

ance

s

global

X-Account

Custom replication

Why custom? More control• Multiple sources• Individual columns• Apply transformations• Smart re-sync features

Global: MySQL• ACID semantics• Master-Slave replication

#rightscale

# 18U

sers

Inst

ance

s

Account

global dash

S3

events

tags

audit

X-Account

Dashboard: MySQL• ACID semantics• Master-SlaveN replication• Slave reads• Rows tagged by account

Other systems: Cassandra• Simpler Key-Value access• Great scalability• Great replica control• High write availability• Time-to-live expiration as cache• Rows tagged by account

Data archive: S3• Low read rate• Globally accessible

#rightscale

# 19U

sers

Inst

ance

s

Account

global dash

S3

events

tags

audit

X-Account

dash

events

tags

audit

So we can horizontally scale our dashboard by partitioning objects

based on account groups:

Clusters

#rightscale

# 20U

sers

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Account Set 1 Account Set 2

RightScale Accounts

Clu

ster

3

dash

S3

events

tags

audit …

Features:• 1 cluster: N accounts• 1 account: 1 home• Migratable accounts

Benefits:• Great horizontal growth• Better failure isolation• Independent scale• Load rebalancing• Versionable code• Differentiated service

US Eas

t

EU Ja

pan

#rightscale

# 21

dash

events

tags

audit

Use

rsIn

stan

ces

Account

global dash

S3

events

tags

audit

routing

polling

monitor

X-Account

#rightscale

# 22

routing

polling

monitor

routing

polling

monitor

Use

rsIn

stan

ces

Account

global dash

S3

events

tags

audit

X-Account

And partition our cloud objects based on the cloud the instances of an account run on:

Islands

#rightscale

# 23

Cloud 1 Cloud 2 Cloud N

Account

Inst

ance

s

Services co-locatedwith resources

Services co-locatedwith resources

Services co-locatedwith resources

Features:• 1 instance: 1 home island• 1 Island can serve N clouds• Core Agents: global data

Benefits:• Close to cloud resources• Good failure isolation

• As good as cloud • Good scale: global replicas

across Cassandra DCs

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Polling Clouds: MySQL• Master-Slave replication• Can port to NoSQL easily• Mostly a resource cache• But cloud partitionable

Monitoring: Custom• Replicated files• Backup to S3• Archive to S3

Routing: Cassandra• Simpler Key-Value access• Very high availability• Great scalability• Great replica control• Plus cross DC replication*

#rightscale

# 24U

sers

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Inst

ance

s

Azure

AWS E

ast

Privat

e

US Eas

t

Wes

t EU

Japa

n

Different Geographies

Different Clouds

What if the cloud where the clusteris deployed on…

Fails?

#rightscale

# 25U

sers

AccountC

lust

er 1

dash

S3

events

tags

audit

Clu

ster

N

dash

S3

events

tags

audit

Clu

ster

3

dash

S3

events

tags

audit …

US Eas

t

Wes

t EU

Japa

n

routing

polling

monitor

routing

polling

monitor

routing

polling

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Inst

ance

s

Azure

AWS E

ast

Privat

e

Sister Clusters

Full replica

Features:• Each master has an extra remote slave• Each cluster in a pair is a DC replica of the other’s localring

At Disaster Recovery time:• Apps are told to start serving an extra shard• No need to provision more infrastructure to recover(try to avoid since everybody is on the same boat)

• New resources can be allocated over time to help offload existing ones

#rightscale

# 26

Conclusions• Shown that RightScale uses multiple database technologies

• RDBMS – MySQL for the ACID semantics and ‘queryability’• Using a Master to N-Slaves for RO scale, and quick failure recovery• And ReadOnly Provisioning – To increase RO availability and scale remote systems

• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster• For fully replicated regions across the globe (for Read/Write!)

• Shown how RightScale uses them in different techniques• It partitions resource data into Islands based on cloud proximity

• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances• Can provide routing availability, colocated with instances for any world region

• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account isolation/differentiation• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources

• It maintains cluster pairs (sister sites)• To recover from full cloud region failures• It doesn’t require massive amounts of new resources to recover

#rightscale

# 27

Next Steps1. Learn: Building Scalable Applications

in the Cloud Whitepaperwww.rightscale.com/whitepapers

2. Analyze: Deployment review of your environmentwww.rightscale.com/contact

3. Try: Free Editionwww.rightscale.com/free

Contact RightScale

(866) 720-0208sales@rightscale.comwww.rightscale.com

The next big RightScale Community Event!April 25-26 in San Franciscowww.RightScaleCompute.com

• Attend technical breakout sessions• Get RightScale training

• Talk with RightScale customers• Ask questions at the Expert Bar