45
ruxit theme 2014.05.15 Behind the scenes @ ruxit Running a global monitoring infrastructure on AWS Alois Reitbauer, ruxit @aloisreitbauer

Ruxit - How we launched a global monitoring platform on AWS in 80 days

Embed Size (px)

Citation preview

ruxit theme 2014.05.15

Behind the scenes @ ruxitRunning a global monitoring infrastructure on AWS

Alois Reitbauer, ruxit@aloisreitbauer

ruxit theme 2014.05.15

Ruxit – what we doSaaS-based Monitoring and Management Solution

ruxit theme 2014.05.15

ruxit theme 2014.05.15

ruxit theme 2014.05.15

ruxit theme 2014.05.15

ruxit theme 2014.05.15

ruxit theme 2014.05.15

A bit of historyHow we moved to a global AWS deployment in 80 days

ruxit theme 2014.05.15

ruxit theme 2014.05.15How we moved to the Cloud in 80 days

June 2014 – Beta Cloud Deployment

Ju ly 2014 – Open Beta Off ering to Publ ic

August 2014 – Ful l automation

September 2014 – Offi cial Product Launch

October 2014 - >1000 active companies

ruxit theme 2014.05.15

ruxit theme 2014.05.15

Our architectureLessons learned building a global cloud platform

ruxit theme 2014.05.15

Cluster

ruxit theme 2014.05.15

Cluster

Cassandra DB Cluster

Server ClusterPublic

Security Gateways

Availa

bility

Zone

Availa

bility

Zone

Availa

bility

Zone

Amazon EC2

HA Proxy

Elastic Load Balancer

ruxit theme 2014.05.15

Cluster

3rdP

3rdP

3rdP

3rdP

3rdPcloudcontrol.ruxit.com

account.ruxit.com

*.live.ruxit.com

*.live.ruxit.com

*.live.ruxit.com

ruxit theme 2014.05.15

Ruxit is build on AWSHow we solve challenges using AWS technology stack

ruxit theme 2014.05.15

Challenge: Growth

Being one of the fastest growing B2B SaaS companies

ruxit theme 2014.05.15

Challenge: Usability

Real Time provisioning of DNS names

ruxit theme 2014.05.15

Challenge: Reliability

Zero downtime without manual intervention

ruxit theme 2014.05.15

Challenge: Delivery

Manage deployment artifacts globally

ruxit theme 2014.05.15

How we achieve zero downtimeYour application will break; your users should not recognize

ruxit theme 2014.05.15Key Guiding Principles

Over Provisioning

Quarantine Mode

Rolling Updates

Soft Stickyness

ruxit theme 2014.05.15

We never run above two thirds of capacityOver provisioning is built into our architecture.

ruxit theme 2014.05.15

Cassandra DB Cluster

Server ClusterPublic

Security Gateways

Availa

bility

Zone

Availa

bility

Zone

Availa

bility

Zone

HA Proxy

Elastic Load Balancer

Quarantine and Diagnose in Production

ruxit theme 2014.05.15

How we handle upgradesWe have to be able to upgrade without any downtimes

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Rolling update

Cloud Control AWS S3

ruxit theme 2014.05.15

Soft StickinessCombining Data Locality with Transparent Failover

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Dynamic Traffi c Routing

A

B

C

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Constant Failover Mode

A

B

C

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Routing with Wishlist

A

B

C

B

ruxit theme 2014.05.15

Server ClusterPublic

Security GatewaysHA Proxy

Elastic Load Balancer

Routing with Failover

A

C

B

B

ruxit theme 2014.05.15

Our road from DevOps to NoOpsWe don’t have a dedicated Operations team and we don’t want one

ruxit theme 2014.05.15Key Guiding Principles

Autonomous Operations

Feedback and Transparency

Everything is production

Data-Driven Operations

ruxit theme 2014.05.15

Run books become backlogsIf you describe what to do, you can also code it into the platform

ruxit theme 2014.05.15Ruxit needs to be able to mange itself

ruxit theme 2014.05.15

Feedback and TransparencyEverybody has access to our production monitoring data.

ruxit theme 2014.05.15Full Transparency on Quality

ruxit theme 2014.05.15

We treat all environments like productionEverybody has access to our production monitoring data.

ruxit theme 2014.05.15

ruxit theme 2014.05.15

Data-Driven OperationsThere is no decision without data.

ruxit theme 2014.05.15

Java

OS

Apache

IIS.NET

Understand the impact of deployments

ruxit theme 2014.05.15

1.57

1.591.61 1.63

1.651.67

1.69

1.58

1.54

1.54

Information on Agent Deployment

ruxit theme 2014.05.15

Questions?

ruxit theme 2014.05.15

Member of

ruxit theme 2014.05.15

Alois [email protected]@ruxit.comblog.ruxit.com