48
So Many Ways To Skin The Cat Chris J.T. Auld AZR308

AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Embed Size (px)

Citation preview

Page 1: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

So Many Ways To Skin The Cat

Chris J.T. Auld AZR308

Page 2: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Why We Use The Cloud

Page 3: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services.• Distributed System. Cloud applications are inherently distributed and

must balance consistency and availability• Abstraction. Cloud applications are developed against various levels of

abstraction; virtual machines (IaaS), platform services (storage, compute – PaaS), and software services (media, mobile, messaging, data – SaaS).

• Commodity hardware at Internet scale. Clouds deploy commodity hardware to balance cost-efficiency and performance in 1000+ machines chunks.

• Composed of multiple services. Platform services (stuff you get from the cloud), external services (stuff you get from others) and application services (stuff you write)

Internet Scale

Page 4: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Drive cost efficiency at design time and at runtime while still using a robust platform with good uptime.• Drive cost efficiency. The overall cost of an application includes the

development effort, the running costs and the people time to operate and manage. Need to optimize them all

• Design time. Pre-built cloud services reduce dev effort, optimized deployment models allow faster iteration.

• Runtime. Granular purchasing in small increments. High level abstractions remove much of the traditional operations overhead.

• Robust platform. Still not the cheapest, but comparing like for like in quality it is hard to compete with the economies of scale

Low Cost & Capable

Page 5: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

Don’t make things harder than they need to be.By chasing scalability and multi-tenancy you’ll add cost and often have to make some dis-comforting trade-offs.

Page 6: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

FundamentalsAzure Storage

Page 7: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Azure StorageThe durable and highly available storage layer for Windows Azure

Data written to a storage account is auto-replicated (synchronous) to three storage nodes (across three fault/upgrade domains)

Optionally, data asynchronously replicated to fail-over data center

Page 8: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Azure Storage Optimization

• per GB per month stored••

• “transactions”•••

• data egress•••

•hotspots

• one-time operations

• retry policies

• same region

• multiple storage accounts• different storage account

• Storage 2.0

Page 9: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Abstractions• Blobs – Simple interface to store and retrieve files in cloud

• Simple file storage endpoint• RESTful so can serve to clients direct

• Disks/Drives – Network mounted durable disks for VMs in Azure • Mounted disks in IaaS and Services are VHDs stored in Azure Blobs

• Tables –Scalable and extremely easy to use NoSQL system that auto scales• Key-value store

• Queues – Reliable messaging system

All of the abstractions use the same underlying storage sub-system

Page 10: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

You can’t travel faster than the speed of light…“I’m going to spin up a a Cassandra cluster because it’s much faster than Windows Azure Tables”Your durable storage is *always* Azure Storage

Page 11: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Design GoalsHighly Available with Strong Consistency

• Provide access to data in face of failures/partitioning

Durability• Replicate data several times within and across regions

Scalability• Need to scale to zettabytes• Provide a global namespace to access data around the world• Automatically scale out and load balance data to meet peak traffic demands

• Additional details can be found in the SOSP paper:• “Windows Azure Storage: A Highly Available Cloud Storage Service

with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011 http://tinyurl.com/was-internals

Page 12: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Index

Architecture inside WAS

Front-End Layer

Distributed File System

Partition Layer

M

REST• Front-end Layer• REST front-end (blob, table, queue)• Authn/Authz/Metrics/Logging• Stateless scale

• Partition Layer• Provide transaction semantics and

strong consistency for Blobs, Tables and Queues

• Stores and reads the objects to/from extents in the Stream layer

• Provides inter-stamp (geo) replication by shipping logs to other stamps

• Scalable object index via partitioning, and dynamic re-partitioning

• Distributed File System Layer• Data persistence and replication (JBOD)• Data is stored into a file called extent,

which is replicated 3 times across different nodes (UDs/FDs)

• Append-only file system. Writes to SSDs• DFS distribution != Partition distribution

Page 13: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Blobs• Blobs stored in Containers

• 1 or more Containers per account• Scoping and Authz is at container level• …/Container/blobname• Container logical only. Unit of scale is the Blob & then the Storage Account

• Blobs• Two types, Page and Block• Page is random R/W, Block has immutable block semantics• Block is faster, page is sparse storage so can be ‘cheaper’ as empty bits don’t cost

• Can hit REST endpoint direct from client• Get, can set certain response headers e.g. Content-Type, Cache Control• Put via Shared Access Signatures• Trivial to put the Azure CDN in front

#1

Page 14: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Blobs – Worked Example

Upload

• Direct upload from client. Good experience from JS, better from Silverlight (can parallelize) http://tinyurl.com/blob-jscript http://tinyurl.com/blob-slight

• Geo-distance. Leverage “closer” storage accounts for globally distributed clients.

• Multiple storage accounts per geo to scale out if traffic limits are hit

• Use non-geo-redundant storage for initial ingest to save money, boost throughput

Process

• Process data using compute co-located with each storage account

• Avoid pushing large files back out over the network (outbound is metered)

• Gzip content if suitable. Save on storage and outbound traffic

Serve • Deliver via the CDN

Page 15: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Azure CDN• CDNs Cache Content at the

Edge• Serve content from ‘closest’ node• Faster and lower latency for clients• Remove load from origin servers

• Azure CDN• 8 nodes in USA• 8 nodes in Europe• 6 nodes in APAC• A node in Brazil and a node in Doha• Watch for costs: >$ EU/US, <$ RoW

• Strategies• Serve Blob content• Serve content from Hosted Services• Serve content from on-premise apps by

pushing out via Storage

#2

Browser Client

DNS

http:// xxx.vo.msecnd.net

?

Origin

Page 16: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Tables• NoSQL Key-Value Store

• Entities in Tables. No schema • PartitionKey determines partition server responsible for entity• RowKey+PartitionKey uniquely identify Entity in table• Unit of scale is the partition and then then storage account ( not multiple tables!)• Hot partitions can be scaled up at the Partition layer

• Strategies• It’s not an RDBMS• Don’t normalize• Store different types in the same table

• Be careful querying• Large scans are expensive• Embrace eventual consistency and ‘groom’ your data

#3

Go to AZR412: WAS Tables, What Are They Good For? 1610hr today in Epsom

Page 17: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Drives & Disks• VHDs in Blobs

• ‘Fixed’ style Microsoft VHD format stored in Page blobs• Page blobs are sparse storage so create VHDs as larger than you need

#4

Azure PaaS uses Drives Azure IaaS uses Disks

Page 18: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Drives & Disks• Drives (Azure PaaS)

• Kernel mode driver in the Azure Role guest OS• REST network traffic direct off the Guest OS• 128kb chunks (each metered transaction is 128kb)• Can mount up to 16 drives

• Disks (Azure IaaS)• Driver in the Azure IaaS host OS. The guest just sees a standard disk interface.• REST network traffic runs off the Host OS• 2MB chunks (each metered transaction is 2MB)• Can mount up to 16 drives. But number restricted by VM size.

So for apps with chunky file IO we’ll see some cost savings.Do Disks and Drives deliver different performance on account of the

different paths?

Page 19: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Caching• Option to use RAM and local host drives to cache data• Available for both Drives and Disks – implementation slightly

different• Trading off IOPS, Latency and Cost

• Azure storage gives better IOPS but at higher latency and incurs txn costs.• Cache give lower latency and is free, but lower IOPS if we need to go to disk

• Drives (PaaS)• Write through cache• Caching only on first read (not first write and first read)

• Disks (IaaS)• Read/Write (Write back cache)• Read (Write through cache)• Write cache uses host RAM for storage. Is subject to loss for non flushed writes.

Most OLTP database scenarios best with caching off (high random IO requirements)

Page 20: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

The performance limits on Azure Storage are entirely artificial…If you can deal with the latency and make your DAL storage account aware then the performance you can get will be pretty awesome…

Page 21: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

RDBMS

Page 22: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Windows Azure SQL Database• SQL Server as Platform as a Service

• 3 transitionally consistent replicas• Broad, but not complete, SQL Server compatibility

• Fully managed platform• Just talk TDS• CREATE DATABASE, DDL• Connect with DAL & Tools of choice

• Costs• Starts at $5/month for <100MB• AS low as 99c/GB• Premium SKU provides dedicated capacity ($15/$30/day)

• Challenges• Non-deterministic performance – hard to capacity plan• Limited database size

#5

SingleLogical

Database

Multiple PhysicalReplicas

Replic

a 1

Replic

a 2

Replic

a 3

Fire

wall

TDS

Scott Klein has two sessions on Azure SQL DatabaseAZR314: Query Performance Tuning straight after this session in

EpsomAZR311: Azure SQL Database for the DBA 1510hr today in NZ3

Page 23: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

Windows Azure is the cheapest high availability RDBMS that money can buy.Azure Websites + SQL Azure == AwesomeLow cost to buy, low cost to operate

Page 24: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

SQL Server in an IaaS Virtual Machine• SQL Server installed in an IaaS VM

• Start with MS provided image or build your own• Supports full SQL Server feature set• Transparent Data Encryption• Various HA options• Full text indexing• Something that doesn’t work with Windows Azure SQL Database• Database size of > 150GB

• Costs• Storage (per above. Keep an eye on transactions)• VM time (2c/hr through to $2.04/hr)• Bundled with SQL License if needed

(6.5c/hr for SQL Web on XS to $6.25/hr for SQL Enterprise on A7)

• Performance Tuning Guidehttp://go.microsoft.com/fwlink/?LinkId=306266

#6

Page 25: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Getting the most out of SQL on IaaS

See TechEd US 2013 sessions that deep dive on HA & Perf TuningDBI311 Performance Tuning Microsoft SQL Server in Windows Azure Virtual MachinesMDC406 SQL Server High Availability and Disaster Recovery on Windows Azure VMs

• Capacity Plan by Load Testing• Performance of Disks is reasonably consistent• Performance of underlying VM is consistent• Can end up with some inconsistency between deployments• Most workloads with <1TB of data will suffice with a single data disk

• Increase Performance• Scale up VM Size (cores, memory and network IO)

A7 SKU provides 56GB RAM and full rate IO (2GB/S)• Use multiple drives. Ideally placing DB files across multiple drives.• If you *really* need more than 500 IOPs for a single file (DB transaction logs, TempDB) then use

Storage Space• Enable data compression if still IO bound

• High Availability• Use AlwaysOn Availability Groups• Various architectures; Single DC HA, Cross DC HA, Hybrid HA

Page 26: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

The OSS Tent

Page 27: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

The CAP Theorum (2000ish)• Persistent state is hard to scale• We’re all coming from the world of ACID databases. These do

not scale in a distributed fashion

• A Distributed System Can Guarantee Two But Never Three Of These Things• Consistency• Availability• Partition Tolerance

Choosing among Cloud Computing Data Stores is usually about trading off CAP

Page 28: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

C + AForfeit Partitions2-Phase Commit to be DistributedE.g. RDBMS’s, Neo4j

C + PForfeit AvailabilityPessimistic locking, quorum commit modelsE.g. MongoDB, BigTable

C + PForfeit ConsistencyEventual consistencyE.g. Riak, Cassandra, Dynamo

Page 29: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Riak• Focused on availability• Key/Value store• Masterless design means write availability can be guaranteed even if only one node standing• Eventual consistency (i.e. A+P) but this is tunable

• Real Time Trade-offs of Consistency and Availability• When a Partition occurs can remain available but sacrifice consistency or;• Remain consistent but go “offline”• N = Nodes that must eventually be written to• W = Nodes that must be successfully written to before returning to client• R = Nodes that must be read in order to return value to client• Can tune W, R to get more consistency or more availability. (Leaving the DW thing aside for the

moment…)

#7

W+R > N provides strong consistencyE.g. N=2, W=2, R=1 is strongly consistent but will fail for writes with any node failureN=4, W=3, R=2 is strongly consistent and available under failure of a node

W+R <= N will always be eventually consistentE.g.N=2, W=1, R=1 will have a period where the single read could return a value from a node other than that which was written to by the client

W=1 optimizes for write availabilityR=1 optimizes for read availabilityCan still be strongly consistent if we want e.g.N=4, W=1, R=N=4 can write to a single node but must read the same value from all 4 nodes ensuring consistency

Page 30: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Riak• Pre/Post Commit Hooks• A bit like triggers• Can transform data or even call off to another service on commit

• Search & Indexes• Secondary indexes• Solr like full text searching• Map-reduce capability; useful for things like aggregartes• Deployed on CentOS 6.2

• Pros & Cons• Classic NoSQL with all the modelling and querying pain that brings• No transactions and probably need to learn some Erlang• Able to achieve really high availability and tunable model is nice• If you like the look of Amazon Dynamo then Riak lets you do that on Azure

Page 31: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Riak in Azure• Supported by Basho• http://basho.com/announcing-riak-on-microsoft-windows-azure/• Deployed on CentOS 6.2• VM available in MSOpenTech VMDepot

http://vmdepot.msopentech.com/Vhd/Show?vhdId=66

Page 32: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Redis• Focused on performance• In memory Key/Value store with optional durability• Hundreds of thousands of read/writes per second is easy to achieve• Supports atomic transactions• A range of complex data structure types

• Distribution• Master-Slave Replication. Async and non-blocking on the master• Slaves are writeable• A+P: Eventually consistent

• Trade-off of Performance and Durability• Batch based writing at intervals (RDB)• Append Only File recording every write (AOF)

#8

Page 33: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Redis Example• So we built this SharePoint site…• …and it’s mission critical so we need it to be highly available…• …and occasionally we have events that increase the load on

the site by some 10s or orders of magnitude…• …and so at the moment we’re spending several 10s of

thousands each month on hosting…• …and we still fall over when the load hits

Help!?! Can we use Azure?

Page 34: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Nginx+ Redis (M)+ Crawler4j

Nginx+ Redis (S)

Nginx+ Redis (S) …

Fix this.What can they trade-off?Uptime for edit operations?

Page 35: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Redis on Azure• Run Redis on Windows in Azure• http://msopentech.com/blog/2013/04/22/redis-on-windows-stable-and-reliable/

• Run Redis on Linux• Probably best to build your own image

• Use Redis Cloud managed service from Garantia• http://redis-cloud.com

Page 36: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

Using a PaaS service is great, but, if you’re using it to buy an ultra high performance, ultra low latency data server then the round-trip through the load balancer to your service provider might kill the benefits.

Page 37: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Neo4j• A Graph Database• If you can draw it on a whiteboard… you can store it in a Graph DB• Models Nodes and Relationships and properties of both• C + A: It’s more like a normal RDMBS with ACID properties

• Why a GraphDB• M:N relationships are funky• Deep traversals of the graph are hard and expensive• Many current problems well suited to graph model• Recommendation engines• Social networks

• E.g. Path Exists a la LinkedIn

#9

Page 38: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Neo4j on Azure• On Windows on Azure• Supported by Neo4j with a pre-built Cloud Service package• http://tinyurl.com/az-neo4j

• On Linux• Pre built VM from VMDepot http://vmdepot.msopentech.com/List/Index?

sort=Featured&search=Neo4j

Page 39: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

ElasticSearch• Distributed Fancy-Pants Search Server• Built on Lucene• A+P: Eventually consistent• Analytics and query improvement feedback• JSON document based

• Fancy-Pants + Lucene• One of the best full text search tools available• Wildcards• Fuzzy• Proximity• Etc…

• Geospatial querying• Distance• Bounding ‘box’

• RESTful Interface• Very well suited to PaaS type consumption

#10

Page 40: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Elastic Search on Azure• On Windows on Azure• Coming….• http://stackoverflow.com/questions/16475075/whats-the-recommended-elasticsearch-

deployment-on-windows-azure

• On Linux• Build your own VM at this stage

• As PaaS• https://facetflow.com/ • 5000 documents for free• No in production yet…

• Solr is another option.

Page 41: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

MongoDB• Document Database• Stores and indexes BSON (Binary JSON) document• Schema-less• Supports a concept of server-side functions• Master-Slave Replication + Sharding• C+A by default• Indexing includes Geo support• GridFS provides potential for ‘updateable distributed web server’

• Huge Amounts of Data• Hard to beat for absolute scale• Combination of Sharding and Replication used to tune for data volume and requests

• Flexible schema supports change over time

• An elegant programming • JSON documents are easy to work with in HTML applications• Direct storage of ‘Model’ from your MV* style application

#11

Page 42: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

Who remembers Client-Server?In a case of what goes around comes around expect to see the re-emergence of something that looks a lot like Client-Server.Javascript MV* Framework against NoSQL REST Backend

Page 43: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

MongoDB on Azure• On Windows Azure VMs• http://blogs.msdn.com/b/interoperability/archive/2012/07/09/mongodb-installer-for-

windows-azure.aspx

• On Windows Azure in a Worker Role• https://github.com/mongodb/mongo-azure/

• On Ubuntu using pre-built image• http://vmdepot.msopentech.com/Vhd/Show?vhdId=149

• As PaaS on Azure via MongoLab• http://www.windowsazure.com/en-us/store/service/?id=527f070d-3339-43dd-9c54-

d43f7befc2f9

#11

Page 44: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Polyglot Persistence • Use a whole bunch of different stores• Each store best for a specific purpose

• eCommerce App• Financial Transactions• SQL Azure

• Purchase & Browse History & Likes• Neo4j to support recommendations engine• Maybe imported via a Hadoop style pipeline

• Sessions and Shopping Cart• Redis? Riak?

• Product Catalog• Mongo• ElasticSearch for indexing

• Product images• Blob storage• Serve via CDN

#12

Page 45: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Important Concept

Even if many of your chosen data tools provide strong consistency, in a Polyglot world the platform as a whole will be eventually consistent

Page 46: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Related contentAZR412 – Azure Storage TablesAZR311 – SQL Azure for the DBAAZR314 – Query tuning on SQL Azure

NoSQL Distilled (Martin Fowler)http://tinyurl.com/ns-distilled 7 Databases in 7 Weekshttp://tinyurl.com/ns-7DB

Find Me Later At The Intergen Booth

Page 47: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

Evaluate this session and you could win instantly!

Head to...aka.ms/te

Page 48: AZR308. Building distributed systems on an abstraction against commodity hardware at Internet scale, composed of multiple services. Distributed System

© 2013 Microsoft Corporation. All rights reserved.Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.