Upload
james-smith
View
855
Download
0
Tags:
Embed Size (px)
DESCRIPTION
How to you manage Performance in the Cloud, in particular in "Platform as a Service (PaaS) environments like Window's Azure or Heroku where you don't have a "virtual machine" to manage? Even in "Infrastructure as a Service (IaaS)" environments like Amazon EC2 there are limitations on the tools you can deploy into that environment to assist in performance management, troubleshooting etc (e.g. you can't deploy promiscuous mode network sniffing tools in EC2). James Smith from Adactus will give us an overview of Cloud Services as a whole, and then drill down into some of the issues they have experienced in deployed their "Pulse" Claims Management Solution into the Azure cloud (http://www.pulseclaims.com/home). Beyond just looking at page speed performance he'll talk about the challenges involved in managing SLA's, Cloud "support" (or lack of it!), performance troubleshooting and the whole "performance lifecycle".
Citation preview
Managing Performance in the Cloud
TheDevMgr
BACKGROUNDCloud History
• Desktop internet computing
• Shift from local to centralised computing
• Software was cheap and hardware was expensive.
In the nineties…
• Shift from desktop to mobile
• The cloud is born
• Bezos and his book company start to shape the future.
The carefree noughty days
• Shift from centralised to distributed computing
• Commoditisation of computing (PAYG)
• Anything-as-a-Service (XaaS).
The twenty-tens
THE CLOUDWhat is it?
Service ModelsX
aaS SaaS
PaaS
IaaS
Anything
Software
Platform
Infrastructure
Infrastructure (IaaS)
• Outsource hardware to support operations– Storage, servers, networking components
• Service provider owns and hosts equipment
• Service provider responsible for management & maintenance.
Platform (Paas)
• Paradigm for delivering operating systems and associated services over the Internet
• No downloads or installation
• Google App Engine, Microsoft Windows Azure, Heroku & Force.com.
Software (SaaS)
• Software distribution model in which applications are hosted by a vendor or service provider
• Made available to customers over the Internet
• SalesForce.com, many...many...more.
Deployment Models
Private PublicHybrid
• “Virtualised” infrastructure operated for a single organisation (single tenant)
• Hosted internally or externally
• Managed internally or by a third-party
• Can be secured to meet compliance
• More expensive, less flexible.
Private Cloud
• Service provider makes resources available to the general public over the Internet– Compute, Storage, O/S, Applications
• May be free or pay-per-usage model
• Fast deployment, short commitments
• Shared services, less control.
Public Cloud
• Core platform on private cloud
• Burstable capability into public cloud
• Brings best of both private and public
• Brings problems of both private and public.
Hybrid
THE COST OF POOR CLOUD PERFORMANCE
Financial and customer satisfaction
Cost• Compuware survey suggests large
business losses can exceed £500k due to poor cloud performance
• 57% of European IT Directors believe that they can’t manage cloud application performance
• You still have to deliver 2 second response times.
Performance• 50% of ops teams have suffered more
than one P-1 performance issue in the cloud
• 33% experience a P-1 issue every month
• 60% of incidents took more than 2 hours to resolve
• Good luck webops (cloudops). Source: AppDynamics
COMMON PERFORMANCE CHALLENGES
Traditional and new problems
Performance Challenges• Traditional
• Connectivity– Bandwidth /
Latency
• Bottlenecks– CPU, IO, Database
• Contemporary
• Bigger scale–More stuff
• Shared infrastructure– Not your stuff
(entirely).
Traditional• Connectivity
• Latency, jitter & Packet loss
• Bandwidth limitations
• Users demand fast access to data
• Bottlenecks
• Will still occur!
• Virtualised hardware– Host Contention– Storage.
Contemporary• Bigger Scale
• 10’s, 100’s, 1000’s, 10,000’s of servers – VM Sprawl
• Dynamically allocated physical resource
• Over-provisioning
• Hidden billing costs
• Shared Resources
• Room for one more?
• Deal with other peoples problems– DDOS, general
stupidity?
–Mi casa, es tu casa.
• Elasticity– Planned (scheduled/controlled scaling)– Unplanned (auto-scaling)
• Global distribution– Data Centres– Data
• Less Control.
Paradigm Shift
Data location still matters!
CLOUD EXPERIENCESStories from the trenches
INFRASTRUCTURE-AS-SERVICE
IaaS
• Adactus Food Ordering Platform
• Transacts –> 7 million orders & > $100M USD a year – 30% daily of orders taken in1 hour
• Adopted as eCommerce platform for Pizza Hut and KFC globally.
Application
Platform• Private• Global instances all
deployed on private clouds
• VMWare ESX Hosts– V-Web’s
• Dedicated / Non-Virtualised SQL
• Public• Rackspace public
cloud
• On-Demand– Load Balancers–Web Servers– SQL Servers
• High-scale, high-volume.
• Big Scale– A lot more to manage
• Virtual Platform– Contention
• End-to-End Application Performance Management.
Challenges
Solutions
• Cloud-centric APM– AppDynamics– CloudKick (now Rackspace APM)– Rightscale
• Automated Operations– Chef, Puppet (SysOps)– CloudFoundry, OpenShift (App LifeCycle)– Heroku, AppFog (NoOps?)
PLATFORM-AS-A-SERVICEPaaS
• Adactus Pulse
• Claims management solution for the insurance industry delivered as SaaS
• Processed over a million claims
• Deployed for ISS and Aviva.
Application
Platform• Deployed into Windows Azure Platform–Web Roles–Worker Roles– SQL Azure– SQL Azure Reporting Services
• Upgrade of traditional ASP.NET application
• Continuous Deployment Process.
Challenges
• Disproving the “shared resource” impact– Is it the infrastructure?
• Database performance is a black-box– Limitations and more limitations
• Getting performance data is hard work– Not easy to access, dispersed everywhere
• Baseline performance is not linear.
Baseline Performance
Large variances in baseline performance.
Windows Azure is more consistent.
Solutions• Instrumentation is king– Aspect Orientation (AOP)
• Gibraltar
– Does your provider offer a Performance API?
• Dedicated Cloud (Azure) Tools• Dynatrace• Cerebrata
• You must automate– Deployment (and everything else!)– Consistency is key.
DATABASE-AS-SERVICEDaaS
• Service provider takes responsibility for installing and maintaining the database.
• Amazon (mySQL)• Microsoft SQL Azure• Google App Engine Datastore• CouchDB, MongoDB.
Overview
Challenges
• Most service providers are having performance issues (even Google!)
• Database is a (performance) black-box– You will find limitations
• Need to handle transient connections– Your database will be there, but not always.
Solutions• Do as much tuning outside of the cloud
as possible
• Instrument your data access
• DB sharding becomes viable easy
• Build connection resiliency into your data-framework.
• On-premise databases– Are you sure?
• You might be about to create your own data storm?– Too much on-premise data– Too little bandwidth.
Caution
SOFTWARE-AS-A-SERVICESaaS
Overview• Adactus Pulse– Delivered on a SaaS Model
• We consume SaaS (heavily)– CRM, Performance, Google Apps, WIKI, Bug
Tracking, Testing, Accounting, Planning & Forecasting, Document Management, CMS, Exception Handling, Business Intelligence, Deployment, APM, Collaboration, HRM, ERP and more.
Challenges• Consumer
• Good news– Performance is out
of your control!
• Bad news– Performance is out
of your control!
• Provider
• Expectations are high!– Response times
• Performance is still king!– Competitors– Repeat use.
Real User Monitoring• Consumer
• It’s your new best friend
• Get to know your SLA– Its your new best
friend
• Simple rules– Be the first to know– Get your money back
• Provider
• It’s your new best friend
• You will live & die by your SLA’s
• Simple rules– Be the first to know– Tell your customers.
MonitoringXaaS
SaaS
PaaS
IaaS
RUM
Instrumentation
APM
BEYOND PERFORMANCEStories from the trenches
Service-Level-Agreements
• Critical element for both provider and consumer
• Don’t waste time on detailed numerical service level agreements
• SLAs need to be based on end-user experience.
Service-Level-Agreements1. Establish system availability
2. Establish system response time
3. Establish error resolution time
4. Establish a fail over window for disaster recovery
5. Ensure that you can get your data back.
Service-Level-Agreements• IaaS– The O/S is your responsibility• Managed Cloud Platforms are available
• PaaS– SLA’s stop at the O/S• Your application still remains your responsibility
• SaaS– Know your SLA inside out. Its your
responsibility.
Disaster Recovery
• It’s hard in the cloud
• DR strategies are still emerging
• Bandwidth & network capacity limits
• Security is still a concern.
Disaster Recovery• There isn’t a single blueprint
• Identify critical resources and recovery methods
• Architect for redundancy
• Back up to/from and restore to/from the cloud
• Most cloud SLA’s > 99.5% availability– 4 hours, 39 minutes downtime per month.
THANK YOU. QUESTIONS?That’s all folks!