84
http://img11.imageshack.us/img11/2017/skatingdownarollercoastw.jpg Running a Cloud: How the Cloud Impacts Service Management and IT Operations

Bright talk running a cloud - final

Embed Size (px)

Citation preview

Page 1: Bright talk   running a cloud - final

http://img11.imageshack.us/img11/2017/skatingdownarollercoastw.jpg!

Running a Cloud: How the Cloud Impacts Service Management and IT Operations!

Page 2: Bright talk   running a cloud - final

!Mr. White has fifteen years of experience designing and managing the deployment of systems monitoring and Event Management software. Prior to joining IBM, Mr. White held various positions including the leader of the Monitoring and Event Management organization of a Fortune 100 company and developing solutions as a consultant for a wide variety of organizations, including the Mexican Secretaría de Hacienda y Crédito Público, Telmex, Wal-Mart of Mexico, JP Morgan Chase, Nationwide Insurance and the US Navy Facilities and Engineering Command.!

!Andrew White!Cloud and Smarter Infrastructure Solution Specialist!IBM Corporation!

Page 3: Bright talk   running a cloud - final

http://weheartit.com/entry/12433848!

Page 4: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

GROUND RULES FOR THIS SESSION…!

1.  If you can’t tell if I am trying to be funny…!!GO AHEAD AND LAUGH!!

2.  Feel free to text, tweet, yammer, or whatever. Use !3.  If you have a question, no need to wait until the

end. Just interrupt me. Seriously… I don’t mind.!

Page 5: Bright talk   running a cloud - final

I have a lot of experience leading !Systems and Event Management teams !

My name is Andrew White!

Page 6: Bright talk   running a cloud - final

Cloud Operations!I am here today to share some of what I have learned about!

Page 7: Bright talk   running a cloud - final

More importantly, I am here today to talk about how the cloud affects…!

Page 8: Bright talk   running a cloud - final

QUESTION:!What value does your IT organization create for your business?!

Page 9: Bright talk   running a cloud - final

If you can’t answer this question, how can you be sure you are doing the right things and doing them well…!

Page 10: Bright talk   running a cloud - final

HINT: “We provide infrastructure or applications the business uses” is not a value statement!

Page 11: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

We are all here for one reason…!

Page 12: Bright talk   running a cloud - final

How does IT preserve the value it creates?!

• 100% Uptime*!

• Scalability*!• Performance*!• Agility*!

• Good UX*!!

*To the best of our ability!

Page 13: Bright talk   running a cloud - final

How well would THEY say you are doing?!

Page 14: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CURRENT MARKET CONDITIONS!

§  The velocity of change and the volume of data is increasing!§  Virtualization introduces complexity and increased

consumption of resources!§  Shared services are forced to oversubscribe finite resources!§  Expertise is limited to functional silos and there is no

understanding of how the system functions end-to-end!§  Supporting a cloud requires the ability to manage a large-

scale dynamic infrastructure!§  Agile development and Continuous Delivery are in conflict

with ITIL processes!

Page 15: Bright talk   running a cloud - final

We need to recognize when we have problems to solve!

Page 16: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

To  solve  problems  quickly,  we  look  for  solu5ons  that  we  can  use  to  define  best  prac5ces  and  develop  

processes  to  insert  a  measure  of  control.  

THE TRADITIONAL APPROACH!

Page 17: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

§  Solutions are driven by accepted conventions!§  Best practices are coveted and are usually adopted

without understanding how and why they were developed!§  There must always be a right answer!§  No logical analysis is required!§  People are frequently seen as the “root cause”!§  The outcomes are enforced using “re-dos” and punitive

actions (or the looming threat of these things)!

THE PROBLEM WITH THIS APPROACH!

Page 18: Bright talk   running a cloud - final

http://leanhomebuilding.files.wordpress.com/2010/12/standard2.jpg!

Page 19: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

§  We receive feedback from our business partners that system performance and availability have been unacceptable for many of our critical business applications!

§  Our productivity is impacted and we fail to meat delivery timelines!§  IT is not able to measure its impact on the business or the end user experience!§  There is a lack of clear communication during a problem!§  People are “hoarding” data and reports!§  IT lacks the information needed to prioritize performance issues and

opportunities based on business need!§  We take a really long time to figure out what is wrong!§  The same old problems keep coming back!§  We never really get to the “true root cause”!

HOW DO WE KNOW WE NEED TO CHANGE!

Page 20: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Our typical approach towards service improvement is a bit like attempting to

put the toothpaste back in the tube!! “Some  problems  are  so  complex  

that  you  have  to  be  highly  intelligent  and  well  informed  just  to  be  undecided  about  them.”  

       -­‐  Laurence  J.  Peter  

CONTROL IS AN ILLUSION!

Page 21: Bright talk   running a cloud - final

Organizations don’t fail because they take the wrong path, they fail because they can’t imagine a better path than the one they are on.!! ! ! ! ! ! ! ! ! ! !-- Marty Neumeier!

Page 22: Bright talk   running a cloud - final

What is the next step in the evolution?!

Page 23: Bright talk   running a cloud - final

Is it the infrastructure or the application?!The perennial problem….!

Page 24: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

DRIVING THE RIGHT KIND OF ACTION!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

Page 25: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

DRIVING THE RIGHT KIND OF ACTION!

Page 26: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

DRIVING THE RIGHT KIND OF ACTION!

Page 27: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

DRIVING THE RIGHT KIND OF ACTION!

Page 28: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

DRIVING THE RIGHT KIND OF ACTION!

Page 29: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!29

Who ya gonna call?

Page 30: Bright talk   running a cloud - final

Is it the infrastructure or the application?!The perennial problem….!

Page 31: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CLOUD PAIN POINTS!

§  It takes too long to diagnose problems in the application and infrastructure!

§  Existing management tools are outdated and don’t work at scale!

§  Critical information is missed causing outages and poor user experiences!

§  Most problems are managed reactively!

Does any of this sound familiar?!

Page 32: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

DRIVING THE RIGHT KIND OF ACTION!

Application!

End User Experience!

Gainesville!

Transaction 1!

Transaction 2!

Transaction N!

San Antonio!

Transaction 1!

Transaction 2!

Transaction N!

Des Moines!

Transaction 1!

Transaction 2!

Transaction N!

Columbus!

Transaction 1!

Transaction 2!

Transaction N!

Infrastructure!

Network!

KPI 1!

KPI 2!

KPI N!

Mainframe!

KPI 1!

KPI 2!

KPI N!

Storage!

KPI 1!

KPI 2!

KPI N!

Linux!

KPI 1!

KPI 2!

KPI N!

Middleware!

KPI 1!

KPI 2!

KPI N!

Database!

KPI 1!

KPI 2!

KPI N!

The Cloud!

Page 33: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

REQUIREMENTS FOR UNITY OF EFFORT!

1. Command and Control!

2. Shared Experience!

3. Situational Awareness!

•  Command and control (No Leadership)!•  The team lacks a clear direction!•  Lots of activity, lack of progress!

•  Shared Experience (Poor Relationships)!•  Us vs. Them mentality!

•  Unhealthy competition!•  Situational Awareness (Poor Communication)!

•  Focused on cooperation, not collaboration!•  Blame culture!•  Infrequent or non-existent communication!

Symptoms of Missing Elements!

Page 34: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

TWO TYPES OF DECISION MAKING!

§  Programmed Decisions!§  Routine!§  Repetitive!§  Well-Structured!§  Predetermined Decision

Rules!

§  Non-Programmed Decisions!§  Unique!§  Presence of Risk!§  Presence of Uncertainty!§  Black Swans!

Page 35: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

BOYD’S OODA “LOOP”!

Observation!

Outside Information!

Implicit Guidance & Control!

Unfolding Interaction With Environment!

Feedback!

Feedback!

Unfolding Circumstances! Cultural!

Norms!

Cognitive!Abilities!

Knowledge !Life Cycle!

Prior!Wisdom!

New !Information!

Feed Forward! Decision!

(Hypothesis)!

Feed Forward! Action

(Test)!

Feed Forward!

•  Note how observation shapes orientation, shapes decision, shapes action, and in turn is shaped by the feedback and other phenomena coming into our sensing or observing window.!

•  Also note how the entire “loop” (not just orientation) is an ongoing many-sided implicit cross-referencing process of projection, empathy, correlation, and rejection.!

!From “The Essence of Winning and Losing,” John R. Boyd, January 1996.!

Observe! Orient! Decide! Act!

Page 36: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Down  Time  

Detec5on  Time   Response  Time   Repair  Time   Recovery  Time  Outage  

Detec5on

 

Diagno

sis  

Repair  

Recover  

Restore  

Observe   Orient   Decide   Act  

INCIDENT LIFE CYCLE!

Page 37: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

ANATOMY OF AN OUTAGE!

Corporate!LANs & VPNs!

Load Balancer!

Firewall!

Web!Servers!

Message!Queue!

zOS!CICS!

WAS!

Database!

WAS!Database!

zOS!MQ!

DB2!

IM01109089: P0 - Affecting Multiple apps!!!!!

4!

!!!!!!

3!

!!!!!!1!

5:45-ish pm: CICS ABENDS start flooding the console but not high enough to ticket!

!!!!!!2!

6:00-ish pm: MQ flows start are interrupted and are alerting in Flow Diagnostics!

6:04pm: Synthetic transactions fail at and 6:14 the Ops Center confirms the issue and creates a P0 Incident!

6:54pm: Support teams investigate the interrupted flows and determine it is a “back-end” problem!

10:29pm: Support teams investigate MQ and ultimately and rule it out and ultimately decide to reset CICS to resolve the issue!

!!!!

5!

Page 38: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!hBp://www.ithakabound.com/wp-­‐content/uploads/2010/02/DC-­‐Snow-­‐men-­‐pushing-­‐car.jpg  

Why did this happen?!

Page 39: Bright talk   running a cloud - final

Four Sources of Bad Decisions:!!

1. Failure to frame the problem correctly!2. Poor use of evidence!3. Faulty decision making process!4. No feedback for improvement!

Page 40: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

WHERE THE BREAKDOWN OCCURS!

Observe! Orient! Decide! Act!

Situational Awareness!

Perception of Elements in Current Situation!

!Level 1!

Comprehension of Current Situation!

!Level 2!

Projection of Future Status!

!!

Level 3!

Decision! Performance of Actions!

Cur

rent

Sta

te!

Feedback!

• Goals & Objectives!• Preconceptions!• Expectations!

• Abilities!• Experience!• Training!

Long Term Memory! Automaticity!

Cognitive Processes!

• System Capability!• Interface Design!• Stress & Workload!• Complexity!• Automation!

Adapted from Endsley, M.R. (1995b). Toward a theory of situation awareness in dynamic systems. Human Factors 37(1), 32–64.!

Systemic Influences!

Individual Influences!

Page 41: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

SOMETIMES WE MISS WHAT IS GOING ON!

Say… what’s a mountain goat doing all the way up here in a cloud bank?!

Page 42: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

NORMATIVE DECISION MAKING MODEL!§  Limited Information Collection!

§  7 +/- 2!§  Tendency to acquire manageable rather than optimal amounts

of information!§  Difficulty identifying all possible options!

§  Judgmental Heuristics!§  Judgmental heuristics - rules of thumb or shortcuts that people

use to reduce information processing demands!§  Availability heuristic - tendency to base decisions on

information readily available in memory!§  Representativeness heuristic - tendency to assess the

likelihood of an event occurring based on impressions about similar occurrences!

§  Satisficing!§  Choosing a solution that meets a minimum standard of

acceptance!

Page 43: Bright talk   running a cloud - final

1. Adapted from Endsley, M.R. (1995b). Toward a theory of situation awareness in dynamic systems. Human Factors 37(1), 32–64.!!!

Our systems are capable of producing a huge amount of data, both on the status of their own components and on the status of the environment. The problem with today’s systems is not a lack of information, but finding what is needed when it is needed.!

Page 44: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Page 45: Bright talk   running a cloud - final

Why does any of this matter?!

Page 46: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

REQUIREMENTS FOR UNITY OF EFFORT!

1. Command and Control!

2. Shared Experience!

3. Situational Awareness!

•  Command and control (No Leadership)!•  The team lacks a clear direction!•  Lots of activity, lack of progress!

•  Shared Experience (Poor Relationships)!•  Us vs. Them mentality!

•  Unhealthy competition!•  Situational Awareness (Poor Communication)!

•  Focused on cooperation, not collaboration!•  Blame culture!•  Infrequent or non-existent communication!

Symptoms of Missing Elements!

In the cloud, much of this will be federated or done by software!

Page 47: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CLOUD IS ASSISTED DECISION MAKING!§  Programmed Decision Making!

§  Collect evidence!§  Identify the problem!§  Select a solution!§  Implement and evaluate the outcome!

§  Non-Programmed Decision Making!§  Narrow evidence down to the ideal level!§  Apply heuristics to limit the impact of cognitive bias!§  Present options to a human for a decision!

Page 48: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

DECISIONS BEING AUTOMATED IN THE CLOUD!Packing! •  Compressing workloads to the fewest number of physical

servers!•  Maximizing cost efficiencies!

Striping! •  Spreading workloads across as many physical servers as possible!

•  Ensuring higher performance levels and reducing risk due to component failure!

Load-Awareness!

•  Allocating new workloads to the servers with the lowest load!•  Maximizing the performance of the workloads!

HA-Awareness!

•  Ensuring workloads are distributed across pods!•  Matching availability levels with service requirements and

cost targets!

Energy Awareness!

•  Placing workloads according to energy costs!•  Ending workloads to reduce energy consumption or

rescheduling them for off-peak hours!

Affinity-Awareness!

•  Placing workloads close to critical resource dependencies!•  Collocating compatible workloads to maximize available

resources!

Platform Awareness!

•  Allocate workloads to best platform!•  Migrating workloads to least expensive platform still capable

of delivering required service levels!

Topology Awareness!

•  Allocating resources within a service group near each other!•  Isolate single-points-of-failure!

Page 49: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CLOUD OPERATION REQUIREMENT!

!The perception of and reaction to a set of changing events in terms of what can be done instead of merely the recollection of a stimuli.1 !

Operating a cloud means enabling good decision making!

1. Adapted from Endsley, M.R. (1995b). Toward a theory of situation awareness in dynamic systems. Human Factors 37(1), 32–64.!!!

Page 50: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

When decisions are not made based on information, it’s called gambling.!

Page 51: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

SOME THINGS NEVER CHANGE!

Corporate!LANs & VPNs!

ISP!Connection!

DNS & Internet!Services!

Content Mgmt!System!

Social Network!Widgets!

Site Tracking!& Analytics!

Banner Ads & !Revenue Generators!

Multimedia &!CDN Content!

Home Wireless!& Broadband!

Mobile Broadband!

Is It My Cloud Provider?!•  Configuration errors!•  Application design issues!•  Code defects!•  Insufficient infrastructure!•  Oversubscription Issues!•  Poor routing optimization!•  Low cache hit rate!

Is It a Service Provider Problem?!•  Non-optimized mobile content!•  Bad performance under load!•  Blocking content delivery!•  Incorrect geo-targeted content!

Is it an ISP Problem?!•  Peering problems!•  ISP Outages! Is it My Code or a Browser Problem?!

•  Missing content!•  Poorly performing JavaScript!•  Inconsistent CSS rendering!•  Browser/device incompatibility!•  Page size too big!•  Conflicting HTML tag support!•  Too many objects!•  Content not optimized for device!

The Cloud!

Page 52: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

OUR UNDERSTANDING OF YOUR GOALS!

§ Gaining visibility into and control of an increasingly complex operating environment in order to prevent frequent and prolonged outages!

§ Evolving from fault monitoring to a holistic approach to managing application performance!

§  Increased focus on cloud makes problem isolation and resolution more complex.!

PROACTIVE OPERATIONS!

§ Optimizing the performance of business processes to boost productivity!

§ Providing cost transparency to track, analyze, and manage resources and control the costs associated with highly-virtualized and cloud environments!

§  Improving software asset management to prevent over-spending and under-licensing!

!CONTROL COST!

!§ Leveraging automation to facilitate

rapid growth and reduce the cost of service delivery!

§ Maintaining OS and application patch levels across all images (active or dormant) to protect the enterprise and enable compliance!

§ Automating application releases to optimize service delivery and align the Development and Operations teams thereby increasing innovation, reducing costs, and accelerating time to value!

ELIMINATE HUMAN FACTORS!

Migrating to the cloud is disruptive to an IT organization. We have experienced that many of our clients use this as an opportunity to re-evaluate the way they operate their environments and the tools they leverage to deliver a quality service.!We have identified three key goals driving the adoption of the cloud:!

Page 53: Bright talk   running a cloud - final

OK.!So now what?!

Page 54: Bright talk   running a cloud - final

Starting the journey…!

Page 55: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

WHAT THIS MEANS TO US…!There are a few inescapable facts we face:!1.  We needs reliable systems to store the promises it

makes to its customers !2.  Our systems mirror the complexity of the

businesses they support!3.  Our environments must be massive to scale to

handle the workload!4.  There is too much activity for a single person to be

totally situationally aware!5.  If the users can’t use it, it doesn’t work!

Page 56: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Monitoring & Capacity! Infrastructure as Code! Orchestration!

Backup & Recovery! Continuous Delivery! Storage Virtualization!

Cost Management ! HA / DR!

Patch Mgmt! Dynamic Scheduling!

Bare Metal Provisioning!

Network Management!

Transaction Tracing!App Provisioning! Performance Analytics!App Perf Mgmnt! App Diagnostics! Service Visualization!

Monitoring & Capacity! App Perf Mgmt! Event Management!

Infrastructure!Optimization!

Application !Analytics!

Analytics Enabled !Datacenter!

Virtualization !Optimization! DevOps! Cloud Enabled !

Datacenter!

Cloud Optimized!

Analytics Empowered!

The building blocks on your Journey towards an agile, flexible and optimized environment!ROADMAP TO MATURE CLOUD OPERATIONS!

Page 57: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

REMEMBER THE OPS USE CASE!

•  Security!•  Backups!•  High Availability!•  Upgradability!•  Deployment Process!•  Scaling and Elasticity!•  Anticipated Performance Under Load!•  Known Defects!

Page 58: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

NEW OPERATIONAL REQUIREMENTS!§  Keep the data moving!§  Query on streams!§  Handle stream imperfections!§  Integrate stored and streaming data!§  Guarantee data safety and availability!§  Partition and scale applications automatically!§  Process and respond instantaneously!§  Drive Interoperability!

Page 59: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CLEANING UP THE LANDSCAPE!

Adapted from: Akella, Janaki. “IT Architecture: Cutting costs and complexity.” McKinsey Quarterly 13 Nov 2009 https://www.mckinseyquarterly.com/IT_architecture_Cutting_costs_and_complexity_2391!

Silo!

Monolithic Framework!

Nic

he!

Launch Pad!

Information Bus!

Page 60: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CREATING A DIRECTED WORKFLOW!

Directed !

Non Directed!

Observe! Orient! Decide!

Launchpad!

Executive Dashboard!

Business Area!Dashboards!

Application PAC!Dashboards!

Command Center!Dashboards!

Technology Owner!Dashboard!

Application Owner!Dashboard!

Problem Isolation!

Workspace!

Problem Diagnostics!Workspace!

System Detail!View!

Component Detail!View!

Page 61: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

A TYPICAL ITIL CHANGE PROCESS!

Objectives:!- What Changes are coming? - Why is the change required? - Has the existing configuration been reviewed? - What is the risk & impact, low, medium, high? - what is the plan B?!

Page 62: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Palette of library assets enable easy

workflow composition through drag and drop

Access to rich libraries (toolkits) of reusable

automation assets that enable to speed

automation creation

Rich set of actions types, flow control, data handling

primitives that simplify creation of complex

automations

Easy workflow action editing for managing: data mapping,

error recovery options, implementation details , etc.

Graphical editor for composing and

connecting workflows

Rich tooling functions to edit, version, debug,

optimize workflows

AUTOMATING ITIL PROCESSES!

Page 63: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

FINDING METRICS THAT MATTER!

§  Will the metric be used in a report? If so, which one? How is it used in the report?!

§  Will the metric be used in a dashboard? If so, which one? How will it be used?!

§  What action(s) will be taken if an alert is generated? Who are the actors? Will a ticket be generated? If so, what severity?!

§  How often is this event likely to occur? What is the impact if the event occurs? What is the likelihood it can be detected by monitoring?!

§  Will the metric help identify the source of a problem? Is it a coincident / symptomatic indicator?!

§  Is the metric always associated with a single problem? Could this metric become a false indicator?!

§  What is the impact if this goes undetected?!§  What is the lifespan for this metric? What is the potential for changes that

may reduce the efficacy of the metric?!

Evaluating the Effectiveness of a Metric!

Page 64: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

PICKING BETTER MONITORS!

Itemize the existing monitors!

Brainstorm potential gaps to

fill!Deploy new

monitors!

Identify the potential

risks!

Itemize the existing monitors!

Determine if which

gaps exist!

Fill the monitoring

gaps!

Current Approach!

Proposed Approach!

Page 65: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

WHAT GOOD MONITORING LOOKS LIKE!

Corporate!LANs & VPNs!

Load Balancer!

Load Balancer!

Firewall!

Switch!

Web Server Farm!

Database!

Data Power!Mainframe!

Middleware!

Load Balancer!

1.  System Availability!2.  Operating System Performance!3.  Hardware Monitoring!4.  Service/Daemon and Process Availability!5.  Error Logs!6.  Application Resource KPIs!7.  End-to-End Transactions!8.  Point of Failure Transactions!9.  Fail-Over Success!10. “Activity Monitors” and “Reverse Hockey Stick”!

Elements of Good Monitoring!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!3!2! 4! 5! 6!1!!!!!

7!

!!!!!!!!!!!!!!!!!!8!

!!!!!!!!!!!!!!!!!!!!

9! !!!!!!

10!

Page 66: Bright talk   running a cloud - final

http://info.streamdatacenters.com/Portals/165393/Gallery/Album/6624/Richardson%20Aerial-01.png!

This is no longer the way we should think about monitoring!

Monitoring Happens Here!

Page 67: Bright talk   running a cloud - final

Cloud Monitoring Happens Here!

Page 68: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

WHAT DO YOU WANT TO ACCOMPLISH?!

Your monitoring should help you answer:!

•  How will we know if the users are getting the experience they are expecting?!

•  How much capacity do we need during normal and peak times to ensure user expectations are met?!

•  How quickly can the provider we select ramp up to meet our needs if we find that the service is underperforming?!

•  How fast do we need to be able to access additional capacity once it is ready for us?!

Page 69: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!69

Here comes the elevator pitch…

Page 70: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

70!

THE IBM SOLUTION!!IBM SmartCloud Suite offers essential management capabilities for applications in complex cloud and hybrid environments. !

!! !•  At-a-glance status determination

via network topology graphs!•  Proactively identify and respond to

compliance issues!•  Monitor the performance of the

environment and the tenants living inside of it!

•  Understand the current capacity needs and forecast future needs!

•  Understand the costs associated with providing the service and enable “showback” and charge back” reporting to the application owners!

SINGLE POINT OF MANAGEMENT!

!•  Minimize service and system

outages!•  Identify recurring incidents and

implement action to remediate problems before they cause impacts!

•  Assist troubleshooting by suppressing “noise” events and providing root cause determination!

MAXIMIZE SERVICE AVAILABILITY!

!•  Reduce the need for manual

action or intervention!•  Automate for repeatability and

elimination of human error!•  Develop standardized practices

for complex business processes!•  Enable the development of APIs

to allow for self-service management by the consumers!

IMPROVED OPERATIONAL EFFICIENCY!

Page 71: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Understand the !end-user experience !

Follow changing !workloads!

Mobile devices & smart endpoints!

Private, public & hybrid clouds!

Highly virtualized applications, storage & networks !

Discovery!Visibility into application resources!

End User Experience!

Transaction performance monitoring to ensure SLA compliance!!!

Transaction Tracking!

Rapid problem isolation through transaction path analysis!!!

Diagnostics!!

Domain-specific operations tools for diagnosis and repair!!!

Predictive Analytics!

Proactive approach to reduce outages & improve performance!!!

shared data & common services!

See steps !across the cloud !

VISIBILITY, CONTROL AND AUTOMATION TO INTELLIGENTLY MANAGE CRITICAL APPLICATIONS IN CLOUD AND HYBRID ENVIRONMENTS.!

APPLICATION PERFORMANCE MANAGEMENT!

Page 72: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

COMPOSITE APPLICATIONS!

Site Content!Search!

Session!Information!

User Login!& Identity Mgmt!

Content Mgmt!System!

Social Network!Widgets!

Site Tracking!& Analytics!

Banner Ads & !Revenue Generators!

Multimedia &!CDN Content!

Page 73: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

GAINING PERSPECTIVE REQUIRES BALANCE!

Packet Capture!

Synthetic Transactions!

Client Monitoring!

Client Monitoring!

Synthetic Transactions!

Server Probe!

1.  Client to the Server!2.  Server to the Client!3.  “3rd Party” Vantage Point!4.  Synthetic Transactions!

Four Perspectives of User Experience!

Page 74: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Predic.ve  Outage  Avoidance  

Ensure  availability  of  applicaBons  and  services  

   

•  Use learning tools to augment custom best practices •  Leverage statistical methods to maximize predictive warning •  Improve problem detection across IT silos

Predict

Faster  Problem  Resolu.on  

Find  &  correct  problems  faster  with  tools  that  determine  acBons  

required  to  resolve  issues  

   

•  Identify problems quicker with insight to large unstructured repositories

•  Isolate problems quicker by bringing relevant unstructured data into problem investigations

•  Repair problems quicker with the right details quickly to hand.

Resolve

Op.mized  Performance    

Track,  OpBmize,  and  Predict  capacity  and  performance  needs  

over  Bme  

   

•  Track capacity and performance of applications and services in classic and cloud environments • Optimize resource deployment with what-if and best fit planning tools •  Escalate capacity and performance problems before they cause critical failures

Perform

Improved  Insight    Enhance  visibility  into  systems  resource  relaBonships  while  

increasing  customer  saBsfacBon    

   

•  Determine what resources are interdependent to assess impact of failures •  Gain insight into what is important to your customer

•  Decrease customer churn and acquisition costs while increasing customer retention and satisfaction

Know

Automated Analytics helps lower IT Administration Costs: • Performance and Capacity planning tools monitor appropriately and escalate, reducing time

consuming report browsing • Learning tools reduce customization and best practices investment on initial deployment • Log Analysis helps speed problem resolution to be able to do more with less

BUSINESS VALUE OF ADOPTING ANALYTICS!

Page 75: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

That is great but we need more…

Page 76: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

In addition to handling monitoring and performance alerts, it helps drive improved availability.!

Our Formula:!1.  Continually collect, categorize, and analyze all events from as many

sources as possible!2.  Correlate events and analyze them using previous outages as

patterns to identify situations worth investigating!3.  Notify a support team so the situation can be mitigated before

becoming an outage!4.  Automate responses that have well established situational

fingerprints and proven resolution steps!

THE EVENT MANAGEMENT FOCUS!

Page 77: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

ONE INTEGRATED ENVIRONMENT!

Distributed! Database!Mainframe! Network! Middleware! Storage!

Event Pool!

Operational!Data Warehouse!

Predictive!

Enrichment & Correlation!

Service Desk!Paging!

CMDB!

Knowledge!

Asset Mgmt!

Event Catalog!

Event API!

Business Telemetry!

3rd Party Providers!

Presentation Framework!

Page 78: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

Presentation!Framework!

Asset Management & Topology Database!

Aggregation and Analysis!

Security Management!

Availability Management!

Configuration Management!

Change Management!

Performance Management!

Enterprise Data Sources!

Business Telemetry

Information!

Configuration Discrepancies!

Enrichment Data!Business Activity Data!

Historical Data!

“Enriched” Events!

Change Activity!

Topology Snapshots!

Tren

d-R

elat

ed F

aults!D

iscovered Problems!

Status Indications!

Incidents!

Audit Information and Suspicious Activity!

Enrichment Data! Business Activity Data!

Automated Discovery!

Page 79: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CONCEPTUALIZING SITUATIONAL AWARENESS!

Situational Awareness

Engine!

Adapted from http://www.slideshare.net/TimBassCEP/getting-started-in-cep-how-to-build-an-event-processing-application-presentation-717795!

Real-Time Event Streams!

Detected and Predicted Situations!

Patterns from Historical Data!

Causal Relationship from Past RCAs!

Page 80: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

CONCEPTUAL MODEL OF COMPLEX EVENT PROCESSING!

Adapted from http://www.slideshare.net/aparnachaudhary/esper-cep-engine!

Event Pipeline!

Event Queries!

Time Window!

Data Events!

Control Event!

Other Events!

Event Filter!Scenarios!

A!

B!

C!

Feedback Loop!

Event Intelligence!

Action Events!

Page 81: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

ITERATIVE DEVELOPMENT!

As you recognize opportunities to capture knowledge, use it to improve your Event Management System. !

Page 82: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

The IT Culture is driven to technology for solutions. Leverage your monitoring and testing tools to help practice failure scenarios. Work on tracking potential points of failure by creating monitoring and report the rate of occurrence to the developers at the start of each new iteration.!

PLAYING TO OUR STRENGTHS!

Page 83: Bright talk   running a cloud - final

Follow Us: #ITSMSummit!

LET’S KEEP THE CONVERSATION GOING…!

[email protected]!

ReverendDrew!

SystemsManagementZen.Wordpress.com!

systemsmanagementzen.wordpress.com/feed/!

@SystemsMgmtZen!

ReverendDrew!

[email protected]!

614-306-3434!

Page 84: Bright talk   running a cloud - final