Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Digital Service Efficiency: - A New Management Scorecard (DCM 10.2)
Shekhar Dasgupta Founder
GreenField Software
2
Digital Service Efficiency: A New Management Scorecard
This presentation defines and outlines management scorecards, including Kaplan & Norton’s Balanced Scorecard. It then discusses why a supplementary scorecard should be used to measure IT efficiencies with respect to specific Data Center operational roles. Finally, it goes to show how next-gen DCIM software should build a role-based DSE framework to achieve organizational objectives and goals.
3
Scorecard Examples
Management Scorecards
What are they? Organizational Performance Management frameworks • Mix of financial & non-financial measures against benchmarks • Started in late 1970s by Dr. Aubrey Daniels • Goal: alignment of top management towards common organizational
objectives & positive outcomes through key measurement parameters
What do they measure? People Performance Process Efficiency Systems Efficiency
4
Balanced Scorecard
• Developed by Kaplan & Norton in 1990s
• Linked company strategy to Financial & Non-Financial KPIs
• Multiple variants, including industry-specific templates
• Technology recognized as enabler for business process efficiencies and driver for innovation and growth
• Observations: Tool for objectively incentivizing
executives on non-financial KPIs Practitioners have not evolved
any IT infrastructure-related KPIs nor directly linked them to the BSC framework
5
Why a New Scorecard for Data Centers?
• For the Data Center to function effectively/compete better • Who are responsible to make that happen? • How does one measure the new processes are being managed effectively? • What are the costs and how do they measure against benchmarks? • How does one measure that the innovations/ new systems are delivering
desired outcomes?
6
Digital Service Efficiency: A New Scorecard for Data Centers.
“Digital Service Efficiency (DSE) methodology is ebay’s miles-per-gallon (MPG) equivalent for viewing the productivity and efficiency of technical
infrastructure across four key areas: performance, cost, environmental impact and revenue”.
“The DSE methodology equips decision-makers to see the results of their technical infrastructure choices to date (i.e., what MPG they achieved with
their design and operations), and serves as the flexible tool they need when faced with making new decisions (i.e., what knobs to turn to achieve maximum performance across all dimensions). Ultimately, DSE enables balance within the technology ecosystem by exposing how turning knobs in one dimension affects
the others”.
Original Designer: Dean Nelson from eBay, Inc.
7
DSE Dashboard
ebay’s real-time dashboard available on http://tech.ebay.com/dashboard
8
Next Gen DCIM
Delivering Role-Based DSE Scorecard
9
DCIM & Role-Based Digital Services Efficiency
Today’s DCIM • Current DCIM measures real-time:
IT Asset utilization DC Power usage (PUE, CUE)
Cooling Requirements Floor & Rack Space
Occupancy • Data analyzed for improving
Energy Efficiencies & Capacity Planning
• Helps to predict & prevent failures
Next-Gen DCIM • DCIM DSE will provide Role-Based
Scorecards • DCIM DSE will provide granular
cost measurement across complete IT infrastructure
• DCIM DSE scorecard will measure Infrastructure Capability for Process Improvements & Technology Innovations.
10
Data Center Operations & Roles
Data Center Manager
Facility Staff IT Staff
Data Center Manager: Responsible for overall data center operations
Data Center Facility Staff: Responsible for data center facilities operations
Data Center IT Staff: Responsible for data center IT operations
11
DCIM DSE Scorecard
For Facility Staff
12
Data Center Facility Staff Fa
cilit
y K
RA
Infrastructure monitoring & health check
Scheduled & Preventive Maintenance
Incident and Problem Management
Maintaining Energy efficiency
Uptime Reporting
Faci
lity
KP
Is
13
Infrastructure Monitoring KRA & KPIs
DCIM provides better DC facility monitoring by Real-time monitoring of power systems > Electrical panels (HT & LT panels), UPS, PDUs (row & rack) Real-time monitoring of cooling systems > Chillers, PACs, AHU Real-time monitoring of environmental statistics of DC > temperature, humidity, water-leak, smoke, fire Ability to monitor above subsystems through a single dashboard and get alerts on abnormal conditions over email/SMS
KRA: Data Center infrastructure monitoring & health check
DC
IM
Cooling KPIs UPS KPIs Environment KPIs
Fan Runtime Utility Line & Output Voltage
Cabinet Internal Temperature
Supply Air Temperature
Power Loss Cabinet Internal Humidity
Supply Air Humidity UPS Load Room Ambient Temperature/Humidity
Rack Cooling Index Remaining Battery Capacity
Smoke
Return Temperature Index
Internal UPS temperature
Water Leak Detection
Return Air Humidity UPS battery run time remaining before battery exhaustion
Cabinet Door Ajar
Power Consumption (kW)
The elapsed time since the UPS has switched to battery power
Motion
14
Maintenance KRA & KPIs
Real-time monitoring & alerts help staff during routine checks as well as preventing failures of facility equipment. Helps scheduling routine maintenance for facility devices Breakdown maintenance analysis prevents similar failures or enables faster recovery
KRA: Preventive & Breakdown Maintenance
DC
IM
Scheduled Maintenance Breakdown Maintenance
Age of Device Failure Rate
Criticality of Device Mean Time Between Failures
Date of Last Check-Up Mean Time To Repair
Check-Up Frequency Total Maintenance Cost Asset Replacement Value
Condition Based Maintenance %
Uptime Required Time
Spare Part Used Versus Availability
Immediate Corrective Maint. Time Total DT Related to Maintenance
15
Incident Management KRA & KPIs
DCIM enforces ITSM best practice framework on data center facility operations and ensures that all incidents, service requests are tracked till closure
KRA: Incident and Problem Management
DC
IM
Incident Measures Resolution Measures
Number of Incidents Mean Response Time versus target response time
Breakdown of incidents at each stage (logged, WIP and closed)
Mean elapsed time for incident resolution (Turn around Time)
Number and % of major incidents
% of incidents resolved within target resolution time
Number of incidents reopened as % of total
Number and % of incidents incorrectly assigned
Breakdown of incidents by time of day
Number and % of incidents incorrectly categorized
16
DCIM: Helping facility staff with their PUE KPI
Ensure a stable PUE for the data center
DCIM monitors data center PUE at real-time and also does analytics on historical PUE data to recommend ways to improve PUE
KPI: Maintain efficiency level (PUE)
DCIM
Other power management measures: watt per sf, RCI
17
DCIM: Helping facility staff in their Uptime KPI
Periodic reporting of Facility uptime, RTO & RPO statistics of Facility Services & subsystems.
DCIM provides Facility Uptime and recovery metrics. Includes reporting on health & functional statistics of facility subsystems like power, cooling and environmental components. DCIM provides dashboards, analytics and scheduled reports on facility uptime, DC energy efficiency (PUE) and incident management
KPI: Facility Uptime as per SLA
DCIM
18
DCIM DSE Scorecard
For IT Staff
19
KRA: Data Center IT Staff IT
KR
A
IT Monitoring
IT Hardware Maintenance
IT Asset Management
IT Vendor/Contract Management
Business Continuity
Reporting
IT K
PIs
20
Monitoring & Provisioning KRAs & KPIs
Real-time monitoring of resource utilization of IT devices: server CPU, memory, storage, network bandwidth.
KRA: IT Monitoring & Provisioning
DC
IM
Proactive monitoring enables alerts when thresholds are breached.
Auto Provisioning of Racks & Devices
Virtualization Planner Identifies servers that can be virtualized. Also identifies under-utilized IT devices; recommends retirement, replacement.
Monitoring Provisioning
CPU Utilization Time to Harden a New Server
Memory Utilization Time to Provision a New Device
Power Consumption Time to Provision New Rack space
Storage Utilization versus Free Storage
Time to Virtualize a new system
Server Uptime versus Target Time to replace a legacy system
Failures Prevented Due to proactive monitoring
Time to decommission a legacy system
Failures due to human errors Time to install patches & updates
21
IT Hardware Maintenance KRA & KPIs
DCIM helps schedule preventive maintenance (PM) based on following: Age of a device as recorded in DCIM
Utilization/load of device as monitored by DCIM
DCIM helps IT staff understand cascading effect of temporary unavailability (due to PM) of a particular device: send prior notification
KRA: IT Hardware Maintenance
DC
IM
Scheduled Maintenance Breakdown Maintenance
Age of Device Failure Rate
Criticality of Device based on utilization and application hosted
Mean Time Between Failures
Date of Last Check-Up Mean Time To Repair
Date of last upgrade/nature of upgrade
Total Maintenance Cost Asset Replacement Value
Condition Based Maintenance %
Uptime Required Time
Spare Part Used Versus Availability
Immediate Corrective Maint. Time Total DT Related to Maintenance
22
IT Asset Management KRA & KPIs
DCIM serves as enterprise asset management software for both IT & Facilities. DCIM auto-discovers intelligent assets and creates asset database. DCIM helps manage IT asset relationships DCIM also maintains information about redundant assets in HA and DR setup
KRA: IT Asset Management
DC
IM
Asset Management
Time taken to add or delete intelligent & Non-intelligent asset
Time taken to update due to MAC
Time taken to add interdependencies between assets
% accuracy of asset database
% Over & Under Provisioned
23
Vendor/Contract Management KRA & KPIs
DCIM tracks support renewal dates Tracks hardware vendor/supplier and services provider
KRA: Vendor/Contract Management
DC
IM
Vendor Management
% of systems out of support renewal
% Uptime by device category and vendor
% Contractor’s Compliance by SLA terms
24
Business Continuity KRA & KPIs
DCIM helps in better impact analysis of outages and in faster RCA of any incident and thereby helps in faster turn-around-time
KRA: Business Continuity
DC
IM
Business Continuity
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Actual versus RTO & RPO
25
DCIM: Helping IT staff in their Reporting KRA
DCIM provides superior reporting on IT infra availability, resource utilization and incident management
KRA: Reporting
DC
IM
Trend Comparison for Multiple Servers
26
DCIM DSE Scorecard
For Data Center Manager
27
KRA: Data Center Manager
DC
Man
age
r K
RA
Increase profitability by controlling data center cost
Minimize DC failure and improve availability
Improve operational efficiency and meet business SLA
Data center capacity planning
Adopt ‘Green’ practices for sustainable DC operations
Reports & Analytics DC
Man
age
r K
PIs
28
Data Center Manager Cost & Profitability KRA & KPIs
Control CapEx: Repurpose under-utilized
servers Discover stranded capacities
& defer costly upgrade
Reduce OpEx: Reduce cooling costs Reduce server footprint
KRA: Increase profitability by controlling data center cost D
CIM
29
Data Center Manager Availability KRA & PIs
Ability to predict failures Better impact analysis in the event of subsystem/component failure Faster RCA and Turn around Time capabilities
KRA: Minimize DC failure and improve availability
DC
IM
Actuals
Number of Incidents/alarms
SLA Benchmarks
Breakdown of alarms at each stage (logged, WIP and closed)
Major alarms by type Facilities: Fire, Temp, …. IT: Server, Storage, Application… RTO
RPO
30
Data Center Manager Operational Efficiency vs. SLA
DCIM automates critical data center processes like Asset Management, Capacity Planning and Provisioning, thereby minimizing human error, increasing accuracy and data integrity and improving operational efficiency of the data center.
KRA: Improve operational efficiency and meet business SLA
DC
IM
Actuals
Asset DB Accuracy
SLA Benchmarks
Time and Cost to Provision additional resources
Availability by Servers, Storage and Applications
Watt Per Rack and Watt per sq ft
PUE & CUE
31
Data Center Manager: Capacity Planning KRA & KPIs
Monitor current capacity utilization Forecast future capacity requirement accurately Design and implement critical capacities efficiently without under/over-provisioning
KRA: Data center capacity planning
DC
IM
Monitoring Planning & Forecasting
Incidents due to Capacity Shortages
Exactness of Capacity Forecast
Capacity Adjustments % reduction in panic buying
Unplanned Capacity Adjustments % reduction in lost business due to
inadequate capacity
Resolution Time of Capacity Shortage Capacity Reserves
Percentage of Capacity Monitoring Relative reduction in cost of
production of Capacity Plan
Sources: 1. Clemson Computing & Information Technology
2. IT Process Maps
32
Data Center Manager ‘Green Practices’ KRA & KPIs
Monitor energy consumption in the data center till the lowest level
Find ways to reduce energy consumption and improve efficiency
Ensure that DC operations comply with organization’s sustainability goals
KRA: Adopt ‘Green’ practices for sustainable DC operations
DC
IM
33
Data Center Manager Reporting KRA & KPIs
Reports & Analytics on - Uptime and availability - Energy efficiency and health - Data center costs and savings - Capacity/Resource utilization and availability - Operational efficiency and SLA Compliance
KRA: Reports & Analytics
DC
IM
34
How Will DSE Scorecard Help Data Center Operations?
Link Back to Organizational Vision & Strategy & BSC
Are the Data Centre Infrastructure & Capital Costs
aligned to process improvements?
Have we been able to reduce Infrastructure OpEx?
Are we maintaining a Risk-free Data Centre Infrastructure?
Is the infrastructure delivering on the technology innovation?
Next Gen
DCIM w/ DSE
Scorecard