Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Understanding MLC Cost Impact of Performance and Capacity Management
St. Louis CMG Oct 4th 2016 at Donald Zeunert
BMC Software
Survey Questions
Hardware upgrades
– driven by online / batch?
MLC Costs - 4hr Rolling Average
– online, batch or both?
4HRA MSU peak hour
– SCRT (CEC, LPAR, Product sources)?
2
Topics
What is important to reduce IT Costs
– MIPS / MSUs reduction?
– ISV SWLCs reduction?
– Specialty Engines usage?
– JIT Hardware upgrades?
Why do I need to manage my 4 hour MSU rolling average?
How can I manage the 4HRA w/o impacting SLAs?
3
What is important to reducing IT Costs ?CIO objectives and IT Budget Cost sources %
Bigger benefits with less effort if focus on managing MLC costs
Focus on Software Costs
MLC 25-30%
– Save 10% = 2.5%
ISVs 7-10%
– Save 10% = 0.7%
4
Meeting CIO objectives -With 80/20 RuleFocus 20% of effort to get 80% of benefit
Hardware no longer $ issue
Performance & Availability Rules of Thumb
Keep CEC & MVS 100% busy– An unused MIP is lost forever– See USAA Session 18345
PR/SM Configuration– ensure production can steal from
test and visa-versa
– Over configure logical to physical ratio so MIPS > guaranteed % can be used
Upgrade thresholds – online workloads peak at 70-80% of
capacity
Decrease Batch window– Complete ASAP for maximum
window to recover from failures or schedule planned outages
– Start batch as soon as possible
5
Are you managing P&A using most of these rules?
IBM SCRT Report – LPARs contribution to CPC 4HRA
6
The CPC 4HRA used for z/OS MLC chargesis different date than any of the LPAR Detail breakdowns
PRD2 the largest LPAR did peak at same hour the two days beforeNone of the other LPARs around that date / time.2016 SCRT version has detail mode to show all hours
Data in SCRT report insufficient to understand what caused CPC 4HRA peak for the month
Goal of Usage (4HRA) based MLC
7
Billed at 4hr Rolling average not peak usage
4hrs4hrs
• Heavy online with multiple Peaks
• Little or no batch within 4hr of peak
• Batch consumption peak below onlines4HRA
Buy extra capacity - meet SLAs, pay for less than used
“Ideal” workload
5:00 PM9:00 AM Q: Is my CPC Ideal?
FREE
"Indy, they're digging in the wrong place!”
8
When you have all of the information you can focus your efforts in the right place
Modification to formula;take back one kadam
The SCRT report doesn’t have enough information to tell if our CPCs are “Ideal”. We need to rely on performance monitors, Capacity planning reports, other tools.
Is my workload “Ideal”? –Understand 4HRA Drivers
9
Batch 7:30PM
Backup / ReOrgs
Online Peak 3pm
4HRA RiskFrom batch spike
4HRAPEAK
Start of online day
White spaceConsumed?
Investigate• CPC billed MSUs
time• Which LPAR is
contributing to CPC max
• What workload on the LPAR
• From a spike, necessary?
Typical ISSUEs – 4HRA not when expected
10
BATCH is source of problemStarted too early – averages w/ end of online peak usage Finishes too late - averages w/ start of online peak usage Source of peak MSUs is the 4HRA
– Finishes with hours to spare and is not either above condition• Needs to be controlled
– Drops then peaks again barely finishes on time
Daytime OnlinesPeak 4HRA 56 MSUs
10pm Why not started earlier? MSUs to finish
on time
Onlines volume complete
3:00AM
Why not CAP to Online Peak?
Batch needs 100% capacity to complete on time
11
Need to Begin sooner Shorten Duration
– make (batch, sorts, backups, DB reorgs, etc.) more efficient
– Use less GCP or use specialty engines
Delay Subset– find discretionary batch
/ work that can complete after 8am
Batch starts 9pm finishes 7:30amOnlines start 8amNeed all MSUs to completeResource limiting not an option
Batch Multiple Peaks
12
Need to smooth and CAP– Batch Starts 7pm– Drops at 8:15 as
waiting for something– Picks up at 9:30, drops
off till Midnight– Small spike– 3am something huge
(reorgs / backups) creates 4hr MSU peak for billing.• Doesn’t run long
done 6:15am ready for 7am online start
Resource Limit Second Peak
Monitor / Manage / Plan - 4hr Avg MSU
Monitor demand and 4HRA MSUs– Catch loops and anomalies to exclude from SCRT report
– Understand normal LPARs / CEC usage patterns
– Track current against normal 4hr MSU patterns
Manage potential SLA impact of capping– Max MSUs Consumed
– Average MSUs Consumed
Plan your corrective actions– Understand what to sacrifice to stay under expected max
• What actions can be taken to allow sacrifice
13
Capping types and Pro/ConsManage to current consumption
Can’t exceed even if no impact to 4HRA
PR/SM Initial Capping (aka – Hard Capping) –Relative to LPARs current weight, # of CPs worth, don’t take “white space”
Absolute Capping – (hard cap new on EC12) –Not relative to share, specified in terms of 1/100ths of a processor
WLM Resource Groups - CPU SU/SEC a Service class can use across the Sysplex
– Sysplex scope may not be CPC and aligned with SCRT billing
Manage to 4hr MSU Average
May exceed MSU cap until impacts 4HRA limit
4HRA not allowed to exceed cap– Will never cap lower than Defined capacity– Even if 4HRA exceeds cap not billed for it
Types• Defined Capacity – Soft Capping # of
MSUs to give LPAR in 1MSU increments
• Group Capacity – Set capacity for group of LPARs in a CEC to subset of CEC capacity but in PR/SM ration entitlements
14
Growth LPAR – Rolling and Using
15
Most monitors will let you view a CECs Rolling 4hr MSU
But this is not enough• Alarms – Not just 4HRA too
late, look at MSU spikes• Actions - What on the fly
actions are allowed?• Impact SLAs ?Peak 4HRA long time in making. What actions now?
Image purple line was 4HRA Cap
Ease of identifying drivers of 4HA MSUs
16
Easier
Homogeneous software on LPARs
Very few LPARs
Very few LPARs / CEC
HardHeterogeneous software on LPARs
– New work on z LPARs just WAS and / or DB2.
– IMS, CICS not on all LPARs– DB2 not on all LPARs– DB2 only LPARs (SAP)
Numerous LPARs with different time of day peaks and different application mixes typically from large customers / acquisitions
– Banks– Insurance – Service Bureaus
Sub-capacity Pricing – Example Scenario
17
CPC Peak 4HRA1108 MSUs
LPAR1 Peak 4HRA550 MSUs
LPAR3 Peak 4HRA450 MSUs
LPAR2 Peak 4HRA400 MSUs
0
200
400
600
800
1000
Monthly MSU 4 Hour Rolling Average
LPAR1
LPAR2
LPAR3
Total CEC (LPARs 1-3)
LPA
R1 z/OS
CICS
DB2
IMS
Onlines
LPA
R2 z/OS
CICS
DB2
IMS
Onlines
LPA
R3 z/OS
CICS
IMS
Batch
MQ
MQ: 450 MSUs
Largest LPAR peak not what is billed
Bill - z/OS,CICS,IMS: 1108 MSUs
Need for LPAR and Workload balancing
18
4HRA Peaks at Same basic time on CEC777
Different peak times on CEC888
Complexity different products on LPARs / CPCs
19
LPARs on different CPCs peak at different timesIf Subsystem MSUs not = z/OS MSUs, then not licensed all LPARsSCRT Record type data
– Bill by product date / time
CPC 1CPC 2
PR/SM – Easy changes lead you big rewards
20
Controlling 4HRA Monthly Max
Using PR/SM
– Capping
– Max share via #LPs
Using WLM (w/ PR/SM)
– Capping
• LPAR
• Service class – for lower importance work
– Discretionary can be source of 4HRA peak
21
CEC and LPAR – Rolling 4hr MSUs
22
Reality workloads tend to consume 100% of capacity providedLPARs configured for stealing for latent demand
Contribution by LPAR
4HRA Free above the DC line
LPARs with latent demand expand to take up others contractions
Logical / Physical – Guaranteed Minimum
23Everyone's share < 10% of 16 CPs so guaranteed < 1.6 CPs
Using White Space / more than guaranteed shareImpacting 4HRA ?
If no LPARs with work to compete with production batch, can exceed guaranteed share. And therefore easily exceed (2x) daytime 4HRA (3 vs 1.5 CPs)
Does not create a maximum other than 100% of # of LPs
Avoid Batch SLA impact
Reduce Batch GCP CPU
zIIP enabled Sorts, reorgs, etc.
Delta / changed backups vs fullReorgs by need instead of all
Balance LPARs across CECs• avoid simultaneous peaks• Use scheduling environments
Reduce LPAR Software footprint• Consider Batch only LPARs eliminate
non-essential software
Reduce Batch CPU Tune application and subsystems• DB2 V10 zIIP enabled sequential prefetch (batch,
online)• Verify adequate zIIP capacity
Reduce Batch Elapsed timeConvert serial processing to parallel processing• N-way data sharing (VSAM RLS)• Batch pipes or equivalent
24
MSUs and Relative Nest Index (RNI)
25
High RNI = Higher MSU for same workload.
Drives demand MSUs, which drives 4HRA and MLC costs
Summary - New Rules of Thumb
Manage 4 hour rolling average MSU consumption without impacting Service Level AgreementsDon’t use CEC Capacity if SLAs can be met
– Don’t always let unimportant workloads consume “spare” MIPS• Group CAP box to Production monthly peak +N% for SLA
– Let others consume 100% of what the bill will be anyway
– Be careful how many LPs you give LPAR if no competing LPARs– Balance LPARs across CECs if peak at different times
Don’t let batch be 4HRA peak by default– If it can complete with an acceptable window for recovery
• Don’t start batch ASAP if it creates a 4HRA max • Don’t let batch finish late if creates a 4HRA max in online window before online peak
Don’t run software on all LPARs – if not needed as billed even if not used
26
Discussion TopicMobile Workload Pricing
– 60% discount on CICS, IMS, DB2, MQ, WAS MSUs if customer can prove request initiated from a “mobile” device (phone, tablet)• So if same URL is used for “mobile” and PC and you can’t tell the difference
you can’t get the discount
– Do you currently track “mobile” MSU consumption?• How do you do it?
27
On May 6, 2014 IBM a new reporting (MWRT) tool ,available June 30th, Workload on z/OS from Mobile (cell / tablets) to pay 60% reduced MLC
Replaced w/ new JAVA based SCRT toolWLM w/ CICS & IMS support MWLC via SMF70s and 72s