69
 Average Session Load (ASL) The Golden Metric ? Kyle Hailey http://perfvision.com

02 Asl Golden Metric

Embed Size (px)

DESCRIPTION

02 Asl Golden Metric

Citation preview

  • Average Session Load (ASL)The Golden Metric ?Kyle Haileyhttp://perfvision.com

    #.*

    In this SessionThe Holy Grail of Performance : ASL

    ASL : stethoscope for Database HealthTap into the heart beat of the databaseHeart stopped - Hung?Beating very slow - Idle ?Beating fast - Heavily Loaded ?Subcomponent TrinityCPUWaitsTime seriesExtra: use Maximum CPU as Yardstick

    #.*

    Word of WisdomMy Goal is to cut out a lot of the noise and extraneous data and hone in on the essentialHalf of the game is knowing when to act and how much effort to put in

    #.*

    Idle DatabaseValue of proving the database is Idle Its the Databases FaultHow many times do you hear that?Database IdleNo load on databaseDatabase performance is fineUnder utilized Problem lies elsewhereSaved me time and stress many times

    #.*

    Whats the Database DoingOften I want a quick and easy way to see what the database is doingIs it working?Is it blockedHow much is going on?Is the database healthy?

    #.*

    Checking the DatabaseHow do *you* check the Database Health?Routine Exams?Statspack?1300 lines of dataWhich linesHow many statsAutomated Alerts?What do you set alerts on?What if no alerts fire ? Are you ok?Do alerts really tell you whats happening?

    #.*

    Whats the Database DoingWhip out the stethoscope

    ASL

    #.*

    The Cult of ASLOnce youve been initiated there is no going back

    #.*

    Welcome to

    The Cult of ASLMagic Metric for Database Health

    Average Session Load (ASL)

    For those of you who are already members, welcome back

    #.*

    ASLThe Golden Metric Powerful MultidimensionalIndispensable

    #.*

    Average Session Load (ASL) goes under the guise of

    Session LoadI often refer to it this wayAverage Active Sessions (aas?)The way I named it in OEM 10g GraphsCenti-seconds per second (or secs/sec)In the dark ages, before OEM 10g, waits were often measured as Centi-secs per sec

    #.*

    ASL Average Session LoadAverage Value Averaged over 15 secs in OEM 10gTime period could 5 minutes, an hour

    Active Session OnlyActive sessions put load the databaseInactive Sessions dont put loadExcept for memory usageActive sessions areSessions in a CallStarted a SQL Statement but hasnt returned yetDBWR writing blocks out

    #.*

    ASL Calculations ASL = DB TIME / Time PeriodDB Time (DBT) = sum over all sessions of time in the statesCPU Wait DB TIME (10g) =

    DB TIME =select value from v$sysstat where name = DB time;Select sum(time_waited) from v$system_event where event not in ( ... idle events );+Select value from v$sysstat where name = CPU used by this session;DB time

    #.*

    ASL sources

    ASL can be found from or derived from10gOEMV$systat DB time / elapsed time9i v$system_event $sysstatThis works Oracle 7 to 10g (probably 11)

    #.*

    ASL 9iDataV$system_eventSum wait timesNon-idle waitsV$sysstat CPU used by this sessionASL= (sums(wait)-cpu)/elapsed timeASL(CPU)=cpu/elaspedASL(wait)= sum(waits)/elapsedProduces Session time / elapsed timeSession cent-secs/secSession secs/ secAvg Session Load (ASL)

    #.*

    ASL in OEM DB Home Page

    #.*

    Calculating ASL : StatspackUses v$session_waitv$sysstatLook forTop 5 Timed EventsElapsed Timecpu_count helpful

    Seconds in Wait / Elapsed time = ASL

    #.*

    Use Statspack to Find WaitsStatspack Cheat Sheet:InstallConnect as SYSDBA@?/rdbms/admin/spcreate.sqlRunExec statspack.snap;Generate Reports@?/rdbms/admin/spreport.sql

    #.*

    StatspackTrusty statspack reportElapased TimeCheck Top 5 Timed Events

    Start at line 52 of about 1300!

    #.*

    Elapsed TimeSTATSPACK report for

    DB Name DB Id Instance Inst Num Release RAC Host------- ----------- -------- -------- ---------- --- -------LABSF03 1420044432 labsf03 1 10.1.0.2.0 NO labsfr Snap Id Snap Time Sessions Curs/Sess --------- ------------------ -------- ---------Begin Snap: 1 03-Apr-06 12:34:06 18 5.6 End Snap: 2 03-Apr-06 12:34:36 18 4.8 Elapsed: 1.00 (mins)

    #.*

    Used CPU Time and Wait Time Look at Top 5 Timed Events

    Top 5 Timed Events~~~~~~~~~~~~~~~~~~ % TotalEvent Waits Time (s) Call Time--------------------- --------- --------- -------buffer busy waits 2,748 250 78.72CPU time 32 10.16free buffer waits 1,588 15 4.63write complete waits 10 8 2.51log buffer space 306 5 1.51

    #.*

    ExampleCPU + WAITSCPU = 32 WAITS = 250 + 15 + 8 + 5 = 278 secsElapsed Time = 60 secs (32 + 278) user secs / 60 secs = 5.1 average session load ( 4.6 waiting 0.5 on CPU )

    Top 5 Timed Events Event Time (s) ----------------- -----buffer busy waits 250CPU time 32free buffer waits 15write complete waits 8log buffer space 5

    #.*

    Available CPU init.oraStatspack 10g shows # of CPUSStatspack 9i # of CPUs SQLPLUS> show parameters cpu_count

    NAME TYPE VALUE------------------ -------- ----------cpu_count integer 2

    #.*

    Available CPU vs ASL

    Far above available CPU => problem

    Plenty of Free CPU => wait bottleneck

    4.6 waiting 0.5 on CPU

    ASL = 5.1# of CPU = 2

    #.*

    ASL Primary PurposeAnswers the Question

    Is the database idle ?Active?How Active?

    #.*

    ASL< 1 Database is not blockedASL ~= 0 Database basically idleASL < # of CPUsExtra CPU to be hadDatabase is probably not blockedASL > # of CPUsCould have performance problemsASL >> # of CPUSThere is a bottleneck

    #.*

    Going Farther with ASLASL can tell you a lotBut its components tell you much moreTo go farther need the components of ASLCPU How many CPUs (max CPU available)WaitWhich waitsValue over time

    #.*

    Components of ASLDB Home Page Performance PageASL Point in TimeASL over Time

    #.*

    EM DB Home Page

    #.*

    Performance Page

    #.*

    Performance Page

    #.*

    ASL Performance Page

    #.*

    OEM 10g - ASLASL is the top of the curve31.92

    #.*

    OEM 10g - CPUCPU

    #.*

    OEM 10g - WaitsWaits

    #.*

    OEM 10g CPU vs WaitWaitsCPU

    #.*

    OEM 10g Max CPUMaximum Possible CPU

    #.*

    OEM 10g Zoom-InAvailable CPUCPU + WAIT

    #.*

    OEM 10g get to work !Relax Get to Work!Looks OK But

    #.*

    Calculating ASL

    #.*

    ASL Calculations ASL = DB TIME / Time PeriodBut there is another way

    #.*

    ASL alternative Calculationactive sessionsCount Active sessions over intervalaverage by intervalLess accurate, but surprisingly closev$session_wait (or v$active_session_history)Wait_time > 0 = ON CPUFilter out idle events9i or lower, join to v$sessionstatus='ACTIVE' type='USER10g v$sessions has all the columns

    #.*

    Two Sources comparison

    V$system_event & v$sysstatIndirect measure via time spend in DBAccurate Measure of time countersValues can lag (especially CPU)v$session_waitDirect measure of # of sessionsCloser to real timestatistical approximation via samples

    #.*

    ASL in OEM Same Chart but calculated differently 12

    #.*

    ASL Performance Pagev$sysstatv$system_eventASL = DBT / TimeASL(tn)= ( DBT(tn) - DBT(t0) ) / (tn-t0)ASL1

    #.*

    ASL Top Activity PageASL(tn) =/ n2n = # of samples

    #.*

    Active Average Sessions = Top Activty ?ASL = DBT / TimeASL(tn)= ( DBT(tn) - DBT(t0) ) / (tn-t0)Performance Page : Average Active SessionsTop Activity=?

    #.*

    DB TIME = area under the curve ASL = DBT/ TimeDBT = Area under curve

    #.*

    ASL Top Activity Pagen = # of samples ASL = DBT/ Time

    #.*

    ASL Top Activity Page/

    #.*

    Samples VS CountersCountersSamplesSlight Lagsv$system_eventv$active_session_history

    #.*

    The Power ASH gives ASLDB Home Performance Page Top Activity Page

    #.*

    ASH in OEMTop Activity gives more information

    #.*

    Top Activity : Based on ASHmissingThanks To ASH

    #.*

    Top Activity : ASH Dimensions

    #.*

    ASL %Session Time IssueShown in % DB TimeMissing % Session Time

    #.*

    Top Activity: ASH SessionsMany Users Active On Performance Page, no way to tell how many usersBut Top Activity Page fixes that

    #.*

    Top Activity: ASH SessionsTwo Users Active

    #.*

    SQL and SessionDB Home PerformanceTop ActivitySQLSession

    #.*

    Session : ASH Activity

    #.*

    SQL : ASH Activity

    #.*

    Getting the MostNeed to know the Systems ProfileWhat your application is likeData WarehouseOLTPTypical loadOnce you get to know it you can see anomaliesIs ASL near 0 when it should be higherIs that Data Warehouse query running normalDo you know what it looks like?Is there an unusual bottleneck

    #.*

    Knowing your DB Profile

    #.*

    When to tuneGeneral rules of ThumbWaits >> CPUCPU > Max CPU

    #.*

    Waits > CPU

    #.*

    CPU > Max CPU

    #.*

    CPU > Max CPU

    #.*

    Getting More out of ASLDB Home PerformanceTop ActivitySQLSessionSession

    #.*

    In resumeASL is simple and PowerfulASLs components are even more powerfulCPUWAITValue over TimeUse # of CPUs as a yardstickKnow your application load profile to see anomaliesData warehouse OLTPHeavy LoadLight Load

    Wouldnt it be nice to have an easy way to show that the application wasnt even putting any load on the databaseStatspack can be an overwhelming amount of data clocking in at a typical 1300 lines of data.Graphics are a good solution displaying this amount of data more quickly, but what stats to you show and how?

    Alerts are great at automated notification, but what do you set alerts on? Alerts are generally set on a standard work load. What if there are no alerts going off? How do you check that things really are working correctly and there is not a problem with the alerts, their levels or possibly missing alerts?IQ is a controversial metric as well but it has its place.Someone with an IQ of 70 probably would not be the best candidate for an Ivy League math professor.The problem is they might be good for other jobs but IQ lacks information about which professions they person might excel at. IQ lacks multiple dimensions that would break down abilities into different areas.Similarly the ASA by itself lacks the dimensions and by itself is limited but useful. It of course can be broken down into several parts to give us a more detailed picture of activity.Looking at the situation more concretely, when there is a slow down on the database we look at our trusted Statspack report for the period of the slowdown. The first step in analyzing the statspack report is to look at the Top 5 Timed Events The top 5 timed wait events will tell us if any wait event has crept up to cause a bottleneck. If we do find a wait event bottleneck we will need to know who or what is causing the problem in order to solve it. For example if there is a CPU bottleneck, we need to know what SQL statement is hogging the CPU. If there is an IO bottleneck we need to know what SQL statement is stuck on IO and needs tuning. If there is a complex situation like a buffer busy waits or latch contention we need to know which sessions were involved, what the wait event arguments were and what SQL they were executing. Statspack fails to give us the necessary detailed information but ASH does.Statspack is probably the most reliable source of performance information.The statspack report, ?/rdbms/admin/spreport.sql , generates over 1000 lines of information, but the first and possible only place to go in the report is Top 5 Timed Events.In Top 5 Timed Events we can determine if the database has any performance issues. If it does have performance issues, then we can find out if its CPU or a wait. If its a wait we can tune that particular wait.If the users call up saying the database is hanging and ASL < 1 you know its not true. If ASL is near 0 you even know that the database is idle and that the users or application is not requesting any work from OracleOEM DB Home page only shows ASL at a point in time which is limited.Click on the performance page tab to get a time line view.In the middle of the page is Average Session Load, or Average Active SessionsIn the middle of the page is Average Session Load, or Average Active SessionsMaximum CPU line ADDM report (folder with checkmark)Run ADDM Now Run ASH Report Top Activity CPU Used Wait Classes -Maximum CPU line ADDM report (folder with checkmark)Run ADDM Now Run ASH Report Top Activity CPU Used Wait Classes -select substr(name,0,25) event, substr(wait_class,0,25) class from v$event_name where wait_class != 'Other' and wait_class !='Idle' and wait_class != 'System I/O'order by wait_class/buffer pool resize Administrativeswitch logfile command Administrativeindex (re)build online st Administrativeindex (re)build online cl Administrativeindex (re)build online me Administrative

    enq: TM - contention Applicationenq: TX - row lock conten ApplicationSQL*Net break/reset to cl ApplicationSQL*Net break/reset to db Applicationenq: UL - contention Application

    log file sync Commit

    enq: TX - index contentio Concurrencylatch: row cache objects Concurrencyrow cache lock Concurrencycursor: mutex X Concurrencycursor: mutex S Concurrencycursor: pin S wait on X Concurrencylatch: shared pool Concurrencylatch: library cache Concurrencylatch: library cache lock Concurrencylatch: library cache pin Concurrencylibrary cache pin Concurrencylibrary cache lock Concurrencylibrary cache load lock Concurrencypipe put Concurrencyos thread startup Concurrencylatch: cache buffers chai Concurrencybuffer busy waits Concurrency

    sort segment request Configurationenq: TX - allocate ITL en Configurationenq: SQ - contention Configurationfree buffer waits Configurationwrite complete waits Configurationlatch: redo writing Configurationlatch: redo copy Configurationlog buffer space Configurationlog file switch (checkpoi Configurationlog file switch (private Configurationlog file switch (archivin Configurationlog file switch completio Configurationenq: ST - contention Configurationenq: HW - contention Configurationenq: SS - contention Configurationundo segment extension Configurationundo segment tx slot Configuration

    SQL*Net more data from db NetworkSQL*Net message to client NetworkSQL*Net message to dblink NetworkSQL*Net more data to clie NetworkSQL*Net more data to dbli NetworkSQL*Net more data from cl Network

    Datapump dump file I/O User I/OBFILE read User I/Olocal write wait User I/Obuffer read retry User I/Oread by other session User I/Odb file sequential read User I/Odb file scattered read User I/Odb file single write User I/Odb file parallel read User I/Odirect path read User I/Odirect path read temp User I/Odirect path write User I/Odirect path write temp User I/O

    adfOnce you get to know a systems profile it will be easy to see aberations.