View
258
Download
1
Category
Preview:
Citation preview
MONITORING SQL SERVER
Key Performance Metrics and how to Interpret Them
Tuning blog: http://www.sqlperformance.com/
E-mail ebooks@sqlsentry.com for free copies of our $10 e-books:
YOUR PRESENTER• John Q Martin
o Sales Engineer for SQL Sentryo Worked with SQL Server for ~10 yearso Consultant, SQL DBA, Dev & BI Developero Former Microsoft Premier Field Engineer
• Contact Informationo Email: Jmartin@SQLSentry.como Blog: http://blogs.sqlsentry.com/author/JohnMartin/ o Twitter: @SQLDiplomat o LinkedIn: https://uk.linkedin.com/in/johnqmartin
AGENDA• CPU Monitoring
• Memory Monitoring
• Storage Monitoring
• SQL Server Monitoringo Monitoring Counterso Wait Statso DMVso Events
MONITORING APPROACHES
MONITORING FUNDAMENTALS
• Monitor over time, keep the captured data as it will be invaluableo Don’t just grab everything “just in case”
• Use historical data to create baselineso Baselines will allow for spotting when regular events or time periods are ‘out of
band’
• Historical monitoring data can be used to perform trend analysis and capacity planning.
CPU METRICS
• Processo% Processor Timeo% Privileged Time
• Processoro% Processor Timeo% Privileged Timeo% DPC Time
CPU METRICS• Important to monitor each CPU as well as the total CPU usage.
o Helps identify potential MAXDOP issues.o Will allow for you to see if there are possible misconfigurations in the system
outside of SQL Server
• Excessive DPC and Privileged time can indicate issues elsewhere in the system such as networking or storage.
• Monitoring the SQL Server Processes will allow you to see how much time is spent on SQL Servero Depending on storage you can capture more process instances
CPU METRICSReal Time Monitoring Tracking CPU over time.
MEMORY METRICS
• NUMA Node MemoryoAvailable MByteso Total MBytes
• MemoryoAvailable Mbyteso Page Faults/seco Page Reads/seco Page Writes/sec
MEMORY MONITORING
• Understand what volumes of data are being read into and out of memory.
• Tracking memory use by NUMA node can have benefits depending on the configuration of the system. o Differences in the amount of memory allocated to each NUMA node can affect
processing in the CPUs within each node.
• Ensuring that there is a sufficient free memory is important to maintaining a stable system.
MEMORY MONITORING
• Track memory by node
• Aggregate to overall
• What else is in use?Node 1Node 0
STORAGE METRICS
• Logical Disko Same as Physical DiskoDepends on your disk
configuration, if 1:1 mapping between physical & logical then use Physical metrics.
• Physical DiskoDisk Read Bytes/secoDisk Write Bytes/secoDisk Reads/secoDisk Writes/seco Split IO/secoCurrent Disk Queue
Length
STORAGE MONITORING
• Key monitoring elements for storageo IOPSo Throughputo Latency
• Monitor amount of space usedo Sample rate does not need to be frequent, can be minutes or hours rather than seconds.
• Understand the configuration of the disks as to whether you need to use Logical and/or Physical Disk counters.
SQL SERVER STORAGE DMVS
• sys.dm_io_virtual_file_stats()o Gives depth to the reads & writes into each database fileo Allows you to derive Read/Write balance for data fileso IO operations and Bytes written
• sys.sm_io_pending_io_requestso Shows outstanding file IOs for SQL Server database files.
• sys.dm_db_index_physical_stats()o Gather index fragmentation detailso Can cause lots of IO, use sparingly on large databases
SQL SERVER METRICS
• Buffer ManageroBuffer Cache Hit RatiooCheckpoint pages/seco Page Reads/seco Page Writes/sec
• Access Methodso Forwarded Records/seco FreeSpace Scans/seco Page Splits/secoWorkfiles Created/secoWorktables Created/sec
SQL SERVER METRICS
• Buffer Nodeo Page Life Expectancyo Local node Page
lookups/secoRemote node page
lookups/sec
• DatabasesoActive Transactionso Log Bytes Flushed/seco Log Flush Wait timeo Log Flush Waits/seco Log Flushes/seco Percent Log Used
PAGE LIFE EXPECTANCY• PLE Value is meaningless, Discuss.
• Value needs to be given contexto How large is the buffer poolo What is my IO sub-system capabilityo What % of the IO Channel is used to maintain the PLE value
• Investigate changeso What happened when PLE suddenly dropped?
• Monitor at the Buffer Node Levelo Global PLE value will not equal mean AVG of Node value.
PAGE LIFE EXPECTANCY
• Look for changes and see what else was going ono Large batch job/reporto Someone runs DBCC DROPCLEANBUFFERS
• Frequent monitoring required as changes can happen fasto Seconds to minutes for monitoring interval.
What happened ?
SQL SERVER WAITS
• ASYNC_NETWORK_IO
• LCK_*
• PAGELATCH_*
• RESOURCE_SEMAPHORE
• WRITELOG
• LOGBUFFER
• CXPACKET
• THREADPOOL
SQL SERVER EVENTS
• Monitor SQL Agent for Failed Jobs
• Monitor for 823, 824, 825 errorso Can indicate storage or corruption issueso Make use of Agent Alerts or tools to scan the agent log
• Monitor and manage the dbo.suspect_pages table in MSDBo SQL Server will track incidences of corrupt pages hereo Limited to 1000 records so needs to be managed if there is anything here
SUMMARY
• Identify base metrics that you should be capturing and a capture frequencyo Understand why you are collecting them and how to use them effectively
• Identify specific business events and cycles and create baselines to allow for tracking performance over multiple iterations and time
• Look for correlation between performance metricso Make use of CORREL function in Excel if needed
• Track changes to the environment, code, applications etc. this will help supplement the monitoring data.
QUESTIONS
THANK YOU!
• Slides will be available at http://blogs.sqlsentry.com • More information at:
o SQLSkills, et al• E-mail ebooks@sqlsentry.com for free copies of our e-books:
o Just tell them where you met me• My contact info for other questions:
o Email: Jmartin@SQLSentry.como Twitter: @SQLDiplomat
Recommended