Upload
nathen-threlfall
View
241
Download
0
Tags:
Embed Size (px)
Citation preview
MICROSOFT SQL SERVERDATABASE ENGINE I/O
by Bob Dorr, Microsoft SQL Server Principle Escalation Engineer, 1994 – Present
Built: Jan 2008
Areas Covered
Write Ahead Logging (WAL) Protocol Synchronous vs Asynchronous I/O Scatter / Gather I/O Sector alignment, Block Alignment Latching and a page: A read walk-through SQL Server I/O Sizes Data cache maintenance PAE and AWE Read Ahead User Mode and Kernel Mode (SYSTRAP) Sparse Files and Copy On Write (COW)
Pages Locked Pages Scribbler(s) and Bit flips Page Protection and Constant Pages Checksum vs Torn Stale Read Stalled I/O
WAL Protocol
Write Ahead Logging ACID (Durability Property) Log records secured
before data Hardened / Stable Media Log contains parity bit
• Commit • Rollback• Trigger Snapshot
Synchronous vs Asynchronous I/O
Sync: Wait for Completion Async: Post and Continue
Overlapped Event Completion Port
SQL Server 98% Async Usage
Overlapped and HasOverlappedIoCompleted Network Layers Use Completion Port
Backup/Restore Use Sync – Sequential Patterns
• dm_io_pending_io_requests• Overlapped Structure• Async Processing ~= CPU• Package vs Phone
Scatter / Gather I/O
Consolidates or Distributes APIs
ReadFileScatter WriteFileGather
Increases Efficiency Used by SQL I/O Paths Used by Windows Page File
• Old Design: 6.x Sorting• AWE Availability • WriteMultiple• # of 8K Pages• Forward and
Backward• Buffer Pool Ramp-up
Disk
Memory
GatherScatter
Sector AlignmentBlock Alignment
Sector: Log Writes Block: Performance Avoid Crossovers DiskPart/DiskPar Utilities Discuss with your Vendor
• Double Touch• Rewrites• Defragment• 4K Sectors
Alignment: http://support.microsoft.com/kb/929491To verify that an existing partition is aligned, divide the size of the stripe unit by the starting offset of the RAID disk group. Use the following syntax: ((Partition offset) * (Disk sector size)) / (Stripe unit size)
Example of alignment calculations in bytes for a 256-KB stripe unit size: (63 * 512) / 262144 = 0.123046875(64 * 512) / 262144 = 0.125(128 * 512) / 262144 = 0.25(256 * 512) / 262144 = 0.5(512 * 512) / 262144 = 1
These examples shows that the partition is not aligned correctly for a 256-KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector).
Latch
Multiple Readers (SH) One Writer (EX) Protects In-Memory Data Page
Latch = Physical Protection Lock = Logical Protection
User Mode UMS/SQLOS Aware Optimized FIFO Ordering
• Flushed & Rollback• Latch Timeout• Sub-latch
BUF Array
Memory (Data Pages)
BUFStatusLatchDatabase*PageIdHash *…
Reading A Page
Get Free Buffer for Read Acquire Exclusive (EX) Latch Is already in-memory/hashed? Add Entry to Page Hash Post and Record Asynchronous Read … Continue Processing …. Check Status (Scheduler Switch) Complete: Validate I/O and Release
Latch
• Page Audits• Read retry • Stalled I/O Warnings • Error raised at Acquire• Shared (SH) waiters• PAGE_IO* vs PAGE* Latch• Writing A Page
kernel transition – Stuck I/O?ntdll!ZwWriteFile+0xa kernel32!WriteFile+0xf6 sqlservr!DiskWriteAsync+0xee …
0:000> uf ZwWriteFilemov r10,rcxmov eax,5Syscall Kernel Transitionret
Myth: Single Worker Per FileTruth: Each Worker Issues I/O
Worker #1
Vol #1
Vol #2
Serial Planselect * from dbTest.dbo.tblTestinsert into dbTest.dbo.tblTest
dbTest.MDF dbTest.NDF dbTest.LDF
Worker #2
Parallel Planselect * from dbTest.dbo.tblTestinsert into dbTest.dbo.tblTest
Create Database Workers Assigned by Volume ID
Primary = dbTest.MDFSecondary = dbTest.NDFLog = dbTest.LDF
Worker #3
Worker #4
Worker #5
Data Cache Maintenance
Memory Pressure: LazyWriter Per NUMA Node Time Of Last Access (TLA)
Recovery Interval: Checkpoint Queue I/O Targets .LDF Usage Triggers Alternate Triggers (Backup, Restore, …) Scatter/Gather Usage (WriteMultiple)
• Checkpoint Assignments• By Ordinal Sweep• Stalled I/O – LW #0• I/O Queue Depth > 2
PAE and AWE
Physical Address Extensions /PAE in Boot.ini Boots Kernel with 36 bit addressing Physical Memory > 4GB Virtual Address Unchanged (/2gb or /3GB) Automatic for Hot Add Memory Computers
Address Windows Extension Windows APIs (AllocateUserPhysicalPages) Physical Memory Allocations Un/Mapped in or out of Virtual Address Range
• Data Pages-Only• Locked Pages• Windows Paging•Windows 2000 Bugs
•32 Bit Address = 4294967295 (0xFFFFFFFF) 4GB• Interlocked Instruction
lock xadd dword ptr [ecx],eax
•36 Bit Address = 68719476735 (0xFFFFFFFFF) 64GB• Multiple Instructions
Read Ahead
128 Pages Standard SKU 1024 Pages Enterprise
SKU Uses ReadFileScatter Plan Based Decisions Power of Asynchronous
I/O• Read Over Write • Ramp-up
Sparse Files – Copy On Write
Usage Online DBCC Snapshot Databases
Buffer Pool: PrepareToDirty
File Control Block (FCB) Chaining
• Sparse Allocation• FCB Tracking• Windows Limits • New Page Allocations
Advanced Protection
What is a Scribbler? Data Page Audits
None Torn Bits Checksum
Log Block Checksum Constant Page Backup with Checksum
• DBCC Page Audit• Stale Read Check• SQLIOSim
REFERENCES
Overview
SQL Server Always Onhttp://www.microsoft.com/sql/alwayson
SQL Server I/O Basics Chapter 1http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx
SQL Server I/O Basics Chapter 2http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx
Fundamentals and Requirements
KB230785 - SQL Server 7.0, SQL Server 2000 and SQL Server 2005 logging and data storage algorithms extend data reliability
KB917047 - Microsoft SQL Server I/O subsystem requirements for the tempdb database
KB231347 - SQL Server databases not supported on compressed volumes (except 2005 read only files)
Subsystems
KB917043 - Key factors to consider when evaluating third-party file cache systems with SQL Server
KB234656- Using disk drive caching with SQL Server KB46091-
Using hard disk controller caching with SQL Server KB86903 -
Description of caching disk controls in SQL Server KB304261-
Description of support for network database files in SQL Server
KB910716 (in progress) - Support for third-party Remote Mirroring solutions used with SQL Server 2000 and 2005
KB833770 - Support for SQL Server 2000 on iSCSI technology components (applies to SQL Server 2005)
Design and Configuration
White paper - Physical Database Layout and Design KB298402 -
Understanding How to Set the SQL Server I/O Affinity Option KB78363 - When Dirty Cache Pages are Flushed to Disk White paper - Database Mirroring in SQL Server 2005 White paper -
Database Mirroring Best Practices and Performance Considerations
KB910378 - Scalable shared database are supported by SQL Server 2005
MSDN article - Read-Only Filegroups KB156932 -
Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP
Diagnostics
KB826433 - Additional SQL Server Diagnostics Added to Detect Unreported I/O Problems
KB897284 - SQL Server 2000 SP4 diagnostics help detect stalled and stuck I/O operations (applies to SQL Server 2005)
KB828339 - Error message 823 may indicate hardware problems or system problems in SQL Server
KB167711 - Understanding Bufwait and Writelog Timeout Messages
KB815436 - Use Trace Flag 3505 to Control SQL Server Checkpoint Behavior
KB906121 - Checkpoint resumes behavior that it exhibited before you installed SQL Server 2000 SP3 when you enable trace flag 828
WebCast- Data Recovery in SQL Server 2005
Certification Policy
KB913945- Microsoft does not certify that third-party products will work with Microsoft SQL Server
KB841696 - Overview of the Microsoft third-party storage software solutions support policy
KB231619 - How to use the SQLIOStress utility to stress a disk subsystem such as SQL Server
Utilities
Download - SQLIO Disk Subsystem Benchmark Tool
Download - SQLIOStress utility to stress disk subsystem (applies to SQL Server 7.0, 2000, and 2005 - replaced with SQLIOSim and SQL Server 2008 installed in BINN)
Blog Content SQL Server Urban Legends Discussed
http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx How It Works: SQL Server Checkpoint (FlushCache) Outstanding I/O Target
http://blogs.msdn.com/psssql/archive/2008/04/11/how-it-works-sql-server-checkpoint-flushcache-outstanding-i-o-target.aspx How It Works: SQL Server Page Allocations
http://blogs.msdn.com/psssql/archive/2008/04/08/how-it-works-sql-server-page-allocations.aspx How It Works: Shapshot Database (Replica) Dirty Page Copy Behavior (NewPage)
http://blogs.msdn.com/psssql/archive/2008/03/24/how-it-works-shapshot-database-replica-dirty-page-copy-behavior-newpage.aspx
How It Works: SQL Server 2005 I/O Affinity and NUMA Don't Always Mixhttp://blogs.msdn.com/psssql/archive/2008/03/18/how-it-works-sql-server-2005-i-o-affinity-and-numa-don-t-always-mix.aspx
How It Works: Debugging SQL Server Stalled or Stuck I/O Problems - Root Causehttp://blogs.msdn.com/psssql/archive/2008/03/03/how-it-works-debugging-sql-server-stalled-or-stuck-i-o-problems-root-cause.aspx
How It Works: SQL Server 2005 Database Snapshots (Replica)http://blogs.msdn.com/psssql/archive/2008/02/07/how-it-works-sql-server-2005-database-snapshots-replica.aspx
How It Works: File Stream the Before and After Image of a Filehttp://blogs.msdn.com/psssql/archive/2008/01/15/how-it-works-file-stream-the-before-and-after-image-of-a-file.aspx
Using SQLIOSim to Diagnose SQL Server Reported Checksum (Error 824/823) Failureshttp://blogs.msdn.com/psssql/archive/2008/12/19/using-sqliosim-to-diagnose-sql-server-reported-checksum-error-824-823-failures.aspx
How to use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem http://support.microsoft.com/kb/231619
Should I run SQLIOSim? - An e-mail follow-up from SQL PASS 2008 http://blogs.msdn.com/psssql/archive/2008/11/24/should-i-run-sqliosim-an-e-mail-follow-up-from-sql-pass-2008.aspx
What do I need to know about SQL Server database engine I/O? http://blogs.msdn.com/psssql/archive/2006/11/27/what-do-i-need-to-know-about-sql-server-database-engine-i-o.aspx
SQLIOSim is "NOT" an I/O Performance Tuning Tool http://blogs.msdn.com/psssql/archive/2008/04/05/sqliosim-is-not-an-i-o-performance-tuning-tool.aspx
How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers ... http://blogs.msdn.com/psssql/archive/2008/11/12/how-it-works-sqliosim-running-average-target-duration-discarded-buffers.aspx
How It Works: SQLIOSim [Audit Users] and .INI Control File Sections with User Count Options http://blogs.msdn.com/psssql/archive/2008/08/19/how-it-works-sqliosim-audit-users-and-ini-control-file-sections-with-user-count-options.aspx
Understanding SQLIOSIM Output http://sqlblog.com/blogs/kevin_kline/archive/2007/06/28/understanding-sqliosim-output.aspx
Additional Learning Resources
Inside SQL Server 7.0 and Inside SQL Server 2000Written by Kalen Delaney – her husband is Paul Randle who wrote the core dbcc checks for SQL 7.0, 2000 and 2005
The Guru’s Guide to SQL Server Architecture and Internals – ISBN 0-201-70047-6
Written by Ken after he joined Microsoft SQL Server SupportMany chapters reviewed by developers and folks like myself
SQL Server 2005 Practical Troubleshooting ISBN 0-321-44774-3 – Ken Henderson
Authors of this book were key developers or support team membersCesar – QP developer and leader of the QP RedZone with Keithelm and JackliSameert – Developer of UMS and SQLOS SchedulerSanteriv – Developer of the lock managerSlavao – Developer of the SOS memory managers and engine architectWei Xiao – Engine developerBart Duncan – long time SQL EE and now developer of the Microsoft Data
Warehouse – performance focusedBob Ward – SQL Server Support Senior EE
Advanced Windows Debugging – ISBN 0-321-37446Written by Microsoft developers – excellent resource
Applications for Windows – Jeffrey RichterGreat details about Windows basics