Indy pass writing efficient queries – part 1 - indexing

Writing Efficient Queries – Part 1

Using SQL Server Internals to Improve Data Access

Eddie WuerchMCT, MCITP

Principal, Data ManagementExactTarget

[email protected]

Disk I/O – Key Points

Disk I/O (reads and writes) is usually the slowest component of a system◦Compare: Memory and CPU speeds are reported

in GHz: billions of actions per second◦Disk I/O rates are simply measured in IOPS:

Input/output Operations Per Second◦Fast disks on mostly sequential workloads can

get 100-150 IOPSBecause of disk access rates, our first

tuning goal is to reduce overall I/O…so let’s look at those data pages, then

examine how to process less of them

Disk I/O – Key Points - II

SQL Server is not a black boxMost data is in structured storageThere are many ways to access the data,

some ways are significantly faster – and cause less impact on other processes - than others

Understanding SQL Server data storage internals will guide you to the faster ways

Data Storage in SQL Server

The base unit of SQL Server data storage is the page

All data – system and user – is stored in pages

Each page is 8KB (8192 bytes)Pages are allocated from files in 64KB (8-

page) extents

Page Processing

Pages are read from disk and processed in memory as an entire 8KB unit

Extents are often read in from disk as a single block to reduce I/O

All data is processed in memory, pulled from disk first (processing put on hold) if data is not in memory

Page Types

System Page Types◦Space Management: File Header, PFS, GAM,

SGAM◦Change Management: DCM, BCM

Data Page Types◦In-row data ◦Index ◦LOB data and Row-overflow data

All pages are 8KB

In-Row Data Pages

96-byte Page HeaderPage Header

Row 1…Row 2…Row 3…Row 4…

4… 3… 2… 1…

Row DataRows written seriallyStarts at 97th byte

Row-offset tableStarts at end of page, moves

backwardsRecords first-byte offset of

each row

Disk Access Methods

Think of a phone book, with each entry as a record

Ordered by Last Name, First Name, MITwo ways to find a record:

◦Use Last Name, First Name to find a number (Index Seek)

◦Look through the entire phone book, one page at a time, scanning each row for data (Table Scan)

Index Types

Clustered Index◦Represents the table itself◦Index specifies the physical ordering of that

data◦Only 1 allowed per table◦May be unique, does not have to be the

primary keyNon-clustered index

◦Additional index of data◦Over 200 allowed◦May be unique

The phone book example

If a table has a clustered index, the pointer to each row in the table is the clustered index key

The leaf level of the nonclustered index contains the nonclustered keys and the clustered index keys

Nonclustered indexes may also include additional non-indexed columns, will be stored at the leaf level of the index

Index Pages

4 5 6

22 23 24 25 26 27

274 275 276 277 278 279 280 281 282 283

A-K : Page 4L-U : Page 5V-Z : Page 6

A : Page 22B : Page 23C : Page 24 D….

Baa : Page 276Baba : Page 277Base : Page 278Ba…

Index Lookups

Index Lookups - revisted

Nonclustered index

Clustered index (table)

Separate trip through the clustered index for

each ncl entry!

Operational Join Types

Merge JoinsHash JoinsLoop Joins

Join Type Comparison

Merge Hash

One trip through each table

Requires indexes on both sides, at least one of them must be unique

Usually the fastest join type

Works well for very large joins

Builds join data in tempdb

Loop

When the other two can’t be used

One trip through one table

One trip through the other table for each entry in the first table

Generally the slowest of the three types

Join and Indexing Tips

When defining an index, if the data is unique, then declare the index as unique

Join on keysProvide arguments in WHERE clauses to

match available indexesCluster tables on range scansLook for covering indexes

So How Do I Know?

SET STATISTICS IO ON

So How Do I Know?

So How Do I Know?

sys.dm_db_index_usage_stats◦User_seeks◦User_scans◦User_lookups◦User_updates

Sys.dm_db_missing_index_*◦Not magic, has limitations◦Many similar index entries with different

INCLUDE statements may indicate a need to revisit the clustered index design

So How Do I Know?

Scan-indicating waits◦Lots of PAGEIOLATCH_SH and PAGEIOLATCH_EX

waits are generated by tables scans that read from disk

◦CX_PACKET waits – related to parallellism often caused by scanning large tables (don’t reduce MAXDOP: fix the scan!)

◦Other processes with SOS_SCHEDULER_YIELD or high signal wait times may be mitigated by reducing CPU load of scans

So How Do I Know?

TempDB activity in instances without much use of temp tables or table variables◦SELECT * FROM

sys.dm_io_virtual_file_stats(DB_ID(‘TempDB’), NULL)

◦Must track over time, perform time-slice analysis◦May indicate additional worktable sort and hash-

match activity◦Tracking this for all of your databases shows the

amount of I/O your systems are performing, and if the disk systems are keeping up

Resources

Microsoft White Papers◦SQL Server 2000 I/O Basics (

http://technet.microsoft.com/en-us/library/cc966500.aspx)

◦SQL Server I/O Basics, Chapter 2 (http://technet.microsoft.com/en-us/library/cc917726.aspx)

◦SQL Server Waits and Queues (download) (http://technet.microsoft.com/en-us/library/cc966413.aspx)

The Waits and Queues document is highly recommended tuning or analyzing workloads

http://technet.microsoft.com/en-us/library/cc966500.aspx






Resources

Inside SQL Server Book Series◦SQL 2005

The Storage Engine (Kalen Delaney) Query Tuning and Optimization (Delany, et. al.) T-SQL Querying (Ben-Gan, Kollar, Sarka)

◦SQL 2008 Microsoft SQL Server 2008 Internals (Delaney,

Randal, Tripp, Cunningham) T-SQL Querying (Ben-Gan, Kollar, Sarka)

Questions?

Email: [email protected]

Technology

Indy pass writing efficient queries – part 1 - indexing