Upload
noleta
View
47
Download
2
Embed Size (px)
DESCRIPTION
Storage Performance on SQL Server. Joe Chang. Coverage. Emphasis is on Line of Business DB Different priorities for less critical apps Performance Fault-tolerance covered else where. Overview. IO Performance Objectives The Complete Storage Environment Direct-Attach and SAN - PowerPoint PPT Presentation
Citation preview
Storage Performance on SQL Server
Joe Chang
CoverageCoverage
Emphasis is on Line of Business DBDifferent priorities for less critical apps
PerformanceFault-tolerance covered else where
OverviewOverview
IO Performance Objectives The Complete Storage Environment
Direct-Attach and SANStorage ComponentsDisk PerformanceSQL Server IO CharacteristicsConfiguration ExamplesSSD
Old RulesOld Rules
Meet transaction throughputDisk Performance Criteria
Read from Data, Write to LogsSeparate Data and Log files?Disk Queue Depth < 2 per disk
Prevalent use of SANLUNs with unknown number of disks
Latency (Avg Disk Sec/Read)
Storage Performance CriteriaStorage Performance Criteria
SELECT (Read) QueryData must be read into buffer cache if not already in cache – read from data
INSERT/UPDATE/DELETE (Write) Query
Data must be read into buffer cache Transaction must be written to logBuffer is marked as dirty, lazy writer handles
Large Query (as necessary)Write and Read to tempdb
Flashback: 1994 - 2009Flashback: 1994 - 2009
1994: Pentium 100MHz64MB, 4 x 16MB SIMM ($700+ each?)OS + DB executable ~ 16-24MBNet: 40MB Buffer cache
Difficult to support transactionsReports run on 1st of month
Today: 4 x Quad Core128GB, 32 x 4GB, $48003000 X increase in buffer cache
Requirements Then and NowRequirements Then and Now
Old: Support transactionsNo longer really an issue for most environments (after proper SQL tuning!)
Today: Minimize disruptions to transactions
Large query or table scan while supporting transactionsCheckpoint – write dirty buffers to dataTransaction Log backup Backup & Restore
Cost versus Value/RequirementsCost versus Value/Requirements
Money is no object:With sufficient number of disks, IO channels, proper configurationIt is possible to avoid most disruptions
Otherwise – Manage IO disruptionsEstablish tolerable disruptions: 5-30 seconds?Large reports run off-hoursConfigure sufficient performance to handle transient events
Complete Storage EnvironmentDirect Attach and SAN
Most Common Mistakes
Storage sized to capacity requirements2 HBA (or RAID Controllers)Too few big capacity disk drives
Fill system PCI-E slots with controllersMany small 15K drives (146 3.5 or 73 2.5)
Direct AttachDirect AttachSystem IO capabilities is distributed across multiple PCI-E slots.
Single controller does not have sufficient IOSingle (or even Dual) SAS/FC port does not have sufficient IO
Distribute IO over multiplePCI-E channelsControllers (SAS or FC)Dual port SAS or FC
Disk Array Enclosures (DAE)Do not daisy chain (shared SAS/FC) until all channels are filled!
Server SystemCPU CPUCPU CPU
IO HUB
PCI-E
PCI-E
HBA HBA
SAS SA
S SAS SA
S
IO HUB
PCI-E
PCI-E
HBA HBA
SASSA
SSASSA
S
SANSANSAN is really computer system(s)Typically connected by FC to host and storage
Can be fault-tolerant in all components and paths: HBA, cables, switches, SP, disks
No special performance enhancementsSlight degradation (excessive layers)
Write cache is mirrored between SP’sReally important!Distribute load over all front-end and back-end FC ports
SANSP A SP B
HBA HBA HBA HBA
HBA HBA
Server SystemCPU CPUCPU CPU
IO HUB
PCI-E
PCI-E
HBA HBA
IO HUB
PCI-E
PCI-E
HBA HBA
FC FC FC FC FC FC FC FC
HBA HBA
FC FC FC
FC FC FC FC
FC
Direct Attach & SANDirect Attach & SAN
Direct AttachRAID Controller in ServerFault-tolerant disks,
sometimes controller/path, 2-node clusters
SANHost Bus Adapter, (switches)Service Processor Full component and path fault tolerance
Multi-node clusters
SAN Vendor ViewSAN Vendor View
One immensely powerful SAN serving storage needs of all servers
Storage consolidation – centralize management and minimize unused spaceProblem is: SAN is not immensely powerful
What happens if LUN for another server fails, and a restore from backup is initiated during busy hours
DW-BIDB Email
SAN
Web QADB
OLTPDB
SharePoint
Switch
Proper ViewProper View
Nothing should disrupt the operation of a line-of-business server
Data Warehouse is not be mixed with transaction processing DB
Consider multiple storage systems for very large IOPS loads instead of a single SAN
DW/BI
Storage Storage
SAN
Sharepoint
File ServerOLTP
SAN SAN
Storage SystemsStorage Systems
SANEntry
Mid range
Enterprise
DA HighDensity
DirectAttach
HP MSA 2000, (Dell MD 3000)
EMC CLARiiON, HP EVA, NetApp FAS3100
EMC DMX, Hitachi, 3 PAR, FAS6000
HP MSA 50, 70, Dell MD 1120
HP MSA 60, Dell MD 1000
EMC CLARiiON
x8 CMI
LCCLCC
High-performance Flash drives
Spin DownLow power SATA II drives
Adaptive Cooling
Virtual Provisioning= Capacity optimization
= Energy efficiency
Multi-core processorsIncreased memory
64-bit FLAREUp to 960 drives= up to twice the
performance, scale
SPSSPS
Power Supply
Power Supply
IO Complex
iSCSI module
Fibre Channel module
iSCSI module
Fibre Channel module
Fibre Channel module
Fibre Channel module
CPU Module
Multi-Core Processors
Memory
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
IO Complex
iSCSI module
Fibre Channel module
iSCSI module
Fibre Channel module
Fibre Channel module
Fibre Channel module
CPU Module
Multi-Core Processors
Memory
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
EMC DMXEMC DMX
Cache
If system memory is 128GBWhat you expect to find in 16GB SAN cacheThat is not in the buffer cache?
Performance benchmarksMost use direct attach storageWith SAN: cache disabledAlternative: tiny read cache, almost all to write
Complete Environment SummaryComplete Environment Summary
Server SystemMemory BandwidthIO bandwidth, port, PCI-E slots
Pipes/channels from Server to StorageStorage System
RAID controller, etcPipes to disk drivesDisk drivesIf system memory is 128GB, what you expect to find in the
16GB SAN cache that is not in the buffer cache?
Storage Components
Storage Components/InterfacesStorage Components/Interfaces
System IODisk DrivesHBA and RAID Controller
SAS (3Gbit/s going to 6), FC (4Gbit/s to 8)
Storage Enclosures (DAE)Disk DrivesSAN – SystemsSAN – Switches
Server Systems: PCI-E Gen 1Server Systems: PCI-E Gen 1
PCI-E Gen 1: 2.5Gbit/s per lane, bi-directionalDell PowerEdge 2950 – 2 x8, 1 x4Dell PowerEdge R900 – 4 x8, 3 x4 (shared)HP ProLiant DL385G5p – 2 x8, 2 x4HP ProLiant DL585G5 – 3 x8, 4 x4HP ProLiant DL785G5 – 3 x16, 3 x8, 5 x4Most PCI-E slots have dedicated bandwidth, some may be
shared bandwidth (with expander chip)
Server Systems: PCI-E Gen 2Server Systems: PCI-E Gen 2
PCI-E Gen 2: 5.0Gb/s per lanex4: 2 GB/sec in each direction
Dell PowerEdge R710 – 2 x8, 2 x4Dell PowerEdge R910(?)HP ProLiant DL370G6 – 2 x16, 2 x8, 6 x4
Intel 5520 chipset: 36 PCI-E Gen 2 lanes, 1 ESI (x4)ProLiant ML/DL 370G6 has 2 5520 IOH devices
Disk DrivesDisk Drives
Rotational Speed – 7200, 10K, 15KAverage Rotational latency 4, 3, 2 milli-sec
Average Seek Time 8.5, 4.7, 3.4ms (7200, 10K, 15K RPM)2.5 in 15K 2.9 ms avg. seek
Average Random Access TimeRotational + Seek + Transfer + Overhead
Native Command Queuing
Disk InterfacesDisk Interfaces
SATA – mostly 7200RPMSATA disk can be used in SAS systemSATA Adapter cannot connect to SAS disk
SAS –15K3.5 in LFF, 2.5in SFFCurrently 3 Gbits/sec, next gen: 6 Gb/s
FC – typically in SAN4 Gbit/s, next: 8 Gbit/s
Disk Drives (3.5in, LFF)Disk Drives (3.5in, LFF)
95mm 84mm 65mm
7200RPM, 1TB Barracuda 12: 8.5ms, 125MB/sBarracuda LP 95MB/s (5900)
10,000RPM, 5ms
End of life?
15,000RPM, 3.4ms 146, 300, 450GB167MB/sec
Lower RPM drives have higher bit density and larger platters contributing to very low $/GB. Desktop rated for 2 years @ 20% duty cycle, server for 5 years @ 100%
Seagate DrivesSeagate Drives
15K.7Cheetah 3.5in LFF drives15K.2 2.9/3.3 15K.4 36/73/146GB 3.5/4.0ms 95?15K.5 73/146/300GB 3.5/4.0ms 125-7315K.6 146/300/450GB 3.4/3.9ms 171-112MB/sec15K.7 300/450/600GB
Savvio 15K.2
Barracuda ES
Savvio 10K.3
Savvio 2.5 in SFF drives15K.1 36/72GB2.9/3.3 ms 112-79MB/sec15K.2 73/146GB 2.9/3.3 ms 160-120MB/s
Dell PowerVaultDell PowerVault
Dell PowerVault MD 1000 – 15 3.5in $7K for 15 x 146GB 15K drives
Dell PowerVault MD 1120 – 24 2.5in $11K for 24 x 73GB 15K
HP MSAHP MSA
MSA 60: 12 LFF drives
MSA 70: 25 SFF drives
Direct Attach Cluster CapableDirect Attach Cluster Capable
Dell PowerVault MD 3000 – 15 3.5in 2 internal dual-port RAID controllers$11.5K for 15 x 146G 15K drives
Listed as Direct Attach, but essentially an entry SAN
PCI-E SAS RAID ControllersPCI-E SAS RAID Controllers
First GenerationPCI-E host interfacePCI-X SAS controllerPCI-E to PCI-X bridge
800MB/secSecond Generation
Native PCI-E to SAS1.6GB/sec in x8 PCI-E, 2 x4 SAS ports
FC HBAFC HBA
QLogic QLE2562Dual port 8Gbs FC, x8 PCI-E Gen 2
QLogic QLE 2462 Dual Port 4Gbs, x4 PCI-E Gen 1
Qlogic QLE 2464Quad port FC, x8 PCI-E Gen 1
Emulex LPe12002Emulex LPe11002/11004
Disk Performance
Random IO Theory Queue Depth 1Random IO Theory Queue Depth 1
10K 3.0 4.7 0.07 7.77 128.7
15K 2.0 3.4 0.05 5.45 183.6
15K SFF 2.0 2.9 0.05 4.95 202
7200 4.17 8.5 0.06 12.7 78.6
Drive RotationalLatency
AvgSeek
8KBtransfer
Totalmilli-sec IOPS
IO rate based on data distributed over entire disk accessed at random, one IO command issued at a timeNot accounting for other delays
Other FactorsOther Factors
Short Stroke:
Data is distributed over a fraction of the entire disk
Average seek time is lower (track-to-track minimum)
Command Queuing:
More than one IO issued at a time,
Disk can reorder individual IO accesses, lowering access time per IO
8K Random IOPS vs Utilization8K Random IOPS vs Utilization
0
100
200
300
400
500
600
88% 47% 24% 12% 6.1% 3.0% 1.4%
Q1 Q2 Q4 Q8
Q16 Q32 Q64
IOPS for range of Queue depth and space utilization
Latency versus Queue DepthLatency versus Queue Depth
0
20
40
60
80
100
120
140
160
180
Q1 Q2 Q4 Q8 Q16 Q32 Q64
88% 47% 24% 12% 6.1% 3.0% 1.4%
Latency versus Queue depth for range of space utilization
Disk SummaryDisk Summary
Frequently cited rules for random IOApplies to Queue Depth 1Data spread across entire disk
Key FactorShort-strokeHigh-Queue Depth
SANComplex SAN may hide SS and HQ behavior
SQL Server IO Patterns
SQL Server IOSQL Server IO
Transactional queriesRead/Write
Reporting / DW queriesCheckpointsT-Log backupsDifferential/Full backups
Transactional QueryTransactional Query
Few rows involvedSELECT xx FROM Table WHERE Col1 = yy
Execution Plan has bookmark lookup or loop joinsIO for data not in buffer cache
8KB, randomissued 1 at a time, serially (5ms min latency)
(up to around 24-26 rows)Even if LUN has many disks, IO depth is 1!
Large QueryLarge Query
Plan has bookmark lookup or loop join
Uses Scatter-Gather IOMore than (approximately) 30 rowsDepending on Standard or Enterprise EditionMultiple IO issued with one call,
Generates high-queue depthQuery for 100 rows can run faster than 20!
High row count non-clustered index seek: Are key lookups really random. Build index with care. Only highly selective SARG in key.
TempdbTempdb
Large Query may to spool intermediate results to tempdbSequence of events is:
Read from dataWrite to tempdbRead from tempdb (sometimes)Repeat
Disk load is not temporally uniform!Data and tempdb should share common pool of Disks/LUNs
CheckpointCheckpoint
Dirty data buffers written to diskUser does not wait on data writeSQL Server should throttle checkpoint writesBut high-queue depth of writes may result in high-latency reads
Log BackupLog Backup
Disrupts sequential log writes
Update
Problem in SQL Server 2000UPDATE uses non-clustered indexPlan does not factor in key lookupsExecution – fetch one row at a time
~5-10ms per key lookup
Storage Configuration Examples
General Strategy – General Strategy – DistributeDistribute IO IO
Distribute IO across multiple PCI-E slotsDistribute IO across multiple HBA/ControllersDistribute IO across many disk drives
Daisy chain DAE only after High transaction (write) volume
Dedicate HBA/controller, SAN SP, disk drives for logs?
LFF or SFF disksLFF or SFF disks
LFF 12-15 disks per enclosureSFF 24-25 disks per enclosure15 disks on x4 SAS,
Total bandwidth: 800MB/s, 53MB/s per disk
24 disks on x4 SAS, 33MB/s
Minimum for Line-of-BusinessMinimum for Line-of-Business2 x Xeon 5500 or 5400 series64-72GB memory4 SAS RAID Controllers$11-13K
4 x 15 Disk Enclosures60 146GB 15K drives6TB capacity (3+1 RAID 5)600GB database3GB/sec sequential30K IOPS short-stroke, peak$28K
SQL Server Ent License $50K
12-15 disks per x4 SAS port800-1000MB/sec bandwidth
SAN Option: 2 dual-port FC HBAEMC CLARiiON CX2-240, 4 DAE
x4 or x8 PCI-E
IntermediateIntermediate
1 DAE per controller in x4 PCI-E slots2 DAE per controller in x8 PCI-E slots, use both SAS ports, 1 DAE per x4 SAS Daisy-chain DAE only for very high disks
SAN example: CLARiiON CX4-480, 3 dual-port,HBA6 DAE
x4 PCI-E x8 PCI-E x4 PCI-E x8 PCI-E
4 x Xeon 7400 series128GB memory4 SAS RAID Controllers$25K
6 LFF (3.5) Disk Enclosures90 73GB 15K drives9TB capacity (3+1 RAID 5)900GB database3GB/sec+ sequential45K IOPS short-stroke, peak$42K
SQL Server Ent License $100K
SFF disks for Heavy Random IOSFF disks for Heavy Random IO
x4 PCI-E x8 PCI-E x4 PCI-E x8 PCI-E
4 x Xeon 7400 series128GB memory4 SAS RAID Controllers$25K
6 SFF (2.5in) Disk Enclosures144 73GB 15K drives7TB capacity (3+1 RAID 5)700GB database3GB/sec+ sequential70K IOPS short-stroke, peak$66K
SQL Server Ent License $100K
Really Serious DWReally Serious DW
8 x Opteron 8400 series246GB memory8 SAS RAID Controllers$80KOr Unisys, NEC, IBM
14 SFF (2.5in) Disk Enclosures336 73GB 15K drives16TB capacity (3+1 RAID 5)7-9GB/sec+ sequential
1.6TB database, 160K IOPS peak3.2TB, 130K IOPS peak$154K
SQL Server Ent License $200K
Need lots of IO bandwidth and slots, more than 4-way Xeon 7400 series with 7300 chipset can handle
SAN – CLARiiON example
Minimum (disks)CX4-240, 2 dual-port FC HBA, 4 DAE
Intermediate (120 disks)CX4-480, 4 dual-port FC HBA, 8 DAE
High-bandwidth DW (240 disks)CX4-960, 2 quad, 4 dual-port FC HBA, 16 DAE
Very high random IO (480 disks)CX4-960, 2 quad, 4 dual-port HBA, 32 DAE
Storage Performance Verification
What To Test
SequentialRandom low queue, high queueHigh row count Update with nonclustered indexCheckpoint writesFull-stroke and Short-stroke
Cache Settings
ReadRead-Ahead, Adaptive Read-Ahead, None
WriteWrite Back, Write Through
Read – none or very small (2MB/LUN)Write – Write-Back
SAN - HBA Settings
NumberOfRequestsDefault – 32? Prevents multiple hosts from overloading SANMatch to number of disks to control queue depth?
MaxSGList
SSD
SSD TypesSSD Types
DRAM – fastest, most expensiveNVRAM
SLC – more expensive /GB, higher writeMLC - low cost per GB
InterfacesSASPCI-E (Fusion-IO, 1GB/sec, 120K IOPS+)Complete SAN (Texas Memory Systems)
SSDSSD
Intel X-25E, 32 & 64GBSequential Read 250MB/s, Write 170MB/sRandom Read: 35,000 IOPS @ 4KBRandom Write: 3,300 IOPS @ 4KB
Good but not spectacularLatency: 75 us Read, 85 us Write
Really helpful for serial Queue Depth 1 accesses
SQL Server IO Cost StructureSQL Server IO Cost Structure
Key Lookup, Loop Join4-5 micro-sec in-memory15-25 us for 8K read from disk + eviction45 us for 64K read due to cold cache
SSD and RAIDSSD and RAID
Does an SSD need to be in RAIDDisk drive is fundamentally is single device
Motor or media failure results in loss of drive
SSD is not required to be a single device
Composed of SoC, interfaces SAS to NVRAMDual SoC plus ECC w/chip kill could make SSD fault-tolerant
Additional Slides
Partition Alignment
http://blogs.msdn.com/jimmymay/default.aspxMisaligned TheoryWith 64K stripe, warm cache, 8KB IO
on average every 8 random IO accesses will generate 10 actual IO, 25% gain
64K stripe, cold cache, 64KB IOEvery disk access generates 2 IO, 100% gain
RAID TheoryRAID Theory
Operation RAID 0 RAID 1+0 RAID 5
Read 1 1 1Small Write 1 1/2 1/4Large Write 1 1/2 1 - 1/N
Theoretical performance per drive for N drives in a RAID group
RAID 5 write: 1 read data, 1 read parity, 1 write data, 1 write parity. Write penalty is reduced if entire stripe can be written
EMC CLARiiONCX4-120 CX4-240 CX4-480 CX4-960
SP CPU 1x1.2GHz DC 1x1.6GHz DC 1 2.2GHz DC 2x2.3GH QC
System memory 6GB 8GB 16GB 32GB
Memory per SP 3GB 4GB 8GB 16GB
Max cache 600MB 1.264GB 4.5GB 10.76GB
Max write cache 600MB 1.264GB 4.5GB 10.76GB
CMI X4 X4 X8
Front-End Base 4 FC + 4 iSCSI
4 FC + 4 iSCSI
8 FC + 4 iSCSI
8 + 4
Back-end Base 2 FC 4 FC 8 FC 8FC
Max drives 120 240 480 480-960
Tot IO Slots 6 8 10 12
IO populated in base
4 4 6 6
Front-end FC ports 12 12 16 24
Back-end FC 2 4 8 16
Max iSCSI 8 12 12 16
NetApp
Write Anywhere File Layout (WAFL)Very different characteristicsOverrides many standard database strategiesNo need to defragment
See NetApp specific documentsIndex rebuild to clean up unused space may still be helpful
Enterprise SAN
Massive cross-barRAID groups
RAID 5 3+1 or 7+1, RAID 10 2+2 or 4+4
Hyper Volume: 16GB slices from RAID groupLUNS created from Hyper Volumes
Theory: Massive number disks, say 1000 disks, can do 150K IOPS. Each server averages 10K IOPS steady, with surges to 50K. Many servers can share large SAN
Table Scan to DiskTable Scan to Disk
0
200
400
600
800
1,000
1,200
1,400
1,600
Default RowLock PagLock TabLock NoLock
MB/
sec
SQL 2000 Clust. Index Scan SQL 2000 Heap Table ScanSQL 2005 Clust. Index Scan SQL 2005 Heap Table
Low Queue WritesLow Queue Writes
Read activity drops sharply during checkpoints
4 15K SCSI
Updates – All data in memoryUpdates – All data in memory
Checkpoints does not slow SQL batch, no reads required
HP Test System 2HP Test System 2
rx862016 Itanium 2
1.5GHz
HSV110 HSV110 HSV110 HSV110
8 2Gb/sFC ports 6 SCSI Disks
rx862016 Itanium 2
1.5GHz
HSV110 HSV110 HSV110 HSV110
8 2Gb/sFC ports 6 SCSI Disks