Upload
devender-singh-rautela
View
237
Download
30
Embed Size (px)
DESCRIPTION
SSIS Best practice, SSIS Best Practice
Citation preview
Bob Duffy
Irish SQL Academy 2008. Level 300
DTS 2000
1.75 Developers
SSIS 2005
*Figures are only approximations and should not be referenced or quoted
• Minimize staging (else use RawFiles if possible)
• Hardware Infrastructure: Disks, RAM, CPU, Network
• SQL Infrastructure: File Groups, Indexing, Partitioning
Optimize and Stabilize the basics
• Replace destinations with RowCount
• Source->RowCount throughput
• Source->Destination throughputMeasure
• OVAL performance tuning strategy
• The Three S‟s
• Data Flow Bag of TricksTune
• Lookup patterns
• Script vs custom transformParallelize
• Increase the efficiency of every aspect
Sharpen
• Parallelize, partition, pipelineShare
• Buy faster, bigger, better hardwareSpend
But be aware of limitations
RowBased
(synchronous)
Partially Blocking
(asynchronous)
Blocking(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
4 Gb Fiber Channel
Destination server runs SQL
Server
DatabaseEMC CX3-80
Destination server:Unisys ES7000/One32 sockets each with dual core Intel 3.4 GHz CPUs256 GB RAMWindows Server 2008SQL Server 2008
1 Gb Ethernet connections
2 Gb Fiber Channel
Source servers run SSIS
Source data EMC CX600
Source servers:Unisys ES3220L2 sockets each with 4 core Intel 2 GHz CPUs4 GB RAMWindows Server 2008SQL Server 2008
Make: Unisys
Model: ES7000/one Enterprise Server
OS: Microsoft Windows Server 2008 x64 Datacenter Edition
CPU: 32 socket dual core Intel® Xeon 3.4 GHz (7140M)
RAM: 256 GB
HBA: 8 dual port 4Gbit FC
NIC: Intel® PRO/1000 MT Server Adapter
Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
Storage: EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC
Quantity: 4
Make: Unisys
Model: ES3220L
OS: Windows2008 x64 Enterprise Edition
CPU: 2 socket quad core Intel® Xeon processors @ 2.0GHz
RAM: 4 GB
HBA: 1 dual port 4Gbit Emulex FC
NIC: Intel PRO1000/PT dual port
Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1
C1
C1
C1
Orders TablePartition
1Partition
2Partition
3Partition
4Partition
5Partition
6Partition
55Partition
56
Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 Orders_55 Orders_56. . .
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS
SSIS. . .
orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56
(Package details removed to protect the innocent)
• Iterative design, development & testingFollow Microsoft
Development Guidelines
• People & Processes
• Kimball‟s ETL and SSIS books are an excellent referenceUnderstand the Business
• Resource contention, processing windows, …
• SSIS does not forgive bad database design
• Old principles still apply – e.g. load with/without indexes?
Get the big picture
• Will this run on IA64 / X64?
• No BIDS on IA64 – how will I debug?
• Is OLE-DB driver XXX available on IA64?
• Memory and resource usage on different platforms
Platform considerations
• Break complex ETL into logically distinct packages (vsmonolithic design)
• Improves development & debug experience
Process Modularity
• Separate sub-processes within package into separate Containers
• More elegant, easier to develop
• Simple to disable whole Containers when debugging
Package Modularity
• Use Script Task/Transform for one-off problems
• Build custom components for maximum re-use
Component Modularity
Concise naming conventions
Conformed “blueprint” design patterns
Presentable layout
Annotations
Error Logging
Configurations
Get as close to the data as possible
• Limit number of columns
• Filter number of rows
Don‟t be afraid to leverage TSQL
• Type conversions, null coercing, coalescing, data type sharpening
• select nullif(name, „‟) from contacts order by 1
• select convert(tinyint, code) from sales
Performance Testing & Tuning
• Connect Output to RowCount transform
• See Performance Best Practices
„FastParse‟ for text files
BEFORE:
selectdbo.Tbl_Dim_Store.SK_Store_ID
, Tbl_Dim_Store.Store_Num,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_IDfrom dbo.Tbl_Dim_Storeleft outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_Num
where Current_Row = 1
AFTER:
select * from etl.uf_FactStoreSales(@Date)
Use the power of TSQL to clean the data 'on the fly'
• Too many moving parts is inelegant and likely slow
• But don‟t be afraid to experiment – there are many ways to solve a problem
Avoid over-design
• Allocate enough threads
• EngineThreads property on DataFlow Task
• See Performance Talk
Maximize Parallelism
• Synchronous vs. Asynchronous components
• Memcopy is expensive
Minimize blocking
• For example, minimize data retrieved by LookupTxMinimize
ancillary data
• Full Cache – for small lookup datasets
• No Cache – for volatile lookup datasets
• Partial Cache – for large lookup datasets
Three Modes of Operation
• Full Cache is optimal, but uses the most memory, also takes time to load
• Partial Cache can be expensive since it populates on the fly using singleton SELECTs
• No Cache uses no memory, but takes longer
Tradeoff memory vs.
performance
• Catch is that it requires Sorted inputs
• See SSIS Performance white paper for more details
Can use Merge Join component
instead
• Can written in any .Net language
• Must be signed, registered and installed – but can be widely re-used
• Quite fiddly for single task
Custom components
• Can be written in VisualBasic.Net or C#
• Are persisted within a package – and have limited reuse
• Have template methods already created for you
Scripts
http://sqlcat.com
http://technet.microsoft.com/en-us/library/bb961995.aspx
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx