77
Satisfy Your Technical Curiosity Creating High Impact Data Creating High Impact Data Warehouses with SQL Server Warehouses with SQL Server Integration Services and Integration Services and Analysis Services Analysis Services Ashvini Sharma Ashvini Sharma Senior Program Manager Senior Program Manager [email protected] [email protected]

Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Embed Size (px)

Citation preview

Page 1: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Creating High Impact Data Warehouses Creating High Impact Data Warehouses with SQL Server Integration Services with SQL Server Integration Services

and Analysis Servicesand Analysis Services

Ashvini SharmaAshvini SharmaSenior Program ManagerSenior Program [email protected]@microsoft.com

Page 2: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

A DW ArchitectureA DW Architecture

Data Warehouse(SQL Server, Oracle,

DB2, Teradata)

Data Warehouse(SQL Server, Oracle,

DB2, Teradata)

SQL/OracleSQL/Oracle SAP/Dynamics

SAP/Dynamics LegacyLegacy TextText XMLXML

Integration ServicesIntegration Services

ReportsReports Dashboards

Dashboards ScorecardsScorecards ExcelExcel BI toolsBI tools

Analysis ServicesAnalysis Services

Page 3: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Session ObjectivesSession ObjectivesAssumptionsAssumptions

Experience with SSIS and SSASExperience with SSIS and SSAS

GoalsGoalsDiscuss design, performance, and scalability for building ETL Discuss design, performance, and scalability for building ETL packages and cubes (UDMs)packages and cubes (UDMs)Best practicesBest practicesCommon mistakesCommon mistakes

Page 4: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Page 5: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SQL Server 2005 Best Practices SQL Server 2005 Best Practices AnalyzerAnalyzer

Utility that scans your SQL Server metadata and recommends Utility that scans your SQL Server metadata and recommends best practicesbest practicesBest practices from dev team and Customer Support ServicesBest practices from dev team and Customer Support ServicesWhat’s new:What’s new:

Support for SQL Server 2005Support for SQL Server 2005Support for Analysis Services and Integration ServicesSupport for Analysis Services and Integration ServicesScan schedulingScan schedulingAuto update frameworkAuto update frameworkCTP available now, RTM April CTP available now, RTM April

http://www.microsoft.com/downloads/details.aspx?FamilyId=DA0531E4-http://www.microsoft.com/downloads/details.aspx?FamilyId=DA0531E4-E94C-4991-82FA-F0E3FBD05E63&displaylang=enE94C-4991-82FA-F0E3FBD05E63&displaylang=en

Page 6: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

AgendaAgendaIntegration ServicesIntegration Services

Quick overview of ISQuick overview of ISPrinciples of Good Package DesignPrinciples of Good Package DesignComponent DrilldownComponent DrilldownPerformance TuningPerformance Tuning

Analysis ServicesAnalysis ServicesUDM overviewUDM overviewUDM design best practicesUDM design best practicesPerformance tipsPerformance tips

Page 7: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

What is SQL Server Integration Services?What is SQL Server Integration Services?

Introduced in SQL Server Introduced in SQL Server 20052005The successor to Data The successor to Data Transformation ServicesTransformation ServicesThe platform for a new The platform for a new generation of high-generation of high-performance data performance data integration technologiesintegration technologies

Page 8: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Call center data: semi structured

Legacy data: binary files

Application database

ETL Warehouse

Reports

Mobiledata

Data mining

Alerts & escalation

•Integration and warehousing require separate, staged operations.•Preparation of data requires different, often incompatible, tools.•Reporting and escalation is a slow process, delaying smart responses.•Heavy data volumes make this scenario increasingly unworkable.

Handcoding

StagingText mining

ETL Staging

Cleansing &

ETL

Staging

ETL

ETL Objective: Before SSISETL Objective: Before SSIS

Page 9: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Call center: semi-structured data

Legacy data: binary files

Application database

•Integration and warehousing are a seamless, manageable operation.•Source, prepare, and load data in a single, auditable process.•Reporting and escalation can be parallelized with the warehouse load.•Scales to handle heavy and complex data requirements.

SQL Server Integration Services

Text miningcomponents

Customsource

Standardsources

Data-cleansingcomponents

Merges

Data miningcomponents

Warehouse

Reports

Mobiledata

Alerts & escalation

Changing the Game with SSISChanging the Game with SSIS

Page 10: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

SSIS ArchitectureSSIS ArchitectureControl Flow (Runtime)Control Flow (Runtime)

A parallel workflow engineA parallel workflow engineExecutes containers and tasksExecutes containers and tasks

Data Flow (“Pipeline”)Data Flow (“Pipeline”)A special runtime taskA special runtime taskA high-performance data pipelineA high-performance data pipelineApplies graphs of components to data Applies graphs of components to data movementmovementComponent can be sources, Component can be sources, transformations or destinationstransformations or destinationsHighly parallel operations possibleHighly parallel operations possible

Page 11: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

AgendaAgendaOverview of Integration ServicesOverview of Integration ServicesPrinciples of Good Package DesignPrinciples of Good Package DesignComponent DrilldownComponent DrilldownPerformance TuningPerformance Tuning

Page 12: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Principles of Good Package Design - GeneralPrinciples of Good Package Design - General

Follow Microsoft Development GuidelinesFollow Microsoft Development GuidelinesIterative design, development & testingIterative design, development & testing

Understand the BusinessUnderstand the BusinessUnderstanding the people & processes are critical for successUnderstanding the people & processes are critical for successKimball’s “Data Warehouse ETL Toolkit” book is an excellent referenceKimball’s “Data Warehouse ETL Toolkit” book is an excellent reference

Get the big pictureGet the big pictureResource contention, processing windows, …Resource contention, processing windows, …SSIS does not forgive bad database designSSIS does not forgive bad database designOld principles still apply – e.g. load with/without indexes?Old principles still apply – e.g. load with/without indexes?

Platform considerationsPlatform considerationsWill this run on IA64 / X64?Will this run on IA64 / X64?

No BIDS on IA64 – how will I debug?No BIDS on IA64 – how will I debug?Is OLE-DB driver XXX available on IA64?Is OLE-DB driver XXX available on IA64?

Memory and resource usage on different platformsMemory and resource usage on different platforms

Page 13: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Principles of Good Package Design - Principles of Good Package Design - ArchitectureArchitecture

Process ModularityProcess ModularityBreak complex ETL into logically distinct packages (vs. Break complex ETL into logically distinct packages (vs. monolithic design)monolithic design)Improves development & debug experienceImproves development & debug experience

Package ModularityPackage ModularitySeparate sub-processes within package into separate Separate sub-processes within package into separate ContainersContainersMore elegant, easier to developMore elegant, easier to developSimple to disable whole Containers when debuggingSimple to disable whole Containers when debugging

Component ModularityComponent ModularityUse Script Task/Transform for one-off problemsUse Script Task/Transform for one-off problemsBuild custom components for maximum re-useBuild custom components for maximum re-use

Page 14: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Bad ModularityBad Modularity

Page 15: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Good ModularityGood Modularity

Page 16: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Principles of Good Package Design - Principles of Good Package Design - InfrastructureInfrastructure

Use Package ConfigurationsUse Package ConfigurationsBuild it in from the startBuild it in from the start

Will make things easier later onWill make things easier later onSimplify deployment Dev Simplify deployment Dev QA QA Production Production

Use Package LoggingUse Package LoggingPerformance & debuggingPerformance & debugging

Build in Security from the startBuild in Security from the startCredentials and other sensitive infoCredentials and other sensitive infoPackage & Process IPPackage & Process IPConfigurations & ParametersConfigurations & Parameters

Page 17: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Principles of Good Package Design - Principles of Good Package Design - DevelopmentDevelopment

SSIS is visual programming! SSIS is visual programming! Use source code control systemUse source code control system

Undo is not as simple in a GUI environment!Undo is not as simple in a GUI environment!Improved experience for multi-developer environmentImproved experience for multi-developer environment

Comment your packages and scriptsComment your packages and scriptsIn 2 weeks even you may forget a subtlety of your designIn 2 weeks even you may forget a subtlety of your designSomeone else has to maintain your codeSomeone else has to maintain your code

Use error-handlingUse error-handlingUse the correct precedence constraints on tasksUse the correct precedence constraints on tasksUse the error outputs on transforms – store them in a table for Use the error outputs on transforms – store them in a table for processing later, or use downstream if the error can be handled in the processing later, or use downstream if the error can be handled in the packagepackageTry…Catch in your scriptsTry…Catch in your scripts

Page 18: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Component Drilldown - Tasks & TransformsComponent Drilldown - Tasks & Transforms

Avoid over-designAvoid over-designToo many moving parts is inelegant and likely slowToo many moving parts is inelegant and likely slowBut don’t be afraid to experiment – there are many ways to But don’t be afraid to experiment – there are many ways to solve a problemsolve a problem

Maximize ParallelismMaximize ParallelismAllocate enough threadsAllocate enough threadsEngineThreads property on DataFlow TaskEngineThreads property on DataFlow Task““Rule of thumb” - # of datasources + # of async componentsRule of thumb” - # of datasources + # of async components

Minimize blockingMinimize blockingSynchronous vs. Asynchronous componentsSynchronous vs. Asynchronous componentsMemcopy is expensive – reduce the number of Memcopy is expensive – reduce the number of asynchronous components in a flow if possible – example asynchronous components in a flow if possible – example coming up latercoming up later

Minimize ancillary dataMinimize ancillary dataFor example, minimize data retrieved by LookupTxFor example, minimize data retrieved by LookupTx

Page 19: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Debugging & Performance Tuning - GeneralDebugging & Performance Tuning - General

Leverage the logging and auditing featuresLeverage the logging and auditing featuresMsgBox is your friendMsgBox is your friendPipeline debuggers are your friendPipeline debuggers are your friendUse the throughput component from Project REALUse the throughput component from Project REAL

Experiment with different techniquesExperiment with different techniquesUse source code control system Use source code control system Focus on the bottlenecks – methodology discussed laterFocus on the bottlenecks – methodology discussed later

Test on different platformsTest on different platforms32bit, IA64, x6432bit, IA64, x64Local Storage, SANLocal Storage, SANMemory considerationsMemory considerationsNetwork & topology considerationsNetwork & topology considerations

Page 20: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Debugging & Performance Tuning - VolumeDebugging & Performance Tuning - Volume

Remove redundant columnsRemove redundant columnsUse SELECT statements as opposed to tablesUse SELECT statements as opposed to tablesSELECT * is your enemySELECT * is your enemyAlso remove redundant columns after every async component!Also remove redundant columns after every async component!

Filter rowsFilter rowsWHERE clause is your friendWHERE clause is your friendConditional Split in SSISConditional Split in SSISConcatenate or re-route unneeded columnsConcatenate or re-route unneeded columns

Parallel loadingParallel loadingSource system split source data into multiple chunksSource system split source data into multiple chunks

Flat Files – multiple filesFlat Files – multiple filesRelational – via key fields and indexesRelational – via key fields and indexes

Multiple Destination components all loading same tableMultiple Destination components all loading same table

Page 21: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Debugging & Performance Tuning - Debugging & Performance Tuning - ApplicationApplication

Is BCP good enough?Is BCP good enough?Overhead of starting up an SSIS package may offset any performance gain over Overhead of starting up an SSIS package may offset any performance gain over BCP for small data setsBCP for small data setsIs the greater manageability and control of SSIS needed?Is the greater manageability and control of SSIS needed?

Which pattern?Which pattern?Many Lookup patterns possible – which one is most suitable?Many Lookup patterns possible – which one is most suitable?See Project Real for examples of patterns:See Project Real for examples of patterns:http://www.microsoft.com/sql/solutions/bi/projectreal.mspxhttp://www.microsoft.com/sql/solutions/bi/projectreal.mspx

Which component?Which component?Bulk Import Task vs. Data FlowBulk Import Task vs. Data Flow

Bulk Import might give better performance if there are no transformations or filtering required, Bulk Import might give better performance if there are no transformations or filtering required, and the destination is SQL Server.and the destination is SQL Server.

Lookup vs. MergeJoin (LeftJoin) vs. set based statements in SQLLookup vs. MergeJoin (LeftJoin) vs. set based statements in SQLMergeJoin might be required if you’re not able to populate the lookup cache.MergeJoin might be required if you’re not able to populate the lookup cache.Set based SQL statements might provide a way to persist lookup cache misses and apply a set Set based SQL statements might provide a way to persist lookup cache misses and apply a set based operation for higher performance.based operation for higher performance.

Script vs. custom componentScript vs. custom componentScript might be good enough for small transforms that’re typically not reusedScript might be good enough for small transforms that’re typically not reused

Page 22: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Case Study - PatternsCase Study - Patterns

105 seconds 83 seconds

Use Error Output for handling Lookup miss

Ignore lookup errors and check for null looked up values in Derived Column

Page 23: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Debugging & Performance Tuning – Debugging & Performance Tuning – A methodologyA methodology

Optimize and Stabilize the basicsOptimize and Stabilize the basicsMinimize staging (else use RawFiles if possible)Minimize staging (else use RawFiles if possible)Make sure you have enough MemoryMake sure you have enough MemoryWindows, Disk, Network, …Windows, Disk, Network, …SQL FileGroups, Indexing, PartitioningSQL FileGroups, Indexing, Partitioning

Get BaselineGet BaselineReplace destinations with RowCountReplace destinations with RowCountSource->RowCount throughputSource->RowCount throughputSource->Destination throughputSource->Destination throughput

Incrementally add/change components to see effectIncrementally add/change components to see effectThis could include the DB layerThis could include the DB layerUse source code control!Use source code control!

Optimize slow components for resources availableOptimize slow components for resources available

Page 24: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Case Study - ParallelismCase Study - ParallelismFocus on critical pathFocus on critical pathUtilize available resourcesUtilize available resources

Memory Constrained Reader and CPU Constrained

Let it rip! Optimize the slowest

Page 25: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

SummarySummaryFollow best practice development methodsFollow best practice development methodsUnderstand how SSIS architecture influences performanceUnderstand how SSIS architecture influences performance

Buffers, component typesBuffers, component typesDesign PatternsDesign Patterns

Learn the new featuresLearn the new featuresBut do not forget the existing principlesBut do not forget the existing principles

Use the native functionalityUse the native functionalityBut do not be afraid to extendBut do not be afraid to extend

Measure performanceMeasure performanceFocus on the bottlenecksFocus on the bottlenecks

Maximize parallelism and memory use where appropriateMaximize parallelism and memory use where appropriateBe aware of different platforms capabilities (64bit RAM)Be aware of different platforms capabilities (64bit RAM)

Testing is keyTesting is key

Page 26: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Analysis ServicesAnalysis Services

Page 27: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Clippy® for Business Intelligence!Clippy® for Business Intelligence!

Page 28: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AgendaAgenda

Server architecture and UDM basicsServer architecture and UDM basicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion

Page 29: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Analysis Server

OLEDB

ADOMD .NET

AMO IIS

TCP

HTTP

XMLA

ADOMD

Client Apps

BIDS

SSMS

Profiler

Excel

Client Server ArchitectureClient Server Architecture

Page 30: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

DimensionDimension

An entity on which analysis is to be performed An entity on which analysis is to be performed (e.g. Customers)(e.g. Customers)Consists of:Consists of:

Attributes that describe the entityAttributes that describe the entityHierarchies that organize dimension members in Hierarchies that organize dimension members in meaningful waysmeaningful ways

CustomerID

FirstName

LastName

State

City MaritalStatus

Gender

… Age

123 Tom Martens BRU Brussels Married Male … 42

456 Frank Desmet OVL Gent Unmarried

Male … 34

789 Ilse Vermeersch ANT Antwerpen

Married Female

… 21

Page 31: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AttributeAttribute

Containers of dimension members.Containers of dimension members.Completely define the dimensional space.Completely define the dimensional space.Enable slicing and grouping the dimensional Enable slicing and grouping the dimensional space in interesting ways.space in interesting ways.

Customers in Customers in Brussels Brussels and and age > 50age > 50Customers who are Customers who are marriedmarried and and drink Beerdrink Beer

Typically have one-many relationshipsTypically have one-many relationshipsCity City State, State State, State Country, etc. Country, etc.All attributes implicitly related to the keyAll attributes implicitly related to the key

Page 32: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Ordered collection of attributes into levelsOrdered collection of attributes into levelsNavigation path through dimensional spaceNavigation path through dimensional spaceUser defined hierarchies – typically multiple levelsUser defined hierarchies – typically multiple levelsAttribute hierarchies – implicitly created for each Attribute hierarchies – implicitly created for each attribute – single levelattribute – single level

HierarchyHierarchy

Customers by Geography

Country

State

City

Customer

Customers by Demographics

Marital

Gender

Customer

Page 33: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Customer

City

State

Country

Gender Marital

Country

State

City

Customer

Gender

Customer

Marital

Gender

Customer

Customer

City

State

Country

Gender

Marital

Attributes Hierarchies

Age

Dimension ModelDimension Model

Page 34: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

CubeCube

Collection of dimensions and measuresCollection of dimensions and measuresMeasure Measure numeric data associated with a numeric data associated with a set of dimensions (e.g. Qty Sold, Sales set of dimensions (e.g. Qty Sold, Sales Amount, Cost)Amount, Cost)Multi-dimensional spaceMulti-dimensional space

Defined by dimensions and measuresDefined by dimensions and measuresE.g. (Customers, Products, Time, Measures)E.g. (Customers, Products, Time, Measures)

Intersection of dimension members and measures Intersection of dimension members and measures is is a cell (Belgium, Mussels, 2006, Sales Amount) = a cell (Belgium, Mussels, 2006, Sales Amount) = E10,523,374.83E10,523,374.83

Page 35: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

A CubeA Cube

ProductProductPeasPeas CornCorn BreadBread SpudsSpuds ChocolateChocolate

MMaarrkkeett

BrusselsBrussels

LondonLondon

OsloOslo

DublinDublinJanJan

MarMarFebFeb

TimeTime

Units of Chocolate sold in Units of Chocolate sold in Brussels in JanuaryBrussels in January

Page 36: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Measure GroupMeasure Group

Group of measures with same dimensionalityGroup of measures with same dimensionalityAnalogous to fact tableAnalogous to fact tableCube can contain more than one measure Cube can contain more than one measure groupgroup

E.g. Sales, Inventory, FinanceE.g. Sales, Inventory, Finance

Multi-dimensional spaceMulti-dimensional spaceSubset of dimensions and measures in the cubeSubset of dimensions and measures in the cube

AS2000 comparisonAS2000 comparisonVirtual Cube Virtual Cube Cube CubeCube Cube Measure Group Measure Group

Page 37: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Sales Inventory Finance

Customers X

Products X X

Time X X X

Promotions X

Warehouse X

Department

X

Account X

Scenario X

Measure GroupMeasure Group

Measure GroupMeasure Group

Page 38: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AgendaAgenda

Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion

Page 39: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Top 3 Tenets of Good Cube DesignTop 3 Tenets of Good Cube Design

Attribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationships

Page 40: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute Relationships

One-to-many relationships between attributesOne-to-many relationships between attributesServer simply “works better” if you define them where Server simply “works better” if you define them where applicableapplicableExamples:Examples:

City City State, State State, State Country CountryDay Day Month, Month Month, Month Quarter, Quarter Quarter, Quarter Year Year Product Subcategory Product Subcategory Product Category Product Category

Rigid v/s flexible relationships (default is flexible)Rigid v/s flexible relationships (default is flexible)Customer Customer City, Customer City, Customer PhoneNo are flexible PhoneNo are flexibleCustomer Customer BirthDate, City BirthDate, City State are rigid State are rigid

All attributes implicitly related to key attributeAll attributes implicitly related to key attribute

Page 41: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Customer

City

State Country

Gender Marital Age

Attribute Relationships Attribute Relationships (continued)(continued)

Page 42: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Customer

City

State

Country

Gender Marital Age

Attribute Relationships Attribute Relationships (continued)(continued)

Page 43: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute Relationships Attribute Relationships Where are they used?Where are they used?

MDX SemanticsMDX SemanticsTells the formula engine how to roll up measure valuesTells the formula engine how to roll up measure valuesIf the grain of the measure group is different from the key If the grain of the measure group is different from the key attribute (e.g. Sales by Month)attribute (e.g. Sales by Month)

Attribute relationships from grain to other attributes required Attribute relationships from grain to other attributes required (e.g. Month (e.g. Month Quarter, Quarter Quarter, Quarter Year) Year)Otherwise no data (NULL) returned for Quarter and YearOtherwise no data (NULL) returned for Quarter and Year

MDX Semantics explained in detail at:MDX Semantics explained in detail at:http://www.sqlserveranalysisservices.com/OLAPPapers/AttributeRelationships.htmhttp://www.sqlserveranalysisservices.com/OLAPPapers/AttributeRelationships.htm

Page 44: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsWhere are they used?Where are they used?

StorageStorageReduces redundant relationships between dimension Reduces redundant relationships between dimension members – normalizes dimension storagemembers – normalizes dimension storageEnables clustering of records within partition segments (e.g. Enables clustering of records within partition segments (e.g. store facts for a month together)store facts for a month together)

ProcessingProcessingReduces memory consumption in dimension processing – Reduces memory consumption in dimension processing – less hash tables to fit in memoryless hash tables to fit in memoryAllows large dimensions to push 32-bit barrierAllows large dimensions to push 32-bit barrierSpeeds up dimension and partition processing overallSpeeds up dimension and partition processing overall

Page 45: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsWhere are they used?Where are they used?

Query performanceQuery performanceDimension storage access is fasterDimension storage access is fasterProduces more optimal execution plansProduces more optimal execution plans

Aggregation designAggregation designEnables aggregation design algorithm to produce effective Enables aggregation design algorithm to produce effective set of aggregationsset of aggregations

Dimension securityDimension securityDeniedSet = {Country.Ireland} should deny cities and DeniedSet = {Country.Ireland} should deny cities and customers in Ireland – requires attribute relationshipscustomers in Ireland – requires attribute relationships

Member propertiesMember propertiesAttribute relationships identify member properties on levelsAttribute relationships identify member properties on levels

Page 46: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsHow to set them up?How to set them up?

Creating an attribute relationship is easy, but …Creating an attribute relationship is easy, but …Pay careful attention to the key columns!Pay careful attention to the key columns!Make sure every attribute has unique key columns (add Make sure every attribute has unique key columns (add composite keys as needed)composite keys as needed)

There must be a 1:M relation between the key There must be a 1:M relation between the key columns of the two attributescolumns of the two attributesInvalid key columns cause a member to have Invalid key columns cause a member to have multiple parentsmultiple parents

Dimension processing picks one parent arbitrarily and Dimension processing picks one parent arbitrarily and succeedssucceedsHierarchy looks wrong!Hierarchy looks wrong!

Page 47: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsHow to set them up?How to set them up?

Don’t forget to remove redundant relationships!Don’t forget to remove redundant relationships!All attributes start with relationship to keyAll attributes start with relationship to keyCustomer Customer City City State State Country Country

Customer Customer State (redundant) State (redundant)Customer Customer Country (redundant) Country (redundant)

Page 48: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsExampleExample

Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year

Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4Quarter: 1 to 4Month: 1 to 12Month: 1 to 12Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231

Day

Week

Month

Quarter

Year

Page 49: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Attribute RelationshipsAttribute RelationshipsExampleExample

Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year

Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)

Month: 1 to 12Month: 1 to 12Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231

Day

Week

Month

Quarter

Year

Page 50: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year

Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)

Month: 1 to 12 - Month: 1 to 12 - Key columns (Year, Month)Key columns (Year, Month)

Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231

Attribute RelationshipsAttribute RelationshipsExampleExample

Day

Week

Month

Quarter

Year

Page 51: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year

Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)

Month: 1 to 12 - Month: 1 to 12 - Key columns (Year, Month)Key columns (Year, Month)

Week: 1 to 52 - Week: 1 to 52 - Key columns (Year, Week)Key columns (Year, Week) Day: 20030101 to 20101231Day: 20030101 to 20101231

Attribute RelationshipsAttribute RelationshipsExampleExample

Day

Week

Month

Quarter

Year

Page 52: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Defining Attribute RelationshipsDefining Attribute Relationships

Page 53: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

User Defined HierarchiesUser Defined Hierarchies

Pre-defined navigation paths thru dimensional Pre-defined navigation paths thru dimensional space defined by attributesspace defined by attributesAttribute hierarchies enable ad hoc navigationAttribute hierarchies enable ad hoc navigationWhy create user defined hierarchies then?Why create user defined hierarchies then?

Guide end users to interesting navigation pathsGuide end users to interesting navigation pathsExisting client tools are not “attribute aware”Existing client tools are not “attribute aware”PerformancePerformance

Optimize navigation path at processing timeOptimize navigation path at processing timeMaterialization of hierarchy tree on diskMaterialization of hierarchy tree on diskAggregation designer favors user defined hierarchiesAggregation designer favors user defined hierarchies

Page 54: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Natural HierarchiesNatural Hierarchies

1:M relation (via attribute relationships) between 1:M relation (via attribute relationships) between every pair of adjacent levelsevery pair of adjacent levelsExamples:Examples:

Country-State-City-Customer (natural)Country-State-City-Customer (natural)Country-City (natural)Country-City (natural)State-Customer (natural)State-Customer (natural)Age-Gender-Customer (unnatural)Age-Gender-Customer (unnatural)Year-Quarter-Month (depends on key columns)Year-Quarter-Month (depends on key columns)

How many quarters and months?How many quarters and months?4 & 12 across all years (unnatural)4 & 12 across all years (unnatural)4 & 12 for each year (natural)4 & 12 for each year (natural)

Page 55: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Natural Hierarchies Natural Hierarchies Best Practice for Hierarchy DesignBest Practice for Hierarchy Design

Performance implicationsPerformance implicationsOnly natural hierarchies are materialized on disk Only natural hierarchies are materialized on disk during processingduring processingUnnatural hierarchies are built on the fly during Unnatural hierarchies are built on the fly during queries (and cached in memory)queries (and cached in memory)Server internally decomposes unnatural Server internally decomposes unnatural hierarchies into natural componentshierarchies into natural components

Essentially operates like ad hoc navigation path (but somewhat better)Essentially operates like ad hoc navigation path (but somewhat better)

Create natural hierarchies where possibleCreate natural hierarchies where possibleUsing attribute relationshipsUsing attribute relationshipsNot always appropriate (e.g. Age-Gender)Not always appropriate (e.g. Age-Gender)

Page 56: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for Cube DesignBest Practices for Cube DesignDimensionsDimensions

Consolidate multiple hierarchies into single dimension Consolidate multiple hierarchies into single dimension (unless they are related via fact table)(unless they are related via fact table)Avoid ROLAP storage mode if performance is keyAvoid ROLAP storage mode if performance is keyUse role playing dimensions (e.g. OrderDate, BillDate, Use role playing dimensions (e.g. OrderDate, BillDate, ShipDate) - avoids multiple physical copiesShipDate) - avoids multiple physical copiesUse parent-child dimensions prudentlyUse parent-child dimensions prudently

No intermediate level aggregation supportNo intermediate level aggregation support

Use many-to-many dimensions prudentlyUse many-to-many dimensions prudentlySlower than regular dimensions, but faster than calculationsSlower than regular dimensions, but faster than calculationsIntermediate measure group must be “small” relative to primary measure Intermediate measure group must be “small” relative to primary measure groupgroup

Page 57: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for Cube DesignBest Practices for Cube DesignAttributesAttributes

Define all possible attribute relationships!Define all possible attribute relationships!Mark attribute relationships as rigid where appropriateMark attribute relationships as rigid where appropriateUse integer (or numeric) key columnsUse integer (or numeric) key columnsSet AttributeHierarchyEnabled to false for attributes not Set AttributeHierarchyEnabled to false for attributes not used for navigation (e.g. Phone#, Address)used for navigation (e.g. Phone#, Address)Set AttributeHierarchyOptimizedState to NotOptimized for Set AttributeHierarchyOptimizedState to NotOptimized for infrequently used attributesinfrequently used attributesSet AttributeHierarchyOrdered to false if the order of Set AttributeHierarchyOrdered to false if the order of members returned by queries is not importantmembers returned by queries is not important

HierarchiesHierarchiesUse natural hierarchies where possibleUse natural hierarchies where possible

Page 58: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for Cube DesignBest Practices for Cube Design

MeasuresMeasuresUse smallest numeric data type possible Use smallest numeric data type possible Use semi-additive aggregate functions instead of MDX Use semi-additive aggregate functions instead of MDX calculations to achieve same behaviorcalculations to achieve same behaviorPut distinct count measures into separate measure Put distinct count measures into separate measure group (BIDS does this automatically)group (BIDS does this automatically)Avoid string source column for distinct count measuresAvoid string source column for distinct count measures

Page 59: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AgendaAgenda

Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion

Page 60: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

PartitioningPartitioning

Mechanism to break up large cube into Mechanism to break up large cube into manageable chunksmanageable chunksPartitions can be added, processed, Partitions can be added, processed, deleted independentlydeleted independently

Update to last month’s data does not affect prior months’ Update to last month’s data does not affect prior months’ partitionspartitionsSliding window scenario easy to implementSliding window scenario easy to implement

E.g. 24 month window E.g. 24 month window add June 2006 partition and delete June 2004 add June 2006 partition and delete June 2004

Partitions can have different storage settingsPartitions can have different storage settings

Partitions require Enterprise Edition!

Page 61: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Benefits of PartitioningBenefits of PartitioningPartitions can be processed and queried Partitions can be processed and queried in parallelin parallel

Better utilization of server resourcesBetter utilization of server resourcesReduced data warehouse load timesReduced data warehouse load times

Queries are isolated to relevant partitions Queries are isolated to relevant partitions less data less data to scanto scan

SELECT … FROM … WHERE [Time].[Year].[2006]SELECT … FROM … WHERE [Time].[Year].[2006]Queries only 2006 partitionsQueries only 2006 partitions

Bottom line Bottom line partitions enable: partitions enable:ManageabilityManageabilityPerformancePerformanceScalabilityScalability

Page 62: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for PartitioningBest Practices for Partitioning

No more than 20M rows per partitionNo more than 20M rows per partitionSpecify partition sliceSpecify partition slice

Optional for MOLAP – server auto-detects the slice and Optional for MOLAP – server auto-detects the slice and validates against user specified slice (if any)validates against user specified slice (if any)Must be specified for ROLAPMust be specified for ROLAP

Manage storage settings by usage patternsManage storage settings by usage patternsFrequently queried Frequently queried MOLAP with lots of aggs MOLAP with lots of aggsPeriodically queried Periodically queried MOLAP with less or no aggs MOLAP with less or no aggsHistorical Historical ROLAP with no aggs ROLAP with no aggs

Alternate disk drive - use multiple controllers to avoid Alternate disk drive - use multiple controllers to avoid I/O contentionI/O contention

Page 63: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for AggregationsBest Practices for Aggregations

Define all possible attribute relationshipsDefine all possible attribute relationshipsSet accurate attribute member counts and fact Set accurate attribute member counts and fact table countstable countsSet AggregationUsage to guide agg designerSet AggregationUsage to guide agg designer

Set rarely queried attributes to NoneSet rarely queried attributes to NoneSet commonly queried attributes to UnrestrictedSet commonly queried attributes to Unrestricted

Do not build too many aggregationsDo not build too many aggregationsIn the 100s, not 1000s!In the 100s, not 1000s!

Do not build aggregations larger than 30% of fact Do not build aggregations larger than 30% of fact table size (agg design algorithm doesn’t)table size (agg design algorithm doesn’t)

Page 64: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for AggregationsBest Practices for Aggregations

Aggregation design cycleAggregation design cycleUse Storage Design Wizard (~20% perf gain) to Use Storage Design Wizard (~20% perf gain) to design initial set of aggregationsdesign initial set of aggregationsEnable query log and run pilot workload (beta test Enable query log and run pilot workload (beta test with limited set of users)with limited set of users)Use Usage Based Optimization (UBO) Wizard to Use Usage Based Optimization (UBO) Wizard to refine aggregationsrefine aggregationsUse larger perf gain (70-80%)Use larger perf gain (70-80%)Reprocess partitions for new aggregations to Reprocess partitions for new aggregations to take effecttake effectPeriodically use UBO to refine aggregationsPeriodically use UBO to refine aggregations

Page 65: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AgendaAgenda

Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion

Page 66: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Improving ProcessingImproving ProcessingSQL Server Performance TuningSQL Server Performance Tuning

Improve the queries that are used for extracting data from SQL ServerImprove the queries that are used for extracting data from SQL ServerCheck for proper plans and indexingCheck for proper plans and indexingConduct regular SQL performance tuning processConduct regular SQL performance tuning process

AS Processing ImprovementsAS Processing ImprovementsUse SP2 !!Use SP2 !!

Processing 20 partitions: SP1 1:56, SP2: 1:06Processing 20 partitions: SP1 1:56, SP2: 1:06Don’t let UI default for parallel processingDon’t let UI default for parallel processing

Go into advanced processing tab and change itGo into advanced processing tab and change itMonitor the values:Monitor the values:

Maximum number of datasource connectionsMaximum number of datasource connectionsMaxParallel – How many partitions processed in parallel, don’t let the server decide MaxParallel – How many partitions processed in parallel, don’t let the server decide on its own.on its own.

Use INT for keys, if possible.Use INT for keys, if possible.

Parallel processing requires Enterprise Edition!

Page 67: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

Improving ProcessingImproving Processing

For best performance use ASCMD.EXE and XMLA For best performance use ASCMD.EXE and XMLA Use <Parallel> </Parallel> to group processing tasks Use <Parallel> </Parallel> to group processing tasks together until Server is using maximum resourcestogether until Server is using maximum resourcesProper use of <Transaction> </Transaction>Proper use of <Transaction> </Transaction>

ProcessFact and ProcessIndex separately instead ProcessFact and ProcessIndex separately instead of ProcessFull (for large partitions)of ProcessFull (for large partitions)

Consumes less memory.Consumes less memory.

ProcessClearIndexes deletes existing indexes and ProcessClearIndexes deletes existing indexes and ProcessIndexes generates or reprocesses existing ProcessIndexes generates or reprocesses existing ones.ones.

Page 68: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for ProcessingBest Practices for Processing

Partition processingPartition processingMonitor aggregation processing spilling to disk (perfmon Monitor aggregation processing spilling to disk (perfmon counters for temp file usage)counters for temp file usage)

Add memory, turn on /3GB, move to x64/ia64Add memory, turn on /3GB, move to x64/ia64

Fully process partitions periodicallyFully process partitions periodicallyAchieves better compression over repeated incremental Achieves better compression over repeated incremental processingprocessing

Data sourcesData sourcesAvoid using .NET data sources – OLEDB is faster for Avoid using .NET data sources – OLEDB is faster for processingprocessing

Page 69: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

AgendaAgenda

Server architectureServer architectureUDM BasicsUDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion

Page 70: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Non_Empty_BehaviorNon_Empty_Behavior

Most client tools (Excel, Proclarity) display non empty Most client tools (Excel, Proclarity) display non empty results – eliminate members with no dataresults – eliminate members with no dataWith no calculations, non empty is fast – just checks With no calculations, non empty is fast – just checks fact datafact dataWith calculations, non empty can be slow – requires With calculations, non empty can be slow – requires evaluating formula for each cellevaluating formula for each cellNon_Empty_Behavior allows non empty on Non_Empty_Behavior allows non empty on calculations to just check fact datacalculations to just check fact data

Note: query processing hint – use with care!Note: query processing hint – use with care!

Create Member [Measures].[Internet Gross Profit]Create Member [Measures].[Internet Gross Profit] AAss[Internet Sales Amount][Internet Sales Amount] -- [Internet Total Cost],[Internet Total Cost],Format_String = "Currency",Format_String = "Currency",Non_Empty_Behavior =Non_Empty_Behavior = [Internet Sales Amount][Internet Sales Amount]; ; BAD!BAD!

Page 71: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Auto-ExistsAuto-ExistsAttributes/hierarchies within a dimension are always Attributes/hierarchies within a dimension are always existed togetherexisted together

City.Brussels * Country.Members returns {(Brussels, Belgium)}City.Brussels * Country.Members returns {(Brussels, Belgium)}(Dublin, Belgium), (Dublin, USA) do not “exist”(Dublin, Belgium), (Dublin, USA) do not “exist”

Exploit the power of auto-existsExploit the power of auto-existsUse Exists/CrossJoin instead of .Properties – fasterUse Exists/CrossJoin instead of .Properties – fasterRequires attribute hierarchy enabled on member propertyRequires attribute hierarchy enabled on member property

Filter(Customer.Members,Filter(Customer.Members, Customer.CurrentMember.Properties(“Gender”) = “Male”)Customer.CurrentMember.Properties(“Gender”) = “Male”)

Exists(Customer.Members, Gender.[Male])Exists(Customer.Members, Gender.[Male])

Page 72: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Conditional Statement: IIFConditional Statement: IIFUse scopes instead of conditions such as IIf/CaseUse scopes instead of conditions such as IIf/Case

Scopes are evaluated once staticallyScopes are evaluated once staticallyConditions are evaluated dynamically for each cellConditions are evaluated dynamically for each cell

Always try to coerce IIF for one branch to be nullAlways try to coerce IIF for one branch to be null

Create Member Measures.Create Member Measures.Sales Sales AAss Iif(Currency.CurrentMember Is Currency.USD,Iif(Currency.CurrentMember Is Currency.USD, Measures.SalesUSD, Measures.SalesUSD * Measures.XRate);Measures.SalesUSD, Measures.SalesUSD * Measures.XRate);

Create Member Measures.Sales As Null;Create Member Measures.Sales As Null;Scope(Measures.Sales, Currency.Members);Scope(Measures.Sales, Currency.Members); This = Measures.SalesUSD * Measures.XRate;This = Measures.SalesUSD * Measures.XRate; Scope(Currency.USA);Scope(Currency.USA); This = Measures.SalesUSD;This = Measures.SalesUSD; End Scope;End Scope;End Scope;End Scope;

Page 73: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Best Practices for MDXBest Practices for MDXUse calc members instead of calc cells where possibleUse calc members instead of calc cells where possibleUse .MemberValue for calcs on numeric attributesUse .MemberValue for calcs on numeric attributes

Filter(Customer.members, Salary.MemberValue > 100000)Filter(Customer.members, Salary.MemberValue > 100000)Avoid redundant use of .CurrentMember and .ValueAvoid redundant use of .CurrentMember and .Value

(Time.CurrentMember.PrevMember, Measures.CurrentMember ).Value can be (Time.CurrentMember.PrevMember, Measures.CurrentMember ).Value can be replaced with Time.PrevMemberreplaced with Time.PrevMember

Avoid LinkMember, StrToSet, StrToMember, StrToValueAvoid LinkMember, StrToSet, StrToMember, StrToValueReplace simple calcs with computed columns in DSVReplace simple calcs with computed columns in DSV

Calculation done at processing time is always betterCalculation done at processing time is always betterMany more at:Many more at:

Analysis Services Performance Whitepaper: Analysis Services Performance Whitepaper: http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc7f7151b2011c/SSAS2005PerfGuide.dochttp://sqljunkies.com/weblog/moshahttp://sqljunkies.com/weblog/moshahttp://sqlserveranalysisservices.comhttp://sqlserveranalysisservices.com

Page 74: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

ConclusionConclusionAS2005 is major re-architecture from AS2000AS2005 is major re-architecture from AS2000Design for perf & scalability from the startDesign for perf & scalability from the startMany principles carry through from AS2000Many principles carry through from AS2000

Dimensional design, Partitioning, AggregationsDimensional design, Partitioning, Aggregations

Many new principles in AS2005Many new principles in AS2005Attribute relationships, natural hierarchiesAttribute relationships, natural hierarchiesNew design alternatives – role playing, many-to-many, reference New design alternatives – role playing, many-to-many, reference dimensions, semi-additive measuresdimensions, semi-additive measuresFlexible processing optionsFlexible processing optionsMDX scripts, scopesMDX scripts, scopes

Use Analysis Services with SQL Server Enterprise Use Analysis Services with SQL Server Enterprise Edition to get max performance and scaleEdition to get max performance and scale

Page 75: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity

Page 76: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical CuriositySatisfy Your Technical Curiosity

ResourcesResourcesSSISSSIS

SQL Server Integration Services site – links to blogs, training, partners, etc.: SQL Server Integration Services site – links to blogs, training, partners, etc.: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx

SSIS MSDN Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?SSIS MSDN Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=80&SiteID=1ForumID=80&SiteID=1

SSIS MVP community site: SSIS MVP community site: http://www.sqlis.comhttp://www.sqlis.com

SSASSSAS

BLOGS: http://blogs.msdn.com/sqlcatBLOGS: http://blogs.msdn.com/sqlcatPROJECT REAL-Business Intelligence in PracticePROJECT REAL-Business Intelligence in PracticeAnalysis Services Performance GuideAnalysis Services Performance GuideTechNet: Analysis Services for IT ProfessionalsTechNet: Analysis Services for IT Professionals

Microsoft BIMicrosoft BI

SQL Server Business Intelligence public site: SQL Server Business Intelligence public site: http://www.microsoft.com/sql/evaluation/bi/default.asp http://www.microsoft.com/sql/evaluation/bi/default.asp

http://www.microsoft.com/bihttp://www.microsoft.com/bi

Page 77: Satisfy Your Technical Curiosity Creating High Impact Data Warehouses with SQL Server Integration Services and Analysis Services Ashvini Sharma Senior

Satisfy Your Technical Curiosity