Upload
james-barker
View
216
Download
0
Embed Size (px)
Citation preview
Satisfy Your Technical Curiosity
Creating High Impact Data Warehouses Creating High Impact Data Warehouses with SQL Server Integration Services with SQL Server Integration Services
and Analysis Servicesand Analysis Services
Ashvini SharmaAshvini SharmaSenior Program ManagerSenior Program [email protected]@microsoft.com
Satisfy Your Technical Curiosity
A DW ArchitectureA DW Architecture
Data Warehouse(SQL Server, Oracle,
DB2, Teradata)
Data Warehouse(SQL Server, Oracle,
DB2, Teradata)
SQL/OracleSQL/Oracle SAP/Dynamics
SAP/Dynamics LegacyLegacy TextText XMLXML
Integration ServicesIntegration Services
ReportsReports Dashboards
Dashboards ScorecardsScorecards ExcelExcel BI toolsBI tools
Analysis ServicesAnalysis Services
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Session ObjectivesSession ObjectivesAssumptionsAssumptions
Experience with SSIS and SSASExperience with SSIS and SSAS
GoalsGoalsDiscuss design, performance, and scalability for building ETL Discuss design, performance, and scalability for building ETL packages and cubes (UDMs)packages and cubes (UDMs)Best practicesBest practicesCommon mistakesCommon mistakes
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SQL Server 2005 Best Practices SQL Server 2005 Best Practices AnalyzerAnalyzer
Utility that scans your SQL Server metadata and recommends Utility that scans your SQL Server metadata and recommends best practicesbest practicesBest practices from dev team and Customer Support ServicesBest practices from dev team and Customer Support ServicesWhat’s new:What’s new:
Support for SQL Server 2005Support for SQL Server 2005Support for Analysis Services and Integration ServicesSupport for Analysis Services and Integration ServicesScan schedulingScan schedulingAuto update frameworkAuto update frameworkCTP available now, RTM April CTP available now, RTM April
http://www.microsoft.com/downloads/details.aspx?FamilyId=DA0531E4-http://www.microsoft.com/downloads/details.aspx?FamilyId=DA0531E4-E94C-4991-82FA-F0E3FBD05E63&displaylang=enE94C-4991-82FA-F0E3FBD05E63&displaylang=en
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
AgendaAgendaIntegration ServicesIntegration Services
Quick overview of ISQuick overview of ISPrinciples of Good Package DesignPrinciples of Good Package DesignComponent DrilldownComponent DrilldownPerformance TuningPerformance Tuning
Analysis ServicesAnalysis ServicesUDM overviewUDM overviewUDM design best practicesUDM design best practicesPerformance tipsPerformance tips
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
What is SQL Server Integration Services?What is SQL Server Integration Services?
Introduced in SQL Server Introduced in SQL Server 20052005The successor to Data The successor to Data Transformation ServicesTransformation ServicesThe platform for a new The platform for a new generation of high-generation of high-performance data performance data integration technologiesintegration technologies
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Call center data: semi structured
Legacy data: binary files
Application database
ETL Warehouse
Reports
Mobiledata
Data mining
Alerts & escalation
•Integration and warehousing require separate, staged operations.•Preparation of data requires different, often incompatible, tools.•Reporting and escalation is a slow process, delaying smart responses.•Heavy data volumes make this scenario increasingly unworkable.
Handcoding
StagingText mining
ETL Staging
Cleansing &
ETL
Staging
ETL
ETL Objective: Before SSISETL Objective: Before SSIS
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Call center: semi-structured data
Legacy data: binary files
Application database
•Integration and warehousing are a seamless, manageable operation.•Source, prepare, and load data in a single, auditable process.•Reporting and escalation can be parallelized with the warehouse load.•Scales to handle heavy and complex data requirements.
SQL Server Integration Services
Text miningcomponents
Customsource
Standardsources
Data-cleansingcomponents
Merges
Data miningcomponents
Warehouse
Reports
Mobiledata
Alerts & escalation
Changing the Game with SSISChanging the Game with SSIS
Satisfy Your Technical Curiosity
SSIS ArchitectureSSIS ArchitectureControl Flow (Runtime)Control Flow (Runtime)
A parallel workflow engineA parallel workflow engineExecutes containers and tasksExecutes containers and tasks
Data Flow (“Pipeline”)Data Flow (“Pipeline”)A special runtime taskA special runtime taskA high-performance data pipelineA high-performance data pipelineApplies graphs of components to data Applies graphs of components to data movementmovementComponent can be sources, Component can be sources, transformations or destinationstransformations or destinationsHighly parallel operations possibleHighly parallel operations possible
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
AgendaAgendaOverview of Integration ServicesOverview of Integration ServicesPrinciples of Good Package DesignPrinciples of Good Package DesignComponent DrilldownComponent DrilldownPerformance TuningPerformance Tuning
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Principles of Good Package Design - GeneralPrinciples of Good Package Design - General
Follow Microsoft Development GuidelinesFollow Microsoft Development GuidelinesIterative design, development & testingIterative design, development & testing
Understand the BusinessUnderstand the BusinessUnderstanding the people & processes are critical for successUnderstanding the people & processes are critical for successKimball’s “Data Warehouse ETL Toolkit” book is an excellent referenceKimball’s “Data Warehouse ETL Toolkit” book is an excellent reference
Get the big pictureGet the big pictureResource contention, processing windows, …Resource contention, processing windows, …SSIS does not forgive bad database designSSIS does not forgive bad database designOld principles still apply – e.g. load with/without indexes?Old principles still apply – e.g. load with/without indexes?
Platform considerationsPlatform considerationsWill this run on IA64 / X64?Will this run on IA64 / X64?
No BIDS on IA64 – how will I debug?No BIDS on IA64 – how will I debug?Is OLE-DB driver XXX available on IA64?Is OLE-DB driver XXX available on IA64?
Memory and resource usage on different platformsMemory and resource usage on different platforms
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Principles of Good Package Design - Principles of Good Package Design - ArchitectureArchitecture
Process ModularityProcess ModularityBreak complex ETL into logically distinct packages (vs. Break complex ETL into logically distinct packages (vs. monolithic design)monolithic design)Improves development & debug experienceImproves development & debug experience
Package ModularityPackage ModularitySeparate sub-processes within package into separate Separate sub-processes within package into separate ContainersContainersMore elegant, easier to developMore elegant, easier to developSimple to disable whole Containers when debuggingSimple to disable whole Containers when debugging
Component ModularityComponent ModularityUse Script Task/Transform for one-off problemsUse Script Task/Transform for one-off problemsBuild custom components for maximum re-useBuild custom components for maximum re-use
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Bad ModularityBad Modularity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Good ModularityGood Modularity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Principles of Good Package Design - Principles of Good Package Design - InfrastructureInfrastructure
Use Package ConfigurationsUse Package ConfigurationsBuild it in from the startBuild it in from the start
Will make things easier later onWill make things easier later onSimplify deployment Dev Simplify deployment Dev QA QA Production Production
Use Package LoggingUse Package LoggingPerformance & debuggingPerformance & debugging
Build in Security from the startBuild in Security from the startCredentials and other sensitive infoCredentials and other sensitive infoPackage & Process IPPackage & Process IPConfigurations & ParametersConfigurations & Parameters
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Principles of Good Package Design - Principles of Good Package Design - DevelopmentDevelopment
SSIS is visual programming! SSIS is visual programming! Use source code control systemUse source code control system
Undo is not as simple in a GUI environment!Undo is not as simple in a GUI environment!Improved experience for multi-developer environmentImproved experience for multi-developer environment
Comment your packages and scriptsComment your packages and scriptsIn 2 weeks even you may forget a subtlety of your designIn 2 weeks even you may forget a subtlety of your designSomeone else has to maintain your codeSomeone else has to maintain your code
Use error-handlingUse error-handlingUse the correct precedence constraints on tasksUse the correct precedence constraints on tasksUse the error outputs on transforms – store them in a table for Use the error outputs on transforms – store them in a table for processing later, or use downstream if the error can be handled in the processing later, or use downstream if the error can be handled in the packagepackageTry…Catch in your scriptsTry…Catch in your scripts
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Component Drilldown - Tasks & TransformsComponent Drilldown - Tasks & Transforms
Avoid over-designAvoid over-designToo many moving parts is inelegant and likely slowToo many moving parts is inelegant and likely slowBut don’t be afraid to experiment – there are many ways to But don’t be afraid to experiment – there are many ways to solve a problemsolve a problem
Maximize ParallelismMaximize ParallelismAllocate enough threadsAllocate enough threadsEngineThreads property on DataFlow TaskEngineThreads property on DataFlow Task““Rule of thumb” - # of datasources + # of async componentsRule of thumb” - # of datasources + # of async components
Minimize blockingMinimize blockingSynchronous vs. Asynchronous componentsSynchronous vs. Asynchronous componentsMemcopy is expensive – reduce the number of Memcopy is expensive – reduce the number of asynchronous components in a flow if possible – example asynchronous components in a flow if possible – example coming up latercoming up later
Minimize ancillary dataMinimize ancillary dataFor example, minimize data retrieved by LookupTxFor example, minimize data retrieved by LookupTx
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Debugging & Performance Tuning - GeneralDebugging & Performance Tuning - General
Leverage the logging and auditing featuresLeverage the logging and auditing featuresMsgBox is your friendMsgBox is your friendPipeline debuggers are your friendPipeline debuggers are your friendUse the throughput component from Project REALUse the throughput component from Project REAL
Experiment with different techniquesExperiment with different techniquesUse source code control system Use source code control system Focus on the bottlenecks – methodology discussed laterFocus on the bottlenecks – methodology discussed later
Test on different platformsTest on different platforms32bit, IA64, x6432bit, IA64, x64Local Storage, SANLocal Storage, SANMemory considerationsMemory considerationsNetwork & topology considerationsNetwork & topology considerations
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Debugging & Performance Tuning - VolumeDebugging & Performance Tuning - Volume
Remove redundant columnsRemove redundant columnsUse SELECT statements as opposed to tablesUse SELECT statements as opposed to tablesSELECT * is your enemySELECT * is your enemyAlso remove redundant columns after every async component!Also remove redundant columns after every async component!
Filter rowsFilter rowsWHERE clause is your friendWHERE clause is your friendConditional Split in SSISConditional Split in SSISConcatenate or re-route unneeded columnsConcatenate or re-route unneeded columns
Parallel loadingParallel loadingSource system split source data into multiple chunksSource system split source data into multiple chunks
Flat Files – multiple filesFlat Files – multiple filesRelational – via key fields and indexesRelational – via key fields and indexes
Multiple Destination components all loading same tableMultiple Destination components all loading same table
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Debugging & Performance Tuning - Debugging & Performance Tuning - ApplicationApplication
Is BCP good enough?Is BCP good enough?Overhead of starting up an SSIS package may offset any performance gain over Overhead of starting up an SSIS package may offset any performance gain over BCP for small data setsBCP for small data setsIs the greater manageability and control of SSIS needed?Is the greater manageability and control of SSIS needed?
Which pattern?Which pattern?Many Lookup patterns possible – which one is most suitable?Many Lookup patterns possible – which one is most suitable?See Project Real for examples of patterns:See Project Real for examples of patterns:http://www.microsoft.com/sql/solutions/bi/projectreal.mspxhttp://www.microsoft.com/sql/solutions/bi/projectreal.mspx
Which component?Which component?Bulk Import Task vs. Data FlowBulk Import Task vs. Data Flow
Bulk Import might give better performance if there are no transformations or filtering required, Bulk Import might give better performance if there are no transformations or filtering required, and the destination is SQL Server.and the destination is SQL Server.
Lookup vs. MergeJoin (LeftJoin) vs. set based statements in SQLLookup vs. MergeJoin (LeftJoin) vs. set based statements in SQLMergeJoin might be required if you’re not able to populate the lookup cache.MergeJoin might be required if you’re not able to populate the lookup cache.Set based SQL statements might provide a way to persist lookup cache misses and apply a set Set based SQL statements might provide a way to persist lookup cache misses and apply a set based operation for higher performance.based operation for higher performance.
Script vs. custom componentScript vs. custom componentScript might be good enough for small transforms that’re typically not reusedScript might be good enough for small transforms that’re typically not reused
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Case Study - PatternsCase Study - Patterns
105 seconds 83 seconds
Use Error Output for handling Lookup miss
Ignore lookup errors and check for null looked up values in Derived Column
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Debugging & Performance Tuning – Debugging & Performance Tuning – A methodologyA methodology
Optimize and Stabilize the basicsOptimize and Stabilize the basicsMinimize staging (else use RawFiles if possible)Minimize staging (else use RawFiles if possible)Make sure you have enough MemoryMake sure you have enough MemoryWindows, Disk, Network, …Windows, Disk, Network, …SQL FileGroups, Indexing, PartitioningSQL FileGroups, Indexing, Partitioning
Get BaselineGet BaselineReplace destinations with RowCountReplace destinations with RowCountSource->RowCount throughputSource->RowCount throughputSource->Destination throughputSource->Destination throughput
Incrementally add/change components to see effectIncrementally add/change components to see effectThis could include the DB layerThis could include the DB layerUse source code control!Use source code control!
Optimize slow components for resources availableOptimize slow components for resources available
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Case Study - ParallelismCase Study - ParallelismFocus on critical pathFocus on critical pathUtilize available resourcesUtilize available resources
Memory Constrained Reader and CPU Constrained
Let it rip! Optimize the slowest
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SummarySummaryFollow best practice development methodsFollow best practice development methodsUnderstand how SSIS architecture influences performanceUnderstand how SSIS architecture influences performance
Buffers, component typesBuffers, component typesDesign PatternsDesign Patterns
Learn the new featuresLearn the new featuresBut do not forget the existing principlesBut do not forget the existing principles
Use the native functionalityUse the native functionalityBut do not be afraid to extendBut do not be afraid to extend
Measure performanceMeasure performanceFocus on the bottlenecksFocus on the bottlenecks
Maximize parallelism and memory use where appropriateMaximize parallelism and memory use where appropriateBe aware of different platforms capabilities (64bit RAM)Be aware of different platforms capabilities (64bit RAM)
Testing is keyTesting is key
Satisfy Your Technical Curiosity
Analysis ServicesAnalysis Services
Satisfy Your Technical Curiosity
Clippy® for Business Intelligence!Clippy® for Business Intelligence!
Satisfy Your Technical Curiosity
AgendaAgenda
Server architecture and UDM basicsServer architecture and UDM basicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion
Satisfy Your Technical Curiosity
Analysis Server
OLEDB
ADOMD .NET
AMO IIS
TCP
HTTP
XMLA
ADOMD
Client Apps
BIDS
SSMS
Profiler
Excel
Client Server ArchitectureClient Server Architecture
Satisfy Your Technical Curiosity
DimensionDimension
An entity on which analysis is to be performed An entity on which analysis is to be performed (e.g. Customers)(e.g. Customers)Consists of:Consists of:
Attributes that describe the entityAttributes that describe the entityHierarchies that organize dimension members in Hierarchies that organize dimension members in meaningful waysmeaningful ways
CustomerID
FirstName
LastName
State
City MaritalStatus
Gender
… Age
123 Tom Martens BRU Brussels Married Male … 42
456 Frank Desmet OVL Gent Unmarried
Male … 34
789 Ilse Vermeersch ANT Antwerpen
Married Female
… 21
Satisfy Your Technical Curiosity
AttributeAttribute
Containers of dimension members.Containers of dimension members.Completely define the dimensional space.Completely define the dimensional space.Enable slicing and grouping the dimensional Enable slicing and grouping the dimensional space in interesting ways.space in interesting ways.
Customers in Customers in Brussels Brussels and and age > 50age > 50Customers who are Customers who are marriedmarried and and drink Beerdrink Beer
Typically have one-many relationshipsTypically have one-many relationshipsCity City State, State State, State Country, etc. Country, etc.All attributes implicitly related to the keyAll attributes implicitly related to the key
Satisfy Your Technical Curiosity
Ordered collection of attributes into levelsOrdered collection of attributes into levelsNavigation path through dimensional spaceNavigation path through dimensional spaceUser defined hierarchies – typically multiple levelsUser defined hierarchies – typically multiple levelsAttribute hierarchies – implicitly created for each Attribute hierarchies – implicitly created for each attribute – single levelattribute – single level
HierarchyHierarchy
Customers by Geography
Country
State
City
Customer
Customers by Demographics
Marital
Gender
Customer
Satisfy Your Technical Curiosity
Customer
City
State
Country
Gender Marital
Country
State
City
Customer
Gender
Customer
Marital
Gender
Customer
Customer
City
State
Country
Gender
Marital
Attributes Hierarchies
Age
Dimension ModelDimension Model
Satisfy Your Technical Curiosity
CubeCube
Collection of dimensions and measuresCollection of dimensions and measuresMeasure Measure numeric data associated with a numeric data associated with a set of dimensions (e.g. Qty Sold, Sales set of dimensions (e.g. Qty Sold, Sales Amount, Cost)Amount, Cost)Multi-dimensional spaceMulti-dimensional space
Defined by dimensions and measuresDefined by dimensions and measuresE.g. (Customers, Products, Time, Measures)E.g. (Customers, Products, Time, Measures)
Intersection of dimension members and measures Intersection of dimension members and measures is is a cell (Belgium, Mussels, 2006, Sales Amount) = a cell (Belgium, Mussels, 2006, Sales Amount) = E10,523,374.83E10,523,374.83
Satisfy Your Technical Curiosity
A CubeA Cube
ProductProductPeasPeas CornCorn BreadBread SpudsSpuds ChocolateChocolate
MMaarrkkeett
BrusselsBrussels
LondonLondon
OsloOslo
DublinDublinJanJan
MarMarFebFeb
TimeTime
Units of Chocolate sold in Units of Chocolate sold in Brussels in JanuaryBrussels in January
Satisfy Your Technical Curiosity
Measure GroupMeasure Group
Group of measures with same dimensionalityGroup of measures with same dimensionalityAnalogous to fact tableAnalogous to fact tableCube can contain more than one measure Cube can contain more than one measure groupgroup
E.g. Sales, Inventory, FinanceE.g. Sales, Inventory, Finance
Multi-dimensional spaceMulti-dimensional spaceSubset of dimensions and measures in the cubeSubset of dimensions and measures in the cube
AS2000 comparisonAS2000 comparisonVirtual Cube Virtual Cube Cube CubeCube Cube Measure Group Measure Group
Satisfy Your Technical Curiosity
Sales Inventory Finance
Customers X
Products X X
Time X X X
Promotions X
Warehouse X
Department
X
Account X
Scenario X
Measure GroupMeasure Group
Measure GroupMeasure Group
Satisfy Your Technical Curiosity
AgendaAgenda
Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion
Satisfy Your Technical Curiosity
Top 3 Tenets of Good Cube DesignTop 3 Tenets of Good Cube Design
Attribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationshipsAttribute relationships
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute Relationships
One-to-many relationships between attributesOne-to-many relationships between attributesServer simply “works better” if you define them where Server simply “works better” if you define them where applicableapplicableExamples:Examples:
City City State, State State, State Country CountryDay Day Month, Month Month, Month Quarter, Quarter Quarter, Quarter Year Year Product Subcategory Product Subcategory Product Category Product Category
Rigid v/s flexible relationships (default is flexible)Rigid v/s flexible relationships (default is flexible)Customer Customer City, Customer City, Customer PhoneNo are flexible PhoneNo are flexibleCustomer Customer BirthDate, City BirthDate, City State are rigid State are rigid
All attributes implicitly related to key attributeAll attributes implicitly related to key attribute
Satisfy Your Technical Curiosity
Customer
City
State Country
Gender Marital Age
Attribute Relationships Attribute Relationships (continued)(continued)
Satisfy Your Technical Curiosity
Customer
City
State
Country
Gender Marital Age
Attribute Relationships Attribute Relationships (continued)(continued)
Satisfy Your Technical Curiosity
Attribute Relationships Attribute Relationships Where are they used?Where are they used?
MDX SemanticsMDX SemanticsTells the formula engine how to roll up measure valuesTells the formula engine how to roll up measure valuesIf the grain of the measure group is different from the key If the grain of the measure group is different from the key attribute (e.g. Sales by Month)attribute (e.g. Sales by Month)
Attribute relationships from grain to other attributes required Attribute relationships from grain to other attributes required (e.g. Month (e.g. Month Quarter, Quarter Quarter, Quarter Year) Year)Otherwise no data (NULL) returned for Quarter and YearOtherwise no data (NULL) returned for Quarter and Year
MDX Semantics explained in detail at:MDX Semantics explained in detail at:http://www.sqlserveranalysisservices.com/OLAPPapers/AttributeRelationships.htmhttp://www.sqlserveranalysisservices.com/OLAPPapers/AttributeRelationships.htm
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsWhere are they used?Where are they used?
StorageStorageReduces redundant relationships between dimension Reduces redundant relationships between dimension members – normalizes dimension storagemembers – normalizes dimension storageEnables clustering of records within partition segments (e.g. Enables clustering of records within partition segments (e.g. store facts for a month together)store facts for a month together)
ProcessingProcessingReduces memory consumption in dimension processing – Reduces memory consumption in dimension processing – less hash tables to fit in memoryless hash tables to fit in memoryAllows large dimensions to push 32-bit barrierAllows large dimensions to push 32-bit barrierSpeeds up dimension and partition processing overallSpeeds up dimension and partition processing overall
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsWhere are they used?Where are they used?
Query performanceQuery performanceDimension storage access is fasterDimension storage access is fasterProduces more optimal execution plansProduces more optimal execution plans
Aggregation designAggregation designEnables aggregation design algorithm to produce effective Enables aggregation design algorithm to produce effective set of aggregationsset of aggregations
Dimension securityDimension securityDeniedSet = {Country.Ireland} should deny cities and DeniedSet = {Country.Ireland} should deny cities and customers in Ireland – requires attribute relationshipscustomers in Ireland – requires attribute relationships
Member propertiesMember propertiesAttribute relationships identify member properties on levelsAttribute relationships identify member properties on levels
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsHow to set them up?How to set them up?
Creating an attribute relationship is easy, but …Creating an attribute relationship is easy, but …Pay careful attention to the key columns!Pay careful attention to the key columns!Make sure every attribute has unique key columns (add Make sure every attribute has unique key columns (add composite keys as needed)composite keys as needed)
There must be a 1:M relation between the key There must be a 1:M relation between the key columns of the two attributescolumns of the two attributesInvalid key columns cause a member to have Invalid key columns cause a member to have multiple parentsmultiple parents
Dimension processing picks one parent arbitrarily and Dimension processing picks one parent arbitrarily and succeedssucceedsHierarchy looks wrong!Hierarchy looks wrong!
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsHow to set them up?How to set them up?
Don’t forget to remove redundant relationships!Don’t forget to remove redundant relationships!All attributes start with relationship to keyAll attributes start with relationship to keyCustomer Customer City City State State Country Country
Customer Customer State (redundant) State (redundant)Customer Customer Country (redundant) Country (redundant)
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsExampleExample
Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year
Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4Quarter: 1 to 4Month: 1 to 12Month: 1 to 12Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231
Day
Week
Month
Quarter
Year
Satisfy Your Technical Curiosity
Attribute RelationshipsAttribute RelationshipsExampleExample
Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year
Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)
Month: 1 to 12Month: 1 to 12Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231
Day
Week
Month
Quarter
Year
Satisfy Your Technical Curiosity
Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year
Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)
Month: 1 to 12 - Month: 1 to 12 - Key columns (Year, Month)Key columns (Year, Month)
Week: 1 to 52Week: 1 to 52Day: 20030101 to 20101231Day: 20030101 to 20101231
Attribute RelationshipsAttribute RelationshipsExampleExample
Day
Week
Month
Quarter
Year
Satisfy Your Technical Curiosity
Time dimensionTime dimensionDay, Week, Month, Quarter, YearDay, Week, Month, Quarter, Year
Year: 2003 to 2010Year: 2003 to 2010Quarter: 1 to 4 - Quarter: 1 to 4 - Key columns (Year, Quarter)Key columns (Year, Quarter)
Month: 1 to 12 - Month: 1 to 12 - Key columns (Year, Month)Key columns (Year, Month)
Week: 1 to 52 - Week: 1 to 52 - Key columns (Year, Week)Key columns (Year, Week) Day: 20030101 to 20101231Day: 20030101 to 20101231
Attribute RelationshipsAttribute RelationshipsExampleExample
Day
Week
Month
Quarter
Year
Satisfy Your Technical Curiosity
Defining Attribute RelationshipsDefining Attribute Relationships
Satisfy Your Technical Curiosity
User Defined HierarchiesUser Defined Hierarchies
Pre-defined navigation paths thru dimensional Pre-defined navigation paths thru dimensional space defined by attributesspace defined by attributesAttribute hierarchies enable ad hoc navigationAttribute hierarchies enable ad hoc navigationWhy create user defined hierarchies then?Why create user defined hierarchies then?
Guide end users to interesting navigation pathsGuide end users to interesting navigation pathsExisting client tools are not “attribute aware”Existing client tools are not “attribute aware”PerformancePerformance
Optimize navigation path at processing timeOptimize navigation path at processing timeMaterialization of hierarchy tree on diskMaterialization of hierarchy tree on diskAggregation designer favors user defined hierarchiesAggregation designer favors user defined hierarchies
Satisfy Your Technical Curiosity
Natural HierarchiesNatural Hierarchies
1:M relation (via attribute relationships) between 1:M relation (via attribute relationships) between every pair of adjacent levelsevery pair of adjacent levelsExamples:Examples:
Country-State-City-Customer (natural)Country-State-City-Customer (natural)Country-City (natural)Country-City (natural)State-Customer (natural)State-Customer (natural)Age-Gender-Customer (unnatural)Age-Gender-Customer (unnatural)Year-Quarter-Month (depends on key columns)Year-Quarter-Month (depends on key columns)
How many quarters and months?How many quarters and months?4 & 12 across all years (unnatural)4 & 12 across all years (unnatural)4 & 12 for each year (natural)4 & 12 for each year (natural)
Satisfy Your Technical Curiosity
Natural Hierarchies Natural Hierarchies Best Practice for Hierarchy DesignBest Practice for Hierarchy Design
Performance implicationsPerformance implicationsOnly natural hierarchies are materialized on disk Only natural hierarchies are materialized on disk during processingduring processingUnnatural hierarchies are built on the fly during Unnatural hierarchies are built on the fly during queries (and cached in memory)queries (and cached in memory)Server internally decomposes unnatural Server internally decomposes unnatural hierarchies into natural componentshierarchies into natural components
Essentially operates like ad hoc navigation path (but somewhat better)Essentially operates like ad hoc navigation path (but somewhat better)
Create natural hierarchies where possibleCreate natural hierarchies where possibleUsing attribute relationshipsUsing attribute relationshipsNot always appropriate (e.g. Age-Gender)Not always appropriate (e.g. Age-Gender)
Satisfy Your Technical Curiosity
Best Practices for Cube DesignBest Practices for Cube DesignDimensionsDimensions
Consolidate multiple hierarchies into single dimension Consolidate multiple hierarchies into single dimension (unless they are related via fact table)(unless they are related via fact table)Avoid ROLAP storage mode if performance is keyAvoid ROLAP storage mode if performance is keyUse role playing dimensions (e.g. OrderDate, BillDate, Use role playing dimensions (e.g. OrderDate, BillDate, ShipDate) - avoids multiple physical copiesShipDate) - avoids multiple physical copiesUse parent-child dimensions prudentlyUse parent-child dimensions prudently
No intermediate level aggregation supportNo intermediate level aggregation support
Use many-to-many dimensions prudentlyUse many-to-many dimensions prudentlySlower than regular dimensions, but faster than calculationsSlower than regular dimensions, but faster than calculationsIntermediate measure group must be “small” relative to primary measure Intermediate measure group must be “small” relative to primary measure groupgroup
Satisfy Your Technical Curiosity
Best Practices for Cube DesignBest Practices for Cube DesignAttributesAttributes
Define all possible attribute relationships!Define all possible attribute relationships!Mark attribute relationships as rigid where appropriateMark attribute relationships as rigid where appropriateUse integer (or numeric) key columnsUse integer (or numeric) key columnsSet AttributeHierarchyEnabled to false for attributes not Set AttributeHierarchyEnabled to false for attributes not used for navigation (e.g. Phone#, Address)used for navigation (e.g. Phone#, Address)Set AttributeHierarchyOptimizedState to NotOptimized for Set AttributeHierarchyOptimizedState to NotOptimized for infrequently used attributesinfrequently used attributesSet AttributeHierarchyOrdered to false if the order of Set AttributeHierarchyOrdered to false if the order of members returned by queries is not importantmembers returned by queries is not important
HierarchiesHierarchiesUse natural hierarchies where possibleUse natural hierarchies where possible
Satisfy Your Technical Curiosity
Best Practices for Cube DesignBest Practices for Cube Design
MeasuresMeasuresUse smallest numeric data type possible Use smallest numeric data type possible Use semi-additive aggregate functions instead of MDX Use semi-additive aggregate functions instead of MDX calculations to achieve same behaviorcalculations to achieve same behaviorPut distinct count measures into separate measure Put distinct count measures into separate measure group (BIDS does this automatically)group (BIDS does this automatically)Avoid string source column for distinct count measuresAvoid string source column for distinct count measures
Satisfy Your Technical Curiosity
AgendaAgenda
Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion
Satisfy Your Technical Curiosity
PartitioningPartitioning
Mechanism to break up large cube into Mechanism to break up large cube into manageable chunksmanageable chunksPartitions can be added, processed, Partitions can be added, processed, deleted independentlydeleted independently
Update to last month’s data does not affect prior months’ Update to last month’s data does not affect prior months’ partitionspartitionsSliding window scenario easy to implementSliding window scenario easy to implement
E.g. 24 month window E.g. 24 month window add June 2006 partition and delete June 2004 add June 2006 partition and delete June 2004
Partitions can have different storage settingsPartitions can have different storage settings
Partitions require Enterprise Edition!
Satisfy Your Technical Curiosity
Benefits of PartitioningBenefits of PartitioningPartitions can be processed and queried Partitions can be processed and queried in parallelin parallel
Better utilization of server resourcesBetter utilization of server resourcesReduced data warehouse load timesReduced data warehouse load times
Queries are isolated to relevant partitions Queries are isolated to relevant partitions less data less data to scanto scan
SELECT … FROM … WHERE [Time].[Year].[2006]SELECT … FROM … WHERE [Time].[Year].[2006]Queries only 2006 partitionsQueries only 2006 partitions
Bottom line Bottom line partitions enable: partitions enable:ManageabilityManageabilityPerformancePerformanceScalabilityScalability
Satisfy Your Technical Curiosity
Best Practices for PartitioningBest Practices for Partitioning
No more than 20M rows per partitionNo more than 20M rows per partitionSpecify partition sliceSpecify partition slice
Optional for MOLAP – server auto-detects the slice and Optional for MOLAP – server auto-detects the slice and validates against user specified slice (if any)validates against user specified slice (if any)Must be specified for ROLAPMust be specified for ROLAP
Manage storage settings by usage patternsManage storage settings by usage patternsFrequently queried Frequently queried MOLAP with lots of aggs MOLAP with lots of aggsPeriodically queried Periodically queried MOLAP with less or no aggs MOLAP with less or no aggsHistorical Historical ROLAP with no aggs ROLAP with no aggs
Alternate disk drive - use multiple controllers to avoid Alternate disk drive - use multiple controllers to avoid I/O contentionI/O contention
Satisfy Your Technical Curiosity
Best Practices for AggregationsBest Practices for Aggregations
Define all possible attribute relationshipsDefine all possible attribute relationshipsSet accurate attribute member counts and fact Set accurate attribute member counts and fact table countstable countsSet AggregationUsage to guide agg designerSet AggregationUsage to guide agg designer
Set rarely queried attributes to NoneSet rarely queried attributes to NoneSet commonly queried attributes to UnrestrictedSet commonly queried attributes to Unrestricted
Do not build too many aggregationsDo not build too many aggregationsIn the 100s, not 1000s!In the 100s, not 1000s!
Do not build aggregations larger than 30% of fact Do not build aggregations larger than 30% of fact table size (agg design algorithm doesn’t)table size (agg design algorithm doesn’t)
Satisfy Your Technical Curiosity
Best Practices for AggregationsBest Practices for Aggregations
Aggregation design cycleAggregation design cycleUse Storage Design Wizard (~20% perf gain) to Use Storage Design Wizard (~20% perf gain) to design initial set of aggregationsdesign initial set of aggregationsEnable query log and run pilot workload (beta test Enable query log and run pilot workload (beta test with limited set of users)with limited set of users)Use Usage Based Optimization (UBO) Wizard to Use Usage Based Optimization (UBO) Wizard to refine aggregationsrefine aggregationsUse larger perf gain (70-80%)Use larger perf gain (70-80%)Reprocess partitions for new aggregations to Reprocess partitions for new aggregations to take effecttake effectPeriodically use UBO to refine aggregationsPeriodically use UBO to refine aggregations
Satisfy Your Technical Curiosity
AgendaAgenda
Server architecture and UDM BasicsServer architecture and UDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Improving ProcessingImproving ProcessingSQL Server Performance TuningSQL Server Performance Tuning
Improve the queries that are used for extracting data from SQL ServerImprove the queries that are used for extracting data from SQL ServerCheck for proper plans and indexingCheck for proper plans and indexingConduct regular SQL performance tuning processConduct regular SQL performance tuning process
AS Processing ImprovementsAS Processing ImprovementsUse SP2 !!Use SP2 !!
Processing 20 partitions: SP1 1:56, SP2: 1:06Processing 20 partitions: SP1 1:56, SP2: 1:06Don’t let UI default for parallel processingDon’t let UI default for parallel processing
Go into advanced processing tab and change itGo into advanced processing tab and change itMonitor the values:Monitor the values:
Maximum number of datasource connectionsMaximum number of datasource connectionsMaxParallel – How many partitions processed in parallel, don’t let the server decide MaxParallel – How many partitions processed in parallel, don’t let the server decide on its own.on its own.
Use INT for keys, if possible.Use INT for keys, if possible.
Parallel processing requires Enterprise Edition!
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Improving ProcessingImproving Processing
For best performance use ASCMD.EXE and XMLA For best performance use ASCMD.EXE and XMLA Use <Parallel> </Parallel> to group processing tasks Use <Parallel> </Parallel> to group processing tasks together until Server is using maximum resourcestogether until Server is using maximum resourcesProper use of <Transaction> </Transaction>Proper use of <Transaction> </Transaction>
ProcessFact and ProcessIndex separately instead ProcessFact and ProcessIndex separately instead of ProcessFull (for large partitions)of ProcessFull (for large partitions)
Consumes less memory.Consumes less memory.
ProcessClearIndexes deletes existing indexes and ProcessClearIndexes deletes existing indexes and ProcessIndexes generates or reprocesses existing ProcessIndexes generates or reprocesses existing ones.ones.
Satisfy Your Technical Curiosity
Best Practices for ProcessingBest Practices for Processing
Partition processingPartition processingMonitor aggregation processing spilling to disk (perfmon Monitor aggregation processing spilling to disk (perfmon counters for temp file usage)counters for temp file usage)
Add memory, turn on /3GB, move to x64/ia64Add memory, turn on /3GB, move to x64/ia64
Fully process partitions periodicallyFully process partitions periodicallyAchieves better compression over repeated incremental Achieves better compression over repeated incremental processingprocessing
Data sourcesData sourcesAvoid using .NET data sources – OLEDB is faster for Avoid using .NET data sources – OLEDB is faster for processingprocessing
Satisfy Your Technical Curiosity
AgendaAgenda
Server architectureServer architectureUDM BasicsUDM BasicsOptimizing the cube designOptimizing the cube designPartitioning and AggregationsPartitioning and AggregationsProcessingProcessingQueries and calculationsQueries and calculationsConclusionConclusion
Satisfy Your Technical Curiosity
Non_Empty_BehaviorNon_Empty_Behavior
Most client tools (Excel, Proclarity) display non empty Most client tools (Excel, Proclarity) display non empty results – eliminate members with no dataresults – eliminate members with no dataWith no calculations, non empty is fast – just checks With no calculations, non empty is fast – just checks fact datafact dataWith calculations, non empty can be slow – requires With calculations, non empty can be slow – requires evaluating formula for each cellevaluating formula for each cellNon_Empty_Behavior allows non empty on Non_Empty_Behavior allows non empty on calculations to just check fact datacalculations to just check fact data
Note: query processing hint – use with care!Note: query processing hint – use with care!
Create Member [Measures].[Internet Gross Profit]Create Member [Measures].[Internet Gross Profit] AAss[Internet Sales Amount][Internet Sales Amount] -- [Internet Total Cost],[Internet Total Cost],Format_String = "Currency",Format_String = "Currency",Non_Empty_Behavior =Non_Empty_Behavior = [Internet Sales Amount][Internet Sales Amount]; ; BAD!BAD!
Satisfy Your Technical Curiosity
Auto-ExistsAuto-ExistsAttributes/hierarchies within a dimension are always Attributes/hierarchies within a dimension are always existed togetherexisted together
City.Brussels * Country.Members returns {(Brussels, Belgium)}City.Brussels * Country.Members returns {(Brussels, Belgium)}(Dublin, Belgium), (Dublin, USA) do not “exist”(Dublin, Belgium), (Dublin, USA) do not “exist”
Exploit the power of auto-existsExploit the power of auto-existsUse Exists/CrossJoin instead of .Properties – fasterUse Exists/CrossJoin instead of .Properties – fasterRequires attribute hierarchy enabled on member propertyRequires attribute hierarchy enabled on member property
Filter(Customer.Members,Filter(Customer.Members, Customer.CurrentMember.Properties(“Gender”) = “Male”)Customer.CurrentMember.Properties(“Gender”) = “Male”)
Exists(Customer.Members, Gender.[Male])Exists(Customer.Members, Gender.[Male])
Satisfy Your Technical Curiosity
Conditional Statement: IIFConditional Statement: IIFUse scopes instead of conditions such as IIf/CaseUse scopes instead of conditions such as IIf/Case
Scopes are evaluated once staticallyScopes are evaluated once staticallyConditions are evaluated dynamically for each cellConditions are evaluated dynamically for each cell
Always try to coerce IIF for one branch to be nullAlways try to coerce IIF for one branch to be null
Create Member Measures.Create Member Measures.Sales Sales AAss Iif(Currency.CurrentMember Is Currency.USD,Iif(Currency.CurrentMember Is Currency.USD, Measures.SalesUSD, Measures.SalesUSD * Measures.XRate);Measures.SalesUSD, Measures.SalesUSD * Measures.XRate);
Create Member Measures.Sales As Null;Create Member Measures.Sales As Null;Scope(Measures.Sales, Currency.Members);Scope(Measures.Sales, Currency.Members); This = Measures.SalesUSD * Measures.XRate;This = Measures.SalesUSD * Measures.XRate; Scope(Currency.USA);Scope(Currency.USA); This = Measures.SalesUSD;This = Measures.SalesUSD; End Scope;End Scope;End Scope;End Scope;
Satisfy Your Technical Curiosity
Best Practices for MDXBest Practices for MDXUse calc members instead of calc cells where possibleUse calc members instead of calc cells where possibleUse .MemberValue for calcs on numeric attributesUse .MemberValue for calcs on numeric attributes
Filter(Customer.members, Salary.MemberValue > 100000)Filter(Customer.members, Salary.MemberValue > 100000)Avoid redundant use of .CurrentMember and .ValueAvoid redundant use of .CurrentMember and .Value
(Time.CurrentMember.PrevMember, Measures.CurrentMember ).Value can be (Time.CurrentMember.PrevMember, Measures.CurrentMember ).Value can be replaced with Time.PrevMemberreplaced with Time.PrevMember
Avoid LinkMember, StrToSet, StrToMember, StrToValueAvoid LinkMember, StrToSet, StrToMember, StrToValueReplace simple calcs with computed columns in DSVReplace simple calcs with computed columns in DSV
Calculation done at processing time is always betterCalculation done at processing time is always betterMany more at:Many more at:
Analysis Services Performance Whitepaper: Analysis Services Performance Whitepaper: http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc7f7151b2011c/SSAS2005PerfGuide.dochttp://sqljunkies.com/weblog/moshahttp://sqljunkies.com/weblog/moshahttp://sqlserveranalysisservices.comhttp://sqlserveranalysisservices.com
Satisfy Your Technical Curiosity
ConclusionConclusionAS2005 is major re-architecture from AS2000AS2005 is major re-architecture from AS2000Design for perf & scalability from the startDesign for perf & scalability from the startMany principles carry through from AS2000Many principles carry through from AS2000
Dimensional design, Partitioning, AggregationsDimensional design, Partitioning, Aggregations
Many new principles in AS2005Many new principles in AS2005Attribute relationships, natural hierarchiesAttribute relationships, natural hierarchiesNew design alternatives – role playing, many-to-many, reference New design alternatives – role playing, many-to-many, reference dimensions, semi-additive measuresdimensions, semi-additive measuresFlexible processing optionsFlexible processing optionsMDX scripts, scopesMDX scripts, scopes
Use Analysis Services with SQL Server Enterprise Use Analysis Services with SQL Server Enterprise Edition to get max performance and scaleEdition to get max performance and scale
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
ResourcesResourcesSSISSSIS
SQL Server Integration Services site – links to blogs, training, partners, etc.: SQL Server Integration Services site – links to blogs, training, partners, etc.: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx
SSIS MSDN Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?SSIS MSDN Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=80&SiteID=1ForumID=80&SiteID=1
SSIS MVP community site: SSIS MVP community site: http://www.sqlis.comhttp://www.sqlis.com
SSASSSAS
BLOGS: http://blogs.msdn.com/sqlcatBLOGS: http://blogs.msdn.com/sqlcatPROJECT REAL-Business Intelligence in PracticePROJECT REAL-Business Intelligence in PracticeAnalysis Services Performance GuideAnalysis Services Performance GuideTechNet: Analysis Services for IT ProfessionalsTechNet: Analysis Services for IT Professionals
Microsoft BIMicrosoft BI
SQL Server Business Intelligence public site: SQL Server Business Intelligence public site: http://www.microsoft.com/sql/evaluation/bi/default.asp http://www.microsoft.com/sql/evaluation/bi/default.asp
http://www.microsoft.com/bihttp://www.microsoft.com/bi
Satisfy Your Technical Curiosity