Upload
ralph-parrish
View
214
Download
1
Embed Size (px)
Citation preview
Scaling SQL Server 2000 Scaling SQL Server 2000 Analysis Services to the Analysis Services to the MAXMAX
Dave WickertDave WickertProgram ManagerProgram ManagerSQL Server BI Practices TeamSQL Server BI Practices TeamMicrosoft CorporationMicrosoft Corporation
ScalabilityScalability
““For our purposes, scalability is For our purposes, scalability is the result of a system where the result of a system where performance and response times performance and response times are smooth, even and predictable are smooth, even and predictable as the number of users is as the number of users is increased”increased”
AgendaAgenda Out-of-the-box defaultsOut-of-the-box defaults Performance tuningPerformance tuning
Aggregation designAggregation design Storage modesStorage modes
Cube designCube design PartitioningPartitioning Memory managementMemory management Schema optimizationSchema optimization
Architectural designArchitectural design NLB Clustering (web-farm of Analysis servers)NLB Clustering (web-farm of Analysis servers)
For more details on all these topics…For more details on all these topics… For information that didn’t fit in this talk…For information that didn’t fit in this talk…
1.1. Analysis Services Performance Guide available at:Analysis Services Performance Guide available at:http://www.microsoft.com/technethttp://www.microsoft.com/technet /prodtechnol/sql/maintain/optimize/AnSvcsPG.asp /prodtechnol/sql/maintain/optimize/AnSvcsPG.asp
2.2. Analysis Services Operations Guide available at:Analysis Services Operations Guide available at:http://www.microsoft.com/technethttp://www.microsoft.com/technet /prodtechnol/sql/maintain/operate/AnServOG.asp /prodtechnol/sql/maintain/operate/AnServOG.asp
3.3. ““Creating Large-Scale, Highly Available OLAP Sites” Creating Large-Scale, Highly Available OLAP Sites” white paper available at:white paper available at:http://www.microsoft.com/sql/evaluation/bi/creatingOLAPsites.asphttp://www.microsoft.com/sql/evaluation/bi/creatingOLAPsites.asp
We can’t cover it all . . .We can’t cover it all . . .
Out-of-the-box DefaultsOut-of-the-box DefaultsAll are ‘server’ propertiesAll are ‘server’ properties
Data and temp folder locationData and temp folder location Validate the default memory settingsValidate the default memory settings Increase the process buffer size Increase the process buffer size
100-200MB minimum100-200MB minimum
Always set the system-wide Always set the system-wide processing log fileprocessing log file
Enable error reportingEnable error reporting
Out-of-the-box DefaultsOut-of-the-box Defaults
Dave WickertDave WickertProgram ManagerProgram ManagerBI Practices TeamBI Practices Team
Aggregation DesignAggregation Design
What aggregations areWhat aggregations are How many aggregations there areHow many aggregations there are How big aggregations areHow big aggregations are How aggregations help query performanceHow aggregations help query performance Why too many aggregations cost in processingWhy too many aggregations cost in processing Controls over the aggregation designControls over the aggregation design
Setting level and partition countsSetting level and partition counts Setting Aggregation Usage propertySetting Aggregation Usage property
Aggregation design toolsAggregation design tools Aggregation design strategiesAggregation design strategies
Design aggregates for the Design aggregates for the Foodmart SALES cubeFoodmart SALES cube
Dave WickertDave WickertProgram ManagerProgram ManagerBI Practices TeamBI Practices Team
What Aggregations AreWhat Aggregations AreSubtotals at a certain level from every dimensionSubtotals at a certain level from every dimension
CustomersCustomersAll CustomersAll CustomersCountryCountryStateStateCityCityNameName
ProductProductAll ProductsAll ProductsCategoryCategoryBrandBrandItemItemSKUSKU
FactsFactsCustomer ID SKU Units Sold Sales
345-23 135123 2 $45.67
563-01 451236 34 $67.32
…
Highest Level AggregationHighest Level Aggregation
Customer Product Units Sold Sales
All All 347814123 $345,212,301.30
Intermediate AggregationIntermediate AggregationCountry Item ID Units Sold Sales
Can sd452 9456 $23,914.30
US yu678 4623 $57,931.45
…
125 possible combinations125 possible combinations Five customer levels, five product levels, five time levelsFive customer levels, five product levels, five time levels
Imagine a cube with ten dimensions, five levels eachImagine a cube with ten dimensions, five levels each 551010 = 9,765,625 or ~ 10 million theoretical combinations! = 9,765,625 or ~ 10 million theoretical combinations!
General rule: multiply the number of levels General rule: multiply the number of levels in each dimensionin each dimension
How Many AggregationsHow Many AggregationsCustomerCustomerAll CustomersAll CustomersCountryCountryStateStateCityCityNameName
ProductProductAll ProductsAll ProductsCategoryCategoryBrandBrandItemItemSKUSKU
TimeTimeAll TimeAll TimeYearYearQuarterQuarterMonthMonthDayDay
Goal is to find the best subset Goal is to find the best subset of this potentially huge number of possibilitiesof this potentially huge number of possibilities
Aggregations at lower levels have more possible cells…Aggregations at lower levels have more possible cells…(All, All, All)(All, All, All) 1 x 1 x 11 x 1 x 1 = 1= 1
(Country, Item, Quarter)(Country, Item, Quarter) 3 x 7621 x 123 x 7621 x 12 = 274,356= 274,356
(Name, SKU, Day)(Name, SKU, Day) 3811 x 8211 x 10953811 x 8211 x 1095 = 34,264,872,495= 34,264,872,495
The size varies based on the scarcity of the data itself since The size varies based on the scarcity of the data itself since empty cells are never stored.empty cells are never stored.
Obviously, it also depends on the number of measures (and the Obviously, it also depends on the number of measures (and the number of bytes used; based on the measure’s datatype).number of bytes used; based on the measure’s datatype).
Size of AggregationsSize of AggregationsCustomerAll Customers (1)All Customers (1)Country (3)Country (3)State (80)State (80)City (578)City (578)Name (3811)Name (3811)
ProductAll Products (1)All Products (1)Category (60)Category (60)Brand (911)Brand (911)Item (7621)Item (7621)SKU (8211)SKU (8211)
TimeAll Time (1)All Time (1)Year (3)Year (3)Quarter (12)Quarter (12)Month (36)Month (36)Day (1095)Day (1095)
Aggs ‘Cost’ In ProcessingAggs ‘Cost’ In Processing Aggs computed from base factsAggs computed from base facts
(Name, SKU, Day)(Name, SKU, Day) (State, Item, Quarter) (State, Item, Quarter)(Name, SKU, Day)(Name, SKU, Day) (Name, Category, All) (Name, Category, All)
An agg ‘miss’ is if An agg ‘miss’ is if nono aggs are below the one being queried – aggs are below the one being queried – thus the only way to calculate it is to go to the base factsthus the only way to calculate it is to go to the base facts
Aggs may be computed from other aggsAggs may be computed from other aggs(State, Item, Quarter)(State, Item, Quarter) (Country, Item, Quarter) (Country, Item, Quarter)(City, Category, All)(City, Category, All) (State, All, All) (State, All, All)(State, All, All)(State, All, All) (All, All, All) (All, All, All)
Remember the cube with ten 5-level dimsRemember the cube with ten 5-level dims 551010 = 9,765,625 or ~ 10 million combinations = 9,765,625 or ~ 10 million combinations Computing even 10% of these is costly!Computing even 10% of these is costly!
Goal is find the optimal agg set Goal is find the optimal agg set that helps query performance the most that helps query performance the most but don’t cost too much in processingbut don’t cost too much in processing
Aggregation Design WizardsAggregation Design Wizards Evaluate cost / benefit of aggsEvaluate cost / benefit of aggs
Relative to other aggregationsRelative to other aggregations Designed in “waves” from Designed in “waves” from
top of pyramidtop of pyramid Cost is related to aggregation sizeCost is related to aggregation size Benefit is related to “distance” from Benefit is related to “distance” from
another aggregation another aggregation Storage Design wizardStorage Design wizard
Assumes all combinations Assumes all combinations of levels are equally likelyof levels are equally likely
Usage Based Usage Based Optimization wizardOptimization wizard Assumes query pattern resembles Assumes query pattern resembles
your selection from the query logyour selection from the query log Representative history is neededRepresentative history is needed
Fact TableFact Table
Getting control over Getting control over your aggregation your aggregation designdesign
Dave WickertDave WickertProgram ManagerProgram ManagerBI Practices TeamBI Practices Team
Controls Over Agg DesignControls Over Agg DesignRow counts drive the design processRow counts drive the design process
Level member countsLevel member counts Property of each dimension levelProperty of each dimension level Used to estimate sizes of aggregationsUsed to estimate sizes of aggregations Set at creation or by manual interventionSet at creation or by manual intervention
Partition row countsPartition row counts Property of each partitionProperty of each partition Used to estimate data densityUsed to estimate data density Set at creation or by manual interventionSet at creation or by manual intervention
Reset these values before starting aggregation Reset these values before starting aggregation design!design!
Controls Over Agg DesignControls Over Agg DesignAggregation usage propertyAggregation usage property
Property of a dimensionProperty of a dimension Used to include or exclude levels Used to include or exclude levels from from
considerationconsideration (not from the design)(not from the design) Standard Standard (standard dims default)(standard dims default) Top Level Only Top Level Only (virtual dims default)(virtual dims default) Bottom Level OnlyBottom Level Only Top and Bottom Levels Top and Bottom Levels (changing dims default)(changing dims default) CustomCustom
Uses Enable Aggregations property Uses Enable Aggregations property of each levelof each level
TrickTrick: Set rarely queried dims to ‘Top Level Only’ -- Each time used : Set rarely queried dims to ‘Top Level Only’ -- Each time used reduces the cube complexity by a dimension. Transfers the cost from reduces the cube complexity by a dimension. Transfers the cost from processing to query time.processing to query time.
Agg Design StrategiesAgg Design StrategiesGuidelinesGuidelines
Design initial overall agg set with Design initial overall agg set with Storage Design wizardStorage Design wizard 10% - 20% performance gain (higher complexity cubes may be 10% - 20% performance gain (higher complexity cubes may be
even lower) – limit to no more than 20-40 minutes of ‘design’ time.even lower) – limit to no more than 20-40 minutes of ‘design’ time. Pilot usage – Pilot usage – collect query logscollect query logs Design UBO agg setDesign UBO agg set
Aggregation Usage “Standard” on most dimensionsAggregation Usage “Standard” on most dimensions Use higher performance gainUse higher performance gain Merge new aggregations with existing setMerge new aggregations with existing set
Periodically add UBO aggs as usage changesPeriodically add UBO aggs as usage changes Merge new aggregations with existing setMerge new aggregations with existing set Eventually may need to start overEventually may need to start over
Partition Aggregation Partition Aggregation UtilityUtility
Dave WickertDave WickertProgram ManagerProgram ManagerBI Practices TeamBI Practices Team
PartitioningPartitioning
Partitioning helps query performancePartitioning helps query performance Set the slice value on partitionsSet the slice value on partitions
Partitioning can help Partitioning can help processing performanceprocessing performance Be more selective about what you processBe more selective about what you process Process in parallelProcess in parallel
Partitioning and the data lifecyclePartitioning and the data lifecycle Partitioning by time is most commonPartitioning by time is most common Used to remove old dataUsed to remove old data Aggregation designs can vary over timeAggregation designs can vary over time
MonthMonth
44
Partitioning And Agg DesignPartitioning And Agg Design(rolling ‘n’ months)(rolling ‘n’ months)
Aggregation designs are per-partitionAggregation designs are per-partition Perform UBO on most recent partitionPerform UBO on most recent partition Copy design to empty “base” partitionCopy design to empty “base” partition Clone “base” for new partitionsClone “base” for new partitions
MonthMonth
11
MonthMonth
BASEBASE(no (no
data)data)
MonthMonth
22MonthMonth
33MonthMonth
3535MonthMonth
3636MonthMonth
3737
Design aggs based on most recent usageDesign aggs based on most recent usage
Copy new agg design to base partitionCopy new agg design to base partition
Clone base for new partitionClone base for new partition
Incremental updates in new partitionIncremental updates in new partition
NeNeww
datdataaDelete oldest partition when obsoleteDelete oldest partition when obsolete
Memory ManagementMemory Management
Dimension, replica and shadow Dimension, replica and shadow memorymemory
Memory used during processingMemory used during processing Query results cacheQuery results cache Memory cleaningMemory cleaning Handling large memory needsHandling large memory needs
Concern is Concern is virtual memoryvirtual memory of ofmsmdsrv processmsmdsrv process
Need enough physical memory to avoid Need enough physical memory to avoid pagingpaging
ReplicaReplicaMemoryMemory
ShadowShadowDimensionsDimensions
ProcessingProcessingBuffersBuffers
Read aheadRead ahead
ConnectionsConnections
DimensionDimensionMemoryMemory
AvailableAvailableCacheCache
Min
imu
m allo
cated m
emo
ryM
inim
um
allocated
mem
ory
Mem
ory co
nservatio
n th
resho
ldM
emo
ry con
servation
thresh
old
Server Memory UsageServer Memory UsageLarger memory needsLarger memory needs
By default AS can use up to 2GBBy default AS can use up to 2GB Address space of a Win32 processAddress space of a Win32 process Msmdsrv process is not AWE-awareMsmdsrv process is not AWE-aware
Use the /3GB switch for moreUse the /3GB switch for more Change boot.ini file to enable 3GBChange boot.ini file to enable 3GB Requires Windows Advanced Server or DatacenterRequires Windows Advanced Server or Datacenter Set AS high and low memory limits appropriatelySet AS high and low memory limits appropriately
To use even more memory: 64-bit serversTo use even more memory: 64-bit servers Use SQL Server 2000 (64-bit) Analysis ServicesUse SQL Server 2000 (64-bit) Analysis Services Use Windows Server 2003Use Windows Server 2003
Why 64-bit?Why 64-bit? All about capacity; not performance; large All about capacity; not performance; large
virtual address spacevirtual address space Tens of millions of members in a dimensionTens of millions of members in a dimension Able to cache large amount of aggregates (virtually Able to cache large amount of aggregates (virtually
unlimited query cache)unlimited query cache) Huge processing buffers at the same time as large Huge processing buffers at the same time as large
query cachequery cache
Performance impactPerformance impact Marginal for most applications (% delta; X)Marginal for most applications (% delta; X) Exception: Process buffers in multiple GB (10-Exception: Process buffers in multiple GB (10-
20GB); never having to go to temp files20GB); never having to go to temp files
Minimizing the effect on the Minimizing the effect on the host RDBMShost RDBMS Optimize your cube schemaOptimize your cube schema
Goal: pure table scan of the fact tableGoal: pure table scan of the fact table
If you have the capacity If you have the capacity , run the , run the RDBMS and AS on the same RDBMS and AS on the same machinemachine No network roundtripsNo network roundtrips With a full process, it can mean LOTS With a full process, it can mean LOTS
of dataof data
Schema optimizationSchema optimization
Dave WickertDave WickertProgram ManagerProgram ManagerBI Practices TeamBI Practices Team
Schema OptimizationSchema Optimization Eliminate joins in RDBMS when processingEliminate joins in RDBMS when processing
Goal is a fact table scanGoal is a fact table scanNote:Note: Not all joins can be eliminated, e.g. the one Not all joins can be eliminated, e.g. the oneused to enforce the data slice if using partitioningused to enforce the data slice if using partitioning
Criteria in BOL and the Performance GuideCriteria in BOL and the Performance Guide Topic: “Optimizing Cube Schemas”Topic: “Optimizing Cube Schemas”
Doing schema optimizationDoing schema optimization Use Optimize Schema tool in Cube Editor -or-Use Optimize Schema tool in Cube Editor -or- By-hand by setting the member key column to the fact By-hand by setting the member key column to the fact
tabletableNote:Note: A dimension is ‘unoptimized’ if you remove it and A dimension is ‘unoptimized’ if you remove it and then re-add it to the cubethen re-add it to the cube
More TopicsMore TopicsAll Discussed in Performance and Operations All Discussed in Performance and Operations GuidesGuides
Changing dimensionsChanging dimensionsand flex aggsand flex aggs
Cache warmingCache warming Incremental updatesIncremental updates Tuning the RDBMSTuning the RDBMS Data typesData types Unique member keysUnique member keys Virtual cubesVirtual cubes Distinct countsDistinct counts Optimizing hardwareOptimizing hardware Performance countersPerformance counters Process buffer tuningProcess buffer tuning 64-bit systems64-bit systems
Storage mode selection, i. e. Storage mode selection, i. e. MOLAP vs. ROLAP vs. HOLAPMOLAP vs. ROLAP vs. HOLAP
Database placement on serversDatabase placement on servers Optimizing cell writebackOptimizing cell writeback More schema optimizationMore schema optimization Parent-child dims and ROLAPParent-child dims and ROLAP Grouping measures by usageGrouping measures by usage Middle tier systems / connection Middle tier systems / connection
poolingpooling
For queries:For queries: ““Service” process is fully multithreadedService” process is fully multithreaded Various types of worker thread queues for queriesVarious types of worker thread queues for queries
For processing:For processing: Parallelized at the segment-level and read-ahead Parallelized at the segment-level and read-ahead
buffersbuffers Therefore, for large SMP box, use the Parallel Therefore, for large SMP box, use the Parallel
Processing UtilityProcessing Utility Latest on at the Microsoft.com Download Center – search Latest on at the Microsoft.com Download Center – search
for “Analysis Services”for “Analysis Services” Marginal returns above 8-12 requests in parallelMarginal returns above 8-12 requests in parallel
Lazy aggregator is single threadedLazy aggregator is single threaded
Scale-up performanceScale-up performanceSMP – more CPUsSMP – more CPUs
Uses “web farm” technologyUses “web farm” technology May not need it so long as you ‘hit’ May not need it so long as you ‘hit’
aggregates; normal cases focus on:aggregates; normal cases focus on: Good aggregation design (usage-based)Good aggregation design (usage-based) Good partitioning designGood partitioning design
However, there can be problems areasHowever, there can be problems areas ““Wide queries” , e.g. top count, median vs. Wide queries” , e.g. top count, median vs.
mean, etc.mean, etc. Very complex cubes with random queries , Very complex cubes with random queries ,
where the probability of hitting an aggregate is where the probability of hitting an aggregate is very low very low
Scale-out performanceScale-out performanceNLB ClusteringNLB Clustering
InternetInternetInternetInternet
CorpnetCorpnetCorpnetCorpnet
NLB clusteredNLB clustered(one system to the outside world)(one system to the outside world)
Data flows in – Cubes flow outData flows in – Cubes flow out
Scale-out performanceScale-out performanceNLB ClusteringNLB Clustering . . .
. . .
Easy to add new capacityEasy to add new capacity Just roll in a new server, install, update the Just roll in a new server, install, update the
OLAP data folder and converge it into the OLAP data folder and converge it into the clustercluster
Can mix and match with any size systemCan mix and match with any size system No additional HW or SWNo additional HW or SW
All you need is Windows Adv ServerAll you need is Windows Adv Server Linear scalability Linear scalability
and . . . high availability since each system is and . . . high availability since each system is standalonestandalone
Scale-out performanceScale-out performanceWhy NLB?Why NLB?
Requires networking expertiseRequires networking expertise Requires scripting supportRequires scripting support
Application Center can be used to assistApplication Center can be used to assist
Requires “n” copies of all data folderRequires “n” copies of all data folder One for each node in the cluster One for each node in the cluster
Not all AS capabilities can be supported,Not all AS capabilities can be supported,i.e. writeback or “what-if” is neededi.e. writeback or “what-if” is needed
Only used to address querying as the Only used to address querying as the bottleneck; does not help processing if it is bottleneck; does not help processing if it is the problemthe problem
Scale-out performanceScale-out performanceWhy not NLB?Why not NLB?
SummarySummary
Out-of-the-box defaultsOut-of-the-box defaults Performance tuningPerformance tuning
Aggregation design; Storage modesAggregation design; Storage modes
Cube designCube design Partitioning; Memory managementPartitioning; Memory management Schema optimization; Changing dimensionsSchema optimization; Changing dimensions
Architectural designArchitectural design Scale-out with NLB Clustering (web-farm of Scale-out with NLB Clustering (web-farm of
Analysis servers)Analysis servers)
Call To ActionCall To ActionGet more involved with the Get more involved with the scalability feature of Analysis scalability feature of Analysis Services; read the Performance Services; read the Performance and Operations Guides; make it and Operations Guides; make it right – get the most from your right – get the most from your systemsystem
For more information, please email For more information, please email [email protected]
You can download all presentations atYou can download all presentations atwww.microsoft.com/usa/southcentral/
SQL Server Summit SQL Server Summit Brought To You By:Brought To You By:
© 2004 Microsoft Corporation. All rights reserved.© 2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.