Microsoft SQL Server Data Warehouses for SQL Server DBAs

Microsoft SQL Server Data Warehouses for SQL DBAs

SQL Saturday Philly June 9, 2012

MSSQLDUDE

Twitter: @mssqldudehttp://mssqldude.wordpress.com

SQL Server Magazine BI Bloghttp://www.sqlmag.com/blog/sql-server-bi-blog-17

mkromer@microsoft.com

Twitter: @jdanton

http://joedantoni.wordpress.com

jdanton1@yahoo.com

Agenda

• The SQL Server data warehouse landscape• Managing and Monitoring the data

warehouse− Managing fragmentation− Managing reporting users (SSRS)

• Tuning the DW databases− Compression considerations− TempDB & User Database layout

• Data Loading− SSIS (ETL)

• SAN & Data File Recommendations• End-user BI solutions

− Semantic Models (SSAS)− In-memory Analytics

Microsoft Data Warehousing Offerings

Tier 1 Offerings

EnterpriseFast Track Data

Warehouse

HP Business DW

Appliance

Parallel Data Warehouse

Scalable and reliable platform for Data

Warehousing on any hardware

Reference Architectures offering best price

performance for Data Warehousing

An affordable SMP solution for data warehousing on

optimized hardware

Appliance for high end Data Warehousing requiring

highest scalability, performance or complexity

Ideal for data marts or small to mid-sized

enterprise data warehouses (EDWs)

Ideal for data marts or small to mid-sized DWs with

scan centric workloads

Ideal for small data marts or DWs with scan centric

workloads

Offers flexibility in hardware and architecture

Software onlyReference Architectures (Software and Hardware)

Integrated Appliance (Software and Hardware)

DW Appliance(Fully integrated Software

and Hardware)

Scale up data warehousing Scale up data warehousing Scale up data warehousing

Scale out data warehousing with massively parallel processing (MPP)

10s of terabytes 4–80 terabytes Up to 5 terabytes 10s–100s of terabytes

Some Data Warehouses today

Big SANBig SMP ServerConnected together

What’s wrong with this picture?

Answer: system out of balance

This server can consume 12 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec Even when the SAN is dedicated to the SQL Data

Warehouse, which it often isn’t Queries are slow

Despite significant investment in both Server and Storage

Result: significant investment, not delivering performance

Challenges of traditional Data Warehouse

The Alternative: A Balanced System

Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload

Avoid sharing storage devices among servers

Avoid overinvesting in disk drives

Key to Fast Track Data Warehouse Architecture

Data Warehouse appliances for SQL Server

SQL Server Fast Track Data Warehouse

A method for designing a cost-effective, balanced system for Data Warehouse workloads

Reference hardware configurations developed in conjunction with hardware partners using this method

Best practices for data layout, loading and management

Solution to help customers and partners accelerate their data warehouse deployments

Fast Track Data Warehouse Components

Software:• SQL Server 2008 R2

Enterprise• Windows Server 2008 R2

Hardware:• Tight specifications for

servers, storage and networking

• ‘Per core’ building block

Configuration guidelines:• Physical table structures• Indexes• Compression• SQL Server settings• Windows Server settings• Loading

Core Fast Track Metrics

• These metrics are use to validate Fast Track− Maximum Consumption Rate (MCR)

− Ability of SQL Server to process data from memory for a specific CPU and Server combination and a standard SQL query.

− Benchmark Consumption Rate (BCR)− Ability of SQL Server to process data from disk for a specific CPU

and Server combination and a user workload or query.

System Benchmarking - MCR

• MCR: Measures the rate at which SQL Server can process data from memory for a given CPU & Server.− Measured per physical core− Page compressed data

• Similar in concept to “Miles Per Gallon” rating for a new car.− Not necessarily what you will see when you drive the car,

but a good starting point.• Current value for published Fast Track RA’s

− 200MB/s per core

Establishing Fast Track MCR

• Create a dataset that contains at least one table that will fit in memory− Enabled PAGE level compression− Use data representative of the target workload

• Load the table into memory by executing your chosen query against the table− After loading into memory execute a query to scan the table

and check to see if there is any disk activity

System Benchmarking - BCR

• BCR: Measures the rate at which SQL Server can process data from disk for a given CPU & Server for a unique customer workload− Measured per physical core− Page compressed data

• Similar in concept to “Actual Miles Per Gallon” with your current driving habits.

• A BCR that is 30% below the MCR rating for the system indicates a higher MCR rated system should be chosen

Establishing Fast Track BCR

• To determine BCR−Create a dataset of at least one table

− Table should be large enough to not fit entirely into Buffer Pool or in SAN array cache

− You can use synthetic dataset if customer data is not available

− If using synthetic dataset it is important to approximate the expected characteristics of the targeted data

DEMOManaging Large DWs on SQL Server EE

MCR & BCR

Fast Track Reference Configurations

2 Processor Configurations (5 – 20 TB, 2-3.7 GB/s) HP ProLiant DL380 G7 HP ProLiant DL385 G7 IBM System x3650 M3 Bull Novascale R460 E2

4 Processor Configurations (20 – 40 TB, 3.5-7.5 GB/s) HP ProLiant DL 580 G7 HP ProLiant DL 585 G7 IBM System x3850 x5 Bull Novascale R480 E1

8 processor Configurations (40 – 80 TB, 7.5-14 GB/s) HP ProLiant DL 980 G7

Represents storage array fully populated with 300GB15k SAS and use of 3:1 compression ratio. This includes the addition of one storage expansion tray per enclosure.

Data Warehouse Workload Characteristics

Scan Intensive

Hash Joins

Aggregations

SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE,SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE,SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX))

AS SUM_CHARGE,AVG(L_QUANTITY) AS AVG_QTY,AVG(L_EXTENDEDPRICE) AS AVG_PRICE,AVG(L_DISCOUNT) AS AVG_DISC,COUNT(*) AS COUNT_ORDER

FROM LINEITEMGROUP BY L_RETURNFLAG,

L_LINESTATUSORDER BY L_RETURNFLAG,

L_LINESTATUS

Software configurationSQL Server Startup

• -E : Allocate 64 extents at a time (4MB)− This is not a guarantee of a logically contiguous extent

allocation• -T1117: Autogrow in even increments

Software configurationTemp DB

• Follow standard tempdb best practices− Auto-Grow should be enabled for tempdb

− Use large growth increment (10% of initial size)

• Create one Tempdb data file per LUN− Make all files the same size

• Transaction log is allocated to a dedicated Log LUN• Sizing the tempdb

− Typically 20-30% of primary data space− Tempdb is not compressed

Software configurationTemp DB & TLOG

• TempDB− Preallocate space, and add a single data file per LUN

− Make all files the same size.− Assign temp log files onto one of the LUNs dedicated to LOG files.− Enable Autogrow

• TLOG− Create a single transaction log file per database on one of the

LUNs assigned to the transaction log space− Spread log files for different databases across available LUNs or

use multiple log files for log growth as required.− Enable the autogrow option for log files.

• User DB Data Files− Remember: Do NOT enable autogrow;− Avoid fragmentation!

DW Server Baseline Configs

• Configuration: per Fast Track validated RA’s− Base Requirements

− 2 Socket server− At least 32GB main board memory− At least 1 8x PCIe open for HBA use− Blade servers not supported

• Memory: 4GB per core minimum− More memory may be warranted for a given customer

workload

DEMOManaging Large DWs on SQL Server EE

Resource Governor, Compression and Partitioning

Fast Track Data Striping

• Fast Track evenly spreads SQL data files across physical RAID-1 disk arrays

Microsoft Confidential

ARY01D1v01

ARY01D2v02

ARY02D1v03

ARY02D2v04

ARY03D1v05

ARY03D2v06

ARY04D1v07

ARY04D2v08

ARY05v09

DB1-1.ndf DB1-7.ndfDB1-5.ndfDB1-3.ndf

DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf

DB1.ldf

Primary Data Log

FT Storage EnclosureRaid-1

Disk 1 & 2

User Databases

• Create at least one Filegroup containing one data file per LUN−FT targets 1:1 LUN to CPU core affinity−Make all files the same size−Effectively stripes database files across data LUNs

• Multiple file groups are necessary• Best Practice: Pre-allocate all databases and

do not use Autogrow.• When Autogrow is used

−Trace flag TF1117 will enforce even file growth and maintain the 4MB extent allocation (stripe width).

Transaction Log

• Create a single transaction log file per database and place on a dedicated Log LUN.

• Enable auto-grow for log files• The transaction log size for each database should

be at least twice the size of the largest DML operation

SQL Server File Layout

LUN16 LUN 2 LUN 3

Local Drive 1

Log LUN 1

Permanent DB Log

TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB)

Permanent FG

Permanent_1.ndf

se Stage FG

Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf

Stage DB Log

Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf

SQL Server Parallel Data Warehouse

Control Rack Data Rack

Compute Nodes Storage Nodes

Spare Compute

Control NodesActive / Passive

Landing Node

Backup Node

Management Nodes

Private Network

Data racksControl rack

Linear Scalability

Control Rack DataRack

Expand to 4 data racks and quadruple your performance and capacity!

4 Data Rack

• 47 Servers • 82 Procs• 492 Cores

1 Data Rack

•17 Servers•22 Procs•132 Cores

PDW POC Query Results

Q1 Q2 Q3 Q4 Q5 Q60

50010001500200025003000350040004500

16 6 2 2 2 4

120 120 120

Query Speed in Seconds

PDW Time Orig. Time

263x 200x 60x 60x 60x 300x PDW times faster than original query speeds

Parallel Data Warehouse ApplianceHardware Architecture

Compute Nodes

Control Nodes

Active/Passive

Landing Node

Backup Node

Storage Nodes

Spare Compute Node

Management Nodes

Client Drivers

ETL Load Interface

Corporate Backup Solution

Data Center Monitoring

Corporate Network Private Network

Parallel Data Warehouse benefitsMassively Parallel Processing

Compute Nodes

Control Nodes

Active/Passive

Landing Node

Backup Node

Storage Nodes

Spare Compute Node

Management Nodes

Corporate Network Private Network

Query 1

Query 1 is submitted to SQL Server on Control Node

Query is executed on all 10 NodesResults are sent back to client

Parallel Data Warehouse benefitsMassively Parallel Processing

Compute Nodes

Control Nodes

Active/Passive

Landing Node

Backup Node

Storage Nodes

Spare Compute Node

Management Nodes

Corporate Network Private NetworkBlazing fast performance by parallelizing queries on highly

optimized shared nothing nodes

Multiple queries are simultane-ously executed across all nodes.

PDW supportsquerying whiledata is loading.

? ????

??? ? ??????? ? ??????? ? ????

??? ? ??????? ? ????

??? ? ??????? ? ??????? ? ????

??? ? ??????? ? ????

Data Layout Approaches

• ReplicatedA table structure that exists as a full copy within each discrete PDW Node.

• DistributedA table structure that is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the DBMS.

• Ultra Shared NothingThe ability to design a schema of both distributed and replicated tables to minimize data movement between nodes .

− Small sets of data can be more efficiently stored in full (replicated).

− Certain set operations (i.e., single-node operations) are more efficient against full sets of data.

Software Architecture

Compute NodesCompute Nodes

Compute Node

Query Tool

MS BI(AS, RS)

Control Node

Other Third-Party Tools

Landing Zone Node

Internet

Explorer

SQL Server

DW Authenticati

DW Configuratio

DW Schema

TempDB

SQL ServerUser Data

Data Movement Service

MPP Engine Coordinator

IISAdmin

Console

Data Access (OLEDB, ODBC, ADO.NET,

MPP Engine CoordinatorProvides single system imageSQL compilationGlobal metadata and appliance configurationGlobal query optimization and plan generationGlobal query execution coordinationGlobal transaction coordinationAuthentication and authorizationSupportability (hardware and software status)

Data Movement ServiceData movement across the

applianceDistributed query execution

operators

SQLParser

DMS Manag

Core Engine Service

sBackup Node

Data Movement Service

SQL Server 2012 DWColumnstore Indexes

Blazing-Fast Performance

¹Source: Microsoft customer evidence, Choice Hotels International²Source: Microsoft customer evidence, KAS Bank³Source: Microsoft customer testing; common data warehousing queries

ColumnStoreIndexNew

transactions per day²

100,000,000

10xNow, up to Faster³

transactions per second¹

57,000“400 percent improvement in performance.”First American Title

Insurance Company

Columnstore Indexes:Fetch Only Needed Columns

SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108

StoreKey

RegionKey

Quantity

OrderDateKey

20101107

20101108

OrderDateKey

20101108

20101109

ProductKey

SalesAmount

Columnstore Indexes:Processing Data

• Columnstore vs. Rowstore• New Batch Processing in

Engine• Columnstore index scan can

produce batches or rows−Batch-enabled operators get

batches−Non-batch operators get rows

• Query optimizer decides List

qualif

Column vectors

Batch object

SSRS Scale OutMicrosoft SQL Server BI

SSRS Recommended Scale-Out 1 of 3

In a standard scale-out server deployment, multiple report servers share a single report server database. The report server database should be installed on a remote SQL Server instance. The following diagram is an example of a standard scale-out server deployment configuration with the report server database on a remote SQL Server instance.

SSRS Recommended Scale-Out 2 of 3

As another option, you might decide to host the report server database on a SQL Server instance that is part of a failover cluster. The following diagram is an example of a scale-out server deployment configuration where the report server databases are on an instance that is part of a failover cluster.

SSRS Recommended Scale-Out 3 of 3In addition to the standard scale-out deployment, you might determine that your reporting environment would benefit from a more advanced scale-out deployment configuration. For example, you might decide to use the load-balanced report servers for interactive report processing and add a separate report server computer to process only scheduled reports. The following diagram is an example of this advanced scale-out server deployment configuration.

SSRS LogsLog Description

Report Server Execution Log The report server execution log contains data about specific reports, including when a report was run, who ran it, where it was delivered, and which rendering format was used.The execution log is stored in the report server database.

Report Server Service Trace Log The service trace log contains very detailed information that is useful if you are debugging an application or investigating an issue or event. The file is located at \Microsoft SQL Server\<SQL Server Instance>\Reporting Services\LogFiles.

Report Server HTTP Log

The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server Web service and Report Manager. HTTP logging is not enabled by default. You must modify the ReportingServicesService.exe configuration file to use this feature in your installation. The file is located at \Microsoft SQL Server\<SQL Server Instance>\Reporting Services\LogFiles.

SSAS Backup & RecoverMicrosoft SQL Server BI

SSAS Backups

• Take full backups on as regular a basis as possible− Creating a backup file is an I/O intensive operation. You can improve

performance by writing the backup to a high-speed disk drive or put the backup file on a separate drive to avoid having read operations (from users) collide with the backup’s write operations

• Synchronize to a secondary AS server• Reprocess cube from source code• Detach/Attach• In SSMS, right-click an SSAS database and click Script

Database to create an XMLA script that can be executed later to re-create the SSAS database. This approach also requires that you reprocess the SSAS database.

• ASCMD

SSAS Deployment Best Practices

• SSAS LOVES RAM− 64-bit hardware is recommended− Monitor your SSAS instances memory usage with these

perfmon counters (make sure they are SSAS, not MSSQL):− MSAS2008:Memory\Memory Usage Kb− MSAS2008:Memory\Memory Limit Low Kb− MSAS2008:Memory\Memory Limit High Kb

• $SYSTEM.MDSCHEMA_CUBES• $SYSTEM.MDSCHEMA_DIMENSIONS• $SYSTEM.MDSCHEMA_FUNCTIONS• $SYSTEM.MDSCHEMA_HIERARCHIES• $SYSTEM.MDSCHEMA_INPUT_DATASOURCES• $SYSTEM.MDSCHEMA_KPIS• $SYSTEM.MDSCHEMA_LEVELS• $SYSTEM.MDSCHEMA_MEASUREGROUP_DIMENSIONS• $SYSTEM.MDSCHEMA_MEASUREGROUPS• $SYSTEM.MDSCHEMA_MEASURES• $SYSTEM.MDSCHEMA_MEMBERS• $SYSTEM.MDSCHEMA_PROPERTIES• $SYSTEM.MDSCHEMA_SETS

SSAS General Best Practices

• Turn off Flight Recorder in Production• Align SSAS partitioning with source DB partitions• Consider “Lock Pages in Memory”, Windows Group

Policy setting for SSAS (and SQL Server, too)• Partitioning• Incremental processing• Synchronize processing server with query server via

backup/restore or detach/attach• http://msftasprodsamples.codeplex.com

Network Packet Size Helps for SSIS & SSAS

Under the properties of your data source, increasing the network packet size for SQL Server minimizes the protocol overhead require to build many, small packages. The default value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of 32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’t change the value in SQL Server using sp_configure; instead override it in your data source. This can be set whether you are using TCP/IP or Shared Memory.

SQL Server Integration Services

Microsoft SQL Server BI

SSIS Performance• SQLServer:SSIS Service• SSIS Package Instances - Total number of simultaneous SSIS Packages running• SQLServer:SSIS Pipeline• BLOB bytes read - Total bytes read from binary large objects during the monitoring

period.• BLOB bytes written - Total bytes written to binary large objects during the

monitoring period.• BLOB files in use - Number of binary large objects files used during the data flow

task during the monitoring period.• Buffer memory - The amount of physical or virtual memory used by the data flow

task during the monitoring period.• Buffers in use - The number of buffers in use during the data flow task during the

monitoring period.• Buffers spooled - The number of buffers written to disk during the data flow task

during the monitoring period.• Flat buffer memory - The total number of blocks of memory in use by the data flow

task during the monitoring period.• Flat buffers in use - The number of blocks of memory in use by the data flow task at

a point in time.• Private buffer memory - The total amount of physical or virtual memory used by

data transformation tasks in the data flow engine during the monitoring period.• Private buffers in use - The number of blocks of memory in use by the

transformations in the data flow task at a point in time.• Rows read - Total number of input rows in use by the data flow task at a point in

time.• Rows written - Total number of output rows in use by the data flow task at a point in

“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, you will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked

SSIS Best Practices

• Measure CPU Usage for DTExec− Process / % Processor Time (Total)

• Measure Network Throughput− Network Interface / Current Bandwidth: This counter provides an

estimate of current bandwidth.− Network Interface / Bytes Total / sec: The rate at which bytes are

sent and received over each network adapter.− Network Interface / Transfers/sec: Tells how many network

transfers per second are occurring. If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards.

• In SSIS, use NOLOCK or TABLOCK hints to remove locking overhead

• Be careful of SSIS package scheduling conflicts• Data flows in bulk instead of row-by-row• Consider ELT – Extract, Load & Transform using staging tables

Business Intelligence Architectures

In-memory models & reporting

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Microsoft SQL Server Data Warehouses for SQL Server DBAs

Technology

SQL SERVER Transact - SQL SQL SERVER Transact - SQL Kurz SQL SERVER Transact - SQL je určenï뾽 pre ľudï뾽, ktorï뾽 potrebujï뾽 v prostredï뾽 SQL Server zï뾽skavať dï뾽ta

Microsoft SQL Server - SQL Server Migrations Presentation

Welcome to Camp GeekaLot - Microsoft · Welcome to Camp GeekaLot Camp for SQL DBAs To Share Tips and Practices Chris Skorlinski Camp Leader Microsoft SQL Server Escalator Services

SQL Server Heterogêneo: SQL Server + BigData

DAT335 SQL Server 2000 Tips and Tricks: DBAs and Developers Kimberly L. Tripp Solid Quality Learning – SolidQualityLearning.com Email: Kimberly@SolidQualityLearning.com

SQL Server 2005: The CLR Integration What Developers and DBAs need to know

Geek Guide > SQL Server on Linux - suse.com · Microsoft SQL Server is an offering that includes not ... SQL Server 2012, SQL Server 2014 and SQL Server 2016). SQL Server works on

12 things Oracle DBAs must know about SQL

What DBAs Should Know About Windows Server 2012

SQL Server 2014 and the Data Platform - EOS Solutions · SQL Server 2000 SQL Server 2005 SQL Server 2008 SQL Server 2008 R2 SQL Server 2012 XML KPIs Management Studio Mirroring Compression

Powershell Jumpstart for SQL Server DBAs - Art of the DBA Jumpstart.pdf · Mike Fal- The What and Why of Powershell Language Basics Working with SQL Server And Then What? Get-Agenda

Large-Scale SQL Server Deployments for DBAs Unisys

SQL Server 2012 Essentials for Oracle DBAs - Training Day 1

What DBAs Should Know About Windows Server 2012 · PDF file•Performance and Scalability improvements ... What DBAs Should Know About Windows Server 2012 ... • Failover Clustering

Hadoop - An introduction for SQL Server DBAs

SQL Server Virtualization 101 - David Klee · Why Virtualize? 9 And Why Should DBAs Care? Business Cases • Reduced costs • Datacenter efficiency, consolidation & simplification

MySQL für Oracle DBAs - fromdual.com · Sybase ASE vs. MS SQL Server

SQL-Server 2012 Always On. SQL Server “SQL-Server 2012” Highlights High Availability SQL Server AlwaysOn Security & Manageability User-Defined Server

How SQL DBAs and Developers can Speak to Business Leaders

Power BI Sales Deck - download.microsoft.comdownload.microsoft.com/.../Tehnologijas_sql2014-A3.pdf · SQL Server 2000 SQL Server 2005 SQL Server 2008 SQL Server 2008 R2 SQL Server