Data Warehouse in the Cloud Marketing or Reality? SQL_2013_DWH_IN_THE... · Data Warehouse in the Cloud – Marketing or Reality? Alexei Khalyako Sr. Program Manager ... –Iaas SQL

Data Warehouse in the Cloud –

Marketing or Reality?

Alexei Khalyako

Sr. Program Manager

Windows Azure Customer Advisory Team

Data Warehouse we used to know

• High-End workload

• High-End hardware

• Special know-how

*BeyeNetwork Big Data research

Reality is

• Thousands of departmental level DW

• Relatively low perfSLA

http://de.slideshare.net/h1pan/big-data-analytics-16229308

New BI demands

• Utilize external data sources

• Non Structured Data

• Origin is in the Cloud

*BeyeNetwork Big Data research

http://de.slideshare.net/h1pan/big-data-analytics-16229308

New opportunity

• Platform is there

– Iaas SQL VM

– Paas SQL Azure DB

• “Closer” to data

• Less administrative overhead

• Lower initial and TCO cost

SQL Server Data Warehousing in Windows Azure Virtual Machines

• Inspired by the Fast Track Reference Architecture guide

• Based on the High Memory images

• Up to 1TB

• MSDN: SQL Server Data Warehousing in Windows Azure Virtual Machines

http://msdn.microsoft.com/en-us/library/dn387396.aspx

High Memory VM in Azure

How to deploy

• Powershell script • Windows Azure Gallery

The Azure Data Warehouse under the hood

Data Warehouse Lifecycle• Thoughts on the architecture• Creating DB• Connectivity• Populating Database

– Initial data loading OR– Backup/Restore– Incremental data loading– Compression

• Query performance

Thoughts on the architecture

• Data Loading– Minimize Log impact– Scale loading streams– Do not invent the wheel and follow

the Data loading Performance guide

• Query Performance

! Do not invent the wheel and follow the Data loading

Performance guide

http://blogs.msdn.com/b/sqlcat/archive/2009/02/12/the-data-loading-performance-guide-now-available-from-msdn.aspx?Redirected=true

Windows Azure VM Architecture

• Disks implemented as a shared multi-tenant service

• Built-in triple redundancy, optional geo-redundancy

• Performance less predictable than on-prem

Host machines, storage services, network bandwidth shared between subscribersPerf can depend on where and when VM is provisionedSubject to maintenance operationsGranular control & configurability vs. cost, simplicity, out of box redundancy

Storage Stamp

Stream Layer

Partition Layer

Front-ends

LB

Intra-stamp replication

Stream Layer

Partition Layer

Front-ends

LB

Intra-stamp replication

Storage Stamp

Geo-replication

Storage Location Service

Tweaks to improve IO Subsystem • Database file initialization

– GPEdit.msc

• Data file placement– SQL Striping for User Data

and TempDB– Aggregated throughput– Set the size and data grow options wisely

Primary

Log

*You may do it differently. Then• Create 350GB DB took ~3 hours

Scaling IO OptionsWindows Storage Spaces

• Log drive

• Not clear support story

SQL Data Files

• Spread File Group over all drives

Scaling IO OptionsData disk (read) LOG (write)

SQLIO Single Data Disk(64K)

SQLIO Windows Storage SpacesX3 Disks (64K)

SQLIO SQL Striping x3 Disk(64K) *

CUMULATIVE DATA:throughput metrics:

IOs/sec: 1215.13MBs/sec: 75.94


IOs/sec: 2677.69MBs/sec: 167.35


IOs/sec: 2742.22MBs/sec: 171.38

SQLIO Single Data Disk(256K)

SQLIO Windows Storage SpacesX3 Disks (256K)

SQLIO SQL Striping x3 Disk(256K)

CUMULATIVE DATA:

throughput metrics:IOs/sec: 288MBs/sec: 71.98





* But we can access one file at the time!

Connectivity Options

Windows Azure VM End Points Point-to-Site /Site-to-Site

*Other options are also available ( FTP)

What and how we tested

Getting initial data

• Copy backup to the Data Disks

• Backup/Restore to/from URL

• ETL to the new DB

URL is fast!Backup to the Local Data Disk

Backup to the URL

DB Size Time Speed

244GB 3 hours 22,978 MB/sec

DB Size Time Speed

244GB 46 min 90,667 MB/sec

DB and Data Loading

? Data loading ? Tools (BCP, SSIS..)

? Time SLA

? Query Performance? Indexing strategy

? Sizing? Compression

Loading Data in Azure • Smaller batches (10K -15K rows)• Retry logic• Network latency is high• Parallel loading!!

Start with: SSIS for Hybrid Data MovementSSIS Performance and Operational guide

http://sqlcat.com/sqlcat/b/whitepapers/archive/2013/01/02/ssis-tips-tricks-and-best-practices-ssis-for-azure-and-hybrid-data-movement.aspx

http://sqlcat.com/sqlcat/b/whitepapers/archive/2013/01/02/ssis-tips-tricks-and-best-practices-ssis-operational-and-tuning-guide.aspx

Baseline

• Understand Data Sources performance

– Flat File in Azure VM ~60 MB/sec /reads

– SQLIO shows the max throughput of the IO subsystem on the DB side

– App performance can be different

Parallel LoadingFlat fileMax 60 MB/sec

Flat fileMax 60 MB/sec

8 destinations to keep all CPU busy on the DW site

Mod(7) function

Begin to load

Monitoring Loading Performance

You will be followed by TOP waits:ASYNC_NETWORK_IOPAGEIOLATCH_EXWRITELOGPAGEIOLATCH_UPSOS_SCHEDULER_YIELDPAGEIOLATCH_SHPAGELATCH_UPPREEMPTIVE_OLEDBOPS

Network IO

Disk IO

CPU

Loading: table optionsHeap

• 780 772 573 rows

• Elapsed time: 01:06:15.313

Heap compressed

• 780 772 573 rows

• Elapsed time: 05:12:06.094

Loading: table optionsHEAP Clustered Index

• Sort!

• Elapsed time: 01:20:12.547

• 780 772 573 rows

• Elapsed time: 01:06:15.313

Query Performance

• Heap

• Primary Key/Clustered Index

• Compression

Query performance: results

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

Qu

ery

Exec

uti

on

tim

e

Query Type

Heap

Clustered Index

Clustered Index Compressed

Please welcome on stage SQL 2014

What’s new?

• Data files to BLOBs

• Updateable Clustered Column Store index

Loading dataClustered Column store Index

Load test 2 hours 16 min

Heap

1 hour 1 min

SQL 2014

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

CI 708 92. 385 695 834 413 407 508 971 791 251 529 209 393 345 108 729 725 742 111 221 214

CIComp 388 64. 207 382 483 162 253 485 653 451 241 279 158 139 147 85. 387 405 412 108 120 108

CSI 18. 9.9 20. 16. 20. 7.4 11. 24. 70. 63. 41. 14. 96. 12. 6.9 64. 11. 100 9.7 31. 51. 18.

0.00

500.00

1000.00

1500.00

2000.00

2500.00

Qu

ery

Exe

cuti

on

tim

e

Query type

CI CIComp CSI

x51

x39x2

x55

Query 19

Estimates vs Actual

And the winner is…

SQL Server 2014!!

Summary

• Easy and fast deployment through he Gallery or PS scripts

• Azure Data Warehouse is consistent with the most of the best practices– Query

– Loading

• Low Initial investments and TCO

THANK YOU!• For attending this session and

PASS SQLRally Nordic 2013, Stockholm

Documents

Data Warehouse in the Cloud Marketing or Reality? SQL_2013_DWH_IN_THE... · Data Warehouse in the Cloud – Marketing or Reality? Alexei Khalyako Sr. Program Manager ... –Iaas SQL