12
1 | Copyright © 2015 Tata Consultancy Services Limited Microsoft APS based EDW Sustaining Strategic Growth Implementing partitioning

Partition Switch based data loads

  • Upload
    typok1

  • View
    228

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Partition Switch based data loads

1| Copyright © 2015 Tata Consultancy Services Limited

Microsoft APS based EDW

Sustaining Strategic GrowthImplementing partitioning

Page 2: Partition Switch based data loads

2

Presented by: Leo Khaskin, Solution Architected

Agenda

Use Case

Best Practices

Future State Architecture

Live Demo

Partitioning based process template

Partition Switch Mechanics

Compare Existing vs Test Environment

Prototype Design

Performance Statistics

Considerations

Benefits Scalability

Process Control

Maintainability

Flexibility

Next Step - Implementation

Page 3: Partition Switch based data loads

3

Presented by: Leo Khaskin, Solution Architected

Use Case

When EDW on APS platform becomes matured with hundreds of data flows

pumping data into thousands of tables, production teams often times observe

slowdown in query performance and queuing of SQL queries, which leads to

significant delays in data delivery.

If updates to fact table are not limited to any point in time in the the recommended

method is CTAS which creates new table implementing relevant business rules,

drops existing table and renames temp table into original name.

With significant number of records (1B +) and complex rules the query becomes

heavy and might take significant time, consuming much of the appliance

resources, this blocking other queries from execution.

Also, SSAS model, sourced from the fact table will require Full Process, which

consumes significant time.

When CTAS execution time becomes close to SLA - it's right time to evaluate

Partition Switch option.

Page 4: Partition Switch based data loads

4

PDW Best Practices – Sustaining Strategic Growth

• Data preparation – NOT in PDW

• Optimize Query

• Utilize CSI

• Monitor PDW Resources

• Partition Switch

• Separated Processes:• Load

• Refresh

• Process SSAS

Process

Policy

Tool

PDW

Optimal

Performance

Page 5: Partition Switch based data loads

5

Future State Architecture – Sustaining Strategic Growth

Source

File

in

NAS

SSRS

12

3

7

Data Flow

1 Source System

2 Batch extract

3 SQL Server SMP – Data Preparation

4 Prepared data Increment

5 SSIS package

a DWLoader

b Partition Switch

c SSAS Processor

6 PDW

7 Data Consumers

Ad Hoc

Da

ta C

on

su

mers

NON AU Stage

DQA

Data Type Validation

Constraints Check

Surrogate Key Generator

Distribution Key Generator

De-Duplication

System of Records Prepared

Data

4

5a

6

PDW

Computations

Mart

Stage Fact

SSAS

DWLPS

TAB

5b

5c

Page 6: Partition Switch based data loads

6

Presented by: Leo Khaskin, Solution Architected

Partition Switch Mechanics

Load data

into PDW

FFLoader

Parallel Partitions

ProcessingProcess SSAS model

SSAS Processor

Page 7: Partition Switch based data loads

7

Presented by: Leo Khaskin, Solution Architected

Compare Existing vs Test Environment

*Only 2 partitions where executed in parallel due to memory constraints.SSIS is running on 4 core machine, max 6 partition can be processed simultaneously.Degree of parallelism is defined by SSIS server number of cores, configuration

settings and available memory.

Page 8: Partition Switch based data loads

8

Prototype Design

Metadata operation

Dataset operation

Presented by: Leo Khaskin, Solution Architected

Page 9: Partition Switch based data loads

9

Presented by: Leo Khaskin, Solution Architected

Performance Statistics – No pressure on PDW resources

Execution Notes:

Table depicts parallel execution average run time per partition.

Degree of parallelism is defined by SSIS server settings.

Highlighted executions are performed on the same table with Column Store Index (CSI) applied.

Averaged memory consumption

CPU utilization

Page 10: Partition Switch based data loads

10

Presented by: Leo Khaskin, Solution Architected

Considerations / Decisions

Partition grain: larger partition – fewer partitions count

System of records:Maintain a copy – create a new copy every run

Table availability:Table copy – single partition (on fly - switch out / in )

Page 11: Partition Switch based data loads

11

Presented by: Leo Khaskin, Solution Architected

Benefits

• Significantly shorter load time

• Possibility to process SSAS model incrementally

• Ability to use CSI • Data Compression – smaller footprint on disk

• Batch execution mode enabled

• Improved execution plans

• Faster queries performance

• Scalability to TB sizes

• Better process control

• Increased Maintainability

• Modular design – Reusable Components

• Data Recovery, Archiving, System of Record

Page 12: Partition Switch based data loads

12

Next Step - Implementation

Environment

Data

Contact us for evaluation:

Leo Khaskin, [email protected]

Huzeifa Nasir, [email protected]