52
SQLDay 2017 Azure Data Factory vs SSIS May the best tool win! Regis Baccaro @regbac http://theblobfarm.wordpress.com

Azure Data Factory vs SSIS Ins and Outs - SQLDaysqlday.pl/materials2017/Regis Baccaro/SQLDay Azure D… ·  · 2017-05-29SQLDay 2017 Azure Data Factory vs SSIS May the best tool

Embed Size (px)

Citation preview

SQLDay 2017

Azure Data Factory vs SSIS

May the best tool win!

Regis Baccaro@regbac http://theblobfarm.wordpress.com

SQLDay 2017

SQLDay 2017

About.me : Regis Baccaro

Consultant

Developer

Speaker

Author

Data Platform MVP

Farmer

SQL Nexus lead

MCT

SQLDay 2017

The contestants

SQLDay 2017

Agenda

What is Data Factory

Pricing

Core Concepts

JSON DatasetsPipeline & Activities

Scheduling & Execution

Building a pipeline

Azure PortalVisual Studio

PowerShellARM

TemplateREST API

SSIS 101What is SSIS

for?Benefits

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security

SQLDay 2017

What is Data Factory

Data Factory is a fully managed cloud-based data integration service that orchestrates and automates the movement and transformation of data

SQLDay 2017

Biggest mistake / pain points about Data Factory

Positioning it as SSIS for the Cloud

Adhoc execution

Datatype management

Datasets without time slices

Debugging

Continuous integration

SQLDay 2017

Azure Data Factory Architecture

No upfront cost

No termination fees

Pay as you g(r)o(w)

Pay for movement and Data usage

What is Data Factory Pricing

A Dataset is a logical description of the data

The mechanism (address, protocol, authentication scheme) to access the data is defined in the Linked Service and referenced in the dataset definition.

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Logical

Address

ProtocolAuthentication

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Pipeline = logical grouping of activities

Data movement activities

Data transformation activities

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Relationship between Data Factory entities

2 types of Activities:

Copy & Transform

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Data Movement Activities

Azure Blob storage Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure Table storage Azure Document Cosmos DB Azure Search IndexSQL Server* Oracle* MySQL* DB2* Teradata* PostgreSQL* Sybase*

Cassandra* MongoDB*Amazon RedshiftFile System*HDFS* Amazon S3 FTPSalesforceGeneric ODBC* Generic OData Web Table (table from HTML) GE Historian*

All are Sources / Sink are orange

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Data Transform Activities

TransformationHivePigMapReduceHadoop StreamingMachine Learning activities: Batch Execution and Update ResourceStored ProcedureData Lake Analytics U-SQLDotNet

HDInsight [Hadoop]HDInsight [Hadoop]HDInsight [Hadoop]HDInsight [Hadoop]Azure VM

Azure SQL, Azure SQL Data Warehouse, or SQL ServerAzure Data Lake AnalyticsHDInsight [Hadoop] or Azure Batch

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Recurring schedule

Data slices

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Core Concepts

DatasetsPipeline & Activities

Scheduling & Execution

Compute Linked

Services

Compute EnvironmentHDInsight [Hadoop]HDInsight [Hadoop]HDInsight [Hadoop]HDInsight [Hadoop]Azure VMAzure SQL, Azure SQL Data Warehouse, or SQL ServerAzure Data Lake AnalyticsHDInsight [Hadoop] or Azure Batch

TransformationHivePigMapReduceHadoop StreamingMachine Learning activities: Batch Execution and Update ResourceStored ProcedureData Lake Analytics U-SQL

PORTAL.AZURE.COM

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

Linked Service represents

Data Stores = Source or Sink

Compute resource = Data transformation

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

SQLDay 2017

Data Stores

Contain credentials and connection information for Sources and Destinations.

An On Premises Data Store MUST reference a Data Gateway

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

Dataset

Data structure in the data store

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

Demos

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

PowerShellGet-AzureRmDataFactoryDatasetGet-AzureRmDataFactoryGatewayGet-AzureRmDataFactoryHubGet-AzureRmDataFactoryLinkedServiceGet-AzureRmDataFactoryPipelineGet-AzureRmDataFactoryRunGet-AzureRmDataFactorySlice

Azure Resource Manager TemplateMore JSON !!

REST APICURL Tool with RESTCreate Web Application in AADAssign to Data Factory Contributor RoleUse CURL to communicate with Web Application

Building a pipeline

Azure Portal

Visual Studio

PowerShellARM

TemplateREST API

$cmd = {.\curl.exe -X PUT -H "Authorization: Bearer $accessToken" -H "Content-Type: application/json" --data “@azurestoragelinkedservice.json” https://management.azure..

$results = Invoke-Command -scriptblock $cmd;

SQLDay 2017

Data Extract, Transformation and Loading tool

Born in 2005 – enterprise ready but still room for new features

Part of SQL Server license and installation

SSIS 101 What is SSIS for?

Benefits

SQLDay 2017

Rich development tool

Many built-in transformations

Extensible with scripts

SSIS 101 What is SSIS for?

Benefits

Development tool

Administration tool

Data source & destinations

Data transformations

Price Error handling Deployment Monitoring

Security Technology RequirementsBig Data

compatibility

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SSIS ADF

SSDT Azure portal ADF Editor

Free PowerShell

JSON Scripts

Visual Studio

SSIS ADF

Standalone tool Yes Yes

Powerful GUI Yes No

Available Free Free

Prerequisite SQL Server Azure Subscription

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SSIS ADF

SSMS Azure portal

PowerShell PowerShell w/ADF Cmdlets

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

Comparing SSIS and

ADFDev Admin Pricing Deployment Monitoring

Source & Destinations

Security Performance

Orchestration

- In the cloud

- High frequency: €0,51 per activity/month

- Low frequency: €0,84 per activity/month

- On-prem with Data Mgt Gateway

- High frequency: €1,26 per activity/month

- Low frequency: €2,11 per activity/month

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

Data movement

- Between Cloud data stores: €0,21/hour

- When on-prem is involved: €0,08/hour

Inactive Pipelines:€0,67/month

Comparing SSIS and

ADFDev Admin Pricing Deployment Monitoring

Source & Destinations

Security Performance

Re-running activities

- In the Cloud: €1,13 per 1000 re-runs

- On-prem: €2,83 per 1000 re-runs

Inactive Pipelines:€0,67/month

Calculatorhttps://azure.microsoft.com/en-us/pricing/calculator/?service=data-factory

Basic SSIS

Free = Express edition – Import & Export wizard

Standard SSIS

Standard (and BI editions)

Enterprise SSIS (CDC & Advanced adapters)

Enterprise edition

Still true with SQL Server 2016 SP1

Comparing SSIS and

ADFDev Admin Pricing Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Licensing Yes

Pay for features Yes

Pay per usage Yes

Comparing SSIS and

ADFDev Admin Pricing Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Azure environment Yes

Hardware setup Yes

Software setup Yes

Administration costs / Data center Yes

Comparing SSIS and

ADFDev Admin Pricing Deployment Monitoring

Source & Destinations

Security Performance

SSIS

MSDB

Project deployment

Package deployment

SSIS Catalog

ADF

Power shell scripts

Automization w/ PS Scripts

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Alerts Yes

Error Loging Yes Yes

Error handling Yes

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SSIS Logging

SSIS Catalog reports

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

Diagram view

Drill through features

Capable GUI

Data slice execution

Data lineage

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Monitoring GUI Yes Yes

Drillthrough Yes Yes

Data slice Yes

Data lineage Yes

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SQL ServerOracleSAPAzureAccessSybasePostGresSQLFoxProSharePointWebService…..

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

Azure Blob storage Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure Table storage Azure DocumentDB[Azure Search Index]SQL Server* Oracle* MySQL* DB2* Teradata* PostgreSQL*

Sybase* Cassandra* MongoDB*Amazon RedshiftFile System*HDFS* Amazon S3 FTPSalesforceGeneric ODBC* Generic OData Web Table (table from HTML) GE Historian*

All are Sources / Sink are orange

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Copy Yes Yes

C# custom transformations

Yes Yes

Pig and Hive Yes Yes*

Azure ML Scoring Yes With scripting

Stored procedure Yes Yes

Built-in transformations Yes

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

ADF SSIS

Role based Yes Yes

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SQLDay 2017

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SQLDay 2017

Comparing SSIS and

ADFDev Admin Deployment Monitoring

Source & Destinations

Security Performance

SQLDay 2017

Not build for the same purpose

ADF still a V1 product

ADF great for Cloud Data integration for MS and Azure

Get benefits of both

Hybrid SSIS and ADF

Cloud based data movement, computing and monitoring

On premises Data Transformations

Conclusion

SQLDay 2017

Contact me : [email protected]

Twitter : @regbac

Questions?

SQLDay 2017