31
ETL with SSIS 101 read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected] 1

ETL with SSIS 101 - belle's sql · PDF fileData Warehousing ETL Understanding the data – what needs to be captured and reported Also understanding your own Data Quality Naming conventions

Embed Size (px)

Citation preview

ETL with SSIS 101

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

whoami? aka shameless plugs

� Donabel Santos

� SQL Geek (I love SQL Server!), PowerShell enthusiast

� Principal, QueryWorks Solutions� Training, Consulting

� Instructor, BC Institute of Technology

� Blogger – sqlmusings.com

� Tweep – @sqlbelle

� Author – some articles + upcoming PowerShell/SQL Server book (summer 2012)

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Agenda

� What ETL is

� SSIS Refresher

� Demo, demo, demo

� SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Materials

� Presentation materials will be at my blog

� http://www.sqlmusings.com

� Demo files

� Slides

� Questions? Comments? Email me

[email protected]

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

ETL

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Extract Transform Load (ETL)

� A way to bridge “source” systems and

“destination” systems

� Mostly used for cleaning data and loading

data warehouse

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Lifecycle: Extract, Transform, Load

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL

� Understanding the data – what needs to be captured and reported

� Also understanding your own Data Quality

� Naming conventions

� Schemas would also be helpful (stg, etl, tmp, dwh)

� Auditing

� When packages last run, when records are last updated etc

� NULL Values

� Not acceptable in a DW

� Will cause loss of data when joins are done

� Maps to an “Unknown” default value in the dimensions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL

� Staging

� Good for auditing, recoverability, releasing sources (tables, files etc)

� Checksum

� Allows you to easily figure out if any records have changed

� Helpful for Slowly Changing Dimensions (SCD)

� Surrogate Key (SK) vs Business Key/Alternate Key (BK or AK)

� SK is a “meaningless” number that is the PK in the DW

� Faster; no dependencies in source keys; avoids collision

� BK or AK is the PK in the source

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 1

� Only keep last value

� How: UPDATE

� When to use: errors, don’t want to keep history, don’t have a lot of storage

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName City

1 Jamie Vancouver

SK FName City

1 Jamie Richmond

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 2

� Keep all historical values

� HOW: Insert new records, update start and end dates

� When to use: you want to keep history

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName City StartDate EndDate

1 Jamie Vancouver 2012-01-01 2012-03-10

2 Jamie Richmond 2012-03-01 NULL

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 3

� Keep only current, plus last value before current

� How: one more column that has previous value

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName PreviousCity EffectiveDate CurrentCity

1 Jamie Vancouver 2012-03-10 Richmond

SSIS REFRESHER

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

BIDS Tour

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SSIS Components – Data Flow

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Source Transformation Destination

Common Transformations

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Variables

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

� Allow you to have more dynamic (and powerful) SSIS packages

� Allow you to set properties at runtime

� “Configurable” properties depend on which component you’re working with

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

� Check if field is null� ISNULL(RegistrationDate)

� Checks if SellDate is June or later� DATEPART("mm",SellStartDate) > 6

� Calculates markup� ROUND(ListPrice / StandardCost,2) * 100

� Checks if product is a manufactured finished good and size is not a specified bike size� MakeFlag == TRUE && FinishedGoodsFlag == TRUE && Size !=

(DT_WSTR,10)@BikeSize

� Ternary Operator� UPPER(SUBSTRING(Color,1,1)) + LOWER(SUBSTRING(Color,2,15))

� File with today’s date� "C:\\SSIS\\File_" + (DT_WSTR, 30)(DT_DBDATE)GETDATE() + ".txt"

MISSION POSSIBLE

Our mission for today, should we choose to accept it …

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Sources

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Excel 2007

Access 2007

MySQL

Text File

XML

Mission Accomplished

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

GENERAL SSIS TIPS

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: Package Configuration Files

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: EncryptSensitiveWithUserKey

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: Credentials and Proxies

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

What we covered

� What ETL is

� SSIS Refresher

� Demo, demo, demo

� SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

ETL/SSIS Resources

� BI Developer Network

� http://www.bidn.com/

� Jamie Thomson – SSIS Junkie

� http://sqlblog.com/blogs/jamie_thomson/

� Pragmatic Works

� http://pragmaticworks.com/

� Matt Masson – SSIS Team Blog

� http://blogs.msdn.com/b/mattm/

� Andy Leonard

� http://sqlblog.com/blogs/andy_leonard/

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

THANK YOU!

Presentation materials will be at my blog

http://www.sqlmusings.com

Demo files

Slides

Cheatsheets

Questions? Comments? Email me

[email protected]