20
Company LOGO A Generalized Lesson in ETL Architecture Presented by Wes Dumey Durable Impact Consulting, Inc. June 11, 2007

A Generalized Lesson in ETL Architecture

Embed Size (px)

Citation preview

Page 1: A Generalized Lesson in ETL Architecture

Company

LOGOA Generalized Lesson in ETL Architecture

Presented by Wes Dumey

Durable Impact Consulting, Inc.

June 11, 2007

Page 2: A Generalized Lesson in ETL Architecture

Agenda

1. ETL Overview1. ETL Overview

2. When is ETL Appropriate?2. When is ETL Appropriate?

3. Tools vs. Hard Coding3. Tools vs. Hard Coding

4. ETL Architecture 4. ETL Architecture

Page 3: A Generalized Lesson in ETL Architecture

• ETL Overview– 20 Minutes

• ETL Design Tips– 20 Minutes

• Demonstration– 20 Minutes

• Ask questions at any time

Page 4: A Generalized Lesson in ETL Architecture

Speaker Biography• Senior Consultant, Durable Impact Consulting,

Inc.• Experience on high-performance data

warehouses• Education

– B.S. in Computer Information Systems• Missouri State University

– M.A. in Business Economics (in progress)• University of South Florida

• External interests: aviation (Private pilot) and economics

Page 5: A Generalized Lesson in ETL Architecture

ETL Overview

• Extract Transform and Load is used to populate a data warehouse

• Extract is where data is pulled from source systems– SQL connect over networks– Flat files– Transaction messaging (MSMQ)

• Transformations can be the most complex part of data warehousing– Convert text to numbers– Apply business logic in this stage

• Load is where data is loaded into the data warehouse – Sequential or bulk loading

Page 6: A Generalized Lesson in ETL Architecture

ETL?

• Many companies find value in the graphical representation of data and use it in other applications as well

• ETL is very efficient when designed correctly

Page 7: A Generalized Lesson in ETL Architecture

ETL Tools vs. Hard Coding

• Many shops still use hard code (triggers, procedures, code blocks)

• Hard to maintain code• Hard to scale properly• ETL tools easy to visualize flows• With SSIS, there is no good reason to not

use an ETL tool

Page 8: A Generalized Lesson in ETL Architecture

What is going on here?

Page 9: A Generalized Lesson in ETL Architecture

ETL Design Methodology

• Steps for successful ETL Design1. Clear and concise requirements

2. Modularized design

3. Data cleansing capability

4. High Emphasis on Data Quality

5. Functional Testing

6. Sufficient Documentation• See the methodology document

Page 10: A Generalized Lesson in ETL Architecture

ETL Methodology Steps

1. Extract the data – pulls data

2. Load PSA and audit tables

3. Source Load Temp – sources and cleanses data

4. Lookup Dimensions – extract records for update

5. Lookup Facts

6. Transform Facts

7. Transform Dimensions

8. Quality Check

9. Load

Page 11: A Generalized Lesson in ETL Architecture

Design Considerations

• Naming conventions and comments• Standard approaches allow for:

– Quick, micro-batch processing (if desired)– Ability to pause/resume, resurrection

• Data cleansing • Legal requirements (HIPAA, SOX)• Industry-standard best practices• Data retention

– Archive vs Purge• Quality

Page 12: A Generalized Lesson in ETL Architecture

Demonstration

Page 13: A Generalized Lesson in ETL Architecture

Let’s Get Started

• Gather Functional Requirements• Build the Data Model• Write Technical Specification• Construct• Test

Follows the systems development lifecycle

IECT

Page 14: A Generalized Lesson in ETL Architecture

Closing Info

• Presenter Information• Blog

– www.thedamndata.com “A techies’ discussion of databases, datawarehouses, and the damn data itself”

– Covering SQL Server 2005, Oracle, IBM Websphere DataStage ETL tool, SSIS, and whatever the hell else is on my mind

– Check it out – funny and hopefully informative• Corporate Information

– www.durableimpact.com – Durable Impact Consulting• Presenting finalized EDW at Tampa Code Camp

Page 15: A Generalized Lesson in ETL Architecture

Cycle Diagram

Text

TextText

Text

Text

Add Your Text

Cycle Name

Page 16: A Generalized Lesson in ETL Architecture

Progress Diagram

Phase 1Phase 1 Phase 2Phase 2 Phase 3Phase 3

Page 17: A Generalized Lesson in ETL Architecture

Block Diagram

TEXT TEXT TEXT TEXT

TEXT TEXT TEXT TEXT

Page 18: A Generalized Lesson in ETL Architecture

Table

TEXT TEXT TEXT TEXT TEXT

Title A

Title B

Title C

Title D

Title E

Title F

Page 19: A Generalized Lesson in ETL Architecture

3-D Pie Chart

TEXT

TEXT

TEXT

TEXTTEXT

TEXT

Page 20: A Generalized Lesson in ETL Architecture

Marketing Diagram

Title

TEXT TEXTTEXT TEXT