Upload
maneesh-gupta
View
137
Download
2
Embed Size (px)
Citation preview
Company
LOGOA Generalized Lesson in ETL Architecture
Presented by Wes Dumey
Durable Impact Consulting, Inc.
June 11, 2007
Agenda
1. ETL Overview1. ETL Overview
2. When is ETL Appropriate?2. When is ETL Appropriate?
3. Tools vs. Hard Coding3. Tools vs. Hard Coding
4. ETL Architecture 4. ETL Architecture
• ETL Overview– 20 Minutes
• ETL Design Tips– 20 Minutes
• Demonstration– 20 Minutes
• Ask questions at any time
Speaker Biography• Senior Consultant, Durable Impact Consulting,
Inc.• Experience on high-performance data
warehouses• Education
– B.S. in Computer Information Systems• Missouri State University
– M.A. in Business Economics (in progress)• University of South Florida
• External interests: aviation (Private pilot) and economics
ETL Overview
• Extract Transform and Load is used to populate a data warehouse
• Extract is where data is pulled from source systems– SQL connect over networks– Flat files– Transaction messaging (MSMQ)
• Transformations can be the most complex part of data warehousing– Convert text to numbers– Apply business logic in this stage
• Load is where data is loaded into the data warehouse – Sequential or bulk loading
ETL?
• Many companies find value in the graphical representation of data and use it in other applications as well
• ETL is very efficient when designed correctly
ETL Tools vs. Hard Coding
• Many shops still use hard code (triggers, procedures, code blocks)
• Hard to maintain code• Hard to scale properly• ETL tools easy to visualize flows• With SSIS, there is no good reason to not
use an ETL tool
What is going on here?
ETL Design Methodology
• Steps for successful ETL Design1. Clear and concise requirements
2. Modularized design
3. Data cleansing capability
4. High Emphasis on Data Quality
5. Functional Testing
6. Sufficient Documentation• See the methodology document
ETL Methodology Steps
1. Extract the data – pulls data
2. Load PSA and audit tables
3. Source Load Temp – sources and cleanses data
4. Lookup Dimensions – extract records for update
5. Lookup Facts
6. Transform Facts
7. Transform Dimensions
8. Quality Check
9. Load
Design Considerations
• Naming conventions and comments• Standard approaches allow for:
– Quick, micro-batch processing (if desired)– Ability to pause/resume, resurrection
• Data cleansing • Legal requirements (HIPAA, SOX)• Industry-standard best practices• Data retention
– Archive vs Purge• Quality
Demonstration
Let’s Get Started
• Gather Functional Requirements• Build the Data Model• Write Technical Specification• Construct• Test
Follows the systems development lifecycle
IECT
Closing Info
• Presenter Information• Blog
– www.thedamndata.com “A techies’ discussion of databases, datawarehouses, and the damn data itself”
– Covering SQL Server 2005, Oracle, IBM Websphere DataStage ETL tool, SSIS, and whatever the hell else is on my mind
– Check it out – funny and hopefully informative• Corporate Information
– www.durableimpact.com – Durable Impact Consulting• Presenting finalized EDW at Tampa Code Camp
Cycle Diagram
Text
TextText
Text
Text
Add Your Text
Cycle Name
Progress Diagram
Phase 1Phase 1 Phase 2Phase 2 Phase 3Phase 3
Block Diagram
TEXT TEXT TEXT TEXT
TEXT TEXT TEXT TEXT
Table
TEXT TEXT TEXT TEXT TEXT
Title A
Title B
Title C
Title D
Title E
Title F
3-D Pie Chart
TEXT
TEXT
TEXT
TEXTTEXT
TEXT
Marketing Diagram
Title
TEXT TEXTTEXT TEXT