Upload
amethyst-gardner
View
80
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Building the Warehouse. Chapter 10. Overview. Planning Warehouse Storage. Meeting a Business Need. Defining DW Concepts & Terminology. Choosing a Computing Architecture. ETT (Building The Warehouse). Managing The Data Warehouse. Modeling The Data Warehouse. Planning - PowerPoint PPT Presentation
Citation preview
Building the Warehouse
Chapter 10
Overview
Defining DW Concepts& Terminology
PlanningFor a
SuccessfulWarehouse
Project Management(Methodology, Maintaining Metadata)
Meeting aBusiness
Need
Choosing aComputingArchitecture
ModelingThe Data
Warehouse
AnalyzingUser Query
Needs
PlanningWarehouse
Storage
ETT(BuildingThe
Warehouse)
ETT(BuildingThe
Warehouse)
SupportingEnd UserAccess
ManagingThe Data
Warehouse
Extraction/Transformation/Transportation Process (ETT)
* Extract source data * Load data into WH
* Transform/clean data * Detect change
* Index and summarize * Refresh data
Programs
Gateways
Tools
ETT
Operational systems
Warehouse
ETT Processes
Must result in data that is relevant, useful, high-quality, accurate, and accessible
Require a large proportion of warehouse development time and resources
Clean up
Consolidate
Restructure
Relevant
Useful
Quality
Accurate
AccessibleOpertational Systems
ETT
Warehouse
Data Staging Area
The Construction site for the warehouseRequired by most implementationsComposed of ODS, flat files, or
relational server tablesFrequently configured as multitier
staging
Operationalsystem
Operationalsystem
DataStaging
area
DataStaging
areaWarehouseWarehouseExtract
Transport (Load)
Remote Staging Model
Data staging area within the warehouse environment
Operationalsystem
Operationalsystem
Oper.envt.Data
Stagingarea
DataStaging
areaWarehouseWarehouse
Operationalsystem
Operationalsystem
Data Staging
area
Data Staging
areaWarehouseWarehouse
Oper.envt.
Staging envt.
Warehouse envt.
Warehouse environment
Data staging area in its own environment, avoiding negative impact on the warehouse environment
Extract, Transform,transport
Transport (Local)
Onsite Staging Model
Data staging area within the operationalenvironment, possibly affecting the operationalsystem
Operationalsystem
Operationalsystem
Datastaging
area
Datastaging
areaWarehouseWarehouse
WH envt.Operational environment
TransformExtract
Extracting Data
Routines developed to select fields from sourceVarious data formatsRules, audit trails, error correction facilities
Operational databases
Warehouse database
DataStagingarea
Transform
Datamapping
Source Systems
ProductionArchiveInternalExternal
Production Data
Operating system platformsHardware platformsFile systemsDatabase systems and vertical applications
IMSDB2VSAMNonStop SQLOracleSybaseRdb
SAPShared MedicalSystemsDun and BradstreetFinancialsHogan FinancialsOracle Financials
Archive Data
Historical dataUseful for analysis over long periods of timeUseful for first-time loadMay require unique transformations
Operational database
Warehouse database
Internal Data
Planning, sales, and marketing organization data
Maintained by: - Spreadsheets (structured) - Documents (unstructured)Treated like any other source data
Planning
Marketing
Accounting Warehousedatabase
External Data
Information from outside the organization Issues of frequency, format, and predictabilityDescribed and tracked using metadata
A.C.Nielsen, IRI, IMS,Waish America Competitive
information
Economicforecasts
Wall StreetJournal
Warehousingdatabases
Barron’s
Dun and Bradstreet
Purchaseddatabases
Mapping Defines which operational attributes to use Defines how to transform the attributes for the
warehouse Defines where the attributes exist in the warehouse Mapping tools are available
MetadataFile A Staging File OneF1 NumberF2 NameF3 DOB
File AF1 123F2 BloggsF3 10/12/56
Staging File OneNumber USA123Name Mr.BloggsDOB 10-Dec-56
Extraction Techniques
Programs: C, COBOL, PL/SQLGateways: transparent database
accessIn-house development is popularTools - High initial cost - Ongoing automation - Data cleanup
Sources and Targets
Data marts
Data analysis
Data mining
OLAP
Designing Extraction Processes
Analysis: - Source, technologies - Data types, quality, ownersDesign options: - Manual, custom, gateway, third-party - Replication, full, or delta refreshDesign issues: - Batch window, volumes, data currency - Automation, skills needed, resources
Maintaining Extraction Metadata
Source location, type, structureAccess methodPrivilege informationTemporary storageFailure proceduresValidity checksHandlers for missing data
Possible ETT Failure
A missing source fileA system failurePoor mapping informationInadequate storage planningA source structural changeNo contingency planInadequate data validation
Maintaining ETT Quality
ETT must be: - Tested - Documented - Monitored and reviewedDisplay metadata must be
coordinated
Selection CriteriaBase functionality Interface featuresMetadata repositoryOpen APIMetadata accessRepository utilities Input and output processingCleansing, reformatting, and auditingReferenceTraining requirements
WTI Partner ETT Tools
CarletonConstellarEvolutionary Technologies Informatica Information BuildersOracle EDMS, Toolkits, OADWPrism SolutionsSagentVality Technology
Summary
This lesson discussed the following topics:ETT processes are essential and consume a
large proportion of warehouse resources and time
The extraction process acquires source data
You may encounter many data sourcesThere are many data extraction issuesETT Tools should be considered