Overview Of Data Warehousing

Embed Size (px)

Citation preview

  • 8/14/2019 Overview Of Data Warehousing

    1/62

    SUSHILSUSHIL

    KULKARNIKULKARNI

    OVERVIEW OFOVERVIEW OF

    DATA WAREHOUSINGDATA WAREHOUSING

  • 8/14/2019 Overview Of Data Warehousing

    2/62

  • 8/14/2019 Overview Of Data Warehousing

    3/62

    SUSHIL KULKARNI

    DATABASEDATABASE

    VSVSDATA WAREHOUSEDATA WAREHOUSE

  • 8/14/2019 Overview Of Data Warehousing

    4/62

    DATABASEDATABASE

    VSVS

    DATA WAREHOUSEDATA WAREHOUSE

    SUSHIL KULKARNI

    To accelerate decision making:

    1. Right information,

    2. Right time,

    3.Easily accessible

    Problems with database

    1. Fragments

    2. Operational / Information

    Processing

  • 8/14/2019 Overview Of Data Warehousing

    5/62

    SUSHIL KULKARNI

    PROBLEMS !PROBLEMS !

  • 8/14/2019 Overview Of Data Warehousing

    6/62

    PROBLEMSPROBLEMS

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    7/62

    PROBLEMS !PROBLEMS !

    SUSHIL KULKARNI

    IT business requires:

    1. integrated

    2. company-wide view of high quality

    3. Fixed network with changing users

    Informational Processing systemsdepartment must be separated fromoperational systems to improveperformance

  • 8/14/2019 Overview Of Data Warehousing

    8/62

    PROBLEMS !PROBLEMS !

    SUSHIL KULKARNI

    No single system of data.

    View of databases as a whole is difficult

    Organization wants to analyze theactivities in a balance way

    Customer relationship with management

  • 8/14/2019 Overview Of Data Warehousing

    9/62

    SUSHIL KULKARNI

    DEFINITIONDEFINITION

    OFOFDATA WAREHOUSEDATA WAREHOUSE

  • 8/14/2019 Overview Of Data Warehousing

    10/62

    SOSO WHAT ISWHAT IS

    DATA WAREHOUSE ?DATA WAREHOUSE ?

    SUSHIL KULKARNI

    Subject-oriented:

    customers, patients, students,products, time.

    Integrated: Gathered CENTRALLY from

    1.several internal systems of records

    2. sources external to the organization

  • 8/14/2019 Overview Of Data Warehousing

    11/62

    WHAT ISWHAT IS

    DATA WAREHOUSE ?DATA WAREHOUSE ?

    SUSHIL KULKARNI

    Time - variant:

    Use to study trends and changes.

    Non - updatable:

    cannot updated by end users.

  • 8/14/2019 Overview Of Data Warehousing

    12/62

    SUSHIL KULKARNI

    DATA SYSTEMSDATA SYSTEMS

  • 8/14/2019 Overview Of Data Warehousing

    13/62

    SUSHIL KULKARNI

    DATA SYSTEMS AVAILABLE

    INFORMATION SYSTEMSOPERATIONAL SYSTEMS

  • 8/14/2019 Overview Of Data Warehousing

    14/62

    OPERATIONAL SYSTEMSOPERATIONAL SYSTEMS

    SUSHIL KULKARNI

    Used to run a business in real time basedon current data and process large volumesof relatively simple read/write transactions,while providing fast response.

    Examples

    1. Sales order processing2. Reservation systems

    3. Patient registration

  • 8/14/2019 Overview Of Data Warehousing

    15/62

    INFORMATION SYSTEMSINFORMATION SYSTEMS

    SUSHIL KULKARNI

    Designed to support decision-making based on

    1. Historical data

    2. Prediction data.

    Designed for complex queries or data-miningapplications.

    Examples:

    1. Sales trend analysis,

    2. Customer segmentation

    3. Human resources planning

  • 8/14/2019 Overview Of Data Warehousing

    16/62

    DIFFERENCEDIFFERENCE

    SUSHIL KULKARNI

    Periodical batchupdates and queriesrequiring many or allrows

    Many, constantupdates and querieson one or a few tablerows

    Volume

    Ease of flexible accessand use

    Performancethroughput,

    availability

    Design goal

    Broad, ad hoc,complex queries andanalysis

    Narrow, planned, andsimple updates andqueries

    Scope of usage

    Managers, businessanalysts, customers

    Clerks, sales-persons,administrations

    Primary users

    Real and analyzehistorical data.

    Real time data entryPurpose

    Informational

    Systems

    Operational

    Systems

    Characteristics

  • 8/14/2019 Overview Of Data Warehousing

    17/62

    SUSHIL KULKARNI

    DATA WAREHOUSEDATA WAREHOUSESCOPESCOPE

  • 8/14/2019 Overview Of Data Warehousing

    18/62

    DATA WAREHOUSE SCOPEDATA WAREHOUSE SCOPE

    SUSHIL KULKARNI

    Broad :

    Required for companies, Very costly, Maybe divided according to Depts.

    Narrow:

    Required for Personal information

  • 8/14/2019 Overview Of Data Warehousing

    19/62

    SUSHIL KULKARNI

    TYPESTYPES

    OFOFDATA WAREHOUSEDATA WAREHOUSE

  • 8/14/2019 Overview Of Data Warehousing

    20/62

    TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE

    SUSHIL KULKARNI

    Point Point

    End-users allowed to get operational

    databases directly using any tools

  • 8/14/2019 Overview Of Data Warehousing

    21/62

    TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE

    SUSHIL KULKARNI

    Central Data Warehouses

  • 8/14/2019 Overview Of Data Warehousing

    22/62

    TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE

    SUSHIL KULKARNI

    1. EIS : Executive Information System 2. DSS: Decision Support System

    3. Reporting

    Distributed Data Warehouse:

    Certain Components of DW are distributedacross a number of different physicaldatabases

  • 8/14/2019 Overview Of Data Warehousing

    23/62

    SUSHIL KULKARNI

    BIG PICTUREBIG PICTURE

  • 8/14/2019 Overview Of Data Warehousing

    24/62

    BIG PICTUREBIG PICTURE

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    25/62

    SUSHIL KULKARNI

    END USERSEND USERS

  • 8/14/2019 Overview Of Data Warehousing

    26/62

    END USERSEND USERS

    SUSHIL KULKARNI

    Executives and managers

    "Power" users (business and financialanalysts, engineers, etc.)

    Support users (clerical, administrative,

    etc.)

  • 8/14/2019 Overview Of Data Warehousing

    27/62

    SUSHIL KULKARNI

    DATA MARTDATA MART

  • 8/14/2019 Overview Of Data Warehousing

    28/62

    DATA MARTSDATA MARTS

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    29/62

    DATA MARTSDATA MARTS

    SUSHIL KULKARNI

    Create many DMs

    Limited scope Independent ETL process or derived from

    DW

    Examples:

    1.Financial DM

    2. Marketing DM

    3. Supply chain DM

  • 8/14/2019 Overview Of Data Warehousing

    30/62

    D.M. PICTURED.M. PICTURE

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    31/62

    SUSHIL KULKARNI

    DATADATA

    ININDATA WAREHOUSEDATA WAREHOUSE

    DATA INDATA IN

  • 8/14/2019 Overview Of Data Warehousing

    32/62

    DATA INDATA IN

    DATA WARE HOUSEDATA WARE HOUSE

    SUSHIL KULKARNI

    one version of the truth across theenterprisewith meaning full recordesFor IT staff : clean, consistent, and

    documented formatted data.

    For engineer or analyst: convenient, in acommon formatted data, exportable toother common formats

    DATA INDATA IN

  • 8/14/2019 Overview Of Data Warehousing

    33/62

    Production Data: Data from different

    Operational systems with heterogeneous

    platforms

    Internal Data: Private data of organization

    like spread sheets, documents, customerprofiles

    DATA INDATA IN

    DATA WARE HOUSEDATA WARE HOUSE

    SUSHIL KULKARNI

    DATA INDATA IN

  • 8/14/2019 Overview Of Data Warehousing

    34/62

    External Data: Data from external sources.Statistics relating to their industry

    produced by external agencies

    Example: DW of car rental company

    contains data on the current productionschedules of the leading automobilemanufactures

    DATA INDATA IN

    DATA WARE HOUSEDATA WARE HOUSE

    SUSHIL KULKARNI

    DATA INDATA IN

  • 8/14/2019 Overview Of Data Warehousing

    35/62

    Archived Data: Data from current business

    and old data store in archive files

    DATA INDATA IN

    DATA WARE HOUSEDATA WARE HOUSE

    SUSHIL KULKARNI

    DATA INDATA IN

  • 8/14/2019 Overview Of Data Warehousing

    36/62

    Methods of archiving data:

    1. Recent data is archived to separate

    archival database that may be online 2. Old data is archived to flat files on disk

    storage

    3. Oldest data is archived to tapecartridges or microfilms or kept off line

    DATA INDATA IN

    DATA WARE HOUSEDATA WARE HOUSE

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    37/62

    SUSHIL KULKARNI

    DATA BASEDATA BASE

    STRUCTURESTRUCTURE

  • 8/14/2019 Overview Of Data Warehousing

    38/62

    DATA BASE STRUCTUREDATA BASE STRUCTURE

    SUSHIL KULKARNI

    DW made up of three separate databases:

    1. Interim data store

    2. Meta data repository

    3. Production DW

  • 8/14/2019 Overview Of Data Warehousing

    39/62

    SUSHIL KULKARNI

    OLTPOLTP

    ANDANDOLAPOLAP

  • 8/14/2019 Overview Of Data Warehousing

    40/62

    OLTPOLTP

    SUSHIL KULKARNI

    On line transaction processing

    Standard Normalized Structure

    Designed for transactions: Insert, Updates,

    Delete

  • 8/14/2019 Overview Of Data Warehousing

    41/62

    OLAPOLAP

    SUSHIL KULKARNI

    On line analytical processing , StarSchema [See Table]

    Read Only

    Historical data

    Aggregated data

  • 8/14/2019 Overview Of Data Warehousing

    42/62

    SUSHIL KULKARNI

    ARCHETECTUREARCHETECTURE

    ARCHITECTUREARCHITECTURE

  • 8/14/2019 Overview Of Data Warehousing

    43/62

    ARCHITECTUREARCHITECTURE

    ANDAND

    END-TO - PROCESSEND-TO - PROCESS

    SUSHIL KULKARNI

    BACK END TOOLSBACK END TOOLS

  • 8/14/2019 Overview Of Data Warehousing

    44/62

    BACK END TOOLSBACK END TOOLS

    ANDAND

    UTILITIESUTILITIES

    SUSHIL KULKARNI

    Tools are used to extract & loading data

    Data extraction from foreign sources bygateways & interfaces

    Examples: EDA/SQL, ODBC, Oracle OpenConnect, Sybase Enterprise ConnectInformix Enterprise gate way

  • 8/14/2019 Overview Of Data Warehousing

    45/62

    PROCESS OF BRINGING DATAPROCESS OF BRINGING DATA

    TO DATA WAREHOUSETO DATA WAREHOUSE

    ETL PROCESSETL PROCESS

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    46/62

    Large volumes of data from multiple

    sources are involvedHigh probability of errors and anomalies in

    the data

    Tools that help to detect data anomaliesand correct them can have a high payoff

    CLEANINGCLEANING

    SUSHIL KULKARNI

    CLEANINGCLEANING

  • 8/14/2019 Overview Of Data Warehousing

    47/62

    Examples where data cleaning becomesnecessary are:

    1. Inconsistent field lengths,

    2. Inconsistent descriptions,

    3. Inconsistent value assignments,

    4. Missing entries and violation of

    integrity constraints.Different, classes of data cleaning tools used to

    extract & loading data

    1. Data Migration

    2. Data scrubbing

    3. Data Auditing tools

    CLEANINGCLEANING

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    48/62

    Data migrationtools allow simple

    transformation rules to be specified

    Examples: replace the string genderby

    sex.

    Warehouse Manager from Prism is anexample of a popular tool of this kind.

    DATA MIGRATIONDATA MIGRATION

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    49/62

    Data scrubbingtools use domain-specific

    knowledge

    Example: Postal addresses, to do the

    scrubbing of data.

    Use parsing and fuzzy matchingtechniques to accomplish cleaning from

    multiple sources.

    Tools: Integrity and Trillum

    DATA SCRUBBINGDATA SCRUBBING

    SUSHIL KULKARNI

    DATA AUDITINGDATA AUDITING

  • 8/14/2019 Overview Of Data Warehousing

    50/62

    Data auditingtools make it possible todiscover rules and relationships by

    scanning data.

    Example: Tool may discover a suspicious

    pattern (based on statistical analysis) that

    a certain car dealer has never receivedany complaints.

    DATA AUDITINGDATA AUDITING

    SUSHIL KULKARNI

    LOADINGLOADING

  • 8/14/2019 Overview Of Data Warehousing

    51/62

    Additional preprocessing required:

    1.Checking integrity constraints

    2. Sorting; summarization, aggregation

    3.Other computation to build the derived tables

    stored in the warehouse

    Batch load utilities are used for this purpose. Inaddition to populating the warehouse, a load utility

    must allow the system administrator to monitorstatus, to cancel, suspend and resume a load, andto restart after failure with no loss of data integrity.

    LOADINGLOADING

    SUSHIL KULKARNI

    REFRESHREFRESH

  • 8/14/2019 Overview Of Data Warehousing

    52/62

    Refreshing a warehouse consists inpropagating updates on source data tocorrespondingly update the base data andderived data stored in the warehouse.

    Two sets of issues: whento refresh, andhowto refresh.

    REFRESHREFRESH

    SUSHIL KULKARNI

    SUMMARIZATIONSUMMARIZATION

  • 8/14/2019 Overview Of Data Warehousing

    53/62

    Required lot of space to store and requirecomputer time as well as resources.Some of the summaries may containfigures that explain the summary.

    Advantage is that the data warehouse isnot calculating the summaries.

    SUMMARIZATIONSUMMARIZATION

    SUSHIL KULKARNI

    METADATAMETADATA

  • 8/14/2019 Overview Of Data Warehousing

    54/62

    Administrative metadata

    Business metadata includes business

    terms and definitions,

    Operational metadata includesinformation that is collected during theoperation of the warehouse:

    METADATAMETADATA

    SUSHIL KULKARNI

  • 8/14/2019 Overview Of Data Warehousing

    55/62

    The ETL Process

    Capture

    Scrub or data cleansing

    Transform

    Load and Index

    ETL = Extract, transform, and load

    SSt i d t ili ti

  • 8/14/2019 Overview Of Data Warehousing

    56/62

    Steps in data reconciliationSteps in data reconciliation

    Static extract=

    capturing a snapshot of

    the source data at a

    point in time

    Incremental extractIncremental extract=

    capturing changes that

    have occurred since the

    last static extract

    Capture = extractobtaining a snapshot of a

    chosen subset of the source data for loading

    into the data warehouse

    Steps in data reconciliationSteps in data reconciliation

  • 8/14/2019 Overview Of Data Warehousing

    57/62

    Scrub = cleanseuses pattern

    recognition and AI techniques to

    upgrade data quality

    Fixing errors:misspellings,

    erroneous dates, incorrect field

    usage, mismatched addresses,

    missing data, duplicate data,

    inconsistencies

    Also:decoding, reformatting,

    time stamping, conversion, key

    generation, merging, error

    detection/logging, locating

    missing data

    Steps in data reconciliationSteps in data reconciliation

    Steps in data reconciliationSteps in data reconciliation

  • 8/14/2019 Overview Of Data Warehousing

    58/62

    Transform = convert data from

    format of operational system to

    format of data warehouse

    Record-level:Selection data partitioning

    Joining data combining

    Aggregation data

    summarization

    Field-level:single-field from one field to

    one field

    multi-field from many fields to

    one, or one field to many

    Steps in data reconciliationSteps in data reconciliation

    Steps in data reconciliationSteps in data reconciliation

  • 8/14/2019 Overview Of Data Warehousing

    59/62

    Load/Index = place transformed

    data into the warehouse and

    create indexes

    Refresh mode:bulkrewriting of target data at

    periodic intervals

    Update mode: onlychanges in source data are

    written to data warehouse

    Steps in data reconciliationSteps in data reconciliation

    Si l fi ld t f tiSi l fi ld t f ti

  • 8/14/2019 Overview Of Data Warehousing

    60/62

    Single-field transformationSingle-field transformation

    In general some transformation function

    translates data from old form to new form

    Algorithmic transformation uses a formula

    or logical expression

    Tablelookup another approach

    Multi field transformationMulti field transformation

  • 8/14/2019 Overview Of Data Warehousing

    61/62

    Multi field transformationMulti field transformation

    M:1 from many source

    fields to one target field

    1:M from one

    source field to

    many target fields

  • 8/14/2019 Overview Of Data Warehousing

    62/62

    T H A N K S !T H A N K S !