finalpresentation-111220200340-phpapp01

Embed Size (px)

Citation preview

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    1/18

    Data Warehousing

    A data warehouse is a subject-oriented,integrated, time-variant, and nonupdatablecollection of data in support of managementsdecision-making process.

    Subject-Oriented High level Entities like Customers,Patients, Students, Products andtime.

    Integrated Data gathered from severalinternal system of records or fromsources external to the

    organization.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    2/18

    Time-Variant Time dimension is used in DataWarehousing to study the trends and

    changes.

    Nonupdatable New data is always added as asupplement to DB, rather thanreplacement. The DB continuallyabsorbs this new data, incrementally

    integrating it with previous data.

    Data warehousecan be more than onedatabase

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    3/18

    In Simple Words

    A data warehouse is simply a single,

    complete, and consistent store of data

    obtained from a variety of sources and

    made available to end users in a way they

    can understand and use it in a business

    context.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    4/18

    Problem: Heterogeneous

    Information Sources

    Heterogeneities are

    everywhere

    Different interfaces

    Different data representations

    Duplicate and inconsistent information

    Combined research results from different bioinformatics repositories

    PersonalDatabases

    Digital Libraries

    Scientific DatabasesWorldWide

    Web

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    5/18

    Goal: Unified Access to Data

    Integration System

    Collects and combines information

    Provides integrated view, uniform user interface

    Supports sharing

    World

    Wide

    Web

    Digital Libraries Scientific Databases

    Personal

    Databases

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    6/18

    The Need for Data Warehousing

    1. A business requires an integrated,

    companywide view of high quality

    information.

    2. The information systems department

    must separate informational from

    operational systems( system of records)

    to improve performance dramatically inmanaging company data.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    7/18

    Why a Warehouse

    For analysis and decision support, end users

    require access to data captured and stored in an

    organizations operational or production

    systems.

    This data is stored in multiple formats, on

    multiple platforms, in multiple data structures,

    with multiple names, and probably created using

    different business rules

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    8/18

    Why should we consider Data

    Warehousing solutions ?

    When users are requesting access to a large amount of

    historical information for reporting purposes, you

    should strongly consider a warehouse. The user will

    benefit when the information is organized in an

    efficient manner for this type of access.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    9/18

    An Example to look at the need of

    Data Warehousing

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    10/18

    Data Warehouse Components

    CombinedData

    Warehouse

    DecisionSupport Tools

    Management ReportingSales/Marketing

    Customer RelationsReserve Analysis

    Risk Analysis

    Data WarehouseComponents

    Customers

    Policies

    PremiumsClaims

    Reserves

    Rates

    Extract ProgramsData Cleansers/ScrubbersTranslators/Transformers

    Timing ToolsData LoadingFile Transfer

    MainframeAppli cations

    PCAppli cations

    DB2/2

    ExternalSources

    ???

    Midrange

    DB/6000

    DB/400

    IMS

    VSAMDB/2

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    11/18

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    12/18

    Administration and Management

    Tools

    a data warehouse requires tools to support theadministration and management of suchcomplex enviroment.

    for the various types of meta-data and the day-

    to-day operations of the data warehouse, theadministration and management tools must becapable of supporting those tasks:

    monitoring data loading from multiple sources

    data quality and integrity checksmanaging and updating meta-data

    monitoring database performance to ensure efficient queryresponse times and resource utilization

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    13/18

    auditing data warehouse usage to provideuser chargeback information

    replicating, subsetting, and distributing

    data maintaining effient data storage

    management

    archiving and backing-up data implementing recovery following failure

    security management

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    14/18

    In computers, the path of data from source

    document to data entry to processing to

    final reports. Data changes format and

    sequence (within a file) as it moves fromprogram to program.

    Is known as Data flow

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    15/18

    Data Flow

    Inflow- The processes associated with the extraction,cleansing, and loading of the data from the source systems into thedata warehouse.

    upflow- The process associated with adding value to the datain the warehouse through summarizing, distribution of the data.

    downflow-The processes associated with archiving andbacking-up of data in the warehouse.

    outflow- The process associated with making the dataavailabe to the end-users.

    Meta-flow-The processes associated with the managementof the meta-data.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    16/18

    Architectures

    Many database architectures has been implemented

    2 architectures need to be quoted:

    1. OLTP (OnLine Transaction Processing)

    2. Data Warehouse (OLAP)(online analytical processing)

    OLTP is used to store data and query it frequently andis based on normalized schemas.

    Data warehouse is used to store data history and is

    based on fact tables and dimension tables.

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    17/18

    Difference between

    OLTP and DataWare House

    OLTP OLAP

    users clerk, IT professional knowledge worker

    function day to day operations decision support

    DB design application-oriented subject-oriented

    data current, up-to-datedetailed

    historical,

    summarized, multidimensional

    integrated

    access read/write

    index/hash on prim. key

    lots of scans

    unit of work short, simple transaction complex query

    # records accessed tens millions

    #users thousands hundreds

    DB size 100MB-GB 100GB-TB

  • 7/31/2019 finalpresentation-111220200340-phpapp01

    18/18

    Special Thanks to

    Google.comand other sites.