Data Warehouse Concepts

  • View
    6

  • Download
    3

Embed Size (px)

Text of Data Warehouse Concepts

Data Warehouse Concepts

ContentsData & Information Introduction to Data warehouse (DWH) Characteristics of DWH Operational System Vs DWH DWH Architectures Data Marts Metadata

Data & InformationA fundamental concept of data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems. In a data warehouse environment, data only comes to have value to end-users when it is organized and presented as information. Information is an integrated collection of facts and is used as the basis for decision making.

Introduction to Data WarehouseDefinitions:"A data warehouse is a subject oriented, integrated, time-variant, non volatile collection of data in support of management's decision making process". A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. A Data Warehouse is a structured repository (Subject Oriented) of Historic Data.

Data warehouses separate analysis part from transactional part and enables the organization to collect data from several sources.

Characteristics of Data WarehouseData Warehouse is usually: Subject Oriented Integrated Non-Volatile Time-Variant Accessible & Process Oriented

Subject OrientedSales

DWH

Marketing

Finance

Information is presented according to specific subjects or Information is presented according to specific subjects or areas of interest. areas of interest. Data is manipulated to provide information about a particular Data is manipulated to provide information about a particular subject. subject.

IntegratedOperational Systems Appln A m/f Appln B 1/0 Appln C Male/Female Appln A Bal_On_Hand Appln B Current_Balance Appln C Cash_On_Hand DWH m/f

Current_Balance

Though the data in the data warehouses is scattered around different tables, databases or even servers but the data is integrated consistently in the values of variables, naming conventions and physical data definitions (datatype).

Time-VariantOperational Systems View of Business Today DWH Designated Time Frame (3 10 years). DWH stores historical data.

Contains a history of the subject, as well as current Contains a history of the subject, as well as current information. information. Historical information is an important component of a data Historical information is an important component of a data warehouse. warehouse.

Non-VolatileOperational SystemsInsert Create Read Load

DWHRead Read

Update

Read Delete

Read Read Read Only

Stable information that doesnt change each time an Stable information that doesnt change each time an operational process is executed. Information is consistent operational process is executed. Information is consistent regardless of when the warehouse is accessed. regardless of when the warehouse is accessed. There exist only two operations time based loading of data, There exist only two operations time based loading of data, accessing the loaded data. accessing the loaded data.

Accessible & Process OrientedAccessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information.

Operational System Vs Data WarehouseOperational System Characteristics Data Focused, Transaction Processing focused system. Current, Near-term (Today, Last week). Day-to-day decisions, Current operational results. Twice daily, Daily, Weekly. Data Warehouse Subject Oriented, Integrated, Non-Volatile, Time-Variant. Historic (Last month, Quarterly, Five years). Long-term decisions, Reporting, Trend detection. Weekly, Monthly, Quarterly.

Age of the data

Primary Use

Frequency of load

DWH ArchitecturesData Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts)

DWH Architectures (contd..)Operational SystemsData Extraction Operational System Data Transformation

Data WarehouseData Storing Data Access

Users

Meta Data

Analysis

DWHReporting

Data Loading Legacy Systems

Data Warehouse Architecture (Basic)

Mining

DWH Architectures (contd..)Operational SystemsData Extraction Operational System Data Transformation

Data WarehouseData Storing Data Access

Users

Staging Area

Meta Data

Analysis

DWHReporting

Data Loading Legacy Systems

Data Warehouse Architecture (with a Staging Area)

Mining

DWH Architectures (contd..)Operational SystemsData Extraction Operational System Data Transformation Staging Area

Data WarehouseData Storing Data Marts Data Access

Users

Meta Data

Analysis Sales

DWHMarketing Reporting

Data Loading Legacy Systems Finance

Data Warehouse Architecture (with a Staging Area and Data Marts)

Mining

Data MartsData Marts:Data mart is a subset of DWH. A data mart is a specialized version of a DWH. A data mart configuration emphasizes easy access to relevant information.

DWH

Data Marts

Data Marts (contd..)Dependent data mart: Data can be derived from an enterprise-wide data warehouse. Independent data mart: Data can be collected directly from sources.

Data Marts (contd..)Reasons for creating a Data martEases access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation Lower cost than implementing a full Data warehouse

MetadataMetadata:Metadata is data about data. Something can be data and metadata at the same time. It is possible to create meta-meta-...-metadata.

Metadata is used to speed up and enrich searching for resources.E.g: Browsers automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched.

Questions ?

Thank You !