18
Data Warehouse Dimensional Model Components Concept Dimensional model is equivalent of logical data design of Data Warehouse, and much more. It is more simplistic in design and suits the purpose of a data warehouse. Dimensional Modeling Concept Dimensional Model is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimension al, and it adheres to a discipline th at uses the r elational model with some important restrictions. Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact ta ble. (See Figure) This chara cteristic 'star- like' structure is often called a star join. A fact table, because it has a multi-part primary key made up of two OR more foreign keys, always expresses a many-to- many relationship. The most useful fact tables also contain one OR more numerical measures, OR 'facts,' that occur for the combination of keys that define each record. In Figure, the facts are Units_Sold, Dollars_Sold, and Avg_sales. The most useful facts in a fact table are numeric and additive. Additivity is crucial because data warehou se applications almost never retrieve a single fact ta ble record; rather, they fetch back hundreds, thousand s, OR even millions of these records at a time, and the only useful thing to do with so many records is to add them up. Dimension tables, by contrast, most often contain descriptive textual information, and the attributes (also called classification attributes), which are used for analysis. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the r ow headers in the SQL a nswer set. Fact Table and Dimension Tables in a Dimensional Model Schema

Data Warehouse Dimensional Model Components Concept

Embed Size (px)

Citation preview

Page 1: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 1/18

Data Warehouse Dimensional Model Components Concept

Dimensional model is equivalent of logical data design of Data Warehouse, and much more. It is more simplistic in designand suits the purpose of a data warehouse.

Dimensional Modeling Concept

Dimensional Model is a logical design technique that seeks to present the data in a standard, intuitive framework thatallows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational modewith some important restrictions. Every dimensional model is composed of one table with a multi-part key, called the facttable, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key thatcorresponds exactly to one of the components of the multi-part key in the fact table. (See Figure) This characteristic 'star-like' structure is often called a star join.

A fact table, because it has a multi-part primary key made up of two OR more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one OR more numerical measures, OR 'facts,' that occur for thecombination of keys that define each record. In Figure, the facts are Units_Sold, Dollars_Sold, and Avg_sales. The mostuseful facts in a fact table are numeric and additive. Additivity is crucial because data warehouse applications almost never retrieve a single fact table record; rather, they fetch back hundreds, thousands, OR even millions of these records at a time,

and the only useful thing to do with so many records is to add them up.

Dimension tables, by contrast, most often contain descriptive textual information, and the attributes (also calledclassification attributes), which are used for analysis. Dimension attributes are used as the source of most of the interestingconstraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set.

Fact Table and Dimension Tables in a Dimensional Model Schema

Page 2: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 2/18

Lets consider a Data-Warehouse cube. This cube has 4 dimensions and three measures. This means that for every value of 

each of these 4 dimensions there will two values of coordinates. For example:

Co-ordinate [City(X), Product(Y), channel(Z),Month] = [ Sales (Quantity), Sales (Value)]OR [NY, Standard Desk-top, Mail, September 2005] = [2000 units, $15000]

In the dimensional modeling schema, the FACT table contains the value of coordinates against the lowest granularity of allthe possible combinations of dimensions. The dimension tables contain the details of the dimensions, which include theattributes of dimensions including all the higher-level hierarchies. The link between the fact table and all the associated

dimension tables is through a dimension key, which is the lowest level granularity primary key of the dimension tables.

Fact Table- The central linkage in Dimensional Modeling

A fact table contains the value of all the measures linked to the set of dimensions linked to the FACT table. It contains themeasure values for the combination of lowest level of granularity of dimensions. The measures are typically numeric, which

can undergo mathematical aggregation and analysis.

Families of FACT Tables

y  Chains and Circles.y  Heterogeneous products.

y  Transactions and snapshots.

y  Aggregates

Dimension Table- What does and should it contain

The dimension table contains all the information on the dimension. This includes:

a. The primary key (Equivalent foreign key in the Fact Table).

b. All attributes of the dimension. These include:

y  The hierarchy attributes- Consider a business hierarchy-- pin-code to city to district to state to country for locatiodimension. This means that each hierarchy element will be an attribute.

y  Textual as well as the code attributes- Location code as well as the name of the location. This is required, becaus both could be used for different reasons by different users. A power user could be looking for location code (NY01)whereas an end user could be looking for more explicit header (New Jersey).

y  Include all parallel hierarchies ± A product could be having different hierarchies, depending upon if CFO OR Head of sales is looking at it. This enables the done on all hierarchies as well as cross-hierarchies.

y  Production Primary Key Refer Surrogate primary key link to FACT table ± These keys are used because the

 production keys could change OR could be reused. For example a bill number could be reused after 5 years, OR a part number (especially FMCG) could be reused after few years.

y  Production OR source system key- This is required for audit ability OR link to the Extraction data and source

systems.

Dimensional Model Schemas- Star, Snow-Flake and Constellation

Dimensional model can be organized in star-schema or snow-flaked schema.

Page 3: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 3/18

Dimensional Model Star Schema using Star Query

The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationshipdiagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large

fact table and the points of the star are the dimension tables.

A star schema is characterized by one OR more very large fact tables that contain the primary information in the data

warehouse, and a number of much smaller dimension tables (OR lookup tables), each of which contains information aboutthe entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the facttable using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimize

recognizes star queries and generates efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sample schema, the fact table, sales, contain themeasures quantity_sold, amount, and average, and the keys time_key, item-key, branch_key, and location_key. The

dimension tables are time, branch, item and location.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:

y  Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schemadesign.

y  Provide highly optimized performance for typical star queries.y  Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that

the data-warehouse schema contains dimension tables

Page 4: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 4/18

Snow-Flake Schema in Dimensional Modeling

The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is calle

a snowflake schema because the diagram of the schema resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped intomultiple tables instead of one large table. For example, a location dimension table in a star schema might be normalized intoa location table and city table in a snowflake schema. While this saves space, it increases the number of dimension tablesand requires more foreign key joins. The result is more complex queries and reduced query performance. Figure above

 presents a graphical representation of a snowflake schema.

Fact Constellation Schema

Page 5: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 5/18

This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension.The split of fact table is done only when we want to focus on aggregation over few facts & dimensions.

Dimensional Modeling vs. Relational Modeling

Dimensional modeling is different from the OLTP normalized modeling to enable analysis and querying through massiveand unpredicted queries. Something which is a relational model is ill-equipped to handle.

How Dimensional model is different from an E-R diagram?

y  An E-R diagram (used in OLTP or transactional system) has highly normalized model (Even at a logical level),whereas dimensional model aggregates most of the attributes and hierarchies of a dimension into a single entity.

y  An E-R diagram is a complex maze of hundreds of entities linked with each other, whereas the Dimensional modelhas logical grouped set of star-schemas.

y  The E-R diagram is split as per the entities. A dimension model is split as per the dimensions and facts.y  In an E-R diagram all attributes for an entity including textual as well as numeric, belong to the entity table.

Whereas a 'dimension' entity in dimension model has mostly the textual attributes, and the 'fact' entity has mostly

numeric attributes.

Dimensional modeling is a better approach for Data warehouse compared to standard Data Model.

The dimensional model has a number of important data warehouse advantages that the ER model lacks.

First advantage of the dimensional model is that there are standard type of joins and framework. All dimensions can bethought of as symmetrically equal entry points into the fact table. The logical design can be done independent of expectedquery patterns. The user interfaces are symmetrical, the query strategies are symmetrical, and the SQL generated against th

dimensional model is symmetrical. In other words,

Page 6: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 6/18

y  You will never find attributes in fact tables and facts in dimension tables.

y  If you see a non-fact field in the fact table, you can assume that it is a key to a dimension table

Second advantage of the dimensional model is that it is smoothly extensible to accommodate unexpected new dataelements and new design decisions. First, all existing tables (both fact and dimension) can be changed in place by simplyadding new data rows in the table. Data should not have to be reloaded. Typically, No query tool OR reporting tool needs to

 be reprogrammed to accommodate the change. All old applications continue to run without yielding different results. Youcan, respectively, make the following graceful changes to the design after the data warehouse is up and running by:

y  Adding new unanticipated facts (that is, new additive numeric fields in the fact table), as long as they are consistenwith the fundamental grain of the existing fact table.

y  Adding completely new dimensions, as long as there is a single value of that dimension defined for each existingfact record

y  Adding new, unanticipated dimensional attributes.

y  Breaking existing dimension records down to a lower level of granularity from a certain point in time forward.

Third advantage of the dimensional model is that there is a body of standard approaches for handling common modelingsituations in the business world. Each of these situations has a well-understood set of alternatives that can be specifically

 programmed in report writers, query tools, and other user interfaces. These modeling situations include:

y  Slowly changing dimensions, where a 'constant' dimension such as Product OR Customer actually evolves slowlyand asynchronously. Dimensional modeling provides specific techniques for handling slowly changing dimensionsdepending on the business environment.

y  Heterogeneous products, where a business such as a bank needs to:o  track a number of different lines of business together within a single common set of attributes and facts, but

at the same time..

o  it needs to describe and measure the individual lines of business in highly idiosyncratic ways using

incompatible measures.

Foundation & Conformed Dimensions and Facts in Data Warehouse Dimensional Model

Data Warehouse is a repository which feeds data marts, and other down stream systems. It has to be designed to have globaor re-usable set of dimensions and measures.

Data Warehouse modeling has two components:

y  Foundation to support medium to long-term capabilities, without the need to unsettle the structure time and again.

y  The individual phases for developments of Data Marts eventually merge into the enterprise wide Data Warehouse.

A project has to address both the foundation and phase elements. Every stage in the Data Warehouse project will addressthese two elements in distinct and overt manner. For dimensional modeling, the following foundation setting elements will

work like reusable components. They will be same across the Data-Marts/Data Warehouse for current and the future phasesof developments:

Standard set of foundation or conformed dimensions. This means that:

y  Dimensions are super-sets of all possible attributes for that dimension. For example, customer 'age' attribute maynot be required for sales analysis, but required for Credit Analysis. Therefore, when creating the standarddimensions, one make the superset of attributes.

Page 7: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 7/18

y  Dimensions include all possible levels of business hierarchy. For example- A portfolio analysis of a channel maynot require the branch level location, but the agent productivity analysis could.

y  Dimensions to include not only categories, but descriptive textual attributes as well wherever needed.For example- A textual detail for a location code could be needed for distribution analysis, but many not be needed for  portfolio analysis.

y  Make the dimension most granular- Many a times the analysis does not need to go down to the most granular leve

of customer ID. In case, customer moves from his existing customer segment, the whole dimensional modelingcould lead to issues, if the dimension is starting from customer group upwards

examples of foundation dimensions are- Customer , Location, Channel, Sales Lead etc. PLEASE REFER Universal

Dimensions for more examples.

Standard set of foundation or conformed facts. This means that:

y  A fact table will include all possible units of measures for given set of dimensions. For example sales by numberscould need only the number of 'Crates' in one data mart and 'Pieces' in the other. However, both units for the givenmeasure should be included even if there is a standard conversion rate. These standards conversion rates keep onchanging with time.

y  A Fact table logically groups a business instance. For example you could require distribution of a 'product' to retailoutlet for distribution analysis. However, you will require the fact on final sale to the end customer for sales

analysis. As a guideline, a highly linked business process should get combined in a single fact.

Standard set of foundation measures. This means that

y  All the measures and their possible units to be listed out.

y  Measures are most susceptible to having confusing definitions OR to be mis-named. Detailed formulas behindmeasures are must. Refer Sales Revenue Fact-Measure as an example.

examples of foundation measures are- Sales Measures, Customer Measures, etc. PLEASE REFER FACTS-Base Measures

for more examples.

Slowly Changing Dimensions SCD in Dimensional Modeling

Dimensional model has to address some complex situations liked slowly changing dimensions.

Slowly Changing Dimensions

Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. leadto changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost.

For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billingstatements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on

how many customers move to new locations over what frequency.

A key benefit for Data Warehouse is to provide historical information, which is typically over-written (and thus lost)in thetransaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit.

Page 8: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 8/18

There are three ways to handle the same:

Slowly Changing Dimension method 1 (In short SCD 1)

The way most of the source systems will handle it- Overwrite the attribute value. For example if a customer¶s marital statushas moved from 'Unmarried' to 'Married', we over-write 'unmarried' to 'Married'. Similarly, if an insurance policy status has

moved from 'Lapsed' to 'Re-instated' the new status is over written on the old status. This is obviously done, when we arenot analyzing the historical information.

Slowly Changing Dimension Method 2 (in short SCD 2)

This is the true-blue technique to deliver precise historical analysis. This is used, when there is more than one change in

the attributes of an entity, and we need to track the date of change of the attribute.

In this method, a new record is added whereby the new record is given a separate identifier as the primary key. We cannotuse the production key as the primary key here as it has not changed (Customer ID has remained the same, while the valueof its attribute 'marital status' has changed). This new identifier is called the surrogate key.

Apart from adding a new record and providing a new primary (surrogate) key, the validity period for this new record is also

added.

For example- You have a dimensional table with customer_ID '110002' with marital status as 'single'. Overtime, customer 

gets married and also moved to a new location. The customer dimension record will be:

Surrogate Key  Customer ID  Date Valid Marital

Status Date of Birth  City 

1100021 110002 Sept 23, 2004 Single Jan8, 1982 Palo Alto

1100022 110002 Oct 25, 2005 Married Jan8, 1982 Palo Alto

1100023 110002 Nov 23, 2005 Married Jan8, 1982 San Francisco

Slowly changing dimension method 3 (SCD 3)

This is a mid-way between method 1 and method 2. Here we don¶t add an additional record, but add a new field 'oldattribute value'. However, this has limitations. This method has to know from the beginning on what attributes will change.This is because a new field/attribute has to be added in the design for every attribute, which can change. Secondly, attributecan change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.

Surrogate Key  Customer ID Marital

Status Date of Birth  City 

Marital Status

Old City

Old 

1100021 110002 Married Jan8, 1982 San Francisco SinglePaloAlto

 NOTE ± The term of 'Slowly changing dimension' is used because of it being a universally acknowledged term. However,the same methods will apply to fast changing dimensions as well.

Surrogate Keys as Primary keys of dimension tables

Page 9: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 9/18

Page 10: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 10/18

elements/stores are connected. This can also includes on how data flows across an enterprise. Refer DataMapping & Assessment 

Data Loading Refer Data Loading in Data Warehouse 

Data Integration 

Data integration' is part of Data Delivery. Data integration is done both for offline data usage (Data Warehouse)and online data usage ('Operational Data Store' and 'Single Customer View')Data Access Tools 

Data access tools are used in context of OLAP, Reports and Querying and data mining. These are the tools whichallow the user to access the data. The examples include user query tools, reporting tools etc.

Scorecard A Scorecard being part of a broader corporate methodology or management discipline and is a performanceassessment report for a given person, process, team, business unit or entity performed with respect to strategic business plan. Typically Scorecard is produced for a period of time. Refer Scorecard & Dashboard in Dashboardsand Scorecards.

Dashboard 

A Dashboard shares on what is happening at this point of time. To take an example, if you are driving your car,the combination of speedometer, RPM meter, Fuel Gauge and Temperature Gauge will be the Dashboard. Refer Scorecard , Dashboard in Dashboards and Scorecards 

Customer Relationship Management Customer Relationship Management is super set terms for Customer Acquisition or Sales & DistributionManagement, Customer Management. Also refer to the Customer Dimension.

Cube A data cube is a multidimensional structure that contains an aggregate value at each point, i.e., the result of applying an aggregate function to an underlying relation. Data cubes are used to implement online analytical processing (OLAP).

Consumer An individual, group or application that accesses data/information in a Data Warehouse.

Consolidation The process that takes data from different systems and entities, and possibly disparate formats, and combines andaggregates that information to create a unified view. During this process the data is transformed to the commonformats, common data standards, and common codes. Refer Data Transformation in DW Design & Architecture 

Communications Integrity An operational quality that ensures transmitted data has been accurately received at its destination.Business Intelligence Platform 

The set of tools and architectures which provide the business intelligence capability. One has to remember that business intelligence is not only technology capability is lot to do with human and cultural capability as well..

Business Intelligence (BI) Business Intelligence is another word for 'Data Usage' which includes Data Analysis/OLAP, 'Data Mining/DataModeling'. We haven't used the word business intelligence in the knowledgebase as it is a very wide term and hasmany connotations.Business Drivers 

The parameters which drive the business objectives. In every day use terms they are typically the financial and thcustomer parameters.Base Tables Base tables are the tables which provide the detailed data. For example in Data Warehouse, they are knows as the

Page 11: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 11/18

detailed data tables in 'staging area' as well as the 'loaded area.'Attribute 

Attribute is the feature or character type of an object. For example a customer has the attributes like age, sex etc.in RDBMS they are the columns of a table. In Dimensional Model they are the fields in the dimension table.

Atomic Data 

Data elements which are the lowest level of 'Granularity' of data. For Dimensional Model it is recommended tokeep the granularity of data to its atomic level.Analysis or Analytics 

Data Analysis/OLAP/OLAP is most fundamental way to make sense out of your data. It involves looking at thedata from all possible angles, slicing & dicing on various dimensions, drilling up/down, applying filters,exception highlighting, graphs and other presentation tools, doing time trending analysis. Whether you are doinga pivot on excel or creating advanced views in a upmarket OLAP tool, most of the usage of data in today's worldfalls within the realm of Data Analysis/OLAP/OLAP. It is essentially a post graduate course before you go for fellowship in Data Mining

Analytic Applications The Data Analysis/OLAP applications. By definition they don't include 'Data Mining' &' Dimensional Model 

applications. The Analytic applications provides Data Analysis Functions and Data Analysis Types Aggregate Data Aggregate data is the summation, average etc of the individual data elements. In its layman term it is calledsummation. In Data Management terms it is any function applied to present one value in lieu of multipleindividual value.

Administration Data Both in Data Warehouse and production 'source systems' , any data which is used to administer the systems. Thisdata includes essentially every thing apart from production/business data. Administration data is a subset of 'MetaData'.

View These are different representations of data made out of a cube. The views can be stored and can be populatedadhoc or can be schedule. All the Scorecards, Dashboard and other MIS generated on an OLAP/reporting tool ondisplay is called a view.

Changing attributes These are attributes which change with time. For example Marital status, age etc. Refer Special Situations inDimensional Model 

Field or front end systems These are the system which form a layer to interact with sales force or end customers. Most of the CRMapplications are field systems. Refer to Field Systems in Source Systems in Data Staging in Data Warehouse 

Derived attribute This is the attribute which is derived out of the base attribute. For example if the base attribute is sales revenue innumber of units, the attribute 'sales in cartons' is a derived attribute. Refer Dimensional Model 

Attribute Value It is the value an attribute can take. In laymen terms it is a value of a field/column in a schema. For example thevalue attribute 'City' of a dimension 'Location' can be all possible cities. Refer Dimensional Model 

Granularity The level of atomicity of data in Data Warehouse. More base level of the data, the greater is the granularity. For example, the office code is greater granularity compared to city or district for the 'Location' dimension.

Staging A method by which includes Extraction Design from Source Systems to a Staging Area and do Data

Page 12: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 12/18

Transformation so that it is ready for Data Loading into a Multidimensional Database in a Data Warehouse.Data Pre-processing 

Preparation of data before it can be used for OLAP or Data Mining. Refer Transformation Design in DataWarehouse 

Test Cycles This is the grouping of test conditions into a schedule.

Test Plans These are the detailed listing of all the conditions you are going to test. A test condition could include more thanone test scenario and vice versa.

Test Scenarios This is the list of various business events where the system response needs to be tested.

Test Approach This states the overall approach to testing phase. This will include the scope of testing, sequence of testing, thesystems to be included, the volume and variety of test data.Business case 

This is the explanation of why a project makes business sense. This explanation includes cost-benefit analysis,non-quantified benefits, strategic fit etc..Project Proposal 

This is an in-process deliverable in the project Initiation phase. Project proposal document contains the details ofthe project approaches and recommendation. Post the acceptance of the project proposal, the detailed project planning is done.Project Agreement 

This is the final document containing all specifics related to the project scope, risks, cost, plan, organization andthe works.Project Requirement Stage 

This is the stage where the detailed business requirements are captured. The requirements should be detailedenough to enable the system design.Project Initiation Stage 

This is the first stage in the project, which evolves an idea/issue/need into a well planned and sponsored project.Refer 'Project Initiation Stage' in 'Project Management'.Project Task Teams 

Project Task teams are the natural teams formed to complete a certain task in the project plan.Project Core Team 

Project Core Team is chaired by the project manager and comprises the key stakeholders and participants in a projectProject Co-ordinator 

Project Co-ordinator is responsible for day to day co-ordination and tracking of a project.

Project Manager Project Manager is responsible for successfully scoping, executing and implementing a project.Project Sponsor 

The role which is responsible for the business case, and achievement of the project objectives.Surrogate Key 

This is the key, which is used to connect the dimension table with the fact table. This is done to ensure that thedetail in the Data Warehouse due to change of primary key over time.

Star Schema 

Page 13: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 13/18

A star- schema is a the basic schema, where the fact table is connected to the dimension tables. Refer to 'Star Schema' in Dimensional Model 

Source System Refer to Source Systems in Execution-MiHDelivery - Data Warehouse/Marting.

Snowflake Schema 

Snow-flake schema has a dimension table further split into multiple sub-tables for the purpose of design, size and performance. Refer to 'Snowflake Schema' in ' Dimensional Model in ' Data Ware House'.Share of Wallet 

This is the measure on how much of the customer potential value an enterprise is able to tap. Companies alwaystry to maximize the spending by the customer on their product or introduce new products to help cover the widerrange of customers spending potential. Refer Customer Management.

Scalability Ability to support the increasing level of data or processing requirements. Scalability essentially demands that the platform can be incremented gradually and to high levels.

RDBMS Relational Data base management systems. The platform which is used now for most of the production systems.

has data stored in tables, rows and column which are linked through 'foreign keys'.Query A request for data given specific conditions from data source.Primary key 

The field or combination of fields which are unique identifiers of a record. There cannot be two records in adatabase which have same primary key.Partitioning 

Splitting of target data into smaller units.Parallelism 

The ability to perform functions in parallel. This is used extensively in large scale ETL, OLAP and mining tools.Normalization 

The process of converting a higher level data/logical structure into a simpler, broken down and more integralstructure. This involves removing redundant attributes, separating the master data from transaction data, andcreating a single point reference for data elements. This method is used typically in the production or sourcesystem mainly to maintain data integrity and avoiding redundant data.Multidimensional Database (MDBS and MDBMS) 

A powerful database that lets users analyze large amounts of data. An MDBS captures and presents data as arraythat can be arranged in multiple dimensions.Multidimensional 

The representation of data which contains more than one Dimension. Please refer Dimensional Model.Logical Data Model 

This is the business representation of the way data should be organized. Refer Dimensional Model in DataWarehouse.Information 

The useful and actionable analysis, interpretation and presentation of the raw mostly transaction level data.Hypercube 

An OLAP product that stores all data in a single cube which has all the application dimensions applied to it.HOLAP (Hybrid OLAP) 

A product that can provide multidimensional analysis simultaneously of data stored in a multidimensionaldatabase and in an RDBMS. Becoming a popular architecture for server OLAP.

Page 14: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 14/18

  Business Hierarchy In the context of Dimensional Model, this is the parent-child relationship with in a dimension. For exampleLocation dimension will have the hierarchy of street-->city-->district-->state.

Granularity It is the state of fineness of the data. A more granular data means greater level of detail. This means that if I have

the sale figure at city level, my data is less granular compared to sales figures which are at branch level.Foreign Key A foreign key is the primary key of one data structure that is placed into a related data structure to represent arelationship among those structures. Foreign keys resolve relationships, and support navigation among datastructures.

Filters Refer 'Filter' in 'analytic functions' in 'OLAP/Data Analysis/OLAP 

Fastload A methods that is used to load the data from the data-sets in staging area into the Data Loading area in DataWarehouse. This is mainly done by bypassing the integrity checks, logging and also lesser levels of commit points.

Fact Table A central table in a Data Warehouse schema that contains numerical Measures and keys relating facts todimension tables. Fact tables contain data that describes specific events within a business, such as bank transactions or product sales. Refer to 'Facts' in Dimensional Model in Data Warehouse.

Data Extraction Refer Data Extraction in Data Warehouse in the knowledgebase.Business Rule 

This is the statement which determines any change in an enterprise. All what you program in a system is either the definition of data, definition of display and business rule.

ETL It represents the process of converting the raw data to a state where it can be used for analysis and datamining/modeling. It includes Data Extraction which extracts transaction data from disparate sources to a commonrepository--> Data Transformation which cleans and transforms the data-->Data Loading which loads thetransformed data into multi-dimensional database to enable analysis & mining.

ER P Enterprise Resource Planning. This subject is typically used for manufacturing domain, but its principles are usedacross all industry types. Its deals with linking of sales forecast to sales planning to purchase management tomanufacturing schedules to bill of materials to inventory management to financial management. In other words,its is a process of optimizing the resources across the enterprise by integrated planning and management.

Enterprise Portal Enterprise Portal is a subject with in data visualization. Enterprise portal is one stop source for the employees toview and use information. Information can be generic or public use or could be domain or user specific. Itincludes transaction reports, spreadsheets, analytics, Dashboards, company news, policies, business processrepository etc. It is driven by the availability and security layers.Performance Management 

Please refer the section Performance ManagementDrill Down 

This is a type of analysis whereby one Moves up, down and across the dimensions, Attributes and their hierarchies. Refer 'Drill Analysis' in Data Analysis Types in Data Analysis/OLAP 

Dimension 

Page 15: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 15/18

Refer to Dimensional Model in Data Warehouse or Universal Dimensions in Execution-MiHRepositories Desktop Applications 

These are the applications through which a user gets the data on this desk-top. For example a OLAP client, or adata modeling tool or a query tool can be stated as desk-top application.

Derived Data 

All the Data which is not 'Atomic Data'Dependent Data Mart/Stores These are those data marts/Stores and cubes which are cascaded from the parent Data Warehouse, in terms of meta-data as well as the data.

Data Warehouse Network  An integrated network of data warehouses that contain sharable data propagated from a source Data Warehouse on the basis of information consumer demand. The warehouses are managed to control data redundancy and to promote effective use of the sharable data.

Drill (horizontal) and Cross (horizontal) Navigationand Analysis

these are the methods of moving horizontally and vertically with in the dimensional structure of Data-warehouseand OLAP. This term is more used in context with OLAP, because typically various End-user BusinessIntelligence tools sit on top of OLAP, which in turn sits over Data-Warehouse.

These are the methods of moving horizontally with in the dimensional structure of Data-warehouse and OLAP.This term is more used in context with OLAP, because typically various End-user Business Intelligence tools siton top of OLAP, which in turn sits over Data-Warehouse.

Drill-down Navigation

It is a method of exploring for more detailed data. It is done by revealing lower-level data than was previouslydisplayed. For instance, you can drill down from State to City to offices. Available levels depend on thegranularity of the data in OLAP and data warehouse.

Drill-link 

A URL hyperlink to a destination, defining the parameters, such as the document name and prompt answers, for the drill. When the document is viewed in Web, a user can click the link to navigate to the link's destination.

Roll-up Navigation

A method of exploring for more widely summarized data. It¶s an antonym to Drill Down. Typically you move upa dimension hierarchy. For example you have the office level break-up of sales revenue, and you can roll it up tocity, zone, region and country level figures.

Page 16: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 16/18

Cross-Dimensional (horizontal) analysis and

Navigation

Cross-dimensional analysis, is an analysis across multiple dimensions- the key reason why OLAP and its multi-dimensional structure exists. Most of the business reporting and analysis goes across dimensions.

A single dimension analysis is, when you get measures for a single dimension. For example- when one looks for measures sales, headcount of employees, operating expenses etc. for 'location' dimension (office, city, state,region, country..)

A cross-dimension example will be to look for measures sales, gross profit etc. for 'location' dimension (office,city, state, region, country..) for a given set of products, for a given number of quarters. If you top this kind of example with other analysis types (max-min, exception, filtration), you come close to the real-life complexity of  business analysis query. One example can be: 

Identifying top ten of the offices where, the sales for 'washing and cleaning' product range is more than the

average sales for this product range across all offices, for those offices, which are open for more than 3 years andhave an average growth of 5% per quarter over last 4 quarters.

Cross-dimensional analysis capability with an OLAP server is also manifested in the cross-dimensionalnavigation. For example- you are seeing a pie-chart of revenue share for different product-lines. By clicking on pie of a given product (general insurance- Vehicles), you may like to go for state-wise split for the revenue of tha product. Going further, you may like to click on a given state (New-York, California..) and look for split acrossthe channels (telemarketing, sales employees, tied agents, corporate agents, 3rd party brokers..).

In the above examples, you are able to seamlessly navigate and drill across due to the cross dimensional linkages

Here is the list of cross-dimensional analysis you can perform:

Drill-across Dimensions

You drill across dimensions, when you move from one dimension to another.

For example you are looking at revenue break-up for the cities. However, now you want to have the break-up of revenue for various products (Say Fax machine, Telephone and copier) within that city (Say New York). With inthe Fax Machine product in New York City, you want to find the break-up as per channels of telemarketing,

mailers and direct sales.

In the above example you have drilled across the Dimensions of 'Location'->'Products'->'Channel'..

This is one of the most important and features And is fundamental capability expected out of an OLAP tool

Drill Across Measures

Page 17: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 17/18

It is similar to Drill-across dimensions.

For example, you are doing the sales revenue analysis and have been able to find out the best and least performinoffices. However, to have a further understanding of the picture, you now move across measures to find out abouthe Sales transactions of these offices (a low revenue , but higher sales transaction point to a certain level of 

activity) and number of sales staff (the low performing offices could have lesser staff) and number of monthssince the office is set-up (the new offices being in gestation period could be performing lower)..

Drill Across Attributes

This is by all means same as 'drill-across dimensions'. For example you have the data for revenue in US as per thcustomer relationship value bands (say USD 10K to USD 20K/USD 20K to USD 50 K/USD 50K and above.).For USD >50K band, you want to have the break-up as per the age bands (18 years to 25 Years/25 Years to 40Years/>40 Years), and with in >40 years, you want to have the break-up for occupation (self employed, Practicin professional, employ ed..)

In this example we drill across the attributes of relationship value band-->Age Band-->Occupation All belong tothe customer dimension 

Pivoting, and Slicing & Dicing Analysis

Slicing means taking out the slice of a cube, given certain set of select dimension (product), and value (homefurnishings..) and measures (sales value, sales units..). Dicing means viewing the slices from different angles. Forexample -Revenue for different products within a given state or revenue for different states for a given product.One form of Slicing and Dicing is called pivoting.

Slicing means taking out the slice of a cube, given certain set of select dimension (customer segment), and value(home furnishings..) and measures (sales revenue, sales units..) or KPIs (Sales Productivity). Dicing meansviewing the slices from different angles. For example -Revenue for different products within a given state OR revenue for different states for a given product.

Slicing and Dicing leads to what you can call Pivot. Pivot is known in Excel context. Pivot is the standard and basic look and feel of the views you create on the OLAP cubes. A pivot creates an ability for you to create thewidth and depth in your view of the data.

A pivot is a two dimensional lay-out of the summary data. The x and y axis are the dimensions and theintersection cells for any two dimension values contain the value of the measures.

Here is an example of how you can slice and dice through pivot:

Step1: Starting layout- You can have product list on y axis (say 10 products), the quarters (say four quarters) onthe X-axis. You can have sales value as the measure shown in the table against intersection of a given product ana quarter. You will have 10 X 4 matrix.

Page 18: Data Warehouse Dimensional Model Components Concept

8/6/2019 Data Warehouse Dimensional Model Components Concept

http://slidepdf.com/reader/full/data-warehouse-dimensional-model-components-concept 18/18

Step 2: Adding depth Cross-Dimensionally-Taking a step further, you can add a dimension of locations under the product to give it more depth. Therefore now you can have different locations (say 3 locations) for each rowof product. You will not have a 30 (3 locations for each of the 10 products) X 4 (quarters) matrix.

Step 3: Adding depth within a single dimension: You can also add another dimension like months under 

quarters. Now you will have 30 X 12 (3 months for each quarter). You can also specify, if you want to have sub-totals for every dimension. For example, you can have the sub-totals for locations, productions, month andquarters.

Step 4: Pivoting on an axis: You can also pivot your view and transpose the product+ location combination on Xaxis and quarter + month combination on Y axis.

Step 5: Adding Width: Referring to starting layout-You can also add dimensions in 'width' instead of 'depth'. Foexample- instead of having location dimension under the product, you can add location dimension adjacent to the product dimension. Therefore, you will have a matrix, which on Y axis will have 10 rows (for 10 products) and 3rows (for 3 locations), with a 13X4 matrix.