59
Dimensional Modeling By Dr. Gabriel

Dimensional Modeling

  • Upload
    rkpolu

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

DatawarehouseModeling

Citation preview

  • Dimensional Modeling

    By Dr. Gabriel

  • Dimensional ModelingDimensional modelingLogical design technique for structuring dataIt is intuitive to business usersEasy-to-understandFast query performancePrimary constructs of a dimensional modelfact tablesdimension tables

  • Star SchemaA fact tableMultiple dimension tablesExample: Assume this schema to be of a retail-chain. Fact will be revenue (money). How do you want to see data is called a dimension.

  • FactsFactsMeasurementsNumericAdditiveCriticalBI applications do not retrieve a single fact table row; data is summarizedSemi-additiveCannot be summed across time periodsExamples: account balances, inventory levelsNon-additiveCannot be summed across any dimensionAre stored in dimension tables

  • Fact TablesFact tablesStore numeric additive factsConformed factsFacts with identical definitionsMay have same standardized name in separate tablesFor non-conformed factsDifferent interpretations must be given different names

  • Fact TablesFact table keysComplex key that consists of foreign keys from intersecting dimension tablesEvery foreign key must match a unique primary key in the corresponding dimension tableForeign keys should not be nullSpecial keys such as unknown, N/A, etc. should be used instead.

  • Fact TablesFact table granularityData should be at the lowest, most detailed atomic grain captured by a business processFlexibility in querying/reportingScalability

  • Dimension TablesDimension tablesConsist of highly correlated groups of attributes that represent key objects in business such as products, customers, employees, facilitiesStore attributes forQuery constraining/filteringQuery result labelingDimensionsCan be easily identified when business users use by wordExample: by year, by product, by region, etc.

  • Dimension TablesDimension attributesTextual fieldsNumeric values that behave like textNon-additivesRequirementsLabels consist of full worldsDescriptiveNo missing valuesDiscretely valued (contain only 1 value for each row in the dimension table)Quality assured (no misspelling, obsolete or orphaned values, different versions of the same attribute)

  • Dimension TablesDimension tables are small with regard to the number of rowsStoring descriptions for each attribute is criticalEasy-to-use for business usersRows are uniquely identified by a single key, usually, a sequential surrogate key

  • Dimension TablesAdvantages of using surrogate keysPerformanceEfficient joinssmaller indexesmore rows per blockData integrityWhen the keys in operational systems are reusedDiscontinued products, Deceased customers, etc.Mapping when integrating data from different sourcesKeys from different sources may be differentMapping table of the surrogate key and keys from different sources

  • Dimension TablesAdvantages of using surrogate keys (Cont)Handling unknown or N/A valuesEase of assignment a surrogate key value to rows with these valuesTracking changes in dimensional attribute valuesCreating new attributes and assigning the next available surrogate key

  • Dimension TablesDisadvantages of using surrogate keysAssignment and management of surrogate keys and appropriate substitution of these keys for natural keys extra load for ETL systemMany ETL tools have built-in capabilities to support surrogate key processingOnce the process is developed, it can be easily reused for other dimensions

  • Conformed Dimensionsa.k.a. master or common reference dimensionsShared across the DW environment joining to multiple fact tables representing various business processes2 typesIdentical dimensionsOne dimension being a subset of a more detailed dimension

  • Conformed DimensionsIdentical dimensionsSame content, interpretation, and presentation regardless of the business process involvedSame keys, attribute names, attribute definitions, and domain values regardless of domain values they join toExample: product dimension referenced by orders and the one referenced by inventory are identicalOne dimension being a perfect subset of a more detailed, granular dimension tableSame attribute names, definitions, and domain valuesExample: sales is linked to a dimension table at the individual product level; sales forecast is linked at the brand level

  • Conformed DimensionsSales Fact TableDate key FKProduct key FK other FKeysSales quantitySales amountProduct DimensionProduct key PKProduct descriptionSKU numberBrand descriptionSub class descriptionClass descriptionDepartment descriptionColorsizeDisplay type

    Sales Forecast Fact TableMonth key FKBrand key FK other FKeysForecast quantityForecast amount

    Brand DimensionBrand key PKBrand descriptionSub class descriptionClass descriptionDepartment descriptionDisplay type

  • Conformed DimensionsBenefitsConsistencyEvery fact table is filtered consistently and results are labeled consistentlyIntegrationUsers can create queries that drill across fact tables representing different processes individually and then join result set on common dimension attributesReduced development time to marketOnce created, conform dimensions are reused

  • Dimensional Design ProcessBased on business requirements and data realitiesStep 1 choose the business processStep 2 declare the grainStep 3 identify dimensionsStep 4 Identify facts

  • Enterprise Bus ArchitectureRequirements are gathered and represented in a form of Enterprise Data Warehouse Bus MatrixEach row corresponds to a business/processEach column corresponds to a dimension of the businessEach column is a conformed dimensionEnterprise Data Warehouse Bus Matrix documents the overall data architecture for DW/BI system

  • Enterprise Bus Architecture Matrix

  • Enterprise Bus Architecture MatrixPossible Problems:Level of details for each column and row in the matrixRow-relatedListing departments/imitating organizational chart instead of business processesListing reports and analytics related to business process instead of the business process itselfEx. Shipping orders business process supports various analytics such as customer ranking, sales rep performance, product movement analyses

  • Enterprise Bus Architecture MatrixPossible Problems (Cont):Column-relatedGeneralized columns/dimensionsExample: Entity column is too general as it includes employees, suppliers, contractors, vendors, customersToo many columns related to the same dimensionWorst case when each attribute is listed separatelyExample: Product, Product Group, LOB are all related to the Product dimension and should be listed as one.

  • Date/Time DimensionsStandard date dimension table at a daily grain

    Rationale: remove association with calendar from BI applicationsUse numeric surrogate keys for date dimension tablesDate DimensionDate key pkCalendar DateCalendar MonthCalendar DayCalendar QuarterCalendar Half yearCalendar YearFiscal QuarterFiscal Year

  • Date/Time DimensionsTime of day should be treated as dimension only if there are meaningful textual descriptions for periods within the dayExample; lunch hour, rush hours, etc.Otherwise, time of day needs to be represented as a simple non-additive fact or a date/timestamp

  • Date/TimestampUsed in the fact table to support precise time interval calculated across fact rowsCalculations to be performed by ETL systemExample: elapsed time between original claim date and first payment date

  • Multiple Time ZonesExpress time in coordinated universal time (UTC)Additionally, may be expressed in local timeOther options: use a single time zone (for example, ET) to express all times in this zone

    Call Center Activity FactLocal call date key FKUTC call date key FKLocal call time of day fkUTC call time of day fklocal call date dimensionUTC call date dimensionLocal call time of day dimensionUTC call time of day dimension

  • Degenerate DimensionsOccur in transaction fact tables that have a natural parent-child structureKey remains the only attribute left after other attributes got separated into dimensionsKey should be the actual transaction numberStored in a fact table - do not create a corresponding dimension table

  • Degenerate DimensionsExample:

    ORDERS TRANSACTIONSorder#customer idcustomer lnamecustomer fnameshipto street addressshipto cityshipto stateshipto ziporder total amountdiscount amountnet order amountpayment amountorder date

    ORDERS FACTScustomer keyshipto address keyorder date keyorder total amountdiscount amountnet order amountpayment amountorder#

    DIM CUSTOMER Customer keycustomer idcustomer lnamecustomer fname

    DIM SHIPTO ADDRESS Shipto address keyshipto street addressshipto cityshipto stateshipto zip

    DIM Order DateOrder date keyCalendar dateCalendar month

  • Slowly Changing DimensionsDimension table attributes change infrequentlyMini-dimensionsSeparating more frequently changing attributes into their own separate dimension table, a.k.a. mini-dimension3 types of handling slowly changing dimensionsOverwrite the dimension attributeAdd a new dimension rowAdd a new dimension attribute

  • Slowly Changing Dimensions - Overwrite the dimension attributeNew values overwrite old onesNo history is keptProblems occur if data was previously aggregated based on old valuesWill not match ad-hoc aggregations based on new valuesPrevious aggregations need to be updated to keep aggregated data in-sync.

  • Slowly Changing Dimensions - Add a new dimension rowMost popular techniqueNew row with new surrogate PK is inserted into dimension table to reflect new attribute valuesBoth, old and new values are stored along with effective and expiration dates, and the current row indicatorExample:

  • Slowly Changing Dimensions - Add a new dimension attributeUsed infrequentlyA new column is added to the dimension tableOld value is recorded in a prior attribute columnNew value is recorded in the existing columnAll BI applications transparently use the new attributeQueries can be written to access values stored in the prior attribute column

  • Role-playing DimensionsSame physical dimension table plays different logical role in a dimension modelExample: multiple date dimensions

    Order Transaction FactOrder date key FKShip date key FKProduct key FKOrder amountOrder Date DimensionOrder date key PKOrder dateOrder date day of weekOrder date monthShip Date DimensionShip date key PKShip dateShip date day of weekShip date month

  • Role-playing DimensionsOther examples: Customer (ship to, bill to, sold to)Facility or port (origin, destination)Provider (referring, performing)Stored in the same physical table but presented in a separately-labeled viewImplemented using views or aliases depending on the database platform

  • Junk DimensionsMiscellaneous flags and text attributes that cannot be placed into one of existing dimension tablesStore them in a junk dimensionStore as unique combinationsExample:

    Data profiling is useful in identifying junk dimension candidates

  • SnowflakingOccurs when dimension tables are normalized

    Increases complexity for usersDecreases performanceProduct DimensionProduct key PKProduct DescrSKU numberBrand key FKPackage type key FKBrand dimensionBrand key pkBrand descriptionSubcategory key FKSubcategory dimensionSubcategory key pkSubcategory descriptionPackage type dimensionPackage type key pkPackage type descr

  • Outrigger DimensionsLook like a beginning of a snowflakeExample:

    Large number of attributesDifferent grainDifferent update frequencyFact tableCustomer key FK.Customer dimensionCustomer key PKFnameLnameAddressCountyCounty demographicsCounty demographicsOutrigger dimensionCounty Demogr keyTotal populationMalesFemaleUnder 18

  • Bridge TablesUsed to implement variable-depth hierarchiesShould be used only when absolutely necessaryNegatively affect usabilityDecrease performance

    Example: reporting revenue for customers who has subsidiary relationship

    Customer dimensionCustomer key FK.Customer hierarchy bridgeParent Customer keySubsid. Customer key#levels from parentBottom flagTop flagFact tabledate key FKCustomer key F

  • 3 Fundamental Fact Table GrainsTransactionOne row per transaction/line of transactionRows are inserted into fact tables only when a transaction activity occurs

  • 3 Fundamental Fact Table GrainsPeriodic snapshotAt predetermined intervals snapshots of the same level of details are taken and stacked consecutively in the fact tableExample: most financial reports, bank account valueComplements detailed transaction facts but not substitutes themShare the same conformed dimensions but have less dimensions

  • 3 Fundamental Fact Table GrainsAccumulating snapshotLess frequently usedHave multiple date FK that correspond to each milestone in the workflowLots of N/A or Unknown fields when a row is originally insertedRequires a special row in date dimension table as discussed earlier

  • Facts of Different GranularityA single fact table cannot have facts with different granularityAll measurements must be in the same level of detailsExample: Measurements are captured for each line order except for the shipping charge which is for the entire orderSolutions:Allocating higher level facts to a lower granularityCreate two separate fact table

  • Multiple Currencies and Units of MeasuresMeasurements are provided in a local currencyMeasurements are also converted to a standardized currency or conversion rates must be storedSimilarly, in case of multiple units of measures, conversions to all different units of measure are provided

  • Factless Fact Tablesbusiness processes that do not generate quantifiable measurementsExample: student attendance

    Can be easily converted into traditional fact tables by adding an attribute Count, which is always equal to 1.Helps to perform aggregationsStudent attendance event factsDate keyStudent keyFacility keyFaculty keyCourse/section keyDate dimensionfacility dimensionCourse/section dimensionstudent dimensionfaculty dimension

  • Consolidated Fact TablesFact tables populated from different sources may potentially be consolidated into single oneLevel of granularity must be the sameMeasurements are listed side-by-sideExample: by combining forecast and actual sales amounts, a forecast/actual sales variance amount can be easily calculated and stored

  • Recommendations to Avoid Common Misconceptions about Dimensional ModelingDo not take a report-centric approachDo not create a new dimensional model for each slightly different reportDo not create a new dimensional model for each department for data from the same sourceCreate dimensional models with the finest level of granularity (atomic data)Flexible and independent of a specific business question/reportScalableUse conformed dimensions ease integration effortsMake ETL process structuredAvoid chaos when integrating multiple data marts

  • Comprehensive example Video rental

  • Customer#Cust NoF NameL NameAds1Ads2CityStateZipTel NoCC NoExpireRental#Rental NoDateClerk NoPay TypeCC NoExpireCC ApprovalLine#Line NoDue DateReturn DateOD chargePay type

    Requestor ofOwner ofVideo#Video NoOne-day feeExtra daysWeekendTitle#Title NoNameVendor NoCostName forHolder ofE-R Diagram

  • Dimensional Model

  • Modeling Process

  • 4 steps of dimensional modelingChoose a business processDeclare the grainIdentify dimensionsIdentify facts

  • High-level model diagramIs a data model at the entity levelShows specific fact and dimension tables applicable to a specific business processGreat communication and training toolOrdersDateOrder, DueOrder junkCustomerPromotionProductCurrencyChannelSales person

  • Derived factsAdditive calculation using other facts in the same tableCan be calculated using a viewExample: net sales based on subtraction of commission amount from the gross salesNon-additive calculation that is expressed at a different level of details than the fact table itselfCan be calculated by BI tools at the time of queryExample: Year-to-date sales

  • Derived facts

  • Detailed Dimensional Design Worksheet

  • Updating bus matrix

  • Sample Data Model Issue List

  • Design documentBrief description of business processes included in the designHigh level discussion of the business requirements to be supported pointing back to the detailed requirements documentHigh level data model diagramDetailed dimensional design worksheet for each fact and dimension tableOpen issues list highlighting the unresolved issuesDiscussion of any known limitations of the design to support the project scope and business requirementsOther items of interest, such as design compromises or source data concerns)

  • Questions ?