33
1 1 The IT Perspective: Data Warehousing, Management, and Analytical Structures Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

The IT Perspective: Data Warehousing, Management, and ...download.microsoft.com/download/D/8/0/D804C329-9FC2-4B1E-B5C9 … · The IT Perspective: Data Warehousing, Management, and

Embed Size (px)

Citation preview

11

The IT Perspective: Data Warehousing, Management, and Analytical Structures

Rafal LukawieckiStrategic Consultant, Project Botticelli [email protected]

22

ObjectivesExplain the basics of:1. Master Data Management

2. Data Warehousing

3. ETL

4. OLAP/Multidimensional Data

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. Thematerial presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to theinformation in this presentation.

Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials byother authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows,Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. Theinformation herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of thispresentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be acommitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after thedate of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft, Marin Bezic, and Donald Farmer with his entire team for all the support.

33

44

SQL Services – Why?Install only the ones you need

Which?Integration Services

Get your data from the world outside (ETL)

Analysis ServicesCubes, Data Mining, support for PowerPivot on SharePoint

Reporting ServicesDIY Report Builder and traditional “big” reports

Master Data ServicesQuality of critical master data (cities, colours, customers)

Database EngineData warehouse and OLTP relational storage

5

Master Data Management

66

MDM

Ensures consistency of data across all organisational uses

Impacts overall data quality

Processes and tools for:Collection, aggregation, matching, distribution, and persistence of master data

Consistently

Related to Federated Data Management

Key to MDM: Modelling

77

Why MDM?It’s About Evolution of Enterprise Architecture

88

MDM Processes

• Batched Acquisition from Staging Tables

• Members, Attributes, Parent-Child Relationships

• SQL Integration Services

Import & Integration

• Versioning Changes

• Auditing

• Compliance

• Tracking of Instances

Modeling• Subscription Views

• Export to:

• Operational Systems

• Data Warehouses

• BI Analytics

• Reporting Tools

Export & Subscription

99

Microsoft Master Data ServicesSQL 2008 R2 Enterprise, Datacenter, Developer

Tools:Master Data Manager

Primary tool for managing your master data

MDS Configuration Manager

IT Pro tool

MDS Web ServiceFor developers wanting to extend MDS

Concepts:Models

Entities

Attributes

Members

Hierarchies

Collections

Versions

Database

1010

Modelling Master Data

Model organises data at highest levelAllowing versioning of changes to data

There are typically four categories of models:People (Customers, Staff)

Places (Geographies, Cities, Countries)

Things (Products)

Concepts (Accounts, Behaviours, Transactions)

1111

Example: Product MDM Model

Product (model)

Product (entity)

Name (free-form attr)

Code (free-form attr)

Subcategory (domain-

based attr)

Name (free-form attr)

Code (free-form attr)

Category (domain-

based attr)

Name (free-form attr)

Code (free-form attr)

StandardCost(free-form

attr)

ListPrice(free-form

attr)

Photo (file attr)

12

1. Reviewing a Data Model Using Master Data Services

13

Data Warehouse

1414

OLE DB

ODBC

DB2Oracle

XML

SQL Server

Analysis Services

SQL Server

Report Server Models

SQL Server

Data Mining Models

SQL Server

Integration Services

MySAP

Hyperion Essbase

SAP

NetWeaver BISQL Server

Teradata

Rich ConnectivityData Providers

1515

Star Schema

1616

Star Schema Benefits

Simple, not-so-normalized model

High-performance queriesEspecially with Star Join Query Optimization

Mature and widely supported

Low-maintenance

1717

Snowflake Dimension TablesDefine hierarchies using multiple dimension tables

Support fact tables with varying granularity

Simplify consolidation of heterogeneous data

Potential for slower query performance in relational reporting

No difference in performance in Analysis Services database

2323

Slowly Changing Dimensions

Maintain historical context as dimension data changes

Three common ways (there are more):Type 1: Overwrite the existing dimension record

Type 2: Insert a new ‘versioned’ dimension record

Type 3: Track limited history with attributes

27

Integration and ETL

2828

Let’s do ETL with SSIS

SQL Server Integration Services (SSIS) service

SSIS object model

Two distinct runtime engines:Control flow

Data flow

32-bit and 64-bit editions

2929

The Package

The basic unit of work, deployment, and execution

An organized collection of:Connection managers

Control flow components

Data flow components

Variables

Event handlers

Configurations

Can be designed graphically or built programmatically

Saved in XML format to the file system or SQL Server

3030

Control Flow

Control flow is a process-oriented workflow engine

A package contains a single control flow

Control flow elementsContainers

Tasks

Precedence constraints

Variables

3131

Data Flow

The Data Flow Task

Performs traditional ETL and more

Fast and scalable

Data Flow Components

Extract data from Sources

Load data into Destinations

Modify data with Transformations

Service Paths

Connect data flow components

Create the pipeline

32

1. Using SQL Server Integration Services for Splitting Data

33

OLAP/Multidimensional Data

3434

Cube = Unified Dimensional Model

Multidimensional data

Combination of measures and dimensions as one conceptual model

Measures are sourced from fact tables

Dimensions are sourced from dimension tables

3636

Hierarchies

BenefitsView of data at different levels of summarization

Path to drill down or drill up

ImplementationDenormalized starschema dimension

Normalized snowflakedimension

Self-referencing relationship

3737

Hierarchy

Defined in Analysis Services

Ordered collection of attributes into levels

Navigation path through dimensional space

Very important to get right!

Customers by Geography

Country

State

City

Customer

Customers by Demographics

Marital

Gender

Customer

3838

Measure Group

Group of measures with same dimensionality

Analogous to a fact table

Cube can contain more than one measure group

E.g. Sales, Inventory, Finance

Defined by dimension relationships

3939

Sales Inventory Finance

Customers X

Products X X

Time X X X

Promotions X

Warehouse X

Department X

Account X

Scenario X

Measure Group

Measure GroupD

ime

ns

ion

42

1. Using BIDS to Review Dimension Design

2. Cube Design and Functionality

4343

Summary

As a platform for enterprise Business Intelligence you should consider four services:

Data Warehouse (can be relational)

Process for Data Management (MDS)

Process for Data Integration (ETL)

Analysis (OLAP, Data Mining, Columnar)

= SQL Server 2008 R2

4444

© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented isnot certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, asindividually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names areor may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents thecurrent view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it shouldnot be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided afterthe date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.