18
Enterprise Information Management Pristine Data Inc. Ashish Nachane

Enterprise Data Management 1.3

Embed Size (px)

Citation preview

Page 1: Enterprise Data Management 1.3

Enterprise Information Management

Pristine Data Inc. Ashish Nachane

Page 2: Enterprise Data Management 1.3

Contents

2

• Introduction • Information Management Paradigm• Information Management Evolution Cycle• Data Categories

• Information Management Initiatives• Data Service• Data Governance• Data Quality• Appendix

• Working Diagrams• Implementation Techniques

Page 3: Enterprise Data Management 1.3

Information Management ParadigmIntroduction

Governance• Guiding Principles• Project Management• Enhancement

Prioritization• Business Case

Development• Funding requests• Privacy

Data Quality• Data Profiling• Quality Measurement• Cleanup Initiatives

Content Expertise• Data Content SME’s• Source System SME’s• Report Librarian• Atomic Data SME’s• Metadata Tools

Nomenclature• Common Code Tables• Data Naming Standards• KPI Definition• Element Reference

Directory

Content & UseManagement

Development / Design• Analytic Sandbox• Analytic Pilots• Information Delivery Tool

SME’s• ETL SME’s

Metadata• Conceptual Models• Logical Models• Physical Data Models• Business Definitions

Architecture• Process Blueprints• Technology evaluation• Tool/product Standards• Corporate Standards

Development

Shared Services• Reusable Components• Reusable Infrastructure• Enterprise Licenses• Web Services Stack

Training• Tools• Metadata / Content

Information/Data Domain

Func

tiona

l Are

a

Storage Management• Allocation of Local / SAN

/ NAS Storage• Mapping Importance to

Physical Media• Physical Failover

(RAIDx)• Physical Data Layout

Disaster Recovery• Business Continuity

Planning• Backup / Recovery

Strategy• DR SLA’s

Performance Management

• Query Monitoring• DB Tuning• Usage Monitoring• Data Archive Strategy

Capacity Planning• Physical Storage• Index / Working

Overhead• Growth / Usage

Projections• ETL Staging

ETL Architecture• Integration Paradigms• ETL Best Practices• Shared Process

Management• Common Operational

Controls / Monitoring

Data Acquisition• Data Sourcing / Mapping• Extract Management• Incremental Derivation

Data Transformation• Transformation Rules• Transformation Services• Source / Target Maps

Data Access• Standard Reporting• Ad Hoc Queries• Analytic marts• OLAP• Data Mining

InformationDeliveryOperationsIntegration

Architecture

Security• Stratification• Policies• Access Control

Mechanisms• Compliance Procedures

Infrastructure• Network Connectivity &

Capacity• Storage Mechanisms• Servers

EAI / ESB/ Messaging• Message based

integration• Business Activity

Monitoring

EII• Real-time Integration

3

Page 4: Enterprise Data Management 1.3

Information Management Life Cycle

Business operations managed effectively

Information Integration and

Delivery

Basic Data

Management

Enterprise Information

Management

• Enterprise view of data exists• Master Data Management in place• Data definitions and standards exist

across the enterprise• Standardized processes to create

and maintain high-quality data exist• Industry standards-compliant

structure and format of data• 3rd party data integration

• Information created and delivered at the point of operational performance or management control

• Complete information – bad news with the good

• Formal information network predominates

• Collaborative exchange of transactions & information with trading partners

• Analytics applied to operational measurement

• Increased effectiveness of management decision making and faster answers to critical business questions

• Organizational incentives and information processes designed to maximize enterprise value

• Enhanced information, tools and decision-making at every management control point

• Value creation from information used to disrupt the value network

StrategicInformation Capability

Facts >>>> Understanding >>>> Optimization >>>> Innovation

Increasing Executive-Level Commitment

• Data exists in operational silos• Limited analytic capabilities

exist• Information not shared across

functional areas

Business performance analyzed across silos – knowing what happened

Information integrated into operations to optimize performance

Competitive advantages derived from leveraging information to accelerate desired capabilities

Introduction

4

Page 5: Enterprise Data Management 1.3

Data CategoriesIntroduction

5

Most organizations today accumulate various categories of data. Each of these data categories have a specific usage within the enterprise.

Page 6: Enterprise Data Management 1.3

Enterprise Process ManagementGartner defines Business Process Management as:

A management discipline that treats business processes as assets that directly contribute to enterprise performance by driving operational excellence and business agility.

http://www.gartner.com/it-glossary/business-process-management-bpm/http://www.metacase.com/methods/bpmn.html

An organization’s business is evolving and do the processes that support the business need to evolve. A typical BPM initiative will follow these steps:

• Analyze current processes (‘As is Model’) • Design the future processes (‘To be Model’)• Develop future processes (Workflows,

Business Rules, Process Models etc.)• Execute workflows• Monitor business activity

Software vendors such as IBM, Oracle, Pegasystems, Appian (& many more) offer BPM suites to accomplish the tasks involved in a BPM program. Some of the components offered by these suites are:

• Process Modeling, Simulation• Rules Engines, Forms Designer • Web Services (SOA ), Application Integration• Content Repositories, Document Management• Data and Database access • Business Activity Monitoring, Portal, Analytics

BPM evolution cycle

Information Management Initiatives

6

Page 7: Enterprise Data Management 1.3

Service Oriented Architecture

Solaris Windows Linux AIX Mainframes

Middleware

SOA Products .Net MQSeries TIBCO CICS

A service is an action or collection of actions performed in order to provide predictable results. Many such services exists within an organization. These standalone services need to collaborate with each other to provide an organization with a platform to accomplish it’s goals.

This collaboration is made possible by: Ensuring that the services are available They are complete and reliable And can communicate with each other using a common interface (protocol)

Establishing these baseline requirements is a key goal of Service Oriented Architecture.

Application Layer

SAP ORACLE CA In House / Custom Applications

Internal Services External Services

Order Processing

Accounts Receivables

Market Reporting

Outsourced Services

SOA Benefits•Quicker response to changing market conditions•Holistic approach to organization needs•Better business control over IT solutions•Agile and scalable infrastructure•Lower application development costs•De-coupling users from service implementations•Reusability

7

Information Management Initiatives – Enterprise Application Integration

Page 8: Enterprise Data Management 1.3

ESB Stack

Enterprise Service BusIn a Service Oriented Architecture, ESB is a software architecture that facilitates interaction between mutually excusive software applications. ESB exploits asynchronous messaging for communicating between applications. An ESB needs to:•Govern message exchange between services•Control deployment and versioning of services•Resolve contention between services•Promote reusability of services•Cater commodity services like – protocol conversion, event handling, data mapping, message and event queuing, exception handling, enforcing service quality

8Messaging – Message Service, Message Routing & Consolidation (MQ Series, MSMQ)

Protocol Conversion – XML, XSL, CORBA, SOAP

Web Services – WSDL, REST, CGI

Special Message Services – Test tools, loop back

Business Application Monitoring

Data Consolidation & Mapping – EDI, MDM, B2B

Application Adapters – RFC, IDoc, XML-RPC

Process Automation – BPEL, Workflow

ESB Architecture assumes that services are autonomous and the availability of a service cannot be guaranteed. Hence the messages need to be buffered continuously. An ESB manages message processing such that it can:•Buffer a message and deliver it as soon as the receiver is ready•Enforce dynamic processing and security policies•Monitor messages and services•Prioritize, delay and reschedule message delivery•Maintain message logs and handle exceptions

ESB does not implement SOA but provides features for SOA implementation. ESB is standards based and flexible. ESB is not always web services based.

Information Management Initiatives – Enterprise Application Integration

Page 9: Enterprise Data Management 1.3

Information Management & Data Warehousing Information Management Initiatives – Enterprise Business Intelligence

Data Warehouse (DW): consolidated data storage designed to support an organization’s analytical needsOperational Data Store (ODS): a data storage with near real-time data to support operational needs

Data Mart (DM): a data store designed to support a department's analytical needsExtract Transform load (ETL): process used to extract transactions from different systems, transform the data into usable form

and load to the data warehouse

Business Intelligence (BI): the process of analyzing data based on relationships and trendsKnowledge Management (KM): an approach to organizing information such that it is more available and more valuable

Master Data Management (MDM): an approach to define and manage organization’s non-transactional reference dataExecutive Information Systems (EIS): provide to the minute information to executive management about the organizations operations

On-line Analytical Processing (OLAP): a system designed to provide efficient data analytics by rolling up data into pre-defined aggregates

Data Mining : process and technique for analyzing large amounts of data to derive customer behavior patterns

Dimensions: a table used to store master data associated with a business activity. Such as Contacts, Universities, Alumni, Placements etc.

Facts: a table used to store historical business activity details by dimension. Traditionally fact tables contain multiple rows for the same entity over a period of time.

Aggregates: Database table used to store business activities aggregated / rolled up based on pre-defined criteria.

Data Organization

Data Warehouse Objectives

Data Manipulation Techniques

Data Warehousing Terms

Massively Parallel Processing (MPP): a technique that uses memory distribution via independent nodes to process thousands of rows of data at the same time

Columnar Databases : Data Organization technique that stores data by columns instead of the traditional row based storage for relatively quicker data input as well as output

9

Page 10: Enterprise Data Management 1.3

Data Warehouse Components

AcquireExtract data from source systems

ProfileCollect statistics related to source data

CleanseEnsure data integrity

TransformApply business rules to source data

IntegrateConsolidate data from multiple sources

Extract, Transform, Load

LoadMove the data to data storage

EDW Enterprise Wide Single Version of Truth

ODSNear real-time data for operational reporting

DMHistorical view of departmental data

Data Organization

Source Data Contacts Fund Raising Activity Management

Reporting of operational performance metrics

Consolidated Reporting across business units

Departmental / Specific subject area analytics

Periodic Trend Analysis

Organized Adoption of analysis driven decision making

Master Data Management central repository to hold organization’s

reference data

Information Access

Metadata Data Definitions Business Rules Data Standards Operational Metadata Technical Metadata Taxonomy

Information Management Initiatives – Enterprise Business Intelligence

10

Page 11: Enterprise Data Management 1.3

Data Service ArchitectureInformation Management Initiatives – Enterprise Business Intelligence

11

Page 12: Enterprise Data Management 1.3

Data GovernanceData governance is the exercise of authority and control over the management of data assets. The goals of a data governance initiative are:•To define, approve, and communicate data strategies, policies, standards, architecture, procedures and metrics•Enforce regulatory compliance and conformance to data policies, standards, architectures and procedures•To sponsor, track and oversee the delivery of data management projects and services•To manage and resolve data related issues•To understand and promote the values of data assets

12

Activities:1.Data Management Planning

1. Understand Strategic Enterprise Data Needs2. Develop & Maintain Data Strategy3. Establish Data Professional Roles & Operations4. Identify & Appoint Data Stewarts 5. Establish Data Governance Organization6. Develop & Approve Data Policies, Standards & Procedures7. Review and Approve Data Architecture8. Plan & Sponsor Data Management Projects and Services9. Estimate Data Asset Values and Associated Costs

2.Data Management Control1. Supervise Data Professional Organization & Staff2. Coordinate Data Governance Activities3. Manage & Resolve Data Related Issues4. Monitor & Ensure Data Regulatory Compliance 5. Enforce Conformance With Data Policies & Standards 6. Oversee Data Management Projects and Services7. Communicate and Promote the value of Data Assets

Suppliers:•Business Executives•IT Executives•Data Stewards•Regulatory Bodies

Inputs:•Business Goals•Business Strategies•IT Objectives•IT Strategies•Data Needs•Data Issues•Regulatory Requirements

Outputs:•Data Policies•Data Standards•Resolved Issues•Data Mgmt. Projects & Services•Quality Data & Information •Recognized Data Values

Consumers:•Data Producers•Knowledge Workers•Managers & Executives•Data Professionals •Customers

Metrics:•Data Value•Data Management Cost•Achievement of Objective•#Meetings Held, Decisions Made•Steward Representation•Data Professional Headcount

Participants:•Executive Data Stewards•Coordinating Data Stewards•Business Data Stewards•Data Professionals•DM Leader•CIO

Tools:•Email•Personal Productivity Tools•Internet and Other Resources

Data Governance

Page 13: Enterprise Data Management 1.3

Data Quality

Data Quality Phases

An organization’s data quality is determined based on:•How accurate the data is•The Completeness of the data •The timeliness of the overall data availability

The data quality improvement is achieved based on following phases:•Quality Assessment – this phase determines the current state of the data quality. This phase also involves loading the source data and profiling the data. During this phase it is easy to identify redundancies and outliers

•Design – this phase is used to design the quality process. The relationships between objects is finalized during this phase

•Transformation – in case there are any changes / updates required to be done on the source data, they are done during this phase

•Monitoring - Data monitoring is the process of examining data over time and sending alerts when the data violates any business rules that are set

13

Page 14: Enterprise Data Management 1.3

Contents

14

• Introduction • Information Management Paradigm• Information Management Evolution Cycle

• Information Management Initiatives• Data Governance• Appendix

• Working Diagrams• Implementation Techniques

Page 15: Enterprise Data Management 1.3

Data Warehouse Architecture

Page 16: Enterprise Data Management 1.3

Techniques to implement Data Warehouses

• Data consolidated from multiple sources and loaded to a staging area in a single batch usually during off –peak hours • Business Rules are applied to the data in the staging area • Transformed data is loaded to common data repository • Reporting applications reference the common data repository for historical reporting

• Data consolidated from multiple sources and loaded directly to a common data storage multiple times during the day• Business Rules are applied to the data while the is loaded to the common data storage • Data loaded into a database structure that is very similar to the transactional data structure • Reporting applications reference the common data repository but also have to apply rules in order to present aggregated data

Appendix

• Data consolidated from multiple sources and loaded to a staging area in a single batch usually during off –peak hours • Business Rules are applied to the data in the staging area • Reference data loaded specific reference tables (dimensions) • Transformed transaction data loaded to subject data repository • Reporting applications reference the historical transaction data though common reference data repository. This presents a conformed view of the Reference Data for the Organization

Page 17: Enterprise Data Management 1.3

Drivers for BI / Data WarehouseOperational Efficiency

Required activities

Definebusiness

value

Business Intelligence

Decision Support Simulation

Managerial Reporting

Defineinfomgmt goals

Define current state

Competitive Advantage

Effective Data Management

Governance

Key Performance Metrics

Goals

Source Analysis Processes

While Business Intelligence is often the main driver for a Data Warehouse, data warehousing also supports higher levels of collaboration by providing a single version of the

truth for shared data

Data warehousing reduces IT cost by reducing the:Costs associated with information collection, consolidation, and disseminationEffort required for report developmentMaintenance of transactional systemsDisruption caused by ad-hoc information requests

When information assets are well-managed, the business can be better positioned to focus on:Improving customer satisfactionImproving delivery of products and servicesMeeting increasing demands for regulatory reportingManaging risk across the organizationBuilding repeatable processes and platforms

Information Management Initiatives – Enterprise Business Intelligence

17

Page 18: Enterprise Data Management 1.3

Enterprise Data Warehouse – Conceptual (WIP)