22
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential Governance of a Large ERP Data Warehouse Asim Aziz [email protected]

TDWI Savannah pres3

Embed Size (px)

Citation preview

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Governance of a Large ERP Data Warehouse

Asim Aziz [email protected]

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Problem Statement

•  Need to combine ERP data with Non ERP data to satisfy end to end reporting requirements. •  Requirement to use reporting tools other than BW or Bobj, for example Cognos, SAS, direct

SQL etc. that don’t integrate with ERP system •  Alternative approach to extracting tables manually through ERP application layer for reporting

that is time consuming (in some cases not possible for larger tables) and inefficient •  Needed space to house ERP and non ERP Data in a cost efficient manner •  Need for analyzing large amounts of data and keeping it near real time with operational systems •  Safeguard information from exposure and theft •  Satisfy auditors and regulators by using technology and processes to ensure specific guidelines

are followed •  Need for defining a common business language •  Need for better understanding data and data relationships •  Need to analyze and monitor data quality •  Need to cleanse, standardize and match information •  Requirement to maintain data lineage •  Need to increase innovation by leveraging information in the DW.

2

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Overview: Building a Data Warehouse to Practice Good Data Governance

•  We developed a large data warehouse for the Navy. Along the way, we had to overcome some thorny problems.

•  Such a project must necessarily involve Data Governance at many levels, as we consider ü  Data Security, from multiple points of view ü  Data Lineage ü  Data Stewardship ü  Data Profiling, etc.

•  But Data Governance is more than just this checklist! Data Governance comprises the activities according to which the enterprise manages its data as a resource.

•  So the main way the organization is practicing good governance is by building the warehouse itself. That is, the organization is exploiting its operational data – a valuable resource that would otherwise lie fallow!

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Businesses run their operations on ERP

Many of the world’s largest and most sophisticated organizations use ERP programs to run their operations. And these programs do a great job!

But now these organizations have an additional requirement: they need to analyze the data that these ERP programs have been collecting.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

The US Navy also needs BI from SAP

5

•  The US Navy is a large SAP customer, and it needs a BI solution for its SAP (and non-SAP) data, for the same reasons commercial customers do.

•  Naval Supply Systems Command (NAVSUP-BSC) led the Navy’s effort to integrate multiple commands’ SAP data into a consolidated warehouse.

•  NAVSUP requirements: maintainability, low cost, support of heterogeneous front and back ends, huge data volumes, low impact on operations, support of multiple BI platforms, thousands of users.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Business Analytics for SAP ERP Data

6

•  NAVSUP considered a number of competing alternatives, performed extensive tests, and ran POCs for about one year. In the end they chose Business Analytics for SAP ERP Data.

•  A typical configuration is shown above. This system is based largely on IBM hardware, particularly IBM Pure Data for Analytics (formerly Netezza), IIS information management software from IBM and ELTMaestro from Boston Common Analytics.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

IBM InfoSphere Information Server supports successful information governance

7

IBM InfoSphere Information Server offers end-to-end data quality capabilities that help organizations to: •  Define a common business language to reduce miscommunication between

business and IT. •  Understand data and their relationships to gain a complete picture before

beginning a project. •  Analyze and monitor data quality continuously to reduce the proliferation of

incorrect or inconsistent data. •  Cleanse, standardize and match data to assure its quality and consistency and to

provide a single version of the truth. •  Maintain data lineage so end users can trace data back to original sources,

establishing trust and confidence in the information received. © IBM 2015

© IBM 2015

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

High Level Technical Solution

8

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

The first step is to capture the changes to the operational databases. We’ll illustrate using the case of SAP, since the non SAP cases are often easier.

Change Data Capture

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

•  An Oracle database typically underlies the SAP application; SAP objects are implemented as Oracle tables.

•  The IIS CDC agent reads the redo logs of the Oracle database, capturing the activity of the SAP application with minimal impact on operations.

•  The CDC agent transmits change information to the IIS server.

Change Data Capture

to IIS server

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Change Data Capture

IBM  IIS  Server

•  Most of the source tables are transparent SAP tables, meaning that there is a straightforward relationship between the SAP objects and the Oracle tables.

•  But what about cluster tables? These are cases where the relationship is more complex. Cluster tables are a big problem for many SAP data extraction efforts.

•  InfoSphere DataStage includes SAP pack, which contains efficient tools for extracting data from cluster tables.

•  In general, DataStage has large libraries of methods for extracting data from heterogeneous sources.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

12

DataStage moves large volumes quickly

•  Following extraction from sources, data is loaded to a staging area in Netezza by IIS DataStage.

•  The main consideration here is handling massive data volumes (multiple terabytes) quickly, especially during initial loads or refreshes.

•  DataStage performed well during stress testing on terabyte-range dataloads.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

13

ELTMaestro from Boston Common Analytics enables Netezza to support successful data governance

Within Netezza, the data is processed by the ELTMaestro tool. ELTMaestro supports native Netezza operations through a graphical dataflow language. ELTMaestro supports successful data governance with: •  Data lineage integration with IIS Metadata Workbench, enabling data to

be traced from the BI analysis tool all the way back to SAP tables. •  Support for Netezza row-level security. •  Advanced audit, balance, and control capabilities, batch and run

management, support of other native Netezza operations.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Data Governance: Security

•  Organizations take Data Governance seriously because Data Governance means also means security.

•  Business Analytics for SAP ERP Data faced several different kinds of security challenges, which required different kinds of solutions.

•  Although the Navy has a demanding security environment, these security challenges will be familiar to many commercial organizations as well.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Security Challenges

As you might imagine, the US Navy has strenuous, and sometimes intricate security requirements. •  Each application is thoroughly examined before it is integrated into the

system. •  Traditional role-based security for each application and application user. •  The architecture of the tables in the data warehouse required row and

column based security. That means that there are some large tables where some users are allowed to see some rows, but not others, and some users are allowed to see some columns but not others.

•  Suppose an authorized user is downloading terabytes of information not obviously related to their job, during hours that they aren’t expected to be at work. Should their supervisor be notified? Maybe so! That would be the task of an active security monitoring system. NAVSUP BSC decided to require such a system for their data warehouse.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Row- and column-based security

KEY COMMAND MATERIAL_NUMBER FUND UNIT_PRICE INVOICE CMD_NAME QUANTITY

101 1782 11961958 97XBP28 11.69 23.38 NAVSUP 2

102 1782 13316115 97XBP29 9.06 18.12 NAVSUP 2

103 1719 12767391 97XBP31 0.20 40.00 NAVAIR 200

104 1719 12153775 97XBP31 35.88 2152.80 NAVAIR 60

Users authorized to view NAVSUP data only see these rows.

Users authorized to view NAVAIR data only see these rows.

Columns containing prices are hidden from contract employees.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Guardium Active Security

•  The system is overseen by an active security system called InfoSphere Guardium.

•  Guardium detects suspicious activity even by authorized users: e.g. large downloads, access at unusual times, repeated access, etc.,

17

•  Guardium can respond with a range of actions such as logging the activity, sending an email, or actively closing a session.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Different reasons to secure data

Reasons to restrict data access include: •  Origin: Commands many not want

other commands to view their data. •  Data type: For example, contractors

should not see dollar amounts. •  Values in particular fields: For

example, a data row may be restricted based on plant number or company code.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Data Governance: Data Lineage and Stewardship

Business Analytics for SAP ERP Data uses IIS Metadata Workbench to preserve a complete data lineage. A line in a Cognos report can be referenced traced all the way back to the SAP data from which it originated.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Data Governance includes Business Intelligence

Data governance is the set of practices according to which the organization protects, preserves, and maintains its data and exploits it as a resource. You can’t leave valuable operational data unanalyzed and claim to be performing best-practice data governance! In other words, data governance demands that you treat the data that ERP and other enterprise operational programs have collected as a valued resource, and extract all of the worth that it has to offer.

Business Intelligence is a critical component of Data Governance.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Takeaways

•  It’s no surprise that building a large data warehouse involves lots of Data Governance-related activities.

•  This is especially true in the case we’ve just considered because of the strong emphasis on – Data Security – Data Lineage – Data Stewardship

•  But Data Governance is about more than just keeping your data safe and knowing where it came from. It’s also about deriving maximum benefit from your data. And in this case, that means using the data to support Business Intelligence – in other words, building the data warehouse itself.

©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential

Questions/Comments?

• Call 443-327-9727 • Email: [email protected] –  Asim

22