View
24
Download
1
Category
Tags:
Preview:
Citation preview
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Governance of a Large ERP Data Warehouse
Asim Aziz aaziz@agr-us.com
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Problem Statement
• Need to combine ERP data with Non ERP data to satisfy end to end reporting requirements. • Requirement to use reporting tools other than BW or Bobj, for example Cognos, SAS, direct
SQL etc. that don’t integrate with ERP system • Alternative approach to extracting tables manually through ERP application layer for reporting
that is time consuming (in some cases not possible for larger tables) and inefficient • Needed space to house ERP and non ERP Data in a cost efficient manner • Need for analyzing large amounts of data and keeping it near real time with operational systems • Safeguard information from exposure and theft • Satisfy auditors and regulators by using technology and processes to ensure specific guidelines
are followed • Need for defining a common business language • Need for better understanding data and data relationships • Need to analyze and monitor data quality • Need to cleanse, standardize and match information • Requirement to maintain data lineage • Need to increase innovation by leveraging information in the DW.
2
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Overview: Building a Data Warehouse to Practice Good Data Governance
• We developed a large data warehouse for the Navy. Along the way, we had to overcome some thorny problems.
• Such a project must necessarily involve Data Governance at many levels, as we consider ü Data Security, from multiple points of view ü Data Lineage ü Data Stewardship ü Data Profiling, etc.
• But Data Governance is more than just this checklist! Data Governance comprises the activities according to which the enterprise manages its data as a resource.
• So the main way the organization is practicing good governance is by building the warehouse itself. That is, the organization is exploiting its operational data – a valuable resource that would otherwise lie fallow!
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Businesses run their operations on ERP
Many of the world’s largest and most sophisticated organizations use ERP programs to run their operations. And these programs do a great job!
But now these organizations have an additional requirement: they need to analyze the data that these ERP programs have been collecting.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
The US Navy also needs BI from SAP
5
• The US Navy is a large SAP customer, and it needs a BI solution for its SAP (and non-SAP) data, for the same reasons commercial customers do.
• Naval Supply Systems Command (NAVSUP-BSC) led the Navy’s effort to integrate multiple commands’ SAP data into a consolidated warehouse.
• NAVSUP requirements: maintainability, low cost, support of heterogeneous front and back ends, huge data volumes, low impact on operations, support of multiple BI platforms, thousands of users.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Business Analytics for SAP ERP Data
6
• NAVSUP considered a number of competing alternatives, performed extensive tests, and ran POCs for about one year. In the end they chose Business Analytics for SAP ERP Data.
• A typical configuration is shown above. This system is based largely on IBM hardware, particularly IBM Pure Data for Analytics (formerly Netezza), IIS information management software from IBM and ELTMaestro from Boston Common Analytics.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
IBM InfoSphere Information Server supports successful information governance
7
IBM InfoSphere Information Server offers end-to-end data quality capabilities that help organizations to: • Define a common business language to reduce miscommunication between
business and IT. • Understand data and their relationships to gain a complete picture before
beginning a project. • Analyze and monitor data quality continuously to reduce the proliferation of
incorrect or inconsistent data. • Cleanse, standardize and match data to assure its quality and consistency and to
provide a single version of the truth. • Maintain data lineage so end users can trace data back to original sources,
establishing trust and confidence in the information received. © IBM 2015
© IBM 2015
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
High Level Technical Solution
8
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
The first step is to capture the changes to the operational databases. We’ll illustrate using the case of SAP, since the non SAP cases are often easier.
Change Data Capture
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
• An Oracle database typically underlies the SAP application; SAP objects are implemented as Oracle tables.
• The IIS CDC agent reads the redo logs of the Oracle database, capturing the activity of the SAP application with minimal impact on operations.
• The CDC agent transmits change information to the IIS server.
Change Data Capture
to IIS server
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Change Data Capture
IBM IIS Server
• Most of the source tables are transparent SAP tables, meaning that there is a straightforward relationship between the SAP objects and the Oracle tables.
• But what about cluster tables? These are cases where the relationship is more complex. Cluster tables are a big problem for many SAP data extraction efforts.
• InfoSphere DataStage includes SAP pack, which contains efficient tools for extracting data from cluster tables.
• In general, DataStage has large libraries of methods for extracting data from heterogeneous sources.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
12
DataStage moves large volumes quickly
• Following extraction from sources, data is loaded to a staging area in Netezza by IIS DataStage.
• The main consideration here is handling massive data volumes (multiple terabytes) quickly, especially during initial loads or refreshes.
• DataStage performed well during stress testing on terabyte-range dataloads.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
13
ELTMaestro from Boston Common Analytics enables Netezza to support successful data governance
Within Netezza, the data is processed by the ELTMaestro tool. ELTMaestro supports native Netezza operations through a graphical dataflow language. ELTMaestro supports successful data governance with: • Data lineage integration with IIS Metadata Workbench, enabling data to
be traced from the BI analysis tool all the way back to SAP tables. • Support for Netezza row-level security. • Advanced audit, balance, and control capabilities, batch and run
management, support of other native Netezza operations.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Data Governance: Security
• Organizations take Data Governance seriously because Data Governance means also means security.
• Business Analytics for SAP ERP Data faced several different kinds of security challenges, which required different kinds of solutions.
• Although the Navy has a demanding security environment, these security challenges will be familiar to many commercial organizations as well.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Security Challenges
As you might imagine, the US Navy has strenuous, and sometimes intricate security requirements. • Each application is thoroughly examined before it is integrated into the
system. • Traditional role-based security for each application and application user. • The architecture of the tables in the data warehouse required row and
column based security. That means that there are some large tables where some users are allowed to see some rows, but not others, and some users are allowed to see some columns but not others.
• Suppose an authorized user is downloading terabytes of information not obviously related to their job, during hours that they aren’t expected to be at work. Should their supervisor be notified? Maybe so! That would be the task of an active security monitoring system. NAVSUP BSC decided to require such a system for their data warehouse.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Row- and column-based security
KEY COMMAND MATERIAL_NUMBER FUND UNIT_PRICE INVOICE CMD_NAME QUANTITY
101 1782 11961958 97XBP28 11.69 23.38 NAVSUP 2
102 1782 13316115 97XBP29 9.06 18.12 NAVSUP 2
103 1719 12767391 97XBP31 0.20 40.00 NAVAIR 200
104 1719 12153775 97XBP31 35.88 2152.80 NAVAIR 60
Users authorized to view NAVSUP data only see these rows.
Users authorized to view NAVAIR data only see these rows.
Columns containing prices are hidden from contract employees.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Guardium Active Security
• The system is overseen by an active security system called InfoSphere Guardium.
• Guardium detects suspicious activity even by authorized users: e.g. large downloads, access at unusual times, repeated access, etc.,
17
• Guardium can respond with a range of actions such as logging the activity, sending an email, or actively closing a session.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Different reasons to secure data
Reasons to restrict data access include: • Origin: Commands many not want
other commands to view their data. • Data type: For example, contractors
should not see dollar amounts. • Values in particular fields: For
example, a data row may be restricted based on plant number or company code.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Data Governance: Data Lineage and Stewardship
Business Analytics for SAP ERP Data uses IIS Metadata Workbench to preserve a complete data lineage. A line in a Cognos report can be referenced traced all the way back to the SAP data from which it originated.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Data Governance includes Business Intelligence
Data governance is the set of practices according to which the organization protects, preserves, and maintains its data and exploits it as a resource. You can’t leave valuable operational data unanalyzed and claim to be performing best-practice data governance! In other words, data governance demands that you treat the data that ERP and other enterprise operational programs have collected as a valued resource, and extract all of the worth that it has to offer.
Business Intelligence is a critical component of Data Governance.
©2015 AGR. All rights reserved. No part of this presentation may be reproduced or distributed without the prior written permission of AGR. All trademarks are the property of their respective owners. AGR Confidential
Takeaways
• It’s no surprise that building a large data warehouse involves lots of Data Governance-related activities.
• This is especially true in the case we’ve just considered because of the strong emphasis on – Data Security – Data Lineage – Data Stewardship
• But Data Governance is about more than just keeping your data safe and knowing where it came from. It’s also about deriving maximum benefit from your data. And in this case, that means using the data to support Business Intelligence – in other words, building the data warehouse itself.
Recommended