Upload
buidan
View
221
Download
4
Embed Size (px)
Citation preview
Data Warehouse on a Budget
How to really do more with less
DiscussionOctober 11, 2009
2Confidential and Proprietary
Agenda
• Essentials (foundation products)
• Key components (design build products)
• Inexpensive, high quality substitutes
• A word about accelerators How to deliver value quickly
• Other options Alternatives to mainstream thought
3Confidential and Proprietary
About me…
• Led design/build of seven (7) large scale (over 5TB) data warehouses designed, built, and deployed in the Financial Services, Transportation, Supply Chain, Retail, Utility, and Professional Services industries
• Over twenty-five (25) data marts (special purpose subject areas) designed, built, and deployed across a wide variety of industries
• Eight (8) data warehouse executive assessments prepared and delivered for management review and action. In addition, five (5) detailed business cases prepared to support the investment in the analytic environment
• Five (5) commercial off-the-shelf products developed and marketed worldwide to the Software Engineering and Healthcare industries
4Confidential and Proprietary
Essentials Foundation components
5Confidential and Proprietary
Essentials – Fundamental Pattern
6Confidential and Proprietary
Essentials – Full Deployment
7Confidential and Proprietary
Essentials - Federated Environments
Common Staging Area
SAP FinancialsClaims Processing CRM 3rd Party
Federated Marketing Data
Warehouse
Real Time Data Mining and Analytics
Real Time Segmentation, Classification,Qualification,
Offerings, etc.
Federated Financial Data
Warehouse
Analytical Applications
Federated Claims
Processing Data Marts
Subset Data Marts
Real Time ODS
E-Commerce
Actuarial Data
Warehouse
Warranty Analysis
AWARE
Federated Meta Data Repositories
8Confidential and Proprietary
Essentials – Common Information Model
9Confidential and Proprietary
Essentials – EII
10Confidential and Proprietary
Design Build Components
11Confidential and Proprietary
A closer look…
12Confidential and Proprietary
Key Processes
13Confidential and Proprietary
Test Automation Strategy
Test Autom ation Strategy Unit B asics
Test Fixture S trategy
Fresh Fixture Patterns Shared F ixture Patterns
Result Verification Patterns Fixture Tear D ow n Patterns
Recorded Test Scripted Test Data Driven Test
Test Autom ation Fram ework
Fresh F ixture
Standard Fixture
Shared F ixture
Im m utableP re-built
Inline Setup Delegated Setup Im plic it Setup
Creation M ethod O bjec t M ethod
State Verification B ehavior Verification
Expected O bjec t
G uard A ssertion Custom Assertion Verification M ethod Delta Assertion
In line Tear Down Im plic it Tear Down
G arbage Collected Tear
D o w nAutom ated Tear Down
Lazy F ixture Setup Suite Fixture S etup Decora ted Setup Chained Test
F inder M ethod F ixture Regis tryDelta Assertion
Test Autom ation
Fram eworkTest Runner Test Case Object
Test D iscovery Test Enum eration Test Selection
Test Case Object Test M ethod Assertion M ethod
Test Case C lass Four P hase Test A ssertion M essage
Test Execution
Test D efin ition
Construction
Access
14Confidential and Proprietary
Test Automation Strategy - Realized
15Confidential and Proprietary
Data Quality
Data Quality Process
Measure
Analyze
Standardize
Correct
Enhance
Match
Consolidate
Report
Normalize data values andformats according to business
rules and third-party
references
Verify, scrub, and
appends data based upon
algorithms, business rules
provided from a
secondary source
Append additional data
enhancing the
information value
Identify duplicate
records within multiple
tables, databases
Combine unique data
elements from matched
records into a single
source
Provide reporting within
the data quality process
Quantifies the number
and types of defects
Assess the nature and
cause of the defects
Data Profiling
Data Cleansing
Data Enhancement
Match and Consolidate
Management Reporting and Oversight
ParseIsolate and identify
data elements in data
structures
16
Data Quality – Why it is needed
SQL Server 2008 Data Profiling Task in Integration Services
17Confidential and Proprietary
ChoicesEnabling Technologies
18Confidential and Proprietary
• ORACLE, SAP, IBM, Informatica
– Powerful
– Expensive
– Demands high skill levels to deploy successfully
• Microsoft
– Good, well rounded general purpose platform
– Missing key management and meta-data elements
• Open Source (Pentaho, Jaspersoft, and Infobright)
– Validated the market for open source BI reporting and ETL tools
– Good, special purpose tools in the right hands (Talend)
• Alternatives
– Wherescape RED
– Special Purpose Tools (SeeWhy, Pervasive)
The choices
19Confidential and Proprietary
• Labor intensive
• Subject to Vendor Driven Architecture (VDA)
• Expensive (maintenance, hidden support costs)?
• Missing critical management components
• Customization and development costs
• Meet organizational capability and align with objectives
– Expensive and time consuming if not
– JAVA or .NET
– UNIX or Microsoft
• Technical debt
– Quick and dirty is expensive
– Should invest more heavily in design
Total Cost of Ownership
20Confidential and Proprietary
• Replace …
– AIX with Linux
– Websphere with JBOSS
– Domino with Alfresco or Drupal (ECM)
– Cognos with Pentaho
– Tivoli Monitoring with Hyperic
– Tivoli Netview with Zenoss
– Tivoli (Netcool) with OpenNMS
– Tivoli Configuration Manager with Puppet
– Tivoli Provisioning Manager with OpenQRM
How to save 10 million dollars
John Willis: IT Management and Cloud Bloghttp://www.johnmwillis.com/other/how-to-save-10-million-dollars-while-staring-into-the-abyss/
while staring into the abyss…
21Confidential and Proprietary
• Most of our costs are in our people (4-5x)
– Development
– Support
– Maintenance
• Need for consistent, repeatable process controls
– Enable cost efficiency
– Deliver information products faster and less expensive
– Reduced complexity
– Component reuse
– Improved communication
• Leverage standardization benefits
– Less variance in work products
– Solve problems once
– Improved quality (defects caught earlier in cycle)
– Adopt standardized reference models, and templates
Seriously…
22Confidential and Proprietary
• Open Source may not be so “Open”
– Align with internal skills and core competencies
• UNIX vs. Windows
• Java vs. .NET
• Perl vs. Powershell or WSH
• PHP vs. ASP
• Windows DW Stack may not be complete
– Management
– Metadata
– Flexibility
• Do not try to build a system whose complexity exceeds the organization's capabilities to deliver
Seriously…
23Confidential and Proprietary
• Probably something in between
– Platform (don’t forget virtualization in development)
– Database and Storage Architecture
– Middleware
– Data Profiling and Quality Tools
– Configuration Management and ALM
– Test Automation and Continuous Integration
• Cruise Control
• NANT
• MAVEN
– Reporting and Information Delivery
• Reporting Services
• Excel (Server based – zero footprint)
What is the best solution on a budget?
24Confidential and Proprietary
Inexpensive, high quality substitutesAlternatives to mainstream thought
25Confidential and Proprietary
Zenoss Core - monitoring and systems management
26Confidential and Proprietary
Puppet – Automated Systems Administration
27Confidential and Proprietary
Subversion – Version Control
28Confidential and Proprietary
Maven and Eclipse – Build and Manage Projects
29Confidential and Proprietary
Pentaho (BI-Suite)
30Confidential and Proprietary
Jaspersoft
31Confidential and Proprietary
Talend
32Confidential and Proprietary
INFOBright
33Confidential and Proprietary
Protégé and the Essential Architecture Project
34Confidential and Proprietary
DB Designer 4
35Confidential and Proprietary
A word about accelerators
36Confidential and Proprietary
Wherescape RED
37Confidential and Proprietary
Wherescape RED
38Confidential and Proprietary
MethodologyAlong the way…
39Confidential and Proprietary
MIKE2.0 (Methodology)
40Confidential and Proprietary
Comprehensive Process Models
41Confidential and Proprietary
Self documenting
Questions and reference links
• Wherescape REDhttp://www.wherescape.com/home/home.aspx
• Talendhttp://www.talend.com/index.php
• Essential Project
http://www.enterprise-architecture.org/
• Mike 2.0http://mike2.openmethodology.org/
• Pentaho BI Enterprise Suite
http://www.pentaho.com/
• nfoBrighthttp://www.infobright.com/InfoBright
Questions and reference links
• JasperSofthttp://www.jaspersoft.com/
• John Willis: IT Management and Cloud Blog
http://www.johnmwillis.com/other/how-to-save-10-million-dollars-while-staring-into-the-abyss/
• Cruise Controlhttp://cruisecontrol.sourceforge.net/
• Maven
http://maven.apache.org/
• NANT
http://nant.sourceforge.net/
• Subversion, Puppet
http://subversion.tigris.org/, http://reductivelabs.com/trac/puppet/
Data Warehouse on a Budget
How to really do more with less
Thank You…
45Confidential and Proprietary
Mr. Parnitzke is a hands-on technology executive, trusted partner, advisor, software publisher, and widely recognized database management and enterprise architecture thought leader. Over his career he has served in executive, technical, publisher (commercial software), and practice management roles across a wide range of industries. Now a highly sought after technology management advisor and hands-on practitioner his customers include many of the Fortune 500 as well as emerging businesses where he is known for taking complex challenges and solving for them across all levels of the customer’s organization delivering distinctive value and lasting relationships.
Contact:[email protected]
Blogs:Applied Enterprise Architecture (pragmaticarchitect.wordpress.com)
The Corner Office (cornerofficeguy.wordpress.com)
Data management professional (jparnitzke.wordpress.com)
Essential Analytics (essentialanalytics.wordpress.com)
The program office (theprogramoffice.wordpress.com)
Data Warehouse on a Budget
How to really do more with less