Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Barriers to BI - Technical and Non-technical:
The changing role of Data Architects
1
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Peter Aiken
• DoD Computer Scientist– Reverse Engineering Program Manager/Office of the Chief Information Officer (1992-1997)
• Visiting Scientist
– Software Engineering Institute/Carnegie Mellon University (2001-2002)
• DAMA International Advisor/Board Member (http://dama.org)
– 2001 DAMA International Individual Achievement Award (with Dr. E. F. "Ted" Codd)
– 2005 DAMA Community Award
• Founding Advisor/International Association for Information and Data Quality (http://iaidq.org)
• Founding Advisor/Meta-data Professionals Organization (http://metadataprofessional.org)
• Founding Director Data Blueprint 1999
2
• Full time in information technology since 1981
• IT engineering research and project background
• University teaching experience since 1979
• Seven books and dozens of articles
• Research Areas – reengineering, data reverse engineering, software requirements engineering, information engineering, human-
computer interaction, systems integration/systems engineering, strategic planning, and DSS/BI
• Director
– George Mason University/Hypermedia Laboratory (1989-1993)
• Published Papers– Communications of the ACM, IBM Systems Journal, InformationWEEK, Information & Management, Information
Resources Management Journal, Hypermedia, Information Systems Management, Journal of Computer Information Systems and IEEE Computer & Software
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Dogs New Clothes
3
• What does it mean to improve BI/Analytic capabilities?
• Improvement requires addressing two types of challenges – Technical challenges
– Non-technical challenges
• Moving your initiative to the next level
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Agenda
4
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Source: Black Market Prices Spring 2007 from Trend Micro
Valuing Information• $980-$4,900
– Trojan program to steal online account information
• $490
– Credit card number with PIN
• $78-294
– Billing data, including account number, address, SSAN, home address, and birthdate
• $147
– Driver's license
• $147
– Birth certificate
• $98
– Social security card
• $6-24
– Credit card number with security code and expiration date
• $6
– Paypal account logon and password
5
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
IT Project Failure RatesRecent IT project failure rates statistics can be summarized as follows:
– Carr 1994
• 16% of IT Projects completed on time, within budget, with full functionality
– OASIG Study (1995)
• 7 out of 10 IT projects "fail" in some respect
– The Chaos Report (1995)
• 75% blew their schedules by 30% or more
• 31% of projects will be canceled before they ever get completed
• 53% of projects will cost over 189% of their original estimates
• 16% for projects are completed on-time and on-budget
– KPMG Canada Survey (1997)
• 61% of IT projects were deemed to have failed
– Conference Board Survey (2001)
• Only 1 in 3 large IT project customers were very “satisfied"
– Robbins-Gioia Survey (2001)
• 51% of respondents viewed their large IT implementation project as unsuccessful
– MacDonalds Innovate (2002)
• Automate fast food network from fry temperature to # of burgers sold-$180M USD write-off
– Ford Everest (2004)
• Replacing internal purchasing systems-$200 million over budget
– FBI (2005)
• Blew $170M USD on suspected terrorist database-"start over from scratch"
http://www.it-cortex.com/stat_failure_rate.htm (accessed 9/14/02)
New York Times 1/22/05 pA31
6
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
[Aiken Data Reverse Engineering 1996 p. 50]
Agility
• In today's turbulent competitive environments, an organization's agility must be defined as a state of:– "organizational dexterity …
combined with ... awareness of that dexterity"
• Goal:– Facilitate the interaction among
organizations, individuals, and systems
• Organizational 'agility' is limited by capabilities
7
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI/Analytic Capabilities
• Business Intelligence (BI)
– refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself.
– The purpose of business intelligence--a term that dates at least to 1958--is to support better business decision making.
• Analytics
– The simplest definition of Analytics is "the science of analysis."
– A simple and practical definition, however, would be how an entity (i.e., business) arrives at an optimal or realistic decision based on existing data. • Both definitions from wikipedia.org
8
Analytics
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI/Analytic Capabilities
9
Business Intelligence
Strategy formulation Strategy implementation
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
50% Data Warehouse Failure Rate
10
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
FBI manages data on Americans to reduce risk
• STAR (System to Assess Risk)
– Rates the threat posed by people already identified as suspected terrorists
• Identity Intelligence Program (2003)
– Examine and analyze consumer complaints to identify major identity theft rings in a given geographic area
• Health Care Fraud System (2003)
– Analyses billing records in government and private insurance claims databases to identify fraud or over-billing by health care providers.
• FDA Consumer Complaints (2005)
– Looks at consumer complaints to the Food and Drug Administration to identify larger trends about fraud by Internet pharmacies.
• Housing Fraud (1999)
– Analyzes public data on real estate transactions to identify fraudulent housing purchases, including so- called property flipping.
• National Insurance Crime Bureau
– Compares National Insurance Crime Bureau information against other data to crack down on fake car accident insurance claims and identify major offenders.
http://www.msnbc.msn.com/id/19702310/
11
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Defining BI/Analytic Capabilities
• A common application of business analytics is portfolio analysis. In this, a bank or lending agency has a collection of accounts of varying value and risk. The accounts may differ by the social status (wealthy, middle-class, poor, etc.) of the holder, the geographical location, its net value, and many other factors. The lender must balance the return on the loan with the risk of default for each loan. The question is then how to evaluate the portfolio as a whole.
• For instance, the least risk loan may be to the very wealthy, but there are a very limited number of wealthy people. On the other hand there are many poor that can be lent to, but at greater risk. Some balance must be struck that maximizes return and minimizes risk. The analytics solution may combine time series analysis, with many other issues in order to make decisions on when to lend money to these different borrower segments, or decisions on the interest rate charged to members of a portfolio segment to cover any losses among members in that segment.
12
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI/Analytic KPIs
• BI/Analytic initiative goals
– Understanding the current and future enterprise BI/Analytic needs and delivering required information effectively and efficiently
• Key Performance Indicators (KPI)
– Financial and non-financial metrics used to help an organization define and measure progress toward organizational goals.
– KPIs can be delivered through Business Intelligence techniques to assess the present state of the business and to assist in prescribing a course of action. (from wikipedia.org)
• BI/Analytic KPIs
– Financial and non-financial measures used to help an organization define and progress towards organizational business intelligence goals
– BI/Analytic KPIs permit an organization to determine the state of its BI/Analytic initiatives
13
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Improved BI/Analytic Capabilities
• Plan
– Recognize an opportunity and plan a change
• Develop good understanding of existing strengths, weaknesses, capabilities and limitations
• Do
– Test the change
• Implement plans to take advantage of strengths and capabilities
• Develop plans to address weaknesses and limitations
• Check
– Review the test, analyze the results and identify what you’ve learned
• Improved the corrective action
• Act
– Take action based on previous steps results
• Refine the correction - repeat– Adapted from http://www.asq.org/learn-about-quality/project-planning-tools/overview/pdca-cycle.html
14
• What does it mean to improve BI/Analytic capabilities?
• Improvement requires addressing two types of challenges – Technical challenges
– Non-technical challenges
• Moving your initiative to the next level
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Agenda
15
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI Challenges
• Technical Challenges– Poor quality data
– Poor understanding of architectural constructs
– Poor quality data management practices
– New technical expertise is required
• Non-Technical Challenges– Architecture is under appreciated
– BI perceived as a "technology" project
– Inability to link technical capabilities to business objectives
– Putting BI initiatives in context
• Compliance and Regulatory Mandates
16
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Cost of Poor Data Quality $600 Billion Annually!
Thanks to Bret Champlin17
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Why?
18
24%
28%
35%
36%
43%
46%
46%
47%
60%
Performance and scalability
Immature technology
Lack of tools for doing real-time
processing
Education and understanding of
real-time BI by IT staff
Poor quality data
Lack of infrastructre for handingreal-time processing
Education and understanding ofreal-time BI by business users
Non-integrated data sources
Business case, high cost or budget
issues
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Obstacles to Real-Time BI-Lessons from Deployment
TDWI The Real Time Enterprise Report, 200319
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Defining CustomerChallenges
• Purchased an A4 on June 15 2007
• Had not done business with the dealership prior
• "makes them seem
sleazy when I get a
letter in the mail
before I've even made
the first payment on
the car advertising
lower payments than I
got"
20
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
A congratulations letter from another
bank
Problems
• Bank did not know it made an error
• Tools alone could not have prevented this error
• Lost confidence in the ability of the bank to manage customer funds
21
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
From my retirement plan
22
Hypothesized extensions contributed by a Chicago DAMA Member10. Both soon to be female11. Both soon to be male12. Psychologically female, biologically male13. Psychologically male, biologically female
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
FBI & Canadian Social Security Gender Codes
1. Male
2. Female
3. Formerly male now female
4. Formerly female now male
5. Uncertain
6. Won't tell
7. Doesn't know
8. Male soon to be female
9. Female soon to be male
23
If column 1 in
source = "m"
• then set
value of
target data
to "male"
• else set
value of
target data
to "female"
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Why Data Projects Fail by Joseph R. Hudicka
• Assessed 1200 migration projects!
– Surveyed only experienced migration specialists who have done at least four migration projects
• The median project costs over 10 times the amount planned!
• Biggest Challenges: Bad Data; Missing Data; Duplicate Data
• The survey did not consider projects that were cancelled largely due to data migration difficulties
• "… problems are encountered rather than discovered"
Median Project Expense
Median Project Cost
$0 $125,000 $250,000 $375,000 $500,000
Joseph R. Hudicka "Why ETL and Data Migration Projects Fail" Oracle Developers Technical Users Group Journal June 2005 pp. 29-3124
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
The challenge ahead
0.00
1.00
2.00
3.00
4.00
5.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
The chart represents the average scores presented on the previous slide - interesting that none have apparently reached level-3
25
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Misunderstanding Data Management
26
Data Quality, Data Security, Analytics,Data Compliance, Data Mashups, Business Rules
(more)
2000-
Organization-wide DM coordinationOrganization-wide data integration
Data stewardshipData use
1990-2000
Data requirements analysisData modeling
1970-1990
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Expanding DM Scope
Years 1950-1970
Database designDatabase operation
27
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
"Understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting business activities"Aiken, P, Allen, M. D., Parker, B., Mattia, A., "Measuring Data Management's Maturity: A Community's Self-Assessment" IEEE Computer (research feature April 2007)
Data Management
28
1. Each FACT combines with one or more MEANINGS.2. Each specific FACT and MEANING combination is referred to as a DATUM.3. An INFORMATION is one or more DATA that are returned in response to a specific
REQUEST. 4. INFORMATION REUSE is enabled when one FACT is combined with more than
one MEANING.5. INTELLIGENCE is INFORMATION associated with its USES.6. Data Element: a unit of data for which the definition, identification, representation
and permissible values are specified by means of a set of attributes© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Data Data
Data
Information
Fact Meaning
Request
A Model Specifying Relationships Among Important Terms
[Built on definition by Dan Appleton 1983]
Intelligence
Use
Wisdom & knowledge are often used synonymously
Data
Data
Data Data
29
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Avoiding Unnecessary Work Using Business Rule Metadata
Person Job Class
Employee Position
BR1) Zero, one, or more EMPLOYEES can be
associated with one PERSON
BR2) Zero, one, or more EMPLOYEES can be associated with one JOB CLASS;
BR3) Zero, one, or more EMPLOYEES can be associated with one POSITION
BR4) One or more POSITIONS can be associated with one JOB CLASS.
30
Job Sharing
'Mond-Licht' or
'Mondschein'
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
StandardData
Organizational DM Functions and their Inter-relationships
Data Program
Coordination
Organizational
Data Integration
Data
Stewardship
Data Support
Operations
Data
Asset Use
Organizational Strategies
Goals
IntegratedModels
BusinessData
Business Value
Application Models & Designs
Feedback
Implementation
Direction
Data
Development
Guidance
Major focus
of study and
research
31
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
New Technical Expertise Required
• Focus has been on new systems development
• Guidance and technical expertise required to develop new data applications and components. – New domain focus is on maintenance of
existing environments.
– Understanding what the existing systems were originally designed to accomplish (the requirements) and on how (the design) those systems implemented the business requirements in a physical system.
32
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Future Challenge: BI and Non-Tabular Data
33
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Integration of Unstructured Data
• Properties selection under the file menu of MS-Office 2000 +
• Queries can be run for slide titles or other document structures
34
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Typical System Evolution
Payroll Application(3rd GL)
Payroll Data(database)
R& D Applications(researcher supported, no documentation)
R & DData(raw)
Mfg. Data(home grown
database) Mfg. Applications(contractor supported)
FinanceData
(indexed)
Finance Application(3rd GL, batch system, no source)
Marketing Application(4rd GL, query facilities, no reporting, very large)
Marketing Data(external database)
Personnel Data(database)
Personnel App.(20 years old,
un-normalized data)
35
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
How many interfaces are required to solve this integration problem?
Application 4 Application 5 Application 6
15 Interfaces(N*(N-1))/2
Application 1 Application 2 Application 3
RBC: 200 applications - 4900 batch interfaces36
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
XML-based Integration Solution
Application 4 Application 5 Application 6
Integration Processor
Application 1 Application 2 Application 3
37
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
3-Way Scalability
Expand the:
1. Number of data items from each system
– How many individual data items are tagged?
2. Number of interconnections between the systems and the hub
– How many systems are connected to the hub?
3. Amount of interconnectability among hub-connected systems
– How many inter-system data item transformations exist in the rule collection?
© Copyright 2004 by Data Blueprint - all rights reserved!43 - datablueprint.com
XML-based Integration SolutionXML-based Integration Solution
Application 4 Application 5 Application 6
XML Processor
Application 1 Application 2 Application 3
38
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Nicolo Machiavelli (1469-1527)
He who doesn’t lay his foundations before hand, may by great abilities do so afterward,although with great trouble to the architect and danger to the building.
Machiavelli, Niccolo. The Prince. 19 Mar. 2004 http://pd.sparknotes.com/philosophy/prince39
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Sample Conversation (Developing Constraints)
• I'd like to build a building.
• What kind of building - do you want to sleep in it? Eat in it? Work in it?
• I'd like to sleep in it.
• Oh, you want to build a house?
• Yes, I'd like a house.
• How large a house do you have in mind?
• Well, my lot size is 100 feet by 300 feet.
• Then you want a house about 50 feet by 100 feet.
• Yes, that's about right.
• How many bedrooms do you need?
• Well, I have two children, so I'd like three bedrooms ...40
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Building from the Top
41
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Look Familiar?
42
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI perceived as a "technology" project
43
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
BI perceived as a "technology" project
44
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Link business objectives to technical capabilities
45
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Legal Incentives
• Data Quality Act
• Clinger-Cohen
• Sarbanes-Oxley
• Basel II Capital Accord
• California Senate Bill 1386
• USA Patriot Act
46
• What does it mean to improve BI/Analytic capabilities?
• Improvement requires addressing two types of challenges – Technical challenges
– Non-technical challenges
• Moving your initiative to the next level
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Agenda
47
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com48
Initial(1)
Our DM practices are ad hoc and dependent upon "heroes"
Repeatable(2)
We have DM experience and have the ability to implement disciplined
processes
Defined(3)
We have experience that we have standardized so that all in the
organization can follow it
1996 Council of American
Building Officials (COBE) and the
2000 International Code Council
recommendations call for unit
runs to be not less than 10
inches and unit rises not more
than 7! inches.
Capability Maturity Model Levels Optimizing
(5)
We have a process for improving our DM capabilities
We manage our DM processes so that the whole organization can follow our standard
DM guidance
Managed(4)
One concept for process improvement, others include:
• Norton Stage Theory
• TQM
• TQdM
• TDQM
• ISO 9000
and focus on understanding current processes and determining where improvements can be made.
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Source: Applications Executive Council, Applications Budget, Spend, and Performance Benchmarks: 2005 Member Survey Results, Washington D.C.: Corporate Executive Board 2006, p. 23.
Percentage of Projects on BudgetBy Process Framework Adoption
…while the same pattern generally holds true for on-time performance
Percentage of Projects on TimeBy Process Framework Adoption
Key Finding: Process Frameworks are not Created Equal
With the exception of CMM and ITIL, use of process-efficiency frameworks does not predict higher on-budget project delivery…
49
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Development guidance
Data Adminstration
Support systems
Asset recovery capability
Development training
0 1 2 3 4 5
Nokia Industry Competition All Respondents
Data Management Practices Assessment
Challenge
Challenge
Challenge
Client
Result 1
Result 2
Result 3
Result 4
Result 5
50
Page
High Marks for IFC’s Program
Data Mgmt Audit 2006
Leadership & Guidance
Asset Creation
Metadata Management
Quality Assurance
Change Management
Data Quality
0 1 2 3 4 5
Overall Benchmarks Industry Benchmarks TRE IFC ISG
"These IFC scores represent the highest aggregate scores in the area of data stewardship recorded in our database of hundreds of assessments that has been recognized as as a representative scientific sample."
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Assessment Benefits
• Quantitative Benefits– Objective determination of
baseline BI/Analytic capabilities
– Gap analysis indicates specific actions required to achieve the "next" level
– Available comparisons with similar organizations
– Provides facts useful when prioritizing subsequent investments
• Qualitative Benefits– Highlights strengths, weaknesses, capabilities,
and limitations existing BI/Analytic practices
– Raises awareness of business value of BI/Analytics
52
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
Data Management Incentives
• Faster– Capable of rapid response
• Better– Better resource use
• Cheaper– 20 to 40% of all IT spending is
non-value added (i.e., integration)
• No decrease in quality!
• Legal
53
© Copyright 2008 and previous years by Data Blueprint - all rights reserved! - datablueprint.com
http://peteraiken.net
Contact Information:
Peter Aiken, Ph.D.
Department of Information Systems School of BusinessVirginia Commonwealth University1015 Floyd Avenue - Room 4170Richmond, Virginia 23284-4000
Data Blueprint Maggie L. Walker Business & Technology Center501 East Franklin StreetRichmond, VA 23219804.521.4056http://datablueprint.com
office :+1.804.883.759cell:+1.804.382.5957
e-mail:[email protected]://peteraiken.net
Copyright 12/18/07 by Data Blueprint - all rights reserved!54