Upload
meljun-cortes
View
429
Download
2
Embed Size (px)
DESCRIPTION
MELJUN CORTES Fundamentals of Enterprise Data Management Week 01
Citation preview
© 2013 IBM CorporationIBM Confidential
BAFEDM2: Fundamentals of Enterprise Data Management
Week 01
© 2013 IBM CorporationIBM Confidential
Agenda
2
Course Overview• Introduction to the Course• Setting of Course Objectives• Administrative Matters• Project Introduction
Module 1: Introduction to Data Warehousing• What is a Data Warehouse• Why a Data Warehouse
© 2013 IBM CorporationIBM Confidential
Course OverviewBAFEDM2: Fundamentals of Enterprise Data Management
3
© 2013 IBM CorporationIBM Confidential
Course Description
The course is designed to introduce students to the fundamentals of database management systems, enterprise data management using data warehouse, which can be used for further data mining, reporting and data analysis purposes. It describes various activities involved in data mining tasks like data anomaly detection, data association rule learning, data clustering, data classification, data regression and data summarization. This course also introduces students to formalized means of organizing and storing structured and unstructured data in an organization.
4
© 2013 IBM CorporationIBM Confidential
Course Objectives
The course will enable the student to:Understand database management systemsDescribe the process of data discovery and data patterns in large data setsUnderstand various methods related to intersection of artificial intelligence, machine learning, statistics, and database systemsUnderstand various techniques related to data extraction and data pre-processing before using for data modelingUnderstand the concept of master data management (MDM)Describe data inference considerations, interestingness metrics, complexity considerationsUnderstand various techniques used for post-processing of discovered structures and visualization
5
© 2013 IBM CorporationIBM Confidential
Course Objectives (continued)
Describe the importance of data warehouses for reporting and data analysis and understand the difference from operation data source
Describe formalized means of organizing and storing of documents and other content in an organization related to the organization’s processes
Describe the need and policy around data security and privacy and techniques to restrict information from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction
Describe online fraud and their consequences and understand predictive analytics for detection of fraudulent activities
6
© 2013 IBM CorporationIBM Confidential
Learning Outcomes
Upon completion of this course, the student should be able to:Understand data management concepts and criticality of data availability in order to make reliable business decisionsDemonstrate understanding of business intelligence including the importance of data gathering, data storing, data analyzing and accessing dataDescribe where to look for data in an organization and create required reportsUnderstand the functions and data access constraints of various departments within an organization and provide compliance reportsPerform high-quality tasks required by the organization in particular, and the industry in general
7
© 2013 IBM CorporationIBM Confidential
Course Modules
Module 1: Introduction to Data WarehousingIn this module, the business reasons behind undertaking a data
warehousing project are outlined and the framework of data warehouse architecture is explained. The key factors to be considered while developing a data warehouse are analyzed, and the goals of data warehousing are arrived at.
Module 2: Data Warehouse Design ConsiderationsIn this module, a realistic approach to modeling a business process is
given. It aims at answering questions on how to model complex hierarchies, how to determine the granularity of data required by a business, and other design considerations during dimensional modeling.
8
© 2013 IBM CorporationIBM Confidential
Course Modules (continued)
Module 3: Extract, Transform and Loading ProcessThis module expounds on the different ETL architectures and strategies
available, and how to choose the best strategy for any business environment.
Module 4: Measuring the Effectiveness of a Data WarehouseThis module stresses the importance of having the ability to measure the
progress and quality of a data warehouse in effectively meeting business goals.
Module 5: Data SecurityThis module describes the need and policy around data security and
privacy and techniques to restrict information from unauthorized access, use.
9
© 2013 IBM CorporationIBM Confidential
Course Outline and Timeframe
10
Schedule Module Duration (Hours)
Week 01 Course Overview•Introduction to the Course•Setting of Course Objectives•Administrative Matters•Project IntroductionModule 1: Introduction to Data Warehousing•Data Evolution•What is a Data Warehouse•Data Warehouse vs Business Analytics•Why a Data Warehouse•The Goals of a Data Warehouse
3.0
Week 02 • Framework of the Data Warehouse• Data Warehouse Options
3.0
Week 03 Module 2: Data Warehouse Design Considerations•Data Models•The Dimensional Model•Facts and Dimensions•Four-Step Dimensional Design Process•Case Study: Retail
3.0
© 2013 IBM CorporationIBM Confidential
Course Outline and Timeframe (continued)
11
Schedule Module Duration (Hours)
Week 04 • Case Study: Education• Case Study: Communications• Dimensional Modeling Best Practices• Project Identification and Sign-Off (Data Model)
3.0
Week 05 Project Consultation 3.0
Week 06 • Long Exam 1 (Modules 1 to 2)• Project Consultation
3.0
Week 07 Module 3: Extract, Transform and Loading Process•Extract Processing•Transform and Prepare for Load•Load Process
3.0
Week 08 Module 4: Measuring the Effectiveness of a Data Warehouse•First Step: Measure•Next Step: Manage and Improve•Project Development (ETL Process, Measures)
3.0
Week 09 Project Consultation 3.0
© 2013 IBM CorporationIBM Confidential
Course Outline and Timeframe (continued)
12
Schedule Module Duration (Hours)
Week 10 • Definition of Acceptable Performance• Capacity Planning for the Data Warehouse
3.0
Week 11 Module 5: Data Security•Industry Standards on Data Security•Securing the Data Warehouse•Project Development (Data Security)
3.0
Week 12 Project Consultation 3.0
Week 13 • Long Exam 2 (Modules 3 to 5)• Project Consultation
3.0
Week 14-15 Project Presentation Dry-Run 6.0
Week 16-17 Finals: Project Presentation 6.0
Week 18 • Final Project Submission• Course Debrief
3.0
© 2013 IBM CorporationIBM Confidential
Readings
13
“Data Warehousing: Design, Development and Best Practices”by Soumendra Mohanty
“Building the Data Warehouse”by William H. Inmon
“The Data Warehouse Toolkit”by Ralph Kimball
“The Data Warehouse Lifecycle Toolkit”by Ralph Kimball
© 2013 IBM CorporationIBM Confidential
Course Requirements Class Participation• Lectures and class discussions• Reading and written assignments• Long exams
Project• The class will be divided in groups of 3 or 4. A designated “Project Sponsor” will be
assigned to each group.• Each group will be asked to put together a data warehouse design for the analytics
project that they will put up in BAFBAN1.• The deliverables will be discussed and will be built upon as the course progresses.
Checkpoints may occur during the duration of the course to check the progress of the deliverables.• The goal of the group is to get approval from their “Project Sponsor” for their project
which would mean that they will secure funding for it.• The deliverables will be submitted to the instructor(s) on the appointed time as
indicated in the course timeframe. The instructor(s) will then distribute the deliverables to the “Project Sponsor”.• Each group will be given 1.0 hours to conduct the presentation to their “Project
Sponsor”.• The final grade of the group will be determined by a point system.
14
© 2013 IBM CorporationIBM Confidential
Grading System
Breakdown of MarksExamsQuizzes 10%Long Exams 20%Case Analysis 20%Project Quality of Deliverables* 25%Presentation 10%Final Output 10%Contribution Rating 5% (peer evaluation)
15
*Consider project ranking or extra merit.
© 2013 IBM CorporationIBM Confidential
Grading System (continued)
Grading Scale
16
Final Mark Numerical Equivalent Quality Point Equivalent
A 92 to 100 3.76 to 4.00
B+ 87 to 91 3.31 to 3.75
B 83 to 86 2.81 to 3.30
C+ 79 to 82 2.31 to 2.80
C 76 to 78 1.81 to 2.30
D 70 to 75 1.00 to 1.80
F Below 70 Below 1.00
W Overcut Overcut
© 2013 IBM CorporationIBM Confidential
Classroom PoliciesAttendance will be checked at the start of the sessions. Students are
allowed to miss a maximum of nine (9) class hours for this course. Hours missed due to tardiness will be counted towards this maximum number.
Deadlines will be strictly enforced. Deliverables received after the designated deadlines will not be checked.
Graded work will be returned to the students within a reasonable period of time. One week after the release of graded work, students are allowed to appeal for changes of grade. Beyond this period, appeals will no longer be entertained.
Make-up activities may be given only to students who have missed or are unable to complete or undertake a major class requirement due to:• Participation in an official school activity• Illness which involves hospitalization or contagious diseasesIn either case, students are required to present proper documentation
prior to taking the make-up exam.
17
© 2013 IBM CorporationIBM Confidential
Classroom Policies (continued)
Students are not allowed to eat or drink inside the classrooms. If students should choose to eat dinner or any snack during the break, they must take their food outside the classroom.
Students are required to turn off their mobile devices before the start of class. Any device that goes off during class may be confiscated. A first offense is punishable with a warning. A second offense can be subjected to disciplinary proceedings.
Students should come to class in proper attire. Student not in proper attire will not be allowed inside the classroom.
Other rules and general academic policies will apply.
18
© 2013 IBM CorporationIBM Confidential
Module 1: Introduction to Data Warehousing
BAFEDM2: Fundamentals of Enterprise Data Management
19
© 2013 IBM CorporationIBM Confidential
Data Evolution
Evolving data to information to knowledge to action creates outcome that has the most impact and value to an organization.
20
“Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.” This is what Business Intelligence is really about!“Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.” This is what Business Intelligence is really about!
DescriptiveQuantitativeQualitative
FactsMetrics
RecallInstincts
ExperienceBelief
InsightResolveDecision
Innovation
Done by Software Done by People
DataInformation
KnowledgeAction
AchievementDiscovery
Outcome
Impact and Value
© 2013 IBM CorporationIBM Confidential
Data Evolution (continued)Data: is composed of individual discreet facts that collect descriptive,
quantitative, and qualitative value of business interests. Data warehousing involves three types of data:• Run the Business Data: produced by corporate applications, such as the one ‐ ‐
used to fill customer orders for its products or the one used to manage financial transactions.• Integrate the Business Data: built to improve the quality of and synchronize ‐ ‐
two or more applications, such as a master list of customers.• Monitor the Business Data: presented to end users for reporting and decision ‐ ‐
support, such as financial dashboards. Information: is an organized collection of data presented in a specific and
meaningful wayKnowledge: it encompasses the familiarity, awareness, understanding,
and perceptions of a person about a given subjectAction: is the process of doing something; effective action is the process
of doing the right thing
21
© 2013 IBM CorporationIBM Confidential
What is a Data Warehouse
22
A data warehouse is a subject-oriented, integrated, non-volatile, time-variant collection of data in support of management’s decisions.
A data warehouse is a subject-oriented, integrated, non-volatile, time-variant collection of data in support of management’s decisions.
In a data warehouse, information used for analysis is organized around subjects (e.g., employees, accounts, sales, products) rather than activities.
Integrated data refers to de-duplicating information and merging it from many sources into one consistent definition (e.g., when short listing the top banks in the country, you must know that “BPI” and “Bank of the Philippine Islands” are one and the same.
Since the information in a data warehouse is heavily queried against time, it is extremely important to preserve it pertaining to each and every business event of an organization.
Time-referenced data essentially refers to its time-valued characteristic (e.g., what were the total sales of Product A for the past three years).
© 2013 IBM CorporationIBM Confidential
What is a Data Warehouse (continued)
A data warehouse is a powerful database model that significantly enhances the user’s ability to quickly analyze large, multidimensional data sets. It cleanses and organizes data to allow users to make business decisions based on facts.
A data warehouse is a collection of integrated, subject oriented databases designed to support the decision support function where each unit of data is relevant to some moment of time.
A data warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End-user orientated data access and reporting tools let user get at the data for decision support.
23
© 2013 IBM CorporationIBM Confidential
Data Warehouse vs Business Analytics
Data Warehousing• is a way of storing data and creating information through leveraging
data marts. Data marts are segments or categories of information and/or data that are grouped together to provide insights into that segment or category. A data warehouse does not require business intelligence to work. Reporting tools can generate reports from the data warehouse.
Business Analytics• is the leveraging of a data warehouse to help make business decisions
and recommendations. Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.
24
© 2013 IBM CorporationIBM Confidential
Why a Data Warehouse: Need and AdvantagesOperational Efficiency• Make the right information available at the right time• Manage data volumes and business complexities
Compliance and Transparency• Leverage value of data across the enterprise• Adherence to federal rules and regulations
Information Integration• Manage information complexity with data integration• Manage information technology costs
Competitive Differentiation• Outperform the competition rather than just stay in business• Use analytics to get insights information within data
Data Governance• Manage data as an asset• Make data secure and reliable
25
© 2013 IBM CorporationIBM Confidential
Why a Data Warehouse: Implementation ChallengesCost in Building the Data Warehouse• Operational cost for a data warehouse• Maintenance cost for a data warehouse
Big Bang Approach• Justifying return on investment (ROI) on data warehouse development• Time required to build and get results in a data warehouse
Architectural Challenges• Manage information cosistency with architectural changes• Manage information complexity with data consolidation
Data Governance Issues• Data ownership issues• Data integration, consistency, and quality issues
Business Complexities• Adaptability of a data warehouse to ever changing business scenarios• Business complexities due to mergers and acquisitions
26
© 2013 IBM CorporationIBM Confidential
The Goals of a Data Warehouse
27
"We have mountains of data in this company, but we can't access it."
"We need to slice and dice the data every which way."
"You've got to make it easy for business people to get at the data directly."
"Just show me what is important."
"It drives me crazy to have two people present the same business metrics at a meeting, but with different numbers."
"We want people to use information to support more fact-based decision making."
© 2013 IBM CorporationIBM Confidential
The Goals of a Data Warehouse (continued)
The data warehouse must make an organization's information easily accessible.
The data warehouse must present the organization's information consistently.
The data warehouse must be adaptive and resilient to change.The data warehouse must be a secure bastion that protects our
information assets.The data warehouse must serve as the foundation for improved
decision making. The business community must accept the data warehouse if it is to
be deemed successful.
28
© 2013 IBM CorporationIBM Confidential
For the Next SessionBAFEDM2: Fundamentals of Enterprise Data Management
29
© 2013 IBM CorporationIBM Confidential
For the Next Sessions
Agenda• Framework of the Data Warehouse• Data Warehouse Options
30