Business Intelligence. Unit 1 Important Concepts

Embed Size (px)

Citation preview

  • Slide 1
  • Business Intelligence
  • Slide 2
  • Unit 1 Important Concepts
  • Slide 3
  • INTRODUCTION Organizations need business intelligence Business intelligence (BI) knowledge about your customers, competitors, business partners, competitive environment, and internal operations to make effective, important, and strategic business decisions 3-3
  • Slide 4
  • INTRODUCTION IT tools help process information to create business intelligence according to: OLTP OLAP 3-4
  • Slide 5
  • INTRODUCTION Online transaction processing (OLTP) the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information Databases support OLTP Operational database databases that support OLTP 3-5
  • Slide 6
  • INTRODUCTION Online analytical processing (OLAP) the manipulation of information to support decision making Databases can support some OLAP Data warehouses only support OLAP, not OLTP Data warehouses are special forms of databases that support decision making 3-6
  • Slide 7
  • INTRODUCTION 3-7
  • Slide 8
  • What Is a Data Warehouse? Data warehouse logical collection of information gathered from operational databases used to create business intelligence that supports business analysis activities and decision-making tasks A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context. -- Barry Devlin, IBM Consultant 3-8
  • Slide 9
  • 3-9 What Is a Data Warehouse?
  • Slide 10
  • 3-10 What Is a Data Warehouse? Multidimensional Rows and columns Also layers Many times called hypercubes
  • Slide 11
  • Data Warehouses a record of an enterprise's past transactional and operational information designed to favor efficient data analysis and reporting data warehousing is not meant for current "live" data
  • Slide 12
  • Data Warehouses large amounts of data sometimes subdivided into smaller logical units (dependent data marts)
  • Slide 13
  • 3-13 What Are Data-Mining Tools? Data-mining tools software tools that you use to query information in a data warehouse Query-and-reporting tools Intelligence agents Multidimensional analysis tools Statistical tools
  • Slide 14
  • Data Warehouses Components of a data warehouse: Sources -> Data Source Interaction Data Transformation Data Warehouse (Data Storage) Reporting (Data Presentation) Metadata
  • Slide 15
  • Slide 16
  • Data Warehouses ADVANTAGES complete control over the four main areas of data management systems: Clean data Query processing: multiple options Indexes: multiple types Security: data and access
  • Slide 17
  • Data Warehouses DISADVANTAGES Adding new data sources takes time and associated high cost Data owners lose control over their data, raising ownership, security and privacy issues Long initial implementation time and associated high cost Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries
  • Slide 18
  • OLTP vs. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse
  • Slide 19
  • OLTP Database vs. Data Warehouse relational databases - groups data using common attributes found in the data set objectives are different
  • Slide 20
  • OLTP database Data Warehouse Designed for real time business operations Designed for analysis of business measures by categories and attributes
  • Slide 21
  • OLTP database Data Warehouse Mostly updates Many small transactions Mb - Gb of data Mostly reads Queries are long and complex Gb - Tb of data
  • Slide 22
  • OLTP database Data Warehouse Current snapshot Raw data Thousands of users (e.g., clerical users) History Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts)
  • Slide 23
  • SUMMARY four questions for you
  • Slide 24
  • Designed for real time business operations Designed for analysis of business measures by categories and attributes 1 2
  • Slide 25
  • Designed for real time business operations Designed for analysis of business measures by categories and attributes Data Warehouse OLTP database
  • Slide 26
  • Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. 1 2
  • Slide 27
  • Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. OLTP database Data Warehouse
  • Slide 28
  • Loaded with consistent, valid data; requires no real time validation. Optimized for validation of incoming data during transactions; uses validation data tables. 1 2
  • Slide 29
  • Loaded with consistent, valid data; requires no real time validation. Optimized for validation of incoming data during transactions; uses validation data tables. OLTP database Data Warehouse
  • Slide 30
  • Supports thousands of concurrent users. Supports few concurrent users relative to OLTP. 1 2
  • Slide 31
  • Supports thousands of concurrent users. Supports few concurrent users relative to OLTP. Data Warehouse OLTP database
  • Slide 32
  • Data, Information & Knowledge Data is just symbols Information is data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions Knowledge is application of data and information; answers "how" questions
  • Slide 33
  • Data Data is raw. It simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself. In computer parlance, a spreadsheet generally starts out by holding data.
  • Slide 34
  • Information Information is data that has been given meaning by way of relational connection. This "meaning" can be useful, but does not have to be. In computer parlance, a relational database makes information from the data stored within it.
  • Slide 35
  • Knowledge Knowledge is the appropriate collection of information, such that it's intent is to be useful. Summaries of information in a database for example. Or modeling and simulation tools exercise some type of stored knowledge.
  • Slide 36
  • Copyright 2005 Pearson Addison- Wesley. All rights reserved. 1-36 Examples - Supermarket OLTP Event is 3 cans of soup and 1 box of crackers bought; update database to reflect that event OLAP Last winter in all stores in northeast, how many customers bought soup and crackers together? Data Mining Are there any interesting combinations of foods that customers frequently bought together?
  • Slide 37
  • 11 Database designing rules Rule 1: What is the nature of the application (OLTP or OLAP)? Rule 2: Break your data in to logical pieces, make life simpler Rule 3: Do not get overdosed with rule 2 Rule 4: Treat duplicate non-uniform data as your biggest enemy Rule 5: Watch for data separated by separators
  • Slide 38
  • Database designing rules Rule 6: Watch for partial dependencies Rule 7: Choose derived columns preciously Rule 8: Do not be hard on avoiding redundancy, if performance is the key Rule 9: Multidimensional data is a different beast altogether Rule 10: Centralize name value table design Rule 11: For unlimited hierarchical data self-reference PK and FK
  • Slide 39
  • Slide 40
  • Normal form examples 1 NF : First Name, Middle name, Surname- different columns 2 NF : Syllabus column of 5 th standard should depend on both primary keys roll no. & standard 3 NF : Average column depends on marks & no. of subjects Normalization rules are important guidelines but taking them as a mark on stone is calling for trouble.
  • Slide 41
  • Rule 1: What is the nature of the application (OLTP or OLAP)? Transactional: End user is more interested in CRUD, i.e., creating, reading, updating, and deleting records. The official name for such a kind of database is OLTP. Analytical: End user is more interested in analysis, reporting, forecasting, etc. - less number of inserts and updates. - main intention here is to fetch and analyze data as fast as possible. - The official name for such a kind of database is OLAP.
  • Slide 42
  • Rule 1: What is the nature of the application (OLTP or OLAP)? In other words if you think inserts, updates, and deletes are more prominent then go for a normalized table design, else create a flat denormalized database structure.
  • Slide 43
  • Slide 44
  • Rule 2: Break your data into logical pieces, make life simpler The first rule from 1 st normal form. If your queries are using too many string parsing functions like substring, charindex, etc apply this rule E.g. Query- student names having Koirala and not Harisingh, very complex query The better approach would be to break this field into further logical pieces to write clean and optimal queries.
  • Slide 45
  • Slide 46
  • Rule 3: Do not get overdosed with rule 2 Decomposing, is it needed? The decomposition should be logical. Its rare that you will operate on ISD codes of phone numbers separately (until your application demands it). So it would be a wise decision to just leave it as it can lead to more complications.
  • Slide 47
  • Rule 4: Treat duplicate non-uniform data as your biggest enemy Focus and refactor duplicate data, it creates confusion. For instance, in the below diagram, you can see 5th Standard and Fifth standard means the same.
  • Slide 48
  • Rule 4: Treat duplicate non-uniform data as your biggest enemy One of the solutions -move the data into a different master table altogether and refer them via foreign keys. E.g. new master table called Standards and linked the same using a simple foreign key.
  • Slide 49
  • Rule 5: Watch for data separated by separators The 2 nd rule of 1 st normal form says avoid repeating groups. Too much data stuffed in syllabus column. These fields are termed as Repeating groups. To manipulate this data, the query would be complex and the performance of the queries degrades.
  • Slide 50
  • Rule 5: Watch for data separated by separators Columns which have data stuffed with separators need special attention and a better approach would be to move those fields to a different table and link them with keys for better management.
  • Slide 51
  • Rule 6: Watch for partial dependencies Watch for fields which depend partially on primary keys. E.g Primary key is created on roll number and standard. The syllabus is associated with the standard in which the student is studying and not directly with the student. Move the syllabus field and attach it to the Standards table. This rule is the 2 nd normal form: All keys should depend on the full primary key and not partially.
  • Slide 52
  • Slide 53
  • Rule 7: Choose derived columns preciously
  • Slide 54
  • OLTP applications: getting rid of derived columns would be a good OLAP :a lot of summations, calculations, these kinds of fields are necessary to gain performance. The 3 rd normal form: No column should depend on other non-primary key columns. See the situation and then decide if you want to implement the 3 rd normal form.
  • Slide 55
  • Rule 8: Do not be hard on avoiding redundancy, if performance is the key Need for performance: think about de- normalization. Normalization: make joins with many tables Denormalization: the joins reduce and increase performance.
  • Slide 56
  • Rule 8: Do not be hard on avoiding redundancy, if performance is the key
  • Slide 57
  • Rule 9: Multidimensional data is a different beast altogether OLAP projects mostly deal with multidimensional data. E.g. get sales per country, customer, and date, where sales figures have three intersections of dimension data.
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Rule 10: Centralize name value table design Name and value tables :has key and some data associated with the key. E.g. currency table and a country table. Have only a key and value. For such kinds of tables, creating a central table and differentiating the data by using a type field makes more sense.
  • Slide 62
  • Slide 63
  • Rule 11: For unlimited hierarchical data self- reference PK and FK Unlimited parent child hierarchy. E.g. A multi-level marketing scenario where a sales person can have multiple sales people below them. For such scenarios, using a self-referencing primary key and foreign key will help to achieve the same
  • Slide 64
  • Slide 65
  • Business Models Depends on business requirements E.g. E-commerce business model
  • Slide 66
  • 3-66 Data Marts Data warehouses can support all of an organizations information Data marts have subsets of an organizationwide data warehouse Data mart subset of a data warehouse in which only a focused portion of the data warehouse information is kept
  • Slide 67
  • Slide 68
  • Assignment 1 Differentiate between OLTP and OLAP. Explain the design aspects of OLTP & OLAP What is BI & what are its components?
  • Slide 69
  • References OLTP Vs OLAP ppts Notes by Shivprasad koirala