INTRODUCTION Organizations need business intelligence Business
intelligence (BI) knowledge about your customers, competitors,
business partners, competitive environment, and internal operations
to make effective, important, and strategic business decisions
3-3
Slide 4
INTRODUCTION IT tools help process information to create
business intelligence according to: OLTP OLAP 3-4
Slide 5
INTRODUCTION Online transaction processing (OLTP) the gathering
of input information, processing that information, and updating
existing information to reflect the gathered and processed
information Databases support OLTP Operational database databases
that support OLTP 3-5
Slide 6
INTRODUCTION Online analytical processing (OLAP) the
manipulation of information to support decision making Databases
can support some OLAP Data warehouses only support OLAP, not OLTP
Data warehouses are special forms of databases that support
decision making 3-6
Slide 7
INTRODUCTION 3-7
Slide 8
What Is a Data Warehouse? Data warehouse logical collection of
information gathered from operational databases used to create
business intelligence that supports business analysis activities
and decision-making tasks A data warehouse is simply a single,
complete, and consistent store of data obtained from a variety of
sources and made available to end users in a way they can
understand and use it in a business context. -- Barry Devlin, IBM
Consultant 3-8
Slide 9
3-9 What Is a Data Warehouse?
Slide 10
3-10 What Is a Data Warehouse? Multidimensional Rows and
columns Also layers Many times called hypercubes
Slide 11
Data Warehouses a record of an enterprise's past transactional
and operational information designed to favor efficient data
analysis and reporting data warehousing is not meant for current
"live" data
Slide 12
Data Warehouses large amounts of data sometimes subdivided into
smaller logical units (dependent data marts)
Slide 13
3-13 What Are Data-Mining Tools? Data-mining tools software
tools that you use to query information in a data warehouse
Query-and-reporting tools Intelligence agents Multidimensional
analysis tools Statistical tools
Slide 14
Data Warehouses Components of a data warehouse: Sources ->
Data Source Interaction Data Transformation Data Warehouse (Data
Storage) Reporting (Data Presentation) Metadata
Slide 15
Slide 16
Data Warehouses ADVANTAGES complete control over the four main
areas of data management systems: Clean data Query processing:
multiple options Indexes: multiple types Security: data and
access
Slide 17
Data Warehouses DISADVANTAGES Adding new data sources takes
time and associated high cost Data owners lose control over their
data, raising ownership, security and privacy issues Long initial
implementation time and associated high cost Difficult to
accommodate changes in data types and ranges, data source schema,
indexes and queries
Slide 18
OLTP vs. OLAP OLTP: On Line Transaction Processing Describes
processing at operational sites OLAP: On Line Analytical Processing
Describes processing at warehouse
Slide 19
OLTP Database vs. Data Warehouse relational databases - groups
data using common attributes found in the data set objectives are
different
Slide 20
OLTP database Data Warehouse Designed for real time business
operations Designed for analysis of business measures by categories
and attributes
Slide 21
OLTP database Data Warehouse Mostly updates Many small
transactions Mb - Gb of data Mostly reads Queries are long and
complex Gb - Tb of data
Slide 22
OLTP database Data Warehouse Current snapshot Raw data
Thousands of users (e.g., clerical users) History Summarized,
reconciled data Hundreds of users (e.g., decision-makers,
analysts)
Slide 23
SUMMARY four questions for you
Slide 24
Designed for real time business operations Designed for
analysis of business measures by categories and attributes 1 2
Slide 25
Designed for real time business operations Designed for
analysis of business measures by categories and attributes Data
Warehouse OLTP database
Slide 26
Optimized for bulk loads and large, complex, unpredictable
queries that access many rows per table. Optimized for a common set
of transactions, usually adding or retrieving a single row at a
time per table. 1 2
Slide 27
Optimized for bulk loads and large, complex, unpredictable
queries that access many rows per table. Optimized for a common set
of transactions, usually adding or retrieving a single row at a
time per table. OLTP database Data Warehouse
Slide 28
Loaded with consistent, valid data; requires no real time
validation. Optimized for validation of incoming data during
transactions; uses validation data tables. 1 2
Slide 29
Loaded with consistent, valid data; requires no real time
validation. Optimized for validation of incoming data during
transactions; uses validation data tables. OLTP database Data
Warehouse
Slide 30
Supports thousands of concurrent users. Supports few concurrent
users relative to OLTP. 1 2
Slide 31
Supports thousands of concurrent users. Supports few concurrent
users relative to OLTP. Data Warehouse OLTP database
Slide 32
Data, Information & Knowledge Data is just symbols
Information is data that are processed to be useful; provides
answers to "who", "what", "where", and "when" questions Knowledge
is application of data and information; answers "how"
questions
Slide 33
Data Data is raw. It simply exists and has no significance
beyond its existence (in and of itself). It can exist in any form,
usable or not. It does not have meaning of itself. In computer
parlance, a spreadsheet generally starts out by holding data.
Slide 34
Information Information is data that has been given meaning by
way of relational connection. This "meaning" can be useful, but
does not have to be. In computer parlance, a relational database
makes information from the data stored within it.
Slide 35
Knowledge Knowledge is the appropriate collection of
information, such that it's intent is to be useful. Summaries of
information in a database for example. Or modeling and simulation
tools exercise some type of stored knowledge.
Slide 36
Copyright 2005 Pearson Addison- Wesley. All rights reserved.
1-36 Examples - Supermarket OLTP Event is 3 cans of soup and 1 box
of crackers bought; update database to reflect that event OLAP Last
winter in all stores in northeast, how many customers bought soup
and crackers together? Data Mining Are there any interesting
combinations of foods that customers frequently bought
together?
Slide 37
11 Database designing rules Rule 1: What is the nature of the
application (OLTP or OLAP)? Rule 2: Break your data in to logical
pieces, make life simpler Rule 3: Do not get overdosed with rule 2
Rule 4: Treat duplicate non-uniform data as your biggest enemy Rule
5: Watch for data separated by separators
Slide 38
Database designing rules Rule 6: Watch for partial dependencies
Rule 7: Choose derived columns preciously Rule 8: Do not be hard on
avoiding redundancy, if performance is the key Rule 9:
Multidimensional data is a different beast altogether Rule 10:
Centralize name value table design Rule 11: For unlimited
hierarchical data self-reference PK and FK
Slide 39
Slide 40
Normal form examples 1 NF : First Name, Middle name, Surname-
different columns 2 NF : Syllabus column of 5 th standard should
depend on both primary keys roll no. & standard 3 NF : Average
column depends on marks & no. of subjects Normalization rules
are important guidelines but taking them as a mark on stone is
calling for trouble.
Slide 41
Rule 1: What is the nature of the application (OLTP or OLAP)?
Transactional: End user is more interested in CRUD, i.e., creating,
reading, updating, and deleting records. The official name for such
a kind of database is OLTP. Analytical: End user is more interested
in analysis, reporting, forecasting, etc. - less number of inserts
and updates. - main intention here is to fetch and analyze data as
fast as possible. - The official name for such a kind of database
is OLAP.
Slide 42
Rule 1: What is the nature of the application (OLTP or OLAP)?
In other words if you think inserts, updates, and deletes are more
prominent then go for a normalized table design, else create a flat
denormalized database structure.
Slide 43
Slide 44
Rule 2: Break your data into logical pieces, make life simpler
The first rule from 1 st normal form. If your queries are using too
many string parsing functions like substring, charindex, etc apply
this rule E.g. Query- student names having Koirala and not
Harisingh, very complex query The better approach would be to break
this field into further logical pieces to write clean and optimal
queries.
Slide 45
Slide 46
Rule 3: Do not get overdosed with rule 2 Decomposing, is it
needed? The decomposition should be logical. Its rare that you will
operate on ISD codes of phone numbers separately (until your
application demands it). So it would be a wise decision to just
leave it as it can lead to more complications.
Slide 47
Rule 4: Treat duplicate non-uniform data as your biggest enemy
Focus and refactor duplicate data, it creates confusion. For
instance, in the below diagram, you can see 5th Standard and Fifth
standard means the same.
Slide 48
Rule 4: Treat duplicate non-uniform data as your biggest enemy
One of the solutions -move the data into a different master table
altogether and refer them via foreign keys. E.g. new master table
called Standards and linked the same using a simple foreign
key.
Slide 49
Rule 5: Watch for data separated by separators The 2 nd rule of
1 st normal form says avoid repeating groups. Too much data stuffed
in syllabus column. These fields are termed as Repeating groups. To
manipulate this data, the query would be complex and the
performance of the queries degrades.
Slide 50
Rule 5: Watch for data separated by separators Columns which
have data stuffed with separators need special attention and a
better approach would be to move those fields to a different table
and link them with keys for better management.
Slide 51
Rule 6: Watch for partial dependencies Watch for fields which
depend partially on primary keys. E.g Primary key is created on
roll number and standard. The syllabus is associated with the
standard in which the student is studying and not directly with the
student. Move the syllabus field and attach it to the Standards
table. This rule is the 2 nd normal form: All keys should depend on
the full primary key and not partially.
Slide 52
Slide 53
Rule 7: Choose derived columns preciously
Slide 54
OLTP applications: getting rid of derived columns would be a
good OLAP :a lot of summations, calculations, these kinds of fields
are necessary to gain performance. The 3 rd normal form: No column
should depend on other non-primary key columns. See the situation
and then decide if you want to implement the 3 rd normal form.
Slide 55
Rule 8: Do not be hard on avoiding redundancy, if performance
is the key Need for performance: think about de- normalization.
Normalization: make joins with many tables Denormalization: the
joins reduce and increase performance.
Slide 56
Rule 8: Do not be hard on avoiding redundancy, if performance
is the key
Slide 57
Rule 9: Multidimensional data is a different beast altogether
OLAP projects mostly deal with multidimensional data. E.g. get
sales per country, customer, and date, where sales figures have
three intersections of dimension data.
Slide 58
Slide 59
Slide 60
Slide 61
Rule 10: Centralize name value table design Name and value
tables :has key and some data associated with the key. E.g.
currency table and a country table. Have only a key and value. For
such kinds of tables, creating a central table and differentiating
the data by using a type field makes more sense.
Slide 62
Slide 63
Rule 11: For unlimited hierarchical data self- reference PK and
FK Unlimited parent child hierarchy. E.g. A multi-level marketing
scenario where a sales person can have multiple sales people below
them. For such scenarios, using a self-referencing primary key and
foreign key will help to achieve the same
Slide 64
Slide 65
Business Models Depends on business requirements E.g.
E-commerce business model
Slide 66
3-66 Data Marts Data warehouses can support all of an
organizations information Data marts have subsets of an
organizationwide data warehouse Data mart subset of a data
warehouse in which only a focused portion of the data warehouse
information is kept
Slide 67
Slide 68
Assignment 1 Differentiate between OLTP and OLAP. Explain the
design aspects of OLTP & OLAP What is BI & what are its
components?
Slide 69
References OLTP Vs OLAP ppts Notes by Shivprasad koirala