View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Database Access and Techniques, most notably Data Mining.
Tools:Computers andIT. VB, VBA,
Excel, InterDev,Etc.
Humans:DecisionMakingProcess
Algorithms:Math/Flow Chart
stuff that helps thetools help the humans
make decisions.
DSS
DSS
Data:Facts pertinentto the decision
at hand.
Warehouses & Marts• Data Warehouse
– a database designed to support decision-making in an organization. It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems.
• Data Mart– a database focused on addressing the concerns of a specific
problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses.
Data Warehouses & Data Marts
TPS& other
operational systems
DataWarehouse
Data Mart(Marketing)
Data Mart(Engineering)
3rd party data
= query, OLAP, mining, etc.
= operational clients
Data Warehousing
• Physical separation of operational and decision support environments
• Purpose: to establish a data repository making operational data accessible
• Transforms operational data to relational form• Only data needed for decision support come
from the TPS• Data are transformed and integrated into a
consistent structure• Data warehousing (or information warehousing):
a solution to the data access problem • End users perform ad hoc query, reporting
analysis and visualization
Data Warehousing Benefits• Increase in knowledge worker productivity • Supports all decision makers’ data requirements• Provide ready access to critical data• Insulates operation databases from ad hoc
processing• Provides high-level summary information • Provides drill down capabilities
Yields– Improved business knowledge– Competitive advantage– Enhances customer service and satisfaction– Facilitates decision making– Help streamline business processes
DW Suitability
For organizations where• Data are in different systems• Information-based approach to
management in use• Large, diverse customer base• Same data have different representations
in different systems• Highly technical, messy data formats
Transform Data from TPS to Warehouse
• Consolidate data– e.g. from multiple TPS around the country/world
• “Scrub” the data– keep definitions consistent (e.g. translate part
numbers/product names if they differ per country)
• Calculate fields (decrease processor load)• Summarize fields (decrease processor load)• De-normalize data (ease of use)
OLAP: Data Access and Mining, Querying and Analysis
Online Analytical processing (OLAP)
– DSS and EIS computing done by end-users in online systems
– Versus online transaction processing (OLTP)
OLAP Activities– Generating queries– Requesting ad hoc reports– Conducting statistical analyses – Building multimedia applications
OLAP uses the data warehouse
and a set of tools, usually with multidimensional
capabilities• Query tools
• Spreadsheets
• Data mining tools
• Data visualization tools
Using SQL for Querying
• SQL (Structured Query Language)Data language English-like, nonprocedural, very user friendly languageFree format
Example:SELECT Name, SalaryFROM EmployeesWHERE Salary >2000
Sample of SQL Statements
Natural Language
SQL
List of all purchases of L.B. University since January of 1996, in terms of products, prices, and quantities
SELECT PRODUCTS PURCH PRICE QUANTITY FROM PURCHASE-HIST WHERE CUST-NAME EQ L.B. UNIVERSITY AND PURCH-DATE GE 01/01/96
List the price of cotton shirts, medium size, with short sleeves and white color
SELECT PRICE, AMOUNT- AVAIL FROM PRODUCT WHERE PROD-NAME EQ COTTON SHIRT AND SIZE EQ MEDIUM AND STYLE EQ SHORT SLEEVES AND COLOR EQ WHITE
Query Tools & OLAP
• Query Tools– user-lead discovery. Can return individual records or summaries.
Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”).
• OLAP - Online Analytical Processing– user-lead discovery. Data is explored via “drill down” into the data
by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”).
OLAP• Online Analytical Processing. (example of
cross-tab results presented below)
1. business unit
2. product type 3. year
DataSources
I nternalDataSources
ExternalDataSources
DataAcquisition,Extraction,DeliveryTransformation
DataWarehouse
BusinessCommunication
Querying
ReportGeneration
SpreadsheetForecastingAnalysisModeling
Multimedia
E IS,Others
Online Analytical Processing
DataPresentationandVisualization
Data Mining• automated information discovery process,
uncovers important patterns in existing data– can use neural networks or other approaches.
Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment.
• e.g. “What are the characteristics that identify when we are likely to lose a customer?”
Data Mining
For• Knowledge discovery in
databases• Knowledge extraction• Data archeology• Data exploration• Data pattern processing• Data dredging• Information harvesting
Data Mining Examples
• Market Segmentation - e.g. Dayton Hudson
• Direct Marketing - e.g. Chase
• Market basket analysis - e.g. Wal-Mart
• Customer Churn - e.g. Fleet Bank
• Fraud Detection - e.g. Bank of America
• Cost Reduction Prospecting - e.g. Merk Medco.
Major Data Mining Characteristics and
Objectives• Data are often buried deep• Client/server architecture• Sophisticated new tools--including advanced
visualization tools--help to remove the information “ore”
• Massaging and synchronizing data • Usefulness of “soft” data• End-user minor is empowered by “data drills” and
other power query tools with little or no programming skills
• Often involves finding unexpected results• Tools are easily combined with spreadsheets etc.• Parallel processing for data mining
Data Mining Application Areas
• Marketing• Banking• Retailing and sales• Manufacturing and production• Brokerage and securities trading• Insurance• Computer hardware and software• Government and defense• Airlines• Health care• Broadcasting• Law Enforcement• … almost everywhere!!!
Stupid Data-Miner Tricks• Ad-Hoc Theories
– when an oddity jumps out of the data, it’s tempting to develop a theory for it. Sometimes findings are just statistical flukes.
• Using Too Many Variables– the more factors considered, the more likely a relationship will
be found - valid or not.
• Not Taking No for an Answer– it’s OK to stop looking if you can’t find anything. There are
no silver bullets.
• Limited or incorrect interpretation