21
Database Access and Techniques, most notably Data Mining.

Database Access and Techniques, most notably Data Mining

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Database Access and Techniques, most notably Data Mining

Database Access and Techniques, most notably Data Mining.

Page 2: Database Access and Techniques, most notably Data Mining

Tools:Computers andIT. VB, VBA,

Excel, InterDev,Etc.

Humans:DecisionMakingProcess

Algorithms:Math/Flow Chart

stuff that helps thetools help the humans

make decisions.

DSS

DSS

Data:Facts pertinentto the decision

at hand.

Page 3: Database Access and Techniques, most notably Data Mining

Warehouses & Marts• Data Warehouse

– a database designed to support decision-making in an organization. It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems.

• Data Mart– a database focused on addressing the concerns of a specific

problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses.

Page 4: Database Access and Techniques, most notably Data Mining

Data Warehouses & Data Marts

TPS& other

operational systems

DataWarehouse

Data Mart(Marketing)

Data Mart(Engineering)

3rd party data

= query, OLAP, mining, etc.

= operational clients

Page 5: Database Access and Techniques, most notably Data Mining

Data Warehousing

• Physical separation of operational and decision support environments

• Purpose: to establish a data repository making operational data accessible

• Transforms operational data to relational form• Only data needed for decision support come

from the TPS• Data are transformed and integrated into a

consistent structure• Data warehousing (or information warehousing):

a solution to the data access problem • End users perform ad hoc query, reporting

analysis and visualization

Page 6: Database Access and Techniques, most notably Data Mining

Data Warehousing Benefits• Increase in knowledge worker productivity • Supports all decision makers’ data requirements• Provide ready access to critical data• Insulates operation databases from ad hoc

processing• Provides high-level summary information • Provides drill down capabilities

Yields– Improved business knowledge– Competitive advantage– Enhances customer service and satisfaction– Facilitates decision making– Help streamline business processes

Page 7: Database Access and Techniques, most notably Data Mining

DW Suitability

For organizations where• Data are in different systems• Information-based approach to

management in use• Large, diverse customer base• Same data have different representations

in different systems• Highly technical, messy data formats

Page 8: Database Access and Techniques, most notably Data Mining

Transform Data from TPS to Warehouse

• Consolidate data– e.g. from multiple TPS around the country/world

• “Scrub” the data– keep definitions consistent (e.g. translate part

numbers/product names if they differ per country)

• Calculate fields (decrease processor load)• Summarize fields (decrease processor load)• De-normalize data (ease of use)

Page 9: Database Access and Techniques, most notably Data Mining

OLAP: Data Access and Mining, Querying and Analysis

Online Analytical processing (OLAP)

– DSS and EIS computing done by end-users in online systems

– Versus online transaction processing (OLTP)

OLAP Activities– Generating queries– Requesting ad hoc reports– Conducting statistical analyses – Building multimedia applications

Page 10: Database Access and Techniques, most notably Data Mining

OLAP uses the data warehouse

and a set of tools, usually with multidimensional

capabilities• Query tools

• Spreadsheets

• Data mining tools

• Data visualization tools

Page 11: Database Access and Techniques, most notably Data Mining

Using SQL for Querying

• SQL (Structured Query Language)Data language English-like, nonprocedural, very user friendly languageFree format

Example:SELECT Name, SalaryFROM EmployeesWHERE Salary >2000

Page 12: Database Access and Techniques, most notably Data Mining

Sample of SQL Statements

Natural Language

SQL

List of all purchases of L.B. University since January of 1996, in terms of products, prices, and quantities

SELECT PRODUCTS PURCH PRICE QUANTITY FROM PURCHASE-HIST WHERE CUST-NAME EQ L.B. UNIVERSITY AND PURCH-DATE GE 01/01/96

List the price of cotton shirts, medium size, with short sleeves and white color

SELECT PRICE, AMOUNT- AVAIL FROM PRODUCT WHERE PROD-NAME EQ COTTON SHIRT AND SIZE EQ MEDIUM AND STYLE EQ SHORT SLEEVES AND COLOR EQ WHITE

Page 13: Database Access and Techniques, most notably Data Mining

Query Tools & OLAP

• Query Tools– user-lead discovery. Can return individual records or summaries.

Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”).

• OLAP - Online Analytical Processing– user-lead discovery. Data is explored via “drill down” into the data

by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”).

Page 14: Database Access and Techniques, most notably Data Mining

OLAP• Online Analytical Processing. (example of

cross-tab results presented below)

1. business unit

2. product type 3. year

Page 15: Database Access and Techniques, most notably Data Mining

DataSources

I nternalDataSources

ExternalDataSources

DataAcquisition,Extraction,DeliveryTransformation

DataWarehouse

BusinessCommunication

Querying

ReportGeneration

SpreadsheetForecastingAnalysisModeling

Multimedia

E IS,Others

Online Analytical Processing

DataPresentationandVisualization

Page 16: Database Access and Techniques, most notably Data Mining

Data Mining• automated information discovery process,

uncovers important patterns in existing data– can use neural networks or other approaches.

Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment.

• e.g. “What are the characteristics that identify when we are likely to lose a customer?”

Page 17: Database Access and Techniques, most notably Data Mining

Data Mining

For• Knowledge discovery in

databases• Knowledge extraction• Data archeology• Data exploration• Data pattern processing• Data dredging• Information harvesting

Page 18: Database Access and Techniques, most notably Data Mining

Data Mining Examples

• Market Segmentation - e.g. Dayton Hudson

• Direct Marketing - e.g. Chase

• Market basket analysis - e.g. Wal-Mart

• Customer Churn - e.g. Fleet Bank

• Fraud Detection - e.g. Bank of America

• Cost Reduction Prospecting - e.g. Merk Medco.

Page 19: Database Access and Techniques, most notably Data Mining

Major Data Mining Characteristics and

Objectives• Data are often buried deep• Client/server architecture• Sophisticated new tools--including advanced

visualization tools--help to remove the information “ore”

• Massaging and synchronizing data • Usefulness of “soft” data• End-user minor is empowered by “data drills” and

other power query tools with little or no programming skills

• Often involves finding unexpected results• Tools are easily combined with spreadsheets etc.• Parallel processing for data mining

Page 20: Database Access and Techniques, most notably Data Mining

Data Mining Application Areas

• Marketing• Banking• Retailing and sales• Manufacturing and production• Brokerage and securities trading• Insurance• Computer hardware and software• Government and defense• Airlines• Health care• Broadcasting• Law Enforcement• … almost everywhere!!!

Page 21: Database Access and Techniques, most notably Data Mining

Stupid Data-Miner Tricks• Ad-Hoc Theories

– when an oddity jumps out of the data, it’s tempting to develop a theory for it. Sometimes findings are just statistical flukes.

• Using Too Many Variables– the more factors considered, the more likely a relationship will

be found - valid or not.

• Not Taking No for an Answer– it’s OK to stop looking if you can’t find anything. There are

no silver bullets.

• Limited or incorrect interpretation