18
DWH-Ahsan Abdullah DWH-Ahsan Abdullah 1 Data Warehousing Data Warehousing Lecture-29 Lecture-29 Brief Intro. to Data Mining Brief Intro. to Data Mining Virtual University of Virtual University of Pakistan Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

Embed Size (px)

Citation preview

Page 1: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

11

Data Warehousing Data Warehousing Lecture-29Lecture-29

Brief Intro. to Data MiningBrief Intro. to Data Mining

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]

Page 2: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

22

What is Data Mining?: Non technical viewWhat is Data Mining?: Non technical view

““There are things that we know that we There are things that we know that we know…know…

there are things that we know that we there are things that we know that we don’t know…don’t know…

there are things that we there are things that we don’t knowdon’t know we we don’t knowdon’t know.”.”

Donald RumsfieldDonald Rumsfield

US Secretary of DefenceUS Secretary of Defence

Page 3: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

33

What is Data Mining?: Slightly formalWhat is Data Mining?: Slightly formal

Page 4: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

44

What is Data Mining?: Formal viewWhat is Data Mining?: Formal view

Data mining digs out Data mining digs out valuablevaluable non-trivial non-trivial informationinformation from from largelarge multidimensionalmultidimensional apparently apparently unrelatedunrelated data bases(sets). data bases(sets).

Page 5: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

55

Why Data Mining? Huge volumeWhy Data Mining? Huge volume

Page 6: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

66

Claude Shannon's info. theoryClaude Shannon's info. theory

More volume means less information

Page 7: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

77

Volume of Data

Value vs. VolumeValue vs. Volume

Valueof

Data

Decision (Y/N)Decision (Y/N)

Decision SupportDecision Support

Knowledge Knowledge

Information Information

Indexed DataIndexed Data

Raw DataRaw Data

Page 8: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

88

Why Data Mining?: Supply & DemandWhy Data Mining?: Supply & Demand

Page 9: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

99

Page 10: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1010

Data Mining is Data Mining is HOT!HOT!

10 Hottest Jobs of year 2025 10 Hottest Jobs of year 2025 Time MagazineTime Magazine, 22 May, 2000, 22 May, 2000

10 emerging areas of technology10 emerging areas of technologyMIT’s Magazine of Technology ReviewMIT’s Magazine of Technology Review, , Jan/Feb, 2001Jan/Feb, 2001

Page 11: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1111

How Data Mining is different? TraditionallyHow Data Mining is different? Traditionally

Data WarehousesData Warehouses ( (Data-driven explorationData-driven exploration))::

Data Mining (Knowledge-driven explorationKnowledge-driven exploration)

Traditional Database (Transactions):

Knowledge Discovery (KDD)

Page 12: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1212

Data Mining Vs StatisticsData Mining Vs Statistics

Page 13: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1313

Data Mining Vs. StatisticsData Mining Vs. Statistics

Page 14: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1414

Knowledge extraction using statisticsKnowledge extraction using statistics

Inflation Vs Stock inedx increase

010203040

1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6

Inflation (%)

Sto

ck

inc

rea

se

(%

)

Q: What will be the stock increase when inflation is 6%?

A: Model non-linear relationship using a line y = mx + c. Hence answer is 13%

Page 15: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1515

0

10000

20000

30000

40000

50000

60000

70000

0 5 10 15 20 25 30 35

y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263

-10000

0

10000

20000

30000

40000

50000

60000

70000

0 5 10 15 20 25 30 35

Failure of regression modelsFailure of regression models

Page 16: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1616

Data Mining is…Data Mining is…

Decision TreesDecision Trees

Neural NetworksNeural Networks

Rule InductionRule Induction

ClusteringClustering

Genetic AlgorithmsGenetic Algorithms

If. . . . .If. . . . .

Then. . . Then. . .

Page 17: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1717

Data Mining is NOT ...Data Mining is NOT ...

Data warehousing Data warehousing

Ad Hoc Query / ReportingAd Hoc Query / Reporting

Online Analytical Processing (OLAP)Online Analytical Processing (OLAP)

Data VisualizationData Visualization

Software AgentsSoftware Agents

Page 18: DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center

DWH-Ahsan AbdullahDWH-Ahsan Abdullah

1818

Data Mining: Data Mining: BusinessBusiness Perspective Perspective

““knowledge” is worth knowing if it can be used to knowledge” is worth knowing if it can be used to increase profit by lowering cost or it can be used to increase profit by lowering cost or it can be used to increase profit by raising revenue.increase profit by raising revenue.

Business questionsBusiness questions Profiling/Segmentation

Cross-Service

Employee retention: