Upload
timothy-terry
View
214
Download
1
Embed Size (px)
Citation preview
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
11
Data Warehousing Data Warehousing Lecture-29Lecture-29
Brief Intro. to Data MiningBrief Intro. to Data Mining
Virtual University of PakistanVirtual University of Pakistan
Ahsan AbdullahAssoc. Prof. & Head
Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
22
What is Data Mining?: Non technical viewWhat is Data Mining?: Non technical view
““There are things that we know that we There are things that we know that we know…know…
there are things that we know that we there are things that we know that we don’t know…don’t know…
there are things that we there are things that we don’t knowdon’t know we we don’t knowdon’t know.”.”
Donald RumsfieldDonald Rumsfield
US Secretary of DefenceUS Secretary of Defence
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
33
What is Data Mining?: Slightly formalWhat is Data Mining?: Slightly formal
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
44
What is Data Mining?: Formal viewWhat is Data Mining?: Formal view
Data mining digs out Data mining digs out valuablevaluable non-trivial non-trivial informationinformation from from largelarge multidimensionalmultidimensional apparently apparently unrelatedunrelated data bases(sets). data bases(sets).
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
55
Why Data Mining? Huge volumeWhy Data Mining? Huge volume
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
66
Claude Shannon's info. theoryClaude Shannon's info. theory
More volume means less information
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
77
Volume of Data
Value vs. VolumeValue vs. Volume
Valueof
Data
Decision (Y/N)Decision (Y/N)
Decision SupportDecision Support
Knowledge Knowledge
Information Information
Indexed DataIndexed Data
Raw DataRaw Data
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
88
Why Data Mining?: Supply & DemandWhy Data Mining?: Supply & Demand
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
99
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1010
Data Mining is Data Mining is HOT!HOT!
10 Hottest Jobs of year 2025 10 Hottest Jobs of year 2025 Time MagazineTime Magazine, 22 May, 2000, 22 May, 2000
10 emerging areas of technology10 emerging areas of technologyMIT’s Magazine of Technology ReviewMIT’s Magazine of Technology Review, , Jan/Feb, 2001Jan/Feb, 2001
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1111
How Data Mining is different? TraditionallyHow Data Mining is different? Traditionally
Data WarehousesData Warehouses ( (Data-driven explorationData-driven exploration))::
Data Mining (Knowledge-driven explorationKnowledge-driven exploration)
Traditional Database (Transactions):
Knowledge Discovery (KDD)
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1212
Data Mining Vs StatisticsData Mining Vs Statistics
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1313
Data Mining Vs. StatisticsData Mining Vs. Statistics
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1414
Knowledge extraction using statisticsKnowledge extraction using statistics
Inflation Vs Stock inedx increase
010203040
1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6
Inflation (%)
Sto
ck
inc
rea
se
(%
)
Q: What will be the stock increase when inflation is 6%?
A: Model non-linear relationship using a line y = mx + c. Hence answer is 13%
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1515
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263
-10000
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
Failure of regression modelsFailure of regression models
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1616
Data Mining is…Data Mining is…
Decision TreesDecision Trees
Neural NetworksNeural Networks
Rule InductionRule Induction
ClusteringClustering
Genetic AlgorithmsGenetic Algorithms
If. . . . .If. . . . .
Then. . . Then. . .
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1717
Data Mining is NOT ...Data Mining is NOT ...
Data warehousing Data warehousing
Ad Hoc Query / ReportingAd Hoc Query / Reporting
Online Analytical Processing (OLAP)Online Analytical Processing (OLAP)
Data VisualizationData Visualization
Software AgentsSoftware Agents
DWH-Ahsan AbdullahDWH-Ahsan Abdullah
1818
Data Mining: Data Mining: BusinessBusiness Perspective Perspective
““knowledge” is worth knowing if it can be used to knowledge” is worth knowing if it can be used to increase profit by lowering cost or it can be used to increase profit by lowering cost or it can be used to increase profit by raising revenue.increase profit by raising revenue.
Business questionsBusiness questions Profiling/Segmentation
Cross-Service
Employee retention: