Upload
jerome-salazar-caballero
View
224
Download
0
Embed Size (px)
Citation preview
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
1/31
CS359 Introduction to Data
Mining
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
2/31
This course introduces the fundamental concepts ofdata mining and knowledge discovery fromdatabases.
It focuses on the discussion and demonstration of
common data mining methods and how data miningresults become useful to businesses andorganizations.
Course objectives
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
3/31
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
4/31
Attendance will be checked.
No make-up quizzes
Make-up long exam only for excused absence.
Set schedule within a week after the exam date
Late submissions will not be accepted (assignments,cases and project)
Policies
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
5/31
Han, J. & Kamber, M. (2006) Data Mining Concepts
and Techniques 2ndEdition. Morgan KaufmannPublisher Elsevier Inc., California.
P. Tan, M. Steinbach & V. Kumar, Introduction to DataMining, Addison Wesley, 2006.
References
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
6/31
Data Mining Software Links by Dr. Pang-Ning Tan :
www.cse.msu.edu/~cse980/software.html RapidMiner : http://rapid-
i.com/content/view/26/84/lang,en/
Weka : http://www.cs.waikato.ac.nz/ml/weka/
Software Links
http://www.cse.msu.edu/~cse980/software.htmlhttp://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://www.cs.waikato.ac.nz/ml/weka/http://www.cs.waikato.ac.nz/ml/weka/http://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://rapid-i.com/content/view/26/84/lang,en/http://www.cse.msu.edu/~cse980/software.html8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
7/31
Data Mining Processes and
Knowledge Discovery
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
8/31
Define Data Mining and knowledge discovery in
databases. Discuss some business applications of data mining
Identify the elements of the data mining process
Discuss the steps in CRISP-DM
Objectives
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
9/31
Is also known as Knowledge Discovery in Databases; a
nontrivial extraction of implicit, previously unknownand potentially useful information from databases(Han et al, 1999)
Involves the use of analysis to detect patterns and
allow predictions. (Olson & Shi, 2007)
What is Data Mining?
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
10/31
Exploratory data analysis
Finds its roots along with the development in classicalstatistics, artificial intelligence and machine learning
Looks for actionable information, or information thatcan be utilized in a concrete way to improve
profitability
Data Mining
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
11/31
Hypothesis Testing
A theory about the relationship between actions andoutcomes is expressed and tested
Knowledge Discovery
Preconceived notion may not be present
Relationships can be identified by looking in to the data
Data Mining requires the identification of aproblem
General Types of Data Mining
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
12/31
Retailing
Affinity Positioning based upon the identification ofproducts that the same customer is likely to want
Cross-selling knowledge of products that go togethercan be used by marketing the complementary product
Data Mining Applications
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
13/31
Banking
Customer Relationship Management identify customervalue, develop programs to maximize revenue
Credit Card Management
Identify Balance Surfers or credit card holders who pays
old balances with a new card
Lift identify effective market segments
Churn identify likely customer turnover
Data Mining Applications
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
14/31
Insurance
Fraud detection identify fraud claims meritinginvestigation
TelecommunicationsChurn customer turnover or switching carriers
MedicineCancer Cell Detection
Machine VisionPattern Recognition
Data Mining Applications
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
15/31
Cross-Industry Standard Process for Data Mining
Phases
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
CRISP-DM Process
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
16/31
Knowing what the study is for
Identify business task
Business Understanding
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
17/31
Select the related data from many available
databases to correctly describe a givenbusiness task Identify relevant data for the problem descriptionSelected variables for the relevant data should be
independent of each other or do not containoverlapping information
Types of data: geographic, socio-graphic,transactional or quantitative and qualitative
Data Understanding
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
18/31
Also known as data preprocessing
Clean selected data for better quality
Filter, aggregate and fill in missing values (imputation)
Filter: remove outliers and redundancies
Aggregate: data is reduces to obtain aggregatedinformation
Filling-in or Smoothing: missing values are found andreplaces with reasonable values
Data Preparation
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
19/31
Data transformationUses mathematical formulations to convert
different measurements into a unified numericalscale
Numerical to numerical scales
Shrink or enlarge the dataCategorical to numerical scales
Categorical values can be ordinal (less, moderate, strong)or nominal (red, yellow, blue)
Data Preparation
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
20/31
Data mining software is used to generate results for
various situations Data is divided into:
Training set used for the development of the model
Test set used to test the model thats built
Modeling
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
21/31
Data Modeling Techniques
Association the relationship of a particular item in adata transaction on other items in the same transactionis used to predict patterns
Classification learning different functions that mapeach item of the selected data into one of a predefined
set of classes
Modeling
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
22/31
Clustering takes ungrouped data and uses automatic
techniques to put this data into groupsPrediction Analysis discover the relationship between
the dependent and independent variables
Sequential Pattern Analysis seeks to fine similar
patterns in data transaction over a business period
Modeling
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
23/31
Data interpretation stage
Two things to consider:How to recognize business value from knowledge
patterns discovered
How to visualize the results to properly interpret
patterns
Evaluation
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
24/31
The results are reported to project sponsors
The result is applied to business task or data miningobjective
Deployment
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
25/31
Data Cleaning
Data Integration
Data Selection
Data Transformation
Data Mining
Pattern Evaluation
Knowledge Presentation
Knowledge Discovery Process
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
26/31
Data Mining System Architecture
DatabaseData
WarehouseWWW
Other
Repositories
Data Mining Engine
Pattern Evaluation
User Interface
Knowledge
Base
Database or Data Warehouse Server
Data cleaning, Integration and Selection
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
27/31
Relational Databases
Data Warehouses
Transactional Databases
Object-Relational Databases
Temporal, Sequence or Time-Series Database
Spatial Databases and Spatiotemporal Databases
Data Mining on what data?
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
28/31
Descriptive characterize the general properties of
data Data characterization, Data discrimination, Association,
Clustering
Predictive performs inference on the current data in
order to make predictions Classification and Prediction, Evolution analysis
Data Mining - what patterns?
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
29/31
NO
A pattern is interesting if (1) it is easily understood by humans,
(2) valid on new or test data with some degree ofcertainty,
(3) potentially useful, and
(4) novel.
A pattern is also interesting if it validates ahypothesis that the user sought to confirm.
Are all patterns interesting?
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
30/31
Refers to COMPLETENESS of a data mining algorithm
It is unrealistic and inefficient for data mining systemsto generate all of the possible patterns.
A focused search which makes use of interestingnessmeasures should be used to control pattern
generation.
Can a data mining system generateall interesting patterns?
8/10/2019 1 Data Mining Processes and Knowledge Discovery (1)
31/31
1. What is the business task or data mining objective?
2. What are the relevant data and their sources?3. How was the data prepared? What were the
processes?
4. What was the data mining technique used?
5. How was the model used to address the businesstask?
CASE study: Telephone company