Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Objectives
After finishing this class the
students will:
Understand the basic terms
in Data Mining and
Warehousing
Understand their necessity
in business and IS
Objectives
Understand the basic
concepts of Data Mining
and Warehousing
Understand the
implementation processes of
those concepts
Motivation
Lots of data is being collected
and warehoused
Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions
Motivation
Computers have become cheaper and more powerful
Competitive Pressure is Strong
Need better, customized services for an edge (e.g. in Customer Relationship Management)
Data Warehousing
A data warehouse is
repository of information
collected from multiple
sources, stored under a
unified scheme, and
usually resides at a
single site
Data Mining
Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
Non-trivial extraction of implicit, previously unknown and potentially useful information from data
Data Mining
Data mining is the process of discovering
actionable information from large sets of data.
Data mining uses mathematical analysis to
derive patterns and trends that exist in data.
Typically, these patterns cannot be discovered by
traditional data exploration because the
relationships are too complex or because there is
too much data.
Discovering the knowledge
Data cleaning
Remove the noise or irrelevant data
Data integration
Combine the possible data sources
Data selection
Retrieve the relevant data for such analysis task
Discovering the knowledge
Data transformation
Transform and consolidate data into a form that appropriate for mining
Data Mining
Pattern evaluation
Identify the interesting patterns that representing the knowledge
Discovering the knowledge
Knowledge Presentation
Visualize and presents the mined knowledge to the user
Data mining tasks
Prediction Methods
Use some variables to predict unknown or
future values of other variables.
Data mining tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
Data mining Algorithms
Classification algorithms
predict one or more discrete variables,
based on the other
Regression algorithms
predict one or more continuous variables,
such as profit or loss, based on other
attributes in the dataset.
Data mining Algorithms
Segmentation algorithms
divide data into groups, or clusters, of
items that have similar properties
Data mining Algorithms
Association algorithms
find correlations between different
attributes in a dataset. The most common
application of this kind of algorithm is for
creating association rules, which can be
used in a market basket analysis.
Data mining Algorithms
Sequence analysis algorithms
summarize frequent sequences or
episodes in data, such as a Web path flow.
Data mining Models
Risk and probability
Choosing the best customers for targeted
mailings, determining the probable break-
even point for risk scenarios, assigning
probabilities to diagnoses or other
outcomes
Data mining Models
Recommendations
Determining which products are likely to be
sold together, generating
recommendations
Data mining Models
Finding sequences
Analyzing customer selections in a
shopping cart, predicting next likely events
Data mining Models
Grouping
Separating customers or events into
cluster of related items, analyzing and
predicting affinities