17
CCB-681: Data Mining

CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

CCB-681: Data Mining

Page 2: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Unit 1

Basics of data mining, Knowledge Discovery in

databases, KDD process, data mining tasks primitives,

Integration of data mining systems with a database or data

warehouse system, Major issues in data mining, Data pre-

processing: data cleaning, data integration and

transformation, data reduction etc.

Page 3: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Data Mining is defined as extracting information from huge

sets of data. In other words, we can say that data mining is the

procedure of mining knowledge from data. Mined knowledge

can be used for any of the following applications −

Market Analysis

Fraud Detection

Customer Retention

Production Control

Science Exploration

Page 4: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Why Data Mining

Credit ratings/targeted marketing:

Given a database of 100,000 names, which persons

are the least possible to default on their credit cards?

Identify possible responders to sales promotions

Fraud detection

Which types of transactions are possible to be fake,

given the demographics and transactional history of a

particular customer?

Page 5: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Customer relationship management:

Which of my customers are possible to be the most

loyal, and which are most possible to leave for a

competitor? :

Data Mining helps to extract such information

Today’s Scenario: The Explosive Growth of Data

And solution is Data mining—Automated analysis of

massive data sets.

Page 6: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

What Is Data Mining?

Data mining (knowledge discovery from data)

Extraction of interesting (non-trivial (significant),

implicit (hidden), previously unknown and

potentially useful) patterns or knowledge from huge

amount of data.

Alternative name

Data mining is the analysis step of the "knowledge

discovery in databases" process or KDD.

6

Page 7: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Data mining

Process of semi-automatically analyzing large databases

to find patterns that are:

valid: hold on new data with some certainty (legal).

novel: non-obvious to the system (unique).

useful: should be possible to act on the item.

understandable: humans should be able to interpret

the pattern.

Page 8: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

The actual data mining task is the semi-automatic or

automatic analysis of large quantities of data to extract

previously unknown, interesting patterns such as

groups of data records (cluster analysis), unusual

records (anomaly detection), and dependencies

(association rule mining, sequential pattern mining).

Is everything “data mining”?

Page 9: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown
Page 10: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown
Page 11: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown
Page 12: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Data mining is the process of discovering patterns

in large data sets involving methods at the

intersection of machine learning, statistics, and

database systems. ... Data mining is the analysis

step of the "knowledge discovery in databases"

process or KDD.

Knowledge discovery is an iterative process

Page 13: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Data Mining: A KDD Process

Data mining—core of

knowledge discovery process

13

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 14: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

The KDD process

The main objective of the KDD process is to extract information

from data in the context of large databases.

The Knowledge Discovery in Databases is considered as a

programmed, exploratory (experimental) analysis and modeling of

vast data repositories.

KDD is the organized procedure of recognizing valid, useful, and

understandable patterns from huge and complex data sets.

Data Mining is the root of the KDD procedure, including the

gathering of algorithms that investigate the data, develop the model,

and find previously unknown patterns.

Page 15: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

The KDD process

The model is used for extracting the knowledge from the

data, analyze the data, and predict the data.

The knowledge discovery process is iterative and

interactive, comprises of nine steps. The process is iterative

at each stage, implying that moving back to the previous

actions might be required.

The process begins with determining the KDD objectives

and ends with the implementation of the discovered

knowledge. At that point, the loop is closed.

Page 16: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

List of steps involved in the knowledge

discovery process −

Data Cleaning − In this step, the noise and inconsistent

data is removed.

Data Integration − In this step, multiple data sources are

combined.

Data Selection − In this step, data relevant to the analysis

task are retrieved from the database.

Data Transformation − In this step, data is transformed or

consolidated into forms appropriate for mining by

performing summary or aggregation operations.

Page 17: CCB-681: Data Mining · What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting (non-trivial (significant), implicit (hidden), previously unknown

Data Mining − In this step, intelligent methods are

applied in order to extract data patterns.

Pattern Evaluation − In this step, data patterns are

evaluated.

Knowledge Presentation − In this step, knowledge is

represented.

List of steps involved in the knowledge

discovery process −