19
DATA MINING Data Mining, Data Pattern and Machine Learning

Data Mining, Data Pattern, Machine Learning(Week 2

  • Upload
    s4vana

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 1/19

DATA MINING

Data Mining, Data Pattern

and Machine Learning

Page 2: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 2/19

Definition

• “…the analysis of (often large) observational data sets to find

unsuspected relationships and to summarize the data in novelways that are both understandable and useful to the dataowner.”

Hand, Mannila & Smyth

 •

“… an interdisciplinary field bringing together techniques frommachine learning, pattern recognition, statistics, databases,and visualization to address the issue of information extractionfrom large data bases.”

Evangelos Simoudis in Cabena et al.

• “… the extraction of implicit, previously unknown, andpotentially useful information from data.”

Witten & Frank

2

Page 3: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 3/19

Why Has Data Mining Appeared

• Large volumes of data stored by organizations in a

competitive environment combined with advances intechnologies which can be applied to the data

• Background and evolution

  –   

• The need for exploratory data analysis

 –  Niche marketing, customer retention, the internet, onlineinteraction, scientific discovery

• The means to implement Data Mining –  data warehouses, computing power, effective modelling

approaches

3

Page 4: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 4/19

Structural Pattern of Data

4

Page 5: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 5/19

Structural Pattern of Data --cont--

5

Page 6: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 6/19

Machine Learning

• To learn:

 –  To get knowledge of by study, experience, or beingtaught

 –  To become aware by information or from observation

  – 

o comm t to memory –  To be informed

 –  To receive instruction

Learning: –  Things learn when they change their behavior in a way

that makes them perform better in the future

6

Page 7: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 7/19

Machine Learning --cont--

• Machine Learning involves learning in

practical not in theoretical

• Interested in techniques for finding and

for helping to explain that data and make

predictions from it

7

Page 8: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 8/19

Data Mining

• Preliminary Analysis

 –  Much interesting information can be found byquerying the data set

 –  May be supported by a visualisation of the data set

 •

Choose a one or more modelling approaches• There are (at least?) two styles of data mining

 –  Hypothesis testing

 – Knowledge discovery

• The styles and approaches are not mutuallyexclusive

8

Page 9: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 9/19

The Proses of Knowlegde Discovery

• Pre-processing

 –  data selection

 –  cleaning

 –  codin

• Data Mining

 –  select a model

 –  apply the model

• Analysis of results and assimilation

 –  Take action and measure the results

9

Page 10: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 10/19

Data Selection

• Identify the relevant data, both internal and

external to the organisation

• Select the subset of the data appropriate for

• Store the data in a database separate from

the operational systems

10

Page 11: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 11/19

Data Pre-Processing

• Cleaning

 –  Domain consistency: replace certain values with

null

 –  -

database (DB) on each purchase transaction

 –  Disambiguation: highlighting ambiguities for a

decision by the user

• e.g., if names differed slightly but addresses were the

same

11

Page 12: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 12/19

Data Pre-Processing –cont--

• Enrichment

 –  Additional fields are added to records from externalsources which may be vital in establishingrelationships.

 –  e.g., take addresses and replace them with regionalcodes

 –  e.g., transform birth dates into age ranges

• It is often necessary to convert continuous datainto range data for categorisation purposes.

12

Page 13: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 13/19

Data Mining Task• Various taxonomies exist. E.g. Berry & Linoff 6 tasks:

 –  Classification

 –  Estimation (a.k.a. regression)

 –  Prediction

 –  Association Rule Discovery (a.k.a. Affinity Grouping )

 –  Clustering

 –  Description

• The tasks are also referred to as operations. Cabena et al. define 4 operations:

 –  Predictive Modelling

 –  Database Segmentation (a.k.a. clustering)

 –  Link Analysis

 –  Deviation Detection

• Beware! Different authors use different names for the same technique, operation

or task.

13

Page 14: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 14/19

Clasification

• Classification involves considering the

features of some object then assigning it it tosome pre-defined class, for example:

 –   

 –  Which phone numbers are fax numbers

 –  Which customers are high-value

14

Page 15: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 15/19

Regression

• Regression deals with numerically valued

outcomes rather than discrete categories asoccurs in classification.

 –   

 –  Estimating family income

15

Page 16: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 16/19

Prediction

• Essentially the same as classification and

estimation but involves future behavior

• Historical data is used to build a model

• The model developed is then applied to current

inputs to predict future outputs

 –  Predict which customers will respond to an

advertising promotion

 –  Classifying loan applications

16

Page 17: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 17/19

Association Rule Discovery

• Association Rule Discovery is also referred to

as Market Basket Analysis, or Affinitygrouping

• A common exam le is discoverin which

items are bought together at thesupermarket. Once this is known, decisionscan be made on, for example:

 –  how to arrange items on the shelves –  which items should be promoted together

17

Page 18: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 18/19

Clustering

Clustering is also sometimes referred to assegmentation (though this has other meanings inother fields)

• In clustering there are no pre-defined classes. A

similarity measure is used to group records. The usermust attach meaning to the clusters formed

• Clustering often precedes some other data miningtask, for example:

 –  once customers are separated into clusters, a promotionmight be carried out based on market basket analysis of the resulting cluster

18

Page 19: Data Mining, Data Pattern, Machine Learning(Week 2

8/8/2019 Data Mining, Data Pattern, Machine Learning(Week 2

http://slidepdf.com/reader/full/data-mining-data-pattern-machine-learningweek-2 19/19

Deviation Detection• Records whose attributes deviate from the norm

by significant amounts are also called outliers• Application areas include:

 –  fraud detection

  –   

 –  tracing defects

• Visualization techniques and statisticaltechniques are useful in finding outliers

• A cluster which contains only a few records mayin fact represent outliers

19