Upload
clifton-knight
View
223
Download
3
Embed Size (px)
Citation preview
Data Mining
Data Mining Taxonomy
Predictive Method
- …predict the value of a particular attribute…
Descriptive Method
- …foundation of human-interpretable patterns that describe the data…
Overview
Introduction Data Mining Taxonomy Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining
Definition of Data Mining
“…The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data…”
Fayyad, Piatetsky-Shapiro, Smyth [1996]
Overview
Introduction Data Mining Taxonomy Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining
Data Mining Taxonomy
Descriptive Models- Clustering - Association
Creation of different customer segments,
unrelated products that are bought together (market basket analysis).
Predictive Models- Classification- Regression
customer’s likelihood of switching to a competitor,
an insurance claim’s likelihood of being fraudulent,
the likelihood someone will place a catalog order,
the revenue a customer will generate during the next year
Classification & Regression
Classification:…aim to identify the characteristics that
indicate the group to which each case belongs…
Two Crows Corporation Regression:…uses existing values to forecast what
other values will be… Two Crows Corporation
Clustering & Association
Clustering:…divides a database into different groups……find groups that are very different from each
other, with similar members…. Two Crows Corporation
Association:…involve determinations of affinity-how
frequently two or more things occur together…
Two Crows Corporation
Deviation Detection & Pattern Discovery
Deviation Detection:
…discovering most significant changes in data from previously measured or normative values…
V. Kumar, M. Joshi, Tutorial on High Performance Data Mining.
Sequential Pattern Discovery:
…process of looking for patterns and rules that predict strong sequential dependencies among different events…
V. Kumar, M. Joshi, Tutorial on High Performance Data Mining.
Overview
Introduction Data Mining Taxonomy Data Mining Models and Algorithms Quick Wins with Data Mining Privacy-Preserving Data Mining
Data Mining Models & Algorithms
Neural Networks Decision Trees Rule Induction K-nearest Neighbor Logistic regression Discriminant Analysis
Neural Networks
- efficiently model large and complex problems;- may be used in classification problems or for
regressions;- Starts with input layer => hidden layer => output
layer
1
2
3
4
5
6
Inputs Output
Hidden Layer
Neural Networks (cont.)
- can be easily implemented to run on massively parallel computers;
- can not be easily interpret;- require an extensive amount of training time;- require a lot of data preparation (involve very
careful data cleansing, selection, preparation, and pre-processing);
- require sufficiently large data set and high signal-to noise ratio.
Decision Trees (cont.)
- handle very well non-numeric data;- work best when the predictor
variables are categorical;
Decision Trees
-a way of representing a series of rules that lead to a class or value;
-basic components of a decision tree: decision node, branches and leaves;
Income>40,000
Job>5 High Debt
Low Risk High Risk High Risk Low Risk
No Yes
YesNo Yes No
Rule Induction
- method of deriving a set of rules to classify cases;
- generate a set of independent rules which do not necessarily form a tree;
- may not cover all possible situations;- may sometimes conflict in their
predictions.
K-nearest neighbor
- decides in which class to place a new case by examining some number of the most similar cases or neighbors;
- assigns the new case to the same class to which most of its neighbors belong;
X X x
X Y x
X N X
XY
Artificial Neural Networks
Introduction
What is neural computing/neural networks?The brain is a remarkable computer. It interprets imprecise information from
the senses at an incredibly high speed.
Introduction
• A good example is the processing of visual information: a one-year-old baby is much better and faster at recognising objects, faces, and other visual features than even the most advanced AI system running on the fastest super computer.
• Most impressive of all, the brain learns
(without any explicit instructions) to create the internal representations that make these skills possible
Biological Neural Systems The brain is composed of approximately 100
billion (1011) neurons
Schematic drawing of two biological neurons connected by synapses
Dendrites
Synapse
Axon
A typical neuron collects signals from other neurons through a host of fine structures called dendrites.
The neuron sends out spikes of electrical activity through a long, thin strand known as an axon, which splits into thousands of branches.
At the end of the branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons.
When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon.
Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on the other changes
What is a Neural Net?
A neural net simulates some of the learning functions of the human brain. It can recognize patterns and "learn." You can use it to forecast and make smarter business decisions. It can also serve as an "expert system" that simulates the thinking of an expert and can offer advice. Unlike conventional rule-based artificial-intelligence software, a neural net extracts expertise from data automatically - no rules are
required. In other words through the use of a trial and error
method the system “learns” to become an “expert” in the field the user gives it to study.
Components Needed: In order for a neural network to learn it needs 2
basic components:• Inputs
• Which consists of any information the expert uses to determine his/her final decision or outcome.
• Outputs• Which are the decisions or outcome arrived at by the expert
that correspond to the inputs entered.
How does a neural network learn?
A neural network learns by determining the relation between the inputs and outputs.
By calculating the relative importance of the inputs and outputs the system can determine such relationships.
Through trial and error the system compares its results with the expert provided results in the data until it has reached an accuracy level defined by the user. With each trial the weight assigned to the inputs is
changed until the desired results are reached.
Artificial Neural Networks
Artificial neurons are analogous to their biological
inspirers
Here the neuron is actually a processing unit, it calculates the weighted sum of the input signal to the neuron to generate the activation signal a, given by
f
a y
x 1
x
x
2
N
w
w
w
1
2
N
An artificial neuron
a w xi ii
N
1
where wi is the strength of the synapse connected to
the neuron, xi is an input feature to the neuron
Artificial Neural Networks
The activation signal is passed through a transform function to produce the output of the neuron, given by
The transform function can be linear, or non-linear, such as a threshold or sigmoid function [more later …].
For a linear function, the output y is proportional to the activation
signal a. For a threshold function, the output y is set at one of two levels, depending on whether the activation signal a is greater than or less than some threshold value. For a sigmoid function, the output y varies continuously as the activation signal a changes.
y f a ( )
Artificial Neural Networks
Artificial neural network models (or simply neural networks) are typically composed of interconnected units or artificial neurons. How the neurons are connected depends on some specific task that the neural network performs.
Two key features of neural networks distinguish them from any other sort
of computing developed to date:
Neural networks are adaptive, or trainable Neural networks are naturally massively parallel
These features suggest the potential for neural network systems capable of learning, autonomously improving their own performance, adapting automatically to changing environments, being able to make decisions at high speed and being fault tolerant.
Neural Network Architectures
Feed-forward single layered networks
Feed-forward multi-layer networks
Recurrent networks
Neural Network Applications
Speech/Voice recognition Optical character recognition Face detection/Recognition Pronunciation (NETtalk) Stock-market prediction Navigation of a car Signal processing/Communication Imaging/Vision ….