Upload
johannes-hoppe
View
1.622
Download
0
Tags:
Embed Size (px)
Citation preview
STAATLICHANERKANNTEFACHHOCHSCHULE
STUDIERENUND DURCHSTARTEN.
Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 13.05.2011
STAATLICHANERKANNTEFACHHOCHSCHULE
Further Data Mining Algorithms
Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 13.05.2011
Data Mining Algorithms - Regression Analysis
01
3
DM Algorithms - Regression Analysis
Regression Analysis
› AKA. function approximation› Includes any techniques for modeling and analyzing
several variables› Models the relationship between one or more variables
you are trying to predict (dependent variables) and the predictive variables (independent variables)
4
DM Algorithms - Regression Analysis
SSAS build in
› MS Linear Regression Analysis› MS Logistic Regression Analysis› MS Time Series Algorithm
http://msdn.microsoft.com/en-us/library/ms170993(SQL.90).aspx
5
6
DM Algorithms - Regression / Linear Regression
Linear Regression
› Analyze two continuous columns › Relationship is an equation› Equation is a line (linear equation)
f(x) = m*x + b› Error == distance from the regression line
http://msdn.microsoft.com/en-us/library/ms174824(SQL.90).aspx
7
DM Algorithms - Regression / Linear Regression
0 100 200 300 400 500 6000
1000
2000
3000
4000
5000
6000
f(x) = 7.81381138497918 x + 866.585289444156R² = 0.701037764746929
Sales
SalesLinear (Sales)Linear (Sales)
Example
8
DM Algorithms - Regression / Linear Regression
Explanation
The Diagram shows a relationship between sales and advertising along with the regression equation. The goal is to be able to predict sales based on the amount spent on advertising. The graph shows a very linear relationship between sales and advertising. A key measure of the strength of the relationship is the R-square. The R-square measures the amount of the overall variation in the data that is explained by the model.This regression analysis results in an R-square of 70%. This implies that 70% of the variation in sales can be explained by the variation in advertising.[Source: Olivia Parr Rud et. al, Data Mining Cookbook]
9
DM Algorithms - Regression / Logistic Regression
Logistic regression › Dependent variables have values between 0 and 1› Functions which describes the probability of a given event › Instead of creating a straight line, logistic regression
analysis creates an "S" shaped curve that contains maximum and minimum constraints
› Wikipedia Algorithm != MSDN Algorithm
http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx
10
DM Algorithms - Regression / Logistic Regression
Logistic regression
DM Algorithms - Regression / Time-Series
MS Time-Series Algorithm
› Trend Analysis› Optimized for analyzing continuous values
› eg. product sales over time
› Train Predict› Cross-predictions possible! *
* cool!http://msdn.microsoft.com/en-us/library/ms174923(SQL.90).aspx
DM Algorithms - Regression / Time-Series
MS Time-Series Algorithm
13
DM Algorithms - Regression / Time-Series
› Combination of 2 algorithms, results are mixed› ARTxp
› Auto Regressive Tree Method› Developed by Microsoft Research› Based on Microsoft Decision-Tree› For Short term predictions
› ARIMA:› Auto Regressive Integrated Moving Average› Developed by Box and Jenkins› For long term predictions
http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspxhttp://msdn.microsoft.com/en-us/library/bb677216.aspx
Data Mining Algorithms - Neural Networks
02
14
DM Algorithms - Neural Networks
15
DM Algorithms - Neural Networks
Neural Networks (NN or ANN)
› Better term: artificial neural networks (ANN),in opposite to biological NN
› Sometimes called neuronal networks
› By the way…http://code.google.com/p/clustered-neuronal-network/wiki/ProjektInfos
16
17
DM Algorithms - Neural Networks
Definition
› A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use.
› It resembles the brain in two respects:› Knowledge is acquired by the network through a learning process. › Interneuron connection strengths known as synaptic weights are
used to store the knowledge.
[Source: Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan. ]
18
DM Algorithms - Neural Networks
› Most NN are composed of several layers of neurons› The direction of most connections is from input to output › Often used: Back Propagation Networks› A single neuron has several inputs with individual weights
and one output › In the basic form, the output is activated if the sum of
inputs*weights exceeds a given threshold › Learning is done with a target value at an additional
training input plus a training mode signal.
19
THANK YOUFOR YOUR ATTENTION
20