39
Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School www.wi.hs-wismar.de/ ~laemmel [email protected]

Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School laemmel [email protected]

Embed Size (px)

Citation preview

Neural Networks and Data Mining Folie 1

Artificial Neural Networksand

Data Mining

Uwe Lämmel

Wismar Business

School

www.wi.hs-wismar.de/~laemmel

[email protected]

Neural Networks and Data Mining Folie 2

Content

Data Mining Classification: approach Data Mining Cup

– 2004: Who will cancel?– 2007: Who will get a rebate coupon?– 2008: How long will someone participate in a

lottery?– 2009: Forecast of book sales figures– 2010 ?

Clustering: approach– Behaviour of bank customers

Neural Networks and Data Mining Folie 3

Data Mining

Data Mining is a – systematic and automated

discovery and extraction– of previously unknown knowledge – out of huge amount of data.

"KDD – Knowledge Discovery in Data bases" – synonym

Notion wrong: Gold Mining Data Mining

Neural Networks and Data Mining Folie 4

Data Mining – Applications

classification

clustering

association

prediction

text mining

web mining

clustering partitioning a data set into subsets

(clusters), so that the data in each subset (ideally) share some common features – similarity or proximity for some defined

distance measure is building classes

classification items are placed in subsets

(classes) classes have known properties

– customer is bad, average, good– pattern recognition– …

set of training items is used to train the classification algorithm

Neural Networks and Data Mining Folie 5

Data Mining Process

CRISP-DM model

Neural Networks and Data Mining Folie 6

Content

Data Mining Classification: approach using NN Data Mining Cup Clustering: approach

Neural Networks and Data Mining Folie 7

Classification using NN

prerequisite set of training pattern (many patterns)

approach code the values divide set of training pattern into:

– training set– test set

build a network train the network using the training set check the network quality using the test

set

real data

training p.

coded p.

training set test set

Neural Networks and Data Mining Folie 8

Development of an NN-application

calculate network output compare to

teaching output

use Test set data

evaluate output

compare to teaching output

change parameters

modify weights

input of training pattern

build a network architecture

quality is good enough

error is too high

error is too high

quality is good enough

Neural Networks and Data Mining Folie 9

Build an Artificial Neural Network

Number of Input Neurons?– depends on the number of attributes– depends on the coding

Number of Output Neurons?– depends on the coding of the class attribute

Number of Hidden Neurons?– experiments necessary– generally: not more than input neurons– quarter … half of number of input neurons

may work– see capacity of a neural network

Neural Networks and Data Mining Folie 10

Experiments using the JavaNNS

Build a network Load training-pattern open the Error Graph open the Control Panel Initialize the network try different learning parameter: 0.1, 0.2, 0.5,

0.8 Start Learning

Neural Networks and Data Mining Folie 11

Getting Results

value the error Finally:

– make the test-Pattern the actual one

– Save Data …– include output files– save as a .res-file

Evaluate the .res-file

Neural Networks and Data Mining Folie 12

Experiments

How can we improve the results?– Data pre-processing?– Architecture of ANN?– Learning Parameters?– Evaluation of the results: post-processing?

record your work!

Neural Networks and Data Mining Folie 13

Content

Data Mining Classification: approach Data Mining Cup

– 2004: Who will cancel?– 2007: Who will get a rebate coupon?– 2008: How long will someone participate in a

lottery?– 2009: Forecast of book sales figures– 2010 ?

Clustering: approach– Behaviour of bank customers

Neural Networks and Data Mining Folie 14

Data Mining Cup www.data–mining–cup.de

annual competition for students runs April – May /June real world problem:

– problem– set of training data – set of data for classification– to be developed: classification

supported by many companies (data/software)

~ 200 – 300 participants workshop (user day)

Neural Networks and Data Mining Folie 15

DMC2004: A Mailing Action

mailing action of a company: – special offer– estimated annual income per customer:

given:– 10,000 sets of customer data

containing 1,000 cancellers (training) problem:

– test set contains 10,000 customer data

– Who will cancel ? – Whom to send an offer?

customerwillcancel

willnot cancel

gets an offer 43.80€ 66.30€

gets no offer 0.00€ 72.00€

Neural Networks and Data Mining Folie 16

Mailing Action – Aim?

no mailing action:– 9,000 x 72.00 = 648,000

everybody gets an offer:– 1,000 x 43.80 + 9,000 x 66.30 = 640,500

maximum (100% correct classification):– 1,000 x 43.80 + 9,000 x 72.00 = 691,800

customerwillcancel

willnot cancel

gets an offer 43.80€ 66.30€

gets no offer 0.00€ 72.00€

Neural Networks and Data Mining Folie 17

Goal Function: Lift

basis: no mailing action: 9,000 · 72.00goal = extra income:liftM = 43.8 · cM + 66.30 · nkM – 72.00· nkM

customerwillcancel

willnot cancel

gets an offer

43.80€ 66.30€

gets no offer

0.00€ 72.00€

Neural Networks and Data Mining Folie 18

Dataresults>

<important

^missing values^

----- 32 input data ------

Neural Networks and Data Mining Folie 19

Feed Forward Network – What to do?

train the net with training set (10,000) test the net using the test set ( another 10,000)

– classify all 10,000 customer into canceller or loyal– evaluate the additional income

Neural Networks and Data Mining Folie 20

Results

data mining cup 2002

neural network project 2004

gain: – additional income by the mailing action

if target group was chosen according analysis

Neural Networks and Data Mining Folie 21

DMC 2007: Rebate System

Check-out couponing allows an individual coupon generation at the check-out

The coupon is printed at the end of the sales slip depending on the current customer.

Questions: – How can the retailer identify

whether a customer is a potential couponing customer?

– On what coupons he will respond?

Neural Networks and Data Mining Folie 22

Couponing Print:

– coupon A– coupon B– No coupon

50,000 customer cards for training

Classify another 50,000 customer!

Cost function:– coupon not redeemed (false assignment to A or B): –1 – coupon A redeemed (correct assignment to A): +3– coupon B redeemed (correct assignment to B): +6

Maximize the value!

Neural Networks and Data Mining Folie 23

Data Understanding What is the meaning of the attributes? Type and range of values?

Neural Networks and Data Mining Folie 24

20–20–2 Network

Profit = 3AA + 6 BB – (NA+NB+BA+AB)

results: winner 2007 7,890 my version 6,714 our students 6,468

(73/230)

Neural Networks and Data Mining Folie 25

DMC2008: Participation in a Lottery Predicting, at the beginning of the lottery,

how long participants will participate:

0 – The first ticket has not been paid for 1 – Only the ticket for the first class has been paid for 2 – Only the first two classes were played 3 – The lottery was played until the end

but no ticket purchased for the following lottery

4 – At least first ticket for the following lottery purchased

cost matrix

Neural Networks and Data Mining Folie 26

Data

113,476 pattern! 69 attributes

– new customer (yes/no)

– age– bank– car– …

Neural Networks and Data Mining Folie 27

100–40–20–5 Network

results: 1,030,240 RWTH Aachen (1)

…1,024,535 RWTH Aachen (8)

865,565 Bauhaus Univ. Weimar (100)

Univ. Wismar: 878,550 – 835,035 – 1,494,315 (212)

Neural Networks and Data Mining Folie 28

DMC 2009 – online bookshop „Libri“

Sales figures training:– more than 1.800 books– 2.418 shops

Sales figures forecast– 8 books– 2.394 shops

Neural Networks and Data Mining Folie 29

DMC 2009 – online bookshop „Libri“

Neural Networks and Data Mining Folie 30

DMC 2009 – 83-25-9-3 network

Neural Networks and Data Mining Folie 31

DMC 2010: Revenue maximisation by intelligent couponing

Many customers only make an order in an online shop once

decision whether to send a voucher worth € 5.00 voucher for those

who would not have decided to re-order by themselves.

32,427 data sets for training 32,428 data sets for prediction 37 attributes per set + target attribute in training set

Neural Networks and Data Mining Folie 32

DMC 2010

out of 67 teams!

Neural Networks and Data Mining Folie 33

Content

Data Mining Classification: approach Data Mining Cup Clustering: approach

– Behaviour of bank customers

Neural Networks and Data Mining Folie 34

Clustering Transaction Data

Co–operation Hochschule Wismar HypoVereinsbank Medienhaus Rostock

Issue What information can be extracted

from turnover time series?Strategy1. Clustering time series data2. Assign customers/accounts to clusters3. Examine clusters

Neural Networks and Data Mining Folie 35

Transaction Data & Time Series

Original financial data not suitable: Order of values is important Time displacements are

problematic

Corporate clients 223 branches

Cumulated transactions per

Month Account Type of transaction

... for a total of 6 years

Neural Networks and Data Mining Folie 36

Fourier versus Original Data

No displacementSimilarity detected on both: transaction curve and frequency spectrum

Data is displacedfrequency spectrum shows similarity

Neural Networks and Data Mining Folie 37

Using a classification model

Clustering

Sequence A

Initial Cluster

Preprocessing

Classification Model

t0 tm

1. Building the Model

Customer Turnover ...

New Cluster

Sequence B

Preprocessing

t0+n tm+n

2. Applying themodel

Identical

?

3. Comparing clusterassignments

Different

Initial Cluster

Neural Networks and Data Mining Folie 38

Clustering & Prediction Results

140.000 records 1 record = 1 account 6x5 SOM = max. 30 clusters average changes of cluster assignments: ca.

19%

Variability per Business Sector22,3% Taxi 239/107022,3% Ship Broker Offices

64/47120,9% Churches 228/109120,2% Trucking 1010/5008

Neural Networks and Data Mining Folie 39

Ende