An Introduction to Deep Learning with RapidMiner · An Introduction to Deep Learning with...

Preview:

Citation preview

An Introduction to Deep Learning with RapidMiner

Philipp Schlunder - RapidMiner Research

©2017 RapidMiner, Inc. All rights reserved. - 2 -

1. Things to know before getting started

2. What’s Deep Learning anyway?

3. How to use it inside RapidMiner

What’s in store for today?

©2017 RapidMiner, Inc. All rights reserved. - 3 -

Before getting up to speed

©2017 RapidMiner, Inc. All rights reserved. - 4 -

Hardware

What do I actually need?

Minimum Recommended Starter Kit Optimum

PC or Mac running RapidMiner PC with supported NVIDIA GPU running Linux/Windows

Dedicated Server with multiple GPUs running Linux

Software

• RapidMiner Studio or RapidMiner Server• Python installation (at least Version 3.5) with

• Pandas• Linux recommended: Tensorflow (for GPUs)• Windows recommended: Microsoft CNTK

• Keras extension

Keras

TensorflowCognitive

ToolkitTheano

©2017 RapidMiner, Inc. All rights reserved. - 5 -

I don’t have a server with GPUs!

We’ve got you covered with RapidMiner Server on the Amazon AWS and Microsoft Azuremarketplaces

• Put your server where your data is

• Server installation in minutes

• No need for own hardware

• Bring-your-own-license images(Amazon AMIs & Microsoft VHDs)

©2017 RapidMiner, Inc. All rights reserved. - 6 -

1. Stuck with current model– Accuracy not high enough

2. Huge amount of training data available

3. Fixed specific use-case

4. Need notion of memory– Understanding context

5. Multi-dimensional input– Pictures: 3 colors per Pixel

– Text: Word Vectors

When should I try Deep Learning?

©2017 RapidMiner, Inc. All rights reserved. - 7 -

• Large number of parameters to tune often req. huge amount of data– Is my infrastructure suited for that?

– Is the potential ROI high enough for the resource investment?

• For many general ML tasks only minor improvement

• Often not that flexible and very domain specific

• Hard to understand reasoning (consider LIME)

• General Data Protection Regulation (25th of May 2018, EU-based)– “right to an explanation”

– “prevent discriminatory effects based on race, opinions, health”

Things to consider

©2017 RapidMiner, Inc. All rights reserved. - 8 -

What is Deep Learning anyway?

©2017 RapidMiner, Inc. All rights reserved. - 9 -

Machine Learning

©2017 RapidMiner, Inc. All rights reserved. - 10 -

Hierarchical Learning

©2017 RapidMiner, Inc. All rights reserved. - 11 -

First Representation Second RepresentationInput

Hierarchical Learning

©2017 RapidMiner, Inc. All rights reserved. - 12 -

Hierarchical Learning

©2017 RapidMiner, Inc. All rights reserved. - 13 -

Deep Learning

3x

©2017 RapidMiner, Inc. All rights reserved. - 14 -

Layers

Input Layer Output Layer

Hidden Layer

©2017 RapidMiner, Inc. All rights reserved. - 15 -

Neurons

Input Layer Output Layer

Number of Attributes

Hidden Layer

©2017 RapidMiner, Inc. All rights reserved. - 16 -

Neurons

Input Layer Output Layer

Number of Attributes

Number of Label values

Hidden Layer

©2017 RapidMiner, Inc. All rights reserved. - 17 -

Neurons

Input Layer Output Layer

Number of Attributes

Number of Label values

Hidden Layer

New Attribute based on previous ones and some activation function (e.g. ReLu)

©2017 RapidMiner, Inc. All rights reserved. - 18 -

Layer: Fully-Connected

Input Layer Output Layer

Hidden Layer

Fully-Connected (Dense) Layer

©2017 RapidMiner, Inc. All rights reserved. - 19 -

Change of Perspective

©2017 RapidMiner, Inc. All rights reserved. - 20 -

Layer: Convolutional

©2017 RapidMiner, Inc. All rights reserved. - 21 -

Layer: Convolutional

©2017 RapidMiner, Inc. All rights reserved. - 22 -

Layer: Convolutional

Example Filter

X =

Activation Map

(N – F) / Stride + 1

N = 4

F = 2

©2017 RapidMiner, Inc. All rights reserved. - 23 -

Layer: Convolutional

Example Filter

X =

Activation Map

(N – F) / Stride + 1

N = 4

F = 2

©2017 RapidMiner, Inc. All rights reserved. - 24 -

Layer: Convolutional

Example Filter

X =

Activation Map

(N – F) / Stride + 1= (4 – 2) / 1 + 1= 3

N = 4

F = 2

Stride = 1

©2017 RapidMiner, Inc. All rights reserved. - 25 -

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

1

0

0

1

Example Filter11

X =

Activation Map

4x1 + 3x0+ 1x0 + 7x1

= 11

©2017 RapidMiner, Inc. All rights reserved. - 26 -

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

1

0

0

1

Example Filter11 11

X =

Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 27 -

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

1

0

0

1

Example Filter11 11 9

X =

Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 28 -

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

1

0

0

1

Example Filter11

7

11 9

X =

Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 29 -

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

1

0

0

1

Example Filter11

7

13

11

10

6

9

10

7

X =

Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 30 -

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

Layer: Convolutional

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

Padding = 1

©2017 RapidMiner, Inc. All rights reserved. - 31 -

Layer: ConvolutionalActivation Map

©2017 RapidMiner, Inc. All rights reserved. - 32 -

Layer: Convolutional

1

0

0

1

Filter:Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 33 -

Layer: Convolutional

Convolutional Layer

1

0

1

0

1

0

0

1

Filter:Activation Map

©2017 RapidMiner, Inc. All rights reserved. - 34 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

Max(imum) Pooling

©2017 RapidMiner, Inc. All rights reserved. - 35 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7 8

Max(imum) Pooling

Stride = 2

©2017 RapidMiner, Inc. All rights reserved. - 36 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

9

8

Max(imum) Pooling

©2017 RapidMiner, Inc. All rights reserved. - 37 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

9

8

4

Max(imum) Pooling

©2017 RapidMiner, Inc. All rights reserved. - 38 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

9

8

4

Max(imum) Pooling

15/4

22/4

24/4

9/4

Average Pooling

©2017 RapidMiner, Inc. All rights reserved. - 39 -

Layer: Pooling

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

9

8

4

Max(imum) Pooling

• Representation becomes smaller less values to optimize

• Applied to each activation map of the previous layer

©2017 RapidMiner, Inc. All rights reserved. - 40 -

𝑠0

Layer: Recurrent

𝑠0

State of the layer

©2017 RapidMiner, Inc. All rights reserved. - 41 -

𝑠1

Layer: Recurrent

𝑠1

State of the layerchanges with each repetition

©2017 RapidMiner, Inc. All rights reserved. - 42 -

𝑠3

𝑠1

𝑠2

Layer: Recurrent

𝑠0

Neurons with hidden states

I drank some coffee

Drinking a hot beverage that […] for me it is coffee.

𝑠93𝑠1 𝑠2𝑠0…

©2017 RapidMiner, Inc. All rights reserved. - 43 -

Layer: Long Short Term Memory

of

𝑠𝑝𝑟𝑒𝑣

i

Information can be:

• Not stored (input)

• Only stored

partially (gate)

• Forgotten

• Be considered only

partially (output)

g

©2017 RapidMiner, Inc. All rights reserved. - 44 -

Layer: DropoutDropout Rate

• Probability of

dropping a neuron

• Between 0 and 1

Dropout is a form of regularization. It helps reduce complexity and frame the model.

©2017 RapidMiner, Inc. All rights reserved. - 45 -

Input & Setup

©2017 RapidMiner, Inc. All rights reserved. - 46 -

Input

Input Layer Output Layer

Hidden Layer

©2017 RapidMiner, Inc. All rights reserved. - 47 -

Input

Input Layer input_shapedepends on layers used

• Convolution/Recurrent:

(timesteps, input_dim)

= (1, 3)

• Others:

(input_dim,)

= (3,)

©2017 RapidMiner, Inc. All rights reserved. - 48 -

Input

Input Layer input_shape

• Convolution/Recurrent:

(timesteps, input_dim)

= (4,3)

since the data contains

timesteps

©2017 RapidMiner, Inc. All rights reserved. - 49 -

Input

Input Data Set

batch_size• Number of entries to

use in one model building step(obtaining weights)

Here: 4

Entry (picture, sentence, …)

Batch

©2017 RapidMiner, Inc. All rights reserved. - 50 -

The Model – Weights

weights

Weight amount ofa neuron’s contribution

Weights are changed using an optimizer

An optimizer reduces the loss

One weight change cycle is called an epoch

©2017 RapidMiner, Inc. All rights reserved. - 51 -

The Model – Building it

Parameter Binary Classification Multi-Class Classification Regression

optimizer adam/rmsprop adam/rmsprop adam/rmsprop

loss binary_crossentropy categorical_crossentropy mse

While training observe loss (log/Tensorboard):

• Bump change weight init

• Lowers slowly learning rate too low

• Fast decline, then stagnating learning rate too high

• Brief decline, rises extremely afterwards learning rate

way too high

• rmsprop: esp. good

choice for recurrent

• adam: rmsprop with

extra (momentum)

©2017 RapidMiner, Inc. All rights reserved. - 52 -

ou

tpu

t

recu

rren

t

po

olin

g

con

volu

tio

n

inp

ut

con

volu

tio

n

po

olin

g

con

volu

tio

n

con

volu

tio

n

fully

-co

nn

ecte

d

The Model – Architecture

• Start small and simple

• Test overfitting on small batch

(small loss, ~1.0 accuracy)

• Add regularization until loss goes down• Not going down, learning rate too low• Constant or NaN, learning rate too high

• Hyper Parameter Tuning

• Often:• No. filters rises (64 128 256)• Filter size shrinks (5 3)• Stride shrinks (2 1)• Padding is introduced (0 1)

©2017 RapidMiner, Inc. All rights reserved. - 53 -

Deep Learning Summary

©2017 RapidMiner, Inc. All rights reserved. - 54 -

Summary

• Different Types of Layers:– Fully-connected, convolutional, recurrent, pooling, dropout, …

• Model is defined by:– Setup of layers (architecture)

– Weights obtained through optimization of a loss over several epochs

• Classification vs. Regression:– Change number of units in last layer (Number of possible classes vs.

number of targeted values to predict)

– Change loss from crossentropy to mse

©2017 RapidMiner, Inc. All rights reserved. - 55 -

How to use it in RapidMiner?

©2017 RapidMiner, Inc. All rights reserved. - 56 -

Options in RapidMiner

Feature Neural Net Deep Learning H20

Deep LearningKeras

Fully-Connected Layer X X X

Basic Optimization X X X

Advanced Optimization X X

Multi-Threading X X

GPU Support X

Advanced Layers X

Complexity

©2017 RapidMiner, Inc. All rights reserved. - 57 -

Demo Time

©2017 RapidMiner, Inc. All rights reserved. - 58 -

Classification

©2017 RapidMiner, Inc. All rights reserved. - 59 -

Classification

©2017 RapidMiner, Inc. All rights reserved. - 60 -

Classification

©2017 RapidMiner, Inc. All rights reserved. - 61 -

Classification

3 label values

©2017 RapidMiner, Inc. All rights reserved. - 62 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 63 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 64 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 65 -

Regression 32 entries

256 iterations of changing/optimizing the weightswith a method called ‘Adam’

©2017 RapidMiner, Inc. All rights reserved. - 66 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 67 -

Regression

64 filters 64 activation maps

©2017 RapidMiner, Inc. All rights reserved. - 68 -

Regression

1

9

4

3

1

2

7

4

7

6

3

4

8

3

8

0

7

9

8

4

©2017 RapidMiner, Inc. All rights reserved. - 69 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 70 -

Regression

©2017 RapidMiner, Inc. All rights reserved. - 71 -

Regression

…250 neurons

©2017 RapidMiner, Inc. All rights reserved. - 72 -

Regression

1 value to estimate (closing date of 30th timestep)

©2017 RapidMiner, Inc. All rights reserved. - 73 -

Regression

Log storage directory

©2017 RapidMiner, Inc. All rights reserved. - 74 -

Regression

Log storage directory

tensorboard --logdir=C:/Users/PhilippSchlunder/Desktop/logs/

Start Tensorboard with log storagedirectory location provided throughcommand line

View loss graph in Browser under the address:Localhost:6006

©2017 RapidMiner, Inc. All rights reserved. - 75 -

Q & A

Philipp Schlunder

@WortPixel

inquiries@rapidminer.com @RapidMinerwww.rapidminer.com

Download RapidMiner 7.6

©2017 RapidMiner, Inc. All rights reserved. - 76 -

Sources

• Stanford CS231n

• Keras docu

• When not to use deep learning

• Deep Learning is not the AI future

Recommended