Nutshell Deep Learning in a - Meetupfiles.meetup.com/...15-Stefan-talk1-DLNutshell.pdf · Deep...

Preview:

Citation preview

Deep Learning in a Nutshell

Stefan CarlssonComputer Vision Group KTH

Machine Learning

x = data ---> y = label

y = f(x)

For example: y = ax + b

y = ax + b

x

y

x

xx

xx

xx

xx

x

xx

y = ax + b

x

y

x

xx

xx

x x

xx

x

x x

(yi - axi - b)

x

y

x

xx

xx

x x

xx

x

x x

2

i

mina, b

(yi - axi - b)

x

y

x

xx

ox

o x

xx

x

x x

2

i=1

mina, b

i=2

Use training data only, to find a and b

x1 x2

y1

y2

Supervised learning

1. Data set to be classified: x1 . . . xn2. Select subset for training: xi . . . xi3. Label training subset: yi . . . yi4. Find model parameters: a, b

(yi - axi - b)

2

i=i

mina, b

i=i

1

1

1

k

k

k

Vectors, classification

Data sets are in general vectors: x = (x1 … xm)

Linear classifier: y = wi xi = wTx

Non linearity: y* = f(y)i

y*

y

y = w1x1 + w2x

2 + w3

x1

x2

xx

x xx

xx

x x

x

x

x

x

x

x

x

x

xxx

x

x

x

x

y*

y1

-1

y* = 1

y* = -1

x1

x2

xx

x xx

xx

x x

x

x

x

x

x

x

x

x

xx x

x

x

x

x

x

x

Not linearly separable

x1

x2

xx

x xx

xx

x x

x

x

x

x

x

x

x

x

xx x

x

x

x

x

x

x

Non - linear classifier

x1

x2

xx

x xx

xx

x x

x

x

x

x

x

x

x

x

xx x

x

x

x

x

x

x

or, a non-linear coordinate transformation

y1

y2

y1 = f1(x1, x2)

y2 = f2(x1, x2)

y1 = wa1x

1 + wa2x

2 + wa3

y2 = wb1x

1 + wb2x

2 + wb3

x1

x2

xx

x xx

xx

x x

x

x

x

x

x

x

x

x

xx x

x

x

x

x

y*

y1

-1

y = wcy*1 + wcy*2

x

x

Hierarchies of linear transformations + non-linearity

y* = c

y = wcy*1 + wcy*2y1 = wa1x

1 + wa2x

2 + wa3

y2 = wb1x

1 + wb2x

2 + wb3

x1 x2

y1 y2

y*

y*

y1

-1

Hierarchical nets

Deep learning

Traditional learning:

1. Feature engineering of raw data2. Non-linear classifier design

Deep learning:

Learn the parameters of a hierachical network with raw data as input

Deep learning makes no distinction between feature selection and classifier

design

x2

y*

y*

Hierarchical Convolutional Nets

x3 x4

y* y*

y*

x5 x6

y* y*

y*

x7

y*

y*y*

y* y* y*y*Fully connected layer

Convolutional layer 1

Convolutional layer 2

Image Classification

Imagenet Dataset

15 million images annotated with content

Training convolutional nets with imagenet data

. . .1.2 million

imagesof 1000 classes

1000 classesoutput

Convolutional layersFully

connected layers

60 million parameters for training

2012

Previously: Feature engineering + non linear classifier

Training a classifier for each class using negative examples

Non-linear classifier

Object detection until 2012

A revolution in computer vision

Input image features for early nodes

1st level filters

2nd level generic “filters”

Input image features higher level nodes

Deep learning success

● image recognition● speech recognition● automatic translation● natural language understanding● bioinformatics

IBM acquires AlchemyAPI to power up Watson’s deep learning skills

Bioinformatics

Deep Learning as a generic tool

complex system

input output

Deep Learning as a generic tool

Human knowledge

input output

Deep Learning as a generic tool

Human knowledge

Natural language

Understanding

Deep Learning as a generic tool

● Lab tests

● Wearable monitoring

● Dialogue

● Personal genome

Medicalstate

Medical knowledge

Summary● Deep learning in hierarchical nets is a powerful way of

finding mappings between raw input data and labeled output data that generalizes outside the given datasets used for training

Summary● Deep learning in hierarchical nets is a powerful way of

finding mappings between raw input data and labeled output data that generalizes outside the given datasets used for training

● It is able to identify the structure and regularity of complex mappings in very different domains of application

Summary● Deep learning in hierarchical nets is a powerful way of

finding mappings between raw input data and labeled output data that generalizes outside the given datasets used for training

● It is able to identify the structure and regularity of complex mappings in very different domains of application

● The datasets used for training are the crucial components in this while the software is generic

Recommended