Upload
david-rostcheck
View
112
Download
3
Embed Size (px)
Citation preview
An introduction to Deep Learning
Who am I?
• David Rostcheck
• I am a data science consultant
• Follow my articles on LinkedIn
DEEP LEARNING
in some tests, Deep Learning has already shown abilities at the same level as humans
These include: • computers that understand natural
language• autonomous vehicles • programs that can identify what is
occurring in a video
It’s notable that
these solutions to diverse problems
in very different fields
use the same powerful technology
NEURAL NET
a neural net is a
simulation
of the brain,
a mathematical abstraction
in the real brain,
the neurons send signals withfre cuen cies
not discrete signals
tools exist that try to simulate the brain in a way that’s
more accurate
to the real brain
Example: Numenta NuPIC, a type of Hierarchical Temporal Memory (HTM)
but the techniques of neural nets
are sufficient
to deliver results
similar or better than humans
in specific cognitive tests
therefore:
Deep Learning
what is it?
common point of view:
a with
neural distinct
net levels
is correct, but…
there is another point of view,
maybe more useful,
that we are going to present here
it comes from Vincent Vanhoucke, Principal Research Scientist at Google.
the following comes from
his course on Deep
Learning, on Udacity
He thinks about Deep Learning as
a framework for calculating
linear and almost linear
equations in an efficient way
to develop this framework,
we are going to construct a
classifier
the simplest (and worst)
possible
but wait a minute…
why
a classifier?
Because classification (or more generally prediction) is a central technique in Machine Learning
with this, we can achieve ranking, regression, detection, reinforcement learning, and more…
we start with a linear equation, in vector form…
Think about constructing a simple classifier to predict, for each occurrence of X, which is:
to do this, we must learn the values of W and b
Does it work well?
No.
It’s the worst.
Why?
there are two problems…
No. 1:
it gives values,
and what we want
are probabilities
we can fix it with the“softmax” function:
we express the correct values in a vector of values 1 (correct) and 0 (the others).
we call this“one-hot encoding”
to evaluate errors, we compare the probabilities with the correct values
using what we call“cross-entropy”
better, but…
there remains the second problem:
our equation is linear
and doesn’t represent non-linear equations well
this problem killed the perceptron (single level neural net)
it doesn’t help to just add levels to the network
because we can represent whatever combination of linear operations as another linear operation – we can reduce the new network to another WX + b with the same problem
What do we do?
without another option,
we have to introduce non-linear
functions
logistic function
but it’s expensive to calculate – we can use a simplified approximation called a “Rectified Linear Unit” , o ReLU
now we can construct our neural net, in a way that’s efficient to calculate
we can express this in a modular way, with a series of linear or almost linear operations with a matrix ... that allows us to us the power of a GPU
this is good, but we are still lacking something…
to improve our estimation, we must minimize the error,
and this requires us to calculate the derivative of the function
think about the chain rule of calculus:
d f(x) = d du f(x)dx du dx
that can convert a derivative into a product (of other derivatives):
that fits in our modular framework
now we have it! a general, modular framework that incorporates everything we need!
and we can construct deep neural nets, adding more levels as we need them
…but wait a minute:
why do we like deep networks?
the most interesting problems,
like language and vision,
have very complex rules
we need a lot of parameters to represent them
yes, but why don’t we use wider networks?
why is it better to have deep ones?
are more efficient and better capture the structure inherent in many problems
CONVNETS
the convolutional network, or convnet,
transforms the input
so that the translation
of the input does not matter
we use it for visual recognition
Let’s start with a photo:
We use a region (kernel) of a photo like an input to another small neural net, with K as the output
we slice the window across the photo
this transforms the photo into another new one, with K color channels, and different dimensions
this operation is called
a convolution
if the region (the “kernel”) has
the same size as the original,
what did we obtain?
?
in this case,
we recover the original photo
Questions?
?Contact: [email protected], twitter: @davidrostcheckArticles: http://linkedin.com/in/davidrostcheck