Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Deep Learning: Theory and Practice

15-03-2018Deep Learning

[email protected]

Neural Networks

Multi-layer Perceptron [Hopfield, 1982]

thresholding function

non-linear function (tanh,sigmoid)

• Useful for classifying non-linear data boundaries - non-linear class separation can be realized given enough data.

Neural Networks

tanh sigmoid ReLu

Cost-Function

are the desired outputs

Mean Square Error Cross Entropy

Types of Non-linearities

Richard, Michael D., and Richard P. Lippmann. "Neural network classifiers estimate Bayesian a posteriori probabilities." Neural computation 3.4 (1991): 461-483.

Learning Posterior Probabilities with NNsNeural networks predict posterior probabilities [Richard, 1991]

When DNNs are trained with CE or MSE

Neural networks estimate conditional expectation of the desired targets given the input

When the targets are discrete classes conditional expectation is the class posterior !

Learning Posterior Probabilities with NNsChoice of target function

• Softmax function for classification

• Softmax produces positive values that sum to 1 • Allows the interpretation of outputs as posterior

probabilities

Parameter Learning

Error function for entire data

Typical Error Surface as a function of parameters

(weights and biases)

Parameter Learning

Non-linear nature of error function

• Move in the reverse direction of the gradient

Error back propagation

Error surface close to a local optima

Summary so far…

• Neural networks as discriminative classifiers

• Need for hidden layer

• Choice of non-linearities and target functions

• Estimating posterior probabilities with NNs

• Parameter learning with back propagation.

Need For Deep NetworksModeling complex real world data like speech, image, text

• Single hidden layer networks are too restrictive.

• Needs large number of units in the hidden layer and

trained with large amounts of data.

• Not generalizable enough.

Networks with multiple hidden layers - deep networks

(Open questions till 2005)

• Are these networks trainable ?

• How can we initialize such networks ?

• Will these generalize well or over train ?

Deep Networks IntuitionNeural networks with multiple hidden layers - Deep networks [Hinton, 2006]

Neural networks with multiple hidden layers - Deep networks

Deep Networks Intuition

Neural networks with multiple hidden layers - Deep networks

Deep networks perform hierarchical data abstractions which enable the non-linear separation of complex data samples.

Deep Networks Intuition

Summary so far…

• Linear models to neural network.

• Deep Neural networks as extensions of NNs.

• Intuition behind behind multiple hidden layers

Deep Networks- Will the networks generalize with deep networks

• DNNs are quite data hungry and performance

improves by increasing the data.

• Generalization problem is tackled by providing

training data from all possible conditions.

• Many artificial data augmentation methods have

been successfully deployed

• Providing the state-of-art performance in several

real world applications.

• The input data representation is one of most important components of any machine learning system.

Representation Learning in Deep Networks

Cartesian Coordinates Polar Coordinates

• The input data representation is one of most important components of any machine learning system.

• Extract factors that enable classification while suppressing factors which are susceptible to noise.

• Finding the right representation for real world applications - substantially challenging.

• Deep learning solution - build complex representations from simpler representations.

• The dependencies between these hierarchical representations are refined by the target.



Pixels

Edges

Contours

Object Parts

Object Identity

[Zeiler, 2014]

Documents

Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs