17
Deep Learning: Theory and Practice 15-03-2018 Deep Learning [email protected]

Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Deep Learning: Theory and Practice

15-03-2018Deep Learning

[email protected]

Page 2: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Neural Networks

Multi-layer Perceptron [Hopfield, 1982]

thresholding function

non-linear function (tanh,sigmoid)

• Useful for classifying non-linear data boundaries - non-linear class separation can be realized given enough data.

Page 3: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Neural Networks

tanh sigmoid ReLu

Cost-Function

are the desired outputs

Mean Square Error Cross Entropy

Types of Non-linearities

Page 4: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Richard, Michael D., and Richard P. Lippmann. "Neural network classifiers estimate Bayesian a posteriori probabilities." Neural computation 3.4 (1991): 461-483.

Learning Posterior Probabilities with NNsNeural networks predict posterior probabilities [Richard, 1991]

When DNNs are trained with CE or MSE

Neural networks estimate conditional expectation of the desired targets given the input

When the targets are discrete classes conditional expectation is the class posterior !

Page 5: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Learning Posterior Probabilities with NNsChoice of target function

• Softmax function for classification

• Softmax produces positive values that sum to 1 • Allows the interpretation of outputs as posterior

probabilities

Page 6: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Parameter Learning

Error function for entire data

Typical Error Surface as a function of parameters

(weights and biases)

Page 7: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Parameter Learning

Non-linear nature of error function

• Move in the reverse direction of the gradient

Error back propagation

Error surface close to a local optima

Page 8: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Summary so far…

• Neural networks as discriminative classifiers

• Need for hidden layer

• Choice of non-linearities and target functions

• Estimating posterior probabilities with NNs

• Parameter learning with back propagation.

Page 9: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Need For Deep NetworksModeling complex real world data like speech, image, text

• Single hidden layer networks are too restrictive.

• Needs large number of units in the hidden layer and

trained with large amounts of data.

• Not generalizable enough.

Networks with multiple hidden layers - deep networks

(Open questions till 2005)

• Are these networks trainable ?

• How can we initialize such networks ?

• Will these generalize well or over train ?

Page 10: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Deep Networks IntuitionNeural networks with multiple hidden layers - Deep networks [Hinton, 2006]

Page 11: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Neural networks with multiple hidden layers - Deep networks

Deep Networks Intuition

Page 12: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Neural networks with multiple hidden layers - Deep networks

Deep networks perform hierarchical data abstractions which enable the non-linear separation of complex data samples.

Deep Networks Intuition

Page 13: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Summary so far…

• Linear models to neural network.

• Deep Neural networks as extensions of NNs.

• Intuition behind behind multiple hidden layers

Page 14: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Deep Networks- Will the networks generalize with deep networks

• DNNs are quite data hungry and performance

improves by increasing the data.

• Generalization problem is tackled by providing

training data from all possible conditions.

• Many artificial data augmentation methods have

been successfully deployed

• Providing the state-of-art performance in several

real world applications.

Page 15: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

• The input data representation is one of most important components of any machine learning system.

Representation Learning in Deep Networks

Cartesian Coordinates Polar Coordinates

Page 16: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

• The input data representation is one of most important components of any machine learning system.

• Extract factors that enable classification while suppressing factors which are susceptible to noise.

• Finding the right representation for real world applications - substantially challenging.

• Deep learning solution - build complex representations from simpler representations.

• The dependencies between these hierarchical representations are refined by the target.

Representation Learning in Deep Networks

Page 17: Deep Learning: Theory and Practice...Need For Deep Networks Modeling complex real world data like speech, image, text • Single hidden layer networks are too restrictive. • Needs

Representation Learning in Deep Networks

Pixels

Edges

Contours

Object Parts

Object Identity

[Zeiler, 2014]