Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Neural Networks
Multi-layer Perceptron [Hopfield, 1982]
thresholding function
non-linear function (tanh,sigmoid)
• Useful for classifying non-linear data boundaries - non-linear class separation can be realized given enough data.
Neural Networks
tanh sigmoid ReLu
Cost-Function
are the desired outputs
Mean Square Error Cross Entropy
Types of Non-linearities
Richard, Michael D., and Richard P. Lippmann. "Neural network classifiers estimate Bayesian a posteriori probabilities." Neural computation 3.4 (1991): 461-483.
Learning Posterior Probabilities with NNsNeural networks predict posterior probabilities [Richard, 1991]
When DNNs are trained with CE or MSE
Neural networks estimate conditional expectation of the desired targets given the input
When the targets are discrete classes conditional expectation is the class posterior !
Learning Posterior Probabilities with NNsChoice of target function
• Softmax function for classification
• Softmax produces positive values that sum to 1 • Allows the interpretation of outputs as posterior
probabilities
Parameter Learning
Error function for entire data
Typical Error Surface as a function of parameters
(weights and biases)
Parameter Learning
Non-linear nature of error function
• Move in the reverse direction of the gradient
Error back propagation
Error surface close to a local optima
Summary so far…
• Neural networks as discriminative classifiers
• Need for hidden layer
• Choice of non-linearities and target functions
• Estimating posterior probabilities with NNs
• Parameter learning with back propagation.
Need For Deep NetworksModeling complex real world data like speech, image, text
• Single hidden layer networks are too restrictive.
• Needs large number of units in the hidden layer and
trained with large amounts of data.
• Not generalizable enough.
Networks with multiple hidden layers - deep networks
(Open questions till 2005)
• Are these networks trainable ?
• How can we initialize such networks ?
• Will these generalize well or over train ?
Deep Networks IntuitionNeural networks with multiple hidden layers - Deep networks [Hinton, 2006]
Neural networks with multiple hidden layers - Deep networks
Deep Networks Intuition
Neural networks with multiple hidden layers - Deep networks
Deep networks perform hierarchical data abstractions which enable the non-linear separation of complex data samples.
Deep Networks Intuition
Summary so far…
• Linear models to neural network.
• Deep Neural networks as extensions of NNs.
• Intuition behind behind multiple hidden layers
Deep Networks- Will the networks generalize with deep networks
• DNNs are quite data hungry and performance
improves by increasing the data.
• Generalization problem is tackled by providing
training data from all possible conditions.
• Many artificial data augmentation methods have
been successfully deployed
• Providing the state-of-art performance in several
real world applications.
• The input data representation is one of most important components of any machine learning system.
Representation Learning in Deep Networks
Cartesian Coordinates Polar Coordinates
• The input data representation is one of most important components of any machine learning system.
• Extract factors that enable classification while suppressing factors which are susceptible to noise.
• Finding the right representation for real world applications - substantially challenging.
• Deep learning solution - build complex representations from simpler representations.
• The dependencies between these hierarchical representations are refined by the target.
Representation Learning in Deep Networks
Representation Learning in Deep Networks
Pixels
Edges
Contours
Object Parts
Object Identity
[Zeiler, 2014]