NASSCOM Big Data and Analytics Summit 2015:Session IX:Transitioning from Predictive & Prescriptive Analytics to Artificial Intelligence & Cognitive Computing

Transitioning From Predictive Analytics To Artificial Intelligence

Presented byRajeev Rastogi

Predictions with Multimedia (Images, Audio) Data

• Applications: Object recognition, speech recognition

• Raw data (image pixels, audio signals) too low-level, lacks predictive power

• Need higher level feature representations (with predictive power)

Input Data Target

Car

“Hello World”

Image

Audio

2

Higher Level Features: Computer Vision

• Computer vision researchers have developed hand-crafted features over the past decade

• But, how do we automatically generate high-level feature representations?

SIFT HoG

3

Deep Learning: Deep Neural Networks

• Multiple hidden layers learn higher level feature representations

• Non-linear sigmoid function performs mapping between layers

• Deep architectures can model any function without exponential blowup

• Edge weights learned using backpropagationInput Data

Outputs

HiddenLayers

0

)( 1isp

1

0

jjiji

i wsbsp

)exp(1)( 11

4

Deep Learning: Model Pre-training

• Key Challenge: Modeled functions are highly non-convex with local minima. Poor initialization of weights can cause algorithm to get stuck in local minima.

• Model pre-training used to initialize weights. Weights learned one layer at a time (with previous layer as input) using unsupervised (e.g. RBMs) and supervised techniques.

Input Input

Pre-training step 1 Pre-training step 2

5

Pixels

Edges

Object parts(combination

of edges)

Object models

Andrew Ng. “Machine Learning and AI via Brain Simulations”

Training set: Alignedimages of faces

Deep Learning: Example of Learned Features

6

Deep Learning Success in Computer Vision

• ImageNet Large Scale Visual Recognition Challenge

– 1000 categories, 1.5 million labeled examples

• Deep learning model [Krizhevsky,

Hinton 2012]– 650K neurons, 832M synapses, 60M parameters– Trained with backprop on GPU

• Error rate: 15% (whenever correct class

isn’t in top 5)• Previous state-of-the-art: 25% error

7

Deep Learning Success in Speech Recognition

• [Zeiler et al. 2013]

• Several large technology companies have deployed Deep Learning-based speech recognition system in their products

Number of hidden layers Word error rate %1 162 12.84 11.48 10.9

GMM baseline: 15.4%

8

Challenges

• Deep learning models have millions of parameters – billions of training examples needed to learn parameters.

• Models are learned using backpropagation + stochastic gradient descent. This is difficult to parallelize making it difficult to scale to large datasets.

• Deep learning models have numerous hyper-parameters (# hidden layers, # hidden units per layer, regularization, learning rate) that require tuning.

9

Presentations & Public Speaking

NASSCOM Big Data and Analytics Summit 2015:Session IX:Transitioning from Predictive & Prescriptive Analytics to Artificial Intelligence & Cognitive Computing