Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
WHITE PAPER
H3 TRENDS IN AI ALGORITHMS THE INFOSYS WAY
Abstract
Artificial Intelligence algorithms are the wheels of AI To make the art of possible applications of AI a very good and deep understanding of these algorithms is required This paper tries to bring a perspective on the landscape of various AI algorithms that will be shaping key advancements across industries
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Today technology adoption is influenced
by business and technology uncertainties
These uncertainties drive organisations
to evaluate technology adoptions based
on risks and returns Broadly technology
led disruptions can be classified into
Horizon1 Horizon 2 and Horizon 3 Horizon
1 or H1 technologies are those that are
in mainstream client adoptions and have
steady business transactions while H2
and H3 are those that are yet to become
mainstream but have started to spring
interesting possibilities and potential
returns in the future
At Infosys Center for Emerging
Technology solutions (iCETS) we
continuously look at H2 and H3
technologies and their impact on client
landscapes These H2 and H3 technologies
are very important to be monitored as they
have the potential to transform or disrupt
existing well-oiled business models hence
fetching large returns However there
are also associated risks from adoptions
that need to be monitored as some of
those can have higher negative impact on
compliance safety and so on
With the emergence and availability of
several open datasets computational
thrust with GPU availability and maturity
of Artificial Intelligence (AI) algorithms AI
is making strong inroads into current and
future of IT ecosystems Today AI plays
an integral role in IT strategy by driving
new experiences and creating new art of
possibilities In this paper we try to look
at important AI algorithms that are
shaping various H3 of AI possibilities
While we do that here is a chart
representing the broader AI algorithm
landscape in the context of this paper
Business Uncertainty
Tech
nolo
gy U
ncer
tain
ty
Fly
Run
Crawl a
nd Wal
k
Emerg
ing Oerings
Core
O
erings
New
O
erings
DierentiateDiversify Deploy
AdoptScale
Enhance
EnvisionInvent
Disrupt
Emerging InvestmentOpportunities
Algorithms Use Cases
Incubated to New Oerings
Main stream
bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search (NAS)
bull Scene Captioningbull Scene detectionbull Store Footfall countsbull Specic object class detectionbull Sentence Completionbull Video Scene predictionbull Auto learningbull Fake images Art generationbull Music generationbull Data Augmentation
bull Convolution Neural Networks (CNN)bull Long Term Short Term Memory
(LSTM)bull Recurrent Neural Networks (RNN)bull Word2Vecbull GloVebull Transfer Learning (Vision)
bull Object detectionbull Product brand recognition
classicationbull Facial Recognitionbull Speech Recognitionbull Speech Transcriptionsbull Topic Classication extraction
bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines (SVM)bull Collaborative Filteringbull n-grams
bull Recommendationsbull Predictionbull Document image
classicationbull Document image Clusteringbull Sentiment Analysis
External Document copy 2019 Infosys Limited
Figure 10 Horizon 3 AI Algorithms- Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences
H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems
that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords
These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing
H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)
Horizon 1(Mainstream)
Horizon 2(Adopt Scale)
Horizon 3(Envision Invent Disrupt)
bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search
(NAS)
bull Convolution Neural Networks (CNN)
bull Long Term Short Term Memory (LSTM)
bull Recurrent Neural Networks (RNN)
bull Word2Vecbull GloVebull Transfer Learning (Vision)
bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines
(SVM)bull Collaborative Filteringbull n-grams
Algorithms
Use Cases
bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class
Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation
bull Object Detectionbull Face Recognitionbull Product Brand Recognition
Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction
bull Recommendationsbull Predictionbull Document Image
Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition
(NER)bull Keyword Extractions
Table 10 H3 Algorithms and Usecases- Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Network algorithms are considered
to derive hidden patterns from data that
many other conventional best of breed
Machine Learning algorithms such as
Support Vector Machines Random Forrest
Naiumlve Bayes etc are unable to establish
However there is an increasing rate of
incorrect and unexplainable decisions
and results produced by the Neural
Network algorithms in activities such
as credit lending skilled job hiring and
facial recognition Given this scenario
AI results should be justified explained
and reproduced for consistency and
correctness as some of these results can
Network dissection helps in associating
these established units to concepts
They learn from labeled concepts during
supervised training stages and how and in
what magnitude these are influenced by
channel activations
Several frameworks are currently evolving
to improve the explainability of the
models Two known frameworks in this
space are LIME and SHAP
Explainable AI (XAI)
have profound impact on livelihood
Geoffrey Hinton (University of Toronto)
often called the godfather of deep
learning explains ldquoA deep-learning system
doesnrsquot have any explanatory power The
more powerful the deep-learning system
becomes the more opaque it can becomerdquo
It is to address the issues of transparency
in AI that Explainable AI was developed
Explainable AI (XAI) as a framework
increases the transparency of black-box
algorithms by providing explanations for
the predictions made and can accurately
explain a prediction at the individual level
LIME ( Local Interpretability Model
agnostic Explanations) It treats the
model as a blackbox and tries to create
another surrogate non-linear model where
explainablitiy is supported or feasible
such as SVM Random Forest or Logistic
Regression The surrogate non-linear
model is then used to evaluate different
components of the image by perturbing
the inputs and evaluating its impact
on the result Thereby deciding which
Here are a few approaches provided
through certain frameworks that can help
understand the traceability of results
Feature Visualization as depicted in figure
below helps in visualizing various layers in
a neural network They help establish that
lower layers are useful in learning features
such as edges and textures whereas
higher layer provides more of higher order
abstract concepts such as objects
parts of the image are most important in
arriving at results Since the original model
does not participate directly it is model
independent The challenge with this
approach is that even when the surrogate
model based explanations can be relevant
to the model it is used on it may not be
generalizable precisely or become one to
one mappable to the original model all the
time
External Document copy 2019 Infosys Limited
Edges Textures Patterns Parts Objects
Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Generative AI will have a potentially
strong role in creative work be it writing
articles creating completely new images
from the existing set of trained models
improving image or video quality merging
images for artistic creations in creating
music or improving dataset through data
generation Generative AI as it matures in
near term will augment many jobs and will
potentially replace many in future
Generative Networks consists of two deep
neural networks a generative network
and a discriminative network They work
together to provide high-level simulation
of conceptual tasks
To train a Generative model we first
collect a large amount of data in some
Generative AI
Steps
Neural Style Transfer (NST)
1 Create set of noisy (perturbed) example
images by disabling certain features
(marking certain portions gray)
2 For each example get the probability
that tree frog is in the image as per
original model
3 Using these created data points train a
domain (eg think millions of images
sentences or sounds etc) and then
train the model to generate similar
data Generative network generates the
data to fool the Discriminative Network
while Discriminative Network learns by
identifying real vs fake data received from
the Generative Network
Generator trains with an objective function
on whether it can fool the discriminator
network whereas discriminator trains on
its ability to not be fooled and correctly
identify real vs fake Both network
learns through back propagation The
generator is typically a deconvolutional
neural network and the discriminator is a
convolutional neural network
Generative Networks can be of multiple
types depending on the objective they are
designed for example being
Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions
Neural Style Transfer (NST) is one of the
Generative AI techniques in deep learning
As seen below it merges two images
namely a content image (C) and a style
image (S) to create a generated image
(G) The generated image G combines the
content of the image C with the style of
image S
simple linear model (Logistic regression
etc) and get the results
4 Superpixels with highest positive
weights becomes an explanation
SHAP (SHapley Additive exPlanations)
It uses a game theory based approach
to predict the outcome by using various
permutations and combinations of features
and their effect on the delta of the result
(predicted - actual) and then computing
average of the score for that feature to
explain the results For image use cases
it marks the dominating feature areas by
coloring the pixels in the image
SHAP produces relatively accurate results
and is more widely used in Explainable AI
as against LIME
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Today technology adoption is influenced
by business and technology uncertainties
These uncertainties drive organisations
to evaluate technology adoptions based
on risks and returns Broadly technology
led disruptions can be classified into
Horizon1 Horizon 2 and Horizon 3 Horizon
1 or H1 technologies are those that are
in mainstream client adoptions and have
steady business transactions while H2
and H3 are those that are yet to become
mainstream but have started to spring
interesting possibilities and potential
returns in the future
At Infosys Center for Emerging
Technology solutions (iCETS) we
continuously look at H2 and H3
technologies and their impact on client
landscapes These H2 and H3 technologies
are very important to be monitored as they
have the potential to transform or disrupt
existing well-oiled business models hence
fetching large returns However there
are also associated risks from adoptions
that need to be monitored as some of
those can have higher negative impact on
compliance safety and so on
With the emergence and availability of
several open datasets computational
thrust with GPU availability and maturity
of Artificial Intelligence (AI) algorithms AI
is making strong inroads into current and
future of IT ecosystems Today AI plays
an integral role in IT strategy by driving
new experiences and creating new art of
possibilities In this paper we try to look
at important AI algorithms that are
shaping various H3 of AI possibilities
While we do that here is a chart
representing the broader AI algorithm
landscape in the context of this paper
Business Uncertainty
Tech
nolo
gy U
ncer
tain
ty
Fly
Run
Crawl a
nd Wal
k
Emerg
ing Oerings
Core
O
erings
New
O
erings
DierentiateDiversify Deploy
AdoptScale
Enhance
EnvisionInvent
Disrupt
Emerging InvestmentOpportunities
Algorithms Use Cases
Incubated to New Oerings
Main stream
bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search (NAS)
bull Scene Captioningbull Scene detectionbull Store Footfall countsbull Specic object class detectionbull Sentence Completionbull Video Scene predictionbull Auto learningbull Fake images Art generationbull Music generationbull Data Augmentation
bull Convolution Neural Networks (CNN)bull Long Term Short Term Memory
(LSTM)bull Recurrent Neural Networks (RNN)bull Word2Vecbull GloVebull Transfer Learning (Vision)
bull Object detectionbull Product brand recognition
classicationbull Facial Recognitionbull Speech Recognitionbull Speech Transcriptionsbull Topic Classication extraction
bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines (SVM)bull Collaborative Filteringbull n-grams
bull Recommendationsbull Predictionbull Document image
classicationbull Document image Clusteringbull Sentiment Analysis
External Document copy 2019 Infosys Limited
Figure 10 Horizon 3 AI Algorithms- Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences
H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems
that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords
These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing
H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)
Horizon 1(Mainstream)
Horizon 2(Adopt Scale)
Horizon 3(Envision Invent Disrupt)
bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search
(NAS)
bull Convolution Neural Networks (CNN)
bull Long Term Short Term Memory (LSTM)
bull Recurrent Neural Networks (RNN)
bull Word2Vecbull GloVebull Transfer Learning (Vision)
bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines
(SVM)bull Collaborative Filteringbull n-grams
Algorithms
Use Cases
bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class
Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation
bull Object Detectionbull Face Recognitionbull Product Brand Recognition
Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction
bull Recommendationsbull Predictionbull Document Image
Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition
(NER)bull Keyword Extractions
Table 10 H3 Algorithms and Usecases- Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Network algorithms are considered
to derive hidden patterns from data that
many other conventional best of breed
Machine Learning algorithms such as
Support Vector Machines Random Forrest
Naiumlve Bayes etc are unable to establish
However there is an increasing rate of
incorrect and unexplainable decisions
and results produced by the Neural
Network algorithms in activities such
as credit lending skilled job hiring and
facial recognition Given this scenario
AI results should be justified explained
and reproduced for consistency and
correctness as some of these results can
Network dissection helps in associating
these established units to concepts
They learn from labeled concepts during
supervised training stages and how and in
what magnitude these are influenced by
channel activations
Several frameworks are currently evolving
to improve the explainability of the
models Two known frameworks in this
space are LIME and SHAP
Explainable AI (XAI)
have profound impact on livelihood
Geoffrey Hinton (University of Toronto)
often called the godfather of deep
learning explains ldquoA deep-learning system
doesnrsquot have any explanatory power The
more powerful the deep-learning system
becomes the more opaque it can becomerdquo
It is to address the issues of transparency
in AI that Explainable AI was developed
Explainable AI (XAI) as a framework
increases the transparency of black-box
algorithms by providing explanations for
the predictions made and can accurately
explain a prediction at the individual level
LIME ( Local Interpretability Model
agnostic Explanations) It treats the
model as a blackbox and tries to create
another surrogate non-linear model where
explainablitiy is supported or feasible
such as SVM Random Forest or Logistic
Regression The surrogate non-linear
model is then used to evaluate different
components of the image by perturbing
the inputs and evaluating its impact
on the result Thereby deciding which
Here are a few approaches provided
through certain frameworks that can help
understand the traceability of results
Feature Visualization as depicted in figure
below helps in visualizing various layers in
a neural network They help establish that
lower layers are useful in learning features
such as edges and textures whereas
higher layer provides more of higher order
abstract concepts such as objects
parts of the image are most important in
arriving at results Since the original model
does not participate directly it is model
independent The challenge with this
approach is that even when the surrogate
model based explanations can be relevant
to the model it is used on it may not be
generalizable precisely or become one to
one mappable to the original model all the
time
External Document copy 2019 Infosys Limited
Edges Textures Patterns Parts Objects
Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Generative AI will have a potentially
strong role in creative work be it writing
articles creating completely new images
from the existing set of trained models
improving image or video quality merging
images for artistic creations in creating
music or improving dataset through data
generation Generative AI as it matures in
near term will augment many jobs and will
potentially replace many in future
Generative Networks consists of two deep
neural networks a generative network
and a discriminative network They work
together to provide high-level simulation
of conceptual tasks
To train a Generative model we first
collect a large amount of data in some
Generative AI
Steps
Neural Style Transfer (NST)
1 Create set of noisy (perturbed) example
images by disabling certain features
(marking certain portions gray)
2 For each example get the probability
that tree frog is in the image as per
original model
3 Using these created data points train a
domain (eg think millions of images
sentences or sounds etc) and then
train the model to generate similar
data Generative network generates the
data to fool the Discriminative Network
while Discriminative Network learns by
identifying real vs fake data received from
the Generative Network
Generator trains with an objective function
on whether it can fool the discriminator
network whereas discriminator trains on
its ability to not be fooled and correctly
identify real vs fake Both network
learns through back propagation The
generator is typically a deconvolutional
neural network and the discriminator is a
convolutional neural network
Generative Networks can be of multiple
types depending on the objective they are
designed for example being
Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions
Neural Style Transfer (NST) is one of the
Generative AI techniques in deep learning
As seen below it merges two images
namely a content image (C) and a style
image (S) to create a generated image
(G) The generated image G combines the
content of the image C with the style of
image S
simple linear model (Logistic regression
etc) and get the results
4 Superpixels with highest positive
weights becomes an explanation
SHAP (SHapley Additive exPlanations)
It uses a game theory based approach
to predict the outcome by using various
permutations and combinations of features
and their effect on the delta of the result
(predicted - actual) and then computing
average of the score for that feature to
explain the results For image use cases
it marks the dominating feature areas by
coloring the pixels in the image
SHAP produces relatively accurate results
and is more widely used in Explainable AI
as against LIME
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences
H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems
that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords
These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing
H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)
Horizon 1(Mainstream)
Horizon 2(Adopt Scale)
Horizon 3(Envision Invent Disrupt)
bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search
(NAS)
bull Convolution Neural Networks (CNN)
bull Long Term Short Term Memory (LSTM)
bull Recurrent Neural Networks (RNN)
bull Word2Vecbull GloVebull Transfer Learning (Vision)
bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines
(SVM)bull Collaborative Filteringbull n-grams
Algorithms
Use Cases
bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class
Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation
bull Object Detectionbull Face Recognitionbull Product Brand Recognition
Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction
bull Recommendationsbull Predictionbull Document Image
Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition
(NER)bull Keyword Extractions
Table 10 H3 Algorithms and Usecases- Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Network algorithms are considered
to derive hidden patterns from data that
many other conventional best of breed
Machine Learning algorithms such as
Support Vector Machines Random Forrest
Naiumlve Bayes etc are unable to establish
However there is an increasing rate of
incorrect and unexplainable decisions
and results produced by the Neural
Network algorithms in activities such
as credit lending skilled job hiring and
facial recognition Given this scenario
AI results should be justified explained
and reproduced for consistency and
correctness as some of these results can
Network dissection helps in associating
these established units to concepts
They learn from labeled concepts during
supervised training stages and how and in
what magnitude these are influenced by
channel activations
Several frameworks are currently evolving
to improve the explainability of the
models Two known frameworks in this
space are LIME and SHAP
Explainable AI (XAI)
have profound impact on livelihood
Geoffrey Hinton (University of Toronto)
often called the godfather of deep
learning explains ldquoA deep-learning system
doesnrsquot have any explanatory power The
more powerful the deep-learning system
becomes the more opaque it can becomerdquo
It is to address the issues of transparency
in AI that Explainable AI was developed
Explainable AI (XAI) as a framework
increases the transparency of black-box
algorithms by providing explanations for
the predictions made and can accurately
explain a prediction at the individual level
LIME ( Local Interpretability Model
agnostic Explanations) It treats the
model as a blackbox and tries to create
another surrogate non-linear model where
explainablitiy is supported or feasible
such as SVM Random Forest or Logistic
Regression The surrogate non-linear
model is then used to evaluate different
components of the image by perturbing
the inputs and evaluating its impact
on the result Thereby deciding which
Here are a few approaches provided
through certain frameworks that can help
understand the traceability of results
Feature Visualization as depicted in figure
below helps in visualizing various layers in
a neural network They help establish that
lower layers are useful in learning features
such as edges and textures whereas
higher layer provides more of higher order
abstract concepts such as objects
parts of the image are most important in
arriving at results Since the original model
does not participate directly it is model
independent The challenge with this
approach is that even when the surrogate
model based explanations can be relevant
to the model it is used on it may not be
generalizable precisely or become one to
one mappable to the original model all the
time
External Document copy 2019 Infosys Limited
Edges Textures Patterns Parts Objects
Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Generative AI will have a potentially
strong role in creative work be it writing
articles creating completely new images
from the existing set of trained models
improving image or video quality merging
images for artistic creations in creating
music or improving dataset through data
generation Generative AI as it matures in
near term will augment many jobs and will
potentially replace many in future
Generative Networks consists of two deep
neural networks a generative network
and a discriminative network They work
together to provide high-level simulation
of conceptual tasks
To train a Generative model we first
collect a large amount of data in some
Generative AI
Steps
Neural Style Transfer (NST)
1 Create set of noisy (perturbed) example
images by disabling certain features
(marking certain portions gray)
2 For each example get the probability
that tree frog is in the image as per
original model
3 Using these created data points train a
domain (eg think millions of images
sentences or sounds etc) and then
train the model to generate similar
data Generative network generates the
data to fool the Discriminative Network
while Discriminative Network learns by
identifying real vs fake data received from
the Generative Network
Generator trains with an objective function
on whether it can fool the discriminator
network whereas discriminator trains on
its ability to not be fooled and correctly
identify real vs fake Both network
learns through back propagation The
generator is typically a deconvolutional
neural network and the discriminator is a
convolutional neural network
Generative Networks can be of multiple
types depending on the objective they are
designed for example being
Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions
Neural Style Transfer (NST) is one of the
Generative AI techniques in deep learning
As seen below it merges two images
namely a content image (C) and a style
image (S) to create a generated image
(G) The generated image G combines the
content of the image C with the style of
image S
simple linear model (Logistic regression
etc) and get the results
4 Superpixels with highest positive
weights becomes an explanation
SHAP (SHapley Additive exPlanations)
It uses a game theory based approach
to predict the outcome by using various
permutations and combinations of features
and their effect on the delta of the result
(predicted - actual) and then computing
average of the score for that feature to
explain the results For image use cases
it marks the dominating feature areas by
coloring the pixels in the image
SHAP produces relatively accurate results
and is more widely used in Explainable AI
as against LIME
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Network algorithms are considered
to derive hidden patterns from data that
many other conventional best of breed
Machine Learning algorithms such as
Support Vector Machines Random Forrest
Naiumlve Bayes etc are unable to establish
However there is an increasing rate of
incorrect and unexplainable decisions
and results produced by the Neural
Network algorithms in activities such
as credit lending skilled job hiring and
facial recognition Given this scenario
AI results should be justified explained
and reproduced for consistency and
correctness as some of these results can
Network dissection helps in associating
these established units to concepts
They learn from labeled concepts during
supervised training stages and how and in
what magnitude these are influenced by
channel activations
Several frameworks are currently evolving
to improve the explainability of the
models Two known frameworks in this
space are LIME and SHAP
Explainable AI (XAI)
have profound impact on livelihood
Geoffrey Hinton (University of Toronto)
often called the godfather of deep
learning explains ldquoA deep-learning system
doesnrsquot have any explanatory power The
more powerful the deep-learning system
becomes the more opaque it can becomerdquo
It is to address the issues of transparency
in AI that Explainable AI was developed
Explainable AI (XAI) as a framework
increases the transparency of black-box
algorithms by providing explanations for
the predictions made and can accurately
explain a prediction at the individual level
LIME ( Local Interpretability Model
agnostic Explanations) It treats the
model as a blackbox and tries to create
another surrogate non-linear model where
explainablitiy is supported or feasible
such as SVM Random Forest or Logistic
Regression The surrogate non-linear
model is then used to evaluate different
components of the image by perturbing
the inputs and evaluating its impact
on the result Thereby deciding which
Here are a few approaches provided
through certain frameworks that can help
understand the traceability of results
Feature Visualization as depicted in figure
below helps in visualizing various layers in
a neural network They help establish that
lower layers are useful in learning features
such as edges and textures whereas
higher layer provides more of higher order
abstract concepts such as objects
parts of the image are most important in
arriving at results Since the original model
does not participate directly it is model
independent The challenge with this
approach is that even when the surrogate
model based explanations can be relevant
to the model it is used on it may not be
generalizable precisely or become one to
one mappable to the original model all the
time
External Document copy 2019 Infosys Limited
Edges Textures Patterns Parts Objects
Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Generative AI will have a potentially
strong role in creative work be it writing
articles creating completely new images
from the existing set of trained models
improving image or video quality merging
images for artistic creations in creating
music or improving dataset through data
generation Generative AI as it matures in
near term will augment many jobs and will
potentially replace many in future
Generative Networks consists of two deep
neural networks a generative network
and a discriminative network They work
together to provide high-level simulation
of conceptual tasks
To train a Generative model we first
collect a large amount of data in some
Generative AI
Steps
Neural Style Transfer (NST)
1 Create set of noisy (perturbed) example
images by disabling certain features
(marking certain portions gray)
2 For each example get the probability
that tree frog is in the image as per
original model
3 Using these created data points train a
domain (eg think millions of images
sentences or sounds etc) and then
train the model to generate similar
data Generative network generates the
data to fool the Discriminative Network
while Discriminative Network learns by
identifying real vs fake data received from
the Generative Network
Generator trains with an objective function
on whether it can fool the discriminator
network whereas discriminator trains on
its ability to not be fooled and correctly
identify real vs fake Both network
learns through back propagation The
generator is typically a deconvolutional
neural network and the discriminator is a
convolutional neural network
Generative Networks can be of multiple
types depending on the objective they are
designed for example being
Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions
Neural Style Transfer (NST) is one of the
Generative AI techniques in deep learning
As seen below it merges two images
namely a content image (C) and a style
image (S) to create a generated image
(G) The generated image G combines the
content of the image C with the style of
image S
simple linear model (Logistic regression
etc) and get the results
4 Superpixels with highest positive
weights becomes an explanation
SHAP (SHapley Additive exPlanations)
It uses a game theory based approach
to predict the outcome by using various
permutations and combinations of features
and their effect on the delta of the result
(predicted - actual) and then computing
average of the score for that feature to
explain the results For image use cases
it marks the dominating feature areas by
coloring the pixels in the image
SHAP produces relatively accurate results
and is more widely used in Explainable AI
as against LIME
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Generative AI will have a potentially
strong role in creative work be it writing
articles creating completely new images
from the existing set of trained models
improving image or video quality merging
images for artistic creations in creating
music or improving dataset through data
generation Generative AI as it matures in
near term will augment many jobs and will
potentially replace many in future
Generative Networks consists of two deep
neural networks a generative network
and a discriminative network They work
together to provide high-level simulation
of conceptual tasks
To train a Generative model we first
collect a large amount of data in some
Generative AI
Steps
Neural Style Transfer (NST)
1 Create set of noisy (perturbed) example
images by disabling certain features
(marking certain portions gray)
2 For each example get the probability
that tree frog is in the image as per
original model
3 Using these created data points train a
domain (eg think millions of images
sentences or sounds etc) and then
train the model to generate similar
data Generative network generates the
data to fool the Discriminative Network
while Discriminative Network learns by
identifying real vs fake data received from
the Generative Network
Generator trains with an objective function
on whether it can fool the discriminator
network whereas discriminator trains on
its ability to not be fooled and correctly
identify real vs fake Both network
learns through back propagation The
generator is typically a deconvolutional
neural network and the discriminator is a
convolutional neural network
Generative Networks can be of multiple
types depending on the objective they are
designed for example being
Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions
Neural Style Transfer (NST) is one of the
Generative AI techniques in deep learning
As seen below it merges two images
namely a content image (C) and a style
image (S) to create a generated image
(G) The generated image G combines the
content of the image C with the style of
image S
simple linear model (Logistic regression
etc) and get the results
4 Superpixels with highest positive
weights becomes an explanation
SHAP (SHapley Additive exPlanations)
It uses a game theory based approach
to predict the outcome by using various
permutations and combinations of features
and their effect on the delta of the result
(predicted - actual) and then computing
average of the score for that feature to
explain the results For image use cases
it marks the dominating feature areas by
coloring the pixels in the image
SHAP produces relatively accurate results
and is more widely used in Explainable AI
as against LIME
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Some of the other GAN variations that are
popular are
Super Resolution GAN (SRGAN) that
helps improve quality of images
Stack-GAN that generates realistic
looking photographs from textual
descriptions of simple objects like birds
and flowers
Sketch-GAN a Generative model for
vector drawings which is a Recurrent
Neural Network (RNN) and is able to
construct stroke-based drawings of
common objects The model is trained
on a dataset of human-drawn images
representing many different classes
eGANs (Evolutionary Generative
Adversarial Networks) that generate
photographs of faces with different
ages from young to old
IcGAN to reconstruct photographs
of faces with specific features such
as changes in hair color style facial
expression and even gender
Content image
Content image
Content image
Colorful circle
Louvre museum
Ancient city of Persepolis
Blue painting
Impressionist style painting
Colorful circle with blue painting style
Louvre painting with impressionist style
Persepolis in Van Gogh style
The Starry Night (Van Gogh)
Style image
Style image
Style image
Generated image
Generated image
Generated image
External Document copy 2019 Infosys Limited
Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Classification of an object into specific
categories such as car table flower and
such are common in Computer Vision
However establishing the objectsrsquo finer
class based on specific characteristics is
where AI is making rapid progress This is
because granular features of objects are
being trained and used for differentiation
of objects
Examples of Fine Grained Classification are
In Fine Grained Classification the
progression through the 8-layer CNN
network can be thought of as a progression
from low to mid to high-level features
The later layers aggregate more complex
structural information across larger
scalesndashsequences of convolutional layers
Fine Grained Classification Fine grained clothing style finder type
of a shoe etc
Recognizing a car type
Recognizing breed of a dog plant
species insect bird species etc
However fine-grained classification is
challenging due to the difficulty of finding
discriminative features Finding those
subtle traits that fully characterize the
object is not straightforward
Feature representations that better
preserve fine-grained information
Segmentation-based approaches that
facilitate extraction of purer features
and partpose normalized feature
spaces
Pose Normalization Schemes
Fine Grained Classification Approaches
interleaved with max-pooling can capture
deformable parts and fully connected
layers can capture complex co-occurrence
statistics
Bird recognition is one of the major examples in fine grained classification in the below image given a test image
groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier
Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
The below pictures and steps depict fine
grained classification approach for car
detection system
a Detects parts using a collection of
unsupervised part detectors
b Outputs a grid of discriminative
features (The CNN is learned with class
Car Detection System using Fine Grained Classification
labels and then truncated retaining
the first two convolutional layers
that retain spatial information) The
appearance of each part detected
using the learned CNN features is
described by pooling in the detected
region of each part
c Appearance of any undetected part
is set to zero This results in Ensemble
of Localized Learned Features (ELLF)
representation which is then used to
predict fine-grained object categories
d A standard CNN passes the output
of the convolutional layers through
several fully connected layers in order
to make a prediction
Convolutional Network are so far the
defacto and well accepted algorithms to
work with image based datasets They
work on the pixels of images using various
size filters (channels) by convolving using
pooling techniques to bubble the stronger
features to derive colors textures edges
and shapes and establish structures
through lower to highest layers
Given the face of a person CNN identifies
the face by establishing eyes ears
eyebrows lips chin etc components
of the face However if the facial image
is provided with incorrect position and
Capsule Network
alignment of eyes and eyebrows or say
eyebrows swaps with lips and ears are
placed on forehead the same CNN trained
algorithm would still go on and detect this
as a human face This is the huge drawback
of CNN algorithm and happens due to
its inability to store the information on
relative position of various objects
Capsule Network invented by Geoffery
Hinton addresses exactly this problem of
CNN by storing the spatial relationships of
various parts
Capsule Network like CNN are multi
layered neural networks consisting of
several capsules each capsule consists
of several neurons Capsules in lower
layers are called primary capsules and are
trained to detect an object (eg triangle
circle) within a given region of image It
outputs a vector that has two properties
Length and Orientation Length represents
the probability of the presence of the
object and Orientation represents the
pose parameters of the object such as
coordinates rotation angle etc
Capsules in higher layers called routing
capsules detect larger and more complex
objects such as eyes ears etc
Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Routing by Agreement
Advantage over CNN
Unlike CNN which primarily bubbles higher
order features using max or avg pooling
Capsule Network bubbles up features
using routing by agreement where every
capsule participates in choosing the shape
by voting (democratic election way)
In the figure given above
Lower level corresponds to rectangles
triangles and circles
High level corresponds to houses
boats and cars
If there is an image of a house the
capsules corresponding to rectangles
and triangles will have large activation
vectors Their relative positions (coded
in their instantiation parameters) will bet
on the presence of high-level objects
Since they will agree on the presence of
house the output vector of the house
capsule will become large This in turn
will make the predictions by the rectangle
and the triangle capsules larger This
cycle will repeat 4-5 times after which the
bets on the presence of a house will be
considerably larger than the bets on the
presence of a boat or a car
Less data for training - Capsule
Networks need very less data for
training (almost 10) as compared to
CNN
Fewer parameters The connections
between layers require fewer
parameters as capsule groups neurons
resulting in relatively less computations
bandwidth
Preserve pose and position - They
preserve pose and position information
as against CNN
High accuracy - Capsule Networks
have higher accuracy as compared to
CNNs
Reconstruction vs mere classification
- CNN helps you to classify the images
but not reconstruct the same image
whereas Capsule Networks help you to
reconstruct the exact image
Information retention vs loss - With
CNN during edge detection kernel
for edge detection works only on a
specific angle and each angle requires
a corresponding kernel When dealing
with edges CNN works well because
there are very few ways to describe
an edge Once we get up to the level
of shapes we do not want to have a
kernel for every angle of rectangles
ovals triangles and so on It would
get unwieldy and would become
even worse when dealing with more
complicated shapes that have 3
dimensional rotations and features like
lighting the reason why traditional
neural nets do not handle unseen
rotations effectively
Capsule Networks are best suited for
object detection and image segmentation
while it helps better model hierarchical
relationships and provides high accuracy
However Capsule Networks are still under
research and relatively new and mostly
tested and benchmarked on MNIST
dataset but they will be the future in
working with massive use cases emerging
from Vision datasets
Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic
Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton
Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Traditional methods of learning in Machine
Learning focuses on taking a huge labeled
dataset and then learning to detect y
(dependent variable say classifying an
image as cat or dog) and given set of x
(independent variables images of cats
and dogs) This process involves selection
of an algorithm such as Convolution
Neural Net and arriving at various hyper
parameters such as number of layers
in the network number of neurons in
each layer learning rate weights bias
dropouts activation function to activate
the neuron such as sigmoid tanh and
Relu The learning happens through
several iterations of forward and backward
passes (propagation) by readjusting (also
called learning) the weights based on
difference in the loss (actual vs computed)
At the minimal loss the weights and
other network parameters are frozen
and are considered final model for future
prediction tasks This is obviously a long
and tedious process and repeating this for
every use case or task is engineering data
and compute intensive
Meta Learning focuses on how to learn to
learn It is one of the fascinating discipline
of artificial intelligence Human beings
have varying styles of learning Some
Humans can learn from their own existing
experiences or experiences they have
heard seen or observed Transfer Learning
discipline of AI is based on similar traits of
human learning where new models can
learn and benefit from existing trained
model
For example if a Computer Vision based
detection model with no Transfer
Learning that already detects various
types of vehicles such as cars trucks and
bicycles needs to be trained to detect an
airplane then you may have to retrain the
full model with images of all the previous
objects
Like the variety in human learning
techniques Meta Learning also uses
various learning methods based on
patterns of problems such as those based
on boundary space amount of data by
optimizing size of neural network or using
recurrent network approach Each of these
are briefly discussed inline
Few Shots Meta-Learning
This learning technique focuses on
learning from a few instances of data
Typically Neural Nets need millions of
data points to learn however Few Shots
Meta- Learning uses only a few instances
of data to build models Examples being
Facial recognition systems using Single
Shot Learning this is explained in detail in
Single Shot Learning section
Optimizer Meta-Learning
In this method the emphasis is on
optimizing the neural network and its
hyper- parameters A great example of
optimizer meta-learning are models that
are focused on improving gradient descent
techniques
Metric Meta-Learning
In this learning method the metric space
is narrowed down to improve the focus of
learning Then the learning is carried out
only in this metric space by leveraging
various optimization parameters that are
established for the given metric space
Recurrent Model Meta-Learning
This type of meta-learning model is tailored
to Recurrent Neural Networks(RNNs)
such as Long-Short-Term-Memory(LSTM)
In this architecture the meta-learner
algorithm will train a RNN model to process
a dataset sequentially and then process
new inputs from the task In an image
classification setting this might involve
passing in the set of (image label) pairs
of a dataset sequentially followed by new
examples which must be classified Meta-
Reinforcement Learning is an example of
this approach
Meta Learning
Transfer Learning (TL)
Types of Meta-Learning Models
people learn and memorize with one
instance of visual or auditory scan Some
people need multiple perspectives to
strengthen the neural connections for
permanent memory Some remember by
writing while some remember through
actual experiences Meta Learning tries
to leverage these to build its learning
characteristics
However with Transfer Learning you can
introduce an additional layer on top of the
existing pre-trained layer to start detecting
airplanes
Typically in a no Transfer Learning
scenario model needs to be trained and
during training right weights are arrived
at by doing many iterations (epochs) of
forward and back propagation which takes
significant amount of computation power
and time In addition Vision models need
significant amount of image data such as
in this example images of airplanes to be
trained
With Transfer Learning approach you can
reuse the existing pre-trained weights of an
existing trained model with significantly
less number of images ( 5 to 10 percent
of actual images needed for training
ground up model) for the model to start
detecting As the pre-trained model has
already learnt some basic learning around
identifying edges curves and shapes in the
earlier layers it needs to learn only higher
order features specific to airplanes with
the existing computed weights In brief
Transfer Learning helps eliminate the need
to learn anything from scratch
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Transfer Learning helps in saving
significant amount of data computational
power and time in training new models as
they leverage pre-trained weights from the
existing trained models and architectures
However it is important to understand
that Transfer Learning approach today
is only matured enough to be applied to
similar use cases that is you cannot use
the above discussed model to train a facial
recognition model
Another key thing during Transfer Learning
is that it is important to understand the
details of the data on which new use cases
are being trained as it can implicitly push
the built-in biases from the underlying data
into newer systems It is recommended
that the datasheets of underlying models
and data be studied thoroughly unless the
usage is for experimentative purpose
Earlier having used the human brain
rationale it is important to note that
human brains have gone through centuries
of experiences and gene evolution and has
the ability to learn faster whereas transfer
learning is just a few decades old and is
becoming ground for new vision and text
use cases
External Document copy 2019 Infosys Limited
Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Humans have the impressive skill to reason
about new concepts and experiences with
just a single example They have the ability
for one-shot generalization the aptitude
to encounter a new concept understand
its structure and then generate compelling
alternative variations of the same
Facial recognition systems are good
candidates for Single Shot Learning
otherwise needing ten thousands of
individual face images to train one neural
network can be extremely costly time
consuming and infeasible However a
Single Shot LearningSingle Shot Learning based system using
existing pre-trained FaceNet model and
facial encoding based approach on top of
it can be very effective to establish face
similarity by computing distance between
the faces
In this approach 128 bit encoding of each
face image is generated and compared
with other imagersquos encoding to determine
if the person is same or different
Various distance based algorithms such
as Euclidean distance can be used to
determine if they are within specified
threshold The model training approach
involves creating pairs of (Anchor Positive)
and (Anchor Negative) and training the
model in a way where (Anchor Positive)
pair distance difference is smaller and
(Anchor Negative) distance is farther
ldquoAnchorrdquo is the image of a person for whom
the recognition model needs to be trained
ldquoPositiverdquo is another image of the same
person
ldquoNegativerdquo is image of a different person
External Document copy 2019 Infosys Limited
Figure 100 Encoding approach inspired from ML Course from Coursera
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
This is a specialized Machine Learning
discipline where an agent learns to behave
in an environment by getting reward or
punishment for the actions performed The
agent can have an objective to maximize
short term or long-term rewards This
discipline uses deep learning techniques to
Value Based
In value-based RL the goal is to optimize
the value function V(s) Qtable uses any
The agent will use this value function to
select which state to choose at each step
Policy Based
In policy-based RL we want to directly
optimize the policy function π(s) without
using a value function
The policy is what defines the agent
behavior at a given time
Deep Reinforcement Learning (RL)
Three Approaches to Reinforcement Learning
bring in human level performance on the
given task
Deep Reinforcement Learning has found
significant relevance and application in
various game design systems such as
creating video games chess alpha Go
Atari as well as in industrial applications of
mathematical function to arrive at a state
based on action
The value of each state is the total amount
There are two types of policies
1 Deterministic A policy which at a given
state will always return the same action
2 Stochastic A policy that outputs a
distribution probability over actions
Value based and Policy based are more
conventional Reinforcement Learning
approaches They are useful for modeling
relatively simple systems
robots driverless car etc
In reinforcement learning policy p
controls what action we should take Value
function v measures how good it is to be
in a particular state The value function
tells us the maximum expected future
reward the agent will get at each state
of the reward an agent can expect to
accumulate over the future starting at that
state
State
State
Q value
Q value action 2
Q value action 1
Q value action 3
Qtable
Deep Q Neuralnetwork
Q learning
Deep Q learning
Action
ExpectedReward discounted
Given that state
action = policy(state)
Figure 110 Schema inspired by the Q learning notebook by Udacity
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Model Based
In model-based RL we model the
environment This means we create a
model of the behavior of the environment
then this model is used to arrive at results
that maximises short term or long-term
rewards The model equation can be
any equation that is defined based on
the environments behavior and must be
sufficiently generalized to counter new
situations
When Model based approach uses Deep
Neural Network algorithms to sufficiently
well generalize and learn the complexities
of the environment to produce optimal
results it is called Deep Reinforcement
Learning The challenge with model based
approach is each environment needs a
dedicated trained model
AlphaGo was trained by using data from
several games to beat the human being
in the game of Go The training accuracy
was just 57 and still it was sufficient to
beat the human level performance The
training methods involved reinforcement
learning and deep learning to build a
policy network that tells what moves are
promising and a value network that tells
how good the board position is Searches
for the final move from these networks
is done using Monte Carlo Tree Search
(MCTS) algorithm Using supervised
learning a policy network was created to
imitate the expert moves
Deep Mind released AlphaGo Zero in late
2017 which beat AlphaGo and did not
involve any training from previous games
data to train deep network The deep
network training was done by picking
the training samples from AlphaGo and
AlphaGo Zero playing games against
itself and selecting best moves to train
the network and then applying those
in real games to improve the results
iteratively This is possible because deep
reinforcement learning algorithms can
store long-range tree search results for the
next best move in memory and do very
large computations that are difficult for a
human brain
Designing machine learning solution
involves several steps such as collecting
data understanding cleansing and
normalizing data doing feature
engineering selecting or designing
the algorithm selecting the model
architecture selecting and tuning modelrsquos
hyper-parameters evaluating modelrsquos
performance deploying and monitoring
the machine learning system in an online
system and so on Such machine learning
solution design requires an expert Data
Scientist to complete the pipeline
Auto ML (AML)As the complexity of these and other tasks
can easily get overwhelming the rapid
growth of machine learning applications
has created a demand for off-the-shelf
machine learning methods that can be
used easily and without expert knowledge
The AI research area that encompasses
progressive automation of machine
learning pipeline tasks is called AutoML
(Automatic Machine Learning)
Google CEO Sundar Pichai wrote
ldquoDesigning neural nets is extremely time
intensive and requires an expertise that
limits its use to a smaller community of
scientists and engineers Thatrsquos why wersquove
created an approach called AutoML
showing that itrsquos possible for neural nets
to design neural netsrdquo while Googlersquos
Head of AI Jeff Dean suggested that 100x
computational power could replace the
need for machine learning expertise
AutoML Vision relies on two core
techniques transfer learning and neural
architecture search
Xtrain Ytrain
Xtest budget
Han
d-cr
afte
d po
rtfo
lio Meta Learning
AutoML system
Build Ensemble Ytest Data
ProcessorFeature
Preprocessor
Bayesian Optimization
Classier
ML Pipeline
Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Here is a look at the few libraries that help
in implementing AutoML
AUTO-SKLEARN
AUTO SKLEARN automates several key
tasks in Machine Learning pipeline such
as addressing column missing values
encoding of categorical values data scaling
and normalization feature pre-processing
and selection of right algorithm with
hyper-parameters The pipeline supports
15 Classification and 14 Feature processing
algorithms Selection of right algorithm can
happen based on ensembling techniques
and applying meta knowledge gathered
from executing similar scenarios (datasets
and algorithms)
Usage
Auto-sklearn is written in python and can
be considered as replacement for scikit-
learn classifiers Here is a sample set of
commands
gtgtgt import autosklearnclassification
Implementing AutoML gtgtgt cls = autosklearnclassification
AutoSklearnClassifier()
gtgtgt clsfit(X_train y_train)
gtgtgt predictions = clspredict(X_test
y_test)
SMAC (Sequential Model-Based
Algorithm Configuration)
SMAC is a tool for automating certain
AutoML steps SMAC is useful for selection
of key features hyper-parameter
optimization and to speed up algorithmic
outputs
BOHB (Bayesian Optimization
Hyperband searches)
BOHB combines Bayesian hyper parameter
optimization with bandit methods for
faster convergence
Google H2O also have their respective
AutoML tools which are not covered here
but can be explored in specific cases
AutoML needs significant memory and
computational power to execute alternate
algorithms and compute results At
present GPU resources are extremely
costly to execute even simple Machine
Learning workloads such as CNN algorithm
to classify objects If multiple such
alternate algorithms should be executed
the computation dollar needed would be
exponential This is impractical infeasible
and inefficient for the current state of Data
Science industry Adoption of AutoML will
depend on two things one the maturity
of AutoML pipeline and second but more
important how quickly GPU clusters
become cheap The second being most
critical Selling Cloud GPU capacity could
be one of the motivation of several cloud
based infrastructure-running companies
to promote AutoML in the industry Also
AutoML will not replace the Data scientistrsquos
work but can provide augmentation
and speed to certain tasks such as data
standardization model tuning and
trying multiple algorithms It is only the
beginning for AutoML but this technique
has high relevance and usefulness for
solving ultra-complex problems
External Document copy 2019 Infosys Limited
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Neural Architecture Search (NAS) is a
component of AutoML and addresses
the important step of designing Neural
Network Architecture
Designing fresh Neural Net architecture
involves an expert establishing and
organizing Neural Network layers filters
or channels filter sizes selecting other
optimum Hyper parameters and so on
through several rounds of computational
Neural Architecture Search (NAS)
iterations Since AlexNet deep neural
network architecture won the ImageNet
(image classification based on ImageNet
dataset) competition in 2012 several
architecture styles such as VGG ResNet
Inception Xception InceptionResNet
MobileNet and NASNet have significantly
evolved However selection of the right
architecture for the right problem is also
a skill due to the presence of various
influencers such as applicability to the
problem accuracy number of parameters
memory and computational footprint and
size of the architecture that govern the
overall functioning efficiency
Neural Architecture Search tries to address
this problem space by automatically
selecting right Neural Network architecture
to solve a given problem
AutoML
HyperparameterOptimization
NAS
External Document copy 2019 Infosys Limited
Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
Search space The search space provides
boundary within which the specific
architecture needs to be searched
Computer Vision (captioning the scene
or product identification) based use
cases would need a different neural
network architecture style as against
Speech (speech transcription or speaker
classification) or unstructured Text (Topic
extraction intent mining) based use cases
Search space tries to provide available
catalogs of best in class architectures based
on other domain data and performance
Key Components of NAS
These are also usually hand crafted by
expert data scientists
Optimization method This is responsible
for providing mechanism to search the
best architecture It could be searched
and applied randomly or using certain
statistical or Machine Learning evaluation
approach such as Bayesian method or
reinforcement learning methods
Evaluation method This has the role
of evaluating the quality of architecture
considered by optimization method It
could be done using full training approach
or doing partial training and then applying
certain specialized methods such as partial
training or early stopping weights sharing
network morphism etc
For selective problem spaces as
compared to manual methods NAS have
outperformed and is showing definite
promise for future However it is still
evolving and not ready for production
usages as several architectures need to be
established and evaluated depending on
the problem space
Search Space
DAG Representation
Cell Block
Meta-Architecture
NAS Specic
Reinforcement Learning
Evolutionary Search
Gradient-Based Optimization
BayesianOptimization
Optimization Method
Components of NAS
Full Training
Partial Training
Weight-Sharing
Network Morphism
Hypernetworks
Evaluation Method
Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
In this paper we looked at some key H3 AI
areas by no means this is an exhaustive
list Amongst all discussed Transfer
Learning Capsule Networks Explainable
AI Generative AI are making interesting
Addressing H3 AI Trends at Infosys
things possible and looks highly promising
We are keenly experimenting with these
building early use cases and integrating
into our product stack Infosys Enterprise
Cognitive platform (iECP) to solve
interesting client problems Here is a look
at how we are employing these H3 trends
in the work we do
Trend Use cases
1
2
3
4
5
6
7
8
9
10
Explainable AI (XAI)
Generative AI Neural Style Transfer (NST)
Fine Grained Classication
Capsule Networks
Meta Learning
Transfer Learning
Single Shot Learning
Deep Reinforcement Learning (RL)
Auto ML
Neural Architecture Search (NAS)
Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc
Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation
Vehicle Classication Type of Tumor Detection
Image Re-constructionImage ComparisonMatching
Intelligent Agents Continuous Learning scenarios for document review and corrections
Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies
Face Recognition Face Verication
Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections
Invoice Attribute Extraction Document Classication Document Clustering
CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc
External Document copy 2019 Infosys Limited
Table 20 AI Use cases Infosys Research
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited
1 Explainable AI (XAI)
2 Fine Grained Classification
5 Transfer Learning
6 Single Shot Learning
3 Capsule Networks
4 Meta Learning
7 Deep Reinforcement Learning (RL)
8 Auto ML
bull httpschristophmgithubiointerpretable-ml-book
bull httpssimmachinescomexplainable-ai
bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml
bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c
bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739
bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf
bull httpswwwfastai20180723auto-ml-3
bull httpsarxivorgpdf160305106pdf
bull httpsarxivorgpdf171009829pdf
bull httpskerasioexamplescifar10_cnn_capsule
bull httpswwwyoutubecomwatchv=pPN8d0E3900
bull httpswwwyoutubecomwatchv=rTawFwUvnLE
bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc
bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0
bull httpproceedingsmlrpressv48santoro16pdf
bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660
bull httpsdeepmindcomblogarticledeep-reinforcement-learning
bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419
bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5
bull httpsarxivorgpdf181112560pdf
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpswwwfastai20180723auto-ml-3
bull httpswwwfastai20180716auto-ml2auto-ml
bull httpscompetitionscodalaborgcompetitions17767
bull httpswwwautomlorgautomlauto-sklearn
bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac
bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml
Reference
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author
copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document
For more information contact askusinfosyscom
Infosyscom | NYSE INFY Stay Connected
9 Neural Architecture Search (NAS)
10 Infosys Enterprise Cognitive Platform
bull httpswwworeillycomideaswhat-is-neural-architecture-search
bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx
Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)
a microservices API based Artificial Intelligence platform He has over 21 years of experience
in creating products solutions and working with clients on industry problems His current
areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities
To know more about our work on the H3 trends in AI write to icetsinfosyscom
About the author